Friday, January 23, 2009

SQL Tuning & Troubleshooting

Recently, an interesting performance issue cropped up at one of my clients' site. I thought I will share the issue, troubleshooting, and the takeaways with you.

It was stated that a job that used to run exceptionally well was taking a long time. Especially, considering the fact that it is a simple "DELETE" statement. The interesting part was that the "Commit" issued after the "DELETE" took a long time.

Below is a screenshot.

08:35:30 SQL> DELETE FROM apc.prep
08:35:38 2 WHERE event_id = '122608T09' AND lead_or_assoc = 'A';
0 rows deleted.
Elapsed: 00:00:00.03
08:35:40 SQL> commit;
Commit complete.
Elapsed: 00:00:32.89

That is 32 seconds for a commit of essentially "0" rows! Something was up.

Tanel Poder discussed a while ago, about the discipline of methodical common sense tuning and troubleshooting. He even had a nice presentation on this topic. This is one of the classic examples.

Instead of looking into, I/O issues, redo logs, log file sync issues etc. I decided to try a simple "delete/commit" on a sample table. It worked just fine. So I decided that something in this "candidate" object was the root cause.

By looking to see if there were any triggers on the object, I did find one. Great. Perhaps this is the problem for the slowdown. Upon looking at the code for the trigger...

CREATE OR REPLACE TRIGGER APC.prep_sheet_audit
BEFORE INSERT OR UPDATE
ON apc.prep
FOR EACH ROW
BEGIN
IF INSERTING
THEN
:NEW.added_by := SUBSTR (USER, 1, 10);
:NEW.date_added := SYSDATE;
:NEW.date_last_modified := SYSDATE;
:NEW.last_modified_by := SUBSTR (USER, 1, 10);
:NEW.status_flag := 'A';
END IF;
END;

Tt only applies to "INSERTS" and we are doing a delete. So I decided to take a 10046 trace to see what was happening.

Upon reviewing the tkprof'd trace:

I came across some interesting statements.

delete from "APC"."VW_APC_LEAD_STYLE"

call count cpu elapsed disk query current rows

------- ------ -------- ---------- ---------- ---------- ---------- ----------

Parse 1 0.00 0.00 0 0 0 0

Execute 1 7.07 15.39 1464 8703 515264 120497

Fetch 0 0.00 0.00 0 0 0 0

------- ------ -------- ---------- ---------- ---------- ---------- ----------

total 2 7.07 15.39 1464 8703 515264 120497

INSERT /*+ */ INTO "APC"."VW_APC_LEAD_STYLE"("EVENT_ID","MARKET_CODE",
"PAGE_ID","PAGE_SECTION_CODE","STYLE_NUM","LEAD_OR_ASSOC"
) SELECT
<.....sniped .....>
"PREP_SHEET"."DESCRIPTION",MAX("PREP_SHEET"."BLOCK_NUMBER") FROM
"APC"."PREP_SHEET" "PREP_SHEET" WHERE "PREP_SHEET"."LEAD_OR_ASSOC"<>'A'
GROUP BY "PREP_SHEET"."EVENT_ID","PREP_SHEET"."MARKET_CODE",
"PREP_SHEET"."PAGE_ID","PREP_SHEET"."PAGE_SECTION_CODE",
<...>

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------

Parse 1 0.02 0.00 0 34 0 0

Execute 1 7.68 17.72 63327 64462 388792 120497

Fetch 0 0.00 0.00 0 0 0 0

------- ------ -------- ---------- ---------- ---------- ---------- ----------

total 2 7.70 17.72 63327 64496 388792 120497

The total elapsed time of these two statements is approximately 32 seconds.

As we know (pasted again below..) the Commit, we noticed, the total time was

08:35:30 SQL> DELETE FROM apc.prep
08:35:38 2 WHERE event_id = '122608T09' AND lead_or_assoc = 'A';
0 rows deleted.
Elapsed: 00:00:00.03
08:35:40 SQL> commit;
Commit complete.
Elapsed: 00:00:32.89

So, these are the culprits. But a View (based on the name starting with VW)? what is this? where did this come from?

The delete followed by an insert gave me an idea that this may not be an ordinary view but perhaps a Materialized View.

Upon checking the DBA_OBJECTS,

SQL> select owner, object_name,object_id,object_type from dba_objects where object_name='VW_APC_LEAD_STYLE';

OWNER OBJECT_NAME OBJECT_ID OBJECT_TYPE

------------------------------ ------------------------------ ---------- -------------------

APC VW_APC_LEAD_STYLE 50243 TABLE

APC VW_APC_LEAD_STYLE 50245 MATERIALIZED VIEW

With the existence of a materialized view confirmed, the last piece of the puzzle is to make sure that the materialized view was created on the base table

"PREP_SHEET" though this definitely must be the case, we need proof.

So upon checking the DDL for the materialized view,

CREATE MATERIALIZED VIEW APC.VW_APC_LEAD_STYLE
BUILD IMMEDIATE
USING INDEX
TABLESPACE APC_DATA
REFRESH COMPLETE ON COMMIT
WITH PRIMARY KEY
AS
select EVENT_ID,
<....sniped...>
max(BLOCK_NUMBER) as BLOCK_NUMBER
from apc.prep_sheet
where LEAD_OR_ASSOC != 'A'
GROUP BY
<......sniped.....>
DESCRIPTION;

Now it was proved that the reason for the commit to take all that time with no rows getting deleted was not a problem with the performance of the database but due to objects that were created from behind the scenes.

Later on, this issue started throwing ORA-603 errors like the one below.

ksedmp: internal or fatal error
ORA-00603: ORACLE server session terminated by fatal error
ORA-00600: internal error code, arguments: [4080], [1], [131], [], [], [], [], []
Current SQL statement for this session:
COMMIT
Checking the alert.log confirmed
Errors in file /dkha3090/ora01/app/oracle/admin/apcu/udump/apcu_ora_10641626.trc:
ORA-00600: internal error code, arguments: [4080], [1], [131], [], [], [], [], []
Mon Jan 19 16:54:37 2009
Following on-commit snapshots not refreshed :
APC.VW_APC_LEAD_STYLE
Error 600 trapped in 2PC on transaction 4.31.61701. Cleaning up.
Error stack returned to user:
ORA-00600: internal error code, arguments: [4080], [1], [131], [], [], [], [], []
Mon Jan 19 16:54:37 2009
Errors in file /dkha3090/ora01/app/oracle/admin/apcu/udump/apcu_ora_10641626.trc:
ORA-00603: ORACLE server session terminated by fatal error
ORA-00600: internal error code, arguments: [4080], [1], [131], [], [], [], [], []
Mon Jan 19 16:54:41 2009
DISTRIB TRAN APCU.WORLD.b6b63999.4.31.61701
is local tran 4.31.61701 (hex=04.1f.f105)
insert pending collecting tran, scn=83987547675 (hex=13.8e0c461b)
Mon Jan 19 16:54:42 2009
DISTRIB TRAN APCU.WORLD.b6b63999.4.31.61701
is local tran 4.31.61701 (hex=04.1f.f105))
delete pending collecting tran, scn=83987547675 (hex=13.8e0c461b)

which led us to note Bug 1385495

With the materialized view changed to refresh on DEMAND. The problem was resolved.

Takeaways:

1. Simplistic Logical approach to performance troubleshooting always works.

2. Naming convention is very imporrtant. Generally, VW is used to refer to views though this is not written in stone. Perhaps, a prefix

of MVIEW would have helped instantly recognize that it was a materialized view instead of a view. Though this was identified quickly.

3. Someone in the past mentioned that 10046 trace is a treasure trove. This still is valid.

No comments: