We recently upgraded 10g Database from 10.2.0.4 to 10.2.0.5 psu patch 10. Upgradation was perfect, but unfortunately we faced an issue. The auto AWR snapshot was not generating. If we try to go for manual snapshot, it use to get hanged. By hang I mean to say stuck at command prompt. We had raised an SR with Oracle for this issue. Took a long while to identify and resolve. I thought why not share this issue with resolution.

Scenario:

AWR snapshot was not generated. With manual snap creation the execution used to get hanged.

exec DBMS_WORKLOAD_REPOSITORY.create_snapshot();

They asked for 10046 trace. To take trace for manual snap creation “execute dbms_workload_repository.create_snaphot();”

SQL> oradebug setospid 25807 -- ospid
SQL> oradebug unlimit
SQL> oradebug Event 10046 trace name context forever, level 12
-- waiting for few minutes
SQL> oradebug Event 10046 trace name context off
SQL> oradebug dump errorstack 3
-- waiting for few minutes
SQL> oradebug dump errorstack 3
SQL> exit

oradebug level 3 errorstack provides the following information:

Current statement
Call Stack Trace
Files Currently Opened
Process State for the pid
Pinned Buffer History
Cursor Dump
Fixed PGA, SGA and UGA
In memory trace dump

How to read ERRORSTACK output

Trace file shows:

application name: sqlplus.exe, hash value=254292535
waiting for 'enq: WF - contention' wait_time=0, seconds since wait started=1432
name|mode=57460006, 0=0, 0=0
blocking sess=0x000000057A706358 seq=37
Dumping Session Wait History
for 'enq: WF - contention' count=1 wait_time=2.999985 sec
name|mode=57460006, 0=0, 0=0

The session collecting snapshot is continually waiting for “enq: WF – contention”. So, the waiting was for this wait to flush the statistics to AWR. mmon trace shows, it error out while flushing the stats.

MMON trace are generated in bdump location.

MMON trace shows:

*** KEWRAFM1: Error=13518 encountered by Auto Flush Main.

13518, 00000, "Invalid database id (%s)"
// *Cause: The operation failed because the specified database ID
// does not exist in the Workload Repository.
// *Action: check the database id and retry the operation.

The mmon trace provided some more hint. The error state that database ID is invalid as the database ID doesn’t exists.

The MMON process takes snapshots at regular intervals (default 60 minutes) and inserts that data into the AWR tables. These tables containing AWR information are under SYS schema. The AWR related tables all begin with “WR”:

SQL> select distinct dbid from WRM$_DATABASE_INSTANCE ;

no row selected.

It has to show DB ID similar to the output for next statement. If it doesn’t, then in that case the DB ID needs to be populated in AWR tables i.e the missing DB ID.

SQL> select dbid from v$database;

WRM$_DATABASE_INSTANCE will be populated when instance is bounced. This is the case when we try to recreate AWR with the help of catnoawr.sql and catawrtb.sql. The DBID is not populated in the AWR tables. The Database needs to be bounced.

When a function / procedure is invoked from DBMS_WORKLOAD_REPOSITORY package, it refers to WRM$_DATABASE_INSTANCE table to get dbid information. If the AWR repositories are recreated, the table doesn’t have any data in it.

Ref: ORA-13518 : ‘Invalid Database Id’ On Call to DBMS_WORKLOAD_REPOSITORY Package [ID 984447.1]

After a reboot of the database, the dbid was recorded in WRM$_DATABASE_INSTANCE table of sys. But this didn’t resolved our problem. We were still not able to generate AWR snaps.

Then we went for systemstate 266 trace. Collected system state dump level 266 for the session for creating manual snapshot.

sqlplus /nolog
SQL> oradebug setmypid
Statement processed.
SQL> oradebug unlimit
Statement processed.
SQL> oradebug dump systemstate 266
Statement processed.
-- waiting for few minutes
SQL> oradebug dump systemstate 266
Statement processed.
SQL> oradebug tracefile_name
c:\oracle112\diag\rdbms\orcl\orcl\trace\orcl_ora_1136.trc
SQL> exit

mmon trace:

*** 2011-10-04 11:23:47.339
*** KEWRAFC: Flush slave failed, AWR Enqueue Timeout
*** 2011-10-04 11:24:47.339
*** KEWRAFC: Flush slave failed, AWR Enqueue Timeout
*** 2011-10-04 11:25:50.339

The trace shows the waiting is for flushing snapshots. Remedy for the problem is to disable the flushing of AWR data.

alter system set "_awr_disabled_flush_tables" = 'wrh$_tempstatxs';

Its a dynamic parameter, doesn’t require bounce of database.

Ref: How to disable flush of ASH data to AWR?

For More: AUTOMATED WORKLOAD REPOSITORY

Advertisements