Exadata Critical Issue DB20

A new Exadata Critical Issue – EX20 – has been announced on MOS note 1270094.1 and applies to Exadata Storage Server versions 12.1.1.1.0 and 12.1.1.1.1.

The issue is caused by bug 19211091:

CELLSRV Internal Error ORA-600 [DiskIOSched::GetCatIndex:2]

Further details can be found in MOS 1967985.1

You might hit this bug if your database resource manager plan contains sub-plans and OTHER_GROUPS is present in a sub-plan instead of the top plan.

The CELLSRV trace file will contain one or more entries indicating CELLSRV process failure similar to the following:

ORA-00600: internal error code, arguments: [DiskIOSched::GetCatIndex:2], [4294967295], [], [], [], [], [], [], [], [], [], []

CELLSRV encountered a fatal signal 11. LWPID: 28000 userId: 80 kernelId: 80 pthreadID: 139785595115840
Ignoring fatal signal encountered during Cellsrv state dump LWPID: 28000 userId: 80 kernelId: 80 pthreadID: 139785595115840

If CELLSRV fails on multiple cells simultaneously, then the ASM disk groups may dismount or ASM instances may crash, potentially causing databases to crash.

Typically, the Restart Server (RS) process will restart CELLSRV after it fails.  However, too many CELLSRV failures will trigger “flood control” and prevent further CELLSRV restarts.  Flood control is indicated in the trace file with entries similar to the following:

[RS] monitoring process /opt/oracle/cell/cellsrv/bin/cellrsomt (pid: 26763) returned with error: 126
[RS] Monitoring process for service CELLSRV detected a flood of restarts. Disable monitoring process.
RS-7445 [CELLSRV monitor disabled] [Detected a flood of restarts] [] [] [] [] [] [] [] [] [] []

Workarounds
The recommended action is to upgrade to Exadata Storage Server software version 12.1.1.1.2 (or higher) or 12.1.2.1.1 (or higher).

Alternately, you can apply patch 19211091.

As a temporary workaround, you can disable the Resource Manager on the affected databases, modify the appropriate plan so that the OTHER_GROUPS directive is in the top plan (and not any sub-plan) and re-enable the Resource Manager:

ALTER SYSTEM SET resource_manager_plan=” SCOPE=both SID=’*’;

SELECT unique name
FROM v$rsrc_plan_history
WHERE name NOT IN (
SELECT plan
FROM dba_rsrc_plan_directives
WHERE plan IN (
SELECT unique name
FROM v$rsrc_plan_history)
AND group_or_subplan = ‘OTHER_GROUPS’);

SYS.DBMS_RESOURCE_MANAGER.CREATE_PLAN_DIRECTIVE(
plan => ‘MY_PLAN’,
group_or_subplan => ‘OTHER_GROUPS’,
mgmt_p2 => 80,
switch_estimate => FALSE,
comment => NULL);

ALTER SYSTEM SET resource_manager_plan=’MY_PLAN’ SCOPE=both SID=’*’;

Advertisements
Tagged , , ,

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: