Overnight, Oracle announced a new Exadata Critical Issue (EX19) which applies to storage cells running 18.104.22.168.1 or earlier of the ESS software.
The bug is 19695225 and more information can be found on MOS 1991445.1.
Cell disk metadata corruption and loss of cell disk content (i.e. grid disk, ASM disk) will occur if many CREATE GRIDDISK or ALTER GRIDDISK commands that modify cell disk space configuration are run over time for the same cell disk.
If CellCLI griddisk commands are typically run in parallel on all storage servers simultaneously, which is a common maintenance practice, and the issue occurs on multiple storage servers at the same time such that all redundant disk extents are lost for files in an ASM disk group, then the disk group will dismount and database will crash, and will require restoring files from backup.
Rolling cell maintenance commands that change grid disk state, such as ALTER GRIDDISK INACTIVE and ALTER GRIDDISK ACTIVE, do not contribute to this issue.
Since initial system deployment if you have recreated or reconfigured grid disks using CellCLI commands CREATE GRIDDISK or ALTER GRIDDISK more than 31 times, then the likelihood of occurrence is high.
Risk and Detection
The risk to test and development systems is expected to be higher than production systems due to the dynamic manner in which they may be reconfigured.
To determine if your system is exposed to this issue, and how close the system is to having cell disk metadata corruption, download and run the script attached to this document on all storage servers as the root user.
Possible symptoms that cell disk metadata corruption has occurred as a result of this bug include the following:
- ASM disk group(s) dismount and database crash following CREATE GRIDDISK or ALTER GRIDDISK.
- ASM disk group(s) cannot be mounted following the disk group dismount.
- Error ORA-600 [addNewSegmentsToGDisk_2] is reported in the cell alert.log.
The cell disk corruption cannot be repaired once it occurs. Recovery requires recreating cell disks, grid disks, and ASM disk groups, then restoring affected databases from backup.
Perform one of the following actions to prevent bug 19695225:
- Upgrade to Exadata Storage Server version 22.214.171.124.1 or later (Exadata 126.96.36.199.0 contains the fix to this issue, however 188.8.131.52.1 or later is the recommended version).
- Upgrade to Exadata Storage Server version 184.108.40.206.2 or later 220.127.116.11.x.
- Apply patch 19695225 to all Exadata Storage Servers. At the time of writing a patch is available for Exadata versions 18.104.22.168.1, 22.214.171.124.1, and 126.96.36.199.0.
- Avoid running CellCLI commands CREATE GRIDDISK or ALTER GRIDDISK until the code fix is applied via upgrade or patch apply.
I think it’s a good idea to run the check script on your storage cells as root to determine whether there’s any immediate risk (probably unlikely). If necessary, consider applying the patch – but you should be planning your patching to the QFSDP April 2015 now, right? 🙂