Category Archives: Configuration

Dude … Where’s My GUI?

Are you running Exadata Storage Server 12.1.2.1.0?

Have you tried to get a graphical tool, such as DBCA, DBUA or even VNC to run on your Exadata lately?

If, like me, you had quite the struggle until a friendly sysadmin installed a bunch of packages for you, you might be interested in reading this MOS note:

1969308.1 – Unable to run graphical tools (gui) on Exadata 12.1.2.1.0 or later.

I understand that Oracle have been increasingly hardening Exadata since the birth of their X3-2 machines, but you’d think that you wouldn’t need to add extra packages to a system that’s meant to be “ready to use” once your choice of consultant has finished with the Deployment Assistant.

After all, aren’t DBCA / DBUA Oracle’s tools of choice? Do they really want DBAs to spend their time creating response files and running these tools from the command line?

Odd.

Tagged , , , , , ,

Exadata Critical Issue EX21

Oracle announced a new Exadata Critical Issue this morning (EX21) which applies to the ESS software versions 12.1.1.1.2 and 12.1.2.1.1.

“This issue is encountered only when a disk media error occurs while synchronous I/O is performed. Because the majority of I/O operations issued with Exadata storage are done asynchronously, and this problem is possible only when disk media errors are experienced while synchronous I/O is performed, the likelihood of experiencing this problem is low. However, the impact of hitting this problem can potentially be high.

This problem affects Exadata Storage Server software versions 12.1.2.1.1 and 12.1.1.1.2.

Disk corruption symptoms are varied. Some corruptions will be resolved automatically by Oracle Database, while other corruptions will lead to unexpected process shutdown due to internal errors.”

ESS 12.1.1.2.1 DOES have a patch available, but 12.1.2.1.1 does not at the moment (the patch is “pending”).  I’m sure it will become available soon.

I have MOS email me whenever the Exadata Critical Issues document (1270094.1) is updated so I’m quickly aware of the latest important bugs. It’s pretty neat and I’d advise other Exadata types to make use of it as well.

Tagged , , , ,

Leap Second 2015

L’Observatoire de Paris has decided that there will be a “leap second” on June 30th, 2015.  At 23:59:60 on this date, an additional second will be “inserted” into UTC (Coordinated Universal Time) to take into account the slightly irregular rotation of our planet.

The last “leap second” was on June 30th, 2012, when a bunch of servers running Linux had problems (including, and not limited to, Qantas Airways, reddit and anything running Hadoop).

This year, Google and Amazon both plan to implement a “leap smear” whereby they will add the “leap second” over an extended period on June 30th.

Be aware that a number of AWS services are affected and resolving issues with your EC2 instances is your responsibility.
 

The “Leap Second” and Oracle
The Oracle database requires no patches and has no problem with the “leap second” changes on the O/S level.

No action is required for Exadata servers which are NOT running 12.1.2.1.0.  If you ARE running this version, you will need to follow MOS note 1986986.1 to update your NTP configuration.
 

Linux Servers
However, any derivative of Red Hat Enterprise Linux (including Oracle Enterprise Linux, Oracle Unbreakable Enterprise Kernel and Asianux) versions 4.4 through 6.2, using kernel versions 2.4 to 2.6.39, may be affected.  This applies to both baremetal or virtualized environments.

In MOS 1472421.1, Oracle state that impacted servers may become unresponsive sometime before the “leap second” on June 30th, with the following seen in various logs (system, console, netconsole, etc):
 

INFO: task kjournald:1119 blocked for more than 120 seconds.
“echo 0 > /proc/sys/kernel/hung_task_timeout_secs” disables this message.
kjournald D ffff880028087f00 0 1119 2 0x00000000
ffff8807ac15dc40 0000000000000246 ffffffff8100e6a1 ffffffffb053069f
ffff8807ac22e140 ffff8807ada96080 ffff8807ac22e510 ffff880028073000
ffff8807ac15dcd0 ffff88002802ea60 ffff8807ac15dc20 ffff8807ac22e140

 
Alternatively, Java applications may suddenly start to use 100% of the CPU with the event “Leap second insertion causes futex to repeatedly timeout“.

The primary workaround is to stop the NTP service, reset the system clock and restart the NTP service:
 

/etc/init.d/ntpd stop
date -s “`date`”
/etc/init.d/ntpd start

 

An additional workaround is to reboot the server.
 

Oracle Enterprise Manager
Per MOS 1472651.1, any version of OEM from 10.2.0.5 to 12c running on Linux may see the OEM agent or the OMS service consume excessive CPU on or around “leap seconds”.

Suggested workarounds are identical to the Linux servers (reset the system clock or reboot the server).
 

Oracle Clusterware on Solaris Servers
Per MOS 759143.1, servers running Solaris 5.8 to 5.10 and running Oracle Clusterware 10.1 to 11.1 may suffer a node reboot unless they have the required patches.

The workaround for this issue is to configure the local xntpd daemon to disable PLL mode and enable skewing or apply Oracle Clusterware patch bundles / MLRs and increase the oprocd daemon timeout margin appropriately.
 

References

  • Leap seconds (extra second in a year) and impact on the Oracle database. (Doc ID 730795.1)
  • Leap Second Time Adjustment (e.g. on June 30, 2015 at 23:59:59 UTC) and Its Impact on Exadata Database Machine (Doc ID 1986986.1)
  • Enterprise Manager Management Agent or OMS CPU Use Is Excessive near Leap Second Additions on Linux (Doc ID 1472651.1)
  • NTP leap second event causing Oracle Clusterware node reboot (Doc ID 759143.1)
  • Leap Second Hang – CPU Can Be Seen at 100% (Doc ID 1472421.1)

 

Tagged , , , , , , ,

Exadata: why a half-rack is the “recommended minimum size”

Lots of shops dipped their toes in the Exadata water with a quarter-rack first of all.

(For those who are new to the Exadata party and don’t know of a world without elastic configurations, a quarter-rack is a machine with two compute nodes and three storage cells).

If you are / were one of those customers, you’ll probably have winced at the difference between the “raw” storage capacity and the “usable” storage capacity when you got to play with it for the first time.

While you could choose to configure your DATA and RECO diskgroups with HIGH redundancy in ASM, did you notice that you couldn’t do the same with the DBFS_DG / SYSTEM_DG?

Check out page 5 in this document about best practices for consolidation on Exadata.

“A slight HA disadvantage of an Oracle Exadata Database Machine X3-2 quarter or eighth rack is that there are insufficient Exadata cells for the voting disks to reside in any high redundancy disk group which can be worked around by expanding with 2 more Exadata cells. Voting disks require 5 failure groups or 5 Exadata cells; this is one of the main reasons why an Exadata half rack is the recommended minimum size.”

Basically, you need at least 5 storage cells for each Exadata environment if you want to have true “high availability” with your Exadata machine.

While quarter-rack machines have 3 storage cells, half-rack machines have 7 or 8 storage cells, depending on the model.

Let’s say that you have the model with 8 storage cells:  if you split a half-rack machine equally, you’ll have 2x quarter-rack machines with 4 storage cells, so you would need one more storage cell per machine to provide HA for the SYSTEMDG / DATA_DG diskgroup.

For some reason, this nugget escaped my attention until recently.  Even more reason to have a standby Exadata machine at your DR site …

Mark

 

Tagged , , , , ,

Exadata Critical Issue DB27

Oracle announced a new Exadata Critical Issue yesterday (DB27) as per MOS 2004572.1.

11.2.0.4 databases running with Grid Infrastructure 12.1 (either 12.1.0.1 or 12.1.0.2) will crash whenever a health update is received (such as when a cell disk is marked “predictive failure”).

The database ASMB process terminates causing the database instance to crash.  The following errors are reported in the database alert.log:

ORA-15064: communication failure with ASM instance
ORA-03115: unsupported network datatype or representation
ASMB: terminating the instance due to error 15064

Perform one of the following actions to prevent bug 20361671:

  1. Upgrade the Grid Infrastructure home to 12.1.0.2.7 (Database Patch for Engineered Systems and DB In-Memory 12.1.0.2.7) or later.
  2. Apply patch 20361671 to the Grid Infrastructure home.

At the time of writing, the patch README incorrectly omits the rootcrs.pl commands required to unlock and lock the Grid Infrastructure home before and after patching, respectively.

Prior to running the opatch command to apply the patch run the following rootcrs.pl command as the root user to unlock the Grid Infrastructure home:

$GI_HOME/crs/install/rootcrs.pl -unlock

After applying the patch run the following rootcrs.pl command as the root user to lock the Grid Infrastructure home:

$GI_HOME/crs/install/rootcrs.pl -patch

Tagged , , ,

Oracle OpenWorld 2015

Submitted my Oracle OpenWorld 2015 presentation earlier.  Today is the last day to submit proposals for presentations or tutorials.

Oracle have extended their deadline for proposals until May 6th!

 

 

Tagged , , , , ,

Exadata Critical Issue EX19

Overnight, Oracle announced a new Exadata Critical Issue (EX19) which applies to storage cells running 12.1.1.1.1 or earlier of the ESS software.

The bug is 19695225 and more information can be found on MOS 1991445.1.

Cell disk metadata corruption and loss of cell disk content (i.e. grid disk, ASM disk) will occur if many CREATE GRIDDISK or ALTER GRIDDISK commands that modify cell disk space configuration are run over time for the same cell disk.

If CellCLI griddisk commands are typically run in parallel on all storage servers simultaneously, which is a common maintenance practice, and the issue occurs on multiple storage servers at the same time such that all redundant disk extents are lost for files in an ASM disk group, then the disk group will dismount and database will crash, and will require restoring files from backup.

Rolling cell maintenance commands that change grid disk state, such as ALTER GRIDDISK INACTIVE and ALTER GRIDDISK ACTIVE, do not contribute to this issue.

Since initial system deployment if you have recreated or reconfigured grid disks using CellCLI commands CREATE GRIDDISK or ALTER GRIDDISK more than 31 times, then the likelihood of occurrence is high.

 

Risk and Detection
The risk to test and development systems is expected to be higher than production systems due to the dynamic manner in which they may be reconfigured.

To determine if your system is exposed to this issue, and how close the system is to having cell disk metadata corruption, download and run the script attached to this document on all storage servers as the root user.

Possible symptoms that cell disk metadata corruption has occurred as a result of this bug include the following:

  • ASM disk group(s) dismount and database crash following CREATE GRIDDISK or ALTER GRIDDISK.
  • ASM disk group(s) cannot be mounted following the disk group dismount.
  • Error ORA-600 [addNewSegmentsToGDisk_2] is reported in the cell alert.log.

 

The cell disk corruption cannot be repaired once it occurs.  Recovery requires recreating cell disks, grid disks, and ASM disk groups, then restoring affected databases from backup.

Perform one of the following actions to prevent bug 19695225:

  • Upgrade to Exadata Storage Server version 12.1.2.1.1 or later (Exadata 12.1.2.1.0 contains the fix to this issue, however 12.1.2.1.1 or later is the recommended version).
  • Upgrade to Exadata Storage Server version 12.1.1.1.2 or later 12.1.1.1.x.
  • Apply patch 19695225 to all Exadata Storage Servers. At the time of writing a patch is available for Exadata versions 12.1.1.1.1, 11.2.3.3.1, and 11.2.3.3.0.
  • Avoid running CellCLI commands CREATE GRIDDISK or ALTER GRIDDISK until the code fix is applied via upgrade or patch apply.

 

I think it’s a good idea to run the check script on your storage cells as root to determine whether there’s any immediate risk (probably unlikely). If necessary, consider applying the patch – but you should be planning your patching to the QFSDP April 2015 now, right? 🙂

 

Mark

Tagged , , , ,

DBA 3.0 – How to Become a Real-World Exadata DBA – IOUG Collaborate 2015

According to a Book of Lists survey, 41% of people’s biggest fear is “public speaking”.  To put that into perspective, “death” is the biggest fear for 19%, “flying” for 18% and “clowns” don’t even register (which does make me seriously doubt the survey’s credibility).

I gave my first public presentation at IOUG Collaborate 2015 last week in Las Vegas and I didn’t die.

Why did do make your presentation debut at the second largest Oracle event on the calendar?  Excellent question.

Continue reading

Tagged , , , ,

Exadata and OVM

Exadata and OVM.

OVM and Exadata.

It’s not been the best-kept secret in the world, but it is now a reality with Oracle’s new X5 engineered systems.

I don’t like it either, though I admit that I might just be a purist snob. As far as I can see, this might be useful in two possible scenarios:

1) Saving on additional cost option licensing.
Picture this: you have four databases on your Exadata machine and only one of them needs the {INSERT EXPENSIVE COST OPTION HERE} option.

Instead of buying, for instance, an Advanced Security license for all 144 cores, you might consider dividing up your X5-2 half-rack into four virtual machines – one for each database – and only license Advanced Security for the virtual machine on which that particular database resides.

Assuming each virtual machine is provisioned identically (with 36 cores each instead of the full 144), the cost of licensing ASO is 25% of what it was if you had licensed the entire machine.

Some of those cost options are expensive, definitely. But why not consider a smaller, dedicated Exadata machine for that database instead? Why not consider an alternative instead, such as ODA?

2) Capacity on-demand licensing.
Let’s say that you KNOW you’re going to migrate more databases onto your Exadata machine in the future, but you’re not using its full capabilities to support the databases that are running there right now. Bear with me for argument’s sake…

With OVM, you’re able to license a minimum of 40% of the cores on your Exadata system. If you’re not getting close to fifth gear right now, but you know you will be at some point, you could use OVM to license in a “capacity on-demand” fashion and crank things up as your needs increase.

Of course, given the exponential improvements that come with each new version of Exadata, wouldn’t you try your best to wait until a couple of months before you DID need the extra horsepower so you could buy the latest and greatest Exadata then?

Let’s say you DO eventually get to 100% usage, you still have that extra virtualization layer in the stack and whatever issues go with it, including having to maintain it. To remove it, one assumes that the machine would need to be rebuilt, which isn’t a particularly attractive option.

“Exadata is expensive”
I understand the “Exadata is expensive” argument, but I don’t really think this helps with that very much – you’re still laying down a big wad of cash when you buy the hardware, no matter how you slice the licensing up. Is it really going to be worth the hassle of that extra virtualization layer to save (and possibly only temporarily) on licenses?

Oddly, I think the new elastic configuration capability in X5 makes the argument harder to make: you could achieve the same thing by choosing a different hardware configuration and/or adding comp nodes or storage cells as your needs dictate.

I’m sure there’s a compelling reason out there for putting OVM on Exadata that I haven’t figured out yet, there usually is. Until then, I’m back to scratching my head…

Tagged , , , , , ,