Category Archives: Engineered systems

Exadata X6

Blink and you might have missed it, but the Exadata X6 was officially announced today

As has become the norm, Oracle have doubled-down on the specs compared to the X5:

  • 2x disk capacity
  • 2x Flash capacity
  • 2x faster Flash
  • 25% faster CPUs
  • 13% faster DRAM

With the X6-2 machine, you still have Infiniband running at 40Gb/sec, but the compute nodes and the storage servers now have the following:

X-6 Compute Node

  • 2x 22-core Broadwell CPUs
  • 256Gb of DDR4 DRAM (expandable to 768Gb)
  • 4x 600Gb 10,000 RPM disks for local storage (expandable to 6)
  • DDR4 DRAM

High Capacity Storage Server

  • 2x 10-core Broadwell CPUs
  • 128Gb of DDR4 DRAM
  • 12x 8Tb 7,000 RPM Helium SAS3 disks
  • 4x 3.2Tb NVMe PCIe 3.0 Flash cards

Extreme Flash Storage Server

  • 2-socket, 10-core Broadwell CPUs
  • 128Gb of DDR4 DRAM
  • 8x 3.2Tb NVMe PCIe 3.0 Flash cards

What does all of that give you when it comes down to it?

Well, remember that the eighth-rack is the same as a quarter-rack, but you have access to half the cores and half the storage across the board (you still have two compute nodes and three storage servers):

High Capacity Eighth-Rack

  • 44-core compute nodes
  • 30-core storage servers
  • 144Tb raw usable disk storage
  • 19.2Tb Flash storage

Extreme Flash Eighth-Rack

  • 44-core compute nodes
  • 30-core storage servers
  • 38.4Tb Flash storage

Minimum licensing requirements is 16 cores for the eighth-rack and 28 cores for the quarter-rack.

I’m sure you can read through the sales stuff yourself, but aside from the UUUUGE increase in hardware, two new features of the X6 really pop out for me.

Exadata now has the ability to preserve storage indexes through a storage cell reboot. Anyone who had to support an older Exadata machine will remember quite how much of a big deal that used to be: the wait for the storage index to be rebuilt would take hours and often require some major understanding on the part of user population and management to get through the first day or so after some maintenance.

Probably the biggest thing is that Oracle have introduced high availability quorum disks for the quarter-rack and eighth-rack machines. I blogged about this before as I thought it had the potential to be a real “gotcha” if you were expecting to run high redundancy diskgroups on anything less than a half-rack.

No longer.

Now, a copy of the quorum disk is stored locally on each database node, allowing you to lose a storage cell and still be able to maintain your high redundancy.

This is a particularly useful development when you remember that Oracle have doubled the size of the high-capacity disks from 4Tb to 8Tb. Why? Well, because re-balancing a bunch of 8Tb disks is going to take longer than re-balancing the same number of 4Tb disks.

I’ll be going to Collaborate IOUG 2016 next week and I’m looking forward to hearing more about the new kit there.

Mark

Advertisements
Tagged , , ,

Today’s Nugget: Oracle OpenWorld 2015 … Or Not

Alas, my submission for this year’s Oracle OpenWorld was turned down by Oracle a little while ago.

Maybe I shouldn’t have installed this browser extension?

Tee-hee ūüôā

I’m nothing if not persistent(ly annoying) – so I submitted a similar abstract to the 2016 RMOUG Training Days.

Tagged , , , ,

Dude … Where’s My GUI?

Are you running Exadata Storage Server 12.1.2.1.0?

Have you tried to get a graphical tool, such as DBCA, DBUA or even VNC to run on your Exadata lately?

If, like me, you had quite the struggle until a friendly sysadmin installed a bunch of packages for you, you might be interested in reading this MOS note:

1969308.1 – Unable to run graphical tools (gui) on Exadata 12.1.2.1.0 or later.

I understand that Oracle have been increasingly hardening Exadata since the birth of their X3-2 machines, but you’d think that you wouldn’t need to add extra packages to a system that’s meant to be “ready to use” once your choice of consultant has finished with the Deployment Assistant.

After all, aren’t DBCA / DBUA Oracle’s tools of choice? Do they really want DBAs to spend their time creating response files and running these tools from the command line?

Odd.

Tagged , , , , , ,

New Exadata Critical Issues – EX23 and EX24

Over the weekend, Oracle announced two new Critical Issues for Exadata Storage Server (EX23 and EX24), both impacting version 12.1.2.1.

Both the new Critical Issues can be patched by either applying the one-off patch 21251493 or by upgrading the Exadata Storage Server software to 12.1.2.1.2.

 

Critical Issue EX23
Affects Exadata Storage Server 12.1.2.1.0 and 12.1.2.1.1

Bug 21174310 – wrong results, ORA-1438 errors or other internal errors (ORA-00600 and ORA-07445) are possible from smart scan offloaded queries against HCC or OLTP compressed tables stored on Exadata storage if:

  • Exadata Storage Server is version 12.1.2.1.0 or 12.1.2.1.1 AND
  • Oracle Database was upgraded from 11.2 to 12.1 AND
  • A smart scan offloaded query is issued against an OLTP compressed table or an HCC table containing OLTP compressed blocks

The workaround is to recreate the table.

The recommended action is to upgrade to Exadata Storage Server software version 12.1.2.1.2 (or higher).

Alternatively, apply patch 21251493 to Exadata Storage Servers running version 12.1.2.1.1.  Note that patch 21251493 contains additional fixes required to resolve other critical issues.

MOS 2032464.1 has additional details.

 

Critical Issue EX24
Affects Exadata Storage Server 12.1.2.1.1

After replacing a failed system disk (disk 0 or disk 1), the new disk is not correctly configured leaving the system vulnerable to the other system disk failing. The likelihood of occurrence is high when running Exadata version 12.1.2.1.1 and a failed system disk is replaced.

The workaround is to follow the instructions in MOS 2003674.1.

The recommended action is to upgrade to Exadata Storage Server software version 12.1.2.1.2 (or higher).

Alternatively, apply Patch 21251493 to Exadata Storage Servers running version 12.1.2.1.1. Note that patch 21251493 contains additional fixes required to resolve other critical issues.

MOS 2032402.1 has additional details.

 

References

  • Bug 21174310 – Wrong results or ORA-1438 errors possible from smart scan offloaded queries against HCC or OLTP compressed tables stored on Exadata storage (Doc ID 2032464.1)
  • Important Fixes Required for System Disk Replacement on Exadata Storage Servers Running Version 12.1.2.1.1 (Doc ID 2032402.1)
  • Exadata Storage Software 12.1.2.1.0 and 12.1.2.1.1 System Disk Replacement Issues (Doc ID 2003674.1)
  • Exadata Critical Issues (Doc ID 1270094.1)
  • Patch 21251493
Tagged , , , , , , , , , , ,

Oracle Critical Patch Update for July 2015

Oracle’s Critical Patch Update is out for July 2015:

http://www.oracle.com/technetwork/topics/security/cpujul2015-2367936.html

Affected are database versions 11.1.0.7, 11.2.0.3, 11.2.0.4, 12.1.0.1 and 12.1.0.2.

This is the final patch for both the 11.1.0.7 and 11.2.0.3 releases. The final patch for 12.1.0.1 will be released in January 2016.

The most prominent bug on the risk matrix is CVE-2015-2629 whereby a remote authenticated user can exploit a flaw in the Java VM component to gain elevated privileges.

For the 11.2.0.4 patches, you can apply one of the following:

11.2.0.4 SPU for UNIX: patch 20803583
11.2.0.4.7 PSU for UNIX: patch 20760982
11.2.0.4.17 Quarterly Database Patch for Exadata (July 2015): patch 21142006
July 2015 Quarterly Full-Stack Patch for Exadata: patch 21186703

Don’t forget your Grid Infrastructure patching:

11.2.0.4 PSU for UNIX: patch 20996923

And, of course, ever since those Java bugs were discovered, we should also patch the JVM:

11.2.0.4.4 Database PSU for UNIX: patch 21068539

Happy patching!

Tagged , , ,

Exadata Critical Issue EX21

Oracle announced a new Exadata Critical Issue this morning (EX21) which applies to the ESS software versions 12.1.1.1.2 and 12.1.2.1.1.

“This issue is encountered only when a disk media error occurs while synchronous I/O is performed. Because the majority of I/O operations issued with Exadata storage are done asynchronously, and this problem is possible only when disk media errors are experienced while synchronous I/O is performed, the likelihood of experiencing this problem is low. However, the impact of hitting this problem can potentially be high.

This problem affects Exadata Storage Server software versions 12.1.2.1.1 and 12.1.1.1.2.

Disk corruption symptoms are varied. Some corruptions will be resolved automatically by Oracle Database, while other corruptions will lead to unexpected process shutdown due to internal errors.”

ESS 12.1.1.2.1 DOES have a patch available, but 12.1.2.1.1 does not at the moment (the patch is “pending”). ¬†I’m sure it will become available soon.

I have MOS email me whenever the Exadata Critical Issues document (1270094.1) is updated so I’m quickly aware of the latest important bugs. It’s pretty neat and I’d advise other Exadata types to make use of it as well.

Tagged , , , ,

Leap Second 2015

L’Observatoire de Paris has decided that there will be a “leap second” on June 30th, 2015. ¬†At 23:59:60 on this date, an additional second will be “inserted” into UTC (Coordinated Universal Time) to take into account the slightly irregular rotation of our planet.

The last “leap second” was on June 30th, 2012, when a bunch of servers running Linux had problems (including, and not limited to, Qantas Airways, reddit and anything running Hadoop).

This year, Google and Amazon both plan to implement a “leap smear” whereby they will add the “leap second” over an extended period on June 30th.

Be aware that a number of AWS services are affected and resolving issues with your EC2 instances is your responsibility.
 

The “Leap Second” and Oracle
The Oracle database requires no patches and has no problem with the “leap second” changes on the O/S level.

No action is required for Exadata servers which are NOT running 12.1.2.1.0.  If you ARE running this version, you will need to follow MOS note 1986986.1 to update your NTP configuration.
 

Linux Servers
However, any derivative of Red Hat Enterprise Linux (including Oracle Enterprise Linux, Oracle Unbreakable Enterprise Kernel and Asianux) versions 4.4 through 6.2, using kernel versions 2.4 to 2.6.39, may be affected.  This applies to both baremetal or virtualized environments.

In MOS 1472421.1, Oracle state that impacted servers may become unresponsive sometime before the “leap second” on June 30th, with the following seen in various logs (system, console, netconsole, etc):
 

INFO: task kjournald:1119 blocked for more than 120 seconds.
“echo 0 > /proc/sys/kernel/hung_task_timeout_secs” disables this message.
kjournald D ffff880028087f00 0 1119 2 0x00000000
ffff8807ac15dc40 0000000000000246 ffffffff8100e6a1 ffffffffb053069f
ffff8807ac22e140 ffff8807ada96080 ffff8807ac22e510 ffff880028073000
ffff8807ac15dcd0 ffff88002802ea60 ffff8807ac15dc20 ffff8807ac22e140

 
Alternatively, Java applications may suddenly start to use 100% of the CPU with the event “Leap second insertion causes futex to repeatedly timeout“.

The primary workaround is to stop the NTP service, reset the system clock and restart the NTP service:
 

/etc/init.d/ntpd stop
date -s “`date`”
/etc/init.d/ntpd start

 

An additional workaround is to reboot the server.
 

Oracle Enterprise Manager
Per MOS 1472651.1, any version of OEM from 10.2.0.5 to 12c running on Linux may see the OEM agent or the OMS service consume excessive CPU on or around “leap seconds”.

Suggested workarounds are identical to the Linux servers (reset the system clock or reboot the server).
 

Oracle Clusterware on Solaris Servers
Per MOS 759143.1, servers running Solaris 5.8 to 5.10 and running Oracle Clusterware 10.1 to 11.1 may suffer a node reboot unless they have the required patches.

The workaround for this issue is to configure the local xntpd daemon to disable PLL mode and enable skewing or apply Oracle Clusterware patch bundles / MLRs and increase the oprocd daemon timeout margin appropriately.
 

References

  • Leap seconds (extra second in a year) and impact on the Oracle database. (Doc ID 730795.1)
  • Leap Second Time Adjustment (e.g. on June 30, 2015 at 23:59:59 UTC) and Its Impact on Exadata Database Machine (Doc ID 1986986.1)
  • Enterprise Manager Management Agent or OMS CPU Use Is Excessive near Leap Second Additions on Linux (Doc ID 1472651.1)
  • NTP leap second event causing Oracle Clusterware node reboot (Doc ID 759143.1)
  • Leap Second Hang – CPU Can Be Seen at 100% (Doc ID 1472421.1)

 

Tagged , , , , , , ,

Exadata Critical Issue DB20

A new Exadata Critical Issue – EX20 – has been announced on MOS note 1270094.1 and applies to Exadata Storage Server versions 12.1.1.1.0 and 12.1.1.1.1.

The issue is caused by bug 19211091:

CELLSRV Internal Error ORA-600 [DiskIOSched::GetCatIndex:2]

Further details can be found in MOS 1967985.1

You might hit this bug if your database resource manager plan contains sub-plans and OTHER_GROUPS is present in a sub-plan instead of the top plan.

The CELLSRV trace file will contain one or more entries indicating CELLSRV process failure similar to the following:

ORA-00600: internal error code, arguments: [DiskIOSched::GetCatIndex:2], [4294967295], [], [], [], [], [], [], [], [], [], []

CELLSRV encountered a fatal signal 11. LWPID: 28000 userId: 80 kernelId: 80 pthreadID: 139785595115840
Ignoring fatal signal encountered during Cellsrv state dump LWPID: 28000 userId: 80 kernelId: 80 pthreadID: 139785595115840

If CELLSRV fails on multiple cells simultaneously, then the ASM disk groups may dismount or ASM instances may crash, potentially causing databases to crash.

Typically, the Restart Server (RS) process will restart CELLSRV after it fails. ¬†However, too many CELLSRV failures will trigger “flood control” and prevent further CELLSRV restarts. ¬†Flood control is indicated in the trace file with entries similar to the following:

[RS] monitoring process /opt/oracle/cell/cellsrv/bin/cellrsomt (pid: 26763) returned with error: 126
[RS] Monitoring process for service CELLSRV detected a flood of restarts. Disable monitoring process.
RS-7445 [CELLSRV monitor disabled] [Detected a flood of restarts] [] [] [] [] [] [] [] [] [] []

Workarounds
The recommended action is to upgrade to Exadata Storage Server software version 12.1.1.1.2 (or higher) or 12.1.2.1.1 (or higher).

Alternately, you can apply patch 19211091.

As a temporary workaround, you can disable the Resource Manager on the affected databases, modify the appropriate plan so that the OTHER_GROUPS directive is in the top plan (and not any sub-plan) and re-enable the Resource Manager:

ALTER SYSTEM SET resource_manager_plan=” SCOPE=both SID=’*’;

SELECT unique name
FROM v$rsrc_plan_history
WHERE name NOT IN (
SELECT plan
FROM dba_rsrc_plan_directives
WHERE plan IN (
SELECT unique name
FROM v$rsrc_plan_history)
AND group_or_subplan = ‘OTHER_GROUPS’);

SYS.DBMS_RESOURCE_MANAGER.CREATE_PLAN_DIRECTIVE(
plan => ‘MY_PLAN’,
group_or_subplan => ‘OTHER_GROUPS’,
mgmt_p2 => 80,
switch_estimate => FALSE,
comment => NULL);

ALTER SYSTEM SET resource_manager_plan=’MY_PLAN’ SCOPE=both SID=’*’;

Tagged , , ,

Exadata: why a half-rack is the “recommended minimum size”

Lots of shops dipped their toes in the Exadata water with a quarter-rack first of all.

(For those who are new to the Exadata party and don’t know of a world without elastic configurations, a quarter-rack is a machine with two compute nodes and three storage cells).

If you are / were one of those customers, you’ll probably have winced at the difference between the “raw” storage capacity and the “usable” storage capacity when you got to play with it for the first time.

While you could choose to configure your DATA and RECO diskgroups with HIGH redundancy in ASM, did you notice that you couldn’t do the same with the DBFS_DG / SYSTEM_DG?

Check out page 5 in this document about best practices for consolidation on Exadata.

“A slight HA disadvantage of an Oracle Exadata Database Machine X3-2 quarter or eighth rack is that there are insufficient Exadata cells for the voting disks to reside in any high redundancy disk group which can be worked around by expanding with 2 more Exadata cells. Voting disks require 5 failure groups or 5 Exadata cells; this is one of the main reasons why an Exadata half rack is the recommended minimum size.”

Basically, you need at least 5 storage cells for each Exadata environment if you want to have true “high availability” with your Exadata machine.

While quarter-rack machines have 3 storage cells, half-rack machines have 7 or 8 storage cells, depending on the model.

Let’s say that you have the model with 8 storage cells: ¬†if you split a half-rack machine equally, you’ll have 2x quarter-rack machines with 4 storage cells,¬†so you would need one more storage cell per machine to provide HA for the SYSTEMDG / DATA_DG diskgroup.

For some reason, this nugget escaped my attention until recently. ¬†Even more reason to have a standby Exadata machine at your DR site …

Mark

 

Tagged , , , , ,

Oracle OpenWorld 2015

Submitted my Oracle OpenWorld 2015 presentation earlier.  Today is the last day to submit proposals for presentations or tutorials.

Oracle have extended their deadline for proposals until May 6th!

 

 

Tagged , , , , ,
%d bloggers like this: