Oracle Database In Action: ORA-15064: communication failure with ASM instance after adding asm disks to existing diskgroup

ASM (ORA-600[kffMapMesgAcquire02]) and Database (ORA-15064: communication failure with ASM instance) [ID 1483294.1]"

Applies to:
Oracle Server - Enterprise Edition - Version 11.2.0.3 and later
Information in this document applies to any platform.
Symptoms

4 node RAC - 112030 (no Interim patches) - Solaris

While attempting to a disk to an ASM disk group, ASM2 gave the following errors and was unresponsive (all queries hung):

alert_+ASM2.log:
~~~~~~~~~~~~~~~~~~
...
Mon Jun 18 12:23:18 2012
Errors in file
/vzwhome/oracle/app/oracle/diag/asm/+asm/+ASM2/trace/+ASM2_ckpt_332.trc
(incident=104134):
ORA-00600: internal error code, arguments: [kffMapMesgAcquire02], [], [], [], [], [], [], [], [], [], [], []
Mon Jun 18 12:23:18 2012
Dumping diagnostic data in directory=[cdmp_20120618122318], requested by
(instance=4, osid=5601 (LMD0)), summary=[incident=41683].
Incident details in:
/vzwhome/oracle/app/oracle/diag/asm/+asm/+ASM2/incident/incdir_104134/+ASM2_ck
pt_332_i104134.trc
Dumping diagnostic data in directory=[cdmp_20120618122324], requested by
(instance=2, osid=332 (CKPT)), summary=[incident=104134].
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file
/vzwhome/oracle/app/oracle/diag/asm/+asm/+ASM2/trace/+ASM2_ckpt_332.trc:
ORA-00600: internal error code, arguments: [kffMapMesgAcquire02], [], [], [],
[], [], [], [], [], [], [], []
CKPT (ospid: 332): terminating the instance due to error 469
Mon Jun 18 12:23:27 2012
ORA-1092 : opitsk aborting process
Mon Jun 18 12:23:27 2012
License high water mark = 24
Mon Jun 18 12:23:29 2012
System state dump requested by (instance=2, osid=332 (CKPT)),
summary=[abnormal instance termination].
System State dumped to trace file
/vzwhome/oracle/app/oracle/diag/asm/+asm/+ASM2/trace/+ASM2_diag_304.trc
Mon Jun 18 12:23:30 2012
Instance terminated by CKPT, pid = 332
USER (ospid: 9254): terminating the instance
Instance terminated by USER, pid = 9254
...

All databases on node2 crashed:

alert_mydb2.log:
~~~~~~~~~~~~~~~~~~
...
Mon Jun 18 12:23:25 2012
NOTE: ASMB terminating
Errors in file /logs/diag/rdbms/mydb/mydb2/trace/mydb2_asmb_3979.trc:
ORA-15064: communication failure with ASM instance
ORA-03113: end-of-file on communication channel
...

Curiously, +ASM4's lmd0, reported ORA-4031 (unable to allocate 3768 bytes of shared memory ("shared pool","unknown object","sga heap (1,0)","ges enqueues") just before +ASM2 departed the cluster.

Cause

When disks are added into the diskgroup and rebalance is going on, the ASM instances need to use more DLM locks and thus consume more
SGA memory. With shared_pool_size being set to 128 MB, this is not enough and will cause the ORA-4031 in DLM daemons.
In ASM, we will hit [kffMapMesgAcquire02] when this happens.

Solution

When using Automatic Memory Management. Set shared_pool_size, large_pool_size or any other instance's memory parameter, to a higher value.
Please unset these and let AMM (via memory_target) manage the instance memory components automatically.

Check to see if you have enough share memory.

Reference : http://docs.oracle.com/cd/E11882_01/install.112/e24326/toc.htm
"
Automatic Memory Management

Starting with Oracle Database 11g, the Automatic Memory Management feature requires more shared memory (/dev/shm)and file descriptors. The shared memory should be sized to be at least the greater of MEMORY_MAX_TARGET and MEMORY_TARGET for each Oracle instance on that computer.

To determine the amount of shared memory available, enter the following command:

# df -h /dev/shm/"

oracle@oraclenode2:~/logs
$ df -h /dev/shm/
Filesystem Size Used Avail Use% Mounted on
tmpfs 12G 4.2G 7.6G 36% /dev/shm

It looks like we have 7.6G of free shared memory available for us.

So..

SQL> ALTER SYSTEM RESET large_pool_size SCOPE=SPFILE SID='*';

System altered.

SQL> ALTER SYSTEM SET memory_max_target=1G SCOPE=SPFILE SID='*';

System altered.

SQL> ALTER SYSTEM SET memory_target=1G SCOPE=SPFILE SID='*';

System altered.

Restart database instance and asm

Oracle Database In Action

Tuesday, 30 October 2012

ORA-15064: communication failure with ASM instance after adding asm disks to existing diskgroup

No comments:

Post a Comment

About Me