Oracle: Oracle Cluster Health Monitor (CHM) using large amount of space (crfclust.bdb)

Last night my rac 2 node server went down for OS patcing and rebooted but all CRS resources not coming up on both the node after node reboots:

conn as root user and check all resources

[root@oradev11 bin]# ./crsctl stat res -t -init

--------------------------------------------------------------------------------

NAME TARGET STATE SERVER STATE_DETAILS

--------------------------------------------------------------------------------

Cluster Resources

--------------------------------------------------------------------------------

ora.asm

1 ONLINE OFFLINE

ora.cluster_interconnect.haip

1 ONLINE OFFLINE

ora.crf

1 ONLINE OFFLINE

ora.crsd

1 ONLINE OFFLINE

ora.cssd

1 ONLINE OFFLINE

ora.cssdmonitor

1 ONLINE ONLINE oradev11

ora.ctssd

1 ONLINE OFFLINE

ora.diskmon

1 ONLINE OFFLINE

ora.drivers.acfs

1 ONLINE ONLINE oradev11

ora.evmd

1 ONLINE OFFLINE

ora.gipcd

1 ONLINE OFFLINE

ora.gpnpd

1 ONLINE OFFLINE

ora.mdnsd

1 ONLINE OFFLINE STARTING

CRS alert log says:

[root@oradev11 ] # cd $GRID_HOME/log/hostname

[root@oradev11 oradev11]# tail -50f alertoradev11.log

o/p trimmed………

2015-04-22 20:06:01.173:

[/u01/app/11.2.0.4/grid/bin/oraagent.bin(24990)]CRS-5818:Aborted

2015-04-22 20:06:05.177:

[ohasd(12696)]CRS-2757:Command 'Start' timed out waiting for response from the resource 'ora.mdnsd'. Details at (:CRSPE00111:) {0:0:2} in /u01/app/11.2.0.4/grid/log/sl73vmhasd/ohasd.log.

2015-04-22 20:06:05.658:

[/u01/app/11.2.0.4/grid/bin/oraagent.bin(25614)]CRS-0037:An error occurred while attempting to write to file "/u01/app/11.2.0.4/grid/log/oradev11/agent/ohasd/oraagenagent_grid.log". Additional diagnostics: LFI-00004: Call to lfibwrt() failed.

LFI-01518: write() failed(OSD return value = 28) in slfiwl.

2015-04-22 20:06:05.659:

[/u01/app/11.2.0.4/grid/bin/oraagent.bin(25614)]CRS-0004:logging terminated for the process. log file: "/u01/app/11.2.0.4/grid/log/oradev11/agent/ohasd/oraagent_gridgrid.log"

2015-04-22 20:06:06.176:

[/u01/app/11.2.0.4/grid/bin/oraagent.bin(25631)]CRS-0037:An error occurred while attempting to write to file "/u01/app/11.2.0.4/grid/log/oradev11/agent/ohasd/oraagenagent_grid.log". Additional diagnostics: LFI-00004: Call to lfibwrt() failed.

LFI-01518: write() failed(OSD return value = 28) in slfiwl.

2015-04-22 20:06:06.176:

[/u01/app/11.2.0.4/grid/bin/oraagent.bin(25631)]CRS-0004:logging terminated for the process. log file: "/u01/app/11.2.0.4/grid/log/oradev11/agent/ohasd/oraagent_gridgrid.log"

2015-04-22 20:06:06.272:

[gpnpd(25644)]CRS-0037:An error occurred while attempting to write to file "/u01/app/11.2.0.4/grid/log/oradev11/gpnpd/gpnpd.log". Additional diagnostics: LFI-00004: ibwrt() failed.

LFI-01518: write() failed(OSD return value = 28) in slfiwl.

2015-04-22 20:06:06.272:

[gpnpd(25644)]CRS-0004:logging terminated for the process. log file: "/u01/app/11.2.0.4/grid/log/oradev11/gpnpd/gpnpd.log"

2015-04-22 20:06:09.314:

[gpnpd(25644)]CRS-2329:GPNPD on node oradev11 shutdown.

2015-04-22 20:08:06.226:

[/u01/app/11.2.0.4/grid/bin/oraagent.bin(25631)]CRS-5818:Aborted command 'start' for resource 'ora.gpnpd'. Details at (:CRSAGF00113:) {0:0:2} in /u01/app/11.2.0.4/grid/logbd001/agent/ohasd/oraagent_grid/oraagent_grid.log.

2015-04-22 20:08:10.229:

[ohasd(12696)]CRS-2757:Command 'Start' timed out waiting for response from the resource 'ora.gpnpd'. Details at (:CRSPE00111:) {0:0:2} in /u01/app/11.2.0.4/grid/log/sl73vmhasd/ohasd.log.

2015-04-22 20:08:10.710:

[/u01/app/11.2.0.4/grid/bin/oraagent.bin(26582)]CRS-0037:An error occurred while attempting to write to file "/u01/app/11.2.0.4/grid/log/oradev11/agent/ohasd/oraagenagent_grid.log". Additional diagnostics: LFI-00004: Call to lfibwrt() failed.

LFI-01518: write() failed(OSD return value = 28) in slfiwl.

2015-04-22 20:08:10.710:

[/u01/app/11.2.0.4/grid/bin/oraagent.bin(26582)]CRS-0004:logging terminated for the process. log file: "/u01/app/11.2.0.4/grid/log/oradev11/agent/ohasd/oraagent_gridgrid.log"

2015-04-22 20:08:11.280:

[/u01/app/11.2.0.4/grid/bin/oraagent.bin(26604)]CRS-0037:An error occurred while attempting to write to file "/u01/app/11.2.0.4/grid/log/oradev11/agent/ohasd/oraagenagent_grid.log". Additional diagnostics: LFI-00004: Call to lfibwrt() failed.

LFI-01518: write() failed(OSD return value = 28) in slfiwl.

2015-04-22 20:08:11.280:

[/u01/app/11.2.0.4/grid/bin/oraagent.bin(26604)]CRS-0004:logging terminated for the process. log file: "/u01/app/11.2.0.4/grid/log/oradev11/agent/ohasd/oraagent_gridgrid.log"

2015-04-22 20:08:11.347:

[mdnsd(26617)]CRS-0037:An error occurred while attempting to write to file "/u01/app/11.2.0.4/grid/log/oradev11/mdnsd/mdnsd.log". Additional diagnostics: LFI-00004: ibwrt() failed.

LFI-01518: write() failed(OSD return value = 28) in slfiwl.

2015-04-22 20:08:11.347:

[mdnsd(26617)]CRS-0004:logging terminated for the process. log file: "/u01/app/11.2.0.4/grid/log/oradev11/mdnsd/mdnsd.log"

2015-04-22 20:08:11.351:

[mdnsd(26617)]CRS-5602:mDNS service stopping by request.

After so much of time spending on troubleshooting I checked the space on server and then released it is because of space issue on a mount point where my GRID home located, due to which CRS resources are not coming up

[root@oradev11 bin]# df -h

Filesystem Size Used Avail Use% Mounted on

/dev/mapper/rootvg-rootlv

5.8G 1.8G 3.8G 33% /

tmpfs 3.0G 0 3.0G 0% /dev/shm

/dev/sda1 190M 86M 95M 48% /boot

/dev/mapper/rootvg-homelv

2.0G 9.2M 1.8G 1% /home

/dev/mapper/rootvg-optlv

9.8G 2.0G 7.3G 22% /opt

/dev/mapper/rootvg-securlv

1.5G 211M 1.2G 16% /opt/security

/dev/mapper/rootvg-tmplv

2.0G 375M 1.5G 21% /tmp

/dev/mapper/rootvg-varlv

9.8G 1.1G 8.2G 12% /var

/dev/mapper/datavg-gridbaselv

50G 49G 0 100% /u01/app

/dev/mapper/datavg-rdbmsbaselv

50G 4.8G 42G 11% /u01/app/oracle

/dev/mapper/datavg-adrrepolv

50G 2.6G 45G 6% /oratrace

/dev/mapper/datavg-oemagentlv

20G 651M 18G 4% /u01/app/emagent

/dev/mapper/datavg-gglv

50G 52M 47G 1% /gg

/dev/mapper/datavg-dbawslv

99G 16G 79G 17% /oraworkspace

/dev/mapper/datavg-auditfslv

50G 230M 47G 1% /oradbaudit

/dev/mapper/datavg-dbtoolslv

9.8G 86M 9.2G 1% /oratools

Checking to see if i can delete anything on /u01/app mount point and i see "crfclust.bdb" is consuming much space then any other

[root@oradev11 bin]# cd ../crf/db

[root@oradev11 db]# ls -lrht

total 4.0K

drwxr-x--- 2 root oinstall 4.0K Apr 22 20:45 oradev11

[root@oradev11 db]# cd oradev11

[root@oradev11 oradev11]# ls -lrth

total 38G

-rw-r--r-- 1 root root 1.1M Sep 8 2014 08-SEP-2014-09:24:06.txt

-rw-r--r-- 1 root root 1.9M Sep 8 2014 08-SEP-2014-10:07:28.txt

-rw-r--r-- 1 root root 1.2M Sep 8 2014 08-SEP-2014-10:20:00.txt

-rw-r----- 1 root root 8.0K Nov 20 09:44 repdhosts.bdb

-rw-r--r-- 1 root root 74K Mar 9 10:53 09-MAR-2015-10:53:37.txt

-rw-r--r-- 1 root root 856K Mar 9 10:56 09-MAR-2015-10:56:42.txt

-rw-r--r-- 1 root root 77K Mar 13 19:21 13-MAR-2015-19:21:26.txt

-rw-r--r-- 1 root root 218K Mar 13 19:21 13-MAR-2015-19:21:44.txt

-rw-r----- 1 root root 16M Apr 22 12:19 log.0000007983

-rw-r----- 1 root root 24K Apr 22 20:42 __db.001

-rw-r--r-- 1 root root 115M Apr 22 20:42 oradev11.ldb

-rw-r----- 1 root root 8.0K Apr 22 20:43 crfconn.bdb

-rw-r--r-- 1 root root 777K Apr 22 20:45 22-APR-2015-20:45:53.txt

-rw-r----- 1 root root 56K Apr 22 20:56 __db.006

-rw-r----- 1 root root 392K Apr 22 20:56 __db.002

-rw-r----- 1 root root 812M Apr 22 20:56 crfloclts.bdb

-rw-r----- 1 root root 668M Apr 22 20:56 crfcpu.bdb

-rw-r----- 1 root root 743M Apr 22 20:56 crfalert.bdb

-rw-r----- 1 root root 526M Apr 22 20:56 crfts.bdb

-rw-r----- 1 root root 607M Apr 22 20:56 crfhosts.bdb

-rw-r----- 1 root root 34G Apr 22 20:56 crfclust.bdb

-rw-r----- 1 root root 16M Apr 22 20:56 log.0000007984

-rw-r----- 1 root root 1.2M Apr 22 20:56 __db.005

-rw-r----- 1 root root 2.1M Apr 22 20:56 __db.004

-rw-r----- 1 root root 2.6M Apr 22 20:56 __db.003

From the above output I see only “crfclust.bdb” is consuming lot of space, then I followed the steps given in the oracle doc to free up the space on the server

Stop ora.crf ……….

[root@oradev11 bin]# ./crsctl stop res ora.crf -init

CRS-2673: Attempting to stop 'ora.crf' on 'oradev11'

CRS-2677: Stop of 'ora.crf' on 'oradev11' succeeded

[root@oradev11 oradev11]# rm crfclust.bdb

[root@oradev11 oradev11]# df -h

Filesystem Size Used Avail Use% Mounted on

/dev/mapper/rootvg-rootlv

5.8G 1.8G 3.8G 33% /

tmpfs 3.0G 854M 2.2G 28% /dev/shm

/dev/sda1 190M 86M 95M 48% /boot

/dev/mapper/rootvg-homelv

2.0G 9.2M 1.8G 1% /home

/dev/mapper/rootvg-optlv

9.8G 2.0G 7.3G 22% /opt

/dev/mapper/rootvg-securlv

1.5G 211M 1.2G 16% /opt/security

/dev/mapper/rootvg-tmplv

2.0G 376M 1.5G 21% /tmp

/dev/mapper/rootvg-varlv

9.8G 1.1G 8.2G 12% /var

/dev/mapper/datavg-gridbaselv

50G 13G 34G 28% /u01/app

/dev/mapper/datavg-rdbmsbaselv

50G 4.8G 42G 11% /u01/app/oracle

/dev/mapper/datavg-adrrepolv

50G 2.6G 45G 6% /oratrace

/dev/mapper/datavg-oemagentlv

20G 651M 18G 4% /u01/app/emagent

/dev/mapper/datavg-gglv

50G 52M 47G 1% /gg

/dev/mapper/datavg-dbawslv

99G 16G 79G 17% /oraworkspace

/dev/mapper/datavg-auditfslv

50G 231M 47G 1% /oradbaudit

/dev/mapper/datavg-dbtoolslv

9.8G 86M 9.2G 1% /oratools

/dev/asm/ggatevol-387

20G 562M 20G 3% /gg/GG11

Start again………..

[root@oradev11 bin]# ./crsctl start res ora.crf -init

CRS-2672: Attempting to start 'ora.crf' on 'oradev11'

CRS-2676: Start of 'ora.crf' on 'oradev11' succeeded

[root@oradev11 bin]# ./crsctl status res -t -init

--------------------------------------------------------------------------------

NAME TARGET STATE SERVER STATE_DETAILS

--------------------------------------------------------------------------------

Cluster Resources

--------------------------------------------------------------------------------

ora.asm

1 ONLINE ONLINE oradev11 Started

ora.cluster_interconnect.haip

1 ONLINE ONLINE oradev11

ora.crf

1 ONLINE ONLINE oradev11

ora.crsd

1 ONLINE ONLINE oradev11

ora.cssd

1 ONLINE ONLINE oradev11

ora.cssdmonitor

1 ONLINE ONLINE oradev11

ora.ctssd

1 ONLINE ONLINE oradev11 OBSERVER

ora.diskmon

1 OFFLINE OFFLINE

ora.drivers.acfs

1 ONLINE ONLINE oradev11

ora.evmd

1 ONLINE ONLINE oradev11

ora.gipcd

1 ONLINE ONLINE oradev11

ora.gpnpd

1 ONLINE ONLINE oradev11

ora.mdnsd

1 ONLINE ONLINE oradev11

Now I see all the resources are up and running

Refer:

Oracle Cluster Health Monitor (CHM) using large amount of space (more than default) (Doc ID 1343105.1)

Thursday, April 23, 2015

Oracle Cluster Health Monitor (CHM) using large amount of space (crfclust.bdb)

0 comments:

Post a Comment