Oracle ASM Diagnostics Script

As I mentioned in Oracle Tablespace Monitoring, I worked for a company that operated a very large internet site and our space consumption was at the rate of 800GB every month. We were adding 4x400GB LUNs to our RAC cluster every two months. And this amount of space needed to be added not only to the primary production site, but also to the snapshot DB, reporting DB and the physical standby DB in a remote datacenter. Needless to say, I’ve seen a number of multipath/storage and ASM issues and to make my life easier I developed an Oracle ASM Diagnostics Script that allowed me to perform some basic health-checks on the state of ASM GROUPs and DISKs as seen from the database side.

The script is called lasmdsk.sh and it does the following:

Check if ASM is running and parse the ASM instance ID (ASM1,2,3,n etc): check if ASM is running

Parse the ASM ORACLE_HOME from /etc/oratab using the ASM Instance ID from previous step and set up ORACLE_SID and ORACLE_HOME environmental variables accordingly: parse ASM ORACLE_HOME

Run asmcmd command and print the following attributes:

  • State
  • Type
  • Rebal
  • Sector
  • Block
  • AU
  • Total_MB
  • Free_MB
  • Req_mir_free_MB
  • Usable_file_MB
  • Offline_disks
  • Voting_files
  • Name

And finally dig deep into the ASM data dictionary joining gv$asm_disk and gv$asm_diskgroup to check for the most common issues we’ve seen in our shop while adding SAN provisioned multipath’ed LUNs to our databases: parse ASM ORACLE_HOME

However the true utility of this script is in how quickly and easily it allows me to filter the output of the above query. And to really demonstrate this – let me give you a real example of how I add new LUNs under ASM and provision them to a real production data group:

Lets say I just created four new ASM DISKs (DATA_105, DATA_106, DATA_107 and DATA_108) using sudo /etc/init.d/oracleasm createdisk command and then I didscandisks and listdisks on all RAC nodes.

Now it’s time to verify the gv$asm_disk.HEADER_STATUS = PROVISIONED. I could setup ORACLE_HOME and SID and PATH variables to point to ASM/Grid OH and then login to sqlplus and run the query selecting HEADER_STATUS from gv$asm_disk where NAME ...

hmm … do you see the problem?

I now have to use an IN or OR operator to get all 4 DISKs because there is no common pattern to give to a LIKE operator unless I use regex and who is going to do that on a fly?

Contrast this with my script instead:

wget https://s3.amazonaws.com/mve-shared/bin/lasmdsk.sh
./lasmdsk.sh “_105|_106|_107|_108”

Easy! And it works because inside the script I wrap the query in a shell function and then pipe it’s output to egrep which does the filtering faster and easier than it’s possible inside oracle:

get_asm | egrep "INST_ID|^--|${ASMDISK}"

Now I simply run ./lasmdsk.sh “_105|_106|_107|_108” , check that HEADER_STATUS = PROVISIONED and move to the next step which is creating a TEST disk group and adding the 4 new DISKs to it to make sure everything works as expected:

ASMID=`ps -ef | grep pmon | grep ASM | awk '{print $NF}' | sed 's/asm_pmon_//g'`
ORACLE_HOME=`grep ${ASMID} /etc/oratab | awk -F: '{print $2}'`
export ORACLE_HOME
PATH=$PATH:$ORACLE_HOME/bin
export PATH
ORACLE_SID=${ASMID}; export ORACLE_SID

sqlplus / as sysasm <<EOF
create diskgroup TESTGRP EXTERNAL REDUNDANCY
disk
   'ORCL:DATA_105'
,  'ORCL:DATA_106'
,  'ORCL:DATA_107'
,  'ORCL:DATA_108';
exit;
EOF

Next I mount the new TESTGRP group on the remaining nodes of the RAC cluster using another script of mine called mntgrp.sh which I previously installed on all nodes in the RAC cluster using the following method:

for x in {2..6}
do
   scp mntgrp.sh racdb0${x}:/oracle/dba/bin/
done

which now allows me to call it with mount|unmount option like so:

for x in {2..6}
do
   ssh racdb0${x} /oracle/dba/bin/mntgrp.sh TESTGRP mount
done

I then verify that the TESTGRP is mounted on all nodes in the cluster using the following:

./lasmdsk.sh TESTGRP

Next I proceed to check ASM Alert logs for any errors (actually I simply check our custom monitoring system that keeps an eye on these things) and if none found I drop the TESTGRP and assign the 4 new DISKs to the real production DISK GROUP that they were destined for. Here’s the remainder of the process:

Dismount the TESTGRP on all but first node (node where I am now):

for x in {2..6}
do
   ssh racdb0${x} /oracle/dba/bin/mntgrp.sh TESTGRP dismount
done

Take a storage level snapshot of the database (this is a scripted process so I will spare you from the details of this as it’s very specific to our environment and storage).

Once the snapshot is taken – drop the TESTGRP:

## first make sure it’s dismounted on all but this node
./lasmdsk.sh TESTGRP
## then sign into sqlplus as sysasm and issue:
drop diskgroup TESTGRP;

Verify that the HEADER_STATUS = FORMER using lasmdsk.sh:

./lasmdsk.sh “_105|_106|_107|_108”

and finally add the 4 DISKs to the real production data group as sysasm:

ALTER DISKGROUP PROD_DATA1 ADD
DISK
   'ORCL:DATA_105'
,  'ORCL:DATA_106'
,  'ORCL:DATA_107'
,  'ORCL:DATA_108';

alter diskgroup PROD_DATA1 rebalance power 10;

At this point I am almost done. I just do one more final check on the state of DISKs and GROUPs using lasmdsk.sh:

./lasmdsk.sh “_105|_106|_107|_108”

I check above output for some key things that we watch out for:

  1. gv$asm_disk.PATH doesn’t match gv$asm_disk.NAME (something terribly wrong happened at OS level, rescan or bounce to clear it up)
  2. DISK belongs to a wrong GROUP (ditto)
  3. GROUP’s state is not what it supposed to be (mount/dismount manually)
  4. DISK’s state is HUNG (should be NORMAL: Solution ALTER DISKGROUP name UNDROP DISKS)
  5. DISK’s header_status is other than MEMBER (PROVISIONED is OK for newly formatted unused/free disks)

For example – here’s an issue with CRS DISKs pointing to wrong paths (click the image to view full size): CRS DISKs pointing to wrong paths

I hope this helps you with chasing down ASM/SAN-storage issues! And if you have any questions head over to our Oracle DBA Community and simply ask.

December 22, 2015

Posted In: Operations, Scripts

Tags: , , , , , ,