
Pillar Axiom 500 SAN & NAS Storage
About a year and a half ago our shop made a decision to implement ORACLE RAC after it became apparent we will have to stop using SUN (Solaris) for our database needs. The cost of SUN hardware and ORACLE’s commitment to Linux were the final nails into SUN’s coffin. Oracle RAC was getting a lot of traction as more and more shops put their mission critical apps onto RAC.
I immediately got engaged in research and implementation of an ORACLE RAC Cluster. After the initial research was done it became apparent that we needed to decide on what shared storage to use. The two contenders we worked with were NetAPP and Pillar. Pillar won the bid because they offered something NetAPP couldn’t — closest integration with ORACLE RDBMS by offering Quality of Service (QOS) depending on data/access type. QOS and strong technical people that they brought to the table won us over.
It took a lot of time to get everyone on board with our decision to move away from SUN but eventually we built the right team and setup the QA cluster in a very short period of time. The cluster consists of 4 HP Proliant 380DL G5’s each with 16G RAM and dual E5440 procs running 64bit OEL 4.5. Shared storage is provided by the Pillar Axiom 500 SAN patched via dual brocade switches to two single port qla2400’s on each node. I think you’ve had enough of the geek talk for now?
I finished the installation and it was now time to burn in the hardware. I chose a simple test — load lots of data into a partitioned table and in the process hit all of the spindles on Pillar. I didn’t care about how realistic the test was I just wanted to put as much load on the components for long periods of time. Here’s the test case:
connect / as sysdba
grant connect, resource to perftst identified by perftst;
alter user perftst default tablespace users
temporary tablespace temp;
grant select on dba_objects to perftst ;
connect perftst/perftst
drop table pillar_test;
create table pillar_test(
pkey number(15) not null,
pdate date not null,
pchar varchar2(4000) not null)
partition by hash(pkey) (
partition pt_a tablespace users
, partition pt_b tablespace users
, partition pt_c tablespace users
, partition pt_d tablespace users
, partition pt_e tablespace users
, partition pt_f tablespace users
, partition pt_g tablespace users
, partition pt_h tablespace users);
drop sequence pillar_test_s;
create sequence pillar_test_s;
create or replace procedure pillar_test_proc(p_pass in number)
as
begin
for x in 1 .. p_pass
loop
insert into pillar_test
select pillar_test_s.nextval,
sysdate,
object_name||'.'||object_type||'.'||owner
from sys.dba_objects;
end loop;
end;
/
sqlplus /nolog <<EOF
connect perftst/perftst
set time on
set timing on
exec pillar_test_proc(1000);
exit
EOF
First I ran this test from a single node, then I span it onto four nodes monitoring the wait events which were recorded by turning sql-trace ON for one of the sessions … While doing that I got sucked into the performance aspect of this test because the timings were horrible when the test ran simultaneously on all four nodes. For example the first data load from a single node took 33 minutes (46043000 rows were created) and when the same data load ran simultaneously on four nodes it took 3 hours and 57 minutes each! It was kind of hard to ignore this so I decided to dig into the stats and try to understand what was causing this degradation in throughput. In the process I’ve learned invaluable information — here are the results:

Dual Brocades are used to connect to SAN
Test #1
1 node / 1 data load process
Waits:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
row cache lock 486489 0.00 19.20
DFS lock handle 341 0.49 22.87
Test #2
4 nodes / each doing it’s data load
Elapsed: 03:57:53.21
Elapsed: 03:56:53.87
Elapsed: 03:56:52.30
Elapsed: 03:55:01.58
Waits: (just ignore the low “Total Waited” numbers — I turned the trace late in the game):
Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
row cache lock 42638 0.70 419.65
gc cr block busy 44301 0.91 133.91
db file sequential read 4992 0.73 66.54
enq: TX - contention 346 0.49 101.44
Notes: The biggest issue here was waiting on “row cache lock” my research showed that the culprit was the uncached sequence (see ML Note:4390868.8).
Solution:
alter sequence pillar_test_s cache 10000;
Test #3
This is the same as Test #2 but with cached seq.
database got corrupted (issue with SAN storage)
Fix:
upgrade pillar firmware
replace pillar CU1's mobo
Test #4
4 nodes / each doing it’s data load — this is the fist successful test after the sequence was cached.
Elapsed: 01:21:29.46
Elapsed: 01:21:12.16
Elapsed: 01:20:33.10
Elapsed: 01:20:49.23
Waits:
Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
db file sequential read 276718 0.38 2442.71
gc cr block busy 5631 0.33 342.28
Notes: This is a huge performance boost to the overall throughput in comparison to Test #2. We went down from almost 4 hours to 1 hour and 20 minutes average! Caching the sequence made an incredible difference.
Test #5
Same as Test #4 no changes just to further burn pillar in
Elapsed: 02:15:08.94
unable to extend table [USERS ts filled up]
Elapsed: 01:42:50.61
unable to extend table [USERS ts filled up]
Waits:
Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
gc cr block busy 30473 1.00 1098.09
gc current block busy 64267 0.57 1978.75
gc current block 2-way 59767 0.43 541.29
db file sequential read 49086 0.20 329.69
gc cr block 2-way 1020376 0.02 237.45
gc current block 3-way 155133 0.62 1196.94
gc current retry 2526 0.22 140.10
gc buffer busy 13361 0.97 338.95
Notes: since the load had to be rolled back on two nodes this test was simply an exception, nevertheless, it’s a good example of what to expect when a large transaction is being rolled back in a 4-way cluster — you see a different set of wait events and the throughput of the other sessions are heavily hindered by the competing I/O.
Solution:
ALTER TABLESPACE USERS
ADD DATAFILE '+DATA(DATAFILE)' SIZE 10000M
AUTOEXTEND ON NEXT 100M MAXSIZE UNLIMITED;
Test #6
Same as test 4 and 5 — no changes other than having more space in USERS tablespace
Elapsed: 01:14:21.17
Elapsed: 01:15:00.04
Elapsed: 01:11:37.21
Elapsed: 01:13:52.00
Waits:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
gc current block busy 29946 0.48 1408.29
gc buffer busy 16755 0.36 271.87
gc cr block busy 5939 0.87 474.33
gc current grant busy 20965 0.37 94.89
row cache lock 7815 0.88 163.39
db file sequential read 9472 0.23 91.23
Notes: Something interesting happened here — since we have to ignore the results of the Test #5 (due to rollback) lets focus on comparison with Test #4 which on average, ran 10 minutes longer and had the most waits on “db file sequential read” while this test waited mostly on “gc current block busy”.
Why would this be the case? There were only two changes between these two tests:
- addition of one “pre-extended” datafile
- Test #5 pre-extended the table and then rolled back all of the data
Perhaps the “db file sequential read” waits were on creating new extents for the partitions …
Test #8
Ever since I cached the sequence I never went back to retest a single load on a single server, so I figured it’s time to do that now:
Waits:
Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
db file sequential read 4673 0.06 15.85
gc current multi block request 52281 0.02 11.15
log file switch completion 139 0.27 19.18
Data file init write 4393 0.07 25.35
Notes: It’s called Test #8 because I ran it after #7 was already done but it important to show this result before we get to Test #7 and the conclusion of all this …
Test #7
One final test I wanted to do was to run four instances of the data-loading procedure all from the same NODE to see how it compares to the clustered test. Here are the results:
Elapsed: 00:37:57.90
Elapsed: 00:38:05.00
Elapsed: 00:38:41.37
Elapsed: 00:40:24.61
Waits:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
log buffer space 1169 0.24 45.19
db file sequential read 82460 0.26 195.84
buffer busy waits 2759 0.98 53.22
log file switch completion 622 0.97 107.32
Data file init write 9041 0.22 93.90
Notes: Totally different picture is painted here — first of all the timings are GREAT! 37-40 minutes per run — that’s the closest I got so far to the single run on a single NODE which took 23 minutes (Test #8).
Lets ignore the “log file switch completion” — that’s an easy fix. Lets take a look at what blocks we were hitting while waiting on “db file sequential read” since it’s the highest wait here …
bash-3.00$ grep "db file sequential read" xracq3_ora_15494.trc | head
WAIT #1: nam='db file sequential read' ela= 425 file#=8 block#=1500250 blocks=1 obj#=47469 tim=1190035629498382
WAIT #1: nam='db file sequential read' ela= 284 file#=8 block#=1481753 blocks=1 obj#=47185 tim=1190035629531711
WAIT #1: nam='db file sequential read' ela= 382 file#=8 block#=1500313 blocks=1 obj#=47469 tim=1190035629532250
WAIT #1: nam='db file sequential read' ela= 465 file#=8 block#=1500188 blocks=1 obj#=47469 tim=1190035629705544
WAIT #1: nam='db file sequential read' ela= 2428 file#=8 block#=1481754 blocks=1 obj#=47185 tim=1190035629716264
WAIT #1: nam='db file sequential read' ela= 418 file#=8 block#=1500251 blocks=1 obj#=47469 tim=1190035629750103
WAIT #1: nam='db file sequential read' ela= 2248 file#=8 block#=1481817 blocks=1 obj#=47185 tim=1190035629759915
WAIT #1: nam='db file sequential read' ela= 260 file#=8 block#=1500314 blocks=1 obj#=47469 tim=1190035629800526
WAIT #1: nam='db file sequential read' ela= 319 file#=8 block#=1500377 blocks=1 obj#=47469 tim=1190035629835047
WAIT #1: nam='db file sequential read' ela= 325 file#=8 block#=1481755 blocks=1 obj#=47185 tim=1190035629836766
bash-3.00$
Lets find the obj# (I am not interested in block# since I already knew they were all different — it’s an insert and I am not going to identify a hot block I am just trying to identify the objects we are waiting on with the “db file sequential read”):
grep "db file sequential read" xracq3_ora_15494.trc |
sed -e 's/.*file#=/ file /' \
-e 's/ block#=.*.blocks=.*.obj#=/ obj /' \
-e 's/ tim=.*//' |
head
bash-3.00$ grep "db file sequential read" xracq3_ora_15494.trc |
> sed -e 's/.*file#=/ file /' \
> -e 's/ block#=.*.blocks=.*.obj#=/ obj /' \
> -e 's/ tim=.*//' |
> head
file 8 obj 47469
file 8 obj 47185
file 8 obj 47469
file 8 obj 47469
file 8 obj 47185
file 8 obj 47469
file 8 obj 47185
file 8 obj 47469
file 8 obj 47469
file 8 obj 47185
bash-3.00$
And finally get counts of waits grouped by obj and file to find the heavy hitters:
grep "db file sequential read" xracq3_ora_15494.trc |
sed -e 's/.*file#=/ file /' \
-e 's/ block#=.*.blocks=.*.obj#=/ obj /' \
-e 's/ tim=.*//' |
sort |
uniq -c |
sort -nr
bash-3.00$ grep "db file sequential read" xracq3_ora_15494.trc |
> sed -e 's/.*file#=/ file /' \
> -e 's/ block#=.*.blocks=.*.obj#=/ obj /' \
> -e 's/ tim=.*//' |
> sort |
> uniq -c |
> sort -nr
11293 file 8 obj 47184
11225 file 8 obj 47185
10713 file 8 obj 47467
10706 file 8 obj 47186
10436 file 8 obj 47468
10228 file 8 obj 47469
9986 file 8 obj 47183
7774 file 8 obj 47182
26 file 5 obj 47185
20 file 5 obj 47183
18 file 5 obj 47186
16 file 5 obj 47182
15 file 5 obj 47469
12 file 5 obj 47467
10 file 5 obj 47184
8 file 5 obj 47468
4 file 5 obj 0
2 file 8 obj 0
bash-3.00$
Here are the heavy hitters:
11293 file 8 obj 47184
11225 file 8 obj 47185
10713 file 8 obj 47467
10706 file 8 obj 47186
10436 file 8 obj 47468
10228 file 8 obj 47469
9986 file 8 obj 47183
7774 file 8 obj 47182
Now lets see what they are:
col object_name format a20
col owner format a10
col subobject_name format a20
select object_name,subobject_name,
object_type,owner
from dba_objects
where object_id in (47184,
47185,
47467,
47186,
47468,
47469,
47183,
47182);
SQL> select object_name,subobject_name,
2 object_type,owner
3 from dba_objects
4 where object_id in (47184,
5 47185,
6 47467,
7 47186,
8 47468,
9 47469,
10 47183,
11 47182);
OBJECT_NAME SUBOBJECT_NAME OBJECT_TYPE OWNER
-------------------- -------------------- ------------------- ----------
PILLAR_TEST PT_H TABLE PARTITION PERFTST
PILLAR_TEST PT_G TABLE PARTITION PERFTST
PILLAR_TEST PT_F TABLE PARTITION PERFTST
PILLAR_TEST PT_E TABLE PARTITION PERFTST
PILLAR_TEST PT_D TABLE PARTITION PERFTST
PILLAR_TEST PT_C TABLE PARTITION PERFTST
PILLAR_TEST PT_B TABLE PARTITION PERFTST
PILLAR_TEST PT_A TABLE PARTITION PERFTST
8 rows selected.
SQL>
While not related to RAC it’s not something I expected here. Why would ORACLE want to do “db file sequential read” on the table we are INSERTing into? There are no indexes and no select clause that references this table … My guess is that ORACLE reads in each block before it can insert into it to see if there’s any space left there. Maybe it has something to do with ASSM (this tablespace is ASSM) or maybe it’s the HASH partitioning — I really don’t know. Whatever the reason is — there’s no denying it — INSERT INTO table_a incurs “db file sequential read” on table_a even if there are no indexes.
Conclusion
Going back to the subject of this page — Oracle RAC Scalability. Something became apparent to me throughout this test — ORACLE RAC doesn’t scale AT ALL when the same blocks are worked on by multiple instances. This is not something you will hear from ORACLE, instead it’s “On-demand scalability — Expand capacity by simply adding servers to your cluster” (copied from oracle.com’s RAC page). Perhaps it should say that ORACLE RAC will give you an opportunity to provide On-demand scalability when more nodes are added to the cluster. But that doesn’t sound as good as “On-demand scalability — Expand capacity by simply adding servers to your cluster” does it? No, because providing an opportunity implies some work to be done on your part instead of things being done automagically.
Update [SEP-02-2008]
I just ran another series of tests because I realized the following:
- When one of the Pillar Control Unit’s (CU) failed the multipath software directed all of the data to the secondary CU. When we replaced failed unit I never verified if the multipath was back to optimal using both CU’s. Thus there’s a possibility that we were only using a single data path even though multiple ASM LUNs were created and assigned to alternating control units.
- I never tested multiple datafile scenario to see if it would improve the overall throughput.
For this series of tests I verified multipath was back to optimal with both CU’s engaged and I created a secondary table on top of four datafiles to see if it would make any difference. Here’s the new DDL for both single and multi-datafile tests:
sqlplus /nolog
connect / as sysdba
CREATE SMALLFILE TABLESPACE single_dbf
DATAFILE '+DATA(DATAFILE)' SIZE 10000M
AUTOEXTEND ON NEXT 1280K MAXSIZE UNLIMITED
EXTENT MANAGEMENT LOCAL
SEGMENT SPACE MANAGEMENT AUTO;
CREATE SMALLFILE TABLESPACE multi_dbf
DATAFILE '+DATA(DATAFILE)' SIZE 10000M AUTOEXTEND ON NEXT 1280K MAXSIZE UNLIMITED ,
'+DATA(DATAFILE)' SIZE 10000M AUTOEXTEND ON NEXT 1280K MAXSIZE UNLIMITED ,
'+DATA(DATAFILE)' SIZE 10000M AUTOEXTEND ON NEXT 1280K MAXSIZE UNLIMITED ,
'+DATA(DATAFILE)' SIZE 10000M AUTOEXTEND ON NEXT 1280K MAXSIZE UNLIMITED
EXTENT MANAGEMENT LOCAL
SEGMENT SPACE MANAGEMENT AUTO;
connect perftst/perftst
drop table pillar_test_single_dbf;
create table pillar_test_single_dbf(
pkey number(15) not null,
pdate date not null,
pchar varchar2(4000) not null)
partition by hash(pkey) (
partition pt_a tablespace single_dbf
, partition pt_b tablespace single_dbf
, partition pt_c tablespace single_dbf
, partition pt_d tablespace single_dbf
, partition pt_e tablespace single_dbf
, partition pt_f tablespace single_dbf
, partition pt_g tablespace single_dbf
, partition pt_h tablespace single_dbf);
create or replace procedure pillar_test_proc_single_dbf(p_pass in number)
as
begin
for x in 1 .. p_pass
loop
insert into pillar_test_single_dbf
select pillar_test_s.nextval,
sysdate,
object_name||'.'||object_type||'.'||owner
from sys.dba_objects;
end loop;
end;
/
drop table pillar_test_multi_dbf;
create table pillar_test_multi_dbf(
pkey number(15) not null,
pdate date not null,
pchar varchar2(4000) not null)
partition by hash(pkey) (
partition pt_a tablespace multi_dbf
, partition pt_b tablespace multi_dbf
, partition pt_c tablespace multi_dbf
, partition pt_d tablespace multi_dbf
, partition pt_e tablespace multi_dbf
, partition pt_f tablespace multi_dbf
, partition pt_g tablespace multi_dbf
, partition pt_h tablespace multi_dbf);
create or replace procedure pillar_test_proc_multi_dbf(p_pass in number)
as
begin
for x in 1 .. p_pass
loop
insert into pillar_test_multi_dbf
select pillar_test_s.nextval,
sysdate,
object_name||'.'||object_type||'.'||owner
from sys.dba_objects;
end loop;
end;
/
SQL> select REFERENCED_NAME,REFERENCED_TYPE
2 from user_dependencies
3 where name = 'PILLAR_TEST_PROC_SINGLE_DBF';
REFERENCED_NAME REFERENCED_TYPE
---------------------------------------------------------------- -----------------
STANDARD PACKAGE
DBA_OBJECTS VIEW
SYS_STUB_FOR_PURITY_ANALYSIS PACKAGE
PILLAR_TEST_S SEQUENCE
PILLAR_TEST_SINGLE_DBF TABLE
SQL>
SQL> select REFERENCED_NAME,REFERENCED_TYPE
2 from user_dependencies
3 where name = 'PILLAR_TEST_PROC_MULTI_DBF';
REFERENCED_NAME REFERENCED_TYPE
---------------------------------------------------------------- -----------------
STANDARD PACKAGE
DBA_OBJECTS VIEW
SYS_STUB_FOR_PURITY_ANALYSIS PACKAGE
PILLAR_TEST_S SEQUENCE
PILLAR_TEST_MULTI_DBF TABLE
SQL>
SQL> r
1 select table_name,PARTITION_NAME,TABLESPACE_NAME
2 from user_tab_partitions
3* where table_name = 'PILLAR_TEST_SINGLE_DBF'
TABLE_NAME PARTITION_NAME TABLESPACE_NAME
------------------------------ ------------------------------ ------------------------------
PILLAR_TEST_SINGLE_DBF PT_A SINGLE_DBF
PILLAR_TEST_SINGLE_DBF PT_B SINGLE_DBF
PILLAR_TEST_SINGLE_DBF PT_C SINGLE_DBF
PILLAR_TEST_SINGLE_DBF PT_D SINGLE_DBF
PILLAR_TEST_SINGLE_DBF PT_E SINGLE_DBF
PILLAR_TEST_SINGLE_DBF PT_F SINGLE_DBF
PILLAR_TEST_SINGLE_DBF PT_G SINGLE_DBF
PILLAR_TEST_SINGLE_DBF PT_H SINGLE_DBF
8 rows selected.
SQL>
SQL> r
1 select table_name,PARTITION_NAME,TABLESPACE_NAME
2 from user_tab_partitions
3* where table_name = 'PILLAR_TEST_MULTI_DBF'
TABLE_NAME PARTITION_NAME TABLESPACE_NAME
------------------------------ ------------------------------ ------------------------------
PILLAR_TEST_MULTI_DBF PT_A MULTI_DBF
PILLAR_TEST_MULTI_DBF PT_B MULTI_DBF
PILLAR_TEST_MULTI_DBF PT_C MULTI_DBF
PILLAR_TEST_MULTI_DBF PT_D MULTI_DBF
PILLAR_TEST_MULTI_DBF PT_E MULTI_DBF
PILLAR_TEST_MULTI_DBF PT_F MULTI_DBF
PILLAR_TEST_MULTI_DBF PT_G MULTI_DBF
PILLAR_TEST_MULTI_DBF PT_H MULTI_DBF
8 rows selected.
SQL>
SQL> r
1 select file_name from dba_data_files
2* where tablespace_name = 'SINGLE_DBF'
FILE_NAME
---------------------------------------------
+DATA/xracq/datafile/single_dbf.270.663958519
SQL>
SQL> r
1 select file_name from dba_data_files
2* where tablespace_name = 'MULTI_DBF'
FILE_NAME
---------------------------------------------
+DATA/xracq/datafile/multi_dbf.271.663958699
+DATA/xracq/datafile/multi_dbf.272.663958819
+DATA/xracq/datafile/multi_dbf.273.663958937
+DATA/xracq/datafile/multi_dbf.274.663959053
SQL>
-- note I am reusing previously created sequence pillar_test_s ...
Create a shell script to simplify the test process:
## NOTE:
## /u02 is an OCFS2 fs accessible from all nodes
##
bash-3.00$ cat /u02/app/oracle/admin/scripts/bin/perf-test.sh
sqlplus /nolog <<EOF
connect perftst/perftst
set time on
set timing on
exec ${1}(1000);
exit
EOF
Test 1 (II)
## run 4 instances of pillar_test_proc_single_dbf ALL ON A SINGLE NODE
##
/u02/app/oracle/admin/scripts/bin/perf-test.sh pillar_test_proc_single_dbf &
/u02/app/oracle/admin/scripts/bin/perf-test.sh pillar_test_proc_single_dbf &
/u02/app/oracle/admin/scripts/bin/perf-test.sh pillar_test_proc_single_dbf &
/u02/app/oracle/admin/scripts/bin/perf-test.sh pillar_test_proc_single_dbf &
Elapsed: 00:38:07.44
Elapsed: 00:38:20.21
Elapsed: 00:38:48.51
Elapsed: 00:40:14.67
Test 2 (II)
## run pillar_test_proc_single_dbf on all 4 nodes
##
qarac1-> /u02/app/oracle/admin/scripts/bin/perf-test.sh pillar_test_proc_single_dbf
qarac2-> /u02/app/oracle/admin/scripts/bin/perf-test.sh pillar_test_proc_single_dbf
qarac3-> /u02/app/oracle/admin/scripts/bin/perf-test.sh pillar_test_proc_single_dbf
qarac4-> /u02/app/oracle/admin/scripts/bin/perf-test.sh pillar_test_proc_single_dbf
Elapsed: 00:52:29.73
Elapsed: 00:52:25.54
Elapsed: 00:51:20.80
Elapsed: 00:52:19.83
Test 3 (II)
## run 4 instances of pillar_test_proc_multi_dbf ALL ON A SINGLE NODE
##
/u02/app/oracle/admin/scripts/bin/perf-test.sh pillar_test_proc_multi_dbf &
/u02/app/oracle/admin/scripts/bin/perf-test.sh pillar_test_proc_multi_dbf &
/u02/app/oracle/admin/scripts/bin/perf-test.sh pillar_test_proc_multi_dbf &
/u02/app/oracle/admin/scripts/bin/perf-test.sh pillar_test_proc_multi_dbf &
Elapsed: 00:32:22.48
Elapsed: 00:32:49.22
Elapsed: 00:33:25.12
Elapsed: 00:33:28.65
Test 4 (II)
## now run pillar_test_proc_multi_dbf on all 4 nodes
##
qarac1-> /u02/app/oracle/admin/scripts/bin/perf-test.sh pillar_test_proc_multi_dbf
qarac2-> /u02/app/oracle/admin/scripts/bin/perf-test.sh pillar_test_proc_multi_dbf
qarac3-> /u02/app/oracle/admin/scripts/bin/perf-test.sh pillar_test_proc_multi_dbf
qarac4-> /u02/app/oracle/admin/scripts/bin/perf-test.sh pillar_test_proc_multi_dbf
Elapsed: 00:45:32.65
Elapsed: 00:47:47.58
Elapsed: 00:47:30.09
Elapsed: 00:47:01.17
Conclusion II
There’s no doubt that by ensuring both Control Units are utilized and by laying down a table on top of multiple data-files we were able to significantly improve the throughput of this test. Lets just review the most significant timings and the changes we made to improve them starting from the top:
+---+------------+-------------+--------------+---------------------------------+
| | Test Name | Nodes/Tests | Elapsed Time | Change from last test |
| | | (parallel) | (best) | |
+---+------------+-------------+--------------+---------------------------------+
|1. | Test #8 | One/One | 00:23:29.39 | -- |
|2. | Test #7 | One/Four | 00:37:57.90 | -- |
|3. | Test #2 | Four/Four | 03:57:53.21 | NOTE: this is what you'd get |
| | | | | if you went from a single |
| | | | | instance to a RAC and did |
| | | | | nothing at all |
| | | | | |
|4. | Test #4 | Four/Four | 01:20:33.10 | cache the sequence |
| | | | | |
|5. | Test 2 (II)| Four/Four | 00:51:20.80 | ensure data is passed via both |
| | | | | control units |
|6. | Test 3 (II)| One/Four | 00:32:22.48 | load data into a table which is |
| | | | | created on top of 4 data-files |
|7. | Test 4 (II)| Four/Four | 00:45:32.65 | -- |
+---+------------+-------------+--------------+---------------------------------+
The bottom line is that after all of the improvements the best time on a 4-way RAC cluster was 00:45:32.65 vs 00:32:22.48 on a single node (see 7 vs 6 above). That’s not bad you might think, but realize that we are still taking a 13 minute penalty with RAC even after doing all the right things … You might think “What’s 13 minutes? Nothing!” well yes, it’s not as severe as 3 hour and 20 minute penalty we initially took going directly from a single instance to a 4-way RAC cluster (see 2 vs 3 above) but realize that 13 minutes is a 40% penalty (on a 32 minute baseline)! Just think about it — 40% penalty!
The moral here is that an extreme care must be taken designing an application that goes on top of an Oracle RAC. And if you are responsible for porting an existing one you must carefully watch for the pitfalls and stress test like you’ve never stress tested before. Complex architecture of the RAC system creates multitude of unforseen scenarios which are very hard to predict during the QA phase.
Oracle RAC implementation must be taken very seriously by all team members not just the DBAs and SAs it’s not as simple as throwing couple of nodes at it whenever you hit the scalability ceiling. Realize that once you go live on an ORACLE RAC, the database, storage and applications become tightly coupled and will have to be carefully evaluated every time a change is made to the application. Is your organization ready for such change in deployment procedures? Only you can answer these questions, but it’s your job as a DBA to bring these to the table and to get your team involved with the process as early as possible.

4 way HP DL380 G5 ORACLE RAC Cluster (Front)

4 way HP DL380 G5 ORACLE RAC Cluster (Back)