You’ve been there haven’t you? You buy a product, use it, have an issue, cringe and call support. What comes next either breaks or makes your future relations with the company and the product they sold you. It’s at this point that you find out if they are your partner or you are on your own. With consumer products my expectation for a good one is to never break — I don’t ever want to have to call their support. But when it comes to the toys we use in data centers I want to hit all of the big issues during QA and I want it to break down so badly that it will stump the technical support team so we get a chance to test their infrastructure and protocols.

That’s exactly what happened to one of the key products we selected for our latest large scale implementation. It’s irrelevant to this article what that product is, what’s relevant is that the support organization experience was disappointing. I am not going to focus on what went wrong — it really isn’t that important, what’s important is that we got the message across and it appears we’ll have a meeting with the people that might be able to make a difference. And I really hope that they do — I believe this company has potential.

I was thinking about the upcoming meeting and what to address and here’s what I realized. It’s actually very simple — here’s my list of things I Need From a Support Organization of a technology company:

  1. Community
  2. Communication
  3. Response

Community

Community is where you go first if you have a non urgent question. Community needs to consist of three components:

  1. Knowledge Base
  2. Bug Database
  3. Forums

Knowledge Base needs to offer an up to date list of known issues and white papers that bridge the gap between official documentation and what customers are actually going through while doing hands on implementations. Bug Database is where you get a chance to feel the pulse of the company — it’s an integral part of the community and it doesn’t have to be an open Bug database that we see in the open source communities, no, just a read-only access where you get to see what kinds of issues development is working on these days. Through Forums you share experiences with other customers and learn best practices by picking the brains of the community gurus. Forums is what connects you with the rest of the customer base and gives you a chance to see the bigger picture, it also shows that the company is open about it’s business and is not afraid of public scrutiny.

Communication

Communication is one of the most critical aspects of a support organization. It needs to flow both ways — customer to support and support to the customer. Support needs to drive Communication, they need to deliver instant updates to your issue and they need to squeeze the last drop of information from you because even the smallest piece of information can play a huge role in how effective a solution will be or how it will help other customers that might be facing similar symptoms but have not yet identified the cause of the problem. Communication is the only thing that allows a good support organization to create strong Knowledge Base because it’s the only way to gage what you are really experiencing in the field.

For example a failure of their product might have an adverse affect on other products within your technology stack so it’s imperative for support to ask you for the error messages that you might have seen with the rest of the components so that these symptoms can be properly documented and published in the knowledge base as a white paper or an alert.

Response

Response is the most critical aspect of the support organization. The worst thing you as a customer can experience is when your requests are being ignored or lost in the queue. You need to know as soon as possible that someone is looking into the issue and it better NOT be an boilerplate response from an automated system. Response needs to include action plan — it’s not enough to simply say “We are working on it” — it should provide an action plan even if it involves you — the customer. Response goes hand in hand with communication and it needs to be recorded into a system that you can login to view history of events. Just like Communication, Response goes both ways and support needs to drive it — if a customer is delaying response to a request for information support needs to follow up immediately.

And there you have it — Community, Communication, Response = effective support organization.

Pillar Axiom 500 SAN & NAS Storage

Pillar Axiom 500 SAN & NAS Storage

About a year and a half ago our shop made a decision to implement ORACLE RAC after it became apparent we will have to stop using SUN (Solaris) for our database needs. The cost of SUN hardware and ORACLE’s commitment to Linux were the final nails into SUN’s coffin. Oracle RAC was getting a lot of traction as more and more shops put their mission critical apps onto RAC.

I immediately got engaged in research and implementation of an ORACLE RAC Cluster. After the initial research was done it became apparent that we needed to decide on what shared storage to use. The two contenders we worked with were NetAPP and Pillar. Pillar won the bid because they offered something NetAPP couldn’t — closest integration with ORACLE RDBMS by offering Quality of Service (QOS) depending on data/access type. QOS and strong technical people that they brought to the table won us over.

It took a lot of time to get everyone on board with our decision to move away from SUN but eventually we built the right team and setup the QA cluster in a very short period of time. The cluster consists of 4 HP Proliant 380DL G5’s each with 16G RAM and dual E5440 procs running 64bit OEL 4.5. Shared storage is provided by the Pillar Axiom 500 SAN patched via dual brocade switches to two single port qla2400’s on each node. I think you’ve had enough of the geek talk for now?

I finished the installation and it was now time to burn in the hardware. I chose a simple test — load lots of data into a partitioned table and in the process hit all of the spindles on Pillar. I didn’t care about how realistic the test was I just wanted to put as much load on the components for long periods of time. Here’s the test case:

connect / as sysdba
grant connect, resource to perftst identified by perftst;
alter user perftst default tablespace users
temporary tablespace temp;
grant select on dba_objects to perftst ;
 
connect perftst/perftst
 
drop table pillar_test;
 
create table pillar_test(
   pkey number(15) not null,
   pdate date not null,
   pchar varchar2(4000) not null)
partition by hash(pkey) (
   partition pt_a tablespace users
,  partition pt_b tablespace users
,  partition pt_c tablespace users
,  partition pt_d tablespace users
,  partition pt_e tablespace users
,  partition pt_f tablespace users
,  partition pt_g tablespace users
,  partition pt_h tablespace users);
 
drop sequence pillar_test_s;
create sequence pillar_test_s;
 
create or replace procedure pillar_test_proc(p_pass in number)
as
begin
   for x in 1 .. p_pass
   loop
      insert into pillar_test
      select pillar_test_s.nextval,
             sysdate,
             object_name||'.'||object_type||'.'||owner
             from sys.dba_objects;
   end loop;
end;
/
 
sqlplus /nolog <<EOF
connect perftst/perftst
set time on
set timing on
exec pillar_test_proc(1000);
exit
EOF

First I ran this test from a single node, then I span it onto four nodes monitoring the wait events which were recorded by turning sql-trace ON for one of the sessions … While doing that I got sucked into the performance aspect of this test because the timings were horrible when the test ran simultaneously on all four nodes. For example the first data load from a single node took 33 minutes (46043000 rows were created) and when the same data load ran simultaneously on four nodes it took 3 hours and 57 minutes each! It was kind of hard to ignore this so I decided to dig into the stats and try to understand what was causing this degradation in throughput. In the process I’ve learned invaluable information — here are the results:

Dual Brocades are used to connect to SAN

Dual Brocades are used to connect to SAN

Test #1

1 node / 1 data load process

   Elapsed: 00:33:35.29

Waits:

  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  row cache lock                             486489        0.00         19.20
  DFS lock handle                               341        0.49         22.87

Test #2

4 nodes / each doing it’s data load

   Elapsed: 03:57:53.21
   Elapsed: 03:56:53.87
   Elapsed: 03:56:52.30
   Elapsed: 03:55:01.58

Waits: (just ignore the low “Total Waited” numbers — I turned the trace late in the game):

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  row cache lock                              42638        0.70        419.65
  gc cr block busy                            44301        0.91        133.91
  db file sequential read                      4992        0.73         66.54
  enq: TX - contention                          346        0.49        101.44

Notes: The biggest issue here was waiting on “row cache lock” my research showed that the culprit was the uncached sequence (see ML Note:4390868.8).

Solution:

   alter sequence pillar_test_s cache 10000;

Test #3

This is the same as Test #2 but with cached seq.

   database got corrupted (issue with SAN storage)

Fix:

   upgrade pillar firmware
   replace pillar CU1's mobo

Test #4

4 nodes / each doing it’s data load — this is the fist successful test after the sequence was cached.

   Elapsed: 01:21:29.46
   Elapsed: 01:21:12.16
   Elapsed: 01:20:33.10
   Elapsed: 01:20:49.23

Waits:

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  db file sequential read                    276718        0.38       2442.71
  gc cr block busy                             5631        0.33        342.28

Notes: This is a huge performance boost to the overall throughput in comparison to Test #2. We went down from almost 4 hours to 1 hour and 20 minutes average! Caching the sequence made an incredible difference.

Test #5

Same as Test #4 no changes just to further burn pillar in

   Elapsed: 02:15:08.94
   unable to extend table  [USERS ts filled up]
   Elapsed: 01:42:50.61
   unable to extend table  [USERS ts filled up]

Waits:

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  gc cr block busy                            30473        1.00       1098.09
  gc current block busy                       64267        0.57       1978.75
  gc current block 2-way                      59767        0.43        541.29
  db file sequential read                     49086        0.20        329.69
  gc cr block 2-way                         1020376        0.02        237.45
  gc current block 3-way                     155133        0.62       1196.94
  gc current retry                             2526        0.22        140.10
  gc buffer busy                              13361        0.97        338.95

Notes: since the load had to be rolled back on two nodes this test was simply an exception, nevertheless, it’s a good example of what to expect when a large transaction is being rolled back in a 4-way cluster — you see a different set of wait events and the throughput of the other sessions are heavily hindered by the competing I/O.

Solution:

      ALTER TABLESPACE USERS
      ADD DATAFILE '+DATA(DATAFILE)' SIZE 10000M
      AUTOEXTEND ON NEXT 100M MAXSIZE UNLIMITED;

Test #6

Same as test 4 and 5 — no changes other than having more space in USERS tablespace

   Elapsed: 01:14:21.17
   Elapsed: 01:15:00.04
   Elapsed: 01:11:37.21
   Elapsed: 01:13:52.00

Waits:

  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  gc current block busy                       29946        0.48       1408.29
  gc buffer busy                              16755        0.36        271.87
  gc cr block busy                             5939        0.87        474.33
  gc current grant busy                       20965        0.37         94.89
  row cache lock                               7815        0.88        163.39
  db file sequential read                      9472        0.23         91.23

Notes: Something interesting happened here — since we have to ignore the results of the Test #5 (due to rollback) lets focus on comparison with Test #4 which on average, ran 10 minutes longer and had the most waits on “db file sequential read” while this test waited mostly on “gc current block busy”.

Why would this be the case? There were only two changes between these two tests:

  1. addition of one “pre-extended” datafile
  2. Test #5 pre-extended the table and then rolled back all of the data

Perhaps the “db file sequential read” waits were on creating new extents for the partitions …

Test #8

Ever since I cached the sequence I never went back to retest a single load on a single server, so I figured it’s time to do that now:

   Elapsed: 00:23:29.39

Waits:

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  db file sequential read                      4673        0.06         15.85
  gc current multi block request              52281        0.02         11.15
  log file switch completion                    139        0.27         19.18
  Data file init write                         4393        0.07         25.35

Notes: It’s called Test #8 because I ran it after #7 was already done but it important to show this result before we get to Test #7 and the conclusion of all this …

Test #7

One final test I wanted to do was to run four instances of the data-loading procedure all from the same NODE to see how it compares to the clustered test. Here are the results:

   Elapsed: 00:37:57.90
   Elapsed: 00:38:05.00
   Elapsed: 00:38:41.37
   Elapsed: 00:40:24.61

Waits:

  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  log buffer space                             1169        0.24         45.19
  db file sequential read                     82460        0.26        195.84
  buffer busy waits                            2759        0.98         53.22
  log file switch completion                    622        0.97        107.32
  Data file init write                         9041        0.22         93.90

Notes: Totally different picture is painted here — first of all the timings are GREAT! 37-40 minutes per run — that’s the closest I got so far to the single run on a single NODE which took 23 minutes (Test #8).

Lets ignore the “log file switch completion” — that’s an easy fix. Lets take a look at what blocks we were hitting while waiting on “db file sequential read” since it’s the highest wait here …

bash-3.00$ grep "db file sequential read" xracq3_ora_15494.trc | head
WAIT #1: nam='db file sequential read' ela= 425 file#=8 block#=1500250 blocks=1 obj#=47469 tim=1190035629498382
WAIT #1: nam='db file sequential read' ela= 284 file#=8 block#=1481753 blocks=1 obj#=47185 tim=1190035629531711
WAIT #1: nam='db file sequential read' ela= 382 file#=8 block#=1500313 blocks=1 obj#=47469 tim=1190035629532250
WAIT #1: nam='db file sequential read' ela= 465 file#=8 block#=1500188 blocks=1 obj#=47469 tim=1190035629705544
WAIT #1: nam='db file sequential read' ela= 2428 file#=8 block#=1481754 blocks=1 obj#=47185 tim=1190035629716264
WAIT #1: nam='db file sequential read' ela= 418 file#=8 block#=1500251 blocks=1 obj#=47469 tim=1190035629750103
WAIT #1: nam='db file sequential read' ela= 2248 file#=8 block#=1481817 blocks=1 obj#=47185 tim=1190035629759915
WAIT #1: nam='db file sequential read' ela= 260 file#=8 block#=1500314 blocks=1 obj#=47469 tim=1190035629800526
WAIT #1: nam='db file sequential read' ela= 319 file#=8 block#=1500377 blocks=1 obj#=47469 tim=1190035629835047
WAIT #1: nam='db file sequential read' ela= 325 file#=8 block#=1481755 blocks=1 obj#=47185 tim=1190035629836766
bash-3.00$

Lets find the obj# (I am not interested in block# since I already knew they were all different — it’s an insert and I am not going to identify a hot block I am just trying to identify the objects we are waiting on with the “db file sequential read”):

grep "db file sequential read" xracq3_ora_15494.trc |
   sed -e 's/.*file#=/  file /' \
       -e 's/ block#=.*.blocks=.*.obj#=/  obj /' \
       -e 's/ tim=.*//' |
   head
 
bash-3.00$ grep "db file sequential read" xracq3_ora_15494.trc |
>    sed -e 's/.*file#=/  file /' \
>        -e 's/ block#=.*.blocks=.*.obj#=/  obj /' \
>        -e 's/ tim=.*//' |
>    head
  file 8  obj 47469
  file 8  obj 47185
  file 8  obj 47469
  file 8  obj 47469
  file 8  obj 47185
  file 8  obj 47469
  file 8  obj 47185
  file 8  obj 47469
  file 8  obj 47469
  file 8  obj 47185
bash-3.00$

And finally get counts of waits grouped by obj and file to find the heavy hitters:

grep "db file sequential read" xracq3_ora_15494.trc |
   sed -e 's/.*file#=/  file /' \
       -e 's/ block#=.*.blocks=.*.obj#=/  obj /' \
       -e 's/ tim=.*//' |
   sort |
   uniq -c |
   sort -nr
 
bash-3.00$ grep "db file sequential read" xracq3_ora_15494.trc |
>    sed -e 's/.*file#=/  file /' \
>        -e 's/ block#=.*.blocks=.*.obj#=/  obj /' \
>        -e 's/ tim=.*//' |
>    sort |
>    uniq -c |
>    sort -nr
  11293   file 8  obj 47184
  11225   file 8  obj 47185
  10713   file 8  obj 47467
  10706   file 8  obj 47186
  10436   file 8  obj 47468
  10228   file 8  obj 47469
   9986   file 8  obj 47183
   7774   file 8  obj 47182
     26   file 5  obj 47185
     20   file 5  obj 47183
     18   file 5  obj 47186
     16   file 5  obj 47182
     15   file 5  obj 47469
     12   file 5  obj 47467
     10   file 5  obj 47184
      8   file 5  obj 47468
      4   file 5  obj 0
      2   file 8  obj 0
bash-3.00$

Here are the heavy hitters:

  11293   file 8  obj 47184
  11225   file 8  obj 47185
  10713   file 8  obj 47467
  10706   file 8  obj 47186
  10436   file 8  obj 47468
  10228   file 8  obj 47469
   9986   file 8  obj 47183
   7774   file 8  obj 47182

Now lets see what they are:

col object_name format a20
col owner format a10
col subobject_name format a20
 
select object_name,subobject_name,
       object_type,owner
  from dba_objects
 where object_id in (47184,
                     47185,
                     47467,
                     47186,
                     47468,
                     47469,
                     47183,
                     47182);
 
SQL> select object_name,subobject_name,
  2         object_type,owner
  3    from dba_objects
  4   where object_id in (47184,
  5                       47185,
  6                       47467,
  7                       47186,
  8                       47468,
  9                       47469,
 10                       47183,
 11                       47182);
 
OBJECT_NAME          SUBOBJECT_NAME       OBJECT_TYPE         OWNER
-------------------- -------------------- ------------------- ----------
PILLAR_TEST          PT_H                 TABLE PARTITION     PERFTST
PILLAR_TEST          PT_G                 TABLE PARTITION     PERFTST
PILLAR_TEST          PT_F                 TABLE PARTITION     PERFTST
PILLAR_TEST          PT_E                 TABLE PARTITION     PERFTST
PILLAR_TEST          PT_D                 TABLE PARTITION     PERFTST
PILLAR_TEST          PT_C                 TABLE PARTITION     PERFTST
PILLAR_TEST          PT_B                 TABLE PARTITION     PERFTST
PILLAR_TEST          PT_A                 TABLE PARTITION     PERFTST
 
8 rows selected.
 
SQL>

While not related to RAC it’s not something I expected here. Why would ORACLE want to do “db file sequential read” on the table we are INSERTing into? There are no indexes and no select clause that references this table … My guess is that ORACLE reads in each block before it can insert into it to see if there’s any space left there. Maybe it has something to do with ASSM (this tablespace is ASSM) or maybe it’s the HASH partitioning — I really don’t know. Whatever the reason is — there’s no denying it — INSERT INTO table_a incurs “db file sequential read” on table_a even if there are no indexes.

Conclusion

Going back to the subject of this page — Oracle RAC Scalability. Something became apparent to me throughout this test — ORACLE RAC doesn’t scale AT ALL when the same blocks are worked on by multiple instances. This is not something you will hear from ORACLE, instead it’s “On-demand scalability — Expand capacity by simply adding servers to your cluster” (copied from oracle.com’s RAC page). Perhaps it should say that ORACLE RAC will give you an opportunity to provide On-demand scalability when more nodes are added to the cluster. But that doesn’t sound as good as “On-demand scalability — Expand capacity by simply adding servers to your cluster” does it? No, because providing an opportunity implies some work to be done on your part instead of things being done automagically.

Update [SEP-02-2008]

I just ran another series of tests because I realized the following:

  1. When one of the Pillar Control Unit’s (CU) failed the multipath software directed all of the data to the secondary CU. When we replaced failed unit I never verified if the multipath was back to optimal using both CU’s. Thus there’s a possibility that we were only using a single data path even though multiple ASM LUNs were created and assigned to alternating control units.
  2. I never tested multiple datafile scenario to see if it would improve the overall throughput.

For this series of tests I verified multipath was back to optimal with both CU’s engaged and I created a secondary table on top of four datafiles to see if it would make any difference. Here’s the new DDL for both single and multi-datafile tests:

sqlplus /nolog
 
connect / as sysdba
 
CREATE SMALLFILE TABLESPACE single_dbf
DATAFILE '+DATA(DATAFILE)' SIZE 10000M 
AUTOEXTEND ON NEXT 1280K MAXSIZE UNLIMITED 
EXTENT MANAGEMENT LOCAL 
SEGMENT SPACE MANAGEMENT AUTO;
 
 
CREATE SMALLFILE TABLESPACE multi_dbf
DATAFILE '+DATA(DATAFILE)' SIZE 10000M AUTOEXTEND ON NEXT 1280K MAXSIZE UNLIMITED ,
         '+DATA(DATAFILE)' SIZE 10000M AUTOEXTEND ON NEXT 1280K MAXSIZE UNLIMITED ,
         '+DATA(DATAFILE)' SIZE 10000M AUTOEXTEND ON NEXT 1280K MAXSIZE UNLIMITED ,
         '+DATA(DATAFILE)' SIZE 10000M AUTOEXTEND ON NEXT 1280K MAXSIZE UNLIMITED 
EXTENT MANAGEMENT LOCAL 
SEGMENT SPACE MANAGEMENT AUTO;
 
 
connect perftst/perftst
 
drop table pillar_test_single_dbf;
 
create table pillar_test_single_dbf(
   pkey number(15) not null,
   pdate date not null,
   pchar varchar2(4000) not null)
partition by hash(pkey) (
   partition pt_a tablespace single_dbf
,  partition pt_b tablespace single_dbf
,  partition pt_c tablespace single_dbf
,  partition pt_d tablespace single_dbf
,  partition pt_e tablespace single_dbf
,  partition pt_f tablespace single_dbf
,  partition pt_g tablespace single_dbf
,  partition pt_h tablespace single_dbf);
 
 
create or replace procedure pillar_test_proc_single_dbf(p_pass in number)
as
begin
   for x in 1 .. p_pass
   loop
      insert into pillar_test_single_dbf
      select pillar_test_s.nextval,
             sysdate,
             object_name||'.'||object_type||'.'||owner
             from sys.dba_objects;
   end loop;
end;
/
 
drop table pillar_test_multi_dbf;
 
create table pillar_test_multi_dbf(
   pkey number(15) not null,
   pdate date not null,
   pchar varchar2(4000) not null)
partition by hash(pkey) (
   partition pt_a tablespace multi_dbf
,  partition pt_b tablespace multi_dbf
,  partition pt_c tablespace multi_dbf
,  partition pt_d tablespace multi_dbf
,  partition pt_e tablespace multi_dbf
,  partition pt_f tablespace multi_dbf
,  partition pt_g tablespace multi_dbf
,  partition pt_h tablespace multi_dbf);
 
 
create or replace procedure pillar_test_proc_multi_dbf(p_pass in number)
as
begin
   for x in 1 .. p_pass
   loop
      insert into pillar_test_multi_dbf
      select pillar_test_s.nextval,
             sysdate,
             object_name||'.'||object_type||'.'||owner
             from sys.dba_objects;
   end loop;
end;
/
 
 
SQL> select REFERENCED_NAME,REFERENCED_TYPE
  2  from user_dependencies
  3  where name = 'PILLAR_TEST_PROC_SINGLE_DBF';
 
REFERENCED_NAME                                                  REFERENCED_TYPE
---------------------------------------------------------------- -----------------
STANDARD                                                         PACKAGE
DBA_OBJECTS                                                      VIEW
SYS_STUB_FOR_PURITY_ANALYSIS                                     PACKAGE
PILLAR_TEST_S                                                    SEQUENCE
PILLAR_TEST_SINGLE_DBF                                           TABLE
 
SQL>
SQL> select REFERENCED_NAME,REFERENCED_TYPE
  2  from user_dependencies
  3  where name = 'PILLAR_TEST_PROC_MULTI_DBF';
 
REFERENCED_NAME                                                  REFERENCED_TYPE
---------------------------------------------------------------- -----------------
STANDARD                                                         PACKAGE
DBA_OBJECTS                                                      VIEW
SYS_STUB_FOR_PURITY_ANALYSIS                                     PACKAGE
PILLAR_TEST_S                                                    SEQUENCE
PILLAR_TEST_MULTI_DBF                                            TABLE
 
SQL>
 
 
SQL> r
  1  select table_name,PARTITION_NAME,TABLESPACE_NAME
  2  from user_tab_partitions
  3* where table_name = 'PILLAR_TEST_SINGLE_DBF'
 
TABLE_NAME                     PARTITION_NAME                 TABLESPACE_NAME
------------------------------ ------------------------------ ------------------------------
PILLAR_TEST_SINGLE_DBF         PT_A                           SINGLE_DBF
PILLAR_TEST_SINGLE_DBF         PT_B                           SINGLE_DBF
PILLAR_TEST_SINGLE_DBF         PT_C                           SINGLE_DBF
PILLAR_TEST_SINGLE_DBF         PT_D                           SINGLE_DBF
PILLAR_TEST_SINGLE_DBF         PT_E                           SINGLE_DBF
PILLAR_TEST_SINGLE_DBF         PT_F                           SINGLE_DBF
PILLAR_TEST_SINGLE_DBF         PT_G                           SINGLE_DBF
PILLAR_TEST_SINGLE_DBF         PT_H                           SINGLE_DBF
 
8 rows selected.
 
SQL>
 
 
 
SQL> r
  1  select table_name,PARTITION_NAME,TABLESPACE_NAME
  2  from user_tab_partitions
  3* where table_name = 'PILLAR_TEST_MULTI_DBF'
 
TABLE_NAME                     PARTITION_NAME                 TABLESPACE_NAME
------------------------------ ------------------------------ ------------------------------
PILLAR_TEST_MULTI_DBF          PT_A                           MULTI_DBF
PILLAR_TEST_MULTI_DBF          PT_B                           MULTI_DBF
PILLAR_TEST_MULTI_DBF          PT_C                           MULTI_DBF
PILLAR_TEST_MULTI_DBF          PT_D                           MULTI_DBF
PILLAR_TEST_MULTI_DBF          PT_E                           MULTI_DBF
PILLAR_TEST_MULTI_DBF          PT_F                           MULTI_DBF
PILLAR_TEST_MULTI_DBF          PT_G                           MULTI_DBF
PILLAR_TEST_MULTI_DBF          PT_H                           MULTI_DBF
 
8 rows selected.
 
SQL>
 
 
SQL> r
  1  select file_name from dba_data_files
  2* where tablespace_name = 'SINGLE_DBF'
 
FILE_NAME
---------------------------------------------
+DATA/xracq/datafile/single_dbf.270.663958519
 
SQL>
 
 
SQL> r
  1  select file_name from dba_data_files
  2* where tablespace_name = 'MULTI_DBF'
 
FILE_NAME
---------------------------------------------
+DATA/xracq/datafile/multi_dbf.271.663958699
+DATA/xracq/datafile/multi_dbf.272.663958819
+DATA/xracq/datafile/multi_dbf.273.663958937
+DATA/xracq/datafile/multi_dbf.274.663959053
 
SQL>
 
-- note I am reusing previously created sequence pillar_test_s ...

Create a shell script to simplify the test process:

## NOTE: 
##    /u02 is an OCFS2 fs accessible from all nodes
##
 
bash-3.00$ cat /u02/app/oracle/admin/scripts/bin/perf-test.sh
sqlplus /nolog <<EOF
connect perftst/perftst
set time on
set timing on
exec ${1}(1000);
exit
EOF

Test 1 (II)

## run 4 instances of pillar_test_proc_single_dbf ALL ON A SINGLE NODE
##
/u02/app/oracle/admin/scripts/bin/perf-test.sh pillar_test_proc_single_dbf &
/u02/app/oracle/admin/scripts/bin/perf-test.sh pillar_test_proc_single_dbf &
/u02/app/oracle/admin/scripts/bin/perf-test.sh pillar_test_proc_single_dbf &
/u02/app/oracle/admin/scripts/bin/perf-test.sh pillar_test_proc_single_dbf &
 
Elapsed: 00:38:07.44
Elapsed: 00:38:20.21
Elapsed: 00:38:48.51
Elapsed: 00:40:14.67

Test 2 (II)

## run pillar_test_proc_single_dbf on all 4 nodes
##
qarac1-> /u02/app/oracle/admin/scripts/bin/perf-test.sh pillar_test_proc_single_dbf
qarac2-> /u02/app/oracle/admin/scripts/bin/perf-test.sh pillar_test_proc_single_dbf
qarac3-> /u02/app/oracle/admin/scripts/bin/perf-test.sh pillar_test_proc_single_dbf
qarac4-> /u02/app/oracle/admin/scripts/bin/perf-test.sh pillar_test_proc_single_dbf
 
Elapsed: 00:52:29.73
Elapsed: 00:52:25.54
Elapsed: 00:51:20.80
Elapsed: 00:52:19.83

Test 3 (II)

## run 4 instances of pillar_test_proc_multi_dbf ALL ON A SINGLE NODE
##
/u02/app/oracle/admin/scripts/bin/perf-test.sh pillar_test_proc_multi_dbf &
/u02/app/oracle/admin/scripts/bin/perf-test.sh pillar_test_proc_multi_dbf &
/u02/app/oracle/admin/scripts/bin/perf-test.sh pillar_test_proc_multi_dbf &
/u02/app/oracle/admin/scripts/bin/perf-test.sh pillar_test_proc_multi_dbf &
 
Elapsed: 00:32:22.48
Elapsed: 00:32:49.22
Elapsed: 00:33:25.12
Elapsed: 00:33:28.65

Test 4 (II)

## now run pillar_test_proc_multi_dbf on all 4 nodes
##
qarac1-> /u02/app/oracle/admin/scripts/bin/perf-test.sh pillar_test_proc_multi_dbf
qarac2-> /u02/app/oracle/admin/scripts/bin/perf-test.sh pillar_test_proc_multi_dbf
qarac3-> /u02/app/oracle/admin/scripts/bin/perf-test.sh pillar_test_proc_multi_dbf
qarac4-> /u02/app/oracle/admin/scripts/bin/perf-test.sh pillar_test_proc_multi_dbf
 
Elapsed: 00:45:32.65
Elapsed: 00:47:47.58
Elapsed: 00:47:30.09
Elapsed: 00:47:01.17

Conclusion II

There’s no doubt that by ensuring both Control Units are utilized and by laying down a table on top of multiple data-files we were able to significantly improve the throughput of this test. Lets just review the most significant timings and the changes we made to improve them starting from the top:

+---+------------+-------------+--------------+---------------------------------+
|   | Test Name  | Nodes/Tests | Elapsed Time | Change from last test           |
|   |            | (parallel)  | (best)       |                                 |
+---+------------+-------------+--------------+---------------------------------+
|1. | Test #8    | One/One     | 00:23:29.39  | --                              |
|2. | Test #7    | One/Four    | 00:37:57.90  | --                              |
|3. | Test #2    | Four/Four   | 03:57:53.21  | NOTE: this is what you'd get    |
|   |            |             |              | if you went from a single       |
|   |            |             |              | instance to a RAC and did       |
|   |            |             |              | nothing at all                  |
|   |            |             |              |                                 |
|4. | Test #4    | Four/Four   | 01:20:33.10  | cache the sequence              |
|   |            |             |              |                                 |
|5. | Test 2 (II)| Four/Four   | 00:51:20.80  | ensure data is passed via both  |
|   |            |             |              | control units                   |
|6. | Test 3 (II)| One/Four    | 00:32:22.48  | load data into a table which is |
|   |            |             |              | created on top of 4 data-files  |
|7. | Test 4 (II)| Four/Four   | 00:45:32.65  | --                              |
+---+------------+-------------+--------------+---------------------------------+

The bottom line is that after all of the improvements the best time on a 4-way RAC cluster was 00:45:32.65 vs 00:32:22.48 on a single node (see 7 vs 6 above). That’s not bad you might think, but realize that we are still taking a 13 minute penalty with RAC even after doing all the right things … You might think “What’s 13 minutes? Nothing!” well yes, it’s not as severe as 3 hour and 20 minute penalty we initially took going directly from a single instance to a 4-way RAC cluster (see 2 vs 3 above) but realize that 13 minutes is a 40% penalty (on a 32 minute baseline)! Just think about it — 40% penalty!

The moral here is that an extreme care must be taken designing an application that goes on top of an Oracle RAC. And if you are responsible for porting an existing one you must carefully watch for the pitfalls and stress test like you’ve never stress tested before. Complex architecture of the RAC system creates multitude of unforseen scenarios which are very hard to predict during the QA phase.

Oracle RAC implementation must be taken very seriously by all team members not just the DBAs and SAs it’s not as simple as throwing couple of nodes at it whenever you hit the scalability ceiling. Realize that once you go live on an ORACLE RAC, the database, storage and applications become tightly coupled and will have to be carefully evaluated every time a change is made to the application. Is your organization ready for such change in deployment procedures? Only you can answer these questions, but it’s your job as a DBA to bring these to the table and to get your team involved with the process as early as possible.

4 way HP DL380 G5 ORACLE RAC Cluster (Front)

4 way HP DL380 G5 ORACLE RAC Cluster (Front)

4 way HP DL380 G5 ORACLE RAC Cluster (Back)

4 way HP DL380 G5 ORACLE RAC Cluster (Back)

After running ORACLE on SUN for over 7 years and being the big proponent of it I have finally said enough is enough. Starting this summer our shop made a switch to Linux (HP) and so far I could not be happier. I just don’t get it with SUN anymore they keep digging a deeper whole by separating themselves further and further from ORACLE. And with ORACLE now offering Linux support I just don’t see why would someone not yet invested in SUN hardware would want to choose them over Linux.

Overpriced SUN Hardware is replaced by powerful and reasonably priced HP Boxes

Overpriced SUN Hardware is replaced by powerful and reasonably priced HP Boxes (Front)

Overpriced SUN Hardware is replaced by powerful and reasonably priced HP Boxes (Back)

Overpriced SUN Hardware is replaced by powerful and reasonably priced HP Boxes (Back)