Making sense of ASH TIME_WAITED Units

It’s widely believed that ASH / v$active_session_history.TIME_WAITED is presented in microseconds. It sure seems like it, but something is amiss as my small test shows. Using two sessions I created a test table in one, inserted a row and then from another session tried to select the same row for update:

| session A > SQL> create table ash_time_waited_test (t number);
| session A > 
| session A > Table created.
| session A > 
| session A > SQL> insert into ash_time_waited_test values(1);
| session A > 
| session A > 1 row created.
| session A > 
| session A > SQL> commit;
| session A > 
| session A > Commit complete.
| session A > 
| session A > SQL> update ash_time_waited_test set t=2 where t=1;
| session A > 
| session A > 1 row updated.

| session B > SQL> !date
| session B > Thu Jul 30 18:20:15 PDT 2009
| session B > 
| session B > SQL> select * from ash_time_waited_test for update;

| session A > 
| session A > SQL> commit;
| session A > 
| session A > Commit complete.
| session A > 
| session A > SQL>
| session A > 

| session B > 
| session B >          T
| session B > ----------
| session B >          2
| session B > 
| session B > SQL> !date
| session B > Thu Jul 30 18:20:38 PDT 2009
| session B > 

What I found next was rather strange — the sum(TIME_WAITED) from ASH (v$active_session_history) did not add up to the total time session A kept the lock on the test table:

First we get the SQL_ID for session B (the waiter) so we can tie into ASH.SQL_ID later:

SQL> select sql_id from v$sql where sql_text = 'select * from ash_time_waited_test for update';

SQL_ID
-------------
fpm89fyrsf6gv

Now having the SQL_ID in hand we can dig into ASH to see what waits our session experienced:

SQL> col SAMPLE_TIME format a25
SQL> col event format a30
SQL> select sample_time,EVENT,WAIT_TIME,TIME_WAITED from v$active_session_history
  2  where sql_id = 'fpm89fyrsf6gv'
  3  order by sample_time;

SAMPLE_TIME               EVENT                           WAIT_TIME TIME_WAITED
------------------------- ------------------------------ ---------- -----------
30-JUL-09 06.20.25.751 PM enq: TX - row lock contention           0      489233
30-JUL-09 06.20.26.761 PM enq: TX - row lock contention           0      489244
30-JUL-09 06.20.27.761 PM enq: TX - row lock contention           0      489242
30-JUL-09 06.20.28.761 PM enq: TX - row lock contention           0      489238
30-JUL-09 06.20.29.761 PM enq: TX - row lock contention           0      489244
30-JUL-09 06.20.30.761 PM enq: TX - row lock contention           0      489242
30-JUL-09 06.20.31.761 PM enq: TX - row lock contention           0      489244
30-JUL-09 06.20.32.761 PM enq: TX - row lock contention           0      489253
30-JUL-09 06.20.33.761 PM enq: TX - row lock contention           0      489233
30-JUL-09 06.20.34.761 PM enq: TX - row lock contention           0      489245
30-JUL-09 06.20.35.761 PM enq: TX - row lock contention           0      489245

11 rows selected.

SQL>

Based on the output above we know that the first sample that trapped this wait occurred at “06.20.25.751 PM” and the last one at “06.20.35.761 PM”. The difference (delta) between these two timestamps is just over 10 seconds:

SQL> select e.SAMPLE_TIME - s.SAMPLE_TIME
  2  from v$active_session_history s,
  3       v$active_session_history e
  4  where s.sample_id = (select min(sample_id) from v$active_session_history where sql_id = 'fpm89fyrsf6gv')
  5    and e.sample_id = (select max(sample_id) from v$active_session_history where sql_id = 'fpm89fyrsf6gv')
  6  ;

E.SAMPLE_TIME-S.SAMPLE_TIME
---------------------------------------------------------------------------
+000000000 00:00:10.010

SQL>

So far so good. Now lets see what we get by summing up the TIME_WAITED:

SQL> select sum(TIME_WAITED) from v$active_session_history
  2  where sql_id = 'fpm89fyrsf6gv'
  3  ;

SUM(TIME_WAITED)
----------------
         5381663

Interesting? If the ASH.TIME_WAITED is measured in microseconds then the total wait time = 5381663/1000000 = 5.381663 seconds and as we know it’s not correct. In fact, we can prove this using another metric, the ELAPSED_TIME_* from DBA_HIST_SQLSTAT:

SQL> select ELAPSED_TIME_TOTAL, ELAPSED_TIME_DELTA,
  2  APWAIT_TOTAL, APWAIT_DELTA,
  3  CCWAIT_TOTAL, CCWAIT_DELTA
  4  from DBA_HIST_SQLSTAT
  5  where sql_id = 'fpm89fyrsf6gv'
  6  ;

ELAPSED_TIME_TOTAL ELAPSED_TIME_DELTA APWAIT_TOTAL APWAIT_DELTA CCWAIT_TOTAL CCWAIT_DELTA
------------------ ------------------ ------------ ------------ ------------ ------------
          11110172           11110172     11106973     11106973          254          254

SQL>

DBA_HIST_SQLSTAT is dead on — with Elapsed Time Total = 11110172/1000000 = 11.110172 seconds … So what’s happening with ASH then? Why such a large > 2 times discrepancy?

I have my theory — I think that because ASH sampling is taken every second half of that seconds is actually taken to process the data and for this reason the TIME_WAITED gets truncated. Whatever the reason is, one thing is for sure — ASH.TIME_WAITED, at least in this particular case, cannot be trusted.

The final question … how do you then use ASH? How do you arrive at a more or less accurate metrics, specifically the TIME_WAITED? One way is this — we know what ASH sampling takes place every second, so instead of relying on the buggy TIME_WAITED values we can rely on the actual number of times that a SQL_ID appeared in ASH … If a SQL_ID was trapped by ASH 10 consecutive times with WAIT_TIME=0 (Waiting) we then know for sure that it waited for at least 10 seconds …

UPDATE Aug 12, 2009

Graham Wood from Oracle explains the issue with ASH: TIME_WAITED in his paper Sifting through the ASHes (see Page 35). He also lists some other “ASH data gotcha’s” on page 27. What really caught my attention through, was page 36, which explains “ASH vs AWR/Statpack” and it specifically says that for “Counts/occurrence data” analysis you should go to AWR not ASH.

And it really makes sense, after all, ASH is sampled every second and while it sounds like “often enough” it really isn’t. All you have to do is run sql_trace on a busy session and you’ll see that there’s a lot that goes on between the 1 second intervals — all of it would be missed by ASH. ASH is great at putting spotlight on the “big issues”, not so great for subtle, sub second waits which could and do add up to a considerable wait time.

Bottom line for me is that there’s no substitute for the good, old sql_trace, but overall, ASH is a huge improvement over what we had or rather didn’t have before. It definitely replaces my own monitoring scripts which I’ve used for many years now, sampling v$session_wait where wait_time=0 every 5 seconds.

Sat: January 2 2016: Update!

I am building a new monitoring framework that:

  1. Samples Oracle ASH every N minutes and captures the live heartbeat of the database.
  2. Allows you to define thresholds on top of this sampling and notifies you if something is amiss.
  3. Saves all this data for historical purposes so that when AWR is wiped out you have solid performance metrics years later.
  4. Monitors Oracle LOCKS.
  5. Monitors Oracle SPACE.
  6. Delivers this functionality in an easy to deploy (single binary) distribution that only takes a minute to install on a new host.

Does this sound like something you’d like in your shop? If yes – then I’d like to hear from you – sign up for the mailing list below to start the discussion and get the private beta as soon as it’s ready (no SPAM here I guarantee it!):

EvenTorEx Mailing List

July 31, 2009

Posted In: Tuning

Tags: , , , , , ,

Notes from the 11g Upgrade & Best Practices hosted by Tom Kyte

I finally got out of the dungeon and made it to an Oracle hosted event titled “Oracle Technology Day11g Upgrade & Best Practices” which was hosted by Tom Kyte and Oracle Center of Excellence (COE) folks in Los Angeles on June 16 ’09. And it was definitely worth it. I got a chance to finally meet Tom Kyte face to face and I even had my original, WROX “Expert One-One-One” signed by him. The place was packed and I even heard that the event organizers had to turn some people down because there was no room left.

The presentation was focused on 11g upgrade — I guess Oracle has had it with us “conservative” DBAs that stick to our guns and don’t upgrade until the R2 is out. Well, they sure got my attention with the “Revolutionary New Capabilities brought to you by the upcoming 11g R2” … But before I get to that I figured I’ll run through the notes of the stuff that was most interesting to me as a DBA, so instead of just stuffing the notepad someplace where I will never find it I figured it would be beneficial to both you and I if I wrote it up here:

Interesting stuff to research:
Quiesce
Incident Packaging
Total Recall
Data Masking

Capture & Replay Workload AKA Real Application Testing:
You can now achieve realistic testing with Database Capture and Replay. DB Capture is capable of recording bind variables, SCN and order of operations on your production instance and then using DB Replay you can reproduce the same exact load on a test system which will allow you to compare metrics (v$..), test upgrades and even test capacity plan.

Database Capture is available as of 9.2.0.8/up and it can be used across platforms, so you can easily use Captured workload from 9.2.0.8 and replay it on 11g test instance. Replay only works on 11g (11.1.0.7 see Note:738538.1).

The core of Real Application Testing is SPA (SQL Performance Analyzer) [pronounced like “heath SPA”] and SQL Tuning Advisor. SPA can be used in conjunction with Replay to look for regressed SQL, allowing you to build acceptable optimizer plans for the new release of the database before you upgrade. Replay commits changes — use Flashback DB to revert back and replay. 10.2.0.5 Grid Control can automate the Replay and restore of the database using GUI.

11g R2 will be able to UP the workload so it can be effectively used for future capacity planning.

Advise: when making comparisons using these tools rely on CPU TIME and BUFFER GETS rather than ELAPSED TIME.

Replay clients are called WRC. You can start multiple WRC processes to simulate concurrency. Filters can be used during Capture to limit what is being recorded. 11g R2 will also have selective replay capability.

Data Recovery Advisor:
If you want a second pair of “eyes” or have no idea what to do when your database requires recovery — Data Recovery Advisor might be able to help. Of course this requires you to use RMAN to backup your database in the first place, but you should be already doing that anyway right? But it also requires RMAN Catalog …

Flashback Data Archive:
Flashback is designed to “cure” the “right command / wrong database syndrome“, where you urgently want to UNDO a mistake.

9i – brought to us Flashback Query
10g – Flashback Tables and DB
11g – brings Flashback Data Archive & Transactions

Flash Data Archive is enabled on a table by table basis, where once a table is placed in the Data Archive a shadow table is created where all changes are recorded and kept. It works through a new background process and has very little (if any) overhead because it mines the UNDO. This is the best option for auditing and should replace all of your DIY audit triggers. While Audit triggers can be disabled or dropped, once a table is placed in the flash data archive it cannot be dropped even by the SYS DBA – perfect AUDIT.

Revolutionary New Capabilities brought to you by the upcoming 11g R2:
Expect to hear a lot of buzz about 11g R2 by October ’09. 80% of the Open World “talk” will be about 11g R2. What can we expect? How about “Upgrade Database Application” Online using a new, “revolutionary” feature called “ADDITION“?

create ADDITION v2;
alter session set ADDITION = v2;
— from now on everything takes place in this new virtual schema called v2 —
[compile new PKGs]
[add indexes and tables ???] Tom mentioned later that ADDITIONs only work for things that “live” in SYSTEM, i.e. compiled and virtual stuff, so this might not work for tables and indexes …
[v1/original is still fully operational and running]
[test v2]
[pause DB]
alter database set schema XYZ to V2 addition; [not real syntax …]
DONE.

Obviously, this will only work if you use PL/SQL. V2 becomes the current ADDITION once it’s enabled. V1 can be kept around if you desire so. This is NOT a source code control — it’s only designed to easily deploy database applications without disrupting your online users.

11g Upgrade Best Practices:
Upgrading to 10g? Review Note:466181.1
Upgrading to 11g? Review Note:601807.1

Oracle support really, really wants you to use OCM (Oracle Configuration Manager). They also want you to use the new Metalink (I don’t use it just in case you wanted to know …).

USE DBUA (I agree). DBUA == easy. Manual Upgrade == hard.

Backup your optimizer stats — see Note:465787.1

Don’t set any optimizer_* or db_multiblock_* params that you found on google. This is BAD practice. Most of the default values are good for 99% of installations — don’t touch them.

Gather dictionary statistics before you upgrade, it can dramatically improve the time it takes to upgrade your database.

Post upgrade run:
execute DBMS_STATS.GATHER_SYSTEM_STATS(start); [not exact command]
execute DBMS_STATS.GATHER_SYSTEM_STATS(stop); [not exact command]

On 11g use AUTO_SAMPLE_SIZE for ESTIMATE_PERCENT — this is 5-6 times faster and close to 100% accurate.

Export AWR repository BEFORE the upgrade.

Plan Stability:
On 11g use Plan Stability feature — see DBMS_SPM. SPM allows you to create SQL Baseline where only known and accepted plans are used for specific SQL. This can be either automated or setup manually by DBA. SPM works in three phases:

  1. Capture (stored in SYSAUX)
  2. Selection (DBA marks what’s accepted)
  3. Evolution (EVAL all unverified plans to see if they are worth considering)

Capture: OPTIMIZER_CAPTURE_SQL_PLAN_BASELINES = true
Selection: OPTIMIZER_USE_SQL_PLAN_BASELINES = true + The Plan must be ENABLED=YES AND ACCEPTED=YES to be used by SPM.
Evolution: Scheduled by DBA via EVAL procedure.

See Note:456518.1.

You can also pack the stored SQL Plan Baselines on DEV and then unpack them into PROD. Great thing to ask of your vendors — provide the SQL Plan Baselines with all your code.

If you are going to 11g from 10g you can utilize STS — create STS (SQL Tuning Set) on 10g, unload them to staging table and load them into 11g’s plan baseline.

Some Random Bits and funny Pieces:
Evolution of APEX:
Browse (good) -> WebDB (good) -> Portal (messy) -> dead end
few years later …
HTML DB (good) -> APEX (better)

The guy who wrote APEX (Mike Hichwa) is also largely responsible for existence of SQLPLUS and SQL Developer.

Bryn Llewellyn owns PLSQL.

DBMS_OUTPUT.put_line (danger)
sys.DBMS_OUTPUT.put_line (safe)

Tom Kyte:

This is a really big table. Really? How big? It’s really big — imagine the largest table and insert one more record into it — that’s how big.

Rexx is my favorite language …

:)

June 17, 2009

Posted In: Events

Tags: , , , , , , , , , , , , , , , ,

Oracle RAC’s share everything vs share nothing …

Google’s share nothing approach to application development has lead to the #1 search engine solution both in performance and functionality. Notice that I said “application development” because for the share nothing approach to work it needs to be built into the application from day one not as an afterthought.

On the other end of the spectrum we have ERP APPS where design with thousands of tables per module are the norm and the UNION ALL joins span a multi-page printout. In these types of applications Oracle RAC’s “share everything” approach is clearly superior, scratch that, it’s the only solution, period.

For an interesting read on this specific issue take a look at Kevin Closson’s post titled “Nearly Free or Not, GridSQL for EnterpriseDB is Simply Better Than Real Application Clusters. It is Shared-Nothing Architecture After All!” and a sort of reply to it by a blogger called “bonglonglong” titled “All in the assumptions“.

June 15, 2009

Posted In: RAC

Tags: , , ,

What the heck is “snapshot standby”?

I was just reviewing Note:565535.1 Flashback Database Best Practices & Performance and came across the following paragraph under the title of “Measuring Impact of turning Flashback Database ON / Analyzing the Performance Impact“:

If you’re on RDBMS release 11g and have a physical standby, then you can enable flashback database on your physical standby and then convert it to a snapshot standby (this can also be done in 10.2 with a manual snapshot standby, but no redo is received and applied as in the 11g Snapshot Standby. See Dell 10g Snapshot Standby – Dell 10g Snapshot Standby). Then when you run a workload against the snapshot standby it will be with flashback database logging enabled.

So what the heck is “snapshot standby”?

Essentially it allows you to convert your 11g physical standy to a “Snapshot standby” so that you can “Make changes to the Primary and the the Snapshot standby in some common tables” and then “Convert the Snapshot standby back to a Physical standby”. Makes sense? NO? OK, I know, how about this:

  • A snapshot standby database is a fully updateable standby database created by converting a physical standby database into a snapshot standby database.
  • A snapshot standby database receives and archives, but does not apply, redo data from a primary database. The redo data received from the primary database is applied once a snapshot standby database is converted back into a physical standby database, after discarding all local updates to the snapshot standby database.
  • A snapshot standby can be created from Enterprise Manager, the Data Guard Broker command line interface (DGMGRL) or from SQL*Plus.

Better? YES. This is probably THE reason to go 11g if you haven’t already. Basically you get an instant REPORTING/TEST server without loosing your Physical Standby database. See: Using Snapshot Standby for an example.

June 15, 2009

Posted In: Data Guard

Tags: , , ,

09′ IOUG Collaborate – notes for Oracle DBAs

I just reviewed Tom Kyte’s presentation from the 09′ IOUG Collaborate event and made a list of things that are specifically interesting to an Oracle DBA with the links for further research into each feature.

I. Those managing PHP/Oracle shops will be happy to learn that there’s a new Database Resident Connection Pool [DRCP] that performs as well as dedicated server (it really is) with benefits of shared server model.

II. Are you sill using TOAD? Perhaps it’s time to give Oracle Sql Developer a shoot because it’s offering a host of Sql/PLSQL Tuning Features built right in:

June 10, 2009

Posted In: Tuning

Tags: , , , , , ,

Formatting Corrupted Block Not Part of Any Segment

It’s been a while since my last post … but this issue I ran into last night is worth mentioning. Here’s what happened — one of my RMAN backups failed with:

   RMAN-00571: ===========================================================
   RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
   RMAN-00571: ===========================================================
   RMAN-03002: failure of backup plus archivelog command at 05/05/2009 23:16:15
   ORA-19566: exceeded limit of 0 corrupt blocks for file /u01/app/czgddata/a_txn_data02.dbf

The corrupt block was being reported as part of a segment — a table:

   SQL> connect / as sysdba
   Connected.
   SQL>
   SQL>
   SQL> select owner, segment_name, segment_type
   from   dba_extents
   where  file_id = 401
   and    257532 between block_id and block_id + blocks - 1;
   
     2    3    4
   
   OWNER
   ------------------------------
   SEGMENT_NAME
   --------------------------------------------------------------------------------
   SEGMENT_TYPE
   ------------------
   CZ
   CZ_DEVL_PROJECTS
   TABLE

But the data was still readable and I was able to move the table to “map the corrupt block” outside of the segment boundaries:

   SQL> alter table CZ.CZ_DEVL_PROJECTS move;
   
   Table altered.
   
   SQL> select count(*) from CZ.CZ_DEVL_PROJECTS;
   
     COUNT(*)
   ----------
          312

Still, even after the move the block was still reported corrupt by the dbv and I had to deploy a procedure from the ML-Note: 336133.1. Which I fully documented in here –> How to Format Corrupted Block Not Part of Any Segment.

May 6, 2009

Posted In: Operations

Tags: , ,

What I Need From Support

You’ve been there haven’t you? You buy a product, use it, have an issue, cringe and call support. What comes next either breaks or makes your future relations with the company and the product they sold you. It’s at this point that you find out if they are your partner or you are on your own. With consumer products my expectation for a good one is to never break — I don’t ever want to have to call their support. But when it comes to the toys we use in data centers I want to hit all of the big issues during QA and I want it to break down so badly that it will stump the technical support team so we get a chance to test their infrastructure and protocols.

That’s exactly what happened to one of the key products we selected for our latest large scale implementation. It’s irrelevant to this article what that product is, what’s relevant is that the support organization experience was disappointing. I am not going to focus on what went wrong — it really isn’t that important, what’s important is that we got the message across and it appears we’ll have a meeting with the people that might be able to make a difference. And I really hope that they do — I believe this company has potential.

I was thinking about the upcoming meeting and what to address and here’s what I realized. It’s actually very simple — here’s my list of things I Need From a Support Organization of a technology company:

  1. Community
  2. Communication
  3. Response

Community

Community is where you go first if you have a non urgent question. Community needs to consist of three components:

  1. Knowledge Base
  2. Bug Database
  3. Forums

Knowledge Base needs to offer an up to date list of known issues and white papers that bridge the gap between official documentation and what customers are actually going through while doing hands on implementations. Bug Database is where you get a chance to feel the pulse of the company — it’s an integral part of the community and it doesn’t have to be an open Bug database that we see in the open source communities, no, just a read-only access where you get to see what kinds of issues development is working on these days. Through Forums you share experiences with other customers and learn best practices by picking the brains of the community gurus. Forums is what connects you with the rest of the customer base and gives you a chance to see the bigger picture, it also shows that the company is open about it’s business and is not afraid of public scrutiny.

Communication

Communication is one of the most critical aspects of a support organization. It needs to flow both ways — customer to support and support to the customer. Support needs to drive Communication, they need to deliver instant updates to your issue and they need to squeeze the last drop of information from you because even the smallest piece of information can play a huge role in how effective a solution will be or how it will help other customers that might be facing similar symptoms but have not yet identified the cause of the problem. Communication is the only thing that allows a good support organization to create strong Knowledge Base because it’s the only way to gage what you are really experiencing in the field.

For example a failure of their product might have an adverse affect on other products within your technology stack so it’s imperative for support to ask you for the error messages that you might have seen with the rest of the components so that these symptoms can be properly documented and published in the knowledge base as a white paper or an alert.

Response

Response is the most critical aspect of the support organization. The worst thing you as a customer can experience is when your requests are being ignored or lost in the queue. You need to know as soon as possible that someone is looking into the issue and it better NOT be an boilerplate response from an automated system. Response needs to include action plan — it’s not enough to simply say “We are working on it” — it should provide an action plan even if it involves you — the customer. Response goes hand in hand with communication and it needs to be recorded into a system that you can login to view history of events. Just like Communication, Response goes both ways and support needs to drive it — if a customer is delaying response to a request for information support needs to follow up immediately.

And there you have it — Community, Communication, Response = effective support organization.

August 14, 2008

Posted In: Operations

Tags:

No more SUN

After running ORACLE on SUN for over 7 years and being the big proponent of it I have finally said enough is enough. Starting this summer our shop made a switch to Linux (HP) and so far I could not be happier. I just don’t get it with SUN anymore they keep digging a deeper whole by separating themselves further and further from ORACLE. And with ORACLE now offering Linux support I just don’t see why would someone not yet invested in SUN hardware would want to choose them over Linux.

Overpriced SUN Hardware is replaced by powerful and reasonably priced HP Boxes
Overpriced SUN Hardware is replaced by powerful and reasonably priced HP Boxes (Front)
Overpriced SUN Hardware is replaced by powerful and reasonably priced HP Boxes (Back)
Overpriced SUN Hardware is replaced by powerful and reasonably priced HP Boxes (Back)

August 7, 2008

Posted In: Linux

Tags:

Create Oracle 10gR2 standby database using Real Time Apply

  1. Configuration Details
  2. Configure Primary Database
  3. Create PHYSICAL standby
  4. Here’s how Real Time Apply works
  5. References

Configuration Details

PRIMARY site

hostname=zephir
ORACLE_SID=10GR2
DB_UNIQUE_NAME=leftcube

------------- $HOME/.profile ----------------
export ORACLE_BASE=/u01/app/oracle
export ORACLE_HOME=/u01/app/oracle/product/10.2.0/db_1
export ORACLE_SID=10GR2
export ORACLE_HOSTNAME=zephir.domain.com
umask 022
##export DISPLAY="zephir:0.0"
export PATH=$PATH:$ORACLE_HOME/bin:/usr/ccs/bin:/usr/bin:/etc:/usr/openwin/bin
PS1="`/usr/ucb/hostname`.$ORACLE_SID-> "
set -o vi
set history=50
###################################################################
# Oracle Alias' For Trace Files and Alert Files
###################################################################
alias   -x  pfile="cd /u01/app/oracle/admin/10GR2/pfile"
alias   -x  bdump="cd /u01/app/oracle/admin/10GR2/bdump"
alias   -x  cdump="cd /u01/app/oracle/admin/10GR2/cdump"
alias   -x  udump="cd /u01/app/oracle/admin/10GR2/udump"
###################################################################

STANDBY site


hostname=atlas
ORACLE_SID=10GR2
DB_UNIQUE_NAME=rightcube

------------- $HOME/.profile ----------------
export ORACLE_BASE=/u01/app/oracle
export ORACLE_HOME=/u01/app/oracle/product/10.2.0/db_1
export ORACLE_SID=10GR2
export ORACLE_HOSTNAME=atlas.domain.com
umask 022
##export DISPLAY="atlas:0.0"
export PATH=$PATH:$ORACLE_HOME/bin:/usr/ccs/bin:/usr/bin:/etc:/usr/openwin/bin
PS1="`/usr/ucb/hostname`.$ORACLE_SID-> "
set -o vi
set history=50
###################################################################
# Oracle Alias' For Trace Files and Alert Files
###################################################################
alias   -x  pfile="cd /u01/app/oracle/admin/10GR2/pfile"
alias   -x  bdump="cd /u01/app/oracle/admin/10GR2/bdump"
alias   -x  cdump="cd /u01/app/oracle/admin/10GR2/cdump"
alias   -x  udump="cd /u01/app/oracle/admin/10GR2/udump"
###################################################################

Configure Primary Database

Create a Password File (or verify you have one)


ls -l $ORACLE_HOME/dbs/orapw$ORACLE_SID
-rw-r-----   1 oracle   dba         1536 Oct  1 15:00 /u01/app/oracle/product/10.2.0/db_1/dbs/orapw10GR2

Enable Forced Logging


alter database force logging;

Configure Standby Redo Logs

NOTE: A standby redo log is required for the maximum protection and maximum availability modes and the LGWR ASYNC transport mode is recommended for all databases. Data Guard can recover and apply more redo data from a standby redo log than from archived redo log files alone.

The size of the current standby redo log files must exactly match the size of the current primary database online redo log files.

Minimally, the configuration should have one more standby redo log file group than the number of online redo log file groups on the primary database.

the recommended number of standby redo log file groups is:


   (maximum number of logfiles for each thread + 1) * maximum number of threads

For example, if the primary database has 2 log files for each thread and 2 threads, then 6 standby redo log file groups are needed on the standby database.

Although the standby redo log is only used when the database is running in the standby role, Oracle recommends that you create a standby redo log on the primary database so that the primary database can switch over quickly to the standby role without the need for additional DBA intervention.


SQL> col member format a35
SQL> set lines 132
SQL> set trims on
SQL>
SQL> r
  1* select group#,type,member from v$logfile

    GROUP# TYPE    MEMBER
---------- ------- -----------------------------------
         1 ONLINE  /u01/oradata/10GR2/redo01.log
         1 ONLINE  /u02/oradata/10GR2/redo01.log
         1 ONLINE  /u03/oradata/10GR2/redo01.log
         2 ONLINE  /u01/oradata/10GR2/redo02.log
         2 ONLINE  /u02/oradata/10GR2/redo02.log
         2 ONLINE  /u03/oradata/10GR2/redo02.log
         3 ONLINE  /u01/oradata/10GR2/redo03.log
         3 ONLINE  /u02/oradata/10GR2/redo03.log
         3 ONLINE  /u03/oradata/10GR2/redo03.log

9 rows selected.

SQL> select group#,THREAD#,BYTES,status from v$log;

    GROUP#    THREAD#      BYTES STATUS
---------- ---------- ---------- ----------------
         1          1   52428800 INACTIVE
         2          1   52428800 INACTIVE
         3          1   52428800 CURRENT

based on the above information create 3+1 = 4 new standby groups each with 3 member – you could do 2 members or just 1 it’s up to you the main thing is that there needs to be 4 standby groups, multiple members simply add redundancy to the setup they do not increase performance.

Note that I name each group member the same this underscores that these files are multiplexed and are identical


alter database add standby logfile group 4
   ('/u01/oradata/10GR2/stby04.log',
    '/u02/oradata/10GR2/stby04.log',
    '/u03/oradata/10GR2/stby04.log')
size 52428800;

alter database add standby logfile group 5
   ('/u01/oradata/10GR2/stby05.log',
    '/u02/oradata/10GR2/stby05.log',
    '/u03/oradata/10GR2/stby05.log')
size 52428800;

alter database add standby logfile group 6
   ('/u01/oradata/10GR2/stby06.log',
    '/u02/oradata/10GR2/stby06.log',
    '/u03/oradata/10GR2/stby06.log')
size 52428800;

alter database add standby logfile group 7
   ('/u01/oradata/10GR2/stby07.log',
    '/u02/oradata/10GR2/stby07.log',
    '/u03/oradata/10GR2/stby07.log')
size 52428800;

Verify the standby redo log file groups were created


SQL> select group#,type,member from v$logfile;

    GROUP# TYPE    MEMBER
---------- ------- -----------------------------------
         1 ONLINE  /u01/oradata/10GR2/redo01.log
         1 ONLINE  /u02/oradata/10GR2/redo01.log
         1 ONLINE  /u03/oradata/10GR2/redo01.log
         2 ONLINE  /u01/oradata/10GR2/redo02.log
         2 ONLINE  /u02/oradata/10GR2/redo02.log
         2 ONLINE  /u03/oradata/10GR2/redo02.log
         3 ONLINE  /u01/oradata/10GR2/redo03.log
         3 ONLINE  /u02/oradata/10GR2/redo03.log
         3 ONLINE  /u03/oradata/10GR2/redo03.log
         4 STANDBY /u01/oradata/10GR2/stby04.log
         4 STANDBY /u02/oradata/10GR2/stby04.log

    GROUP# TYPE    MEMBER
---------- ------- -----------------------------------
         4 STANDBY /u03/oradata/10GR2/stby04.log
         5 STANDBY /u01/oradata/10GR2/stby05.log
         5 STANDBY /u02/oradata/10GR2/stby05.log
         5 STANDBY /u03/oradata/10GR2/stby05.log
         6 STANDBY /u01/oradata/10GR2/stby06.log
         6 STANDBY /u02/oradata/10GR2/stby06.log
         6 STANDBY /u03/oradata/10GR2/stby06.log
         7 STANDBY /u01/oradata/10GR2/stby07.log
         7 STANDBY /u02/oradata/10GR2/stby07.log
         7 STANDBY /u03/oradata/10GR2/stby07.log

21 rows selected.

SQL> select group#,thread#,sequence#,archived,status from v$standby_log;

    GROUP#    THREAD#  SEQUENCE# ARC STATUS
---------- ---------- ---------- --- ----------
         4          0          0 YES UNASSIGNED
         5          0          0 YES UNASSIGNED
         6          0          0 YES UNASSIGNED
         7          0          0 YES UNASSIGNED

SQL>

Setup Archive Log

NOTE: this is not the right setup for DataGuard but it’s a typical setup on our production systems so I am simulating it here which will be changed in the next step


zephir.10GR2-> mkdir /u02/oradata/10GR2/arch
zephir.10GR2-> sqlplus /nolog

SQL> connect / as sysdba
Connected.

SQL> alter database close;

Database altered.

SQL> show parameter spfile

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
spfile                               string      /u01/app/oracle/product/10.2.0
                                                 /db_1/dbs/spfile10GR2.ora

SQL> alter system set log_archive_dest_1='LOCATION=/u02/oradata/10GR2/arch' scope=both;

System altered.

SQL> alter database archivelog;

Database altered.

SQL> shutdown immediate;

ORACLE instance shut down.

SQL> startup
ORACLE instance started.

Total System Global Area  327155712 bytes
Fixed Size                  2029880 bytes
Variable Size             171968200 bytes
Database Buffers          150994944 bytes
Redo Buffers                2162688 bytes
Database mounted.
Database opened.
SQL>

SQL> archive log list
Database log mode              Archive Mode
Automatic archival             Enabled
Archive destination            /u02/oradata/10GR2/arch
Oldest online log sequence     28
Next log sequence to archive   30
Current log sequence           30
SQL>

SQL> !ls -l /u02/oradata/10GR2/arch
total 0

SQL> alter system switch logfile;

System altered.

SQL> r
  1* alter system switch logfile

System altered.

SQL> r
  1* alter system switch logfile

System altered.

SQL> !ls -l /u02/oradata/10GR2/arch
total 64826
-rw-r-----   1 oracle   dba      33161216 Oct  5 14:08 1_30_634834860.dbf
-rw-r-----   1 oracle   dba         1024 Oct  5 14:08 1_31_634834860.dbf
-rw-r-----   1 oracle   dba         3584 Oct  5 14:08 1_32_634834860.dbf

SQL>

Set Primary Database Initialization Parameters

in this case there are two boxes both are in the same cubicle. Since DB_UNIQUE_NAME is normally tied to a physical location which removes any confusion when database’s roles are switched I am setting DB_UNIQUE_NAME to:

leftcube AND rightcube

at this moment leftcube’s role is PRIMARY and rightcube’s role is STANDBY

DB_NAME on both will be 10GR2

TWO tns-aliases will be setup for each of these names

Note that the example specifies the LGWR process and asynchronous (ASYNC) network transmission to transmit redo data on the LOG_ARCHIVE_DEST_2 initialization parameter. These are the recommended settings and require standby redo log files


alter system set db_unique_name=leftcube scope=spfile;

alter system set log_archive_config='DG_CONFIG=(leftcube,rightcube)' scope=spfile;

alter system set log_archive_dest_1=
   'LOCATION=/u02/oradata/10GR2/arch
    VALID_FOR=(all_logfiles,all_roles)
    DB_UNIQUE_NAME=leftcube'
scope=spfile;

alter system set log_archive_dest_2=
   'SERVICE=rightcube LGWR ASYNC
    VALID_FOR=(online_logfiles,primary_role)
    DB_UNIQUE_NAME=rightcube'
scope=spfile;

alter system set log_archive_dest_state_1=ENABLE scope=spfile;
alter system set log_archive_dest_state_2=ENABLE scope=spfile;

NOTE: Specifying the initialization parameters shown in Example 3–4 sets up the primary database to resolve gaps, converts new datafile and log file path names from a new primary database, and archives the incoming redo data when this database is in the standby role. With the initialization parameters for both the primary and standby roles set as described, none of the parameters need to change after a role transition.

I am not using DB_FILE_NAME_CONVERT because I always try to keep directory structure exactly the same


alter system set fal_server=rightcube scope=spfile;
alter system set fal_client=leftcube scope=spfile;
alter system set standby_file_management=AUTO scope=spfile;

## this one is just a DUMMY setup to avoid ORA-19527 see:
##    http://kb.dbatoolz.com/tp/2692.avoid_ora-19527_set_dummy_log_file_name_convert.html
##
alter system set log_file_name_convert='junk','junk' scope=spfile;

shutdown immediate;
startup;

AFTER you enable these setting and until you have setup the actual standby database you will be receiving the following errors in the alert log:


ORA-12154: TNS:could not resolve the connect identifier specified
LGWR: Error 12154 creating archivelog file 'rightcube'
LNS: Failed to archive log 2 thread 1 sequence 35 (12154)

this is due to LGWR not being able to send REDO over to rightcube IT’S OK since log_archive_dest_2 has a flag of ASYNC which is implicitly makes this destination OPTIONAL

Take A Hotbackup Of Primary Database

I used a custom backup script


zephir.10GR2-> pwd
/u01/app/oracle/admin/10GR2/backup

zephir.10GR2-> grep "Header" hotbk.sh
# $Header$ hotbk.sh 05-OCT-2007 2.1

zephir.10GR2-> grep bkdir= hotbk.sh
bkdir=/copy/db_backup/10GR2/dbfs

zephir.10GR2-> mkdir -p /copy/db_backup/10GR2/dbfs

zephir.10GR2-> ./hotbk.sh
zephir.10GR2-> Fri Oct  5 16:48:03 PDT 2007:    getting datafile list ...
Fri Oct  5 16:48:04 PDT 2007:   building backup script ...
mv: cannot access /copy/db_backup/10GR2/dbfs/recover.sh
Fri Oct  5 16:48:04 PDT 2007:   running backup script (logfile=/tmp/13363.hotbk.sh.log) ...

zephir.10GR2-> ls -l /copy/db_backup/10GR2/dbfs/
total 1831154
-rw-r-----   1 oracle   dba      14270464 Oct  5 16:48 2007_10_05.backup.ctl
-rw-r--r--   1 oracle   dba          212 Oct  5 16:48 recover.sh
-rw-r-----   1 oracle   dba      230694912 Oct  5 16:48 sysaux01.dbf
-rw-r-----   1 oracle   dba      471867392 Oct  5 16:48 system01.dbf
-rw-r-----   1 oracle   dba      214966272 Oct  5 16:48 undotbs01.dbf
-rw-r-----   1 oracle   dba      5251072 Oct  5 16:48 users01.dbf
zephir.10GR2-> cat /copy/db_backup/10GR2/dbfs/recover.sh
cp -p ./sysaux01.dbf /u03/oradata/10GR2/sysaux01.dbf
cp -p ./system01.dbf /u02/oradata/10GR2/system01.dbf
cp -p ./undotbs01.dbf /u03/oradata/10GR2/undotbs01.dbf
cp -p ./users01.dbf /u02/oradata/10GR2/users01.dbf
zephir.10GR2->

Create a Control File for the Standby Database


alter database create standby controlfile as
   '/copy/db_backup/10GR2/dbfs/stby_control.ctl';

alter system archive log current;

Prepare Initialization Parameter File for the Standby Database


## run this on PRIMARY
create pfile='/copy/db_backup/10GR2/dbfs/init10GR2.ora' from spfile;

## Set initialization parameters on the physical standby database
##
cp -p /copy/db_backup/10GR2/dbfs/init10GR2.ora \
      /copy/db_backup/10GR2/dbfs/init10GR2_stby.ora

vi /copy/db_backup/10GR2/dbfs/init10GR2_stby.ora
--------- MODIFY AS FOLLOWS --------
*.db_unique_name='RIGHTCUBE'

*.fal_client='RIGHTCUBE'
*.fal_server='LEFTCUBE'

*.log_archive_dest_1='LOCATION=/u02/oradata/10GR2/arch
    VALID_FOR=(all_logfiles,all_roles)
    DB_UNIQUE_NAME=rightcube'

*.log_archive_dest_2='SERVICE=leftcube LGWR ASYNC
    VALID_FOR=(online_logfiles,primary_role)
    DB_UNIQUE_NAME=leftcube'

zephir.10GR2-> diff /copy/db_backup/10GR2/dbfs/init10GR2.ora \
>                   /copy/db_backup/10GR2/dbfs/init10GR2_stby.ora
15c15
< *.db_unique_name='LEFTCUBE'
---
> *.db_unique_name='RIGHTCUBE'
17,18c17,18
< *.fal_client='LEFTCUBE'
< *.fal_server='RIGHTCUBE'
---
> *.fal_client='RIGHTCUBE'
> *.fal_server='LEFTCUBE'
23,25d22
<     DB_UNIQUE_NAME=leftcube'
< *.log_archive_dest_2='SERVICE=rightcube LGWR ASYNC
<     VALID_FOR=(online_logfiles,primary_role)
26a24,26
> *.log_archive_dest_2='SERVICE=leftcube LGWR ASYNC
>     VALID_FOR=(online_logfiles,primary_role)
>     DB_UNIQUE_NAME=leftcube'

Copy Files from the Primary System to the Standby System

NOTE: I used rdist you can also use FTP or any other method


## on standby verify primary host is added to .rhosts
atlas.10GR2-> grep zephir $HOME/.rhosts
zephir.domain.com oracle

## rdist all files over to standby host
zephir.10GR2-> rsh -l oracle -n atlas "mkdir -p /copy/db_backup/10GR2/dbfs"

zephir.10GR2-> rdist -c /copy/db_backup/10GR2/dbfs/* oracle@atlas:/copy/db_backup/10GR2/dbfs/
updating host atlas
installing: /copy/db_backup/10GR2/dbfs/2007_10_05.backup.ctl
installing: /copy/db_backup/10GR2/dbfs/init10GR2.ora
installing: /copy/db_backup/10GR2/dbfs/init10GR2_stby.ora
installing: /copy/db_backup/10GR2/dbfs/recover.sh
installing: /copy/db_backup/10GR2/dbfs/stby_control.ctl
installing: /copy/db_backup/10GR2/dbfs/sysaux01.dbf
installing: /copy/db_backup/10GR2/dbfs/system01.dbf
installing: /copy/db_backup/10GR2/dbfs/undotbs01.dbf
installing: /copy/db_backup/10GR2/dbfs/users01.dbf
zephir.10GR2->

## rdist archive logs
zephir.10GR2-> rsh -l oracle -n atlas "mkdir -p /u02/oradata/10GR2/arch"

zephir.10GR2-> rdist -c /u02/oradata/10GR2/arch/* oracle@atlas:/u02/oradata/10GR2/arch/
updating host atlas
installing: /u02/oradata/10GR2/arch/1_30_634834860.dbf
installing: /u02/oradata/10GR2/arch/1_31_634834860.dbf
installing: /u02/oradata/10GR2/arch/1_32_634834860.dbf
installing: /u02/oradata/10GR2/arch/1_33_634834860.dbf
installing: /u02/oradata/10GR2/arch/1_34_634834860.dbf
installing: /u02/oradata/10GR2/arch/1_35_634834860.dbf
installing: /u02/oradata/10GR2/arch/1_36_634834860.dbf

## copy password file
zephir.10GR2-> cd /u01/app/oracle/product/10.2.0/db_1/dbs/

zephir.10GR2-> rdist -c orapw10GR2 oracle@atlas:/u01/app/oracle/product/10.2.0/db_1/dbs/orapw10GR2
updating host atlas
installing: orapw10GR2

Configure listeners for the primary and standby databases.


## primary (leftcube)
zephir.10GR2-> cat listener.ora

SID_LIST_LISTENER =
  (SID_LIST =
    (SID_DESC =
      (GLOBAL_DBNAME = 10GR2.DOMAIN.COM)
      (ORACLE_HOME = /u01/app/oracle/product/10.2.0/db_1)
      (SID_NAME = 10GR2)
    )
  )

LISTENER =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = zephir.domain.com)(PORT = 1521))
  )


zephir.10GR2-> cat tnsnames.ora
rightcube =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = TCP)(HOST = atlas.domain.com)(PORT = 1521))
    )
    (CONNECT_DATA =
      (SID = 10GR2)
    )
  )

10GR2 =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = TCP)(HOST = zephir.domain.com)(PORT = 1521))
    )
    (CONNECT_DATA =
      (SERVICE_NAME = 10GR2.domain.com)
    )
  )




## standby (rightcube)

atlas.10GR2-> cat listener.ora

SID_LIST_LISTENER =
  (SID_LIST =
    (SID_DESC =
      (GLOBAL_DBNAME = 10GR2.DOMAIN.COM)
      (ORACLE_HOME = /u01/app/oracle/product/10.2.0/db_1)
      (SID_NAME = 10GR2)
    )
  )

LISTENER =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = atlas.domain.com)(PORT = 1521))
  )


atlas.10GR2-> cat tnsnames.ora
leftcube =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = TCP)(HOST = zephir.domain.com)(PORT = 1521))
    )
    (CONNECT_DATA =
      (SID = 10GR2)
    )
  )

10GR2 =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = TCP)(HOST = atlas.domain.com)(PORT = 1521))
    )
    (CONNECT_DATA =
      (SERVICE_NAME = 10GR2.domain.com)
    )
  )

Startup Listener Services On Both Standby And Primary


zephir.10GR2-> lsnrctl start
atlas.10GR2-> lsnrctl start

# test TNS aliases

zephir.10GR2-> tnsping rightcube

TNS Ping Utility for Solaris: Version 10.2.0.3.0 - Production on 05-OCT-2007 17:44:43

Copyright (c) 1997, 2006, Oracle.  All rights reserved.

Used parameter files:


Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL =
TCP)(HOST = atlas.domain.com)(PORT = 1521))) (CONNECT_DATA = (SID = 10GR2)))
OK (0 msec)


atlas.10GR2-> tnsping leftcube

TNS Ping Utility for Solaris: Version 10.2.0.3.0 - Production on 05-OCT-2007 17:45:09

Copyright (c) 1997, 2006, Oracle.  All rights reserved.

Used parameter files:


Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL =
TCP)(HOST = zephir.domain.com)(PORT = 1521))) (CONNECT_DATA = (SID = 10GR2)))
OK (0 msec)

Create PHYSICAL standby

Create server parameter file on the standby database.


atlas.10GR2-> ls -l /copy/db_backup/10GR2/dbfs/
total 1859050
-rw-r-----   1 oracle   dba      14270464 Oct  5 16:48 2007_10_05.backup.ctl
-rw-r--r--   1 oracle   dba         1391 Oct  5 16:56 init10GR2.ora
-rw-r--r--   1 oracle   dba         1391 Oct  5 17:06 init10GR2_stby.ora
-rw-r--r--   1 oracle   dba          212 Oct  5 16:48 recover.sh
-rw-r-----   1 oracle   dba      14270464 Oct  5 16:55 stby_control.ctl
-rw-r-----   1 oracle   dba      230694912 Oct  5 16:48 sysaux01.dbf
-rw-r-----   1 oracle   dba      471867392 Oct  5 16:48 system01.dbf
-rw-r-----   1 oracle   dba      214966272 Oct  5 16:48 undotbs01.dbf
-rw-r-----   1 oracle   dba      5251072 Oct  5 16:48 users01.dbf
atlas.10GR2->


atlas.10GR2-> sqlplus /nolog

SQL*Plus: Release 10.2.0.3.0 - Production on Fri Oct 5 17:46:49 2007

Copyright (c) 1982, 2006, Oracle.  All Rights Reserved.

SQL> connect / as sysdba
Connected to an idle instance.
SQL>
SQL>
SQL> !ls -l /u01/app/oracle/product/10.2.0/db_1/dbs
total 48
-rw-r-----   1 oracle   dba         8385 Sep 11  1998 init.ora
-rw-r--r--   1 oracle   dba        12920 May  3  2001 initdw.ora
-rw-r-----   1 oracle   dba         1536 Oct  1 15:00 orapw10GR2

SQL> create spfile from pfile='/copy/db_backup/10GR2/dbfs/init10GR2_stby.ora';

File created.

SQL> !ls -l /u01/app/oracle/product/10.2.0/db_1/dbs
total 56
-rw-r-----   1 oracle   dba         8385 Sep 11  1998 init.ora
-rw-r--r--   1 oracle   dba        12920 May  3  2001 initdw.ora
-rw-r-----   1 oracle   dba         1536 Oct  1 15:00 orapw10GR2
-rw-r-----   1 oracle   dba         3584 Oct  5 17:48 spfile10GR2.ora

Copy Standby Controlfile To It’s Proper Location


SQL> !grep control /u01/app/oracle/product/10.2.0/db_1/dbs/spfile10GR2.ora
*.control_files='/u01/oradata/10GR2/control01.ctl',
   '/u02/oradata/10GR2/control02.ctl','/u03/oradata/10GR2/control03.ctl'

SQL> !cp -p /copy/db_backup/10GR2/dbfs/stby_control.ctl /u01/oradata/10GR2/control01.ctl

SQL> !cp -p /copy/db_backup/10GR2/dbfs/stby_control.ctl /u02/oradata/10GR2/control02.ctl

SQL> !cp -p /copy/db_backup/10GR2/dbfs/stby_control.ctl /u03/oradata/10GR2/control03.ctl

Copy All Datafiles To Their Locations


SQL> !cat /copy/db_backup/10GR2/dbfs/recover.sh
cp -p ./sysaux01.dbf /u03/oradata/10GR2/sysaux01.dbf
cp -p ./system01.dbf /u02/oradata/10GR2/system01.dbf
cp -p ./undotbs01.dbf /u03/oradata/10GR2/undotbs01.dbf
cp -p ./users01.dbf /u02/oradata/10GR2/users01.dbf


SQL> !cd /copy/db_backup/10GR2/dbfs/; sh ./recover.sh

SQL> !find /*/oradata/10GR2 -type f -ls
95290 13944 -rw-r-----   1 oracle   dba      14270464 Oct  5 18:01 /u01/oradata/10GR2/control01.ctl
 5829 32408 -rw-r-----   1 oracle   dba      33161216 Oct  5 14:08 /u02/oradata/10GR2/arch/1_30_634834860.dbf
 5830    1 -rw-r-----   1 oracle   dba          1024 Oct  5 14:08 /u02/oradata/10GR2/arch/1_31_634834860.dbf
 5831    4 -rw-r-----   1 oracle   dba          3584 Oct  5 14:08 /u02/oradata/10GR2/arch/1_32_634834860.dbf
 5832  192 -rw-r-----   1 oracle   dba        187904 Oct  5 14:51 /u02/oradata/10GR2/arch/1_33_634834860.dbf
 5833  144 -rw-r-----   1 oracle   dba        134656 Oct  5 15:01 /u02/oradata/10GR2/arch/1_34_634834860.dbf
 5834 1648 -rw-r-----   1 oracle   dba       1678336 Oct  5 16:48 /u02/oradata/10GR2/arch/1_35_634834860.dbf
 5835   12 -rw-r-----   1 oracle   dba         11776 Oct  5 16:55 /u02/oradata/10GR2/arch/1_36_634834860.dbf
 5836 13944 -rw-r-----   1 oracle   dba      14270464 Oct  5 18:01 /u02/oradata/10GR2/control02.ctl
 5837 461048 -rw-r-----   1 oracle   dba      471867392 Oct  5 16:48 /u02/oradata/10GR2/system01.dbf
 5838 5136 -rw-r-----   1 oracle   dba       5251072 Oct  5 16:48 /u02/oradata/10GR2/users01.dbf
 5905 13944 -rw-r-----   1 oracle   dba      14270464 Oct  5 18:01 /u03/oradata/10GR2/control03.ctl
 5906 225408 -rw-r-----   1 oracle   dba      230694912 Oct  5 16:48 /u03/oradata/10GR2/sysaux01.dbf
 5907 210040 -rw-r-----   1 oracle   dba      214966272 Oct  5 16:48 /u03/oradata/10GR2/undotbs01.dbf

Start The Physical Standby Database


SQL> startup mount;
ORACLE instance started.

Total System Global Area  327155712 bytes
Fixed Size                  2029880 bytes
Variable Size             192939720 bytes
Database Buffers          130023424 bytes
Redo Buffers                2162688 bytes
Database mounted.
SQL>

Create Standby Redo Files

NOTE: this is a pre-req to enable “Real Time Apply”:

  1. this should match exactly what was done on primary database (see above “Configure Standby Redo Logs”)
  2. this is optional in fact the first time around I forgot to create these logs on standby and recovery process was still working it just reports the following error: RFS[1]: Unable to open standby log 5: 313 and uses ARCHIVE logs instead

alter database add standby logfile group 4
   ('/u01/oradata/10GR2/stby04.log',
    '/u02/oradata/10GR2/stby04.log',
    '/u03/oradata/10GR2/stby04.log')
size 52428800;

alter database add standby logfile group 5
   ('/u01/oradata/10GR2/stby05.log',
    '/u02/oradata/10GR2/stby05.log',
    '/u03/oradata/10GR2/stby05.log')
size 52428800;

alter database add standby logfile group 6
   ('/u01/oradata/10GR2/stby06.log',
    '/u02/oradata/10GR2/stby06.log',
    '/u03/oradata/10GR2/stby06.log')
size 52428800;

alter database add standby logfile group 7
   ('/u01/oradata/10GR2/stby07.log',
    '/u02/oradata/10GR2/stby07.log',
    '/u03/oradata/10GR2/stby07.log')
size 52428800;

Start Redo Apply

NOTE: The statement includes the DISCONNECT FROM SESSION option so that Redo Apply runs in a background session. See Section 6.3, “Applying Redo Data to Physical Standby Databases” for more information.

OPTION A is recommended


## OPTION A) [enables "Real Time Apply"]

SQL> alter database recover managed standby database using current logfile disconnect from session;

Database altered.

## OPTION B) ["Real Time Apply" is disabled]
SQL> alter database recover managed standby database disconnect from session;

Database altered.

On Primary: verify ARCHIVE seqs


SQL> alter session set nls_date_format='YYYY-MON-DD HH24:MI:SS';

Session altered.

SQL> select sequence#, first_time, next_time
from v$archived_log order by sequence#;
  2
 SEQUENCE# FIRST_TIME           NEXT_TIME
---------- -------------------- --------------------
        30 2007-OCT-03 19:35:08 2007-OCT-05 14:08:29
        31 2007-OCT-05 14:08:29 2007-OCT-05 14:08:31
        32 2007-OCT-05 14:08:31 2007-OCT-05 14:08:37
        33 2007-OCT-05 14:08:37 2007-OCT-05 14:51:48
        34 2007-OCT-05 14:51:48 2007-OCT-05 15:01:54
        35 2007-OCT-05 15:01:54 2007-OCT-05 16:48:24
        35 2007-OCT-05 15:01:54 2007-OCT-05 16:48:24
        36 2007-OCT-05 16:48:24 2007-OCT-05 16:55:46
        36 2007-OCT-05 16:48:24 2007-OCT-05 16:55:46
        37 2007-OCT-05 16:55:46 2007-OCT-05 18:09:07
        37 2007-OCT-05 16:55:46 2007-OCT-05 18:09:07

 SEQUENCE# FIRST_TIME           NEXT_TIME
---------- -------------------- --------------------
        38 2007-OCT-05 18:09:07 2007-OCT-05 18:11:42
        38 2007-OCT-05 18:09:07 2007-OCT-05 18:11:42

13 rows selected.

SQL>

compare them to standby


SQL> alter session set nls_date_format='YYYY-MON-DD HH24:MI:SS';

Session altered.

SQL> select sequence#, first_time, next_time
from v$archived_log order by sequence#;  2

 SEQUENCE# FIRST_TIME           NEXT_TIME
---------- -------------------- --------------------
        35 2007-OCT-05 15:01:54 2007-OCT-05 16:48:24
        36 2007-OCT-05 16:48:24 2007-OCT-05 16:55:46
        37 2007-OCT-05 16:55:46 2007-OCT-05 18:09:07
        38 2007-OCT-05 18:09:07 2007-OCT-05 18:11:42

SQL>

Verify new archived redo log files were applied


SQL> select sequence#,applied from v$archived_log order by sequence#;

 SEQUENCE# APP
---------- ---
        35 YES
        36 YES
        37 YES
        38 YES

SQL>

SQL> archive log list
Database log mode              Archive Mode
Automatic archival             Enabled
Archive destination            /u02/oradata/10GR2/arch
Oldest online log sequence     34
Next log sequence to archive   0
Current log sequence           39
SQL> !ls -lta /u02/oradata/10GR2/arch
total 71926
-rw-r-----   1 oracle   dba      52429312 Oct  5 18:17 1_39_634834860.dbf
-rw-r-----   1 oracle   dba      1379328 Oct  5 18:11 1_37_634834860.dbf
drwxr-xr-x   2 oracle   dba          512 Oct  5 18:11 .
-rw-r-----   1 oracle   dba        11776 Oct  5 18:11 1_36_634834860.dbf
-rw-r-----   1 oracle   dba      1678336 Oct  5 18:11 1_35_634834860.dbf
-rw-r-----   1 oracle   dba       113152 Oct  5 18:11 1_38_634834860.dbf
drwxr-xr-x   3 oracle   dba          512 Oct  5 18:03 ..
-rw-r-----   1 oracle   dba       134656 Oct  5 15:01 1_34_634834860.dbf
-rw-r-----   1 oracle   dba       187904 Oct  5 14:51 1_33_634834860.dbf
-rw-r-----   1 oracle   dba         3584 Oct  5 14:08 1_32_634834860.dbf
-rw-r-----   1 oracle   dba      33161216 Oct  5 14:08 1_30_634834860.dbf
-rw-r-----   1 oracle   dba         1024 Oct  5 14:08 1_31_634834860.dbf

SQL>

Here’s how Real Time Apply works

Test 1) – switch a log on PRIMARY

here’s what you’ll see in the primary alert log:


LNS1 started with pid=18, OS id=13637
Fri Oct  5 18:48:38 2007
Thread 1 advanced to log sequence 41
 Current log# 2 seq# 41 mem# 0: /u01/oradata/10GR2/redo02.log
 Current log# 2 seq# 41 mem# 1: /u02/oradata/10GR2/redo02.log
 Current log# 2 seq# 41 mem# 2: /u03/oradata/10GR2/redo02.log
Fri Oct  5 18:48:39 2007
ARC0: Standby redo logfile selected for thread 1 sequence 40 for destination LOG_ARCHIVE_DEST_2
Fri Oct  5 18:48:39 2007
LNS: Standby redo logfile selected for thread 1 sequence 41 for destination LOG_ARCHIVE_DEST_2

here’s what’s reported in the STANDBY alert log:


Fri Oct  5 18:48:38 2007
RFS[1]: Successfully opened standby log 4: '/u01/oradata/10GR2/stby04.log'
Fri Oct  5 18:48:39 2007
Redo Shipping Client Connected as PUBLIC
-- Connected User is Valid
RFS[2]: Assigned to RFS process 3909
RFS[2]: Identified database type as 'physical standby'
Primary database is in MAXIMUM PERFORMANCE mode
Primary database is in MAXIMUM PERFORMANCE mode
RFS[2]: Successfully opened standby log 4: '/u01/oradata/10GR2/stby04.log'
Fri Oct  5 18:48:40 2007
Media Recovery Log /u02/oradata/10GR2/arch/1_40_634834860.dbf
Media Recovery Waiting for thread 1 sequence 41 (in transit)
Fri Oct  5 18:48:40 2007
Recovery of Online Redo Log: Thread 1 Group 4 Seq 41 Reading mem 0
 Mem# 0: /u01/oradata/10GR2/stby04.log
 Mem# 1: /u02/oradata/10GR2/stby04.log
 Mem# 2: /u03/oradata/10GR2/stby04.log

Test 2) – WITHOUT SWITCHING A LOG

On STANDBY run the following SQL:


SQL> select process, status,sequence#,block#,blocks, delay_mins from v$managed_standby;

PROCESS   STATUS        SEQUENCE#     BLOCK#     BLOCKS DELAY_MINS
--------- ------------ ---------- ---------- ---------- ----------
ARCH      CLOSING              40          1         16          0
ARCH      CLOSING              41          1       1611          0
MRP0      APPLYING_LOG         42         42     102400          0
RFS       IDLE                  0          0          0          0
RFS       IDLE                 42         45          1          0

SQL>

On PRIMARY create a test table


SQL> create table test(t number);

Table created.

SQL> insert into test values (1);

1 row created.

SQL> insert into test values (2);

1 row created.

SQL> insert into test values (3);

1 row created.

SQL> commit;

Commit complete.

SQL>

*** DO NOT SWITCH A LOG ***

On STANDBY run the same SQL (note BLOCK# increasing):


SQL> r
 1* select process, status,sequence#,block#,blocks, delay_mins from v$managed_standby

PROCESS   STATUS        SEQUENCE#     BLOCK#     BLOCKS DELAY_MINS
--------- ------------ ---------- ---------- ---------- ----------
ARCH      CLOSING              40          1         16          0
ARCH      CLOSING              41          1       1611          0
MRP0      APPLYING_LOG         42         76     102400          0
RFS       IDLE                  0          0          0          0
RFS       IDLE                 42         78          1          0

SQL>

Cancel out of real time apply and open it in read only mode:


SQL> alter database recover managed standby database cancel;

Database altered.

SQL> alter database open read only;

Database altered.

SQL> select * from test;

        T
----------
        1
        2
        3

Place the standby back in managed recovery mode This will take the standby directly from read only mode and place it in managed recovery mode


   SQL> alter database recover managed standby database using current logfile disconnect;

   Database altered.

   SQL>

References

  • http://download.oracle.com/docs/cd/B19306_01/server.102/b14239.pdf
  • http://www.oracle.com/technetwork/database/features/availability/maa-wp-10grecoverybestpractices-129577.pdf
  • https://www.scribd.com/document/252240161/MAA-WP-10gR2-SwitchoverFailoverBestPractices-pdf

July 7, 2007

Posted In: Data Guard

Tags: ,