DBAToolZ

Lightweight Oracle DBA Tools

 

Notes from the 11g Upgrade & Best Practices hosted by Tom Kyte

I finally got out of the dungeon and made it to an Oracle hosted event titled “Oracle Technology Day - 11g Upgrade & Best Practices” which was hosted by Tom Kyte and Oracle Center of Excellence (COE) folks in Los Angeles on June 16 ‘09. And it was definitely worth it. I got a chance to finally meet Tom Kyte face to face and I even had my original, WROX “Expert One-One-One” signed by him. The place was packed and I even heard that the event organizers had to turn some people down because there was no room left.

The presentation was focused on 11g upgrade — I guess Oracle has had it with us “conservative” DBAs that stick to our guns and don’t upgrade until the R2 is out. Well, they sure got my attention with the “Revolutionary New Capabilities brought to you by the upcoming 11g R2” … But before I get to that I figured I’ll run through the notes of the stuff that was most interesting to me as a DBA, so instead of just stuffing the notepad someplace where I will never find it I figured it would be beneficial to both you and I if I wrote it up here:

Interesting stuff to research:
Quiesce
Incident Packaging
Total Recall
Data Masking

Capture & Replay Workload AKA Real Application Testing:
You can now achieve realistic testing with Database Capture and Replay. DB Capture is capable of recording bind variables, SCN and order of operations on your production instance and then using DB Replay you can reproduce the same exact load on a test system which will allow you to compare metrics (v$..), test upgrades and even test capacity plan.

Database Capture is available as of 9.2.0.8/up and it can be used across platforms, so you can easily use Captured workload from 9.2.0.8 and replay it on 11g test instance. Replay only works on 11g (11.1.0.7 see Note:738538.1).

The core of Real Application Testing is SPA (SQL Performance Analyzer) [pronounced like "heath SPA"] and SQL Tuning Advisor. SPA can be used in conjunction with Replay to look for regressed SQL, allowing you to build acceptable optimizer plans for the new release of the database before you upgrade. Replay commits changes — use Flashback DB to revert back and replay. 10.2.0.5 Grid Control can automate the Replay and restore of the database using GUI.

11g R2 will be able to UP the workload so it can be effectively used for future capacity planning.

Advise: when making comparisons using these tools rely on CPU TIME and BUFFER GETS rather than ELAPSED TIME.

Replay clients are called WRC. You can start multiple WRC processes to simulate concurrency. Filters can be used during Capture to limit what is being recorded. 11g R2 will also have selective replay capability.

Data Recovery Advisor:
If you want a second pair of “eyes” or have no idea what to do when your database requires recovery — Data Recovery Advisor might be able to help. Of course this requires you to use RMAN to backup your database in the first place, but you should be already doing that anyway right? But it also requires RMAN Catalog …

Flashback Data Archive:
Flashback is designed to “cure” the “right command / wrong database syndrome“, where you urgently want to UNDO a mistake.

9i - brought to us Flashback Query
10g - Flashback Tables and DB
11g - brings Flashback Data Archive & Transactions

Flash Data Archive is enabled on a table by table basis, where once a table is placed in the Data Archive a shadow table is created where all changes are recorded and kept. It works through a new background process and has very little (if any) overhead because it mines the UNDO. This is the best option for auditing and should replace all of your DIY audit triggers. While Audit triggers can be disabled or dropped, once a table is placed in the flash data archive it cannot be dropped even by the SYS DBA - perfect AUDIT.

Revolutionary New Capabilities brought to you by the upcoming 11g R2:
Expect to hear a lot of buzz about 11g R2 by October ‘09. 80% of the Open World “talk” will be about 11g R2. What can we expect? How about “Upgrade Database Application” Online using a new, “revolutionary” feature called “ADDITION“?

create ADDITION v2;
alter session set ADDITION = v2;
– from now on everything takes place in this new virtual schema called v2 –
[compile new PKGs]
[add indexes and tables ???] Tom mentioned later that ADDITIONs only work for things that “live” in SYSTEM, i.e. compiled and virtual stuff, so this might not work for tables and indexes …
[v1/original is still fully operational and running]
[test v2]
[pause DB]
alter database set schema XYZ to V2 addition; [not real syntax ...]
DONE.

Obviously, this will only work if you use PL/SQL. V2 becomes the current ADDITION once it’s enabled. V1 can be kept around if you desire so. This is NOT a source code control — it’s only designed to easily deploy database applications without disrupting your online users.

11g Upgrade Best Practices:
Upgrading to 10g? Review Note:466181.1
Upgrading to 11g? Review Note:601807.1

Oracle support really, really wants you to use OCM (Oracle Configuration Manager). They also want you to use the new Metalink (I don’t use it just in case you wanted to know …).

USE DBUA (I agree). DBUA == easy. Manual Upgrade == hard.

Backup your optimizer stats — see Note:465787.1

Don’t set any optimizer_* or db_multiblock_* params that you found on google. This is BAD practice. Most of the default values are good for 99% of installations — don’t touch them.

Gather dictionary statistics before you upgrade, it can dramatically improve the time it takes to upgrade your database.

Post upgrade run:
execute DBMS_STATS.GATHER_SYSTEM_STATS(start); [not exact command]
execute DBMS_STATS.GATHER_SYSTEM_STATS(stop); [not exact command]

On 11g use AUTO_SAMPLE_SIZE for ESTIMATE_PERCENT — this is 5-6 times faster and close to 100% accurate.

Export AWR repository BEFORE the upgrade.

Plan Stability:
On 11g use Plan Stability feature — see DBMS_SPM. SPM allows you to create SQL Baseline where only known and accepted plans are used for specific SQL. This can be either automated or setup manually by DBA. SPM works in three phases:

  1. Capture (stored in SYSAUX)
  2. Selection (DBA marks what’s accepted)
  3. Evolution (EVAL all unverified plans to see if they are worth considering)

Capture: OPTIMIZER_CAPTURE_SQL_PLAN_BASELINES = true
Selection: OPTIMIZER_USE_SQL_PLAN_BASELINES = true + The Plan must be ENABLED=YES AND ACCEPTED=YES to be used by SPM.
Evolution: Scheduled by DBA via EVAL procedure.

See Note:456518.1.

You can also pack the stored SQL Plan Baselines on DEV and then unpack them into PROD. Great thing to ask of your vendors — provide the SQL Plan Baselines with all your code.

If you are going to 11g from 10g you can utilize STS — create STS (SQL Tuning Set) on 10g, unload them to staging table and load them into 11g’s plan baseline.

Some Random Bits and funny Pieces:
Evolution of APEX:
Browse (good) -> WebDB (good) -> Portal (messy) -> dead end
few years later …
HTML DB (good) -> APEX (better)

The guy who wrote APEX (Mike Hichwa) is also largely responsible for existence of SQLPLUS and SQL Developer.

Bryn Llewellyn owns PLSQL.

DBMS_OUTPUT.put_line (danger)
sys.DBMS_OUTPUT.put_line (safe)

Tom Kyte:

This is a really big table. Really? How big? It’s really big — imagine the largest table and insert one more record into it — that’s how big.

Rexx is my favorite language …

:)

Oracle RAC’s share everything vs share nothing …

Google’s share nothing approach to application development has lead to the #1 search engine solution both in performance and functionality. Notice that I said “application development” because for the share nothing approach to work it needs to be built into the application from day one not as an afterthought.

On the other end of the spectrum we have ERP APPS where design with thousands of tables per module are the norm and the UNION ALL joins span a multi-page printout. In these types of applications Oracle RAC’s “share everything” approach is clearly superior, scratch that, it’s the only solution, period.

For an interesting read on this specific issue take a look at Kevin Closson’s post titled “Nearly Free or Not, GridSQL for EnterpriseDB is Simply Better Than Real Application Clusters. It is Shared-Nothing Architecture After All!” and a sort of reply to it by a blogger called “bonglonglong” titled “All in the assumptions“.

What the heck is “snapshot standby”?

I was just reviewing Note:565535.1 Flashback Database Best Practices & Performance and came across the following paragraph under the title of “Measuring Impact of turning Flashback Database ON / Analyzing the Performance Impact“:

If you’re on RDBMS release 11g and have a physical standby, then you can enable flashback database on your physical standby and then convert it to a snapshot standby (this can also be done in 10.2 with a manual snapshot standby, but no redo is received and applied as in the 11g Snapshot Standby. See Dell 10g Snapshot Standby - Dell 10g Snapshot Standby). Then when you run a workload against the snapshot standby it will be with flashback database logging enabled.

So what the heck is “snapshot standby”?

Essentially it allows you to convert your 11g physical standy to a “Snapshot standby” so that you can “Make changes to the Primary and the the Snapshot standby in some common tables” and then “Convert the Snapshot standby back to a Physical standby”. Makes sense? NO? OK, I know, how about this:

  • A snapshot standby database is a fully updateable standby database created by converting a physical standby database into a snapshot standby database.
  • A snapshot standby database receives and archives, but does not apply, redo data from a primary database. The redo data received from the primary database is applied once a snapshot standby database is converted back into a physical standby database, after discarding all local updates to the snapshot standby database.
  • A snapshot standby can be created from Enterprise Manager, the Data Guard Broker command line interface (DGMGRL) or from SQL*Plus.

Better? YES. This is probably THE reason to go 11g if you haven’t already. Basically you get an instant REPORTING/TEST server without loosing your Physical Standby database. See: Using Snapshot Standby for an example.

09′ IOUG Collaborate - notes for Oracle DBAs

I just reviewed Tom Kyte’s presentation from the 09′ IOUG Collaborate event and made a list of things that are specifically interesting to an Oracle DBA with the links for further research into each feature.

I. Those managing PHP/Oracle shops will be happy to learn that there’s a new Database Resident Connection Pool [DRCP] that performs as well as dedicated server (it really is) with benefits of shared server model.

II. Are you sill using TOAD? Perhaps it’s time to give Oracle Sql Developer a shoot because it’s offering a host of Sql/PLSQL Tuning Features built right in:

Formatting Corrupted Block Not Part of Any Segment

It’s been a while since my last post … but this issue I ran into last night is worth mentioning. Here’s what happened — one of my RMAN backups failed with:

1
2
3
4
5
   RMAN-00571: ===========================================================
   RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
   RMAN-00571: ===========================================================
   RMAN-03002: failure of backup plus archivelog command at 05/05/2009 23:16:15
   ORA-19566: exceeded limit of 0 corrupt blocks for file /u01/app/czgddata/a_txn_data02.dbf

The corrupt block was being reported as part of a segment — a table:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
   SQL> CONNECT / AS sysdba
   Connected.
   SQL>
   SQL>
   SQL> SELECT owner, segment_name, segment_type
   FROM   dba_extents
   WHERE  file_id = 401
   AND    257532 BETWEEN block_id AND block_id + blocks - 1;
 
     2    3    4
 
   OWNER
   ------------------------------
   SEGMENT_NAME
   --------------------------------------------------------------------------------
   SEGMENT_TYPE
   ------------------
   CZ
   CZ_DEVL_PROJECTS
   TABLE

But the data was still readable and I was able to move the table to “map the corrupt block” outside of the segment boundaries:

1
2
3
4
5
6
7
8
9
   SQL> ALTER TABLE CZ.CZ_DEVL_PROJECTS move;
 
   TABLE altered.
 
   SQL> SELECT COUNT(*) FROM CZ.CZ_DEVL_PROJECTS;
 
     COUNT(*)
   ----------
          312

Still, even after the move the block was still reported corrupt by the dbv and I had to deploy a procedure from the ML-Note: 336133.1. Which I fully documented in here –> How to Format Corrupted Block Not Part of Any Segment.

What I Need From Support

You’ve been there haven’t you? You buy a product, use it, have an issue, cringe and call support. What comes next either breaks or makes your future relations with the company and the product they sold you. It’s at this point that you find out if they are your partner or you are on your own. With consumer products my expectation for a good one is to never break — I don’t ever want to have to call their support. But when it comes to the toys we use in data centers I want to hit all of the big issues during QA and I want it to break down so badly that it will stump the technical support team so we get a chance to test their infrastructure and protocols.

That’s exactly what happened to one of the key products we selected for our latest large scale implementation. It’s irrelevant to this article what that product is, what’s relevant is that the support organization experience was disappointing. I am not going to focus on what went wrong — it really isn’t that important, what’s important is that we got the message across and it appears we’ll have a meeting with the people that might be able to make a difference. And I really hope that they do — I believe this company has potential.

I was thinking about the upcoming meeting and what to address and here’s what I realized. It’s actually very simple — here’s my list of things I Need From a Support Organization of a technology company:

  1. Community
  2. Communication
  3. Response

Community

Community is where you go first if you have a non urgent question. Community needs to consist of three components:

  1. Knowledge Base
  2. Bug Database
  3. Forums

Knowledge Base needs to offer an up to date list of known issues and white papers that bridge the gap between official documentation and what customers are actually going through while doing hands on implementations. Bug Database is where you get a chance to feel the pulse of the company — it’s an integral part of the community and it doesn’t have to be an open Bug database that we see in the open source communities, no, just a read-only access where you get to see what kinds of issues development is working on these days. Through Forums you share experiences with other customers and learn best practices by picking the brains of the community gurus. Forums is what connects you with the rest of the customer base and gives you a chance to see the bigger picture, it also shows that the company is open about it’s business and is not afraid of public scrutiny.

Communication

Communication is one of the most critical aspects of a support organization. It needs to flow both ways — customer to support and support to the customer. Support needs to drive Communication, they need to deliver instant updates to your issue and they need to squeeze the last drop of information from you because even the smallest piece of information can play a huge role in how effective a solution will be or how it will help other customers that might be facing similar symptoms but have not yet identified the cause of the problem. Communication is the only thing that allows a good support organization to create strong Knowledge Base because it’s the only way to gage what you are really experiencing in the field.

For example a failure of their product might have an adverse affect on other products within your technology stack so it’s imperative for support to ask you for the error messages that you might have seen with the rest of the components so that these symptoms can be properly documented and published in the knowledge base as a white paper or an alert.

Response

Response is the most critical aspect of the support organization. The worst thing you as a customer can experience is when your requests are being ignored or lost in the queue. You need to know as soon as possible that someone is looking into the issue and it better NOT be an boilerplate response from an automated system. Response needs to include action plan — it’s not enough to simply say “We are working on it” — it should provide an action plan even if it involves you — the customer. Response goes hand in hand with communication and it needs to be recorded into a system that you can login to view history of events. Just like Communication, Response goes both ways and support needs to drive it — if a customer is delaying response to a request for information support needs to follow up immediately.

And there you have it — Community, Communication, Response = effective support organization.

No more SUN

After running ORACLE on SUN for over 7 years and being the big proponent of it I have finally said enough is enough. Starting this summer our shop made a switch to Linux (HP) and so far I could not be happier. I just don’t get it with SUN anymore they keep digging a deeper whole by separating themselves further and further from ORACLE. And with ORACLE now offering Linux support I just don’t see why would someone not yet invested in SUN hardware would want to choose them over Linux.

Overpriced SUN Hardware is replaced by powerful and reasonably priced HP Boxes

Overpriced SUN Hardware is replaced by powerful and reasonably priced HP Boxes (Front)

Overpriced SUN Hardware is replaced by powerful and reasonably priced HP Boxes (Back)

Overpriced SUN Hardware is replaced by powerful and reasonably priced HP Boxes (Back)