Friday, May 21, 2010

MAX_UTILIZATION_LIMIT - A Very useful option in 11gR2

Resource allocation and management in large SMP systems has always been a challenge especially in large data warehouse systems. At the client sites that I have been to, the challenge will be in the form of critical ETL jobs that does massive data processing while trickle-down feeds that gets incrementally loaded in the staging tables for subsequent processing.

While Oracle manages CPU resources efficiently using database resource manager APIs, there are cases when this does not satisfy the real world requirements.

For instance, the CPU resource allocation pretty much goes by allocation based on how much resource is currently available at the time a process started. Once the process starts it is impossible to take away resources from the job for redistribution.

Imagine a scenario where a slew of jobs from a low priority group kicks off when the system is totally idle. Based on the way resource management is handled in Oracle, all the available CPU resources will be divided among the processes. Now while these processes are consuming CPU, if a job from a high priority group kicks off, the resources are not immediately available and the high priority job has to work with what is available. Potentially, if the low priority jobs are many and as a whole consumes a significant amount of CPU, then this directly will impact the high priority job.

Oracle Database 11gR2 provides a new parameter "MAX_UTILIZATION_LIMIT" for the resource manager APIs that let's us set an "absolute" maximum CPU that a group and it's associated processes can use. This is really significant. With this now we can guarantee some low priority jobs will not eat up CPU cycles while more important jobs need to take a hit.

Below is a sample.

Some portion of the code was re-used from Jonathan Lewis's kill_cpu.

The test was conducted in a completely idle system running 11gR2/OEL5/ASM 11R2.

In the first case, the plan with 2 groups with "max_utilization" resource allocation of 50/50 was created and a simple create table statement was executed. The tkprof output,

create table tab_stingy_cpu_new_50_50 as
  select a.name,a.type,a.line,a.text,b.*
from dba_source a, dba_objects b
 where rownum <=1000000

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.12       0.18          0          0          0           0
Execute      1      7.04      27.22      10554      12559      27033     1000000
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        2      7.16      27.41      10554      12559      27033     1000000


The next sample shows, the elapsed time change when the max_utilization_limit is set to a higly imbalanced (1,99) fro the same respective groups.

exec dbms_resource_manager.create_pending_area;
exec dbms_resource_manager.update_plan_directive('STINGY_CPU','STINGY_CPU_CON_GROUP',NEW_MAX_UTILIZATION_LIMIT=>1);
exec dbms_resource_manager.update_plan_directive('STINGY_CPU','OTHER_GROUPS',NEW_MAX_UTILIZATION_LIMIT=>99);
exec dbms_resource_manager.validate_pending_area;
exec dbms_resource_manager.submit_pending_area


SQL> select a.sid,a.resource_consumer_group,
b.MAX_UTILIZATION_LIMIT from v$session a, DBA_RSRC_PLAN_DIRECTIVES  b where plan='STINGY_CPU'
and a.resource_consumer_group=b.group_or_subplan
and a.sid=(select sys_context('USERENV','SID') from dual)  2    3    4 
  5  /

       SID RESOURCE_CONSUMER_GROUP          MAX_UTILIZATION_LIMIT
---------- -------------------------------- ---------------------
       146 STINGY_CPU_CON_GROUP                                 1

create table tab_stingy_cpu_new_1_99 as
  select a.name,a.type,a.line,a.text,b.*
from dba_source a, dba_objects b
 where rownum <=1000000

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.12       6.17          0          0          0           0
Execute      1      7.03     305.25      10554      12559      27033     1000000
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        2      7.16     311.43      10554      12559      27033     1000000



Amazingly, to my surprise the CPU time is exactly the same (7.16 seconds). I confess, I did not expect this :) However, if you notice the elapsed time is 311 seconds. This is because, the max_utilization_limit enforces that the process does not take too much CPU though the machine is complete idle.

This in my opinion is an amazing feature from Data Warehousing systems perspective where I can be confident with my ETL load and Materialized view refresh timings.

After all, consistency is more important than speed when it comes to explaining performance to end users.

All comments welcome.

Friday, May 7, 2010

UNIX Group Permissions when Installing 10g on 11g ASM

The following was a lesson learned with proper group permissions when interacting with the new and improved 11gR2 ASM. I thought perhaps, this might be useful for someone who is facing this scenario in the future. Hope Google, provides this as a possible link if you ever search this problem.
I tried to install Oracle 10g database on 11.2 ASM. I already have an 11.2 RAC DB running fine on this 11.2 ASM (also RAC) instance.
Upon completing the DB binary install (10.2.0.1), I tried to create the database using DBCA. At the point where it asks to select the type of the file, When I selected ASM: The following message pops up.




It erroed saying that the ASM instance was not running. Obviously, it was. So, It was time to look under the covers.

First place was to look at the trace.log in $OH/cfgtoollogs/dbca
trace.log (final few lines ) were...

[AWT-EventQueue-0] [22:31:45:302] [ASMInstance.discoverASMInstance:369] Found existing ASM -- Sid: +ASM2 OracleHome: /u01/app/11.2.0/grid
[AWT-EventQueue-0] [22:31:45:303] [ASMInstance.discoverASMInstance:374] ORATAB_FILE /etc/oratab
[AWT-EventQueue-0] [22:31:45:305] [ASMInstance.discoverASMInstance:376]discoverOSMInstance:bExists=true
[AWT-EventQueue-0] [22:31:45:310] [ASMInstance.initialize:339]initialize:m_bASMPresent=true
[AWT-EventQueue-0] [22:31:46:925] [ASMInstance.validateASM:475] Validating SI-ASM...
[AWT-EventQueue-0] [22:31:46:952] [SQLEngine.initialize:244] Execing SQLPLUS/SVRMGR process...
[AWT-EventQueue-0] [22:31:46:973] [SQLEngine.initialize:272] m_bReaderStarted: false
[AWT-EventQueue-0] [22:31:46:975] [SQLEngine.initialize:276] Starting Reader Thread...
[AWT-EventQueue-0] [22:31:47:33] [ASMInstance.detectASMInstance:775]detectOSMInstance:m_bASMPresent=true
[AWT-EventQueue-0] [22:31:47:34] [ASMInstance.detectASMInstance:776]detectOSMInstance:m_bASMRunning=false
[AWT-EventQueue-0] [22:31:47:147] [SQLEngine.done:1960] Done called

Here is the problem. It just stops abruptly instead of listing the ASM disk groups to proceed further with the install. So something is amiss.
[AWT-EventQueue-0] [22:31:47:148] [SQLEngine.reInitialize:605] Reinitializing SQLEngine...
[AWT-EventQueue-0] [22:31:47:151] [SQLEngine.initialize:244] Execing SQLPLUS/SVRMGR process...

Looking into metalink, notes: 948456.1

Point #1 was not the reason for my error, as the cluster in linux2 (my node) is pinned.
[root@linux2 ~]# olsnodes -t -n
linux1 1 Unpinned
linux2 2 Pinned

I looked at Point#2 and this was the exact error message I received. So I decided to take that route and did an upgrade to 10.2.0.5. I could not find out whether (8288940) is included in 10.2.0.5. It was a leap of faith.

After Upgrade, I started DBCA and came to the same error. So I decided to compare the o/s level permissions between the 11.2 oracle id (oracle) and 10.2 oracle id (oracle10).
I saw a difference.

[oracle@linux2 ~]$ id <-- This is the oracle db software owner id for 11gR2.
uid=1101(oracle) gid=1000(oinstall) groups=502(dba),503(oper),1000(oinstall),1201(asmdba)

[oracle10@linux2 ~]$ id <-- This is the Oracle s/w owner for 10g
uid=1104(oracle10) gid=1000(oinstall) groups=1000(oinstall),1200(asmadmin),1203(dba10)


As you can see, the asmdba group is missing for Oracle10.
Oracle 11.2 provides new security restrictions to manage and maintain ASM. Gone are the days when the "sys" user of the database can do everything on ASM. Specifically, oracle provides (asmadmin, asmdba, and asmoper). You can read more about the roles and privileges.

It states:

Members of the (OSASM group), a.k.a asmadmin can:

use SQL to connect to an Oracle ASM instance as SYSASM using operating system authentication. The SYSASM privileges permit mounting and dismounting disk groups, and other storage administration tasks.

Whereas,
Members of the ASM Database Administrator group (OSDBA for ASM) are granted read and write access to files managed by Oracle ASM. The grid infrastructure installation owner and all Oracle Database software owners must be a member of this group, and all users with OSDBA membership on databases that have access to the files managed by Oracle ASM must be members of the OSDBA group for ASM.

Translation:

Though one might think (I certainly did) "asmadmin" is some sort of "admin" group that would have a superset of privileges, in fact, this is not true. It is more important to have the oracle software owners to be part of "asmdba" group than "asmadmin” group.

Upon adding the “asmdba” group to “oracle10” id, I restarted the install. This time the DBCA went past the problem area.
















This time the trace.log has the following.

94 [AWT-EventQueue-0] [23:9:41:819] [ASMInstance.discoverASMInstance:369] Found existing ASM -- Sid: +ASM2 OracleHome: /u01 /app/11.2.0/grid
95 [AWT-EventQueue-0] [23:9:41:820] [ASMInstance.discoverASMInstance:374]ORATAB_FILE /etc/oratab
96 [AWT-EventQueue-0] [23:9:41:821] [ASMInstance.discoverASMInstance:376]discoverOSMInstance:bExists=true
97 [AWT-EventQueue-0] [23:9:41:825] [ASMInstance.initialize:339]initialize:m_bASMPresent=true
98 [AWT-EventQueue-0] [23:9:42:721] [ASMInstance.validateASM:475] Validating SI-ASM...
99 [AWT-EventQueue-0] [23:9:42:742] [SQLEngine.initialize:244] Execing SQLPLUS/SVRMGR process...
100 [AWT-EventQueue-0] [23:9:42:757] [SQLEngine.initialize:272] m_bReaderStarted: false
101 [AWT-EventQueue-0] [23:9:42:758] [SQLEngine.initialize:276] Starting Reader Thread...
102 [AWT-EventQueue-0] [23:9:42:809] [ASMInstance.detectASMInstance:775]detectOSMInstance:m_bASMPresent=true
103 [AWT-EventQueue-0] [23:9:42:810] [ASMInstance.detectASMInstance:776]detectOSMInstance:m_bASMRunning=false
104 [AWT-EventQueue-0] [23:9:42:919] [ASMInstance.detectASMInstance:803] OSM Instance is running
105 [AWT-EventQueue-0] [23:9:42:990] [ASMInstance.detectASMInstance:880] ASM version = 11.2.0.1.0
106 [AWT-EventQueue-0] [23:9:43:206] [InitParamAttributes.loadDBParams:4123]Checking if SPFILE is used
107 [AWT-EventQueue-0] [23:9:43:263] [InitParamAttributes.loadDBParams:4134]spParams = [Ljava.lang.String;@1e2a069
108 [AWT-EventQueue-0] [23:9:43:265] [ASMParameters.loadASMParameters:441]useSPFile=true
109 [AWT-EventQueue-0] [23:9:43:265] [StorageOptionsPage.validate:569] ASM is present and started.
110 [AWT-EventQueue-0] [23:9:43:347] [DiskGroupsPanel.initialize:262] Loading Data in the Table..
111 [AWT-EventQueue-0] [23:9:43:348] [DiskGroupDataSource.loadData:252] Load the data from the datasource
112 [AWT-EventQueue-0] [23:9:43:422] [ASMInstance.loadDiskGroups:1811] sql to be executed:=select NAME||'|'||round(TOTAL_MB)||'|'||round(USABLE_FILE_MB)||'|'||nvl(TYPE, 'DBCA_NULL')||'|'||STATE from v$asm_diskgroup order by NAME
113 [AWT-EventQueue-0] [23:9:43:611] [ASMInstance.loadDiskGroups:1815] Loading the diskgroup from schema....
114 [AWT-EventQueue-0] [23:9:43:612] [ASMInstance.loadDiskGroups:1823] diskgroup: CRS|3069|2673|EXTERN|MOUNTED
115 [AWT-EventQueue-0] [23:9:43:615] [ASMInstance.loadDiskGroups:1823] diskgroup: DATADG|30719|28615|EXTERN|MOUNTED
116 [AWT-EventQueue-0] [23:9:43:616] [ASMInstance.loadDiskGroups:1823] diskgroup: DATARAC10G|30719|30624|EXTERN|MOUNTED
117 [AWT-EventQueue-0] [23:9:43:617] [ASMInstance.loadDiskGroups:1823] diskgroup: FRADG|30719|30192|EXTERN|MOUNTED
118 [AWT-EventQueue-0] [23:9:43:688] [ASMInstance.setBestDiskGroup:1432] sql to be executed:=select name from v$asm_diskgrou p where free_mb= (select max(free_mb) from v$asm_diskgroup)

Just to ensure "asmdba" is the only critical group needed, I took out "asmadmin" and tried the install once again.

[oracle10@linux2 ~]$ id
uid=1104(oracle10) gid=1000(oinstall) groups=1000(oinstall),1201(asmdba),1203(dba10)

It was able to recognize the disks.

So, please make sure that the oracle binary owners are part of the "asmdba" group when creating databases on 11.2 ASM.