Oracle: Technology, Application, and more: May 2010

Resource allocation and management in large SMP systems has always been a challenge especially in large data warehouse systems. At the client sites that I have been to, the challenge will be in the form of critical ETL jobs that does massive data processing while trickle-down feeds that gets incrementally loaded in the staging tables for subsequent processing.

While Oracle manages CPU resources efficiently using database resource manager APIs, there are cases when this does not satisfy the real world requirements.

For instance, the CPU resource allocation pretty much goes by allocation based on how much resource is currently available at the time a process started. Once the process starts it is impossible to take away resources from the job for redistribution.

Imagine a scenario where a slew of jobs from a low priority group kicks off when the system is totally idle. Based on the way resource management is handled in Oracle, all the available CPU resources will be divided among the processes. Now while these processes are consuming CPU, if a job from a high priority group kicks off, the resources are not immediately available and the high priority job has to work with what is available. Potentially, if the low priority jobs are many and as a whole consumes a significant amount of CPU, then this directly will impact the high priority job.

Oracle Database 11gR2 provides a new parameter "MAX_UTILIZATION_LIMIT" for the resource manager APIs that let's us set an "absolute" maximum CPU that a group and it's associated processes can use. This is really significant. With this now we can guarantee some low priority jobs will not eat up CPU cycles while more important jobs need to take a hit.

Below is a sample.

Some portion of the code was re-used from Jonathan Lewis's kill_cpu.

The test was conducted in a completely idle system running 11gR2/OEL5/ASM 11R2.

In the first case, the plan with 2 groups with "max_utilization" resource allocation of 50/50 was created and a simple create table statement was executed. The tkprof output,

create table tab_stingy_cpu_new_50_50 as
select a.name,a.type,a.line,a.text,b.*
from dba_source a, dba_objects b
where rownum <=1000000

call     count       cpu    elapsed       disk      query    current        rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse        1      0.12       0.18          0          0          0           0
Execute      1      7.04      27.22      10554      12559      27033     1000000
Fetch        0      0.00       0.00          0          0          0           0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total        2      7.16      27.41      10554      12559      27033     1000000

The next sample shows, the elapsed time change when the max_utilization_limit is set to a higly imbalanced (1,99) fro the same respective groups.

exec dbms_resource_manager.create_pending_area;
exec dbms_resource_manager.update_plan_directive('STINGY_CPU','STINGY_CPU_CON_GROUP',NEW_MAX_UTILIZATION_LIMIT=>1);
exec dbms_resource_manager.update_plan_directive('STINGY_CPU','OTHER_GROUPS',NEW_MAX_UTILIZATION_LIMIT=>99);
exec dbms_resource_manager.validate_pending_area;
exec dbms_resource_manager.submit_pending_area

SQL> select a.sid,a.resource_consumer_group,
b.MAX_UTILIZATION_LIMIT from v$session a, DBA_RSRC_PLAN_DIRECTIVES b where plan='STINGY_CPU'
and a.resource_consumer_group=b.group_or_subplan
and a.sid=(select sys_context('USERENV','SID') from dual) 2    3    4
5 /

       SID RESOURCE_CONSUMER_GROUP          MAX_UTILIZATION_LIMIT
---------- -------------------------------- ---------------------
       146 STINGY_CPU_CON_GROUP                                 1

create table tab_stingy_cpu_new_1_99 as
select a.name,a.type,a.line,a.text,b.*
from dba_source a, dba_objects b
where rownum <=1000000

call     count       cpu    elapsed       disk      query    current        rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse        1      0.12       6.17          0          0          0           0
Execute      1      7.03     305.25      10554      12559      27033     1000000
Fetch        0      0.00       0.00          0          0          0           0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total        2      7.16     311.43      10554      12559      27033     1000000

Amazingly, to my surprise the CPU time is exactly the same (7.16 seconds). I confess, I did not expect this :) However, if you notice the elapsed time is 311 seconds. This is because, the max_utilization_limit enforces that the process does not take too much CPU though the machine is complete idle.

This in my opinion is an amazing feature from Data Warehousing systems perspective where I can be confident with my ETL load and Materialized view refresh timings.

After all, consistency is more important than speed when it comes to explaining performance to end users.

All comments welcome.

The following was a lesson learned with proper group permissions when interacting with the new and improved 11gR2 ASM. I thought perhaps, this might be useful for someone who is facing this scenario in the future. Hope Google, provides this as a possible link if you ever search this problem.

I tried to install Oracle 10g database on 11.2 ASM. I already have an 11.2 RAC DB running fine on this 11.2 ASM (also RAC) instance.

Upon completing the DB binary install (10.2.0.1), I tried to create the database using DBCA. At the point where it asks to select the type of the file, When I selected ASM: The following message pops up.

It erroed saying that the ASM instance was not running. Obviously, it was. So, It was time to look under the covers.

First place was to look at the trace.log in $OH/cfgtoollogs/dbca

trace.log (final few lines ) were...

[AWT-EventQueue-0] [22:31:45:302] [ASMInstance.discoverASMInstance:369] Found existing ASM -- Sid: +ASM2 OracleHome: /u01/app/11.2.0/grid

[AWT-EventQueue-0] [22:31:45:303] [ASMInstance.discoverASMInstance:374] ORATAB_FILE /etc/oratab

[AWT-EventQueue-0] [22:31:45:305] [ASMInstance.discoverASMInstance:376]discoverOSMInstance:bExists=true

[AWT-EventQueue-0] [22:31:45:310] [ASMInstance.initialize:339]initialize:m_bASMPresent=true

[AWT-EventQueue-0] [22:31:46:925] [ASMInstance.validateASM:475] Validating SI-ASM...

[AWT-EventQueue-0] [22:31:46:952] [SQLEngine.initialize:244] Execing SQLPLUS/SVRMGR process...

[AWT-EventQueue-0] [22:31:46:973] [SQLEngine.initialize:272] m_bReaderStarted: false

[AWT-EventQueue-0] [22:31:46:975] [SQLEngine.initialize:276] Starting Reader Thread...

[AWT-EventQueue-0] [22:31:47:33] [ASMInstance.detectASMInstance:775]detectOSMInstance:m_bASMPresent=true

[AWT-EventQueue-0] [22:31:47:34] [ASMInstance.detectASMInstance:776]detectOSMInstance:m_bASMRunning=false

[AWT-EventQueue-0] [22:31:47:147] [SQLEngine.done:1960] Done called

Here is the problem. It just stops abruptly instead of listing the ASM disk groups to proceed further with the install. So something is amiss.

[AWT-EventQueue-0] [22:31:47:148] [SQLEngine.reInitialize:605] Reinitializing SQLEngine...

[AWT-EventQueue-0] [22:31:47:151] [SQLEngine.initialize:244] Execing SQLPLUS/SVRMGR process...

Looking into metalink, notes: 948456.1

Point #1 was not the reason for my error, as the cluster in linux2 (my node) is pinned.

[root@linux2 ~]# olsnodes -t -n

linux1 1 Unpinned

linux2 2 Pinned

I looked at Point#2 and this was the exact error message I received. So I decided to take that route and did an upgrade to 10.2.0.5. I could not find out whether (8288940) is included in 10.2.0.5. It was a leap of faith.

After Upgrade, I started DBCA and came to the same error. So I decided to compare the o/s level permissions between the 11.2 oracle id (oracle) and 10.2 oracle id (oracle10).

I saw a difference.

[oracle@linux2 ~]$ id <-- This is the oracle db software owner id for 11gR2.

uid=1101(oracle) gid=1000(oinstall) groups=502(dba),503(oper),1000(oinstall),1201(asmdba)

[oracle10@linux2 ~]$ id <-- This is the Oracle s/w owner for 10g

uid=1104(oracle10) gid=1000(oinstall) groups=1000(oinstall),1200(asmadmin),1203(dba10)

As you can see, the asmdba group is missing for Oracle10.

Oracle 11.2 provides new security restrictions to manage and maintain ASM. Gone are the days when the "sys" user of the database can do everything on ASM. Specifically, oracle provides (asmadmin, asmdba, and asmoper). You can read more about the roles and privileges.

It states:

Members of the (OSASM group), a.k.a asmadmin can:

use SQL to connect to an Oracle ASM instance as SYSASM using operating system authentication. The SYSASM privileges permit mounting and dismounting disk groups, and other storage administration tasks.

Whereas,

Members of the ASM Database Administrator group (OSDBA for ASM) are granted read and write access to files managed by Oracle ASM. The grid infrastructure installation owner and all Oracle Database software owners must be a member of this group, and all users with OSDBA membership on databases that have access to the files managed by Oracle ASM must be members of the OSDBA group for ASM.

Translation:

Though one might think (I certainly did) "asmadmin" is some sort of "admin" group that would have a superset of privileges, in fact, this is not true. It is more important to have the oracle software owners to be part of "asmdba" group than "asmadmin” group.

Upon adding the “asmdba” group to “oracle10” id, I restarted the install. This time the DBCA went past the problem area.

This time the trace.log has the following.

94 [AWT-EventQueue-0] [23:9:41:819] [ASMInstance.discoverASMInstance:369] Found existing ASM -- Sid: +ASM2 OracleHome: /u01 /app/11.2.0/grid

95 [AWT-EventQueue-0] [23:9:41:820] [ASMInstance.discoverASMInstance:374]ORATAB_FILE /etc/oratab

96 [AWT-EventQueue-0] [23:9:41:821] [ASMInstance.discoverASMInstance:376]discoverOSMInstance:bExists=true

97 [AWT-EventQueue-0] [23:9:41:825] [ASMInstance.initialize:339]initialize:m_bASMPresent=true

98 [AWT-EventQueue-0] [23:9:42:721] [ASMInstance.validateASM:475] Validating SI-ASM...

99 [AWT-EventQueue-0] [23:9:42:742] [SQLEngine.initialize:244] Execing SQLPLUS/SVRMGR process...

100 [AWT-EventQueue-0] [23:9:42:757] [SQLEngine.initialize:272] m_bReaderStarted: false

101 [AWT-EventQueue-0] [23:9:42:758] [SQLEngine.initialize:276] Starting Reader Thread...

102 [AWT-EventQueue-0] [23:9:42:809] [ASMInstance.detectASMInstance:775]detectOSMInstance:m_bASMPresent=true

103 [AWT-EventQueue-0] [23:9:42:810] [ASMInstance.detectASMInstance:776]detectOSMInstance:m_bASMRunning=false

104 [AWT-EventQueue-0] [23:9:42:919] [ASMInstance.detectASMInstance:803] OSM Instance is running

105 [AWT-EventQueue-0] [23:9:42:990] [ASMInstance.detectASMInstance:880] ASM version = 11.2.0.1.0

106 [AWT-EventQueue-0] [23:9:43:206] [InitParamAttributes.loadDBParams:4123]Checking if SPFILE is used

107 [AWT-EventQueue-0] [23:9:43:263] [InitParamAttributes.loadDBParams:4134]spParams = [Ljava.lang.String;@1e2a069

108 [AWT-EventQueue-0] [23:9:43:265] [ASMParameters.loadASMParameters:441]useSPFile=true

109 [AWT-EventQueue-0] [23:9:43:265] [StorageOptionsPage.validate:569] ASM is present and started.

110 [AWT-EventQueue-0] [23:9:43:347] [DiskGroupsPanel.initialize:262] Loading Data in the Table..

111 [AWT-EventQueue-0] [23:9:43:348] [DiskGroupDataSource.loadData:252] Load the data from the datasource

112 [AWT-EventQueue-0] [23:9:43:422] [ASMInstance.loadDiskGroups:1811] sql to be executed:=select NAME||'|'||round(TOTAL_MB)||'|'||round(USABLE_FILE_MB)||'|'||nvl(TYPE, 'DBCA_NULL')||'|'||STATE from v$asm_diskgroup order by NAME

113 [AWT-EventQueue-0] [23:9:43:611] [ASMInstance.loadDiskGroups:1815] Loading the diskgroup from schema....

114 [AWT-EventQueue-0] [23:9:43:612] [ASMInstance.loadDiskGroups:1823] diskgroup: CRS|3069|2673|EXTERN|MOUNTED

115 [AWT-EventQueue-0] [23:9:43:615] [ASMInstance.loadDiskGroups:1823] diskgroup: DATADG|30719|28615|EXTERN|MOUNTED

116 [AWT-EventQueue-0] [23:9:43:616] [ASMInstance.loadDiskGroups:1823] diskgroup: DATARAC10G|30719|30624|EXTERN|MOUNTED

117 [AWT-EventQueue-0] [23:9:43:617] [ASMInstance.loadDiskGroups:1823] diskgroup: FRADG|30719|30192|EXTERN|MOUNTED

118 [AWT-EventQueue-0] [23:9:43:688] [ASMInstance.setBestDiskGroup:1432] sql to be executed:=select name from v$asm_diskgrou p where free_mb= (select max(free_mb) from v$asm_diskgroup)

Just to ensure "asmdba" is the only critical group needed, I took out "asmadmin" and tried the install once again.

[oracle10@linux2 ~]$ id

uid=1104(oracle10) gid=1000(oinstall) groups=1000(oinstall),1201(asmdba),1203(dba10)

It was able to recognize the disks.

So, please make sure that the oracle binary owners are part of the "asmdba" group when creating databases on 11.2 ASM.

Oracle: Technology, Application, and more

Friday, May 21, 2010

MAX_UTILIZATION_LIMIT - A Very useful option in 11gR2

Friday, May 7, 2010

UNIX Group Permissions when Installing 10g on 11g ASM

About Me

Blog Archive

Welcome

Categories

Sitemeter