📋Steps in this guide1/6

Attribute Clustering in Oracle Database 12c Release 1

Attribute clustering groups specific attributes together so they are in close physical proximity, making the processing of related attributes more efficient.

oracle 12cconfigurationintermediate

by OracleDba

19 views

Setup

We create a customers, products and sales table. We create 100 customers and 10 products. We populate the table with 100,000 rows, each of which has a random value between 1 and 100, a random value between 1 and 10 and a random between 1 and 50. The padding is used for this test to increase the size of the rows, and therefore the number of blocks required to store them. Looking at a sample of 20 rows we see the and values are randomly distributed throughout the blocks. We'll keep the table as our starting point, and use it as the data source for our tests.

Code/Command (click line numbers to comment):

-- drop table sales purge;
-- drop table products purge;
-- drop table customers purge;

create table customers (
  customer_id  number,
  name         varchar2(50),
  constraint customers_pk PRIMARY KEY (customer_id)
);


create table products (
  product_id  number,
  name        varchar2(50),
  constraint products_pk PRIMARY KEY (product_id)
);


create table sales (
  id              number generated always as identity,
  customer_id     number,
  product_id      number,
  quantity        number,
  created_date    date,
  padding_string  varchar2(4000),
  constraint sales_pk primary key (id),
  constraint sales_customers_fk foreign key (customer_id) references customers(customer_id),
  constraint sales_products_fk foreign key (product_id) references products(product_id)
);

insert into customers
select level,
       'Customer ' || level
from   dual
connect by level <= 100;

commit;

insert into products
select level,
       'Product ' || level
from   dual
connect by level <= 10;

commit;

insert into sales (customer_id, product_id, quantity, created_date, padding_string)
select trunc(dbms_random.value(1, 101)),
       trunc(dbms_random.value(1, 11)),
       trunc(dbms_random.value(1, 51)),
       sysdate,
       dbms_random.string('x',100)
from   dual
connect by level <= 100000;

commit;

set autotrace off

select customer_id, product_id
from   sales
where  rownum <= 20;

CUSTOMER_ID PRODUCT_ID
----------- ----------
         62          4
         65          2
         56          1
         65          1
         96          3
         99          2
         37          3
         27          1
         50          5
         75          4
         90          2

CUSTOMER_ID PRODUCT_ID
----------- ----------
         64          4
         15          4
         89          5
         72          7
         93         10
         54          6
         74          6
          9          7
         76          4

20 rows selected.

SQL>

The Problem

We create a table called as a copy of the table. If we pull back all the records for a single customer, we can see it's taking 1864 consistent gets to return the data. We add an index on the column and gather fresh statistics. We also display the clustering factor of the new index. We run the query again, but we don't see a dramatic improvement in the number of consistent gets, because we have not used the index. We force the query to use the new index, which drops our consistent gets down to 860. So why was the index not used until we forced it? The data for customer "1" is spread throughout the table, and the database determined it was still going to have to hit a lot of blocks to get it all, whether using the index or not, so it chose not to use it. This is a classic case of the index having a high cluster factor, so the optimizer has chosen not to use it in this scenario.

Code/Command (click line numbers to comment):

-- drop table sales2 purge;

create table sales2 as
select * from sales;

set autotrace trace stat

select /* attempt 1 */ * from sales2 where customer_id = 1;

998 rows selected.


Statistics
----------------------------------------------------------
         17  recursive calls
         31  db block gets
       1864  consistent gets
       1786  physical reads
       6212  redo size
     130591  bytes sent via SQL*Net to client
        778  bytes received via SQL*Net from client
         68  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        998  rows processed

SQL>

create index sales2_customer_id_idx on sales2 (customer_id);
exec dbms_stats.gather_table_stats(null, 'sales2');

set autotrace off

select clustering_factor
from   user_indexes
where  index_name = 'SALES2_CUSTOMER_ID_IDX';

CLUSTERING_FACTOR
-----------------
            76808

SQL>

set autotrace trace stat

select /* attempt 2 */ * from sales2 where customer_id = 1;

998 rows selected.


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       1857  consistent gets
          0  physical reads
          0  redo size
     130591  bytes sent via SQL*Net to client
        778  bytes received via SQL*Net from client
         68  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        998  rows processed

SQL>

set autotrace trace stat

select /*+ index(sales2) */ * from sales2 where customer_id = 1;

998 rows selected.


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
        860  consistent gets
          0  physical reads
          0  redo size
     142053  bytes sent via SQL*Net to client
        778  bytes received via SQL*Net from client
         68  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        998  rows processed

SQL>

The Solution: Attribute Clustering

We add linear clustering to the table, based on the column, and rebuild the table using an online operation. The data has now been rearranged into ascending order based on the column. We can see this by checking a sample of the data. The online move of the table maintains the indexes. When we check the clustering factor of the index, we see it has reduced drastically. We try the original query again, without the index hint. There are two things of note here. - The optimizer has chosen to use the index without being forced to. - When using the index, the consistent gets have dropped from 860 to 154. Now the data in the table is ordered in the same order as the index leaf blocks, the index clustering factor is much lower, making it a more interesting index to the optimizer. This is why the optimizer chose to use the index without being forced. What's more, the fact rows for the same customer are clustered together in the same blocks mean less total blocks have to be retrieved to get the data. This is why the total number of consistent gets dropped drastically.

Code/Command (click line numbers to comment):

alter table sales2 add clustering by linear order (customer_id);
alter table sales2 move online;

set autotrace off

select customer_id, product_id
from   sales2
where  rownum <= 20;

CUSTOMER_ID PRODUCT_ID
----------- ----------
          1          9
          1          8
          1          6
          1          1
          1          3
          1          4
          1          8
          1          7
          1          5
          1         10
          1          2

CUSTOMER_ID PRODUCT_ID
----------- ----------
          1          8
          1          5
          1          9
          1          5
          1         10
          1          6
          1          3
          1          4
          1          3

20 rows selected.

SQL>

set autotrace off

select clustering_factor
from   user_indexes
where  index_name = 'SALES2_CUSTOMER_ID_IDX';

CLUSTERING_FACTOR
-----------------
             1786

SQL>

set autotrace trace stat

select /* attempt 3 */ * from sales2 where customer_id = 1;

998 rows selected.


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
        154  consistent gets
          4  physical reads
          0  redo size
     142053  bytes sent via SQL*Net to client
        778  bytes received via SQL*Net from client
         68  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        998  rows processed

SQL>

Attribute Clustering Basic Syntax

Attribute clustering can be defined during table creation, including during a CTAS operation. By default, attribute clustering is enabled for direct-path inserts ( ) and data movement operations ( ). The data movement operations include , , , and online table redefinitions. The attribute clustering defaults can be modified at creation time. The attribute clustering settings of an existing table can be modified using the command. The existing data is untouched. The attribute clustering settings of load operations can be overridden using the and hints. It's possible to do join attribute clustering. Data from the joined tables is not stored in the clustered table, but the cluster order is determined by the join columns data in joined tables. This is meant for star schemas and fact/dimension tables.

Code/Command (click line numbers to comment):

drop table sales3 purge;

create table sales3 (
  id              number,
  customer_id     number,
  product_id      number,
  quantity        number,
  created_date    date,
  padding_string  varchar2(4000),
  constraint sales3_pk primary key (id),
  constraint sales3_customers_fk foreign key (customer_id) references customers(customer_id),
  constraint sales3_products_fk foreign key (product_id) references products(product_id)
)
clustering by linear order (customer_id)
;


drop table sales3 purge;

create table sales3
clustering by linear order (customer_id)
as
select * from sales;

drop table sales3 purge;

-- Limit attribute clustering to data movement operations.
create table sales3 (
  id              number,
  customer_id     number,
  product_id      number,
  quantity        number,
  created_date    date,
  padding_string  varchar2(4000),
  constraint sales3_pk primary key (id),
  constraint sales3_customers_fk foreign key (customer_id) references customers(customer_id),
  constraint sales3_products_fk foreign key (product_id) references products(product_id)
)
clustering by linear order (customer_id)
no on load yes on data movement
;


drop table sales3 purge;

-- Specify the attribute clustering defaults explicitly.
create table sales3 (
  id              number,
  customer_id     number,
  product_id      number,
  quantity        number,
  created_date    date,
  padding_string  varchar2(4000),
  constraint sales3_pk primary key (id),
  constraint sales3_customers_fk foreign key (customer_id) references customers(customer_id),
  constraint sales3_products_fk foreign key (product_id) references products(product_id)
)
clustering by linear order (customer_id)
yes on load yes on data movement
;

-- Drop clustering.
alter table sales3 drop clustering;

-- Add clustering.
alter table sales3 add clustering by linear order (customer_id);

-- Modify ON LOAD and ON DATA MOVEMENT settings.
alter table sales3 modify clustering yes on load no on data movement;

alter table sales3 modify clustering yes on load yes on data movement;

insert /*+ append no_clustering */ into sales3 select * from sales;
commit;

drop table sales4 purge;

create table sales4 (
  id              number,
  customer_id     number,
  product_id      number,
  quantity        number,
  created_date    date,
  padding_string  varchar2(4000),
  constraint sales4_pk primary key (id),
  constraint sales4_customers_fk foreign key (customer_id) references customers(customer_id),
  constraint sales4_products_fk foreign key (product_id) references products(product_id)
)
clustering
  sales4 join products on (sales4.product_id = products.product_id)
  by linear order (customer_id, products.name)
;

Views

The column of the views shows us the clustering status of a table. The following views provide more information about attribute clustering. - : Information about the attribute clustering settings for the table. - : Information about the columns used for attribute clustering. - : Information about the dimension tables by which a fact table is clustered. - : Information about the joins of the fact table and dimension tables. Here are some example queries.

Code/Command (click line numbers to comment):

set autotrace off
column table_name format a30
column clustering format a10

select table_name,
       clustering
from   user_tables
order by table_name;

TABLE_NAME                     CLUSTERING
------------------------------ ----------
CUSTOMERS                      NO
PRODUCTS                       NO
SALES                          NO
SALES2                         YES
SALES3                         YES
SALES4                         YES

6 rows selected.

SQL>

set linesize 200
column owner format a10
column table_name format a10
column clustering_type format a25
column on_load format a7
column on_datamovement format a15
column valid format a5
column with_zonemap format a12
column last_load_clst format a30
column last_datamove_clst format a30

select owner,
       table_name,
       clustering_type,
       on_load,
       on_datamovement,
       valid,
       with_zonemap,
       last_load_clst,
       last_datamove_clst
from   user_clustering_tables
order by owner, table_name;

OWNER      TABLE_NAME CLUSTERING_TYPE           ON_LOAD ON_DATAMOVEMENT VALID WITH_ZONEMAP LAST_LOAD_CLST                 LAST_DATAMOVE_CLST
---------- ---------- ------------------------- ------- --------------- ----- ------------ ------------------------------ ------------------------------
TESTUSER1  SALES2     LINEAR                    YES     YES             YES   NO                                          24-DEC-20 08.37.43.591982 PM
TESTUSER1  SALES3     LINEAR                    YES     YES             YES   NO
TESTUSER1  SALES4     LINEAR                    YES     YES             YES   NO

SQL>


column owner format a10
column table_name format a10
column detail_owner format a13
column detail_name format a11
column detail_column format a14

select owner,
       table_name,
       detail_owner,
       detail_name,
       detail_column,
       position,
       groupid
from   user_clustering_keys
order by owner, table_name;

OWNER      TABLE_NAME DETAIL_OWNER  DETAIL_NAME DETAIL_COLUMN    POSITION    GROUPID
---------- ---------- ------------- ----------- -------------- ---------- ----------
TESTUSER1  SALES2     TESTUSER1     SALES2      CUSTOMER_ID             1          0
TESTUSER1  SALES3     TESTUSER1     SALES3      CUSTOMER_ID             1          0
TESTUSER1  SALES4     TESTUSER1     SALES4      CUSTOMER_ID             1          0
TESTUSER1  SALES4     TESTUSER1     PRODUCTS    NAME                    2          0

SQL>


column table_name form a10
column dimension_owner form a15
column dimension_name form a14

select table_name,
       dimension_owner,
       dimension_name
from   user_clustering_dimensions
order by table_name;

TABLE_NAME DIMENSION_OWNER DIMENSION_NAME
---------- --------------- --------------
SALES4     TESTUSER1       PRODUCTS

SQL>


column table_name form a10
column tab1_owner form a10
column tab1_name form a10
column tab1_column form a11
column tab2_owner form a10
column tab2_name form a10
column tab2_column form a11

select table_name,
       tab1_owner,
       tab1_name,
       tab1_column,
       tab2_owner,
       tab2_name,
       tab2_column
from   user_clustering_joins
order by table_name;

TABLE_NAME TAB1_OWNER TAB1_NAME  TAB1_COLUMN TAB2_OWNER TAB2_NAME  TAB2_COLUMN
---------- ---------- ---------- ----------- ---------- ---------- -----------
SALES4     TESTUSER1  SALES4     PRODUCT_ID  TESTUSER1  PRODUCTS   PRODUCT_ID

SQL>

Considerations

There are some things to consider when using attribute clustering. - Attribute clustering is an Enterprise Edition feature. - Zone maps can only be used as part of attribute clustering if you are on the Exadata platform. - Attribute clustering works on direct path loads and data movement operations, so it is a more natural fit for data warehouses. - It's possible to do join attribute clustering. Data from the joined tables is not stored in the clustered table, but the cluster order is determined by the join columns in data in joined tables. This is meant for star schemas and fact/dimension tables. - This article ignores as it is for multidimensional clustering on multiple columns, and I don't really understand it. For more information see: Hope this helps. Regards Tim...

Comments (0)

Please to add comments

No comments yet. Be the first to comment!