DBA Hub

📋Steps in this guide1/2

APPROX_COUNT_DISTINCT : Quick Distinct Count in Oracle Database 12cR1 (12.1.0.2)

Use the APPROX_COUNT_DISTINCT to get quick counts of distinct values in 12.1.0.2 onward.

oracle 12cconfigurationintermediate
by OracleDba
12 views
1

Basic Usage

In previous database versions, if we wanted to perform a count of distinct values, we would probably have done something like the following. This query gives us the exact count of the distinct values based on Oracle's read-consistency model. As such, we see all the committed data, along with any uncommitted changes made by the current session. In contrast, the new function does not provide exact results, but should give "negligible deviation from the exact result" according to the documentation. The function can also be used as part of a group query.

Code/Command (click line numbers to comment):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
SELECT COUNT(DISTINCT object_name) AS obj_count
FROM   all_objects;

 OBJ_COUNT
----------
     47171

1 row selected.

SQL>

SELECT APPROX_COUNT_DISTINCT(object_name) AS obj_count
FROM   all_objects;

 OBJ_COUNT
----------
     46788

1 row selected.

SQL>

SELECT tablespace_name, APPROX_COUNT_DISTINCT(table_name) AS tab_count
FROM   all_tables
GROUP BY tablespace_name
ORDER BY tablespace_name;


TABLESPACE_NAME                 TAB_COUNT
------------------------------ ----------
SYSAUX                                 73
SYSTEM                                 36
USERS                                   7
                                       44

4 rows selected.

SQL>
2

Performance

Even in the following simple example we can see a repeatable difference in the speed of the two approaches, but the performance difference does not seem too dramatic. In reality, the function is designed to process much bigger workloads than this, so we can create a bigger table to test. Now we have a table with a little over 100 million rows and 10,000 distinct values. Now we can see the difference in performance of the two methods is more significant. Christian Antognini has an interesting blog post about the performance benefits of the function here , where he shows the elapsed time and memory usage of the function remains consistent regardless of the number of distinct values in the data set. In contrast, as the number of distinct values increase, the elapsed time and memory usage of the " " method increase drastically. For more information see: Hope this helps. Regards Tim...

Code/Command (click line numbers to comment):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
SET TIMING ON

SELECT COUNT(DISTINCT object_name) AS obj_count
FROM   all_objects;

 OBJ_COUNT
----------
     47171

1 row selected.

Elapsed: 00:00:02.39
SQL>


SELECT APPROX_COUNT_DISTINCT(object_name) AS obj_count
FROM   all_objects;

 OBJ_COUNT
----------
     46788

1 row selected.

Elapsed: 00:00:02.00
SQL>

DROP TABLE t1 PURGE;

CREATE TABLE t1 AS
SELECT level AS  data
FROM   dual
CONNECT BY level <= 10000;

INSERT /*+ APPEND */ INTO t1
SELECT a.data FROM t1 a
CROSS JOIN t1 b;

COMMIT;

EXEC DBMS_STATS.gather_table_stats(USER,'T1');

SET TIMING ON

SELECT COUNT(DISTINCT data) AS data_count
FROM   t1;

DATA_COUNT
----------
     10000

1 row selected.

Elapsed: 00:00:19.66
SQL>


SELECT APPROX_COUNT_DISTINCT(data) AS data_count
FROM   t1;

DATA_COUNT
----------
     10050

1 row selected.

Elapsed: 00:00:10.46
SQL>

Comments (0)

Please to add comments

No comments yet. Be the first to comment!