DBA Hub

📋Steps in this guide1/6

KURTOSIS_POP and KURTOSIS_SAMP Analytic Functions

The KURTOSIS_POP and KURTOSIS_SAMP analytic functions were added in Oracle 21c to describe the 'tailedness' or shape of a probability distribution.

oracle 21cconfigurationintermediate
by OracleDba
14 views
1

Basics

You can read a basic introduction to kurtosis here . The and aggregate and analytic functions were added in Oracle 21c describe the "tailedness" or shape of a probability distribution. In both cases they return a numeric value with the following meaning. - Negative : The data has a flatter peak than a normal distribution, and shorter tails. It's more dome shaped. - Zero : Data has a normal distribution. Most of the values cluster around the mean, with fewer at the tails. Zero is also returned if the data set has less than 3 rows. - Positive : Fewer values are located near the mean, and more are located near the tails. - Null : Null values in the expression are ignored, and the function will return null if all rows have a null value for the expression. The and functions will not return the same results, but both should be representative of the distribution of data. The larger the data set, the more similar their results will be.
2

Setup

We need some data with various distribution patterns to measure using the and aggregate functions. The following table contains columns that represent skewed data and a normal distribution. We will use the following table to demonstrate the use of the and analytic functions. The sample size is too small, but the table will be familiar to people practicing analytic function using the articles on this site.

Code/Command (click line numbers to comment):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
-- drop table t1 purge;

create table t1 (
  id      number generated always as identity,
  skew    number,
  normal  number
);


insert into t1 (skew, normal)
select case
         when level > 9800 then 1
         else dbms_random.value(400, 500)
       end,
       dbms_random.normal
from   dual
connect by level <= 10000;
commit;

create table emp (
  empno    number(4) constraint pk_emp primary key,
  ename    varchar2(10),
  job      varchar2(9),
  mgr      number(4),
  hiredate date,
  sal      number(7,2),
  comm     number(7,2),
  deptno   number(2) );

insert into emp values (7369,'SMITH','CLERK',7902,to_date('17-12-1980','dd-mm-yyyy'),800,NULL,20);
insert into emp values (7499,'ALLEN','SALESMAN',7698,to_date('20-2-1981','dd-mm-yyyy'),1600,300,30);
insert into emp values (7521,'WARD','SALESMAN',7698,to_date('22-2-1981','dd-mm-yyyy'),1250,500,30);
insert into emp values (7566,'JONES','MANAGER',7839,to_date('2-4-1981','dd-mm-yyyy'),2975,NULL,20);
insert into emp values (7654,'MARTIN','SALESMAN',7698,to_date('28-9-1981','dd-mm-yyyy'),1250,1400,30);
insert into emp values (7698,'BLAKE','MANAGER',7839,to_date('1-5-1981','dd-mm-yyyy'),2850,NULL,30);
insert into emp values (7782,'CLARK','MANAGER',7839,to_date('9-6-1981','dd-mm-yyyy'),2450,NULL,10);
insert into emp values (7788,'SCOTT','ANALYST',7566,to_date('13-JUL-87','dd-mm-rr')-85,3000,NULL,20);
insert into emp values (7839,'KING','PRESIDENT',NULL,to_date('17-11-1981','dd-mm-yyyy'),5000,NULL,10);
insert into emp values (7844,'TURNER','SALESMAN',7698,to_date('8-9-1981','dd-mm-yyyy'),1500,0,30);
insert into emp values (7876,'ADAMS','CLERK',7788,to_date('13-JUL-87', 'dd-mm-rr')-51,1100,NULL,20);
insert into emp values (7900,'JAMES','CLERK',7698,to_date('3-12-1981','dd-mm-yyyy'),950,NULL,30);
insert into emp values (7902,'FORD','ANALYST',7566,to_date('3-12-1981','dd-mm-yyyy'),3000,NULL,20);
insert into emp values (7934,'MILLER','CLERK',7782,to_date('23-1-1982','dd-mm-yyyy'),1300,NULL,10);
commit;
3

KURTOSIS_POP and KURTOSIS_SAMP as Aggregate Functions

The following query uses the aggregate function to display the "tailedness" of the data in the and columns. As expected the column returns a non-zero value, and the column returns a near-zero value The function uses a sample size of 100% of the rows, which can represent an overhead for large data sets. In contrast the function uses a smaller sample size, making it more efficient for large data sets, whilst still returning representative results. Using or keywords mean only unique values in the expression are used for the calculation. The keyword is that same as the default action. The , and keywords are also available for the analytic functions.

Code/Command (click line numbers to comment):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
select kurtosis_pop(skew) as skew,
       kurtosis_pop(normal) as normal
from   t1;

      SKEW     NORMAL
---------- ----------
30.0819301 .0453792598

SQL>

select kurtosis_samp(skew) as skew,
       kurtosis_samp(normal) as normal
from   t1;

      SKEW     NORMAL
---------- ----------
30.0975767 .0460021977

SQL>

select kurtosis_samp(distinct skew) as skew_distinct,
       kurtosis_samp(unique skew) as skew_unique,
       kurtosis_samp(all skew) as skew_all,
       kurtosis_samp(skew) as skew
from   t1;

SKEW_DISTINCT SKEW_UNIQUE   SKEW_ALL       SKEW
------------- ----------- ---------- ----------
   4.44000461  4.44000461 30.0975767 30.0975767

SQL>

SQL>
4

KURTOSIS_POP Analytic Function

Using an empty clause turns the function into an analytic function. The lack of a partitioning clause means the whole result set is treated as a single partition. The following query uses the analytic function to display the "tailedness" of the data in the column, as well as all the original data. Adding the partitioning clause allows us to display the salary kurtosis per department, along with the employee data for each department.

Code/Command (click line numbers to comment):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
select empno,
       ename,
       deptno,
       sal,
       round(kurtosis_pop(sal) over (),2) as sal_kurtosis 
from   emp;

     EMPNO ENAME          DEPTNO        SAL SAL_KURTOSIS
---------- ---------- ---------- ---------- ------------
      7369 SMITH              20        800          .49
      7499 ALLEN              30       1600          .49
      7521 WARD               30       1250          .49
      7566 JONES              20       2975          .49
      7654 MARTIN             30       1250          .49
      7698 BLAKE              30       2850          .49
      7782 CLARK              10       2450          .49
      7788 SCOTT              20       3000          .49
      7839 KING               10       5000          .49
      7844 TURNER             30       1500          .49
      7876 ADAMS              20       1100          .49
      7900 JAMES              30        950          .49
      7902 FORD               20       3000          .49
      7934 MILLER             10       1300          .49

SQL>

select empno,
       ename,
       deptno,
       sal,
       round(kurtosis_pop(sal) over (partition by deptno),2) as sal_kurtosis_by_dept
from   emp;

     EMPNO ENAME          DEPTNO        SAL SAL_KURTOSIS_BY_DEPT
---------- ---------- ---------- ---------- --------------------
      7782 CLARK              10       2450                    0
      7839 KING               10       5000                    0
      7934 MILLER             10       1300                    0
      7566 JONES              20       2975                -1.77
      7902 FORD               20       3000                -1.77
      7876 ADAMS              20       1100                -1.77
      7369 SMITH              20        800                -1.77
      7788 SCOTT              20       3000                -1.77
      7521 WARD               30       1250                  .46
      7844 TURNER             30       1500                  .46
      7499 ALLEN              30       1600                  .46
      7900 JAMES              30        950                  .46
      7698 BLAKE              30       2850                  .46
      7654 MARTIN             30       1250                  .46

SQL>
5

KURTOSIS_SAMP Analytic Function

Using an empty clause turns the function into an analytic function. The lack of a partitioning clause means the whole result set is treated as a single partition. The following query uses the analytic function to display the "tailedness" of the data in the column, as well as all the original data. Adding the partitioning clause allows us to display the salary kurtosis per department, along with the employee data for each department.

Code/Command (click line numbers to comment):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
select empno,
       ename,
       deptno,
       sal,
       round(kurtosis_samp(sal) over (),2) as sal_kurtosis 
from   emp;

     EMPNO ENAME          DEPTNO        SAL SAL_KURTOSIS
---------- ---------- ---------- ---------- ------------
      7369 SMITH              20        800         1.32
      7499 ALLEN              30       1600         1.32
      7521 WARD               30       1250         1.32
      7566 JONES              20       2975         1.32
      7654 MARTIN             30       1250         1.32
      7698 BLAKE              30       2850         1.32
      7782 CLARK              10       2450         1.32
      7788 SCOTT              20       3000         1.32
      7839 KING               10       5000         1.32
      7844 TURNER             30       1500         1.32
      7876 ADAMS              20       1100         1.32
      7900 JAMES              30        950         1.32
      7902 FORD               20       3000         1.32
      7934 MILLER             10       1300         1.32

SQL>

select empno,
       ename,
       deptno,
       sal,
       round(kurtosis_samp(sal) over (partition by deptno),2) as sal_kurtosis_by_dept
from   emp;

     EMPNO ENAME          DEPTNO        SAL SAL_KURTOSIS_BY_DEPT
---------- ---------- ---------- ---------- --------------------
      7782 CLARK              10       2450                    0
      7839 KING               10       5000                    0
      7934 MILLER             10       1300                    0
      7566 JONES              20       2975                 -3.1
      7902 FORD               20       3000                 -3.1
      7876 ADAMS              20       1100                 -3.1
      7369 SMITH              20        800                 -3.1
      7788 SCOTT              20       3000                 -3.1
      7521 WARD               30       1250                 3.84
      7844 TURNER             30       1500                 3.84
      7499 ALLEN              30       1600                 3.84
      7900 JAMES              30        950                 3.84
      7698 BLAKE              30       2850                 3.84
      7654 MARTIN             30       1250                 3.84

SQL>
6

Quick Links

The "*" indicates the function supports the full analytic syntax, including the windowing clause. For more information see: - KURTOSIS_POP - KURTOSIS_SAMP - Analytic Functions : All Articles Hope this helps. Regards Tim...

Comments (0)

Please to add comments

No comments yet. Be the first to comment!