DBA Hub

📋Steps in this guide1/8

Analytic Functions

An introduction to analytic functions in Oracle.

oracle miscconfigurationintermediate
by OracleDba
15 views
1

Setup

The examples in this article require the following table.

Code/Command (click line numbers to comment):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
--DROP TABLE emp PURGE;

CREATE TABLE emp (
  empno    NUMBER(4) CONSTRAINT pk_emp PRIMARY KEY,
  ename    VARCHAR2(10),
  job      VARCHAR2(9),
  mgr      NUMBER(4),
  hiredate DATE,
  sal      NUMBER(7,2),
  comm     NUMBER(7,2),
  deptno   NUMBER(2)
);

INSERT INTO emp VALUES (7369,'SMITH','CLERK',7902,to_date('17-12-1980','dd-mm-yyyy'),800,NULL,20);
INSERT INTO emp VALUES (7499,'ALLEN','SALESMAN',7698,to_date('20-2-1981','dd-mm-yyyy'),1600,300,30);
INSERT INTO emp VALUES (7521,'WARD','SALESMAN',7698,to_date('22-2-1981','dd-mm-yyyy'),1250,500,30);
INSERT INTO emp VALUES (7566,'JONES','MANAGER',7839,to_date('2-4-1981','dd-mm-yyyy'),2975,NULL,20);
INSERT INTO emp VALUES (7654,'MARTIN','SALESMAN',7698,to_date('28-9-1981','dd-mm-yyyy'),1250,1400,30);
INSERT INTO emp VALUES (7698,'BLAKE','MANAGER',7839,to_date('1-5-1981','dd-mm-yyyy'),2850,NULL,30);
INSERT INTO emp VALUES (7782,'CLARK','MANAGER',7839,to_date('9-6-1981','dd-mm-yyyy'),2450,NULL,10);
INSERT INTO emp VALUES (7788,'SCOTT','ANALYST',7566,to_date('13-JUL-87','dd-mm-rr')-85,3000,NULL,20);
INSERT INTO emp VALUES (7839,'KING','PRESIDENT',NULL,to_date('17-11-1981','dd-mm-yyyy'),5000,NULL,10);
INSERT INTO emp VALUES (7844,'TURNER','SALESMAN',7698,to_date('8-9-1981','dd-mm-yyyy'),1500,0,30);
INSERT INTO emp VALUES (7876,'ADAMS','CLERK',7788,to_date('13-JUL-87', 'dd-mm-rr')-51,1100,NULL,20);
INSERT INTO emp VALUES (7900,'JAMES','CLERK',7698,to_date('3-12-1981','dd-mm-yyyy'),950,NULL,30);
INSERT INTO emp VALUES (7902,'FORD','ANALYST',7566,to_date('3-12-1981','dd-mm-yyyy'),3000,NULL,20);
INSERT INTO emp VALUES (7934,'MILLER','CLERK',7782,to_date('23-1-1982','dd-mm-yyyy'),1300,NULL,10);
COMMIT;
2

Introduction

Probably the easiest way to understand analytic functions is to start by looking at aggregate functions. An aggregate function, as the name suggests, aggregates data from several rows into a single result row. For example, we might use the aggregate function to give us an average of all the employee salaries in the EMP table. The clause allows us to apply aggregate functions to subsets of rows. For example, we might want to display the average salary for each department. In both cases, the aggregate function reduces the number of rows returned by the query. Analytic functions also operate on subsets of rows, similar to aggregate functions in queries, but they do not reduce the number of rows returned by the query. For example, the following query reports the salary for each employee, along with the average salary of the employees within the department. This time is an analytic function, operating on the group of rows defined by the contents of the clause. This group of rows is known as a window, which is why analytic functions are sometimes referred to as window[ing] functions. Notice how the function is still reporting the departmental average, like it did in the query, but the result is present in each row, rather than reducing the total number of rows returned. This is because analytic functions are performed on a result set after all join, , and clauses are complete, but before the final operation is performed.

Code/Command (click line numbers to comment):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
SELECT AVG(sal)
FROM   emp;

  AVG(SAL)
----------
2073.21429

SQL>

SELECT deptno, AVG(sal)
FROM   emp
GROUP BY deptno
ORDER BY deptno;

    DEPTNO   AVG(SAL)
---------- ----------
	10 2916.66667
	20	 2175
	30 1566.66667

SQL>

SET PAGESIZE 50
BREAK ON deptno SKIP 1 DUPLICATES

SELECT empno, deptno, sal,
       AVG(sal) OVER (PARTITION BY deptno) AS avg_dept_sal
FROM   emp;

     EMPNO     DEPTNO        SAL AVG_DEPT_SAL
---------- ---------- ---------- ------------
      7782         10       2450   2916.66667
      7839         10       5000   2916.66667
      7934         10       1300   2916.66667

      7566         20       2975         2175
      7902         20       3000         2175
      7876         20       1100         2175
      7369         20        800         2175
      7788         20       3000         2175

      7521         30       1250   1566.66667
      7844         30       1500   1566.66667
      7499         30       1600   1566.66667
      7900         30        950   1566.66667
      7698         30       2850   1566.66667
      7654         30       1250   1566.66667

14 rows selected.

SQL>
3

Analytic Function Syntax

There are some variations in the syntax of the individual analytic functions, but the basic syntax for an analytic function is as follows. The breaks down into the following optional elements. The sub-elements of the each have their own syntax diagrams, shown here . Rather than repeat the syntax diagrams, the following sections describe what each section of the is used for.

Code/Command (click line numbers to comment):

1
2
3
analytic_function([ arguments ]) OVER (analytic_clause)

[ query_partition_clause ] [ order_by_clause [ windowing_clause ] ]
4

query_partition_clause

The divides the result set into partitions, or groups, of data. The operation of the analytic function is restricted to the boundary imposed by these partitions, similar to the way a clause affects the action of an aggregate function. If the is omitted, the whole result set is treated as a single partition. The following query uses an empty clause, so the average presented is based on all the rows of the result set. If we change the clause to include a based on the department, the averages presented are specifically for the department the employee belongs too.

Code/Command (click line numbers to comment):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
CLEAR BREAKS

SELECT empno, deptno, sal,
       AVG(sal) OVER () AS avg_sal
FROM   emp;

     EMPNO     DEPTNO        SAL    AVG_SAL
---------- ---------- ---------- ----------
      7369         20        800 2073.21429
      7499         30       1600 2073.21429
      7521         30       1250 2073.21429
      7566         20       2975 2073.21429
      7654         30       1250 2073.21429
      7698         30       2850 2073.21429
      7782         10       2450 2073.21429
      7788         20       3000 2073.21429
      7839         10       5000 2073.21429
      7844         30       1500 2073.21429
      7876         20       1100 2073.21429
      7900         30        950 2073.21429
      7902         20       3000 2073.21429
      7934         10       1300 2073.21429

SQL>

BREAK ON deptno SKIP 1 DUPLICATES

SELECT empno, deptno, sal,
       AVG(sal) OVER (PARTITION BY deptno) AS avg_dept_sal
FROM   emp;

     EMPNO     DEPTNO        SAL AVG_DEPT_SAL
---------- ---------- ---------- ------------
      7782         10       2450   2916.66667
      7839         10       5000   2916.66667
      7934         10       1300   2916.66667

      7566         20       2975         2175
      7902         20       3000         2175
      7876         20       1100         2175
      7369         20        800         2175
      7788         20       3000         2175

      7521         30       1250   1566.66667
      7844         30       1500   1566.66667
      7499         30       1600   1566.66667
      7900         30        950   1566.66667
      7698         30       2850   1566.66667
      7654         30       1250   1566.66667

SQL>
5

order_by_clause

The is used to order rows, or siblings, within a partition. So if an analytic function is sensitive to the order of the siblings in a partition you should include an . The following query uses the function to return the first salary reported in each department. Notice we have partitioned the result set by the department, but there is no . Now compare the values of the column when we include an to order the siblings by ascending salary. In this case the " " keywords are unnecessary as is the default for an and is the default for orders. When ordering by , the default is . It is important to understand how the affects display order. The is guaranteed to affect the order of the rows as they are processed by the analytic function, but it may not always affect the display order. As a result, you must always use a conventional clause in the query if display order is important. Do not rely on any implicit ordering done by the analytic function. Remember, the conventional clause is performed after the analytic processing, so it will always take precedence.

Code/Command (click line numbers to comment):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
BREAK ON deptno SKIP 1 DUPLICATES

SELECT empno, deptno, sal, 
       FIRST_VALUE(sal IGNORE NULLS) OVER (PARTITION BY deptno) AS first_sal_in_dept
FROM   emp;

     EMPNO     DEPTNO        SAL FIRST_SAL_IN_DEPT
---------- ---------- ---------- -----------------
      7782         10       2450              2450
      7839         10       5000              2450
      7934         10       1300              2450

      7566         20       2975              2975
      7902         20       3000              2975
      7876         20       1100              2975
      7369         20        800              2975
      7788         20       3000              2975

      7521         30       1250              1250
      7844         30       1500              1250
      7499         30       1600              1250
      7900         30        950              1250
      7698         30       2850              1250
      7654         30       1250              1250

SQL>

SELECT empno, deptno, sal, 
       FIRST_VALUE(sal IGNORE NULLS) OVER (PARTITION BY deptno ORDER BY sal ASC NULLS LAST) AS first_val_in_dept
FROM   emp;

     EMPNO     DEPTNO        SAL FIRST_VAL_IN_DEPT
---------- ---------- ---------- -----------------
      7934         10       1300              1300
      7782         10       2450              1300
      7839         10       5000              1300

      7369         20        800               800
      7876         20       1100               800
      7566         20       2975               800
      7788         20       3000               800
      7902         20       3000               800

      7900         30        950               950
      7654         30       1250               950
      7521         30       1250               950
      7844         30       1500               950
      7499         30       1600               950
      7698         30       2850               950

SQL>
6

windowing_clause

We have seen previously the controls the window, or group of rows, the analytic operates on. The gives some analytic functions a further degree of control over this window within the current partition, or whole result set if no partitioning clause is used. The is an extension of the and as such, it can only be used if an is present. The has two basic forms, with a third form added in Oracle 21c. You can read more about the enhancements to the windowing clause from Oracle 21c onward here. When using , you are indicating a specific number of rows relative to the current row, either directly, or via an expression. Assuming you don't cross a partition boundary, that number of rows is fixed. In contrast, when you use you are referring to a range of values in a specific column relative to the value in the current row. As a result, Oracle doesn't know how many rows are included in the range until the ordered set is created. treats all rows with the same value as a group, and the start and end points refer to the group rather than the row. It is possible to omit the keyword and specify a single endpoint. In this case, Oracle assumes your specified is the start point and the end point it the current row. I would advise against using this syntax as it will be unclear to anyone who doesn't understand this default action. Possible values for "start_point" and "end_point" are: - : The window starts at the first row of the partition, or the whole result set if no partitioning clause is used. Only available for start points. - : The window ends at the last row of the partition, or the whole result set if no partitioning clause is used. Only available for end points. - : The window starts or ends at the current row. Can be used as start or end point. - : A physical or logical offset before the current row using a constant or expression that evaluates to a positive numerical value. When used with , it can also be an interval literal if the uses a column. - : As above, but an offset after the current row. The start point must be before or equal to the end point. In addition, the current row does not have to be part of the window. The window can be defined to start and end before or after the current row. For analytic functions that support the , the default action is . The following query is similar to one used previously to report the employee salary and average department salary, but now we have included an so we also get the default . There are two things to notice here. - The addition of the without a means the query is now returning a running average. - The default is , not . The fact it is , not , means it includes all rows with the same value as the value in the current row, even if they are further down the result set. As a result, the window may extend beyond the current row, even though you may not think this is the case. To illustrate the last point, let's look at the values if we compare and for the last query. Notice the differences between those lines in bold. In my opinion, the default should have been . This would make the accidental inclusion of the much less confusing. The following query shows one method for accessing data from previous and following rows within the current row using the . This can also be accomplished with LAG and LEAD .

Code/Command (click line numbers to comment):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
RANGE BETWEEN start_point AND end_point
ROWS BETWEEN start_point AND end_point
GROUPS BETWEEN start_point AND end_point (21c onward)

SELECT empno, deptno, sal, 
       AVG(sal) OVER (PARTITION BY deptno ORDER BY sal) AS avg_dept_sal_sofar
FROM   emp;

     EMPNO     DEPTNO        SAL AVG_DEPT_SAL_SOFAR
---------- ---------- ---------- ------------------
      7934         10       1300               1300
      7782         10       2450               1875
      7839         10       5000         2916.66667

      7369         20        800                800
      7876         20       1100                950
      7566         20       2975               1625
      7788         20       3000               2175
      7902         20       3000               2175

      7900         30        950                950
      7654         30       1250               1150
      7521         30       1250               1150
      7844         30       1500             1237.5
      7499         30       1600               1310
      7698         30       2850         1566.66667

SQL>

SELECT empno, deptno, sal, 
       AVG(sal) OVER (PARTITION BY deptno ORDER BY sal
                      RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS range_avg,
       AVG(sal) OVER (PARTITION BY deptno ORDER BY sal
                      ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS rows_avg
FROM   emp;

     EMPNO     DEPTNO        SAL  RANGE_AVG   ROWS_AVG
---------- ---------- ---------- ---------- ----------
      7934         10       1300       1300       1300
      7782         10       2450       1875       1875
      7839         10       5000 2916.66667 2916.66667

      7369         20        800        800        800
      7876         20       1100        950        950
      7566         20       2975       1625       1625
7788         20       3000       2175    1968.75
      7902         20       3000       2175       2175
7900         30        950        950        950
7654         30       1250       1150       1100
      7521         30       1250       1150       1150
7844         30       1500     1237.5     1237.5
      7499         30       1600       1310       1310
      7698         30       2850 1566.66667 1566.66667

SQL>

CLEAR BREAKS

SELECT empno, deptno, sal, 
       FIRST_VALUE(sal) OVER (ORDER BY sal ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) AS previous_sal,
       LAST_VALUE(sal) OVER (ORDER BY sal ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING) AS next_sal
FROM   emp;

     EMPNO     DEPTNO        SAL PREVIOUS_SAL   NEXT_SAL
---------- ---------- ---------- ------------ ----------
      7369         20        800          800        950
      7900         30        950          800       1100
      7876         20       1100          950       1250
      7521         30       1250         1100       1250
      7654         30       1250         1250       1300
      7934         10       1300         1250       1500
      7844         30       1500         1300       1600
      7499         30       1600         1500       2450
      7782         10       2450         1600       2850
      7698         30       2850         2450       2975
      7566         20       2975         2850       3000
      7788         20       3000         2975       3000
      7902         20       3000         3000       5000
      7839         10       5000         3000       5000

SQL>
7

exclude_clause

The clause was added in Oracle 21c. You can read about the syntax enhancements here.
8

Using Analytic Functions

The best way to understand what analytic functions are capable of is to play around with them. This article contains links to other articles I've written about specific analytic functions or links to the documentation. The "*" indicates the function supports the full analytic syntax, including the windowing clause. For more information see: Hope this helps. Regards Tim...

Comments (0)

Please to add comments

No comments yet. Be the first to comment!