📋Steps in this guide1/11

Column-Level Collation and Case-Insensitive Database in Oracle Database 12c Release 2 (12.2)

Oracle Database 12c Release 2 (12.2) lets you specify the collation used for columns that hold string data, allowing you to easily perform case insensitive queries, as well as control the order of queried data.

oracle 12cconfigurationintermediate

by OracleDba

23 views

Assumptions

This article assumes the following.

Default Behaviour

To understand the impact of collation, it's important to understand the default behaviour of an Oracle database. Before we start using UTF data we need to make sure SQL*Plus will handle it properly. We can do this by setting the environment variable correctly before starting SQL*Plus. Create the following test table and populate it with some data containing special characters. Notice the variation in the first letter of the first name. The following query shows how Oracle treats special characters with regards to sort operations. We can see upper case comes before lower case in the sort order, and the regular characters come before the special characters. The following query shows how Oracle treats special characters in comparisons. We can see an exact match is made. The following query shows how Oracle treats special characters in group operations. Once again, the variants on the first character are treated separately. In the following examples, think back to this default behaviour.

Code/Command (click line numbers to comment):

# set NLS_LANG=AMERICAN_AMERICA.UTF8
export NLS_LANG=AMERICAN_AMERICA.UTF8

sqlplus test/test@pdb1

DROP TABLE t1 PURGE;

CREATE TABLE t1 (
  id          NUMBER,
  company     VARCHAR2(15 CHAR),
  location    VARCHAR2(15 CHAR),
  CONSTRAINT t1_pk PRIMARY KEY (id)
);

INSERT INTO t1 VALUES (1, 'LÃ¶wenbrauerei', 'BrÃ¤unlingen');
INSERT INTO t1 VALUES (2, 'LÃ–wenbrauerei', 'BrÃ„unlingen');
INSERT INTO t1 VALUES (3, 'Lowenbrauerei', 'Braunlingen');
INSERT INTO t1 VALUES (4, 'LOwenbrauerei', 'BrAunlingen');
COMMIT;

COLUMN company FORMAT A15
COLUMN location FORMAT A15

SELECT *
FROM   t1
ORDER BY company;

	ID COMPANY	   LOCATION
---------- --------------- ---------------
	 4 LOwenbrauerei   BrAunlingen
	 3 Lowenbrauerei   Braunlingen
         2 LÃ–wenbrauerei   BrÃ„unlingen
         1 LÃ¶wenbrauerei   BrÃ¤unlingen

SQL>

SELECT *
FROM   t1
WHERE  company = 'LÃ¶wenbrauerei';

	ID COMPANY	   LOCATION
---------- --------------- ---------------
         1 LÃ¶wenbrauerei   BrÃ¤unlingen

SQL>


SELECT *
FROM   t1
WHERE  company LIKE '%Ã¶%';

	ID COMPANY	   LOCATION
---------- --------------- ---------------
         1 LÃ¶wenbrauerei   BrÃ¤unlingen

SQL>

SELECT company, COUNT(*) AS amount
FROM   t1
GROUP BY company
ORDER BY company;

COMPANY 	    AMOUNT
--------------- ----------
LOwenbrauerei		 1
Lowenbrauerei		 1
LÃ–wenbrauerei            1
LÃ¶wenbrauerei            1

SQL>

Setting Collation

Collation can be set at a number of different levels, which will be demonstrated below. Each section will not repeat all possible syntax variations or collations as that would be too repetitive. There are two basic types of collation. - Binary : Ordering and comparisons of string data are based on the numeric value of the characters in the strings. - Linguistic : Ordering and comparisons of string data are based on the alphabetic sequence of the characters, regardless of their numeric values. The list of linguistic collations is available here . When using collations there are three suffixes that alter the behaviour of sorts and comparisons. - "_CI" : Case insensitive, but accent sensitive. - "_AI" : Both case and accent insensitive. - "_CS" : Both case and accent sensitive. This is default if no extension is used. If no collation is specified, directly or via a default setting, the default pseudo-collation is used, which means the and parameters are used to determine the actual collation used. The only supported collation for and columns is the pseudo-collation.

Column-Level

The collation of a specific column can be defined when the table is created. In the following example we set the collation of the column to , which will make sorts and comparisons of that columns data case insensitive, but will still treat special characters as separate. We can see the collation has made a difference if we query data in the column. We can also set the collation of new columns added to an existing table. In the following example we set the collation of the column to , which will make sorts and comparisons of that columns data both case insensitive and accent insensitive. The output below shows the impact of the collation on this new column. We can see the collation of the columns using the views.

Code/Command (click line numbers to comment):

DROP TABLE t1 PURGE;

CREATE TABLE t1 (
  id          NUMBER,
  company     VARCHAR2(15 CHAR)
COLLATE BINARY_CI
,
  CONSTRAINT t1_pk PRIMARY KEY (id)
);

INSERT INTO t1 VALUES (1, 'LÃ¶wenbrauerei');
INSERT INTO t1 VALUES (2, 'LÃ–wenbrauerei');
INSERT INTO t1 VALUES (3, 'Lowenbrauerei');
INSERT INTO t1 VALUES (4, 'LOwenbrauerei');
COMMIT;

SELECT company, COUNT(*) AS amount
FROM   t1
GROUP BY company
ORDER BY company;

COMPANY 	    AMOUNT
--------------- ----------
Lowenbrauerei		 2
LÃ¶wenbrauerei            2

SQL>

 
SELECT *
FROM   t1
WHERE  company LIKE '%Ã¶%';

	ID COMPANY
---------- ---------------
         1 LÃ¶wenbrauerei
         2 LÃ–wenbrauerei

SQL>

ALTER TABLE t1 ADD (
  location  VARCHAR2(15 CHAR)
COLLATE BINARY_AI
);


UPDATE t1 SET location = 'BrÃ¤unlingen' WHERE id = 1;
UPDATE t1 SET location = 'BrÃ„unlingen' WHERE id = 2;
UPDATE t1 SET location = 'Braunlingen' WHERE id = 3;
UPDATE t1 SET location = 'BrAunlingen' WHERE id = 4;
COMMIT;

SELECT location, COUNT(*) AS amount
FROM   t1
GROUP BY location
ORDER BY location;

LOCATION	    AMOUNT
--------------- ----------
BrÃ¤unlingen              4

SQL>

 
SELECT *
FROM   t1
WHERE  location LIKE '%Ã¤%';

	ID COMPANY	   LOCATION
---------- --------------- ---------------
         1 LÃ¶wenbrauerei   BrÃ¤unlingen
         2 LÃ–wenbrauerei   BrÃ„unlingen
	 3 Lowenbrauerei   Braunlingen
	 4 LOwenbrauerei   BrAunlingen

SQL>

COLUMN column_name FORMAT A30
COLUMN collation FORMAT A20

SELECT column_id,
       column_name,
       collation
FROM   user_tab_columns
WHERE  table_name = 'T1'
ORDER BY column_id;

 COLUMN_ID COLUMN_NAME			  COLLATION
---------- ------------------------------ --------------------
	 1 ID
	 2 COMPANY			  BINARY_CI
	 3 LOCATION			  BINARY_AI

SQL>

Table-Level

The default collation for the whole table can be defined when the table is created. The table default collation can be changed using the statement, but this only affects new columns added to table, not existing columns. By displaying the column level collation we can see the existing column was not affected by the change in the default collation of the table. The default table collation is displayed using the views.

Code/Command (click line numbers to comment):

DROP TABLE t1 PURGE;

CREATE TABLE t1 (
  id          NUMBER,
  company     VARCHAR2(15 CHAR),
  CONSTRAINT t1_pk PRIMARY KEY (id)
)
DEFAULT COLLATION BINARY_CI
;

ALTER TABLE t1
DEFAULT COLLATION BINARY_AI
;

ALTER TABLE t1 ADD (
  location  VARCHAR2(15 CHAR)
);

COLUMN column_name FORMAT A30
COLUMN collation FORMAT A20

SELECT column_id,
       column_name,
       collation
FROM   user_tab_columns
WHERE  table_name = 'T1'
ORDER BY column_id;

 COLUMN_ID COLUMN_NAME			  COLLATION
---------- ------------------------------ --------------------
	 1 ID
	 2 COMPANY			  BINARY_CI
	 3 LOCATION			  BINARY_AI

SQL>

SELECT default_collation
FROM   user_tables
WHERE  table_name = 'T1';

DEFAULT_COLLATION
--------------------------------------------------------------------------------
BINARY_AI

SQL>

Schema-Level

The default collation for a schema can be defined when the user is created. If we connect to the user and create a table we can see the default setting is used to define the default table collation. The default schema collation is changed using the command, but this doesn't affect the default collation of any existing objects. If we add another column to the existing table you may think it will use the new default schema collation, but you would be wrong. Remember, changes to the default schema collation do not affect existing objects, so the default table collation is unchanged.

Code/Command (click line numbers to comment):

CONN / AS SYSDBA
ALTER SESSION SET CONTAINER = pdb1;

CREATE USER test2 IDENTIFIED BY test2
  DEFAULT TABLESPACE users
  QUOTA UNLIMITED ON users
DEFAULT COLLATION BINARY_CI
;

GRANT CREATE SESSION, CREATE TABLE to test2;

CONN test2/test2@pdb1

DROP TABLE t1 PURGE;

CREATE TABLE t1 (
  id          NUMBER,
  company     VARCHAR2(15 CHAR),
  CONSTRAINT t1_pk PRIMARY KEY (id)
);


SELECT default_collation
FROM   user_tables
WHERE  table_name = 'T1';

DEFAULT_COLLATION
--------------------------------------------------------------------------------
BINARY_CI

SQL>

ALTER USER test2
DEFAULT COLLATION BINARY_AI
;

ALTER TABLE t1 ADD (
  location  VARCHAR2(15 CHAR)
);


COLUMN column_name FORMAT A30
COLUMN collation FORMAT A20

SELECT column_id,
       column_name,
       collation
FROM   user_tab_columns
WHERE  table_name = 'T1'
ORDER BY column_id;

 COLUMN_ID COLUMN_NAME			  COLLATION
---------- ------------------------------ --------------------
	 1 ID
	 2 COMPANY			  BINARY_CI
	 3 LOCATION			  BINARY_CI

SQL>

SELECT default_collation
FROM   user_tables
WHERE  table_name = 'T1';

DEFAULT_COLLATION
--------------------------------------------------------------------------------
BINARY_CI

SQL>

Session-Level

There are two ways to set the default collation for a session. The parameter can be set at session level. Alternatively the and parameters can be set at session level, making them the defaults for the session. Notice the default references to the pseudo-collation, which means the and parameters are used to determine the actual collation used.

Code/Command (click line numbers to comment):

CONN test/test@pdb1
ALTER SESSION SET
DEFAULT_COLLATION=BINARY_CI
;

SELECT SYS_CONTEXT('USERENV', 'SESSION_DEFAULT_COLLATION') FROM DUAL;

SYS_CONTEXT('USERENV','SESSION_DEFAULT_COLLATION')
--------------------------------------------------------------------------------
BINARY_CI

SQL>


DROP TABLE t1 PURGE;

CREATE TABLE t1 (
  id          NUMBER,
  company     VARCHAR2(15 CHAR),
  location    VARCHAR2(15 CHAR),
  CONSTRAINT t1_pk PRIMARY KEY (id)
);


SELECT column_id, column_name, collation
FROM   user_tab_columns
WHERE  table_name = 'T1'
ORDER BY column_id;

 COLUMN_ID COLUMN_NAME			  COLLATION
---------- ------------------------------ --------------------
	 1 ID
	 2 COMPANY			  BINARY_CI
	 3 LOCATION			  BINARY_CI

SQL>


SELECT default_collation
FROM   user_tables
WHERE  table_name = 'T1';

DEFAULT_COLLATION
--------------------------------------------------------------------------------
BINARY_CI

SQL>

CONN test/test@pdb1
ALTER SESSION SET NLS_SORT=BINARY_CI;
ALTER SESSION SET NLS_COMP=LINGUISTIC;
DROP TABLE t1 PURGE;

CREATE TABLE t1 (
  id          NUMBER,
  company     VARCHAR2(10),
  location    VARCHAR2(10),
  CONSTRAINT t1_pk PRIMARY KEY (id)
);


SELECT column_id, column_name, collation
FROM   user_tab_columns
WHERE  table_name = 'T1'
ORDER BY column_id;

 COLUMN_ID COLUMN_NAME			  COLLATION
---------- ------------------------------ --------------------
	 1 ID
	 2 COMPANY			  USING_NLS_COMP
	 3 LOCATION			  USING_NLS_COMP

SQL>


SELECT default_collation
FROM   user_tables
WHERE  table_name = 'T1';

DEFAULT_COLLATION
--------------------------------------------------------------------------------
USING_NLS_COMP

SQL>

Database-Level

If the and parameters are set at the CDB level they represent the default values for the CDB and all associated PDBs. If the parameters are set at the PDB level they override the CDB settings. Remember to reset these values if you don't want any unexpected behaviour in your instance.

Code/Command (click line numbers to comment):

CONN / AS SYSDBA

-- Alter the CDB defaults.
ALTER SYSTEM SET NLS_SORT=BINARY_CI SCOPE=SPFILE;
ALTER SYSTEM SET NLS_COMP=LINGUISTIC SCOPE=SPFILE;
SHUTDOWN IMMEDIATE;
STARTUP;


SHOW PARAMETER NLS_SORT

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
nls_sort                             string      BINARY_CI
SQL>

SHOW PARAMETER NLS_COMP

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
nls_comp                             string      LINGUISTIC
SQL>


-- Switch to the PDB.
ALTER SESSION SET CONTAINER = pdb1;

SHOW PARAMETER NLS_SORT

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
nls_sort                             string      BINARY_CI
SQL>

SHOW PARAMETER NLS_COMP

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
nls_comp                             string      LINGUISTIC
SQL>

CONN / AS SYSDBA
ALTER SESSION SET CONTAINER = pdb1;

-- Alter the PDB defaults.
ALTER SYSTEM SET NLS_SORT=BINARY SCOPE=SPFILE;
ALTER SYSTEM SET NLS_COMP=BINARY SCOPE=SPFILE;
SHUTDOWN IMMEDIATE;
STARTUP;


SHOW PARAMETER NLS_SORT

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
nls_sort                             string      BINARY
SQL>

SHOW PARAMETER NLS_COMP

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
nls_sort                             string      BINARY
SQL>

CONN / AS SYSDBA
ALTER SYSTEM RESET NLS_SORT SCOPE=SPFILE;
ALTER SYSTEM RESET NLS_COMP SCOPE=SPFILE;

ALTER SESSION SET CONTAINER = pdb1;

ALTER SYSTEM RESET NLS_SORT SCOPE=SPFILE;
ALTER SYSTEM RESET NLS_COMP SCOPE=SPFILE;

CONN / AS SYSDBA
SHUTDOWN IMMEDIATE;
STARTUP;

Statement-Level

There are a number of ways to influence the results of statements, regardless of the collation associated with the columns. The following examples show how you might use the operator and function. First we need a new session and a new table with no collation defined. Now we can try a few examples of using the operator and function.

Code/Command (click line numbers to comment):

CONN test/test@pdb1

DROP TABLE t1 PURGE;

CREATE TABLE t1 (
  id          NUMBER,
  company     VARCHAR2(15 CHAR),
  location    VARCHAR2(15 CHAR),
  CONSTRAINT t1_pk PRIMARY KEY (id)
);

INSERT INTO t1 VALUES (1, 'LÃ¶wenbrauerei', 'BrÃ¤unlingen');
INSERT INTO t1 VALUES (2, 'LÃ–wenbrauerei', 'BrÃ„unlingen');
INSERT INTO t1 VALUES (3, 'Lowenbrauerei', 'Braunlingen');
INSERT INTO t1 VALUES (4, 'LOwenbrauerei', 'BrAunlingen');
COMMIT;

SELECT *
FROM   t1
ORDER BY company
COLLATE BINARY_CI
;

	ID COMPANY	   LOCATION
---------- --------------- ---------------
	 3 Lowenbrauerei   Braunlingen
	 4 LOwenbrauerei   BrAunlingen
         1 LÃ¶wenbrauerei   BrÃ¤unlingen
         2 LÃ–wenbrauerei   BrÃ„unlingen

SQL>


SELECT *
FROM   t1
ORDER BY
NLSSORT(company, 'NLS_SORT=BINARY_AI')
;

	ID COMPANY	   LOCATION
---------- --------------- ---------------
         1 LÃ¶wenbrauerei   BrÃ¤unlingen
	 4 LOwenbrauerei   BrAunlingen
	 3 Lowenbrauerei   Braunlingen
         2 LÃ–wenbrauerei   BrÃ„unlingen

SQL>


COLUMN sorted_name FORMAT A15

SELECT company
COLLATE BINARY_AI
AS sorted_name,
       COUNT(*) AS amount
FROM   t1
GROUP BY company
COLLATE BINARY_AI
ORDER BY 1;

SORTED_NAME	    AMOUNT
--------------- ----------
LÃ¶wenbrauerei            4

SQL>


COLUMN sorted_name FORMAT A15

SELECT
NLSSORT(company, 'NLS_SORT=BINARY_AI')
AS sorted_name,
       COUNT(*) AS amount
FROM   t1
GROUP BY
NLSSORT(company, 'NLS_SORT=BINARY_AI')
ORDER BY 1;

SORTED_NAME	    AMOUNT
--------------- ----------
6C6F77656E62726 	 4

SQL>


SELECT *
FROM   t1
WHERE SUBSTR(company,2,1)
COLLATE BINARY_AI
= 'Ã¶'
COLLATE BINARY_AI
ORDER BY company;

	ID COMPANY	   LOCATION
---------- --------------- ---------------
	 4 LOwenbrauerei   BrAunlingen
	 3 Lowenbrauerei   Braunlingen
         2 LÃ–wenbrauerei   BrÃ„unlingen
         1 LÃ¶wenbrauerei   BrÃ¤unlingen

SQL>


SELECT *
FROM   t1
WHERE
NLSSORT(SUBSTR(company,2,1), 'NLS_SORT=BINARY_AI') = NLSSORT('Ã¶', 'NLS_SORT=BINARY_AI')
ORDER BY company;

	ID COMPANY	   LOCATION
---------- --------------- ---------------
	 4 LOwenbrauerei   BrAunlingen
	 3 Lowenbrauerei   Braunlingen
         2 LÃ–wenbrauerei   BrÃ„unlingen
         1 LÃ¶wenbrauerei   BrÃ¤unlingen

SQL>

Linguistic Indexes

Any index created on a column using a named collation is actually a function-based index, but you can create additional indexes with alternative collations.

Code/Command (click line numbers to comment):

CREATE INDEX t1_company_1_idx ON t1(
NLSSORT(company, 'NLS_SORT=GERMAN_AI')
);
CREATE INDEX t1_company_2_idx ON t1(
NLSSORT(company, 'NLS_SORT=FRENCH_AI')
);

CREATE INDEX t1_location_idx ON t1(location
COLLATE BINARY_AI
);

Views and Materialized Views

Views and materialized views can take advantage of collation. The following example creates a clean table, a view that uses collation, then queries the view to shown the collation in action. For more information see: Hope this helps. Regards Tim...

Code/Command (click line numbers to comment):

CONN test/test@pdb1

DROP TABLE t1 PURGE;

CREATE TABLE t1 (
  id          NUMBER,
  company     VARCHAR2(15 CHAR),
  location    VARCHAR2(15 CHAR),
  CONSTRAINT t1_pk PRIMARY KEY (id)
);

INSERT INTO t1 VALUES (1, 'LÃ¶wenbrauerei', 'BrÃ¤unlingen');
INSERT INTO t1 VALUES (2, 'LÃ–wenbrauerei', 'BrÃ„unlingen');
INSERT INTO t1 VALUES (3, 'Lowenbrauerei', 'Braunlingen');
INSERT INTO t1 VALUES (4, 'LOwenbrauerei', 'BrAunlingen');
COMMIT;


-- Create a view using collation.
CREATE OR REPLACE VIEW t1_binary_ai_v (
  id,
  company,
  location
)
AS
SELECT id,
       company
COLLATE BINARY_AI
,
       location
COLLATE BINARY_AI
FROM   t1;


--Query using collation-based sort.
SELECT *
FROM   t1_binary_ai_v
ORDER BY company;

	ID COMPANY	   LOCATION
---------- --------------- ---------------
         1 LÃ¶wenbrauerei   BrÃ¤unlingen
	 4 LOwenbrauerei   BrAunlingen
	 3 Lowenbrauerei   Braunlingen
         2 LÃ–wenbrauerei   BrÃ„unlingen

SQL>


-- Query using collation-based group by.
SELECT company, COUNT(*) AS amount
FROM   t1_binary_ai_v
GROUP BY company
ORDER BY company;

COMPANY 	    AMOUNT
--------------- ----------
LÃ¶wenbrauerei            4

SQL>

Comments (0)

Please to add comments

No comments yet. Be the first to comment!