Warning, /libraries/kdb/src/drivers/sqlite/icu/README.txt is written in an unsupported language. File is not indexed.

0001 [this is a fork of sqlite's ext/icu/ extension]
0002 
0003 This directory contains source code for the SQLite "ICU" extension, an
0004 integration of the "International Components for Unicode" library with
0005 SQLite. Documentation follows.
0006 
0007     1. Features
0008 
0009         1.1  SQL Scalars upper() and lower()
0010         1.2  Unicode Aware LIKE Operator
0011         1.3  ICU Collation Sequences
0012         1.4  SQL REGEXP Operator
0013 
0014     2. Compilation and Usage
0015 
0016     3. Bugs, Problems and Security Issues
0017 
0018         3.1  The "case_sensitive_like" Pragma
0019         3.2  The SQLITE_MAX_LIKE_PATTERN_LENGTH Macro
0020         3.3  Collation Sequence Security Issue
0021 
0022 
0023 1. FEATURES
0024 
0025   1.1  SQL Scalars upper() and lower()
0026 
0027     SQLite's built-in implementations of these two functions only
0028     provide case mapping for the 26 letters used in the English
0029     language. The ICU based functions provided by this extension
0030     provide case mapping, where defined, for the full range of
0031     unicode characters.
0032 
0033     ICU provides two types of case mapping, "general" case mapping and
0034     "language specific". Refer to ICU documentation for the differences
0035     between the two. Specifically:
0036 
0037        https://www.icu-project.org/userguide/caseMappings.html
0038        https://www.icu-project.org/userguide/posix.html#case_mappings
0039 
0040     To utilise "general" case mapping, the upper() or lower() scalar
0041     functions are invoked with one argument:
0042 
0043         upper('ABC') -> 'abc'
0044         lower('abc') -> 'ABC'
0045 
0046     To access ICU "language specific" case mapping, upper() or lower()
0047     should be invoked with two arguments. The second argument is the name
0048     of the locale to use. Passing an empty string ("") or SQL NULL value
0049     as the second argument is the same as invoking the 1 argument version
0050     of upper() or lower():
0051 
0052         lower('I', 'en_us') -> 'i'
0053         lower('I', 'tr_tr') -> 'ı' (small dotless i)
0054 
0055   1.2  Unicode Aware LIKE Operator
0056 
0057     Similarly to the upper() and lower() functions, the built-in SQLite LIKE
0058     operator understands case equivalence for the 26 letters of the English
0059     language alphabet. The implementation of LIKE included in this
0060     extension uses the ICU function u_foldCase() to provide case
0061     independent comparisons for the full range of unicode characters.
0062 
0063     The U_FOLD_CASE_DEFAULT flag is passed to u_foldCase(), meaning the
0064     dotless 'I' character used in the Turkish language is considered
0065     to be in the same equivalence class as the dotted 'I' character
0066     used by many languages (including English).
0067 
0068   1.3  ICU Collation Sequences
0069 
0070     A special SQL scalar function, icu_load_collation() is provided that
0071     may be used to register ICU collation sequences with SQLite. It
0072     is always called with exactly two arguments, the ICU locale
0073     identifying the collation sequence to ICU, and the name of the
0074     SQLite collation sequence to create. For example, to create an
0075     SQLite collation sequence named "turkish" using Turkish language
0076     sorting rules, the SQL statement:
0077 
0078         SELECT icu_load_collation('tr_TR', 'turkish');
0079 
0080     Or, for Australian English:
0081 
0082         SELECT icu_load_collation('en_AU', 'australian');
0083 
0084     The identifiers "turkish" and "australian" may then be used
0085     as collation sequence identifiers in SQL statements:
0086 
0087         CREATE TABLE aust_turkish_penpals(
0088           australian_penpal_name TEXT COLLATE australian,
0089           turkish_penpal_name    TEXT COLLATE turkish
0090         );
0091 
0092   1.4 SQL REGEXP Operator
0093 
0094     This extension provides an implementation of the SQL binary
0095     comparision operator "REGEXP", based on the regular expression functions
0096     provided by the ICU library. The syntax of the operator is as described
0097     in SQLite documentation:
0098 
0099         <string> REGEXP <re-pattern>
0100 
0101     This extension uses the ICU defaults for regular expression matching
0102     behavior. Specifically, this means that:
0103 
0104         * Matching is case-sensitive,
0105         * Regular expression comments are not allowed within patterns, and
0106         * The '^' and '$' characters match the beginning and end of the
0107           <string> argument, not the beginning and end of lines within
0108           the <string> argument.
0109 
0110     Even more specifically, the value passed to the "flags" parameter
0111     of ICU C function uregex_open() is 0.
0112 
0113 
0114 2  COMPILATION AND USAGE
0115 
0116   The easiest way to compile and use the ICU extension is to build
0117   and use it as a dynamically loadable SQLite extension. To do this
0118   using gcc on *nix:
0119 
0120     gcc -shared icu.c `icu-config --ldflags` -o libSqliteIcu.so
0121 
0122   You may need to add "-I" flags so that gcc can find sqlite3ext.h
0123   and sqlite3.h. The resulting shared lib, libSqliteIcu.so, may be
0124   loaded into sqlite in the same way as any other dynamically loadable
0125   extension.
0126 
0127 
0128 3 BUGS, PROBLEMS AND SECURITY ISSUES
0129 
0130   3.1 The "case_sensitive_like" Pragma
0131 
0132     This extension does not work well with the "case_sensitive_like"
0133     pragma. If this pragma is used before the ICU extension is loaded,
0134     then the pragma has no effect. If the pragma is used after the ICU
0135     extension is loaded, then SQLite ignores the ICU implementation and
0136     always uses the built-in LIKE operator.
0137 
0138     The ICU extension LIKE operator is always case insensitive.
0139 
0140   3.2 The SQLITE_MAX_LIKE_PATTERN_LENGTH Macro
0141 
0142     Passing very long patterns to the built-in SQLite LIKE operator can
0143     cause excessive CPU usage. To curb this problem, SQLite defines the
0144     SQLITE_MAX_LIKE_PATTERN_LENGTH macro as the maximum length of a
0145     pattern in bytes (irrespective of encoding). The default value is
0146     defined in internal header file "limits.h".
0147 
0148     The ICU extension LIKE implementation suffers from the same
0149     problem and uses the same solution. However, since the ICU extension
0150     code does not include the SQLite file "limits.h", modifying
0151     the default value therein does not affect the ICU extension.
0152     The default value of SQLITE_MAX_LIKE_PATTERN_LENGTH used by
0153     the ICU extension LIKE operator is 50000, defined in source
0154     file "icu.c".
0155 
0156   3.3 Collation Sequence Security Issue
0157 
0158     Internally, SQLite assumes that indices stored in database files
0159     are sorted according to the collation sequence indicated by the
0160     SQL schema. Changing the definition of a collation sequence after
0161     an index has been built is therefore equivalent to database
0162     corruption. The SQLite library is not very well tested under
0163     these conditions, and may contain potential buffer overruns
0164     or other programming errors that could be exploited by a malicious
0165     programmer.
0166 
0167     If the ICU extension is used in an environment where potentially
0168     malicious users may execute arbitrary SQL (i.e. gears), they
0169     should be prevented from invoking the icu_load_collation() function,
0170     possibly using the authorisation callback.