Warning, /libraries/kdb/src/drivers/sqlite/icu/README.txt is written in an unsupported language. File is not indexed.
0001 [this is a fork of sqlite's ext/icu/ extension] 0002 0003 This directory contains source code for the SQLite "ICU" extension, an 0004 integration of the "International Components for Unicode" library with 0005 SQLite. Documentation follows. 0006 0007 1. Features 0008 0009 1.1 SQL Scalars upper() and lower() 0010 1.2 Unicode Aware LIKE Operator 0011 1.3 ICU Collation Sequences 0012 1.4 SQL REGEXP Operator 0013 0014 2. Compilation and Usage 0015 0016 3. Bugs, Problems and Security Issues 0017 0018 3.1 The "case_sensitive_like" Pragma 0019 3.2 The SQLITE_MAX_LIKE_PATTERN_LENGTH Macro 0020 3.3 Collation Sequence Security Issue 0021 0022 0023 1. FEATURES 0024 0025 1.1 SQL Scalars upper() and lower() 0026 0027 SQLite's built-in implementations of these two functions only 0028 provide case mapping for the 26 letters used in the English 0029 language. The ICU based functions provided by this extension 0030 provide case mapping, where defined, for the full range of 0031 unicode characters. 0032 0033 ICU provides two types of case mapping, "general" case mapping and 0034 "language specific". Refer to ICU documentation for the differences 0035 between the two. Specifically: 0036 0037 https://www.icu-project.org/userguide/caseMappings.html 0038 https://www.icu-project.org/userguide/posix.html#case_mappings 0039 0040 To utilise "general" case mapping, the upper() or lower() scalar 0041 functions are invoked with one argument: 0042 0043 upper('ABC') -> 'abc' 0044 lower('abc') -> 'ABC' 0045 0046 To access ICU "language specific" case mapping, upper() or lower() 0047 should be invoked with two arguments. The second argument is the name 0048 of the locale to use. Passing an empty string ("") or SQL NULL value 0049 as the second argument is the same as invoking the 1 argument version 0050 of upper() or lower(): 0051 0052 lower('I', 'en_us') -> 'i' 0053 lower('I', 'tr_tr') -> 'ı' (small dotless i) 0054 0055 1.2 Unicode Aware LIKE Operator 0056 0057 Similarly to the upper() and lower() functions, the built-in SQLite LIKE 0058 operator understands case equivalence for the 26 letters of the English 0059 language alphabet. The implementation of LIKE included in this 0060 extension uses the ICU function u_foldCase() to provide case 0061 independent comparisons for the full range of unicode characters. 0062 0063 The U_FOLD_CASE_DEFAULT flag is passed to u_foldCase(), meaning the 0064 dotless 'I' character used in the Turkish language is considered 0065 to be in the same equivalence class as the dotted 'I' character 0066 used by many languages (including English). 0067 0068 1.3 ICU Collation Sequences 0069 0070 A special SQL scalar function, icu_load_collation() is provided that 0071 may be used to register ICU collation sequences with SQLite. It 0072 is always called with exactly two arguments, the ICU locale 0073 identifying the collation sequence to ICU, and the name of the 0074 SQLite collation sequence to create. For example, to create an 0075 SQLite collation sequence named "turkish" using Turkish language 0076 sorting rules, the SQL statement: 0077 0078 SELECT icu_load_collation('tr_TR', 'turkish'); 0079 0080 Or, for Australian English: 0081 0082 SELECT icu_load_collation('en_AU', 'australian'); 0083 0084 The identifiers "turkish" and "australian" may then be used 0085 as collation sequence identifiers in SQL statements: 0086 0087 CREATE TABLE aust_turkish_penpals( 0088 australian_penpal_name TEXT COLLATE australian, 0089 turkish_penpal_name TEXT COLLATE turkish 0090 ); 0091 0092 1.4 SQL REGEXP Operator 0093 0094 This extension provides an implementation of the SQL binary 0095 comparision operator "REGEXP", based on the regular expression functions 0096 provided by the ICU library. The syntax of the operator is as described 0097 in SQLite documentation: 0098 0099 <string> REGEXP <re-pattern> 0100 0101 This extension uses the ICU defaults for regular expression matching 0102 behavior. Specifically, this means that: 0103 0104 * Matching is case-sensitive, 0105 * Regular expression comments are not allowed within patterns, and 0106 * The '^' and '$' characters match the beginning and end of the 0107 <string> argument, not the beginning and end of lines within 0108 the <string> argument. 0109 0110 Even more specifically, the value passed to the "flags" parameter 0111 of ICU C function uregex_open() is 0. 0112 0113 0114 2 COMPILATION AND USAGE 0115 0116 The easiest way to compile and use the ICU extension is to build 0117 and use it as a dynamically loadable SQLite extension. To do this 0118 using gcc on *nix: 0119 0120 gcc -shared icu.c `icu-config --ldflags` -o libSqliteIcu.so 0121 0122 You may need to add "-I" flags so that gcc can find sqlite3ext.h 0123 and sqlite3.h. The resulting shared lib, libSqliteIcu.so, may be 0124 loaded into sqlite in the same way as any other dynamically loadable 0125 extension. 0126 0127 0128 3 BUGS, PROBLEMS AND SECURITY ISSUES 0129 0130 3.1 The "case_sensitive_like" Pragma 0131 0132 This extension does not work well with the "case_sensitive_like" 0133 pragma. If this pragma is used before the ICU extension is loaded, 0134 then the pragma has no effect. If the pragma is used after the ICU 0135 extension is loaded, then SQLite ignores the ICU implementation and 0136 always uses the built-in LIKE operator. 0137 0138 The ICU extension LIKE operator is always case insensitive. 0139 0140 3.2 The SQLITE_MAX_LIKE_PATTERN_LENGTH Macro 0141 0142 Passing very long patterns to the built-in SQLite LIKE operator can 0143 cause excessive CPU usage. To curb this problem, SQLite defines the 0144 SQLITE_MAX_LIKE_PATTERN_LENGTH macro as the maximum length of a 0145 pattern in bytes (irrespective of encoding). The default value is 0146 defined in internal header file "limits.h". 0147 0148 The ICU extension LIKE implementation suffers from the same 0149 problem and uses the same solution. However, since the ICU extension 0150 code does not include the SQLite file "limits.h", modifying 0151 the default value therein does not affect the ICU extension. 0152 The default value of SQLITE_MAX_LIKE_PATTERN_LENGTH used by 0153 the ICU extension LIKE operator is 50000, defined in source 0154 file "icu.c". 0155 0156 3.3 Collation Sequence Security Issue 0157 0158 Internally, SQLite assumes that indices stored in database files 0159 are sorted according to the collation sequence indicated by the 0160 SQL schema. Changing the definition of a collation sequence after 0161 an index has been built is therefore equivalent to database 0162 corruption. The SQLite library is not very well tested under 0163 these conditions, and may contain potential buffer overruns 0164 or other programming errors that could be exploited by a malicious 0165 programmer. 0166 0167 If the ICU extension is used in an environment where potentially 0168 malicious users may execute arbitrary SQL (i.e. gears), they 0169 should be prevented from invoking the icu_load_collation() function, 0170 possibly using the authorisation callback.