kdb/docs/sqlite_issues.txt

0001 ---------------------------------------------------------
0002  SQLITE DRIVER IDEAS, ISSUES, PROPOSALS
0003  Copyright (C) 2003 Jarosław Staniek staniek at kde.org
0004  Started: 2003-07-09
0005 ---------------------------------------------------
0006
0007
0008 1. In most situations (especially on massive data operations) we do not want get types of the columns,
0009 so:
0010
0011 PRAGMA show_datatypes = OFF;
0012
0013
0014 2. SQLite automatically adds primary key to the table if there is no such key.
0015 Such pkey column is not visible for statemets like 'select * from table',
0016 'select oid,* from table' need to be executed to also get this special column.
0017
0018 See section '3.1 The ROWID of the most recent insert' of c_interface.html file.
0019
0020
0021 3. For smaller tables (how small? -- add configuration for this) sqlite_get_table() 'in memory'
0022 function could be used to speed up rows retrieving.
0023
0024
0025 4. Queries entered by user in the Query Designer should be checked for syntactically or logically validity and transformed to SQLite-compatible form befor execution. It is nonsense to ask SQLite engine if the given sql statement is valid, because then we wouldn't show too detailed error message to the user.
0026
0027
0028 5. SQLite not only doesn't handles column types but also doesn't checks value sizes, eg. it is possible to insert string of length 100 to the column of size 20.
0029 These checks should be made in KDb SQLite driver. In fact for each driver these checks could be made because user wants get a descriptive, localized, friendly message what's wrong. No single engine provides this of course. We need to store such a parameters like field size in project meta-data as sqlite doesn't stores that in any convenient way. It stores only 'CREATE TABLE' statement, as is.
0030
0031
0032 6. Possible storage methods for SQLite database embedded in KDb project:
0033         A. Single SQLite-compatible database file (let's name it: .sqlite file)
0034                 - Advantages: Best deal for bigger databases - no need for rewriting data form SQLite file to another,
0035                         fastest open and save times. DB data consumes disk space only once. Other applications that uses SQLite library could also make use of standard format of .sqlite file's contents. KDb project and data would be easily, defacto, separated, what is considered as good method in DB programming.
0036                 - Disadvantages: User (who may want to transfer a database) need to know that .kexi file doesn't stores his data but .sqlite is for that.
0037
0038         B. Single SQLite-compatible database file embedded inside a KDb project .kexi file.
0039                 SQLite requires an access to a file in its own (raw) format to be available somewhere in the path. If SQLite storing layer could be patched to adding an option for seek to given file position, sqlite data can be stored after KDb project data. When sqlite raw data file could be saved after a KDb project's data, rewriting the project contents should be performed (and this is done quite frequently). So, finally storing both files concatenated during normal operations is risky, costly and difficult to implement cleanly.
0040                 - Advantages: User do not need to know that there is sqlite used in KDb as embedded DB engine (and even if there is any sql engine). Transferring just one file between machines means successfully transferring data and project.
0041                 - Disadvantages: lack of everything described as advantages of A. method: difficult and costly open and save operations (unless SQLite storing layer could be patched).
0042
0043         Extensions and compilations of the both above methods:
0044                 - .sqlite files are really good compressable, so compress option can be added (if not for regular saving, then at least for "Email project & data" or 'Save As' actions. For these actions concatenating the sqlite data with KDb project's data would be another option convenient from user's point of view.
0045
0046         CURRENT IMPLEMENTATION: B way is selected with above extensions added to the TODO list.
0047
0048
0049 7. SQLite-builtin views are read-only. So the proposal is not to use them. Here is why:
0050         We want have rw queries in KDb if main table in a query is rw.
0051         <DEFINITION>: Main table T in a query Q is a table that is not at 'many' side of query relations.
0052         </DEFINITION>
0053         <Example>:
0054         table persons (name varchar, city integer);
0055         table cities (id integer primary key, name varchar);
0056
0057         DATA: [Jarek, 1]-------[1, Warsaw]
0058                               /
0059               [Jakub, 1]-----/
0060
0061         query: select * from persons, cities
0062         Now: 'cities' table is the main table (in other words it is MASTER table in this query).
0063         'cities' table is rw table in this query, while 'persons' table is read-only because it is at 'many' side
0064         in persons-cities relation. Modifying cities.id field, appropriate persons.city values in related
0065         records will be updated if there is cascade update enabled.
0066         </Example>
0067         IDEAS:
0068         A) Query result view (table view, forms, etc.) should allow editing fields from
0069                 main (master) table of this query, so every field object KDbField should have a method:
0070                 bool KDbField::isWritable() to allow GUI check if editing is allowed. Look that given field object
0071                 should be allocated for given query independently from the same field allocated for table schema.
0072                 The first field object can disallow editing while the latter can allow editing (because it is
0073                 component of regular table).
0074         B) Also add method for QString KDbField that returns i18n'd message about the reasons
0075                 of disallowing for editing given field in a context of given query.
0076
0077
0078 ----------------------------------------------------------------
0079 8. ERRORS Found
0080 8.1 select * from (select name from persons limit 1) limit 2
0081  -should return 1 row; returns 2
0082
0083 ----------------------------------------------------------------
0084
0085 HINTS:
0086
0087 PRAGMA table_info(table-name);
0088 For each column in the named table, invoke the callback function
0089 once with information about that column, including the
0090 column name, data type, whether or not the column can be NULL,
0091 and the default value for the column.
0092
0093
0094 ---------------------------------------------------------------
0095 OPTIMIZATION:
0096
0097 https://www.mail-archive.com/sqlite-users@sqlite.org/msg04117.html
0098
0099 From: D. Richard Hipp [mailto:[EMAIL PROTECTED]
0100 Sent: Friday, October 08, 2004 5:59 PM
0101 To: [EMAIL PROTECTED]
0102 Subject: Re: [sqlite] Questions about sqlite's join translation
0103
0104
0105 Keith Herold wrote:
0106 > The swiki says that making JOINs into a where clause is more
0107 > efficient, since sqlite translates the join condition into a where
0108 > clause.
0109
0110 When SQLite sees this:
0111
0112     SELECT * FROM a JOIN b ON a.x=b.y;
0113
0114 It translate it into the following before compiling it:
0115
0116     SELECT * FROM a, b WHERE a.x=b.y;
0117
0118 Neither form is more efficient that the other.  Both will generate
0119 identical code.  (There are subtle differences on an LEFT OUTER JOIN,
0120 but those details can be ignored when you are looking at things at a
0121 high level, as we are.)
0122
0123  > It also
0124 > says that you make queries more effiecient by minimizing the number of
0125
0126 > rows returned in the FROM clause as far to the left as possible in the
0127
0128 > join.  Does the latter matter if you are translating everything into a
0129
0130 > where  clause anyway?
0131 >
0132
0133 SQLite implements joins using nested loops with the outer
0134 loop formed by the first table in the join and the inner loop formed by
0135 the last table in the join.  So for the example above you would have:
0136
0137     For each row in a:
0138       For each row in b such that b.y=a.x:
0139         Return the row
0140
0141 If you reverse the order of the tables in the FROM clause like
0142 this:
0143
0144     SELECT * FROM b, a WHERE a.x=b.y;
0145
0146 You should get an equivalent result on output, but SQLite will implement
0147 the query differently.  Specifically it does this:
0148
0149     For each row in b:
0150       For each row in a such that a.x=b.y:
0151         Return the row
0152
0153 The trick is that you want to arrange the order of tables so that the
0154 "such that" clause on the inner loop is able to use an index to jump
0155 right to the appropriate row instead of having to do a full table scan.
0156 Suppose, for example, that you have an index on a(x) but not on b(y).
0157 Then if you do this:
0158
0159     SELECT * FROM a, b WHERE a.x=b.y;
0160
0161     For each row in a:
0162       For each row in b such that b.y=a.x:
0163         Return the row
0164
0165 For each row in a, you have to do a full scan of table b.  So the time
0166 complexity will be O(N^2).  But if you reverse the order of the tables
0167 in the FROM clause, like this:
0168
0169     SELECT * FROM b, a WHERE b.y=a.x;
0170
0171     For each row in b:
0172       For each row in a such that a.x=b.y
0173         Return the row
0174
0175 No the inner loop is able to use an index to jump directly to the rows
0176 in a that it needs and does not need to do a full scan of the table.
0177 The time complexity drops to O(NlogN).
0178
0179 So the rule should be:  For every table other than the first, make sure
0180 there is a term in the WHERE clause (or the ON or USING clause if that
0181 is your preference) that lets the search jump directly to the relavant
0182 rows in that table based on the results from tables to the left.
0183
0184 Other database engines with more complex query optimizers will typically
0185 attempt to reorder the tables in the FROM clause in order to give you
0186 the best result.  SQLite is more simple-minded - it codes whatever you
0187 tell it to code.
0188
0189 Before you ask, I'll point out that it makes no different whether you
0190 say "a.x=b.y" or "b.y=a.x".  They are equivalent.  All of the following
0191 generate the same code:
0192
0193       ON a.x=b.y
0194       ON b.y=a.x
0195       WHERE a.x=b.y
0196       WHERE b.y=a.x
0197
0198 ---------------------------------------------------------------