Warning, /office/calligra/sheets/doc/FormulaEngine.dox is written in an unsupported language. File is not indexed.
0001 /** 0002 \page formulaengine Formula Engine 0003 \author Ariya Hidayat (<a href="mailto:ariya@kde.org">ariya@kde.org</a>) 0004 \date 2004 0005 0006 \par Status: 0007 FINISHED; NEEDS UPDATE (inline arrays) 0008 0009 <p>This formula engine is just an expression evaluator. To offer better 0010 performance, the expression is first compiled into byte codes which will 0011 be executed later by a virtual machine.</p> 0012 0013 <p>Before compilation, the expression is separated into pieces, called tokens. 0014 This step, which is also known as lexical analysis, takes places at once 0015 and will produce sequence of tokens. They are however not stored and used only 0016 for the purpose of the subsequent step. 0017 Tokens are supplied to the parser, also known as syntax analyzer. In this 0018 design, the parser is also a code generator. It involve the generation 0019 of byte codes which represents the expression. 0020 Evaluating the formula expression is now basically running the virtual 0021 machine to execute compiled byte codes. No more scanning or parsing 0022 are performed during evaluation, this saves time a lot.</p> 0023 0024 <p>The virtual machine itself (and of course the byte codes) are designed to be 0025 as simple as possible. This is supposed to be stack-based, i.e. the virtual 0026 machine has an execution stack of values which would be manipulated 0027 as each byte code is processed. Beside the stack, there will be a list of 0028 constant (sometimes also called as "constants pool") to hold Boolean, 0029 integer, floating-point or string values. When a certain byte code needs 0030 a constant for the operand, an index is specified which should be used 0031 to look up the constant in the constants pool.</p> 0032 0033 <p>There are only few byte code, sufficient enough to perform calculation. 0034 Yes, this is really minimalist but yet does the job fairly well. 0035 The following provides brief description for each type of bytecode.</p> 0036 0037 <div style="margin-left: 4em"> 0038 0039 <p><i>Nop</i> means no operation.</p> 0040 0041 <p><i>Load</i> means loads a constant and push it to the stack. The constant can 0042 be found at constant pools, at position by 'index', it could be a Boolean, 0043 integer, floating-point or string value.</p> 0044 0045 <p><i>Ref</i> means gets a value from a reference. Member variable 'index' will 0046 refers to a string value in the constant pools, i.e. the name of the reference. 0047 Typically the reference is either a cell (e.g. A1), range of cells (A1:B10) 0048 or possibly function name. Example: expression A2+B2 will be compiled as:<br/> 0049 Constants:<br/> 0050 #0: "A2"<br/> 0051 #1: "B2"<br/> 0052 Codes:<br/> 0053 Ref #0<br/> 0054 Ref #1<br/> 0055 Add 0056 </p> 0057 0058 <p><i>Function</i>. 0059 Example: expression "sin(x)" will be compiled as:<br/> 0060 Constants:<br/> 0061 #0: "sin"<br/> 0062 #1: "x"<br/> 0063 Codes:<br/> 0064 Ref #0<br/> 0065 Ref #1<br/> 0066 Function 1 0067 </p> 0068 0069 <p><i>Neg</i> is a unary operator, a value is popped from stack and negated and then 0070 pushed back to the stack. If it is not number (Boolean or string), it 0071 will be converted first.</p> 0072 0073 <p><i>Add, Sub, Mul, Div and Pow</i> are binary operators, two values are popped from 0074 stack and processed (added, subtracted, multiplied, divided, or power) and 0075 the result is pushed to the stack.</p> 0076 0077 <p><i>Concat</i> is string operation, two values are popped from stack (and converted 0078 to string if they are not string values), concatenated, and the result is 0079 pushed to the stack.</p> 0080 0081 <p><i>Not</i> is a logical operation, a value is popped from stack and its Boolean 0082 not is pushed into the stack. When it is not Boolean value, there will be 0083 a cast.</p> 0084 0085 <p><i>Equal, Less, and Greater</i> are comparison operators, two values are 0086 popped from stack and compared appropriately. The result, which is a Boolean 0087 value, is pushed into the stack. To simplify, there no "not equal" 0088 comparison because it can be regarded as "equal" followed by 0089 "not" byte codes. Same goes for "less than or equal to" and 0090 "greater than or equal to".</p> 0091 0092 </div> 0093 0094 <p>The expression scanner is based on finite state acceptor. The state denotes 0095 the position of cursor, e.g. inside a cell token, inside an identifier, etc. 0096 State transition is following by emitting the associated token to the 0097 result buffer. Rather than showing the state diagrams here, it is much more 0098 convenience and less complicated to browse the scanner source code and try 0099 to follow its algorithm from there.</p> 0100 0101 <p>The parser is designed using one of bottom-up parsing technique, namely 0102 based on Polish notation. Instead of ordering the tokens in suffix Polish 0103 form, the parser (which is also the code generator) simply outputs 0104 byte codes. In its operation, the parser requires the knowledge of operator 0105 precedence to correctly translate unparenthesized infix expression and 0106 thus requires the use of a syntax stack.</p> 0107 0108 <p>The parser algorithm is given as follows:</p> 0109 0110 <div style="margin-left: 4em"> 0111 Repeat the following steps:<br/> 0112 Step 1: Get next token<br/> 0113 Step 2: If it is an identifier<br/> 0114 - push it to syntax stack<br/> 0115 - generated "Ref"<br/> 0116 Step 3: If it is a Boolean, integer, float or string value<br/> 0117 - push it to syntax stack<br/> 0118 - generated "Load"<br/> 0119 Step 4: If it is an operator<br/> 0120 - check for reduce rules<br/> 0121 - when no more rules applies, push token to the syntax stack<br/> 0122 </div> 0123 0124 <p>The reduce rules are:</p> 0125 0126 <p>Rule A: <i>function argument</i>: 0127 if token is semicolon or right parenthesis, 0128 if syntax stack looks as: 0129 <ul type="square"> 0130 <li>non-operator <--- top</li> 0131 <li>operator ;</li> 0132 <li>non-operator</li> 0133 <li>operator (</li> 0134 <li>identifier</li> 0135 </ul> 0136 then reduce to 0137 <ul type="circle"> 0138 <li>non operator</li> 0139 <li>operator (</li> 0140 <li>identifier</li> 0141 <li>increase number of function arguments</li> 0142 </ul> 0143 </p> 0144 0145 <p>Rule B: last function argument<br> 0146 if syntax stack looks as:<br> 0147 <ul type="square"> 0148 <li>operator )</li> 0149 <li>non-operator</li> 0150 <li>operator (</li> 0151 <li>identifier</li> 0152 </ul> 0153 then reduce to:<br> 0154 <ul type="circle"> 0155 <li>non-operator</li> 0156 <li>generated "Function" + number of function arguments</li> 0157 </ul> 0158 </p> 0159 0160 <p>Rule C: function without argument<br> 0161 if syntax stack looks as:<br> 0162 <ul type="square"> 0163 <li>operator )</li> 0164 <li>operator (</li> 0165 <li>identifier</li> 0166 </ul> 0167 then reduce to:<br> 0168 <ul type="circle"> 0169 <li>non-operator (dummy)</li> 0170 </ul> 0171 </p> 0172 0173 <p>Rule D: parenthesis removal<br> 0174 if syntax stack looks as:<br> 0175 <ul type="square"> 0176 <li>operator (</li> 0177 <li>non-operator</li> 0178 <li>operator )</li> 0179 </ul> 0180 then reduce to:<br> 0181 <ul type="circle"> 0182 <li>non-operator</li> 0183 </ul> 0184 </p> 0185 0186 <p>Rule E: binary operator<br> 0187 if syntax stack looks as:<br> 0188 <ul type="square"> 0189 <li>non-operator</li> 0190 <li>binary operator</li> 0191 <li>non-operator</li> 0192 <li>and if the precedence of the binary operator in the syntax stack 0193 is greater or equals to the precedence of token</li> 0194 </ul> 0195 then reduce to:<br> 0196 <ul type="circle"> 0197 <li>non-operator</li> 0198 <li>and generated appropriate byte code for the binary operator</li> 0199 </ul> 0200 </p> 0201 0202 <p>Rule F: unary operator<br> 0203 if syntax stack looks as:<br> 0204 <ul type="square"> 0205 <li>non-operator</li> 0206 <li>unary operator</li> 0207 <li>operator</li> 0208 </ul> 0209 then reduce to:<br> 0210 <ul type="circle"> 0211 <li>operator</li> 0212 <li>and generated "Neg" if unary operator is '-'</li> 0213 </ul> 0214 </p> 0215 0216 <p>Percent operator is a special case and not handled the above mentioned rule. 0217 When the parser finds the percent operator, it checks whether there's a non-operator 0218 token right before the percent. If yes, then the following code is generated: 0219 <tt>load 0.01</tt> followed by <tt>multiply</tt>.</p> 0220