kdev-ruby/parser/README.md

0001 # Parser
0002
0003 ## Intro
0004
0005 This is a bison-generated pure LALR parser. It's based on the MRI parser.
0006 This parser has been designed to be small, simple and yet powerful. Its code
0007 resides in four different files: parser.y, node.h, node.c and main.c.
0008 The main.c file is conceived only for testing purposes (see the section
0009 "Testing and Debugging the parser"). More important are the files node.h and
0010 node.c that, in short, define what is a node and what can we (and the parser)
0011 do with nodes. Finally, the parser.y file is the grammar of the parser. This
0012 is the parser in its very basic shape. To see what this parser really looks
0013 like, we have to generate three files: parser.h, parser.c and hash.c.
0014 The parser\_gen.{h, c} files are generated by bison taking the grammar file as
0015 its input. The last, (but not least) file to generate is hash.c and it's
0016 generated by gperf taking the file tools/gperf.txt as its input. It contains
0017 a hash table that is used by the parser to match keywords quickly. The
0018 parser\_gen.{h, c} files are generated by cmake. The hash.c file is generated
0019 with the tools/gperf.rb script.
0020
0021 ## Testing and Debugging the Parser
0022
0023 All the info on testing the parser can be found [here](http://techbase.kde.org/Projects/KDevelop4/Ruby#Testing).
0024
0025 What represents all those integers from the output ? In short,
0026 it's the representation of an AST printed in pre-order. As you
0027 will see, the parser tries to beautify this output by telling you if the
0028 expression is a condition inside of, for example, a for statement, or it will
0029 output "Root" and "Next" if there is a list of inner statements. Moreover, the
0030 parser sometimes outputs names between parenthesis. Those names are variables,
0031 the name of a function, a class, etc. Sadly, sometimes the output is scary
0032 and a complete mess. In those cases, experience and patience
0033 will be our friends ;)
0034
0035 ## Character encodings
0036
0037 As stated before, this parser is meant to be simple and small. This means
0038 that by now we only support UTF-8 encoding. This doesn't mean that
0039 other encodings will never be supported by this parser, it's just that
0040 the developers haven't had enough time to write the code.
0041