language/duchain/Mainpage.dox

0001 /*!
0002  * @mainpage Definition-Use Chain and Type System
0003  *
0004  * Overview | \ref duchain-design "Design" | \ref Implementing "Implementing" | \ref Using "Using"
0005  *
0006  * The definition-use chain and type system provide a language-neutral
0007  * representation of source code structure, used to provide language-based
0008  * features to all implemented languages in a generic manner.
0009  *
0010  * An introduction to the duchain can be found in the \ref duchain-design document.
0011  *
0012  * Details about how to provide a duchain and type system for your favourite
0013  * language can be found here: \ref Implementing.
0014  *
0015  * @licenses
0016  * @lgpl
0017  *
0018  * For questions and discussions about editor either contact the author
0019  * or the <a href="mailto:kdevelop-devel@kde.org">kdevelop-devel@kde.org</a>
0020  * mailing list.
0021  */
0022
0023 /** \page duchain-design Definition-Use Chain Design
0024
0025  * @ref index "Overview" | Design | \ref Implementing "Implementing" | \ref Using "Using"
0026
0027 \section overview Overview
0028
0029 The duchain is a sequence of contexts in a code file, and the associated definitions which occur in those contexts.  A simplified way of thinking about it is that for each set of brackets (curly {} or not ()), there is a separate context.  Each context is represented by a \ref KDevelop::DUContext.  Each context will have one parent context (except in the case of the top level context which has none), and any number of child contexts (including none).  Additionally, each context can import any number of other contexts.  The reason for this will become clear later.  Thus, the \ref KDevelop::DUContext structure resembles a directed acyclic graph, for those familiar with the concept.
0030
0031
0032 \section parsing Parsing
0033
0034 These \ref KDevelop::DUContext "DUContexts" are created on the first pass after parsing the code to an AST (abstract syntax tree).  Also, in this stage the data types are parsed, and any declarations which are encountered are recorded against the context in which they are encountered in.  Each declaration is represented by a Declaration.
0035
0036 Parsing code is arranged into builder classes, which subclass the AST visitor pattern.  They are designed to be able to subclass each other, thus achieving multiple goals with each pass (as described in the above paragraph).
0037
0038 For most languages, the first pass is accomplished by the \ref KDevelop::AbstractContextBuilder "AbstractContextBuilder", \ref KDevelop::AbstractTypeBuilder "AbstractTypeBuilder", and \ref KDevelop::AbstractDeclarationBuilder "AbstractDeclarationBuilder".  The customised builder class is a subclass of each of these classes. Thus, in the first pass, the \ref KDevelop::AbstractContextBuilder "AbstractContextBuilder" creates the \ref KDevelop::DUContext "DUContext" tree, the \ref KDevelop::AbstractTypeBuilder "AbstractTypeBuilder" records which \ref KDevelop::AbstractType "types" are encountered, and the \ref KDevelop::AbstractDeclarationBuilder "AbstractDeclarationBuilder" creates \ref KDevelop::Declaration "Declaration" instances which are associated with the current type and context.
0039
0040 The second pass is the creation of uses, accomplished a subclass of both the \ref KDevelop::AbstractContextBuilder and the \ref KDevelop::AbstractUseBuilder.  On the second pass, we only iterate previously parsed contexts (as they are already created).  Then, as variable uses are encountered, a \ref KDevelop::Use is created for each.  A \ref KDevelop::Declaration is searched for in the current context, and if one is found, they are associated with each other.
0041
0042
0043 \section classes Classes and their purposes
0044
0045 \li \ref KDevelop::DUChain - a global object which keeps track of all loaded source files and the top level context of their definition-use chains.
0046
0047 \li \ref KDevelop::DUContext - an object which represents a single context in a source file, and stores information about parent and child \ref KDevelop::DUContext "DUContexts", and \ref KDevelop::Declaration "Declarations", \ref KDevelop::Definition "Definitions" and \ref KDevelop::Use "Uses" which occur in them.  Also provides convenience methods for searching the chain.
0048
0049 \li \ref KDevelop::Declaration - an object which represents a single declaration.  Has several subclasses which store more information specific to the type of declaration which is being represented.
0050
0051 \li \ref KDevelop::Use - an object which represents a use of a particular declaration.
0052
0053 \li \ref KDevelop::PersistentSymbolTable - a hash which stores identifiers available in the top level context of a source file and their respective \ref KDevelop::Declaration "Declarations".
0054
0055 \li KDevelop::*Builder - objects whose purpose is to iterate the parsed AST and produce instances of the duchain objects.
0056
0057 \li \ref KDevelop::AbstractType - the base class for types.
0058
0059
0060 \section searching-duchain Definition-use chain searching
0061
0062 Because iterating a complete definition-use chain can become expensive when they are large, when a search is being performed (eg. for a declaration corresponding to a certain identifier) it is first performed up to the top level context, then the symbol table is consulted.  The symbol table is a hash of all identifiers which are known to the entire duchain.  All potential matches are evaluated to see if they are visible from the location of the use.
0063
0064
0065 \section locking Locking
0066
0067 The duchain is designed to operate in a multithreaded environment.  This means that multiple parse jobs may be operating simultaneously, reading from and writing to the duchain.  Thus, locking is required.
0068
0069 A single read-write lock is used to serialise writes to the chain and allow concurrent reads.  Thus, to call non-const methods, you must hold a write lock, and for const methods, a read lock.  Customised read/write lockers have been created, called DUChainWriteLocker and DUChainReadLocker.  You must not request a write lock while holding a read lock, or you could cause a deadlock.
0070
0071 Also, when manipulating text editor ranges, the \ref KTextEditor::SmartInterface must be locked. \warning You must <em>never</em> attempt to acquire the duchain read or write lock when holding the smart lock, else you may cause a deadlock. See code in \ref KDevelop::AbstractContextBuilder::openContextInternal and \ref KDevelop::DUChainBase.
0072
0073
0074 \section plugin-interface Interface for plugins
0075
0076 As plugins will be accessing the \ref KDevelop::DUChain from the main thread, they will need to hold a read lock.
0077
0078
0079 \section text-editor-integration Text editor integration
0080
0081 The main classes are subclasses of a base class, \ref KDevelop::DUChainBase.  This object holds a reference to the text range.  When the source file is opened in an editor, the \ref KDevelop::EditorIntegrator will create smart text ranges, which are bound to the editor's copy of the document.  From there, highlighting can be applied to these ranges, as well as other advanced functions (see the \ref KTextEditor documentation for possibilities).  The language support will convert these ranges to smart ranges when the corresponding document is loaded into an editor.
0082
0083
0084 \section future Future features - ideas
0085
0086 The completed duchain should allow for code refactoring, intelligent navigation, improved automatic code generation (eg. "create switch statement"), context-sensitive code completion, integration of documentation, debugger integration, a code structure view, call graph, static code analysis etc.
0087
0088 */
0089
0090 /**
0091  * \page Implementing Implementing Definition-Use Chains for a specific language
0092  *
0093  * \ref index "Overview" | \ref duchain-design "Design" | Implementing | \ref Using "Using"
0094  *
0095  * \section create Creating the Definition-Use Chain
0096  *
0097  * To create a definition-use chain for a programming language, you need the following:
0098  * \li a parser for the language,
0099  * \li a context builder,
0100  * \li a type builder,
0101  * \li a declaration builder,
0102  * \li and a use builder.
0103  *
0104  * Once you have everything up to the declaration builder, your language's classes, functions etc.
0105  * will automatically appear in the class browser, and be able to perform limited refactoring.
0106  *
0107  * Once you have the use builder, you will automatically have full support for context browsing.
0108  *
0109  * Code completion support requires further work specific to your language, see \ref cc
0110  *
0111  * \subsection parser Parser
0112  * Parsers in %KDevelop can be created in any way as long as they produce an AST (abstract
0113  * syntax tree).  Most supported languages have parsers generated by kdevelop-pg-qt.
0114  * This is a LL parser generator, and allows you to specify the grammar from which the
0115  * parser and AST are generated.  The parser will also need a lexer, common solutions are to
0116  * use flex to create one for you, or to create one by hand.
0117  *
0118  * \subsection builders DUChain Builders
0119  *
0120  * The abstract builder classes (detailed below) provide convenience functions for creating a
0121  * definition-use chain.  They are template classes which require 2 or 3 class
0122  * types:
0123  * - T: your base AST node type
0124  * - NameT: your identifier AST node type, if you have only one, or your base AST node type
0125  *          if more than one exist
0126  * - Base class: your base class, eg. for your use builder, you will usually supply your custom
0127  *               context builder here.
0128  *
0129  * \subsection context Context Builder
0130  * By subclassing \ref KDevelop::AbstractContextBuilder "AbstractContextBuilder", you will have everything you need to
0131  * keep track of contexts as you iterate the AST.  When a new context is encountered, such
0132  * as a new block (eg. between {} brackets), create a new context with KDevelop::AbstractContextBuilder::openContext(),
0133  * and close it with KDevelop::AbstractContextBuilder::closeContext().
0134  *
0135  * Some languages do not need a context to be created for each block, for example languages
0136  * where declarations are visible after the block in which they were defined (eg. php).
0137  *
0138  * \subsection type Type Builder
0139  * By subclassing \ref KDevelop::AbstractTypeBuilder "AbstractTypeBuilder", you can create types
0140  * when one is encountered in your AST by calling openType().  Again, you need to closeType()
0141  * when the type is exited.  Complex types are built up this way by creating the type at each node,
0142  * ie. with int[], first an array type is opened, then an integral type representing an integer
0143  * is opened and closed, then when the array type is closed, you can retrieve the lastType()
0144  * and set that as the type which is being made into an array.
0145  *
0146  * \subsection declaration Declaration Builder
0147  * By subclassing \ref KDevelop::AbstractDeclarationBuilder "AbstractDeclarationBuilder", you can create
0148  * declarations when they are encountered in your AST.  Usually you will assign the lastType()
0149  * or currentType() to them within closeDeclaration().
0150  *
0151  * \subsection use Use Builder
0152  * By subclassing \ref KDevelop::AbstractUseBuilder "AbstractUseBuilder", you can create uses when they are encountered
0153  * in your AST, and they will be automatically registered with the current context.
0154  *
0155  * \subsection expr Expression Visitor
0156  * By subclassing \ref KDevelop::AbstractExpressionVisitor "AbstractExpressionVisitor", you can provide a convenient
0157  * way to request the type and/or instance information for a given expression.  This is useful in both
0158  * figuring out what code completion to offer when it is invoked after or inside an expression, and to
0159  * allow complete use building, ie. for uses within expressions.  In c++, this is a very complex matter,
0160  * involving type conversion and overload resolution, while still tracking template types.
0161  *
0162  * \section cc Implementing Code Completion
0163  *
0164  * To provide code completion for your language, you will need to implement the following:
0165  * \todo complete this section
0166  */
0167
0168 /**
0169  * \page Using Using already created Definition-Use Chains in plugins
0170  *
0171  * \ref index "Overview" | \ref duchain-design "Design" | \ref Implementing "Implementing" | Using
0172  *
0173  * \section intro Introduction
0174  * This section is designed for developers who want to use definition-use chains, for example to provide
0175  * code generation, refactoring, or other advanced language-specific functionality.  First some important
0176  * fundamentals of using the duchain classes will be covered.
0177  *
0178  * \subsection pointers Definition-use chain pointers and references
0179  * As the definition-chain is a dynamic entity, safe pointers (DU*Pointer) and indirect references (Indexed*)
0180  * are required to reference objects in a thread-safe way, and in a way that allows minimisation of memory use by saving
0181  * non-referenced chains to disk.  While you do not hold the KDevelop::DUChain::lock(),
0182  * these pointers and references should not be accessed, because the objects they will return may be
0183  * modified by other threads.
0184  *
0185  * The KDevelop::DUChain::lock() is a read-write lock, which means that if you don't intend to
0186  * change the chain (which you won't, unless you are a language plugin developer), you only need a read-lock.
0187  * This has the advantage of allowing multiple threads to safely read from the chain simultaneously.
0188  * The easiest way to acquire this lock is to use KDevelop::DUChainReadLocker:
0189  * \code
0190  *    KDevelop::TopDUContextPointer topContext;
0191  *
0192  *    // Retrieve the top context for myUrl (see explanation below)
0193  *    topContext = KDevelop::DUChainUtils::standardContextForUrl( myUrl );
0194  *
0195  *    // Lock the duchain for reading
0196  *    KDevelop::DUChainReadLocker readLock( KDevelop::DUChain::lock() );
0197  *
0198  *    // Check if the top context pointer is valid
0199  *    if ( topContext ) {
0200  *      ...
0201  *    }
0202  * \endcode
0203  * Before accessing the top context, this code will block until a read-only lock has been acquired.
0204  * It is then safe to access const functions of all duchain objects.  The lock will continue to be held
0205  * until readLock goes out of scope.
0206  *
0207  * \note It is safe to recursively acquire a read-lock (or a write-lock), but not safe to request a write lock
0208  * once a read lock is held (this may result in a deadlock).
0209  * \note You must not attempt to acquire the duchain lock when you already hold the smart lock (this may result in a deadlock).
0210  *
0211  * In debug builds, if you attempt to access something in the duchain which you do not hold the proper lock
0212  * for, you will encounter an assert (usually triggered by the ENSURE_CHAIN_READ_LOCKED or ENSURE_CHAIN_WRITE_LOCKED macros).
0213  *
0214  * For more information about duchain pointers, see KDevelop::DUChainPointer.
0215  *
0216  * \section searching-declarations Searching for declarations
0217  * If you have a qualified identifier and want to retrieve declaration(s) which share the
0218  * identifier, you can do this by using the symbol table (KDevelop::PersistentSymbolTable):
0219  *
0220  * \code
0221  *  KDevelop::DUChainReadLocker lock( KDevelop::DUChain::lock() );
0222  *  KDevelop::PersistentSymbolTable::Declarations declarations = KDevelop::PersistentSymbolTable::self().getDeclarations( qualifiedIdentifier );
0223  *
0224  *  for (PersistentSymbolTable::Declarations::Iterator it = declarations.iterator(); it; ++it) {
0225  *    KDevelop::Declaration* decl = it->declaration();
0226  *    // Use the declaration...
0227  *  }
0228  * \endcode
0229  *
0230  * \section accessing Accessing a definition-use chain
0231  * The first step in using a duchain is to retrieve the chain that you are interested in.
0232  * Presumably you will know the URL of the file for which you want to retrieve the chain.
0233  * Some languages (notably C and C++) can have several different chains for one file depending on
0234  * what the definitions of macros were when the files were parsed.  Because of this, the recommended
0235  * way to access the duchain for a document is via KDevelop::DUChainUtils.
0236  *
0237  * \subsection accessing-topcontext Accessing top level contexts
0238  * \todo include a note on how to request loading of contexts from disk, and requesting parsing of files which
0239  *  are not currently in the duchain.
0240  *
0241  * Top level contexts (TopDUContext) can be retrieved through KDevelop::DUChainUtils::standardContextForUrl().
0242  * This is the context which is presented to the user when the file is opened (for highlighting, completion etc.).
0243  * In case it is not the context which you are after, all contexts for a file can be retrieved via
0244  * KDevelop::DUChain::chainsForDocument().
0245  *
0246  * \subsection accessing-declaration Accessing declarations at a specific location
0247  * If you have a url and a cursor location, you can attempt to retrieve the declaration located at that position
0248  * with KDevelop::DUChainUtils::itemUnderCursor().
0249  *
0250  * \section navigating Navigating a definition-use chain
0251  * \subsection navigating-duobject All duchain objects
0252  * All duchain objects inherit from KDevelop::DUChainBase.  This is in turn a subclass of KDevelop::DocumentRangeObject.
0253  * Thus, you can retrieve the text range of every object via KDevelop::DocumentRangeObject::range().  If the document
0254  * is currently loaded in a text editor, it will likely have a smart range (KTextEditor::SmartRange), which tracks the position
0255  * of the range when the document is changed.  This can be accessed via KDevelop::DocumentRangeObject::smartRange().
0256  *
0257  * \subsection navigating-contexts Contexts
0258  * Now that you have a chain, you'll probably want to be able to navigate around it.  You can iterate contexts
0259  * by using KDevelop::DUContext::childContexts().  You can then retrieve from each
0260  * context a list of local declarations with KDevelop::DUContext::localDeclarations(), and a list of all
0261  * uses in the context with KDevelop::DUContext::uses().
0262  *
0263  * Imported contexts are usually contexts which have declarations which are visible in the current context.
0264  * For example:
0265  * \code
0266  *   for (int i = 0; i < count(); ++i) {
0267  *     qDebug() << i;
0268  *   }
0269  * \endcode
0270  * The code which contains the debug statement will import the for conditions context, which contains the declaration
0271  * of i.  Thus, i's declaration is visible to the debug statement.
0272  *
0273  * Usually, you will not have to worry about these details, as the search functions already take them into account.
0274  * If you want to find a declaration for a given identifier in a given context, you can use one of the
0275  * KDevelop::DUContext::findDeclarations() or KDevelop::DUContext::findLocalDeclarations() functions.
0276  *
0277  * \subsection navigating-declarations Declarations
0278  * Declarations always occur within a context, which can be accessed through KDevelop::Declaration::parentContext().
0279  * Some declarations (eg. namespaces, classes) create a new context which can then contain child declarations, eg.
0280  * variables and functions within a class.  For these declarations, the associated context which contains these
0281  * child declarations can be accessed through KDevelop::Declaration::internalContext(), if one exists.
0282  *
0283  * \subsection navigating-uses Uses
0284  * Uses are instances where a declaration is referenced in the code.  All uses for a declaration can be calculated
0285  * from the duchain, although this can potentially be a time-consuming task. KDevelop::Declaration::uses() will return
0286  * all uses for a declaration, and KDevelop::Declaration::smartUses() will return smart ranges which represent all uses
0287  * in the currently opened documents.
0288  *
0289  * \subsection navigating-types Types
0290  * Declarations may have a type, which can be retrieved through KDevelop::Declaration::abstractType().  Types can then
0291  * be visited using KDevelop::TypeVisitor, or manually with the corresponding calls in the type subclasses.  Types can be
0292  * compared for equality using KDevelop::AbstractType::equals().
0293  *
0294  * \section efficiency DUChain efficiency issues
0295  * It was confirmed during the implementation of the DUChain that there is too much information to store the duchain for
0296  * an entire project in memory at the same time (KDevPlatform itself was >1Gb).  Subsequently, saving the chains to disk
0297  * has been implemented.  Following are some of the ramifications of this design.
0298  *
0299  * \subsection referenced-topcontexts Top Context Referencing
0300  * In order to determine which chains can be unloaded from memory, a referenced pointer was introduced called
0301  * KDevelop::ReferencedTopDUContext.  If you are using duchain objects outside of a duchain lock, and you need them to
0302  * remain in memory, you should create a KDevelop::ReferencedTopDUContext for the top context of each of the chains you need.
0303  * This will ensure it is not unloaded.  However, do not use this excessively or %KDevelop will have the same problem
0304  * of using large amounts of memory.
0305  *
0306  * \code
0307  *    KDevelop::TopDUContextPointer topContext;
0308  *    KDevelop::ReferencedTopDUContext topReferenced
0309  *
0310  *    topContext = KDevelop::DUChainUtils::standardContextForUrl( myUrl );
0311  *    topReferenced =  KDevelop::DUChainUtils::standardContextForUrl( myOtherUrl );
0312  *
0313  *    // Both of these pointers may be valid here
0314  *
0315  *    // Sleep
0316  *    sleep(10);
0317  *
0318  *    // Lock the duchain for reading
0319  *    KDevelop::DUChainReadLocker readLock( KDevelop::DUChain::lock() );
0320  *
0321  *    // topContext may not be valid any more, because it may have been saved to disk and unloaded from memory.
0322  *    // topReferenced will still be valid if it was valid when it was retrieved (above).
0323  * \endcode
0324  *
0325  * \subsection code-model Code Model
0326  * In order to facilitate easy access to top level declarations, a list of top level declarations is available from
0327  * KDevelop::CodeModel.  For each parsed file, you can call KDevelop::CodeModel::items() to retrieve a list of declarations
0328  * and some basics about their type.  If you need any further information, the chain must be loaded from disk.
0329  * \code
0330  *   uint count;
0331  *   const CodeModelItem* items;
0332  *   IndexedString file = \<yourFile\>;
0333  *
0334  *   // Retrieve the items for the given file
0335  *   KDevelop::CodeModel::self().items(file, count, items);
0336  *
0337  *   for (int i = 0; i < count; ++i) {
0338  *     CodeModelItem* thisItem = items++;
0339  *
0340  *      // Use the item here.
0341  *      ...
0342  *   }
0343  * \endcode
0344  *
0345  * To access the declaration for each item, use KDevelop::PersistentSymbolTable::declarations().
0346  *
0347  * \section inadequate When the duchain doesn't contain all the information
0348  * If you need more information than is available in the duchain, you're most likely looking at using the AST generated by the
0349  * language support.  Note that this is obviously not language-independent, so it should be a last resort in cases where
0350  * the functionality being supplied is not language-specific.  If the duchain is missing some information that would make
0351  * sense to add, please raise it with the %KDevelop developers.
0352  *
0353  * \todo add mechanism to get at the AST
0354  * \todo keep the AST in memory for loaded files
0355  */
0356
0357
0358 // DOXYGEN_REFERENCES = language/editor
0359 // DOXYGEN_SET_WARN_LOGFILE=language/duchain/doxygen.log
0360 // DOXYGEN_SET_RECURSIVE=yes