Warning, /sdk/kcachegrind/doc/index.docbook is written in an unsupported language. File is not indexed.

0001 <?xml version="1.0" ?>
0002 <!DOCTYPE book PUBLIC "-//KDE//DTD DocBook XML V4.5-Based Variant V1.1//EN" "dtd/kdedbx45.dtd" [
0003   <!ENTITY kcachegrind '<application>KCachegrind</application>'>
0004   <!ENTITY cachegrind "<application>Cachegrind</application>">
0005   <!ENTITY calltree "<application>Calltree</application>">
0006   <!ENTITY callgrind "<application>Callgrind</application>">
0007   <!ENTITY valgrind "<application>Valgrind</application>">
0008   <!ENTITY oprofile "<application>OProfile</application>">
0009   <!ENTITY EBS "<acronym>EBS</acronym>">
0010   <!ENTITY TBS "<acronym>TBS</acronym>">
0011   <!ENTITY % addindex "IGNORE">
0012   <!ENTITY % English "INCLUDE">
0013 ]>
0014 
0015 <book id="kcachegrind" lang="&language;">
0016 
0017 <bookinfo>
0018 <title>The &kcachegrind; Handbook</title>
0019 
0020 <authorgroup>
0021 <author>
0022 <firstname>Josef</firstname>
0023 <surname>Weidendorfer</surname>
0024 <affiliation>
0025 <address><email>Josef.Weidendorfer@gmx.de</email></address>
0026 </affiliation>
0027 <contrib>Original author of the documentation</contrib>
0028 </author>
0029 
0030 <author>
0031 <firstname>Federico</firstname>
0032 <surname>Zenith</surname>
0033 <affiliation>
0034 <address><email>federico.zenith@member.fsf.org</email></address>
0035 </affiliation>
0036 <contrib>Updates and corrections</contrib>
0037 </author>
0038 
0039 <!-- TRANS:ROLES_OF_TRANSLATORS -->
0040 
0041 </authorgroup>
0042 
0043 <copyright>
0044 <year>2002-2004</year>
0045 <holder>&Josef.Weidendorfer;</holder>   
0046 </copyright>
0047 <copyright>
0048 <year>2009</year>
0049 <holder>Federico Zenith</holder>
0050 </copyright>
0051 <legalnotice>&FDLNotice;</legalnotice>
0052 
0053 <date>2016-11-18</date>
0054 <releaseinfo>0.8.0 (Applications 17.04)</releaseinfo>
0055 
0056 <abstract>
0057 <para>
0058 &kcachegrind; is a profile data visualization tool, written using &kde-frameworks;.
0059 </para>
0060 </abstract>
0061 
0062 <keywordset>
0063 <keyword>KDE</keyword>
0064 <keyword>kdesdk</keyword>
0065 <keyword>Cachegrind</keyword>
0066 <keyword>Callgrind</keyword>
0067 <keyword>Valgrind</keyword>
0068 <keyword>Profiling</keyword>
0069 </keywordset>
0070 
0071 </bookinfo>
0072 
0073 
0074 <chapter id="introduction">
0075 <title>Introduction</title>
0076 
0077 <para>
0078 &kcachegrind; is a browser for data produced by profiling tools.
0079 This chapter explains what profiling is for, how it is done, and
0080 gives some examples of profiling tools available.
0081 </para>
0082 
0083 <sect1 id="introduction-profiling">
0084 <title>Profiling</title>
0085 
0086 <para>
0087 When developing a program, one of the last steps often involves performance
0088 optimizations.  As it makes no sense to optimize functions rarely used, because
0089 that would be a waste of time, one needs to know in which part of a program most
0090 of the time is spent.
0091 </para>
0092 
0093 <para>
0094 For sequential code, collecting statistical data of the programs runtime
0095 characteristic like time numbers spent in functions and code lines usually is
0096 enough.
0097 This is called Profiling. The program is run under control of a profiling tool,
0098 which gives the summary of an execution run at the end.
0099 In contrast, for parallel code, performance problems typically are caused when
0100 one processor is waiting for data from another. As this waiting time usually
0101 cannot easily attributed, here it is better to generate timestamped event
0102 traces. &kcachegrind; cannot visualize this kind of data.
0103 </para>
0104 
0105 <para>
0106 After analyzing the produced profile data, it should be easy to see the hot
0107 spots and bottlenecks of the code: for example, assumptions about call counts
0108 can be checked, and identified code regions can be optimized.
0109 Afterwards, the success of the optimization should be verified with another
0110 profile run.
0111 </para>
0112 </sect1>
0113 
0114 <sect1 id="introduction-methods">
0115 <title>Profiling Methods</title>
0116 
0117 <para>To exactly measure the time passed or record the events happening during
0118 the execution of a code region (&eg; a function), additional measurement code
0119 needs to be inserted before and after the given region. This code reads the
0120 time, or a global event count, and calculates differences. Thus, the original
0121 code has to be changed before execution. This is called instrumentation.
0122 Instrumentation can be done by the programmer itself, the compiler, or by the
0123 runtime system. As interesting regions usually are nested, the overhead of
0124 measurement always influences the measurement itself. Thus, instrumentation
0125 should be done selectively and results have to be interpreted with care. Of
0126 course, this makes performance analysis by exact measurement a very complex
0127 process.</para>
0128 
0129 <para>Exact measurement is possible because of hardware counters (including
0130 counters incrementing on a time tick) provided in modern processors, which are
0131 incremented whenever an event is happening. As we want to attribute events to
0132 code regions, without the counters, we would have to handle every event by
0133 incrementing a counter for the current code region ourself. Doing this in
0134 software is, of course, not possible; but, on the assumption that the event
0135 distribution over source code is similar when looking only at every n-th event
0136 instead of every event, a measurement method whose overhead is tunable has been
0137 developed: it is called Sampling. Time Based Sampling (&TBS;) uses a timer to
0138 regularly look at the program counter to create a histogram over the program
0139 code. Event Based Sampling (&EBS;) exploits the hardware counters of modern
0140 processors, and uses a mode where an interrupt handler is called on counter
0141 underflow to generate a histogram of the corresponding event distribution:
0142 in the handler, the event counter is always reinitialized to the
0143 <symbol>n</symbol> of the sampling method. The advantage of sampling is that the
0144 code does not have to be changed, but it is still a compromise: the above
0145 assumption will be more correct if <symbol>n</symbol> is small, but the smaller
0146 the <symbol>n</symbol>, the higher the overhead of the interrupt handler.</para>
0147 
0148 <para>Another measurement method is to simulate things happening in the computer
0149 system when executing a given code, &ie; execution driven simulation. The
0150 simulation is always derived from a more or less accurate machine model;
0151 however, with very detailed machine models, giving very close approximations to
0152 reality, the simulation time can be unacceptably high in practice.
0153 The advantage of simulation is that arbitrarily complex measurement/simulation
0154 code can be inserted in a given code without perturbing results. Doing this
0155 directly before execution (called runtime instrumentation), using the original
0156 binary, is very comfortable for the user: no re-compilation is necessary.
0157 Simulation becomes usable when simulating only parts of a machine with a simple
0158 model; another advantage is that the results produced by simple models are often
0159 easier to understand: often, the problem with real hardware is that results
0160 include overlapping effects from different parts of the machine.</para>
0161 </sect1>
0162 
0163 <sect1 id="introduction-tools">
0164 <title>Profiling Tools</title>
0165 
0166 <para>
0167 Most known is the GCC profiling tool <application>gprof</application>: one needs
0168 to compile the program with option <option>-pg</option>; running the program
0169 generates a file <filename>gmon.out</filename>, which can be transformed into
0170 human-readable form with <command>gprof</command>.
0171 One disadvantage is the required re-compilation step to prepare the executable,
0172 which has to be statically linked.
0173 The method used here is compiler-generated instrumentation, which measures call
0174 arcs happening among functions and corresponding call counts, in conjunction
0175 with &TBS;, which gives a histogram of time distribution over the code. Using
0176 both pieces of information, it is possible to heuristically calculate inclusive
0177 time of functions, &ie; time spent in a function together with all functions
0178 called from it.
0179 </para>
0180 
0181 <para>For exact measurement of events happening, libraries exist with functions
0182 able to read out hardware performance counters. Most known here is the PerfCtr
0183 patch for &Linux;, and the architecture independent libraries PAPI and PCL.
0184 Still, exact measurement needs instrumentation of code, as stated above. Either
0185 one uses the libraries itself or uses automatic instrumentation systems like
0186 ADAPTOR (for FORTRAN source instrumentation) or DynaProf (code injection via
0187 DynInst).</para>
0188 
0189 <para>
0190 &oprofile; is a system-wide profiling tool for &Linux; using Sampling.
0191 </para>
0192 
0193 <para>
0194 In many aspects, a comfortable way of Profiling is using &cachegrind; or
0195 &callgrind;, which are simulators using the runtime instrumentation framework
0196 &valgrind;. Because there is no need to access hardware counters (often
0197 difficult with today's &Linux; installations), and binaries to be profiled can
0198 be left unmodified, it is a good alternative to other profiling tools.
0199 The disadvantage of simulation - slowdown - can be reduced by doing the
0200 simulation on only the interesting program parts, and perhaps only on a few
0201 iterations of a loop. Without measurement/simulation instrumentation,
0202 &valgrind;'s usage only has a slowdown factor in the range of 3 to 5.
0203 Also, when only the call graph and call counts are of interest, the cache
0204 simulator can be switched off.
0205 </para>
0206 
0207 <para>
0208 Cache simulation is the first step in approximating real times, since runtime is
0209 very sensitive to the exploitation of so-called <emphasis>caches</emphasis>,
0210 small and fast buffers which accelerate repeated accesses to the same main
0211 memory cells, on modern systems.
0212 &cachegrind; does cache simulation by catching memory accesses.
0213 The data produced includes the number of instruction/data memory accesses and
0214 first- and second-level cache misses, and relates it to source lines and
0215 functions of the run program.
0216 By combining these miss counts, using miss latencies from typical processors,
0217 an estimation of spent time can be given.
0218 </para>
0219 
0220 <para>
0221 &callgrind; is an extension of &cachegrind; that builds up the call graph of a
0222 program on-the-fly, &ie; how the functions call each other and how many events
0223 happen while running a function. Also, the profile data to be collected can
0224 separated by threads and call chain contexts. It can provide profiling data on
0225 an instruction level to allow for annotation of disassembled code.
0226 </para>
0227 </sect1>
0228 
0229 <sect1 id="introduction-visualization">
0230 <title>Visualization</title>
0231 
0232 <para>
0233 Profiling tools typically produce a large amount of data. The wish to easily
0234 browse down and up the call graph, together with fast switching of the sorting
0235 mode of functions and display of different event types, motivates a &GUI;
0236 application to accomplish this task.
0237 </para>
0238 
0239 <para>
0240 &kcachegrind; is a visualization tool for profile data fulfilling these wishes.
0241 Despite being programmed first with browsing the data from &cachegrind; and
0242 &calltree; in mind, there are converters available to be able to display profile
0243 data produced by other tools. In the appendix, a description of the
0244 &cachegrind;/&callgrind; file format is given.
0245 </para>
0246 
0247 <para>
0248 Besides a list of functions sorted according exclusive or inclusive cost
0249 metrics, and optionally grouped by source file, shared library or C++ class,
0250 &kcachegrind; features various views for a selected function, namely:
0251 <itemizedlist>
0252 <listitem><para>a call-graph view, which shows a section of the call graph
0253 around the selected function,</para>
0254 </listitem>
0255 <listitem><para>a tree-map view, which allows nested-call relations to be
0256 visualized, together with inclusive cost metric for fast visual detection of
0257 problematic functions,</para>
0258 </listitem>
0259 <listitem><para>source code and disassembler annotation views, allowing to see
0260 details of cost related to source lines and assembler instructions.</para>
0261 </listitem>
0262 </itemizedlist>
0263 
0264 </para>
0265 </sect1>
0266 </chapter>
0267 
0268 <chapter id="using-kcachegrind">
0269 <title>Using &kcachegrind;</title>
0270 
0271 <sect1 id="using-profile">
0272 <title>Generate Data to Visualize</title>
0273 
0274 <para>First, one wants to generate performance data by measuring aspects of the
0275 runtime characteristics of an application, using a profiling tool. &kcachegrind;
0276 itself does not include any profiling tool, but is good in being used together
0277 with &callgrind;, and by using a converter, also can be used to visualize data
0278 produced with &oprofile;.  Although the scope of this manual is not to document
0279 profiling with these tools, the next section provides short quickstart tutorials
0280 to get you started.
0281 </para>
0282 
0283 <sect2>
0284 <title>&callgrind;</title>
0285 
0286 <para>
0287 &callgrind; is a part of <ulink url="http://valgrind.org">&valgrind;</ulink>.
0288 Note that it previously was called &calltree;, but that name was misleading.
0289 </para>
0290 
0291 <para>
0292 The most common use is to prefix the command line to start your application with
0293 <userinput><command>valgrind</command> <option>--tool=callgrind</option>
0294 </userinput>, as in:
0295 
0296 <blockquote><para><userinput>
0297 <command>valgrind</command> <option>--tool=callgrind</option>
0298 <replaceable>myprogram</replaceable> <replaceable>myargs</replaceable>
0299 </userinput></para></blockquote>
0300 
0301 At program termination, a file
0302 <filename>callgrind.out.<replaceable>pid</replaceable></filename> will be
0303 generated, which can be loaded into &kcachegrind;.
0304 </para>
0305 
0306 <para>
0307 More advanced use is to dump out profile data whenever a given function of your
0308 application is called. E.g. for &konqueror;, to see profile data only for the
0309 rendering of a Web page, you could decide to dump the data whenever you select
0310 the menu item <menuchoice><guimenu>View</guimenu><guimenuitem>Reload
0311 </guimenuitem></menuchoice>. This corresponds to a call to
0312 <methodname>KonqMainWindow::slotReload</methodname>. Use:
0313 
0314 <blockquote><para><userinput>
0315 <command>valgrind</command> <option>--tool=callgrind</option>
0316 <option>--dump-before=KonqMainWindow::slotReload</option>
0317 <replaceable>konqueror</replaceable>
0318 </userinput></para></blockquote>
0319 
0320 This will produce multiple profile data files with an additional sequential
0321 number at the end of the filename. A file without such an number at the end
0322 (only ending in the process PID) will also be produced; by loading this file
0323 into &kcachegrind;, all others are loaded too, and can be seen in the
0324 <guilabel>Parts Overview</guilabel> and <guilabel>Parts</guilabel> list.
0325 </para>
0326 
0327 </sect2>
0328 
0329 <sect2>
0330 <title>&oprofile;</title>
0331 
0332 <para>
0333 &oprofile; is available from <ulink url="http://oprofile.sf.net">its home
0334 page</ulink>. Follow the installation instructions on the Web site, but, before
0335 you do, check whether your distribution does not already provide it as package
0336 (like &SuSE;).
0337 </para>
0338 
0339 <para>
0340 System-wide profiling is only permitted to the root user, as all actions on the
0341 system can be observed; therefore, the following has to be done as root.
0342 First, configure the profiling process, using the &GUI;
0343 <command>oprof_start</command> or the command-line tool
0344 <command>opcontrol</command>. Standard configuration should be timer mode
0345 (&TBS;, see introduction).
0346 To start the measurement, run <userinput><command>opcontrol</command>
0347 <option>-s</option></userinput>.
0348 Then run the application you are interested in and, afterwards, do a
0349 <userinput><command>opcontrol</command> <option>-d</option></userinput>. This
0350 will write out the measurement results into files under folder <filename
0351 class="directory">/var/lib/oprofile/samples/</filename>.
0352 To be able to visualize the data in &kcachegrind;, do in an empty directory:
0353 
0354 <blockquote><para><userinput>
0355 <command>opreport</command> <option>-gdf</option> |
0356 <command>op2callgrind</command>
0357 </userinput></para></blockquote>
0358 
0359 This will produce a lot of files, one for every program which was running
0360 on the system. Each one can be loaded into &kcachegrind; on its own.
0361 </para>
0362 
0363 </sect2>
0364 </sect1>
0365 
0366 <sect1 id="using-basics">
0367 <title>User Interface Basics</title>
0368 
0369 <para>
0370 When starting &kcachegrind; with a profile data file as argument, or after
0371 loading one with <menuchoice><guimenu>File</guimenu>
0372 <guimenuitem>Open</guimenuitem></menuchoice>, you will see a navigation panel
0373 containing the function list at the left; and, on the right the main part, an
0374 area with views for a selected function. This view area can be arbitrarily
0375 configured to show multiple views at once.
0376 </para>
0377 
0378 <para>
0379 At first start, this area will be
0380 divided into a top and a bottom part, each with different tab-selectable views.
0381 To move views, use the tabs' context menu, and adjust the splitters between
0382 views. To switch quickly between different viewing layouts, use
0383 <menuchoice><shortcut><keycombo action="simul">&Ctrl;<keycap>→</keycap>
0384 </keycombo></shortcut> <guimenu>View</guimenu><guisubmenu>Layout</guisubmenu>
0385 <guimenuitem>Go to Next</guimenuitem></menuchoice> and
0386 <menuchoice><shortcut><keycombo action="simul">&Ctrl;<keycap>←</keycap>
0387 </keycombo></shortcut> <guimenu>View</guimenu><guisubmenu>Layout</guisubmenu>
0388 <guimenuitem>Go to Previous</guimenuitem></menuchoice>.
0389 </para>
0390 
0391 <para>
0392 The active event type is important for visualization: for &callgrind;, this is,
0393 for example, cache misses or cycle estimation; for &oprofile;, this is
0394 <quote>Timer</quote> in the simplest case. You can change the event type via a
0395 combobox in the toolbar or in the <guilabel>Event Type</guilabel> view.
0396 A first overview of the runtime characteristics should be given when you select
0397 function <function>main</function> in the left list; look then at the call graph
0398 view. There, you see the calls occurring in your program. Note that the call
0399 graph view only shows functions with high event count.
0400 By double-clicking a function in the graph, it will change to show the called
0401 functions around the selected one.
0402 </para>
0403 
0404 <para>
0405 To explore the &GUI; further, in addition to this manual, also have a look at
0406 the documentation section <ulink url="https://kcachegrind.github.io">on the Web
0407 site</ulink>.
0408 Also, every widget in &kcachegrind; has <quote>What's this</quote> help.
0409 </para>
0410 </sect1>
0411 
0412 </chapter>
0413 
0414 
0415 <chapter id="kcachegrind-concepts">
0416 <title>Basic Concepts</title>
0417 
0418 <para>This chapter explains some concepts of the &kcachegrind;, and introduces
0419 terms used in the interface.
0420 </para>
0421 
0422 <sect1 id="concepts-model">
0423 <title>The Data Model for Profile Data</title>
0424 
0425 <sect2>
0426 <title>Cost Entities</title>
0427 
0428 <para>
0429 Cost counts of event types (like L2 Misses) are attributed to cost entities,
0430 which are items with relationship to source code or data structures of a given
0431 program. Cost entities not only can be simple code or data positions, but also
0432 position tuples. For example, a call has a source and a target, or a data
0433 address can have a data type and a code position where its allocation happened.
0434 </para>
0435 
0436 <para>
0437 The cost entities known to &kcachegrind; are given in the following.
0438 Simple Positions:
0439 <variablelist>
0440 <varlistentry>
0441 <term>Instruction</term>
0442 <listitem><para>
0443 An assembler instruction at a specified address.
0444 </para></listitem>
0445 </varlistentry>
0446 <varlistentry>
0447 <term>Source Line of a Function</term>
0448 <listitem><para>
0449 All instructions that the compiler (via debug information) maps to a given
0450 source line specified by source file name and line number, and which are
0451 executed in the context of some function. The latter is needed because a source
0452 line inside of an inlined function can appear in the context of multiple
0453 functions. Instructions without any mapping to an actual source line are mapped
0454 to line number 0 in file <filename>???</filename>.
0455 </para></listitem>
0456 </varlistentry>
0457 <varlistentry>
0458 <term>Function</term>
0459 <listitem><para>
0460 All source lines of a given function make up the function itself. A function is
0461 specified by its name and its location in some binary object if available. The
0462 latter is needed because binary objects of a single program each can hold
0463 functions with the same name (these can be accessed &eg; with
0464 <function>dlopen</function> or <function>dlsym</function>; the runtime linker
0465 resolves functions in a given search order of binary objects used). If a
0466 profiling tool cannot detect the symbol name of a function, &eg; because debug
0467 information is not available, either the address of the first executed
0468 instruction typically is used, or <function>???</function>.
0469 </para></listitem>
0470 </varlistentry>
0471 <varlistentry>
0472 <term>Binary Object</term>
0473 <listitem><para>
0474 All functions whose code is inside the range of a given binary object, either
0475 the main executable or a shared library.
0476 </para></listitem>
0477 </varlistentry>
0478 <varlistentry>
0479 <term>Source File</term>
0480 <listitem><para>
0481 All functions whose first instruction is mapped to a line of the given source
0482 file.
0483 </para></listitem>
0484 </varlistentry>
0485 <varlistentry>
0486 <term>Class</term>
0487 <listitem><para>
0488 Symbol names of functions typically are hierarchically ordered in name spaces,
0489 &eg; C++ namespaces, or classes of object-oriented languages; thus, a class can
0490 hold functions of the class or embedded classes itself.
0491 </para></listitem>
0492 </varlistentry>
0493 <varlistentry>
0494 <term>Profile Part</term>
0495 <listitem><para>
0496 Some time section of a profile run, with a given thread ID, process ID, and
0497 command line executed.
0498 </para></listitem>
0499 </varlistentry>
0500 </variablelist>
0501 As can be seen from the list, a set of cost entities often defines another cost
0502 entity; thus, there is a inclusion hierarchy of cost entities.
0503 </para>
0504 
0505 <para>
0506 Positions tuples:
0507 <itemizedlist>
0508 <listitem><para>
0509 Call from instruction address to target function.
0510 </para></listitem>
0511 <listitem><para>
0512 Call from source line to target function.
0513 </para></listitem>
0514 <listitem><para>
0515 Call from source function to target function.
0516 </para></listitem>
0517 <listitem><para>
0518 (Un)conditional jump from source to target instruction.
0519 </para></listitem>
0520 <listitem><para>
0521 (Un)conditional jump from source to target line.
0522 </para></listitem>
0523 </itemizedlist>
0524 Jumps between functions are not allowed, as this makes no sense in a call graph;
0525 thus, constructs like exception handling and long jumps in C have to be
0526 translated to popping the call stack as needed.
0527 </para>
0528 
0529 </sect2>
0530 
0531 
0532 <sect2>
0533 <title>Event Types</title>
0534 
0535 <para>
0536 Arbitrary event types can be specified in the profile data by giving them a
0537 name. Their cost related to a cost entity is a 64-bit integer.
0538 </para>
0539 <para>
0540 Event types whose costs are specified in a profile data file are called real
0541 events. Additionally, one can specify formulas for event types calculated from
0542 real events, which are called inherited events.
0543 </para>
0544 </sect2>
0545 
0546 </sect1>
0547 
0548 <sect1 id="concepts-state">
0549 <title>Visualization State</title>
0550 
0551 <para>
0552 The visualization state of a &kcachegrind; window includes:
0553 <itemizedlist>
0554 <listitem><para>
0555 the primary and secondary event type chosen for display,
0556 </para></listitem>
0557 <listitem><para>
0558 the function grouping (used in the <guilabel>Function Profile</guilabel> list
0559 and entity coloring),
0560 </para></listitem>
0561 <listitem><para>
0562 the profile parts whose costs are to be included in visualization,
0563 </para></listitem>
0564 <listitem><para>
0565 an active cost entity (&eg; a function selected from the function profile
0566 sidedock),
0567 </para></listitem>
0568 <listitem><para>
0569 a selected cost entity.
0570 </para></listitem>
0571 </itemizedlist>
0572 This state influences the views.
0573 </para>
0574 
0575 <para>
0576 Views are always shown for one cost entity, the active one. When a given view
0577 is inappropriate for a cost entity, it is disabled: when selecting &eg; an &ELF;
0578 object in the group list, source annotation makes no sense.
0579 </para>
0580 
0581 <para>
0582 For example, for an active function, the callee list shows all the functions
0583 called from the active one: one can select one of these functions without making
0584 it active. Also, if the call graph is shown beside, it will automatically select
0585 the same function.
0586 </para>
0587 
0588 </sect1>
0589 
0590 <sect1 id="concepts-guiparts">
0591 <title>Parts of the &GUI;</title>
0592 
0593 <sect2>
0594 <title>Sidedocks</title>
0595 <para>
0596 Sidedocks are side windows which can be placed at any border of a &kcachegrind;
0597 window. They always contain a list of cost entities sorted in some way.
0598 <itemizedlist>
0599 <listitem><para>
0600 The <guilabel>Function Profile</guilabel> is a list of functions showing
0601 inclusive and exclusive cost, call count, name and position of functions.
0602 </para></listitem>
0603 <listitem><para>
0604 <guilabel>Parts Overview</guilabel>
0605 </para></listitem>
0606 <listitem><para>
0607 <guilabel>Call Stack</guilabel>
0608 </para></listitem>
0609 </itemizedlist>
0610 </para>
0611 </sect2>
0612 
0613 <sect2>
0614 <title>View Area</title>
0615 <para>
0616 The view area, typically the right part of a &kcachegrind; main window, is made
0617 up of one (default) or more tabs, lined up either horizontally or vertically.
0618 Each tab holds different views of only one cost entity at a time.
0619 The name of this entity is given at the top of the tab. If there are multiple
0620 tabs, only one is active. The entity name in the active tab is shown in bold,
0621 and determines the active cost entity of the &kcachegrind; window.
0622 </para>
0623 </sect2>
0624 
0625 <sect2>
0626 <title>Areas of a Tab</title>
0627 <para>
0628 Each tab can hold up to four view areas, namely Top, Right, Left, and Bottom.
0629 Each area can hold multiple stacked views. The visible part of an area is
0630 selected by a tab bar. The tab bars of the top and right area are at the top;
0631 the tab bars of the left and bottom area are at the bottom. You can specify
0632 which kind of view should go into which area by using the tabs' context menus.
0633 </para>
0634 </sect2>
0635 
0636 <sect2>
0637 <title>Synchronized View with Selected Entity in a Tab</title>
0638 <para>
0639 Besides an active entity, each tab has a selected entity. As most view types
0640 show multiple entities with the active one somehow centered, you can change
0641 the selected item by navigating inside a view (by clicking with the mouse
0642 or using the keyboard). Typically, selected items are shown in a highlighted
0643 state. By changing the selected entity in one of the views of a tab, all other
0644 views highlight the new selected entity accordingly.
0645 </para>
0646 </sect2>
0647 
0648 <sect2>
0649 <title>Synchronization between Tabs</title>
0650 <para>
0651 If there are multiple tabs, a selection change in one tab leads to an activation
0652 change in the next tab, be it right of the former or under it. This kind of
0653 linkage should, for example, allow for fast browsing in call graphs.
0654 </para>
0655 </sect2>
0656 
0657 <sect2>
0658 <title>Layouts</title>
0659 <para>
0660 The layout of all the tabs of a window can be saved (<menuchoice><guimenu>View
0661 </guimenu><guisubmenu>Layout</guisubmenu></menuchoice>). After duplicating the
0662 current layout (<menuchoice><shortcut><keycombo action="simul">&Ctrl;
0663 <keycap>+</keycap></keycombo></shortcut> <guimenu>View</guimenu>
0664 <guisubmenu>Layout</guisubmenu><guimenuitem>Duplicate</guimenuitem>
0665 </menuchoice>)
0666 and changing some sizes or moving a view to another area of a tab, you can
0667 quickly switch between the old and the new layout via <keycombo action="simul">
0668 &Ctrl;<keycap>←</keycap></keycombo> and <keycombo action="simul">&Ctrl;
0669 <keycap>→</keycap></keycombo>. The set of layouts will be stored between
0670 &kcachegrind; sessions of the same profiled command. You can make the current
0671 set of layouts the default one for new &kcachegrind; sessions, or restore the
0672 default layout set.
0673 </para>
0674 </sect2>
0675 </sect1>
0676 
0677 <sect1 id="concepts-sidedocks">
0678 <title>Sidedocks</title>
0679 
0680 <sect2>
0681 <title>Flat Profile</title>
0682 <para>
0683 The <guilabel>Flat Profile</guilabel> contains a group list and a function list.
0684 The group list contains all groups where cost is spent in, depending on the
0685 chosen group type. The group list is hidden when grouping is switched off.
0686 </para>
0687 <para>
0688 The function list contains the functions of the selected group (or all functions
0689 if grouping is switched off), ordered by some column, &eg; inclusive or self
0690 costs spent therein. There is a maximum number of functions shown in the list,
0691 configurable in <menuchoice><guimenu>Settings</guimenu><guimenuitem>Configure
0692 KCachegrind</guimenuitem></menuchoice>.
0693 </para>
0694 </sect2>
0695 
0696 <sect2>
0697 <title>Parts Overview</title>
0698 <para>
0699 In a profile run, multiple profile data files can be produced, which can be
0700 loaded together into &kcachegrind;. The <guilabel>Parts Overview</guilabel>
0701 sidedock shows these, ordered horizontally according to creation time; the
0702 rectangle sizes are proportional to the cost spent each part. You can select one
0703 or several parts to constrain the costs shown in the other &kcachegrind; views
0704 to these parts only.
0705 </para>
0706 <para>
0707 The parts are further subdivided between a partitioning and an inclusive cost
0708 split mode:
0709 <variablelist>
0710 <varlistentry>
0711 <term><guilabel>Partitioning Mode</guilabel></term>
0712 <listitem><para>
0713 The partitioning is shown in groups for a profile data part, according to the
0714 group type selected. For example, if &ELF; object groups are selected, you see
0715 colored rectangles for each used &ELF; object (shared library or executable),
0716 sized according to the cost spent therein.
0717 </para></listitem>
0718 </varlistentry>
0719 <varlistentry>
0720 <term><guilabel>Diagram Mode</guilabel></term>
0721 <listitem><para>
0722 A rectangle showing the inclusive cost of the current active function in the
0723 part is shown. This, again, is split up to show the inclusive costs of its
0724 callees.
0725 </para></listitem>
0726 </varlistentry>
0727 </variablelist>
0728 </para>
0729 </sect2>
0730 
0731 <sect2>
0732 <title>Call Stack</title>
0733 <para>
0734 This is a purely fictional <quote>most probable</quote> call stack. It is built
0735 up by starting with the current active function, and adds the callers and
0736 callees with highest cost at the top and to bottom.
0737 </para>
0738 <para>
0739 The <guilabel>Cost</guilabel> and <guilabel>Calls</guilabel> columns show the
0740 cost used for all calls from the function in the line above.
0741 </para>
0742 </sect2>
0743 </sect1>
0744 
0745 <sect1 id="concepts-views">
0746 <title>Views</title>
0747 
0748 <sect2>
0749 <title>Event Type</title>
0750 <para>
0751 The <guilabel>Event Type</guilabel> list shows all cost types available and the
0752 corresponding self and inclusive cost of the current active function for that
0753 event type.
0754 </para>
0755 <para>
0756 By choosing an event type from the list, you change the type of costs shown all
0757 over &kcachegrind; to the selected one.
0758 </para>
0759 </sect2>
0760 
0761 <sect2>
0762 <title>Call Lists</title>
0763 <para>
0764 These lists show calls to and from the current active function. With
0765 <guilabel>All Callers</guilabel> and <guilabel>All Callees</guilabel> are meant
0766 those functions reachable in the caller and callee direction, even when other
0767 functions are in between.
0768 </para>
0769 
0770 <para>
0771 Call list views include:
0772 <itemizedlist>
0773 <listitem><para>Direct <guilabel>Callers</guilabel></para></listitem>
0774 <listitem><para>Direct <guilabel>Callees</guilabel></para></listitem>
0775 <listitem><para><guilabel>All Callers</guilabel></para></listitem>
0776 <listitem><para><guilabel>All Callees</guilabel></para></listitem>
0777 </itemizedlist>
0778 </para>
0779 </sect2>
0780 
0781 <sect2>
0782 <title>Maps</title>
0783 <para>
0784 A treemap view of the primary event type, up or down the call
0785 hierarchy. Each colored rectangle represents a function; its size is
0786 approximately proportional to the cost spent therein while the active function
0787 is running (however, there are drawing constrains).
0788 </para>
0789 <para>
0790 For the <guilabel>Caller Map</guilabel>, the graph shows the nested hierarchy of
0791 all callers of the currently activated function; for the <guilabel>Callee
0792 Map</guilabel>, it shows that of all callees.
0793 </para>
0794 <para>
0795 Appearance options can be found in the context menu. To get exact size
0796 proportions, choose <guimenuitem>Skip Incorrect Borders</guimenuitem>. As this
0797 mode can be very time-consuming, you may want to limit the maximum drawn
0798 nesting level before. <guilabel>Best</guilabel> determinates the split direction
0799 for children from the aspect ratio of the parent. <guilabel>Always
0800 Best</guilabel> decides on remaining space for each sibling. <guilabel>Ignore
0801 Proportions</guilabel> takes space for function name drawing before drawing
0802 children. Note that size proportions can get heavily wrong.
0803 </para>
0804 <para>
0805 Keyboard navigation is available with the left and right arrow keys for
0806 traversing siblings, and up and down arrow keys to go a nesting level up and
0807 down. &Enter; activates the current item.
0808 </para>
0809 </sect2>
0810 
0811 <sect2>
0812 <title>Call Graph</title>
0813 <para>
0814 This view shows the call graph around the active function. The cost shown is
0815 only the cost spent while the active function was actually running, &ie; the
0816 cost shown for <function>main()</function> (if it's visible) should be the same
0817 as the cost of the active function, as that is the part of inclusive cost of
0818 <function>main()</function> spent while the active function was running.
0819 </para>
0820 <para>
0821 For cycles, blue call arrows indicate that this is an artificial call, which
0822 never actually happened, added for correct drawing.
0823 </para>
0824 <para>
0825 If the graph is larger than the drawing area, a bird's eye view is shown on a
0826 side. There are view options similar to those of the call maps; the selected
0827 function is highlighted.
0828 </para>
0829 </sect2>
0830 
0831 <sect2>
0832 <title>Annotations</title>
0833 <para>
0834 The annotated source or assembler lists show the source lines or disassembled
0835 instructions of the current active function together with the (self) cost spent
0836 executing the code of a source line or instruction. If there was a call, lines
0837 with details on the call are inserted into the source: the (inclusive) cost
0838 spent inside of the call, the number of calls happening, and the call
0839 destination.
0840 </para>
0841 <para>
0842 Select such a call information line to activate the call destination.
0843 </para>
0844 </sect2>
0845 </sect1>
0846 
0847 </chapter>
0848 
0849 
0850 <chapter id="commands">
0851 <title>Command Reference</title>
0852 
0853 <sect1 id="kcachegrind-mainwindow">
0854 <title>The main &kcachegrind; window</title>
0855 
0856 <sect2>
0857 <title>The File Menu</title>
0858 <para>
0859 <variablelist>
0860 
0861 <varlistentry>
0862 <term><menuchoice>
0863 <shortcut>
0864 <keycombo>&Ctrl;<keycap>N</keycap></keycombo>
0865 </shortcut>
0866 <guimenu>File</guimenu><guimenuitem>New</guimenuitem>
0867 </menuchoice></term>
0868 <listitem><para>
0869 <action>Opens an empty top-level window</action> in which you can
0870 load profile data. This action is not really necessary, as <menuchoice>
0871 <guimenu>File</guimenu><guimenuitem>Open</guimenuitem></menuchoice> gives you a
0872 new top-level window if the current one already shows some data.
0873 </para></listitem>
0874 </varlistentry>
0875 
0876 <varlistentry>
0877 <term><menuchoice>
0878 <shortcut>
0879 <keycombo>&Ctrl;<keycap>O</keycap></keycombo>
0880 </shortcut>
0881 <guimenu>File</guimenu><guimenuitem>Open</guimenuitem>
0882 </menuchoice></term>
0883 <listitem>
0884 <para>
0885 <action>Pops up the &kde; file selector</action> to choose a
0886 profile data file to be loaded. If there is some data already shown in the
0887 current top-level window, this will open a new window; if you want to open
0888 additional profile data in the current window, use <menuchoice>
0889 <guimenu>File</guimenu><guimenuitem>Add</guimenuitem></menuchoice>.
0890 </para>
0891 <para>
0892 The name of profile data files usually ends in <literal role="extension"
0893 >.<replaceable>pid</replaceable>.<replaceable>part</replaceable>-<replaceable
0894 >threadID</replaceable></literal>, where <replaceable>part</replaceable> and
0895 <replaceable>threadID</replaceable> are optional. <replaceable>pid</replaceable>
0896 and <replaceable>part</replaceable> are used for multiple profile data files
0897 belonging to one application run.
0898 By loading a file ending only in <literal role="extension"><replaceable
0899 >pid</replaceable></literal>, any existing data files for this run with
0900 additional endings are loaded as well.
0901 </para>
0902 <informalexample><para>
0903 If there exist profile data files <filename>cachegrind.out.123</filename> and
0904 <filename>cachegrind.out.123.1</filename>, by loading the first, the second will
0905 be automatically loaded too.
0906 </para></informalexample></listitem>
0907 </varlistentry>
0908 
0909 <varlistentry>
0910 <term><menuchoice>
0911 <guimenu>File</guimenu><guimenuitem>Add</guimenuitem>
0912 </menuchoice></term>
0913 <listitem><para>
0914 <action>Adds a profile data file</action> to the current window.
0915 Using this, you can force multiple data files to be loaded into the same
0916 top-level window even if they are not from the same run, as given by the profile
0917 data file naming convention. For example, this can be used for side-by-side
0918 comparison.
0919 </para></listitem>
0920 </varlistentry>
0921 
0922 <varlistentry>
0923 <term><menuchoice>
0924 <shortcut>
0925 <keycombo><keycap>F5</keycap></keycombo>
0926 </shortcut>
0927 <guimenu>File</guimenu><guimenuitem>Reload</guimenuitem>
0928 </menuchoice></term>
0929 <listitem><para>
0930 <action>Reload the profile data</action>. This is useful when another profile
0931 data file was generated for an already loaded application run.
0932 </para></listitem>
0933 </varlistentry>
0934 
0935 <varlistentry>
0936 <term><menuchoice>
0937 <shortcut>
0938 <keycombo>&Ctrl;<keycap>Q</keycap></keycombo>
0939 </shortcut>
0940 <guimenu>File</guimenu><guimenuitem>Quit</guimenuitem>
0941 </menuchoice></term>
0942 <listitem><para><action>Quits</action> &kcachegrind;</para></listitem>
0943 </varlistentry>
0944 </variablelist>
0945 </para>
0946 
0947 </sect2>
0948 
0949 </sect1>
0950 </chapter>
0951 
0952 <chapter id="faq">
0953 <title>Questions and Answers</title>
0954 
0955 <qandaset id="faqlist">
0956 
0957 
0958 <qandaentry>
0959 <question>
0960 <para>
0961 What is &kcachegrind; for? I have no idea.
0962 </para>
0963 </question>
0964 <answer>
0965 <para>
0966 &kcachegrind; is a helpful at a late stage in software development, called
0967 profiling. If you don't develop applications, you don't need &kcachegrind;.
0968 </para>
0969 </answer>
0970 </qandaentry>
0971 
0972 <qandaentry>
0973 <question>
0974 <para>
0975 What is the difference between <guilabel>Incl.</guilabel> and
0976 <guilabel>Self</guilabel>?
0977 </para>
0978 </question>
0979 <answer>
0980 <para>These are cost attributes for functions regarding some event type. As
0981 functions can call each other, it makes sense to distinguish the cost of the
0982 function itself (<quote>Self Cost</quote>) and the cost including all called
0983 functions (<quote>Inclusive Cost</quote>). <quote>Self</quote> is sometimes also
0984 referred to as <quote>Exclusive</quote> costs.
0985 </para>
0986 <para>
0987 So, for example, for <function>main()</function>, you will always have an
0988 inclusive cost of almost 100%, whereas the self cost is negligible when the real
0989 work is done in another function.
0990 </para>
0991 </answer>
0992 </qandaentry>
0993 
0994 <qandaentry>
0995 <question>
0996 <para>
0997 If I double-click on a function down in the <guilabel>Call Graph</guilabel>
0998 view, it shows for function <function>main()</function> the same cost as the
0999 selected function. Isn't this supposed to be constant at 100%?
1000 </para>
1001 </question>
1002 <answer>
1003 <para>
1004 You have activated a function below <function>main()</function>, which obviously
1005 costs less than <function>main()</function> itself. For every function, it is
1006 shown only the part of the cost spent while the <emphasis>activated</emphasis>
1007 function is running; that is, the cost shown for any function can never be
1008 higher than the cost of the activated function.
1009 </para>
1010 </answer>
1011 </qandaentry>
1012 
1013 
1014 </qandaset>
1015 </chapter>
1016 
1017 
1018 <glossary>
1019 
1020 <glossentry id="costentity">
1021 <glossterm>Cost Entity</glossterm>
1022 <glossdef><para>An abstract item related to source code to which event counts
1023 can be attributed. Dimensions for cost entities are code location (&eg; source
1024 line, function), data location (&eg; accessed data type, data object), execution
1025 location (&eg; thread, process), and tuples or triples of the aforementioned
1026 positions (&eg; calls, object access from statement, evicted data from
1027 cache).</para></glossdef>
1028 </glossentry>
1029 
1030 <glossentry id="eventcosts">
1031 <glossterm>Event Costs</glossterm>
1032 <glossdef><para>Sum of events of some event type occurring while the execution
1033 is related to some cost entity. The cost is attributed to the
1034 entity.</para></glossdef>
1035 </glossentry>
1036 
1037 <glossentry id="eventtype">
1038 <glossterm>Event Type</glossterm>
1039 <glossdef><para>The kind of event of which costs can be attributed to a cost
1040 entity. There are real event types and inherited event types.</para></glossdef>
1041 </glossentry>
1042 
1043 <glossentry id="inheritedeventtype">
1044 <glossterm>Inherited Event Type</glossterm>
1045 <glossdef><para>A virtual event type only visible in the view, defined by a
1046 formula to be calculated from real event types.</para></glossdef>
1047 </glossentry>
1048 
1049 <glossentry id="profiledatafile">
1050 <glossterm>Profile Data File</glossterm>
1051 <glossdef><para>A file containing data measured in a profile experiment, or part
1052 of one, or produced by post-processing a trace. Its size is typically linear
1053 with the code size of the program.</para></glossdef>
1054 </glossentry>
1055 
1056 <glossentry id="profiledatapart">
1057 <glossterm>Profile Data Part</glossterm>
1058 <glossdef><para>Data from a profile data file.</para></glossdef>
1059 </glossentry>
1060 
1061 <glossentry id="profileexperiment">
1062 <glossterm>Profile Experiment</glossterm>
1063 <glossdef><para>A program run supervised by a profiling tool, producing possibly
1064 multiple profile data files from parts or threads of the run.</para></glossdef>
1065 </glossentry>
1066 
1067 <glossentry id="profileproject">
1068 <glossterm>Profile Project</glossterm>
1069 <glossdef><para>A configuration for profile experiments used for one program to
1070 profile, perhaps in multiple versions. Comparisons of profile data typically
1071 only makes sense between profile data produced in experiments of one profile
1072 project.</para></glossdef>
1073 </glossentry>
1074 
1075 <glossentry id="profiling">
1076 <glossterm>Profiling</glossterm>
1077 <glossdef><para>The process of collecting statistical information about runtime
1078 characteristics of program runs.</para></glossdef>
1079 </glossentry>
1080 
1081 <glossentry id="realeventtype">
1082 <glossterm>Real Event Type</glossterm>
1083 <glossdef><para>An event type that can be measured by a tool. This requires the
1084 existence of a sensor for the given event type.</para></glossdef>
1085 </glossentry>
1086 
1087 <glossentry id="trace">
1088 <glossterm>Trace</glossterm>
1089 <glossdef><para>A sequence of timestamped events that occurred while tracing a
1090 program run. Its size is typically linear with the execution time of the program
1091 run.</para></glossdef>
1092 </glossentry>
1093 
1094 <glossentry id="tracepart">
1095 <glossterm>Trace Part</glossterm>
1096 <glosssee otherterm="profiledatapart"/>
1097 </glossentry>
1098 
1099 <glossentry id="tracing">
1100 <glossterm>Tracing</glossterm>
1101 <glossdef><para>The process of supervising a program run and storing its events,
1102 sorted by a timestamp, in an output file, the trace.</para></glossdef>
1103 </glossentry>
1104 
1105 </glossary>
1106 
1107 <chapter id="credits">
1108 
1109 <title>Credits and License</title>
1110 
1111 <para>
1112 Thanks to Julian Seward for his excellent &valgrind;, and Nicholas Nethercote
1113 for the &cachegrind; addition. Without these programs, &kcachegrind; would not
1114 exist. Some ideas for this &GUI; were from them, too.
1115 </para>
1116 <para>
1117 Thanks for all the bug reports and suggestions from different users.
1118 </para>
1119 
1120 <!-- TRANS:CREDIT_FOR_TRANSLATORS -->
1121 &underFDL;               <!-- FDL License -->
1122 
1123 </chapter>
1124 
1125 &documentation.index;
1126 </book>