Warning, /sdk/kcachegrind/doc/index.docbook is written in an unsupported language. File is not indexed.
0001 <?xml version="1.0" ?> 0002 <!DOCTYPE book PUBLIC "-//KDE//DTD DocBook XML V4.5-Based Variant V1.1//EN" "dtd/kdedbx45.dtd" [ 0003 <!ENTITY kcachegrind '<application>KCachegrind</application>'> 0004 <!ENTITY cachegrind "<application>Cachegrind</application>"> 0005 <!ENTITY calltree "<application>Calltree</application>"> 0006 <!ENTITY callgrind "<application>Callgrind</application>"> 0007 <!ENTITY valgrind "<application>Valgrind</application>"> 0008 <!ENTITY oprofile "<application>OProfile</application>"> 0009 <!ENTITY EBS "<acronym>EBS</acronym>"> 0010 <!ENTITY TBS "<acronym>TBS</acronym>"> 0011 <!ENTITY % addindex "IGNORE"> 0012 <!ENTITY % English "INCLUDE"> 0013 ]> 0014 0015 <book id="kcachegrind" lang="&language;"> 0016 0017 <bookinfo> 0018 <title>The &kcachegrind; Handbook</title> 0019 0020 <authorgroup> 0021 <author> 0022 <firstname>Josef</firstname> 0023 <surname>Weidendorfer</surname> 0024 <affiliation> 0025 <address><email>Josef.Weidendorfer@gmx.de</email></address> 0026 </affiliation> 0027 <contrib>Original author of the documentation</contrib> 0028 </author> 0029 0030 <author> 0031 <firstname>Federico</firstname> 0032 <surname>Zenith</surname> 0033 <affiliation> 0034 <address><email>federico.zenith@member.fsf.org</email></address> 0035 </affiliation> 0036 <contrib>Updates and corrections</contrib> 0037 </author> 0038 0039 <!-- TRANS:ROLES_OF_TRANSLATORS --> 0040 0041 </authorgroup> 0042 0043 <copyright> 0044 <year>2002-2004</year> 0045 <holder>&Josef.Weidendorfer;</holder> 0046 </copyright> 0047 <copyright> 0048 <year>2009</year> 0049 <holder>Federico Zenith</holder> 0050 </copyright> 0051 <legalnotice>&FDLNotice;</legalnotice> 0052 0053 <date>2016-11-18</date> 0054 <releaseinfo>0.8.0 (Applications 17.04)</releaseinfo> 0055 0056 <abstract> 0057 <para> 0058 &kcachegrind; is a profile data visualization tool, written using &kde-frameworks;. 0059 </para> 0060 </abstract> 0061 0062 <keywordset> 0063 <keyword>KDE</keyword> 0064 <keyword>kdesdk</keyword> 0065 <keyword>Cachegrind</keyword> 0066 <keyword>Callgrind</keyword> 0067 <keyword>Valgrind</keyword> 0068 <keyword>Profiling</keyword> 0069 </keywordset> 0070 0071 </bookinfo> 0072 0073 0074 <chapter id="introduction"> 0075 <title>Introduction</title> 0076 0077 <para> 0078 &kcachegrind; is a browser for data produced by profiling tools. 0079 This chapter explains what profiling is for, how it is done, and 0080 gives some examples of profiling tools available. 0081 </para> 0082 0083 <sect1 id="introduction-profiling"> 0084 <title>Profiling</title> 0085 0086 <para> 0087 When developing a program, one of the last steps often involves performance 0088 optimizations. As it makes no sense to optimize functions rarely used, because 0089 that would be a waste of time, one needs to know in which part of a program most 0090 of the time is spent. 0091 </para> 0092 0093 <para> 0094 For sequential code, collecting statistical data of the programs runtime 0095 characteristic like time numbers spent in functions and code lines usually is 0096 enough. 0097 This is called Profiling. The program is run under control of a profiling tool, 0098 which gives the summary of an execution run at the end. 0099 In contrast, for parallel code, performance problems typically are caused when 0100 one processor is waiting for data from another. As this waiting time usually 0101 cannot easily attributed, here it is better to generate timestamped event 0102 traces. &kcachegrind; cannot visualize this kind of data. 0103 </para> 0104 0105 <para> 0106 After analyzing the produced profile data, it should be easy to see the hot 0107 spots and bottlenecks of the code: for example, assumptions about call counts 0108 can be checked, and identified code regions can be optimized. 0109 Afterwards, the success of the optimization should be verified with another 0110 profile run. 0111 </para> 0112 </sect1> 0113 0114 <sect1 id="introduction-methods"> 0115 <title>Profiling Methods</title> 0116 0117 <para>To exactly measure the time passed or record the events happening during 0118 the execution of a code region (⪚ a function), additional measurement code 0119 needs to be inserted before and after the given region. This code reads the 0120 time, or a global event count, and calculates differences. Thus, the original 0121 code has to be changed before execution. This is called instrumentation. 0122 Instrumentation can be done by the programmer itself, the compiler, or by the 0123 runtime system. As interesting regions usually are nested, the overhead of 0124 measurement always influences the measurement itself. Thus, instrumentation 0125 should be done selectively and results have to be interpreted with care. Of 0126 course, this makes performance analysis by exact measurement a very complex 0127 process.</para> 0128 0129 <para>Exact measurement is possible because of hardware counters (including 0130 counters incrementing on a time tick) provided in modern processors, which are 0131 incremented whenever an event is happening. As we want to attribute events to 0132 code regions, without the counters, we would have to handle every event by 0133 incrementing a counter for the current code region ourself. Doing this in 0134 software is, of course, not possible; but, on the assumption that the event 0135 distribution over source code is similar when looking only at every n-th event 0136 instead of every event, a measurement method whose overhead is tunable has been 0137 developed: it is called Sampling. Time Based Sampling (&TBS;) uses a timer to 0138 regularly look at the program counter to create a histogram over the program 0139 code. Event Based Sampling (&EBS;) exploits the hardware counters of modern 0140 processors, and uses a mode where an interrupt handler is called on counter 0141 underflow to generate a histogram of the corresponding event distribution: 0142 in the handler, the event counter is always reinitialized to the 0143 <symbol>n</symbol> of the sampling method. The advantage of sampling is that the 0144 code does not have to be changed, but it is still a compromise: the above 0145 assumption will be more correct if <symbol>n</symbol> is small, but the smaller 0146 the <symbol>n</symbol>, the higher the overhead of the interrupt handler.</para> 0147 0148 <para>Another measurement method is to simulate things happening in the computer 0149 system when executing a given code, &ie; execution driven simulation. The 0150 simulation is always derived from a more or less accurate machine model; 0151 however, with very detailed machine models, giving very close approximations to 0152 reality, the simulation time can be unacceptably high in practice. 0153 The advantage of simulation is that arbitrarily complex measurement/simulation 0154 code can be inserted in a given code without perturbing results. Doing this 0155 directly before execution (called runtime instrumentation), using the original 0156 binary, is very comfortable for the user: no re-compilation is necessary. 0157 Simulation becomes usable when simulating only parts of a machine with a simple 0158 model; another advantage is that the results produced by simple models are often 0159 easier to understand: often, the problem with real hardware is that results 0160 include overlapping effects from different parts of the machine.</para> 0161 </sect1> 0162 0163 <sect1 id="introduction-tools"> 0164 <title>Profiling Tools</title> 0165 0166 <para> 0167 Most known is the GCC profiling tool <application>gprof</application>: one needs 0168 to compile the program with option <option>-pg</option>; running the program 0169 generates a file <filename>gmon.out</filename>, which can be transformed into 0170 human-readable form with <command>gprof</command>. 0171 One disadvantage is the required re-compilation step to prepare the executable, 0172 which has to be statically linked. 0173 The method used here is compiler-generated instrumentation, which measures call 0174 arcs happening among functions and corresponding call counts, in conjunction 0175 with &TBS;, which gives a histogram of time distribution over the code. Using 0176 both pieces of information, it is possible to heuristically calculate inclusive 0177 time of functions, &ie; time spent in a function together with all functions 0178 called from it. 0179 </para> 0180 0181 <para>For exact measurement of events happening, libraries exist with functions 0182 able to read out hardware performance counters. Most known here is the PerfCtr 0183 patch for &Linux;, and the architecture independent libraries PAPI and PCL. 0184 Still, exact measurement needs instrumentation of code, as stated above. Either 0185 one uses the libraries itself or uses automatic instrumentation systems like 0186 ADAPTOR (for FORTRAN source instrumentation) or DynaProf (code injection via 0187 DynInst).</para> 0188 0189 <para> 0190 &oprofile; is a system-wide profiling tool for &Linux; using Sampling. 0191 </para> 0192 0193 <para> 0194 In many aspects, a comfortable way of Profiling is using &cachegrind; or 0195 &callgrind;, which are simulators using the runtime instrumentation framework 0196 &valgrind;. Because there is no need to access hardware counters (often 0197 difficult with today's &Linux; installations), and binaries to be profiled can 0198 be left unmodified, it is a good alternative to other profiling tools. 0199 The disadvantage of simulation - slowdown - can be reduced by doing the 0200 simulation on only the interesting program parts, and perhaps only on a few 0201 iterations of a loop. Without measurement/simulation instrumentation, 0202 &valgrind;'s usage only has a slowdown factor in the range of 3 to 5. 0203 Also, when only the call graph and call counts are of interest, the cache 0204 simulator can be switched off. 0205 </para> 0206 0207 <para> 0208 Cache simulation is the first step in approximating real times, since runtime is 0209 very sensitive to the exploitation of so-called <emphasis>caches</emphasis>, 0210 small and fast buffers which accelerate repeated accesses to the same main 0211 memory cells, on modern systems. 0212 &cachegrind; does cache simulation by catching memory accesses. 0213 The data produced includes the number of instruction/data memory accesses and 0214 first- and second-level cache misses, and relates it to source lines and 0215 functions of the run program. 0216 By combining these miss counts, using miss latencies from typical processors, 0217 an estimation of spent time can be given. 0218 </para> 0219 0220 <para> 0221 &callgrind; is an extension of &cachegrind; that builds up the call graph of a 0222 program on-the-fly, &ie; how the functions call each other and how many events 0223 happen while running a function. Also, the profile data to be collected can 0224 separated by threads and call chain contexts. It can provide profiling data on 0225 an instruction level to allow for annotation of disassembled code. 0226 </para> 0227 </sect1> 0228 0229 <sect1 id="introduction-visualization"> 0230 <title>Visualization</title> 0231 0232 <para> 0233 Profiling tools typically produce a large amount of data. The wish to easily 0234 browse down and up the call graph, together with fast switching of the sorting 0235 mode of functions and display of different event types, motivates a &GUI; 0236 application to accomplish this task. 0237 </para> 0238 0239 <para> 0240 &kcachegrind; is a visualization tool for profile data fulfilling these wishes. 0241 Despite being programmed first with browsing the data from &cachegrind; and 0242 &calltree; in mind, there are converters available to be able to display profile 0243 data produced by other tools. In the appendix, a description of the 0244 &cachegrind;/&callgrind; file format is given. 0245 </para> 0246 0247 <para> 0248 Besides a list of functions sorted according exclusive or inclusive cost 0249 metrics, and optionally grouped by source file, shared library or C++ class, 0250 &kcachegrind; features various views for a selected function, namely: 0251 <itemizedlist> 0252 <listitem><para>a call-graph view, which shows a section of the call graph 0253 around the selected function,</para> 0254 </listitem> 0255 <listitem><para>a tree-map view, which allows nested-call relations to be 0256 visualized, together with inclusive cost metric for fast visual detection of 0257 problematic functions,</para> 0258 </listitem> 0259 <listitem><para>source code and disassembler annotation views, allowing to see 0260 details of cost related to source lines and assembler instructions.</para> 0261 </listitem> 0262 </itemizedlist> 0263 0264 </para> 0265 </sect1> 0266 </chapter> 0267 0268 <chapter id="using-kcachegrind"> 0269 <title>Using &kcachegrind;</title> 0270 0271 <sect1 id="using-profile"> 0272 <title>Generate Data to Visualize</title> 0273 0274 <para>First, one wants to generate performance data by measuring aspects of the 0275 runtime characteristics of an application, using a profiling tool. &kcachegrind; 0276 itself does not include any profiling tool, but is good in being used together 0277 with &callgrind;, and by using a converter, also can be used to visualize data 0278 produced with &oprofile;. Although the scope of this manual is not to document 0279 profiling with these tools, the next section provides short quickstart tutorials 0280 to get you started. 0281 </para> 0282 0283 <sect2> 0284 <title>&callgrind;</title> 0285 0286 <para> 0287 &callgrind; is a part of <ulink url="http://valgrind.org">&valgrind;</ulink>. 0288 Note that it previously was called &calltree;, but that name was misleading. 0289 </para> 0290 0291 <para> 0292 The most common use is to prefix the command line to start your application with 0293 <userinput><command>valgrind</command> <option>--tool=callgrind</option> 0294 </userinput>, as in: 0295 0296 <blockquote><para><userinput> 0297 <command>valgrind</command> <option>--tool=callgrind</option> 0298 <replaceable>myprogram</replaceable> <replaceable>myargs</replaceable> 0299 </userinput></para></blockquote> 0300 0301 At program termination, a file 0302 <filename>callgrind.out.<replaceable>pid</replaceable></filename> will be 0303 generated, which can be loaded into &kcachegrind;. 0304 </para> 0305 0306 <para> 0307 More advanced use is to dump out profile data whenever a given function of your 0308 application is called. E.g. for &konqueror;, to see profile data only for the 0309 rendering of a Web page, you could decide to dump the data whenever you select 0310 the menu item <menuchoice><guimenu>View</guimenu><guimenuitem>Reload 0311 </guimenuitem></menuchoice>. This corresponds to a call to 0312 <methodname>KonqMainWindow::slotReload</methodname>. Use: 0313 0314 <blockquote><para><userinput> 0315 <command>valgrind</command> <option>--tool=callgrind</option> 0316 <option>--dump-before=KonqMainWindow::slotReload</option> 0317 <replaceable>konqueror</replaceable> 0318 </userinput></para></blockquote> 0319 0320 This will produce multiple profile data files with an additional sequential 0321 number at the end of the filename. A file without such an number at the end 0322 (only ending in the process PID) will also be produced; by loading this file 0323 into &kcachegrind;, all others are loaded too, and can be seen in the 0324 <guilabel>Parts Overview</guilabel> and <guilabel>Parts</guilabel> list. 0325 </para> 0326 0327 </sect2> 0328 0329 <sect2> 0330 <title>&oprofile;</title> 0331 0332 <para> 0333 &oprofile; is available from <ulink url="http://oprofile.sf.net">its home 0334 page</ulink>. Follow the installation instructions on the Web site, but, before 0335 you do, check whether your distribution does not already provide it as package 0336 (like &SuSE;). 0337 </para> 0338 0339 <para> 0340 System-wide profiling is only permitted to the root user, as all actions on the 0341 system can be observed; therefore, the following has to be done as root. 0342 First, configure the profiling process, using the &GUI; 0343 <command>oprof_start</command> or the command-line tool 0344 <command>opcontrol</command>. Standard configuration should be timer mode 0345 (&TBS;, see introduction). 0346 To start the measurement, run <userinput><command>opcontrol</command> 0347 <option>-s</option></userinput>. 0348 Then run the application you are interested in and, afterwards, do a 0349 <userinput><command>opcontrol</command> <option>-d</option></userinput>. This 0350 will write out the measurement results into files under folder <filename 0351 class="directory">/var/lib/oprofile/samples/</filename>. 0352 To be able to visualize the data in &kcachegrind;, do in an empty directory: 0353 0354 <blockquote><para><userinput> 0355 <command>opreport</command> <option>-gdf</option> | 0356 <command>op2callgrind</command> 0357 </userinput></para></blockquote> 0358 0359 This will produce a lot of files, one for every program which was running 0360 on the system. Each one can be loaded into &kcachegrind; on its own. 0361 </para> 0362 0363 </sect2> 0364 </sect1> 0365 0366 <sect1 id="using-basics"> 0367 <title>User Interface Basics</title> 0368 0369 <para> 0370 When starting &kcachegrind; with a profile data file as argument, or after 0371 loading one with <menuchoice><guimenu>File</guimenu> 0372 <guimenuitem>Open</guimenuitem></menuchoice>, you will see a navigation panel 0373 containing the function list at the left; and, on the right the main part, an 0374 area with views for a selected function. This view area can be arbitrarily 0375 configured to show multiple views at once. 0376 </para> 0377 0378 <para> 0379 At first start, this area will be 0380 divided into a top and a bottom part, each with different tab-selectable views. 0381 To move views, use the tabs' context menu, and adjust the splitters between 0382 views. To switch quickly between different viewing layouts, use 0383 <menuchoice><shortcut><keycombo action="simul">&Ctrl;<keycap>→</keycap> 0384 </keycombo></shortcut> <guimenu>View</guimenu><guisubmenu>Layout</guisubmenu> 0385 <guimenuitem>Go to Next</guimenuitem></menuchoice> and 0386 <menuchoice><shortcut><keycombo action="simul">&Ctrl;<keycap>←</keycap> 0387 </keycombo></shortcut> <guimenu>View</guimenu><guisubmenu>Layout</guisubmenu> 0388 <guimenuitem>Go to Previous</guimenuitem></menuchoice>. 0389 </para> 0390 0391 <para> 0392 The active event type is important for visualization: for &callgrind;, this is, 0393 for example, cache misses or cycle estimation; for &oprofile;, this is 0394 <quote>Timer</quote> in the simplest case. You can change the event type via a 0395 combobox in the toolbar or in the <guilabel>Event Type</guilabel> view. 0396 A first overview of the runtime characteristics should be given when you select 0397 function <function>main</function> in the left list; look then at the call graph 0398 view. There, you see the calls occurring in your program. Note that the call 0399 graph view only shows functions with high event count. 0400 By double-clicking a function in the graph, it will change to show the called 0401 functions around the selected one. 0402 </para> 0403 0404 <para> 0405 To explore the &GUI; further, in addition to this manual, also have a look at 0406 the documentation section <ulink url="https://kcachegrind.github.io">on the Web 0407 site</ulink>. 0408 Also, every widget in &kcachegrind; has <quote>What's this</quote> help. 0409 </para> 0410 </sect1> 0411 0412 </chapter> 0413 0414 0415 <chapter id="kcachegrind-concepts"> 0416 <title>Basic Concepts</title> 0417 0418 <para>This chapter explains some concepts of the &kcachegrind;, and introduces 0419 terms used in the interface. 0420 </para> 0421 0422 <sect1 id="concepts-model"> 0423 <title>The Data Model for Profile Data</title> 0424 0425 <sect2> 0426 <title>Cost Entities</title> 0427 0428 <para> 0429 Cost counts of event types (like L2 Misses) are attributed to cost entities, 0430 which are items with relationship to source code or data structures of a given 0431 program. Cost entities not only can be simple code or data positions, but also 0432 position tuples. For example, a call has a source and a target, or a data 0433 address can have a data type and a code position where its allocation happened. 0434 </para> 0435 0436 <para> 0437 The cost entities known to &kcachegrind; are given in the following. 0438 Simple Positions: 0439 <variablelist> 0440 <varlistentry> 0441 <term>Instruction</term> 0442 <listitem><para> 0443 An assembler instruction at a specified address. 0444 </para></listitem> 0445 </varlistentry> 0446 <varlistentry> 0447 <term>Source Line of a Function</term> 0448 <listitem><para> 0449 All instructions that the compiler (via debug information) maps to a given 0450 source line specified by source file name and line number, and which are 0451 executed in the context of some function. The latter is needed because a source 0452 line inside of an inlined function can appear in the context of multiple 0453 functions. Instructions without any mapping to an actual source line are mapped 0454 to line number 0 in file <filename>???</filename>. 0455 </para></listitem> 0456 </varlistentry> 0457 <varlistentry> 0458 <term>Function</term> 0459 <listitem><para> 0460 All source lines of a given function make up the function itself. A function is 0461 specified by its name and its location in some binary object if available. The 0462 latter is needed because binary objects of a single program each can hold 0463 functions with the same name (these can be accessed ⪚ with 0464 <function>dlopen</function> or <function>dlsym</function>; the runtime linker 0465 resolves functions in a given search order of binary objects used). If a 0466 profiling tool cannot detect the symbol name of a function, ⪚ because debug 0467 information is not available, either the address of the first executed 0468 instruction typically is used, or <function>???</function>. 0469 </para></listitem> 0470 </varlistentry> 0471 <varlistentry> 0472 <term>Binary Object</term> 0473 <listitem><para> 0474 All functions whose code is inside the range of a given binary object, either 0475 the main executable or a shared library. 0476 </para></listitem> 0477 </varlistentry> 0478 <varlistentry> 0479 <term>Source File</term> 0480 <listitem><para> 0481 All functions whose first instruction is mapped to a line of the given source 0482 file. 0483 </para></listitem> 0484 </varlistentry> 0485 <varlistentry> 0486 <term>Class</term> 0487 <listitem><para> 0488 Symbol names of functions typically are hierarchically ordered in name spaces, 0489 ⪚ C++ namespaces, or classes of object-oriented languages; thus, a class can 0490 hold functions of the class or embedded classes itself. 0491 </para></listitem> 0492 </varlistentry> 0493 <varlistentry> 0494 <term>Profile Part</term> 0495 <listitem><para> 0496 Some time section of a profile run, with a given thread ID, process ID, and 0497 command line executed. 0498 </para></listitem> 0499 </varlistentry> 0500 </variablelist> 0501 As can be seen from the list, a set of cost entities often defines another cost 0502 entity; thus, there is a inclusion hierarchy of cost entities. 0503 </para> 0504 0505 <para> 0506 Positions tuples: 0507 <itemizedlist> 0508 <listitem><para> 0509 Call from instruction address to target function. 0510 </para></listitem> 0511 <listitem><para> 0512 Call from source line to target function. 0513 </para></listitem> 0514 <listitem><para> 0515 Call from source function to target function. 0516 </para></listitem> 0517 <listitem><para> 0518 (Un)conditional jump from source to target instruction. 0519 </para></listitem> 0520 <listitem><para> 0521 (Un)conditional jump from source to target line. 0522 </para></listitem> 0523 </itemizedlist> 0524 Jumps between functions are not allowed, as this makes no sense in a call graph; 0525 thus, constructs like exception handling and long jumps in C have to be 0526 translated to popping the call stack as needed. 0527 </para> 0528 0529 </sect2> 0530 0531 0532 <sect2> 0533 <title>Event Types</title> 0534 0535 <para> 0536 Arbitrary event types can be specified in the profile data by giving them a 0537 name. Their cost related to a cost entity is a 64-bit integer. 0538 </para> 0539 <para> 0540 Event types whose costs are specified in a profile data file are called real 0541 events. Additionally, one can specify formulas for event types calculated from 0542 real events, which are called inherited events. 0543 </para> 0544 </sect2> 0545 0546 </sect1> 0547 0548 <sect1 id="concepts-state"> 0549 <title>Visualization State</title> 0550 0551 <para> 0552 The visualization state of a &kcachegrind; window includes: 0553 <itemizedlist> 0554 <listitem><para> 0555 the primary and secondary event type chosen for display, 0556 </para></listitem> 0557 <listitem><para> 0558 the function grouping (used in the <guilabel>Function Profile</guilabel> list 0559 and entity coloring), 0560 </para></listitem> 0561 <listitem><para> 0562 the profile parts whose costs are to be included in visualization, 0563 </para></listitem> 0564 <listitem><para> 0565 an active cost entity (⪚ a function selected from the function profile 0566 sidedock), 0567 </para></listitem> 0568 <listitem><para> 0569 a selected cost entity. 0570 </para></listitem> 0571 </itemizedlist> 0572 This state influences the views. 0573 </para> 0574 0575 <para> 0576 Views are always shown for one cost entity, the active one. When a given view 0577 is inappropriate for a cost entity, it is disabled: when selecting ⪚ an &ELF; 0578 object in the group list, source annotation makes no sense. 0579 </para> 0580 0581 <para> 0582 For example, for an active function, the callee list shows all the functions 0583 called from the active one: one can select one of these functions without making 0584 it active. Also, if the call graph is shown beside, it will automatically select 0585 the same function. 0586 </para> 0587 0588 </sect1> 0589 0590 <sect1 id="concepts-guiparts"> 0591 <title>Parts of the &GUI;</title> 0592 0593 <sect2> 0594 <title>Sidedocks</title> 0595 <para> 0596 Sidedocks are side windows which can be placed at any border of a &kcachegrind; 0597 window. They always contain a list of cost entities sorted in some way. 0598 <itemizedlist> 0599 <listitem><para> 0600 The <guilabel>Function Profile</guilabel> is a list of functions showing 0601 inclusive and exclusive cost, call count, name and position of functions. 0602 </para></listitem> 0603 <listitem><para> 0604 <guilabel>Parts Overview</guilabel> 0605 </para></listitem> 0606 <listitem><para> 0607 <guilabel>Call Stack</guilabel> 0608 </para></listitem> 0609 </itemizedlist> 0610 </para> 0611 </sect2> 0612 0613 <sect2> 0614 <title>View Area</title> 0615 <para> 0616 The view area, typically the right part of a &kcachegrind; main window, is made 0617 up of one (default) or more tabs, lined up either horizontally or vertically. 0618 Each tab holds different views of only one cost entity at a time. 0619 The name of this entity is given at the top of the tab. If there are multiple 0620 tabs, only one is active. The entity name in the active tab is shown in bold, 0621 and determines the active cost entity of the &kcachegrind; window. 0622 </para> 0623 </sect2> 0624 0625 <sect2> 0626 <title>Areas of a Tab</title> 0627 <para> 0628 Each tab can hold up to four view areas, namely Top, Right, Left, and Bottom. 0629 Each area can hold multiple stacked views. The visible part of an area is 0630 selected by a tab bar. The tab bars of the top and right area are at the top; 0631 the tab bars of the left and bottom area are at the bottom. You can specify 0632 which kind of view should go into which area by using the tabs' context menus. 0633 </para> 0634 </sect2> 0635 0636 <sect2> 0637 <title>Synchronized View with Selected Entity in a Tab</title> 0638 <para> 0639 Besides an active entity, each tab has a selected entity. As most view types 0640 show multiple entities with the active one somehow centered, you can change 0641 the selected item by navigating inside a view (by clicking with the mouse 0642 or using the keyboard). Typically, selected items are shown in a highlighted 0643 state. By changing the selected entity in one of the views of a tab, all other 0644 views highlight the new selected entity accordingly. 0645 </para> 0646 </sect2> 0647 0648 <sect2> 0649 <title>Synchronization between Tabs</title> 0650 <para> 0651 If there are multiple tabs, a selection change in one tab leads to an activation 0652 change in the next tab, be it right of the former or under it. This kind of 0653 linkage should, for example, allow for fast browsing in call graphs. 0654 </para> 0655 </sect2> 0656 0657 <sect2> 0658 <title>Layouts</title> 0659 <para> 0660 The layout of all the tabs of a window can be saved (<menuchoice><guimenu>View 0661 </guimenu><guisubmenu>Layout</guisubmenu></menuchoice>). After duplicating the 0662 current layout (<menuchoice><shortcut><keycombo action="simul">&Ctrl; 0663 <keycap>+</keycap></keycombo></shortcut> <guimenu>View</guimenu> 0664 <guisubmenu>Layout</guisubmenu><guimenuitem>Duplicate</guimenuitem> 0665 </menuchoice>) 0666 and changing some sizes or moving a view to another area of a tab, you can 0667 quickly switch between the old and the new layout via <keycombo action="simul"> 0668 &Ctrl;<keycap>←</keycap></keycombo> and <keycombo action="simul">&Ctrl; 0669 <keycap>→</keycap></keycombo>. The set of layouts will be stored between 0670 &kcachegrind; sessions of the same profiled command. You can make the current 0671 set of layouts the default one for new &kcachegrind; sessions, or restore the 0672 default layout set. 0673 </para> 0674 </sect2> 0675 </sect1> 0676 0677 <sect1 id="concepts-sidedocks"> 0678 <title>Sidedocks</title> 0679 0680 <sect2> 0681 <title>Flat Profile</title> 0682 <para> 0683 The <guilabel>Flat Profile</guilabel> contains a group list and a function list. 0684 The group list contains all groups where cost is spent in, depending on the 0685 chosen group type. The group list is hidden when grouping is switched off. 0686 </para> 0687 <para> 0688 The function list contains the functions of the selected group (or all functions 0689 if grouping is switched off), ordered by some column, ⪚ inclusive or self 0690 costs spent therein. There is a maximum number of functions shown in the list, 0691 configurable in <menuchoice><guimenu>Settings</guimenu><guimenuitem>Configure 0692 KCachegrind</guimenuitem></menuchoice>. 0693 </para> 0694 </sect2> 0695 0696 <sect2> 0697 <title>Parts Overview</title> 0698 <para> 0699 In a profile run, multiple profile data files can be produced, which can be 0700 loaded together into &kcachegrind;. The <guilabel>Parts Overview</guilabel> 0701 sidedock shows these, ordered horizontally according to creation time; the 0702 rectangle sizes are proportional to the cost spent each part. You can select one 0703 or several parts to constrain the costs shown in the other &kcachegrind; views 0704 to these parts only. 0705 </para> 0706 <para> 0707 The parts are further subdivided between a partitioning and an inclusive cost 0708 split mode: 0709 <variablelist> 0710 <varlistentry> 0711 <term><guilabel>Partitioning Mode</guilabel></term> 0712 <listitem><para> 0713 The partitioning is shown in groups for a profile data part, according to the 0714 group type selected. For example, if &ELF; object groups are selected, you see 0715 colored rectangles for each used &ELF; object (shared library or executable), 0716 sized according to the cost spent therein. 0717 </para></listitem> 0718 </varlistentry> 0719 <varlistentry> 0720 <term><guilabel>Diagram Mode</guilabel></term> 0721 <listitem><para> 0722 A rectangle showing the inclusive cost of the current active function in the 0723 part is shown. This, again, is split up to show the inclusive costs of its 0724 callees. 0725 </para></listitem> 0726 </varlistentry> 0727 </variablelist> 0728 </para> 0729 </sect2> 0730 0731 <sect2> 0732 <title>Call Stack</title> 0733 <para> 0734 This is a purely fictional <quote>most probable</quote> call stack. It is built 0735 up by starting with the current active function, and adds the callers and 0736 callees with highest cost at the top and to bottom. 0737 </para> 0738 <para> 0739 The <guilabel>Cost</guilabel> and <guilabel>Calls</guilabel> columns show the 0740 cost used for all calls from the function in the line above. 0741 </para> 0742 </sect2> 0743 </sect1> 0744 0745 <sect1 id="concepts-views"> 0746 <title>Views</title> 0747 0748 <sect2> 0749 <title>Event Type</title> 0750 <para> 0751 The <guilabel>Event Type</guilabel> list shows all cost types available and the 0752 corresponding self and inclusive cost of the current active function for that 0753 event type. 0754 </para> 0755 <para> 0756 By choosing an event type from the list, you change the type of costs shown all 0757 over &kcachegrind; to the selected one. 0758 </para> 0759 </sect2> 0760 0761 <sect2> 0762 <title>Call Lists</title> 0763 <para> 0764 These lists show calls to and from the current active function. With 0765 <guilabel>All Callers</guilabel> and <guilabel>All Callees</guilabel> are meant 0766 those functions reachable in the caller and callee direction, even when other 0767 functions are in between. 0768 </para> 0769 0770 <para> 0771 Call list views include: 0772 <itemizedlist> 0773 <listitem><para>Direct <guilabel>Callers</guilabel></para></listitem> 0774 <listitem><para>Direct <guilabel>Callees</guilabel></para></listitem> 0775 <listitem><para><guilabel>All Callers</guilabel></para></listitem> 0776 <listitem><para><guilabel>All Callees</guilabel></para></listitem> 0777 </itemizedlist> 0778 </para> 0779 </sect2> 0780 0781 <sect2> 0782 <title>Maps</title> 0783 <para> 0784 A treemap view of the primary event type, up or down the call 0785 hierarchy. Each colored rectangle represents a function; its size is 0786 approximately proportional to the cost spent therein while the active function 0787 is running (however, there are drawing constrains). 0788 </para> 0789 <para> 0790 For the <guilabel>Caller Map</guilabel>, the graph shows the nested hierarchy of 0791 all callers of the currently activated function; for the <guilabel>Callee 0792 Map</guilabel>, it shows that of all callees. 0793 </para> 0794 <para> 0795 Appearance options can be found in the context menu. To get exact size 0796 proportions, choose <guimenuitem>Skip Incorrect Borders</guimenuitem>. As this 0797 mode can be very time-consuming, you may want to limit the maximum drawn 0798 nesting level before. <guilabel>Best</guilabel> determinates the split direction 0799 for children from the aspect ratio of the parent. <guilabel>Always 0800 Best</guilabel> decides on remaining space for each sibling. <guilabel>Ignore 0801 Proportions</guilabel> takes space for function name drawing before drawing 0802 children. Note that size proportions can get heavily wrong. 0803 </para> 0804 <para> 0805 Keyboard navigation is available with the left and right arrow keys for 0806 traversing siblings, and up and down arrow keys to go a nesting level up and 0807 down. &Enter; activates the current item. 0808 </para> 0809 </sect2> 0810 0811 <sect2> 0812 <title>Call Graph</title> 0813 <para> 0814 This view shows the call graph around the active function. The cost shown is 0815 only the cost spent while the active function was actually running, &ie; the 0816 cost shown for <function>main()</function> (if it's visible) should be the same 0817 as the cost of the active function, as that is the part of inclusive cost of 0818 <function>main()</function> spent while the active function was running. 0819 </para> 0820 <para> 0821 For cycles, blue call arrows indicate that this is an artificial call, which 0822 never actually happened, added for correct drawing. 0823 </para> 0824 <para> 0825 If the graph is larger than the drawing area, a bird's eye view is shown on a 0826 side. There are view options similar to those of the call maps; the selected 0827 function is highlighted. 0828 </para> 0829 </sect2> 0830 0831 <sect2> 0832 <title>Annotations</title> 0833 <para> 0834 The annotated source or assembler lists show the source lines or disassembled 0835 instructions of the current active function together with the (self) cost spent 0836 executing the code of a source line or instruction. If there was a call, lines 0837 with details on the call are inserted into the source: the (inclusive) cost 0838 spent inside of the call, the number of calls happening, and the call 0839 destination. 0840 </para> 0841 <para> 0842 Select such a call information line to activate the call destination. 0843 </para> 0844 </sect2> 0845 </sect1> 0846 0847 </chapter> 0848 0849 0850 <chapter id="commands"> 0851 <title>Command Reference</title> 0852 0853 <sect1 id="kcachegrind-mainwindow"> 0854 <title>The main &kcachegrind; window</title> 0855 0856 <sect2> 0857 <title>The File Menu</title> 0858 <para> 0859 <variablelist> 0860 0861 <varlistentry> 0862 <term><menuchoice> 0863 <shortcut> 0864 <keycombo>&Ctrl;<keycap>N</keycap></keycombo> 0865 </shortcut> 0866 <guimenu>File</guimenu><guimenuitem>New</guimenuitem> 0867 </menuchoice></term> 0868 <listitem><para> 0869 <action>Opens an empty top-level window</action> in which you can 0870 load profile data. This action is not really necessary, as <menuchoice> 0871 <guimenu>File</guimenu><guimenuitem>Open</guimenuitem></menuchoice> gives you a 0872 new top-level window if the current one already shows some data. 0873 </para></listitem> 0874 </varlistentry> 0875 0876 <varlistentry> 0877 <term><menuchoice> 0878 <shortcut> 0879 <keycombo>&Ctrl;<keycap>O</keycap></keycombo> 0880 </shortcut> 0881 <guimenu>File</guimenu><guimenuitem>Open</guimenuitem> 0882 </menuchoice></term> 0883 <listitem> 0884 <para> 0885 <action>Pops up the &kde; file selector</action> to choose a 0886 profile data file to be loaded. If there is some data already shown in the 0887 current top-level window, this will open a new window; if you want to open 0888 additional profile data in the current window, use <menuchoice> 0889 <guimenu>File</guimenu><guimenuitem>Add</guimenuitem></menuchoice>. 0890 </para> 0891 <para> 0892 The name of profile data files usually ends in <literal role="extension" 0893 >.<replaceable>pid</replaceable>.<replaceable>part</replaceable>-<replaceable 0894 >threadID</replaceable></literal>, where <replaceable>part</replaceable> and 0895 <replaceable>threadID</replaceable> are optional. <replaceable>pid</replaceable> 0896 and <replaceable>part</replaceable> are used for multiple profile data files 0897 belonging to one application run. 0898 By loading a file ending only in <literal role="extension"><replaceable 0899 >pid</replaceable></literal>, any existing data files for this run with 0900 additional endings are loaded as well. 0901 </para> 0902 <informalexample><para> 0903 If there exist profile data files <filename>cachegrind.out.123</filename> and 0904 <filename>cachegrind.out.123.1</filename>, by loading the first, the second will 0905 be automatically loaded too. 0906 </para></informalexample></listitem> 0907 </varlistentry> 0908 0909 <varlistentry> 0910 <term><menuchoice> 0911 <guimenu>File</guimenu><guimenuitem>Add</guimenuitem> 0912 </menuchoice></term> 0913 <listitem><para> 0914 <action>Adds a profile data file</action> to the current window. 0915 Using this, you can force multiple data files to be loaded into the same 0916 top-level window even if they are not from the same run, as given by the profile 0917 data file naming convention. For example, this can be used for side-by-side 0918 comparison. 0919 </para></listitem> 0920 </varlistentry> 0921 0922 <varlistentry> 0923 <term><menuchoice> 0924 <shortcut> 0925 <keycombo><keycap>F5</keycap></keycombo> 0926 </shortcut> 0927 <guimenu>File</guimenu><guimenuitem>Reload</guimenuitem> 0928 </menuchoice></term> 0929 <listitem><para> 0930 <action>Reload the profile data</action>. This is useful when another profile 0931 data file was generated for an already loaded application run. 0932 </para></listitem> 0933 </varlistentry> 0934 0935 <varlistentry> 0936 <term><menuchoice> 0937 <shortcut> 0938 <keycombo>&Ctrl;<keycap>Q</keycap></keycombo> 0939 </shortcut> 0940 <guimenu>File</guimenu><guimenuitem>Quit</guimenuitem> 0941 </menuchoice></term> 0942 <listitem><para><action>Quits</action> &kcachegrind;</para></listitem> 0943 </varlistentry> 0944 </variablelist> 0945 </para> 0946 0947 </sect2> 0948 0949 </sect1> 0950 </chapter> 0951 0952 <chapter id="faq"> 0953 <title>Questions and Answers</title> 0954 0955 <qandaset id="faqlist"> 0956 0957 0958 <qandaentry> 0959 <question> 0960 <para> 0961 What is &kcachegrind; for? I have no idea. 0962 </para> 0963 </question> 0964 <answer> 0965 <para> 0966 &kcachegrind; is a helpful at a late stage in software development, called 0967 profiling. If you don't develop applications, you don't need &kcachegrind;. 0968 </para> 0969 </answer> 0970 </qandaentry> 0971 0972 <qandaentry> 0973 <question> 0974 <para> 0975 What is the difference between <guilabel>Incl.</guilabel> and 0976 <guilabel>Self</guilabel>? 0977 </para> 0978 </question> 0979 <answer> 0980 <para>These are cost attributes for functions regarding some event type. As 0981 functions can call each other, it makes sense to distinguish the cost of the 0982 function itself (<quote>Self Cost</quote>) and the cost including all called 0983 functions (<quote>Inclusive Cost</quote>). <quote>Self</quote> is sometimes also 0984 referred to as <quote>Exclusive</quote> costs. 0985 </para> 0986 <para> 0987 So, for example, for <function>main()</function>, you will always have an 0988 inclusive cost of almost 100%, whereas the self cost is negligible when the real 0989 work is done in another function. 0990 </para> 0991 </answer> 0992 </qandaentry> 0993 0994 <qandaentry> 0995 <question> 0996 <para> 0997 If I double-click on a function down in the <guilabel>Call Graph</guilabel> 0998 view, it shows for function <function>main()</function> the same cost as the 0999 selected function. Isn't this supposed to be constant at 100%? 1000 </para> 1001 </question> 1002 <answer> 1003 <para> 1004 You have activated a function below <function>main()</function>, which obviously 1005 costs less than <function>main()</function> itself. For every function, it is 1006 shown only the part of the cost spent while the <emphasis>activated</emphasis> 1007 function is running; that is, the cost shown for any function can never be 1008 higher than the cost of the activated function. 1009 </para> 1010 </answer> 1011 </qandaentry> 1012 1013 1014 </qandaset> 1015 </chapter> 1016 1017 1018 <glossary> 1019 1020 <glossentry id="costentity"> 1021 <glossterm>Cost Entity</glossterm> 1022 <glossdef><para>An abstract item related to source code to which event counts 1023 can be attributed. Dimensions for cost entities are code location (⪚ source 1024 line, function), data location (⪚ accessed data type, data object), execution 1025 location (⪚ thread, process), and tuples or triples of the aforementioned 1026 positions (⪚ calls, object access from statement, evicted data from 1027 cache).</para></glossdef> 1028 </glossentry> 1029 1030 <glossentry id="eventcosts"> 1031 <glossterm>Event Costs</glossterm> 1032 <glossdef><para>Sum of events of some event type occurring while the execution 1033 is related to some cost entity. The cost is attributed to the 1034 entity.</para></glossdef> 1035 </glossentry> 1036 1037 <glossentry id="eventtype"> 1038 <glossterm>Event Type</glossterm> 1039 <glossdef><para>The kind of event of which costs can be attributed to a cost 1040 entity. There are real event types and inherited event types.</para></glossdef> 1041 </glossentry> 1042 1043 <glossentry id="inheritedeventtype"> 1044 <glossterm>Inherited Event Type</glossterm> 1045 <glossdef><para>A virtual event type only visible in the view, defined by a 1046 formula to be calculated from real event types.</para></glossdef> 1047 </glossentry> 1048 1049 <glossentry id="profiledatafile"> 1050 <glossterm>Profile Data File</glossterm> 1051 <glossdef><para>A file containing data measured in a profile experiment, or part 1052 of one, or produced by post-processing a trace. Its size is typically linear 1053 with the code size of the program.</para></glossdef> 1054 </glossentry> 1055 1056 <glossentry id="profiledatapart"> 1057 <glossterm>Profile Data Part</glossterm> 1058 <glossdef><para>Data from a profile data file.</para></glossdef> 1059 </glossentry> 1060 1061 <glossentry id="profileexperiment"> 1062 <glossterm>Profile Experiment</glossterm> 1063 <glossdef><para>A program run supervised by a profiling tool, producing possibly 1064 multiple profile data files from parts or threads of the run.</para></glossdef> 1065 </glossentry> 1066 1067 <glossentry id="profileproject"> 1068 <glossterm>Profile Project</glossterm> 1069 <glossdef><para>A configuration for profile experiments used for one program to 1070 profile, perhaps in multiple versions. Comparisons of profile data typically 1071 only makes sense between profile data produced in experiments of one profile 1072 project.</para></glossdef> 1073 </glossentry> 1074 1075 <glossentry id="profiling"> 1076 <glossterm>Profiling</glossterm> 1077 <glossdef><para>The process of collecting statistical information about runtime 1078 characteristics of program runs.</para></glossdef> 1079 </glossentry> 1080 1081 <glossentry id="realeventtype"> 1082 <glossterm>Real Event Type</glossterm> 1083 <glossdef><para>An event type that can be measured by a tool. This requires the 1084 existence of a sensor for the given event type.</para></glossdef> 1085 </glossentry> 1086 1087 <glossentry id="trace"> 1088 <glossterm>Trace</glossterm> 1089 <glossdef><para>A sequence of timestamped events that occurred while tracing a 1090 program run. Its size is typically linear with the execution time of the program 1091 run.</para></glossdef> 1092 </glossentry> 1093 1094 <glossentry id="tracepart"> 1095 <glossterm>Trace Part</glossterm> 1096 <glosssee otherterm="profiledatapart"/> 1097 </glossentry> 1098 1099 <glossentry id="tracing"> 1100 <glossterm>Tracing</glossterm> 1101 <glossdef><para>The process of supervising a program run and storing its events, 1102 sorted by a timestamp, in an output file, the trace.</para></glossdef> 1103 </glossentry> 1104 1105 </glossary> 1106 1107 <chapter id="credits"> 1108 1109 <title>Credits and License</title> 1110 1111 <para> 1112 Thanks to Julian Seward for his excellent &valgrind;, and Nicholas Nethercote 1113 for the &cachegrind; addition. Without these programs, &kcachegrind; would not 1114 exist. Some ideas for this &GUI; were from them, too. 1115 </para> 1116 <para> 1117 Thanks for all the bug reports and suggestions from different users. 1118 </para> 1119 1120 <!-- TRANS:CREDIT_FOR_TRANSLATORS --> 1121 &underFDL; <!-- FDL License --> 1122 1123 </chapter> 1124 1125 &documentation.index; 1126 </book>