doc/user/combined.docbook

0001 <?xml version="1.0" encoding="UTF-8"?>
0002 <!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
0003  "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd">
0004
0005 <chapter id="ch-combined">
0006 <title>Combined Arms Tactics</title>
0007
0008 <para>While each particular PO processing tool from Pology and other packages may be documented in itself, it may not be always obvious how to use these tools together. This chapter presents some scenarios where combined tool usage may increase the quality and efficiency of daily work on translation.</para>
0009
0010 <!-- ======================================== -->
0011 <sect1 id="sec-cbcompend">
0012 <title>Creating and Using PO Compendia</title>
0013
0014 <para>A <emphasis>PO compendium</emphasis> is simply a PO file which aggregates messages from many other normal PO files, usually all same-language PO files in a given translation project. It may aggregate only the messages currently present in project PO files, but also messages that were present once and are no longer. As such, the compendium can be regarded as an instance of a <emphasis>translation memory</emphasis>. This section explains how to create, update, and apply such a translation memory.</para>
0015
0016 <sect2 id="sec-cbcomptm">
0017 <title>Why Translation Memory?</title>
0018
0019 <para>Imagine that the translator wants to start translating a PO file that was so far never translated, but which has content similar to some other, translated PO files. Perhaps it was even derived from those other PO files, by merging, splitting, etc. This means that many messages in the present PO file may have been translated already in some other PO file, or at least that very similar translated messages exist in other PO files. Since the translation memory (TM) contains all known translated messages, it can be used to automatically produce translated and fuzzy messages in the present PO file, significantly reducing translation effort. Matching against the TM can be performed either as the translator goes from message to message in the editor (if the editor has a TM feature), or at once for all messages (by a specialized command) before starting to go through messages in the editor.</para>
0020
0021 <para>In most non-PO based translation workflows, translation memories are crucial for efficiency. This is because most non-PO formats have no concept of merging with templates. Each new revision of the source material results in (an equivalent of) entirely empty translation files, and it is translator's duty to somehow bring old translations into the new context. A carefully maintained TM, with a corresponding matching tool, is the foremost way to do this.</para>
0022
0023 <para>In a PO-based translation workflow, merging with templates already provides most of what TM is essential for. In effect, the old PO file that is being merged can be considered as a TM for the new PO file that will become based on the new template. Even when PO files are renamed, merged, or split, if that is properly done, no translations will be lost. A TM for PO files is therefore useful mostly to smooth out glitches in translation maintenance procedures (e.g. a PO file improperly split).<footnote>
0024 <para>To be sure, some short messages can be quite similar in many unrelated PO files. But having TM matches only on such messages will result in very small time savings, if measurable at all.</para>
0025 </footnote> Nevertheless, having a well maintained TM in the form of PO compendium cannot hurt, while providing for the (hopefully) rare situations where TM matching is actually needed.</para>
0026
0027 <para>Many dedicated PO editors will automatically maintain an internal TM, usually in a database format, into which they will scoop messages from all PO files that were opened in them. However, in a team environment, these internal TMs are inferior to a PO compendium. For one, different translators will have different TMs; a translator may start to work on a file for which there are TM matches in another translator's internal TM. Internal TMs may be volatile, for example corrupted due to an editor bug, or perish during system maintenance. There is no control over which messages are scooped by the editor, and how they are treated (e.g. which message parts are being ignored).</para>
0028
0029 <para>On the other hand, a PO compendium can be maintained in a central place, and, being a PO file in itself, kept in version control just like all other PO files. In this way, all translators have fast access to a unified TM, which is secured from accidental corruption. Tight control over which messages are collected and how they are collected may be asserted, in the script which is written to update the compendium. This script can be made to run periodically, and to automatically commit updated compendium the version control repository.</para>
0030
0031 </sect2>
0032
0033 <sect2 id="sec-cmcompcr">
0034 <title>Maintaining a Centralized PO Compendium</title>
0035
0036 <para>As a first attempt, the PO compendium can be created simply by concatenting all PO files in the project into one called <filename>compendium.po</filename>, using <command>msgcat</command>. If PO files are organized by language (all PO files of a given language kept in directory of that language), then the concatenation command would be:
0037 <programlisting language="bash">
0038 $ cd $LANGDIR
0039 $ find -iname \*.po | xargs msgcat -o compendium.po
0040 </programlisting>
0041 Unfortunatelly, a compendium created in this way has a number of drawbacks:
0042 <itemizedlist>
0043
0044 <listitem>
0045 <para>Aside from translated messages, the compendium will also contain untranslated and fuzzy messages. While untranslated messages are obviously dead weight, a case could be made for taking in fuzzy messages. But in light of the suggested usage of the compendium in the following section, fuzzy messages too should be ignored.</para>
0046 </listitem>
0047
0048 <listitem>
0049 <para>Messages in the compendium will contain all parts as normal messages do. Some of these parts (such as source references) are unnecessary, since they will be ignored when applying the compendium later. Other than increasing the size of the compendium, another problem with these parts is that changes in them will cause unnecessary version control differences, so they should be stripped from the compendium.</para>
0050 </listitem>
0051
0052 <listitem>
0053 <para>Messages will be ordered as they are seen in concatenated PO files. The ordering of messages in the compendium is also of no importance for application. But, any changes in message ordering between two compendium updates will cause unnecessary version control differences, so it is best to sort messages by their keys (<varname>msgid</varname> and <varname>msgctxt</varname> fields).</para>
0054 </listitem>
0055
0056 <listitem>
0057 <para>When two or more PO files contain the same message by key (<varname>msgid</varname> and <varname>msgctxt</varname>) but with different translations (due to context), such as:
0058 <programlisting language="po">
0059 msgid "Open File"
0060 msgstr "Otvori datoteku"
0061 ⁠
0062 msgid "Open File"
0063 msgstr "Otvaranje datoteke"
0064 </programlisting>
0065 <command>msgcat</command> will <emphasis>aggregate</emphasis> translations (and translator comments if any) in the compendium message, and make it fuzzy:
0066 <programlisting language="po">
0067 #, fuzzy
0068 msgid "Open File"
0069 msgstr ""
0070 "#-#-#-#-#  alpha.po (alpha-1.2.9)  #-#-#-#-#\n"
0071 "Otvori datoteku\n"
0072 "#-#-#-#-#  bravo.po (bravo-0.8.12)  #-#-#-#-#\n"
0073 "Otvaranje datoteke"
0074 </programlisting>
0075 Since the context should be double-checked anyway when applying the compendium later (especially for short messages), it is better to instead pick one of the translations and have a normal translated compendium message. If each translation appears only once, then it does not matter which is picked; but if one translation appears 10 times and the other once, clearly the former should be picked. That is, the most frequent translation should be picked.</para>
0076 </listitem>
0077
0078 <listitem>
0079 <para>The PO header is treated in the same way as messages by <command>msgcat</command>: since all headers have equal <varname>msgid</varname> field (empty), their <varname>msgstr</varname> fields will be aggregated. This too is just dead weight since the header is not used in applications of the compendium. Instead, a brief and informative header should be explicitly set (mentioning that this is a compendium PO file, for which project and language, etc).</para>
0080 </listitem>
0081
0082 <listitem>
0083 <para>In some translation projects, PO files frequently contain <emphasis>meta-messages</emphasis>, such as those where translators can add their names and contact addresses. These messages have the same key (<varname>msgid</varname>) in all PO files, but should be translated differently in general, the more so the more the people in the translation team. So it may be better to omit such messages from the compendium.</para>
0084 </listitem>
0085
0086 </itemizedlist>
0087 It must be noted that none of these problems are an actual deficiency of <command>msgcat</command> itself. Since its function is general concatenation of PO files, it cannot make any of the assumptions necessary for the present application. Instead, <command>msgcat</command> should be used as a part of a wider script, in which the necessary additional processing happens, tailored to the particular translation project and translation team.</para>
0088
0089 <para>Let us assume the following layout of the top directory for the translation project <literal>foo</literal> and translation team (language) <literal>nn</literal>:
0090 <programlisting>
0091 foo-nn/
0092     ui/
0093         alpha.po
0094         bravo.po
0095         ...
0096     doc/
0097         alpha.po
0098         bravo.po
0099         ...
0100     update-compendium-foo-nn.sh
0101     compendium-foo-nn.po
0102 </programlisting>
0103 <filename>update-compendium-foo-nn.sh</filename> will be the script to create or update the compendium, <filename>compendium-foo-nn.po</filename> the compendium itself. It helps clarity to add the project name and language into names of these two files, because both are tailored to that project and that language. Taking into account the aforementioned drawbacks of a simple compendium made by <command>msgcat</command> and the suggested resolutions, <filename>update-compendium-foo-nn.sh</filename> could look like this<footnote>
0104 <para>At one point, the script creates a temporary PO file for each original PO file, and then calls <command>msgcat</command> on these temporary files to create the first, raw compendium. These temporary files have fuzzy and untranslated messages removed, and some other adjustments, before concatenation. One could think that all these adjustments could instead be done on the raw compendium. The problem is that then there would be no unambiguous way to tell which fuzzy messages in the raw compendium were fuzzy to begin with, and which were made fuzzy by <command>msgcat</command> due to agreggation of translations. With fuzzy messages removed prior to concatenation, in de-aggregation by frequency that follows it is known that messages with fuzzy flags are those aggregated.</para>
0105 </footnote>:
0106 <programlisting language="bash">
0107 #!/bin/sh
0108 #
0109 # Create the PO compendium of Foo in Nevernissian language.
0110 #
0111 # Usage:
0112 #   update-compendium-foo-nn.sh [trim]
0113 #
0114 # The script can be called from anywhere, because PO paths are
0115 # hardcoded within the script relative to its own location.
0116 # If the 'trim' argument is not given (i.e. script is called
0117 # without arguments), messages in the old compendium that are
0118 # no longer found in project PO files are preserved in
0119 # the new compendium; if 'trim' is given, they are removed.
0120
0121 # Directory where this script resides.
0122 cmddir=`dirname $0`
0123 # Paths of directories containing PO files, space-separated.
0124 # (Make sure the compendium itself is not in here!)
0125 podirs="$cmddir/ui $cmddir/doc"
0126 # Path to the compendium.
0127 comppo="$cmddir/compendium-foo-nn.po"
0128
0129 trim=$1
0130
0131 # If there is already a compendium, preserve it for later.
0132 test -f $comppo &amp;&amp; mv $comppo $comppo.old
0133
0134 # Collect PO files from given paths into a file.
0135 find $podirs -iname \*.po | sort > polist
0136
0137 # Pre-process PO files in the project, creating temporary
0138 # PO files named *.po.tmpcomp:
0139 # - remove fuzzy and untranslated messages
0140 # - declare obsolete messages non-obsolete
0141 # - remove extracted comments, source references, flags
0142 for pofile in `cat polist`; do
0143     msgattrib $pofile \
0144         --translated --no-fuzzy --clear-obsolete --force-po \
0145     | grep -v '^#[:.,]' > $pofile.tmpcomp
0146 done
0147 # Update file list to contain temporary PO files.
0148 sed -i "s/$/.tmpcomp/" polist
0149
0150 # Reduce headers of temporary PO files to necessary minimum,
0151 # proper header for the compendium will be added later.
0152 posieve -q set-header -f polist \
0153     -srmallcomm \
0154     -sremoverx:'^(?!MIME-Version$|Content-Type$|Content-Transfer-Encoding$)'
0155
0156 # Create raw compendium from temporary PO files:
0157 # - aggregate translations for repeated messages
0158 # - sort messages by key
0159 msgcat --sort-output --force-po -f polist -o $comppo
0160
0161 # Clean up temporary PO files and file list.
0162 cat polist | xargs rm
0163 rm polist
0164
0165 # Resolve aggregated messages to most frequent variant.
0166 # It is safe to unfuzzy resolved messages, since at
0167 # this point assured that only translated messages
0168 # have been aggregated.
0169 posieve -q resolve-aggregates $comppo -sunfuzzy
0170
0171 # Remove meta-messages which are found in many PO files but
0172 # should in general be differently translated in each.
0173 msggrep -v $comppo -o $comppo \
0174     -JFe 'NAME OF TRANSLATORS' \
0175     -JFe 'EMAIL OF TRANSLATORS' \
0176     -JFe 'ROLES_OF_TRANSLATORS' \
0177     -JFe 'CREDIT_FOR_TRANSLATORS' \
0178
0179 # Set the compendium header.
0180 # Use current date as revision date.
0181 dtnow=`date '+%Y-%m-%d %H:%M%z'`
0182 posieve -q set-header $comppo -screate \
0183     -stitle:'Compendium of Foo translation into Nevernissian.' \
0184     -sfield:'Project-Id-Version:compendium-foo-nn' \
0185     -sfield:"PO-Revision-Date:$dtnow" \
0186     -sfield:'Last-Translator:Simulacrum' \
0187     -sfield:'Language-Team:Nevernissian &lt;l10n-nn@neverwhere.org&gt;' \
0188     -sfield:'Language:nn' \
0189     -sfield:'Plural-Forms:nplurals=9; plural=n==1 ? ...' \
0190
0191 # If the old compendium was preserved, add it to the new compendium
0192 # in order to retain messages no longer found in the project
0193 # (unless trimming was requested).
0194 if test -f $comppo.old &amp;&amp; test x"$trim" != xtrim; then
0195     msgcat --use-first --sort-output $comppo $comppo.old -o $comppo
0196     # ...old compendium must be the second argument, in order
0197     # not to override possibly updated translations of
0198     # existing messages in the project.
0199 fi
0200
0201 # Test if new compendium is different from the old, with
0202 # the exception  of creation time. If they are the same,
0203 # discard the new compendium.
0204 if test -f $comppo.old; then
0205     for cpfile in $comppo $comppo.old; do
0206         grep -v '^"PO-Revision-Date:.*\\n"$' $cpfile >$cpfile.nrd
0207     done
0208     if cmp -s $comppo.nrd $comppo.old.nrd; then
0209         mv $comppo.old $comppo
0210     else
0211         rm $comppo.old
0212     fi
0213     rm $comppo.nrd $comppo.old.nrd
0214 fi
0215
0216 # Canonically wrap the compendium.
0217 msgcat $comppo -o $comppo
0218
0219 # All done.
0220 </programlisting>
0221 This script should be periodically called to update the compendium, and the updated file committed, such that all translators will automatically get it when they update their local repository copies. If after some (long) time the compendium becomes to big due to accumulation of old messages, running the script once with the <literal>trim</literal> argument will cause all old messages to be dropped.</para>
0222
0223 </sect2>
0224
0225 <sect2 id="sec-cmcompuse">
0226 <title>Applying the PO Compendium</title>
0227
0228 <para>Translators who use a dedicated PO editor with internal TM should configure the editor to read the compendium into the internal TM. This may be done, for example, by including the compendium PO file (or the directory in which it resides) into editor's translation project paths. If the compendium is kept under version control, the editor should automatically update its internal TM from the compendium whenever the repository is updated and the editor started again. In this way, editor's internal TM becomes transient in nature, there being no problem if it gets corrupted or deleted.</para>
0229
0230 <para>When working on a particular PO file with a properly configured PO editor, as the translator jumps from one to another incomplete (untranslated or fuzzy) message, when the message is similar to one or few messages in the compendium (i.e. in internal TM) the editor will somehow "offer" those similar messages. Ideally, for each similar message the editor should show not only the possible translation, but also the difference between the two original texts (that of the current message and the TM match). This will allow the translator to quickly see how the offered translation should be adapted to fit the current original.</para>
0231
0232 <para>Dedicated PO editors may also offer <emphasis>batch</emphasis> application of the TM. This means that when the PO file is opened, the translator executes a command which fills in all untranslated messages with matches from the TM, making some translated (on exact matches) and some fuzzy (partial matches). However, simpleminded <emphasis>batch application of the TM should be considered dangerous</emphasis>. For one, exact matches in the source language may not be exact matches in the original; especially short messages frequently need different translations. But the translator will simply jump over each batch-translated message and fail to see this. The other problem comes up if the material in the compendium is not sufficiently reviewed, in which case every match from the TM, even on long messages, should be at least casually reviewed by the translator. Thus, if there is no way to configure batch application to be less indiscriminate, it is best to avoid it alltogether, or else the quality of translation may suffer.</para>
0233
0234 <para>Translators who use a general text editor to work on PO files can still make use of the compendium. One option could be merging the PO file with its template in presence of the compendium, just before starting to work on it:
0235 <programlisting language="bash">
0236 $ msgmerge alpha.po alpha.pot -C compendium.po --update --previous
0237 </programlisting>
0238 The <option>-C</option> option to <command>msgmerge</command> specifies the compendium from which to draw exact and partial matches, when there is no match in the PO file itself. This option can be repeated to add several compendia. The <option>--update</option> option is to modify the PO file in place, rather than writing the merged PO file to standard output. The <option>--previous</option> option is to get previous fields (<literal>#| ...</literal> comments) on fuzzy messages. Unfortunatelly, this method is a command line version of the batch application of the TM in a dedicated PO editor, and suffers from the same problem of indiscriminate exact matches that the translator will later fail to check. Therefore it should not be used (at least not for general translation).</para>
0239
0240 <para>Fortunatelly, Pology provides the <command>poselfmerge</command> command, which is a wrapper around <command>msgmerge</command>, and has several options to mitigate the indiscriminancy problem of batch application of TM. To avoid silent exact matches on short messages, the <option>-W</option>/<option>--min-words-exact</option> can be used to set the minimum length of a message in words at which the exact match will be accepted; otherwise the message is made fuzzy. If every exact match should be checked by the translator, no matter the length of the message, there is the <option>-x</option>/<option>--fuzzy-exact</option> to make all exact matches fuzzy.<footnote>
0241 <para>The translator can still see when the match was exact, because normal fuzzy messages will have previous fields and fuzzied exact matches will not.</para>
0242 </footnote> These options have counterpart fields in Pology user configuration, so that the translator does not have to remember to use them on every run, and the PO template is not used at all. See <xref linkend="sec-miselfmerge"/> for details.</para>
0243
0244 </sect2>
0245
0246 </sect1>
0247
0248 <!-- ======================================== -->
0249 <sect1 id="sec-cbeffedit">
0250 <title>Efficiently Translating with a Text Editor</title>
0251
0252 <para>Dedicated PO editors provide not only direct editing enhancements (no dealing with PO format syntax, jumping through incomplete messages, automatic removal of fuzzy elements, etc), but also translation-oriented features like spell checking, translation memory collection and application, glossary suggestions, and, going beyond standalone PO files, translation project overview and statistics. Why would someone, in spite of this, prefer to work on PO files with a general text editor? There are various reasons. Some people do not like how elements of currently translated PO message are scattered all over the window (as is typical of many PO editors), out of eye focus, and some elements even not shown. Other people like to have modularity in the translation workflow, rather than relying on the PO editor for everything and accepting its limitations. Some people are simply well accustomed to their text editor and do not want a higher level editor "abstracting" the PO format for them.</para>
0253
0254 <para>When translating PO files with a general text editor, you will have to use some command line tools to achieve reasonable efficiency and quality.</para>
0255
0256 <sect2 id="sec-cbeffedfeat">
0257 <title>Expected Features of the Text Editor</title>
0258
0259 <para>Starting from the text editor itself, it should have several general text-editing features. Capable editors all have these features, but they should nevertheless be mentioned, so that you can look for them.</para>
0260
0261 <para>The most important feature is probably <emphasis>syntax highlighting</emphasis>, where special parts of the text are displayed in different color, weight, or slant. In a PO file, message field keywords (<varname>msgid</varname>, <varname>msgstr</varname>) should stand out from the text itself, text in comments should look different from the text in fields, internal text elements (e.g. markup tags) should be highlighted, etc. In this way you can quickly focus on what you should be editing, and on the surrounding context of the text. Syntax higlighting was originaly introduced for various programming language source files, but has since spread to other types of structured text files; established editors should have syntax highlighting for PO files as well.</para>
0262
0263 <para>Capable editors usually provide special methods of navigating through the file, above simply scrolling up and down line by line or page by page. One particularly useful method would be <emphasis>line bookmarking</emphasis>. While in the middle of editing a given line, you have to search through the PO file for something (e.g. how a certain phrase was translated earlier): you can then bookmark the line, search as much as you like, and return to the same line by jumping to the bookmark. Otherwise you would have to remember which line (by number) it was to jump back to it, or search for the text that you remember from that line.<footnote>
0264 <para>One trick is also hitting undo once, which will normally skip to the line in which the last modification was made, and then hit redo to recover the modification.</para>
0265 </footnote></para>
0266
0267 <para>It will usually be possible to start the editor with one or more file paths as command-line arguments, to open those files at once. This is useful when a selection of PO files in need of some editing is determined by an external command, which writes out their paths. These paths can then be fed directly to the editor, rather than having to open them manually one by one (and possibly missing some) through editor's file dialog.</para>
0268
0269 </sect2>
0270
0271 <sect2 id="sec-cbeffstats">
0272 <title>Statistics on PO Files</title>
0273
0274 <para>Having good statistics on a single or a group of PO files is necessary for estimating the translation effort, for example how much time should be allotted for updating the existing translation for impending next release of the source material. Pology's workhorse for computing statistics is <link linkend="sv-stats">the <command>stats</command> sieve</link> of <command>posieve</command>.</para>
0275
0276 <para id="p-trpsetup1">Assume the following arrangement of PO files for language <literal>nn</literal> and their templates:
0277 <programlisting>
0278 l10n-nn/
0279     ui/
0280         alpha.po
0281         bravo.po
0282         ...
0283     doc/
0284         alpha.po
0285         bravo.po
0286         ...
0287 l10n-templates/
0288     ui/
0289         alpha.pot
0290         bravo.pot
0291         ...
0292     doc/
0293         alpha.pot
0294         bravo.pot
0295         ...
0296 </programlisting>
0297 If the current working directory is <filename>l10n-nn/</filename>, to compute statistics on a single PO file, <command>posieve</command> can be executed like this:
0298 <programlisting language="bash">
0299 $ posieve stats ui/alpha.po
0300 </programlisting>
0301 This will display a table with message counts, word counts and characters counts, as well as ratios to total, per category of messages (translated, fuzzy, untranslated, obsolete). To have the same output for all PO files in the <filename>ui/</filename> directory taken together, or in the whole project, respectively:
0302 <programlisting language="bash">
0303 $ posieve stats ui/
0304 $ posieve stats
0305 </programlisting>
0306 Note that word count is a much better base for estimating the translation effort than message count.</para>
0307
0308 <para>When statistics is computed for several PO files (or a directory, or several directories full of PO files), frequently it is necessary to get statistics per file (or per directory). This is done by adding the <option>byfile</option> or <option>bydir</option> sieve parameter:
0309 <programlisting language="bash">
0310 $ posieve stats -s byfile ui/
0311 </programlisting>
0312 However, this will output one full table for each file, which may be a bit too much data to grasp. Instead, you can request bar display, where each file is represented by a single-line bar. The bar shows either the number of messages or the number words per category, depending on whether <option>msgbar</option> or <option>wbar</option> was issued. To get word bars per file in <filename>ui/</filename> directory, you can execute:
0313 <programlisting language="bash">
0314 $ posieve stats -s byfile -s wbar ui/
0315 </programlisting>
0316 </para>
0317
0318 <para>Fuzzy messages introduce some uncertainty in effort estimation. If the statistics shows 50 fuzzy messages with 700 words, you cannot conclude from that if changes in those messages are small (e.g. cleaned style, punctuation) and translation can be quickly updated, or substantial (entirely new messages with passing similarity to earlier message) and require heavy editing. For this reason the <command>stats</command> sieve provides the <option>ondiff</option> parameter: for each fuzzy message the difference from previous message is computed, and based on that a part of the word count is assigned to translated category and the rest to untranslated (thus leaving nominal zero words in the fuzzy category). The result is that, for example, a PO file with a lot of messages fuzzy due to punctuation changes will show in statistics as almost completely translated by number of words.</para>
0319
0320 <para>If the translation project is organized such that new empty PO files are not automatically derived from new PO templates, then when running statistics just over language PO files it will happen that templates which do not have a counterpart PO file are not counted as fully empty PO files. To have such templates counted, the two-argument <option>templates</option> parameter can be issued; the first parameter is a path segment of the language directory, and the second parameter what to replace it with to get the corresponding template directory path. In the translation project setup as above, this is how you would compute the statistics on <filename>ui/</filename> directory while taking templates into account:
0321 <programlisting language="bash">
0322 $ posieve stats -s templates:l10n-nn:l10n-templates ui/
0323 </programlisting>
0324 The path replacement is always done on absolute paths, so in this example it is not a problem that the relative paths (<filename>ui/alpha.po</filename>...) do not contain original and replacement segments.</para>
0325
0326 <para>The translation project may not be organized such that each language has its own top directory. Instead, language PO files may be grouped by application and PO domain, and named by language code:
0327 <programlisting>
0328 project/
0329     alpha/
0330         po/
0331             aa.po
0332             bb.po
0333             ...
0334     bravo/
0335         po/
0336             aa.po
0337             bb.po
0338             ...
0339     ...
0340 </programlisting>
0341 In this setup the <command>stats</command> sieve can still be run on directory paths as arguments, in order to get statistics on all PO files of a given language, by using the <option>-I</option>/<option>--include-path</option> option of <command>posieve</command> to single out the desired language. For example, to get statistics on all PO files of the <literal>nn</literal> language in a single table:
0342 <programlisting language="bash">
0343 $ posieve stats project/ -I 'nn.po'
0344 </programlisting>
0345 or by file in form of message bars:
0346 <programlisting language="bash">
0347 $ posieve stats -s byfile -s msgbar project/ -I 'nn.po'
0348 </programlisting>
0349 The value of the <option>-I</option> option is in fact a <link linkend="sec-cmregex">regular expression</link>, and the option can be repeated, which allows to finely tune the file selection when necessary.</para>
0350
0351 <para>As for other statistics tools, Gettext's <command>msgfmt</command> with <option>--statistics</option> option could be considered as one (though it shows only translated, fuzzy, and untranslated message counts), and especially the <ulink url="http://translate.sourceforge.net/wiki/toolkit/pocount">the <command>pocount</command> command from Translate Toolkit</ulink>.</para>
0352
0353 </sect2>
0354
0355 <sect2 id="sec-cbeffupdate">
0356 <title>Updating PO Files After Merging</title>
0357
0358 <para>When a single PO file is to be translated from scratch, then it is easy to just open it in the text editor and start translating messages one by one. However, usually more frequent than this is translation maintenance, in which you need to go through a bunch of freshly merged PO files and update new untranslated and fuzzy messages. The problem then is twofold: how to efficiently check which files need updating, and how to efficiently go through messages that need to be updated within a file.</para>
0359
0360 <para>To see which PO files need to be updated, you can simply run the <command>stats</command> sieve with <option>byfile</option> and <option>msgbar</option>/<option>wbar</option> parameters (and possibly <option>ondiff</option>), as explained in the previous section. After that you would have to manually observe incomplete files and open them in the editor one by one, which is tedious and prone to oversight. Instead, you can also add the <option>incompfile</option> parameter to <command>stats</command>, which will write paths of all incomplete PO files into a file. If PO files are organized as in <link linkend="p-trpsetup1">the previous example</link>, and you want to update translations in <filename>ui/</filename> subdirectory, you would run:
0361 <programlisting language="bash">
0362 $ posieve stats -s byfile -s wbar -s incompfile:toupdate.out ui/
0363 </programlisting>
0364 Now <filename>toupdate.out</filename> will contain the paths of incomplete files. If the editor can be started from the command line with a number of file path arguments, you can directly feed it <filename>toupdate.out</filename>, e.g. by adding <filename>`cat toupdate.out`</filename> to the editor command.</para>
0365
0366 <para>If the translation project is organized such that each new template results in new empty PO file, you may wish to update only those PO files which where worked on before, i.e. those not entirely empty. For this you can add the <option>mincomp</option> parameter, which sets the minimal completeness (the ratio of translated to total messages) at which to take a PO file into consideration, with a very small value:
0367 <programlisting language="bash">
0368 $ posieve stats -s mincomp:1e-6 -s incompfile:toupdate.out ui/
0369 </programlisting>
0370 <literal>1e-6</literal> is short for <literal>0.000001</literal>, which means to take into consideration only those PO files which have more than one in a million translated files. Since there is no PO file with a million messages, this effectively means to include every PO file which has at least one translated message in it.</para>
0371
0372 <para>Once the incomplete PO files are open in the editor, to be able to jump through incomplete messages, you need to somehow use editor's search function. For fuzzy messages it is easy, you can just search for the <literal>, fuzzy</literal> string. Untranslated messages, on the other hand, are more problematic. You may think of searching for <literal>msgstr ""</literal>, but this would also find long wrapped messages:
0373 <programlisting language="po">
0374 msgid ""
0375 "Blah blah blah [...]"
0376 "blah blah."
0377 msgstr ""
0378 "Bla bla bla [...]"
0379 "bla bla."
0380 </programlisting>
0381 To make untranslated messages stand out unambiguously, there is <link linkend="sv-tag-untranslated">the <command>tag-untranslated</command> sieve</link>. It simply adds <literal>untranslated</literal> flag to all untranslated messages (but not to fuzzy unless explicitly requested), so that you can search for <literal>, untranslated</literal> in the editor. The most convenient is to run <command>tag-untranslated</command> on the <filename>toupdate.out</filename> file produced by <command>stats</command> using the <option>-f</option>/<option>--from-files</option>:
0382 <programlisting language="bash">
0383 $ posieve tag-untranslated -f toupdate.out
0384 </programlisting>
0385 </para>
0386
0387 <para>Fuzzy messages may be such only due to small changes in the original text, for example a single word changed in a paragraph-length message. This is not so easy to see by manually comparing the original and the translation. However, since fuzzy messages should have the previous original text in comments (if merged with <option>--previous</option> option of <command>msgmerge</command>), it is possible to automatically embed differences into those comments with <link linkend="sv-diff-previous">the <command>sv-diff-previous</command> sieve</link>; see its documentation for an example. You should run this sieve on <filename>toupdate.out</filename> as well:
0388 <programlisting language="bash">
0389 $ posieve diff-previous -f toupdate.out
0390 </programlisting>
0391 Your editor may even highlight the difference segments added to the previous original text, making them stand out quite clearly.</para>
0392
0393 <para>Since normally you want both to mark untranslated messages and to add differences to fuzzy messages before going through PO files, you can run the two sieves at once:
0394 <programlisting language="bash">
0395 $ posieve tag-untranslated,diff-previous -f toupdate.out
0396 </programlisting>
0397 </para>
0398
0399 <para>As you go through incomplete messages and update the translation, you should remove any <literal>fuzzy</literal> or <literal>untranslated</literal> flags, and previous fields in <literal>#| ...</literal> comments, so that in the end you can commit (upload, send) clean updated PO files. But sometimes it will happen that you realize that you do not have enough time to update everything, and you want to commit what you have completed by that moment. The problem is that there will still be some <literal>untranslated</literal> flags and embedded differences remaining throughout the files, and leftover embedded differences would e.g. interfere with subsequent merging. To automatically remove these remaining elements, you simply run the two sieves with the <option>strip</option> parameter:
0400 <programlisting language="bash">
0401 $ posieve tag-untranslated,diff-previous -s strip -f toupdate.out
0402 </programlisting>
0403 </para>
0404
0405 <para>When you update a PO file, for the sake of clarity and copyright you should also update its header with your personal data (the author comment, the <literal>Last-Translator:</literal> field, etc.) You could do this manually, but it is much simpler to set your data once in the <link linkend="sec-cmconfig">Pology user configuration</link> and run <link linkend="sv-update-header">the <command>update-header</command> sieve</link> over all updated files<footnote>
0406 <para>If you use <link linkend="ch-ascript">ascription</link>, you should instead tell <command>poascribe</command> to update headers for you when committing. This is done by adding <literal>update-headers = yes</literal> to <literal>[poascribe]</literal> section in user configuration.</para>
0407 </footnote>:
0408 <programlisting language="bash">
0409 $ posieve update-header -f toupdate.out
0410 </programlisting>
0411 </para>
0412
0413 </sect2>
0414
0415 </sect1>
0416
0417 <!-- ======================================== -->
0418 <sect1 id="sec-cbsumasc">
0419 <title>Summit with Ascription</title>
0420
0421 <para>Summit and ascription workflows, described in <xref linkend="ch-summit"/> and <xref linkend="ch-ascript"/>, fit excellently together. Ascription enables review-based release control on summit scatter (<xref linkend="sec-sucfgascf"/> shows how to do it), while summit removes the needed for different ascription file trees per branch (and the associated effort at branch cycling). All the information that you need to set up a summit with ascription are explained in the chapters mentioned; the only thing left for this section is to show the order of actions and the resulting file structure, as implied by the technical requirements.</para>
0422
0423 <para>The first thing to set up is the summit. From the viewpoint of ascription, it is not important which summit mode is used; indeed, while the direct summit is still not advised, putting ascription on top would alleviate some of its disadvantages. In the following the summit over dynamic templates is assumed, because it is a bit less involved than the summit over static templates, but nevertheless demonstrates all important points.</para>
0424
0425 <para>After configuring and initializing the summit over dynamic templates, let the summit top directory only (that is, omitting branches) look like this:
0426 <programlisting>
0427 l10n-nn/
0428     summit/
0429         foo-module/
0430             alpha.po
0431             bravo.po
0432             ...
0433         bar-module/
0434             kilo.po
0435             lima.po
0436             ...
0437         ...
0438     summit-config
0439 </programlisting>
0440 PO files in the summit are shown split into several submodules for generality. Unlike in the chapter on summit, the summit directory is placed here within a parent language directory, and the summit configuration file <filename>summit-config</filename> in the parent directory instead of the summit directory. This is in order to have a clearer structure when the ascription is added.</para>
0441
0442 <para>The ascription is set up after the summit, such that it takes only the summit directory into account, having nothing to do with branches. After the ascription is configured and initialized, the summit with ascription tree should look like this:
0443 <programlisting>
0444 l10n-nn/
0445     summit/
0446         foo-module/
0447             alpha.po
0448             bravo.po
0449             ...
0450         bar-module/
0451             kilo.po
0452             lima.po
0453             ...
0454         ...
0455     summit-ascript/
0456         foo-module/
0457             alpha.po
0458             bravo.po
0459             ...
0460         bar-module/
0461             kilo.po
0462             lima.po
0463             ...
0464         ...
0465     ascription-config
0466     summit-config
0467 </programlisting>
0468 Here the ascription tree root is set to <filename>summit-ascript/</filename> in the ascription configuration file <filename>ascription-config</filename>. With this, setting up the summit with ascription workflow is completed.</para>
0469
0470 <sect2 id="sec-cbsamultsum">
0471 <title>Several Summits with Unified Ascription</title>
0472
0473 <para>In some circumstances you may want to have <emphasis>several</emphasis> separate summits with unified ascription. This may be the case, for example, when the translation project is such that the user interface and documentation PO files are put into separate file trees in branches, and most paired UI-documentation PO files have same names.<footnote>
0474 <para>On the other hand, you may still have a unified summit, by defining a <link linkend="sec-sustpptransf">path transformation</link> in summit configuration to disambiguate UI and documentation PO files sharing the same domain name.</para>
0475 </footnote></para>
0476
0477 <para>The parent language directory in this scenario, with summits and ascription set up, could look like this:
0478 <programlisting>
0479 l10n-nn/
0480     summit/
0481         ui/
0482             foo-module/
0483                 alpha.po
0484                 bravo.po
0485                 ...
0486             bar-module/
0487                 kilo.po
0488                 lima.po
0489                 ...
0490             ...
0491             summit-config
0492         doc/
0493             foo-module/
0494                 alpha.po
0495                 bravo.po
0496                 ...
0497             bar-module/
0498                 kilo.po
0499                 lima.po
0500                 ...
0501             summit-config
0502     summit-ascript/
0503         ui/
0504             foo-module/
0505                 alpha.po
0506                 bravo.po
0507                 ...
0508             bar-module/
0509                 kilo.po
0510                 lima.po
0511                 ...
0512             ...
0513         doc/
0514             foo-module/
0515                 alpha.po
0516                 bravo.po
0517                 ...
0518             bar-module/
0519                 kilo.po
0520                 lima.po
0521                 ...
0522             ...
0523     ascription-config
0524 </programlisting>
0525 Note here the location of <literal>summit-config</literal> files: each is within its own summit directory, which are <filename>summit/ui/</filename> and <filename>summit/doc/</filename>. On the other hand, there is a single <literal>ascription-config</literal> file, which covers all summits. This means that summit operations (merging, scattering) must be performed from within their respective summit directories (since <command>posummit</command> looks through the parent directories for first <literal>summit-config</literal> file), while ascription operations can be performed from anywhere.</para>
0526
0527 <para>Having unified ascription is especially convenient in <link linkend="sec-sumntbasic">centralized summit maintenance</link>, since translators and reviewers are concerned only with ascription (running <command>poascribe</command> to commit, select for review, etc.) regardless of how many summits there are.</para>
0528
0529 </sect2>
0530
0531 </sect1>
0532
0533 </chapter>