Warning, /sdk/pology/doc/user/combined.docbook is written in an unsupported language. File is not indexed.
0001 <?xml version="1.0" encoding="UTF-8"?> 0002 <!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" 0003 "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd"> 0004 0005 <chapter id="ch-combined"> 0006 <title>Combined Arms Tactics</title> 0007 0008 <para>While each particular PO processing tool from Pology and other packages may be documented in itself, it may not be always obvious how to use these tools together. This chapter presents some scenarios where combined tool usage may increase the quality and efficiency of daily work on translation.</para> 0009 0010 <!-- ======================================== --> 0011 <sect1 id="sec-cbcompend"> 0012 <title>Creating and Using PO Compendia</title> 0013 0014 <para>A <emphasis>PO compendium</emphasis> is simply a PO file which aggregates messages from many other normal PO files, usually all same-language PO files in a given translation project. It may aggregate only the messages currently present in project PO files, but also messages that were present once and are no longer. As such, the compendium can be regarded as an instance of a <emphasis>translation memory</emphasis>. This section explains how to create, update, and apply such a translation memory.</para> 0015 0016 <sect2 id="sec-cbcomptm"> 0017 <title>Why Translation Memory?</title> 0018 0019 <para>Imagine that the translator wants to start translating a PO file that was so far never translated, but which has content similar to some other, translated PO files. Perhaps it was even derived from those other PO files, by merging, splitting, etc. This means that many messages in the present PO file may have been translated already in some other PO file, or at least that very similar translated messages exist in other PO files. Since the translation memory (TM) contains all known translated messages, it can be used to automatically produce translated and fuzzy messages in the present PO file, significantly reducing translation effort. Matching against the TM can be performed either as the translator goes from message to message in the editor (if the editor has a TM feature), or at once for all messages (by a specialized command) before starting to go through messages in the editor.</para> 0020 0021 <para>In most non-PO based translation workflows, translation memories are crucial for efficiency. This is because most non-PO formats have no concept of merging with templates. Each new revision of the source material results in (an equivalent of) entirely empty translation files, and it is translator's duty to somehow bring old translations into the new context. A carefully maintained TM, with a corresponding matching tool, is the foremost way to do this.</para> 0022 0023 <para>In a PO-based translation workflow, merging with templates already provides most of what TM is essential for. In effect, the old PO file that is being merged can be considered as a TM for the new PO file that will become based on the new template. Even when PO files are renamed, merged, or split, if that is properly done, no translations will be lost. A TM for PO files is therefore useful mostly to smooth out glitches in translation maintenance procedures (e.g. a PO file improperly split).<footnote> 0024 <para>To be sure, some short messages can be quite similar in many unrelated PO files. But having TM matches only on such messages will result in very small time savings, if measurable at all.</para> 0025 </footnote> Nevertheless, having a well maintained TM in the form of PO compendium cannot hurt, while providing for the (hopefully) rare situations where TM matching is actually needed.</para> 0026 0027 <para>Many dedicated PO editors will automatically maintain an internal TM, usually in a database format, into which they will scoop messages from all PO files that were opened in them. However, in a team environment, these internal TMs are inferior to a PO compendium. For one, different translators will have different TMs; a translator may start to work on a file for which there are TM matches in another translator's internal TM. Internal TMs may be volatile, for example corrupted due to an editor bug, or perish during system maintenance. There is no control over which messages are scooped by the editor, and how they are treated (e.g. which message parts are being ignored).</para> 0028 0029 <para>On the other hand, a PO compendium can be maintained in a central place, and, being a PO file in itself, kept in version control just like all other PO files. In this way, all translators have fast access to a unified TM, which is secured from accidental corruption. Tight control over which messages are collected and how they are collected may be asserted, in the script which is written to update the compendium. This script can be made to run periodically, and to automatically commit updated compendium the version control repository.</para> 0030 0031 </sect2> 0032 0033 <sect2 id="sec-cmcompcr"> 0034 <title>Maintaining a Centralized PO Compendium</title> 0035 0036 <para>As a first attempt, the PO compendium can be created simply by concatenting all PO files in the project into one called <filename>compendium.po</filename>, using <command>msgcat</command>. If PO files are organized by language (all PO files of a given language kept in directory of that language), then the concatenation command would be: 0037 <programlisting language="bash"> 0038 $ cd $LANGDIR 0039 $ find -iname \*.po | xargs msgcat -o compendium.po 0040 </programlisting> 0041 Unfortunatelly, a compendium created in this way has a number of drawbacks: 0042 <itemizedlist> 0043 0044 <listitem> 0045 <para>Aside from translated messages, the compendium will also contain untranslated and fuzzy messages. While untranslated messages are obviously dead weight, a case could be made for taking in fuzzy messages. But in light of the suggested usage of the compendium in the following section, fuzzy messages too should be ignored.</para> 0046 </listitem> 0047 0048 <listitem> 0049 <para>Messages in the compendium will contain all parts as normal messages do. Some of these parts (such as source references) are unnecessary, since they will be ignored when applying the compendium later. Other than increasing the size of the compendium, another problem with these parts is that changes in them will cause unnecessary version control differences, so they should be stripped from the compendium.</para> 0050 </listitem> 0051 0052 <listitem> 0053 <para>Messages will be ordered as they are seen in concatenated PO files. The ordering of messages in the compendium is also of no importance for application. But, any changes in message ordering between two compendium updates will cause unnecessary version control differences, so it is best to sort messages by their keys (<varname>msgid</varname> and <varname>msgctxt</varname> fields).</para> 0054 </listitem> 0055 0056 <listitem> 0057 <para>When two or more PO files contain the same message by key (<varname>msgid</varname> and <varname>msgctxt</varname>) but with different translations (due to context), such as: 0058 <programlisting language="po"> 0059 msgid "Open File" 0060 msgstr "Otvori datoteku" 0061 0062 msgid "Open File" 0063 msgstr "Otvaranje datoteke" 0064 </programlisting> 0065 <command>msgcat</command> will <emphasis>aggregate</emphasis> translations (and translator comments if any) in the compendium message, and make it fuzzy: 0066 <programlisting language="po"> 0067 #, fuzzy 0068 msgid "Open File" 0069 msgstr "" 0070 "#-#-#-#-# alpha.po (alpha-1.2.9) #-#-#-#-#\n" 0071 "Otvori datoteku\n" 0072 "#-#-#-#-# bravo.po (bravo-0.8.12) #-#-#-#-#\n" 0073 "Otvaranje datoteke" 0074 </programlisting> 0075 Since the context should be double-checked anyway when applying the compendium later (especially for short messages), it is better to instead pick one of the translations and have a normal translated compendium message. If each translation appears only once, then it does not matter which is picked; but if one translation appears 10 times and the other once, clearly the former should be picked. That is, the most frequent translation should be picked.</para> 0076 </listitem> 0077 0078 <listitem> 0079 <para>The PO header is treated in the same way as messages by <command>msgcat</command>: since all headers have equal <varname>msgid</varname> field (empty), their <varname>msgstr</varname> fields will be aggregated. This too is just dead weight since the header is not used in applications of the compendium. Instead, a brief and informative header should be explicitly set (mentioning that this is a compendium PO file, for which project and language, etc).</para> 0080 </listitem> 0081 0082 <listitem> 0083 <para>In some translation projects, PO files frequently contain <emphasis>meta-messages</emphasis>, such as those where translators can add their names and contact addresses. These messages have the same key (<varname>msgid</varname>) in all PO files, but should be translated differently in general, the more so the more the people in the translation team. So it may be better to omit such messages from the compendium.</para> 0084 </listitem> 0085 0086 </itemizedlist> 0087 It must be noted that none of these problems are an actual deficiency of <command>msgcat</command> itself. Since its function is general concatenation of PO files, it cannot make any of the assumptions necessary for the present application. Instead, <command>msgcat</command> should be used as a part of a wider script, in which the necessary additional processing happens, tailored to the particular translation project and translation team.</para> 0088 0089 <para>Let us assume the following layout of the top directory for the translation project <literal>foo</literal> and translation team (language) <literal>nn</literal>: 0090 <programlisting> 0091 foo-nn/ 0092 ui/ 0093 alpha.po 0094 bravo.po 0095 ... 0096 doc/ 0097 alpha.po 0098 bravo.po 0099 ... 0100 update-compendium-foo-nn.sh 0101 compendium-foo-nn.po 0102 </programlisting> 0103 <filename>update-compendium-foo-nn.sh</filename> will be the script to create or update the compendium, <filename>compendium-foo-nn.po</filename> the compendium itself. It helps clarity to add the project name and language into names of these two files, because both are tailored to that project and that language. Taking into account the aforementioned drawbacks of a simple compendium made by <command>msgcat</command> and the suggested resolutions, <filename>update-compendium-foo-nn.sh</filename> could look like this<footnote> 0104 <para>At one point, the script creates a temporary PO file for each original PO file, and then calls <command>msgcat</command> on these temporary files to create the first, raw compendium. These temporary files have fuzzy and untranslated messages removed, and some other adjustments, before concatenation. One could think that all these adjustments could instead be done on the raw compendium. The problem is that then there would be no unambiguous way to tell which fuzzy messages in the raw compendium were fuzzy to begin with, and which were made fuzzy by <command>msgcat</command> due to agreggation of translations. With fuzzy messages removed prior to concatenation, in de-aggregation by frequency that follows it is known that messages with fuzzy flags are those aggregated.</para> 0105 </footnote>: 0106 <programlisting language="bash"> 0107 #!/bin/sh 0108 # 0109 # Create the PO compendium of Foo in Nevernissian language. 0110 # 0111 # Usage: 0112 # update-compendium-foo-nn.sh [trim] 0113 # 0114 # The script can be called from anywhere, because PO paths are 0115 # hardcoded within the script relative to its own location. 0116 # If the 'trim' argument is not given (i.e. script is called 0117 # without arguments), messages in the old compendium that are 0118 # no longer found in project PO files are preserved in 0119 # the new compendium; if 'trim' is given, they are removed. 0120 0121 # Directory where this script resides. 0122 cmddir=`dirname $0` 0123 # Paths of directories containing PO files, space-separated. 0124 # (Make sure the compendium itself is not in here!) 0125 podirs="$cmddir/ui $cmddir/doc" 0126 # Path to the compendium. 0127 comppo="$cmddir/compendium-foo-nn.po" 0128 0129 trim=$1 0130 0131 # If there is already a compendium, preserve it for later. 0132 test -f $comppo && mv $comppo $comppo.old 0133 0134 # Collect PO files from given paths into a file. 0135 find $podirs -iname \*.po | sort > polist 0136 0137 # Pre-process PO files in the project, creating temporary 0138 # PO files named *.po.tmpcomp: 0139 # - remove fuzzy and untranslated messages 0140 # - declare obsolete messages non-obsolete 0141 # - remove extracted comments, source references, flags 0142 for pofile in `cat polist`; do 0143 msgattrib $pofile \ 0144 --translated --no-fuzzy --clear-obsolete --force-po \ 0145 | grep -v '^#[:.,]' > $pofile.tmpcomp 0146 done 0147 # Update file list to contain temporary PO files. 0148 sed -i "s/$/.tmpcomp/" polist 0149 0150 # Reduce headers of temporary PO files to necessary minimum, 0151 # proper header for the compendium will be added later. 0152 posieve -q set-header -f polist \ 0153 -srmallcomm \ 0154 -sremoverx:'^(?!MIME-Version$|Content-Type$|Content-Transfer-Encoding$)' 0155 0156 # Create raw compendium from temporary PO files: 0157 # - aggregate translations for repeated messages 0158 # - sort messages by key 0159 msgcat --sort-output --force-po -f polist -o $comppo 0160 0161 # Clean up temporary PO files and file list. 0162 cat polist | xargs rm 0163 rm polist 0164 0165 # Resolve aggregated messages to most frequent variant. 0166 # It is safe to unfuzzy resolved messages, since at 0167 # this point assured that only translated messages 0168 # have been aggregated. 0169 posieve -q resolve-aggregates $comppo -sunfuzzy 0170 0171 # Remove meta-messages which are found in many PO files but 0172 # should in general be differently translated in each. 0173 msggrep -v $comppo -o $comppo \ 0174 -JFe 'NAME OF TRANSLATORS' \ 0175 -JFe 'EMAIL OF TRANSLATORS' \ 0176 -JFe 'ROLES_OF_TRANSLATORS' \ 0177 -JFe 'CREDIT_FOR_TRANSLATORS' \ 0178 0179 # Set the compendium header. 0180 # Use current date as revision date. 0181 dtnow=`date '+%Y-%m-%d %H:%M%z'` 0182 posieve -q set-header $comppo -screate \ 0183 -stitle:'Compendium of Foo translation into Nevernissian.' \ 0184 -sfield:'Project-Id-Version:compendium-foo-nn' \ 0185 -sfield:"PO-Revision-Date:$dtnow" \ 0186 -sfield:'Last-Translator:Simulacrum' \ 0187 -sfield:'Language-Team:Nevernissian <l10n-nn@neverwhere.org>' \ 0188 -sfield:'Language:nn' \ 0189 -sfield:'Plural-Forms:nplurals=9; plural=n==1 ? ...' \ 0190 0191 # If the old compendium was preserved, add it to the new compendium 0192 # in order to retain messages no longer found in the project 0193 # (unless trimming was requested). 0194 if test -f $comppo.old && test x"$trim" != xtrim; then 0195 msgcat --use-first --sort-output $comppo $comppo.old -o $comppo 0196 # ...old compendium must be the second argument, in order 0197 # not to override possibly updated translations of 0198 # existing messages in the project. 0199 fi 0200 0201 # Test if new compendium is different from the old, with 0202 # the exception of creation time. If they are the same, 0203 # discard the new compendium. 0204 if test -f $comppo.old; then 0205 for cpfile in $comppo $comppo.old; do 0206 grep -v '^"PO-Revision-Date:.*\\n"$' $cpfile >$cpfile.nrd 0207 done 0208 if cmp -s $comppo.nrd $comppo.old.nrd; then 0209 mv $comppo.old $comppo 0210 else 0211 rm $comppo.old 0212 fi 0213 rm $comppo.nrd $comppo.old.nrd 0214 fi 0215 0216 # Canonically wrap the compendium. 0217 msgcat $comppo -o $comppo 0218 0219 # All done. 0220 </programlisting> 0221 This script should be periodically called to update the compendium, and the updated file committed, such that all translators will automatically get it when they update their local repository copies. If after some (long) time the compendium becomes to big due to accumulation of old messages, running the script once with the <literal>trim</literal> argument will cause all old messages to be dropped.</para> 0222 0223 </sect2> 0224 0225 <sect2 id="sec-cmcompuse"> 0226 <title>Applying the PO Compendium</title> 0227 0228 <para>Translators who use a dedicated PO editor with internal TM should configure the editor to read the compendium into the internal TM. This may be done, for example, by including the compendium PO file (or the directory in which it resides) into editor's translation project paths. If the compendium is kept under version control, the editor should automatically update its internal TM from the compendium whenever the repository is updated and the editor started again. In this way, editor's internal TM becomes transient in nature, there being no problem if it gets corrupted or deleted.</para> 0229 0230 <para>When working on a particular PO file with a properly configured PO editor, as the translator jumps from one to another incomplete (untranslated or fuzzy) message, when the message is similar to one or few messages in the compendium (i.e. in internal TM) the editor will somehow "offer" those similar messages. Ideally, for each similar message the editor should show not only the possible translation, but also the difference between the two original texts (that of the current message and the TM match). This will allow the translator to quickly see how the offered translation should be adapted to fit the current original.</para> 0231 0232 <para>Dedicated PO editors may also offer <emphasis>batch</emphasis> application of the TM. This means that when the PO file is opened, the translator executes a command which fills in all untranslated messages with matches from the TM, making some translated (on exact matches) and some fuzzy (partial matches). However, simpleminded <emphasis>batch application of the TM should be considered dangerous</emphasis>. For one, exact matches in the source language may not be exact matches in the original; especially short messages frequently need different translations. But the translator will simply jump over each batch-translated message and fail to see this. The other problem comes up if the material in the compendium is not sufficiently reviewed, in which case every match from the TM, even on long messages, should be at least casually reviewed by the translator. Thus, if there is no way to configure batch application to be less indiscriminate, it is best to avoid it alltogether, or else the quality of translation may suffer.</para> 0233 0234 <para>Translators who use a general text editor to work on PO files can still make use of the compendium. One option could be merging the PO file with its template in presence of the compendium, just before starting to work on it: 0235 <programlisting language="bash"> 0236 $ msgmerge alpha.po alpha.pot -C compendium.po --update --previous 0237 </programlisting> 0238 The <option>-C</option> option to <command>msgmerge</command> specifies the compendium from which to draw exact and partial matches, when there is no match in the PO file itself. This option can be repeated to add several compendia. The <option>--update</option> option is to modify the PO file in place, rather than writing the merged PO file to standard output. The <option>--previous</option> option is to get previous fields (<literal>#| ...</literal> comments) on fuzzy messages. Unfortunatelly, this method is a command line version of the batch application of the TM in a dedicated PO editor, and suffers from the same problem of indiscriminate exact matches that the translator will later fail to check. Therefore it should not be used (at least not for general translation).</para> 0239 0240 <para>Fortunatelly, Pology provides the <command>poselfmerge</command> command, which is a wrapper around <command>msgmerge</command>, and has several options to mitigate the indiscriminancy problem of batch application of TM. To avoid silent exact matches on short messages, the <option>-W</option>/<option>--min-words-exact</option> can be used to set the minimum length of a message in words at which the exact match will be accepted; otherwise the message is made fuzzy. If every exact match should be checked by the translator, no matter the length of the message, there is the <option>-x</option>/<option>--fuzzy-exact</option> to make all exact matches fuzzy.<footnote> 0241 <para>The translator can still see when the match was exact, because normal fuzzy messages will have previous fields and fuzzied exact matches will not.</para> 0242 </footnote> These options have counterpart fields in Pology user configuration, so that the translator does not have to remember to use them on every run, and the PO template is not used at all. See <xref linkend="sec-miselfmerge"/> for details.</para> 0243 0244 </sect2> 0245 0246 </sect1> 0247 0248 <!-- ======================================== --> 0249 <sect1 id="sec-cbeffedit"> 0250 <title>Efficiently Translating with a Text Editor</title> 0251 0252 <para>Dedicated PO editors provide not only direct editing enhancements (no dealing with PO format syntax, jumping through incomplete messages, automatic removal of fuzzy elements, etc), but also translation-oriented features like spell checking, translation memory collection and application, glossary suggestions, and, going beyond standalone PO files, translation project overview and statistics. Why would someone, in spite of this, prefer to work on PO files with a general text editor? There are various reasons. Some people do not like how elements of currently translated PO message are scattered all over the window (as is typical of many PO editors), out of eye focus, and some elements even not shown. Other people like to have modularity in the translation workflow, rather than relying on the PO editor for everything and accepting its limitations. Some people are simply well accustomed to their text editor and do not want a higher level editor "abstracting" the PO format for them.</para> 0253 0254 <para>When translating PO files with a general text editor, you will have to use some command line tools to achieve reasonable efficiency and quality.</para> 0255 0256 <sect2 id="sec-cbeffedfeat"> 0257 <title>Expected Features of the Text Editor</title> 0258 0259 <para>Starting from the text editor itself, it should have several general text-editing features. Capable editors all have these features, but they should nevertheless be mentioned, so that you can look for them.</para> 0260 0261 <para>The most important feature is probably <emphasis>syntax highlighting</emphasis>, where special parts of the text are displayed in different color, weight, or slant. In a PO file, message field keywords (<varname>msgid</varname>, <varname>msgstr</varname>) should stand out from the text itself, text in comments should look different from the text in fields, internal text elements (e.g. markup tags) should be highlighted, etc. In this way you can quickly focus on what you should be editing, and on the surrounding context of the text. Syntax higlighting was originaly introduced for various programming language source files, but has since spread to other types of structured text files; established editors should have syntax highlighting for PO files as well.</para> 0262 0263 <para>Capable editors usually provide special methods of navigating through the file, above simply scrolling up and down line by line or page by page. One particularly useful method would be <emphasis>line bookmarking</emphasis>. While in the middle of editing a given line, you have to search through the PO file for something (e.g. how a certain phrase was translated earlier): you can then bookmark the line, search as much as you like, and return to the same line by jumping to the bookmark. Otherwise you would have to remember which line (by number) it was to jump back to it, or search for the text that you remember from that line.<footnote> 0264 <para>One trick is also hitting undo once, which will normally skip to the line in which the last modification was made, and then hit redo to recover the modification.</para> 0265 </footnote></para> 0266 0267 <para>It will usually be possible to start the editor with one or more file paths as command-line arguments, to open those files at once. This is useful when a selection of PO files in need of some editing is determined by an external command, which writes out their paths. These paths can then be fed directly to the editor, rather than having to open them manually one by one (and possibly missing some) through editor's file dialog.</para> 0268 0269 </sect2> 0270 0271 <sect2 id="sec-cbeffstats"> 0272 <title>Statistics on PO Files</title> 0273 0274 <para>Having good statistics on a single or a group of PO files is necessary for estimating the translation effort, for example how much time should be allotted for updating the existing translation for impending next release of the source material. Pology's workhorse for computing statistics is <link linkend="sv-stats">the <command>stats</command> sieve</link> of <command>posieve</command>.</para> 0275 0276 <para id="p-trpsetup1">Assume the following arrangement of PO files for language <literal>nn</literal> and their templates: 0277 <programlisting> 0278 l10n-nn/ 0279 ui/ 0280 alpha.po 0281 bravo.po 0282 ... 0283 doc/ 0284 alpha.po 0285 bravo.po 0286 ... 0287 l10n-templates/ 0288 ui/ 0289 alpha.pot 0290 bravo.pot 0291 ... 0292 doc/ 0293 alpha.pot 0294 bravo.pot 0295 ... 0296 </programlisting> 0297 If the current working directory is <filename>l10n-nn/</filename>, to compute statistics on a single PO file, <command>posieve</command> can be executed like this: 0298 <programlisting language="bash"> 0299 $ posieve stats ui/alpha.po 0300 </programlisting> 0301 This will display a table with message counts, word counts and characters counts, as well as ratios to total, per category of messages (translated, fuzzy, untranslated, obsolete). To have the same output for all PO files in the <filename>ui/</filename> directory taken together, or in the whole project, respectively: 0302 <programlisting language="bash"> 0303 $ posieve stats ui/ 0304 $ posieve stats 0305 </programlisting> 0306 Note that word count is a much better base for estimating the translation effort than message count.</para> 0307 0308 <para>When statistics is computed for several PO files (or a directory, or several directories full of PO files), frequently it is necessary to get statistics per file (or per directory). This is done by adding the <option>byfile</option> or <option>bydir</option> sieve parameter: 0309 <programlisting language="bash"> 0310 $ posieve stats -s byfile ui/ 0311 </programlisting> 0312 However, this will output one full table for each file, which may be a bit too much data to grasp. Instead, you can request bar display, where each file is represented by a single-line bar. The bar shows either the number of messages or the number words per category, depending on whether <option>msgbar</option> or <option>wbar</option> was issued. To get word bars per file in <filename>ui/</filename> directory, you can execute: 0313 <programlisting language="bash"> 0314 $ posieve stats -s byfile -s wbar ui/ 0315 </programlisting> 0316 </para> 0317 0318 <para>Fuzzy messages introduce some uncertainty in effort estimation. If the statistics shows 50 fuzzy messages with 700 words, you cannot conclude from that if changes in those messages are small (e.g. cleaned style, punctuation) and translation can be quickly updated, or substantial (entirely new messages with passing similarity to earlier message) and require heavy editing. For this reason the <command>stats</command> sieve provides the <option>ondiff</option> parameter: for each fuzzy message the difference from previous message is computed, and based on that a part of the word count is assigned to translated category and the rest to untranslated (thus leaving nominal zero words in the fuzzy category). The result is that, for example, a PO file with a lot of messages fuzzy due to punctuation changes will show in statistics as almost completely translated by number of words.</para> 0319 0320 <para>If the translation project is organized such that new empty PO files are not automatically derived from new PO templates, then when running statistics just over language PO files it will happen that templates which do not have a counterpart PO file are not counted as fully empty PO files. To have such templates counted, the two-argument <option>templates</option> parameter can be issued; the first parameter is a path segment of the language directory, and the second parameter what to replace it with to get the corresponding template directory path. In the translation project setup as above, this is how you would compute the statistics on <filename>ui/</filename> directory while taking templates into account: 0321 <programlisting language="bash"> 0322 $ posieve stats -s templates:l10n-nn:l10n-templates ui/ 0323 </programlisting> 0324 The path replacement is always done on absolute paths, so in this example it is not a problem that the relative paths (<filename>ui/alpha.po</filename>...) do not contain original and replacement segments.</para> 0325 0326 <para>The translation project may not be organized such that each language has its own top directory. Instead, language PO files may be grouped by application and PO domain, and named by language code: 0327 <programlisting> 0328 project/ 0329 alpha/ 0330 po/ 0331 aa.po 0332 bb.po 0333 ... 0334 bravo/ 0335 po/ 0336 aa.po 0337 bb.po 0338 ... 0339 ... 0340 </programlisting> 0341 In this setup the <command>stats</command> sieve can still be run on directory paths as arguments, in order to get statistics on all PO files of a given language, by using the <option>-I</option>/<option>--include-path</option> option of <command>posieve</command> to single out the desired language. For example, to get statistics on all PO files of the <literal>nn</literal> language in a single table: 0342 <programlisting language="bash"> 0343 $ posieve stats project/ -I 'nn.po' 0344 </programlisting> 0345 or by file in form of message bars: 0346 <programlisting language="bash"> 0347 $ posieve stats -s byfile -s msgbar project/ -I 'nn.po' 0348 </programlisting> 0349 The value of the <option>-I</option> option is in fact a <link linkend="sec-cmregex">regular expression</link>, and the option can be repeated, which allows to finely tune the file selection when necessary.</para> 0350 0351 <para>As for other statistics tools, Gettext's <command>msgfmt</command> with <option>--statistics</option> option could be considered as one (though it shows only translated, fuzzy, and untranslated message counts), and especially the <ulink url="http://translate.sourceforge.net/wiki/toolkit/pocount">the <command>pocount</command> command from Translate Toolkit</ulink>.</para> 0352 0353 </sect2> 0354 0355 <sect2 id="sec-cbeffupdate"> 0356 <title>Updating PO Files After Merging</title> 0357 0358 <para>When a single PO file is to be translated from scratch, then it is easy to just open it in the text editor and start translating messages one by one. However, usually more frequent than this is translation maintenance, in which you need to go through a bunch of freshly merged PO files and update new untranslated and fuzzy messages. The problem then is twofold: how to efficiently check which files need updating, and how to efficiently go through messages that need to be updated within a file.</para> 0359 0360 <para>To see which PO files need to be updated, you can simply run the <command>stats</command> sieve with <option>byfile</option> and <option>msgbar</option>/<option>wbar</option> parameters (and possibly <option>ondiff</option>), as explained in the previous section. After that you would have to manually observe incomplete files and open them in the editor one by one, which is tedious and prone to oversight. Instead, you can also add the <option>incompfile</option> parameter to <command>stats</command>, which will write paths of all incomplete PO files into a file. If PO files are organized as in <link linkend="p-trpsetup1">the previous example</link>, and you want to update translations in <filename>ui/</filename> subdirectory, you would run: 0361 <programlisting language="bash"> 0362 $ posieve stats -s byfile -s wbar -s incompfile:toupdate.out ui/ 0363 </programlisting> 0364 Now <filename>toupdate.out</filename> will contain the paths of incomplete files. If the editor can be started from the command line with a number of file path arguments, you can directly feed it <filename>toupdate.out</filename>, e.g. by adding <filename>`cat toupdate.out`</filename> to the editor command.</para> 0365 0366 <para>If the translation project is organized such that each new template results in new empty PO file, you may wish to update only those PO files which where worked on before, i.e. those not entirely empty. For this you can add the <option>mincomp</option> parameter, which sets the minimal completeness (the ratio of translated to total messages) at which to take a PO file into consideration, with a very small value: 0367 <programlisting language="bash"> 0368 $ posieve stats -s mincomp:1e-6 -s incompfile:toupdate.out ui/ 0369 </programlisting> 0370 <literal>1e-6</literal> is short for <literal>0.000001</literal>, which means to take into consideration only those PO files which have more than one in a million translated files. Since there is no PO file with a million messages, this effectively means to include every PO file which has at least one translated message in it.</para> 0371 0372 <para>Once the incomplete PO files are open in the editor, to be able to jump through incomplete messages, you need to somehow use editor's search function. For fuzzy messages it is easy, you can just search for the <literal>, fuzzy</literal> string. Untranslated messages, on the other hand, are more problematic. You may think of searching for <literal>msgstr ""</literal>, but this would also find long wrapped messages: 0373 <programlisting language="po"> 0374 msgid "" 0375 "Blah blah blah [...]" 0376 "blah blah." 0377 msgstr "" 0378 "Bla bla bla [...]" 0379 "bla bla." 0380 </programlisting> 0381 To make untranslated messages stand out unambiguously, there is <link linkend="sv-tag-untranslated">the <command>tag-untranslated</command> sieve</link>. It simply adds <literal>untranslated</literal> flag to all untranslated messages (but not to fuzzy unless explicitly requested), so that you can search for <literal>, untranslated</literal> in the editor. The most convenient is to run <command>tag-untranslated</command> on the <filename>toupdate.out</filename> file produced by <command>stats</command> using the <option>-f</option>/<option>--from-files</option>: 0382 <programlisting language="bash"> 0383 $ posieve tag-untranslated -f toupdate.out 0384 </programlisting> 0385 </para> 0386 0387 <para>Fuzzy messages may be such only due to small changes in the original text, for example a single word changed in a paragraph-length message. This is not so easy to see by manually comparing the original and the translation. However, since fuzzy messages should have the previous original text in comments (if merged with <option>--previous</option> option of <command>msgmerge</command>), it is possible to automatically embed differences into those comments with <link linkend="sv-diff-previous">the <command>sv-diff-previous</command> sieve</link>; see its documentation for an example. You should run this sieve on <filename>toupdate.out</filename> as well: 0388 <programlisting language="bash"> 0389 $ posieve diff-previous -f toupdate.out 0390 </programlisting> 0391 Your editor may even highlight the difference segments added to the previous original text, making them stand out quite clearly.</para> 0392 0393 <para>Since normally you want both to mark untranslated messages and to add differences to fuzzy messages before going through PO files, you can run the two sieves at once: 0394 <programlisting language="bash"> 0395 $ posieve tag-untranslated,diff-previous -f toupdate.out 0396 </programlisting> 0397 </para> 0398 0399 <para>As you go through incomplete messages and update the translation, you should remove any <literal>fuzzy</literal> or <literal>untranslated</literal> flags, and previous fields in <literal>#| ...</literal> comments, so that in the end you can commit (upload, send) clean updated PO files. But sometimes it will happen that you realize that you do not have enough time to update everything, and you want to commit what you have completed by that moment. The problem is that there will still be some <literal>untranslated</literal> flags and embedded differences remaining throughout the files, and leftover embedded differences would e.g. interfere with subsequent merging. To automatically remove these remaining elements, you simply run the two sieves with the <option>strip</option> parameter: 0400 <programlisting language="bash"> 0401 $ posieve tag-untranslated,diff-previous -s strip -f toupdate.out 0402 </programlisting> 0403 </para> 0404 0405 <para>When you update a PO file, for the sake of clarity and copyright you should also update its header with your personal data (the author comment, the <literal>Last-Translator:</literal> field, etc.) You could do this manually, but it is much simpler to set your data once in the <link linkend="sec-cmconfig">Pology user configuration</link> and run <link linkend="sv-update-header">the <command>update-header</command> sieve</link> over all updated files<footnote> 0406 <para>If you use <link linkend="ch-ascript">ascription</link>, you should instead tell <command>poascribe</command> to update headers for you when committing. This is done by adding <literal>update-headers = yes</literal> to <literal>[poascribe]</literal> section in user configuration.</para> 0407 </footnote>: 0408 <programlisting language="bash"> 0409 $ posieve update-header -f toupdate.out 0410 </programlisting> 0411 </para> 0412 0413 </sect2> 0414 0415 </sect1> 0416 0417 <!-- ======================================== --> 0418 <sect1 id="sec-cbsumasc"> 0419 <title>Summit with Ascription</title> 0420 0421 <para>Summit and ascription workflows, described in <xref linkend="ch-summit"/> and <xref linkend="ch-ascript"/>, fit excellently together. Ascription enables review-based release control on summit scatter (<xref linkend="sec-sucfgascf"/> shows how to do it), while summit removes the needed for different ascription file trees per branch (and the associated effort at branch cycling). All the information that you need to set up a summit with ascription are explained in the chapters mentioned; the only thing left for this section is to show the order of actions and the resulting file structure, as implied by the technical requirements.</para> 0422 0423 <para>The first thing to set up is the summit. From the viewpoint of ascription, it is not important which summit mode is used; indeed, while the direct summit is still not advised, putting ascription on top would alleviate some of its disadvantages. In the following the summit over dynamic templates is assumed, because it is a bit less involved than the summit over static templates, but nevertheless demonstrates all important points.</para> 0424 0425 <para>After configuring and initializing the summit over dynamic templates, let the summit top directory only (that is, omitting branches) look like this: 0426 <programlisting> 0427 l10n-nn/ 0428 summit/ 0429 foo-module/ 0430 alpha.po 0431 bravo.po 0432 ... 0433 bar-module/ 0434 kilo.po 0435 lima.po 0436 ... 0437 ... 0438 summit-config 0439 </programlisting> 0440 PO files in the summit are shown split into several submodules for generality. Unlike in the chapter on summit, the summit directory is placed here within a parent language directory, and the summit configuration file <filename>summit-config</filename> in the parent directory instead of the summit directory. This is in order to have a clearer structure when the ascription is added.</para> 0441 0442 <para>The ascription is set up after the summit, such that it takes only the summit directory into account, having nothing to do with branches. After the ascription is configured and initialized, the summit with ascription tree should look like this: 0443 <programlisting> 0444 l10n-nn/ 0445 summit/ 0446 foo-module/ 0447 alpha.po 0448 bravo.po 0449 ... 0450 bar-module/ 0451 kilo.po 0452 lima.po 0453 ... 0454 ... 0455 summit-ascript/ 0456 foo-module/ 0457 alpha.po 0458 bravo.po 0459 ... 0460 bar-module/ 0461 kilo.po 0462 lima.po 0463 ... 0464 ... 0465 ascription-config 0466 summit-config 0467 </programlisting> 0468 Here the ascription tree root is set to <filename>summit-ascript/</filename> in the ascription configuration file <filename>ascription-config</filename>. With this, setting up the summit with ascription workflow is completed.</para> 0469 0470 <sect2 id="sec-cbsamultsum"> 0471 <title>Several Summits with Unified Ascription</title> 0472 0473 <para>In some circumstances you may want to have <emphasis>several</emphasis> separate summits with unified ascription. This may be the case, for example, when the translation project is such that the user interface and documentation PO files are put into separate file trees in branches, and most paired UI-documentation PO files have same names.<footnote> 0474 <para>On the other hand, you may still have a unified summit, by defining a <link linkend="sec-sustpptransf">path transformation</link> in summit configuration to disambiguate UI and documentation PO files sharing the same domain name.</para> 0475 </footnote></para> 0476 0477 <para>The parent language directory in this scenario, with summits and ascription set up, could look like this: 0478 <programlisting> 0479 l10n-nn/ 0480 summit/ 0481 ui/ 0482 foo-module/ 0483 alpha.po 0484 bravo.po 0485 ... 0486 bar-module/ 0487 kilo.po 0488 lima.po 0489 ... 0490 ... 0491 summit-config 0492 doc/ 0493 foo-module/ 0494 alpha.po 0495 bravo.po 0496 ... 0497 bar-module/ 0498 kilo.po 0499 lima.po 0500 ... 0501 summit-config 0502 summit-ascript/ 0503 ui/ 0504 foo-module/ 0505 alpha.po 0506 bravo.po 0507 ... 0508 bar-module/ 0509 kilo.po 0510 lima.po 0511 ... 0512 ... 0513 doc/ 0514 foo-module/ 0515 alpha.po 0516 bravo.po 0517 ... 0518 bar-module/ 0519 kilo.po 0520 lima.po 0521 ... 0522 ... 0523 ascription-config 0524 </programlisting> 0525 Note here the location of <literal>summit-config</literal> files: each is within its own summit directory, which are <filename>summit/ui/</filename> and <filename>summit/doc/</filename>. On the other hand, there is a single <literal>ascription-config</literal> file, which covers all summits. This means that summit operations (merging, scattering) must be performed from within their respective summit directories (since <command>posummit</command> looks through the parent directories for first <literal>summit-config</literal> file), while ascription operations can be performed from anywhere.</para> 0526 0527 <para>Having unified ascription is especially convenient in <link linkend="sec-sumntbasic">centralized summit maintenance</link>, since translators and reviewers are concerned only with ascription (running <command>poascribe</command> to commit, select for review, etc.) regardless of how many summits there are.</para> 0528 0529 </sect2> 0530 0531 </sect1> 0532 0533 </chapter>