Warning, /sdk/pology/doc/user/sieving.docbook is written in an unsupported language. File is not indexed.

0001 <?xml version="1.0" encoding="UTF-8"?>
0002 <!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
0003  "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd">
0004 
0005 <chapter id="ch-sieve">
0006 
0007 <title>Sieving</title>
0008 
0009 <para>Translator may want to apply batch-type operations to every message in a single PO file or in collection of PO files, such as searching and replacing text, computing statistics, or validating. However, batch-processing tools for general plain text (<command>grep</command>, <command>sed</command>, <command>awk</command>, etc.) are not very well suited to processing PO files. For example, when looking for a particular word, a generic search tool will not see it if it contains an <link linkend="sec-poaccel">accelerator marker</link>; or, if looking for a two-word phrase, a generic tool will miss it if it is <link linkend="sec-powrap">wrapped</link>. Therefore many tools tailored specifically for batch-processing messages in PO files have been developed, such as those bundled with <ulink url="http://www.gnu.org/software/gettext/">Gettext</ulink> (<command>msggrep</command>, <command>msgfilter</command>, <command>msgattrib</command>...), or from <ulink url="http://translate.sourceforge.net/wiki/toolkit/index">Translate Toolkit</ulink> (<command>pocount</command>, <command>pogrep</command>, <command>pofilter</command>...).</para>
0010 
0011 <para>Pology also provides a per-message batch-processing tool, the <command>posieve</command>. What was the need for it, given the myriad of other previously available and powerful tools? In accordance with philosophy of Pology, <command>posieve</command> goes deeper than these other tools. <command>posieve</command> makes easy that which is possible but awkward by combining generic command line tools. <command>posieve</command> is modular from the ground up, such that it is never a design problem to add new functionality to it, even when it is of narrow applicability. Users who know some Python can even write own "plugins" for it. Several processing modules can be applied in a single run of <command>posieve</command>, possibly affecting each other, in ways not possible by generic shell piping and not requiring temporary intermediate files.</para>
0012 
0013 <!-- ======================================== -->
0014 <sect1 id="sec-svbasics">
0015 <title>Basic Usage of <command>posieve</command></title>
0016 
0017 <para>The <command>posieve</command> script itself is actually a simple shell for applying various processing modules, called <emphasis>sieves</emphasis>, to every message in one or more PO files. Some sieves can also request to operate on the header of the PO file, which <command>posieve</command> will then feed to them. A sieve can both examine and modify messages; if any message is modified, by default the modified PO file will be written out in place. Naturally, <command>posieve</command> has a number of options, but more interestingly, each sieve can define some parameters which determine its behavior. Pology comes with many internal sieves, which do things from general to obscure (possibly language or project specific), and users can define their own sieves.</para>
0018 
0019 <para>Here is how you would run the <link linkend="sv-stats"><command>stats</command></link> sieve to collect statistics on all PO files in <filename>frobaz/</filename> directory:
0020 <programlisting language="bash">
0021 $ posieve stats frobaz/
0022 </programlisting>
0023 While PO files in <filename>frobaz/</filename> are being processed, you will see a progress bar with the current file and the number of files to process, and after some time the <command>stats</command> sive will present its findings in a table.</para>
0024 
0025 <para>The first non-option argument in the <command>posieve</command> command line is the sieve name, and then any number of directory or file paths can be specified.
0026 <command>posieve</command> will consider file path arguments to be PO files, and recursively search directory paths to collect all files ending with <filename>.po</filename> or <filename>.pot</filename>. If no paths are specified, PO files to process will be collected from the current working directory.</para>
0027 
0028 <para>If the sieve modifies a message and the new PO file is written out in place of the old, the user will be informed by an exclamation mark followed by the file path. An example of a sieve which modifies messages is the <link linkend="sv-tag-untranslated"><command>tag-untranslated</command></link> sieve; it adds
0029 the <literal>untranslated</literal> flag to every untranslated message, so that you can look them up in a plain text editor (as opposed to <link linkend="sec-poedlist">dedicated PO editor</link>):
0030 <programlisting language="bash">
0031 $ posieve tag-untranslated frobaz/
0032 ! frobaz/alfa.po
0033 ! frobaz/bravo.po
0034 ! frobaz/charlie.po
0035 Tagged 42 untranslated messages.
0036 </programlisting>
0037 <command>posieve</command> itself tracks message modifications and informs about modified PO files, whereas the final line in this example has been output by the <command>tag-untranslated</command> sieve. Sieves will frequently issue such final reports of their actions.</para>
0038 
0039 <para id="p-svparam">If a sieve defines some parameters to control its behavior, these can be issued using the <option>-s</option>. This option takes the parameter specification as the argument, which is of the form <literal><replaceable>name</replaceable>:<replaceable>value</replaceable></literal> or just <literal><replaceable>name</replaceable></literal> for switch-type parameters. More than one parameter can be issued by repeating the <option>-s</option>. For example, the <command>stats</command> sieve can be instructed to take into account only messages with at most 5 words:
0040 <programlisting language="bash">
0041 $ posieve stats -s maxwords:5 frobaz/
0042 </programlisting>
0043 to show statistics in greater detail:
0044 <programlisting language="bash">
0045 $ posieve stats -s detail frobaz/
0046 </programlisting>
0047 or to ignore a certain accelerator marker and show bar-type statistics instead of tabular:
0048 <programlisting language="bash">
0049 $ posieve stats -s accel:_ -s msgbar frobaz/
0050 </programlisting>
0051 </para>
0052 
0053 <para><command>posieve</command> lists and shows descriptions of its options by the usual <option>-h</option>/<option>--help</option> option. Help for a sieve can be requested by issuing the <option>-H</option>/<option>--help-sieves</option> while a sieve name is present in the command line. All available internal sieves with short descriptions are listed using <option>-l</option>/<option>--list-sieves</option>.</para>
0054 
0055 <para>Some sieves are language-specific, which can be seen by their names being of the form <command><replaceable>langcode</replaceable>:<replaceable>name</replaceable></command>. These sieves are primarily intendedfor use on PO files translated to indicated language, but depending on particularities, may be applicable to several more closely related languages. (A sieve which is doing language-specific things, but which is applicable to many languages, is more likely to be named as a general sieve.)</para>
0056 
0057 <para>If <link linkend="sec-cmshellcomp">shell completion</link> is active, it can be used to complete sieve names and their parameters.</para>
0058 
0059 </sect1>
0060 
0061 <!-- ======================================== -->
0062 <sect1 id="sec-svchains">
0063 <title>Sieve Chains</title>
0064 
0065 <para>It is possible to issue several sieves at once, by passing a comma-separated list of sieve names to <command>posieve</command> in place of single sieve name. This is called a <emphasis>sieve chain</emphasis>.</para>
0066 
0067 <para>At minimum, chaining sieves is a performance improving measure, since each PO file is opened (and possibly written out) only once, instead of on each sieve run. For example, you can in one run compute the statistics to see how many messages need to be update and tag all untranslated messages:
0068 <programlisting language="bash">
0069 $ posieve stats,tag-untranslated frobaz/
0070 ! frobaz/alfa.po
0071 ! frobaz/bravo.po
0072 ! frobaz/charlie.po
0073 ... (table with statistics) ...
0074 Tagged 42 untranslated messages.
0075 </programlisting>
0076 A message in the PO file is passed through each sieve in turn, in the order in which they are issued, before proceding to the next message. If a sieve modifies the message, the next sieve in the chain will operate on that modified version of the message. This means that the ordering of sieves in the command line is significant in general, and that it is interchangable only if the sieves in the chain are independent of each other (as in this example). Chain order also determines the order in which sieve reports are shown; if in this example the order had been <literal>tag-untranslated,stats</literal>, then first the tagged messages line would be written out, followed by the statistics table.</para>
0077 
0078 <para>Other than for performance, sieve chains are useful when messages should be modified in a particular way before a sieve gets to operate on it. A good example is when statistics is to be computed on PO files which contain old <link linkend="p-embctxt">embedded contexts</link>, where if nothing would be done, contexts would add to the word count of the original text. To avoid this, a <link linkend="sv-normctxt-sep">context normalization</link> sieve (which converts embedded contexts to <varname>msgctxt</varname>) can be chained with statistics sieve, and the <command>posieve</command> instructed not to write modifications to the PO file. If the embedded context is of the single-separator type, with separator character <literal>|</literal>, the sieve chain is:
0079 <programlisting language="bash">
0080 $ posieve --no-sync normctxt-sep,stats -s sep:'|' frobaz/
0081 Converted 21 separator-embedded contexts.
0082 ... (table with statistics) ...
0083 </programlisting>
0084 The <option>--no-sync</option> option prevents writing modified messages in the PO file on disk. Note that <literal>|</literal> as parameter value is quoted, because it would be interpreted as a shell pipe otherwise.</para>
0085 
0086 <para>Finally, some sieves can stop messages from being pushed further through the sieve chain, so they can be used as a prefilter to other sieves. The     archetypal example of this the <link linkend="sv-find-messages"><command>find-messages</command></link>, which stops non-matched messages from further sieving. For example, to include into statistics only the messages containing the word "quasar", this would be executed:
0087 <programlisting language="bash">
0088 $ posieve find-messages,stats -s msgid:quasar -s nomsg
0089 Found 12 messages satisfying the conditions.
0090 ... (table with statistics) ...
0091 </programlisting>
0092 The <option>msgid:</option> parameter specifies the word (actually, a regular expression) to be looked up in the original text, while <option>nomsg</option> parameter tells <command>find-messages</command> not to write out matched messages to standard output, which it would by default do. Note that no path was specified, meaning that all PO files in current working directory and below will be sieved.</para>
0093 
0094 <para>Examples of sieve chaining so far should have raised the following question: when several sieves are issued, to which of them are the parameters specified by <option>-s</option> options passed? The answer is that a parameter is sent to all sieves which accept parameter of that name. Continuing the previous example, if message texts can contain accelerator marker <literal>&amp;</literal>, this would be specified like this:
0095 <programlisting language="bash">
0096 $ posieve find-messages,stats -s msgid:quasar -s nomsg -s accel:'&amp;'
0097 </programlisting>
0098 <command>find-messages</command> will accept <option>accel</option> in order to also match messages like <literal>"Charybdis Q&amp;uasar"</literal>, while <command>stats</command> will use it to properly split text into words for counting them.</para>
0099 
0100 </sect1>
0101 
0102 <!-- ======================================== -->
0103 <sect1 id="sec-svoptions">
0104 <title>Command Line Options</title>
0105 
0106 <para>
0107 Options specific to <command>posieve</command>:
0108 <variablelist>
0109 
0110 <varlistentry>
0111 <term><option>-a</option>, <option>--announce-entry</option></term>
0112 <listitem>
0113 <para>A sieve may be buggy and crash or keep <command>posieve</command> in infinite loop on a particular PO entry (header or message). When this option is given, each PO entry will be announced before sieving it, so that you can see exactly where the problem occurs.</para>
0114 </listitem>
0115 </varlistentry>
0116 
0117 <varlistentry>
0118 <term><option>-b</option>, <option>--skip-obsolete</option></term>
0119 <listitem>
0120 <para>By default <command>posieve</command> will process all messages in the PO file, including the obsolete. Sometimes sieving obsolete messages is not desired, for example when running translation validation sieves. This option can then be used to skip obsolete messages.</para>
0121 </listitem>
0122 </varlistentry>
0123 
0124 <varlistentry>
0125 <term><option>-c</option>, <option>--msgfmt-check</option></term>
0126 <listitem>
0127 <para>For <command>posieve</command> to process the PO file, it is only necessary that basic PO syntax is valid, i.e. that <command>msgfmt</command> can compile the file. <command>msgfmt</command> also offers stricter validation mode: to have <command>posieve</command> run this stricter validation on the PO file, issue this option. Invalid files will be reported and will not be sieved.</para>
0128 </listitem>
0129 </varlistentry>
0130 
0131 <varlistentry>
0132 <term><option>--force-sync</option></term>
0133 <listitem>
0134 <para>When some messages in the PO file are modified, by default only those messages will be reformatted (e.g. strings wrapped as selected) when the PO file is modified on disk. This makes <command>posieve</command> friendly to version control systems. Sometimes, however, you may want that all messages are reformatted, modified or not, and then you can issue this option.</para>
0135 </listitem>
0136 </varlistentry>
0137 
0138 <varlistentry>
0139 <term><option>-h</option>, <option>--help</option></term>
0140 <listitem>
0141 <para>General help on <command>posieve</command>.</para>
0142 </listitem>
0143 </varlistentry>
0144 
0145 <varlistentry>
0146 <term><option>-H</option>, <option>--help-sieves</option></term>
0147 <listitem>
0148 <para><option>-h</option>/<option>--help</option> shows only description of <command>posieve</command> and its options, while this option shows the descriptions and available parameters of issued sieves. For example:
0149 <programlisting language="bash">
0150 $ posieve find-messages,stats -H
0151 </programlisting>
0152 would output help for <command>find-messages</command> and <command>stats</command> sieves.</para>
0153 </listitem>
0154 </varlistentry>
0155 
0156 <varlistentry>
0157 <term><option>--issued-params</option></term>
0158 <listitem>
0159 <para>List of all sieve parameters and their values that would be issued. Used to check <link linkend="p-svparconfcmd">the interplay of command line and configuration</link> on sieve parameters.</para>
0160 </listitem>
0161 </varlistentry>
0162 
0163 <varlistentry>
0164 <term><option>-l</option>, <option>--list-sieves</option></term>
0165 <listitem>
0166 <para>List of all internal sieves, with short descriptions.</para>
0167 </listitem>
0168 </varlistentry>
0169 
0170 <varlistentry>
0171 <term><option>--list-options</option>; <option>--list-sieve-names</option>; <option>--list-sieve-params</option></term>
0172 <listitem>
0173 <para>Simple listings of global options, internal sieve names, and parameters of issued sieves. Intended mainly for writting shell completion definitions.</para>
0174 </listitem>
0175 </varlistentry>
0176 
0177 <varlistentry>
0178 <term><option>-m <replaceable>OUTFILE</replaceable></option>, <option>--output-modified=<replaceable>OUTFILE</replaceable></option></term>
0179 <listitem>
0180 <para>If some PO files were modified by sieving, you may want to follow up with a command to process only those files. <command>posieve</command> will by default output the paths of modified PO files, but also other information, which makes parsing this output for modified paths ungainly. Instead, this option can be used to specify a file to which path of all modified PO files will be written to, one per line.</para>
0181 </listitem>
0182 </varlistentry>
0183 
0184 <varlistentry>
0185 <term><option>--no-skip</option></term>
0186 <listitem>
0187 <para>If a sieve reports an error, <command>posieve</command> normally skips the problematic message and continues sieving the rest of the PO file, if possible. This is sometimes not desired, when this option will tell <command>posieve</command> to abort with an error message in such cases.</para>
0188 </listitem>
0189 </varlistentry>
0190 
0191 <varlistentry>
0192 <term><option>--no-sync</option></term>
0193 <listitem>
0194 <para>All messages modified by sieves are by default written back to disk, i.e. their PO files modifed. This option prevents modification of PO files. This comes handy in two cases. One is when you want to check what effect a modifying sieve will have before actually accepting it (a "dry" run). The other case is when you use a modifying sieve as a filter for the next sieve in chain, which only needs to examine messages.</para>
0195 </listitem>
0196 </varlistentry>
0197 
0198 <varlistentry>
0199 <term><option>-q</option>, <option>--quiet</option></term>
0200 <listitem>
0201 <para><command>posieve</command> normally shows the progress of sieving, which can be cancelled by this option. (Sieves will still output their own lines.)</para>
0202 </listitem>
0203 </varlistentry>
0204 
0205 <varlistentry>
0206 <term><option>-s <replaceable>PARAM</replaceable>[:<replaceable>VALUE</replaceable>]</option></term>
0207 <listitem>
0208 <para>The central option of <command>posieve</command>, which is used to issue parameters to sieves.</para>
0209 </listitem>
0210 </varlistentry>
0211 
0212 <varlistentry>
0213 <term><option>-S <replaceable>PARAM</replaceable></option></term>
0214 <listitem>
0215 <para>When a sieve parameter is issued <link linkend="p-confsvpar">through user configuration</link>, this option can be used to cancel it for one particular run.</para>
0216 </listitem>
0217 </varlistentry>
0218 
0219 <varlistentry>
0220 <term><option>--version</option></term>
0221 <listitem>
0222 <para>Release and copyright information on <command>posieve</command>.</para>
0223 </listitem>
0224 </varlistentry>
0225 
0226 <varlistentry>
0227 <term><option>-v</option>, <option>--verbose</option></term>
0228 <listitem>
0229 <para>More verbose output, where <command>posieve</command> shows the sieving modes, lists files which are being sieved, etc.</para>
0230 </listitem>
0231 </varlistentry>
0232 
0233 </variablelist>
0234 </para>
0235 
0236 <para>
0237 Options common with other Pology tools:
0238 <variablelist>
0239 
0240 <xi:include xmlns:xi="http://www.w3.org/2001/XInclude"
0241             href="stdopt-filesfrom.docbook"/>
0242 
0243 <xi:include xmlns:xi="http://www.w3.org/2001/XInclude"
0244             href="stdopt-incexc.docbook"/>
0245 
0246 <xi:include xmlns:xi="http://www.w3.org/2001/XInclude"
0247             href="stdopt-colors.docbook"/>
0248 
0249 </variablelist>
0250 </para>
0251 
0252 </sect1>
0253 
0254 <!-- ======================================== -->
0255 <sect1 id="sec-svconfig">
0256 <title>User Configuration</title>
0257 
0258 <para>The following <link linkend="sec-cmconfig">configuration</link> fields can be used to modify general behavior of <command>posieve</command>:
0259 <variablelist>
0260 
0261 <varlistentry>
0262 <term><literal>[posieve]/skip-on-error=[*yes|no]</literal></term>
0263 <listitem>
0264 <para>Setting to <literal>no</literal> is counterpart to <option>--no-skip</option> command line option.</para>
0265 </listitem>
0266 </varlistentry>
0267 
0268 <varlistentry>
0269 <term><literal>[posieve]/msgfmt-check=[yes|*no]</literal></term>
0270 <listitem>
0271 <para>Setting to <literal>yes</literal> is counterpart to <option>-c</option>/<option>--msgfmt-check</option> command line option.</para>
0272 </listitem>
0273 </varlistentry>
0274 
0275 <varlistentry>
0276 <term><literal>[posieve]/skip-obsolete=[yes|*no]</literal></term>
0277 <listitem>
0278 <para>Setting to <literal>yes</literal> is counterpart to <option>-b</option>/<option>--skip-obsolete</option> command line option.</para>
0279 </listitem>
0280 </varlistentry>
0281 
0282 </variablelist>
0283 For configuration fields that have counterpart command line options, the command line option always takes precedence if issued.</para>
0284 
0285 <para id="p-confsvpar">Configuration can also be used to issue sieve parameters, by specifying <literal>[posieve]/param-<replaceable>name</replaceable></literal> fields. For example, parameters <option>transl</option> (a switch) and <option>accel</option> (with value <literal>&amp;</literal>) are issued to all sieves that accept them by writing:
0286 <programlisting language="ini">
0287 [posieve]
0288 param-transl = yes
0289 param-accel = &amp;
0290 </programlisting>
0291 </para>
0292 
0293 <para>To issue parameters only to certain sieves, parameter name can be followed
0294 by a sieve list of the form <literal>/<replaceable>sieve1</replaceable>,<replaceable>sieve2</replaceable>,...</literal>; to <emphasis>prevent</emphasis> the parameter from being issued only to certain sieves, prepend <literal>~</literal> to the sieve list. For example:
0295 <programlisting language="ini">
0296 [posieve]
0297 param-transl/find-messages = yes  # only for find-messages
0298 param-accel/~stats = &amp;            # not for stats
0299 </programlisting>
0300 </para>
0301 
0302 <para>Same parameters can sometimes be repeated in the command line, when it is logically meaningfull to provide several values of that type to a sieve. However, same-name fields cannot be used in configuration to supply several values, because they override each other. Instead, a dot and a unique string (within the sequence) can be appended to the parameter name to make it a unique configuration field:
0303 <programlisting language="ini">
0304 [posieve]
0305 param-accel.0 = &amp;
0306 param-accel.1 = _
0307 </programlisting>
0308 Strings after the dot can be anything, but a sequence of numbers or letters in alphabetical order is the least confusing choice.</para>
0309 
0310 <para id="p-svparconfcmd">Sieve parameters should be issued from the configuration only as a matter of convenience, when they are almost always used in sieve runs. But occasionaly the parameter issued from the configuration is not appropriate for the given run. Instead of going to configuration and commenting the parameter out temporarily, it can be cancelled in the command line using the <option>-S</option> option (note capital S) followed by the parameter name. You can use <option>--issued-params</option> option to confirm which parameters will be issued after both the command line and the configuration have been taken into account.</para>
0311 
0312 </sect1>
0313 
0314 <!-- ======================================== -->
0315 <sect1 id="sec-svinternal">
0316 <title>Internal Sieves</title>
0317 
0318 <para>This section describes the sieves which are contained in Pology distribution and provides instruction for their use.</para>
0319 
0320 <para><!--Some sieve parameters are mandatory, i.e. they have to be issued when the sieve is run. Parameters for which this is the case will have (*) added to their header in the parameter list.--> Parameters which take a value (which are not switches) may or may not have a default value, and when they do, it will be given in square brackets (<literal>[...]</literal>) in the header.</para>
0321 
0322 <sect2 id="sv-apply-filter">
0323 <title><command>apply-filter</command></title>
0324 
0325 <para><command>apply-filter</command> is used to pipe translation through one or several <emphasis>hooks</emphasis> (see <xref linkend="sec-cmhooks"/>). The hooks may modify the translation, validate it, or do something else. More precisely, the following hook types are applicable:
0326 <itemizedlist>
0327 <listitem>
0328 <para>F1A, F3A, F3C, to modify the translation and write changes back to the PO file;</para>
0329 </listitem>
0330 <listitem>
0331 <para>V1A, V3A, V3C, to validate the translation, with standard validation output (highlighted spans and problem messages);</para>
0332 </listitem>
0333 <listitem>
0334 <para>S1A, S3A, S3C, for any side-effect processing on translation (but no modification).</para>
0335 </listitem>
0336 </itemizedlist>
0337 </para>
0338 
0339 <para>Parameters:
0340 <variablelist>
0341 
0342 <varlistentry>
0343 <term><option>filter:<replaceable>hookspec</replaceable></option></term>
0344 <listitem>
0345 <para>The hook specification. Can be repeated to add several hooks, which are then applied in the order of specification.</para>
0346 </listitem>
0347 </varlistentry>
0348 
0349 <varlistentry>
0350 <term><option>showmsg</option></term>
0351 <listitem>
0352 <para>Report every modified message to standard output. (For validation hooks, message is automatically reported if not valid.)</para>
0353 </listitem>
0354 </varlistentry>
0355 
0356 </variablelist>
0357 </para>
0358 
0359 </sect2>
0360 
0361 <sect2 id="sv-apply-header-filter">
0362 <title><command>apply-header-filter</command></title>
0363 
0364 <para><command>apply-header-filter</command> is the counterpart to <command>apply-filter</command> to operate on headers instead of messages. Here the applicable hook types are accordingly F4B, V4B, S4B.</para>
0365 
0366 <para>Parameters:
0367 <variablelist>
0368 
0369 <varlistentry>
0370 <term><option>filter:<replaceable>hookspec</replaceable></option></term>
0371 <listitem>
0372 <para>The hook specification. Can be repeated to add several hooks, which are then applied in the order of specification.</para>
0373 </listitem>
0374 </varlistentry>
0375 
0376 </variablelist>
0377 </para>
0378 
0379 </sect2>
0380 
0381 <sect2 id="sv-bad-patterns">
0382 <title><command>bad-patterns</command></title>
0383 
0384 <caution><para>This sieve is deprecated. Use <link linkend="sv-check-rules"><command>check-rules</command></link> instead, which applies Pology's <link linkend="sec-lgrules">validation rules</link>.</para></caution>
0385 
0386 <para>Sometimes it is possible to use simple pattern matching to discover things that should never appear in the text, such as common grammar or orthographical errors. <command>bad-patterns</command> can apply such patterns to translation, either as plain substring matching or <link linkend="sec-cmregex">regular expressions</link>. Patterns can be given as parameters, or more conveniently, read from files.</para>
0387 
0388 <para>Parameters:
0389 <variablelist>
0390 
0391 <varlistentry>
0392 <term><option>pattern:<replaceable>string</replaceable></option></term>
0393 <listitem>
0394 <para>The pattern to search for. Can be repeated to search for several patterns.</para>
0395 </listitem>
0396 </varlistentry>
0397 
0398 <varlistentry>
0399 <term><option>fromfile:<replaceable>path</replaceable></option></term>
0400 <listitem>
0401 <para>Read patterns to search for from the file. Each line contains one pattern. If line starts with <literal>#</literal>, it is treated as comment. Empty lines are ignored. Trailing and leading whitespace is removed from patterns; if it is significant, it can be given inside <literal>[...]</literal> regex operator. This parameter can be repeated to read patterns from several files.</para>
0402 </listitem>
0403 </varlistentry>
0404 
0405 <varlistentry>
0406 <term><option>rxmatch</option></term>
0407 <listitem>
0408 <para>By default patterns are treated as plain substrings. This parameter requests to treat patterns as regular expressions.</para>
0409 </listitem>
0410 </varlistentry>
0411 
0412 <varlistentry>
0413 <term><option>casesens</option></term>
0414 <listitem>
0415 <para>By default patterns are case-sensitive. This parameter make them case-insensitive.</para>
0416 </listitem>
0417 </varlistentry>
0418 
0419 </variablelist>
0420 </para>
0421 
0422 </sect2>
0423 
0424 <sect2 id="sv-check-docbook4">
0425 <title><command>check-docbook4</command></title>
0426 
0427 <para><command>check-docbook4</command> checks PO files extracted from Docbook 4.x files. Docbook is an XML format, typically used for documenting software.</para>
0428 
0429 <para>Parameters:
0430 <variablelist>
0431 
0432 <varlistentry>
0433 <term><option>showmsg</option></term>
0434 <listitem>
0435 <para>Instead of just showing the message location and problem description, also show the complete message with problematic segments higlighted.</para>
0436 </listitem>
0437 </varlistentry>
0438 
0439 <varlistentry>
0440 <term><option>lokalize</option></term>
0441 <listitem>
0442 <para>Open the PO file on reported messages in Lokalize. Lokalize must be already running with the project that contains the PO file opened.</para>
0443 </listitem>
0444 </varlistentry>
0445 
0446 </variablelist>
0447 </para>
0448 
0449 <para>Currently performed checks:
0450 <itemizedlist>
0451 
0452 <listitem>
0453 <para>Markup validity. Docbook is a complex XML format, and nothing short of full validation of XML files generated from translated PO files can show if the translation is technically valid. Therefore <command>check-docbook4</command> checks only well-formedness, whether tags are defined by Docbook, and some nesting constraints, and that on the level of single message. But this is already enough to catch great majority of usual translation errors.</para>
0454 
0455 <para>This check can be skipped on a message by adding to it the <literal>no-check-markup</literal> <link linkend="p-trflag">translator flag</link>.</para>
0456 </listitem>
0457 
0458 <listitem>
0459 <para>Message insertion placeholders. Some extractors of Docbook split out into standalone messages contextually separate units that are found in the middle of flowing paragraphs (e.g. footnotes). When that happens, a special placeholder is left in the originating message, so that the markup can be reconstructed when the translated Docbook file is built. Such placeholders must be carried into translation.</para>
0460 </listitem>
0461 
0462 </itemizedlist>
0463 </para>
0464 
0465 </sect2>
0466 
0467 <sect2 id="sv-check-grammar">
0468 <title><command>check-grammar</command></title>
0469 
0470 <para><command>check-grammar</command> checks translation with LanguageTool, an open source grammar and style checker (<ulink url="http://www.languagetool.org/">http://www.languagetool.org/</ulink>). LanguageTool supports a number of languages to greater or smaller extent, which you can check on <ulink url="http://www.languagetool.org/languages/">its web site</ulink>.</para>
0471 
0472 <para>LanguageTool can be run as standalone program or in client-server mode, and this sieve expects the latter. This means that LanguageTool has to be up and running before this sieve is run. Messages in which problems are discovered are reported to standard output.</para>
0473 
0474 <para>Parameters:
0475 <variablelist>
0476 
0477 <varlistentry>
0478 <term><option>lang:<replaceable>code</replaceable></option></term>
0479 <listitem>
0480 <para>The language code for which to apply the rules. If not given, it will be read from each PO file in turn, and if not found there either, an error will be signaled.</para>
0481 </listitem>
0482 </varlistentry>
0483 
0484 <varlistentry>
0485 <term><option>host:<replaceable>hostname</replaceable></option> [<literal>localhost</literal>]</term>
0486 <listitem>
0487 <para>Name of the host where the LanguageTool server is running. The default value of <literal>localhost</literal> means that it is running on the same computer where the sieve is run.</para>
0488 </listitem>
0489 </varlistentry>
0490 
0491 <varlistentry>
0492 <term><option>port:<replaceable>number</replaceable></option> [<literal>8081</literal>]</term>
0493 <listitem>
0494 <para>TCP port of the host on which the LanguageTool server listens for queries.</para>
0495 </listitem>
0496 </varlistentry>
0497 
0498 </variablelist>
0499 </para>
0500 
0501 </sect2>
0502 
0503 <sect2 id="sv-check-kde4">
0504 <title><command>check-kde4</command></title>
0505 
0506 <para><command>check-kde4</command> checks PO files extracted from program code based on KDE4 library and its translation system. Note that this really means what it says; this sieve should <emphasis>not</emphasis> be used to check just any PO file which happens to be part of the KDE project (e.g. PO files covering .desktop files, pure Qt code, etc.).</para>
0507 
0508 <para>Parameters:
0509 <variablelist>
0510 
0511 <varlistentry>
0512 <term><option>strict</option></term>
0513 <listitem>
0514 <para>Partly due to historical reasons, and partly due to programmers being sloppy, the original text itself is sometimes not valid by some checks. By default, when the original is not valid, the translation is not expected to be valid either, i.e. it is not checked. This parameter requires that the translation is always checked, regardless of the validity of the original (problems can almost always be avoided in the translation).</para>
0515 </listitem>
0516 </varlistentry>
0517 
0518 <varlistentry>
0519 <term><option>lokalize</option></term>
0520 <listitem>
0521 <para>Open the PO file on reported messages in Lokalize. Lokalize must be already running with the project that contains the PO file opened.</para>
0522 </listitem>
0523 </varlistentry>
0524 
0525 </variablelist>
0526 </para>
0527 
0528 <para>Currently performed checks:
0529 <itemizedlist>
0530 
0531 <listitem>
0532 <para>Markup validity. KDE4 messages can contain a mix of <ulink url="http://techbase.kde.org/Development/Tutorials/Localization/i18n_Semantics">KUIT</ulink> and <ulink url="http://doc.qt.nokia.com/4.6/richtext-html-subset.html">Qt rich text</ulink> markup. Although Qt rich text does not have to be well-formed in XML sense, this check expects well-formedness to be preserved in translation if the original is such (also see the <option>strict</option> parameter).</para>
0533 
0534 <para>This check can be skipped on a message by adding to it the <literal>no-check-markup</literal> <link linkend="p-trflag">translator flag</link>.</para>
0535 </listitem>
0536 
0537 </itemizedlist>
0538 </para>
0539 
0540 </sect2>
0541 
0542 <sect2 id="sv-check-rules">
0543 <title><command>check-rules</command></title>
0544 
0545 <para><command>check-rules</command> applies language- and project-dependent Pology <emphasis>validation rules</emphasis> to translation. See <xref linkend="sec-lgrules"/> for detailed discussion on writing and applying rules.</para>
0546 
0547 <para>Parameters:
0548 <variablelist>
0549 
0550 <varlistentry>
0551 <term><option>lang:<replaceable>code</replaceable></option></term>
0552 <listitem>
0553 <para>The language code for which to apply the rules. If not given, it will be read from each PO file in turn, and if not found there either, an error will be signaled.</para>
0554 </listitem>
0555 </varlistentry>
0556 
0557 <varlistentry>
0558 <term><option>env:<replaceable>environment</replaceable></option></term>
0559 <listitem>
0560 <para>The language environment for which to apply the rules (see <xref linkend="sec-lglangenv"/>). Several environments can be given as comma-separated list, in which case the later environment in the list takes precedence on conflicted rules. If not given, it may also be read from PO files (see <link linkend="hdr-x-environment"><literal>X-Environment</literal></link> in <xref linkend="sec-cmheader"/>).</para>
0561 </listitem>
0562 </varlistentry>
0563 
0564 <varlistentry>
0565 <term><option>envonly</option></term>
0566 <listitem>
0567 <para>When language environment is given, only the rules explicitly belonging to it are applied, while general rules for the selected language are ignored.</para>
0568 </listitem>
0569 </varlistentry>
0570 
0571 <varlistentry>
0572 <term><option>rule:<replaceable>identifiers</replaceable></option></term>
0573 <listitem>
0574 <para>Comma-separated list of rule identifiers, to apply only those rules. If a rule selected in this way is disabled in its definition, this enables it.</para>
0575 </listitem>
0576 </varlistentry>
0577 
0578 <varlistentry>
0579 <term><option>rulerx:<replaceable>regexes</replaceable></option></term>
0580 <listitem>
0581 <para>Like <option>rule</option>, but the values are interpreted as regular expressions by which to match rule identifiers.</para>
0582 </listitem>
0583 </varlistentry>
0584 
0585 <varlistentry>
0586 <term><option>norule:<replaceable>identifiers</replaceable></option></term>
0587 <listitem>
0588 <para>Inverse of the <option>rule</option> parameter: selected rules are not applied, and all other are applied.</para>
0589 </listitem>
0590 </varlistentry>
0591 
0592 <varlistentry>
0593 <term><option>norulerx:<replaceable>regexes</replaceable></option></term>
0594 <listitem>
0595 <para>Inverse of the <option>rulerx</option> parameter: selected rules are not applied, and all other are applied.</para>
0596 </listitem>
0597 </varlistentry>
0598 
0599 <varlistentry>
0600 <term><option>stat</option></term>
0601 <listitem>
0602 <para>Rules can take time to apply to all sieved PO files, and this parameter requests to write out some statistics of rule application at the end of sieving.</para>
0603 </listitem>
0604 </varlistentry>
0605 
0606 <varlistentry>
0607 <term><option>accel:<replaceable>characters</replaceable></option></term>
0608 <listitem>
0609 <para>Characters to consider as <link linkend="sec-poaccel">accelerator markers</link>. If not given, they may be read from sieved PO files. Note that this parameter in itself does nothing: it only makes it possible for a particular rule or group of rules to remove the accelerator before matching.</para>
0610 </listitem>
0611 </varlistentry>
0612 
0613 <varlistentry>
0614 <term><option>markup:<replaceable>types</replaceable></option></term>
0615 <listitem>
0616 <para>The type of text markup used in messages, by keyword. It can also be a comma-separated list of keywords. If not given, it may be read from sieved PO files. See description of <link linkend="hdr-x-text-markup"><literal>X-Text-Markup</literal></link> in <xref linkend="sec-cmheader"/> for the list of markup keywords currently known to Pology. Similarly to <option>accel</option> parameter, this parameter only enables rules to remove the markup (or do something else) before matching.</para>
0617 </listitem>
0618 </varlistentry>
0619 
0620 <varlistentry>
0621 <term><option>xml:<replaceable>file</replaceable></option></term>
0622 <listitem>
0623 <para>By default, messages failed by rules are reported to standard output, and this parameter requests that they be written into a custom (but simple) XML format. This also causes results to be cached: on subsequent runs of <command>check-rules</command> only modified PO files will be checked again, and results for non-modified files will be pulled from the cache. The cache can be found in <filename>$HOME/.pology-check_rules-cache/</filename> directory.</para>
0624 </listitem>
0625 </varlistentry>
0626 
0627 <varlistentry>
0628 <term><option>rfile:<replaceable>file</replaceable></option></term>
0629 <listitem>
0630 <para>By default internal Pology rules are applied, and this parameter can be used to apply external rules instead, defined in the given rule file.</para>
0631 </listitem>
0632 </varlistentry>
0633 
0634 <varlistentry>
0635 <term><option>rdir:<replaceable>directory</replaceable></option></term>
0636 <listitem>
0637 <para>Like <option>rfile</option>, but external rules are read from a directory containing any number of rule files.</para>
0638 </listitem>
0639 </varlistentry>
0640 
0641 <varlistentry>
0642 <term><option>branch:<replaceable>branch</replaceable></option></term>
0643 <listitem>
0644 <para>Apply rules only to messages from given branch (<link linkend="ch-summit">summit</link>). Several branches may be given as comma-separated list.</para>
0645 </listitem>
0646 </varlistentry>
0647 
0648 <varlistentry>
0649 <term><option>showfmsg</option></term>
0650 <listitem>
0651 <para>Rules are sometimes applied to the filtered instead of the original message, and when such message is failed, it may not be obvious what triggered the rule. This parameter requests that the filtered message is written out too when the original message is reported.</para>
0652 </listitem>
0653 </varlistentry>
0654 
0655 <varlistentry>
0656 <term><option>nomsg</option></term>
0657 <listitem>
0658 <para>When a message is failed, by default it is output in full together with the problem description. This parameter requests that only the problem description is output.</para>
0659 </listitem>
0660 </varlistentry>
0661 
0662 <varlistentry>
0663 <term><option>lokalize</option></term>
0664 <listitem>
0665 <para>Open the PO file on reported messages in Lokalize. Lokalize must be already running with the project that contains the PO file opened.</para>
0666 </listitem>
0667 </varlistentry>
0668 
0669 <varlistentry>
0670 <term><option>mark</option></term>
0671 <listitem>
0672 <para>To each failed message a <literal>failed-rule</literal> flag is added, modifying the PO file. Modified files can then be opened in the editor, and failed messages looked up by this flag.</para>
0673 </listitem>
0674 </varlistentry>
0675 
0676 <varlistentry>
0677 <term><option>byrule</option></term>
0678 <listitem>
0679 <para>As usual for sieving, by default each failed message is output as soon as it is processed. This parameter makes the failed messages output ordered by rules instead, where rules are sorted alphabetically by their identifiers. Note that this will cause there to be no output until all messages have been sieved.</para>
0680 </listitem>
0681 </varlistentry>
0682 
0683 <varlistentry>
0684 <term><option>ruleinfo</option></term>
0685 <listitem>
0686 <para>Shows information on loading of rules during sieving, including switching of environments and listing manually selected rules.</para>
0687 </listitem>
0688 </varlistentry>
0689 
0690 </variablelist>
0691 </para>
0692 
0693 <para>One or more rules can be disabled on a particular message in the PO file itself, by adding a special translator comment that starts with <literal>skip-rule:</literal> and continues with comma-separated list of rule identifiers:
0694 <programlisting language="po">
0695 # skip-rule: <replaceable>ruleid1</replaceable>, <replaceable>ruleid2</replaceable>, ...
0696 </programlisting>
0697 </para>
0698 
0699 </sect2>
0700 
0701 <sect2 id="sv-check-spell">
0702 <title><command>check-spell</command></title>
0703 
0704 <para><command>check-spell</command> checks spelling of translation by splitting it into words and passing them through GNU Aspell (<ulink url="http://aspell.net/">http://aspell.net/</ulink>). This sieve is a more specific counterpart to <link linkend="sv-check-spell-ec"><command>check-spell-ec</command></link>, which exposes some options specific to Aspell and requires no external Python modules, only the Aspell installation. Also read <xref linkend="sec-lgspell"/> for details on spell-checking in Pology.</para>
0705 
0706 <para><command>check-spell</command> behaves mostly the same as <command>check-spell-ec</command>, and accepts all the same parameters with same meanings; the exception is the <option>provider</option> parameter, which is not present here since Aspell is the fixed provider. Only the parameters specific to this sieve are described in the following:
0707 <variablelist>
0708 
0709 <varlistentry>
0710 <term><option>enc:<replaceable>encoding</replaceable></option></term>
0711 <listitem>
0712 <para>The encoding in which the text should be sent to Aspell.</para>
0713 </listitem>
0714 </varlistentry>
0715 
0716 <varlistentry>
0717 <term><option>var:<replaceable>variety</replaceable></option></term>
0718 <listitem>
0719 <para>The variety of the Aspell dictionary, if any.</para>
0720 </listitem>
0721 </varlistentry>
0722 
0723 <varlistentry>
0724 <term><option>skip:<replaceable>regex</replaceable></option></term>
0725 <listitem>
0726 <para>Words matched by this regular expression are not sent to spell-checker.</para>
0727 </listitem>
0728 </varlistentry>
0729 
0730 <varlistentry>
0731 <term><option>case</option></term>
0732 <listitem>
0733 <para>Matching patterns given as parameter values (e.g. with <option>skip:</option>) are by default case-insensitive, and this parameter switches them to case-sensitive.</para>
0734 </listitem>
0735 </varlistentry>
0736 
0737 <varlistentry>
0738 <term><option>xml:<replaceable>file</replaceable></option></term>
0739 <listitem>
0740 <para>By default, messages with unknown words are reported to standard output, and this parameter requests that they be written into a custom (but simple) XML format.</para>
0741 </listitem>
0742 </varlistentry>
0743 
0744 </variablelist>
0745 </para>
0746 
0747 <para>Aspell can be configured for use in Pology through user configuration, so that it is not necessary to issue some parameters on every run. See <xref linkend="sec-cmcfgaspell"/>.</para>
0748 
0749 <caution><para>This sieve is deprecated. Use <link linkend="sv-check-spell-ec"><command>check-spell-ec</command></link> instead, which can apply various spell-checking backends through Enchant.</para></caution>
0750 
0751 </sect2>
0752 
0753 <sect2 id="sv-check-spell-ec">
0754 <title><command>check-spell-ec</command></title>
0755 
0756 <para><command>check-spell-ec</command> uses the Enchant library (<ulink url="http://www.abisource.com/projects/enchant/">http://www.abisource.com/projects/enchant/</ulink>) through PyEnchant Python module (<ulink url="http://pyenchant.sourceforge.net">http://pyenchant.sourceforge.net</ulink>) to provide uniform access to different spell-checkers, such as Aspell, Ispell, Hunspell, etc. Translation is first split into words, possibly eliminating markup and other literal content, and the words are then fed to spell-checker. Messages containing unknown words are reported to standard output, with list of replacement suggestions.</para>
0757 
0758 <para>Parameters:
0759 <variablelist>
0760 
0761 <varlistentry>
0762 <term><option>provider:<replaceable>keyword</replaceable></option></term>
0763 <listitem>
0764 <para>The spell-checker that Enchant should use. The value is one of keywords defined by Enchant (e.g. <literal>aspell</literal>, <literal>myspell</literal>...), and can be seen by running <command>enchant-lsmod</command> command (only providers available on the system are shown). If not given either by this parameter or in user configuration, Enchant will try to select a provider on its own.</para>
0765 </listitem>
0766 </varlistentry>
0767 
0768 <varlistentry>
0769 <term><option>lang:<replaceable>code</replaceable></option></term>
0770 <listitem>
0771 <para>The language code for which the spelling is checked. If not given, it will be read from each PO file in turn, and if not found there either, an error will be signaled.</para>
0772 </listitem>
0773 </varlistentry>
0774 
0775 <varlistentry>
0776 <term><option>env:<replaceable>environment</replaceable></option></term>
0777 <listitem>
0778 <para>The language environment for which to include supplemental dictionaries (see <xref linkend="sec-lglangenv"/>). Several environments can be given as comma-separated list, in which case the union of their dictionaries is used. If not given, environments may be read from PO files (see <link linkend="hdr-x-environment"><literal>X-Environment</literal></link> in <xref linkend="sec-cmheader"/>) or from user configuration.</para>
0779 </listitem>
0780 </varlistentry>
0781 
0782 <varlistentry>
0783 <term><option>accel:<replaceable>characters</replaceable></option></term>
0784 <listitem>
0785 <para>Characters to consider as <link linkend="sec-poaccel">accelerator markers</link>, to remove them before splitting text into words. If not given, they may be read from PO files (see <link linkend="hdr-x-accelerator-marker"><literal>X-Acclerator-Marker</literal></link> in <xref linkend="sec-cmheader"/>).</para>
0786 </listitem>
0787 </varlistentry>
0788 
0789 <varlistentry>
0790 <term><option>markup:<replaceable>types</replaceable></option></term>
0791 <listitem>
0792 <para>The type of text markup used in messages, by keyword. It can also be a comma-separated list of keywords. If not given, it may be read from PO files (see <link linkend="hdr-x-text-markup"><literal>X-Text-Markup</literal></link> in <xref linkend="sec-cmheader"/>; there the list of markup keywords currently known to Pology is given as well).</para>
0793 </listitem>
0794 </varlistentry>
0795 
0796 <varlistentry>
0797 <term><option>skip:<replaceable>regex</replaceable></option></term>
0798 <listitem>
0799 <para>Words matched by this regular expression are not sent to spell-checker.</para>
0800 </listitem>
0801 </varlistentry>
0802 
0803 <varlistentry>
0804 <term><option>case</option></term>
0805 <listitem>
0806 <para>Matching patterns given as parameter values (e.g. with <option>skip:</option>) are by default case-insensitive, and this parameter switches them to case-sensitive.</para>
0807 </listitem>
0808 </varlistentry>
0809 
0810 <varlistentry>
0811 <term><option>filter:<replaceable>hookspec</replaceable></option></term>
0812 <listitem>
0813 <para>The hook to modify the text before splitting into words and spell-checking them (see <xref linkend="sec-cmhooks"/>). The hook type must be F1A, F3A, or F3C. The parameter can be repeated to add several hooks, which are then applied in the order of specification.</para>
0814 </listitem>
0815 </varlistentry>
0816 
0817 <varlistentry>
0818 <term><option>suponly</option></term>
0819 <listitem>
0820 <para>By default, internal supplemental spelling dictionaries are added to the system dictionary of the selected spell-checker. This parameter can be issued to instead use only internal dictionaries and not the system dictionary.</para>
0821 </listitem>
0822 </varlistentry>
0823 
0824 <varlistentry>
0825 <term><option>list</option></term>
0826 <listitem>
0827 <para>By default, when an unknown word is found, the complete message is output, with the problematic word highlighted and possibly the replacement suggestions. With this parameter, only a plain sorted list of unknown words, one per line, is output at the end of sieving. This is useful when a lot of false positives are expected, to quickly add them to the supplemental dictionary.</para>
0828 </listitem>
0829 </varlistentry>
0830 
0831 <varlistentry>
0832 <term><option>lokalize</option></term>
0833 <listitem>
0834 <para>Open the PO file on messages containing unknown words in Lokalize. Lokalize must be already running with the project that contains the PO file opened.</para>
0835 </listitem>
0836 </varlistentry>
0837 
0838 </variablelist>
0839 </para>
0840 
0841 <para><command>check-spell-ec</command> may be told to skip checking specific messages and words, and it may use internal supplemental spelling dictionaries. See <xref linkend="sec-lgspell"/> for these and other details on spell-checking in Pology.</para>
0842 
0843 <para>Enchant can be configured for use in Pology through user configuration, so that it is not necessary to issue some parameters on every run. See <xref linkend="sec-cmcfgenchant"/>.</para>
0844 
0845 </sect2>
0846 
0847 <sect2 id="sv-check-tp-kde">
0848 <title><command>check-tp-kde</command></title>
0849 
0850 <para>The <ulink url="http://l10n.kde.org/">KDE Translation Project</ulink> contains a great number of PO files extracted from various types of sources. This results in that for each message, there are things that the translation can, must or must not contain, for the translation to be technically valid. When run over PO files within the KDE TP, <command>check-tp-kde</command> will first try to determine the type of each message and then apply appropriate technical checks to it. Message type is determined based on file location, file header, message flags and contexts; even a particular message in a particular file may be checked for some very specific issue.</para>
0851 
0852 <para id="p-techprob">"Technical" issues are those which should be fixed regardless of the language and style of translation, because they can lead to loss of functionality, information or presentation to the user. For example, a technical issue would be badly paired XML tags in translation, when in the original they were well paired; a non-technical issue (and thus not checked) would be when the original ends with a certain punctuation, but translation does not -- whether such details are errors or not, depends on the target language and translation style.</para>
0853 
0854 <para>For the sieve to function properly, it needs to detect the project subdirectory of each PO file up to topmost division within the branch, e.g. <filename>messages/kdebase</filename> <filename>docmessages/kdegames</filename>. This means that the local copy of the repository tree needs to follow the repository layout up to that point, e.g. <filename>kde-trunk-ui/kdebase</filename> and <filename>kde-trunk-doc/kdegames</filename> would not be valid local paths.</para>
0855 
0856 <para>Parameters:
0857 <variablelist>
0858 
0859 <varlistentry>
0860 <term><option>strict</option></term>
0861 <listitem>
0862 <para>Sometimes the original text itself may not be valid against a certain check. When this is the case, by default the translation is not expected to be valid either, and the check is skipped. Issuing this parameter will force all checks on translation, regardless of whether the original is valid or not. It may still be possible to avoid some checks on those messages that just cannot be repared through translation, if those checks define their own mechanism of cancelation (like adding a special translator comment).</para>
0863 </listitem>
0864 </varlistentry>
0865 
0866 <varlistentry>
0867 <term><option>check:<replaceable>keywords</replaceable></option></term>
0868 <listitem>
0869 <para>Comma-separated list of checks to apply, by keyword, instead of all. Available checks are listed below.</para>
0870 </listitem>
0871 </varlistentry>
0872 
0873 <varlistentry>
0874 <term><option>showmsg</option></term>
0875 <listitem>
0876 <para>By default, when the message does not pass a check, only its location and the problem are reported. This parameter requests that message is reported in total, possibly with problematic segments of translation highlighted.</para>
0877 </listitem>
0878 </varlistentry>
0879 
0880 <varlistentry>
0881 <term><option>lokalize</option></term>
0882 <listitem>
0883 <para>Open the PO file on reported messages in Lokalize. Lokalize must be already running with the project that contains the PO file opened.</para>
0884 </listitem>
0885 </varlistentry>
0886 
0887 </variablelist>
0888 </para>
0889 
0890 <para>Currently available checks (keyword in parenthesis):
0891 <itemizedlist>
0892 
0893 <listitem>
0894 <para>KDE4 markup checking (<literal>kde4markup</literal>).</para>
0895 </listitem>
0896 
0897 <listitem>
0898 <para>Qt markup checking (<literal>qtmarkup</literal>).</para>
0899 </listitem>
0900 
0901 <listitem>
0902 <para>Docbook markup checking (<literal>dbmarkup</literal>)</para>
0903 </listitem>
0904 
0905 <listitem>
0906 <para>HTML markup checking (<literal>htmlmarkup</literal>).</para>
0907 </listitem>
0908 
0909 <listitem>
0910 <para>No translation scripting in "dumb" messages (<literal>nots</literal>). Translations fetched at runtime by KDE4 translation system may use <ulink url="http://techbase.kde.org/Localization/Concepts/Transcript">translation scripting</ulink>. This check will make sure that scripting is not attempted for other types of messages (used by Qt-only code, for .desktop files, etc.).</para>
0911 </listitem>
0912 
0913 <listitem>
0914 <para>Qt datetime format messages (<literal>qtdt</literal>). A message is considered to be in this format if it contains the string <literal>qtdt-format</literal> in its <varname>msgctxt</varname> string or among flags.</para>
0915 </listitem>
0916 
0917 <listitem>
0918 <para>Validity of translator credits (<literal>trcredits</literal>). PO files may contain meta-messages to input translator credits, which should have both valid translations on their own and some congruence between them.</para>
0919 </listitem>
0920 
0921 <listitem>
0922 <para>Query placeholders in Plasma runners (<literal>plrunq</literal>). Messages in Plasma runners may contain special query placeholder <literal>:q:</literal>, which should be present in translation too.</para>
0923 </listitem>
0924 
0925 <listitem>
0926 <para>File-specific checking (<literal>catspec</literal>). Certain messages in certain PO files have special validity requirements, and this check activates all such file-specific checks.</para>
0927 </listitem>
0928 
0929 </itemizedlist>
0930 </para>
0931 
0932 <para>All markup checks can be skipped on a message by adding the <literal>no-check-markup</literal> <link linkend="p-trflag">translator flag</link>.</para>
0933 
0934 </sect2>
0935 
0936 <sect2 id="sv-check-tp-wesnoth">
0937 <title><command>check-tp-wesnoth</command></title>
0938 
0939 <para>PO files of <ulink url="http://www.wesnoth.org/">The Battle of Wesnoth</ulink> contain a mix of well-known and custom markup and format directives. <command>check-tp-wesnoth</command> heuristically determines the type of each message in a Wesnoth PO file and applies appropriate technical checks to it (where "technical" has the same meaning as in <link linkend="p-techprob">the <command>check-tp-kde</command> sieve</link>).</para>
0940 
0941 <para>Parameters:
0942 <variablelist>
0943 
0944 <varlistentry>
0945 <term><option>check:<replaceable>keywords</replaceable></option></term>
0946 <listitem>
0947 <para>Comma-separated list of checks to apply, by keyword, instead of all. Available checks are listed below.</para>
0948 </listitem>
0949 </varlistentry>
0950 
0951 <varlistentry>
0952 <term><option>showmsg</option></term>
0953 <listitem>
0954 <para>Instead of just showing the message location and problem description, also show the complete message, possibly with higlighted problematic segments.</para>
0955 </listitem>
0956 </varlistentry>
0957 
0958 <varlistentry>
0959 <term><option>lokalize</option></term>
0960 <listitem>
0961 <para>Open the PO file on reported messages in Lokalize. Lokalize must be already running with the project that contains the PO file opened.</para>
0962 </listitem>
0963 </varlistentry>
0964 
0965 </variablelist>
0966 </para>
0967 
0968 <para>Currently available checks (keyword in parenthesis):
0969 <itemizedlist>
0970 
0971 <listitem>
0972 <para>Stray context separators in translation (<literal>ctxtsep</literal>). Wesnoth is still embedding disambiguating context into <varname>msgid</varname>, by putting it in front of the actual text and separated by <literal>^</literal>. An unwary translator will sometimes mistakes such context for part of the original text, and translate it too.</para>
0973 </listitem>
0974 
0975 <listitem>
0976 <para>Congruence of WML interpolations (<literal>interp</literal>). WML interpolations look like <literal>"...side $side_number is..."</literal> and normally must match between the original and translation, or else the player would loose information. Only in very rare cases (e.g. some plurals and Markov chain generators) some interpolations may be missing in translation, and then they can be listed space-separated in a translator comment to silence the check:
0977 <programlisting language="po">
0978 # ignore-interpolations: <replaceable>interp1</replaceable> <replaceable>interp2</replaceable> ...
0979 </programlisting>
0980 (the <literal>$</literal> character is not necessary in the list).
0981 </para>
0982 </listitem>
0983 
0984 <listitem>
0985 <para>WML markup checking (<literal>wml</literal>). If WML in translation is not valid, player may see some visual artifacts. Also, links in WML must match between original and translation, to avoid loss of information.</para>
0986 </listitem>
0987 
0988 <listitem>
0989 <para>Pango markup checking (<literal>pango</literal>). Pango is used in some places for visual text markup instead of WML.</para>
0990 </listitem>
0991 
0992 <listitem>
0993 <para>Congruence of leading and trailing space (<literal>space</literal>). For many languages, significant leading and trailing space from the original should be preserved. A heuristic is used to determine when leading or trailing space is significant. Only languages explicitly specified internally are checked for this.</para>
0994 </listitem>
0995 
0996 <listitem>
0997 <para>Docbook validity (<literal>docbook</literal>). Docbook is actually not used as a source format anywhere in Wesnoth, but the Wesnoth manual is converted into Docbook specifically to facilitate translation (weird as it may sound).</para>
0998 </listitem>
0999 
1000 <listitem>
1001 <para>Man page validity (<literal>man</literal>).</para>
1002 </listitem>
1003 
1004 </itemizedlist>
1005 </para>
1006 
1007 </sect2>
1008 
1009 <sect2 id="sv-collect-pmap">
1010 <title><command>collect-pmap</command></title>
1011 
1012 <para><emphasis>Property maps</emphasis> (or <emphasis>pmaps</emphasis> for short) are one way in which arbitrary properties of language phrases can be defined for use in scripted translations, such as provided by <ulink url="http://techbase.kde.org/Localization/Concepts/Transcript">Transcript</ulink>, the translation scripting system in KDE 4.</para>
1013 
1014 <para>A property map is a text file with a number of entries, each defining the properties of a certain phrase. A pmap entry starts with one or more keys and continues with arbitrary number of key-value properties. An example entry would be grammar declinations of a noun:
1015 <programlisting>
1016 =/Athens/Atina/nom=Atina/gen=Atine/dat=Atini/acc=Atinu//
1017 </programlisting>
1018 The first two characters define, in order, the key-value separator (here <literal>=</literal>) and the property separator (here <literal>/</literal>) for the current entry. The two separators can be any non-alphanumeric characters, and must be different. Then follows a number of entry keys, delimited by property separators, and then a number of key-value properties, each internaly delimited by the key-value separator. The entry is terminated by double property separator. Properties of an entry can be fetched in the translation scripting system by any of the entry keys; keys are case- and whitespace-insensitive.</para>
1019 
1020 <para><command>collect-pmap</command> will parse pmap entries from manual comments in messages, collect them, and write out a property map file. It is not necessary to explicitly specify entry keys, since the contents of <varname>msgid</varname> and <varname>msgstr</varname> are automatically added as keys. Since each manual comment is one line, it is also allowed to drop
1021 the final double separator which would normally terminate the entry.
1022 The above example would thus look like this in a PO message:
1023 <programlisting language="po">
1024 # pmap: =/nom=Atina/gen=Atine/dat=Atini/acc=Atinu/
1025 msgctxt "Greece/city"
1026 msgid "Athens"
1027 msgstr "Atina"
1028 </programlisting>
1029 The manual comment starts with <literal>pmap:</literal> keyword, which is followed by a normal pmap entry, except for missing keys (but additional keys can be specified when <varname>msgid</varname> and <varname>msgstr</varname> are not sufficient). It is also possible to split the entry into several comments,
1030 with only condition that all share the same set of separators:
1031 <programlisting language="po">
1032 # pmap: =/nom=Atina/gen=Atine/
1033 # pmap: =/dat=Atini/acc=Atinu/
1034 </programlisting>
1035 After collecting pmap entries from all processed PO files, if two or more entries end up having same keys, they are all removed from the collection and a warning is reported.</para>
1036 
1037 <para>Pmap entries are collected only from translated, non-plural messages.</para>
1038 
1039 <para>Parameters:
1040 <variablelist>
1041 
1042 <varlistentry>
1043 <term><option>outfile:<replaceable>file</replaceable></option></term>
1044 <listitem>
1045 <para>File path into which the property map should be written. If not given, nothing is written out; this is useful for validating entries.</para>
1046 </listitem>
1047 </varlistentry>
1048 
1049 <varlistentry>
1050 <term><option>propcons:<replaceable>file</replaceable></option></term>
1051 <listitem>
1052 <para>Path to the file which defines constraints on property keys and values, used to validate parsed entries (see <xref linkend="sec-svvalpmap"/>).</para>
1053 </listitem>
1054 </varlistentry>
1055 
1056 <varlistentry>
1057 <term><option>extrakeys</option></term>
1058 <listitem>
1059 <para>By default, it is actually not possible to add any aditional entry keys besides the automatically added <varname>msgid</varname> and <varname>msgstr</varname>. This gives extra safety against errors, such as translator mistyping the key-value pair. If additional keys are actually needed, this parameter can be issued to accept them.</para>
1060 </listitem>
1061 </varlistentry>
1062 
1063 <varlistentry>
1064 <term><option>derivs:<replaceable>file</replaceable></option></term>
1065 <listitem>
1066 <para>Path to the file which defines derivators for synder entries (see <xref linkend="sec-svsynder"/>).</para>
1067 </listitem>
1068 </varlistentry>
1069 
1070 <varlistentry>
1071 <term><option>pmhead:<replaceable>string</replaceable></option></term>
1072 <listitem>
1073 <para>Default <literal>pmap:</literal> as entry prefix may not be the most convenient; for example, when the language of translation is not written with Latin script. This parameter makes makes it possibly to use an arbitrary string for the entry prefix.</para>
1074 </listitem>
1075 </varlistentry>
1076 
1077 <varlistentry>
1078 <term><option>sdhead:<replaceable>string</replaceable></option></term>
1079 <listitem>
1080 <para>Like <option>pmhead</option>, but for prefix to synder entries, instead of the default <literal>synder:</literal> (see <xref linkend="sec-svsynder"/>).</para>
1081 </listitem>
1082 </varlistentry>
1083 
1084 </variablelist>
1085 </para>
1086 
1087 <sect3 id="sec-svsynder">
1088 <title>Derivating Entries</title>
1089 
1090 <para>There is another, more succint way to define pmap entries in comments. Instead of writting out all key-value combinations, it is possible instead to generate them by using <emphasis>syntagma derivators</emphasis> (or <emphasis>synders</emphasis>) for short. From the earlier example:
1091 <programlisting language="po">
1092 # pmap: =/nom=Atina/gen=Atine/dat=Atini/acc=Atinu/
1093 </programlisting>
1094 it can be observed that each form has the same root, <literal>Atin</literal>, followed by the appropriate ending for that form type. This makes it convenient
1095 to reformulate it as a syntagma derivation:
1096 <programlisting language="po">
1097 # synder: Atin|a
1098 </programlisting>
1099 Here <literal>|a</literal> is a <emphasis>derivator</emphasis>; all derivators are defined in a separate synder file (with <filename>.sd</filename> extension by convention) and made known to the sieve through the <option>derivs</option> parameter. The derivator in this example would be defined like this:
1100 <programlisting>
1101 |a: nom=a, gen=e, dat=i, acc=u
1102 </programlisting>
1103 First comes the derivator name, starting with <literal>|</literal> and ending with <literal>:</literal>, and then the comma-separated list of key-value pairs  similar as in the pmap entry, except that now only the endings for the given form are specified. Synders are actually a standalone subsystem of Pology, see <xref linkend="sec-lgsynder"/> for all details.</para>
1104 
1105 <para>It is possible to mix pmap (<literal># pmap: ...</literal>) and synder (<literal># synder: ...</literal>) entries in translator comments. For example, synder entries may be used to cover majority of cases, which follow the general language rules, while pmap entries can be used for exceptions.</para>
1106 
1107 <para>On the other hand, every pmap entry can be reformulated as a synder entry which does not refer to an external derivator:
1108 <programlisting language="po">
1109 # synder: nom=Atina, gen=Atine, dat=Atini, acc=Atinu
1110 </programlisting>
1111 This begs the question of what is the need for pmap entries at all, if synder entries can be used in the same capacity and beyond? Pmap entries are still useful because synders have a lot of special syntax and rules to keep in mind (e.g. what if the phrase itself contains a comma?), while raw pmaps have none past what was described above.</para>
1112 
1113 </sect3>
1114 
1115 <sect3 id="sec-svvalpmap">
1116 <title>Validating Entries</title>
1117 
1118 <para>The <literal>propcons</literal> parameter can be used to specify a file which defines constraints on acceptable property keys, and on values by each key. Its format is the following:
1119 <programlisting>
1120 # Full-line comment.
1121 /key_regex_1/value_regex_1/flags # a trailing comment
1122 /key_regex_2/value_regex_2/flags
1123 :key_regex_3:value_regex_3:flags # different separator
1124 # etc.
1125 </programlisting>
1126 Regular expressions for keys and values are delimited by a separator defined by first non-whitespace character in the line, which must also be non-alphanumeric. Before being compiled, regular expressions are automatically wrapped as <literal>^(<replaceable>regex</replaceable>)$</literal>, so that an expression to require a certain prefix is given as <literal><replaceable>prefix</replaceable>.*</literal> and a suffix as <literal>.*<replaceable>suffix</replaceable></literal>. A property key must match one of the key regexs, or else it is considered invalid. Value to that property must then match the value regexes attached to all matched key regexes.</para>
1127 
1128 <para>For example, a constraint file defining no constraints on either
1129 property keys or values is:
1130 <programlisting>
1131 /.*/.*/
1132 </programlisting>
1133 while a file explicitly listing all allowed property keys, and constraining values to some of them, would be:
1134 <programlisting>
1135 /nom|gen|dat|acc/.*/
1136 /gender/m|f|n/
1137 /number/s|p/
1138 </programlisting>
1139 </para>
1140 
1141 <para>The last separator in the constraint can be followed by a string of single-character flags. These flags are currently defined:
1142 <itemizedlist>
1143 <listitem>
1144 <para><literal>i</literal>: case-insensitive matching for the value.</para>
1145 </listitem>
1146 <listitem>
1147 <para><literal>I</literal>: case-insensitive matching for the key.</para>
1148 </listitem>
1149 <listitem>
1150 <para><literal>t</literal>: the value must both match the regular expression and be equal to <varname>msgstr</varname>. If <literal>i</literal> flag is added too, equality check is also case-insensitive.</para>
1151 </listitem>
1152 <listitem>
1153 <para><literal>r</literal>: regular expression for the key must match at least one key among all defined properties.</para>
1154 </listitem>
1155 </itemizedlist>
1156 </para>
1157 
1158 <para>Constraint definition file must be encoded with UTF-8.</para>
1159 
1160 </sect3>
1161 
1162 </sect2>
1163 
1164 <sect2 id="sv-diff-previous">
1165 <title><command>diff-previous</command></title>
1166 
1167 <para>When PO files are merged with <option>--previous</option> option to <command>msgmerge</command>, fuzzy messages will retain the previous version of original text (<varname>msgctxt</varname>, <varname>msgid</varname> and <varname>msgid_plural</varname>) under <literal>#|</literal> comments. Then <command>diff-previous</command> can be used to embedded differences from previous to current original into previous original strings. For example, the message:
1168 <programlisting language="po">
1169 #: main.c:110
1170 #, fuzzy
1171 #| msgid "The Record of The Witch River"
1172 msgid "Records of The Witch River"
1173 msgstr "Beleška o Veštičjoj reci"
1174 </programlisting>
1175 will become after sieving:
1176 <programlisting language="po">
1177 #: main.c:110
1178 #, fuzzy
1179 #| msgid "{-The Record-}{+Records+} of The Witch River"
1180 msgid "Records of The Witch River"
1181 msgstr "Beleška o Veštičjoj reci"
1182 </programlisting>
1183 Text editors may even provide highlighting for the wrapped difference segments
1184 (e.g. Kwrite/Kate).</para>
1185 
1186 <para>This sieve is very useful if your PO editor does not show differences in the original by itself. To be able to easily see exactly what was changed in the original is important both for efficiency and for quality. Think of a long paragraph in which only one word was changed: without a diff it will take you time to reread it, and you may even miss that changed word.</para>
1187 
1188 <para>Parameters:
1189 <variablelist>
1190 
1191 <varlistentry>
1192 <term><option>strip</option></term>
1193 <listitem>
1194 <para>Instead of embedding diffs, remove them from messages, recovering the original form of previous strings. This is useful if you did not update all fuzzy messages but you anyway want to send the PO file away (commit it to the repository, etc.).</para>
1195 </listitem>
1196 </varlistentry>
1197 
1198 <varlistentry>
1199 <term><option>branch:<replaceable>branch</replaceable></option></term>
1200 <listitem>
1201 <para>Embed diffs only into messages from given branch (<link linkend="ch-summit">summit</link>). Several branches may be given as comma-separated list.</para>
1202 </listitem>
1203 </varlistentry>
1204 
1205 </variablelist>
1206 </para>
1207 
1208 </sect2>
1209 
1210 <sect2 id="sv-empty-fuzzies">
1211 <title><command>empty-fuzzies</command></title>
1212 
1213 <para>For every fuzzy message, <command>empty-fuzzies</command> removes the translation and fuzzy data (the <literal>fuzzy</literal> flag, previous strings). Translator comments are kept by default, but they can be removed as well. Obsolete fuzzy messages are completely removed.</para>
1214 
1215 <para>Parameters:
1216 <variablelist>
1217 
1218 <varlistentry>
1219 <term><option>rmcomments</option></term>
1220 <listitem>
1221 <para>Also remove translator comments from fuzzy messages.</para>
1222 </listitem>
1223 </varlistentry>
1224 
1225 <varlistentry>
1226 <term><option>noprev</option></term>
1227 <listitem>
1228 <para>Empty only those fuzzy messages which do not have previous strings (i.e. when the PO file was merged without <option>--previous</option> option to <command>msgmerge</command>).</para>
1229 </listitem>
1230 </varlistentry>
1231 
1232 </variablelist>
1233 </para>
1234 
1235 </sect2>
1236 
1237 <sect2 id="sv-equip-header-tp-kde">
1238 <title><command>equip-header-tp-kde</command></title>
1239 
1240 <para><command>equip-header-tp-kde</command> applies <link linkend="hk-proj-kde-header-equip-header">the <literal>kde%header/equip-header</literal> hook</link> to headers of PO files within the KDE Translation Project.</para>
1241 
1242 <para>There are no parameters.</para>
1243 
1244 </sect2>
1245 
1246 <sect2 id="sv-fancy-quote">
1247 <title><command>fancy-quote</command></title>
1248 
1249 <para>Ordinary ASCII quotes are easy to type on most keyboard layouts, and these quotes are frequently encountered in non-typeset English texts, rather than proper English quotes. These proper quotes are sometimes called "fancy" quotes. When translating from English, translators can thus be easily moved to use ASCII quotes themselves, instead of the fancy quotes appropriate for their language. To somewhat correct this, <command>fancy-quote</command> can be used to replace ASCII quotes in the translation with selected pairs of fancy quotes.</para>
1250 
1251 <para>ASCII quotes that are part of text markup (e.g. attribute values in XML-like tags) must not be replaced, and this sieve will use heuristics to determine such places. In fact, it will replace quotes rather conservatively. Nevertheless, unless some sort of automatic validation is available, converted text should be manually inspected for correctness.</para>
1252 
1253 <para>Parameters:
1254 <variablelist>
1255 
1256 <varlistentry>
1257 <term><option>single:<replaceable>quotes</replaceable></option></term>
1258 <listitem>
1259 <para>Opening and closing quote to replace ASCII single quotes (i.e. <replaceable>quotes</replaceable> is a two-character string). If not given, single quotes are not replaced (but see the <option>longsingle</option> parameter).</para>
1260 </listitem>
1261 </varlistentry>
1262 
1263 <varlistentry>
1264 <term><option>single:<replaceable>quotes</replaceable></option></term>
1265 <listitem>
1266 <para>Opening and closing quote to replace ASCII double quotes. If not given, double quotes are not replaced (but see the <option>longdouble</option> parameter).</para>
1267 </listitem>
1268 </varlistentry>
1269 
1270 <varlistentry>
1271 <term><option>longsingle:<replaceable>open</replaceable>,<replaceable>close</replaceable></option></term>
1272 <listitem>
1273 <para>Alternative to <option>single</option>, if opening and closing quotes are not single characters. The value are the opening quote string and the closing quote string, separated by comma.</para>
1274 </listitem>
1275 </varlistentry>
1276 
1277 <varlistentry>
1278 <term><option>longdouble:<replaceable>open</replaceable>,<replaceable>close</replaceable></option></term>
1279 <listitem>
1280 <para>Alternative to <option>double</option>, if opening and closing quotes are not single characters.</para>
1281 </listitem>
1282 </varlistentry>
1283 
1284 </variablelist>
1285 </para>
1286 
1287 </sect2>
1288 
1289 <sect2 id="sv-find-messages">
1290 <title><command>find-messages</command></title>
1291 
1292 <para><command>find-messages</command> is the search and replace workhorse of Pology. It applies one or several conditions to different parts of the PO message, with selectable boolean linking between them. If the message is matched as whole, it is reported and possibly some replacements are done. Messages are by default reported to standard output, with full location reference (PO file path, line and entry number), but can also be opened directly in one of supported PO editors (see <xref linkend="sec-cmsupped"/>).</para>
1293 
1294 <para>When used in a sieve chain, <command>find-messages</command> will stop further sieving of messages which did not satisfy the conditions. This makes it useful as a filter for selecting subsets of messages on which other sieves should operate.</para>
1295 
1296 <para>There are three logical groups of parameters: matching parameters, replacement parameters, and general parameters. Matching and replacement parameters have certain relationships between themselves, while general parameters have mutually independent effects (i.e. as usual for sieve parameters).</para>
1297 
1298 <sect3 id="sec-svfmmpar">
1299 <title>Matching Parameters</title>
1300 
1301 <para>Matching parameters specify patterns for matching by parts of the message, or represent binary conditions (whether the message is translated, etc.). For example:
1302 <programlisting language="bash">
1303 $ posieve find-messages -s msgid:'foo bar'
1304 </programlisting>
1305 will report all messages which contain the phrase "foo bar" in their <varname>msgid</varname> (or <varname>msgid_plural</varname>) string. When several matching parameters are given, by default the message is matched if all patterns match; that is, boolean linking of conditions is AND. This:
1306 <programlisting language="bash">
1307 $ posieve find-messages -s msgid:'foo bar' -s transl
1308 </programlisting>
1309 will report all messages that contain "foo bar" in original <emphasis>and</emphasis> are translated. Boolean linking can be switched to OR by issuing the <option>or</option> parameter. To find all messages that contain the word "tooltip" in <emphasis>either</emphasis> context or comments:
1310 <programlisting language="bash">
1311 $ posieve find-messages -s msgctxt:tooltip -s comment:tooltip -s or
1312 </programlisting>
1313 (Actually, the effect of <option>or</option> is somewhat more specific, see its description below.) String matching is by default case insensitive, which can be changed globally by issuing the <option>case</option> parameter.</para>
1314 
1315 <para>Every matching parameter has a negative counterpart, named by prepending <literal>n</literal> to the original parameter, which matches when the original parameter does not. Running:
1316 <programlisting language="bash">
1317 $ posieve find-messages -s msgid:'hello' -s nmsgstr:'zdravo'
1318 </programlisting>
1319 would find all messages that contain "hello" in the original and do <emphasis>not</emphasis> contain "zdravo" in the translation (a typical usage pattern in quick terminology checks).</para>
1320 
1321 <para>To find all messages not matching a set of conditions, in principle it would be possible to negate the whole condition set by switching between positive/negative parameters and AND/OR-linking, but this can be cumbersome. Instead, the <option>invert</option> parameter can be issued to report messages that are not matched by the condition set.</para>
1322 
1323 <para>Sometimes neither simple AND nor simple OR boolean linking is sufficient to form the search. Therefore the <option>fexpr</option> parameter is provided, which can be used to specify a search expression with explicit boolean operators and parentheses for controlling the evaluation order. With <option>fexpr</option>, the previous example could be reformulated as:
1324 <programlisting language="bash">
1325 $ posieve find-messages -s fexpr:'msgid/hello/ and not msgstr/zdravo/'
1326 </programlisting>
1327 For details, see the description of <option>fexpr</option> below.</para>
1328 
1329 <para>Currently defined matching parameters:
1330 <variablelist>
1331 
1332 <varlistentry>
1333 <term><option>(n)msgctxt:<replaceable>regex</replaceable></option></term>
1334 <listitem>
1335 <para>Regular expression to match the <varname>msgctxt</varname> string.</para>
1336 </listitem>
1337 </varlistentry>
1338 
1339 <varlistentry>
1340 <term><option>(n)msgid:<replaceable>regex</replaceable></option></term>
1341 <listitem>
1342 <para>Regular expression to match the <varname>msgid</varname> and <varname>msgid_plural</varname> strings. The condition is satisfed as whole if <emphasis>either</emphasis> of these strings matches.</para>
1343 </listitem>
1344 </varlistentry>
1345 
1346 <varlistentry>
1347 <term><option>(n)msgstr:<replaceable>regex</replaceable></option></term>
1348 <listitem>
1349 <para>Regular expression to match <varname>msgstr</varname> strings. The condition is satisfed as whole if any of the <varname>msgstr</varname> strings matches.</para>
1350 </listitem>
1351 </varlistentry>
1352 
1353 <varlistentry>
1354 <term><option>(n)comment:<replaceable>regex</replaceable></option></term>
1355 <listitem>
1356 <para>Regular expression to match extracted and translator comments and source reference comments. The condition is satisfed as whole if any of these comments matches.</para>
1357 </listitem>
1358 </varlistentry>
1359 
1360 <varlistentry>
1361 <term><option>(n)flag:<replaceable>regex</replaceable></option></term>
1362 <listitem>
1363 <para>Regular expression to match flags. This matches each flag in turn, and not the flag comment as a monolithic string. The condition is satisfed as whole if any flag matches.</para>
1364 </listitem>
1365 </varlistentry>
1366 
1367 <varlistentry>
1368 <term><option>(n)transl</option></term>
1369 <listitem>
1370 <para>The message must be translated.</para>
1371 </listitem>
1372 </varlistentry>
1373 
1374 <varlistentry>
1375 <term><option>(n)obsol</option></term>
1376 <listitem>
1377 <para>The message must be obsolete.</para>
1378 </listitem>
1379 </varlistentry>
1380 
1381 <varlistentry>
1382 <term><option>(n)active</option></term>
1383 <listitem>
1384 <para>The message must be active, i.e. translated and not obsolete.</para>
1385 </listitem>
1386 </varlistentry>
1387 
1388 <varlistentry>
1389 <term><option>(n)plural</option></term>
1390 <listitem>
1391 <para>The message must be a plural message.</para>
1392 </listitem>
1393 </varlistentry>
1394 
1395 <varlistentry>
1396 <term><option>(n)maxchar:<replaceable>number</replaceable></option></term>
1397 <listitem>
1398 <para>Original and translation can have at most this many characters. The condition is satisfied as whole if all these strings satisfy it.</para>
1399 </listitem>
1400 </varlistentry>
1401 
1402 <varlistentry>
1403 <term><option>(n)lspan:<replaceable>start</replaceable>:<replaceable>end</replaceable></option></term>
1404 <listitem>
1405 <para>The referent line number of the message (the line in which its <varname>msgid</varname> string starts) must fall within given range. The starting number is included in the range, the ending number is not.</para>
1406 </listitem>
1407 </varlistentry>
1408 
1409 <varlistentry>
1410 <term><option>(n)espan:<replaceable>start</replaceable>:<replaceable>end</replaceable></option></term>
1411 <listitem>
1412 <para>Like <option>lspan</option>, but instead of line numbers it applies to entry numbers. These are the numbers that dedicated PO editors usually report in their user interfaces.</para>
1413 </listitem>
1414 </varlistentry>
1415 
1416 <varlistentry>
1417 <term><option>(n)branch:<replaceable>branch</replaceable></option></term>
1418 <listitem>
1419 <para>The message must belong to this branch (<link linkend="ch-summit">summit</link>). Several branches may be given as comma-separated list.</para>
1420 </listitem>
1421 </varlistentry>
1422 
1423 <varlistentry id="p-fexprdesc">
1424 <term><option>(n)fexpr:<replaceable>expression</replaceable></option></term>
1425 <listitem>
1426 
1427 <para>Boolean expression with explict boolean operators and parenthesis for priority, constructed out of any of the other matching parameters. If a match parameter needs a value (like a regular expression), in the expression it is given as <literal><replaceable>match</replaceable>/<replaceable>value</replaceable>/</literal>, where any nonalphanumeric character can be used consistently instead of <literal>/</literal> (in case the value itself contains <literal>/</literal>). For example, the expression:
1428 <programlisting>
1429 fexpr:'(msgctxt/foo/ or comment/foo/) and msgid/bar/'
1430 </programlisting>
1431 is satisfied if either the context or comments contain "foo", and the original text contains "bar".</para>
1432 
1433 <para>If matching is influenced by a general parameter (e.g. case sensitivity), in the expression it may be able to take overriding modifiers in form of single characters after the value, i.e. <literal><replaceable>match</replaceable>/<replaceable>value</replaceable>/<replaceable>modifiers</replaceable></literal>. Assuming that <option>case</option> parameter has not been issued, the expression:
1434 <programlisting>
1435 fexpr:'msgid/quuk/ and msgstr/Qaak/c'
1436 </programlisting>
1437 will be satisfied if the original text contains "quuk" in any casing, and translation contains exactly "Qaak". Currently available modifiers are:
1438 <itemizedlist>
1439 <listitem>
1440 <para><literal>c</literal>: matching is case-sensitive.</para>
1441 </listitem>
1442 <listitem>
1443 <para><literal>i</literal>: matching is case-insensitive. May be needed when string matching is globally case-sensitive due to <option>case</option> being issued.</para>
1444 </listitem>
1445 </itemizedlist>
1446 </para>
1447 
1448 </listitem>
1449 </varlistentry>
1450 
1451 </variablelist>
1452 </para>
1453 
1454 </sect3>
1455 
1456 <sect3 id="sec-svfmrpar">
1457 <title>Replacement Parameters</title>
1458 
1459 <para>Replacement is done in pair with matching the appropriate string in the message. For example, to replace each appearance of "foobar" with "fumbar" in translation, this would be run:
1460 <programlisting language="bash">
1461 $ posieve find-messages -s msgstr:foobar -s replace:fumbar
1462 </programlisting>
1463 The <option>replace</option> parameter works in pair with <option>msgstr</option>, i.e. <option>replace</option> cannot be issued without issuing <option>msgstr</option> as well. There are two possible problems with replacement as straightforward as this. The first is that if "foobar" was a whole word (or start of a word), and this word in the text started with upper-case letter, the replacement would make it lower-case. This can be avoided by executing replacement twice with case sensitivity:
1464 <programlisting language="bash">
1465 $ posieve find-messages -s msgstr:foobar -s replace:fumbar -scase
1466 $ posieve find-messages -s msgstr:Foobar -s replace:Fumbar -scase
1467 </programlisting>
1468 The other problem is if the word is split by an accelerator marker, for example:
1469 <programlisting language="po">
1470 msgstr "... f_oobar ..."
1471 </programlisting>
1472 The search may still find the word (see the <option>accel</option> parameter below), but direct replacement would cause the loss of accelerator marker, and therefore it is not done.<footnote>
1473 <para>Some heuristics for reinsertion of the accelerator marker may be implemented in the future.</para>
1474 </footnote> To see such cases, you should monitor the output of <command>find-messages</command> (always a good idea when doing batch replacement), where matched and replaced parts of the text will be highlighted.</para>
1475 
1476 <para>As usual for replacement based on regular expression, the replacement string may contain <literal>\<replaceable>number</replaceable></literal> references to groups defined in the matching pattern. For example, the previous example of case-aware replacement could be more efficiently and more elegantly performed with:
1477 <programlisting language="bash">
1478 $ posieve find-messages -s msgstr:'(f)oobar' -s replace:'\1umbar'
1479 </programlisting>
1480 (Though this is possible only if the original and the replacement start with the same letter.)</para>
1481 
1482 <para>Currently defined replacement parameters:
1483 <variablelist>
1484 
1485 <varlistentry>
1486 <term><option>replace:<replaceable>string</replaceable></option></term>
1487 <listitem>
1488 <para>The string to replace the match by <option>msgstr</option> parameter. Can contain regular expression group references.</para>
1489 </listitem>
1490 </varlistentry>
1491 
1492 </variablelist>
1493 </para>
1494 
1495 </sect3>
1496 
1497 <sect3 id="sec-svfmgpar">
1498 <title>General Parameters</title>
1499 
1500 <para>Parameters influencing general behavior of <command>find-messages</command> are as follows:
1501 <variablelist>
1502 
1503 <varlistentry>
1504 <term><option>or</option></term>
1505 <listitem>
1506 <para>Boolean OR instead of AND linking of conditions, but only for string matchers: <option>msgctxt</option>, <option>msgid</option>, <option>msgstr</option>, <option>comment</option>. This restriction may seem odd, but it is what is mostly needed in practice. For example, the set of conditions:
1507 <programlisting language="bash">
1508 -s msgctxt:tooltip -s comment:tooltip -s transl -s or
1509 </programlisting>
1510 would match all translated messages which have "tooltip" in context or in comments, and not messages which are either translated or have "tooltip" in context or in comments. For full control over the expression, use the <option>fexpr</option> parameter.</para>
1511 </listitem>
1512 </varlistentry>
1513 
1514 <varlistentry>
1515 <term><option>invert</option></term>
1516 <listitem>
1517 <para>Inverts the selection: messages satisfying the condition set are <emphasis>not</emphasis> selected.</para>
1518 </listitem>
1519 </varlistentry>
1520 
1521 <varlistentry>
1522 <term><option>accel:<replaceable>characters</replaceable></option></term>
1523 <listitem>
1524 <para>Characters to consider as <link linkend="sec-poaccel">accelerator markers</link>, to remove before applying matching patterns. If not given, they may be read from PO files (see <link linkend="hdr-x-accelerator-marker"><literal>X-Acclerator-Marker</literal></link> in <xref linkend="sec-cmheader"/>).</para>
1525 </listitem>
1526 </varlistentry>
1527 
1528 <varlistentry>
1529 <term><option>case</option></term>
1530 <listitem>
1531 <para>Matching patterns for strings and comments are by default case-insensitive, and this parameter switches them to case-sensitive.</para>
1532 </listitem>
1533 </varlistentry>
1534 
1535 <varlistentry>
1536 <term><option>mark</option></term>
1537 <listitem>
1538 <para>To each selected message a <literal>match</literal> flag is added, modifying the PO file. Modified files can then be opened in the editor,
1539 and selected messages looked up by this flag. This is typically done when something should be modified in selected messages, but doing that automatically (using <option>replace</option> parameter) is not possible or safe enough. Also useful here is the option <option>-m</option>/<option>--output-modified</option> of <command>posieve</command>, to write out the paths of modified PO files into a separate file, which can then be fed to the editor.</para>
1540 </listitem>
1541 </varlistentry>
1542 
1543 <varlistentry>
1544 <term><option>filter:<replaceable>hookspec</replaceable></option></term>
1545 <listitem>
1546 <para>The hook to modify the translation before applying the <option>msgstr</option> matcher to it. The hook type must be F1A. The parameter can be repeated to add several hooks.</para>
1547 </listitem>
1548 </varlistentry>
1549 
1550 <varlistentry>
1551 <term><option>nomsg</option></term>
1552 <listitem>
1553 <para>Do not report selected messages, either to standard output or to PO editors. Useful when <command>find-messages</command> is a pre-filter in the sieve chain.</para>
1554 </listitem>
1555 </varlistentry>
1556 
1557 <varlistentry>
1558 <term><option>lokalize</option></term>
1559 <listitem>
1560 <para>Open the PO file on selected messages in Lokalize (unless <option>nomsg</option> is in effect). Lokalize must be already running with the project that contains the PO file opened.</para>
1561 </listitem>
1562 </varlistentry>
1563 
1564 </variablelist>
1565 </para>
1566 
1567 </sect3>
1568 
1569 </sect2>
1570 
1571 <sect2 id="sv-generate-xml">
1572 <title><command>generate-xml</command></title>
1573 
1574 <para><command>generate-xml</command> creates a partial XML representation of a group of PO files.</para>
1575 
1576 <!-- TODO: Document the format in more detail. -->
1577 <para>The output XML format is as follows. Each PO file in the group is represented by a <literal>&lt;po&gt;</literal> element, which contains a list of <literal>&lt;msg&gt;</literal> elements, one for each message. The C<literal>&lt;msg&gt;</literal> element contains the usual parts of a PO message:
1578 <itemizedlist>
1579 <listitem>
1580 <para><literal>&lt;line&gt;</literal>: referent line number of the message</para>
1581 </listitem>
1582 <listitem>
1583 <para><literal>&lt;refentry&gt;</literal>: referent entry number of the message</para>
1584 </listitem>
1585 <listitem>
1586 <para><literal>&lt;status&gt;</literal>: current status of the message (obsolete, translated, untranslated, fuzzy)</para>
1587 </listitem>
1588 <listitem>
1589 <para><literal>&lt;msgid&gt;</literal>: the original text</para>
1590 </listitem>
1591 <listitem>
1592 <para><literal>&lt;msgstr&gt;</literal>: the translation</para>
1593 </listitem>
1594 <listitem>
1595 <para><literal>&lt;msgctxt&gt;</literal>: disambiguating context</para>
1596 </listitem>
1597 </itemizedlist>
1598 If the PO message contains plural forms, they will be represented with <literal>&lt;plural&gt;</literal> subelements of <literal>&lt;msgstr&gt;</literal>.</para>
1599 
1600 <para>Parameters:
1601 <variablelist>
1602 
1603 <varlistentry>
1604 <term><option>xml:<replaceable>file</replaceable></option></term>
1605 <listitem>
1606 <para>By default the XML content is written to standard output, and this parameter can be used to send it to a file instead</para>
1607 </listitem>
1608 </varlistentry>
1609 
1610 <varlistentry>
1611 <term><option>translatedOnly</option></term>
1612 <listitem>
1613 <para>Only translated messages are exported to XML (i.e. fuzzy, untranslated and obsolete are ignored).</para>
1614 </listitem>
1615 </varlistentry>
1616 
1617 </variablelist>
1618 </para>
1619 
1620 </sect2>
1621 
1622 <sect2 id="sv-merge-corr-tree">
1623 <title><command>merge-corr-tree</command></title>
1624 
1625 <para>When doing corrections on a copy of PO files tree, it is not possible to easily merge back just the updated translations, because word wrapping in PO file can be different, generating much more difference than it should.</para>
1626 
1627 <para>Additionally, tools like <command>pogrep</command> from <ulink url="http://translate.sourceforge.net/wiki/toolkit/index">Translate Toolkit</ulink> will create new partial tree as output, containing matched messages only. <command>merge-corr-tree</command> will help you to merge changes made in that partial tree back into the main tree.</para>
1628 
1629 <para>The main PO files tree is the input, and the <option>pathdelta</option> parameter is used to provide the path difference to where the partial correction tree is located.</para>
1630 
1631 <para>Parameters:
1632 <variablelist>
1633 
1634 <varlistentry>
1635 <term><option>pathdelta:<replaceable>search</replaceable>:<replaceable>replace</replaceable></option></term>
1636 <listitem>
1637 <para>Specifies that the partial tree is located at path obtained when <literal><replaceable>search</replaceable></literal> is replaced with <literal><replaceable>replace</replaceable></literal> in the input path.</para>
1638 </listitem>
1639 </varlistentry>
1640 
1641 </variablelist>
1642 </para>
1643 
1644 </sect2>
1645 
1646 <sect2 id="sv-normalize-header">
1647 <title><command>normalize-header</command></title>
1648 
1649 <para><command>normalize-header</command> applies <link linkend="hk-normalize-canonical-header">the <literal>normalize/canonical-header</literal> hook</link> to PO file headers.</para>
1650 
1651 <para>There are no parameters.</para>
1652 
1653 </sect2>
1654 
1655 <sect2 id="sv-normctxt-delim">
1656 <title><command>normctxt-delim</command></title>
1657 
1658 <para>In older PO files, disambiguating contexts may be embedded into <varname>msgid</varname> strings, as the initial part of the string delimited from the actual text with predefined substrings, here called the "head" and the "tail". For example, in:
1659 <programlisting language="po">
1660 msgid ""
1661 "_:this-is-context\n"
1662 "This is original text"
1663 msgstr "This is translated text"
1664 </programlisting>
1665 the head is the underscore-colon sequence (<literal>_:</literal>), and the tail the newline (<literal>\n</literal>). <command>normctxt-delim</command> will convert embedded contexts of the delimiter-type to proper <varname>msgctxt</varname> strings.</para>
1666 
1667 <para>Parameters:
1668 <variablelist>
1669 
1670 <varlistentry>
1671 <term><option>head:<replaceable>string</replaceable></option></term>
1672 <listitem>
1673 <para>The head of the delimiter-type embedded context.</para>
1674 </listitem>
1675 </varlistentry>
1676 
1677 <varlistentry>
1678 <term><option>tail:<replaceable>string</replaceable></option></term>
1679 <listitem>
1680 <para>The tail of the delimiter-type embedded context.</para>
1681 </listitem>
1682 </varlistentry>
1683 
1684 </variablelist>
1685 </para>
1686 
1687 </sect2>
1688 
1689 <sect2 id="sv-normctxt-sep">
1690 <title><command>normctxt-sep</command></title>
1691 
1692 <para>In older PO files, disambiguating contexts may be embedded into <varname>msgid</varname> strings, as the initial part of the string separated from the actual text by a predefined substring. For example, in:
1693 <programlisting language="po">
1694 msgid "this-is-context|This is original text"
1695 msgstr "This is translated text"
1696 </programlisting>
1697 the separator string is the pipe character (<literal>|</literal>). <command>normctxt-sep</command> will convert embedded contexts of the separator-type to proper <varname>msgctxt</varname> strings.</para>
1698 
1699 <para>Parameters:
1700 <variablelist>
1701 
1702 <varlistentry>
1703 <term><option>sep:<replaceable>string</replaceable></option></term>
1704 <listitem>
1705 <para>The string that separates the context and the text in separator-type embedded context.</para>
1706 </listitem>
1707 </varlistentry>
1708 
1709 </variablelist>
1710 </para>
1711 
1712 </sect2>
1713 
1714 <sect2 id="sv-remove-fuzzy-comments">
1715 <title><command>remove-fuzzy-comments</command></title>
1716 
1717 <para>Being translator's input, translator comments are copied verbatim to fuzzy messages created on merging with template. Depending on the purpose of translator comments (e.g. see <xref linkend="sec-cmskipcheck"/> for some special types), it may be better to automatically remove some of them from fuzzy messages (and then possibly add them back manually when updating the translation). If run without any parameters <command>remove-fuzzy-comments</command> will do nothing, so one or more parameters need to be given to actually remove any comment.</para>
1718 
1719 <para>Parameters:
1720 <variablelist>
1721 
1722 <varlistentry>
1723 <term><option>all</option></term>
1724 <listitem>
1725 <para>Simply all translator comments in fuzzy messages are removed.</para>
1726 </listitem>
1727 </varlistentry>
1728 
1729 <varlistentry>
1730 <term><option>nopipe</option></term>
1731 <listitem>
1732 <para>Translator comments containing <link linkend="p-trflag">translator flags</link> (see <xref linkend="sec-cmskipcheck"/>) are removed.</para>
1733 </listitem>
1734 </varlistentry>
1735 
1736 <varlistentry>
1737 <term><option>pattern:<replaceable>regex</replaceable></option></term>
1738 <listitem>
1739 <para>Translator comment must match the given regular expression to be removed.</para>
1740 </listitem>
1741 </varlistentry>
1742 
1743 <varlistentry>
1744 <term><option>exclude:<replaceable>regex</replaceable></option></term>
1745 <listitem>
1746 <para>Translator comment is removed if it does not match the given regular expression.</para>
1747 </listitem>
1748 </varlistentry>
1749 
1750 <varlistentry>
1751 <term><option>case</option></term>
1752 <listitem>
1753 <para>Matching patterns are by default case-insensitive, and this parameter switches to case-sensitivity.</para>
1754 </listitem>
1755 </varlistentry>
1756 
1757 </variablelist>
1758 </para>
1759 
1760 <para>When several removal criteria are specified, first those other than <option>pattern</option> and <option>exclude</option> are applied in unspecified order, then the <option>pattern</option> match, and finally the <option>exclude</option> match.</para>
1761 
1762 </sect2>
1763 
1764 <sect2 id="sv-remove-obsolete">
1765 <title><command>remove-obsolete</command></title>
1766 
1767 <para><command>remove-obsolete</command> simply removes all obsolete messages, whether fuzzy or translated, from the PO file.</para>
1768 
1769 <para>There are no parameters.</para>
1770 
1771 </sect2>
1772 
1773 <sect2 id="sv-remove-previous">
1774 <title><command>remove-previous</command></title>
1775 
1776 <para><command>remove-previous</command> removes previous strings, i.e. <literal>#| ...</literal> comments, from messages.</para>
1777 
1778 <para>Parameters:
1779 <variablelist>
1780 
1781 <varlistentry>
1782 <term><option>all</option></term>
1783 <listitem>
1784 <para>Previous strings are by default removed only from non-fuzzy messages. This parameter specifies to remove previous strings from all messages, including fuzzy.</para>
1785 </listitem>
1786 </varlistentry>
1787 
1788 </variablelist>
1789 </para>
1790 
1791 </sect2>
1792 
1793 <sect2 id="sv-resolve-aggregates">
1794 <title><command>resolve-aggregates</command></title>
1795 
1796 <para>In its default mode of operation, <command>msgcat(1)</command> produces an aggregate message when in different catalogs it encounters a message with the same key but different translation or translator or extracted comments. A general aggregate message looks like this:
1797 <programlisting language="po">
1798 # #-#-#-#-#  po-file-name-1 (project-version-id-1)  #-#-#-#-#
1799 # manual-comments-1
1800 # #-#-#-#-#  po-file-name-2 (project-version-id-2)  #-#-#-#-#
1801 # manual-comments-2
1802 # ...
1803 # #-#-#-#-#  po-file-name-n (project-version-id-n)  #-#-#-#-#
1804 # manual-comments-n
1805 #. #-#-#-#-#  po-file-name-1 (project-version-id-1)  #-#-#-#-#
1806 #. automatic-comments-1
1807 #. #-#-#-#-#  po-file-name-2 (project-version-id-2)  #-#-#-#-#
1808 #. automatic-comments-2
1809 #. ...
1810 #. #-#-#-#-#  po-file-name-n (project-version-id-n)  #-#-#-#-#
1811 #. automatic-comments-n
1812 #: source-refs-1 source-refs-2 ... source-refs-n
1813 #, fuzzy, other-flags
1814 msgctxt "context"
1815 msgid "original-text"
1816 msgstr ""
1817 "#-#-#-#-#  po-file-name-1 (project-version-id-1)  #-#-#-#-#\n"
1818 "translated-text-1\n"
1819 "#-#-#-#-#  po-file-name-2 (project-version-id-2)  #-#-#-#-#\n"
1820 "translated-text-2\n"
1821 "..."
1822 "#-#-#-#-#  po-file-name-n (project-version-id-n)  #-#-#-#-#\n"
1823 "translated-text-n"
1824 </programlisting>
1825 Each message part is aggregated only if different in at least one message
1826 in the group. For example, extracted comments may be aggregated while translations not.</para>
1827 
1828 <para><command>resolve-aggregates</command> is used to resolve aggregate messages of this kind into normal messages, by picking one variant from each aggregated part.</para>
1829 
1830 <para>Parameters:
1831 <variablelist>
1832 
1833 <varlistentry>
1834 <term><option>first</option></term>
1835 <listitem>
1836 <para>By default, the picked variant is the one with most occurences, or the first of the several with same number of occurences. If this parameter is issued, the first variant is picked unconditionally.</para>
1837 </listitem>
1838 </varlistentry>
1839 
1840 <varlistentry>
1841 <term><option>unfuzzy</option></term>
1842 <listitem>
1843 <para>Aggregated messages are always made fuzzy, leaving no way to determine
1844 if and which of the original messages were fuzzy. Therefore, by default, the resolved message is left fuzzy too. If, however, it is known beforehand that none of the original messages were fuzzy, resolved messages can be unfuzzied by issuing this parameter.</para>
1845 </listitem>
1846 </varlistentry>
1847 
1848 <varlistentry>
1849 <term><option>keepsrc</option></term>
1850 <listitem>
1851 <para>Since there is no information based on which the aggregated source references can be split into originating groups, they are entirely removed unless this parameter is issued.</para>
1852 </listitem>
1853 </varlistentry>
1854 
1855 </variablelist>
1856 </para>
1857 
1858 </sect2>
1859 
1860 <sect2 id="sv-resolve-alternatives">
1861 <title><command>resolve-alternatives</command></title>
1862 
1863 <para><command>resolve-alternatives</command> resolves <emphasis>alternatives directives</emphasis> found in the translation into one of the alternatives.</para>
1864 
1865 <para>An alternative directive is a substring of the form <literal>~@/.../.../...</literal>, for example:
1866 <programlisting language="po">
1867 msgstr "I see a ~@/pink/white/ elephant."
1868 </programlisting>
1869 <literal>~@</literal> is the directive head, which is followed by a character that defines the delimiter of alternatives (can be arbitrary), and then by alternatives themselves. The number of alternatives per directive is not defined by the directive itself, but it is provided as the sieve parameter (i.e. all alternative directives must have some number of alternatives).</para>
1870 
1871 <para>Parameters:
1872 <variablelist>
1873 
1874 <varlistentry>
1875 <term><option>alt:<replaceable>N</replaceable>,<replaceable>M</replaceable>t</option></term>
1876 <listitem>
1877 <para>Specifies how to resolve alternatives. <literal><replaceable>N</replaceable></literal> is the index (starting from 1) of the alternative to take from each directive, and <literal><replaceable>M</replaceable></literal> is the number of alternatives per directive. Example: <literal>alt:1,2t</literal>.</para>
1878 </listitem>
1879 </varlistentry>
1880 
1881 </variablelist>
1882 </para>
1883 
1884 <para>If an alternatives directive is invalid (e.g. too little alternatives), it is reported to standard output. If at least one alternatives directive in the text is not valid, the text is not modifed.</para>
1885 
1886 </sect2>
1887 
1888 <sect2 id="sv-resolve-entities">
1889 <title><command>resolve-entities</command></title>
1890 
1891 <para>XML entities are substrings of the form <literal>&lt;<replaceable>entityname</replaceable>&gt;</literal>, typically encountered in XML-like text markups, but elsewhere too. They are resolved into underlying, human-readable values at build time (when translated text documents are created) or at run time (in translated user interfaces). Sometimes it may be better to have them resolved already in the PO file itself, and that is what <command>resolve-entities</command> does.</para>
1892 
1893 <para>Parameters:
1894 <variablelist>
1895 
1896 <varlistentry>
1897 <term><option>entdef:<replaceable>file</replaceable></option></term>
1898 <listitem>
1899 <para>Path to the file which contains entitiy definitions. It can be repeated to add several files.</para>
1900 
1901 <para>Entity definition files are plain text files of the following format:
1902 <programlisting>
1903 &lt;!-- This is a commment. --&gt;
1904 &lt;!ENTITY name1 'value1'&gt;
1905 &lt;!ENTITY name2 'value2'&gt;
1906 &lt;!ENTITY name3 'value3'&gt;
1907 ...
1908 </programlisting>
1909 </para>
1910 </listitem>
1911 </varlistentry>
1912 
1913 <varlistentry>
1914 <term><option>ignore:<replaceable>entitynames</replaceable></option></term>
1915 <listitem>
1916 <para>Entities which should be ignored during resolution. Standard XML entities (<literal>&amp;lt;</literal>, <literal>&amp;gt;</literal>, <literal>&amp;apos;</literal>, <literal>&amp;quot;</literal>, <literal>&amp;amp;</literal>) are ignored by default.</para>
1917 </listitem>
1918 </varlistentry>
1919 
1920 </variablelist>
1921 </para>
1922 
1923 </sect2>
1924 
1925 <sect2 id="sv-set-header">
1926 <title><command>set-header</command></title>
1927 
1928 <para>Sometimes a PO header field or comment needs to be updated in many PO files at once, and <command>set-header</command> serves that purpose.</para>
1929 
1930 <para>Parameters for setting and removing header fields:
1931 <variablelist>
1932 
1933 <varlistentry>
1934 <term><option>field:<replaceable>name</replaceable>:<replaceable>value</replaceable></option></term>
1935 <listitem>
1936 <para>Set the field with given name to given value. This parameter can be repeated to set several fields in one run.</para>
1937 
1938 <para>By default, <option>field</option> will actually set the field only if it is already present in the header. To add the field if not present, the <option>create</option> parameter must be issued as well. If the field is being added, parameters <option>after</option> and <option>before</option> can be used to specify where to insert it, or else the new field is appended at the end of the header. If the field is present but not positioned according to <option>after</option> and <option>before</option>, the <option>reorder</option> parameter can be issued to move the field within the header.</para>
1939 </listitem>
1940 </varlistentry>
1941 
1942 <varlistentry>
1943 <term><option>create</option></term>
1944 <listitem>
1945 <para>The field should be added if it is not present in the header.</para>
1946 </listitem>
1947 </varlistentry>
1948 
1949 <varlistentry>
1950 <term><option>after</option></term>
1951 <listitem>
1952 <para>When a field is added, it should be inserted after this field.</para>
1953 </listitem>
1954 </varlistentry>
1955 
1956 <varlistentry>
1957 <term><option>before</option></term>
1958 <listitem>
1959 <para>When a field is added, it should be inserted before this field.</para>
1960 </listitem>
1961 </varlistentry>
1962 
1963 <varlistentry>
1964 <term><option>reorder</option></term>
1965 <listitem>
1966 <para>If the field is present, but it is in the wrong place according to <option>after</option> and <option>before</option>, this parameter will cause it to be reinserted in proper place.</para>
1967 </listitem>
1968 </varlistentry>
1969 
1970 <varlistentry>
1971 <term><option>remove:<replaceable>field</replaceable></option></term>
1972 <listitem>
1973 <para>Remove the field with this name. If there are several fileds of that name, all are removed.</para>
1974 </listitem>
1975 </varlistentry>
1976 
1977 <varlistentry>
1978 <term><option>removerx:<replaceable>regex</replaceable></option></term>
1979 <listitem>
1980 <para>Remove all fields matched by the given regular expression.</para>
1981 </listitem>
1982 </varlistentry>
1983 
1984 </variablelist>
1985 </para>
1986 
1987 <para>Parameters for setting and removing header comments:
1988 <variablelist>
1989 
1990 <varlistentry>
1991 <term><option>title:<replaceable>value</replaceable></option></term>
1992 <listitem>
1993 <para>Set the title comment to the given value. It can be repeated, since the title can be composed of multiple comment lines.</para>
1994 </listitem>
1995 </varlistentry>
1996 
1997 <varlistentry>
1998 <term><option>rmtitle</option></term>
1999 <listitem>
2000 <para>Remove title comments.</para>
2001 </listitem>
2002 </varlistentry>
2003 
2004 <varlistentry>
2005 <term><option>copyright:<replaceable>value</replaceable></option></term>
2006 <listitem>
2007 <para>Set the copyright comment to the given value.</para>
2008 </listitem>
2009 </varlistentry>
2010 
2011 <varlistentry>
2012 <term><option>rmcopyright</option></term>
2013 <listitem>
2014 <para>Remove the copyright comment.</para>
2015 </listitem>
2016 </varlistentry>
2017 
2018 <varlistentry>
2019 <term><option>license:<replaceable>value</replaceable></option></term>
2020 <listitem>
2021 <para>Set the license comment to the given value.</para>
2022 </listitem>
2023 </varlistentry>
2024 
2025 <varlistentry>
2026 <term><option>rmlicense</option></term>
2027 <listitem>
2028 <para>Remove the license comment.</para>
2029 </listitem>
2030 </varlistentry>
2031 
2032 <varlistentry>
2033 <term><option>author:<replaceable>value</replaceable></option></term>
2034 <listitem>
2035 <para>Set the author comment to the given value. It can be repeated, since there may be more authors (i.e. translators).</para>
2036 </listitem>
2037 </varlistentry>
2038 
2039 <varlistentry>
2040 <term><option>rmauthor</option></term>
2041 <listitem>
2042 <para>Remove author comments.</para>
2043 </listitem>
2044 </varlistentry>
2045 
2046 <varlistentry>
2047 <term><option>comment:<replaceable>value</replaceable></option></term>
2048 <listitem>
2049 <para>Set the free comment to the given value. It can be repeated, since there can be any number of free comment lines.</para>
2050 </listitem>
2051 </varlistentry>
2052 
2053 <varlistentry>
2054 <term><option>rmcomment</option></term>
2055 <listitem>
2056 <para>Remove free comments.</para>
2057 </listitem>
2058 </varlistentry>
2059 
2060 <varlistentry>
2061 <term><option>rmallcomm</option></term>
2062 <listitem>
2063 <para>Remove all header comments.</para>
2064 </listitem>
2065 </varlistentry>
2066 
2067 </variablelist>
2068 </para>
2069 
2070 <para>Note that all existing comments of given type are removed before setting the new ones, i.e. the new comments are <emphasis>not</emphasis> appended to the existing. For example, if single <option>author</option> parameter is issued, with a translator name and email address as value, this one translator will replace all existing translators in the header comments.</para>
2071 
2072 <para>Comment values are checked for some minimal consistency, e.g. author comments must contain email addresses, licence comments the word "licence", etc.</para>
2073 
2074 <para>Value strings (both of fields and comments) may contain %-directives,
2075 which are expanded to catalog-dependent substrings prior to setting the value.
2076 Currently available directive are:
2077 <itemizedlist>
2078 <listitem>
2079 <para><literal>%poname</literal>: PO domain name (equal to file name without <filename>.po</filename> extension)</para>
2080 </listitem>
2081 </itemizedlist>
2082 If literal % character is needed (e.g. when setting the <literal>Plural-Forms</literal> field), it can be escaped by doubling it, <literal>%%</literal>. The directive can also be given inside braces, as <literal>%{...}</literal> when it would be ambiguous otherwise.</para>
2083 
2084 </sect2>
2085 
2086 <sect2 id="sv-stats">
2087 <title><command>stats</command></title>
2088 
2089 <para><command>stats</command> collects statistics on PO files, such as message and word counts, and more. Statistics can be presented in several ways and on several levels.</para>
2090 
2091 <para>Parameters:
2092 <variablelist>
2093 
2094 <varlistentry>
2095 <term><option>accel:<replaceable>characters</replaceable></option></term>
2096 <listitem>
2097 <para>Characters to consider as <link linkend="sec-poaccel">accelerator markers</link>, to remove them when splitting text to count words. If not given, they may be read from PO files (see <link linkend="hdr-x-accelerator-marker"><literal>X-Acclerator-Marker</literal></link> in <xref linkend="sec-cmheader"/>), or else some usual accelerator marker characters are removed.</para>
2098 </listitem>
2099 </varlistentry>
2100 
2101 <varlistentry>
2102 <term><option>detail</option></term>
2103 <listitem>
2104 <para>In table views, by default only message, word, and character counts are given. This parameter requests additional derived data, such as expansion factors (ratio of words in translation to words in original), number of words per message, etc.</para>
2105 </listitem>
2106 </varlistentry>
2107 
2108 <varlistentry>
2109 <term><option>incomplete</option></term>
2110 <listitem>
2111 <para>When run over a collection of PO files, all non-fully translated PO files are listed separately, with very brief statistics of incompleteness.</para>
2112 </listitem>
2113 </varlistentry>
2114 
2115 <varlistentry>
2116 <term><option>incompfile:<replaceable>file</replaceable></option></term>
2117 <listitem>
2118 <para>Write a file with paths of all non-fully translated PO files, one per line. This file can then be fed with <option>-f</option>/<option>--from-files</option> back to <command>posieve</command> or another script, to process only incomplete PO files.</para>
2119 </listitem>
2120 </varlistentry>
2121 
2122 <varlistentry>
2123 <term><option>templates:<replaceable>search</replaceable>:<replaceable>replace</replaceable></option></term>
2124 <listitem>
2125 <para>If there exists both a directory with translated PO files and with POT (template) files, and not every POT file has the corresponding PO file, this parameter can be used to count POT files without PO counterpart as fully untranslated in statistics. Value to the parameter are two strings separated by colon: the first string will be searched for in directory paths of processed PO files, and replaced with the second string to construct corresponding directory paths of POT files. For example:
2126 <programlisting language="bash">
2127 $ cd $MYTRANSLATIONS
2128 $ ls
2129 my_lang  templates
2130 $ posieve stats -s templates:my_lang:templates my_lang/
2131 </programlisting>
2132 </para>
2133 </listitem>
2134 </varlistentry>
2135 
2136 <varlistentry>
2137 <term><option>minwords:<replaceable>number</replaceable></option></term>
2138 <listitem>
2139 <para>Only messages with at least this many words (in any of original or translation strings) are counted into statistics.</para>
2140 </listitem>
2141 </varlistentry>
2142 
2143 <varlistentry>
2144 <term><option>maxwords:<replaceable>number</replaceable></option></term>
2145 <listitem>
2146 <para>Only messages with at most this many words (in any of original or translation strings) are counted into statistics.</para>
2147 </listitem>
2148 </varlistentry>
2149 
2150 <varlistentry>
2151 <term><option>lspan:<replaceable>start</replaceable>:<replaceable>end</replaceable></option></term>
2152 <listitem>
2153 <para>Only messages with referent line numbers (line number of <varname>msgid</varname>) in this range are counted into statistics. The starting line is included in the range, the ending line is not. If start is omitted (e.g. <option>lspan::500</option>) it is assumed 0, and if end is omitted (e.g. <option>lspan:300</option> or <option>lspan:300:</option>) it is assumed the total number of lines.</para>
2154 </listitem>
2155 </varlistentry>
2156 
2157 <varlistentry>
2158 <term><option>espan:<replaceable>start</replaceable>:<replaceable>end</replaceable></option></term>
2159 <listitem>
2160 <para>Only messages with entry numbers (as reported by PO editors) in this range are counted into statistics. Same boundary inclusion and omission rules as for <option>lspan</option> apply; e.g. <option>espan:4:8</option> means to count messages with entry numbers 4, 5, 6, and 7.</para>
2161 </listitem>
2162 </varlistentry>
2163 
2164 <varlistentry>
2165 <term><option>branch:<replaceable>branch</replaceable></option></term>
2166 <listitem>
2167 <para>Only messages from given branch are counted into statistics (<link linkend="ch-summit">summit</link>). Several branches may be given as comma-separated list.</para>
2168 </listitem>
2169 </varlistentry>
2170 
2171 <varlistentry>
2172 <term><option>bydir</option></term>
2173 <listitem>
2174 <para>Statistics is broken by directories, that is a report is displayed for each group of PO files in the same directory (and not below it). More usually used with bar displays than with tabular displays.</para>
2175 </listitem>
2176 </varlistentry>
2177 
2178 <varlistentry>
2179 <term><option>byfile</option></term>
2180 <listitem>
2181 <para>Statistics is broken by files, that is a report is displayed for each PO file. Usually used with bar displays.</para>
2182 </listitem>
2183 </varlistentry>
2184 
2185 <varlistentry>
2186 <term><option>msgbar</option></term>
2187 <listitem>
2188 <para>Instead of a table with detailed statistics, only message counts are shown, accompanied with a text-art bar. Mostly useful in combination with <option>bydir</option> and <option>byfile</option>.</para>
2189 </listitem>
2190 </varlistentry>
2191 
2192 <varlistentry>
2193 <term><option>wbar</option></term>
2194 <listitem>
2195 <para>Like <option>msgbar</option>, but to have word instead of message counts.</para>
2196 </listitem>
2197 </varlistentry>
2198 
2199 <varlistentry>
2200 <term><option>msgfmt</option></term>
2201 <listitem>
2202 <para>Like <option>msgbar</option>, but a msgfmt-style plain-text summary is printed.</para>
2203 </listitem>
2204 </varlistentry>
2205 
2206 <varlistentry>
2207 <term><option>absolute</option></term>
2208 <listitem>
2209 <para>Bar displays (on <option>msgbar</option> and <option>wbar</option>) are normaly relative, meaning that when <option>byfile</option> or <option>bydir</option> is in effect, each bar is of same length. This parameter makes bars scaled to sizes of PO files or directories. For example, if <option>msgbar</option> and <option>byfile</option> are issued, then the bar of a PO file with twice as many messages as another PO file will be twice as long.</para>
2210 </listitem>
2211 </varlistentry>
2212 
2213 <varlistentry>
2214 <term><option>ondiff</option></term>
2215 <listitem>
2216 <para>Fuzzy messages are often very easy to correct (e.g. a typo fixed), which may make their word count misleading when estimating translation effort. This can be amended by issuing this parameter, to split word and character counts of fuzzy messages into translated and untranslated counts. The split is based on the difference ratio between current and previous original text, and a threshold. If the difference ratio is larger than the threshold, everything is counted as untranslated. The fuzzy count is left at zero. If previous original text is missing, the correction is not made, and counts are assigned to fuzzy as usual.</para>
2217 </listitem>
2218 </varlistentry>
2219 
2220 <varlistentry>
2221 <term><option>mincomp:<replaceable>fraction</replaceable></option></term>
2222 <listitem>
2223 <para>Only those PO files which have translation completeness (measured by the ratio of translated to all messages, excluding obsolete) equal to or higher than the given fraction are included into statistics. This is especially useful when for each new template an empty PO file is automatically produced (instead of translators having to start work from a template), to include into statistics only those files which have actually seen some translation (using a small non-zero number for the fraction, e.g. <option>fraction:1e-6</option>).</para>
2224 </listitem>
2225 </varlistentry>
2226 
2227 <varlistentry>
2228 <term><option></option></term>
2229 <listitem>
2230 <para>The hook to modify the translation before splitting it to count words and characters (see <xref linkend="sec-cmhooks"/>). The hook type must be F1A. The parameter can be repeated to add several hooks, which are then applied in the order of specification.</para>
2231 </listitem>
2232 </varlistentry>
2233 
2234 </variablelist>
2235 </para>
2236 
2237 <sect3 id="sec-svstemctxt">
2238 <title>Handling Embedded Contexts</title>
2239 
2240 <para>Some older PO files will have disambiguating contexts embedded into the <varname>msgid</varname> string, instead of using the newer standard <varname>msgctxt</varname> string. There are several customary ways in which this is done, but in general it depends on the translation environment where such PO files are used.</para>
2241 
2242 <para>Embedded contexts will skew the statistics. Pology contains several sieves for converting embedded contexts into <varname>msgctxt</varname> contexts, named <command>normctxt-*</command>. When statistics on such PO files is computed, a sieve chain should be used in which the <command>stats</command> sieve is preceeded by the context conversion sieve. For example, if the embedded context starts the <varname>msgid</varname> and ends with <literal>|</literal>, statistics should be computed with:
2243 <programlisting language="bash">
2244 $ posieve --no-sync normctxt-sep,stats -s sep:'|' ...
2245 </programlisting>
2246 Note that <command>normctxt-*</command> sieves, since they modify messages, would by default cause PO files to be modified on disk. Option <option>--no-sync</option> is therefore issued to prevent modifications to sieved files.</para>
2247 
2248 </sect3>
2249 
2250 <sect3 id="sec-svstoutleg">
2251 <title>Output Legend</title>
2252 
2253 <para>The default output from <command>stats</command> is a table where rows present statistics for a category of messages, and columns the particular categories of data:
2254 <programlisting language="bash">
2255 $ posieve stats frobaz/
2256 -              msg  msg/tot  w-or  w/tot-or  w-tr  ch-or  ch-tr
2257 translated     ...    ...    ...     ...     ...    ...    ...
2258 fuzzy          ...    ...    ...     ...     ...    ...    ...
2259 untranslated   ...    ...    ...     ...     ...    ...    ...
2260 total          ...    ...    ...     ...     ...    ...    ...
2261 obsolete       ...    ...    ...     ...     ...    ...    ...
2262 </programlisting>
2263 The <literal>total</literal> row is the sum of <literal>translated</literal>, <literal>fuzzy</literal>, and <literal>untranslated</literal> rows, whereas <literal>obsolete</literal> row is excluded. The columns are as follows:
2264 <itemizedlist>
2265 <listitem>
2266 <para><literal>msg</literal>: number of messages</para>
2267 </listitem>
2268 <listitem>
2269 <para><literal>msg/tot</literal>: percentage of messages relative to total</para>
2270 </listitem>
2271 <listitem>
2272 <para><literal>w-or</literal>: number of words in the original</para>
2273 </listitem>
2274 <listitem>
2275 <para><literal>w/tot-or</literal>: percentage of words in the original relative to total</para>
2276 </listitem>
2277 <listitem>
2278 <para><literal>w-tr</literal>: number of words in the translation</para>
2279 </listitem>
2280 <listitem>
2281 <para><literal>ch-or</literal>: number of characters in original</para>
2282 </listitem>
2283 <listitem>
2284 <para><literal>ch-tr</literal>: number of characters in the translation</para>
2285 </listitem>
2286 </itemizedlist>
2287 </para>
2288 
2289 <para>The output with <option>detail</option> parameter in effect is the same as default, with several columns of derived data appended to the table:
2290 <itemizedlist>
2291 <listitem>
2292 <para><literal>w-ef</literal>: word expansion factor (increase in words from the original to the translation)</para>
2293 </listitem>
2294 <listitem>
2295 <para><literal>ch-ef</literal>: character expansion factor (increase in characters from the original to the translation)</para>
2296 </listitem>
2297 <listitem>
2298 <para><literal>w/msg-or</literal>: average of number words per message in the original</para>
2299 </listitem>
2300 <listitem>
2301 <para><literal>w/msg-tr</literal>: average number of words per message in the translation</para>
2302 </listitem>
2303 <listitem>
2304 <para><literal>ch/w-or</literal>: average number of characters per message in the original</para>
2305 </listitem>
2306 <listitem>
2307 <para><literal>ch/w-tr</literal>: average number of characters per message in the translation</para>
2308 </listitem>
2309 </itemizedlist>
2310 </para>
2311 
2312 <para>If any of the sieve parameters that restrict or modify counting (such as <option>ondiff</option>, <option>lspan</option>, etc.) have been issued, this is indicated in the output by a <literal>modifiers: ...</literal> line:
2313 <programlisting language="bash">
2314 $ posieve stats -s maxwords:5 -s ondiff frobaz/
2315 (...the statistics table...)
2316 modifiers: at most 5 words and scaled fuzzy counts
2317 </programlisting>
2318 </para>
2319 
2320 <para>When the <option>incomplete</option> parameter is given, the statistics table is followed by a table of non-fully translated PO files, with counts of fuzzy and untranslated messages and words:
2321 <programlisting language="bash">
2322 $ posieve stats -s incomplete frobaz/
2323 (...the overall statistics table...)
2324 catalog              msg/f   msg/u   msg/f+u   w/f   w/u   w/f+u
2325 frobaz/foxtrot.po        0      11        11     0   123     123
2326 frobaz/november.po      19      14        33    85    47     132
2327 frobaz/sierra.po        22       0        22   231     0     231
2328 </programlisting>
2329 In the column names, <literal>msg/*</literal> and <literal>w/*</literal> stand for messages and words; <literal>*/f</literal>, <literal>*/u</literal>, and <literal>*/f+u</literal> stand for fuzzy, untranslated, and the two summed.</para>
2330 
2331 <para>When parameters <option>msgbar</option> or <option>wbar</option> are in effect, statistics is presented in the form of a text-art bar, giving visual relation between numbers of translated, fuzzy, and untranslated messages or words:
2332 <programlisting language="bash">
2333 $ posieve stats -s wbar frobaz/
2334 4572/1829/2533 w-or |¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤×××××××××············|
2335 </programlisting>
2336 A typical condensed overview of translation state is obtained by:
2337 <programlisting language="bash">
2338 $ posieve stats -s byfile -s msgbar frobaz/
2339 frobaz/foxtrot.po   34/ -/11 msgs |¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤·····|
2340 frobaz/november.po  58/19/14 msgs |¤¤¤¤¤¤¤¤¤¤¤×××××····|
2341 frobaz/sierra.po    65/22/ - msgs |¤¤¤¤¤¤¤¤¤¤¤¤¤¤××××××|
2342 (overall)          147/41/25 msgs |¤¤¤¤¤¤¤¤¤¤¤¤¤××××···|
2343 </programlisting>
2344 Note that while message counts are the classic for bar overviews (<option>msgbar</option>), you are probably better off looking at word counts (<option>wbar</option>) instead, because word counts represent more closely the amount of work needed to complete the translation. Rounding of fractions for bars is such that as long as there is at least one fuzzy or untranslated message (or word), the bar will show one incomplete cell.</para>
2345 
2346 </sect3>
2347 
2348 <sect3 id="sec-svstnotes">
2349 <title>Notes on Counting</title>
2350 
2351 <para>Word and character counts for a message string are obtained by processing it in the following order:
2352 <itemizedlist>
2353 <listitem>
2354 <para>Accelerator markers are removed.</para>
2355 </listitem>
2356 <listitem>
2357 <para>Text markup is eliminated (e.g. XML-like tags).</para>
2358 </listitem>
2359 <listitem>
2360 <para>Other special substrings, such as format directives, are also eliminated (e.g. <literal>%s</literal> in messages with <literal>c-format</literal> flag).</para>
2361 </listitem>
2362 <listitem>
2363 <para>Text is split into words by taking all contiguous sequences of "word characters", which include letters, numbers, and underscore.</para>
2364 </listitem>
2365 <listitem>
2366 <para>All words not starting with a letter are eliminated.</para>
2367 </listitem>
2368 <listitem>
2369 <para>Words that remain are counted into statistics. Whitespace is not included in character count.</para>
2370 </listitem>
2371 </itemizedlist>
2372 </para>
2373 
2374 <para>In plural messages, counts for the original are the average of <varname>msgid</varname> and <varname>msgid_plural</varname> strings, and likewise the average of all <varname>msgstr</varname> strings for the translation. In this way, the comparative statistics between the original and the translation is not skewed for languages that have more or less than two plural forms.</para>
2375 
2376 </sect3>
2377 
2378 </sect2>
2379 
2380 <sect2 id="sv-tag-untranslated">
2381 <title><command>tag-untranslated</command></title>
2382 
2383 <para>Some translators like to edit PO files with a plain text editor, which may provide no special support for editing PO files, other than perhaps PO syntax highlighting. In this scenario, <command>tag-untranslated</command> can be used to equip untranslated messages with <literal>untranslated</literal> flag, so that they can be easily looked up in the editor.</para>
2384 
2385 <para>Since <literal>untranslated</literal> is not one of defined PO flags, it will be lost if the PO file is merged with the template. This is intentional: the only purpose of this flag is to facilitate immediate editing of the PO file, and you may miss to remove some of them while editing. There is no reason for <literal>untranslated</literal> flags to persist in that case. Also, if the flag is not removed after the message has been translated, a subsequent run of this sieve will remove the flag.</para>
2386 
2387 <para>Parameters:
2388 <variablelist>
2389 
2390 <varlistentry>
2391 <term><option>strip</option></term>
2392 <listitem>
2393 <para>Instead of being added, <literal>untranslated</literal> flags are stripped. This is useful when you had no time to translate all messages but you want to send the PO file away.</para>
2394 </listitem>
2395 </varlistentry>
2396 
2397 <varlistentry>
2398 <term><option>wfuzzy</option></term>
2399 <listitem>
2400 <para><literal>untranslated</literal> flags are added to fuzzy messages as well. This can be useful to be able to jump in the text editor through all incomplete message by just giving <userinput>, untranslated</userinput><footnote>
2401 <para>Alternatively, if the editor provides regular expressions for searches, you can search for <userinput>, fuzzy|, untranslated</userinput>.</para>
2402 </footnote>, or when the set of messages to be updated has been limited somehow (e.g. by the <option>branch</option> parameter).</para>
2403 </listitem>
2404 </varlistentry>
2405 
2406 <varlistentry>
2407 <term><option>branch:<replaceable>branch</replaceable></option></term>
2408 <listitem>
2409 <para>Tag only untranslated messages from given branch (<link linkend="ch-summit">summit</link>). Several branches may be given as comma-separated list.</para>
2410 </listitem>
2411 </varlistentry>
2412 
2413 </variablelist>
2414 </para>
2415 
2416 </sect2>
2417 
2418 <sect2 id="sv-unfuzzy-context-only">
2419 <title><command>unfuzzy-context-only</command></title>
2420 
2421 <para>Sometimes the message is made fuzzy during merging only due to change in the <varname>msgctxt</varname> string, or its addition or removal. Some translators and languages may be less dependent on contexts than the other, or they may be in a hurry prior to the release of the translation, and then <command>unfuzzy-context-only</command> can be used to unfuzzy these messages in which only the context was modified. This state can be detected by comparing the current and the previous strings in the fuzzy message, i.e. the PO file must have been merged with <option>--previous</option> option to <command>msgmerge</command>.</para>
2422 
2423 <para>Parameters:
2424 <variablelist>
2425 
2426 <varlistentry>
2427 <term><option>noreview</option></term>
2428 <listitem>
2429 <para>By default, unfuzzied messages will also be given a translator comment
2430 with <literal>unreviewed-context</literal> string, so that you may find and review these messages at a later time. This parameter will prevent the addition of such comment, but it is usually safer to review automatically unfuzzied messages when you find the time.</para>
2431 </listitem>
2432 </varlistentry>
2433 
2434 <varlistentry>
2435 <term><option>eqmsgid</option></term>
2436 <listitem>
2437 <para>Sometimes a lot of messages in the code may be semi-automatically equipped with contexts (e.g. to group items by a common property), and then it may be necessary to review only those messages which got split into two or more messages due to newly added contexts. This parameter may be issued to specifically report all translated messages which have the their <varname>msgid</varname> string equal to an unfuzzied message, including unfuzzied messages themselves. Depending on exactly what kind of contexts have been added, the <option>noreview</option> parameter may be useful here as well.</para>
2438 </listitem>
2439 </varlistentry>
2440 
2441 <varlistentry>
2442 <term><option>lokalize</option></term>
2443 <listitem>
2444 <para>Open the PO file on reported messages in Lokalize. Lokalize must be already running with the project that contains the PO file opened.</para>
2445 </listitem>
2446 </varlistentry>
2447 
2448 </variablelist>
2449 </para>
2450 
2451 </sect2>
2452 
2453 <sect2 id="sv-unfuzzy-ctxmark-only">
2454 <title><command>unfuzzy-ctxmark-only</command></title>
2455 
2456 <para><command>unfuzzy-ctxmark-only</command> has a similar but less wide effect compared to the <command>unfuzzy-context-only</command> sieve. It unfuzzies a message only if the only change that caused fuzzyness is in a specific part of <varname>msgctxt</varname> string, the <emphasis>UI context marker</emphasis>.</para>
2457 
2458 <para>UI context markers are en element of <ulink url="http://techbase.kde.org/Development/Tutorials/Localization/i18n_Semantics">KUIT markup</ulink> (KDE user interface text), which state more formally the user interface context in which the text given by the PO message is used. This may be important for translation, since style guidelines will typically somewhat depend on where in the UI the text is seen. For example, there may be two messages in the code which have exactly the same text in English, but one is used as a menu item, and the other as a dialog title; with KUIT, they would be marked as:
2459 <programlisting language="po">
2460 msgctxt "@action:inmenu File"
2461 msgid "Export as HTML"
2462 msgstr ""
24632464 msgctxt "@title:window"
2465 msgid "Export as HTML"
2466 msgstr ""
2467 </programlisting>
2468 The UI context marker here is the leading part of <varname>msgctxt</varname>, starting with <literal>@...</literal> and ending with first whitespace. <command>unfuzzy-ctxmark-only</command> will unfuzzy the message if only this marker has changed (or was added or removed), but not if the change was in the rest of the context (after the first whitespace).</para>
2469 
2470 <para>Parameters:
2471 <variablelist>
2472 
2473 <varlistentry>
2474 <term><option>noreview</option></term>
2475 <listitem>
2476 <para>See the same-name parameter of <command>unfuzzy-ctxmark-only</command>. Using it here is probably somewhat safer, but this in general it depends on translation style guidelines.</para>
2477 </listitem>
2478 </varlistentry>
2479 
2480 </variablelist>
2481 </para>
2482 
2483 </sect2>
2484 
2485 <sect2 id="sv-unfuzzy-inplace-only">
2486 <title><command>unfuzzy-inplace-only</command></title>
2487 
2488 <para>Some text markups may have a "permissible" or "sloppy" mode, where some tags do not have to be explicitly terminated. The typical example is HTML, where <literal>&lt;br&gt;</literal>, <literal>&lt;hr&gt;</literal>, etc. do not have to be written as <literal>&lt;br/&gt;</literal>. (This is unlike XHTML, which is an XML instance and therefore strict in this respect.) When this permissible markup was used in the code, a programmer revisiting that code at a later time may consider it a poor style, and go about fixing it. This may cause some messages in the PO file to become fuzzy. <command>unfuzzy-inplace-only</command> will recognize some of these situations in a fuzzy message (by comparing the current and previous strings) and automatically modify the translation accordingly and unfuzzy the message.</para>
2489 
2490 <para>There are no parameters.</para>
2491 
2492 </sect2>
2493 
2494 <sect2 id="sv-unfuzzy-qtclass-only">
2495 <title><command>unfuzzy-qtclass-only</command></title>
2496 
2497 <para>PO messages obtained by conversion from Qt Linguist translation files can contain in the <varname>msgctxt</varname> an automatically extracted C++ class name, referring to the class where the message is located in the code. In the following two example messages, the C++ class name is the text before the <literal>|</literal> character:
2498 <programlisting language="po">
2499 #: ui/configdialog.cpp:50
2500 msgctxt "Sonnet::ConfigDialog|"
2501 msgid "Spell Checking Configuration"
2502 msgstr ""
2503 
2504 #: core/loader.cpp:206
2505 #, qt-format
2506 msgctxt "Sonnet::Loader|%1 = language name, %2 = country name"
2507 msgid "%1 (%2)"
2508 msgstr ""
2509 </programlisting>
2510 If the programmer later changes a class name in the code, all messages inside that class will become fuzzy. The <command>unfuzzy-qtclass-only</command> sieve can be used to unfuzzy such messages, by verifying that the only difference between the old and the new message is in the part of <varname>msgctxt</varname> before the <literal>|</literal> character. For this to work, the PO file must have been merged with <option>--previous</option> option to <command>msgmerge</command>.</para>
2511 
2512 <para>There are no parameters.</para>
2513 
2514 </sect2>
2515 
2516 <sect2 id="sv-update-header">
2517 <title><command>update-header</command></title>
2518 
2519 <para>When translation on a PO file starts for the first time, or when a previously translated PO file is being updated after merging, <command>update-header</command> can be used to automatically set and update PO header fields to proper values. The revision date is taken as current, while other pieces of information are read from the user configuration (see <xref linkend="sec-cmconfig"/>). Note that this sieve is normally only of use when you are translating with a plain text editor, while dedicated PO editors should do this automatically when the PO file is saved after editing.</para>
2520 
2521 <para>Parameters:
2522 <variablelist>
2523 
2524 <varlistentry>
2525 <term><option>proj:<replaceable>projectid</replaceable></option></term>
2526 <listitem>
2527 <para>The ID of the project to which the PO files to be updated belong. This ID is used to construct the name of the configuration section as <literal>[project-<replaceable>projectid</replaceable>]</literal>, which contains the project data fields. Also used are the fields from the <literal>[user]</literal>, whenever they are not overriden in project's section. See <xref linkend="sec-cmcfguser"/> and <xref linkend="sec-cmcfgproj"/>.</para>
2528 </listitem>
2529 </varlistentry>
2530 
2531 <varlistentry>
2532 <term><option>init</option></term>
2533 <listitem>
2534 <para>By default, the sieve tries to detect if the header has been initialized before or not, because it differs somewhat what should be changed in the header on initialization and on update. This parameter can be issued to unconditionally treat the header as not initialized, i.e. overwrite any existing content.</para>
2535 </listitem>
2536 </varlistentry>
2537 
2538 <varlistentry>
2539 <term><option>onmod</option></term>
2540 <listitem>
2541 <para>The header should be updated only if the PO file was otherwise modified. This parameter makes sense only in a sieve chane, when this sieve is preceded by a potentially modifying sieve.</para>
2542 </listitem>
2543 </varlistentry>
2544 
2545 </variablelist>
2546 </para>
2547 
2548 <para>An example of a user configuration appropriate for this sieve would be:
2549 <programlisting language="ini">
2550 [user]
2551 name = Chusslove Illich
2552 original-name = Часлав Илић
2553 email = caslav.ilic@gmx.net
2554 po-editor = Kate
2555 
2556 [project-kde]
2557 language = sr
2558 language-team = Serbian
2559 team-email = kde-i18n-sr@kde.org
2560 plural-forms = nplurals=4; plural=n==1 ? 3 : n%%10==1 &amp;&amp; \
2561                n%%100!=11 ? 0 : n%%10>=2 &amp;&amp; n%%10&lt;=4 &amp;&amp; \
2562                (n%%100&lt;10 || n%%100>=20) ? 1 : 2;
2563 </programlisting>
2564 Note that percent characters in the <literal>plural-forms</literal> field are escaped by doubling, because single <literal>%</literal> in configuration has special meaning. Also note splitting into several lines by trailing <literal>\</literal> (only for better looks, since configuration lines can be arbitrarily long).</para>
2565 
2566 </sect2>
2567 
2568 <sect2 id="sv-fr:setUbsp">
2569 <title><command>fr:setUbsp</command></title>
2570 
2571 <para>In French language, some punctuation characters are separated with an unbreakable space from the preceding word. This is unlike in English, so unwary French translators sometimes miss to add the required unbreable space after or before such punctuation when translating from English. <command>fr:setUbsp</command> will heuristically detect such places and insert an unbreakable space.</para>
2572 
2573 <para>There are no parameters.</para>
2574 
2575 </sect2>
2576 
2577 <sect2 id="sv-fr:setApostrophe">
2578 <title><command>fr:setApostrophe</command></title>
2579 
2580 <para>In French language, the <literal>’</literal> character is the apostrophe. A rule in the French team is to use directly <literal>'</literal>. <command>fr:setApostrophe</command> will detect the <literal>’</literal> and transform them to <literal>'</literal>.</para>
2581 
2582 <para>There are no parameters.</para>
2583 
2584 </sect2>
2585 
2586 <sect2 id="sv-ru:fill-doc-date-kde">
2587 <title><command>ru:fill-doc-date-kde</command></title>
2588 
2589 <para>Each translation file for a docbook in KDE has a string for documentation last update date in the format 'yyyy-mm-dd'. This sieve automatically translated those strings into Russian. The sieve uses <command>date</command> command in order to change date formatting. But Russian names of months are hardcoded, so that you do not need to set up Russian locale to use the sieve.</para>
2590 
2591 <para>There are no parameters.</para>
2592 
2593 </sect2>
2594 
2595 </sect1>
2596 
2597 <!-- ======================================== -->
2598 <sect1 id="sec-svexternal">
2599 <title>Using External Sieves</title>
2600 
2601 <para>Each internal sieve is a single Python file in <filename>sieve/</filename> subdirectory (and in <filename>lang/<replaceable>langcode</replaceable>/sieve/</filename> for language-specific sieves). The Python file is named like the sieve, only with hyphens replaced with underscores and with <filename>.py</filename> extension. <command>posieve</command> therefore knows how to find which file to execute when an internal sieve name is given as its first argument.</para>
2602 
2603 <para>However, instead of an internal sieve name, the first argument to <command>posieve</command> can also be an explicit path (relative or absolute) to a Python file which implements a sieve. Explicit paths can also be part of a sieve chain, mixed with internal sieve names. This is all there is to running external sieves; see <xref linkend="sec-prsieves"/> for instructions on how to write one.</para>
2604 
2605 </sect1>
2606 
2607 </chapter>