Warning, /sdk/pology/doc/user/common.docbook is written in an unsupported language. File is not indexed.

0001 <?xml version="1.0" encoding="UTF-8"?>
0002 <!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
0003  "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd"
0004 [
0005     <!ENTITY apibase "../../api/en_US">
0006     <!ENTITY ap "&apibase;/pology.">
0007     <!ENTITY am "-module.html#">
0008 ]>
0009 
0010 <chapter id="ch-common">
0011 <title>Common Functionality</title>
0012 
0013 <para>Different parts of Pology provide common functionality, such as thematic groups of options to scripts, file selection patterns, reliance on PO metadata, etc. This chapter describes such common functionality.</para>
0014 
0015 <!-- ======================================== -->
0016 <sect1 id="sec-cmshellcomp">
0017 <title>Shell Completion</title>
0018 
0019 <para>Shell completion means that, similarly as for command names, it is possible to contextually complete command parameters by pressing the <keycap>Tab</keycap> key. This allows you to efficiently type in the command line, as well as to quickly remind yourself of options and option parameters without resorting to documentation or browsing the file system.</para>
0020 
0021 <para>For example, pressing <keycap>Tab</keycap> just after <link linkend="ch-sieve">the <command>posieve</command> command</link> will complete sieve names, and <keycap>Tab</keycap> after the <option>-s</option> option will complete sieve parameters based on sieves that precede it in the command line. This:
0022 <programlisting language="bash">
0023 $ posieve s&lt;TAB&gt;
0024 </programlisting>
0025 will show all sieves beginning with <literal>s</literal>, and complete the sieve name once sufficient number of characters have been entered to uniquely determine it, while this:
0026 <programlisting language="bash">
0027 $ posieve stats -s m&lt;TAB&gt;
0028 </programlisting>
0029 will show all parameters to <command>stats</command> beginning with <literal>m</literal>, and complete one of them after few more characters are typed in.</para>
0030 
0031 </sect1>
0032 
0033 <!-- ======================================== -->
0034 <sect1 id="sec-cmconfig">
0035 <title>User Configuration</title>
0036 
0037 <para>Various parts of Pology can be configured through the configuration file <filename>.pologyrc</filename> in the root of user's home directory (<filename>~/.pologyrc</filename> for short). The configuration file does not have to exist, so you have to create it when you want to configure something for the first time. It must be UTF-8 encoded.</para>
0038 
0039 <para>The configuration file is in the <ulink url="http://en.wikipedia.org/wiki/INI_file">INI</ulink> format, which is composed of sections beginning with a <literal>[<replaceable>section</replaceable>]</literal> line, and fields of the form <literal><replaceable>field</replaceable> = <replaceable>value</replaceable></literal> within a section. Comments can be written after <literal>#</literal> character at the beginning of the line. Here is an example of a <filename>~/.pologyrc</filename> file:
0040 <programlisting language="ini">
0041 [global]
0042 
0043 [user]
0044 name = Chusslove Illich
0045 original-name = Часлав Илић
0046 email = caslav.ilic@gmx.net
0047 po-editor = Kate
0048 
0049 [enchant]
0050 # Autodetection sufficient.
0051 
0052 [posieve]
0053 msgfmt-check = yes
0054 param-ondiff/stats = yes
0055 
0056 # Project setups follow.
0057 
0058 [project-kde]
0059 language = sr
0060 language-team = Serbian
0061 team-email = kde-i18n-sr@kde.org
0062 plural-forms = nplurals=4; plural=n==1 ? ...
0063 </programlisting>
0064 This configuration contains five sections: <literal>[global]</literal>, <literal>[user]</literal>, <literal>[enchant]</literal>, <literal>[posieve]</literal>, and <literal>[project-kde]</literal>. The <literal>[global]</literal> section set options that have an effect throught Pology, and here it is empty. The <literal>[user]</literal> section provides some information on the person who uses Pology. The <literal>[enchant]</literal> section configures the Enchant spell checker wrapper, used by Pology for spell checking. The <literal>[posieve]</literal> section configures the behavior of the <link linkend="ch-sieve"><command>posieve</command></link> script. The <literal>[project-kde]</literal> section provides information on a project that the user contributes translation to.</para>
0065 
0066 <para>Some details about the configuration file syntax are as follows. Leading and trailing whitespace in section and field names and values is not significant,  e.g. <literal>foo=bar</literal> is same as <literal>foo = bar</literal>. Percent (%) character is used to expand the value of another field, for example:
0067 <programlisting language="ini">
0068 rootdir = /path/to/somewhere
0069 datadir = %(rootdir)s/data
0070 </programlisting>
0071 where the <literal>%(...)s</literal> is Python's string interpolation syntax. Importantly, when you need a literal % character within a value (such as in <literal>plural-forms</literal> field in the previous example), you must repeat it twice, <literal>%%</literal>. Switch-type fields (<literal>msgfmt-check</literal> in the previous example) can take any of the following values for the two states: <literal>0</literal>, <literal>no</literal>, <literal>false</literal>, or <literal>off</literal>; and <literal>1</literal>, <literal>yes</literal>, <literal>true</literal>, or <literal>on</literal> (case is not important).</para>
0072 
0073 <para>Sections in the configuration can be of one of four general types:
0074 <itemizedlist>
0075 <listitem>
0076 <para>General sections, which provide information used by various parts of Pology as they need them. The <literal>[global]</literal> and <literal>[user]</literal> sections from the previous example are general sections.</para>
0077 </listitem>
0078 <listitem>
0079 <para>External tool sections, which are used to configure external libraries and programs used within Pology. The <literal>[enchant]</literal> section from the previous example is of this type.</para>
0080 </listitem>
0081 <listitem>
0082 <para>Internal tool sections, which configure the behavior of Pology's own scripts. This is the <literal>[posieve]</literal> section from the previous example.</para>
0083 </listitem>
0084 <listitem>
0085 <para>Project sections, which provide information related to particular translation projects that the user is contributing to. Names of these sections always start with <literal>project-</literal>, such as <literal>[project-kde]</literal> from the previous example.</para>
0086 </listitem>
0087 </itemizedlist>
0088 Internal tool sections are documented together with the respective tools, while sections of other types are described in the following.</para>
0089 
0090 <para>When mentioning configuration fields in their documentation and elsewhere, they are referred to as <literal>[<replaceable>section</replaceable>]/<replaceable>field</replaceable></literal>. If there is only a fixed number of possible values to a field, this is denoted as <literal>[<replaceable>section</replaceable>]/<replaceable>field</replaceable>=[<replaceable>VALUE1</replaceable>|<replaceable>VALUE2</replaceable>|<replaceable>VALUE3</replaceable>|...]</literal>; if one of the values is the default, it is prefixed with a star (*).</para>
0091 
0092 <sect2 id="sec-cmcfgglobal">
0093 <title>The <literal>[global]</literal> section</title>
0094 
0095 <para>The <literal>[global]</literal> section contains options which can have effect on various otherwise unrelated parts of Pology.</para>
0096 
0097 <para>Known configuration fields are as follows:
0098 <variablelist>
0099 
0100 <varlistentry>
0101 <term><literal>[global]/show-backtrace=[yes|*no]</literal></term>
0102 <listitem>
0103 <para>When one of Pology commands stops execution with an error, by default only the error message is shown. However, for reporting problems and debugging, it is much better to get a <emphasis>backtrace</emphasis> instead. Backtraces can be activated by this option.</para>
0104 
0105 <para><emphasis>Whenever you want to report a problem where a Pology command aborts with an error, make sure to activate this option and submit the full backtrace.</emphasis></para>
0106 </listitem>
0107 </varlistentry>
0108 
0109 </variablelist>
0110 </para>
0111 
0112 </sect2>
0113 
0114 <sect2 id="sec-cmcfguser">
0115 <title>The <literal>[user]</literal> section</title>
0116 
0117 <para>Many parts of Pology can take advantage of information about you and the tools you use. This information is given in the <literal>[user]</literal> section. For example, when initializing PO file from a template, your name, email address in the PO header can be filled out, or a PO file can be opened in a translation editor that you use (if it is supported).</para>
0118 
0119 <para>Known configuration fields are as follows:
0120 <variablelist>
0121 
0122 <varlistentry>
0123 <term><literal>[user]/name</literal></term>
0124 <listitem>
0125 <para>Your name if it is written in Latin script, or the romanized equivalent of your name. The intention is that it is readable (or semi-readable) to people from various places in the world, who would use it to contact you if necessary.</para>
0126 </listitem>
0127 </varlistentry>
0128 
0129 <varlistentry>
0130 <term><literal>[user]/original-name</literal></term>
0131 <listitem>
0132 <para>This is your name in your native language and script, whatever it may be. If it would be the same as the name in the <literal>[user]/name</literal> field, setting this field is not necessary.</para>
0133 </listitem>
0134 </varlistentry>
0135 
0136 <varlistentry>
0137 <term><literal>[user]/email</literal></term>
0138 <listitem>
0139 <para>Your email address.</para>
0140 </listitem>
0141 </varlistentry>
0142 
0143 <varlistentry>
0144 <term><literal>[user]/language</literal></term>
0145 <listitem>
0146 <para>The language code of the language you translate into. If by any chance you translate into several languages, this field can be overridden in per-project configuration sections.</para>
0147 </listitem>
0148 </varlistentry>
0149 
0150 <varlistentry>
0151 <term><literal>[user]/encoding</literal></term>
0152 <listitem>
0153 <para>The encoding of the PO files you work on. Nowdays this should really, really be UTF-8. If it is not UTF-8 for everything that you work on, you can override it in per-project configuration sections.</para>
0154 </listitem>
0155 </varlistentry>
0156 
0157 <varlistentry>
0158 <term><literal>[user]/plural-forms</literal></term>
0159 <listitem>
0160 <para>The value for the <link linkend="sec-poplurals"><literal>Plural-Forms</literal> PO header field</link> used for your language. If it differs between projects, you can override the value set here in per-project configuration sections.</para>
0161 </listitem>
0162 </varlistentry>
0163 
0164 <varlistentry>
0165 <term><literal>[user]/po-editor</literal></term>
0166 <listitem>
0167 <para>The human-readable name of the editor with which you translate (it does not have to be a dedicated PO editor). This is used in contexts where your editor preference is announced, such as through the <literal>X-Generator</literal> PO header field.</para>
0168 </listitem>
0169 </varlistentry>
0170 
0171 <varlistentry>
0172 <term><literal>[user]/po-editor-id=[lokalize]</literal></term>
0173 <listitem>
0174 <para>The keyword under which the PO editor that you use is known to Pology. For the moment, only Lokalize is supported. This is used when a Pology tool is told to open PO files on the messages it matched.</para>
0175 </listitem>
0176 </varlistentry>
0177 
0178 </variablelist>
0179 </para>
0180 
0181 </sect2>
0182 
0183 <sect2 id="sec-cmcfgenchant">
0184 <title>The <literal>[enchant]</literal> section</title>
0185 
0186 <para>This section configures <ulink url="http://www.abisource.com/projects/enchant/">Enchant</ulink>, a wrapper library for spell checking, which is used for Pology's <link linkend="sec-lgspell">spell checking functionality</link>. Through Enchant it is possible to use various spell checkers, such as Aspell, Ispell, Hunspell, etc. in a uniform way.</para>
0187 
0188 <para>Known configuration fields are as follows:
0189 <variablelist>
0190 
0191 <varlistentry>
0192 <term><literal>[enchant]/provider=[aspell|ispell|myspell|...]</literal></term>
0193 <listitem>
0194 <para>The keyword denoting the spell checker that Enchant should use. It can also be a comma-separated list of several keywords, when Enchant will use the first available spell checker in the list. You can find the up-to-date list of all known provider keywords in the <command>enchant(1)</command> man page, and run <command>enchant-lsmod</command> command to see exactly which of those are recognized as available on the system.</para>
0195 </listitem>
0196 </varlistentry>
0197 
0198 <varlistentry>
0199 <term><literal>[enchant]/language</literal></term>
0200 <listitem>
0201 <para>The spell checking dictionary that should be used, by language code. This value is used only if the language is not specified in any other way, such as in the PO header or through command line.</para>
0202 </listitem>
0203 </varlistentry>
0204 
0205 <varlistentry>
0206 <term><literal>[enchant]/environment</literal></term>
0207 <listitem>
0208 <para>The sub-language environment for spell checking. This is related to Pology's internal spelling dictionary supplements, see the <link linkend="sec-lgspell">section on spell checking</link>. This value is used only if the environment is not specified in any other way, such as in the PO header or through command line.</para>
0209 </listitem>
0210 </varlistentry>
0211 
0212 </variablelist>
0213 </para>
0214 
0215 </sect2>
0216 
0217 <sect2 id="sec-cmcfgaspell">
0218 <title>The <literal>[aspell]</literal> section</title>
0219 
0220 <para>At first Pology used <ulink url="http://aspell.net/">Aspell</ulink> for spell checking, before Enchant was introduced. Direct support for Aspell was nevertheless kept, due to some specifics that the Enchant wrapper does not support yet. (Which means that you should better use Enchant if it satisfies your needs.)</para>
0221 
0222 <para>Known configuration fields are as follows:
0223 <variablelist>
0224 
0225 <varlistentry>
0226 <term><literal>[aspell]/language</literal></term>
0227 <listitem>
0228 <para>See <literal>[enchant]/language</literal>.</para>
0229 </listitem>
0230 </varlistentry>
0231 
0232 <varlistentry>
0233 <term><literal>[aspell]/encoding</literal></term>
0234 <listitem>
0235 <para>Encoding for the text sent to Aspell.</para>
0236 </listitem>
0237 </varlistentry>
0238 
0239 <varlistentry>
0240 <term><literal>[aspell]/variety</literal></term>
0241 <listitem>
0242 <para>The sub-language variety of the Aspell spelling dictionary.</para>
0243 </listitem>
0244 </varlistentry>
0245 
0246 <varlistentry>
0247 <term><literal>[aspell]/environment</literal></term>
0248 <listitem>
0249 <para>See <literal>[enchant]/environment</literal>.</para>
0250 </listitem>
0251 </varlistentry>
0252 
0253 <varlistentry>
0254 <term><literal>[aspell]/supplements-only=[yes|*no]</literal></term>
0255 <listitem>
0256 <para>Whether to ignore the system spelling dictionary and use only Pology's internal dictionary supplements.</para>
0257 </listitem>
0258 </varlistentry>
0259 
0260 <varlistentry>
0261 <term><literal>[aspell]/simple-split=[yes|*no]</literal></term>
0262 <listitem>
0263 <para>By default, Pology splits the text into words in a clever fashion (eliminating text markup, format directives, etc.) before sending them to the spell checker. Sometimes this leads to bad result, and then this field can be set to <literal>yes</literal> to split text simply on whitespace (possibly, in the given context, in combination with <link linkend="sec-cmhooks">a pre-filtering hook</link> on the text).</para>
0264 </listitem>
0265 </varlistentry>
0266 
0267 </variablelist>
0268 </para>
0269 
0270 </sect2>
0271 
0272 <sect2 id="sec-cmcfgproj">
0273 <title>Per-project sections (<literal>[project-*]</literal>)</title>
0274 
0275 <para>You will easily come into the situation where you need to translate and maintain translated material within different projects, each with its own set of rules and conventions. Pology is designed to support project switching extensively, and one element of that are per-project configuration sections.</para>
0276 
0277 <para>A project configuration sections has the name <literal>[project-<replaceable>PKEY</replaceable>]</literal>, where <replaceable>PKEY</replaceable> is the project keyword. You can choose the project keyword freely, but it should contain only ASCII letters, digits, underscore and hyphen. Project configuration fields frequently have fallbacks to fields in other configuration sections. This means that when the project field is not set, its corresponding field in that other (more general) section gets used instead. In the following, this is the whenever you are instructed to see a field in another section.</para>
0278 
0279 <para>Per-project configuration fields are as follows:
0280 <variablelist>
0281 
0282 <varlistentry>
0283 <term><literal>[project-*]/name</literal></term>
0284 <listitem>
0285 <para>See <literal>[user]/name</literal>.</para>
0286 </listitem>
0287 </varlistentry>
0288 
0289 <varlistentry>
0290 <term><literal>[project-*]/original-name</literal></term>
0291 <listitem>
0292 <para>See <literal>[user]/original-name</literal>.</para>
0293 </listitem>
0294 </varlistentry>
0295 
0296 <varlistentry>
0297 <term><literal>[project-*]/email</literal></term>
0298 <listitem>
0299 <para>See <literal>[user]/email</literal>.</para>
0300 </listitem>
0301 </varlistentry>
0302 
0303 <varlistentry>
0304 <term><literal>[project-*]/language</literal></term>
0305 <listitem>
0306 <para>See <literal>[user]/language</literal>.</para>
0307 </listitem>
0308 </varlistentry>
0309 
0310 <varlistentry>
0311 <term><literal>[project-*]/language-team</literal></term>
0312 <listitem>
0313 <para>This is the name of the team which translates this project into given language. Since usually there is only one translation team per language in a project, the value of this field is just the human-readable name of the language (as opposed to language code) in English.</para>
0314 </listitem>
0315 </varlistentry>
0316 
0317 <varlistentry>
0318 <term><literal>[project-*]/team-email</literal></term>
0319 <listitem>
0320 <para>The email address for communication with the translation team as whole (usually the team's mailing list).</para>
0321 </listitem>
0322 </varlistentry>
0323 
0324 <varlistentry>
0325 <term><literal>[project-*]/encoding</literal></term>
0326 <listitem>
0327 <para>See <literal>[user]/encoding</literal>.</para>
0328 </listitem>
0329 </varlistentry>
0330 
0331 <varlistentry>
0332 <term><literal>[project-*]/plural-forms</literal></term>
0333 <listitem>
0334 <para>See <literal>[user]/plural-forms</literal>.</para>
0335 </listitem>
0336 </varlistentry>
0337 
0338 </variablelist>
0339 </para>
0340 
0341 </sect2>
0342 
0343 </sect1>
0344 
0345 <!-- ======================================== -->
0346 <sect1 id="sec-cmregex">
0347 <title>Regular Expressions</title>
0348 
0349 <para>There are great many places in Pology where you can supply a matching pattern, to select or deselect something. This could be a PO file by its path, a PO message by its <varname>msgid</varname>, etc. Almost always and by default, this matching pattern will be a <emphasis>regular expression</emphasis> (or <emphasis>regex</emphasis> for short). Regular expressions are a powerful pattern matching language, a fascinating topic in their own right, and they will serve you well in just about any context of searching on computers. The plain text editor that you use probably offers regular expressions in its search dialog, so does your office text processor, and so on.</para>
0350 
0351 <para>Actually, the only point of this brief section is to impress the importance and usefulness of regular expressions onto you, in the case that you have not used them yet. The Internet is full of tutorials on regular expressions, so that there is no point in linking any one particular here.</para>
0352 
0353 <para>It should be mentioned that different regular expression engines have somewhat different syntax and expressiveness. Pology uses regular expressions from the Python Standard Library, described here: <ulink url="http://docs.python.org/library/re.html">http://docs.python.org/library/re.html</ulink> (keep in mind that this page is a reference, and not a tutorial, so you should look elsewhere to learn basics of regular expressions).</para>
0354 
0355 </sect1>
0356 
0357 <!-- ======================================== -->
0358 <sect1 id="sec-cmincexc">
0359 <title>Path Inclusion and Exclusion</title>
0360 
0361 <para>Pology scripts that can recursively search directory paths for PO files will usually provide several options by which certain files can be included or excluded from processing. The first pair of these options include or exclude files by path:
0362 <variablelist>
0363 
0364 <varlistentry>
0365 <term><option>-E <replaceable>REGEX</replaceable></option>, <option>--exclude-path=<replaceable>REGEX</replaceable></option></term>
0366 <listitem>
0367 <para>Every file with the path that does not match the supplied pattern is excluded from processing. This option can be repeated, when a file is excluded if its path matches <emphasis>every</emphasis> pattern. When you want to exclude by <emphasis>any</emphasis> pattern matching the path, you can connect those patterns with regular expression <literal>|</literal>-operator in a single option. This allows you to build up complex exclusion conditions if necessary.</para>
0368 </listitem>
0369 </varlistentry>
0370 
0371 <varlistentry>
0372 <term><option>-I <replaceable>REGEX</replaceable></option>, <option>--include-path=<replaceable>REGEX</replaceable></option></term>
0373 <listitem>
0374 <para>Only those files which have the path matching the supplied pattern are included into processing. If the option is repeated, a file is included only if its path matches every pattern.</para>
0375 </listitem>
0376 </varlistentry>
0377 
0378 </variablelist>
0379 </para>
0380 
0381 <para>Especially those PO files which are used at runtime (as opposed to those used for <link linkend="p-dynstattr">static translation</link>), but others too, are frequently sufficiently identified by their <emphasis>domain</emphasis> name. The domain name is the base name of the installed MO file without the extension, e.g. for <filename>/usr/share/locale/sr/LC_MESSAGES/foobar.mo</filename> the domain name is <literal>foobar</literal>. If, in a given translation project, PO files for a given language are all collected under one top directory of that language, their base names are also formed of domain names.<footnote>
0382 <para>The other frequently encountered file organization is when there is one directory per PO domain, and that directory contains PO files for all languages, named as <filename><replaceable>LANG</replaceable>.po</filename>.</para>
0383 </footnote> When this is the case, it may be more convenient or safer to match PO files by their domain names instead of paths, which is done by options:
0384 <variablelist>
0385 
0386 <varlistentry>
0387 <term><option>-e <replaceable>REGEX</replaceable></option>, <option>--exclude-name=<replaceable>REGEX</replaceable></option></term>
0388 <listitem>
0389 <para>Counterpart to <option>-E</option>/<option>--exclude-path</option> which matches by domain name.</para>
0390 </listitem>
0391 </varlistentry>
0392 
0393 <varlistentry>
0394 <term><option>-i <replaceable>REGEX</replaceable></option>, <option>--include-name=<replaceable>REGEX</replaceable></option></term>
0395 <listitem>
0396 <para>Counterpart to <option>-I</option>/<option>--include-path</option> which matches by domain name.</para>
0397 </listitem>
0398 </varlistentry>
0399 
0400 </variablelist>
0401 </para>
0402 
0403 <para>All inclusion and exclusion options can be freely mixed and repeated, with consequent resolution. A file is processed if it matches all inclusion patterns (if any is given) and does not match at least one exclusion pattern (if any is given). The other way around, a file is not processed if does not match at least one inclusion pattern (if any is given) or it matches all exclusion patterns (if any is given).</para>
0404 
0405 </sect1>
0406 
0407 <!-- ======================================== -->
0408 <sect1 id="sec-cmffrom">
0409 <title>Reading Paths From a File</title>
0410 
0411 <para>Sometimes it is convenient to make a temporary or semi-permanent grouping of files, such that the file group can be referenced through a single argument instead of repeating all the files all the time. This is particularly useful when shell piping is not applicable or not comfortable enough. The classic and simple way to group files is by having a file-list file, which contains one file path by line, which a shell command can read to collect files to process.</para>
0412 
0413 <para>Many Pology scripts can write and read file-list files. Having scripts write such files automatically is simple enough, just check given script's documentation to see if it has this capability (e.g. the <option>-m</option> option to <command>posieve</command>). More interesting are the special features that you can use when writing a file-list file manually. You would do this for standing categories which are periodically updated, such as a list of PO files ready for release.</para>
0414 
0415 <para>For completeness, here is first an example of a basic file-list file:
0416 <programlisting>
0417 xray/alpha.po
0418 xray/bravo.po
0419 yankee/charlie.po
0420 yankee/delta.po
0421 </programlisting>
0422 </para>
0423 
0424 <para>As is usual for path arguments to Pology scripts, you can specify both file and directory paths, and directory paths will be searched recursively for PO files (or whatever the file type that the script is processing):
0425 <programlisting>
0426 xray/
0427 yankee/
0428 zulu/echo.po
0429 zulu/foxtrot.po
0430 </programlisting>
0431 </para>
0432 
0433 <para>You can add comments by starting the line with hash (<literal>#</literal>), and have empty lines:
0434 <programlisting>
0435 # Translations ready for release.
0436 
0437 # Full modules.
0438 xray/
0439 yankee/
0440 
0441 # Specific files.
0442 zulu/echo.po
0443 zulu/foxtrot.po
0444 </programlisting>
0445 </para>
0446 
0447 <para>The inclusion-exclusion functionality equivalent to <link linkend="sec-cmincexc">inclusion-exclusion command line options</link> is provided through inclusion-exclusion directives. They are specified by starting the line with colon (<literal>:</literal>), followed by directive type token, followed by a regular expression. The directives are:
0448 <itemizedlist>
0449 <listitem>
0450 <para><literal>:/-<replaceable>REGEX</replaceable></literal> to exclude files by path,</para>
0451 </listitem>
0452 <listitem>
0453 <para><literal>:/+<replaceable>REGEX</replaceable></literal> to include files by path,</para>
0454 </listitem>
0455 <listitem>
0456 <para><literal>:-<replaceable>REGEX</replaceable></literal> to exclude files by base name without extension, and</para>
0457 </listitem>
0458 <listitem>
0459 <para><literal>:+<replaceable>REGEX</replaceable></literal> to include files by base name without extension.</para>
0460 </listitem>
0461 </itemizedlist>
0462 For example, if a whole module should be processed but for one PO file in it, it is easier to list the whole module and exclude that one file, as compared to listing all other files:
0463 <programlisting>
0464 # Modules.
0465 xray/
0466 yankee/
0467 # Exclude november.po (in whichever module it is).
0468 :-november
0469 </programlisting>
0470 Ordering and position of include-exclude directives is not significant, as they are all applied to all collected files. The semantics of application of multiple directives is the same as that of <link linkend="sec-cmincexc">counterpart command line options</link>.</para>
0471 
0472 <para>File-list files are normally fed to Pology scripts with the following option:
0473 <variablelist>
0474 
0475 <varlistentry>
0476 <term><option>-f <replaceable>FILE</replaceable></option>, <option>--files-from=<replaceable>FILE</replaceable></option></term>
0477 <listitem>
0478 <para>Read files to process from a file which contains one path per line, or special entries as described above. This option can be repeated to read several file lists. Additional paths to process can still be given as command line arguments. Any inclusion-exclusion options will be applied to the files read from the file as well (in addition to the file's internal inclusion-exclusion directives, if any).</para>
0479 </listitem>
0480 </varlistentry>
0481 
0482 </variablelist>
0483 </para>
0484 
0485 </sect1>
0486 
0487 <!-- ======================================== -->
0488 <sect1 id="sec-cmcolors">
0489 <title>Output Coloring</title>
0490 
0491 <para>In some contexts, Pology scripts color the terminal output for better visual separation and highlighting of important parts of the text. Examples include warning and error messages, data presented as tables and bars, and, importantly, matched segments of the text in search and validation operations.</para>
0492 
0493 <para>Output coloring is turned on by default, but sensitive to output destination: the text is colored if the output is to the terminal (using terminal escape sequences), but not if it is piped to a file. Pology scripts provide the following options by which you can influence this behavior:
0494 <variablelist>
0495 
0496 <varlistentry>
0497 <term><option>-R</option>, <option>--raw-colors</option></term>
0498 <listitem>
0499 <para>Disables output destination sensitivity, such that the text is always colored. This is useful when the output is piped to another command which can understand terminal escape sequences by which colors are produce, such as <command>less(1)</command>. A typical example would be piping search results from <link linkend="sv-find-messages">the <command>find-messages</command> sieve</link> to be able to scroll them back and forth:
0500 <programlisting language="bash">
0501 $ posieve find-messages ... -R | less -R
0502 </programlisting>
0503 The <option>-R</option> of <command>less</command> tells it to interpret escape sequences as colors, rather than showing them as literal text.</para>
0504 </listitem>
0505 </varlistentry>
0506 
0507 <varlistentry>
0508 <term><option>--coloring-type=[none|term*|html]</option></term>
0509 <listitem>
0510 <para>Instead of coloring for the terminal, with this option you can choose another coloring type. <literal>none</literal> disables coloring, <literal>term</literal> is the default, while <literal>html</literal> will produce HTML-tagged text ready for embedding into a web page (e.g. inside a &lt;pre&gt; element). For example, with a little bit of additional scripting, you could use <link linkend="sv-stats">the <command>stats</command> sieve</link> and <literal>html</literal> coloring to periodically update a web page with translation statistics.</para>
0511 </listitem>
0512 </varlistentry>
0513 
0514 </variablelist>
0515 </para>
0516 
0517 </sect1>
0518 
0519 <!-- ======================================== -->
0520 <sect1 id="sec-cminttools">
0521 <title>Integration with Other Tools</title>
0522 
0523 <para>One of the general aims of Pology is to fit well with other tools typically found in translation workflows based on PO. Although examples of this can be seen throughout the manual, this section gives the overview of integration by the particular supported tool.</para>
0524 
0525 <sect2 id="sec-cmsupped">
0526 <title>PO Editors</title>
0527 
0528 <para>When Pology is used to validate the translation, be it through informal but <link linkend="sv-find-messages">precise searches</link> or formal <link linkend="sec-lgrules">validation rules</link>, those translations found to be invalid must be modified (or possibly a special translator comment added to the message to silence a false positive). Pology will normally always report the PO file path and the location of the message within the file, so that you can get to it in you preferred PO editor. For greater efficiency, however, Pology can directly open the PO files on problematic messages in some PO editors. Currently these are:
0529 <variablelist>
0530 
0531 <varlistentry>
0532 <term><ulink url="http://userbase.kde.org/Lokalize">Lokalize</ulink></term>
0533 <listitem>
0534 <para>Many <link linkend="ch-sieve">sieves</link>, notably <command>find-messages</command>, <command>check-rules</command>, or <command>check-spell</command>, provide the parameter <option>lokalize</option> to open PO files on reported messages in Lokalize. This means that when run over a collection of PO files, each PO file with at least one reported message will be loaded into one of Lokalize tabs, and only the reported messages will be shown for editing under each tab. A slight catch is that Lokalize must be manually started before a sieve is run, and the Lokalize project which contains all the sieved PO files must be opened; otherwise, simply nothing will happen.</para>
0535 </listitem>
0536 </varlistentry>
0537 
0538 </variablelist>
0539 </para>
0540 
0541 </sect2>
0542 
0543 <sect2 id="sec-cmsuppvcs">
0544 <title>Version Control Systems</title>
0545 
0546 <para>From the viewpoint of translators, PO files are frequently (though not always) handled in the same way as program code, through version control systems (VCS). Pology defines an abstraction of version control functionality, which enables its tools to transparently cooperate with several VCS. Usually it is necessary to tell a Pology tool which VCS is used, which is done by specifying one of VCS keywords. Currently supported VCS and their keywords are:
0547 <itemizedlist>
0548 
0549 <listitem>
0550 <para><ulink url="http://git-scm.com/">Git</ulink>: <literal>git</literal></para>
0551 </listitem>
0552 
0553 <listitem>
0554 <para><ulink url="http://subversion.tigris.org/">Subversion</ulink>: <literal>svn</literal>, <literal>subversion</literal></para>
0555 </listitem>
0556 
0557 <listitem>
0558 <para>none (when specifying a VCS is required, but none is actually used): <literal>none</literal>, <literal>noop</literal></para>
0559 </listitem>
0560 
0561 </itemizedlist>
0562 </para>
0563 
0564 <para>VCS integration is available in following places:
0565 <itemizedlist>
0566 
0567 <listitem>
0568 <para>Producing embedded diffs with <command>poediff</command> (see <xref linkend="ch-diffpatch"/>). Option <option>-c</option>/<option>--vcs</option> can be used to switch <command>poediff</command> into VCS mode, such that it diffs given paths between repository head and working copy, or between given revisions.</para>
0569 </listitem>
0570 
0571 <listitem>
0572 <para>Translating in summit (see <xref linkend="ch-summit"/>). <command>posummit</command> will automatically add or remove files from version control as well as to and from disk, so that the modified repository tree can be directly committed after a summit maintenance operation has completed its run.</para>
0573 </listitem>
0574 
0575 <listitem>
0576 <para>Review ascription (see <xref linkend="ch-ascript"/>). VCS support is central part of <command>poascribe</command>, so it will automatically add, remove and commit files to version control as particular ascription operations require.</para>
0577 </listitem>
0578 
0579 </itemizedlist>
0580 </para>
0581 
0582 <para>Another interesting aspect of VCS support is that, when writing modified PO files to disk, by default Pology will reformat them (almost) only as much as necessary. For example, if only one <varname>msgstr</varname> string in the whole PO file has changed, and wrapping is active, only this string and nothing else will be rewrapped when the file is written out. This makes VCS revision deltas smaller and more informative.</para>
0583 
0584 </sect2>
0585 
0586 </sect1>
0587 
0588 <!-- ======================================== -->
0589 <sect1 id="sec-cmwrap">
0590 <title>Line Wrapping in PO Messages</title>
0591 
0592 <para>While <link linkend="sec-powrap">line wrapping of message strings</link> irrelevant to programs that fetch translations from them, it may be significant to the translator, especially when editing the PO file with a plain text editor. Well-wrapped strings make it easier for the translator to follow the text structure, especially in longer messages.</para>
0593 
0594 <para>Most Gettext tools (<command>msgmerge</command>, <command>msgcat</command>, <command>msgfilter</command>, etc.) provide options to wrap or not to wrap strings, where wrapping is done on the given column and escaped newlines (<literal>\n</literal>). Pology can produce this type of wrapping ("basic" wrapping) as well, but it can also wrap on expected visual line breaks in known text markup, e.g. <literal>&lt;p&gt;</literal> and <literal>&lt;br&gt;</literal> in HTML ("fine" wrapping). Compare this message in basic wrapping alone:
0595 <programlisting language="po">
0596 msgid ""
0597 "&lt;p>These settings control the storage of the corrected images. "
0598 "There are four modes to choose from:&lt;/p>&lt;p>&lt;ul>&lt;li>&lt;b>Subfolder:&lt;/"
0599 "b> The corrected images will be saved in a subfolder under the "
0600 "current album path.&lt;/li>&lt;li>&lt;b>Prefix:&lt;/b> A custom prefix will be "
0601 "added to the corrected image.&lt;/li>&lt;li>&lt;b>Suffix:&lt;/b> A custom "
0602 "suffix will be added to the corrected image.&lt;/li>&lt;li>&lt;b>Overwrite:&lt;/"
0603 "b> All original images will be replaced.&lt;/li>&lt;/ul>&lt;/p>&lt;p>Each of "
0604 "the four modes allows you to add an optional keyword to the image "
0605 "metadata.&lt;/p>"
0606 msgstr ""
0607 </programlisting>
0608 and in basic and fine wrapping together:
0609 <programlisting language="po">
0610 msgid ""
0611 "&lt;p>These settings control the storage of the corrected images. "
0612 "There are four modes to choose from:&lt;/p>"
0613 "&lt;p>"
0614 "&lt;ul>"
0615 "&lt;li>&lt;b>Subfolder:&lt;/b> The corrected images will be saved in a "
0616 "subfolder under the current album path.&lt;/li>"
0617 "&lt;li>&lt;b>Prefix:&lt;/b> A custom prefix will be added to the corrected "
0618 "image.&lt;/li>"
0619 "&lt;li>&lt;b>Suffix:&lt;/b> A custom suffix will be added to the corrected "
0620 "image.&lt;/li>"
0621 "&lt;li>&lt;b>Overwrite:&lt;/b> All original images will be replaced.&lt;/li>"
0622 "&lt;/ul>"
0623 "&lt;/p>"
0624 "&lt;p>Each of the four modes allows you to add an optional keyword "
0625 "to the image metadata.&lt;/p>"
0626 msgstr ""
0627 </programlisting>
0628 If you are editing the PO file with a dedicated PO editor, it may itself provide finely tuned wrapping and ignore the wrapping in the PO file, in which case Pology's wrapping facilities are superfluous to you<footnote>
0629 <para>But if several people are working on a collection of PO files, it is nevertheless good to agree on fixed wrapping. This is both friendly to those exposed to original wrapping, and to version control systems.</para>
0630 </footnote>. But a PO editor may also present strings wrapped just as they are in the PO file (and most do!), when Pology's fine wrapping is just as useful as in combination with a plain text editor.</para>
0631 
0632 <para>At least for alphabetic languages, the most convenient wrapping may be fine wrapping alone (no basic wrapping), while turning on editor's dynamic (visual) line wrapping. This both makes the text structure easy to follow, and allows editing the translation by logical units (paragraphs, list items) without manually adjusting column breaks or putting up with ugly overlength or mid-broken lines. However, for ideographic languages, editor's dynamic line wrapping may produce bad results, and there basic wrapping might be necessary. In fact, for the moment, for ideographic languages it may be better to pass Pology's wrapping entirely and stick with Gettext's wrapping, since the wrapping algorithm in Gettext is more sophisticated and directly supports ideographic writing systems.</para>
0633 
0634 <para>If no wrapping mode is specified when the given PO file is written out, Pology will apply basic wrapping, just as Gettext tools do. There are three general sources from which Pology tools may try to determine the wrapping mode for the given PO file, in decreasing priority: from the command line options, from the PO file's header, and from the user configuration. A tool may or may not provide command line options and configuration fields for wrapping, but PO file headers are always consulted (since this is in Pology's core PO file handling facilities). See the description of <link linkend="hdr-x-wrapping">the <literal>X-Wrapping</literal> header field</link> for how to set the wrapping mode in the PO header, and <link linkend="sv-set-header">the <literal>set-header</literal> sieve</link> for how to set this field in many PO files at once.</para>
0635 
0636 <sect2 id="sec-cmwropts">
0637 <title>Common Command Line Options for Wrapping</title>
0638 
0639 <para>Pology tools in which the wrapping mode can be set from command line, will provide the following options:
0640 <variablelist>
0641 
0642 <varlistentry>
0643 <term><option>--wrap</option></term>
0644 <listitem>
0645 <para>Perform basic wrapping, on certain column.</para>
0646 </listitem>
0647 </varlistentry>
0648 
0649 <varlistentry>
0650 <term><option>--no-wrap</option></term>
0651 <listitem>
0652 <para>Do not perform basic wrapping.</para>
0653 </listitem>
0654 </varlistentry>
0655 
0656 <varlistentry>
0657 <term><option>--fine-wrap</option></term>
0658 <listitem>
0659 <para>Perform fine wrapping, on various expected visual breaks introduced by text markup in rendered text.</para>
0660 </listitem>
0661 </varlistentry>
0662 
0663 <varlistentry>
0664 <term><option>--no-fine-wrap</option></term>
0665 <listitem>
0666 <para>Do not perform fine wrapping.</para>
0667 </listitem>
0668 </varlistentry>
0669 
0670 <varlistentry>
0671 <term><option>--wrap-column=<replaceable>COL</replaceable></option></term>
0672 <listitem>
0673 <para>The column at which the text should be wrapped. The wrapped line in the PO file will never be longer than this many columns, including the outer quotes. If not given, the default is 79.</para>
0674 </listitem>
0675 </varlistentry>
0676 
0677 </variablelist>
0678 Both positive and negative wrapping options are provided in order to be able to override the wrapping mode defined by the user configuration of the PO header. As in Gettext tools, strings are always wrapped on <literal>\n</literal> regardless of the wrapping mode.</para>
0679 
0680 </sect2>
0681 
0682 <sect2 id="sec-cmwrcfg">
0683 <title>Common User Configuration Fields for Wrapping</title>
0684 
0685 <para>The following configuration fields will be read by the tools which consult the user configuration for wrapping mode, in their respective configuration sections:
0686 <variablelist>
0687 
0688 <varlistentry>
0689 <term><literal>[<replaceable>section</replaceable>]/wrap=[*yes|no]</literal></term>
0690 <listitem>
0691 <para>Whether to perform basic wrapping, counterpart to <option>--wrap</option> and <option>--no-wrap</option> command line options.</para>
0692 </listitem>
0693 </varlistentry>
0694 
0695 <varlistentry>
0696 <term><literal>[<replaceable>section</replaceable>]/fine-wrap=[yes|*no]</literal></term>
0697 <listitem>
0698 <para>Whether to perform fine wrapping, counterpart to <option>--fine-wrap</option> and <option>--no-fine-wrap</option> command line options.</para>
0699 </listitem>
0700 </varlistentry>
0701 
0702 </variablelist>
0703 </para>
0704 
0705 </sect2>
0706 
0707 </sect1>
0708 
0709 <!-- ======================================== -->
0710 <sect1 id="sec-cmheader">
0711 <title>Influential Header Fields</title>
0712 
0713 <para><link linkend="sec-poheader">The PO header</link> is a natural place to provide the information which holds for the PO file as whole. Pology scripts, sieves, and hooks can take into account a number of header fields, when available, to automatically determine some aspects of processing. The fields considered are as follows:
0714 <variablelist>
0715 
0716 <varlistentry id="hdr-language">
0717 <term><literal>Language</literal></term>
0718 <listitem>
0719 <para>This field contains the language code of the translation, which Pology will take into account in all contexts where language-dependent processing is done (such as when <link linkend="sec-lgspell">spell-checking</link>). You can also specify the language into which you translate in <link linkend="sec-cmconfig">user configuration</link>, and sometimes in the command line. The language stated by the PO header will override the user configuration, but it will be in turn overridden by the command line. See also <xref linkend="sec-lglangenv"/>.</para>
0720 </listitem>
0721 </varlistentry>
0722 
0723 <varlistentry id="hdr-x-accelerator-marker">
0724 <term><literal>X-Accelerator-Marker</literal></term>
0725 <listitem>
0726 <para><link linkend="sec-poaccel">Accelerator markers</link> are a frequent obstacle in text processing, such as <link linkend="sv-find-messages">searching</link> or <link linkend="sec-lgspell">spell-checking</link>, because they can split words apart. This field can be used to specify which character is used as accelerator marker throughout the file, if any. If there are several possible characters, they can be given as comma-separated list<footnote>
0727 <para>This does mean that the case when the comma itself is the accelerator marker is not covered, but this case is beyond unlikely.</para>
0728 </footnote>. While it is usually possible to specify the accelerator marker through the command line, the header field is much more convenient and flexible: there is no need to remember to add the command line option at every run, and different PO files can have different accelerator markers. However, if command line option is issued, it will override the header field.</para>
0729 
0730 <para>There is a difference between this field not existing in the header, and existing but with an empty value (i.e. <literal>"X-Accelerator-Marker: \n"</literal>). If the field does not exist, some processing elements will go into the "greedy" mode, where they use a list of known frequent accelerator markers (e.g. to remove them from the text). If the field is set to empty value, these processing elements will take it that there are no accelerator markers in text.</para>
0731 </listitem>
0732 </varlistentry>
0733 
0734 <varlistentry id="hdr-x-associated-ui-catalogs">
0735 <term><literal>X-Associted-UI-Catalogs</literal></term>
0736 <listitem>
0737 <para>This field lists the PO domains which are the source of user interface references (button labels, menu items, etc.) throughout the text in current PO file. This makes it possible to automatically fetch and insert UI translations, rather than having to look them up manually and maintain them against changes; see <xref linkend="sec-lguirefs"/> for details. Several PO domains can be given as space- or comma-separated list. If the UI message is found in more than one listed PO domain, the earlier in the list takes precedence.</para>
0738 </listitem>
0739 </varlistentry>
0740 
0741 <varlistentry id="hdr-x-environment">
0742 <term><literal>X-Environment</literal></term>
0743 <listitem>
0744 <para>The language environment to which the translation belongs; see <xref linkend="sec-lglangenv"/> for details. It can be a single keyword, or a comma-separated list of keywords. If several environments are given, the later in the list (which is usually the more specific) takes precedence.</para>
0745 </listitem>
0746 </varlistentry>
0747 
0748 <varlistentry id="hdr-x-text-markup">
0749 <term><literal>X-Text-Markup</literal></term>
0750 <listitem>
0751 <para>When the text contains <link linkend="sec-pomarkup">markup</link>, it may be useful to remove it such that only the plain text remains. This is the case, for example, when computing word counts or applying <link linkend="sec-lgrules">terminology validation rules</link>. Another use case would be the validation of markup itself (whether a tag is properly closed, whether a tag exists, etc.) This header field specifies the markup type found in the text, as a keyword, so that Pology can determine how to process it. Several markup types can be given as comma-separated list.</para>
0752 
0753 <para>Pology currently recognizes the following markup types:
0754 <itemizedlist>
0755 <listitem><para><literal>docbook4</literal> -- Docbook 4.x markup, in documentation POs</para></listitem>
0756 <listitem><para><literal>html</literal> -- HTML 4.01</para></listitem>
0757 <listitem><para><literal>kde4</literal> -- markup in KDE4 UI POs, a mix of Qt rich-text and KUIT</para></listitem>
0758 <listitem><para><literal>kuit</literal> -- UI semantic markup in KDE 4</para></listitem>
0759 <listitem><para><literal>qtrich</literal> -- Qt rich-text, (almost) a subset of HTML</para></listitem>
0760 <listitem><para><literal>xmlents</literal> -- only XML-like entities, no other formal markup</para></listitem>
0761 </itemizedlist>
0762 </para>
0763 
0764 </listitem>
0765 </varlistentry>
0766 
0767 <varlistentry id="hdr-x-wrapping">
0768 <term><literal>X-Wrapping</literal></term>
0769 <listitem>
0770 <para>This header field can be set to tell Pology how to <link linkend="sec-cmwrap">wrap strings</link> in the current PO file, for example, when <command>posieve</command> modifies a message and writes the modified PO file, or when rewrapping is done explicitly by <command>porewrap</command>. The value is a comma-separated list of wrapping modes, chosen from:
0771 <itemizedlist>
0772 <listitem><para><literal>basic</literal> -- wrapping on certain column</para></listitem>
0773 <listitem><para><literal>fine</literal> -- wrapping on logical breaks (such as <literal>&lt;p&gt;</literal> or <literal>&lt;br/&gt;</literal> tags)</para></listitem>
0774 </itemizedlist>
0775 Wrapping on escaped newline <literal>\n</literal> is always performed, regardless of the wrapping mode. If the field value is empty, no other wrapping is done. If more than one wrapping mode is given (e.g. <literal>"X-Wrapping: basic, fine\n"</literal>), it is specificaly defined how modes are combined, so the ordering is not important. As usual, if wrapping is specified by a command line option, that will override the header field.</para>
0776 </listitem>
0777 </varlistentry>
0778 
0779 </variablelist>
0780 </para>
0781 
0782 <para>All of the listed header fields may be set manually, when you get to work on the particular PO file. But frequently it is possible to set them automatically, or at least automatically for the first time with later manual corrections where needed. For this you may use <link linkend="sv-set-header">the <command>set-header</command> sieve</link>. If PO files are periodically merged by the translation project automation (rather than each translator merging on his own only the PO files which he is about to update), the natural moment to run <command>set-header</command> is just after the merging. If translation is done <link linkend="ch-summit">in summit</link>, you can specify in summit configuration to set header fields on merging.</para>
0783 
0784 </sect1>
0785 
0786 <!-- ======================================== -->
0787 <sect1 id="sec-cmhooks">
0788 <title>Processing Hooks</title>
0789 
0790 <para>Pology enables the user to insert special processing elements, called <emphasis>hooks</emphasis>, at many places in the processing chain. Hooks are Python functions with certain prescribed input, output, and behavior. Depending on the exact combination of these three ingredients, there are various <emphasis>hook types</emphasis>. Finally, some hooks can be adapted to a given context through their <emphasis>hook factories</emphasis>. Pology defines many hooks internally, and users can add their own external hooks.</para>
0791 
0792 <para>Usage of hooks is best illustrated through examples. Suppose that you want to use the <link linkend="sv-find-messages">the <command>find-messages</command> sieve</link> to look for a certain word, but the text contains XML-like tags of the form <literal>&lt;<replaceable>tagname</replaceable>&gt;...&lt;<replaceable>/tagname</replaceable>&gt;</literal> which happen to be throwing off your search. Suppose that there exists a hook called <literal>remove-xml-tags</literal>, in the Pology library module <literal>remove</literal>, which takes a piece of text as input and returns that piece of text cleared of any XML-like tags. Then you could insert this hook into the search to clear the tags before matching the text, by using the <option>filter:</option> parameter to <command>find-messages</command>:
0793 <programlisting language="bash">
0794 $ posieve find-messages -s filter:'remove/remove-xml-tags' ...
0795 </programlisting>
0796 Here <literal>remove/remove-xml-tags</literal> is the hook specification, and this is its usual simplest form: the module name, followed by slash, followed by the hook name. (Sometimes it can be only the module name, when the hook function within that module has the same name as the module, but this is rare.) The hook specification was enclosed in single quotes, for the shell to see it as single string; this was not necessary here, but it is a good habit to keep up when adding hooks through command line, because hook specification can get quite involved.</para>
0797 
0798 <para>Suppose now that there is a single hook that can remove any kind of markup from the text (not only XML-like tags) called <literal>remove-markup</literal>, but that it has to be told which markup to remove, by giving it one of the markup type keywords known to Pology. Continuing the previous example, this could be done like this:
0799 <programlisting language="bash">
0800 $ posieve find-messages -s filter:'remove/remove-markup~"docbook4"' ...
0801 </programlisting>
0802 Now the hook specification is <literal>remove/remove-markup~"docbook4"</literal>. Note that outer single quotes in the command line are necessary, as otherwise the shell would strip internal double quotes, which are here integral part of hook specification. <literal>remove-markup</literal> is actually a <emphasis>hook factory</emphasis>, which produces a hook based on the parameters given after the tilde (<literal>~</literal>) character. Here <literal>"docbook4"</literal> is that parameter; why must it be quoted? Because the part after the tilde is passed as argument list to a Python function, and <literal>"docbook4"</literal> must be of string type, which is in Python denoted by quotes. For a hook factory <literal>foo/bar</literal> which would take a string and a number, the hook specification would be <literal>foo/bar~"qwyx",5</literal>. Sometimes a hook factory has default values for some or all of its arguments; in the latter case, if the defaults are sufficient, the part after the tilde in the hook specification can be left empty (e.g. <literal>foo/bar~</literal>).</para>
0803 
0804 <para>Hooks can be language- and project-dependent. Suppose that in your language the letters are sometimes accented, but the accents should be ignored on spell-checking. Then Pology may contain a hook which strips accents from text in your language. If your language code is <literal>ll</literal>, and the hook is <literal>remove-accents</literal> in (language-specific) module <literal>remove</literal>, you could check spelling while ignoring accents using the <link linkend="sv-find-messages">the <command>check-spell-ec</command> sieve</link>:
0805 <programlisting language="bash">
0806 $ posieve check-spell-ec -s filter:'ll:remove/remove-accents' ...
0807 </programlisting>
0808 The hook specification now also contains the language code separated by colon, as <literal>ll:...</literal>. If the hook is project-specific instead, it is prefixed with <literal>pp%...</literal>, where <literal>pp</literal> is the project identifier and percent sign the separator. If the hook is both language- and project-specific, then the specification is <literal>ll:pp%...</literal> or <literal>pp%ll:...</literal>.</para>
0809 
0810 <sect2 id="sec-cmhooktypes">
0811 <title>Hook Types</title>
0812 
0813 <para>In places where a hook can be inserted, it is convenient to succinctly state which types of hooks are acceptable. Hook types are therefore coded with letter-number-letter combinations. The first letter can be F, V, or S, standing for filtering, validation, or side-effect hook, in that order. Filtering hooks modify their input, validation hooks report problems in input in a way understood by their clients, while side-effect hooks can do anything except modifying the input. The number after the first letter describes the composition of input, which can be pure text, PO message, PO header, etc. and their combinations. The final letter indicates the semantics of the input, like whether the input text is supposed to be the original (<varname>msgid</varname>) or the translation (<varname>msgstr</varname>) or can be any of them.</para>
0814 
0815 <para>The following hooks types are currently defined (the hook type is followed by the expected input in parenthesis):
0816 <variablelist>
0817 
0818 <varlistentry>
0819 <term>F1A (text)</term>
0820 <listitem>
0821 <para>Modifies the input text.</para>
0822 </listitem>
0823 </varlistentry>
0824 
0825 <varlistentry>
0826 <term>V1A (text)</term>
0827 <listitem>
0828 <para>Validates the input text.</para>
0829 </listitem>
0830 </varlistentry>
0831 
0832 <varlistentry>
0833 <term>S1A (text)</term>
0834 <listitem>
0835 <para>Side-effects based on the input text.</para>
0836 </listitem>
0837 </varlistentry>
0838 
0839 <varlistentry>
0840 <term>F3A (text, message, file)</term>
0841 <listitem>
0842 <para>Modifies the input text, which is one of the strings in the given PO message, which belongs to the given PO file. The difference between F1A and F3A hooks is that an F1A hook can process text based only on the text itself, while an F3A hook can process text by taking into account the information elsewhere in the PO message (e.g. in comments) and the PO file (e.g. in header). This holds for all *1* and *3* hook types.</para>
0843 </listitem>
0844 </varlistentry>
0845 
0846 <varlistentry>
0847 <term>V3A (text, message, file)</term>
0848 <listitem>
0849 <para>Validates the input text, which is one of the strings in the given PO message, which belongs to the given PO file.</para>
0850 </listitem>
0851 </varlistentry>
0852 
0853 <varlistentry>
0854 <term>S3A (text, message, file)</term>
0855 <listitem>
0856 <para>Side-effects based on the input text, which is one of the strings in the given PO message, which belongs to the given PO file.</para>
0857 </listitem>
0858 </varlistentry>
0859 
0860 <varlistentry>
0861 <term>F3B (original, message, file)</term>
0862 <listitem>
0863 <para>Modifies the input text, which is the <varname>msgid</varname> (or <varname>msgid_plural</varname>) string in the given PO message, which belongs to the given PO file. The difference between F3A and F3B hooks is that the input text of an F3B hook is expected to be precisely the original string in the message, while giving anything else will lead to undefined results. This holds for all *3A, *3B, *3C hook types.</para>
0864 </listitem>
0865 </varlistentry>
0866 
0867 <varlistentry>
0868 <term>V3B (original, message, file)</term>
0869 <listitem>
0870 <para>Validates the input text, which is the <varname>msgid</varname> (or <varname>msgid_plural</varname>) string in the given PO message, which belongs to the given PO file.</para>
0871 </listitem>
0872 </varlistentry>
0873 
0874 <varlistentry>
0875 <term>S3B (original, message, file)</term>
0876 <listitem>
0877 <para>Side-effects based on the input text, which is the <varname>msgid</varname> (or <varname>msgid_plural</varname>) string in the given PO message, which belongs to the given PO file.</para>
0878 </listitem>
0879 </varlistentry>
0880 
0881 <varlistentry>
0882 <term>F3C (translation, message, file)</term>
0883 <listitem>
0884 <para>Modifies the input text, which is one of the <varname>msgstr</varname> strings in the given PO message, which belongs to the given PO file.</para>
0885 </listitem>
0886 </varlistentry>
0887 
0888 <varlistentry>
0889 <term>V3C (translation, message, file)</term>
0890 <listitem>
0891 <para>Validates the input text, which is one of the <varname>msgstr</varname> strings in the given PO message, which belongs to the given PO file.</para>
0892 </listitem>
0893 </varlistentry>
0894 
0895 <varlistentry>
0896 <term>S3C (translation, message, file)</term>
0897 <listitem>
0898 <para>Side-effects based on the input text, which is one of the <varname>msgstr</varname> strings in the given PO message, which belongs to the given PO file.</para>
0899 </listitem>
0900 </varlistentry>
0901 
0902 <varlistentry>
0903 <term>F4A (message, file)</term>
0904 <listitem>
0905 <para>Modifies the input PO message, which belongs to the given PO file. The difference between F4A and F3A hooks is that an F3A hook can modify only the given string in the message, while an F4A hook can modify any number of strings, comments, etc. in the message. This holds for all *3* and *4* hook types.</para>
0906 </listitem>
0907 </varlistentry>
0908 
0909 <varlistentry>
0910 <term>V4A (message, file)</term>
0911 <listitem>
0912 <para>Validates the input PO message, which belongs to the given PO file.</para>
0913 </listitem>
0914 </varlistentry>
0915 
0916 <varlistentry>
0917 <term>S4A (message, file)</term>
0918 <listitem>
0919 <para>Side-effects based on the input PO message, which belongs to the given PO file.</para>
0920 </listitem>
0921 </varlistentry>
0922 
0923 <varlistentry>
0924 <term>F4B (header, file)</term>
0925 <listitem>
0926 <para>Modifies the input PO header, which belongs to the given PO file.</para>
0927 </listitem>
0928 </varlistentry>
0929 
0930 <varlistentry>
0931 <term>V4B (header, file)</term>
0932 <listitem>
0933 <para>Validates the input PO header, which belongs to the given PO file.</para>
0934 </listitem>
0935 </varlistentry>
0936 
0937 <varlistentry>
0938 <term>S4B (header, file)</term>
0939 <listitem>
0940 <para>Side-effects based on the input PO header, which belongs to the given PO file.</para>
0941 </listitem>
0942 </varlistentry>
0943 
0944 <varlistentry>
0945 <term>F5A (file)</term>
0946 <listitem>
0947 <para>Modifies the input PO file. As opposed to F1* and F3* hooks, which can modify only elements within PO messages, F5* hooks can also add, remove, and change positions of messages within the PO file.</para>
0948 </listitem>
0949 </varlistentry>
0950 
0951 <varlistentry>
0952 <term>V5A (file)</term>
0953 <listitem>
0954 <para>Validates the input PO file. As opposed to V1* and V3* hooks, which report only problems confined to PO messages, V5* hooks can also report problems due to relation between several PO messages each of which is valid in itself.</para>
0955 </listitem>
0956 </varlistentry>
0957 
0958 <varlistentry>
0959 <term>S5A (file)</term>
0960 <listitem>
0961 <para>Side-effects based on the input PO file.</para>
0962 </listitem>
0963 </varlistentry>
0964 
0965 <varlistentry>
0966 <term>F6A (any file)</term>
0967 <listitem>
0968 <para>Modifies the input file, whether in PO or another format, on the level of pure text lines. This is unlike F5A hooks which operate on the level of entries in the PO file; F6A hooks are also typically limited to certain types of files, perhaps even only PO files. This holds for all *6* hook types.</para>
0969 </listitem>
0970 </varlistentry>
0971 
0972 <varlistentry>
0973 <term>V6A (raw file)</term>
0974 <listitem>
0975 <para>Validates the input file.</para>
0976 </listitem>
0977 </varlistentry>
0978 
0979 <varlistentry>
0980 <term>S6A (raw file)</term>
0981 <listitem>
0982 <para>Side-effects based on the input file.</para>
0983 </listitem>
0984 </varlistentry>
0985 
0986 </variablelist>
0987 </para>
0988 
0989 </sect2>
0990 
0991 <sect2 id="sec-cminthooks">
0992 <title>List of Internal Hooks</title>
0993 
0994 <para>Pology does not establish strict separation between users and programmers, but presents a continuum between pure use and pure programming, so that users can engage according to their needs and abilities. Hooks, in particular, occupy the middle of this range. On the one hand side, they can be used even from command line; on the other hand side, they are actually Python functions, and hook specifications (in command line and elsewhere) sometimes require Python argument lists (the part after the tilde). This makes it hard both to list all available hooks<footnote>
0995 <para>For example, any Python function in Pology that takes one string and returns the modified version of that string can be considered an F1A hook!</para>
0996 </footnote>, and to decide where and how to document them, in the user manual or in the library programming interface (API) documentation. Therefore, the following will be done. Here, in the user manual, only functions written specifically to be used as hooks will be listed (sometimes grouped by similarity), with their types and short descriptions. To that the link to the complete hook description in the API documentation will be added.<footnote>
0997 <para>In the API documentation, the very first line of the function description will show if the function is a direct hook or a hook factory, the function header will list the inputs for a direct hook (which conform to the declared hook type) or the factory parameters for a hook factory, and the rest of the description will explain the operation of the hook and the meaning of factory parameters.</para>
0998 </footnote></para>
0999 
1000 <sect3 id="sec-cmghooks">
1001 <title>General Hooks</title>
1002 
1003 <para>
1004 <variablelist>
1005 
1006 <varlistentry id="hk-bpatterns-bad-patterns">
1007 <term><ulink url="&ap;bpatterns&am;bad_patterns"><literal>bpatterns/bad-patterns</literal></ulink> (S3A)</term>
1008 <term><ulink url="&ap;bpatterns&am;bad_patterns_msg"><literal>bpatterns/bad-patterns-msg</literal></ulink> (S4A)</term>
1009 <term><ulink url="&ap;bpatterns&am;bad_patterns_msg_sp"><literal>bpatterns/bad-patterns-msg-sp</literal></ulink> (V4A)</term>
1010 <listitem>
1011 <para>Detects unwanted patterns in text, by regular expression matching. Patterns can be specified either as direct arguments, or listed in file given as argument.</para>
1012 
1013 <caution><para>This hook is deprecated. Use <link linkend="sec-lgrules">validation rules</link> instead, which are much a richer method of defining and checking for problems.</para></caution>
1014 </listitem>
1015 </varlistentry>
1016 
1017 <varlistentry id="hk-gtxtools-msgfilter">
1018 <term><ulink url="&ap;gtxtools&am;msgfilter"><literal>gtxtools/msgfilter</literal></ulink> (F6A)</term>
1019 <listitem>
1020 <para>Pipes the PO file through Gettext's <command>msgfilter(1)</command>. The filter argument and options to <command>msgfilter</command> can be specified as parameters to hook factory. (May be used to wrap the PO file canonically, as Pology does not produce exactly the same wrapping as Gettext tools.)</para>
1021 </listitem>
1022 </varlistentry>
1023 
1024 <varlistentry id="hk-gtxtools-msgfmt">
1025 <term><ulink url="&ap;gtxtools&am;msgfmt"><literal>gtxtools/msgfmt</literal></ulink> (S6A)</term>
1026 <listitem>
1027 <para>Pipes the PO file through Gettext's <command>msgfmt(1)</command>, discarding output and reporting any errors as warnings. Useful for hard check of the PO file syntax, and extended checks performed when <command>msgfmt</command> is run with <option>--check</option> option.</para>
1028 </listitem>
1029 </varlistentry>
1030 
1031 <varlistentry id="hk-markup-check-xml">
1032 <term><ulink url="&ap;markup&am;check_xml"><literal>markup/check-xml</literal></ulink> (S3C)</term>
1033 <term><ulink url="&ap;markup&am;check_xml_sp"><literal>markup/check-xml-sp</literal></ulink> (V3C)</term>
1034 <listitem>
1035 <para>Checks whether general XML markup in translation is well-formed, and possibly also whether entities are defined. Checks can be performed either only when the original text itself is valid or unconditionally.</para>
1036 </listitem>
1037 </varlistentry>
1038 
1039 <varlistentry id="hk-markup-check-xml-spec">
1040 <term><ulink url="&ap;markup&am;check_docbook4"><literal>markup/check-docbook4</literal></ulink> (S3C)</term>
1041 <term><ulink url="&ap;markup&am;check_docbook4_sp"><literal>markup/check-docbook4-sp</literal></ulink> (V3C)</term>
1042 <term><ulink url="&ap;markup&am;check_docbook4_msg"><literal>markup/check-docbook4-msg</literal></ulink> (V4A)</term>
1043 <term><ulink url="&ap;markup&am;check_html"><literal>markup/check-html</literal></ulink> (S3C)</term>
1044 <term><ulink url="&ap;markup&am;check_html_sp"><literal>markup/check-html-sp</literal></ulink> (V3C)</term>
1045 <term><ulink url="&ap;markup&am;check_qtrich"><literal>markup/check-qtrich</literal></ulink> (S3C)</term>
1046 <term><ulink url="&ap;markup&am;check_qtrich_sp"><literal>markup/check-qtrich-sp</literal></ulink> (V3C)</term>
1047 <term><ulink url="&ap;markup&am;check_kde4"><literal>markup/check-kde4</literal></ulink> (S3C)</term>
1048 <term><ulink url="&ap;markup&am;check_kde4_sp"><literal>markup/check-kde4-sp</literal></ulink> (V3C)</term>
1049 <term><ulink url="&ap;markup&am;check_pango"><literal>markup/check-pango</literal></ulink> (S3C)</term>
1050 <term><ulink url="&ap;markup&am;check_pango_sp"><literal>markup/check-pango-sp</literal></ulink> (V3C)</term>
1051 <listitem>
1052 <para>Specializations of <link linkend="hk-markup-check-xml"><literal>markup/check-xml</literal></link> hook for various XML formats. Aside from well-formedness, these hooks can also check whether used tags really exist in the format, whether tags are properly nested, etc. (Full conformance to DTD or schema cannot be checked due to chunking into messages.)</para>
1053 </listitem>
1054 </varlistentry>
1055 
1056 <varlistentry id="hk-markup-check-xmlents">
1057 <term><ulink url="&ap;markup&am;check_xmlents"><literal>markup/check-xmlents</literal></ulink> (S3C)</term>
1058 <term><ulink url="&ap;markup&am;check_xmlents_sp"><literal>markup/check-xmlents-sp</literal></ulink> (V3C)</term>
1059 <listitem>
1060 <para>Checks whether XML-like entities (<literal>&amp;foo;</literal>) are defined. This can be used when the markup is not trully XML-like but it uses XML-like entities, or simply to have separate checking of tagging (by <link linkend="hk-markup-check-xml-spec"><literal>markup/check-xml-*</literal></link> hooks) and entities for convenience.</para>
1061 </listitem>
1062 </varlistentry>
1063 
1064 <varlistentry id="hk-noop">
1065 <term><ulink url="&ap;noop&am;text"><literal>noop/text</literal></ulink> (F1A)</term>
1066 <term><ulink url="&ap;noop&am;textm"><literal>noop/textm</literal></ulink> (F3A)</term>
1067 <term><ulink url="&ap;noop&am;msg"><literal>noop/msg</literal></ulink> (F4A)</term>
1068 <term><ulink url="&ap;noop&am;hdr"><literal>noop/hdr</literal></ulink> (F4B)</term>
1069 <term><ulink url="&ap;noop&am;cat"><literal>noop/cat</literal></ulink> (F5A)</term>
1070 <term><ulink url="&ap;noop&am;path"><literal>noop/path</literal></ulink> (F6A)</term>
1071 <listitem>
1072 <para>Filtering hooks that do nothing ("no-operation"). These are useful in contexts where a filtering hook is required, but input should not be really modified.</para>
1073 </listitem>
1074 </varlistentry>
1075 
1076 <varlistentry id="hk-normalize-demangle-srcrefs">
1077 <term><ulink url="&ap;normalize&am;demangle_srcrefs"><literal>normalize/demangle-srcrefs</literal></ulink> (F4A)</term>
1078 <listitem>
1079 <para>In some message extraction scenarios, the source references end up pointing to dummy files which existed only during the extraction, but true source references can still be reconstructed (based on dummy file names or extracted comments). This hook will reconstruct true source references and replace dummy references with them.</para>
1080 </listitem>
1081 </varlistentry>
1082 
1083 <varlistentry id="hk-normalize-uniq-source">
1084 <term><ulink url="&ap;normalize&am;uniq_source"><literal>normalize/uniq-source</literal></ulink> (F4A)</term>
1085 <listitem>
1086 <para>Sometimes source references in PO message end up doubled (e.g. one prefixed with <filename>./</filename> and the other not) due to perculiarities of the extraction process. This hook will make source references unique.</para>
1087 </listitem>
1088 </varlistentry>
1089 
1090 <varlistentry id="hk-normalize-uniq-auto-comment">
1091 <term><ulink url="&ap;normalize&am;uniq_auto_comment"><literal>normalize/uniq-auto-comment</literal></ulink> (F4A)</term>
1092 <listitem>
1093 <para>When extracted comments are automatically added to messages by the extraction tool, if the message is repeated in several source files it may end up containing multiple equal extracted comments. This hook can be used to make extracted comments unique (either all or those matching some criteria).</para>
1094 </listitem>
1095 </varlistentry>
1096 
1097 <varlistentry id="hk-normalize-canonical-header">
1098 <term><ulink url="&ap;normalize&am;canonical_header"><literal>normalize/canonical-header</literal></ulink> (F4B)</term>
1099 <listitem>
1100 <para>Rearranges content of the PO header into canonical form. For example, translator comments will be sorted according to years of contribution, any repeated translator comments will be merged, etc.</para>
1101 </listitem>
1102 </varlistentry>
1103 
1104 <varlistentry id="hk-remove-remove-accel">
1105 <term><ulink url="&ap;remove&am;remove_accel_text"><literal>remove/remove-accel-text</literal></ulink> (F3A)</term>
1106 <term><ulink url="&ap;remove&am;remove_accel_text_greedy"><literal>remove/remove-accel-text-greedy</literal></ulink> (F3A)</term>
1107 <term><ulink url="&ap;remove&am;remove_accel_msg"><literal>remove/remove-accel-msg</literal></ulink> (F4A)</term>
1108 <term><ulink url="&ap;remove&am;remove_accel_msg_greedy"><literal>remove/remove-accel-msg-greedy</literal></ulink> ()</term>
1109 <listitem>
1110 <para>Removes accelerator marker from one or all strings in the message. They will check if the PO file <link linkend="hdr-x-accelerator-marker">specifies the accelerator marker</link>; if not, non-greedy variants will do nothing, while greedy variants will remove everything that is frequently used as accelerator marker.</para>
1111 </listitem>
1112 </varlistentry>
1113 
1114 <varlistentry id="hk-remove-remove-markup">
1115 <term><ulink url="&ap;remove&am;remove_markup_text"><literal>remove/remove-markup-text</literal></ulink> (F3A)</term>
1116 <term><ulink url="&ap;remove&am;remove_markup_msg"><literal>remove/remove-markup-msg</literal></ulink> (F4A)</term>
1117 <listitem>
1118 <para>Converts markup (e.g. XML tags) in one or all strings in the message to plain text. The PO file will be asked for <link linkend="hdr-x-text-markup">the expected markup types in text</link>; if no markup type is specified, these hooks will do nothing.</para>
1119 </listitem>
1120 </varlistentry>
1121 
1122 <varlistentry id="hk-remove-remove-fmtdirs">
1123 <term><ulink url="&ap;remove&am;remove_fmtdirs_text"><literal>remove/remove-fmtdirs-text</literal></ulink> (F3A)</term>
1124 <term><ulink url="&ap;remove&am;remove_fmtdirs_text_tick"><literal>remove/remove-fmtdirs-text-tick</literal></ulink> (F3A)</term>
1125 <term><ulink url="&ap;remove&am;remove_fmtdirs_msg"><literal>remove/remove-fmtdirs-msg</literal></ulink> (F4A)</term>
1126 <term><ulink url="&ap;remove&am;remove_fmtdirs_msg_tick"><literal>remove/remove-fmtdirs-msg-tick</literal></ulink> (F4A)</term>
1127 <listitem>
1128 <para>Removes <link linkend="sec-poformdir">format directives</link>
1129 in one or all strings in the message, or replaces them with a fixed placeholder. The type of format directives is determined by <literal>*-format</literal> message flags.</para>
1130 </listitem>
1131 </varlistentry>
1132 
1133 <varlistentry id="hk-remove-remove-literals">
1134 <term><ulink url="&ap;remove&am;remove_literals_text"><literal>remove/remove-literals-text</literal></ulink> (F3A)</term>
1135 <term><ulink url="&ap;remove&am;remove_literals_text_tick"><literal>remove/remove-literals-text-tick</literal></ulink> (F3A)</term>
1136 <term><ulink url="&ap;remove&am;remove_literals_msg"><literal>remove/remove-literals-msg</literal></ulink> (F4A)</term>
1137 <term><ulink url="&ap;remove&am;remove_literals_msg_tick"><literal>remove/remove-literals-msg-tick</literal></ulink> (F4A)</term>
1138 <listitem>
1139 <para>Removes "literal" segments from one or all strings in the message, or replaces them wih a fixed placeholder. Literal segments are those which are used as computer input somewhere along the line, such as URLs, email addresses, command line options, etc. and therefore generally do not conform to human language rules. Translator can also explicitly declare literal segments, by adding a special translator comment.</para>
1140 </listitem>
1141 </varlistentry>
1142 
1143 <varlistentry id="hk-remove-remove-marlits">
1144 <term><ulink url="&ap;remove&am;remove_marlits_text"><literal>remove/remove-marlits-text</literal></ulink> (F3A)</term>
1145 <term><ulink url="&ap;remove&am;remove_marlits_msg"><literal>remove/remove-marlits-msg</literal></ulink> (F4A)</term>
1146 <listitem>
1147 <para><link linkend="hk-remove-remove-literals"><literal>remove/remove-literals-*</literal></link> hooks can positively determine only certain types of literals based on the text alone. If the text contains semantic markup, such as Docbook, literal segments can also be determined based on tags, and these hooks will remove both such tags and their text. The markup type will be taken <link linkend="hdr-x-text-markup">from the PO file</link>. (When these hooks are used, <literal>remove/remove-literals-*</literal> is not needed.)</para>
1148 </listitem>
1149 </varlistentry>
1150 
1151 <varlistentry id="hk-remove-rewrite-msgid">
1152 <term><ulink url="&ap;remove&am;rewrite_msgid"><literal>remove/rewrite-msgid</literal></ulink> (F4A)</term>
1153 <listitem>
1154 <para><link linkend="sec-lgrules">Checks are sometimes defined</link> such that something is first looked up in the original text, and if it is found, something is expected in the translation. No matter how well written these checks are, the original text will sometimes be a bit out of the ordinary, and the check will fail the translation although everything is fine. This can usually be corrected by the translator manually adding a directive, in a special translator comment, to "rewrite" the problematic part of the original before the check is applied.</para>
1155 </listitem>
1156 </varlistentry>
1157 
1158 <varlistentry id="hk-remove-rewrite-inverse">
1159 <term><ulink url="&ap;remove&am;rewrite_inverse"><literal>remove/rewrite-inverse</literal></ulink> (F4A)</term>
1160 <listitem>
1161 <para>The original text in the message needs to be modified for the same reasons as described in <link linkend="hk-remove-rewrite-msgid"><literal>remove/rewrite-msgid</literal></link>, but it is actually easiest to replace the original text entirely with the original text from another message sharing the same translation (i.e. by "inverse" pairing of messages over translation).</para>
1162 </listitem>
1163 </varlistentry>
1164 
1165 <varlistentry id="hk-remove-remove-paired-ents">
1166 <term><ulink url="&ap;remove&am;remove_paired_ents"><literal>remove/remove-paired-ents</literal></ulink> (F4A)</term>
1167 <term><ulink url="&ap;remove&am;remove_paired_ents_tick"><literal>remove/remove-paired-ents-tick</literal></ulink> (F4A)</term>
1168 <listitem>
1169 <para>Removes all XML-like entities (<literal>&amp;foo;</literal>) from the original text, and all XML-like entities from the translation that were encountered in the original. This may be useful prior to markup validity checks, when the list of defined entities cannot be provided.</para>
1170 </listitem>
1171 </varlistentry>
1172 
1173 <varlistentry id="hk-spell">
1174 <term><ulink url="&ap;spell&am;check_spell"><literal>spell/check-spell</literal></ulink> (S3A)</term>
1175 <term><ulink url="&ap;spell&am;check_spell_sp"><literal>spell/check-spell-sp</literal></ulink> (V3A)</term>
1176 <listitem>
1177 <para>Spell-checking hooks for Aspell, as one element of <link linkend="sec-lgspell">Pology's spell-checking functionality</link>.</para>
1178 </listitem>
1179 </varlistentry>
1180 
1181 <varlistentry id="hk-spell-ec">
1182 <term><ulink url="&ap;spell&am;check_spell_ec"><literal>spell/check-spell-ec</literal></ulink> (S3A)</term>
1183 <term><ulink url="&ap;spell&am;check_spell_ec_sp"><literal>spell/check-spell-ec-sp</literal></ulink> (V3A)</term>
1184 <listitem>
1185 <para>Spell-checking hooks for various spell checkers through the Enchant wrapper, as one element of <link linkend="sec-lgspell">Pology's spell-checking functionality</link>.</para>
1186 </listitem>
1187 </varlistentry>
1188 
1189 <varlistentry id="hk-uiref-resolve-ui">
1190 <term><ulink url="&ap;uiref&am;resolve_ui"><literal>uiref/resolve-ui</literal></ulink> (F3C)</term>
1191 <term><ulink url="&ap;uiref&am;resolve_ui_docbook4"><literal>uiref/resolve-ui-docbook4</literal></ulink> (F3C)</term>
1192 <term><ulink url="&ap;uiref&am;resolve_ui_kde4"><literal>uiref/resolve-ui-kde4</literal></ulink> (F3C)</term>
1193 <listitem>
1194 <para>When translating program documentation, using these hooks it is possible to leave UI references (button labels, menu items, etc.) untranslated and let them be <link linkend="sec-lguirefs">automatically inserted into translation later on</link>. The basic hook requires UI references to be manually wrapped in translation in order to be detected, while specialized versions will also use semantic markup for detection (e.g. <literal>&lt;guilabel&gt;</literal> element in Docbook).</para>
1195 </listitem>
1196 </varlistentry>
1197 
1198 <varlistentry id="hk-uiref-check-ui">
1199 <term><ulink url="&ap;uiref&am;check_ui"><literal>uiref/check-ui</literal></ulink> (V3C)</term>
1200 <term><ulink url="&ap;uiref&am;check_ui_docbook4"><literal>uiref/check-ui-docbook4</literal></ulink> (V3C)</term>
1201 <term><ulink url="&ap;uiref&am;check_ui_kde4"><literal>uiref/check-ui-kde4</literal></ulink> (V3C)</term>
1202 <listitem>
1203 <para>While <link linkend="hk-uiref-resolve-ui"><literal>uiref/resolve-ui</literal></link> hooks will complain when they cannot find a translation for a UI reference, when checking the overall validity of translation it is more convenient to use specialized check-only hooks which will not modify the PO file on succesfully resolved UI references.</para>
1204 </listitem>
1205 </varlistentry>
1206 
1207 </variablelist>
1208 </para>
1209 
1210 </sect3>
1211 
1212 <sect3 id="sec-cmlshooks">
1213 <title>Language-Specific Hooks</title>
1214 
1215 <para>
1216 <variablelist>
1217 
1218 <varlistentry id="hk-lang-ja-katakana">
1219 <term><ulink url="&ap;lang.ja.katakana&am;katakana"><literal>ja:katakana</literal></ulink> (F1A)</term>
1220 <listitem>
1221 <para>Removes everything but Katakana words from Japanese text, and separates retained words with spaces. (Used as filter prior to spell-checking words in Katakana.)</para>
1222 </listitem>
1223 </varlistentry>
1224 
1225 <varlistentry id="hk-lang-nn-exclusion-inofficial-forms">
1226 <term><ulink url="&ap;lang.nn.exclusion&am;inofficial_forms"><literal>nn:exclusion/inofficial-forms</literal></ulink> (V3C)</term>
1227 <listitem>
1228 <para>Checks if there are any inofficial word forms in Norwegian Nynorsk translation.</para>
1229 </listitem>
1230 </varlistentry>
1231 
1232 <varlistentry id="hk-lang-sr-accents-resolve-agraphs">
1233 <term><ulink url="&ap;lang.sr.accents&am;resolve_agraphs"><literal>sr:accents/resolve-agraphs</literal></ulink> (F1A)</term>
1234 <listitem>
1235 <para>Converts "accent graphs" to proper accented letters in Serbian Cyrillic text (e.g. <literal>^а</literal> becomes <literal>а̂</literal>).</para>
1236 </listitem>
1237 </varlistentry>
1238 
1239 <varlistentry id="hk-lang-sr-accents-remove-accents">
1240 <term><ulink url="&ap;lang.sr.accents&am;remove_accents"><literal>sr:accents/remove-accents</literal></ulink> (F1A)</term>
1241 <listitem>
1242 <para>Replaces accented letters in Serbian Cyrillic text with their non-accented counterparts. (Useful as filter prior to spell-checking.)</para>
1243 </listitem>
1244 </varlistentry>
1245 
1246 <varlistentry id="hk-lang-sr-limit-charset">
1247 <term><ulink url="&ap;lang.sr.charsets&am;limit_to_isocyr"><literal>sr:charsets/limit-to-isocyr</literal></ulink> (F1A)</term>
1248 <term><ulink url="&ap;lang.sr.charsets&am;limit_to_isolat"><literal>sr:charsets/limit-to-isolat</literal></ulink> (F1A)</term>
1249 <listitem>
1250 <para>In situations where it is necessary to use an 8-bit encoding instead of Unicode for Serbian text, these hooks can be used to constrain characters in text to only those representable by the target 8-bit encoding.</para>
1251 </listitem>
1252 </varlistentry>
1253 
1254 <varlistentry id="hk-lang-sr-checks-naked-latin">
1255 <term><ulink url="&ap;lang.sr.checks&am;naked_latin"><literal>sr:checks/naked-latin</literal></ulink> (V3C)</term>
1256 <term><ulink url="&ap;lang.sr.checks&am;naked_latin_origui"><literal>sr:checks/naked-latin-origui</literal></ulink> (V3C)</term>
1257 <term><ulink url="&ap;lang.sr.checks&am;naked_latin_se"><literal>sr:checks/naked-latin-se</literal></ulink> (S3C)</term>
1258 <term><ulink url="&ap;lang.sr.checks&am;naked_latin_origui_se"><literal>sr:checks/naked-latin-origui-se</literal></ulink> (S3C)</term>
1259 <listitem>
1260 <para>In translations into Serbian using Cyrillic script, ordinary segments in Latin script may indicate error or omission in translation. These hooks will look for such stray Latin segments, while ignoring recognizable literal segments such as URLs, commands, options, etc.</para>
1261 </listitem>
1262 </varlistentry>
1263 
1264 <varlistentry id="hk-lang-sr-nobr-to-nobr-hyphens">
1265 <term><ulink url="&ap;lang.sr.nobr.to_nobr_hyphens&am;"><literal>sr:nobr/to-nobr-hyphens</literal></ulink> (F1A)</term>
1266 <listitem>
1267 <para>The ordinary hyphen (-) is normally treated as a character on which the text can be split into the next line. In Serbian texts, hyphens are sometimes used to attach case endings to nouns (especially acronyms), which should not be split into the next line. This hooks guesses such positions and replaces the ordinary hyphen with no-break hyphen.</para>
1268 </listitem>
1269 </varlistentry>
1270 
1271 <varlistentry id="hk-lang-sr-reduce">
1272 <term><ulink url="&ap;lang.sr.reduce&am;words_ec"><literal>sr:reduce/words-ec</literal></ulink> (F1A)</term>
1273 <term><ulink url="&ap;lang.sr.reduce&am;words_ec_lw"><literal>sr:reduce/words-ec-lw</literal></ulink> (F1A)</term>
1274 <term><ulink url="&ap;lang.sr.reduce&am;words_ic"><literal>sr:reduce/words-ic</literal></ulink> (F1A)</term>
1275 <term><ulink url="&ap;lang.sr.reduce&am;words_ic_lw"><literal>sr:reduce/words-ic-lw</literal></ulink> (F1A)</term>
1276 <term><ulink url="&ap;lang.sr.reduce&am;words_ic_lw_dlc"><literal>sr:reduce/words-ic-lw-dlc</literal></ulink> (F1A)</term>
1277 <listitem>
1278 <para>Various reductions of Serbian text to a subset of words of certain type, possibly rearranged in a particular way.</para>
1279 </listitem>
1280 </varlistentry>
1281 
1282 <varlistentry id="hk-lang-sr-trapres-froments">
1283 <term><ulink url="&ap;lang.sr.trapres&am;froments"><literal>sr:trapres/froments</literal></ulink> (F3C)</term>
1284 <term><ulink url="&ap;lang.sr.trapres&am;froments_t1"><literal>sr:trapres/froments-t1</literal></ulink> (F3C)</term>
1285 <term><ulink url="&ap;lang.sr.trapres&am;froments_t1db"><literal>sr:trapres/froments-t1db</literal></ulink> (F3C)</term>
1286 <listitem>
1287 <para>Hooks which resolve grammatical inserts in form of XML entities in Serbian text, based on the "trapnakron" contained within Pology. See the documentation in Serbian section for details.</para>
1288 </listitem>
1289 </varlistentry>
1290 
1291 <varlistentry id="hk-lang-sr-uiref">
1292 <term><ulink url="&ap;lang.sr.uiref&am;"><literal>sr:uiref/mod_entities</literal></ulink> (F1A)</term>
1293 <listitem>
1294 <para>When UI references are <link linkend="hk-uiref-resolve-ui">automatically resolved</link> in documentation, and the UI texts may contain <link linkend="hk-lang-sr-trapres-froments">grammatical inserts in form of XML entities</link>, these inserts may need to be slightly modified to keep the documentation structure valid.</para>
1295 </listitem>
1296 </varlistentry>
1297 
1298 <varlistentry id="hk-lang-sr-wconv">
1299 <term><ulink url="&ap;lang.sr.wconv&am;ctol"><literal>sr:wconv/ctol</literal></ulink> (F1A)</term>
1300 <term><ulink url="&ap;lang.sr.wconv&am;cltoa"><literal>sr:wconv/cltoa</literal></ulink> (F1A)</term>
1301 <term>and many more</term>
1302 <listitem>
1303 <para>Hooks for various transliterations and hybridizations of Serbian text, by script (Cyrillic, Latin) and dialect (Ekavian, Ijekavian). See the documentation in Serbian section for details.</para>
1304 </listitem>
1305 </varlistentry>
1306 
1307 </variablelist>
1308 </para>
1309 
1310 </sect3>
1311 
1312 <sect3 id="sec-cmpshooks">
1313 <title>Project-Specific Hooks</title>
1314 
1315 <para>
1316 <variablelist>
1317 
1318 <varlistentry id="hk-proj-kde-header-equip-header">
1319 <term><ulink url="&ap;proj.kde.header&am;equip_header"><literal>kde%header/equip-header</literal></ulink> (F4B)</term>
1320 <listitem>
1321 <para>Adds assorted header fields to PO files within the KDE Translation Project, with values based on their name and position in the repository tree, so that Pology and other tools are better informed how to process them.</para>
1322 </listitem>
1323 </varlistentry>
1324 
1325 </variablelist>
1326 </para>
1327 
1328 </sect3>
1329 
1330 </sect2>
1331 
1332 <sect2 id="sec-cmexthooks">
1333 <title>Using External Hooks</title>
1334 
1335 <para>[Not implemented yet.]</para>
1336 
1337 <para>See <xref linkend="sec-prhooks"/> for instructions on how to write and contribute hooks.</para>
1338 
1339 </sect2>
1340 
1341 </sect1>
1342 
1343 <!-- ======================================== -->
1344 <sect1 id="sec-cmskipcheck">
1345 <title>Skipping and Selecting Checks</title>
1346 
1347 <para>With all the different heuristic checks and rules that Pology can apply, false positives -- messages proclaimed invalid when they are actually valid -- are inevitable. False positivies are <emphasis>very</emphasis> inconvenient in serious automatic quality control effort. They make it harder for translators to spot real problems, which in turn demotivates them to apply automatic checks at all. If there is one or few dedicated persons in the translation team to tweak and apply automatic checks, they would be particularly hard-hit with this negative feedback. False positives can reduce automatic quality control from a strong normative element in the workflow, to merely advisory "run-if-you-have-the-time" extra.</para>
1348 
1349 <para>For this reason, most checks in Pology provide a way for them to be disabled on certain messages, files, or the processing batch, such that it is possible to methodically cancel false positives. From the other side, it is usually possible to run one or few checks on their own, in order to be easier to define and debug. Each checking tool and element documents such functionality, and in the following only some general patterns are described.</para>
1350 
1351 <para>The simplest method to disable or enable some checks is "dynamically", for single validation run, through an option to the tool which is being run. For example, <link linkend="sv-check-rules">the <command>check-rules</command> sieve</link> provides several parameters to select and deselect validation rules which are to be applied. The important point here is that checks in Pology usualy have some sort of a unique identifier, a keyword, by which they can be referred to.</para>
1352 
1353 <para>"Static" methods to disable or enable checks are those where the instruction is written down somewhere, in a specific format, and automatically taken into account by the validation tool in subsequent runs. There may be several static methods to disable a certain check, differing in their reach: a group of PO files, single PO file, single message, or even a part of the text in the message. Within one PO file, the following methods are common:
1354 <itemizedlist>
1355 
1356 <listitem>
1357 <para>The PO header is a natural place to disable or enable checks for the complete PO file, by adding a custom <literal>X-</literal> header field.</para>
1358 </listitem>
1359 
1360 <listitem>
1361 <para>On the single message level, the only place where it is possible to add a manual processing instruction is a <link linkend="sec-pomancmnt">translator comment</link>. This is because if it would put anywhere else (e.g. as extracted comment or a flag), it would be removed on subsequent merging with template. These instructions are usualy kept simple, like this:
1362 <programlisting language="po">
1363 # <replaceable>some-instruction</replaceable>: <replaceable>arguments</replaceable>
1364 #: ...
1365 msgid "..."
1366 msgstr "..."
1367 </programlisting>
1368 Instructions are always composed of two or more words, separated by hyphens, ended by colon, and followed by an arbitrary argument string (e.g. a list of identifiers of checks to skip on this message). This makes it sufficiently unlikely that another, free-form translator comment will be accidentally interpreted as a known instruction.<footnote>
1369 <para>Especially considering that free-form translator comments are more usually written in the language of the translation.</para>
1370 </footnote></para>
1371 </listitem>
1372 
1373 <listitem>
1374 <para id="p-trflag">A special type of translator comment with processing instructions is a comment of the following form:
1375 <programlisting language="po">
1376 # |, <replaceable>flag1</replaceable>, <replaceable>flag2</replaceable>, ...
1377 </programlisting>
1378 This is a "translator flag" comment, which is used to set processing instructions too simple to occupy one whole comment line (e.g. those of the switch type, never needing arguments). It starts with <literal>|,</literal>, and continues with comma-separated list of flag-like keywords.</para>
1379 </listitem>
1380 
1381 </itemizedlist>
1382 </para>
1383 
1384 </sect1>
1385 
1386 </chapter>