doc/user/diffpatch.docbook

0001 <?xml version="1.0" encoding="UTF-8"?>
0002 <!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
0003  "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd">
0004
0005 <chapter id="ch-diffpatch">
0006 <title>Diffing and Patching</title>
0007
0008 <para><emphasis>Line-level</emphasis> diffing of plain text files assumes that the file is chunked into lines as largest well-defined units, that each line has a significant standalone meaning, and that the ordering of lines is not arbitrary. For example, this is typical of programming language code.</para>
0009
0010 <para>Superficially, PO files could also be considered "a programming language of translation", and amenable to same line-level treatment on diffing. However, some of the outlined assumptions, which make line-level diffing viable, are violated in the PO format. Firstly, the minimal unit of PO file is one message, whereas one line has little semantic value. Secondly, ordering of messages can be arbitrary in principle (e.g. dependent on the order of extraction from program code files), such that two line-wise very different PO files are actually equivalent from translator's viewpoint. And thirdly, good number of lines in the PO file are auxiliary, neither original text nor translation, generated either automatically or by the programmer (e.g. source references, extracted comments), all of which are out of translator's scope for modifications.</para>
0011
0012 <para>Due to these difficulties, the common way to use line-level diffing with PO files is only for review, and even that with some preparations. Due to myriad line-wise different but semantically equivalent representations of the PO file, it is almost useless to send line-level diffs as patches. Translators are instead told to always send full PO files to the reviewer or the commiter, no matter what is the amount of modifications. Then, the reviewer merges the received PO file (new version), and possibly the original (old version), with current PO template, without wrapping of message strings (<varname>msgid</varname>, <varname>msgstr</varname>, etc.). This "normalizes" the old and the new file with respect to all semantically non-significant elements, and only then can line-level diffing be performed. Additionally, since a long non-wrapped line of text may differ only in few words, a dedicated diff viewer which can highlight <emphasis>word-level</emphasis> differences should be used. Ordinary diff syntax highlighting (e.g. in shell, or in general text editor) would waste reviewer's time in trying to see those few changed words.</para>
0013
0014 <para>Even with preparations and dedicated diff viewer at hand, there is at least one significant case which is still not reasonably covered: when a fuzzy message with previous strings (i.e. when PO file was merged with <option>--previous</option> option to <command>msgmerge</command>) has been updated and unfuzzied. For example:
0015 <informaltable>
0016 <tgroup cols="2">
0017 <colspec colwidth="2cm"/>
0018 <colspec />
0019 <tbody>
0020 <row>
0021 <entry>old</entry>
0022 <entry><programlisting language="po">
0023 #: main.c:110
0024 #, fuzzy
0025 #| msgid "The Record of The Witch River"
0026 msgid "Records of The Witch River"
0027 msgstr "Beleška o Veštičjoj reci"
0028 </programlisting></entry>
0029 </row>
0030 <row>
0031 <entry>new</entry>
0032 <entry><programlisting language="po">
0033 #: main.c:110
0034 msgid "Records of The Witch River"
0035 msgstr "Beleške o Veštičjoj reci"
0036 </programlisting></entry>
0037 </row>
0038 <row>
0039 <entry>diff</entry>
0040 <entry><programlisting language="diff">
0041 ⁠ #: main.c:110
0042 - #, fuzzy
0043 - #| msgid "The Record of The Witch River"
0044   msgid "Records of The Witch River"
0045 - msgstr "Beleška o Veštičjoj reci"
0046 + msgstr "Beleške o Veštičjoj reci"
0047 </programlisting></entry>
0048 </row>
0049 </tbody>
0050 </tgroup>
0051 </informaltable>
0052 The line-level diff viewer will know to show word-level diff for modified translation, but it cannot know that it should also show word-level diff between the removed previous and current <varname>msgid</varname> strings, so that reviewer can see what has changed <emphasis>in the original</emphasis> text (i.e. why had the message became fuzzy), and based on that judge whether the translation was properly adapted.</para>
0053
0054 <para>A dedicated PO editor may be able to show the truly proper, <emphasis>message-level</emphasis> difference.<footnote>
0055 <para>For example Lokalize, when operating in <ulink url="http://docs.kde.org/stable/en/kdesdk/lokalize/sync.html">merge mode</ulink>.</para>
0056 </footnote> Even then, however, it remains necessary to send around full PO files, and possibly to normalize them to a lesser extent before comparing. Additionally, the diff format becomes tied to the given PO editor, instead of being self-contained and processable by various tools (such as line-level diffs are).</para>
0057
0058 <para>This chapter therefore introduces the format and semantics for self-contained, message-level diffing of PO files -- the <emphasis>embedded diff</emphasis> -- and presents the Pology tools which implement it.</para>
0059
0060 <!-- ======================================== -->
0061 <sect1 id="sec-dpformat">
0062 <title>The Embedded Diff Format</title>
0063
0064 <para>Difference between two PO messages should primarily, though not exclusively, consist of differences between its string parts (<varname>msgid</varname>, <varname>msgstr</varname>, etc.) To be well observable, differences between strings should be as localized as possible -- think of a long paragraph in which only the spelling of a word or some punctuation was changed. Finally, the format of the complete PO message diff should be intuitively comprehensible to translators which are used to the PO format itself, and to some extent compatible with existing PO processing tools.</para>
0065
0066 <para>These considerations lead to making the diff of two PO messages be a PO message itself. In other words, the diff gets <emphasis>embedded</emphasis> into the regular parts of a PO message. An embedded diff (<emphasis>ediff</emphasis> for short) message should be at least syntactically valid, if not semantically (it should not cause a simple <command>msgfmt</command> run to fail, though <command>msgfmt --check</command> could). To be possible to exchange ediffs as patches for PO files, the embedding should be resolvable into the old and the new messages from which the diff was created.</para>
0067
0068 <para>In this way, if ediff messages are packed into a PO file (an <emphasis>ediff PO</emphasis>), existing PO tools can be used to review and modify the diff. For example, highlighting in a text editor will need only minimal upgrades to show the embedded differences (more on that below), and otherwise it will already highlight ediff message parts as usual.</para>
0069
0070 <para>To fully define the ediff format, the following questions should be answered:
0071 <itemizedlist>
0072 <listitem>
0073 <para>How to represent embedded differences in strings?</para>
0074 </listitem>
0075 <listitem>
0076 <para>Which parts of the PO message should be diffed?</para>
0077 </listitem>
0078 <listitem>
0079 <para>How to pair for diffing messages from two PO files?</para>
0080 </listitem>
0081 <listitem>
0082 <para>How to present collection of diffed messages?</para>
0083 </listitem>
0084 </itemizedlist>
0085 </para>
0086
0087 <sect2 id="sec-dpfrmembstr">
0088 <title>Embedding Differences into Strings</title>
0089
0090 <para>Once the word-level difference between the old and the new string has been computed, it should be somehow embedded it into the new string (or, equivalently, the old string). This can be done by wrapping removed and added text segments with <literal>{-...-}</literal> and <literal>{+...+}</literal>, respectively:
0091 <informaltable>
0092 <tgroup cols="2">
0093 <colspec colwidth="2cm"/>
0094 <colspec />
0095 <tbody>
0096 <row>
0097 <entry>old</entry>
0098 <entry><programlisting language="po">
0099 "The Record of The Witch River"
0100 </programlisting></entry>
0101 </row>
0102 <row>
0103 <entry>new</entry>
0104 <entry><programlisting language="po">
0105 "Records of The Witch River"
0106 </programlisting></entry>
0107 </row>
0108 <row>
0109 <entry>diff</entry>
0110 <entry><programlisting language="po">
0111 "{-The Record-}{+Records+} of The Witch River"
0112 </programlisting></entry>
0113 </row>
0114 </tbody>
0115 </tgroup>
0116 </informaltable>
0117 </para>
0118
0119 <para>It may happen that an opening or closing wrapper sequence occurs as a literal part of diffed strings<footnote>
0120 <para>Although this should be quite rare. In the collection of PO files from several translation projects, with over 2 million words in total, there was not a single occurence where one of the chosen wrapper sequences was part of the text.</para>
0121 </footnote>, so some method of escaping is necessary. This is done by inserting a <literal>~</literal> (tilde) in the middle of the literal sequence:
0122 <informaltable>
0123 <tgroup cols="2">
0124 <colspec colwidth="2cm"/>
0125 <colspec />
0126 <tbody>
0127 <row>
0128 <entry>old</entry>
0129 <entry><programlisting language="po">
0130 "Foo {+ bar"
0131 </programlisting></entry>
0132 </row>
0133 <row>
0134 <entry>new</entry>
0135 <entry><programlisting language="po">
0136 "Foo {+ qwyx"
0137 </programlisting></entry>
0138 </row>
0139 <row>
0140 <entry>diff</entry>
0141 <entry><programlisting language="po">
0142 "Foo {~+ {-bar-}{+qwyx+}"
0143 </programlisting></entry>
0144 </row>
0145 </tbody>
0146 </tgroup>
0147 </informaltable>
0148 If strings instead contain the literal sequence <literal>{~+</literal>, then another tilde is inserted, and so on. In this way, ediff can be unambiguously resolved to old and new versions of the string. Escaping by inserting tildes also makes it easier to write a syntax higlighting definition for an editor, as the wrapper pattern is automatically broken by the tilde.</para>
0149
0150 <para>It may happen that a given string is not merely empty in the old or new PO message, but that it does not exist at all (e.g. <literal>msgctxt</literal>). For this reason it is possible to make ediff between an existing and non-existing string as well, in which case a tilde is appended to the very end of the ediff:
0151 <informaltable>
0152 <tgroup cols="2">
0153 <colspec colwidth="2cm"/>
0154 <colspec />
0155 <tbody>
0156 <row>
0157 <entry>old</entry>
0158 <entry><programlisting language="po">
0159
0160 </programlisting></entry>
0161 </row>
0162 <row>
0163 <entry>new</entry>
0164 <entry><programlisting language="po">
0165 "a-context-note"
0166 </programlisting></entry>
0167 </row>
0168 <row>
0169 <entry>diff</entry>
0170 <entry><programlisting language="po">
0171 "{+a-context-note+}~"
0172 </programlisting></entry>
0173 </row>
0174 </tbody>
0175 </tgroup>
0176 </informaltable>
0177 Here too escaping is provided, by inserting further tildes if the ediff between two existing strings would result in a trailing tilde (if the old string is <literal>"~"</literal> and the new <literal>"foo~"</literal>, the ediff is <literal>"{+foo+}~~"</literal>).</para>
0178
0179 <para>It is not necessary to prescribe the exact algorithm for computing the difference between two strings. In fact, the diffing tool may allow translator to select between several diffing algorithms, depending on personal taste and situation. For example, the default algorithm of <link linkend="sec-dpdiff">Pology's <command>poediff</command></link> does the following: words are diffed as atomic sequences, all non-word segments (punctuation, markup tags, etc.) are diffed character by character, and equal non-word segments in between two different words (e.g. whitespace) are included into the difference segment. Hence the above ediff
0180 <programlisting language="po">
0181 "{-The Record-}{+Records+} of The Witch River"
0182 </programlisting>
0183 instead of the smaller
0184 <programlisting language="po">
0185 "{-The -}Record{+s+} of The Witch River"
0186 </programlisting>
0187 as the former is (tentatively) easier to comprehend.</para>
0188
0189 <para>Since every difference segment in the ediff message is represented in the described way, it is sufficient to upgrade the PO syntax highlighting of an editor<footnote>
0190 <para>At the moment, the following text and PO editors are known to have highlighting for ediffs: Kate, Kwrite, Lokalize.</para>
0191 </footnote> to indiscriminately highlight <literal>{-...-}</literal> and <literal>{+...+}</literal> segments everywhere in the message.</para>
0192
0193 </sect2>
0194
0195 <sect2 id="sec-dpfrmincparts">
0196 <title>Message Parts Included in Diffing</title>
0197
0198 <para>A PO message consists of several types of parts: strings, comments, flags, source references, etc. It would not be very constructive to diff all of them; for example, while <varname>msgstr</varname> strings should clearly be included into diffing, source references most probably should not. To avoid pondering over the advantages and disadvantages of including each and every message part, there already exists a well-defined splitting of message parts into two groups, one of which will be taken into diffing, and the other not. These two groups are:
0199 <itemizedlist>
0200 <listitem>
0201 <para id="p-msgparts-extinv"><emphasis>Extraction-invariant</emphasis> parts are those which do not depend on placement (or even presence) of the message in the source file. These are <varname>msgid</varname> string, <varname>msgstr</varname> strings, manual comments, etc.</para>
0202 </listitem>
0203 <listitem>
0204 <para id="p-msgparts-extpre"><emphasis>Extraction-prescribed</emphasis> parts are those which cannot exist independently of the source file from which the message is extracted, such as format flags or extracted comments.</para>
0205 </listitem>
0206 </itemizedlist>
0207 Only extraction-invariant parts will be diffed. The working definition of which parts belong to this group is provided by what remains in obsolete messages in PO files:
0208 <itemizedlist>
0209 <listitem>
0210 <para>current original text: <varname>msgctxt</varname>, <varname>msgid</varname>, and <varname>msgid_plural</varname> strings</para>
0211 </listitem>
0212 <listitem>
0213 <para>previous original text: <literal>#| msgctxt</literal>, <literal>#| msgid</literal>, and <literal>#| msgid_plural</literal> comments</para>
0214 </listitem>
0215 <listitem>
0216 <para>translation text: <varname>msgstr</varname> strings</para>
0217 </listitem>
0218 <listitem>
0219 <para>translator comments</para>
0220 </listitem>
0221 <listitem>
0222 <para>fuzzy state (whether the <literal>fuzzy</literal> flag is present)</para>
0223 </listitem>
0224 <listitem>
0225 <para>obsolete state (whether the message is obsolete)</para>
0226 </listitem>
0227 </itemizedlist>
0228 </para>
0229
0230 <para>Strings and translator comments are presented in the ediff message as embedded word-level differences, as described earlier. Changes in state, fuzzy and obsolete, are represented differently. A special "extracted" comment is added to the ediff message, starting with <literal>#. ediff:</literal> and listing any extra information needed to describe the ediff, including the state changes. Here is an example of two messages and the ediff they would produce<footnote>
0231 <para>Whether two messages such as these would get paired for diffing in the first place, will be discussed later on.</para>
0232 </footnote>:
0233 <informaltable>
0234 <tgroup cols="2">
0235 <colspec colwidth="2cm"/>
0236 <colspec />
0237 <tbody>
0238 <row>
0239 <entry>old</entry>
0240 <entry><programlisting language="po">
0241 #, fuzzy
0242 #~| msgid "Accurate subpolar weather cycles"
0243 #~ msgid "Accurate subpolar climate cycles"
0244 #~ msgstr "Tačni ciklusi subpolarnog vremena"
0245 </programlisting></entry>
0246 </row>
0247 <row>
0248 <entry>new</entry>
0249 <entry><programlisting language="po">
0250 #. ui: property (text), widget (QCheckBox, accCyclesTrop)
0251 #: config.ui:180
0252 #, fuzzy
0253 #| msgid "Accurate tropical weather cycles"
0254 msgctxt "some-superfluous-context"
0255 msgid "Accurate tropical climate cycles"
0256 msgstr "Tačni ciklusi tropskog vremena"
0257 </programlisting></entry>
0258 </row>
0259 <row>
0260 <entry>diff</entry>
0261 <entry><programlisting language="po">
0262 #. ediff: state {-obsolete-}
0263 #. ui: property (text), widget (QCheckBox, accCyclesTrop)
0264 #: config.ui:180
0265 #, fuzzy
0266 #| msgid "Accurate {-subpolar-}{+tropical+} weather cycles"
0267 msgctxt "{+some-superfluous-context+}~"
0268 msgid "Accurate {-subpolar-}{+tropical+} climate cycles"
0269 msgstr "Tačni ciklusi {-subpolarnog-}{+tropskog+} vremena"
0270 </programlisting></entry>
0271 </row>
0272 </tbody>
0273 </tgroup>
0274 </informaltable>
0275 </para>
0276
0277 <para>The first thing to note is that the ediff message contains not only the extraction-invariant parts, but also verbatim copies of extraction-prescribed parts from the new message. Effectively, the ediff is embedded into the copy of the new message. Extraction-prescribed parts are not simply discarded in order to provide more context when reviewing the diff. Here, for example, the extracted comment states that the text is a checkbox label, which may be important for the style of translation.</para>
0278
0279 <para>The other important element is the <literal>#. ediff:</literal> dummy extracted comment, which here indicates that the obsolete state has been "removed", i.e. the message was unobsoleted betwen then old and the new version of the PO file. Aside from state changes, few other indicators may be present in this comment, and they will be mentioned later on. The ediff comment is present only when necessary, if there are any indicators to show.</para>
0280
0281 <para>If diffing of two messages would always be conducted part for part, for all message parts which are taken into diffing, then in some cases the resulting ediff would not be very useful. Consider how the first example in this chapter, the line-level diff of a fuzzy and translated message, would look like as ediff if diffed part for part:
0282 <informaltable>
0283 <tgroup cols="2">
0284 <colspec colwidth="2cm"/>
0285 <colspec />
0286 <tbody>
0287 <row>
0288 <entry>old</entry>
0289 <entry><programlisting language="po">
0290 #: main.c:110
0291 #, fuzzy
0292 #| msgid "The Record of The Witch River"
0293 msgid "Records of The Witch River"
0294 msgstr "Beleška o Veštičjoj reci"
0295 </programlisting></entry>
0296 </row>
0297 <row>
0298 <entry>new</entry>
0299 <entry><programlisting language="po">
0300 #: main.c:110
0301 msgid "Records of The Witch River"
0302 msgstr "Beleške o Veštičjoj reci"
0303 </programlisting></entry>
0304 </row>
0305 <row>
0306 <entry>diff</entry>
0307 <entry><programlisting language="po">
0308 #. ediff: state {-fuzzy-}
0309 #: main.c:110
0310 #| msgid "{-The Record of The Witch River-}~"
0311 msgid "Records of The Witch River"
0312 msgstr "{-Beleška-}{+Beleške+} o Veštičjoj reci"
0313 </programlisting></entry>
0314 </row>
0315 </tbody>
0316 </tgroup>
0317 </informaltable>
0318 This ediff suffers from the same problem as the line-level diff: instead of showing the difference from previous to current <varname>msgid</varname> string, the current <varname>msgid</varname> is left untouched, while the previous <varname>msgid</varname> is simply shown to have been removed.</para>
0319
0320 <para>Therefore, instead of diffing directly part for part, a special transformation takes place when <emphasis>exactly one</emphasis> of the two diffed messages is fuzzy and contains previous original strings. This splits into two directions: from fuzzy to non-fuzzy, and from non-fuzzy to fuzzy.</para>
0321
0322 <para>Diffing from a fuzzy to a non-fuzzy message is the more usual of the two directions. It typically appears when the translation has been updated after merging with template. In this case, the old and the new message are shuffled prior to diffing in the following way (<literal>*-rest</literal> denotes all diffed parts that are neither original text nor fuzzy state):
0323 <informaltable>
0324 <tgroup cols="2">
0325 <colspec colwidth="2cm"/>
0326 <colspec />
0327 <tbody>
0328 <row>
0329 <entry>old</entry>
0330 <entry><programlisting>
0331 fuzzy                   -->     fuzzy
0332 old-previous-strings    -->     old-previous-strings
0333 old-current-strings     -->     old-previous-strings
0334 old-rest                -->     old-rest
0335 </programlisting></entry>
0336 </row>
0337 <row>
0338 <entry>new</entry>
0339 <entry><programlisting>
0340 -                       -->     -
0341 -                       -->     old-current-strings
0342 new-current-strings     -->     new-current-strings
0343 new-rest                -->     new-rest
0344 </programlisting></entry>
0345 </row>
0346 </tbody>
0347 </tgroup>
0348 </informaltable>
0349 When these shuffled messages are diffed, the resulting ediff message's current strings will show the important difference, that between the previous original text of the old (fuzzy) message and the current original text of the new (non-fuzzy) message. Ediff message's previous strings will show the less important difference between the old message's previous and current strings, but <emphasis>only</emphasis> if it is not the same as the difference between current strings. This may sound confusing, but the actual ediff produced in this way is quite intuitive:
0350 <informaltable id="t-ediff-f-to-nf">
0351 <tgroup cols="2">
0352 <colspec colwidth="2cm"/>
0353 <colspec />
0354 <tbody>
0355 <row>
0356 <entry>old</entry>
0357 <entry><programlisting language="po">
0358 #: main.c:110
0359 #, fuzzy
0360 #| msgid "The Record of The Witch River"
0361 msgid "Records of The Witch River"
0362 msgstr "Beleška o Veštičjoj reci"
0363 </programlisting></entry>
0364 </row>
0365 <row>
0366 <entry>new</entry>
0367 <entry><programlisting language="po">
0368 #: main.c:110
0369 msgid "Records of The Witch River"
0370 msgstr "Beleške o Veštičjoj reci"
0371 </programlisting></entry>
0372 </row>
0373 <row>
0374 <entry>diff</entry>
0375 <entry><programlisting language="po">
0376 #. ediff: state {-fuzzy-}
0377 #: main.c:110
0378 msgid "{-The Record-}{+Records+} of The Witch River"
0379 msgstr "{-Beleška-}{+Beleške+} o Veštičjoj reci"
0380 </programlisting></entry>
0381 </row>
0382 </tbody>
0383 </tgroup>
0384 </informaltable>
0385 From this the reviewer can see that the message was unfuzzied, the change in the original text that caused the message to become fuzzy, and what was changed in the translation to unfuzzy it. The old version of the text (in removed and equal segments) is that from the message <emphasis>before</emphasis> it got fuzzied, and the new version (in added and equal segments) is that from the message after it was unfuzzied.</para>
0386
0387 <para>The other special direction, from a non-fuzzy to a fuzzy message, should be less frequent. It appears, for example, when the diff is taken from the old, completely translated PO file, to the new PO file which has been merged with the latest template. In this case, the shuffling is as follows:
0388 <informaltable>
0389 <tgroup cols="2">
0390 <colspec colwidth="2cm"/>
0391 <colspec />
0392 <tbody>
0393 <row>
0394 <entry>old</entry>
0395 <entry><programlisting>
0396 -                       -->     -
0397 -                       -->     new-previous-strings
0398 old-current-strings     -->     old-current-strings
0399 old-rest                -->     old-rest
0400 </programlisting></entry>
0401 </row>
0402 <row>
0403 <entry>new</entry>
0404 <entry><programlisting>
0405 fuzzy                   -->     fuzzy
0406 new-previous-strings    -->     new-current-strings
0407 new-current-strings     -->     new-current-strings
0408 new-rest                -->     new-rest
0409 </programlisting></entry>
0410 </row>
0411 </tbody>
0412 </tgroup>
0413 </informaltable>
0414 The difference in ediff messages's current strings will again be the most important one, and in previous strings the less important one and shown only if not equal to the difference in current strings. Here is what this will result in when applied one step earlier, just after merging with template:
0415 <informaltable id="t-ediff-nf-to-f">
0416 <tgroup cols="2">
0417 <colspec colwidth="2cm"/>
0418 <colspec />
0419 <tbody>
0420 <row>
0421 <entry>old</entry>
0422 <entry><programlisting language="po">
0423 #: main.c:89
0424 msgid "The Record of The Witch River"
0425 msgstr "Beleška o Veštičjoj reci"
0426 </programlisting></entry>
0427 </row>
0428 <row>
0429 <entry>new</entry>
0430 <entry><programlisting language="po">
0431 #: main.c:110
0432 #, fuzzy
0433 #| msgid "The Record of The Witch River"
0434 msgid "Records of The Witch River"
0435 msgstr "Beleška o Veštičjoj reci"
0436 </programlisting></entry>
0437 </row>
0438 <row>
0439 <entry>diff</entry>
0440 <entry><programlisting language="po">
0441 #. ediff: state {+fuzzy+}
0442 #: main.c:110
0443 #, fuzzy
0444 msgid "{-The Record-}{+Records+} of The Witch River"
0445 msgstr "Beleška o Veštičjoj reci"
0446 </programlisting></entry>
0447 </row>
0448 </tbody>
0449 </tgroup>
0450 </informaltable>
0451 The reviewer can see that the message became fuzzy, and the change in the original text that caused that.</para>
0452
0453 <para>The diffing tool may add custom additional information at the end of any strings in the ediff message (<varname>msgid</varname>, <varname>msgstr</varname>, etc.), separated with a newline, a repeated block of one or more characters, and a newline. When this is done, the <literal>#. ediff:</literal> comment will have the <literal>infsep</literal> indicator, which states the character block used and the number of repetitions in the separator:
0454 <programlisting language="po">
0455 #. ediff: state {+fuzzy+}, infsep +- 20
0456 #: main.c:110
0457 #, fuzzy
0458 msgid "{-The Record-}{+Records+} of The Witch River"
0459 msgstr ""
0460 "Beleška o Veštičjoj reci\n"
0461 "+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-\n"
0462 "<replaceable>some-additional-information</replaceable>"
0463 </programlisting>
0464 Of course, the diffing tool should compute the appropriate separator such that it does not conflict with a part of the text in one of the strings. What could be this additional information? For example, it could be a filtered version of the text, to ease some special review type.</para>
0465
0466 </sect2>
0467
0468 <sect2 id="sec-dpfrmpairing">
0469 <title>Pairing Messages From Two PO Files</title>
0470
0471 <para>By now it was described how to make an embedded diff out of two messages, once it has been decided that those messages should be diffed. However, the translator is not expected to decide which messages to diff, but which PO files to diff. The diffing tools should then automatically <emphasis>pair</emphasis> for diffing the messages from the two PO files, and this section describes the several pairing criteria.</para>
0472
0473 <para>Most obviously, messages should be <emphasis>paired by key</emphasis>, which can be called primary pairing. The PO message key is the unique combination of <varname>msgctxt</varname> and <varname>msgid</varname> strings. In the most usual case -- reviewing an ediff from incomplete PO file with fuzzy and untranslated messages, to an updated PO file with those messages translated -- pairing by key will be fully sufficient, as both PO files will contain exactly the same set of messages. These two messages will be paired by key:
0474 <informaltable>
0475 <tgroup cols="2">
0476 <colspec colwidth="2cm"/>
0477 <colspec />
0478 <tbody>
0479 <row>
0480 <entry>old</entry>
0481 <entry><programlisting language="po">
0482 #: main.c:110
0483 #, fuzzy
0484 #| msgid "The Record of The Witch River"
0485 msgid "Records of The Witch River"
0486 msgstr "Beleška o Veštičjoj reci"
0487 </programlisting></entry>
0488 </row>
0489 <row>
0490 <entry>new</entry>
0491 <entry><programlisting language="po">
0492 #: main.c:110
0493 msgid "Records of The Witch River"
0494 msgstr "Beleške o Veštičjoj reci"
0495 </programlisting></entry>
0496 </row>
0497 </tbody>
0498 </tgroup>
0499 </informaltable>
0500 </para>
0501
0502 <para>But what should happen if some messages are left unpaired after pairing by key? Consider the earlier example where the diff was taken from the older fully translated to the newer merged PO file:
0503 <informaltable>
0504 <tgroup cols="2">
0505 <colspec colwidth="2cm"/>
0506 <colspec />
0507 <tbody>
0508 <row>
0509 <entry>old</entry>
0510 <entry><programlisting language="po">
0511 #: main.c:89
0512 msgid "The Record of The Witch River"
0513 msgstr "Beleška o Veštičjoj reci"
0514 </programlisting></entry>
0515 </row>
0516 <row>
0517 <entry>new</entry>
0518 <entry><programlisting language="po">
0519 #: main.c:110
0520 #, fuzzy
0521 #| msgid "The Record of The Witch River"
0522 msgid "Records of The Witch River"
0523 msgstr "Beleška o Veštičjoj reci"
0524 </programlisting></entry>
0525 </row>
0526 </tbody>
0527 </tgroup>
0528 </informaltable>
0529 The keys, here just current <varname>msgid</varname> strings, of the two messages do not match, so they cannot be paired by key. Yet it would be ungainly to represent the old message as fully removed, and the new message as fully added, in the resulting ediff:
0530 <informaltable>
0531 <tgroup cols="2">
0532 <colspec colwidth="2cm"/>
0533 <colspec />
0534 <tbody>
0535 <row>
0536 <entry>diff</entry>
0537 <entry><programlisting language="po">
0538 #: main.c:89
0539 msgid "{-The Record of The Witch River-}~"
0540 msgstr "{-Beleška o Veštičjoj reci-}~"
0541 ⁠
0542 #. ediff: state {+fuzzy+}
0543 #: main.c:110
0544 #, fuzzy
0545 #| msgid "{+The Record of The Witch River+}~"
0546 msgid "{+Records of The Witch River+}~"
0547 msgstr "{+Beleška o Veštičjoj reci+}~"
0548 </programlisting></entry>
0549 </row>
0550 </tbody>
0551 </tgroup>
0552 </informaltable>
0553 (That the message has been fully added or removed can be seen by trailing tilde in the <varname>msgid</varname> string, which indicates that the old or new <varname>msgid</varname> does not exist at all, and so neither the message with it.)</para>
0554
0555 <para id="p-pairing-pivoting">Instead, messages left unpaired by key should be tested for <emphasis>pairing by pivoting</emphasis> around previous strings (secondary pairing). The two messages above will thus be paired due to the fact that the current <varname>msgid</varname> of the old message is equal to the previous <varname>msgid</varname> of the new message, and will produce a single ediff message <link linkend="t-ediff-nf-to-f">as shown earlier</link>.</para>
0556
0557 <para id="p-pairing-merging">Finally, consider the third related combination, when the old PO file has not yet been merged with the template, while the new PO file has both been merged and its translation updated:
0558 <informaltable>
0559 <tgroup cols="2">
0560 <colspec colwidth="2cm"/>
0561 <colspec />
0562 <tbody>
0563 <row>
0564 <entry>old</entry>
0565 <entry><programlisting language="po">
0566 #: main.c:89
0567 msgid "The Record of The Witch River"
0568 msgstr "Beleška o Veštičjoj reci"
0569 </programlisting></entry>
0570 </row>
0571 <row>
0572 <entry>new</entry>
0573 <entry><programlisting language="po">
0574 #: main.c:110
0575 msgid "Records of The Witch River"
0576 msgstr "Beleške o Veštičjoj reci"
0577 </programlisting></entry>
0578 </row>
0579 </tbody>
0580 </tgroup>
0581 </informaltable>
0582 Once again it would be a waste to present the old message as fully removed and the new message as fully added in the resulting ediff. When a message is left unpaired after both pairing by key and pairing by pivoting, then the two PO files can be merged in the background -- as if the new is the template for the old, and vice versa -- and then tested for chained pairing by pivoting and by key with the merged PO file as intermediary. This <emphasis>pairing by merging</emphasis> (tertiary pairing) will then produce another natural ediff:
0583 <informaltable>
0584 <tgroup cols="2">
0585 <colspec colwidth="2cm"/>
0586 <colspec />
0587 <tbody>
0588 <row>
0589 <entry>diff</entry>
0590 <entry><programlisting language="po">
0591 #: main.c:110
0592 msgid "{-The Record-}{+Records+} of The Witch River"
0593 msgstr "{-Beleška-}{+Beleške+} o Veštičjoj reci"
0594 </programlisting></entry>
0595 </row>
0596 </tbody>
0597 </tgroup>
0598 </informaltable>
0599 </para>
0600
0601 <para>It can be left to the diffing tool to decide which pairing methods beyond the primary pairing, by key, to use. There should not be much reason not to perform secondary pairing, by pivoting, as well. If tertiary pairing, by merging, is done, the user should be allowed to disable it, as it can sometimes produce strange results (subject to the fuzzy matching algorithm).</para>
0602
0603 </sect2>
0604
0605 <sect2 id="sec-dpfrmcollect">
0606 <title>Collecting Diffed Messages</title>
0607
0608 <para>For the ediff of two PO files to also be a syntactically valid PO file, constructed ediff messages should be preceded by a PO header in output. At first glance, this PO header could be itself the ediff of headers of the PO files which were diffed. However, there are several issues with this approach:
0609 <itemizedlist>
0610
0611 <listitem>
0612 <para>The reviewer of the ediff PO file would not be informed at once if there was any difference between the headers. Headers tend to be long, and a small change in one of header fields may go visually unnoticed.</para>
0613 </listitem>
0614
0615 <listitem>
0616 <para>Depending on the amount of changes between the two headers, the resulting ediff message of the header could be too badly formed to represent the header as such. For example, if some header fields in <varname>msgstr</varname> were added or removed, embedded difference wrappers would invalidate the MIME-header format of <varname>msgstr</varname>, which could confuse PO processing tools.</para>
0617 </listitem>
0618
0619 <listitem>
0620 <para>How would the diff of two <emphasis>collections</emphasis> of PO files (e.g. directories) be packed into a single ediff PO? To pack diffs of several file pairs into one diff file is an expected feature of diffing tools.</para>
0621 </listitem>
0622
0623 </itemizedlist>
0624 </para>
0625
0626 <para>To avert these difficulties, the following is done instead. First, a minimal valid header is constructed for the ediff PO file, <emphasis>independently</emphasis> of the headers in diffed PO files. The precise content can be left to the diffing tool, with <link linkend="sec-dpdiff">Pology's <command>poediff</command></link> producing something like:
0627 <programlisting language="po">
0628 # +- ediff -+
0629 msgid ""
0630 msgstr ""
0631 "Project-Id-Version: ediff\n"
0632 "PO-Revision-Date: 2009-02-08 01:20+0100\n"
0633 "Last-Translator: J. Random Translator\n"
0634 "Language-Team: Differs\n"
0635 "MIME-Version: 1.0\n"
0636 "Content-Type: text/plain; charset=UTF-8\n"
0637 "Content-Transfer-Encoding: 8bit\n"
0638 "X-Ediff-Header-Context: ~\n"
0639 </programlisting>
0640 The <literal>PO-Revision-Date</literal> header field is naturally set to the date when the ediff was made. Values for the <literal>Last-Translator</literal> and <literal>Language-Team</literal> fields can be somehow pulled from the environment (<command>poediff</command> will fetch them from <link linkend="sec-cmcfguser">Pology user configuration</link>, or set some dummy values). Encoding of the ediff PO can be chosen at will, so long as all constructed ediff messages can be encoded with it (<command>poediff</command> will always use UTF-8). The purpose of the final, <literal>X-Ediff-Header-Context</literal> field will be explained shortly.</para>
0641
0642 <para>It is the first next entry in the ediff PO file that will actually be the ediff of headers of the two diffed PO files. Headers are diffed just like any other message, but the resulting ediff is given a few additional decorations:
0643 <programlisting language="po">
0644 # =========================================================
0645 # Translation of The Witch River into Serbian.
0646 # Koja Kojic &lt;koja.kojic@nedohodnik.net>, 2008.
0647 # {+Era Eric &lt;era.eric@ledopad.net>, 2008.+}~
0648 msgctxt "~"
0649 msgid ""
0650 "- l10n-wr/sr/wriver-main.po\n"
0651 "+ l10n-wr/sr-mod/wriver-main.po\n"
0652 msgstr ""
0653 "Project-Id-Version: wriver 0.1\n"
0654 "POT-Creation-Date: 2008-09-22 09:17+0200\n"
0655 "PO-Revision-Date: 2008-09-{-25 20:44-}{+28 21:49+}+0100\n"
0656 "Last-Translator: {-Koja Kojic &lt;koja.kojic@nedohodnik-}"
0657 "{+Era Eric &lt;era.eric@ledopad+}.net>\n"
0658 "Language-Team: Serbian\n"
0659 "MIME-Version: 1.0\n"
0660 "Content-Type: text/plain; charset=UTF-8\n"
0661 "Content-Transfer-Encoding: 8bit\n"
0662 </programlisting>
0663 Observe the usual ediff segments: translator comment with a new translator who updated the PO file has been added, and the <literal>PO-Revision-Date</literal> and <literal>Last-Translator</literal> header fields contain ediffs reflecting the update. These are the only actual differences between the two headers. More interesting are the additional decorations:</para>
0664 <itemizedlist>
0665
0666 <listitem>
0667 <para>The very first translator comment (here a long line of equality signs) can be anything, and serves as a strong visual indicator of the header ediff. This is especially convenient when the ediff PO file contains diffs of several pairs of PO files.</para>
0668 </listitem>
0669
0670 <listitem>
0671 <para>That this particular message is a header ediff, is indicated by the <varname>msgctxt</varname> string set to a special value, here a single tilde. This value is given up front by the <literal>X-Ediff-Header-Context</literal> of the ediff PO header. It should be computed during diffing such that it does not conflict with <varname>msgctxt</varname> of one of the message ediffs (e.g. it may simply be a sufficiently long sequence of tildes).</para>
0672 </listitem>
0673
0674 <listitem>
0675 <para>The <varname>msgid</varname> string of the header ediff contains newline-separated paths of the diffed PO files. More precisely, the two lines of the <varname>msgid</varname> string are in the form <literal>[+-] <replaceable>file-path</replaceable>[ &lt;&lt;&lt; <replaceable>comment</replaceable>]\n</literal>. The trailing newline of the second file path is elided if the <varname>msgstr</varname> string does not end in newline, to prevent <command>msgfmt</command> from complaining. The file path is followed by the optional, <literal>&lt;&lt;&lt;</literal>-separated comment. This comment can be used for any purpose, one which will be demonstrated in <command>poediff</command>.</para>
0676 </listitem>
0677
0678 </itemizedlist>
0679
0680 <para>Although when a PO file is properly updated there should always be some difference in the header, it may happen that there is none. In such case, the header ediff message is still added, but it contains only the additional decorations: the visual separator comment, the special <varname>msgctxt</varname>, and the <varname>msgid</varname> with file paths. All other comments and <varname>msgstr</varname> are empty; the empty <varname>msgstr</varname> immediatelly shows that there is no difference between the headers. This "empty" header ediff is needed to provide the file paths of diffed PO files, and, if several pairs of PO files were diffed, to separate their  diffs in the ediff PO file.</para>
0681
0682 <para>After the header ediff message, ordinary ediff messages follow. When all constructed ediff messages from the current pair of PO files are listed, the next pair starts with a new header ediff message, and so on.</para>
0683
0684 <para>Especially when diffing several pairs of PO files, it may happen that two ediff messages have same keys (<varname>msgid</varname> and <varname>msgctxt</varname> strings) and thus cannot be both added as such to the ediff PO file. When that happens, the ediff message which was added after the first with the same key, will have its <varname>msgctxt</varname> string <emphasis>padded</emphasis> by few random alphanumerics, to make its key unique. This padding sequence will be recorded in the <literal>#. ediff:</literal> comment, as <literal>ctxtpad</literal> field. For example:
0685 <programlisting language="po">
0686 # =========================================================
0687 msgctxt "~"
0688 msgid "...(first PO header ediff)..."
0689 msgstr "..."
0690 ⁠
0691 #. ediff: state {-fuzzy-}
0692 msgid "White{+ horizon+}"
0693 msgstr "Belo{+ obzorje+}"
0694 ⁠
0695 # =========================================================
0696 msgctxt "~"
0697 msgid "...(second PO header ediff)..."
0698 msgstr "..."
0699 ⁠
0700 #. ediff: state {-fuzzy-}, ctxtpad q9ac3
0701 msgctxt "|q9ac3~"
0702 msgid "White{+ horizon+}"
0703 msgstr "Belo{+ obzorje+}"
0704 </programlisting>
0705 The padding sequence is appended to the original <varname>msgctxt</varname>, separated by <literal>|</literal>. If there was no original <varname>msgctxt</varname>, the padding sequence is further extended by a tilde.</para>
0706
0707 </sect2>
0708
0709 </sect1>
0710
0711 <!-- ======================================== -->
0712 <sect1 id="sec-dpdiff">
0713 <title>Producing Ediffs with <command>poediff</command></title>
0714
0715 <para>The <command>poediff</command> script in Pology implements embedded diffing of PO files as defined in the previous section. To diff two PO files, running the usual:
0716 <programlisting language="bash">
0717 $ poediff orig/foo.po mod/foo.po
0718 </programlisting>
0719 will write out the ediff PO content to standard output, with some basic shell coloring of difference segments. The ediff can be written into a file (an ediff PO file) either with shell redirection, or the <option>-o</option>/<option>--output</option>. It is equally simple to diff directories:
0720 <programlisting language="bash">
0721 $ poediff orig/ mod/
0722 </programlisting>
0723 By default, given directories are recursively searched for PO files, and the PO files present in only one of the directories will also be included in the ediff.</para>
0724
0725 <sect2 id="sec-dpdiffvcs">
0726 <title>Diffing With Underlying VCS</title>
0727
0728 <para>When PO files are handled by a version control system (VCS), <command>poediff</command> can be put into VCS mode using the <option>-c/--vcs <replaceable>VCS</replaceable></option> option, where the value is the keyword of one of the <link linkend="sec-cmsuppvcs">version control systems supported by Pology</link>. In VCS mode, instead of giving two paths to diff, any number of version-controlled paths (files or directories) are given. Without other options, all locally modified PO files in these paths are diffed against the last commit known to local repository. For example, if a program is using a Subversion repository, then the PO files in its <filename>po/</filename> directory can be diffed with:
0729 <programlisting language="bash">
0730 $ poediff -c svn prog/po/
0731 </programlisting>
0732 </para>
0733
0734 <para>Specific revisions to diff can be given by the <option>-r/--revision <replaceable>REV1</replaceable>[:<replaceable>REV2</replaceable>]</option>. <literal><replaceable>REV1</replaceable></literal> and <literal><replaceable>REV2</replaceable></literal> are not necessarily direct revision IDs, but any strings that the underlying VCS can convert into revision IDs. If <literal><replaceable>REV2</replaceable></literal> is omitted, diffing is preformed from <literal><replaceable>REV1</replaceable></literal> to current working copy.</para>
0735
0736 <para>When ediff is made in VCS mode, <varname>msgid</varname> strings in header ediffs will state revision IDs, in &lt;&lt;&lt;-separated comments next to file paths:
0737 <programlisting language="po">
0738 # =========================================================
0739 # ...
0740 msgctxt "~"
0741 msgid ""
0742 "- prog/po/sr.po &lt;&lt;&lt; 20537\n"
0743 "+ prog/po/sr.po"
0744 msgstr "..."
0745 </programlisting>
0746 </para>
0747
0748 </sect2>
0749
0750 <sect2 id="sec-dpdiffopt">
0751 <title>Command Line Options</title>
0752
0753 <para>
0754 Options specific to <command>poediff</command>:
0755 <variablelist>
0756
0757 <varlistentry>
0758 <term><option>-b</option>, <option>--skip-obsolete</option></term>
0759 <listitem>
0760 <para>By default, <link linkend="sec-poobsol">obsolete messages</link> are treated equally to non-obsolete, and can feature in the ediff output. This makes it possible to detect when a message has become obsolete, or has returned from obsolescence, and show this in the ediff. But sometimes including obsolete messages into diffing may not desired, and then this option can be issued to ignore them.</para>
0761 </listitem>
0762 </varlistentry>
0763
0764 <varlistentry>
0765 <term><option>-c <replaceable>VCS</replaceable></option>, <option>--vcs=<replaceable>VCS</replaceable></option></term>
0766 <listitem>
0767 <para>The keyword of the underlying version control system, to switch <command>poediff</command> into <link linkend="sec-dpdiffvcs">VCS mode</link>. See <xref linkend="sec-cmsuppvcs"/> for the list of supported version control systems (or issue <option>--list-vcs</option> option).</para>
0768 </listitem>
0769 </varlistentry>
0770
0771 <varlistentry>
0772 <term><option>--list-options</option>, <option>--list-vcs</option></term>
0773 <listitem>
0774 <para>Simple listings of options and VCS keywords. Intended mainly for writting shell completion definitions.</para>
0775 </listitem>
0776 </varlistentry>
0777
0778 <varlistentry>
0779 <term><option>-n</option>, <option>--no-merge</option></term>
0780 <listitem>
0781 <para>Disable pairing of messages by <link linkend="p-pairing-merging">by internal merging</link> of diffed PO files. Merging is performed only if there were some messages left unpaired after pairing by key and <link linkend="p-pairing-pivoting">by pivoting</link>, so in the usual circumstances it is not done anyway. But when it is done, it may produce strange results, so this option can be used to prevent it.</para>
0782 </listitem>
0783 </varlistentry>
0784
0785 <varlistentry>
0786 <term><option>-o <replaceable>FILE</replaceable></option>, <option>--output=<replaceable>FILE</replaceable></option></term>
0787 <listitem>
0788 <para>The ediff is by default written to the standard output, and this option can be used to send it to a file instead.</para>
0789 </listitem>
0790 </varlistentry>
0791
0792 <varlistentry>
0793 <term><option>-p</option>, <option>--paired-only</option></term>
0794 <listitem>
0795 <para>When directories are diffed, by default the PO files present in only one of them will be included into the ediff, i.e. all their messages will be shown as added or removed. This option will limit diffing only to files present in both directories, in the sense of having the same relative paths (rather than e.g. same PO domain name).</para>
0796 </listitem>
0797 </varlistentry>
0798
0799 <varlistentry>
0800 <term><option>-Q</option>, <option>--quick</option></term>
0801 <listitem>
0802 <para>Produced maximally stripped-down output, sometimes useful for quick visual observation of changes, but which cannot be used as patch. Equivalent to <option>-bns</option>.</para>
0803 </listitem>
0804 </varlistentry>
0805
0806 <varlistentry>
0807 <term><option>-r <replaceable>REV1</replaceable>[:<replaceable>REV2</replaceable>]</option>, <option>--revision=<replaceable>REV1</replaceable>[:<replaceable>REV2</replaceable>]</option></term>
0808 <listitem>
0809 <para>When operating in <link linkend="sec-dpdiffvcs">VCS mode</link>, the default is to make the diff from the last commit to the current working copy. This option can be used to diff between any two revisions. If the second revision is omitted, the diff is taken from first revision to current working copy.</para>
0810 </listitem>
0811 </varlistentry>
0812
0813 <varlistentry>
0814 <term><option>-s</option>, <option>--strip-headers</option></term>
0815 <listitem>
0816 <para>Prevents diffing of PO headers, as well as inclusion of top ediff header in the output. This reduces clutter when the intention is to see only changes in messages through many PO files, but the resulting ediff cannot be used as patch.</para>
0817 </listitem>
0818 </varlistentry>
0819
0820 <varlistentry>
0821 <term><option>-U</option>, <option>--update-effort</option></term>
0822 <listitem>
0823 <para>Instead of outputing the diff, the <emphasis>translation update effort</emphasis> is computed. It is expressed as the nominal number of newly translated words, from old to new paths. The procedure to compute this quantity is not straightforward, but the intention is that it roughly approximate the number of words (in original text) as if messages were translated from scratch. Options <option>-b</option> and <option>-n</option> are ignored.</para>
0824 </listitem>
0825 </varlistentry>
0826
0827 </variablelist>
0828 </para>
0829
0830 <para>
0831 Options common with other Pology tools:
0832 <variablelist>
0833
0834 <xi:include xmlns:xi="http://www.w3.org/2001/XInclude"
0835             href="stdopt-colors.docbook"/>
0836
0837 </variablelist>
0838 </para>
0839
0840 </sect2>
0841
0842 <sect2 id="sec-dpdiffcfg">
0843 <title>User Configuration</title>
0844
0845 <para><command>poediff</command> will consult <link linkend="sec-cmcfguser">the <literal>[user]</literal> section</link> in <link linkend="sec-cmconfig">user configuration</link> to fill out some of the header of the ediff PO file. It also consults its own section, with the following fields avaialbe:
0846 <variablelist>
0847
0848 <varlistentry>
0849 <term><literal>[poediff]/merge=[*yes|no]</literal></term>
0850 <listitem>
0851 <para>Setting to <literal>no</literal> is counterpart to <option>--no-merge</option> command line option, i.e. this field can be used to permanently disable message pairing <link linkend="p-pairing-merging">by merging</link>.</para>
0852 </listitem>
0853 </varlistentry>
0854
0855 </variablelist>
0856 </para>
0857
0858 </sect2>
0859
0860 </sect1>
0861
0862 <!-- ======================================== -->
0863 <sect1 id="sec-dppatch">
0864 <title>Applying Ediffs as Patches with <command>poepatch</command></title>
0865
0866 <para>Basic application of an ediff patch is much easier than that of a line-level patch, because there will be no conflicts if messages have different wrapping, ordering, or <link linkend="p-msgparts-extpre">extraction-prescribed</link> parts (source references, etc.). The patch is applied by resolving each ediff message from it into the originating old and new message, and if either the old or the new message exists (by key) in the target PO file and has equal <link linkend="p-msgparts-extinv">extraction-invariant</link> parts, then the message modification is applied, and otherwise rejected.</para>
0867
0868 <para>Applying the modification to the target message means overwriting its extraction-invariant parts with those from the new message from the ediff, and leaving other parts untouched. If the target message is already equal to the new message by extraction-invariant parts, then the patch is silently ignored. This means that if the same patch is applied twice to the target PO file, the second application makes no modifications. Likewise if, by chance, the modifications given by the patch were already independently performed by another translator (e.g. a few simple updates to unfuzzy messages).</para>
0869
0870 <para>Command-line interface of Pology's <command>poepatch</command> is much like that of <command>patch(1)</command>, sans the myriad of its more obscure options. There is the <option>-p</option> option to strip leading elements of file paths in the ediff, and <option>-d</option> option to append to them a directory path where target PO files are to be looked up. If the ediff was produced in <link linkend="sec-dpdiffvcs">VCS mode</link>, then it can be applied as patch in any of the following ways:
0871 <programlisting language="bash">
0872 $ cd repos/prog/po &amp;&amp; poepatch &lt;ediff.po
0873 $ cd repos/ &amp;&amp; poepatch -p0 &lt;ediff.po
0874 $ poepatch -d repos/app/po &lt;ediff.po
0875 </programlisting>
0876 </para>
0877
0878 <para>Header modifications (coming from the header ediff message) are applied in a slightly relaxed fashion: some of the standard header fields are ignored when checking whether the patch is applicable. These are the fields which are known to be volatile as the PO file goes through different translators, and do not influence the processing of the PO file (e.g. such as encoding or plural forms). The ignored fields are: <literal>POT-Creation-Date</literal>, <literal>PO-Revision-Date</literal>, <literal>Last-Translator</literal>, <literal>X-Generator</literal>. When the header modification is accepted, the ignored fields in the target header are overwritten with those from the patch (including being added or removed).</para>
0879
0880 <sect2 id="sec-dppatchrej">
0881 <title>Handling Rejected Ediffs</title>
0882
0883 <para>All ediff messages which were rejected as patches will be written out to <filename>stdin.rej.po</filename> in the current working directory if the patch was read from standard input, or to <filename><replaceable>FILE</replaceable>.rej.po</filename> if the patch file was given by <option>-i <replaceable>FILE</replaceable>.po</option> option.</para>
0884
0885 <para>The file with rejected ediff messages will again be an ediff PO file. It will have the header as before, except that its comment will mention that the file contains rejects of a patching operation. Afterwards, rejected ediff messages rejected will follow. Every header ediff message will be present whether rejected or not, for the same purpose of separation and provision of file paths, but if it was not rejected as patch itself, it will be stripped of comments and <varname>msgstr</varname> string.</para>
0886
0887 <para>Furthermore, to every straigh-out rejected ediff message an <literal>ediff-no-match</literal> flag will be added. This is done, naturally, because some ediff messages may not be rejected straight-out. Consider the following scenario. A PO file has been merged to produce the fuzzy message:
0888 <informaltable>
0889 <tgroup cols="2">
0890 <colspec colwidth="2cm"/>
0891 <colspec />
0892 <tbody>
0893 <row>
0894 <entry>old</entry>
0895 <entry><programlisting language="po">
0896 #: tools/power.c:348
0897 msgid "Active sonar low frequency"
0898 msgstr "Niska frekvencija aktivnog sonara"
0899 </programlisting></entry>
0900 </row>
0901 <row>
0902 <entry>new</entry>
0903 <entry><programlisting language="po">
0904 #: tools/power.c:361
0905 #, fuzzy
0906 #| msgid "Active sonar low frequency"
0907 msgid "Active sonar high frequency"
0908 msgstr "Niska frekvencija aktivnog sonara"
0909 </programlisting></entry>
0910 </row>
0911 </tbody>
0912 </tgroup>
0913 </informaltable>
0914 The translator updates the PO file, which produces the usual ediff message when going from fuzzy to translated:
0915 <informaltable>
0916 <tgroup cols="2">
0917 <colspec colwidth="2cm"/>
0918 <colspec />
0919 <tbody>
0920 <row>
0921 <entry>diff</entry>
0922 <entry><programlisting language="po">
0923 #. ediff: state {-fuzzy-}
0924 #: tools/power.c:361
0925 msgid "Active sonar {-low-}{+high+} frequency"
0926 msgstr "{-Niska-}{+Visoka+} frekvencija aktivnog sonara"
0927 </programlisting></entry>
0928 </row>
0929 </tbody>
0930 </tgroup>
0931 </informaltable>
0932 However, <emphasis>before</emphasis> this patch could have been applied, the programmer adds a trailing colon to the same message, and the catalog is merged again to produce:
0933 <informaltable>
0934 <tgroup cols="2">
0935 <colspec colwidth="2cm"/>
0936 <colspec />
0937 <tbody>
0938 <row>
0939 <entry>new-2</entry>
0940 <entry><programlisting language="po">
0941 #: tools/power.c:361
0942 #, fuzzy
0943 #| msgid "Active sonar low frequency"
0944 msgid "Active sonar high frequency:"
0945 msgstr "Niska frekvencija aktivnog sonara"
0946 </programlisting></entry>
0947 </row>
0948 </tbody>
0949 </tgroup>
0950 </informaltable>
0951 The patch cannot be cleanly applied at this point, due to the extra colon added in the meantime to the <varname>msgid</varname>, so it has to be rejected. If nothing else is done, it would appear in the file of rejects as:
0952 <programlisting language="po">
0953 #. ediff: state {-fuzzy-}
0954 #: tools/power.c:361
0955 #, ediff-no-match
0956 msgid "Active sonar {-low-}{+high+} frequency"
0957 msgstr "{-Niska-}{+Visoka+} frekvencija aktivnog sonara"
0958 </programlisting>
0959 </para>
0960
0961 <para id="p-split-rejects">It is wastefull to reject such a near-matching patch without any indication that it could be easily adapted to the latest message in the target PO file. Therefore, when an ediff message is rejected, the following analysis is performed: by trying out message pairings as on diffing, could the old message from the patch be paired with a current message from the target PO, and that current message with the new message from the patch? Or, in other words, can an existing message in the target PO be "fitted in between" the old and new messages defined by the patch? If this is the case, instead of the original, two special ediff messages -- <emphasis>split rejects</emphasis> -- are constructed and written out: one from the old to the current message, and another from the current to the new message. They are flagged as <literal>ediff-to-cur</literal> and <literal>ediff-to-new</literal>, respectively:
0962 <programlisting language="po">
0963 #: tools/power.c:361
0964 #, fuzzy, ediff-to-cur
0965 #| msgid "Active sonar low frequency"
0966 msgid "Active sonar high frequency{+:+}"
0967 msgstr "Niska frekvencija aktivnog sonara"
0968 ⁠
0969 #. ediff: state {-fuzzy-}
0970 #: tools/power.c:361
0971 #, ediff-to-new
0972 #| msgid "Active sonar {-low-}{+high+} frequency{+:+}"
0973 msgid "Active sonar {-low-}{+high+} frequency"
0974 msgstr "{-Niska-}{+Visoka+} frekvencija aktivnog sonara"
0975 </programlisting>
0976 </para>
0977
0978 <para>There are more ways to interpret split rejects, depending on the circumstances. In this example, from the <literal>ediff-to-cur</literal> message the reviewer can see what had changed in the target message after the translator made the ediff. This can also be seen by comparing difference embedded into previous and current <varname>msgid</varname> strings in the <literal>ediff-to-new</literal> message. With a bit of editing, the reviewer can fold these two messages into an applicable patch:
0979 <programlisting language="po">
0980 #. ediff: state {-fuzzy-}
0981 #: tools/power.c:361
0982 #, ediff
0983 msgid "Active sonar {-low-}{+high+} frequency:"
0984 msgstr "{-Niska-}{+Visoka+} frekvencija aktivnog sonara:"
0985 </programlisting>
0986 Since the file of rejects is also an ediff PO, after edits such as this to make some patches applicable, it can be <emphasis>reapplied</emphasis> as patch. When that is done, <command>poepatch</command> will silently ignore all ediff messages having <literal>ediff-no-match</literal> or <literal>ediff-to-new</literal> flags, as these have already been determined inapplicable. That is why in this example the reviewer has replaced the <literal>ediff-to-new</literal> flag with the plain <literal>ediff</literal> in the folded ediff message.</para>
0987
0988 </sect2>
0989
0990 <sect2 id="sec-dppatchemb">
0991 <title>Embedding Patches</title>
0992
0993 <para>Depending on the kind of text which is being translated, and distance between the source and target language grammar, ortography, and style, it may be difficult to review the ediff in isolation. In general, messages in ediff PO file will lack <emphasis>positional context</emphasis>, which is in the full PO provided by messages immediately preceding and following the observed message. For example, a long passage from documentation probably needs no positional context. But a short, newly added message such as "Crimson" could very well need one, if it has neither <varname>msgctxt</varname> nor an extracted comment describing it: is it really a color? what grammatical ending should it have (in a language which matches adjective to noun gender)? Several messages around it in the full PO file could easily show whether it is just another color in a row, and their grammatical endings (determined by a translator earlier).</para>
0994
0995 <para>Another difficulty is when an ediff message needs some editing before being applied. This may not be easy to do this directly in the ediff PO file. Everything is fine so long as only the added text segments (<literal>{+...+}</literal>) are edited, but if the sentence needs to be restructured more thoroughly, the reviewer would have to make sure to put all additions into existing or new <literal>{+...+}</literal> segments, and to wrap all removals as <literal>{-...-}</literal> segments. If this is not carefully performed, the patch will not be applicable any more, as the old message resolved from it will no longer exactly match a message in the target PO file.</para>
0996
0997 <para>For these reasons, <command>poepatch</command> can apply the patch such as not to resolve the ediff, but to set all its extraction-invariant fields to the message in the target PO file. In effect, the target PO file becomes an ediff PO by itself, but only in the messages which were actually patched. To mark these messages for lookup, the usual <literal>ediff</literal> flag is added to them. For example, if the message in the patch file was:
0998 <programlisting language="po">
0999 #: title.c:274
1000 msgid "Tutorial"
1001 msgstr "{-Tutorijal-}{+Podučavanje+}"
1002 </programlisting>
1003 then when the patch is successfully applied with embedding, the patched message in target PO file will look like this, among other messages:
1004 <programlisting language="po">
1005 #: main.c:110
1006 msgid "Records of The Witch River"
1007 msgstr "Beleške o Veštičjoj reci"
1008 ⁠
1009 #: title.c:292
1010 #, ediff
1011 msgid "Tutorial"
1012 msgstr "{-Tutorijal-}{+Podučavanje+}"
1013 ⁠
1014 #: title.c:328
1015 msgid "Start the Expedition"
1016 msgstr "Pođi u ekspediciju"
1017 </programlisting>
1018 Other than the addition of the <literal>ediff</literal> flag, note that the patched message kept its own source reference, rather than being overwritten by that from the patch. Same holds for all extraction-prescribed parts.</para>
1019
1020 <para>The reviewer can now jump over <literal>ediff</literal> flags, always having the full positional context for each patched message, and being able to edit it to heart's content, with only minimal care not to invalidate the ediff format. Wrapped difference segments can be entirely removed, non-wrapped segments can be freely edited; it should only not happen that a wrapped segment looses its opening or closing sequence. But this does not mean that the reviewer has to remove difference segments, that is, to manually unembed patched messages. <command>poepatch</command> can do this automatically, when run on the embedded-patched PO file with the <option>-u</option>/<option>--unembed</option> option.</para>
1021
1022 <para>A patch is applied with embedding by issuing the <option>-e</option>/<option>--embed</option> option:
1023 <programlisting language="bash">
1024 $ poepatch -e &lt;ediff.po
1025 patched (E): foo.po
1026 </programlisting>
1027 where <literal>(E)</literal> in the output indicates that the embedding is engaged. After the patched PO file had been reviewed and patched messages possibly edited, all remaining embedded differences are removed, i.e. resolved to new versions, by running:
1028 <programlisting language="bash">
1029 $ poepatch -u foo.po
1030 </programlisting>
1031 More precisely, only those messages having the <literal>ediff</literal> flag are resolved, therefore the reviewer <emphasis>must not</emphasis> remove them (unless manually unembedding the whole message).</para>
1032
1033 <para>What happens with rejected patches when embedding is engaged? They are also added into the target PO file, with heuristic positioning, and no separate file with rejects is created. Same as on plain patching, straight-out rejects will have the <literal>ediff-no-match</literal> flag, and split rejects <literal>ediff-to-cur</literal> or <literal>ediff-to-new</literal>. If these are not manually resolved during the review (<literal>ediff-no-match</literal> messages removed, <literal>ediff-to-*</literal> messages removed or folded), when <command>poepatch</command> is run to unembed the differences, it will remove all <literal>ediff-no-match</literal> and <literal>ediff-to-new</literal> messages, and resolve <literal>ediff-to-cur</literal> messages to current version.</para>
1034
1035 </sect2>
1036
1037 <sect2 id="sec-dppatchopt">
1038 <title>Command Line Options</title>
1039
1040 <para>
1041 Options specific to <command>poepatch</command>:
1042 <variablelist>
1043
1044 <varlistentry>
1045 <term><option>-a</option>, <option>--aggressive</option></term>
1046 <listitem>
1047 <para>After the messages from the patch and the target PO file have been paired, normally only those differences that have no conflicts (e.g. in translation) will be applied. This option can be issued to instead unconditionally overwrite all <link linkend="p-msgparts-extinv">extraction-invariant</link> parts of the message in the target PO file with those defined by the paired patch.</para>
1048 </listitem>
1049 </varlistentry>
1050
1051 <varlistentry>
1052 <term><option>-d</option>, <option>--directory</option></term>
1053 <listitem>
1054 <para>The directory path to prepend to file paths read from the patch file, when trying to match the files on disk to patch.</para>
1055 </listitem>
1056 </varlistentry>
1057
1058 <varlistentry>
1059 <term><option>-e</option>, <option>--embed</option></term>
1060 <listitem>
1061 <para>Apply patch <link linkend="sec-dppatchemb">with embedding</link>.</para>
1062 </listitem>
1063 </varlistentry>
1064
1065 <varlistentry>
1066 <term><option>-i <replaceable>FILE</replaceable></option>, <option>--input=<replaceable>FILE</replaceable></option></term>
1067 <listitem>
1068 <para>Read the patch from the given file instead from standard input.</para>
1069 </listitem>
1070 </varlistentry>
1071
1072 <varlistentry>
1073 <term><option>-n</option>, <option>--no-merge</option></term>
1074 <listitem>
1075 <para>When <link linkend="p-split-rejects">split rejects</link> are computed, all <link linkend="sec-dpfrmpairing">methods for pairing messages</link> like on diffing are used. Pairing by merging can sometimes lead to same strange results as on diffing, and this option disables it.</para>
1076 </listitem>
1077 </varlistentry>
1078
1079 <varlistentry>
1080 <term><option>-p <replaceable>NUM</replaceable></option>, <option>--strip=<replaceable>NUM</replaceable></option></term>
1081 <listitem>
1082 <para>Strips the smallest prefix containing the given number of slashes from file paths read from the patch file, when trying to match the files on disk to patch. If this option is not given, only the base name of each read file path is taken as relative path to match on disk. (This is the same behavior as in <command>patch(1)</command>.)</para>
1083 </listitem>
1084 </varlistentry>
1085
1086 <varlistentry>
1087 <term><option>-u</option>, <option>--unembed</option></term>
1088 <listitem>
1089 <para>Clears all embedded differences in input PO files, after they have been patched <link linkend="sec-dppatchemb">with embedding</link>.</para>
1090 </listitem>
1091 </varlistentry>
1092
1093 </variablelist>
1094 </para>
1095
1096 </sect2>
1097
1098 <sect2 id="sec-dppatchcfg">
1099 <title>User Configuration</title>
1100
1101 <para><command>poepatch</command> consults the following <link linkend="sec-cmconfig">user configuration</link> fields:
1102 <variablelist>
1103
1104 <varlistentry>
1105 <term><literal>[poepatch]/merge=[*yes|no]</literal></term>
1106 <listitem>
1107 <para>Setting to <literal>no</literal> is counterpart to <option>--no-merge</option> command line option, i.e. this field can be used to permanently disable <link linkend="p-pairing-merging">pairing by mergingM</link> when computing split rejects.</para>
1108 </listitem>
1109 </varlistentry>
1110
1111 </variablelist>
1112 </para>
1113
1114 </sect2>
1115
1116 </sect1>
1117
1118 </chapter>
1119