Warning, /sdk/pology/doc/user/programming.docbook is written in an unsupported language. File is not indexed.
0001 <?xml version="1.0" encoding="UTF-8"?> 0002 <!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" 0003 "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" 0004 [ 0005 <!ENTITY apibase "../../api/en_US"> 0006 <!ENTITY ap "&apibase;/pology."> 0007 <!ENTITY amm "-module.html"> 0008 <!ENTITY am "&amm;#"> 0009 <!ENTITY acc "-class.html"> 0010 <!ENTITY ac "&acc;#"> 0011 ]> 0012 0013 <chapter id="ch-prog"> 0014 <title>Programming with Pology</title> 0015 0016 <para>You may find it odd that the user manual contains the section on programming, as that is normally the matter for a separate, programmer-oriented document. On the other hand, while reading the "pure user" sections of this manual, you may have noticed that in Pology the distinction between a user and a programmer is more blurry than one would expect of a translation-related tool. Indeed, before getting into writing standalone Python programs which use the Pology library, there are many places in Pology itself where you can plug in some Python code to adapt the behavior to your language and translation environment. This section exists to support and stimulate such interaction with Pology.</para> 0017 0018 <para>The Pology library is quite simple conceptually and organizationally. It consists of a small core abstraction of the PO format, and a lot of mutually unrelated functionality that may come in handy in particular translation processing scenarios. Everything is covered by <ulink url="&apibase;">the Pology API documentation</ulink>, but since API documentation tends to be non-linear and full of details obstructing the bigger picture, the following subsections are there to provide synthesis and rationale of salient points.</para> 0019 0020 <!-- ======================================== --> 0021 <sect1 id="sec-prfile"> 0022 <title>PO Format Abstraction</title> 0023 0024 <para>The PO format abstraction in Pology is a quite direct and fine-grained reflexion of PO format elements and conventions. This was a design goal from the start; no attempt was made at a more general abstraction, which would tentatively support various translation file formats.</para> 0025 0026 <para>There is, however, one glaring but intentional omission: multi-domain PO files (those which contain <literal>domain "..."</literal> directives) are not supported. We had never observed a multi-domain PO file in the wild, nor thought of a significant advantage it could have today over multiple single-domain PO files. Supporting multi-domain PO files would mean not only always needing two nested loops to iterate through messages in a PO file, but it would also interfere with higher levels in Pology which assume equivalence between PO files and domains. Pology will simply report an error when trying to read a multi-domain PO file.</para> 0027 0028 <sect2 id="sec-prflmon"> 0029 <title>Monitored Objects</title> 0030 0031 <para>Because the PO abstraction is intended to be robust against programming errors when quickly writting custom scripts, and frugal on file modifications, by default some of the abstracted objects are "monitored". This means that they are checked for expected data types and have modification counters. Main monitored objects are PO files, PO headers, and PO messages, but also their attributes which are not plain data types (strings or numbers). For the moment, these secondary monitored types include <ulink url="≈monitored.Monlist&acc;"><classname>Monlist</classname></ulink> (the monitored counterpart to built-in <type>list</type>), <ulink url="≈monitored.Monset&acc;"><classname>Monset</classname></ulink> (counterpart to <type>set</type>), and <ulink url="≈monitored.Monpair&acc;"><classname>Monpair</classname></ulink> (like two-element <type>tuple</type>). Monitored types do not in general provide the full scope of functionality of their built-in counterparts, so sometimes it may be easier (and faster) to work with built-in types and convert them to monitored at the moment of adding to PO objects.</para> 0032 0033 <para>To take a <classname>Monlist</classname> instance as an example, here is how it behaves on its own: 0034 <programlisting language="python"> 0035 >>> from pology.monitored import Monlist 0036 >>> l = Monlist(["a", "b", "c"]) 0037 >>> l.modcount 0038 0 0039 >>> l.append(10) 0040 >>> l 0041 Monlist(["a", "b", "c", 10]) 0042 >>> l.modcount 0043 1 0044 >>> 0045 </programlisting> 0046 Appending an element has caused the modification counter to increase, but, as expected, it was possible to add an integer in spite of previous elements being strings. However, if the monitored list comes from a PO message: 0047 <programlisting language="python"> 0048 >>> from pology.message import Message 0049 >>> msg = Message() 0050 >>> msg.msgstr 0051 Monlist([]) 0052 >>> msg.msgstr.append(10) 0053 Traceback (most recent call last): 0054 ... 0055 pology.PologyError: Expected <type 'unicode'> for sequence element, got <type 'int'>. 0056 >>> msg.msgstr.append("bar") 0057 >>> msg.msgstr.modcount 0058 1 0059 >>> msg.modcount 0060 1 0061 </programlisting> 0062 The <classname>Message</classname> class has type constraints added to its attributes, and therefore addition of an integer to the <varname>.msgstr</varname> list was rejected: only <type>str</type> values are allowed, to prevent carelessness with encodings. Once a proper string was added to <varname>.msgstr</varname> list, its modification counter increased, but also the modification counter of the parent object.</para> 0063 0064 <para>A few more notes on modification counters. Consider this example: 0065 <programlisting language="python"> 0066 >>> msg = Message() 0067 >>> msg.msgstr = Monlist("foo") 0068 >>> msg.msgstr.modcount 0069 0 0070 >>> msg.msgstr_modcount 0071 1 0072 >>> msg.modcount 0073 1 0074 >>> msg.msgstr[0] = "foo" 0075 >>> msg.msgstr.modcount 0076 0 0077 >>> msg.msgstr = Monlist("foo") 0078 >>> msg.msgstr_modcount 0079 1 0080 >>> msg.modcount 0081 1 0082 </programlisting> 0083 <literal>Monlist("foo")</literal> itself is a fresh list with modification counter at 0, so after it was assigned to <varname>msg.msgstr</varname>, its modification counter is still 0. However, every attribute of a parent monitored object also has the associated <emphasis>attribute</emphasis> modification counter, denoted with trailing <literal>_modcount</literal>; therefore <varname>msg.msgstr_modcount</varname> did increase on assignment, and so did the parent <varname>msg.modcount</varname>. Modification tracking actually checks for equality of values, so when same-valued objects are repeadetly assigned (starting from <literal>msg.msgstr[0] = "foo"</literal> above), modification counters do not increase.</para> 0084 0085 <para>Compound monitored objects may also have the attributes themselves constrained, to prevent typos and other brain glitches from causing mysterious wrong behavior when processing PO files. For example: 0086 <programlisting language="python"> 0087 >>> msg = Message() 0088 >>> msg.msgtsr = Monlist("foo") 0089 Traceback (most recent call last): 0090 ... 0091 pology.PologyError: Attribute 'msgtsr' is not among specified. 0092 >>> 0093 </programlisting> 0094 </para> 0095 0096 <para>You may conclude that modification tracking and type and attribute constraining would slow down processing, and you would be right. Since PO messages are by far the most processed objects, a non-monitored counterpart to <classname>Message</classname> is provided as well, for occasions where the code is only reading PO files, or has been sufficiently tested, and speed is of importance. See <xref linkend="sec-prflmsg"/> for details.</para> 0097 0098 </sect2> 0099 0100 <sect2 id="sec-prflmsg"> 0101 <title>Message</title> 0102 0103 <para>PO messages are by default represented with the <ulink url="≈message.Message&acc;"><classname>Message</classname></ulink> class. It is monitored for modifications, and constrained on attributes and attribute types. It provides direct attribute access to parts of a PO message: 0104 <programlisting language="python"> 0105 >>> from pology.monitored import Monpair 0106 >>> from pology.message import Message 0107 >>> msg = Message() 0108 >>> msg.msgid = "Foo %s" 0109 >>> msg.msgstr.append("Bar %s") 0110 >>> msg.flag.add("c-format") 0111 >>> msg.fuzzy = True 0112 >>> print msg.to_string(), 0113 #, fuzzy, c-format 0114 msgid "Foo %s" 0115 msgstr "Bar %s" 0116 0117 >>> 0118 </programlisting> 0119 Attribute access provides the least hassle, while being guarded by monitoring, and makes clear the semantics of particular message parts. For example, the <varname>.flag</varname> attribute is a set, to indicate that the order of flags should be of no importance to either a human translator or a PO processor, and the <varname>.msgstr</varname> attribute is always a list in order to prevent the programmer from not taking into account plural messages. While the fuzzy state is formally indicated by a flag, it is considered special enough to have a separate attribute.</para> 0120 0121 <para>Some message parts may or may not be present in a message, and when they are not present, the corresponding attributes are either empty if sequences (e.g. <varname>.manual_comment</varname> list for translator comments), or set to <literal>None</literal> if strings<footnote> 0122 <para>The canonical way to check if message is a plural message is <literal>msg.msgid_plural is not None</literal>.</para> 0123 </footnote> (e.g. <varname>.msgctxt</varname>).</para> 0124 0125 <para>There are also several derived, read-only attributes for special purposes. For example, if in some context the messages are to be tracked in a dictionary by their keys, there is the <varname>.key</varname> attribute available, which is an undefined but unique combination of <varname>.msgctxt</varname> and <varname>.msgid</varname> attributes. Or, there is the <varname>.active</varname> attribute which is <literal>True</literal> if the message is neither fuzzy nor obsolete, i.e. its translation (if there is one) would be used by the consumer of the PO file that the message is part of.</para> 0126 0127 <para><classname>Message</classname> has a number of methods for frequent operations that need to read or modify more than one attribute. For example, to thoroughly unfuzzy a message, it is not sufficient to just remove its fuzzy flag (by setting <varname>.fuzzy</varname> to <literal>False</literal> or removing <literal>"fuzzy"</literal> from <varname>.flag</varname> set), but previous field comments (<literal>#| ...</literal>) should be removed as well, and this is what <function>.unfuzzy()</function> method does: 0128 <programlisting language="python"> 0129 >>> print msg.to_string(), 0130 #| msgid "Foubar" 0131 #, fuzzy 0132 msgid "Foobar" 0133 msgstr "Fubar" 0134 0135 >>> msg.unfuzzy() 0136 >>> print msg.to_string(), 0137 msgid "Foobar" 0138 msgstr "Fubar" 0139 0140 </programlisting> 0141 Other methods include those to copy over a subset of parts from another message, to revert the message to pristine untranslated state, and so on.</para> 0142 0143 <para>There exists a non-monitored counterpart to <classname>Message</classname>, the <ulink url="≈message.MessageUnsafe&acc;"><classname>MessageUnsafe</classname></ulink>class. Its attributes are of built-in types, e.g. <varname>.msgstr</varname> is plain <classname>list</classname>, and there is no type nor attribute checking. By using <classname>MessageUnsafe</classname>, a speedup of 50% to 100% has been observed in practical applications, so it makes for a good trade-off when you know what you are doing (e.g. you are certain that no modifications will be made). A PO file is opened with non-monitored messages by issuing the <literal>monitored=False</literal> argument to <classname>Catalog</classname> constructor.</para> 0144 0145 <para>Read-only code could should work with <classname>Message</classname> and <classname>MessageUnsafe</classname> objects without any type-based specialization. Code that writes may need some care to achieve the same, for example: 0146 <programlisting language="python"> 0147 def translate_moo_as_mu (msg): 0148 0149 if msg.msgid == "Moo!": # works for both 0150 msg.msgstr = ["Mu!"] # raises exception if Message 0151 msg.msgstr[:] = ["Mu!"] # works for both 0152 msg.msgstr[0] = "Mu!" # works for both (when not empty) 0153 </programlisting> 0154 If you need to create an empty message of the same type as another message, or make a same-type copy of the message, you can use <function>type</function> built-in: 0155 <programlisting language="python"> 0156 newmsg1 = type(msg)() # create empty 0157 newmsg2 = type(msg)(msg) # copy 0158 </programlisting> 0159 <classname>Message</classname> and <classname>MessageUnsafe</classname> share the virtual base class <classname>Message_base</classname>, so you can use <literal>isinstance(obj, Message_base)</literal> to check if an object is a PO message of either type.</para> 0160 0161 </sect2> 0162 0163 <sect2 id="sec-prflhead"> 0164 <title>Header</title> 0165 0166 <para>The PO header could be treated as just another message, but that would both be inconvenient for operating on it, and disruptive in iteration over a catalog. Instead the <ulink url="≈header.Header&acc;"><classname>Header</classname></ulink> class is introduced. Similar to <classname>Message</classname>, it provides both direct attribute access to parts of the header (like the <varname>.field</varname> list of name-value pairs), and methods for usual manipulations which would need a sequence of basic data manipulations (like <function>.set_field()</function> to either modify an existing or add a new header field with the given value).</para> 0167 0168 <para>In particular, header comments are represented by a number of attributes (<varname>.title</varname>, <varname>.author</varname>, etc.), some of which are strings and some lists, depending on semantics. Unfortunatelly, the PO format does not define this separation formally, so when the PO file is parsed, comments are split heuristically (<varname>.title</varname> will be the first comment line, <varname>.author</varname> will get every line which looks like it has an email address and a year in it, etc.)</para> 0169 0170 <para><classname>Header</classname> is a monitored class just like <classname>Message</classname>, but unlike <classname>Message</classname> it has no non-monitored counterpart. This is because in practice the header operations make a small part of total processing, so there is no real advantage at having non-monitored headers.</para> 0171 0172 </sect2> 0173 0174 <sect2 id="sec-prflcat"> 0175 <title>Catalog</title> 0176 0177 <para>PO files are read and written through <ulink url="≈catalog.Catalog&acc;"><classname>Catalog</classname></ulink> objects. A small script to open a PO file on disk (given as the first argument), find all messages that contain a certain substring in the original text (given as the second argument), and write those messages to standard output, would look like this: 0178 <programlisting language="python"> 0179 import sys 0180 from pology.catalog import Catalog 0181 from pology.msgreport import report_msg_content 0182 0183 popath = sys.argv[1] 0184 substr = sys.argv[2] 0185 0186 cat = Catalog(popath) 0187 for msg in cat: 0188 if substr in msg.msgid: 0189 report_msg_content(msg, cat) 0190 </programlisting> 0191 Note the minimalistic code, both by raw length and access interface. Instead of using something like <literal>print msg.to_string()</literal> to output the message, already in this example we introduce the <ulink url="≈msgreport&amm;"><literal>msgreport</literal></ulink> module, which contains various functions for reporting on PO messages;<footnote> 0192 <para>There is also the <ulink url="≈report&amm;"><literal>report</literal></ulink> module for reporting general strings. In fact, all code in Pology distribution is expected to use function from these modules for writing to output streams, and there should not be a <function>print</function> in sight.</para> 0193 </footnote> <function>report_msg_content()</function> will first output the PO file name and location of the message (line and entry number) within the file, and then the message content itself, with some highlighting (for field keywords, fuzzy state, etc.) if the output destination permits it. Since no modifications are done to messages, this example would be just as safe but run significantly faster if the PO file were opened in non-monitored mode. This is done by adding the <literal>monitored=False</literal> argument to <classname>Catalog</classname> constructor: 0194 <programlisting language="python"> 0195 cat = Catalog(popath, monitored=False) 0196 </programlisting> 0197 and no other modification is required.</para> 0198 0199 <para>When some messages are modified in a catalog created by opening a PO file on disk, the modifications will not be written back to disk until the <function>.sync()</function> method is called -- not even if the program exists. If the catalog is monitored and there were no modifications to it up to the moment <function>.sync()</function> is called, the file on disk will not be touched, and <function>.sync()</function> will return <literal>False</literal> (it returns <literal>True</literal> if the file is written).<footnote> 0200 <para>This holds only for catalogs created with monitoring, i.e. no <literal>monitored=True</literal> constructor argument. For non-monitored <function>.sync()</function> will always touch the file and report <literal>True</literal>.</para> 0201 </footnote> In a scenario where a bunch of PO files are processed, this allows you to report only those which were actually modified. Take as an example a simplistic<footnote> 0202 <para>As opposed to <link linkend="sv-find-messages">the <command>find-messages</command> sieve</link>.</para> 0203 </footnote> script to search and replace in translation: 0204 <programlisting language="python"> 0205 import sys 0206 from pology.catalog import Catalog 0207 from pology.fsops import collect_catalogs 0208 from pology.report import report 0209 0210 serchstr = sys.argv[1] 0211 replacestr = sys.argv[2] 0212 popaths = sys.argv[3:] 0213 0214 popaths = collect_catalogs(popaths) 0215 for popath in popaths: 0216 cat = Catalog(popath) 0217 for msg in cat: 0218 for i, text in enumerate(msg.msgstr): 0219 msg.msgstr[i] = text.replace(searchstr, replacestr) 0220 if cat.sync(): 0221 report("%s (%d)" % (cat.filename, cat.modcount)) 0222 </programlisting> 0223 This script takes the search and replace strings as the first two arguments, followed by any number of PO paths. The paths do not have to be only file paths, but can also be directory paths, in which case the <function>collect_catalogs()</function> function from <ulink url="≈fsops&amm;"><literal>fsops</literal></ulink> module will recursively collect any PO files in them. After the search and replace iteration through a catalog is done (<varname>msgstr</varname> being properly handled on plain and plural messages alike), its <function>.sync()</function> method is called, and if it reports that the file was modified, the file's path and number of modified texts is output. The latter is obtained simply as the modification counter state of the catalog, since it was bumped up by one on each text that actually got modified. Note the use of <varname>.filename</varname> attribute for illustration, although in this particular case we had the path available in <varname>popath</varname> variable.</para> 0224 0225 <para>Syncing to disk is an atomic operation. This means that if you or something else aborts the program in the middle of execution, none of the processed PO files will become corrupted; they will either be in their original state, or in the expected modified state.</para> 0226 0227 <para>As can be seen, at its base the <classname>Catalog</classname> class is an iterable container of messages. However, the precise nature of this container is less obvious. To the consumer (a program or converter) the PO file is a dictionary of messages by keys (<varname>msgctxt</varname> and <varname>msgid</varname> fields); there can be no two messages with the same key, and the order of messages is of no importance. For the human translator, however, the order of messages in the PO file is of great importance, because it is one of <link linkend="sec-pocontext">context indicators</link>. Message keys are parts of the messages themselves, which means that a message is both its own dictionary key and the value. Taking these constraints together, in Pology the PO file is treated as an <emphasis>ordered set</emphasis>, and the <classname>Catalog</classname> class interface is made to reflect this.</para> 0228 0229 <para>The ordered set nature of catalogs comes into play when the composition of messages, rather than just the messages themselves, is modified. For example, to remove all obsolete messages from the catalog, the <function>.remove()</function> method <emphasis>could</emphasis> be used: 0230 <programlisting language="python"> 0231 for msg in list(cat): 0232 if msg.obsolete: 0233 cat.remove(msg) 0234 cat.sync() 0235 </programlisting> 0236 Note that the message sequence was first copied into a list, since the removal would otherwise clobber the iteration. Unfortunatelly, this code will be very slow (linear time wrt. catalog size), since when a message is removed, internal indexing has to be updated to maintain both the order and quick lookups. Instead, the better way to remove messges is the <function>.remove_on_sync()</function> method, which marks the message for removal on syncing. This runs fast (constant time wrt. catalog size) and requires no copying into a list prior to iteration: 0237 <programlisting language="python"> 0238 for msg in cat: 0239 if msg.obsolete: 0240 cat.remove_on_sync(msg) 0241 cat.sync() 0242 </programlisting> 0243 </para> 0244 0245 <para>A message is added to the catalog using the <function>.add()</function> method. If <function>.add()</function> is given only the message itself, it will overwrite the message with the same key if there is one such, or else insert it according to source references, or append it to the end. If <function>.add()</function> is also given the insertion position, it will insert the message at that position only if the message with the same key does not exist in the catalog; if it does, it will ignore the given position and overwrite the existing message. When the message is inserted, <function>.add()</function> suffers the same performance problem as <function>.remove()</function>: it runs in linear time. However, the common case when an empty catalog is created and messages added one by one to the end can run in constant time, and this is what <function>.add_last()</function> method does.<footnote> 0246 <para>In fact, <function>.add_last()</function> does a bit more: if both non-obsolete and obsolete messages are added in mixed order, in the catalog they will be separated such that all non-obsolete come before all obsolete, but otherwise maintaining the order of addition.</para> 0247 </footnote></para> 0248 0249 <para>The basic way to check if a message with the same key exists in the catalog is to use the <literal>in</literal> operator. Since the catalog is ordered, if the position of the message is wanted, <function>.find()</function> method can be used instead. Both these methods are fast, running in constant time. There is a series of <function>.select_*()</function> methods for looking up messages by other than the key, which run in linear time, and return lists of messages since the result may not be unique any more.</para> 0250 0251 <para>Since it is ordered, the catalog can be indexed, and that either by a position or by a message (whose key is used for lookup). To replace a message in the catalog with a message which has the same key but is otherwise different, you can either first fetch its position and then use it as the index, or use the message itself as the index: 0252 <programlisting language="python"> 0253 # Idexing by position. 0254 pos = cat.find(msg) 0255 cat[pos] = msg 0256 0257 # Indexing by message key. 0258 cat[msg] = msg 0259 </programlisting> 0260 This leads to the following question: what happens if you modify the key of a message (its <varname>.msgctxt</varname> or <varname>.msgid</varname> attributes) in the catalog? In that case the internal index goes out of sync, rather than being automatically updated. This is a necessary performance measure. If you need to change message keys, while doing that you should treat the catalog as a pure list, using only <literal>in</literal> iteration and positional indexing. Afterwards you should either call <function>.sync()</function> if you are done with the catalog, or <function>.sync_map()</function> to only update indexing (and remove messages marked with <function>.remove_on_sync()</function>) without writing out the PO file.</para> 0261 0262 <para>The <classname>Catalog</classname> class provides a number of convenience methods which report things about the catalog based on the header information, rather than having to manually examine the header. These include the number of plural forms, the <varname>msgstr</varname> index for the given plural number, as well as information important in some Pology contexts, like language code, accelerator markers, markup types, etc. Each of these methods has a counterpart which sets the appropriate value, but this value is not written to disk when the catalog is synced. This is because frequently there are more ways in which the value can be determined from the header, so it is ambiguous how to write it out. Instead, these methods are used to set or override values provided by the catalog (e.g. based on command line options) for the duration of processing only.</para> 0263 0264 <para>To create an empty catalog if it does not exist on disk, the <literal>create=True</literal> argument can be added to the constructor. If the catalog does exist, it will be opened as usual; if it did not exist, the new PO file will be written to disk on sync. To unconditionally create an empty catalog, whether the PO file exists or not at the given path, the <literal>truncate=True</literal> parameter should be added as well. In this case, if the PO file did exist, it will be overwritten with the new content only when the catalog is synced. The catalog can also be created with an empty string for path, in which case it is guaranteed to be empty even without setting <literal>truncate=True</literal>. If a catalog with empty path should later be synced (as opposed to being transient during processing), its <varname>.filename</varname> attribute can simply be assigned a valid path before calling <function>.sync()</function>.</para> 0265 0266 <para>In summary, it can be said that the <classname>Catalog</classname> class is biased, in terms of performance and ease of use, towards processing existing PO files rather than creating PO files from scratch, and towards processing existing messages in the PO file rather than shuffling them around.</para> 0267 0268 </sect2> 0269 0270 </sect1> 0271 0272 <!-- ======================================== --> 0273 <sect1 id="sec-prcodconv"> 0274 <title>Coding Conventions</title> 0275 0276 <para>This section describes the style and conventions that the code which is intended to be included in Pology distribution should adhere to. The general coding style is expected to follow the Python style guide described in <ulink url="http://www.python.org/dev/peps/pep-0008/">PEP 8</ulink>.</para> 0277 0278 <para>Lines should be up to 80 characters long. Class names should be written in camel case, and all other names in lower case with underscores: 0279 <programlisting language="python"> 0280 class SomeThingy (object): 0281 ... 0282 0283 def some_method (self, ...): 0284 0285 ... 0286 longer_variable = ... 0287 0288 0289 def some_function (...): 0290 ... 0291 </programlisting> 0292 Long expressions with operators should be wrapped in parentheses and before the binary operator, with the first line indented to the level of the other operand: 0293 <programlisting language="python"> 0294 some_quantity = ( a_number_of_thingies * quantity_of_that_per_unit 0295 + the_base_offset) 0296 </programlisting> 0297 In particular, long conditions in <literal>if</literal> and <literal>while</literal> statements should be written like this: 0298 <programlisting language="python"> 0299 if ( something and something_else and yet_something 0300 and somewhere_in_between and who_knows_what_else 0301 ): 0302 do_something_appropriate() 0303 </programlisting> 0304 </para> 0305 0306 <para>All messages, warnings, and errors should be issued through <ulink url="≈report&amm;"><literal>msgreport</literal></ulink> and <ulink url="≈msgreport&amm;"><literal>msgreport</literal></ulink> modules. There should be no <function>print</function> statements or raw writes to <literal>sys.stdout</literal>/<literal>sys.stderr</literal>.</para> 0307 0308 <para>For the code in Pology library, it is always preferable to raise an exception instead of aborting execution. On the other hand, it is fine to add optional parameters by which the client can select if the function should abort rather than raise an exception. All topical problems should raise <classname>pology.PologyError</classname> or a subclass of it, and built-in exceptions only for simple general problems (e.g. <classname>IndexError</classname> for indexing past the end of something).</para> 0309 0310 <sect2 id="sec-prcsi18n"> 0311 <title>User-Visible Text and Internationalization</title> 0312 0313 <para>All user-visible text, be it reports, warnings, errors (including exception messages) should be wrapped for internationalization through Gettext. The top <ulink url="≈pology&amm;"><literal>pology</literal></ulink> module provides several wrappers for Gettext functions, which have the following special traits: context is mandatory on every wrapped text, all format directives must be named, and arguments are specified as keyword-value pairs just after the text argument (unless deferred translation is used). Some examples: 0314 <programlisting language="python"> 0315 # Simple message with context marker. 0316 _("@info", 0317 "Trying to sync unnamed catalog.") 0318 0319 # Simple message with extended context. 0320 _("@info command description", 0321 "Keep track of who, when, and how, has translated, modified, " 0322 "or reviewed messages in a collection of PO files.") 0323 0324 # Another context marker and extended context. 0325 _("@title:column words per message in original", 0326 "w/msg-or") 0327 0328 # Parameter substitution. 0329 _("@info", 0330 "Review tag '%(tag)s' not defined in '%(file)s'.", 0331 tag=rev_tag, file=config_path) 0332 0333 # Plural message 0334 n_("@item:inlist", 0335 "written %(num)d word", "written %(num)d words", 0336 num=nwords) 0337 0338 # Deferred translation, when arguments are known later. 0339 tmsg = t_("@info:progress", 0340 "Examining state: %(file)s") 0341 ... 0342 msg = tmsg.with_args(file=some_path).to_string() 0343 </programlisting> 0344 Every context starts with the "context marker" in form of <literal>@<replaceable>keyword</replaceable></literal>, drawn from a predefined set (see the <ulink url="http://techbase.kde.org/Development/Tutorials/Localization/i18n_Semantics#Context_Markers">article on i18n semantics</ulink> at KDE Techbase); it is most often <literal>@info</literal> in Pology code. The context marker may be, and should be, followed by a free-form extend context whenever it can help the translator to understand how and where the message is used. It is usual to have the context, text and arguments in different lines, though not necessary if they are short enough to fit one line.</para> 0345 0346 <para>Pology defines lightweight XML markup for coloring text in the <ulink url="≈colors&amm;"><literal>colors</literal></ulink> module. In fact, Gettext wrappers do not return ordinary strings, but <ulink url="≈colors.ColorString&acc;"><classname>ColorString</classname></ulink> objects, and functions from <literal>report</literal> and <literal>msgreport</literal> modules know how to convert it to raw strings for given output destination (file, terminal, web page...). Therefore you can use colors in any wrapped string: 0347 <programlisting language="python"> 0348 _("@info:progress", 0349 "<green>History follows:</green>") 0350 0351 _("@info", 0352 "<bold>Context:</bold> %(snippet)s", 0353 snippet=some_text) 0354 </programlisting> 0355 Coloring should be used sparingly, only when it will help to cue user's eyes to significant elements of the output.</para> 0356 0357 <para>There are two consequences of having text markup available throughout. The first is that every message must be well-formed XML, which means that it must contain no unballanced tags, and that literal <literal><</literal> characters must be escaped (and then also <literal>></literal> for good style): 0358 <programlisting language="python"> 0359 _("@item automatic name for anonymous input stream", 0360 "&lt;stream-%(num)s&gt;", 0361 num=strno) 0362 </programlisting> 0363 The other consequence is that <classname>ColorString</classname> instances must be joined and interpolated with dedicated functions; see <function>cjoin()</function> and <function>cinterp()</function> functions in <literal>colors</literal> module.</para> 0364 0365 <para>Unless the text of the message is specifically intended to be a title or an insert (i.e. <literal>@title</literal> or <literal>@item</literal> context markers), it should be a proper sentence, starting with a capital letter and ending with a dot.</para> 0366 0367 </sect2> 0368 0369 </sect1> 0370 0371 <!-- ======================================== --> 0372 <sect1 id="sec-prsieves"> 0373 <title>Writing Sieves</title> 0374 0375 <para><link linkend="ch-sieve">Pology sieves</link> are filtering-like processing elements applied by the <command>posieve</command> script to collections of PO files. A sieve can examine as well as modify the PO entries passed through it. Each sieve is written in a separate file. If the sieve file is put into <filename>sieve/</filename> directory of Pology distribution (or intallation), the sieve can be referenced on <command>posieve</command> command line by the shorthand notation; otherwise the path to the sieve file is given. The former is called an internal sieve, and the latter an external sieve, but the sieve file layout and the sieve definition are same for both cases.</para> 0376 0377 <para>In the following, <command>posieve</command> will be referred to as "the client". This is because tools other than <command>posieve</command> may start to use sieves in the future, and it will also be described what these clients should adhere to when using sieves.</para> 0378 0379 <sect2 id="sec-prsvlayout"> 0380 <title>Sieve Layout</title> 0381 0382 <para>The sieve file must define the <classname>Sieve</classname> class, with some mandatory and some optional interface methods and instance variables. There are no restrictions at what you can put into the sieve file beside this class, only keep in mind that <command>posieve</command> will load the sieve file as a Python module, exactly once during a single run.</para> 0383 0384 <para>Here is a simple sieve (also the complete sieve file) which just counts the number of translated messages: 0385 <programlisting language="python"> 0386 class Sieve (object): 0387 0388 def __init__ (self, params): 0389 0390 self.ntranslated = 0 0391 0392 def process (self, msg, cat): 0393 0394 if msg.translated: 0395 self.ntranslated += 1 0396 0397 def finalize (self): 0398 0399 report("Total translated: %d" % self.ntranslated) 0400 </programlisting> 0401 The constructor takes as argument an object specifying any sieve parameters (more on that soon). The <methodname>process</methodname> method gets called for each message in each PO file processed by the client, and must take as parameters the message (instance of <ulink url="≈message.Message_base&acc;"><classname>Message_base</classname></ulink>) and the catalog which contains it (<ulink url="≈catalog.Catalog&acc;"><classname>Catalog</classname></ulink>). The client calls the <methodname>finalize</methodname> method after no more messages will be fed to the sieve, but this method does need to be defined (client should check if it exists before placing the call).</para> 0402 0403 <para>Another optional method is <methodname>process_header</methodname>, which the client calls on the PO header: 0404 <programlisting language="python"> 0405 def process_header (self, hdr, cat): 0406 # ... 0407 </programlisting> 0408 <literal>hdr</literal> is an instance of <ulink url="≈header.Header&acc;"><classname>Header</classname></ulink>, and <literal>cat</literal> is the containing catalog. The client will check for the presence of this method, and if it is defined, it will call it prior to any <methodname>process</methodname> call on the messages from the given catalog. In other words, the client is not allowed to switch catalogs between two calls to <methodname>process</methodname> without calling <methodname>process_header</methodname> in between.</para> 0409 0410 <para>There is also the optional <methodname>process_header_last</methodname> method, for which everything holds just like for <methodname>process_header</methodname>, except that, when present, the client must call it <emphasis>after</emphasis> all consecutive <methodname>process</methodname> calls on messages from the same catalog: 0411 <programlisting language="python"> 0412 def process_header_last (self, hdr, cat): 0413 # ... 0414 </programlisting> 0415 </para> 0416 0417 <para>Sieve methods should not abort program execution in case of errors, instead they should throw an exception. In particular, if the <methodname>process</methodname> method throws <ulink url="≈sieve.SieveMessageError&acc;"><classname>SieveMessageError</classname></ulink>, it means that the sieve can still process other messages in the same catalog; if it throws <ulink url="≈sieve.SieveCatalogError&acc;"><classname>SieveCatalogError</classname></ulink>, then any following messages from the same catalog must be skipped, but other catalogs may be processed. Similarly, if <methodname>process_header</methodname> throws <classname>SieveCatalogError</classname>, other catalogs may still be processed. Any other type of exception tells the client that the sieve should no longer be used.</para> 0418 0419 <para>The <methodname>process</methodname> and <methodname>process_header</methodname> methods should either return <literal>None</literal> or an integer exit code. A return value which is neither <literal>None</literal> nor <literal>0</literal> indicates that while the evaluation was successfull (no exception was thrown), the processed entry (message or header) should not be passed further along the <link linkend="sec-svchains">sieve chain</link>.</para> 0420 0421 </sect2> 0422 0423 <sect2 id="sec-prsvparams"> 0424 <title>Sieve Parameter Handling</title> 0425 0426 <para>The <literal>params</literal> parameter of the sieve constructor is an object with data attributes as <link linkend="p-svparam">parameters which may influence</link> the sieve operation. The sieve file can define the <function>setup_sieve</function> function, which the client will call with 0427 a <ulink url="≈subcmd.SubcmdView&acc;"><classname>SubcmdView</classname></ulink> object as the single argument, to fill in the sieve description and define all mandatory and optional parameters. For example, if the sieve takes an optional parameter named <literal>checklevel</literal>, which controles the level (an integer) at which to perform some checks, here is how <function>setup_sieve</function> could look like: 0428 <programlisting language="python"> 0429 def setup_sieve (p): 0430 0431 p.set_desc("An example sieve.") 0432 p.add_param("checklevel", int, defval=0, 0433 desc="Validity checking level.") 0434 0435 0436 class Sieve (object): 0437 0438 def __init__ (self, params): 0439 0440 if params.checklevel >= 1: 0441 # ...setup some level 1 validity checks... 0442 if params.checklevel >= 2: 0443 # ...setup some level 2 validity checks... 0444 #... 0445 0446 ... 0447 </programlisting> 0448 See the <ulink url="≈subcmd.SubcmdView∾add_param"><methodname>add_param</methodname></ulink> method for details on defining sieve parameters.</para> 0449 0450 <para>The client is not obliged to call <function>setup_sieve</function>, but it must make sure that the object it sends to the sieve as <literal>params</literal> has all the instance variable according to the defined parameters.</para> 0451 0452 </sect2> 0453 0454 <sect2 id="sec-prsvregime"> 0455 <title>Catalog Regime Indicators</title> 0456 0457 <para>There are two boolean instance variables that the sieve may define, and 0458 which the client may check for to decide on the regime in which the 0459 catalogs are opened and closed: 0460 <programlisting language="python"> 0461 class Sieve (object): 0462 0463 def __init__ (self, params): 0464 0465 # These are the defaults: 0466 self.caller_sync = True 0467 self.caller_monitored = True 0468 0469 ... 0470 </programlisting> 0471 The variables are: 0472 <itemizedlist> 0473 0474 <listitem> 0475 <para><varname>caller_sync</varname> instructs the client whether catalogs processed by the sieve should be synced to disk at the end. If the sieve does not define this variable, the client should assume <literal>True</literal> and sync catalogs. This variable is typically set to <literal>False</literal> in sieves which do not modify anything, because syncing catalogs takes time.</para> 0476 </listitem> 0477 0478 <listitem> 0479 <para><varname>caller_monitored</varname> tells the client whether it should open catalogs in monitored mode. If this variable is not set, the client should assume it <literal>True</literal>. This is another way of reducing processing time for sieves which do not modify PO entries.</para> 0480 </listitem> 0481 0482 </itemizedlist> 0483 </para> 0484 0485 <para>Usually a modifying sieve will set neither of these variables, i.e. catalogs will be monitored and synced by default, while a checker sieve will set both to <literal>False</literal>. For a modifying sieve that unconditionally modifies all entries sent to it, only <varname>caller_monitored</varname> may be set to <literal>False</literal> and <varname>caller_sync</varname> left undefined (i.e. <literal>True</literal>).</para> 0486 0487 <para>If a sieve requests no monitoring or no syncing, the client is not obliged to satisfy these requests. On the other hand, if a sieve does request monitoring or syncing (either explicitly or by not defining the corresponding variables), the client must provide catalogs in that regime. This is because there may be several sieves operating at the same time (a sieve chain), and monitoring and syncing is usually necessary for proper operation of those sieves that request it.</para> 0488 0489 </sect2> 0490 0491 <sect2 id="sec-prsvnotes"> 0492 <title>Further Notes on Sieves</title> 0493 0494 <para>Since monitored catalogs have modification counters, the sieve may use them within its <methodname>process*</methodname> methods to find out if any modification really took place. The proper way to do this is to record the counter at start, and check for increase at end: 0495 <programlisting language="python"> 0496 def process (self, msg, cat): 0497 0498 startcount = msg.modcount 0499 0500 # ... 0501 # ... do some stuff 0502 # ... 0503 0504 if msg.modcount > startcount: 0505 self.nmodified += 1 0506 </programlisting> 0507 The <emphasis>wrong</emphasis> way to do it would be to merely check if <literal>msg.modcount > 0</literal>, because several modifying sieves may be operating at the same time, each increasing the counters.</para> 0508 0509 <para>If the sieve wants to remove the message from the catalog, if at all possible it should use catalog's <methodname>remove_on_sync</methodname> instead of <methodname>remove</methodname> method, to defer actual removal to sync time. This is because <methodname>remove</methodname> will probably ruin client's iteration over the catalog, so if it must be used, the sieve documentation should state it clearly. <methodname>remove</methodname> also has linear execution time, while <methodname>remove_on_sync</methodname> has constant.</para> 0510 0511 <para>If the sieve is to become part of Pology distribution, it should be properly documented. This means fully equipped <function>setup_sieve</function> function in the sieve file, and a piece of user manual documentation. 0512 The <classname>Sieve</classname> class itself should not be documented in general. Only when <methodname>process*</methodname> are returning an exit code, this should be stated in their own comments (and in the user manual).</para> 0513 0514 </sect2> 0515 0516 </sect1> 0517 0518 <!-- ======================================== --> 0519 <sect1 id="sec-prhooks"> 0520 <title>Writing Hooks</title> 0521 0522 <para>Hooks are functions with specified sets of input parameters, return values, processing intent, and behavioral constraints. They can be used as modification and validation plugins in many processing contexts in Pology. There are three broad categories of hooks: filtering, validation and side-effect hooks.</para> 0523 0524 <para>Filtering hooks modify some of their inputs. Modifications are done in-place whenever the input is mutable (like a PO message), otherwise the modified input is provided in a return value (like a PO message text field).</para> 0525 0526 <para>Validation hooks perform certain checks on their inputs, and return a list of <emphasis>annotated spans</emphasis> or <emphasis>annotated parts</emphasis>, which record all the encountered errors: 0527 <itemizedlist> 0528 0529 <listitem> 0530 <para id="p-annspans">Annotated spans are reported when the object of validation is a piece of text. Each span is a tuple of start and end index of the problematic segment in the text, and a note which explains the problem. The return value of a text-validation hook will thus be a list: 0531 <programlisting language="python"> 0532 [(start1, end1, "note1"), (start2, end2, "note1"), ...] 0533 </programlisting> 0534 The note can also be <literal>None</literal>, if there is nothing to say about the problem.</para> 0535 </listitem> 0536 0537 <listitem> 0538 <para id="p-annparts">Annotated parts are reported for an object which has more than one distinct piece of text, such as a PO message. Each annotated part is a tuple stating the name of the problematic part of the object (e.g. <literal>"msgid"</literal>, <literal>"msgstr"</literal>), the item index for array-like parts (e.g. for <literal>msgstr</literal>), and the list of problems in appropriate form (for a PO message this is a list of annotated spans). 0539 The return value of a PO message-validation hook will look like this: 0540 <programlisting language="python"> 0541 [("part1", item1, [(start11, end11, "note11"), ...]), 0542 ("part2", item2, [(start21, end21, "note21"), ...]), 0543 ...] 0544 </programlisting> 0545 </para> 0546 </listitem> 0547 0548 </itemizedlist> 0549 </para> 0550 0551 <para>Side-effect hooks neither modify their inputs nor report validation information, but can be used for whatever purpose which is independent of the processing chain into which the hook is inserted. For example, a validation hook can be implemented like this as well, when it is enough that it reports problems to standard output, or where the hook client does not know how to use structured validation data (annotated spans or parts). The return value of a side-effect hook the number of errors encountered internally by the hook (an integer). Clients may use this number to decide upon further behavior. For example, if a side-effect hook modified a temporary copy of a file, the client may decide to abandon the result and use the original file if there were some errors.</para> 0552 0553 <sect2 id="sec-prhktypes"> 0554 <title>Hook Taxonomy</title> 0555 0556 <para>In this section a number of hook types are described and assigned a formal 0557 type keyword, so that they can be conveniently referred to elsewhere in Pology documentation.</para> 0558 0559 <para>Each type keyword has the form <emphasis><letter1><number><letter2></emphasis>, e.g. F1A. The first letter represents the hook category: <emphasis>F</emphasis> for filtering hooks, <emphasis>V</emphasis> for validation hooks, and <emphasis>S</emphasis> for side-effect hooks. The number enumerates the input signature by parameter types, and the final letter denotes the difference in semantics of input parameters for equal input signatures. As a handy mnemonic, each type is also given an informal signature in the form of <literal>(param1, param2, ...) -> result</literal>; in them, <literal>spans</literal> stand for <link linkend="p-annspans">annotated spans</link>, <literal>parts</literal> for <link linkend="p-annparts">annotated parts</link>, and <literal>numerr</literal> for number of errors.</para> 0560 0561 <para>Hooks on pure text: 0562 <itemizedlist> 0563 <listitem> 0564 <para>F1A (<literal>(text) -> text</literal>): filters the text</para> 0565 </listitem> 0566 <listitem> 0567 <para>V1A (<literal>(text) -> spans</literal>): validates the text</para> 0568 </listitem> 0569 <listitem> 0570 <para>S1A (<literal>(text) -> numerr</literal>): side-effects on text</para> 0571 </listitem> 0572 </itemizedlist> 0573 </para> 0574 0575 <para>Hooks on text fields in a PO message in a catalog: 0576 <itemizedlist> 0577 <listitem> 0578 <para>F3A (<literal>(text, msg, cat) -> text</literal>): filters any text field</para> 0579 </listitem> 0580 <listitem> 0581 <para>V3A (<literal>(text, msg, cat) -> spans</literal>): validates any text field</para> 0582 </listitem> 0583 <listitem> 0584 <para>S3A (<literal>(text, msg, cat) -> numerr</literal>): side-effects on any text field</para> 0585 </listitem> 0586 <listitem> 0587 <para>F3B (<literal>(msgid, msg, cat) -> msgid</literal>): filters an original text field; original fields are either <literal>msgid</literal> or <literal>msgid_plural</literal></para> 0588 </listitem> 0589 <listitem> 0590 <para>V3B (<literal>(msgid, msg, cat) -> spans</literal>): validates an original text field</para> 0591 </listitem> 0592 <listitem> 0593 <para>S3B (<literal>(msgid, msg, cat) -> numerr</literal>): side-effects on an original text field</para> 0594 </listitem> 0595 <listitem> 0596 <para>F3C (<literal>(msgstr, msg, cat) -> msgstr</literal>): filters a translation text field; translation fields are the <literal>msgstr</literal> array</para> 0597 </listitem> 0598 <listitem> 0599 <para>V3C (<literal>(msgstr, msg, cat) -> spans</literal>): validates a translation text field</para> 0600 </listitem> 0601 <listitem> 0602 <para>S3C (<literal>(msgstr, msg, cat) -> numerr</literal>): side-effects on a translation text field</para> 0603 </listitem> 0604 </itemizedlist> 0605 </para> 0606 0607 <para>*3B and *3C hook series are introduced next to *3A for cases when it does not make sense for text field to be any other but one of the original, or translation fields. For example, to process the translation sometimes the original (obtained by <literal>msg</literal> parameter) must be consulted. If a *3B or *3C hook is applied on an inappropriate text field, the results are undefined.</para> 0608 0609 <para>Hooks on PO entries in a catalog: 0610 <itemizedlist> 0611 <listitem> 0612 <para>F4A (<literal>(msg, cat) -> numerr</literal>): filters a message, modifying it</para> 0613 </listitem> 0614 <listitem> 0615 <para>V4A (<literal>(msg, cat) -> parts</literal>): validates a message</para> 0616 </listitem> 0617 <listitem> 0618 <para>S4A (<literal>(msg, cat) -> numerr</literal>): side-effects on a message (no modification)</para> 0619 </listitem> 0620 <listitem> 0621 <para>F4B (<literal>(hdr, cat) -> numerr</literal>): filters a header, modifying it</para> 0622 </listitem> 0623 <listitem> 0624 <para>V4B (<literal>(hdr, cat) -> parts</literal>): validates a header</para> 0625 </listitem> 0626 <listitem> 0627 <para>S4B (<literal>(hdr, cat) -> numerr</literal>): side-effects on a header (no modification)</para> 0628 </listitem> 0629 </itemizedlist> 0630 </para> 0631 0632 <para>Hooks on PO catalogs: 0633 <itemizedlist> 0634 <listitem> 0635 <para>F5A (<literal>(cat) -> numerr</literal>): filters a catalog, modifying it in any way</para> 0636 </listitem> 0637 <listitem> 0638 <para>S5A (<literal>(cat) -> numerr</literal>): side-effects on a catalog (no modification)</para> 0639 </listitem> 0640 </itemizedlist> 0641 </para> 0642 0643 <para>Hooks on file paths: 0644 <itemizedlist> 0645 <listitem> 0646 <para>F6A (<literal>(filepath) -> numerr</literal>): filters a file, modifying it in any way</para> 0647 </listitem> 0648 <listitem> 0649 <para>S6A (<literal>(filepath) -> numerr</literal>): side-effects on a file, no modification</para> 0650 </listitem> 0651 </itemizedlist> 0652 </para> 0653 0654 <para>The *2* hook series (with signatures <literal>(text, msg) -> ...</literal>) has been skipped because no need for them was observed so far next to *3* hooks.</para> 0655 0656 </sect2> 0657 0658 <sect2 id="sec-prhkfact"> 0659 <title>Hook Factories</title> 0660 0661 <para>Since hooks have fixed input signatures by type, the way to customize 0662 a given hook behavior is to produce its function by another function. 0663 The hook-producing function is called a I{hook factory}. It works by 0664 preparing anything needed for the hook, and then defining the hook proper 0665 and returning it, thereby creating a lexical closure around it: 0666 <programlisting language="python"> 0667 def hook_factory (param1, param2, ...): 0668 0669 # Use param1, param2, ... to prepare for hook definition. 0670 0671 def hook (...): 0672 0673 # Perhaps use param1, param2, ... in the hook definition too. 0674 0675 return hook 0676 </programlisting> 0677 </para> 0678 0679 <para>In fact, most internal Pology hooks are defined by factories.</para> 0680 0681 </sect2> 0682 0683 <sect2 id="sec-prhknotes"> 0684 <title>Further Notes on Hooks</title> 0685 0686 <para>General hooks should be defined in top level modules, language-dependent hooks in <literal>lang.<replaceable>code</replaceable>.<replaceable>module</replaceable></literal>, 0687 project-dependent hooks in <literal>proj.<replaceable>name</replaceable>.<replaceable>module</replaceable></literal>, 0688 and hooks that are both language- and project-dependent in <literal>lang.<replaceable>code</replaceable>.proj.<replaceable>name</replaceable>.<replaceable>module</replaceable></literal>. Hooks placed like this can be fetched by <ulink url="≈getfunc&am;get_hook_ireq"><function>getfunc.get_hook_ireq</function></ulink> in various non-code contexts, in particular from Pology utilities which allow users to insert hooks into processing through command line options or configurations. If the complete module is dedicated to a single hook, the hook function (or factory) should be named same as the module, so that users can select it by giving only the hook module name.</para> 0689 0690 <para><link linkend="p-annparts">Annotated parts</link> for PO messages returned by hooks are a reduced but valid instance of highlight specifications used by reporting functions, e.g. <ulink url="≈msgreport&am;report_msg_content"><function>msgreport.report_msg_content</function></ulink>. Annotated parts do not have the optional fourth element of a tuple in highlight specification, which is used to provide the filtered text against which spans were constructed, instead of the original text. If a validation hook constructs the list of problematic spans against the filtered text, just before returning it can apply <ulink url="≈diff&am;adapt_spans"><function>diff.adapt_spans</function></ulink> 0691 to reconstruct the spans against the original text.</para> 0692 0693 <para>The documentation to a hook function should state the hook type within the short description, in square brackets at the end as <literal>[type ... hook]</literal>. Input parameters should be named like in the informal signatures in the taxonomy above, and should not be omitted in <literal>@param:</literal> Epydoc entries; but the return should be given under <literal>@return:</literal>, also using one of the listed return names, in order to complete the hook signature.</para> 0694 0695 <para>The documentation to a hook factory should have <literal>[hook factory]</literal> at the end of the short description. It should normally list all the input parameters, while the return value should be given as <literal>@return: type ... hook</literal>, and 0696 the hook signature as the <literal>@rtype:</literal> Epydoc field.</para> 0697 0698 </sect2> 0699 0700 </sect1> 0701 0702 <!-- ======================================== --> 0703 <sect1 id="sec-prascsel"> 0704 <title>Writing Ascription Selectors</title> 0705 0706 <para>Ascription selectors are functions used by <command>poascribe</command> in the translation review workflow as described in <xref linkend="ch-ascript"/>. This section describes how you can write your own ascription selector, which you can then put to use by following the instructions in <xref linkend="sec-asccustsels"/>.</para> 0707 0708 <para>In terms of code, an ascription selector is a function factory, which construct the actual selector function based on supplied selector arguments. It has the following form: 0709 <programlisting language="python"> 0710 # Selector factory. 0711 def selector_foo (args): 0712 0713 # Validate input arguments. 0714 if (...): 0715 raise PologyError(...) 0716 0717 # Prepare selector definition. 0718 ... 0719 0720 # The selector function itself. 0721 def selector (msg, cat, ahist, aconf): 0722 0723 # Prepare selection process. 0724 ... 0725 0726 # Iterate through ascription history looking for something. 0727 for i, asc in enumerate(ahist): 0728 ... 0729 0730 # Return False or True if a shallow selector, 0731 # and 0 or 1-based history index if history selector. 0732 return ... 0733 0734 return selector 0735 </programlisting> 0736 It is customary to name the selector function <function>selector_<replaceable>something</replaceable></function>, where <replaceable>something</replaceable> will also be used as the selector name (in command line, etc). The input <varname>args</varname> parameter is always a list of strings. It should first be validated, insofar as possible without having in hand the particular message, catalog, ascription history or ascription configuration. Whatever does not depend on any of these can also be precomputed for later use in the selector function.</para> 0737 0738 <para>The selector function takes as arguments the message (an instance of <ulink url="≈message.Message_base&acc;"><classname>Message_base</classname></ulink>), the catalog (<ulink url="≈catalog.Catalog&acc;"><classname>Catalog</classname></ulink>) it comes from, the ascription history (list of <ulink url="≈ascript.AscPoint&acc;"><classname>AscPoint</classname></ulink> objects), and the ascription configuration (<ulink url="≈ascript.AscConfig&acc;"><classname>AscConfig</classname></ulink>). For the most part, <classname>AscPoint</classname> and <classname>AscConfig</classname> are simple attribute objects; check their API documentation for the list and description of attributes. Some of the attributes of <classname>AscPoint</classname> objects that you will usually inspect are <varname>.msg</varname> (the historical version of the message), <varname>.user</varname> (the user to whom the ascription was made), or <varname>.type</varname> (the type of the ascription, one of <varname>AscPoint.ATYPE_*</varname> constants). The ascription history is sorted from the latest to the earliest ascription. If the <varname>.user</varname> of the first entry in the history is <literal>None</literal>, that means that the current version of the message has not been ascribed yet (e.g. if its translation has been modified compared to the latest ascribed version). If you are writing a shallow selector, it should return <literal>True</literal> to select the message, or <literal>False</literal> otherwise. In a history selector, the return value should be a 1-based index of an entry in the ascription history which caused the message to be selected, or <literal>0</literal> if the message was not selected.<footnote> 0739 <para>In this way the history selector can automatically behave as shallow selector as well, because simply testing for falsity on the return value will show whether the message has been selected or not.</para> 0740 </footnote></para> 0741 0742 <para>The entry index returned by history selectors is used to compute embedded difference from a historical to the current version of the message, e.g. on <literal>poascribe diff</literal>. Note that <command>poascribe</command> will actually take as base for differencing the first non-fuzzy historical message <emphasis>after</emphasis> the indexed one, because it is assumed that already the historical message which triggered the selection contains some changes to be inspected. (When this behavior is not sufficient, <command>poascribe</command> offers the user to specify a second history selector, which directly selects the historical message to base the difference on.)</para> 0743 0744 <para>Most of the time the selector will operate on messages covered by a single ascription configuration, which means that the ascription configuration argument sent to it will always be the same. On the other hand, the resolution of some of the arguments to the selector factory will depend only on the ascription configuration (e.g. a list of users). In this scenario, it would be waste of performance if such arguments were resolved anew in each call to the selector. You could instead write a small caching (memoizing) resolver function, which when called for the second and subsequent times with the same configuration object, returns previously resolved argument value from the cache. A few such caching resolvers for some common arguments have been provided in the <ulink url="≈ascript&amm;"><literal>ascript</literal></ulink> module, functions named <function>cached_*()</function> (e.g. <ulink url="≈ascript&am;cached_users"><function>cached_users()</function></ulink>).</para> 0745 0746 </sect1> 0747 0748 </chapter>