pim/kitinerary-workbench/README.md

0001 # KItinerary Workbench
0002
0003 Interactive test and inspection tool for developing extractor scripts for
0004 the [Itinerary data extraction engine](https://invent.kde.org/pim/kitinerary).
0005
0006 ![KItinerary Workbench script editor](doc/kitinerary-workbench-script-editor.png)
0007
0008 ## Installing
0009
0010 The easiest way to get KItinerary Workbench is from KDE's nightly Flatpak repository.
0011
0012 ```
0013 flatpak remote-add --if-not-exists kdeapps --from https://distribute.kde.org/kdeapps.flatpakrepo
0014 flatpak install org.kde.kitinerary-workbench
0015 ```
0016
0017 ## Usage
0018
0019 KItinerary Workbench is structured into two main UI parts, the input panel on the left,
0020 and the output panel on the right.
0021
0022 The input panel allows specifying the input data and its context, as well as inspecting
0023 pre-processed input data (such as textual and image data extracted from a PDF). The output
0024 panel allows to inspect the result of the various extractor and post-processing stages.
0025
0026 To test an extractor, specify the input data on the source tab of the input panel, either
0027 by opening a file (via the file open dialog, or e.g. by dnd-ing an email attachment on to the
0028 file input line), or by entering textual source data directly in the text field. If not detected
0029 automatically, you also need to set the right input data type (plain text, HTML, PDF, Apple Wallet
0030 passes, IATA boarding pass codes, UIC 918.3 train ticket codes, etc).
0031
0032 For structured data extractors this should already show results in the output panel then, for
0033 unstructured data extractors you additionally need to specify the sender email (used to pick
0034 the right extractor script) and optionally a context date (used to resolve date/time ambiguities).
0035
0036 ## Extractor Development
0037
0038 For quick iterations during extractor script development, you can use the 'Reload' action to reload
0039 and re-run the extractor on the already loaded input data. Any changes to the input data will trigger
0040 this as well, so you can easily test certain variations in the input by editing the input text field.
0041
0042 For reloading to work, your extractor script must be placed in the file system rather than be compiled
0043 in. It's therefore convenient to symlink the extractor sources to $XDG_DATA_DIRS/kitinerary/extractors
0044 (see https://api.kde.org/kdepim/kitinerary/html/classKItinerary_1_1ScriptExtractor.html). Note that
0045 reloading only works for extractor scripts, not for extractor meta data.
0046
0047 For PDF extractors, the input panel provides two additional tabs, one showing the extracted plain text
0048 and one showing a list of images. For the image list, context menu action provide the ability to perform
0049 Aztec of PDF417 barcode decoding. If successful the result of that is set as the new input text, and
0050 selecting 'IATA BCBP' or 'UIC 918.3' as input will show the result of decoding the respective barcode
0051 message.
0052
0053 For HTML extractors, the input panel provides an additional tab, showing the DOM tree of the parsed
0054 document, and the attributes of an selected element in there.
0055
0056 The output view does not only show the final result ('Post-processed') but also the output of the
0057 extractor script directly ('Extractor'), before it has been normalized, validated and
0058 augmented in the post-processing stage (see https://api.kde.org/kdepim/kitinerary/html/classKItinerary_1_1ExtractorPostprocessor.html).
0059
0060 ## Contributing
0061
0062 See the contributions section of the [Itinerary data extraction engine](https://invent.kde.org/pim/kitinerary)
0063 README about contributing new or improved extractor scripts as well as about donating sample data.