Warning, /documentation/digikam-doc/post_processing/ocrtext_converter.rst is written in an unsupported language. File is not indexed.
0001 .. meta:: 0002 :description: The digiKam OCR Text Converter 0003 :keywords: digiKam, documentation, user manual, photo management, open source, free, learn, easy, ocr, text, tesseract 0004 0005 .. metadata-placeholder 0006 0007 :authors: - digiKam Team 0008 0009 :license: see Credits and License page for details (https://docs.digikam.org/en/credits_license.html) 0010 0011 .. _ocrtext_converter: 0012 0013 OCR Text Converter 0014 ================== 0015 0016 .. contents:: 0017 0018 The OCR text converter is a tool to parse the contents of an image and detect areas with text to convert into editable and translatable characters files. 0019 0020 The tool can batch optical character recognition (OCR) over images, and their translations in many languages using an online translator engine. It also allows you to review the text and make corrections and offers spell checking. 0021 0022 The tool use in background the `Tesseract <https://en.wikipedia.org/wiki/Tesseract_(software)>`_, a powerful open-source optical character recognition engine available for Linux, macOS, and Windows. 0023 0024 To perform text conversions, select the scanned images including text to recognize and start the tool from the menu :menuselection:`Tools --> OCR Text Converter`, or use the icon **OCR Text Converter** from the **Tools** tab on the right sidebar. The following dialog must appear: 0025 0026 .. figure:: images/ocrtext_converter_dialog.webp 0027 :alt: 0028 :align: center 0029 0030 The digiKam OCR Text Converter Dialog 0031 0032 On the right side, the **Text recognition** tab shows on the top of view the **Tesseract** binary program version detected on your system. If none is present, you will need to install it on your system. Below, the **Tesseract** settings can be customized to process images. 0033 0034 The **Languages** setting specifies the language used for OCR. In the **Default** mode, when processing digital text with multiple languages, **Tesseract** can automatically recognize languages using Latin alphabets such as English or French, but it's not compatible with languages using hieroglyphs such as Chinese or Japanese. You can use the **Orientation and Script Detection** mode instead or a specific language module if available. 0035 0036 The **Segmentation mode** settings specify the **Tesseract** page segmentation mode to use while processing images. Possible choices are listed below: 0037 0038 - **OSD only**: Orientation and Script Detection (OSD) only. 0039 - **With OSD**: Automatic page segmentation with OSD. 0040 - **No OSD**: Automatic page segmentation, but no OSD, or OCR. 0041 - **Default**: Fully automatic page segmentation, but no OSD. 0042 - **Col of text**: Assume a single column of text of variable sizes. 0043 - **Vertically aligned**: Assume a single uniform block of vertically aligned text. 0044 - **Block**: Assume a single uniform block of text. 0045 - **Line**: Treat the image as a single text line. 0046 - **Word**: Treat the image as a single word. 0047 - **Word in circle**: Treat the image as a single word in a circle. 0048 - **Character**: Treat the image as a single character. 0049 - **Sparse text**: Sparse text. Find as much text as possible in no particular order. 0050 - **Sparse text + OSD**: Sparse text with OSD. 0051 - **Raw line**: Treat the image as a single text line, bypassing hacks that are Tesseract-specific. 0052 0053 If you want more details about the Tesseract Segmentation Mode you can read this `online tutorial <https://pyimagesearch.com/2021/11/15/tesseract-page-segmentation-modes-psms-explained-how-to-improve-your-ocr-accuracy/>`_. 0054 0055 The **Engine mode** setting specifies the **Tesseract** OCR internal engine to use while processing images. Possible choices are listed below: 0056 0057 - **Legacy**: Legacy engine only (older engine not based on the neural network). 0058 - **LSTM**: Neural network LSTM (Long Short-Term Memory deep-learning) engine only. 0059 - **Legacy + LSTM**: Both legacy and LSTM engines will be used. 0060 - **Default**: Default value. Let Tesseract choose the best engine based on what is available. 0061 0062 The **Resolution Dpi** settings specify the resolution as Dot Per Inch (DPI) for the input images. 0063 0064 If the **Use Multi-cores** setting is enabled, files from the list will be processed in parallel with Tesseract. 0065 0066 The **Store result in** will specify where to place the text contents recognized by Tesseract while processing images. Possible choices are listed below: 0067 0068 - **Text file**: Store OCR result in a separate text file in the same directory as the processed image. 0069 - **Metadata**: Store OCR result in alternative-language XMP tag from image metadata. 0070 0071 On the bottom of this view, the OCR result can be translated into different languages using one online translation engine. You can set more than one translation language to process images. Corresponding translations will be hosted in separate text files or in extra metadata entries depending on the **Store result in** setting. See :ref:`this page from the manual <spellcheck_settings>` for more details about the **Localize Settings**. 0072 0073 The **Text Review** tab on the right side allows editing the OCR result for each image processed with Tesseract. Select one item from the list on the left side and OCR result will be displayed in a text editor. You can fix text if necessary or apply spell-checking. See :ref:`this page from the manual <localize_settings>` for more details about the **Spell-Checking Settings**. 0074 0075 On the bottom of the dialog, the **Default** button allows resetting all settings to the default values. The **Start OCR** drop-down button allows the processing of the currently selected images from the list or all items. Finally, the **Close** button will stop all OCR processes if any and close the dialog. 0076 0077 .. figure:: images/ocrtext_converter_review.webp 0078 :alt: 0079 :align: center 0080 0081 The digiKam OCR Text Converter Content to Review on the Right Side with the Corresponding Image Open in Showfoto