# Settings

The templating engine uses several types of modules to extract data from source files.

## **OCR Module**

**OCR Module** — the module used for text recognition.

<figure><img src="https://3212714295-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FI0zUnKkOuy6lWt7DZ46u%2Fuploads%2Fgit-blob-5f96cd0cd89b6165e670c3fc7bd2b0889edc95c4%2Fimage%20(191).png?alt=media" alt=""><figcaption></figcaption></figure>

Several OCR modules are integrated into the Sherpa RPA platform. Two of them are provided with the robot: Tesseract OCR and Microsoft OCR. These modules can work offline, without an internet connection.

**Tesseract OCR** is an open-source optical character recognition (OCR) engine and is the most popular and high-quality OCR library. OCR uses neural networks to find and recognize text in images.

**Yandex Vision and ABBYY OCR** are online modules that utilize the features of their respective cloud services.

**Yandex Handwriting** is a module that allows for the recognition of handwritten text.

**Microsoft OCR** is a module that enables text recognition in images and scanned documents using optical character recognition (OCR).

**OCR Space** is a module that allows for the recognition of Cyrillic fonts in .jpg images.

**ABBYY FineReader** is a commercial offline module that requires a separate license for use.

The Sherpa RPA platform allows you to configure the script's operation with image recognition and switch between these OCR modules at any time.

## **OCR Scale**

**OCR Scale** — a parameter that allows you to improve recognition quality if documents have poor resolution.

<figure><img src="https://3212714295-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FI0zUnKkOuy6lWt7DZ46u%2Fuploads%2Fgit-blob-322882a0773ae9513a2bd61d2802c27a97017d59%2Fimage%20(192).png?alt=media" alt=""><figcaption></figcaption></figure>

In the case of high and medium quality scanned documents, it is recommended to leave the scale value at "2".

## **Recognition Language**

**Recognition Language** — a parameter that allows you to increase the accuracy of document recognition. You can also specify multiple languages, using commas as separators.

<figure><img src="https://3212714295-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FI0zUnKkOuy6lWt7DZ46u%2Fuploads%2Fgit-blob-89fa8e83df9151e79a024c171e515ee57fcf9ecc%2Fimage%20(193).png?alt=media" alt=""><figcaption></figcaption></figure>

When selecting the “Recognition Language” setting, a dropdown list opens, where you can mark the necessary languages for recognition using flags.

## **Recognition Language for Anchors**

**Recognition Language for Anchors** — a parameter that allows you to specify a specific recognition language for anchors. You can also specify multiple languages, using commas as separators.

<figure><img src="https://3212714295-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FI0zUnKkOuy6lWt7DZ46u%2Fuploads%2Fgit-blob-393d1861be0f91855f85084c51ea6b7161100dae%2Fimage%20(194).png?alt=media" alt=""><figcaption></figcaption></figure>

## **Direct Text Extraction from PDF**

**Direct Text Extraction from PDF** — a parameter that allows you to manage direct text extraction from the page.

<figure><img src="https://3212714295-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FI0zUnKkOuy6lWt7DZ46u%2Fuploads%2Fgit-blob-6c57cc12a27225151f816d8b4f0d647527c913c1%2Fimage%20(195).png?alt=media" alt=""><figcaption></figcaption></figure>

Possible values:

* No — disabled;
* Yes — only direct text extraction is used;
* Auto — automatic mode (if there is no text on the page, text recognition will be performed using the specified OCR module).

## **OCR Cell Size Horizontally**

**OCR Cell Size Horizontally** — a parameter that allows you to specify the horizontal divisor that determines the table cells on the page. The value must be greater than or equal to 1.

<figure><img src="https://3212714295-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FI0zUnKkOuy6lWt7DZ46u%2Fuploads%2Fgit-blob-3cc7177049b0c276d5299bc3b64961926404ca8a%2Fimage%20(196).png?alt=media" alt=""><figcaption></figcaption></figure>

The cell size is defined as the size of the image divided by this value.

The templating engine uses two parameters (horizontally and vertically) for more accurate table identification.

By default, the parameter is set to: 40 (OCR Cell Size Horizontally). This value is optimal for recognizing documents with standard (or close to standard) table cell sizes.

It is recommended to leave this value unchanged and only modify it in case of incorrect recognition (after verification).

If the table is not recognized with the specified parameter, the Templating algorithm will automatically increase this value by 10 and attempt again.

## **OCR Cell Size Vertically**

**OCR Cell Size Vertically** — a parameter that allows you to specify the vertical divisor that determines the table cells on the page. The value must be greater than or equal to 1.

<figure><img src="https://3212714295-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FI0zUnKkOuy6lWt7DZ46u%2Fuploads%2Fgit-blob-0e0a0d3c8cdda38aba730fdd84e4aa49960b301d%2Fimage%20(197).png?alt=media" alt=""><figcaption></figcaption></figure>

The cell size is defined as the size of the image divided by this value.

The templating engine uses two parameters (horizontally and vertically) for more accurate table identification.

By default, the parameter is set to: 20 (OCR Cell Size Vertically). This value is optimal for recognizing documents with standard (or close to standard) table cell sizes.

It is recommended to leave this value unchanged and only modify it in case of incorrect recognition (after verification).

If the table is not recognized with the specified parameter, the Templating algorithm will automatically increase this value by 10 and attempt again.

## **Horizontal Line Filter**

**Horizontal Line Filter** — a parameter that specifies the percentage below which lines will be ignored. It is defined as the ratio of the horizontal line to the width of the image. This parameter is used for attributes.

<figure><img src="https://3212714295-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FI0zUnKkOuy6lWt7DZ46u%2Fuploads%2Fgit-blob-eb671091e329aad58e1a028d9a4a16105a449481%2Fimage%20(198).png?alt=media" alt=""><figcaption></figcaption></figure>

By default, the value of this parameter is 5.

Often, scanned documents contain lines that are not table borders (various artifacts that need to be filtered out). With the specified parameter, all unnecessary artifacts (lines drawn by hand or lines that appeared due to poor scanning) will be ignored.

## **Auto Page Rotation**

**Auto Page Rotation** — a parameter that allows pages to be automatically rotated to an angle that is a multiple of 90 degrees (90°, 180°, and 270°).

<figure><img src="https://3212714295-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FI0zUnKkOuy6lWt7DZ46u%2Fuploads%2Fgit-blob-f22e4342a953f272650db58ed708d8eaf11484bd%2Fimage%20(199).png?alt=media" alt=""><figcaption></figcaption></figure>

Auto page rotation does not rotate the document at small angles. By default, the parameter value is “True”, and it is recommended to leave it unchanged.

## **Auto Page Alignment**

**Auto Page Alignment** — a parameter that allows you to align the content of the page when the sheet is scanned incorrectly. Unlike “Auto Page Rotation”, “Auto Page Alignment” rotates the document at small angles.

<figure><img src="https://3212714295-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FI0zUnKkOuy6lWt7DZ46u%2Fuploads%2Fgit-blob-e4b39509982a8a1ca01804d580f4f92765f443ae%2Fimage%20(200).png?alt=media" alt=""><figcaption></figcaption></figure>

When auto-aligning the page, the longest OCR line present on the page is recognized (most often this is a line from a table or attribute) and determines its tilt angle relative to the horizontal. After that, the document is rotated so that the found line becomes parallel to the horizon.

By default, the parameter value is “True”, and it is recommended to leave it unchanged.

This parameter is not advisable to use if the scanned document is significantly rotated and the rotation angle is more than 40°, in which case the Templating engine will not be able to recognize which direction to align the document and will return an error. In such a situation, you can use the “Angle Adjustment” parameter.

## **Length Criterion. Auto Page Alignment**

**Length Criterion. Auto Page Alignment** — a parameter that allows you to limit the length of the line. This parameter is used as a divisor for the width of the page. If the length of the line is less than the calculated value, it is skipped in the algorithm. If the page is initially significantly rotated, a larger value or 0 should be specified to disable the parameter.

<figure><img src="https://3212714295-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FI0zUnKkOuy6lWt7DZ46u%2Fuploads%2Fgit-blob-aa18a7f149dc35f7ae40d405c2d1852e8a370848%2Fimage%20(201).png?alt=media" alt=""><figcaption></figcaption></figure>

By default, the value is 10.

## **Process PDF Annotations**

**Process PDF Annotations** — a parameter that allows you to enable the processing of PDF file annotations.

<figure><img src="https://3212714295-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FI0zUnKkOuy6lWt7DZ46u%2Fuploads%2Fgit-blob-14c2eba831008418300c63cf78b085618a05a660%2Fimage%20(63).png?alt=media" alt=""><figcaption></figcaption></figure>

## **Process All Pages**

**Process All Pages** — a parameter that allows you to enable the processing of all pages in the document. This mode does not check the LastPage anchor.

<figure><img src="https://3212714295-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FI0zUnKkOuy6lWt7DZ46u%2Fuploads%2Fgit-blob-d7f3be05751c384af738eedbbf5dcc2947dcb9fc%2Fimage%20(64).png?alt=media" alt=""><figcaption></figcaption></figure>

## **Merge Blocks**

**Merge Blocks** — a parameter that allows you to merge adjacent blocks into one block.

<figure><img src="https://3212714295-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FI0zUnKkOuy6lWt7DZ46u%2Fuploads%2Fgit-blob-edcb78867653cc0322ee24cc195546d2a0b1f818%2Fimage%20(65).png?alt=media" alt=""><figcaption></figcaption></figure>

## **Split Blocks**

**Split Blocks** — a parameter that allows you to split blocks containing spaces into blocks without spaces.

<figure><img src="https://3212714295-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FI0zUnKkOuy6lWt7DZ46u%2Fuploads%2Fgit-blob-fdae67963c8d06bc73c4d035b16f13cec8a1417e%2Fimage%20(66).png?alt=media" alt=""><figcaption></figcaption></figure>

## **Ignore Anchor Errors**

**Ignore Anchor Errors** — a parameter that allows you to disable error generation if anchors (any) are not found. In this case, the anchor area is considered zero.

<figure><img src="https://3212714295-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FI0zUnKkOuy6lWt7DZ46u%2Fuploads%2Fgit-blob-3e147f3e2a6c24438e0e7d5484ff63d75b9815f0%2Fimage%20(67).png?alt=media" alt=""><figcaption></figcaption></figure>

## **Remove Blocks Exceeding Size**

**Remove Blocks Exceeding Size** — a parameter that allows you to remove blocks that exceed the specified size. The input field is located next to the parameter name. If you specify a single number, blocks with a width or height greater than this value will be removed. You can also specify values separated by commas in the format: width, height.

<figure><img src="https://3212714295-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FI0zUnKkOuy6lWt7DZ46u%2Fuploads%2Fgit-blob-4c871e928ce54849b7947cc1eb037681ef9e8d92%2Fimage%20(68).png?alt=media" alt=""><figcaption></figcaption></figure>

The size must be specified in pixels.

## **Image Percentage for Recognition**

**Image Percentage for Recognition** — a parameter that allows you to specify the percentage of the image that will be used for recognition by the OCR engine. The input field is located next to the parameter name. The value should be written as a single number or two numbers separated by a dash.

<figure><img src="https://3212714295-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FI0zUnKkOuy6lWt7DZ46u%2Fuploads%2Fgit-blob-27e7eeabc1f9a396ae37fe36955bc4e0e8171f11%2Fimage%20(69).png?alt=media" alt=""><figcaption></figcaption></figure>

For example:

30 or (0-30) — 0-30% of the image will be recognized;\
30-70 — part of the image will be recognized;\
70-100 — the lower 30% of the image will be recognized.

## **Find Stamps**

**Find Stamps** — a parameter that allows you to enable the search for stamps on the document. A key “Stamps” will be added to the attributes, which will return an array of StampItem objects with properties X, Y, Width, Height, PageIndex.

<figure><img src="https://3212714295-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FI0zUnKkOuy6lWt7DZ46u%2Fuploads%2Fgit-blob-8f5f4782ac14b216a277cbb1648ebc6acd50acf1%2Fimage%20(70).png?alt=media" alt=""><figcaption></figcaption></figure>

## **Ignore Watermarks**

Ignore Watermarks — when enabled, characters and text from watermarks will not be extracted.

<figure><img src="https://3212714295-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FI0zUnKkOuy6lWt7DZ46u%2Fuploads%2Fgit-blob-384ce909209cece1c1aaa0daca36171cf7ff89ea%2F2025-09-26_23-13-18.png?alt=media" alt=""><figcaption></figcaption></figure>

## **Return Tables as Dictionary**

Return Tables as Dictionary — when enabled, tables will be returned as a dictionary. The key of the dictionary will be the name of the table.

<figure><img src="https://3212714295-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FI0zUnKkOuy6lWt7DZ46u%2Fuploads%2Fgit-blob-898954a6be099f58880a496bb1a8ea28591a7e67%2F2025-09-26_23-14-53.png?alt=media" alt=""><figcaption></figcaption></figure>
