RU2520407C1

RU2520407C1 - Method and system of text improvement at digital copying of printed documents

Info

Publication number: RU2520407C1
Application number: RU2012148763/08A
Authority: RU
Inventors: Илья Васильевич Курилин; Илья Владимирович Сафонов
Original assignee: Корпорация "САМСУНГ ЭЛЕКТРОНИКС Ко., Лтд."
Priority date: 2012-11-16
Filing date: 2012-11-16
Publication date: 2014-06-27
Also published as: RU2012148763A; KR20140063378A

Abstract

FIELD: printing.

SUBSTANCE: invention relates to means of copying text documents. In the method the printed document is scanned, the scanned image is obtained, the connected regions of symbols are identified, the typical colours for groups of connected regions of symbols are determined, the contours of these regions are approximated using the sequences of line segments and curve segments, the rasterisation of the approximated contours is carried out with filling of their inner area by the respective typical colours, the modified image is printed.

EFFECT: reduction of the degree of text degradation in multiple copying of printed document.

8 cl, 11 dwg

Description

Заявляемое изобретение относится к области обработки цифровой информации, а более конкретно - к способам цифрового копирования печатных документов.The invention relates to the field of digital information processing, and more specifically to methods for digital copying of printed documents.

Из практики известно, что копирование печатных документов приводит к ухудшению качества получаемой копии в сравнении с оригиналом, особенно для текста. Наиболее заметен этот эффект при многократном последовательном копировании, когда полученная на предшествующем этапе копия копируется еще раз. В качестве основных причин ухудшения качества цветного и черно-белого текста, можно назвать следующие: размытие изображения в процессе сканирования, растрирование (halftoning) текстовых областей в процессе печати и появление эффекта цветной окантовки (color fringing). В результате процесса копирования, контрастный четкий текст на исходном печатном документе заменяется на менее четкую, растрированную копию с изменившимся цветом.It is known from practice that copying printed documents leads to a deterioration in the quality of the resulting copy in comparison with the original, especially for text. This effect is most noticeable during multiple sequential copying, when the copy obtained in the previous step is copied again. The main reasons for the deterioration in the quality of color and black and white text include the following: blurring the image during scanning, halftoning of text areas during printing and the appearance of the color fringing effect. As a result of the copying process, the contrasting clear text on the original printed document is replaced with a less clear, rasterized copy with a changed color.

Наиболее распространенным подходом к решению проблемы улучшения качества копируемого документа считается обнаружение на сканированном изображении таких категорий визуальной информации, как текст, растрированные фотографии и фон. Результатом такого обнаружения является выделение областей изображения (связных групп пикселей), которым в соответствие ставится одна из выбранных категорий. Далее, обнаруженные области на сканированном изображении обрабатываются различным, наиболее подходящим для данной категории области способом. Например, для областей текста часто применяются методы подчеркивания границ и локального улучшения контраста. Для растрированных фотографий применяется адаптивное сглаживание. Однако подобные подходы имеют ряд недостатков, заметных на полученной копии, как, например: изменение цвета внутри символа, особенно заметное для маленьких по площади символов, нарушение формы символов в ходе многократного копирования исходного печатного документа и т.д.The most common approach to solving the problem of improving the quality of a copied document is the detection of categories of visual information such as text, rasterized photographs and background on a scanned image. The result of this detection is the selection of image areas (connected groups of pixels), which correspond to one of the selected categories. Further, the detected areas in the scanned image are processed in various ways that are most suitable for this category of area. For example, text areas often use border emphasis and local contrast enhancement techniques. For rasterized photos, adaptive anti-aliasing is used. However, such approaches have a number of drawbacks that are noticeable on the resulting copy, such as: a color change inside the character, especially noticeable for small characters, a violation of the shape of the characters during repeated copying of the original printed document, etc.

Способ, описанный в патенте США №7557963 [1], касается одного из методов улучшения сканированного изображения для последующей печати его высококачественной копии. В частности, на сканированном изображении идентифицируются и размечаются области в соответствии с их предопределенными категориями: текст, изображение, граничные пиксели и фон. Далее, для каждой из выделенных областей выполняется ее улучшение в соответствии с категорией. Например, для области изображения, определенной как текст, может быть выполнено подчеркивание границ области за счет применения процедуры нечеткой маски (unsharp mask).The method described in US patent No. 7557963 [1], relates to one of the methods of improving the scanned image for subsequent printing of high-quality copies. In particular, regions are identified and marked on the scanned image according to their predetermined categories: text, image, boundary pixels and background. Further, for each of the selected areas, its improvement is carried out in accordance with the category. For example, for an image area defined as text, underlining the boundaries of the area can be performed by applying the unsharp mask procedure.

В патенте РФ №2308166 [2] раскрыт способ улучшения качества копии изображения путем предварительного сканирования объекта с низким разрешением, записи отсканированного изображения в память компьютера, определения параметров улучшения качества копии, сканирования с высоким разрешением изображения объекта с корректировкой посредством процессора обработки изображения с применением списка процедур и параметров улучшения качества копии.RF patent No. 2308166 [2] discloses a method for improving the quality of a copy of an image by preliminary scanning an object with a low resolution, writing a scanned image to a computer’s memory, determining parameters for improving the quality of a copy, scanning with a high resolution for an image of an object, and adjusting it using an image processor using a list procedures and parameters for improving copy quality.

Способ, описанный в патенте США №8,169,661 [3], предусматривает раздельную обработку цветных и черно-белых областей изображения, причем черно-белая часть изображения подвергается максимальному сжатию, за счет чего получают копию с высоким качеством и с малым размером файла.The method described in US patent No. 8,169,661 [3], provides for the separate processing of color and black and white areas of the image, and the black and white part of the image is subjected to maximum compression, thereby obtaining a copy with high quality and small file size.

Способ, раскрытый в патенте США №7177049 [4], предусматривает реконструирование цифрового изображения, включая обнаружение текста и его улучшение. Способ улучшения текста ориентирован на обработку черного текста на белом фоне, увеличивая резкость и контраст такого текста за счет перераспределения яркости между темными и светлыми пикселями в пределах предопределенной маски.The method disclosed in US patent No. 7177049 [4], involves the reconstruction of a digital image, including the detection of text and its improvement. The way to improve text is focused on processing black text on a white background, increasing the sharpness and contrast of such text due to the redistribution of brightness between dark and light pixels within a predefined mask.

Наиболее близкими признаками к заявляемому изобретению обладает техническое решение, представленное в патенте США №7079686 [5], который описывает систему, основанную на классификации пикселей изображения печатного документа с последующим улучшением этого изображения в соответствии с результатами выполненной классификации. Для этого каждому пикселю изображения ставится в соответствие вектор признаков, на основе которого выполняется классификация. Дальнейшая обработка может включать в себя применения фильтра усиления резкости границ для пикселей, классифицированных как текст, и применение сглаживающего фильтра для пикселей, классифицированных как изображение.The closest features to the claimed invention has a technical solution presented in US patent No. 7079686 [5], which describes a system based on the classification of pixels of an image of a printed document with the subsequent improvement of this image in accordance with the results of the classification. For this, each feature pixel is associated with a feature vector based on which classification is performed. Further processing may include applying a border sharpening filter for pixels classified as text, and applying a smoothing filter for pixels classified as an image.

Описанные выше патенты позволяют сделать текст на сканированном изображении более четким и резким. Однако значения пикселей в пределах текстовых символов остаются распределеными неравномерно. Более того, структура и форма символов, представленных в виде растровых массивов пикселей, нарушается при растрировании растровым процессором, что соответственно может приводить к искажению текста на результирующей копии.The patents described above make text on a scanned image clearer and sharper. However, pixel values within text characters remain unevenly distributed. Moreover, the structure and shape of the characters represented in the form of pixel raster arrays is violated when rasterized by the raster processor, which, accordingly, can lead to distortion of the text on the resulting copy.

Задача, на решение которой направлено заявляемое изобретение, состоит в том, что, бы разработать способ, позволяющий снизить степень деградации (нарушения формы и смещения цвета заполнения) текста в результате многократного копирования печатного документа и обеспечить воссоздание формы символов и цвета заполнения символов как можно ближе к оригиналу. Причем в первую очередь речь идет именно о тексте, поскольку в отношении других областей изображений могут успешно применяться подходы, отличающиеся от заявляемого.The problem to which the claimed invention is directed, is that to develop a method that allows to reduce the degree of degradation (violation of the form and color shift of the fill) as a result of repeated copying of the printed document and to ensure the recreation of the shape of the characters and fill colors of characters as close as possible to the original. Moreover, it is primarily a question of the text, since in relation to other areas of images approaches that differ from the claimed one can be successfully applied.

Технический результат достигается за счет разработки усовершенствованного способа улучшения качества копий (или сохранения близкого к оригиналу качества) печатных документов при многократном копировании. При этом заявляемый способ улучшения текста при цифровом копировании печатных документов предусматривает выполнение следующих операций:The technical result is achieved by developing an improved method for improving the quality of copies (or preserving the quality close to the original) of printed documents with multiple copying. In this case, the inventive method of improving the text during digital copying of printed documents provides for the following operations:

- сканируют печатный документ, получая сканированное изображение;- scan a printed document, receiving a scanned image;

- выявляют на сканированном изображении связные области символов;- connected areas of characters are detected on the scanned image;

- определяют характерные цвета для групп связных областей символов;- determine the characteristic colors for groups of connected areas of characters;

- аппроксимируют контуры связных областей символов с помощью последовательностей отрезков линий и сегментов кривых;- approximate the contours of the connected areas of the characters using sequences of line segments and curve segments;

- выполняют на сканированном изображении растеризацию аппроксимированных контуров с заполнением их внутренней области соответствующими характерными цветами;- perform rasterization of the approximated contours on the scanned image with filling their inner area with the corresponding characteristic colors;

- печатают модифицированное изображение.- print a modified image.

Основными преимуществами заявляемого метода в сравнении с существующим уровнем техники являются:The main advantages of the proposed method in comparison with the current level of technology are:

Улучшение формы копируемого текста за счет векторизации его контура. Растрирование символов, представленных набором векторизованных контуров, позволяет обеспечить качество печати текста, близкое к исходному копируемому печатному документу. Это достигается за счет того, что векторизованный символ обладает, по сути, неограниченным разрешением и, соответственно, может предоставить для растрирования больше данных, чем содержится в сканированном изображении. Кроме того, большинство современных процессоров растровых изображений (RIP) поддерживают высококачественную печать векторной графики, представляя в этом случае напечатанный символ в виде объекта с четкими непрерывными границами.Improving the shape of the copied text by vectorizing its outline. Rasterization of characters represented by a set of vectorized outlines allows you to ensure the print quality of the text, close to the original copied printed document. This is achieved due to the fact that the vectorized symbol has, in fact, unlimited resolution and, accordingly, can provide for rasterization more data than is contained in the scanned image. In addition, most modern bitmap image processing (RIP) processors support high-quality vector printing, representing in this case the printed character as an object with clear, continuous borders.

- Равномерный и одинаковый цвет заполнения для групп символов, в том числе для символов малого размера. Это достигается за счет увеличения выборки для оценивания цвета путем группировки близкорасположенных символов и их дальнейшей кластеризации по цвету.- Uniform and the same fill color for groups of characters, including small characters. This is achieved by increasing the sample for color estimation by grouping nearby characters and further clustering them by color.

Таким образом, заявляемый способ включает пять этапов. На первом этапе выполняют сканирование копируемого печатного документа. На втором этапе осуществляют сегментацию сканированного изображения на две предопределенные категории: связные области символов и область фона. Области символов включают в себя текст, а также линии, таблицы и т.п. Область фона включает в себя белый или цветной фон, а также растрированные фотографии и рисунки. На третьем этапе выполняют группировку связных областей символов по предопределенным признакам и вычисляют характерные цвета для каждой из групп. На четвертом этапе выделяют контурные пиксели символов и осуществляют их аппроксимацию замкнутыми последовательностями векторных функций. На пятом этапе растрируют аппроксимированные контуры на сканированном изображении и заполняют их внутреннюю область соответствующим характерным цветом. В завершение, осуществляют печать модифицированного сканированного изображения.Thus, the inventive method includes five steps. At the first stage, a copy of the printed document is scanned. At the second stage, the scanned image is segmented into two predetermined categories: connected areas of characters and a background area. Character areas include text, as well as lines, tables, and the like. The background area includes a white or color background, as well as rasterized photographs and drawings. At the third stage, a grouping of connected symbol regions is performed according to predetermined signs and characteristic colors for each of the groups are calculated. At the fourth stage, contour pixels of symbols are distinguished and approximated by closed sequences of vector functions. At the fifth stage, the approximated contours on the scanned image are rasterized and their inner region is filled with the corresponding characteristic color. Finally, a modified scanned image is printed.

Для реализации заявляемого способа разработана система улучшения текста при цифровом копировании печатных документов, которая включает в себя:To implement the proposed method, a system has been developed to improve text in digital copying of printed documents, which includes:

- сканирующий модуль, выполненный с возможностью сканирования исходного печатного документа и подачи на выходы сканированного изображения, причем один выход сканирующего модуля соединен с входом модуля сегментации, а второй выход сканирующего модуля соединен с входами модуля определения характерных цветов и модуля растрирования;- a scanning module, configured to scan the original printed document and feed the output of the scanned image, moreover, one output of the scanning module is connected to the input of the segmentation module, and the second output of the scanning module is connected to the inputs of the characteristic color determination module and the rasterization module;

модуль сегментации текста, выполненный с возможностью создания маркерного бинарного изображения, определяющего текстовые и нетекстовые области на сканированном изображении поступающего на три выхода модуля сегментации текста, причем один выход модуля сегментации текста соединен со вторым входом модуля определения характерных цветов, второй выход модуля сегментации текста соединен с входом модуля векторизации, а третий выход модуля сегментации текста соединен с одним из четырех входов модуля растрирования;a text segmentation module configured to create a marker binary image defining text and non-text areas on a scanned image received at three outputs of a text segmentation module, wherein one output of the text segmentation module is connected to the second input of the characteristic color determination module, the second output of the text segmentation module is connected to the input of the vectorization module, and the third output of the text segmentation module is connected to one of the four inputs of the rasterization module;

- модуль определения характерных цветов, выполненный с возможностью выявления (обнаружения) групп связных областей символов, отличающихся цветом на величину, не более предопределенного значения, и с возможностью определения характерных цветов для указанных групп; выход модуля определения характерных цветов соединен с одним из четырех входов модуля растрирования;- a module for determining characteristic colors, configured to detect (detect) groups of connected regions of characters differing in color by an amount no more than a predetermined value, and with the ability to determine characteristic colors for these groups; the output of the characteristic color determination module is connected to one of four inputs of the rasterization module;

- модуль векторизации, выполненный с возможностью аппроксимации контуров связных областей символов на маркерном бинарном изображении с помощью последовательностей отрезков линий и сегментов кривых; выход модуля векторизации соединен с одним из четырех входов модуля растрирования;- a vectorization module, made with the possibility of approximating the contours of the connected areas of the characters on the marker binary image using sequences of line segments and curve segments; the output of the vectorization module is connected to one of the four inputs of the rasterization module;

- модуль растрирования, выполненный с возможностью растеризации на сканированном изображении аппроксимированных контуров с заполнением их внутренних областей соответствующими характерными цветами; выход модуля растрирования соединен с входом модуля печати;- rasterization module, configured to rasterize the approximated contours on the scanned image with filling their inner areas with the corresponding characteristic colors; the output of the screening module is connected to the input of the print module;

- модуль печати, выполненный с возможностью печати модифицированного изображения.- print module, configured to print a modified image.

Далее существо заявляемого изобретения поясняется с привлечением графических материалов.Further, the essence of the claimed invention is illustrated with the use of graphic materials.

Фиг.1. Иллюстрация улучшения копируемого текста заявляемым способом в сравнении с известным уровнем техники.Figure 1. Illustration of the improvement of the copied text of the claimed method in comparison with the prior art.

Фиг.2. Блок-схема основных этапов заявленного способа улучшения текста при цифровом копировании печатных документов.Figure 2. A block diagram of the main steps of the claimed method for improving text in digital copying of printed documents.

Фиг.3. Блок-схема системы, реализующей способ улучшения текста при цифровом копировании печатных документов.Figure 3. A block diagram of a system that implements a method for improving text when digitally copying printed documents.

Фиг.4. Иллюстрация процесса выявления областей символов на сканированном изображении.Figure 4. Illustration of the process of identifying areas of characters in a scanned image.

Фиг.5. Блок-схема проведения оценки характерных цветов для связных областей символов.Figure 5. Flowchart for evaluating characteristic colors for connected areas of characters.

Фиг.6. Иллюстрация определения среднего цвета для отдельной связной области символа.6. An illustration of the definition of the middle color for a single connected area of a symbol.

Фиг.7. Иллюстрация этапов объединения близкорасположенных областей символов в группу.7. Illustration of the steps of grouping nearby symbol areas into a group.

Фиг.8. Иллюстрация создания групп близкорасположенных связных области символов, отличающихся цветом на величину, не более предопределенного значения.Fig. 8. Illustration of the creation of groups of closely connected connected areas of characters that differ in color by an amount of no more than a predetermined value.

Фиг.9. Иллюстрация процесса кластеризации характерных цветов близкорасположенных областей символов.Fig.9. Illustration of the process of clustering characteristic colors of nearby symbol areas.

Фиг.10. Блок-схема основных этапов векторизации контуров связных областей символов.Figure 10. The block diagram of the main stages of vectorization of the contours of the connected areas of characters.

Фиг.11. Иллюстрация векторизации контуров связных областей символов.11. Illustration of the vectorization of the contours of the connected areas of the symbols.

Пример работы заявляемого изобретения в сравнении с известным уровнем техники проиллюстрирован на рисунке Фиг.1, где представлен фрагмент 102 сканированного текста, полученного путем сканирования исходного печатного документа с разрешением сканирования 300 точек на дюйм (dpi). Исходный печатный документ в данном случае получен посредством печати электронного документа с фрагментом 101 исходного текста, набранным в текстовом редакторе.An example of the operation of the claimed invention in comparison with the prior art is illustrated in Figure 1, which shows a fragment 102 of the scanned text obtained by scanning the original printed document with a scan resolution of 300 dots per inch (dpi). The original printed document in this case is obtained by printing an electronic document with a fragment 101 of the source text typed in a text editor.

Фрагмент 103 текста получен с помощью традиционной процедуры копирования, соответствующей известному уровню техники. В данном случае процесс копирования на цифровом копире, известный из уровня техники, включает в себя выполнение следующих операций:Fragment 103 of the text obtained using the traditional copy procedure, corresponding to the prior art. In this case, the process of copying on a digital copier, known from the prior art, includes the following operations:

- сканируют печатный документ с предпочтительным разрешением копирования; результатом данного этапа является цифровое изображение печатного документа;- scan a printed document with a preferred copy resolution; the result of this step is a digital image of a printed document;

- подготавливают сканированное изображение к печати посредством осуществления необходимых преобразований, которые, например, могут включать в себя: повышение контраста сканированного изображения, гамма-коррекцию, трансформацию из цветового пространства RGB в CMYK, коррекцию цвета, растрирование и т.д.;- prepare the scanned image for printing by performing the necessary transformations, which, for example, may include: increasing the contrast of the scanned image, gamma correction, transformation from the RGB color space to CMYK, color correction, rasterization, etc .;

- печатают результирующее изображение на печатающем устройстве.- print the resulting image on the printing device.

Фрагмент 104 текста иллюстрирует пример копирования печатного документа посредством реализации заявляемого способа. Последующие копии при традиционном копировании имеют еще более существенную деградацию фрагмента 105 текста по сравнению с результатом 106 копирования посредством предложенного способа.Fragment 104 of the text illustrates an example of copying a printed document by implementing the proposed method. Subsequent copies during traditional copying have even more significant degradation of the text fragment 105 in comparison with the result of copying 106 by the proposed method.

Фиг.2 иллюстрирует основные этапы осуществления заявленного изобретения. На шаге 201 получают сканированное изображение копируемого печатного документа. Специалисту понятно, что для этой цели может быть использовано любое устройства, подходящее для захвата/регистрации или получения растрового изображения. Результатом указанного шага является цифровое изображение в виде массива пикселей. Каждый пиксель, в свою очередь, представлен триплетом RGB компонент для цветного изображения или одной компонентой для полутонового (в шкале серого) изображения. Сканированное изображение анализируется на шаге 202 для обнаружения связных областей символов, включающих в себя текстовые символы, линии, таблицы, и т.п. Остальные области изображения, не отнесенные к связным областям символов, относятся к фону, в том числе элементы растровой графики с текстурным или неоднородным заполнением, рисунки, растрированные фотографии и т.п. В данном случае под областью изображения понимается группа связных пикселей (точек) растрового изображения, локализованных в некоторой части сканированного изображения. На шаге 203 выполняют группировку связных областей символов по предопределенным признакам и определяют характерные цвета для каждой из найденных групп. Характерный цвет для группы связных областей символов подразумевает цвет заполнения, одинаковый для всех символов, входящих в группу, например, цвет всех символов параграфа или строки текста. На шаге 204 определяют контурные пиксели связных областей символов и осуществляют их аппроксимацию замкнутыми последовательностями отрезков прямых и сегментов кривых. На шаге 205 растрируют аппроксимированные контуры на сканированном изображении и заполняют их внутреннюю область соответствующим характерным цветом. В завершение осуществляют печать модифицированного сканированного изображения (шаг 206).Figure 2 illustrates the main stages of the implementation of the claimed invention. At step 201, a scanned image of the copied printed document is obtained. One skilled in the art will appreciate that any device suitable for capturing / registering or obtaining a bitmap image can be used for this purpose. The result of this step is a digital image in the form of an array of pixels. Each pixel, in turn, is represented by a triplet of the RGB component for a color image or one component for a grayscale (grayscale) image. The scanned image is analyzed in step 202 to detect coherent symbol areas including text characters, lines, tables, and the like. The remaining areas of the image that are not related to the connected areas of characters relate to the background, including elements of bitmap graphics with texture or heterogeneous filling, drawings, rasterized photographs, etc. In this case, the image area is understood as a group of connected pixels (points) of the raster image localized in a certain part of the scanned image. At step 203, a grouping of connected symbol regions is performed according to predetermined signs and characteristic colors are determined for each of the found groups. A characteristic color for a group of connected symbol areas implies a fill color that is the same for all characters included in the group, for example, the color of all characters in a paragraph or line of text. At step 204, the contour pixels of the connected symbol regions are determined and approximated by closed sequences of straight line segments and curve segments. At step 205, the approximated contours on the scanned image are rasterized and their inner region is filled with the corresponding characteristic color. Finally, a modified scanned image is printed (step 206).

Заявляемый способ обеспечивает эффективное копирование печатных документов с точки зрения снижения величины деградации (нарушения формы и смещения цвета заполнения/заливки) символов в ходе копирования документа, особенно заметного в результате многократного копирования печатного документа. Указанный эффект сохранения формы символов достигается за счет векторизации его контура. В этом случае, при растрировании символов, представленных набором векторизованных контуров, процессор растровых изображений (RIP) в устройствах печати располагает, по сути, неограниченным разрешением указанных символов. Кроме того, большинство современных процессоров растровых изображений поддерживают высококачественную печать векторной графики, представляя в этом случае напечатанный символ в виде объекта с четкими непрерывными границами. Эффект сохранения цвета заполнения символов достигается за счет заполнения внутренней области растрируемых контуров символов, входящих в одну группу, одинаковым характерным цветом. При этом оценивание характерного цвета производится по всей группе символов и соответственно более объективно и устойчиво за счет относительно большой выборки в сравнении с оценкой цвета, выполняемой для отдельных символов независимо друг от друга.The inventive method provides effective copying of printed documents from the point of view of reducing the amount of degradation (form distortion and color shift filling / filling) characters during copying a document, especially noticeable as a result of repeated copying of a printed document. The indicated effect of preserving the shape of the symbols is achieved by vectorizing its outline. In this case, when rasterizing the characters represented by a set of vectorized outlines, the raster image processor (RIP) in the printing devices has, in fact, unlimited resolution of these characters. In addition, most modern bitmap processors support high-quality vector printing, representing in this case the printed symbol as an object with clear, continuous borders. The effect of preserving the color of filling characters is achieved by filling the inner area of the rasterized contours of characters included in one group with the same characteristic color. In this case, the characteristic color is estimated over the entire group of characters and, accordingly, is more objective and stable due to the relatively large sample in comparison with the color assessment performed for individual characters independently of each other.

Фиг.3 схематически иллюстрирует систему, реализующую заявленный способ. Система улучшения текста при цифровом копировании печатных документов включает в себя: модуль 301 сканирования, выполненный с возможностью сканирования исходного печатного документа и передачи сканированного изображения в модуль сегментации, в модуль определения характерных цветов и в модуль растрирования; модуль 302 сегментации текста, выполненный с возможностью создания маркерного бинарного изображения, определяющего текстовые и нетекстовые области на сканированном изображении; на вход модуля подается сканированное изображение от сканирующего модуля; на выход модуля поступает указанное бинарное изображение, которое передается в модуль определения характерных цветов, модуль векторизации и модуль растрирования; модуль 303 определения характерных цветов, выполненный с возможностью обнаружения групп связных областей символов, отличающихся цветом на величину, не более предопределенного значения, и возможностью определения характерных цветов для указанных групп; на вход модуля поступает маркерное бинарное изображение от модуля сегментации и сканированное изображение от модуля сканирования; на выход модуля поступают значения характерных цветов для соответствующих групп связных областей символов, которые передаются на вход модуля растрирования; модуль 304 векторизации, выполненный с возможностью аппроксимации контуров связных областей символов на маркерном бинарном изображении с помощью последовательностей отрезков линий и сегментов кривых; на вход модуля поступают маркерное бинарное изображение от модуля сегментации текста; на выход модуля поступает аппроксимирующие последовательности отрезков линий и сегментов кривых, которые передаются на вход модуля растрирования; модуль 305 растрирования, выполненный с возможностью растеризации на сканированном изображении аппроксимированных контуров с заполнением их внутренних областей соответствующими характерными цветами; на вход модуля поступает сканированное изображение от модуля сканирования, последовательности отрезков линий и сегментов кривых от модуля векторизации, характерные цвета для соответствующих связных областей символов от модуля определения характерных цветов; на выход модуля поступает модифицированное изображение, которое далее передается в модуль 306 печати, предназначенный для печати данного изображения.Figure 3 schematically illustrates a system that implements the claimed method. The text enhancement system for digital copying of printed documents includes: a scanning module 301, configured to scan an original printed document and transmit the scanned image to a segmentation module, to a characteristic color determination module and to a rasterization module; a text segmentation module 302 configured to create a marker binary image defining text and non-text areas in a scanned image; the scanned image from the scanning module is fed to the input of the module; the output of the module receives the specified binary image, which is transmitted to the module for determining the characteristic colors, the vectorization module and the rasterization module; a module 303 for determining characteristic colors, configured to detect groups of connected regions of characters differing in color by an amount of no more than a predetermined value, and the ability to determine characteristic colors for these groups; the marker binary image from the segmentation module and the scanned image from the scanning module are input to the module; the output of the module receives values of characteristic colors for the corresponding groups of connected regions of the characters, which are transmitted to the input of the rasterization module; vectorization module 304, configured to approximate the contours of the connected regions of symbols in a marker binary image using sequences of line segments and curve segments; the input of the module receives a marker binary image from the text segmentation module; the output of the module receives approximating sequences of line segments and curve segments, which are transmitted to the input of the rasterization module; a rasterization module 305 configured to rasterize the approximated contours on the scanned image with filling their inner regions with appropriate characteristic colors; the input of the module receives a scanned image from the scanning module, a sequence of line segments and curve segments from the vectorization module, the characteristic colors for the corresponding connected regions of the characters from the characteristic color determination module; the output of the module receives a modified image, which is then transmitted to the print module 306, designed to print this image.

Все перечисленные модули могут быть выполнены в виде систем на кристалле (SoC), программируемых логических матриц (FPGA), или в виде специализированных интегральных схем (ASIC). Функционирование модулей понятно из их описания и описания соответствующего способа.All of these modules can be made in the form of systems on a chip (SoC), programmable logic arrays (FPGA), or in the form of specialized integrated circuits (ASIC). The functioning of the modules is clear from their description and the description of the corresponding method.

Фиг.4 иллюстрирует пример сканированного изображения, содержащего две категории визуальной информации: фон 401, включающий в себя рисунок или фотографию 403 и области символов, представленные текстом 402 и таблицей 404. Результатом шага 202, на котором выполняют обнаружение связных областей символов, т.е. сегментацию сканированного изображения на две указанные выше категории, является создание маркерного бинарного изображения 405, ненулевые пиксели которого соответствуют пикселям текста на сканированном изображении. Шаг 202 может быть осуществлен любым, подходящим для этой цели способом, известным из уровня техники. Например, для этой цели может быть использован метод, описанный в статье «А.М. Vil'kin, I.V. Safonov, М.А. Egorova, "Bottom-up Document Segmentation Method Based on Textural Features", Pattern Recognition and Image Analysis, vol. 21, No. 3, pp.565-568, 2011» [6] или способ сегментации сканированных изображений, описанный в работе «Jonghyon Yi; Sunghyun Lim; Document image enhancement algorithm for digital color copier. Proc. SPIE 5293, Color Imaging IX: Processing, Hardcopy, and Applications, 57 (December 18, 2003) » [7].Figure 4 illustrates an example of a scanned image containing two categories of visual information: background 401, which includes a picture or photograph 403 and symbol areas represented by text 402 and table 404. The result of step 202 is the detection of connected symbol areas, i.e. . segmentation of the scanned image into the two above categories is the creation of a marker binary image 405, non-zero pixels of which correspond to the pixels of the text in the scanned image. Step 202 may be carried out by any method known in the art for this purpose. For example, for this purpose, the method described in the article “A.M. Vil'kin, I.V. Safonov, M.A. Egorova, "Bottom-up Document Segmentation Method Based on Textural Features", Pattern Recognition and Image Analysis, vol. 21, No. 3, pp.565-568, 2011 ”[6] or the method of segmentation of scanned images described in the work“ Jonghyon Yi; Sunghyun Lim; Document image enhancement algorithm for digital color copier. Proc. SPIE 5293, Color Imaging IX: Processing, Hardcopy, and Applications, 57 (December 18, 2003) ”[7].

Фиг.5 иллюстрирует основные этапы шага 203, на котором осуществляется определение характерных цветов для групп связных областей символов. На этапе 501 размечают связные области символов, описываемые маркерным бинарным изображением. Для этого выделяют связные области пикселей на бинарном изображении и ставят им в соответствие уникальную идентификационную метку, обозначающую их принадлежность области данного символа. Кроме того, на указанном этапе определяют такие параметры каждой размеченной области, как ограничивающий прямоугольник и ее площадь. На этапе 502 оценивают средние значения цвета каждой выделенной связной области символов, независимо от других областей. На этапе 503 комбинируют близкорасположенные связные области символов в группы. На этапе 504 оценивают средний цвет для каждой группы. Если группа близкорасположенных областей символов характеризуется несколькими цветами, тогда такая группа разделяется на группы меньшего размера. В простейшем случае, когда на исходном печатном документе присутствует одноцветный текст, тогда каждая группа близкорасположенных областей символов будет определяться одним значением цвета. Результатом этапа 504 являются группы близкорасположенных связанных областей символов и среднее значение цвета, соответствующее каждому символу. На этапе 505 в выбранном цветовом пространстве выполняют объединение групп посредством кластеризации соответствующих средних значений цвета в компактные кластеры. На этапе 506 выбирают центры полученных кластеров в качестве характерных цветов для групп связных областей символов, соответствующих этим кластерам.Figure 5 illustrates the main steps of step 203, which determines the characteristic colors for groups of connected areas of characters. At step 501, the connected symbol regions described by the marker binary image are marked. To do this, select the connected areas of pixels in a binary image and assign them a unique identification tag, indicating their belonging to the area of this symbol. In addition, at the indicated stage, such parameters of each marked-up area as the bounding rectangle and its area are determined. At step 502, the average color values of each selected connected symbol region are evaluated, independently of other regions. At step 503, the adjacent connected symbol regions are combined into groups. At step 504, the average color for each group is evaluated. If a group of nearby symbol areas is characterized by several colors, then such a group is divided into smaller groups. In the simplest case, when a plain-colored text is present on the original printed document, then each group of closely spaced symbol areas will be determined by a single color value. The result of step 504 is a group of closely related related symbol areas and an average color value corresponding to each symbol. At step 505, grouping is performed in the selected color space by clustering the corresponding average color values into compact clusters. At step 506, the centers of the resulting clusters are selected as characteristic colors for groups of connected symbol regions corresponding to these clusters.

На Фиг.6 показано выполнение этапа 502, на котором осуществляется оценка среднего цвета для анализируемой связной области символа. Из сканированного изображения 601 извлекаются значения пикселей, принадлежащих анализируемой размеченной связной области 602 в соответствии с бинарным маркерным изображением. Результат извлечения пикселей символа из сканированного изображения проиллюстрирован фрагментом 603, он содержит только пиксели области символа, исключая фон. Оценивание цвета символа осуществляется только по его внутренней части. Такой подход позволяет уменьшить ошибку оценивания цвета за счет отбрасывания граничных пикселей наиболее подверженных появлению негативных эффектов, таких как, например, размытие пикселя в результате сканирования и появления цветной окантовки. Для определения внутренних пикселей символа может использоваться наиболее подходящий для этой цели способ, например применение морфологической операции эрозии с заданным структурирующим элементом к связной области символа на маркерном бинарном изображении. В предпочтительном варианте реализации заявленного способа внутренние пиксели символа выделяются посредством следующих операций:Figure 6 shows the execution of step 502, in which the average color is estimated for the analyzed cohesive region of the symbol. From the scanned image 601, pixel values belonging to the analyzed marked-up connected area 602 are extracted in accordance with the binary marker image. The result of extracting the pixels of the symbol from the scanned image is illustrated by fragment 603; it contains only pixels of the symbol region, excluding the background. Assessment of the color of a symbol is carried out only by its internal part. This approach allows us to reduce the error in estimating color by discarding the boundary pixels that are most susceptible to the appearance of negative effects, such as, for example, blurring of a pixel as a result of scanning and the appearance of a color border. To determine the internal pixels of a symbol, the method most suitable for this purpose can be used, for example, the application of the morphological operation of erosion with a given structuring element to the connected region of a symbol in a marker binary image. In a preferred embodiment of the inventive method, the internal pixels of a symbol are extracted by the following operations:

- определяют общий признак цвета символа как темный символ на светлом фоне или светлый символ на темном фоне за счет сравнения средних значений яркостей пикселей символа и пикселей фона в пределах ограничивающего прямоугольника, соответствующего анализируемой области символа. Если средняя яркость пикселей символа меньше, чем средняя яркость пикселей фона, тогда определяют общий признак цвета анализируемого символа как темный символ на светлом фоне, иначе как светлый символ на темном фоне;- determine the common sign of the color of the symbol as a dark symbol on a light background or a light symbol on a dark background by comparing the average values of the brightness of the pixel pixels and the background pixels within the bounding box corresponding to the analyzed region of the symbol. If the average brightness of the pixels of the symbol is less than the average brightness of the pixels of the background, then the common color attribute of the analyzed symbol is determined as a dark symbol on a light background, otherwise as a light symbol on a dark background;

- для цветного сканированного изображения выполняют преобразование в полутоновое. Причем для символов, имеющих общий признак цвета символа как темный символ на светлом фоне, используют преобразование Y-min{R, G, B}, где min - операция выбора минимального значения, R,G,B - цветовые компоненты значения цвета, представленного в RGB-пространстве, Y- яркость пикселя полутонового изображения. Для символов, имеющих общий признак цвета символа как светлый символ на темном фоне, используют преобразование Y=max{R, G, B}, где max - операция выбора максимального значения;- for color scanned image conversion is performed in grayscale. Moreover, for characters having a common color symbol of the symbol as a dark symbol on a light background, use the Y-min transformation {R, G, B}, where min is the operation of selecting the minimum value, R, G, B are the color components of the color value presented in RGB space, Y- brightness of a grayscale pixel. For characters that have a common attribute of the color of the symbol as a light symbol on a dark background, use the transformation Y = max {R, G, B}, where max is the operation of selecting the maximum value;

- далее разделяют преобразованные значения пикселей области символа на группы темных и светлых пикселей посредством их сравнения с порогом. В предпочтительном варианте осуществления заявленного изобретения порог вычисляется с помощью метода Оцу (N.Otsu, "А threshold selection method from grey level histogram", IEEE Transactions on System Man Cybernetics, vol. 9 no. 1, 1979, pp.62-66.) [8];- further, the converted pixel values of the symbol region are divided into groups of dark and bright pixels by comparing them with a threshold. In a preferred embodiment of the claimed invention, the threshold is calculated using the Otsu method (N. Otsu, "A threshold selection method from gray level histogram", IEEE Transactions on System Man Cybernetics, vol. 9 no. 1, 1979, pp. 62-66. ) [8];

- для области символа, имеющей общий признак цвета как темный символ на светлом фоне, определяют внутренние пиксели символа, как соответствующие группе темных пикселей. Выделение внутренних пикселей символа проиллюстрировано на Фиг.6 фрагментом 604. Для символа, имеющего общий признак цвета как светлый символ на темном фоне, определяют внутренние пиксели символа, как соответствующие группе светлых пикселей.- for a symbol region having a common color attribute as a dark symbol on a light background, the internal pixels of the symbol are determined as corresponding to a group of dark pixels. The allocation of the internal pixels of the symbol is illustrated in FIG. 6 by fragment 604. For a symbol having a common color attribute as a light symbol on a dark background, internal pixels of the symbol are determined as corresponding to a group of bright pixels.

В соответствии с иллюстрацией, оценка среднего цвета 605 для анализируемой связной области символа получается усреднением пикселей сканированного изображения, выделенных как внутренние пиксели 604 этой области.In accordance with the illustration, the average color estimate 605 for the analyzed cohesive region of the symbol is obtained by averaging the pixels of the scanned image highlighted as internal pixels 604 of this region.

Фиг.7 иллюстрирует этап 503, на котором комбинируют близкорасположенные связные области символов в группы. Создание каждой группы близкорасположенных областей символов начинается с первой (стартовой) области 701, выбранной случайным образом или в соответствии с предопределенным правилом. В предпочтительном варианте осуществления заявленного способа в качестве стартовой области символа, с которой начинается создание группы, выбирают область с наибольшей площадью. Далее вычисляют евклидово расстояние 702, 704 между ближайшими вершинами ограничивающих прямоугольников соседних областей символов. Если указанное расстояние меньше предопределенного порогового значения, тогда области объединяют в одну группу. В соответствии с иллюстрацией, к текущей группе на данном этапе присоединяются близлежащие области 703 и 705. На следующем этапе объединения, эти вновь присоединенные области 706, 708 выбирают в качестве стартовых и для них аналогично определяют ближайшие соседние области символов 707, 709, расстояние до которых меньше порогового. Процедуру повторяют до тех пор, пока не останется областей символов, находящихся ближе к символам группы, чем указанное пороговое расстояние. После этого создают новую группу. Так продолжается до тех пор, пока не останется ни одной области символов, не включенной в одну из групп.FIG. 7 illustrates a step 503 in which closely related symbolic regions of the characters are combined into groups. The creation of each group of closely spaced symbol areas begins with the first (start) area 701, selected randomly or in accordance with a predefined rule. In a preferred embodiment of the claimed method, as the starting area of the symbol with which the creation of the group begins, the area with the largest area is selected. Next, the Euclidean distance 702, 704 between the nearest vertices of the bounding rectangles of adjacent symbol regions is calculated. If the specified distance is less than a predetermined threshold value, then the areas are combined into one group. In accordance with the illustration, the neighboring regions 703 and 705 are joined to the current group at this stage. In the next merging step, these newly joined regions 706, 708 are selected as starting regions and the closest neighboring symbol regions 707, 709 are determined for them, the distance to which less than the threshold. The procedure is repeated until there are no areas of characters closer to the characters of the group than the specified threshold distance. After that create a new group. This continues until there is not a single region of characters that is not included in one of the groups.

Пример создания групп близлежащих областей символов приведен на Фиг.8. Для фрагмента 802 текста определено пять групп областей символов. Количество указанных групп может быть другим в зависимости от выбранного порогового значения, определяющего максимальное расстояние между областями символов для объединения их в группу. Каждая из групп предоставляет достаточно выборочных данных для оценивания среднего цвета, включенных в группу символов, на этапе 504 по сравнению с оцениванием среднего цвета отдельных символов. В простейшем случае, когда на исходном печатном документе присутствует одноцветный текст, каждая группа близкорасположенных областей символов будет определяться одним средним значением цвета, характерным для этой группы. В данном случае под характерным цветом группы понимается среднее значение RGB компонентов цвета областей символов, входящих в одну группу, и евклидово расстояние в цветовом пространстве RGB между которыми не более предопределенного значения. Для определения ситуации, когда близкорасположенные области символов, составляющие группу, могут характеризоваться различным средним цветом, т.е. имеют несколько характерным цветов, на этапе 504 выполняют следующие операции: выбирают связную область символа, включенную в текущую группу и обладающую максимальной площадью среди остальных областей группы; характерный цвет группы полагают равным среднему значению цвета указанной области; выбирают следующую связную область текущей группы для сравнения ее среднего цвета с характерным цветом группы посредством вычисления между ними евклидова расстояния в цветовом пространстве RGB и его проверки на превышение предопределенного порога Т1. Если порог не превышен, тогда полагают, что выбранная для сравнения область символа соответствует характерному цвету группы. В этом случае характерный цвет группы обновляется путем усреднения средних цветов каждой из областей символов, отнесенных к характерному цвету группы. Если порог превышен, тогда полагают, что текущей группе соответствует более одного характерного цвета, и в этом случае группа областей символов разбивается на группы меньшего размера. Такое дробление исходной группы близкорасположенных областей символов на меньшие продолжается до тех пор, пока каждой из групп меньшего размера не будет поставлен в соответствие только один характерный цвет. На Фиг.8 иллюстрируется пример разделения групп близкорасположенных областей символов. Фрагменты групп, относящиеся к разным характерным цветам группы, обозначены на рисунке закрашенными прямоугольниками разной яркости. Так, например, группа 801 разделяется на две группы 804 и 805.An example of creating groups of nearby symbol areas is shown in Fig. 8. For text fragment 802, five groups of character regions are defined. The number of these groups may be different depending on the selected threshold value that determines the maximum distance between the regions of the characters for combining them into a group. Each of the groups provides enough sampled data to estimate the average color included in the group of characters at step 504 compared to the average color estimate of the individual characters. In the simplest case, when a plain-colored text is present on the original printed document, each group of closely spaced symbol areas will be determined by one average color value characteristic of this group. In this case, the characteristic color of the group refers to the average RGB value of the color components of the areas of the symbols included in one group, and the Euclidean distance in the RGB color space between which is not more than a predetermined value. To determine the situation when the nearby areas of the symbols that make up the group can be characterized by a different average color, i.e. they have several characteristic colors; at step 504, the following operations are performed: select the connected region of the symbol included in the current group and having the maximum area among the remaining regions of the group; the characteristic color of the group is assumed to be equal to the average color value of the specified area; choose the next connected region of the current group to compare its average color with the characteristic color of the group by calculating the Euclidean distance between them in the RGB color space and checking it to exceed a predetermined threshold T1. If the threshold is not exceeded, then it is believed that the symbol region selected for comparison corresponds to the characteristic color of the group. In this case, the characteristic color of the group is updated by averaging the average colors of each of the symbol regions assigned to the characteristic color of the group. If the threshold is exceeded, then it is believed that the current group corresponds to more than one characteristic color, in which case the group of symbol areas is divided into smaller groups. Such a fragmentation of the original group of closely spaced symbol regions into smaller ones continues until each of the smaller groups is assigned only one characteristic color. On Fig illustrates an example of the separation of groups of adjacent symbol areas. Fragments of groups related to different characteristic colors of the group are indicated in the figure by filled rectangles of different brightness. So, for example, group 801 is divided into two groups 804 and 805.

При других вариантах осуществления заявленного способа этапы 504 и 505 могут быть объединены в один, следовательно, определение близкорасположенных областей символов и проверка их соответствия одному характерному цвету будет производиться одновременно.In other embodiments of the inventive method, steps 504 and 505 can be combined into one, therefore, the determination of nearby areas of the characters and verification of their compliance with one characteristic color will be performed simultaneously.

Фиг.9 иллюстрирует этап 505, на котором осуществляют кластеризацию характерных цветов групп и вычисление характерных цветов для связных областей символов. Группы представлены на рисунке окружностями, яркость заполнения которых выбрана аналогично отображению групп областей символов на Фиг.8. Радиус окружностей пропорционален количеству областей символов в соответствующих группах. Процесс кластеризации начинается с группы, включающей в себя максимальное количество областей символов. На иллюстрации кластеризация начинается со значения характерного цвета 902, соответствующей группе 807. Кластеризация выполняется посредством вычисления евклидова расстояния между сравниваемыми характерными цветами групп в выбранном цветовом пространстве. Если вычисленное расстояние меньше предопределенного порога Т2, тогда сравниваемые цвета объединяются в один кластер. В предпочтительном варианте осуществления заявленного способа значение порога Т1 превышает значение порога Т2. Согласно иллюстрациям Фиг.8 и Фиг.9, характерный цвет 903 группы 805 является ближайшим к стартовому значению характерного цвета 902 группы 807. Процедура кластеризации продолжается до тех пор, пока все характерные цвета групп, удаленные друг от друга менее чем на пороговое расстояние Т2, не будут объединены в один кластер. Для предотвращения избыточного разрастания кластеров, максимальный размер каждого кластера 905 ограничен расстоянием до его центра 904. Центр кластера вычисляется как среднее значение цвета всех связных областей символов, группы которых включены в указанный кластер, и это значение обновляется при каждом изменении кластера. В завершение этапа 505 центры полученных кластеров выбирают в качестве характерных цветов для связных областей символов, соответствующих этим кластерам.FIG. 9 illustrates a step 505 in which the characteristic colors of groups are clustered and characteristic colors are calculated for connected symbol areas. The groups are represented in the figure by circles, the filling brightness of which is selected similarly to the display of groups of symbol areas in Fig. 8. The radius of the circles is proportional to the number of symbol areas in the corresponding groups. The clustering process begins with a group that includes the maximum number of symbol areas. In the illustration, clustering starts with the characteristic color value 902 corresponding to group 807. Clustering is performed by calculating the Euclidean distance between the characteristic characteristic colors of the groups being compared in the selected color space. If the calculated distance is less than the predetermined threshold T2, then the compared colors are combined into one cluster. In a preferred embodiment of the claimed method, the threshold value T1 exceeds the threshold value T2. According to the illustrations of Fig. 8 and Fig. 9, the characteristic color 903 of group 805 is closest to the starting value of the characteristic color 902 of group 807. The clustering procedure continues until all the characteristic colors of the groups are less than a threshold distance T2 from each other, will not be combined into one cluster. To prevent excessive cluster growth, the maximum size of each cluster 905 is limited by the distance to its center 904. The center of the cluster is calculated as the average color value of all connected symbol regions whose groups are included in the specified cluster, and this value is updated with every change in the cluster. At the end of step 505, the centers of the resulting clusters are selected as characteristic colors for the connected symbol regions corresponding to these clusters.

Фиг.10 схематически иллюстрирует основные этапы шага 204, на котором определяют контурные пиксели связных областей символов и осуществляют их аппроксимацию замкнутыми последовательностями отрезков прямых и сегментов кривых. Для выполнения шага 204 используют маркерное бинарное изображение, описывающее связные области символов. На этапе 1001 осуществляется отслеживание каждого контура анализируемой замкнутой области символа, включая внешний и внутренние контуры. Процедура отслеживания (трассировки) контура начинается из некоторой стартовой точки контура и продолжается вдоль контура в предопределенном направлении до тех пор, пока стартовая точка не встретится снова. Область символа может быть ограничена только одним внешним контуром. Внутренних контуров может быть несколько, или они могут отсутствовать. На данном этапе контур представляет собой замкнутую последовательность точек, соединенных отрезками длиной в один пиксель, то есть может рассматриваться как полигон. После отслеживания контуров анализируемой области на этапе 1002 выполняется уменьшение количества элементов контуров (вершин полигона) посредством определения их наиболее значимых точек перегиба контура. Процедура нахождения точек перегиба соответствует нахождению оптимальной полигональной аппроксимации контура в соответствии с заданной ошибкой аппроксимации. Ошибка аппроксимации вычисляется как сумма квадратов расстояний от каждой точки аппроксимируемого участка контура к соответствующей аппроксимирующей линии. На этапе 1003 упрощенный контур, представленный полигоном, аппроксимируется последовательностью отрезков прямых и сегментов кривых. В предпочтительном варианте осуществления заявленного способа в качестве аппроксимирующих кривых используются кубические кривые Безье. В общем случае, аппроксимация отрезками прямых включает в себя определение координат их начала и конца, аппроксимация сегментами кривых, описанных кривыми Безье, включает в себя координаты двух контрольных точек и точек начала и конца этих сегментов. Фиг.11 иллюстрирует пример аппроксимации фрагмента связной области символа. Вершина аппроксимирующего полигона 1103, находящаяся между соответствующими ребрами полигона 1101-1103 и 1103-1106, может быть аппроксимирована с помощью сегмента 1105 кубической кривой Безье, ограниченной точками 1102 и 1104, соответствующих серединам ребер полигона. Пример аппроксимации связной области символа 1107 проиллюстрирован фигурой 1108.Figure 10 schematically illustrates the main steps of step 204, in which the contour pixels of the connected symbol regions are determined and approximated by closed sequences of straight line segments and curve segments. To perform step 204, a marker binary image is used that describes the connected areas of the characters. At step 1001, tracking of each contour of the analyzed closed area of the symbol, including external and internal contours, is carried out. The contour tracking (tracing) procedure starts from a certain starting point of the contour and continues along the contour in a predetermined direction until the starting point meets again. The symbol area can be limited to only one outer contour. There may be several internal circuits, or they may be absent. At this stage, the contour is a closed sequence of points connected by segments of one pixel in length, that is, it can be considered as a polygon. After tracking the contours of the analyzed area at step 1002, the number of contour elements (polygon vertices) is reduced by determining their most significant contour inflection points. The procedure for finding inflection points corresponds to finding the optimal polygonal approximation of the contour in accordance with a given approximation error. The approximation error is calculated as the sum of the squared distances from each point of the approximated portion of the contour to the corresponding approximating line. At step 1003, the simplified contour represented by the polygon is approximated by a sequence of line segments and curve segments. In a preferred embodiment of the inventive method, cubic Bezier curves are used as approximating curves. In general, the approximation by line segments includes determining the coordinates of their beginning and end, the approximation by segments of the curves described by Bezier curves, includes the coordinates of two control points and the start and end points of these segments. 11 illustrates an example of approximation of a fragment of a connected region of a symbol. The vertex of the approximating polygon 1103, located between the corresponding edges of the polygon 1101-1103 and 1103-1106, can be approximated using the segment 1105 of the cubic Bezier curve bounded by points 1102 and 1104 corresponding to the midpoints of the edges of the polygon. An example of approximation of the connected region of symbol 1107 is illustrated in figure 1108.

Растрирование результирующего изображения на печатающем устройстве включает в себя следующие этапы: ретушируют области символов на исходном сканированном изображении в соответствии с маркерным бинарным изображением посредством оценки среднего значения цвета фона, окружающего текущую область символа и замены пикселей сканированного изображения на указанное значение цвета; растрируют модифицированное сканированное изображение в соответствии с установками и параметрами принтера; растрируют на указанном изображении аппроксимированные контуры областей символов с заполнением их внутренней области соответствующими характерными цветами.Rasterizing the resulting image on a printing device includes the following steps: retouching the symbol regions of the original scanned image in accordance with the marker binary image by evaluating the average background color value surrounding the current symbol region and replacing the pixels of the scanned image with the specified color value; rasterize the modified scanned image in accordance with the settings and parameters of the printer; rasterized on the specified image the approximated contours of the areas of the characters with filling their inner area with the corresponding characteristic colors.

Заявленное изобретение предназначается для реализации в черно-белых и цветных многофункциональных печатающих устройствах и цифровых копирах. Также способ может быть реализован в составе программного обеспечения сканирующих устройств.The claimed invention is intended for implementation in black and white and color multifunction printing devices and digital copiers. Also, the method can be implemented as part of the software scanning devices.

Специалистам ясно, что возможны разные варианты осуществления, добавления и замены, не выходящие за рамки объема и смысла настоящего изобретения, раскрытого в прилагаемой формуле изобретения.It is clear to those skilled in the art that various embodiments, additions and substitutions are possible without departing from the scope and meaning of the present invention disclosed in the appended claims.

Claims

1. A way to improve the text in digital copying of printed documents, providing for the following operations:
- scan a printed document, receiving a scanned image;
- connected areas of characters are detected on the scanned image;
- determine the characteristic colors for groups of connected areas of characters;
- approximate the contours of the connected areas of the characters using sequences of line segments and curve segments;
- perform rasterization of the approximated contours on the scanned image with filling their inner area with the corresponding characteristic colors;
- print a modified image.

2. The method according to claim 1, characterized in that according to the results of the identification of the connected areas of the characters on the scanned image, a marker binary image is created that defines the connected areas of the characters on the scanned image.

3. The method according to claim 1, characterized in that the characteristic colors for the connected areas of the characters are determined by performing the following operations:
- mark the connected areas on the marker binary image;
- determine on the scanned image the color of the symbol areas corresponding to the marked connected areas on the marker binary image;
- group closely spaced connected areas of characters that differ in color by an amount of no more than a predetermined value;
- determine the average color value for each of these groups of connected areas of characters;
- group together by clustering the average color values of these groups;
- choose the centers of the resulting clusters as characteristic colors for groups of connected symbol regions corresponding to these clusters.

4. The method according to claim 1, characterized in that they approximate the contours of the connected regions of the characters using sequences of line segments and curve segments by performing the following operations:
- track the points of the external and internal contours of each connected area on the marker binary image;
- simplify the contours of the connected areas by highlighting the inflection points of each contour;
- approximate the simplified contours of the connected areas of the characters using sequences of line segments and curve segments.

5. The method according to any one of claims 1 to 3, characterized in that the group of closely connected connected areas of the characters differ in color by an amount of not more than a predetermined value by performing the following operations:
- calculate the Euclidean distance in the RGB color space between the average values of the color components of the compared connected closely spaced areas of the characters;
- group the indicated symbol regions if the Euclidean distance between the average color values of these regions does not exceed a predetermined value.

6. The method according to any one of claims 1 to 3, characterized in that the groups are combined by clustering the average color values of these groups by performing the following operations:
- choose a group of closely connected connected areas of the characters with the largest number of pixels of the scanned image contained in these areas;
- take the average color of the selected group as the center of the cluster;
- include in the current cluster another group of closely connected connected regions of symbols for which the Euclidean distance between its middle color and the center of the cluster does not exceed a predetermined value;
- adjust the center of the cluster by calculating a new average color value of the groups that make up the cluster;
- repeat the indicated operations until all groups of closely connected connected regions of symbols are included in the corresponding clusters.

7. The method according to claims 1 to 4, characterized in that the simplified contours of the connected symbol regions are approximated using sequences of line segments and curve segments by determining the coordinates of the beginning and end for line segments and the coordinates of two control points and the start and end points of the curve segments described cubic bezier curves.

8. The text improvement system that implements the method according to claim 1, including:
- a scanning module, configured to scan the original printed document and feed the output of the scanned image, moreover, one output of the scanning module is connected to the input of the segmentation module, and the second output of the scanning module is connected to the inputs of the characteristic color determination module and the rasterization module;
- text segmentation module, configured to create a marker binary image that defines text and non-text areas on the scanned image coming to the three outputs of the text segmentation module, with one output of the text segmentation module connected to the second input of the characteristic color determination module, the second output of the text segmentation module connected with the input of the vectorization module, and the third output of the text segmentation module is connected to one of the four inputs of the rasterization module;
- a module for determining characteristic colors, configured to identify groups of connected areas of characters differing in color by an amount no more than a predetermined value, and with the ability to determine characteristic colors for these groups; the output of the characteristic color determination module is connected to one of four inputs of the rasterization module;
- a vectorization module, made with the possibility of approximating the contours of the connected areas of the characters on the marker binary image using sequences of line segments and curve segments; the output of the vectorization module is connected to one of the four inputs of the rasterization module;
- rasterization module, configured to rasterize the approximated contours on the scanned image with filling their inner areas with the corresponding characteristic colors; the output of the screening module is connected to the input of the print module;
- print module, configured to print a modified image.