RU2446464C2

RU2446464C2 - Method and system for embedding and extracting hidden data in printed documents

Info

Publication number: RU2446464C2
Application number: RU2010117994/08A
Authority: RU
Inventors: Илья Васильевич Курилин (RU); Илья Васильевич Курилин; Илья Владимирович Сафонов (Ru); Илья Владимирович Сафонов
Original assignee: Корпорация "САМСУНГ ЭЛЕКТРОНИКС Ко., Лтд."
Priority date: 2010-05-06
Filing date: 2010-05-06
Publication date: 2012-03-27
Also published as: RU2010117994A

Abstract

FIELD: information technology.

SUBSTANCE: system for embedding hidden data in a printed document includes an image forming module, a points combination sequence generator module, a data embedding module and an output module. The system for extracting hidden data from a printed document includes an image capturing module, a points configuration detection module, a module for calculating distribution of frequency of unit bit values, a hidden data extracting module and an output module. Methods of embedding and extracting hidden data in a printed document describe operation of said systems.

EFFECT: high resistance of embedded messages to slight modifications.

16 cl, 12 dwg

Description

Заявляемое изобретение относится к области обработки цифровых изображений, а более конкретно к защите печатаемых документов.The claimed invention relates to the field of digital image processing, and more particularly to the protection of printed documents.

В настоящее время большое значение придается разработке способов защиты авторских прав и конфиденциальной информации. Одним из наиболее распространенных примеров является внедрение невидимых "водяных" цифровых знаков или сообщений на носитель защищаемой информации, например изображение, аудио или видеосигналы. Подобные внедренные метки применяют и для аналоговых носителей информации, таких как, например, напечатанный на бумаге документ. Это делается, в частности, для предотвращения подделки или несанкционированной модификации печатных документов, их идентификации, контроля обращения документов в организации и т.п.Currently, great importance is attached to the development of ways to protect copyright and confidential information. One of the most common examples is the introduction of invisible watermarked digital characters or messages onto a medium of protected information, for example, an image, audio or video signals. Similar embedded tags are also used for analogue storage media, such as, for example, a document printed on paper. This is done, in particular, to prevent counterfeiting or unauthorized modification of printed documents, their identification, control of circulation of documents in organizations, etc.

Известно много способов защиты печатных документов, например использование бумаги с водяными знаками, защитных волокон, голограмм или специальных чернил. Препятствием к широкому использованию подобных приемов является их относительно высокая стоимость и необходимость применения специального оборудования. Дополнительно следует отметить случаи, когда необходимо с помощью маркирования напечатанного документа незаметно передать дополнительную цифровую информацию, позволяющую облегчить процесс подтверждения подлинности документа. Поэтому незначительная модификация документа, позволяющая внедрить в этот документ незаметное для невооруженного глаза скрытое уникальное цифровое сообщение, разрушающееся при копировании, предоставляет полезный и экономичный механизм для последующего установления подлинности документа.Many methods are known for protecting printed documents, for example, the use of watermarked paper, security fibers, holograms, or special inks. An obstacle to the widespread use of such techniques is their relatively high cost and the need for special equipment. In addition, it should be noted cases where it is necessary, by marking a printed document, to discreetly transmit additional digital information to facilitate the process of verifying the authenticity of a document. Therefore, a slight modification of the document, which allows to introduce a hidden unique digital message invisible to the naked eye, which is destroyed during copying, provides a useful and economical mechanism for the subsequent authentication of the document.

Из уровня техники известны различные решения для обеспечения защиты от копирования, контроля за копированием документов и установления их подлинности посредством внедрения некоторой защитной информации непосредственно в защищаемый документ. Однако большинство существующих решений ориентировано на внедрение скрытой информации в мультимедийные документы или цифровые изображения, и такие решения не могут быть непосредственно использованы в отношении напечатанных документов из-за сложно формализируемых процессов печати, растрирования, сканирования и т.п.Various solutions are known in the art for protecting against copying, controlling copying of documents and establishing their authenticity by incorporating some security information directly into the protected document. However, most of the existing solutions are focused on embedding hidden information in multimedia documents or digital images, and such solutions cannot be directly used for printed documents due to the difficult formalized processes of printing, rasterization, scanning, etc.

Одно из решений проблемы защиты напечатанных документов описано в выложенной патентной заявке США 20090125723 [1], где описывается метод кодирования цифровых данных на поверхности документа посредством использования инфракрасных чернил. Кодированная поверхность состоит из плотно размещенных, примыкающих друг к другу тегов. Кодирование поверхности осуществляется таким образом, чтобы поле захвата информации было достаточно большим для гарантии успешного считывания всего тега и идентификации области, содержащей тег. Данный способ обеспечивает успешное считывание информации с поверхности документа с помощью специального считывающего устройства в форме ручки. К недостаткам метода, описываемого в заявке, следует отнести необходимость в специальных устройствах печати и считывания информации.One solution to the security problem of printed documents is described in U.S. Patent Application Laid-Open No. 20090125723 [1], which describes a method for encoding digital data on a document surface using infrared ink. The coded surface consists of tightly placed adjacent tags. The coding of the surface is carried out in such a way that the information capture field is large enough to guarantee the successful reading of the entire tag and identification of the area containing the tag. This method ensures the successful reading of information from the surface of the document using a special reader in the form of a pen. The disadvantages of the method described in the application include the need for special devices for printing and reading information.

В выложенной патентной заявке США 20080292129 [2] предлагается способ нанесения дополнительной информации в печатный документ за счет внедрения специальных информационных меток в предопределенные позиции. Метки, состоящие из набора точек, ставятся на свободные области документа. После этого изображение преобразовывается для печати и печатается.US Patent Application Laid-Open No. 20080292129 [2] proposes a method for applying additional information to a printed document by introducing special information labels in predetermined positions. Labels consisting of a set of points are placed on free areas of the document. After that, the image is converted for printing and printed.

Наиболее близким к заявляемому изобретению является решение, описанное в выложенной патентной заявке США 20090021795 [3]. В ней предложен способ внедрения идентификационных меток в псевдослучайные позиции документа, а также система для реализации предложенного способа. Метки представляют собой кластер из черных или белых точек для белых или черных областей документа соответственно. Предполагается, что созданные подобным способом метки будут устойчивы к изменению контраста изображения и процессу растеризации, выполняемому в результате передачи документа по факсу. Недостатком такого способа является высокая вероятность утраты содержащейся в метках информации при незначительных повреждениях документа.Closest to the claimed invention is the solution described in US Patent Application Laid-Open No. 20090021795 [3]. It proposes a method for introducing identification marks into pseudo-random positions of a document, as well as a system for implementing the proposed method. Labels are a cluster of black or white dots for white or black areas of the document, respectively. It is assumed that the tags created in this way will be resistant to changes in image contrast and the rasterization process performed by faxing a document. The disadvantage of this method is the high probability of loss of information contained in the labels with minor damage to the document.

Задача, на решение которой направлено заявляемое изобретение, состоит в том, чтобы разработать более эффективные, по сравнению с прототипом, способ и систему встраивания и извлечения скрытых данных в печатаемых документах. В частности ставится задача повышения устойчивости встраиваемых сообщений к незначительным модификациям, что позволит считывать данные даже с поврежденного документа.The problem to which the claimed invention is directed is to develop a more effective, as compared to the prototype, method and system for embedding and extracting hidden data in printed documents. In particular, the task is to increase the stability of embedded messages to minor modifications, which will allow reading data even from a damaged document.

Технический результат достигается за счет разработки двух взаимосвязанных способов и систем для встраивания в печатаемый документ цифровой информации на этапе печати и извлечения этой информации на этапе сканирования.The technical result is achieved by developing two interrelated methods and systems for embedding digital information in a printed document at the printing stage and extracting this information at the scanning stage.

В рамках группы изобретений, связанных единым замыслом, заявляется способ встраивания скрытых данных в печатаемый документ путем нанесения меток в форме группы слабо различимых точек, отличающийся тем, что выполняют следующие операции:As part of a group of inventions related by a single concept, a method for embedding hidden data in a printed document by applying marks in the form of a group of weakly distinguishable points is claimed, characterized in that the following operations are performed:

- преобразуют (растрируют) исходный документ в растровое изображение;- convert (rasterize) the original document into a bitmap image;

- генерируют инвариантную к ориентации и устойчивую к повреждению последовательность конфигураций точек, где встраиваемые данные разделяются на порции, причем каждая порция данных кодируется конфигурацией точек, каждая конфигурация точек включает в себя множество точек минимального печатаемого размера, расположение которых обладает, по меньшей мере, двусторонней симметрией;- generate a sequence of configurations of points that are invariant to orientation and resistant to damage, where the embedded data is divided into portions, each piece of data being encoded by a point configuration, each point configuration includes many points of a minimum print size, the arrangement of which has at least two-sided symmetry ;

- определяют на растрированном изображении все возможные позиции для встраивания конфигураций точек;- determine on the rasterized image all possible positions for embedding the configuration of points;

- встраивают конфигурации точек на обнаруженные позиции;- embed point configurations on detected positions;

- печатают растрированное изображение со встроенными данными.- print a rasterized image with embedded data.

Для встраивания данных в печатаемый документ используются конфигурации черных (в случае черно-белых документов) или цветных (в случае цветных документов) точек минимального печатаемого размера, который может быть обеспечен настройками используемого печатного оборудования, например принтера. Такой способ обеспечивает практически незаметную для невооруженного глаза модификацию печатаемого документа.To embed data in a printed document, configurations of black (in the case of black-and-white documents) or color (in the case of color documents) dots of the minimum printable size, which can be provided by the settings of the used printing equipment, such as a printer, are used. This method provides almost invisible to the naked eye modification of the printed document.

Структура встраиваемых конфигураций точек предполагает отсутствие требований к порядку их расположения на документе. Конфигурации точек могут находиться в любой свободной или частично занятой печатаемой информацией области документа независимо одна от другой в произвольном порядке и последовательности.The structure of embedded point configurations implies the absence of requirements for the order of their location on the document. Point configurations can be in any free or partially occupied information area of a document printed independently of one another in an arbitrary order and sequence.

Симметричная компоновка точек внутри каждой конфигурации обеспечивает способ устойчивостью к поворотам документа и его наклонам при сканировании.The symmetrical arrangement of points within each configuration provides a method of resistance to document rotations and its inclinations during scanning.

Заявляемый способ встраивания цифровой информации в печатаемый документ обеспечивает гарантированную максимальную емкость встраиваемого сообщения для абсолютного большинства текстовых документов.The inventive method of embedding digital information in a printed document provides a guaranteed maximum capacity of an embedded message for the vast majority of text documents.

Многократное повторение встраиваемой информации и статистический подход, используемый при ее извлечении, позволяют заменить значения битов извлекаемых данных на их относительные веса, соответствующие частотам (вероятностям) появления единичных значений. Такой подход обеспечивает способ устойчивостью к незначительным модификациям печатного документа или шумам. Например, наличие только половины от исходного документа формата А4 позволяет извлечь скрытое сообщение так же, как из целого неповрежденного документа.Repeated repetition of embedded information and the statistical approach used to extract it allow us to replace the bits of the extracted data with their relative weights corresponding to the frequencies (probabilities) of occurrence of single values. This approach provides a method resistant to minor modifications of the printed document or noise. For example, the presence of only half of the original A4 format document allows you to retrieve a hidden message in the same way as from a whole undamaged document.

Предлагаемый способ встраивания цифровой информации основывается на использовании растрового бинарного изображения в качестве входного и обработке исходного изображения по полосам, что позволяет осуществить способ в большинстве существующих печатающих устройствах.The proposed method of embedding digital information is based on the use of a raster binary image as input and processing the original image in strips, which allows the method to be implemented in most existing printing devices.

Суммируя вышесказанное, можно утверждать, что заявляемый способ встраивания информации основывается на модификации бинарного растрированного изображения перед печатью посредством вставки групп (конфигураций, кластеров) точек малого размера на свободные (не покрываемые краской на бумаге) или частично занятые (покрываемые краской) участки изображения. Точки имеют минимальный печатаемый размер для используемого разрешения печати и поэтому незаметны для невооруженного глаза. Встраиваемая информация разделяется на порции данных, каждая из которых кодируется соответствующей конфигурацией точек с инвариантной к местоположению на документе структурой. Конфигурации точек встраиваются в документ в циклическом порядке, т.е. после встраивания последней конфигурации точек повторно встраивают первую конфигурацию. Так происходит до тех пор, пока все позиции на печатаемом документе, пригодные для встраивания порций данных, не будут заполнены. Такой подход позволяет достичь избыточности встраиваемой информации за счет многократного повторения одних и тех же конфигураций точек, кодирующих определенные порции данных. Это обеспечивает высокую устойчивость встраиваемого сообщения к модификациям и повреждениям напечатанного документа.Summarizing the above, it can be argued that the claimed method of embedding information is based on modifying a binary rasterized image before printing by inserting groups (configurations, clusters) of small-sized dots onto free (not covered by ink on paper) or partially occupied (covered by ink) image areas. The dots have a minimum printable size for the print resolution used and therefore are invisible to the naked eye. The embedded information is divided into pieces of data, each of which is encoded by the corresponding configuration of points with a structure invariant to the location on the document. Point configurations are built into the document in a cyclic order, i.e. after embedding the last point configuration, the first configuration is re-embedded. This happens until all the positions on the printed document suitable for embedding pieces of data are filled. This approach allows you to achieve redundancy of embedded information due to repeated repetition of the same configuration of points encoding certain pieces of data. This ensures that the embedded message is highly resistant to modifications and damage to the printed document.

Поскольку встроенную информацию в ходе проверки необходимо извлекать из документа, то заявляется также адаптированный к вышеприведенному способу встраивания способ извлечения скрытой информации, который основывается на анализе обнаруженных в сканированном изображении печатного документе конфигураций точек и накоплении распределения встречаемости единичных значений битов в извлекаемом сообщении. Такой подход подразумевает, что внедренные конфигурации точек могли быть частично утеряны или повреждены в процессе печати и сканирования, поэтому значения битов извлеченного сообщения на этапе анализа заменяют на вероятности присутствия единицы или нуля.Since it is necessary to extract the embedded information from the document during verification, a method for extracting hidden information adapted to the above embedding method is also claimed, which is based on the analysis of the point configurations found in the scanned image of the printed document and the accumulation of the distribution of occurrence of single bits in the extracted message. This approach implies that the embedded point configurations could be partially lost or damaged during printing and scanning, therefore, the values of the bits of the extracted message at the stage of analysis are replaced by the probability of the presence of one or zero.

Заявляемый способ извлечения встроенной скрытой информации включает в себя следующие операции:The inventive method of extracting embedded hidden information includes the following operations:

- сканируют печатный документ;- scan a printed document;

- бинаризуют сканированное изображение;- binarize the scanned image;

- исключают из рассмотрения близкорасположенные связанные области;- exclude from consideration closely related areas;

- выявляют встроенные конфигурации точек;- identify built-in configuration of points;

- вычисляют распределение встречаемости единичных значений битов;- calculate the distribution of occurrence of unit values of bits;

- восстанавливают порции данных;- restore pieces of data;

- извлекают скрытое сообщение.- retrieve a hidden message.

Способы встраивания и извлечения цифровой информации в/из печатного документа с инвариантным к положению и ориентации конфигурации точек могут быть реализованы с помощью заявленных систем встраивания скрытых данных в печатаемый документ и извлечения таких данных из печатного документа соответственно.Methods of embedding and extracting digital information into / from a printed document with a position and orientation-invariant configuration of points can be implemented using the claimed systems for embedding hidden data in a printed document and extracting such data from a printed document, respectively.

При этом система для встраивания скрытых данных в печатаемый документ состоит из следующих модулей:Moreover, the system for embedding hidden data in a printed document consists of the following modules:

- модуль формирования изображения, выполненный с возможностью преобразования входного документа в растровое изображение, причем выход модуля формирования изображения соединен с первым входом модуля встраивания данных;an image forming unit configured to convert the input document into a bitmap image, wherein the output of the image forming unit is connected to a first input of the data embedding unit;

- модуль генератора последовательности комбинаций точек, выполненный с возможностью разделения встраиваемых данных на порции данных, подготовки последовательности изображений комбинаций точек в соответствии с предопределенной схемой и встраиваемыми порциями данных, причем выход модуля генератора последовательности комбинаций точек соединен со вторым входом модуля встраивания данных;- a generator module sequence of combinations of points, configured to divide the embedded data into pieces of data, prepare a sequence of images of combinations of points in accordance with a predetermined scheme and embedded pieces of data, and the output of the generator module sequence of combinations of points is connected to the second input of the data embedding module;

- модуль встраивания данных, выполненный с возможностью выявления на входном растрированном изображении всех возможных позиций, пригодных для встраивания конфигураций точек, и встраивания последовательности конфигураций точек с их многократным повторением на выявленные позиции растрового изображения, причем выход модуля встраивания данных соединен с входом модуля вывода;- a data embedding module, configured to identify all possible positions suitable for embedding point configurations on the input rasterized image and embed a sequence of point configurations with their repeated repetition to the identified positions of the bitmap image, the output of the data embedding module being connected to the input of the output module;

- модуль вывода, выполненный с возможностью изготовления печатного документа со встроенными скрытыми данными.- an output module configured to produce a printed document with embedded hidden data.

Заявляемая система, предназначенная для извлечения скрытых данных из печатного документа, состоит из следующих модулей:The inventive system, designed to extract hidden data from a printed document, consists of the following modules:

- модуль захвата изображения, выполненный с возможностью получения цифрового изображения печатного документа, причем выход модуля захвата изображения связан с первым входом модуля обнаружения конфигураций точек, кодирующих порции скрытой информации;- an image capturing module configured to receive a digital image of a printed document, the output of the image capturing module being connected to a first input of a point configuration detection module encoding portions of hidden information;

- модуль обнаружения конфигураций точек, выполненный с возможностью выявления конфигурации точек, кодирующих порции скрытых данных, извлечения данных для каждой обнаруженной конфигурации точек, подсчета количества обнаруженных конфигураций точек с одинаковыми порядковыми номерами, причем первый выход модуля обнаружения конфигураций точек связан с входом модуля вычисления распределения встречаемости единичных значений битов, а второй выход модуля обнаружения конфигураций точек связан с первым входом модуля извлечения скрытых данных;- a point configuration detection module configured to detect a configuration of points encoding portions of hidden data, extract data for each detected configuration of points, count the number of detected configurations of points with the same sequence numbers, the first output of the detection configuration of points configurations being connected to the input of the frequency distribution calculation module unit values of the bits, and the second output of the module for detecting configurations of points is connected with the first input of the module for extracting hidden data x;

- модуль вычисления распределения встречаемости единичных значений битов, выполненный с возможностью накопления данных о распределении встречаемости единичных значений битов для каждой порции данных, причем выход модуля вычисления распределения встречаемости единичных значений битов соединен со вторым входом модуля извлечения скрытых данных;- a module for calculating the distribution of occurrence of unit values of bits, configured to accumulate data on the distribution of the occurrence of unit values of bits for each piece of data, and the output of the module for calculating the distribution of occurrence of unit values of bits is connected to the second input of the hidden data extraction module;

- модуль извлечения скрытых данных, выполненный с возможностью анализа входящих данных о распределении встречаемости единичных значений битов для каждой порции данных и извлечения скрытых данных, причем выход модуля извлечения скрытых данных связан с входом модуля вывода;- a hidden data extraction module configured to analyze incoming data on the occurrence distribution of unit bits for each piece of data and extract hidden data, the output of the hidden data extraction module being connected to the input of the output module;

- модуль вывода, выполненный с возможностью извлечения данных.- an output module configured to extract data.

Для лучшего понимания существа заявляемой группы изобретений, связанных единым замыслом, далее приводится детальное пояснение изобретений с привлечением графических материалов.For a better understanding of the essence of the claimed group of inventions related by a single concept, the following is a detailed explanation of the inventions involving graphic materials.

Фиг.1. Системы встраивания и извлечения скрытых цифровых данных в/из печатного документа;Figure 1. Systems for embedding and extracting hidden digital data to / from a printed document;

Фиг.2. Блок-схема этапов способа встраивания скрытых данных в печатаемый документ;Figure 2. A flowchart of steps for embedding hidden data in a printed document;

Фиг.3. Представление служебных меток;Figure 3. Presentation of service marks;

Фиг.4. Пример порции данных и схемы компоновки данных в конфигурации точек;Figure 4. An example of a data portion and a data composition scheme in a point configuration;

Фиг.5. Пример размещения конфигурации точек;Figure 5. An example of placing a point configuration;

Фиг.6. Блок-схема этапов поиска позиции для встраивания конфигурации точек;6. A block diagram of the steps of position search for embedding a point configuration;

Фиг.7. Примеры конфигураций точек (размеры встраиваемых точек увеличены в четыре раза для демонстрационных целей);7. Examples of point configurations (the sizes of embedded points are increased four times for demonstration purposes);

Фиг.8. Блок-схема этапов способа извлечения скрытых данных из напечатанного документа;Fig. 8. A flowchart of steps for extracting hidden data from a printed document;

Фиг.9. Блок-схема этапов распознавания служебных меток;Fig.9. The block diagram of the stages of recognition of service marks;

Фиг.10. Блок-схема этапов обнаружения достоверных конфигураций точек;Figure 10. Flowchart for the steps to detect valid point configurations;

Фиг.11. Примеры распознавания конфигураций точек;11. Examples of recognition of point configurations;

Фиг.12. Примеры извлечения информации из конфигураций меток.Fig. 12. Examples of extracting information from label configurations.

Система для реализации способа встраивания скрытой цифровой информации в печатаемый документ показана на Фиг.1.1. Модуль 102 формирования изображения предназначен для преобразования входного документа в форму растрированного бинарного изображения. Для этого может быть использовано любое подходящее для этой цели устройство, как, например, сканер, цифровая фотокамера, процессор растровых изображений (RIP) и т.п. Бинарные изображения для каждого цветового канала являются естественными для печатающих электрофотографических и струйных устройств, поэтому процессор растровых изображений предпочтителен для реализации указанного модуля в заявленном изобретении. Растрированное изображение передается на вход модуля 103 встраивания данных. Генератор 101 последовательности конфигураций точек разделяет входное информационное сообщение, предназначенное для встраивания в печатаемый документ, на порции данных и подготавливает последовательность изображений конфигураций точек в соответствии с предопределенным правилом. Каждая конфигурация точек в этой последовательности кодирует одну порцию данных. Последовательность конфигураций точек передается в модуль 103 встраивания данных. Модуль 103 выявляет позиции для встраивания конфигураций меток во входное бинарное изображение. Конфигурации точек встраиваются из подготовленной последовательности в циклическом порядке с повторением одних и тех же конфигураций на разных местах документа. Бинарное изображение со встроенной информацией передается далее в модуль 104 вывода для формирования печатного документа. Для этой цели может быть использован принтер, плоттер или другие подобные печатающие устройства.A system for implementing a method of embedding hidden digital information in a printed document is shown in FIG. 1.1. The imaging module 102 is designed to convert the input document into the form of a rasterized binary image. For this, any device suitable for this purpose can be used, such as a scanner, digital camera, raster image processor (RIP), etc. Binary images for each color channel are natural for printing electrophotographic and inkjet devices, therefore, a raster image processor is preferred for implementing the specified module in the claimed invention. The rasterized image is transmitted to the input of the data embedding unit 103. A point configuration sequence generator 101 divides an input information message intended to be embedded in a printed document into data chunks and prepares a sequence of point configuration images in accordance with a predetermined rule. Each point configuration in this sequence encodes one piece of data. The sequence of point configurations is transmitted to the data embedding unit 103. Module 103 determines positions for embedding label configurations in an input binary image. Point configurations are built in from the prepared sequence in a cyclic order with repeating the same configurations at different places in the document. A binary image with embedded information is transmitted further to the output module 104 to form a printed document. A printer, plotter, or other similar printing device may be used for this purpose.

Система для реализации способа извлечения цифровой информации из печатного документа показана на Фиг.1.2. Модуль 105 захвата изображения выполнен с возможностью получения цифрового изображения входного печатного документа. В качестве модуля захвата изображения может использоваться любое устройство, позволяющее получить цифровое отображение бумажного оригинала напечатанного документа, например сканер, цифровая фотокамера и т.п. Предпочтительным вариантом для реализации указанного модуля заявленного изобретения является планшетный сканер. Выходное цифровое изображение из модуля 105 передается в модуль 107 обнаружения конфигураций точек для обнаружения скрытой в документе информации. Результат извлечения данных из каждой обнаруженной конфигурации точек, кодирующей определенную порцию данных, передается в модуль 106 вычисления распределения встречаемости единичных значений битов. Каждый элемент распределения описывает количество единичных значений соответствующего бита в последовательности извлекаемой скрытой информации. В данном случае вместо «четких» значений 1 или 0 для каждого бита формируется вероятность того, что бит равен единице. Распределение встречаемости единичных значений битов в извлекаемом сообщении и количества обнаруженных конфигураций точек передаются в модуль 108 извлечения скрытых данных. Извлеченная скрытая информация передается в модуль 109 вывода. В качестве модуля вывода может использоваться любое устройство, пригодное для визуализации извлеченных данных пользователю, например дисплей, принтер и т.п., или система управления безопасностью, применяемая для контроля за обращением печатных документов. Все перечисленные блоки и модули могут быть выполнены в виде системы на кристалле (SoC), или в виде программируемой логической матрицы (FPGA), или в виде специализированной интегральной схемы (ASIC). Работа модулей ясна из их описания и описания соответствующего способа.A system for implementing a method for extracting digital information from a printed document is shown in FIG. 1.2. The image capturing unit 105 is configured to receive a digital image of an input printed document. As the image capture module, any device can be used that allows you to digitally display the paper original of a printed document, such as a scanner, digital camera, etc. The preferred option for implementing the specified module of the claimed invention is a flatbed scanner. The digital output image from the module 105 is transmitted to the point configuration detection module 107 for detecting information hidden in the document. The result of extracting data from each detected configuration of points encoding a certain portion of data is transmitted to the unit 106 for calculating the distribution of occurrence of unit values of bits. Each distribution element describes the number of unit values of the corresponding bit in the sequence of extracted hidden information. In this case, instead of “clear” values of 1 or 0 for each bit, the probability is formed that the bit is equal to one. The distribution of the occurrence of unit values of bits in the extracted message and the number of detected point configurations are transmitted to the hidden data extraction module 108. The extracted hidden information is transmitted to the output module 109. As the output module, any device suitable for visualizing the extracted data to the user, for example, a display, printer, etc., or a security management system used to control the circulation of printed documents can be used. All of these blocks and modules can be made in the form of a system on a chip (SoC), or in the form of a programmable logic matrix (FPGA), or in the form of a specialized integrated circuit (ASIC). The operation of the modules is clear from their description and the description of the corresponding method.

Фиг.2 демонстрирует обобщенные этапы заявляемого способа встраивания скрытых данных в печатаемый документ. На шаге 201 получают исходный PDL документ с последующей растеризацией документа в бинарное изображение на шаге 202. В данном случае PDL (Page Description Language) - это язык описания страниц документа, сообщающий печатающему устройству, как должен выглядеть печатаемый документ, например, PostScript фирмы Adobe, HP-GL и PCL компании HP. На шаге 203 разделяют информацию, предназначенную для скрытого встраивания в печатаемый документ, на порции данных и генерируют последовательность конфигураций точек, кодирующих эти порции данных в соответствии с предопределенным правилом. Каждая порция данных снабжается собственных уникальным порядковым номером. На шаге 204 выявляют возможные позиции для внедрения подготовленных конфигураций точек, которые встраиваются на следующем шаге 205 в циклическом порядке, т.е. после встраивания последней конфигурации точек из последовательности повторно начинают с встраивания первой конфигурации. И так происходит до тех пор, пока все позиции на печатаемом документе, пригодные для встраивания порций данных, не будут заполнены. Такой подход позволяет достичь равномерного распределения одинаковых порций данных по изображению, что обеспечивает высокую устойчивость встраиваемого сообщения к модификациям и повреждениям напечатанного документа. На шаге 206 печатают модифицированное бинарное изображение.Figure 2 shows the generalized steps of the proposed method for embedding hidden data in a printed document. In step 201, the original PDL document is obtained, followed by rasterization of the document into a binary image in step 202. In this case, the PDL (Page Description Language) is a document page description language that tells the printing device what the printed document should look like, for example, Adobe PostScript, HP-GL and PCL by HP. At step 203, information intended for hidden embedding in the printed document is divided into pieces of data and a sequence of configurations of points encoding these pieces of data is generated in accordance with a predetermined rule. Each piece of data is supplied with its own unique serial number. At step 204, possible positions for introducing prepared point configurations are identified, which are embedded in the next step 205 in a cyclic order, i.e. after embedding the last configuration of points from the sequence, re-start with embedding the first configuration. And this happens until all the positions on the printed document, suitable for embedding pieces of data, are filled. This approach allows you to achieve a uniform distribution of the same portions of data over the image, which ensures high stability of the embedded message to modifications and damage to the printed document. At step 206, a modified binary image is printed.

В предпочтительном варианте изобретения каждая конфигурация точек включает в себя множество упорядоченных точек минимального печатаемого размера, обладающих четырехсторонней или двухсторонней симметрией. Такой подход обеспечивает модификацию печатаемого документа, практически незаметную для невооруженного глаза. Под минимальным печатаемым размером подразумевается такой размер точки, который позволяет гарантировать закрепление красящего вещества на листе бумаги как минимум для 70% встраиваемых точек. Минимальный размер точек зависит от характеристик печатающего устройства. Например, точки минимального размера при печати с разрешением 600 dpi могут состоять из одного, двух, трех или четырех аппаратных пикселей. Для черно-белой печати точки ставятся черного цвета, соответственно для цветной печати могут применяться используемые основные цвета. Например, для CMYK печати точки могут быть черного, желтого, голубого, пурпурного цветов. Использование желтого цвета при цветной печати является предпочтительным вариантом, так как желтые точки человек практически не может обнаружить невооруженным глазом.In a preferred embodiment of the invention, each dot configuration includes a plurality of ordered dots of minimum print size having four-sided or two-sided symmetry. This approach provides a modification of the printed document, almost invisible to the naked eye. Minimum printable size refers to a point size that ensures that the coloring matter is fixed on a piece of paper for at least 70% of the embedded points. The minimum dot size depends on the characteristics of the printing device. For example, dots of the minimum size when printing with a resolution of 600 dpi may consist of one, two, three, or four hardware pixels. For black and white printing, dots are set in black, respectively, for color printing, the primary colors used can be used. For example, for CMYK printing, dots may be black, yellow, cyan, magenta. The use of yellow in color printing is the preferred option, since people can hardly detect yellow dots with the naked eye.

Конфигурации точек состоят из двух основных частей: четыре служебные метки, обозначающие присутствие скрытой информации, и тело конфигурации точек, кодирующее соответствующую порцию данных с порядковым номером.Point configurations consist of two main parts: four service marks indicating the presence of hidden information, and a point configuration body encoding the corresponding portion of data with a serial number.

В предпочтительном варианте заявляемого изобретения конфигурации точек включают в себя четыре служебные метки, показанные на Фиг.3. Служебные метки 301 располагаются по углам конфигурации 305 точек и состоят из трех точек (см. Фиг.3.1), формирующих вершины прямоугольных треугольников. Стороны треугольника имеют предопределенный размер, и их прямой угол 302 направлен в сторону центра конфигурации точек. Длина а горизонтальной стороны 304 треугольника всегда превышает длину b вертикальной стороны 303. Такое расположение служебных меток позволяет определить позицию тела конфигурации точек, даже если была обнаружена только одна служебная метка. В предпочтительном варианте заявляемого изобретения а≈0.5 мм (11 точек для разрешения печати 600 dpi), b≈0.4 мм (9 точек для разрешения печати 600 dpi). Например, на Фиг.3.2 представлены служебные метки, расположенные в заданных положениях относительно тела закодированной порции данных. Для специалиста в данной области очевидно, что возможны и иные варианты применения изобретения, позволяющие идентифицировать присутствие конфигурации точек на изображении. Например, служебные метки могут быть представлены в другом количестве или конфигурации или совсем отсутствовать, в этом случае скрытая информация может быть обнаружена за счет поиска в документе определенных геометрических структур, присущих используемым конфигурациям точек, кодирующих порции данных.In a preferred embodiment of the claimed invention, the point configurations include four service marks shown in FIG. 3. Service marks 301 are located at the corners of the configuration of 305 points and consist of three points (see Figure 3.1), forming the vertices of right-angled triangles. The sides of the triangle have a predetermined size, and their right angle 302 is directed toward the center of the configuration of the points. The length a of the horizontal side 304 of the triangle always exceeds the length b of the vertical side 303. This arrangement of service marks allows you to determine the position of the body configuration points, even if only one service mark was found. In a preferred embodiment of the claimed invention a ≈0.5 mm (11 points for a print resolution of 600 dpi), b≈0.4 mm (9 points for a print resolution of 600 dpi). For example, figure 3.2 presents service labels located in predetermined positions relative to the body of the encoded piece of data. It will be apparent to those skilled in the art that other applications of the invention are possible to identify the presence of a dot pattern in an image. For example, service marks may be presented in a different quantity or configuration or completely absent, in which case hidden information can be detected by searching in a document for certain geometric structures inherent in the used configuration of points encoding pieces of data.

В предпочтительном варианте изобретения тело конфигурации точек образуется с помощью расположения точек минимального печатаемого размера на прямоугольной сетке с заданным шагом. Для повышения устойчивости встраиваемой информации каждый бит, кодируемый в конфигурации точек, повторяется четыре раза. Если значение бита равняется единице, то точка печатается, если значение равно нулю, то точка будет отсутствовать. Пример расположения кодируемой информации внутри тела конфигурации точек с четырехсторонней симметрией показан на Фиг.4. В предпочтительном варианте изобретения информация, кодируемая конфигурацией точек, состоит из 16 битов (Фиг.4.1), разделенных на две основные части: порядковый номер 401 ID с битом 402 четности и порция 403 данных встраиваемого сообщения bN, соответствующая порядковому номеру. В приведенном примере порядковый номер кодируется тремя битами ID0, ID1, ID2 с одним битом четности ID3. Порция данных определяется двенадцатью битами (b0-b11). Такой подход предусматривает последовательность, состоящую максимум из восьми порций данных с уникальными порядковыми номерами. Соответственно, максимальная емкость встраиваемой информации в документ не может превышать 12·8=96 битов. Для порядкового номера, описываемого четырьмя битами, максимальная емкость возрастает до 11·16=176 битов. Для специалиста в данной области очевидно, что возможны и иные варианты применения изобретения, позволяющие изменять максимальную емкость встраиваемой информации, компоновку используемых конфигураций точек и их количество.In a preferred embodiment of the invention, a dot configuration body is formed by arranging dots of a minimum printable size on a rectangular grid with a predetermined pitch. To increase the stability of embedded information, each bit encoded in the configuration of points is repeated four times. If the bit value is one, then the point is printed; if the value is zero, then the point will be absent. An example of the location of the encoded information inside the configuration body of points with four-sided symmetry is shown in FIG. 4. In a preferred embodiment of the invention, the information encoded by the point configuration consists of 16 bits (FIG. 4.1), divided into two main parts: an ID serial number 401 with a parity bit 402 and a data portion 403 of the embedded message bN corresponding to the serial number. In this example, the sequence number is encoded with three bits ID0, ID1, ID2 with one parity bit ID3. A chunk of data is defined by twelve bits (b0-b11). This approach provides for a sequence of up to eight pieces of data with unique serial numbers. Accordingly, the maximum capacity of embedded information in a document cannot exceed 12 · 8 = 96 bits. For a sequence number described by four bits, the maximum capacity increases to 11 · 16 = 176 bits. For a person skilled in the art it is obvious that other applications of the invention are possible, allowing you to change the maximum capacity of embedded information, the layout of the configuration of points and their number.

На Фиг.4.2, 4.3 представлены варианты возможных компоновок тела конфигураций точек. Показанные схемы компоновки имеют четырехстороннюю (Фиг.4.2) и двустороннюю (Фиг.4.3) симметрию. Схема компоновки с четырехсторонней симметрией (Фиг.4.2) инвариантна к повороту на 90 градусов и, соответственно, при двусторонней симметрии (Фиг.4.3) данные инвариантны к повороту на 180 градусов.Figure 4.2, 4.3 presents options for possible layouts of the body configurations of points. The layout diagrams shown have four-sided (FIG. 4.2) and two-sided (FIG. 4.3) symmetry. The layout scheme with four-sided symmetry (Figure 4.2) is invariant to rotation by 90 degrees and, accordingly, with bilateral symmetry (Figure 4.3), the data is invariant to rotation by 180 degrees.

На Фиг.5 показан шаблон конфигурации точек. В данном случае для наглядности изображение соответствует варианту конфигурации точек с единичными значениями всех битов. Шаг прямоугольной сетки m превышает соответствующий шаг для служебных меток. Такая разница предоставляет собой хороший отличительный признак для распознавания точек, относящихся к служебным меткам и телу конфигурации точек. В предпочтительном варианте изобретения m≈1 мм (25 аппаратных пикселей для разрешения печати 600 dpi). Служебные метки располагаются на расстоянии d от крайней точки тела конфигурации точек, d≈0.6 мм (15 аппаратных точек для 600 dpi разрешения печати).Figure 5 shows a pattern of points configuration. In this case, for clarity, the image corresponds to the configuration option of points with unit values of all bits. The step of the rectangular grid m exceeds the corresponding step for service marks. Such a difference provides a good distinguishing feature for recognizing points related to service marks and the body of the point configuration. In a preferred embodiment of the invention m≈1 mm (25 hardware pixels for a print resolution of 600 dpi). Service marks are located at a distance d from the extreme point of the body configuration points, d≈0.6 mm (15 hardware points for 600 dpi print resolution).

Многократное повторение одинаковых порций данных на печатаемом документе обеспечивает сохранение встроенной скрытой информации, несмотря на частичную потерю встраиваемых точек. Соответственно частичная потеря внедренных в документ точек не влияет на успешное извлечение встроенного сообщения, и конфигурации точек могут незначительно перекрываться с небольшими объектами, такими как символы, линии, точки и т.п. Блок-схема основных этапов шага 204 для обнаружения возможных позиций на документе для встраивания конфигураций точек более детально показана на Фиг.6. На шаге 601 определяют позицию для конфигурации точек, при которой будут свободны участки изображения для встраивания четырех служебных меток. Если такая позиция была обнаружена, тогда рассматривается область изображения, ограниченная служебными метками. На шаге 602 обнаруживают печатаемые элементы изображения, находящиеся в анализируемой области, вычисляют их площадь. Текущая позиция считается пригодной для встраивания конфигурации точек 603, если площадь каждого из печатаемых элементов, находящихся на интересующем участке, и их общая площадь не превышают предопределенных значений.Repeated repetition of identical portions of data on a printed document ensures the preservation of embedded hidden information, despite the partial loss of embedded points. Accordingly, a partial loss of the points embedded in the document does not affect the successful extraction of the embedded message, and the point configurations may slightly overlap with small objects such as symbols, lines, points, etc. A flowchart of the main steps of step 204 for detecting possible positions on a document for embedding point configurations is shown in more detail in FIG. 6. At step 601, determine the position for the configuration of the points at which there will be free areas of the image to embed four service marks. If such a position was found, then the image area limited by service marks is considered. At step 602, printable image elements located in the analyzed area are detected, and their area is calculated. The current position is considered suitable for embedding the configuration of points 603 if the area of each of the printed elements located in the area of interest and their total area do not exceed predetermined values.

В качестве примера на Фиг.7 приводятся фрагменты бинарного изображения со встроенными конфигурациями точек. Для наглядности встроенные точки увеличены в четыре раза. Фрагмент на Фиг.7.1 иллюстрирует конфигурацию точек в свободной области изображения. Фрагменты на Фиг.7.2, 7.3 описывают примеры встроенных конфигураций точек на частично занятых участках.As an example, Fig.7 shows fragments of a binary image with embedded point configurations. For clarity, the built-in points are quadrupled. The fragment in Fig. 7.1 illustrates the configuration of points in the free region of the image. The fragments in Fig. 7.2, 7.3 describe examples of built-in configurations of points in partially occupied areas.

На Фиг.8 показана обобщенная блок-схема способа извлечения из печатного документа информации, встроенной с помощью способа, описанного выше. Цифровое изображение печатного документа получают посредством сканирования на шаге 801. Затем изображение преобразовывают в бинарное за счет сравнения каждого элемента изображения с предопределенным порогом на шаге 802.FIG. 8 shows a generalized flowchart of a method for extracting information embedded in a printed document using the method described above. A digital image of the printed document is obtained by scanning in step 801. Then, the image is converted to binary by comparing each image element with a predetermined threshold in step 802.

Следующие два шага предназначены для первичной классификации связанных областей на изображении как принадлежащих встроенным конфигурациям точек или нет. Для этого на шаге 803 выявляют на бинарном изображении связанные области, состоящие из примыкающих друг к другу точек. В процессе выполнения этого шага выбирают только маленькие связанные области (пятна), площадь которых меньше, чем предопределенное значение. В предпочтительном варианте изобретения максимально допустимый размер соответствует 0.25 кв. мм (9 пикселей при разрешения сканирования в 300 dpi), при этом считается, что область имеет компактную форму и расположена внутри квадрата со стороной 0.25 мм (3 пикселя при разрешении сканирования в 300 dpi). На шаге 804 отбрасывают из рассмотрения те связанные области, для которых в пределах предопределенного расстояния присутствуют другие объекты. В предпочтительном варианте изобретения это расстояние равняется 0,3 мм от края анализируемой области (4 пикселя при разрешении сканирования в 300 dpi).The following two steps are intended for the initial classification of related areas in the image as belonging to the built-in point configurations or not. To do this, in step 803, related areas consisting of points adjacent to each other are detected on the binary image. In the process of performing this step, only small connected areas (spots) are selected, the area of which is less than the predetermined value. In a preferred embodiment of the invention, the maximum allowable size corresponds to 0.25 square meters. mm (9 pixels with a scan resolution of 300 dpi), while it is believed that the region has a compact shape and is located inside a square with a side of 0.25 mm (3 pixels with a scan resolution of 300 dpi). At step 804, those related areas for which other objects are present within a predetermined distance are discarded. In a preferred embodiment of the invention, this distance is 0.3 mm from the edge of the analyzed area (4 pixels with a scan resolution of 300 dpi).

Последующие этапы ориентированы на выбор и распознавание связанных областей, принадлежащих встроенным конфигурациям точек. На шаге 805 вычисляют центры масс связанных областей, выбранных на предыдущем этапе. Результирующие координаты служат в качестве оценок позиций внедренных точек на этапе печати. На шаге 806 распознают служебные метки. Далее за счет анализа этих меток на шаге 807 обнаруживают достоверные конфигурации точек, для которых вероятность правильного обнаружения достаточно высока. Выбор достоверных конфигураций точек осуществляется на основе учета требований, которым они должны соответствовать. Более подробно эти условия перечислены ниже. На шаге 808 оценивают ориентацию (наклон, скос) изображения посредством анализа ориентации обнаруженных достоверных конфигураций точек. На шаге 809 выявляют менее достоверные конфигурации точек, пропущенные на шаге 807, при этом используется информация об оценке ориентации сканированного документа. На шаге 810 вычисляют распределение встречаемости единичных значений битов для конфигураций точек с одинаковыми порядковыми номерами. Все обнаруженные к этому моменту конфигурации точек упорядочиваются в соответствии с их порядковыми номерами. В ходе этого шага восстанавливается регулярная прямоугольная сетка внутри каждой обнаруженной конфигурации точек, позволяющая определить позиции встроенных точек внутри этой конфигурации на печатном документе. Присутствие связанной области (пятна) на вычисленной позиции узла сетки трактуется как единичное значение связанного с этой позицией бита. Количество единичных значений бита, суммированное для всех порций данных, в которых он присутствует, определяет его вес и служит в дальнейшем для принятия решения о действительном значении этого бита в извлекаемой информации. Элементами распределения являются упомянутые выше веса, т.е. количества найденных единичных значений каждого бита в порции данных. На шаге 811 преобразуют веса битов непосредственно в их значения посредством расчета порога для каждой порции данных и сравнения с этим порогом. На шаге 812 получают целое извлеченное сообщение за счет выстраивания порций данных в цепочки в соответствии с их порядковыми номерами. Наиболее важные этапы описаны более подробно ниже.The subsequent steps are focused on the selection and recognition of related areas belonging to the built-in point configurations. At step 805, the centers of mass of the related regions selected in the previous step are calculated. The resulting coordinates serve as estimates of the positions of the embedded points at the printing stage. At 806, service marks are recognized. Further, by analyzing these marks at step 807, reliable point configurations are found for which the probability of correct detection is high enough. Reliable point configurations are selected based on the requirements that they must meet. These conditions are listed in more detail below. At step 808, the orientation (tilt, bevel) of the image is estimated by analyzing the orientation of the detected valid point configurations. In step 809, less reliable point configurations omitted in step 807 are detected, using information about the orientation estimation of the scanned document. At step 810, the occurrence distribution of unit bit values for point configurations with the same sequence numbers is calculated. All point configurations discovered by this point are ordered according to their serial numbers. During this step, a regular rectangular grid is restored inside each detected point configuration, which allows you to determine the position of the embedded points inside this configuration on a printed document. The presence of a bound region (spot) at the calculated position of the grid node is interpreted as the unit value of the bit associated with this position. The number of unit values of a bit, summed for all pieces of data in which it is present, determines its weight and serves in the future to decide on the actual value of this bit in the extracted information. The distribution elements are the weights mentioned above, i.e. the number of found unit values of each bit in the data portion. At step 811, the bit weights are converted directly to their values by calculating a threshold for each piece of data and comparing with that threshold. At step 812, an entire retrieved message is obtained by arranging pieces of data in chains in accordance with their serial numbers. The most important steps are described in more detail below.

Более подробно этап 806 обнаружения служебных меток иллюстрируется блок-схемой на Фиг.9. Группа близкорасположенных связанных областей может считаться служебной меткой (см. Фиг.3), если они соответствуют нескольким предопределенным условиям. На шаге 901 проверяют соответствие предопределенному диапазону значений евклидова расстояния между теми соседними областями на бинарном изображении, которые получены после фильтрации на шагах 803 и 804. В предпочтительном варианте изобретения расстояние между соседними областями должно быть менее 0,6 мм (7 пикселей при разрешении сканирования 300 dpi). На шаге 902 проверяют количество областей в выбранной группе. Если их количество не равняется двум или трем, то анализируемые области считаются шумом (шаг 905) и отбрасываются из рассмотрения. Если количество областей равняется трем (шаг 903), и один из углов между линиями, проведенными через центры анализируемых областей, с допустимой погрешностью близок к 90 градусам (шаг 904), то эта группа из трех областей маркируется как обнаруженная служебная метка (шаг 906). В противном случае если количество областей равняется двум, то пара областей маркируется как частично поврежденная служебная метка (шаг 907). В дальнейшем анализе частично поврежденная служебная метка будет рассматриваться как метка, не содержащая информации о местоположении относительно нее всей конфигурации точек.In more detail, the service tag detection step 806 is illustrated in the flowchart of FIG. 9. A group of closely related related areas can be considered a service mark (see Figure 3) if they meet several predefined conditions. At step 901, the compliance with the predetermined range of values of the Euclidean distance between those adjacent areas in the binary image obtained after filtering at steps 803 and 804 is checked. In a preferred embodiment, the distance between adjacent areas should be less than 0.6 mm (7 pixels with a scan resolution of 300 dpi). At 902, the number of regions in the selected group is checked. If their number does not equal two or three, then the analyzed areas are considered noise (step 905) and are discarded. If the number of areas is three (step 903), and one of the angles between the lines drawn through the centers of the analyzed areas, with an acceptable error close to 90 degrees (step 904), then this group of three areas is marked as a detected service mark (step 906) . Otherwise, if the number of areas is two, then a pair of areas is marked as a partially damaged service mark (step 907). In further analysis, a partially damaged service mark will be considered as a mark that does not contain location information regarding it throughout the point configuration.

Этап обнаружения достоверных конфигураций точек (шаг 807) более подробно представлен блок-схемой, показанной на Фиг.10. На шаге 1001 выбирают первую обнаруженную ранее и не участвующую в анализе служебную метку. На шаге 1002 вычисляют ориентацию текущей служебной метки и ее положение в конфигурации точек. Для этого используется различие между длинами сторон треугольника служебной метки (см. Фиг.3.1) и оценка угла между ними. В соответствии с предпочтительным вариантом изобретения положение служебной метки в конфигурации точек может быть следующим: верхнее левое, верхнее правое, нижнее левое и нижнее правое (см. Фиг.5). На шаге 1003 вычисляют ориентацию конфигурации точек на основе учета ориентации и положения выбранной служебной метки. Результатом является расчет предполагаемых позиций остальных служебных меток. На шаге 1004 обнаруживают остальные служебные метки в соответствии с оценкой их местоположения. Дальнейшие шаги в основном предназначаются для определения достоверности обнаруженной конфигурации точек посредством проверки ряда условий. Этот подход предназначен для уменьшения вероятности ложного обнаружения. На шаге 1005 проверяют количество обнаруженных служебных меток в конфигурации. Это количество должно быть больше одной. На шаге 1006 проверяют соответствие ориентации обнаруженных служебных меток предполагаемым. На шаге 1007 проверяют количество связанных областей внутри обнаруженной конфигурации точек, которое должно превышать предопределенное минимальное значение. В предпочтительном варианте изобретения минимальное количество областей внутри конфигурации меток равняется семи. На шаге 1008 проверяют совпадение центра масс областей внутри обнаруженной конфигурации точек с ее центром с учетом допустимого смещения. На шаге 1009 проверяют соответствие четности извлеченного из обнаруженной конфигурации точек порядкового номера значению бита четности. Это необходимо для того, чтобы избежать ошибок при определении порядкового номера соответствующей порции данных, закодированной в обнаруженной конфигурации точек. Кроме указанных условий также выполняется анализ достоверности этих данных. Для этого вычисляют распределение встречаемости единичных значений обнаруженных битов, каждый из которых потом соотносится с одной из категорий: достоверное значение бита и недостоверное. Для предпочтительного варианта изобретения в качестве битов с недостоверным значением выступают биты, для которых количество единичных значений, обнаруженных в анализируемой конфигурации точек, равняется одному. Такие значения, предположительно, могут являться результатом шума. Отношение количества достоверных битов к недостоверным, по меньшей мере, должно превышать два. Конфигурацию точек, для которой выполняются указанные условия, маркируют (шаг 1010) как достоверную, а служебные метки - как обнаруженные и не нуждающиеся в дальнейшем анализе. Если одно из условий было нарушено, то рассматриваемую служебную метку обозначают как уже участвующую в анализе, и переходят к другой метке. Эта процедура повторяется до тех пор, пока все обнаруженные служебные метки не будут рассмотрены.The step of detecting valid point configurations (step 807) is represented in more detail by the flowchart shown in FIG. 10. At step 1001, the first service tag previously detected and not involved in the analysis is selected. In step 1002, the orientation of the current service mark and its position in the point configuration are calculated. For this, the difference between the lengths of the sides of the triangle of the service mark (see Figure 3.1) and the estimate of the angle between them are used. According to a preferred embodiment of the invention, the position of the service mark in the point configuration can be as follows: upper left, upper right, lower left and lower right (see FIG. 5). At 1003, a point configuration orientation is calculated based on the orientation and position of the selected service mark. The result is a calculation of the estimated positions of the remaining service marks. At step 1004, the remaining service marks are detected in accordance with an assessment of their location. Further steps are mainly intended to determine the reliability of the detected configuration of points by checking a number of conditions. This approach is designed to reduce the likelihood of false detection. At step 1005, the number of service marks detected in the configuration is checked. This quantity must be more than one. At step 1006, the orientation of the detected service marks is checked to be consistent with the intended ones. In step 1007, the number of related areas within the detected point configuration is checked, which must exceed a predetermined minimum value. In a preferred embodiment of the invention, the minimum number of regions within the label configuration is seven. At step 1008, the coincidence of the center of mass of the regions within the detected configuration of the points with its center is checked, taking into account the allowable displacement. At step 1009, the correspondence of the parity of the sequence number extracted from the detected configuration of the points of the sequence to the parity bit value is checked. This is necessary in order to avoid errors in determining the serial number of the corresponding piece of data encoded in the detected points configuration. In addition to these conditions, an analysis of the reliability of these data is also performed. For this, the distribution of occurrence of unit values of the detected bits is calculated, each of which then corresponds to one of the categories: a reliable bit value and an invalid one. For a preferred embodiment of the invention, bits with an unreliable value are bits for which the number of unit values found in the analyzed point configuration is equal to one. Such values are expected to result from noise. The ratio of the number of valid bits to invalid bits should be at least two. The configuration of points for which the specified conditions are met is marked (step 1010) as reliable, and service marks are detected and do not need further analysis. If one of the conditions has been violated, then the service tag under consideration is designated as already participating in the analysis, and they are transferred to another label. This procedure is repeated until all detected service marks are reviewed.

Основные этапы обнаружения и распознавания конфигурации точек проиллюстрированы на Фиг.11. Фрагмент исходного бинарного изображения с внедренной конфигурацией точек для разрешения печати 600 dpi показан на Фиг.11.1. На Фиг.11.2 показан результат бинаризации сканированного изображения с разрешением сканирования 300 dpi. Можно заметить, что часть встроенных точек была потеряна в процессе печати и сканирования. На Фиг.11.3 показан результат распознавания конфигурации точек, включая служебные метки. Квадратом обведены области обнаруженных служебных меток, кругом обведены области частично поврежденных служебных меток. Крестом обозначены обнаруженные элементы тела конфигурации точек, кодирующие порцию данных с порядковым номером и битом четности.The basic steps of detecting and recognizing a point configuration are illustrated in FIG. 11. A fragment of the original binary image with an embedded dot configuration for a print resolution of 600 dpi is shown in Fig. 11.1. 11.2 shows the result of binarization of the scanned image with a scan resolution of 300 dpi. You may notice that some of the embedded points were lost during printing and scanning. Figure 11.3 shows the result of recognizing the configuration of points, including service marks. The square circled the area of the detected service marks, the circle circled the area of partially damaged service marks. The cross indicates the detected elements of the body configuration points, encoding a piece of data with a serial number and a parity bit.

На Фиг.12 проиллюстрирован процесс получения порции данных на примере конфигурации точек с Фиг.11. Для сравнения на Фиг.12.1 приведены исходное и результирующее распределения встречаемости единичных значений для каждого бита. Встроенная информация повторяется четыре раза в конфигурации точек, на рисунке она обозначена светло-серыми полосами. Элементы извлеченной информации обозначены полосами темно-серого цвета. Максимальное количество встречаемости единичных значений битов равняется трем, минимальное - единице, ложные обнаружения отсутствуют. Соответственно около 47% процентов всех встроенных точек были утеряны. Несмотря на частичную потерю данных накопление статистики по всем конфигурациям точек с одинаковыми порядковыми номерами позволяет восстановить информацию с высокой точностью. Бит четности обеспечивает дополнительный контроль правильности извлечения информации. На Фиг.12.2 показан пример суммарного распределения встречаемости единичных значений битов для одной порции данных. Порядковый номер уже учтен при построении суммарного распределения и поэтому на диаграмме не приводится. Вклад от разных конфигураций точек показан разными оттенками серого цвета. Порог для извлечения данных равняется количеству обнаруженных конфигураций точек с одинаковым порядковым номером. Окончательно полученные порции данных выстраиваются в одно информационное сообщение за счет упорядочивания в соответствии с их порядковым номером.On Fig illustrates the process of obtaining a piece of data on the example of the configuration of the points of Fig.11. For comparison, Fig. 12.1 shows the initial and resulting distribution of occurrence of unit values for each bit. The embedded information is repeated four times in the configuration of points; in the figure, it is indicated by light gray stripes. Elements of the extracted information are indicated by dark gray stripes. The maximum number of occurrences of unit values of bits is three, the minimum is one, false detection is absent. Accordingly, about 47% of all built-in points were lost. Despite the partial loss of data, the accumulation of statistics for all configurations of points with the same serial numbers allows you to restore information with high accuracy. The parity bit provides additional control over the correct extraction of information. On Fig.2.2 shows an example of the total distribution of the occurrence of unit values of bits for one piece of data. The serial number is already taken into account when constructing the total distribution and therefore is not shown in the diagram. Contributions from different point configurations are shown in different shades of gray. The threshold for data extraction is equal to the number of detected configurations of points with the same sequence number. Finally received chunks of data are arranged in one informational message due to ordering in accordance with their serial number.

Заявленный способ внедрения цифрового сообщения в печатаемые документы предназначен для реализации в печатающих устройствах. Способ может быть реализован в процессоре растровых изображений (RIP), который в свою очередь реализуется в драйвере печатающего устройства или непосредственно в устройстве печати.The claimed method of embedding a digital message in printed documents is intended for implementation in printing devices. The method can be implemented in a raster image processor (RIP), which in turn is implemented in the driver of the printing device or directly in the printing device.

Заявленный способ извлечения скрытого цифрового сообщения из печатного документа может быть реализован в автономном сканере или сканере МФУ.The claimed method of extracting a hidden digital message from a printed document can be implemented in a stand-alone scanner or MFP scanner.

Следует учитывать, что приведенный выше вариант реализации изобретения был изложен лишь с целью иллюстрации, и специалистам должно быть ясно, что возможны разные модификации, добавления и замены, не выходящие из объема и смысла настоящего изобретения, раскрытого в прилагаемой формуле изобретения.It should be borne in mind that the above embodiment of the invention was set forth only for the purpose of illustration, and it should be clear to those skilled in the art that various modifications, additions and substitutions are possible without departing from the scope and meaning of the present invention disclosed in the attached claims.

Claims

1. The method of embedding hidden data in a printed document by applying marks in the form of a group of faint points, characterized in that they perform the following operations:
convert (rasterize) the source document into a raster image;
generate an orientation-invariant and damage-resistant sequence of point configurations, where the embedded data is divided into portions, each piece of data being encoded by a point configuration, each point configuration includes many points of minimum print size, the arrangement of which has at least two-sided symmetry;
determine on the rasterized image all possible positions for embedding the point configurations by checking the following conditions: service marks are located in unoccupied printable areas of the image; the total area of the printed elements in the inner area of the point configuration, limited by service marks, does not exceed a predetermined value;
embed point configurations on detected positions;
print a rasterized image with embedded data.

2. The method according to claim 1, characterized in that the configuration of points consist of two main parts: service marks indicating the presence of hidden information, and the body of the configuration of points encoding the corresponding portion of data with a serial number.

3. The method according to claim 1, characterized in that the set of dots of the minimum printable size is printed using black, yellow, cyan, magenta, or other colors of the dye of the printing system or device.

4. The method according to claim 1, characterized in that the embedded configuration of the points on the detected position in the rasterized image as many times as possible.

5. The method of extracting hidden data from a printed document, which consists in performing the following operations:
scanning a printed document and storing the scanned image in an intermediate memory buffer;
binarizing the scanned image by comparison with a predetermined threshold;
detecting related areas in a binary image whose area is less than a predetermined value;
exclude from consideration the related areas for which the distance to the edge of neighboring areas is less than a predetermined value;
calculate the coordinates of the centers of mass of the detected related areas;
recognize service marks by analyzing the coordinates of the centers of mass of the connected areas;
identify reliable point configurations;
evaluate the orientation of the document;
identify less reliable configuration of points, given the orientation of the document;
receive portions of hidden data from detected point configurations;
organize the pieces of data in accordance with their serial numbers to retrieve a sequence of hidden data.

6. The method according to claim 5, characterized in that the service marks are recognized by performing the following steps: select nearby areas, the Euclidean distance between which does not exceed a predetermined range of values, check the equality of the number of areas to a predetermined value, check the correspondence of the angles between the lines drawn through the centers of mass of the regions given values.

7. The method according to claim 5, characterized in that identify reliable configuration of the points by performing the following steps:
select the first recognized service tag;
calculating the orientation of the point configuration by determining the orientation of the selected service mark;
evaluating the positions of the remaining service marks by calculating the estimated location of the service marks in the current point configuration;
identify the remaining service marks in accordance with the assessment of their location;
verify the validity of the detected point configurations by checking the following conditions: the number of service marks found in the point configuration exceeds one; the orientation of service marks in the point configuration is as expected; the number of connected areas within the point configuration limited by service marks exceeds a predetermined number; the center of mass of the coordinates of these regions, taking into account the allowable displacement, coincides with the coordinates of the center of the configuration of points; the parity of the extracted unique serial number of the data portion encoded in the current detected combination of points is equal to the parity bit value;
mark the current point configuration as reliable and service marks as recorded and belonging to the current point configuration if the validity conditions of the detected point configurations are fulfilled.

8. The method according to claim 5, characterized in that they evaluate the orientation of the document by analyzing the orientation of the reliable configuration of the points.

9. The method according to claim 5, characterized in that less reliable configurations of the points are detected by performing the following steps:
analyze damaged service marks with the number of points less than a predetermined value, using a pre-estimated document orientation to calculate the possible locations of the point configurations relative to the current analyzed service marks and
calculating the estimated positions of the remaining service marks of the current point configuration;
identify the remaining service marks in accordance with their intended positions;
choose one configuration of points from the possible ones, for which the conditions of validity of the detected combinations of points are observed, mark the current combination of points as less reliable, and service marks as belonging to the current configuration of points.

10. The method according to claim 5, characterized in that a portion of the hidden data is obtained from the detected point configurations by calculating and analyzing the occurrence distribution of unit bit values for the detected point configurations with the same sequence numbers, including the following:
they extract the bit values at predetermined positions within the detected combination of points: if a connected area is present at the predetermined position, then it is considered that the bit value is one, otherwise zero;
calculate the distribution of occurrence of unit values of bits for each detected combination of points;
binarize the elements of the distribution of occurrence of unit values of bits for each piece of data by means of comparison with a threshold.

11. System for embedding hidden data in a printed document, characterized in that it includes the following configuration of modules:
an image forming unit configured to convert the input document into a bitmap image, wherein the output of the image forming unit is connected to a first input of the data embedding unit;
a point combination sequence generator module, configured to divide the embedded data into pieces of data, prepare a sequence of image combinations of points in accordance with a predetermined scheme and embedded pieces of data, the output of the point combination sequence generator module being connected to the second input of the data embedding module;
a data embedding module configured to detect all possible positions suitable for embedding point configurations on the input rasterized image and embed a sequence of point configurations with their repeated repetition to the identified positions of the bitmap image, wherein the output of the data embedding module is connected to the input of the output module;
an output module configured to produce a printed document with embedded hidden data.

12. The system according to claim 11, characterized in that the imaging module is a device configured to generate a rasterized image and selected from a group of devices including a scanner, digital camera, image rasterization processor.

13. The system according to claim 11, characterized in that the output module is a device configured to form a printed document and selected from a group of devices including a printer, plotter.

14. A system for extracting hidden data from a printed document, characterized in that it includes the following configuration of modules:
an image capturing module configured to receive a digital image of a printed document, the output of the image capturing module being connected to a first input of a point configuration detection module encoding portions of hidden information;
a module for detecting point configurations configured to detect a configuration of points encoding portions of hidden data, extracting data for each detected configuration of points, counting the number of detected configurations of points with the same sequence numbers, the first output of the module detecting point configurations associated with the input of the unit bit values, and the second output of the point configuration detection module is connected to the first input of the hidden data extraction module ;
a module for calculating the distribution of occurrence of unit values of bits, configured to accumulate data on the distribution of the occurrence of unit values of bits for each piece of data, wherein the output of the module for calculating the distribution of occurrence of unit values of bits is connected to a second input of the hidden data extraction module;
a hidden data extraction module configured to analyze incoming data on the distribution of occurrence of unit bits for each piece of data and extract hidden data, the output of the hidden data extraction module being connected to the input of the output module;
an output module configured to extract data.

15. The system of claim 14, wherein the image capture module is a device configured to capture a digital image and selected from the group including a scanner, a digital camera.

16. The system of claim 14, wherein the output module is a device configured to visualize the extracted message and selected from the group including a display, printer, security management system for controlling the circulation of printed documents.