EA044732B1

EA044732B1 - METHOD AND SYSTEM FOR PROTECTING INFORMATION FROM LEAKAGE WHEN PRINTING DOCUMENTS USING THE IMPLEMENTATION OF DIGITAL MARKS

Info

Publication number: EA044732B1
Application number: EA202293485
Authority: EA
Inventors: Михаил Артурович Анистратенко; Александр Артурович Анистратенко; Иван Александрович Оболенский; Дмитрий Алексеевич Борисов; Валентин Валерьевич Сысоев
Original assignee: Публичное Акционерное Общество "Сбербанк России" (Пао Сбербанк)
Priority date: 2022-03-10
Filing date: 2022-12-27
Publication date: 2023-09-27

Description

Область техникиField of technology

Заявленное решение относится к области защиты информации, в частности к решениям для предотвращения утечки информации при печати документов.The declared solution relates to the field of information security, in particular to solutions for preventing information leakage when printing documents.

Уровень техникиState of the art

Технологии предотвращения утечек (англ. Data Leak Prevention, DLP) представляют собой технологии предотвращения утечек конфиденциальной информации из информационной системы вовне, а также технические устройства (программные или программно-аппаратные) для такого предотвращения утечек.Leak prevention technologies (English: Data Leak Prevention, DLP) are technologies for preventing leaks of confidential information from an information system to the outside, as well as technical devices (software or firmware) for such leak prevention.

Из патентной заявки US 20080091954 Al (Morris et al., 17.04.2008) известно решение для проверки целостности данных, представленных на печатных документах. Решение базируется на применении уникального идентификатора, с помощью которого осуществляется анализ содержимого документа. Каждому сегменту документа присваивается цифра или группа цифр, и каждой странице или сегменту документа может быть присвоена одна цифра в общем идентификаторе. Совокупность цифр, связанных с документом, объединяется в строку аутентификации. При получении запроса на последующую обработку документа выполняется аутентификация и проверка целостности документа путем считывания представленного документа для получения строки аутентификации, и последующего сравнения новой строки с ранее сохраненной строкой. После успешного сопоставления документ считается действительным, аутентифицированным и неизмененным.From patent application US 20080091954 Al (Morris et al., 04/17/2008) a solution is known for checking the integrity of data presented on printed documents. The solution is based on the use of a unique identifier, which is used to analyze the contents of the document. Each segment of a document is assigned a digit or group of digits, and each page or segment of a document can be assigned a single digit in the common identifier. The collection of digits associated with the document is combined into an authentication string. When a request for subsequent document processing is received, authentication and document integrity are verified by reading the submitted document to obtain an authentication string, and then comparing the new string with the previously stored string. Once successfully matched, the document is considered valid, authenticated, and unaltered.

Недостатком данного решения является невозможность его использования для предотвращения утечек с целью идентификацию сотрудника, допустившего факт утечки при печати документов. Также, другим недостатком является недостаточная эффективность защиты документов, что обусловлено применением кода для сравнения аутентичности документа, что позволяет только установить факт неизменности и подлинности документа, но не предотвратить утечку информации.The disadvantage of this solution is the impossibility of using it to prevent leaks in order to identify the employee who committed the leak when printing documents. Also, another disadvantage is the insufficient effectiveness of document protection, which is due to the use of a code to compare the authenticity of a document, which only allows one to establish the fact of the immutability and authenticity of the document, but does not prevent information leakage.

Сущность изобретенияThe essence of the invention

Заявленное изобретение направлено на решение технической проблемы, заключающейся в создании эффективного средства для защиты цифровой информации от утечки при ее печати.The claimed invention is aimed at solving a technical problem, which is to create an effective means for protecting digital information from leakage during printing.

Технический результат заключается в повышении эффективности защиты данных от утечки, за счет внедрения цифровых меток в документ, кодирующих уникальный идентификатор пользователя, для последующей его идентификации при анализе распечатанных документов.The technical result consists in increasing the efficiency of data protection from leakage by introducing digital tags into the document that encode a unique user identifier for subsequent identification when analyzing printed documents.

Заявленный результат достигается за счет способа кодирования информации для защиты от ее утечек при печати документов, выполняемого с помощью процессора компьютерного устройства, при этом способ содержит этапы, на которых:The claimed result is achieved through a method of encoding information to protect against leaks when printing documents, performed using the processor of a computer device, and the method contains the stages of:

получают на компьютерном устройстве пользователя информацию о печати по меньшей мере одного цифрового документа, содержащего по меньшей мере текст, при этом компьютерное устройство связано с уникальным идентификатором (УИД) пользователя;receiving on the user's computer device information about printing at least one digital document containing at least text, wherein the computer device is associated with a unique identifier (UID) of the user;

осуществляют до момента передачи цифрового документа на печать его обработку, в ходе которой распознают буквы, содержащиеся в цифровом документе;Before the digital document is sent for printing, it is processed, during which the letters contained in the digital document are recognized;

кодируют УИД пользователя в набор цифровых меток, которые располагаются на контурах букв и/или вблизи контуров букв цифрового документа;encoding the user's UID into a set of digital marks that are located on the outlines of the letters and/or near the outlines of the letters of the digital document;

передают цифровой документ на печать с закодированным УИД пользователя.transmitting a digital document for printing with an encoded user UID.

В одном из частных примеров реализации способа распознавание цифрового документа выполняется с помощью оптического распознавания символов (OCR). В другом частном примере реализации способа распознаются все символы на каждой странице цифрового документа.In one of the particular examples of implementation of the method, digital document recognition is performed using optical character recognition (OCR). In another particular example of the implementation of the method, all characters on each page of a digital document are recognized.

В другом частном примере реализации способа каждый символ УИД пользователя кодируется в двоичный код.In another particular example of implementing the method, each user UID character is encoded into binary code.

В другом частном примере реализации способа на основании разряда двоичного кода определяется область размещения цифровых меток.In another particular example of implementing the method, the area of placement of digital marks is determined based on the bit of the binary code.

Заявленный технический результат также достигается за счет осуществления способа защиты информации от утечек на печатных документах, выполняемого с помощью процессора компьютерного устройства, при этом способ содержит этапы, на которых:The claimed technical result is also achieved by implementing a method for protecting information from leaks on printed documents, performed using a processor of a computer device, wherein the method contains the steps of:

получают по меньшей мере часть изображения печатного документа с закодированным УИД пользователя вышеуказанным способом;obtaining at least a portion of the image of the printed document with the user UID encoded in the above manner;

выполняют распознавание полученного изображения;perform recognition of the resulting image;

определяют буквы, содержащие цифровые метки в своей окрестности;identify letters containing digital marks in their vicinity;

выполняют определение и извлечение закодированного УИД.performing determination and extraction of the encoded UID.

В одном из частных примеров выполнения способа распознавание цифрового документа выполняется с помощью OCR.In one of the particular examples of the method, digital document recognition is performed using OCR.

Заявленное решение также осуществляется с помощью соответствующих систем, содержащих процессор и память, которые хранят машиночитаемые инструкции для реализации каждого из вышеописанных способов.The claimed solution is also implemented using corresponding systems containing a processor and memory that store machine-readable instructions for implementing each of the above methods.

Краткое описание фигурBrief description of the figures

Фиг. 1 иллюстрирует блок-схему способа кодирования цифровой метки.Fig. 1 illustrates a flowchart of a digital mark encoding method.

Фиг. 2А - 2В иллюстрируют примеры размещения цифровых меток в цифровом документе.Fig. 2A - 2B illustrate examples of the placement of digital marks in a digital document.

- 1 044732- 1 044732

Фиг. 3 иллюстрирует блок-схему декодирования цифровых меток.Fig. 3 illustrates a block diagram of digital mark decoding.

Фиг. 4 иллюстрирует диаграмму час раскрытия позиций УИД.Fig. 4 illustrates a diagram of the hour of disclosure of UID positions.

Фиг. 5 иллюстрирует общий вид вычислительного устройства.Fig. 5 illustrates a general view of a computing device.

Осуществление изобретенияCarrying out the invention

На фиг. 1 представлен способ (100) защиты информации в цифровых документах от утечки с помощью кодирования УИД пользователя в виде цифровых меток в документ. На первом этапе (101) получается информация о печати цифрового документа. Выполнение способа (100) осуществляется на компьютерном устройстве пользователя, например, сотрудника, при этом к устройству привязан УИД пользователя, позволяющий его идентифицировать. Исполнение этапа (101) одушевляется с помощью программной логики, исполняемой компьютерным устройством и может быть реализовано, например, в виде программного агента или модуля, обеспечивающего получение сигналов от процессора, свидетельствующих об отправке цифрового документа на печать. Цифровой документ представляет собой, как правило, файл и может содержать текст, графику или их сочетания.In fig. 1 presents a method (100) for protecting information in digital documents from leakage by encoding the user UID in the form of digital marks into the document. At the first stage (101), information about printing of the digital document is obtained. The method (100) is carried out on a computer device of a user, for example, an employee, and a user UID is associated with the device, allowing him to be identified. The execution of step (101) is animated by software logic executed by a computer device and can be implemented, for example, in the form of a software agent or module that provides signals from the processor indicating that a digital document is being sent for printing. A digital document is typically a file and can contain text, graphics, or a combination of both.

После получения на устройстве команды на перехват и анализ документа до его отправки на принтер на этапе (102) выполняется распознавание упомянутого цифрового документа. Обработка документа выполняется с помощью технологии OCR для обеспечения распознавания букв и символов в цифровом документе.After receiving a command on the device to intercept and analyze the document before sending it to the printer, at step (102) recognition of the mentioned digital document is performed. Document processing is done using OCR technology to ensure recognition of letters and symbols in a digital document.

После этапа распознавания цифрового документа на этапе (103) осуществляется процесс кодирования УИД. УИД представляет собой, например, числовой табельный номер сотрудника - цифровой код TAB, состоящий, например, из 8-ми цифр. Данный код можно представить как массив цифр ТАВ_в = {η_1Λη₂, -п_т],ТАВ₈ £ [0 ...9],т = 8.After the digital document recognition step, the UID encoding process is carried out at step (103). The UID is, for example, a numeric personnel number of an employee - a digital TAB code, consisting, for example, of 8 digits. This code can be represented as an array of digits TAB _v = {η _1Λ η ₂ , -n _t ], TAB ₈ £ [0 ...9], t = 8.

Схематичный вид кода представлен в табл. 1.A schematic view of the code is presented in table. 1.

Таблица 1 Схематичное изображение табельного номераTable 1 Schematic representation of the personnel number

тлв₈ tlv ₈ Цифра Number ^П1 ^P 1 п₄ p ₄ ^п5 ^p 5 ^пб ^p b п₇ p ₇ ^по ^By Позиция Position 1 1 2 2 4 4 5 5 6 6 7 7 8 8

Каждый элемент табельного номера представляет собой число от 0 до 9, соответственно, каждый элемент табельного номера можно отобразить в двоичном виде размерностью в 4 бит, т.е. он будет представлять собой двоичное число от 1 до 1100, являющееся гомоморфизмом со сдвигом, представленным в табл.2.Each element of the personnel number is a number from 0 to 9, respectively, each element of the personnel number can be displayed in binary form with a dimension of 4 bits, i.e. it will represent a binary number from 1 to 1100, which is a homomorphism with a shift, presented in Table 2.

Таблица 2table 2

Схема гомоморфизма табельного номера из десятичной в двоичную систему счисленияScheme of homomorphism of personnel number from decimal to binary number system

тав£^ес tav£ ^es TAB^ⁱⁿ TAB^ ⁱⁿ 0 0 0001 0001 1 1 0010 0010 2 2 ООН UN 3 3 0100 0100 4 4 0101 0101 5 5 оно it 6 6 0111 0111 7 7 1000 1000 8 8 1001 1001 9 9 1010 1010

Отображение 0 в 0001 необходимо для того, чтобы фиксировать наличие 0 в табельном номере. Для кодирования элемента табельного номера в двоичном коде TAB₈ ^IN = [b_lf b₂,b₃, ...,Ь),1 = 8, необходимо 4 разряда ~ i^ci' ^с2> ^сз> ^с41, пример которых представлен в табл. 3.The display of 0 in 0001 is necessary in order to record the presence of 0 in the personnel number. To encode a personnel number element in binary code TAB ₈ ^IN = [b _lf b ₂ ,b ₃ , ..., b),1 = 8, 4 digits are required ~ i ^c i' ^c 2 > ^c z > ^c 41, example which are presented in table. 3.

Таблица 3Table 3

Схематическое деление бинарного числа на разрядыSchematic division of a binary number into digits

Таким образом, возможно кодировать любое число в букву посредством двоичного кодирования.Thus, it is possible to encode any number into a letter using binary encoding.

Пример такого разделения для последующего кодирования представлено на фиг. 2А - фиг. 2В. Каждая распознанная буква (20) делится на 4 четверти в плоскости по часовой стрелке, начиная с левого нижнего угла.An example of such a division for subsequent encoding is presented in Fig. 2A - fig. 2B. Each recognized letter (20) is divided into 4 quarters in a clockwise plane, starting from the lower left corner.

При наличии 1-цы в I разряде двоичного представления цифры табельного номера с₁ метка размещается в I четверти. Аналогичные операции проводятся со всеми разрядами двоичного представления цифры.If there is a 1 in the 1st digit of the binary representation of the personnel number digit, _{the 1st} mark is placed in the 1st quarter. Similar operations are carried out with all bits of the binary representation of a digit.

- 2 044732- 2 044732

Метод нанесения метки в пространство возле буквы заключается в том, что как по казано на фиг. 2Б2В наносится цифровая метка в виде линии (21) на поверхности буквы или точки (22) в окрестности буквы в заданной четверти. Пример кодирования меток в буквы представлен в табл. 4.The method of applying a mark in the space near the letter is that, as shown in Fig. 2B2V a digital mark is applied in the form of a line (21) on the surface of the letter or a point (22) in the vicinity of the letter in a given quarter. An example of encoding marks into letters is presented in table. 4.

Таблица 4Table 4

Схема позиционного кодированияPosition coding scheme

Выше представленная табл. 4 означает, что каждую позицию числа в табельном номере возможно кодировать на любую из 4-х букв. Выбор букв для нанесения метки осуществляется постранично. Пусть документ D содержит 1 страниц, тогда документ D - есть массив страниц, “ ЖтРгАз - Ρι}> I ^е The above table 4 means that each number position in the personnel number can be encoded into any of the 4 letters. The selection of letters for marking is carried out page by page. Let document D contain 1 pages, then document D is an array of pages, “ZhtRgAz - Ρι}> I ^e

На каждой странице Р^ * ^G Ж считывается посимвольно текст и записывается в массив символов = {^,52,53,...5, }, , ^ρι где l_pi - количество символов на странице pi, из них выявляются русские бу^SPi ^: On each page Р^ * ^G Ж the text is read character-by-character and written into an array of characters = {^,52,53,...5, }, , ^ρ ι where l _pi is the number of characters on page pi, from which Russian letters are identified ^S Pi ^:

Wrus, квы ¹ 'Pi ^{е S}PiДалее создаются 8 массивов P°^si>P°^s2 -Pos₈, каждый из которых соответствует каждой позиции табельного номера. Каждый массивов Pos заполняется теми символами из Wruspi, которые соответствуют позиции из таблицы 4. Например, Pos₁ заполняется всеми символами из Wruspi, которые имеют значения {а, з, п, ч}, вне зависимости от регистра.Wrus, kvy ¹ 'Pi ^{e S} PiNext, 8 arrays P° ^s i>P° ^s 2 -Pos ₈ are created, each of which corresponds to each position of the personnel number. Each Pos array is filled with those characters from Wruspi that correspond to the position from Table 4. For example, Pos ₁ is filled with all characters from Wruspi that have the values {a, z, p, h}, regardless of case.

Массивы P°^si>P°^s2 -Pos₈ перемешиваются, к примеру, тасованием Кнута.The arrays P° ^s i>P° ^s 2 -Pos ₈ are mixed, for example, by Knuth shuffle.

Пусть Iposp ¹pos₂> lpos₃ — hos₈ - размерности полученных массивов, Р -процент символов на внедрение метки ^{р е} [°'³ -°'⁷]’ тогда каждый массив из P°^S^P°^S2 -P°^s8 обрезается с конца до размерностиLet Iposp ¹ pos ₂ > lpos ₃ - hos ₈ - the dimensions of the resulting arrays, P - the percentage of characters for the implementation of the label ^{p e} [°' ³ -°' ⁷ ]' then each array from P° ^S ^P° ^S 2 -P° ^s 8 is cut from end to size

Σ^ογ ’ Λ Pos₁,Pos₂ ...Pos_B Σ^ογ ' Λ Pos ₁ ,Pos ₂ ...Pos _B

Pos^p,Pos₂ ...Posq, π PnF Рич? ιPos ^p ,Pos ₂ ...Posq, π PnF Reach? ι

Полученные массивы ^Γυ^ΐ'^ΓυΛ2 -™s используются для нанесения цифровых меток вышеописанным способом. Внесение цифровых меток осуществляется с помощью вырезания букв с помощью OCR, внесения меток в пиксельные координаты и внесение букв с цифровыми метками обратно в документ, направляемый на печать. После внедрения всех меток (21, 22) на искомой странице pi тоже самое выполняется для следующей страницы p_i+1 и так далее до конца документа p_b The resulting arrays ^Γυ ^ΐ' ^ΓυΛ 2 -™s are used to apply digital marks in the manner described above. Digital tagging is done by cutting out letters using OCR, adding the tagging to pixel coordinates, and adding the digitally tagged letters back into the document to be printed. After embedding all the labels (21, 22) on the desired page pi, the same is done for the next page p _i+1 and so on until the end of the document p _b

В табл. 5 приведен пример кодирования меток для УИД пользователя - 00013400.In table Figure 5 shows an example of coding labels for user UID - 00013400.

Таблица 5Table 5

Пример кодирования цифровых меток в окрестности буквAn example of encoding digital marks in the neighborhood of letters

- 3 044732- 3 044732

После внесения в документ, направленный на печать цифровых меток, кодирующих УИД, на этапе (104) выполняется его направление на печать. Распечатанный документ будет содержать закодированный УИД неразличимый для человеческого глаза. Размер цифровых меток может выбираться произвольно (например, метки радиусом от 1 - 2 пикселей).After digital labels encoding the UID are added to the document sent for printing, at step (104) it is sent for printing. The printed document will contain an encoded UID that is indistinguishable to the human eye. The size of digital marks can be chosen arbitrarily (for example, marks with a radius of 1 - 2 pixels).

На фиг. 3 приведена последовательность этапов, выполняемых при выполнении способа (300) распознавании УИД на распечатанных документах. На этапе (301) вычислительное устройство, используемое для определения УИД в распечатанном документе, получает изображение такого документа. Изображение может содержать полностью или частично текст, с закодированном УИД, полученный, например, с помощью фотографирования внешним устройством (смартфон, камера и т.п.) или при помощи сканирования с помощью OCR распечатанного документа.In fig. 3 shows the sequence of steps performed when performing the method (300) for recognizing UID on printed documents. At step (301), the computing device used to determine the UID in the printed document obtains an image of such document. The image may contain all or part of the text, with an encoded UID, obtained, for example, by photographing with an external device (smartphone, camera, etc.) or by scanning a printed document using OCR.

Далее на этапе (302) также при помощи технологии OCR выполняется распознавание букв в документе, при этом если страниц в документе несколько, то распознается каждая страница документа. На этапе (303) выполняется считывание цифровых меток в окрестностях распознанных букв. Пример анализа цифровых меток может осуществляться по примеру, приведенному в табл. 5, которая может применяться как таблица для сопоставления меток соответствующей цифре УИД пользователя. После этого выполняется декодирование УИД на этапе (304) и установление по нему табельного номера сотрудника и соответствующего пользователя, с компьютерного устройства которого была осуществлена печать документа.Next, at step (302), also using OCR technology, letters in the document are recognized, and if there are several pages in the document, then each page of the document is recognized. At step (303), digital marks in the vicinity of the recognized letters are read. An example of analyzing digital tags can be carried out according to the example given in table. 5, which can be used as a table to match labels to the corresponding digit of the user UID. After this, the UID is decoded at step (304) and the personnel number of the employee and the corresponding user from whose computer device the document was printed is determined.

Математическое обоснование метода.Mathematical justification of the method.

Для этого убедимся, что частоты раскрытия позиций ^_ {^ηι>^η2, -^пт}>^т - g равномерно распределены для всех m, что позволяет показать вероятность извлечения табельного номера (УИД) из текста страницы.To do this, we will make sure that the frequencies of opening positions ^_ { ^η ι> ^η 2, - ⁿ t}> ^m - g are uniformly distributed for all m, which allows us to show the probability of extracting a personnel number (UID) from the text of the page.

Для математического обоснования было проведено исследование по частоте встречающихся букв в тексте с разным содержанием, к примеру, рассмотрим, такое распределение характерное для литературных произведений. Список литературных произведений, участвующих в эксперименте: Сильмариллион. Дж.Р.Р.Толкин, Двадцать тысяч лье под водой. Жюль Г.Верн, Двадцать лет спустя. Александр Дюма, Три мушкетера. Александр Дюма, Унесенные ветром. Маргарет Митчелл, Айвенго. Вальтер Скотт, Герой нашего времени. Н.В. Гоголь, Война и мир. Л.Н.Толстой, Обитаемый остров. Борис и Аркадий Стругацкие, Преступление и наказание. Ф.М.Достоевский, Живые и мертвые. К.М.Симонов, всего 8 366 594 символов, 3919 страниц. Математическая лингвистика показала следующие вероятности частоты встречи букв русского алфавита в текстах (табл. 6).For mathematical justification, a study was carried out on the frequency of occurrence of letters in text with different contents; for example, let’s consider such a distribution typical for literary works. List of literary works participating in the experiment: The Silmarillion. J.R.R. Tolkien, Twenty Thousand Leagues Under the Sea. Jules G. Verne, Twenty Years Later. Alexandre Dumas, Three Musketeers. Alexandre Dumas, Gone with the Wind. Margaret Mitchell, Ivanhoe. Walter Scott, Hero of Our Time. N.V. Gogol, War and Peace. L.N. Tolstoy, Inhabited Island. Boris and Arkady Strugatsky, Crime and Punishment. F.M. Dostoevsky, The Living and the Dead. K.M. Simonov, total 8,366,594 characters, 3919 pages. Mathematical linguistics has shown the following probabilities for the frequency of occurrence of letters of the Russian alphabet in texts (Table 6).

Таблица 6Table 6

Таблица частоты встречи букв русского алфавита в художественной литературеTable of frequency of occurrence of letters of the Russian alphabet in fiction

Буква Letter Частота встречи, % Meeting frequency, % Буква Letter Частота встречи, % Meeting frequency, % а A 8,31 8.31 Р R 4,32 4.32 б b 1,65 1.65 с With 5,24 5.24 в V 4,59 4.59 т T 6,06 6.06 г G 1,72 1.72 У U 2,95 2.95 д d 3,06 3.06 Ф F 0,13 0.13 е e 8,42 8.42 X X 0,84 0.84 ё e 0,02 0.02 ч h 1,56 1.56 ж and 1,01 1.01 ц ts 0,44 0.44 3 3 1,71 1.71 ш w 0,97 0.97 и And 6,84 6.84 щ sch 0,32 0.32 й th 1,11 1.11 ъ ъ 0,03 0.03 к To 3,42 3.42 ы s 1,81 1.81 л l 4,99 4.99 ь b 1,93 1.93 м m 3,16 3.16 э uh 0,27 0.27 н n 6,46 6.46 ю Yu 0,57 0.57 о O 11,42 11.42 я I 1,95 1.95

п 2,71n 2.71

- 4 044732- 4 044732

Для получения значения частоты раскрытия позиций ^_ {^ηι>^η2, -^пт} выполняются следующие действия. Из табл. 4 и 5 известны буквы, в которые кодируются разряды. Для получения частоты раскрытия разрядов для алгоритма нанесения метки в пространстве возле буквы, частоты букв, в которые кодируются метки, складываются, т.к. позиция выкрывается при обнаружении метки хотя бы в одной из них. В результате вышеописанных действий получается табл. 7.To obtain the value of the position opening frequency ^_ { ^η ι> ^η 2, - ⁿ t} the following actions are performed. From the table 4 and 5, the letters into which the digits are encoded are known. To obtain the frequency of bit opening for the algorithm for applying a mark in the space near a letter, the frequencies of the letters into which the marks are encoded are added up, because The position is opened when a mark is detected in at least one of them. As a result of the above actions, a table is obtained. 7.

Таблица 7Table 7

Таблица частоты раскрытия позиций табельного номера i Частота встречи букв Частота раскрытия разрядаTable of frequency of disclosure of positions of personnel number i Frequency of letters Frequency of disclosure of digits

а A 8,31 8.31 3 3 1,71 1.71 и And 2,71 2.71 ч h 0,44 0.44 13,17 13.17 б b 1,65 1.65 и And 6,84 6.84 Р R 4,32 4.32 ш w 0,97 0.97 13,78 13.78 в V 4,59 4.59 й th 1Д1 1D1 с With 5,24 5.24 Щ SCH 0,32 0.32 11,26 11.26 г G 1,72 1.72 к To 3,42 3.42 т T 6,06 6.06 ъ ъ 0,03 0.03 11,23 11.23 д d 3,06 3.06 л l 4,99 4.99 у at 2,95 2.95 ы s 1,81 1.81 12,81 12.81 е e 8,42 8.42 м m 3,16 3.16 ф f 0,13 0.13 э uh 0,27 0.27 11,84 11.84 ё e 0,02 0.02 и And 6,46 6.46 X X 0,84 0.84 ь b 1,93 1.93 я I 1,95 1.95 11,57 11.57 ж and 1,01 1.01 О ABOUT 11,42 11.42 Ц C 1,56 1.56 ю Yu 0,57 0.57 14,56 14.56

На основании табл. 7 формируется диаграмма, представленная на фиг. 4. Диаграмма показывает, что частота раскрытия всех позиций распределена относительно равномерно.Based on table 7 the diagram shown in FIG. 4. The diagram shows that the frequency of opening of all positions is distributed relatively evenly.

Вычислим количество каждой буквы русского алфавита экспериментальной выборки:Let's calculate the number of each letter of the Russian alphabet in the experimental sample:

Таблица 8Table 8

Буквенно-позиционная количественная характеристика экспериментальной выборки.Letter-positional quantitative characteristics of the experimental sample.

Ир i Ir i Кол-во СИМВОЛО в Number of SYMBOLS in Сим в / стр Sim v/p Кол-во символ ОВ Number of OB symbols Сим в / стр Sim v/p Кол-во символ ОВ Number of OB symbols Сим в / стр Sim v/p Кол-во символ ОВ Number of OB symbols Сим в / стр Sim v/p Кол-во символ ОВ Number of OB symbols Сим в/ стр Sim in/page 1 1 а A 689 971 689 971 176 176 3 3 142 242 142 242 36 36 п P 213 560 213 560 54 54 ч h 121417 121417 31 31 2 2 б b 140 050 140 050 36 36 И AND 572 432 572 432 146 146 Р R 372 611 372 611 95 95 ш w 74 968 74,968 19 19 3 3 в V 372 447 372 447 95 95 й th 92 969 92,969 24 24 с With 448 533 448 533 114 114 Щ SCH 26 501 26,501 7 7 4 4 г G 152 827 152,827 39 39 к To 283 925 283 925 72 72 т T 516 921 516 921 132 132 ъ ъ 2 588 2,588 1 1 5 5 Д D 255 254 255 254 65 65 л l 420 003 420 003 107 107 У U 234 845 234 845 60 60 ы s 162 890 162,890 42 42 6 6 е e 709 671 709 671 181 181 м m 257 188 257 188 66 66 Ф F 12 025 12,025 3 3 ь b 165 ПО 165 VP 42 42 7 7 ё e 5 953 5,953 2 2 и And 536 626 536 626 137 137 X X 75 243 75 243 19 19 э uh 24 582 24,582 6 6 я I 171 316 171 316 44 44 8 8 ж and 88 798 88 798 23 23 О ABOUT 940 740 940 740 240 240 Ц C 31 199 31 199 8 8 ю Yu 51 195 51 195 13 13

Для метода нанесения точки в пространство возле буквы принимается следующее допущение: процент Р символов на внедрение метки Р = 0,3, при передаче через мессенджеры теряется определенный процент М = 0,7 меток. На основании вышеописанного можно вычислить вероятность распознавания текста, если для дешифрования доступно:For the method of placing a dot in the space near the letter, the following assumption is made: the percentage P of characters for the implementation of the label P = 0.3; when transmitted through messengers, a certain percentage M = 0.7 of labels is lost. Based on the above, you can calculate the probability of text recognition if the following is available for decryption:

целая страница;whole page;

1/2 страницы;1/2 page;

1/₄ страницы. _1/4 pages.

- 5 044732- 5 044732

Таблица 9Table 9

Пояснения и вероятностей распознавания текста, закодированного методом нанесения точки в пространство возле буквыExplanations and probabilities of recognizing text encoded by placing a dot in the space near the letter

n_if ίn _if ί выбирает ся 30 % общего количест ва в разряде 30% of the total number is selected va in rank передача по мессендже РУ (70% меток теряется) transmission via RU messenger (70% of tags are lost) целая страница whole page 1\2 1\2 1\4 1\4 букв текуще й позици и на страни це letters current position on the page ПОЗИЦИЯ распознала сь или нет POSITION recognized or not букв текуще й позици и на страни це letters current position on the page ПОЗИЦИЯ распознала сь или нет POSITION recognized or not позиция распознала сь или нет position was recognized or not ПОЗИЦИЯ распознала сь или нет POSITION recognized or not 1 1 89 89 27 27 27 27 1 1 13,4 13.4 1 1 6,7 6.7 1 1 2 2 89 89 27 27 27 27 1 1 13,3 13.3 1 1 6,7 6.7 1 1 3 3 72 72 22 22 22 22 1 1 10,8 10.8 1 1 5,4 5.4 1 1 4 4 73 73 22 22 22 22 1 1 11,0 11.0 1 1 5,5 5.5 1 1 5 5 82 82 25 25 25 25 1 1 12,3 12.3 1 1 6,2 6.2 1 1 6 6 88 88 26 26 26 26 1 1 13,1 13.1 1 1 6,6 6.6 1 1 7 7 49 49 15 15 15 15 1 1 7,4 7.4 1 1 3,7 3.7 1 1 8 8 85 85 26 26 26 26 1 1 12,8 12.8 1 1 6,4 6.4 1 1 100 100 100 100 100 100

Пример Экспериментального применения.Example of Experimental Application.

В ходе тестирования было распечатано и анализировано около 500 страниц разного содержания: текст, разреженный текст, текст с таблицами, текст с графиками, текст с формулами;During testing, about 500 pages of different content were printed and analyzed: text, sparse text, text with tables, text with graphs, text with formulas;

с разными типами шрифтов: Arial, Calibri, Times New Roman;with different types of fonts: Arial, Calibri, Times New Roman;

разное оформление текста: обычный, курсив, полужирный, подчеркнутый;different text formats: regular, italic, bold, underlined;

разной размерности: 12рх, 14рх;different sizes: 12px, 14px;

разным межстрочным интервалом: 0.5, 1.15, 1,5;different line spacing: 0.5, 1.15, 1.5;

разным межзнаковым интервалом: обычный, разреженный, уплотнённый; В каждом случае рассматривалась возможность извлечения метки с:different character spacing: regular, sparse, compacted; In each case, the possibility of extracting the label from:

распечатки напрямую;direct printouts;

с фотографии распечатки;from a photograph of a printout;

переданной по мессенджеру распечатки фотографии.a photo printout sent via messenger.

Печать проводилась на офисном черно белом лазерном принтере Lexmark МХ71 lde на офисной бумаге Снегурочка с белизной CIE 146 по ISO 11475. Фотографирование производилось на телефон Samsung A51 при офисном освещении, бумага лежит горизонтально на столе, фотографирование случайное под разными, незначительными углами, порядком 2-4% в 3-х измерениях. При передаче фотографий использовался мессенджер Telegram со сжатием изображения при отправлении.Printing was carried out on a Lexmark MX71 lde office black-and-white laser printer on Snegurochka office paper with whiteness CIE 146 according to ISO 11475. Photographing was done on a Samsung A51 phone in office lighting, the paper lay horizontally on the table, photographing was random at different, slight angles, about 2- 4% in 3 dimensions. When transferring photos, the Telegram messenger was used with image compression when sending.

В ходе эксперимента подбирались параметры, такие как размер меток, их оптимальные места и способы нанесения. Результаты последней фазы эксперимента показаны в табл. 10.During the experiment, parameters were selected, such as the size of the marks, their optimal locations and methods of application. The results of the last phase of the experiment are shown in table. 10.

Таблица 10Table 10

Результат экспериментаExperiment result

ТЕКСТ TEXT ПОЗИЦИЯ POSITION 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 100% 100% обнаруженная буква detected letter а A И AND С WITH т T Д D е e н n О ABOUT РАЗРЕЖЕННЫЙ ТЕКСТ SPARSE TEXT ПОЗИЦИЯ POSITION 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 100% 100% обнаруженная буква detected letter а A И AND с With т T Л L е e н n О ABOUT ТАБЛИЦА TABLE ПОЗИЦИЯ POSITION 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 100% 100%

обнаруженная буква detected letter п P и And с With т T Д D м m н n ж and ГРАФИК SCHEDULE ПОЗИЦИЯ POSITION 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 обнаруженная буква detected letter а A И AND в V т T Л L Ж AND ФОРМУЛА FORMULA ПОЗИЦИЯ POSITION 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 обнаруженная буква detected letter ч h И AND К TO ф f н n

Вышеописанная таблица показывает хорошие результаты анализа переданных по мессенджеру фотографий распечаток на офисном черно-белом принтере. В результате эксперимента были подобраны оптимальные параметры для внедрения метки, которые с одной стороны, были бы заметны на распечатках как дефекты принтера, с другой стороны, хорошо извлекались из переданных фотографий по мессенджерам.The table described above shows good results from the analysis of photographs of printouts sent via messenger on an office black and white printer. As a result of the experiment, optimal parameters were selected for introducing marks, which, on the one hand, would be noticeable on printouts as printer defects, on the other hand, could be easily extracted from photographs sent via instant messengers.

--

Claims

In fig. 5 is an overview of a computing device (500) suitable for performing the above methods. The device (500) may be, for example, a computer, a server, or other type of suitable computing device.

In the general case, a computing device (500) contains one or more processors (501), memory devices such as RAM (502) and ROM (503), input/output interfaces (504), and input/output devices ( 505), and a network communication device (506).

The processor (501) (or multiple processors, multi-core processor) may be selected from a variety of devices commonly used today, such as Intel™, AMD™, Apple™, Samsung Exynos™, MediaTEK™, Qualcomm Snapdragon™, etc. . A graphics processor, for example, Nvidia, AMD, Graphcore, etc., can also be used as a processor (501).

RAM (502) is a random access memory and is designed to store machine-readable instructions executed by the processor (501) to perform the necessary operations for logical data processing. RAM (502) typically contains executable operating system instructions and related software components (applications, program modules, etc.).

ROM (503) is one or more permanent storage devices, such as a hard disk drive (HDD), a solid state drive (SSD), flash memory (EEPROM, NAND, etc.), optical storage media (CD-R) /RW, DVD-R/RW, BlueRay Disc, MD), etc.

To organize the operation of device components (500) and organize the operation of external connected devices, various types of I/O interfaces (504) are used.

The choice of appropriate interfaces depends on the specific design of the computing device, which can be, but is not limited to: PCI, AGP, PS/2, IrDa, FireWire, LPT, COM, SATA, IDE, Lightning, USB (2.0, 3.0, 3.1, micro, mini, type C), TRS/Audio jack (2.5, 3.5, 6.35), HDMI, DVI, VGA, Display Port, RJ45, RS232, etc. To ensure user interaction with the computing device (500), various means (505) of I/O information are used, for example, a keyboard, a display (monitor), a touch display, a touch pad, a joystick, a mouse, a light pen, a stylus, a touchpad, trackball, speakers, microphone, augmented reality tools, optical sensors, tablet, light indicators, projector, camera, biometric identification tools (retina scanner, fingerprint scanner, voice recognition module), etc.

The networking facility (506) allows the device (500) to communicate via an internal or external computer network, such as an Intranet, Internet, LAN, etc. One or more means (506) may be used, but not limited to: Ethernet card, GSM modem, GPRS modem, LTE modem, 5G modem, satellite communication module, NFC module, Bluetooth and/or BLE module, Wi-Fi module and etc.

Additionally, satellite navigation tools can also be used as part of the device (500), for example, GPS, GLONASS, BeiDou, Galileo. The submitted application materials disclose preferred examples of implementation of a technical solution and should not be interpreted as limiting other, particular examples of its implementation that do not go beyond the scope of the requested legal protection, which are obvious to specialists in the relevant field of technology.

CLAIM

1. A method for encoding information to protect against leaks when printing documents, performed using a processor of a computer device, wherein the method contains the steps of:

receiving on the user's computer device information about printing at least one digital document containing at least text, wherein the computer device is associated with a unique identifier (UID) of the user;

Before the digital document is sent for printing, it is processed, during which the letters contained in the digital document are recognized;

encoding the user's UID into a set of digital marks that are located on the outlines of the letters and/or near the outlines of the letters of the digital document;

transmitting a digital document for printing with an encoded user UID.

2. The method according to claim 1, characterized in that recognition of the digital document is performed using optical character recognition (OCR).

3. The method according to claim 2, characterized in that all characters on each page of the digital document are recognized.

4. The method according to claim 1, characterized in that each user UID character is encoded into binary code.

5. The method according to claim 4, characterized in that based on the bit of the binary code, the area where the digital marks are placed is determined.

6. A method for protecting information from leaks on printed documents, performed using a processor of a computer device, wherein the method contains the steps of:

-