RU2744769C1

RU2744769C1 - Method for image processing using adaptive technologies based on neural networks and computer vision

Info

Publication number: RU2744769C1
Application number: RU2020122196A
Authority: RU
Inventors: Арташес Левонович Дадаян; Шамиль Анасович КАЮМОВ; Константин Юрьевич МАЙОРОВ; Павел Владимирович ЮРАСОВ
Original assignee: Общество с ограниченной ответственностью "СЭНДБОКС"
Priority date: 2020-07-04
Filing date: 2020-07-04
Publication date: 2021-03-15

Abstract

FIELD: image processing.

SUBSTANCE: group of inventions relates to image processing, in particular to recognition of the type of document and its content from a stream of arbitrary images. The method includes the steps of extracting a document image using machine vision algorithms from a set of arbitrary images; determining the proposed type of document using a convolutional neural network based on the extracted document image, while the convolutional network sets the degree of similarity of the document image with known document types based on the specified document samples; determining the document type using a recurrent neural network based on the proposed document type received from the convolutional neural network, while the recurrent neural network evaluates the degree of similarity of the document with the intended document type based on machine learning algorithms. Data contained in a document are recognized based on a specific type of document; the recognized data are saved.

EFFECT: group of inventions ensures correct recognition of the document and restoring its content from the stream of images, even in the presence of illegible text or its absence on a part of the document.

40 cl, 2 dwg, 3 tbl

Description

Область техники, к которой относится изобретениеThe technical field to which the invention relates

Настоящее изобретение относится к обработке изображений неквалифицированным пользователем. В частности, изобретение относится к распознаванию и классификации документов из потока произвольных изображений на основе алгоритмов компьютерного зрения и нейронных сетей.The present invention relates to image processing by an unskilled user. In particular, the invention relates to the recognition and classification of documents from a stream of arbitrary images based on computer vision algorithms and neural networks.

Уровень техникиState of the art

Современные смартфоны позволяют быстро делать изображения и сохранять их в большом количестве. Зачастую в количестве гораздо большем, чем пользователь способен рассортировать, обработать. Функциональные возможности таких устройств все время увеличиваются, количество необработанных фотографий растет, пользователю становится практически невозможно найти фотографию, которую он сделал несколько месяцев или год назад.Modern smartphones allow you to quickly capture images and save them in large quantities. Often in quantities much larger than the user is able to sort, process. The functionality of such devices is constantly increasing, the number of raw photos is growing, and it becomes almost impossible for the user to find a photo that he took a few months or a year ago.

Отдельная задача заключается в учете фотографий каких-либо документов, или информационных материалов. Необходимость более строгой классификации и извлечения информации из фотографий такого рода очевидна.A separate task is to take into account photographs of any documents or information materials. The need for stricter classification and information extraction from such photographs is obvious.

Например, чек на товар становится нечитаемым через некоторое время из-за того, что он напечатан на термобумаге, и чтобы избежать ситуаций рекомендуется делать копии чеков, как бумажные, так и электронные, которые можно в любой момент распечатать. Однако из-за большого количества изображений поиск фотографии чека, сделанной больше года назад, представляется трудоемкой, а часто неразрешимой задачей.For example, a receipt for a product becomes unreadable after a while due to the fact that it is printed on thermal paper, and in order to avoid situations, it is recommended to make copies of receipts, both paper and electronic, which can be printed out at any time. However, due to the large number of images, finding a photo of a receipt taken more than a year ago seems to be a time consuming and often insoluble task.

Кроме того, вероятность использования копии каждого чека или другого документа или материала невелика, поэтому пользователь не готов самостоятельно вводить различные атрибуты и параметры сфотографированного документа.In addition, the likelihood of using a copy of each receipt or other document or material is small, so the user is not ready to independently enter various attributes and parameters of the photographed document.

В связи с этим возникает необходимость в наличии виртуального помощника на телефоне, реализованного, например, в виде мобильного приложения, который бы самостоятельно, без привлечения пользователя обнаруживал бы документы, определял бы их тип, распознавал текст и другие важные характеристики (QR код и другие).In this regard, there is a need for a virtual assistant on the phone, implemented, for example, in the form of a mobile application, which would independently, without involving the user, detect documents, determine their type, recognize text and other important characteristics (QR code and others) ...

Такой помощник мог бы не только выделять из потока изображений документы и материалы, но и принимал бы решение за пользователя по ряду полей, обнаруженных в документе: ставил напоминания без его участия по дате окончания действия ОСАГО, заграничного паспорта и другие.Such an assistant could not only select documents and materials from the stream of images, but also make a decision for the user on a number of fields found in the document: set reminders without his participation by the expiration date of the CTP, foreign passport and others.

В настоящей момент у пользователя есть набор различных программ и технических средств, которые способны выполнить некоторые из указанных действий. В частности, известно применение нейросетей либо для узнавания образа на изображении, либо для распознавания текста. Однако отсутствует возможность поручить выполнение указаных действий виртуальному помощнику. At the moment, the user has a set of various programs and technical means that are capable of performing some of the indicated actions. In particular, it is known to use neural networks either for recognizing an image in an image, or for text recognition. However, there is no way to instruct the virtual assistant to perform these actions.

Кроме того, технологии классификации и извлечения информации из изображений востребованы не только на смартфоне, но и на других мобильных персональных устройствах, таких как очки дополненной реальности, и, кроме того, на немобильных и не персональных устройствах, таких как корпоративный сканер, камеры CCTV `осуществляющие захват и фиксацию документов.In addition, technologies for classifying and extracting information from images are in demand not only on a smartphone, but also on other mobile personal devices, such as augmented reality glasses, and, in addition, on non-mobile and non-personal devices such as a corporate scanner, CCTV cameras ` carrying out the capture and fixation of documents.

Раскрытие сущности изобретенияDisclosure of the essence of the invention

Предметом изобретения является информационная система, обрабатывающая изображение документов. На входе данная информационная система получает изображение в электронном виде, а на выходе – блок информации, содержащий: улучшенное изображение с исправленной геометрией и цветокоррекцией; тип документа, определенный на основе алгоритмов машинного обучения; содержание полей документа, зависящих от определенного ранее типа документа, дополненное информацией из других источников, если это необходимо. Информационная система обладает способностью определять тип документа на основе алгоритмов нейросетей, а также исправлять ошибки распознавания текста за счет каскадной обработки изображения. Данная способность реализована с помощью методики обучения нейросети; в частности, нейросеть самостоятельно выводит формулу, по которой определяет степень схожести анализируемого изображения с документом того или иного типа.The subject of the invention is an information system that processes the image of documents. At the input, this information system receives an image in electronic form, and at the output - a block of information containing: an improved image with corrected geometry and color correction; document type determined based on machine learning algorithms; the content of the fields of the document, depending on the previously defined type of document, supplemented with information from other sources, if necessary. The information system has the ability to determine the type of document based on neural network algorithms, as well as correct text recognition errors through cascade image processing. This ability is realized using the neural network training method; in particular, the neural network independently deduces a formula by which it determines the degree of similarity of the analyzed image with a document of one type or another.

Сущность данного изобретения заключается в использовании каскада технологий распознавания для обеспечения полной автоматизации процесса и автономии от пользователя. Данные технологии являются адаптивными и решения принимается на основе учета множества параметров, включая, но не ограничиваясь:The essence of this invention lies in the use of a cascade of recognition technologies to ensure complete automation of the process and autonomy from the user. These technologies are adaptive and decisions are made based on many parameters, including but not limited to:

- общий внешний вид документов того или иного вида; - the general appearance of documents of one kind or another;

- способы фотографирования или фиксации в более широком понимании, вида документов того или иного вида;- ways of photographing or fixing, in a broader sense, the type of documents of one type or another;

- текст документа, поддающийся распознаванию;- the text of the document amenable to recognition;

- штрих-коды, QR коды, другая машиночитаемая информация на документе;- barcodes, QR codes, other machine-readable information on the document;

- информация, извлеченная из документов других типов, но принадлежащих тому же пользователю;- information extracted from documents of other types, but belonging to the same user;

- контекст, в котором было сделано изображение, включая предыдущие изображения;- the context in which the image was taken, including previous images;

- различные проверочные базы данных, полученных из открытых источников, включая концепцию открытого доступа к государственным данным.- various verification databases obtained from open sources, including the concept of open access to government data.

Краткое описание чертежейBrief Description of Drawings

Сопроводительные чертежи иллюстрируют принцип работы виртуального помощника и способы обнаружения документов в потоке изображений. На чертежах:The accompanying drawings illustrate how the virtual assistant works and how to detect documents in an image stream. In the drawings:

фиг.1 - блок-схема, изображающая виртуального помощника и некоторые примеры источников контекста, которые он может обрабатывать;1 is a block diagram depicting a virtual assistant and some examples of context sources that it can handle;

фиг.2 - блок-схема обнаружения и обработки документа;Fig. 2 is a block diagram of document detection and processing;

Осуществление изобретенияImplementation of the invention

Ниже представлены предпочтительные варианты осуществления изобретения.Preferred embodiments of the invention are presented below.

В одном варианте осуществления заявленное изобретения реализовано на смартфоне в виде виртуального помощника. Общие принципы работы виртуального помощника представлены на фиг. 1. In one embodiment, the claimed invention is implemented on a smartphone as a virtual assistant. The general operating principles of the virtual assistant are shown in FIG. one.

Виртуальный помощник обрабатывает изображения документов, полученное либо напрямую от пользователя (фотографирование), либо самостоятельно при наличии соответствующего разрешения от пользователя. После произведенной обработки виртуальный помощник классифицирует обработанные изображения по типу содержащегося в них документа (чеки, билеты, договоры, страховые полисы и т. п.).The virtual assistant processes the images of documents received either directly from the user (photographing), or independently with the appropriate permission from the user. After processing, the virtual assistant classifies the processed images by the type of document they contain (checks, tickets, contracts, insurance policies, etc.).

На фиг. 2 представлен общий алгоритм распознавания типа и параметров документа в одном варианте осуществления изобретения.FIG. 2 shows a general algorithm for recognizing the type and parameters of a document in one embodiment of the invention.

Обработка изображения документа начинается на этапе 1. Document image processing begins in step 1.

Вначале выполняется предварительная обработка документа. На этапе 2 производится проверка, что данное изображение не является ранее распознанным документом. В случае, если изображение представляет собой дубликат уже распознанного документа, то документ игнорируется.First, preprocessing of the document is performed. At stage 2, it is checked that this image is not a previously recognized document. If the image is a duplicate of an already recognized document, the document is ignored.

Если документ не является дубликатом, то на этапе 3 выполнятся определение границ документа, то есть бумажного листа или другой основы, на котором напечатан/изображен документ. После отсечения изображения за границами основы документа происходит на этапе 4 коррекция геометрии изображения, например, исправление трапециевидных искажений или коррекция ракурса.If the document is not a duplicate, then at stage 3, the boundaries of the document, that is, a paper sheet or other base on which the document is printed / depicted, will be determined. After clipping the image outside the bounds of the document base, in step 4, the geometry of the image is corrected, for example, trapezoidal distortion correction or angle correction.

На этапе 5 выполняет сохранение начальных ключевых параметров цвета, чтобы на этапе 6 выполнить, при необходимости, выполнить цветокоррекцию, а также оптимизацию контрастности и яркости для обеспечения лучшего распознавания текста. Также производится настройка резкости изображения.In step 5, it stores the initial key color parameters so that in step 6 it can perform color corrections, if necessary, as well as optimize contrast and brightness for better text recognition. The sharpness of the image is also adjusted.

После этого на этапе 7 происходит первичная попытка распознавания текста на изображении и определяется ориентация «верх-низ» (этап 8). На этапе 9 определяется, удалось ли определить верх и низ документа. В случае неудачи в распознавании текста происходит последовательно попытки найти строки текста без попытки прочитать их (этап 10), а также принимается решение о том, является предыдущая обработка документа от того же пользователя схожей по параметрам и какое решение об ориентации документа было принято в результате предыдущей обработки (этап 11). Далее на основе собранной информации происходит поворот документа в соответствии с определенной ориентацией (этап 12).Thereafter, at step 7, an initial attempt is made to recognize text on the image, and the up-down orientation is determined (step 8). In step 9, it is determined whether the top and bottom of the document was detected. In case of failure in text recognition, attempts are made to find lines of text without trying to read them (step 10), and a decision is also made about whether the previous processing of the document from the same user is similar in parameters and what decision about the orientation of the document was made as a result of the previous processing (step 11). Further, based on the collected information, the document is rotated in accordance with a certain orientation (step 12).

Затем на этапе 13 сверточная нейронная сеть выполняет первичное распознавание типа документа. Сверточная сеть выставляет степень схожести данного изображения со всеми известными типами документов на основании известного ей заранее типичных образов документов этого типа.Then, in step 13, the convolutional neural network performs primary recognition of the document type. The convolutional network sets the degree of similarity of a given image with all known types of documents on the basis of typical images of this type of documents known to it in advance.

Затем сверточная сеть пытается найти известные ей образы (нетекстовые признаки) на документе, включая изображение лица (этап 14), QR-код или штрихкод (этап 15), герб, логотип (этапе 16) и другую нетекстовую информацию, а также извлечь информацию о координатах и пропорциях такой информации.The convolutional network then tries to find images it knows (non-textual features) on the document, including a face image (step 14), a QR code or barcode (step 15), a coat of arms, a logo (step 16), and other non-textual information, as well as extract information about coordinates and proportions of such information.

На следующем этапе происходит упаковка вся полученной о документе информации и передаче ее рекуррентной нейросети, включая At the next stage, all information received about the document is packed and transmitted to a recurrent neural network, including

- «мнение» сверточной сети в виде кортежа с определенными значениями вероятности схожести, - "opinion" of a convolutional network in the form of a tuple with certain values of the likelihood of similarity,

- извлеченный неструктурированный текст, полученный при первичной обработке, - extracted unstructured text obtained during primary processing,

- наличие стоп-слов и их потенциальных модификаций, связанных с низким качеством изображения,- the presence of stop words and their potential modifications associated with low image quality,

- информация о дате, времени, месте фотографирования, - information about the date, time, place of photographing,

- информация о ключевых цветовых параметрах изображения- information about the key color parameters of the image

- информация о наличии нетекстовых элементов и их характеристиках- information about the presence of non-text elements and their characteristics

На этапе 17 рекуррентная нейросеть выставляет свои оценки степени похожести документа на документ определенного типа на основе алгоритмов машинного обучения.At stage 17, the recurrent neural network exposes its estimates of the degree of similarity of the document to a certain type of document based on machine learning algorithms.

На этапе 18 для распознавания документа применяется шаблон, соответствующий типу документа.In step 18, a template corresponding to the document type is applied to recognize the document.

На этапе 19 определяется, достаточно ли качество распознанного текста. Если качество недостаточно, то на этапе 20 выполняют поворот документа и проверяют качество распознанного текста еще раз.At step 19, it is determined whether the quality of the recognized text is sufficient. If the quality is not enough, then at step 20, the document is rotated and the quality of the recognized text is checked again.

На этапе 21 применяется уточненный шаблон для качественного распознавания текста, основанный на типе документа, учитывающий взаимное положение текстовых элементов, их цвет, шрифт и другие особенности. Этот этап, в частности, позволяет отделить шрифт от фонового рисунка, чего было нельзя сделать на предыдущем этапе. Извлеченные блоки текста сохраняются в виде структуры «поле-значение».At step 21, a refined template for high-quality text recognition is applied, based on the document type, taking into account the relative position of the text elements, their color, font, and other features. This stage, in particular, allows you to separate the font from the background image, which was not possible in the previous stage. The extracted blocks of text are saved as a field-value structure.

Далее для некоторых видов документов происходит обогащение информации, извлеченной из документа (этап 22). Дополнительная информация может быть получена от внешних источников. Так, например, для получения информации о кассовом чеке происходит обращение в «Открытое API ФНС России». Для других документов, например СНИЛС осуществляется сравнение ФИО владельца телефона с ФИО на документе и если отличия заключаются в небольшом количестве символов, причем которые по статистике для данного шрифта относятся к схожим, то ФИО владельца смартфона добавляется как претендент на исправление.Further, for some types of documents, the information extracted from the document is enriched (step 22). Additional information can be obtained from external sources. So, for example, to obtain information about a cashier's check, an appeal is made to the Open API of the Federal Tax Service of Russia. For other documents, for example SNILS, the full name of the owner of the phone is compared with the full name on the document, and if the differences are in a small number of characters, and which, according to statistics for this font, are similar, then the full name of the owner of the smartphone is added as a candidate for correction.

Далее на этапе 22 документ анализируется на предмет того, является ли он многостраничным за счет наличия признаков, характерных для вида документов, таких как номера страниц, связанность текста, наличия одного и того же номера паспорта на изображениях разных страниц паспорта. Для этого анализируются ранее введенные изображения. Further, at step 22, the document is analyzed for whether it is multi-page due to the presence of features characteristic of the type of documents, such as page numbers, text concatenation, the presence of the same passport number on images of different passport pages. For this, previously entered images are analyzed.

Если определено, что изображения относятся к одному и тому же документу, то на этапе 24 выполняется анализ, не является ли это изображение одной и той же страницы документа. Если обнаруживается, что это та же самая страница многостраничного документа, то выбирается страница, у которой лучше произошло распознавание текста и общая резкость изображения выше. Изображения из который был составлен многостраничный документ удаляются (этап 26).If it is determined that the images refer to the same document, then at step 24 it is analyzed if this image is the same page of the document. If it is found that this is the same page of a multi-page document, then the page is selected with better text recognition and higher overall image sharpness. The images from which the multi-page document was composed are deleted (step 26).

Если обнаружено, что это другая страница многостраничного документа, то на этапе 25 определяется, следует ли создать многостраничных документ или склеить несколько изображений в одно. Например, кассовый чек может быть очень длинным и пользователю требуется сделать несколько фотографий, чтобы целиком его внести. В первом случае к многостраничному документу добавляется новая страница (этап 27), а во втором случае выполняется склейка нескольких изображений в одно.If it is found that this is another page of a multi-page document, then at step 25 it is determined whether to create a multi-page document or to merge several images into one. For example, a sales receipt can be very long and the user needs to take several photos in order to fill in the entire receipt. In the first case, a new page is added to the multipage document (step 27), and in the second case, several images are glued into one.

Свёрточная нейронная сеть CNN обученная использует архитектуру InceptionV3. Эта сеть анализирует изображение и, на основе хранящихся в ней весовых коэффициентов, определяет визуальное сходство текущего изображения с массивом изображений документов того или иного типа. Сеть выдает кортеж, состоящий из скалярных значений вероятностей совпадения изображения. Данные вероятности колеблются от 0% до 100%.CNN trained convolutional neural network uses InceptionV3 architecture. This network analyzes the image and, based on the weighting factors stored in it, determines the visual similarity of the current image with an array of document images of one type or another. The network produces a tuple consisting of scalar values of the probabilities of the image coincidence. These probabilities range from 0% to 100%.

В одном варианте осуществления сверточная нейронная сеть может определять вероятность, что документ принадлежит к определенному типу, по пользуясь следующими значениями параметров, представленными в таблице 1:In one embodiment, a convolutional neural network can determine the probability that a document is of a particular type using the following parameter values shown in Table 1:

Таблица 1Table 1

ПараметрParameter Описание параметраParameter description ЗначениеValue EpochEpoch Кол-во полных итераций по всему набору данныхNumber of complete iterations over the entire dataset 200200 BatchSizeBatchSize Размера набора данных при мини обучениеDataset size with mini training 10ten LearningRateLearningRate Коэффициент скорости обученияLearning rate ratio 0,010.01

Рекуррентная нейронная сеть (RNN), также базирующаяся на базе TensorFlow Inception, анализирует множество параметров данного изображения, в том числе и решения о визуальном сходстве, принятые нейросетью CNN. На вход данной нейросети поступает набор числовых и строчных значений, в том числе данные, полученные в результате первого прохода модуля распознавания текста. Далее за счет использования многоклассового классификатора (multi-class classifier) и алгоритма мультиномиальной логистической регрессии (multinomial logistic regression algorithm) происходит сравнение с коэффициентами, полученными ранее методикой машинного обучения.A recurrent neural network (RNN), also based on TensorFlow Inception, analyzes many parameters of a given image, including the visual similarity decisions made by the CNN neural network. The input of this neural network is a set of numeric and string values, including the data obtained as a result of the first pass of the text recognition module. Further, through the use of a multi-class classifier and a multinomial logistic regression algorithm, a comparison is made with the coefficients previously obtained by machine learning.

В одном варианте осуществления рекуррентная нейронная сеть определяет вероятностный тип документа по следующей формуле:In one embodiment, the recurrent neural network determines the probabilistic document type using the following formula:

,

где Where

- T – вероятностное значение приоритетного типа документа- T - the probabilistic value of the priority document type

- Max U[] – функция поиска максимума в массиве скалярных значений, каждый из которых представлен суммой N весов.- Max U [] - function of finding the maximum in an array of scalar values, each of which is represented by the sum of N weights.

- W_ik*F_ik - скалярное значение вероятности для k-того параметра i-того типа документа- W _ik * F _ik - scalar probability value for the k-th parameter of the i-th document type

- W_ik – весовой коэффициент для для k-того параметра i-того типа документа- W _ik - the weighting factor for the k-th parameter of the i-th document type

- F_ik – k-тый параметр (фактор) i-того типа документа, представленный функцией свертки соответствующего входного параметра- F _ik - the k-th parameter (factor) of the i-th document type, represented by the convolution function of the corresponding input parameter

Пример параметров весовых коэффициентов представлен в таблице 2.An example of the parameters of the weighting factors is presented in Table 2.

Таблица 2table 2

ПараметрParameter ОписаниеDescription ТипA type cnn_document_typecnn_document_type Результат обработки документа через CNN The result of processing a document via CNN СтрокаLine b_pattern_rfb_pattern_rf Наличие фразы ‘РОССИЙСКАЯ ФЕДЕРАЦИЯ’The presence of the phrase ‘RUSSIAN FEDERATION’ BooleanBoolean b_pattern_birthdateb_pattern_birthdate Наличие фразы ‘Дата рождения’Presence of the phrase ‘Date of birth’ BooleanBoolean b_pattern_birthplaceb_pattern_birthplace Наличие фразы ‘Место рождения’The presence of the phrase ‘Place of birth’ BooleanBoolean b_pattern_signb_pattern_sign Наличие фразы ‘Подпись владельца’Presence of the phrase 'Signature of the owner' BooleanBoolean b_pattern_lnameb_pattern_lname Наличие фразы ‘Фамилия’ The presence of the phrase ‘Last name’ BooleanBoolean b_pattern_fnameb_pattern_fname Наличие фразы ‘Имя’The presence of the phrase ‘Name’ BooleanBoolean b_pattern_mnameb_pattern_mname Наличие фразы ‘Отчество’The presence of the phrase ‘Patronymic’ BooleanBoolean b_pattern_insurpoliceb_pattern_insurpolice Наличие фразы ‘Страховой полис'Presence of the phrase 'Insurance policy' BooleanBoolean b_psprt_issuedb_psprt_issued Наличие фразы ‘Паспорт выдан’The presence of the phrase ‘Passport issued’ BooleanBoolean b_psprt_branchb_psprt_branch Наличие фразы ‘ОТДЕЛЕНИЕМ УФМС РОССИИ’The presence of the phrase ‘DEPARTMENT OF THE FMS OF RUSSIA’ BooleanBoolean b_psprt_branchb_psprt_branch Наличие фразы ‘ОТДЕЛОМ ВНУТРЕННИХ ДЕЛ’The presence of the phrase ‘DEPARTMENT OF THE INTERIOR’ BooleanBoolean b_psprt_branchb_psprt_branch Наличие фразы ‘ОТДЕЛОМ УФМС РОССИИ’The presence of the phrase ‘DEPARTMENT OF THE FMS OF RUSSIA’ BooleanBoolean b_psprt_branchb_psprt_branch Наличие фразы ‘ОУФМС РОССИИ’The presence of the phrase ‘OUFMS RUSSIA’ BooleanBoolean b_psprt_isdateb_psprt_isdate Наличие фразы ‘Дата выдачи’The presence of the phrase ‘Date of issue’ BooleanBoolean b_psprt_issuerb_psprt_issuer Наличие фразы ‘Код подразделения’The presence of the phrase ‘Subdivision code’ BooleanBoolean b_psprt_codeb_psprt_code Наличие фразы ‘Личный код’The presence of the phrase ‘Personal code’ BooleanBoolean b_psprt_signb_psprt_sign Наличие фразы ‘Личная подпись’The presence of the phrase ‘Personal signature’ BooleanBoolean b_psprt_gendateb_psprt_gendate Наличие фразы ‘Паттерн "пол + дата"’The presence of the phrase ‘Gender + date pattern’ BooleanBoolean b_faceb_face Наличие изображения лицаHaving a face image BooleanBoolean n_pattern_rfn_pattern_rf Местоположение фразы ‘РОССИЙСКАЯ ФЕДЕРАЦИЯ’ относительно документаLocation of the phrase ‘RUSSIAN FEDERATION’ in relation to the document IntegerInteger n_pattern_birthdaten_pattern_birthdate Местоположение фразы ‘Дата рождения’ относительно документаLocation of the phrase ‘Date of birth’ in relation to the document IntegerInteger n_pattern_birthplacen_pattern_birthplace Местоположение фразы ‘Место рождения’ относительно документаLocation of the phrase ‘Place of birth’ in relation to the document IntegerInteger n_pattern_signn_pattern_sign Местоположение фразы ‘Подпись владельца’ относительно документаLocation of the phrase 'Owner's signature' in relation to the document IntegerInteger n_pattern_lnamen_pattern_lname Местоположение фразы ‘Фамилия’ относительно документаLocation of the phrase ‘Last name’ in relation to the document IntegerInteger n_pattern_fnamen_pattern_fname Местоположение фразы ‘Имя’ относительно документаLocation of ‘Name’ in relation to document IntegerInteger n_pattern_mnamen_pattern_mname Местоположение фразы ‘Отчество’ относительно документаLocation of the phrase ‘Patronymic’ in relation to the document IntegerInteger n_pattern_insurpolicen_pattern_insurpolice Местоположение фразы ‘Страховой полис’ относительно документаLocation of the phrase ‘Insurance policy’ in relation to the document IntegerInteger n_psprt_issuedn_psprt_issued Местоположение фразы ‘Паспорт выдан’ относительно документаLocation of the phrase 'Passport issued' in relation to the document IntegerInteger n_psprt_branchn_psprt_branch Местоположение фразы ‘ОТДЕЛЕНИЕМ УФМС РОССИИ’ относительно документаLocation of the phrase ‘DEPARTMENT OF THE FMS OF RUSSIA’ in relation to the document IntegerInteger n_psprt_branchn_psprt_branch Местоположение фразы ‘ОТДЕЛОМ ВНУТРЕННИХ ДЕЛ’ относительно документаLocation of the phrase ‘DEPARTMENT’ in relation to the document IntegerInteger n_psprt_branchn_psprt_branch Местоположение фразы ‘ОТДЕЛОМ УФМС РОССИИ’ относительно документаLocation of the phrase ‘DEPARTMENT OF THE FMS OF RUSSIA’ in relation to the document IntegerInteger n_psprt_branchn_psprt_branch Местоположение фразы ‘ОУФМС РОССИИ’ относительно документаLocation of the phrase ‘OUFMS RUSSIA’ relative to the document IntegerInteger n_psprt_isdaten_psprt_isdate Местоположение фразы ‘Дата выдачи’ относительно документаLocation of the phrase ‘Date of issue’ in relation to the document IntegerInteger n_psprt_issuern_psprt_issuer Местоположение фразы ‘Код подразделения’ относительно документаLocation of the phrase ‘Department code’ in relation to the document IntegerInteger n_psprt_coden_psprt_code Местоположение фразы ‘Личный код’ относительно документаLocation of the phrase ‘Personal code’ in relation to the document IntegerInteger n_psprt_signn_psprt_sign Местоположение фразы ‘Личная подпись’ относительно документаLocation of the phrase ‘Personal signature’ in relation to the document IntegerInteger n_psprt_gendaten_psprt_gendate Местоположение фразы ‘Паттерн "пол + дата"’ относительно документаLocation of the phrase ‘Gender + date pattern’ relative to the document IntegerInteger n_facen_face Местоположение изображения лица относительно документаLocation of the face image relative to the document IntegerInteger

Пример значений весовых коэффициентов представлен в таблице 3An example of the values of the weighting factors is presented in Table 3.

Таблица 3Table 3

ПараметрParameter Типы документовDocument types Т1T1 T2T2 T3T3 T4T4 T5T5 T6T6 T7T7 b_pattern_rfb_pattern_rf 5%five% 0%0% 0%0% 0%0% 0%0% 0%0% 0%0% b_pattern_birthdateb_pattern_birthdate 43%43% 0%0% 0%0% 0%0% 0%0% 0%0% 0%0% b_pattern_birthplaceb_pattern_birthplace 85%85% 0%0% 0%0% 0%0% 0%0% 0%0% 0%0% b_pattern_signb_pattern_sign 82%82% 0%0% 0%0% 0%0% 0%0% 0%0% 0%0% b_pattern_lnameb_pattern_lname 47%47% 0%0% 0%0% 0%0% 0%0% 0%0% 0%0% b_pattern_fnameb_pattern_fname 81%81% 0%0% 0%0% 0%0% 0%0% 0%0% 0%0% b_pattern_mnameb_pattern_mname 61%61% 0%0% 0%0% 0%0% 0%0% 0%0% 0%0% b_pattern_insurpoliceb_pattern_insurpolice 35%35% 0%0% 0%0% 0%0% 0%0% 0%0% 0%0% b_psprt_issuedb_psprt_issued 50%fifty% 0%0% 0%0% 0%0% 0%0% 0%0% 0%0% b_psprt_branchb_psprt_branch 26%26% 0%0% 0%0% 0%0% 0%0% 0%0% 0%0% b_psprt_branchb_psprt_branch 52%52% 0%0% 0%0% 0%0% 0%0% 0%0% 0%0% b_psprt_branchb_psprt_branch 5%five% 0%0% 0%0% 0%0% 0%0% 0%0% 0%0% b_psprt_branchb_psprt_branch 60%60% 0%0% 0%0% 0%0% 0%0% 0%0% 0%0% b_psprt_isdateb_psprt_isdate 71%71% 0%0% 0%0% 0%0% 0%0% 0%0% 0%0% b_psprt_issuerb_psprt_issuer 37%37% 0%0% 0%0% 0%0% 0%0% 0%0% 0%0% b_psprt_codeb_psprt_code 66%66% 0%0% 0%0% 0%0% 0%0% 0%0% 0%0% b_psprt_signb_psprt_sign 2%2% 0%0% 0%0% 0%0% 0%0% 0%0% 0%0% b_psprt_gendateb_psprt_gendate 55%55% 0%0% 0%0% 0%0% 0%0% 0%0% 0%0% b_faceb_face 42%42% 31%31% 88%88% 14%14% 0%0% 0%0% 11%eleven% n_pattern_rfn_pattern_rf 16%16% 0%0% 0%0% 0%0% 0%0% 0%0% 0%0% n_pattern_birthdaten_pattern_birthdate 9%9% 0%0% 0%0% 0%0% 0%0% 0%0% 0%0% n_pattern_birthplacen_pattern_birthplace 48%48% 0%0% 0%0% 0%0% 0%0% 0%0% 0%0% n_pattern_signn_pattern_sign 20%20% 0%0% 0%0% 0%0% 0%0% 0%0% 0%0% n_pattern_lnamen_pattern_lname 30%thirty% 0%0% 0%0% 0%0% 0%0% 0%0% 0%0% n_pattern_fnamen_pattern_fname 19%19% 0%0% 0%0% 0%0% 0%0% 0%0% 0%0% n_pattern_mnamen_pattern_mname 50%fifty% 0%0% 0%0% 0%0% 0%0% 0%0% 0%0% n_pattern_insurpolicen_pattern_insurpolice 65%65% 0%0% 0%0% 0%0% 0%0% 0%0% 0%0% n_psprt_issuedn_psprt_issued 5%five% 0%0% 0%0% 0%0% 0%0% 0%0% 0%0% n_psprt_branchn_psprt_branch 62%62% 0%0% 0%0% 0%0% 0%0% 0%0% 0%0% n_psprt_branchn_psprt_branch 55%55% 0%0% 0%0% 0%0% 0%0% 0%0% 0%0% n_psprt_branchn_psprt_branch 2%2% 0%0% 0%0% 0%0% 0%0% 0%0% 0%0% n_psprt_branchn_psprt_branch 94%94% 0%0% 0%0% 0%0% 0%0% 0%0% 0%0% n_psprt_isdaten_psprt_isdate 43%43% 0%0% 0%0% 0%0% 0%0% 0%0% 0%0% n_psprt_issuern_psprt_issuer 57%57% 0%0% 0%0% 0%0% 0%0% 0%0% 0%0% n_psprt_coden_psprt_code 39%39% 0%0% 0%0% 0%0% 0%0% 0%0% 0%0% n_psprt_signn_psprt_sign 29%29% 0%0% 0%0% 0%0% 0%0% 0%0% 0%0% n_psprt_gendaten_psprt_gendate 35%35% 0%0% 0%0% 0%0% 0%0% 0%0% 0%0% n_facen_face 39%39% 20%20% 19%19% 11%eleven% 0%0% 0%0% 8%8%

В одном варианте осуществления обучение нейронной сети происходит не в режиме реального времени, а в периоды технологического обслуживания. При этом могут выполняться следующие этапы:In one embodiment, the training of the neural network occurs not in real time, but during maintenance periods. In this case, the following steps can be performed:

- Выполняют первоначальную разметку в базе документов и изображений, в отношении которых в течение работы были жалобы на некорректное распознавание или ряд косвенных параметров выбивается из статистической погрешности.- Perform initial markup in the database of documents and images, in respect of which during the work there were complaints about incorrect recognition or a number of indirect parameters are knocked out of the statistical error.

- Изучают проблемные документы в ручном режиме- Study problem documents in manual mode

- Осуществляют подготовку очищенного набора данных- Prepare the cleaned dataset

- Выполняют обучение нейросети на очищенном наборе данных- Carry out training of a neural network on a cleaned dataset

- Выполняют анализ результатов обучения- Analyze learning outcomes

- Загружают уточненные коэффициенты в базу данных и переключают поток пользователей на обновленную логику распознавания- Upload the updated coefficients to the database and switch the user flow to the updated recognition logic

Пример применения изобретенияExample of application of the invention

Пользователь запускает приложение на смартфоне и фотографирует кассовые чек средствами, встроенными в приложение. Приложение самостоятельно обрабатывает изображения по способу в соответствии с настоящим изобретением. Если через некоторое время пользователю понадобился данный чек, он может запустить приложение и подать, например, голосовую команду "найти чек на утюг". Приложение выполняет поиск всех кассовых чеков, в которых в качестве товара указан утюг. Пользователь распечатывает чек и прикладывает копию чек, например, к направляемой в магазин претензии. При этом пользователь не выполняет никаких иных действий, кроме фотографирования чека и подачи команды приложению. Всю остальную обработку приложение выполняет самостоятельно.The user launches the application on a smartphone and photographs the cashier's receipt using the means built into the application. The application independently processes the images according to the method in accordance with the present invention. If after some time the user needs this check, he can launch the application and give, for example, a voice command "find a check on the iron". The application searches for all receipts with an iron specified as the item. The user prints out the receipt and attaches a copy of the receipt, for example, to a claim sent to a store. In this case, the user does not perform any other actions, except for photographing the receipt and giving a command to the application. The application does the rest of the processing itself.

Настоящее изобретение позволяет надежно распознать и классифицировать документ из потока произвольных изображений.The present invention makes it possible to reliably recognize and classify a document from an arbitrary image stream.

Claims

1. A method for recognizing and classifying documents from a stream of arbitrary images:

i) extracting a document image using machine vision algorithms from a set of arbitrary images;

ii) determining an assumed document type using a convolutional neural network based on the extracted document image, wherein the convolutional network exposes the degree of similarity of the document image with known document types based on the specified document samples;

iii) determining the document type using the recurrent neural network based on the intended document type obtained from the convolutional neural network, while the recurrent neural network evaluates the degree of similarity of the document with the intended document type based on machine learning algorithms;

iv) recognize the data contained in the document based on a certain type of document;

v) save the recognized data.

2. The method according to claim 1, further comprising the step of extracting key non-textual features of the document,

however, in step iii) the document type is additionally determined based on the extracted non-textual characteristics of the document.

3. The method according to claim 1, further comprising the step of extracting key text parameters of the document,

however, in step iii) the document type is additionally determined based on the extracted key text parameters of the document.

4. A method according to any one of claims. 1-3, further comprising the step of recognizing the text contained in the document based on the recognized document type and storing the recognized text.

5. The method according to claim 4, wherein at the stage of recognizing the text contained in the document, based on the recognized document type, a template corresponding to the document type is applied.

6. The method according to any one of claims. 1-3, further comprising the step of enriching the recognized data with additional information extracted from external sources based on the recognized data.

7. A method according to any one of claims. 1-3, additionally containing the stages at which:

determine whether the recognized data belongs to a previously recognized document, and

in the case where it is determined that the recognized data belongs to the previously recognized document, it is determined whether to create a multi-page document or to glue the recognized data with the previously recognized data.

8. The method according to claim. 7, in which in the case when the recognized data belongs to the previously recognized document, it is checked whether the recognized data matches the previously recognized data, and,

if the recognized data coincides with the previously recognized data, the previously recognized data or the recognized data are deleted.

9. The method according to any one of claims. 1-3, additionally containing the stages at which:

before the step of determining the intended type of document by the convolutional neural network, it is determined whether the document image is already a recognized document image, and,

if the document image is already a recognized document image, the method is terminated.

10. The method according to any one of claims. 1-3, further comprising a stage at which;

before step ii) the boundaries of the document are determined and the geometry of the document image is corrected.

11. The method according to any one of claims. 1-3, further comprising a stage at which;

before step ii) perform color correction of the document image, and also optimize the contrast and brightness of the image for better text recognition.

12. The method according to any one of claims. 1-3, in which before step ii) the orientation of the document is determined and, if necessary, the document is rotated.

13. The method according to claim 1, in which at step ii) the image is analyzed using a convolutional neural network, the visual similarity of the current image with an array of images of documents of various types is determined based on the weight coefficients stored in it, and a tuple consisting of scalar values of the membership probabilities images to different types of document.

14. The method according to claim 1, wherein in step iii) a plurality of document image parameters are determined and, using a multi-class classifier and a multinomial logistic regression algorithm, these parameters are compared with the coefficients obtained in the result of machine learning.

15. The method of claim 1, wherein the set of arbitrary images required in step i) is obtained by photographing documents.

16. The method of claim 15, wherein documents are photographed using a smartphone camera.

17. The method of claim 15, wherein the set of arbitrary images is contained in a photo bank specified by the user.

18. The method of claim 1, wherein the steps of the method are performed using software.

19. The method of claim 18, wherein the software is a mobile application installed on a smartphone.

20. The method of claim 18, in which the software is located in the cloud on the servers of the operating organization.

21. A system for recognizing and classifying documents from a stream of arbitrary images, containing:

a preprocessing module configured to extract a document image using machine vision algorithms from a set of arbitrary images;

a convolutional neural network configured to determine the intended type of document based on the extracted document image, while the convolutional neural network is configured to set the degree of similarity of the document image with known document types based on the specified document samples;

a recurrent neural network configured to determine the type of document based on the intended type of document received from the convolutional neural network, while the recurrent neural network is configured to assess the degree of similarity of the document with the intended type of document based on machine learning algorithms; and

recognizing data contained in a document based on a specific type of document;

a storage module configured to store the recognized data.

22. The system of claim. 21, in which the recurrent neural network is configured to extract key non-text features of the document and

determining the document type additionally based on the extracted non-text characteristics of the document.

23. The system according to claim. 21, in which the recurrent neural network is additionally configured to recognize the key text parameters of the document and

determining the document type additionally based on the recognized key parameters of the document.

24. The method according to any one of paragraphs. 21-23, in which the recurrent neural network is configured to recognize the text contained in the document based on the recognized type of the document, while the storage module is configured to store the recognized text.

25. The system of claim 24, wherein the recurrent neural network is configured to apply a template corresponding to the type of document when recognizing the text contained in the document.

26. The system according to any one of paragraphs. 21-23, in which the recurrent neural network is configured to enrich the recognized data with additional information extracted from external sources based on the recognized data.

27. System according to any one of paragraphs. 21-23, in which the recurrent neural network is additionally configured to:

determining whether the recognized data belongs to a previously recognized document, and

in the case where it is determined that the recognized data belongs to a previously recognized document, determining whether to create a multi-page document or to glue the recognized data with the previously recognized data.

28. The system of claim. 27, in which the recurrent neural network, in the case when the recognized data belongs to a previously recognized document, is configured to check whether the recognized data matches the previously recognized data,

in this case, the storing module, if the recognized data coincides with the previously recognized data, is configured to delete the previously recognized data or the recognized data.

29. The system according to any one of paragraphs. 21-23, in which the preprocessing module is configured to:

determining if the document image is already a recognized document image, and,

if the document image is already a recognized document image, stop processing the document.

30. The system of claim 20, wherein the preprocessing module is further configured;

defining the boundaries of the document and performing correction of the geometry of the document image.

31. The system according to any one of paragraphs. 21-23, in which the preprocessing module is further configured to perform color correction on the document image, and to optimize the contrast and brightness of the image to provide better text recognition.

32. The system according to any one of paragraphs. 21-23, in which the preprocessing module is further configured to determine the orientation of the document and, if necessary, perform rotation of the document.

33. The system according to claim 21, in which the convolutional neural network is additionally configured to analyze the image, determine, based on the weight coefficients stored in it, the visual similarity of the current image with an array of images of documents of various types, and issue a tuple consisting of scalar values of the probabilities of the image belonging to different types of document.

34. The system according to claim 21, in which the recurrent neural network is configured to determine a plurality of document image parameters and by using a multi-class classifier and a multinomial logistic regression algorithm for comparing said parameters with coefficients, obtained as a result of machine learning.

35. The system of claim. 21, wherein said set of arbitrary images is obtained by photographing documents.

36. The system according to claim 35, wherein the photographing of documents is carried out using a smartphone camera.

37. The system of claim 35, wherein the set of arbitrary images is contained in a bank of photographs specified by the user.

38. The system of claim 21, wherein the preprocessing module, convolutional neural network, recurrent neural network, and storage module are implemented as software.

39. The system of claim 38, in which the software is a mobile application installed on a smartphone.

40. System p. 38, in which the software is in the cloud on the servers of the operating organization.