WO2020197428A1 - Procédé et système de vérification d'un ensemble électronique de documents - Google Patents

Procédé et système de vérification d'un ensemble électronique de documents Download PDF

Info

Publication number
WO2020197428A1
WO2020197428A1 PCT/RU2019/000197 RU2019000197W WO2020197428A1 WO 2020197428 A1 WO2020197428 A1 WO 2020197428A1 RU 2019000197 W RU2019000197 W RU 2019000197W WO 2020197428 A1 WO2020197428 A1 WO 2020197428A1
Authority
WO
WIPO (PCT)
Prior art keywords
document
page
signer
attribute
type
Prior art date
Application number
PCT/RU2019/000197
Other languages
English (en)
Russian (ru)
Inventor
Евгений Сергеевич ЛАТЫШЕВ
Кирилл Геннадьевич ТАРАСОВ
Original Assignee
Публичное Акционерное Общество "Сбербанк России"
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Публичное Акционерное Общество "Сбербанк России" filed Critical Публичное Акционерное Общество "Сбербанк России"
Publication of WO2020197428A1 publication Critical patent/WO2020197428A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/30Writer recognition; Reading and verifying signatures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis

Definitions

  • the presented technical solution relates generally to the field of image analysis, and in particular to methods and systems for checking an electronic set of documents, for example, scanned documents of a corporate client of a bank.
  • the technical problem or task posed in this technical solution is the creation of a new effective method for automated verification of a set of documents, for example, documents of a corporate client of the Bank.
  • the technical result is to improve the accuracy of automated verification of documents for completeness.
  • An additional technical result is an increase in the speed of the automated check of documents for their completeness.
  • the document page vector is formed on the basis of the word values contained in the text information, the structure of word dependencies from each other, and the weight values of said words.
  • determining the type of document and the type of its page based on the vector of the document page is carried out by classifying the document according to its belonging to predetermined types of pages and documents, and the mathematical model for classification is implemented by means of machine learning algorithms "random forest”.
  • the step of checking for the presence of at least one attribute of the signer on the resulting image of the document includes the steps in which:
  • checking for the presence of at least one signer's attribute on the obtained document image to determine the completeness of the document is carried out by comparing information about the location of the signer's attribute on the document page image with information indicating where the signer's attribute should be located on this page type.
  • the detection of at least one signer attribute is carried out only on those images of document pages of the type indicating that the page data contains the signer's attributes.
  • At least one signer attribute is additionally classified, and the classification is carried out based on the location information of the signer attribute.
  • the signer attribute is a signature and / or a seal.
  • a system for verifying a set of documents comprising at least one computing device and at least one memory containing machine-readable instructions that, when executed by at least one computing device, perform the above method.
  • FIG. 1 shows a general diagram of the interaction of system elements for checking a set of documents.
  • FIG. 2 shows an example of a scanned document.
  • FIG. 3 shows an example of a general view of a system for checking a set of documents.
  • the system means, including a computer system, a computer (electronic computer), a CNC (numerical control), a PLC (programmable logic controller), computerized control systems and any other devices capable of performing a given , a well-defined sequence of operations (actions, instructions).
  • a computer electronic computer
  • CNC numerical control
  • PLC programmable logic controller
  • computerized control systems any other devices capable of performing a given , a well-defined sequence of operations (actions, instructions).
  • a command processing device means an electronic unit or an integrated circuit (microprocessor) that executes machine instructions (programs).
  • An instruction processor reads and executes machine instructions (programs) from one or more storage devices.
  • Data storage devices can include, but are not limited to, hard drives (HDD), flash memory, ROM (read only memory), solid state drives (SSD), optical drives.
  • a program is a sequence of instructions for execution by a computer control device or command processing device.
  • Database (DB) a collection of data organized in accordance with a conceptual structure describing the characteristics of this data and the relationship between them, and such a collection data that supports one or more application areas (ISO / IEC 2382: 2015, 2121423 "database”).
  • the system 1 for checking a set of documents contains interconnected: a data conversion module 10; a page classification module 20, a signer attributes verification module 30 such as signatures and / or stamps, and a document set verification module 40.
  • These modules can be implemented on the basis of the software and hardware of the system 1 for checking a set of documents, for example, on the basis of at least one computing device, in particular a microprocessor, and at least one memory containing machine-readable instructions for implementing assigned modules below functions.
  • the data transformation module 10 may contain a vector shaping module 11 and an image filtering module 12, and can be implemented based on the Tesseract opensource tool (Tesseract Open Source OCR Engine) and the TF-IDF algorithm.
  • Page classification module 20 can be implemented on the basis of a pre-trained mathematical model using a mathematical model learning algorithm - a random forest of decision trees.
  • Signer attribute verification module 30 may be implemented on the basis of a YOLOv3 neural network pre-trained on a typical set of signatures and seals.
  • the document set verification module 40 may include at least one database 41 for storing information that may be required to verify the document set.
  • the system (200) for checking a set of documents contains one or more processors (201), united by a common bus of information exchange, memory means such as RAM (202) and ROM (203), input / output interfaces (204), input / output devices (205), and a device for networking (206).
  • processors 201
  • memory means such as RAM (202) and ROM
  • input / output interfaces 204
  • input / output devices 205
  • device for networking 206
  • the processor (201) (or multiple processors, multi-core processor, etc.) can be selected from a range of devices currently widely used, for example, such manufacturers as: Intel TM, AMD TM, Apple TM, Samsung Exynos TM, MediaTEK TM, Qualcomm Snapdragon TM, etc. Under the processor or one of the used processors in the system (200), it is also necessary to take into account the GPU, for example, NVIDIA GPU with a CUDA-compatible programming model, or Graphcore, the type of which is also suitable for full or partial execution of the method, and can also be used for training and applying machine learning models in various information systems.
  • the GPU for example, NVIDIA GPU with a CUDA-compatible programming model, or Graphcore, the type of which is also suitable for full or partial execution of the method, and can also be used for training and applying machine learning models in various information systems.
  • the RAM (202) is a random access memory and is intended for storing machine-readable instructions executed by the processor (201) for performing the necessary operations for logical data processing.
  • RAM (202) contains executable instructions of the operating system and corresponding software components (applications, software modules, etc.).
  • the available memory of the graphics card or graphics processor can act as RAM (202).
  • ROM (203) is one or more persistent storage devices such as a hard disk drive (HDD), solid state data storage device (SSD), flash memory (EEPROM, NAND, etc.), optical storage media ( CD-R / RW, DVD-R / RW, BlueRay Disc, MD), etc.
  • HDD hard disk drive
  • SSD solid state data storage device
  • EEPROM electrically erasable programmable read-only memory
  • NAND flash memory
  • optical storage media CD-R / RW, DVD-R / RW, BlueRay Disc, MD, etc.
  • I / O interfaces (204) are used to organize the operation of the system components (200) and to organize the operation of external connected devices.
  • the choice of the appropriate interfaces depends on the specific version of the computing device, which can be, but are not limited to: PCI, AGP, PS / 2, IrDa, FireWire, LPT, COM, SATA, IDE, Lightning, USB (2.0, 3.0, 3.1, micro, mini, type C), TRS / Audio jack (2.5, 3.5, 6.35), HDMI, DVI, VGA, Display Port, RJ45, RS232, etc.
  • I / O information are used, for example, a keyboard, a display (monitor), a touch display, a touch-pad, a joystick, a mouse manipulator, a light pen, a stylus, touch panel, trackball, speakers, microphone, augmented reality, optical sensors, tablet, light indicators, projector, camera, biometric identification (retina scanner, fingerprint scanner, voice recognition module), etc.
  • the networking tool (206) provides data transmission via an internal or external computer network, for example, Intranet, Internet, LAN, and the like.
  • One or more means (206) can be used, but not limited to: Ethernet card, GSM modem, GPRS modem, LTE modem, 5G modem, satellite communication module, NFC module, Bluetooth and / or BLE module, Wi-Fi module, and dr.
  • satellite navigation means can be used as part of the system (200), for example, GPS, GLONASS, BeiDou, Galileo.
  • the data conversion module 10 receives at least one image of a document, in particular a scanned document, for example, a file in the format of a multipage PDF, JPEG, TIFF or any other known format that can be used for storage in the scanned electronic document set.
  • the document image may come from an image data source 50, in particular directly from a document scanning device such as a scanner, or it may be retrieved from a suitable image database in which the document image data is stored in advance.
  • the document the image of which is supplied to the data conversion module 10, can be any document consisting of at least one page, which may contain the attributes of the signer, and filled in accordance with a known template.
  • the document can be, for example, an agreement concluded between companies "A" and "B", or between a company and an individual, or between individuals, or the document can represent this type of document that is signed by only one signatory - a company or an individual , for example, a power of attorney from a company or from an individual; or etc.
  • the data conversion unit 10 performs character recognition on at least one image of a document page and converts them into text information.
  • the data conversion module 10 can be configured to preprocess the received text information to reduce the variety of possible texts of recognized document images in order to simplify the operation of the following system modules.
  • textual information is tokenized.
  • the tokenization stage involves the selection of basic text elements (tokens), delimited on both sides by separating characters, spaces or punctuation marks.
  • the elements here are words, numbers, dates, abbreviations, abbreviations, compound prepositions, etc. Tokenization allows you to select discrete units of text, which are the basis for further work at the stages of morphological and syntactic analysis. As a result of tokenization, each element is assigned the appropriate type: word, number, date, address, etc.
  • the data conversion unit 10 proceeds to the step of generating document page vectors by the vector generating unit 11.
  • the mentioned module for each word obtained after text processing determines the value of the word weight using the statistical measure TF-IDF.
  • TF-IDF is a statistical measure used to assess the importance of a word in the context of a document that is part of a document collection or corpus.
  • the weight of a word is proportional to the number of times this word is used in a document, and is inversely proportional to the frequency of using a word in other documents in the collection.
  • the TF-IDF measure is often used in text analysis and information retrieval tasks, for example, as one of the criteria for the relevance of a document to a search query, when calculating the measure of document proximity during clustering.
  • TF term frequency
  • u is the number of occurrences of the word t * in document d; , k n k is the total number of words in a given user request and / or document.
  • IDF inverse document frequency
  • IDF accounting reduces the weight of commonly used words. There is only one IDF value for each unique word within a particular collection of documents.
  • the IDF characteristic is defined by the following relationship:
  • a high TF-IDF weight is given to words with high frequency within a particular document and with low frequency in other documents.
  • the TF-IDF measure is often used to represent collection documents as numerical vectors reflecting the importance of using each word from a set of words (the number of words in a set determines the dimension of the vector) in each document.
  • Such a model is called a vector model and makes it possible to compare texts by comparing the vectors that represent them in some metric (Euclidean distance, cosine measure, Manhattan distance, Chebyshev distance, etc.), that is, by performing cluster analysis.
  • the module 11 for generating vectors based on the word values obtained after preprocessing, the structure of the dependence of words on each friend in the text information and the values of the weights of the mentioned words forms the vector of the document page.
  • the generated document page vector is sent to the page classification module 20 to determine the document type and the document page type, i.e. to classify a document as belonging to predefined types of pages and documents.
  • the vector generating unit 11 generates in the same way for each page of the document a document page vector, which are also sent to the page classification unit 20.
  • Module 20 classification of pages for determining the type of document and types of its pages contains a mathematical model, the input of which is received data on the vectors of the pages of the document.
  • the mathematical model can be implemented through machine learning algorithms "random forest", which consists in using a committee (ensemble) of decision trees.
  • the classification of objects is carried out by voting: each committee tree assigns the object to be classified, in this case a document page, to one of the classes characterizing the page type and document type, and the class for which the largest number of trees voted wins.
  • the optimal number of trees is selected in such a way as to minimize the classifier error on the test sample.
  • the output of the mathematical model comes from each decision tree pointers of the document type and the page type.
  • the page classification module 20 analyzes the number of said pointers obtained at the output of said model, and determines the document type and page type based on the document type indicator, the number of which is greater at the output of the mathematical model, i.e. for which the largest number of trees voted.
  • the document type will be determined as a "contract”
  • the data on the page type will indicate that the page in the image is a page of the contract, which must contain the attributes of signers in the form of signatures and seals of companies "A" and "B" located in the specified areas on the page (for example , in areas 105 and 106, see Fig. 2).
  • the document type can be defined, for example, as a power of attorney from company "A" or from an individual
  • the page type is a page of a power of attorney with the signer's attributes, for example, in the form of a signature and a seal in a specified area of the page, for example, area 106, if the power of attorney from the company, or only in the form of a signature in the specified area of the page, if the power of attorney is from an individual.
  • the input of the mathematical model receives data on two or more vectors of document pages.
  • the page classification module 20 similarly analyzes the number of the mentioned pointers at the output of the mathematical model and determines the type of document and the list of its pages based on that document type indicator, the number of which is greater at the output of the mathematical model. In this case, the document type is determined based on the vectors of all its pages.
  • the page classification module 20 can determine the type of the document as an agreement between companies "A" and "B", consisting of 4 pages, and the data on the types of pages may indicate that the first page is a page of the Agreement, not containing the signer's attributes, the second page is the contract page with the signer's attributes in the specified areas, and pages 3 and 4 are applications that do not contain the signer's attributes.
  • the data on the types of the document and the types of its pages is sent by the page classification module 20 to the document set verification module 40 and to the image filtering module 12, which determines the types of pages with at least one signer attribute, extracts the corresponding page images with at least one the signer attribute from the document image, and sends the page image data to the signer attribute validator 30 for further analysis.
  • the module 30 since not the entire image of the document is sent to the module 30 for checking the attributes of the signer, but only images of document pages, the type of which assumes the presence of at least one signer's attribute on these pages, reduces the computational load and increases the speed of image processing by the module 30 for detecting images of the signer's attributes, thereby increasing the speed of automated verification of documents for their completeness.
  • the module 30 for checking the signer's attributes proceeds to the step of detecting at least one image of the signer's attribute on each received image of the document page to determine its location on the document page.
  • the signer attribute validator 30 may determine that the signer's attribute image is a signature and / or seal image in document area 105 or 106 (see FIG. 2).
  • area 101 of document 100 may contain information about the Agreement number, area 102 - the name of the city, area 103 - the date of the Agreement, and area 104 - the text of the Agreement.
  • the well-known algorithms of the neural network of the YOLOv3 architecture trained on a selected dataset of signatures and seals are used to detect images of the signer's attributes, disclosed, for example, in an article published on the Internet at: https://pireddie.com/media/files /papers/YQLOv3.pdf.
  • the data on the detected attributes of the signer, in particular information on their location on the document page, is transmitted to the module 40 for checking the set of documents.
  • Module 40 verification of a set of documents in the course of its work checks for the presence of mandatory for this type of document list of pages and attributes of signers, such as a seal and / or signature and / or a set of signatures, in the specified areas of the pages.
  • the module 40 for checking the set of documents can be equipped with a corresponding database 41 with information about the templates of documents, their list of pages, and the attributes of the signers, the presence of which must be checked in a given area of pages from the list of pages of a given type of document.
  • Page vectors are based on textual information, which may include the names of one or more companies, or the names of one or more individuals, then information about the type of pages will also determine in which area of the page the attributes should be located 5 signed on the image of the document page.
  • the document set verification module 40 checks areas 105 or 106 of the second page for the presence of attributes of signers, and the location of the attributes of the first and second signers in these areas is determined by the type of document and the types of its pages.
  • the module 40 for checking the set of documents based on the data on the type of the document received from the module 20 classifying pages, searches the database 41 for a template of this type of document, on the basis of which the module 40 will check the set of documents, and extracts information about the types of pages of this 25 document template. For example, if the page classification module 20 determines that the scanned document is an Agreement between companies "A" and "B", then based on this information about the type of document, the module 40 for checking the set of documents finds in the database a template of the Agreement between companies "A" and "B” »And retrieves information about the types of pages present in the template of the Agreement.
  • the signer's attributes should be located on the first page of the document.
  • the information that the signer's attributes should be on the first page, as well as their location on the page, can be contained in the page type information, according to which the package verification module 40 will check for the presence of signers' attributes on the first page of the contract.
  • information about the type of the last page of the document may contain information that the attributes of the signer should be in a given area (for example, in areas 105 or 106) on this page.
  • information about the type of the first page or the type of document can contain information that the signer's attributes should be in the specified areas on the second or other page in the document.
  • the document set check unit 40 decides that the document set is incomplete.
  • this scanned document is an Agreement concluded with an individual, consisting of 3 pages, where the first 2 pages are the pages of the Agreement, and the third page is a scan of the passport.
  • the scanned Agreement does not contain a scan of the passport or instead of a scan of the passport another document is attached, the image of which will be processed by system 1, then the information on the type of the third page received from module 20 will not coincide with the information on the type of pages of the document template.
  • Information that the scanned set of documents is incomplete, for example, in the form of a message "missing passport scan”, can be output to the I / O means (205).
  • the document set verification module 40 extracts from the database 41 information about the location of at least one signer's attribute on at least one page according to the document template for those types of pages that must contain at least one signer attribute.
  • the mentioned information about the location of at least one attribute of the signer can be obtained experimentally based on the data on the average coordinates of the location of signatures and seals in document templates.
  • the data classification module 20 extracts from the database 41 information about the location of signatures and / or seals (i.e.
  • DB 41 can be stored as a type of document template, in which the location information will indicate that the attributes of the signer of company "A” should be in the area of page 105 of the Agreement 100, and the attributes of the signer of company B - in the area 106 Agreement 100, and the type of document template, in which the attributes of the signatory of the company "B” should be in area 105 of the Agreement 100, and the attributes of the signer of the company "A" - in area 106 of the Agreement 100.
  • the information about the location of at least one attribute of the signer 15 extracted from the database 41 in the previous step is compared by the module 40 for checking the set of documents with the information about the location of at least one attribute on the document page received from the module 30. If the extracted from the database 41, said location information of at least one signer's attribute does not match the location information of at least one attribute 20 on a document page received from module 30, then the document set verification module 40 decides that the document set is incomplete. Information that the scanned set of documents is incomplete, for example, in the form of a message "there is no client signature on page 3", can be displayed on the means (205) I / O 25 information
  • the document set verification module 40 decides that the document set meets the specified completeness requirements. Information that the scanned document set is complete can also be output to the I / O means (205).
  • the document set verification module 40 may be configured to classify at least one signer attribute based on the signer's attribute location information.
  • DB 41 additionally contains information about which side of the Agreement the signer's attribute belongs to, depending on its location on the page.
  • the database 41 may contain information that in the area of page 105 of the Agreement 100 there is an attribute of the signer of the client, and in the area of page 106 - the executor of the Agreement.
  • the document suite verification module 40 classifies the signer's attribute image, for example, as a signer's attribute client, if the signer's attribute is located in the page area 105. or as an attribute of the signer of the Contractor if the signer attribute is located in the page 106 area.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Geometry (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Character Input (AREA)

Abstract

La présente invention se rapporte de manière générale au domaine de l'analyse d'images, et concerne notamment des procédés et des systèmes de vérification d'un ensemble électroniques de documents, comme des documents scannés d'un client de type entreprise d'une banque. Le résultat technique consiste en une augmentation de la précision de la vérification automatisée des documents en ce qui concerne leur intégralité. Ce résultat technique est atteint grâce à l'exécution d'un procédé de vérification d'un ensemble électronique de documents qui est mis en oeuvre sur au moins un dispositif informatique, et qui comprend les étapes suivantes: obtenir une image d'un document comprenant au moins une page; reconnaître les symboles sur la page du document et les convertir en informations de texte; générer un vecteur de page de document sur la base des informations de texte obtenues lors de l'étape précédente; déterminer sur la base du vecteur de page du document le type de document et le type de page de celui-ci; déterminer une liste des pages et au moins un attribut du signataire dont la présence est nécessaire pour vérifier le type donné de document; vérifier la présence de la liste de pages et d'au moins un attribut du signataire sur l'image obtenue du document afin de déterminer l'intégralité du document.
PCT/RU2019/000197 2019-03-28 2019-06-06 Procédé et système de vérification d'un ensemble électronique de documents WO2020197428A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
RU2019109055 2019-03-28
RU2019109055A RU2702967C1 (ru) 2019-03-28 2019-03-28 Способ и система для проверки электронного комплекта документов

Publications (1)

Publication Number Publication Date
WO2020197428A1 true WO2020197428A1 (fr) 2020-10-01

Family

ID=68280239

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/RU2019/000197 WO2020197428A1 (fr) 2019-03-28 2019-06-06 Procédé et système de vérification d'un ensemble électronique de documents

Country Status (3)

Country Link
EA (1) EA201990647A1 (fr)
RU (1) RU2702967C1 (fr)
WO (1) WO2020197428A1 (fr)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11361528B2 (en) * 2020-08-11 2022-06-14 Nationstar Mortgage LLC Systems and methods for stamp detection and classification
WO2024030042A1 (fr) * 2022-08-04 2024-02-08 Публичное Акционерное Общество "Сбербанк России" Procédé et système de traitement d'images de documents

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050066172A1 (en) * 2001-07-20 2005-03-24 Vorbruggen Dr Jan C Method and device for confirming the authenticity of a document and a safe for storing data
US20070220259A1 (en) * 2006-03-02 2007-09-20 Microsoft Corporation Verification of electronic signatures
US20110255788A1 (en) * 2010-01-15 2011-10-20 Copanion, Inc. Systems and methods for automatically extracting data from electronic documents using external data
US20140281946A1 (en) * 2013-03-14 2014-09-18 Yossi Avni System and method of encoding content and an image
RU2014118012A (ru) * 2014-05-05 2015-11-10 Галина Эдуардовна Добрякова Система и способ дистанционного заключения и регистрации электронных сделок
US20170212875A1 (en) * 2016-01-27 2017-07-27 Microsoft Technology Licensing, Llc Predictive filtering of content of documents

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
RU2229744C2 (ru) * 2002-02-28 2004-05-27 ЗАО "НИИИН МНПО "Спектр" Способ и устройство компьютеризированной оптической обработки документов
RU56682U1 (ru) * 2006-06-08 2006-09-10 Александр Алексеевич Бойко Информационно-аналитическая торгово-операционная система электронных торгов
JP5207688B2 (ja) * 2007-08-30 2013-06-12 キヤノン株式会社 画像処理装置および統合ドキュメント生成方法
JP5448766B2 (ja) * 2009-12-08 2014-03-19 キヤノン株式会社 画像処理装置、画像処理装置の制御方法、プログラム

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050066172A1 (en) * 2001-07-20 2005-03-24 Vorbruggen Dr Jan C Method and device for confirming the authenticity of a document and a safe for storing data
US20070220259A1 (en) * 2006-03-02 2007-09-20 Microsoft Corporation Verification of electronic signatures
US20110255788A1 (en) * 2010-01-15 2011-10-20 Copanion, Inc. Systems and methods for automatically extracting data from electronic documents using external data
US20140281946A1 (en) * 2013-03-14 2014-09-18 Yossi Avni System and method of encoding content and an image
RU2014118012A (ru) * 2014-05-05 2015-11-10 Галина Эдуардовна Добрякова Система и способ дистанционного заключения и регистрации электронных сделок
US20170212875A1 (en) * 2016-01-27 2017-07-27 Microsoft Technology Licensing, Llc Predictive filtering of content of documents

Also Published As

Publication number Publication date
RU2702967C1 (ru) 2019-10-14
EA201990647A1 (ru) 2020-09-30

Similar Documents

Publication Publication Date Title
US10482174B1 (en) Systems and methods for identifying form fields
US11816138B2 (en) Systems and methods for parsing log files using classification and a plurality of neural networks
US20220004878A1 (en) Systems and methods for synthetic document and data generation
WO2007080642A1 (fr) Programme de traitement et dispositif a programme de fiche
US11507901B1 (en) Apparatus and methods for matching video records with postings using audiovisual data processing
RU2702967C1 (ru) Способ и система для проверки электронного комплекта документов
EP4141818A1 (fr) Numérisation, transformation et validation de documents
CN113221570A (zh) 基于线上问诊信息的处理方法、装置、设备及存储介质
JP2019212115A (ja) 検査装置、検査方法、プログラム及び学習装置
US20230138491A1 (en) Continuous learning for document processing and analysis
KR102282025B1 (ko) 컴퓨터를 이용한 문서 분류 및 문자 추출 방법
CN112464927B (zh) 一种信息提取方法、装置及系统
KR102280490B1 (ko) 상담 의도 분류용 인공지능 모델을 위한 훈련 데이터를 자동으로 생성하는 훈련 데이터 구축 방법
RU2732071C1 (ru) Способ и система автоматического принятия правового решения
RU2739342C1 (ru) Способ и система интеллектуальной обработки документа
WO2021054850A1 (fr) Procédé et système de traitement intelligent de document
EA043496B1 (ru) Способ и система для проверки электронного комплекта документов
CN114064893A (zh) 一种异常数据审核方法、装置、设备及存储介质
CN111341404B (zh) 一种基于ernie模型的电子病历数据组解析方法及系统
CN114443834A (zh) 一种证照信息提取的方法、装置及存储介质
EP3640861A1 (fr) Systèmes et procédés d'analyse de fichiers de journal utilisant la classification et une pluralité de réseaux neuronaux
EA040560B1 (ru) Способ и система интеллектуальной обработки документа
WO2021075998A1 (fr) Système de classification de données pour révéler des informations confidentielles dans un texte
CN111291726A (zh) 医疗票据分拣方法、装置、设备和介质
US12014561B2 (en) Image reading systems, methods and storage medium for performing geometric extraction

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19920686

Country of ref document: EP

Kind code of ref document: A1