CN116758561A - Document image classification method and device based on multi-mode structured information fusion - Google Patents

Document image classification method and device based on multi-mode structured information fusion Download PDF

Info

Publication number
CN116758561A
CN116758561A CN202311033101.4A CN202311033101A CN116758561A CN 116758561 A CN116758561 A CN 116758561A CN 202311033101 A CN202311033101 A CN 202311033101A CN 116758561 A CN116758561 A CN 116758561A
Authority
CN
China
Prior art keywords
text
key
document image
key information
image classification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311033101.4A
Other languages
Chinese (zh)
Inventor
申意萍
陈友斌
张志坚
徐一波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hubei Micropattern Technology Development Co ltd
Original Assignee
Hubei Micropattern Technology Development Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hubei Micropattern Technology Development Co ltd filed Critical Hubei Micropattern Technology Development Co ltd
Priority to CN202311033101.4A priority Critical patent/CN116758561A/en
Publication of CN116758561A publication Critical patent/CN116758561A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/191Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06V30/19173Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/191Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06V30/1918Fusion techniques, i.e. combining data from various sources, e.g. sensor fusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/42Document-oriented image-based pattern recognition based on the type of document

Abstract

The invention discloses a document image classification method and a device based on multi-mode structured information fusion, which relate to the technical field of document image classification, and are characterized in that firstly, layout analysis is carried out on document images, and key areas in the document images, such as titles, characters, figures, figure titles, tables, table titles, seals and the like, are positioned; then extracting text key information of the region according to the type of the key region, then carrying out word segmentation and word vector extraction on the text key information, and finally merging all word vectors to classify. The invention can realize the rapid and accurate classified filing treatment of a large number of electronic materials, and can effectively avoid the classification difficulty and the misclassification caused by different shooting environments; the problem of text classification caused by paper material deformation can be solved; the problems of partial shielding of characters by seals or other things, incomplete documents, titles of some documents, no titles of some documents and complicated material content can be solved.

Description

Document image classification method and device based on multi-mode structured information fusion
Technical Field
The invention relates to the technical field of document image classification, in particular to a document image classification method and device based on multi-mode structured information fusion.
Background
As digital transformation progresses through the industries, the number of electronic document images continues to increase. In the financial field (such as banks, insurance, securities, tax, etc.), in order to preserve a wide variety of paper materials for a long period of time, it is necessary to process them electronically, thus forming a huge electronic document image dataset. In recent years, various remote financial behaviors are continuously popularized due to the influence of epidemic situations, such as remote account opening, online reimbursement and the like. In these remote financial activities, it is necessary to electronically render the paper material, typically using a user's cell phone or tablet. A large number of electronic materials require sorting, archiving and identification processes. Electronic documents contain a large amount of industry-related image and text information, and manual processing of such information is time consuming and costly, so that automatic classification of electronic document images is highly desirable. However, classifying these document images faces the following difficulties:
(1) The shooting environments, such as illumination, angles and backgrounds, are different, and shooting devices, such as resolution, have large differences in generated document images and are difficult to unify and normalize;
(2) The paper material is a non-rigid body, and is easy to deform, so that the characters are distorted and deformed, and the accuracy of text recognition is affected;
(3) The text is partially blocked by the stamp or other things, for example, titles of various notes are blocked by the stamp;
(4) Document incomplete;
(5) The documents are various, some documents have titles, and some documents have no title;
(6) The document material content is complex and changeable, the document layout of the same kind is not uniform, and the intra-class difference is large. Taking medical documents as an example, the examination reports with different names are not in a fixed form, some examination reports only have images shot by a medical camera, some examination reports only have tables, some examination reports only have text descriptions, and some examination reports comprise at least two of the 3 items; for cases, there are both printed cases and handwritten cases. The machine-made cases, generally, do not have a uniform format; the handwritten cases are usually written on the medical record book printed with the keywords, and the written contents are very difficult to identify;
(7) Even with the same type of document, different institutions produce documents that differ.
Disclosure of Invention
In order to solve the technical problems, the invention provides a document image classification method and device based on multi-mode structured information fusion. The following technical scheme is adopted:
the document image classification method based on multi-mode structured information fusion comprises the following steps:
step 1, performing layout analysis on a document image to be classified, and locating a key area;
step 2, extracting text key information from the key areas according to types;
step 3, word segmentation and word vector extraction are respectively carried out on the text key information;
and 4, classifying the documents based on the word vectors and the types to which the word vectors belong.
Optionally, in step 1, the key area is a title, a text, a graph title, a table title, and a stamp.
By adopting the technical scheme, firstly, layout analysis is carried out on the document image, and key areas in the document image, such as titles, characters, figures, figure titles, tables, table titles, seals and the like, are positioned; then extracting text key information of the region according to the type of the key region, then carrying out word segmentation and word vector extraction on the text key information, and finally merging all word vectors to classify.
The method can realize rapid and accurate classified filing and identification processing of a large number of electronic materials, and can effectively avoid classification difficulty and misclassification caused by different shooting environments; the problem of text recognition caused by paper material deformation can be solved; the problems of partial shielding of characters by seals or other things, incomplete documents, titles of some documents, no titles of some documents and complicated material content can be solved.
Optionally, in step 1, the key area is located based on the layout analysis algorithm.
By adopting the technical scheme, the key region can be rapidly positioned by specifically using the LayoutLM algorithm.
Optionally, in step 2, the specific method for extracting the text key information respectively includes:
for the title area, text detection and text recognition are carried out to obtain text content which is used as key information;
for a text region, performing text detection, text recognition, semantic entity recognition and relation extraction to obtain a plurality of key-value pairs, and taking a key as key information;
for the graph, if the corresponding icon questions exist, text detection and text recognition are carried out on the icon questions to obtain text contents, the text contents are used as key information, if the corresponding icon questions do not exist, the graph is used for generating text description, and the text description is used as key information;
if the table title exists in the table, text detection and text recognition are carried out on the table title to obtain text content serving as key information; if the title does not exist, treating according to the text area, and acquiring key information;
for the seal, text recognition is adopted to extract text content in the seal as key information.
Optionally, in extracting text key information of the title region, text detection is performed based on DBNet or FCENT, and text recognition is performed based on CRNN+CTC;
in the text key information extraction of the text region, semantic entity identification is to classify each detected text based on a LayoutXLM model, and the types comprise keys, values and titles;
the relation extraction is based on pairing key and value based on a LayoutXLM model;
in the text key information extraction of the graph, viT or CNN+LSTM is adopted to generate text description.
By adopting the technical scheme, text detection, text recognition, semantic entity recognition and relation extraction are carried out on a text region, a series of key-value pairs are obtained, and keys are taken as key information; semantic entity identification, classifying each detected text, wherein the types can comprise key, value, title and the like, and a LayoutXLM model can be used. The relation extraction is to pair the key and the value, and can be realized by using LayoutXLM model training. For the graph, if the corresponding icon questions exist, text detection and text recognition are carried out on the icon questions to obtain text contents, and the text contents are used as key information; if there is no corresponding icon question, a word description such as ViT, or CNN+LSTM, is generated using the graph as key information. According to the method, the graph can be converted into text content information, and fusion of different mode structural information is realized. For the table, if the table title exists, text detection and text recognition are carried out on the table title to obtain text content which is used as key information; and if the title does not exist, treating according to the text area, and acquiring key information.
For the seal, text content in the seal is extracted by text recognition to serve as key information. The seal information has a great influence on the classification of the documents, for example, two seals are arranged on a qualified invoice, the medical expense list comprises one seal, the seals possibly exist in the cases in the medical document image, and the seals are not arranged on the documents such as the second-generation identity card and the bank card of the inspection report and the card type.
Optionally, in step 3, word vectors are extracted based on word2vec or glove.
By adopting the technical scheme, word2vec or glove can quickly and accurately extract word vectors.
Optionally, in step 4, the documents are classified based on the word vector of the TextCNN algorithm and the type to which the word vector belongs, where the type to which the word vector belongs is which of the title, the text, the image, the table, and the seal the source of the word vector belongs.
By adopting the technical scheme, the documents are classified by using the word vectors and the types to which the word vectors belong, such as a TextCNN algorithm. Word vectors obtained from 5 types of text key information of titles, characters, figures, tables and seals can be classified in a feature fusion mode or a decision fusion mode or a hybrid fusion mode.
The document image classification device based on multi-mode structured information fusion comprises a buffer, a processor and a memory, wherein images to be classified are stored in the buffer, a document image classification program is preloaded in the memory, and the processor runs the document image classification program in the memory to complete classification of the images to be classified.
Optionally, the system further comprises a shooting device, wherein the shooting device is in communication connection with the buffer and is used for shooting images of the documents to be classified and storing the images in the buffer.
Optionally, the system further comprises a display, wherein the display is in communication connection with the processor and displays the classification result of the image to be classified under the control of the processor.
In summary, the invention has at least the following beneficial technical effects:
the invention can provide a document image classification method and device based on multi-mode structured information fusion, firstly, layout analysis is carried out on document images, and key areas in the document images are positioned; then extracting text key information of the region according to the type of the key region, then carrying out word segmentation and word vector extraction on the text key information, and finally merging all word vectors to classify, so that a large amount of electronic materials can be classified and filed rapidly and accurately, and classification difficulty and error classification caused by different shooting environments can be effectively avoided; the problem of text image classification caused by paper material deformation can be solved; the problems of partial shielding of characters by seals or other things, incomplete documents, titles of some documents, no titles of some documents and complicated material content can be solved.
Drawings
FIG. 1 is a flow diagram of a document image classification method based on multimodal structured information fusion of the present invention;
FIG. 2 is a schematic diagram of the connection principle of the document image classification device based on multi-modal structured information fusion;
fig. 3 is a schematic representation of an embodiment of the present invention.
Reference numerals illustrate: 1. a buffer; 2. a processor; 3. a memory; 4. a photographing device; 5. a display.
Description of the embodiments
The present invention will be described in further detail with reference to the accompanying drawings.
The embodiment of the invention discloses a document image classification method and device based on multi-mode structured information fusion.
Referring to fig. 1 to 3, the document image classification method based on multi-modal structured information fusion includes the steps of:
step 1, performing layout analysis on a document image to be classified, and locating a key area;
step 2, extracting text key information from the key areas according to types;
step 3, word segmentation and word vector extraction are respectively carried out on the text key information;
and 4, classifying the documents based on the word vectors and the types to which the word vectors belong.
In step 1, the key areas are titles, characters, figures, drawing titles, tables, table titles and seals.
Firstly, carrying out layout analysis on a document image, and positioning key areas in the document image, such as titles, characters, figures, picture titles, tables, table titles, seals and the like; then extracting text key information of the region according to the type of the key region, then carrying out word segmentation and word vector extraction on the text key information, and finally merging all word vectors to classify.
The method can realize rapid and accurate classified filing and identification processing of a large number of electronic materials, and can effectively avoid classification difficulty and misclassification caused by different shooting environments; the problem of text recognition caused by paper material deformation can be solved; the problems of partial shielding of characters by seals or other things, incomplete documents, titles of some documents, no titles of some documents and complicated material content can be solved.
In step 1, a key area is located based on a layout analysis algorithm.
In particular, the LayoutLM algorithm can quickly locate the critical area.
In the step 2, the specific method for respectively extracting the text key information comprises the following steps:
for the title area, text detection and text recognition are carried out to obtain text content which is used as key information;
for a text region, performing text detection, text recognition, semantic entity recognition and relation extraction to obtain a plurality of key-value pairs, and taking a key as key information;
for the graph, if the corresponding icon questions exist, text detection and text recognition are carried out on the icon questions to obtain text contents, the text contents are used as key information, if the corresponding icon questions do not exist, the graph is used for generating text description, and the text description is used as key information;
if the table title exists in the table, text detection and text recognition are carried out on the table title to obtain text content serving as key information; if the title does not exist, treating according to the text area, and acquiring key information;
for the seal, text recognition is adopted to extract text content in the seal as key information.
In the text key information extraction of the title area, text detection is carried out based on DBNet or FCENT, and text recognition is carried out based on CRNN+CTC;
in the text key information extraction of a text region, semantic entity identification is to classify each detected text based on a LayoutXLM model, and the types comprise keys, values and titles;
the relation extraction is to pair the key and the value based on a LayoutXLM model;
in the text key information extraction of the graph, viT or CNN+LSTM is adopted to generate text description.
For a text region, performing text detection and text recognition, semantic entity recognition and relation extraction to obtain a series of key-value pairs, and taking a key as key information; semantic entity identification, classifying each detected text, wherein the types can comprise key, value, title and the like, and a LayoutXLM model can be used. The relation extraction is to pair the key and the value, and can be realized by using LayoutXLM model training. For the graph, if the corresponding icon questions exist, text detection and text recognition are carried out on the icon questions to obtain text contents, and the text contents are used as key information; if there is no corresponding icon question, a word description such as ViT, or CNN+LSTM, is generated using the graph as key information. According to the method, the graph can be converted into text content information, and fusion of different mode structural information is realized. For the table, if the table title exists, text detection and text recognition are carried out on the table title to obtain text content which is used as key information; and if the title does not exist, treating according to the text area, and acquiring key information.
For the seal, text content in the seal is extracted by text recognition to serve as key information. The seal information has a great influence on the classification of the documents, for example, two seals are arranged on a qualified invoice, the medical expense list comprises one seal, the seals possibly exist in the cases in the medical document image, and the seals are not arranged on the documents such as the second-generation identity card and the bank card of the inspection report and the card type.
In step 3, word vectors are extracted based on word2vec or glove.
word2vec or glove can quickly and accurately extract word vectors.
In step 4, classifying the documents based on the word vector of the TextCNN algorithm and the type to which the word vector belongs, wherein the type to which the word vector belongs is which of the title, the text, the image, the table and the seal the source of the word vector belongs.
The documents are classified by the word vector and the type to which the word vector belongs, such as TextCNN algorithm. Word vectors obtained from 5 types of text key information of titles, characters, figures, tables and seals can be classified in a feature fusion mode or a decision fusion mode or a hybrid fusion mode.
The document image classification device based on multi-mode structured information fusion comprises a buffer 1, a processor 2 and a memory 3, wherein images to be classified are stored in the buffer 1, a document image classification program is preloaded in the memory 3, and the processor 2 runs the document image classification program in the memory 3 to finish classification of the images to be classified.
The system further comprises a shooting device 4, wherein the shooting device 4 is in communication connection with the buffer 1 and is used for shooting images of the documents to be classified and storing the images in the buffer 1.
The system also comprises a display 5, wherein the display 5 is in communication connection with the processor 2 and displays the classification result of the images to be classified under the control of the processor 2.
The document image classification method and device based on multi-mode structured information fusion in the embodiment of the invention has the implementation principle that:
the method comprises the steps of carrying out digital filing on a batch of cases, respectively shooting photos of the batch of cases through shooting equipment 4, storing the photos in a buffer 1, and executing a document image classification program on the photos by a processor 2 to obtain classification results. Referring to fig. 3, layout analysis results in a title area, a text area, and an image area. For the title area, the text content of the title is extracted as follows: the ultrasonic detection report form of the people hospital in the city of senior citizens' is used as key information; for an image area, no image title is detected, so that a description "ultrasonic image" of the image area can be obtained as key information in a mode of generating words from the image; and (3) carrying out text detection and text recognition, semantic entity recognition and relation extraction on the text region, and finally obtaining a series of key-value pairs, wherein the number of keys is 11 keys, namely a name, a gender, an age, an examination number, a report date, a department for censoring, a hospitalization number, a bed number, examination equipment, an examination part and a description. Performing word segmentation and extracting word vectors on the 3 types of key information; finally, the word vector and the type (title, image and text area) to which the word vector belongs are taken as input, and the TextCNN is used for classifying to obtain the category to which the document belongs as a check report.
The above embodiments are not intended to limit the scope of the present invention, and therefore: all equivalent changes in structure, shape and principle of the invention should be covered in the scope of protection of the invention.

Claims (8)

1. The document image classification method based on multi-mode structured information fusion is characterized by comprising the following steps of:
step 1, performing layout analysis on a document image to be classified, and locating a key area;
step 2, extracting text key information from the key areas according to types;
step 3, word segmentation and word vector extraction are respectively carried out on the text key information;
step 4, classifying the documents based on the word vectors and the types to which the word vectors belong;
in the step 2, the specific method for respectively extracting the text key information comprises the following steps:
for the title area, text detection and text recognition are carried out to obtain text content which is used as key information;
for a text region, performing text detection, text recognition, semantic entity recognition and relation extraction to obtain a plurality of key-value pairs, and taking a key as key information;
for the graph, if the corresponding icon questions exist, text detection and text recognition are carried out on the icon questions to obtain text contents, the text contents are used as key information, if the corresponding icon questions do not exist, the graph is used for generating text description, and the text description is used as key information;
if the table title exists in the table, text detection and text recognition are carried out on the table title to obtain text content serving as key information; if the title does not exist, treating according to the text area, and acquiring key information;
for the seal, text recognition is adopted to extract text content in the seal as key information;
in the text key information extraction of the title area, text detection is carried out based on DBNet or FCENT, and text recognition is carried out based on CRNN+CTC;
in the text key information extraction of the text region, semantic entity identification is to classify each detected text based on a LayoutXLM model, and the types comprise keys, values and titles;
the relation extraction is to pair the key and the value based on a LayoutXLM model;
in the text key information extraction of the graph, viT or CNN+LSTM is adopted to generate text description.
2. The document image classification method based on multi-modal structured information fusion according to claim 1, wherein: in step 1, the key areas are titles, characters, figures, drawing titles, tables, table titles and seals.
3. The document image classification method based on multi-modal structured information fusion according to claim 2, wherein:
in step 1, a key area is located based on a layout analysis algorithm.
4. The document image classification method based on multi-modal structured information fusion according to claim 3, wherein: in step 3, word vectors are extracted based on word2vec or glove.
5. The document image classification method based on multi-modal structured information fusion according to claim 4, wherein: in step 4, classifying the documents based on the word vector of the TextCNN algorithm and the type to which the word vector belongs, wherein the type to which the word vector belongs is which of the title, the text, the image, the table and the seal the source of the word vector belongs.
6. Document image classification device based on multimode structured information fusion, its characterized in that: the method comprises a buffer (1), a processor (2) and a memory (3), wherein the buffer (1) stores images to be classified and is preloaded with a document image classification program designed according to the method of claim 5, and the processor (2) runs the document image classification program in the memory (3) to complete classification of the images to be classified.
7. The document image classification apparatus based on multi-modal structured information fusion as set forth in claim 6, wherein: the system further comprises a shooting device (4), wherein the shooting device (4) is in communication connection with the buffer (1) and is used for shooting images of the documents to be classified and storing the images in the buffer (1).
8. The document image classification apparatus based on multi-modal structured information fusion as set forth in claim 7, wherein: the system also comprises a display (5), wherein the display (5) is in communication connection with the processor (2) and displays the classification result of the images to be classified under the control of the processor (2).
CN202311033101.4A 2023-08-16 2023-08-16 Document image classification method and device based on multi-mode structured information fusion Pending CN116758561A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311033101.4A CN116758561A (en) 2023-08-16 2023-08-16 Document image classification method and device based on multi-mode structured information fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311033101.4A CN116758561A (en) 2023-08-16 2023-08-16 Document image classification method and device based on multi-mode structured information fusion

Publications (1)

Publication Number Publication Date
CN116758561A true CN116758561A (en) 2023-09-15

Family

ID=87951774

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311033101.4A Pending CN116758561A (en) 2023-08-16 2023-08-16 Document image classification method and device based on multi-mode structured information fusion

Country Status (1)

Country Link
CN (1) CN116758561A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112733658A (en) * 2020-12-31 2021-04-30 北京华宇信息技术有限公司 Electronic document filing method and device
CN113849657A (en) * 2021-08-11 2021-12-28 杭州云嘉健康管理有限公司 Structured data processing method of intelligent supervision black box
CN114241501A (en) * 2021-12-20 2022-03-25 北京中科睿见科技有限公司 Image document processing method and device and electronic equipment
CN114299528A (en) * 2021-12-27 2022-04-08 万达信息股份有限公司 Information extraction and structuring method for scanned document
CN115880704A (en) * 2023-02-16 2023-03-31 中国人民解放军总医院第一医学中心 Automatic case cataloging method, system, equipment and storage medium
CN116434028A (en) * 2023-06-15 2023-07-14 上海蜜度信息技术有限公司 Image processing method, system, model training method, medium and device
CN116543404A (en) * 2023-05-09 2023-08-04 重庆师范大学 Table semantic information extraction method, system, equipment and medium based on cell coordinate optimization

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112733658A (en) * 2020-12-31 2021-04-30 北京华宇信息技术有限公司 Electronic document filing method and device
CN113849657A (en) * 2021-08-11 2021-12-28 杭州云嘉健康管理有限公司 Structured data processing method of intelligent supervision black box
CN114241501A (en) * 2021-12-20 2022-03-25 北京中科睿见科技有限公司 Image document processing method and device and electronic equipment
CN114299528A (en) * 2021-12-27 2022-04-08 万达信息股份有限公司 Information extraction and structuring method for scanned document
CN115880704A (en) * 2023-02-16 2023-03-31 中国人民解放军总医院第一医学中心 Automatic case cataloging method, system, equipment and storage medium
CN116543404A (en) * 2023-05-09 2023-08-04 重庆师范大学 Table semantic information extraction method, system, equipment and medium based on cell coordinate optimization
CN116434028A (en) * 2023-06-15 2023-07-14 上海蜜度信息技术有限公司 Image processing method, system, model training method, medium and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
QQ_16952303: "OCR基于图像数据的信息抽取任务", pages 245 - 4, Retrieved from the Internet <URL:https://blog.csdn.net/qq_16952303/article/details/127083237> *

Similar Documents

Publication Publication Date Title
Altwaijry et al. Arabic handwriting recognition system using convolutional neural network
US20190385054A1 (en) Text field detection using neural networks
US20190294921A1 (en) Field identification in an image using artificial intelligence
US20190180094A1 (en) Document image marking generation for a training set
Lee et al. GNHK: a dataset for English handwriting in the wild
Lopes et al. Offline handwritten signature verification using deep neural networks
Singh et al. A new feature extraction approach for script invariant handwritten numeral recognition
Singh et al. A benchmark dataset of online handwritten gurmukhi script words and numerals
CN112801099A (en) Image processing method, device, terminal equipment and medium
Choudhary et al. Offline handwritten mathematical expression evaluator using convolutional neural network
CN116758561A (en) Document image classification method and device based on multi-mode structured information fusion
CN115937887A (en) Method and device for extracting document structured information, electronic equipment and storage medium
US20220398399A1 (en) Optical character recognition systems and methods for personal data extraction
Shi et al. An invoice recognition system using deep learning
Vishwanath et al. Deep reader: Information extraction from document images via relation extraction and natural language
Palani et al. Detecting and extracting information of medicines from a medical prescription using deep learning and computer vision
Kamble et al. Adaptive threshold-based database preparation method for handwritten image classification
Khan et al. Analysis of Cursive Text Recognition Systems: A Systematic Literature Review
Shinde et al. Text Extraction from Images using Tesseract
Kunang et al. A New Deep Learning-Based Mobile Application for Komering Character Recognition
Santiago Garcia Country-independent MRTD layout extraction and its applications
Khandan An intelligent hybrid model for identity document classification
Schuerkamp et al. Enabling new interactions with library digital collections: automatic gender recognition in historical postcards via deep learning
Bagwe et al. Optical character recognition using deep learning techniques for printed and handwritten documents
Shahin et al. Deploying Optical Character Recognition to Improve Material Handling and Processing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination