CN115223183A - Information extraction method and device and electronic equipment - Google Patents

Information extraction method and device and electronic equipment Download PDF

Info

Publication number
CN115223183A
CN115223183A CN202211046852.5A CN202211046852A CN115223183A CN 115223183 A CN115223183 A CN 115223183A CN 202211046852 A CN202211046852 A CN 202211046852A CN 115223183 A CN115223183 A CN 115223183A
Authority
CN
China
Prior art keywords
anchor frame
data
text
form image
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211046852.5A
Other languages
Chinese (zh)
Inventor
孙强
常鹏
周辉
冯兴祥
黎利
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Bank Co Ltd
Original Assignee
Ping An Bank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Bank Co Ltd filed Critical Ping An Bank Co Ltd
Priority to CN202211046852.5A priority Critical patent/CN115223183A/en
Publication of CN115223183A publication Critical patent/CN115223183A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/412Layout analysis of documents structured with printed lines or input boxes, e.g. business forms or tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/803Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of input or preprocessed data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/413Classification of content, e.g. text, photographs or tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Character Input (AREA)

Abstract

The invention discloses an information extraction method, an information extraction device and electronic equipment, wherein the method comprises the following steps: acquiring a form image to be identified, and extracting text data in the form image; inputting the form image into a trained anchor frame recognition model to obtain anchor frame data of the form image; matching a pre-generated offline form template library according to the anchor frame data; and fusing the anchor frame data and the text data based on a matching result to obtain structured form information. The embodiment of the invention can acquire the coordinate data of the anchor frame based on the target detection model of the image, obtain the text information corresponding to the anchor frame through the template matching algorithm based on the text, and use the off-line form template library for matching, thereby improving the accuracy of final information extraction and reducing the error rate of the error detection and omission of the cross-line or cross-page text.

Description

Information extraction method and device and electronic equipment
Technical Field
The present invention relates to the field of image processing technologies, and in particular, to an information extraction method and apparatus, and an electronic device.
Background
The image identification technology is widely applied to banking business, a large number of certificates are required to be authenticated, identified and checked in an operation link, the certificates are classified and identified manually in the traditional certificate approval process, and then the certificates are manually input, so that the efficiency is low, the time consumption is long, and the error rate is high. Through development of image recognition technology, the existing OCR form recognition technology can basically recognize bank general forms or certificates, such as identity cards, social security cards, credit card application forms, checks and the like, customers can upload directly recognized forms through a scanning function or photographing on applets, webpages and H5 pages, and administrators can automatically extract key information of the verified forms by using image recognition products.
In the international settlement business scene of a bank, the international settlement business scene relates to the link of extracting and auditing a large number of forms of the credit certificates, and due to the factors of multiple forms of the credit certificates, different formats, smaller fonts and the like, the existing image identification technology at present has low accuracy of identifying the credit certificates, the extracted texts have error and leakage conditions, manual re-processing is needed to correct the extracted contents, a large amount of human resources are put into simple and repeated low-value work, the international settlement document auditing efficiency is difficult to improve, and the forms are required to influence structuralization, the style and the accuracy of the extracted forms are increased, and the document auditing progress is accelerated.
In the prior art, the problem of structuring a form image is divided into two independent tasks: and reading the text and extracting information, and analyzing and extracting key elements in the text after detecting and identifying the text in the image. However, the method mainly focuses on improving the task of information extraction, ignores the relevance between text reading and information extraction, and thus, although the problem of form image structuring can be effectively solved, the method only focuses on the operation of two independent tasks, ignores the relevance between the two tasks, and causes that the accuracy of information extraction is reduced when the situation of text line crossing or page crossing occurs, thereby affecting the final text generation.
Accordingly, there is a need for improvements and developments in the art.
Disclosure of Invention
In view of the defects of the prior art, the invention provides an information extraction method, an information extraction device and electronic equipment, and aims to solve the problem that when information is extracted in the prior art, the accuracy of information extraction is low when text line crossing or page crossing exists.
The technical scheme of the invention is as follows:
a first embodiment of the present invention provides an information extraction method, including:
acquiring a form image to be identified, and extracting text data in the form image;
inputting the form image into a trained anchor frame recognition model to obtain anchor frame data of the form image;
matching a pre-generated offline form template library according to the anchor frame data;
and fusing the anchor frame data and the text data based on the matching result to obtain structured form information.
Further, the acquiring a form image to be recognized and extracting text data in the form image includes:
acquiring a form image to be identified, and performing text identification on the form image based on a text identification algorithm;
and obtaining text data in the form image based on a text recognition result, wherein the text data comprises text content and corresponding position coordinates.
Further, the inputting the form image to be recognized into the trained anchor frame recognition model to obtain the anchor frame data of the form image includes:
obtaining an anchor frame marking result of the form image sample, and obtaining an anchor frame marking data set based on the anchor frame marking result;
training a target detection model based on an anchor frame labeling data set to generate a trained anchor frame identification model;
and inputting the form image into the anchor frame recognition model to obtain anchor frame data of the form image, wherein the anchor frame data comprises anchor frame categories and corresponding position coordinates.
Further, the inputting the form image to be recognized into the trained anchor frame recognition model to obtain the anchor frame data of the form image includes:
obtaining an anchor frame marking result of the form image sample, and obtaining an anchor frame marking data set based on the anchor frame marking result;
training a target detection model based on an anchor frame labeling data set to generate a trained anchor frame identification model;
and inputting the form image into the anchor frame recognition model to obtain anchor frame data of the form image, wherein the anchor frame data comprises anchor frame categories and corresponding position coordinates.
Further, if the matching result is based on, fusing the anchor frame data and the text data to obtain structured form information, including:
and merging the anchor frame data and the text data based on the first text and the second text to obtain the structured form information.
Further, based on the matching result, fusing the anchor frame data and the text data to obtain structured form information, including:
and merging the anchor frame data and the text data based on the first text and the second text to obtain the structured form information.
Further, the inputting the anchor frame labeling data sample into a target detection model for training to generate a trained anchor frame recognition model includes:
inputting the anchor frame marking data sample into a ResNet network of a target detection model for feature extraction;
inputting the extracted first features into a neck network part of the target detection model, and acquiring second features output by the neck network part;
merging the second features according to the resolution to generate third features;
inputting the third characteristic into a head network part to obtain a prediction result output by the head network;
and evaluating the prediction result, and adjusting the network parameters of the anchor frame identification model according to the evaluation result until the identification precision of the anchor frame identification model meets the preset condition.
Further, the method further comprises:
and acquiring anchor frame data coordinates output by an anchor frame recognition model, and converting the coordinates of the anchor frame data into coordinates in the form image according to a preset rule.
Another embodiment of the present invention provides an information extracting apparatus, including:
the text data acquisition module is used for acquiring a form image to be identified and extracting text data in the form image;
the anchor frame data acquisition module is used for inputting the form image into a trained anchor frame recognition model to obtain anchor frame data of the form image;
the form template matching module is used for matching a pre-generated offline form template library according to the anchor frame data;
and the data fusion module is used for fusing the anchor frame data and the text data based on the matching result to obtain structured form information.
Further, the text data acquisition module includes:
the form image recognition unit is used for acquiring a form image to be recognized and performing text recognition on the form image based on a text recognition algorithm;
and the text data acquisition unit is used for acquiring text data in the form image based on a text recognition result, wherein the text data comprises text content and corresponding position coordinates.
Further, the anchor frame data acquisition module comprises:
the data set acquisition unit is used for acquiring an anchor frame labeling result of the form image sample and acquiring an anchor frame labeling data set based on the anchor frame labeling result;
the model training unit is used for training the target detection model based on the anchor frame marking data set to generate a trained anchor frame identification model;
and the model output unit is used for inputting the form image into the anchor frame recognition model to obtain anchor frame data of the form image, wherein the anchor frame data comprises anchor frame categories and corresponding position coordinates.
Further, a single-mode board matching module, comprising:
the first text matching unit is used for retrieving a pre-generated offline form template library according to the position coordinates of the anchor frame, outputting the corresponding position of the anchor frame and the specific content of the corresponding cross-line or cross-page text, and recording the corresponding position and the specific content as a first text;
and the second text matching unit is used for searching other texts except the anchor box position based on the position coordinates of the anchor box data and recording the texts as second texts.
Further, the data fusion module comprises:
and the data merging unit is used for merging the anchor frame data and the text data based on the first text and the second text to obtain the structured form information.
Further, the anchor frame data acquisition module further comprises:
the model construction unit is used for constructing a target detection model based on a YOLOv5 network in advance;
the data preprocessing unit is used for preprocessing the anchor frame marking data set to generate anchor frame marking data samples with uniform sizes;
and the training unit is used for inputting the anchor frame marking data sample into a target detection model for training to generate a trained anchor frame identification model.
Further, the anchor frame data obtaining module further comprises:
the first feature extraction unit is used for inputting the anchor frame marking data sample into a ResNet network of a target detection model for feature extraction;
the second feature extraction unit is used for inputting the extracted first features into a neck network part of the target detection model and acquiring second features output by the neck network part;
the third feature generation unit is used for merging the second features according to the resolution and generating third features;
the prediction unit is used for inputting the third characteristic to the head network part and acquiring a prediction result output by the head network;
and the prediction result evaluation unit is used for evaluating the prediction result and adjusting the network parameters of the anchor frame identification model according to the evaluation result until the identification precision of the anchor frame identification model meets the preset condition.
Further, the apparatus further comprises:
and the coordinate conversion module is used for acquiring the category and the coordinate of the anchor frame data output by the anchor frame recognition model and converting the coordinate of the anchor frame data into the coordinate in the form image according to a preset rule.
Another embodiment of the present invention provides an electronic device comprising at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the above-described information extraction method.
Another embodiment of the present invention also provides a non-transitory computer-readable storage medium storing computer-executable instructions that, when executed by one or more processors, cause the one or more processors to perform the above-described information extraction method.
Has the beneficial effects that: according to the information extraction method, the coordinate data of the anchor frame is obtained through the image-based target detection model, the text information corresponding to the anchor frame is obtained through the text-based template matching algorithm, the offline form template base is used for matching, the accuracy of final information extraction is improved, and the error rate of false detection and missed detection of cross-line or cross-page texts is reduced.
Drawings
The invention will be further described with reference to the following drawings and examples, in which:
FIG. 1 is a flow chart of a preferred embodiment of an information extraction method according to the present invention;
FIG. 2 is a flowchart illustrating a detailed step of step S100 according to an embodiment of the present invention;
FIG. 3 is a flowchart illustrating a detailed step of step S200 according to an embodiment of the present invention;
FIG. 4 is a flowchart of a detailed step of step S300 according to an embodiment of the present invention;
FIG. 5 is a functional block diagram of an information extraction apparatus according to an embodiment of the present invention;
fig. 6 is a schematic diagram of functional modules of the text data acquiring module 11 according to an embodiment of the present invention;
fig. 7 is a schematic diagram of functional modules of the anchor frame data acquiring module 12 according to a specific application embodiment of the information extracting apparatus of the present invention;
FIG. 8 is a functional block diagram of the form template matching module 13 according to an embodiment of the present invention;
fig. 9 is a schematic diagram of a hardware structure of an electronic device according to a preferred embodiment of the invention.
Detailed Description
In order to make the objects, technical solutions and effects of the present invention clearer and clearer, the present invention is described in further detail below. It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention.
Embodiments of the present invention will be described below with reference to the accompanying drawings.
In view of the above problems, an embodiment of the present invention provides an information extraction method, please refer to fig. 1, where fig. 1 is a flowchart of a preferred embodiment of the information extraction method according to the present invention. As shown in fig. 1, it includes:
s100, obtaining a form image to be identified, and extracting text data in the form image;
in specific implementation, the information extraction method in the embodiment of the invention is used for extracting information in an image. For example, the image may contain a form image of the form data, and may be a letter of credit image. The original text contains tables and characters, and the tables represent different contents. According to the embodiment of the invention, after the form image to be identified is obtained, the form image is subjected to image processing, and the text data in the form image is extracted. The text data may be extracted by an OCR (optical character recognition), wherein a specific implementation process of the OCR is the prior art, and is not described herein again.
S200, inputting the form image into the trained anchor frame recognition model to obtain anchor frame data of the form image;
in specific implementation, in order to better acquire form image data, the embodiment of the invention divides the form image into a plurality of checked-box anchor frames. And inputting the form image to be recognized into the trained anchor frame recognition model so as to obtain the anchor frame data of the form image. The anchor frame identification model may employ a target detection algorithm model. And the specific function of the characters in the form image is accurately obtained by obtaining the anchor frame.
Step S300, matching a pre-generated offline form template library according to the anchor frame data;
in specific implementation, in order to improve the processing speed of the algorithm, anchor frame data can be obtained in a form of generating a form template library in advance, and a form template matched with the anchor frame data is obtained through the anchor frame, so that subsequent data processing is facilitated.
And S400, fusing the anchor frame data and the text data based on the matching result to obtain structured form information.
And during specific implementation, if the form template is matched, filling the anchor frame data according to the form template, and fusing the text data and the anchor frame data to obtain structured form information.
The embodiment of the invention aims at extracting and structuring the key information of the current form, decomposing related tasks in the form, improving the form by adopting two types of models of recognition and detection, fusing the improved models, positioning the key information position by using a detector with higher precision, and effectively improving the precision of the finally generated text key information by one-step detection work in advance on the premise of not influencing the recognition precision of the corresponding text.
In one embodiment, as shown in fig. 2, step S100 includes:
s101, obtaining a form image to be identified, and performing text identification on the form image based on a text identification algorithm;
and S102, obtaining text data in the form image based on the text recognition result, wherein the text data comprises text content and corresponding position coordinates.
In specific implementation, after a form image to be recognized is obtained, text recognition is performed on the form image based on an existing text recognition algorithm model, and all text contents and position coordinates of the text contents in the form image are extracted.
In one embodiment, as shown in fig. 3, step S200 includes:
step S201, obtaining an anchor frame labeling result of the form image sample, and obtaining an anchor frame labeling data set based on the anchor frame labeling result;
step S202, training a target detection model based on an anchor frame labeling data set to generate a trained anchor frame recognition model;
step S203, inputting the form image into an anchor frame recognition model to obtain anchor frame data of the form image, wherein the anchor frame data comprises anchor frame categories and corresponding position coordinates.
In specific implementation, the existing form image sample is labeled manually to generate an anchor frame labeling result. And obtaining an anchor frame marking result to generate an anchor frame data set. Specifically, the provided 500-case single image data is manually labeled, formats of the packets-boxes are manually screened, the packets-boxes are divided into 10 categories according to different selection modes of the related packets-boxes, finally, the labeled position coordinates are converted into formats required by the algorithm, and the labeled data format is csv.
And dividing the anchor frame labeling data set into a training set and a verification set to train the target detection model and generate a trained anchor frame recognition model. And inputting the form image to be processed into the trained anchor frame recognition model to obtain anchor frame data, wherein the anchor frame data comprises but is not limited to anchor frame categories and corresponding position coordinates. Specifically, the current task is to train a model capable of detecting specific coordinates of a clicked-box marking the position of text key information, wherein the input is a form image picture, and the output is related categories and specific position coordinates of all clicked-boxes in the picture. The data set is divided into a training set and a verification set according to 8.
In one embodiment, as shown in fig. 4, step S300 includes:
s301, retrieving a pre-generated offline form template library according to the position coordinates of the anchor frame, and outputting the corresponding position of the anchor frame and the specific content of the corresponding cross-line or cross-page text, wherein the specific content is recorded as a first text;
and S302, searching other texts except the anchor frame position based on the position coordinates of the anchor frame data, and recording the texts as second texts.
And during specific implementation, searching a corresponding template result, searching from the constructed template dictionary according to the coordinates of the anchor frame obtained in the previous step, finding out the problems contained in the corresponding positions of the key information and the specific content of the corresponding cross-line or cross-page text, marking as a first text, and outputting the first text. Searching a corresponding text result, searching other normal texts in a region corresponding to the key information according to the anchor frame, and combining all the found key information.
Because labeling work of the content and layout of the text key information is complicated, under the condition that the data set is small in scale, a method of constructing a template dictionary in an off-line mode is adopted for data enhancement, after a small number of data sets are merged and summarized, the related layout and text of the final template are summarized, for example, a form picture A is composed of three parts, and the specific layout of each part and the specific content and position coordinates of the related cross-line and cross-page text are determined. Before the text is input into the model, the matched template is found and is sent into the model together, so that the accuracy of final information extraction is improved, and the error rate of false detection and missing detection of the cross-line and cross-page text is reduced.
In one embodiment, step S400 includes:
and merging the anchor frame data and the text data based on the first text and the second text to obtain the structured form information.
During specific implementation, the first text and the second text are obtained through the position coordinates of the anchor frame and the corresponding text, the first text and the second text are fused and processed to obtain all the final contents of the key information of the form image, and the form image data is structured.
In one embodiment, training a target detection model based on an anchor frame labeling dataset to generate a trained anchor frame recognition model comprises:
a target detection model based on a YOLOv5 network is constructed in advance;
preprocessing an anchor frame marking data set to generate anchor frame marking data samples with uniform sizes;
and inputting the anchor frame labeling data sample into a target detection model for training to generate a trained anchor frame recognition model.
In specific implementation, the YOLOv5 target detection algorithm model pre-constructed in the embodiment of the invention is modified to improve the performance of detecting smaller target objects, so as to be used for the task. Due to the particularity of the used data set, the detected objects only occupy a small pixel area, and the characteristics are not obvious, so that the input sizes of the pictures are unified to be (2000 ), the training is performed according to a Yolov5 target detection algorithm model, the input of the Yolov5 target detection algorithm model is form data, and a clicked-box coordinate result is output. Wherein YOLOv5 is a typical target detection one-stage model.
In one embodiment, inputting anchor frame labeling data samples into a target detection model for training, and generating a trained anchor frame recognition model, the method comprises:
inputting an anchor frame marking data sample into a ResNet network of a target detection model for feature extraction;
inputting the extracted first features into a neck network part of the target detection model, and acquiring second features output by the neck network part;
the second characteristic is that a third characteristic is generated after merging according to the resolution;
inputting the third characteristic into the head network part to obtain a prediction result output by the head network;
and evaluating the prediction result, and adjusting the network parameters of the anchor frame recognition model according to the evaluation result until the recognition precision of the anchor frame recognition model meets the preset condition.
In specific implementation, the anchor frame marking data sample is subjected to feature extraction through a ResNet network to generate a first feature, the extracted first feature is sent to a Neck part, the extracted feature is recorded as a second feature, the second feature is merged according to the resolution, a third feature is generated, the third feature is sent to a Head part to perform final reasoning work, and a prediction result is generated. In the embodiment of the invention, in order to make the network more helpful to the task, the Neck part in the target detection network frame is modified, the network layer corresponding to the feature with lower aggregation resolution is added, and the connection mode of the Neck part and the Head part is changed, so that the network is focused on detecting a specific feature diagram, and the aim of detecting a smaller target is finally achieved.
In a further embodiment, the method further comprises:
and acquiring anchor frame data coordinates output by the anchor frame identification model, and converting the coordinates of the anchor frame data into coordinates in the form image according to a preset rule.
In specific implementation, the corresponding categories and specific coordinates of all required check-boxes in the form image data are deduced according to the detection model, the check-boxes are converted into specific coordinates in the original image according to rules, and coordinate conversion is carried out according to the following formula: new coordinates = original coordinates image corresponding size.
According to the method embodiment, aiming at the current form key information extraction structuralization, the related tasks are decomposed, the related models which are researched before are improved, the two types of models of 'recognition and detection' are adopted for improvement, the improved models are fused, the positions of the key information are positioned by using a detector with higher precision, the precision of the finally generated text key information is effectively improved by one-step detection work in advance on the premise of not influencing the corresponding text recognition precision, meanwhile, compared with the related research, the model training speed is greatly reduced, and finally, a data enhancement method of template retrieval is added, so that the problems of false detection or missed detection of cross-row and cross-page texts are effectively solved.
It should be noted that, a certain order does not necessarily exist between the above steps, and it can be understood by those skilled in the art according to the description of the embodiments of the present invention that, in different embodiments, the above steps may have different execution orders, that is, may be executed in parallel, may also be executed interchangeably, and the like.
Another embodiment of the present invention provides an information extracting apparatus, as shown in fig. 5, the apparatus 1 includes:
the text data acquisition module 11 is configured to acquire a form image to be identified and extract text data in the form image;
the anchor frame data acquisition module 12 is configured to input the form image into the trained anchor frame recognition model to obtain anchor frame data of the form image;
the form template matching module 13 is used for matching a pre-generated offline form template library according to the anchor frame data;
and the data fusion module 14 is configured to fuse the anchor frame data and the text data based on the matching result to obtain structured form information.
The specific implementation is shown in the method embodiment, and is not described herein again.
In one embodiment, as shown in fig. 6, the text data obtaining module 11 includes:
the form image recognition unit 111 is configured to acquire a form image to be recognized, and perform text recognition on the form image based on a text recognition algorithm;
a text data obtaining unit 112, configured to obtain text data in the form image based on the text recognition result, where the text data includes text content and corresponding position coordinates.
The specific implementation is shown in the method embodiment, and is not described herein again.
In one embodiment, as shown in FIG. 7, the anchor frame data acquisition module 12 includes:
the data set obtaining unit 121 is configured to obtain an anchor frame labeling result of the form image sample, and obtain an anchor frame labeling data set based on the anchor frame labeling result;
the model training unit 122 is configured to train a target detection model based on the anchor frame labeling data set, and generate a trained anchor frame recognition model;
and the model output unit 123 is configured to input the form image into the anchor frame recognition model to obtain anchor frame data of the form image, where the anchor frame data includes an anchor frame category and corresponding position coordinates.
The detailed description of the embodiments is omitted here for the embodiments of the method.
In one embodiment, as shown in FIG. 8, the form template matching module 13 includes:
the first text matching unit 131 is configured to retrieve a pre-generated offline form template library according to the position coordinates of the anchor frame, output a position corresponding to the anchor frame and corresponding specific contents of a cross-line or cross-page text, and record the position as a first text;
and a second text matching unit 132, configured to search for other texts besides the anchor box position based on the position coordinates of the anchor box data, and record the texts as a second text.
The specific implementation is shown in the method embodiment, and is not described herein again.
In one embodiment, the data fusion module 14 includes:
and the data merging unit is used for merging the anchor frame data and the text data based on the first text and the second text to obtain the structured form information.
The detailed description of the embodiments is omitted here for the embodiments of the method.
In one embodiment, the anchor frame data obtaining module further comprises:
the model construction unit is used for constructing a target detection model based on a YOLOv5 network in advance;
the data preprocessing unit is used for preprocessing the anchor frame marking data set to generate anchor frame marking data samples with uniform sizes;
and the training unit is used for inputting the anchor frame marking data sample into the target detection model for training and generating a trained anchor frame identification model.
The specific implementation is shown in the method embodiment, and is not described herein again.
In one embodiment, the anchor frame data obtaining module further comprises:
the first feature extraction unit is used for inputting the anchor frame marking data sample into a ResNet network of the target detection model for feature extraction;
the second feature extraction unit is used for inputting the extracted first features into a neck network part of the target detection model and acquiring second features output by the neck network part;
the third feature generation unit is used for merging the second features according to the resolution and then generating third features;
the prediction unit is used for inputting the third characteristic to the head network part and acquiring a prediction result output by the head network;
and the prediction result evaluation unit is used for evaluating the prediction result and adjusting the network parameters of the anchor frame identification model according to the evaluation result until the identification precision of the anchor frame identification model meets the preset condition.
The specific implementation is shown in the method embodiment, and is not described herein again.
In one embodiment, the apparatus further comprises:
and the coordinate conversion module is used for acquiring the category and the coordinate of the anchor frame data output by the anchor frame recognition model and converting the coordinate of the anchor frame data into the coordinate in the form image according to a preset rule.
The specific implementation is shown in the method embodiment, and is not described herein again.
Another embodiment of the present invention provides an electronic device, as shown in fig. 9, an electronic device 10 includes:
one or more processors 110 and a memory 120, where one processor 110 is illustrated in fig. 9, the processor 110 and the memory 120 may be connected by a bus or other means, and fig. 9 illustrates the connection by the bus as an example.
The processor 110 is used to implement various control logic of the electronic device 10, which may be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), a single chip, an ARM (Acorn RISC Machine) or other programmable logic device, discrete gate or transistor logic, discrete hardware controls, or any combination of these components. Also, the processor 110 may be any conventional processor, microprocessor, or state machine. Processor 110 may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration.
The memory 120, which is a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules, such as program instructions corresponding to the information extraction method in the embodiment of the present invention. The processor 110 executes various functional applications and data processing of the device 10, i.e. implements the information extraction method in the above-described method embodiments, by running non-volatile software programs, instructions and units stored in the memory 120.
The memory 120 may include a storage program area and a storage data area, wherein the storage program area may store an operating device, an application program required for at least one function; the storage data area may store data created according to the use of the device 10, and the like. Further, the memory 120 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, memory 120 optionally includes memory located remotely from processor 110, which may be connected to device 10 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
One or more units are stored in the memory 120, which when executed by the one or more processors 110 perform the information extraction method in any of the method embodiments described above, e.g. performing the method steps S100 to S400 in fig. 1 described above.
Embodiments of the present invention provide a non-transitory computer-readable storage medium storing computer-executable instructions for execution by one or more processors, for example, to perform method steps S100-S400 of fig. 1 described above.
By way of example, non-volatile storage media can include read-only memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically erasable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM), which acts as external cache memory. By way of illustration and not limitation, RAM is available in many forms such as Synchronous RAM (SRAM), dynamic RAM, (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), enhanced SDRAM (ESDRAM), synchlink DRAM (SLDRAM), and Direct Rambus RAM (DRRAM). The disclosed memory controls or memories of the operating environments described herein are intended to comprise one or more of these and/or any other suitable types of memory.
Another embodiment of the present invention provides a computer program product comprising a computer program stored on a non-transitory computer-readable storage medium, the computer program comprising program instructions that, when executed by a processor, cause the processor to perform the information extraction method of the above-described method embodiment. For example, the method steps S100 to S400 in fig. 1 described above are performed.
The above-described embodiments are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
Through the above description of the embodiments, it is clear to those skilled in the art that the embodiments may be implemented by software plus a general hardware platform, and may also be implemented by hardware. Based on such understanding, the technical solutions in essence or part contributing to the related art can be embodied in the form of a software product, which can be present in a computer readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes several instructions for causing a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the method of various embodiments or some parts of embodiments.
Conditional language such as "can," "might," or "may" is generally intended to convey that a particular embodiment can include (yet other embodiments do not include) particular features, elements, and/or operations, among others, unless specifically stated otherwise or otherwise understood within the context as used. Thus, such conditional language is also generally intended to imply that features, elements and/or operations are in any way required for one or more embodiments or that one or more embodiments necessarily include logic for deciding, with or without input or prompting, whether these features, elements and/or operations are included or are to be performed in any particular embodiment.
What has been described herein in the specification and drawings includes examples capable of providing an information extraction method and apparatus. It will, of course, not be possible to describe every conceivable combination of components and/or methodologies for purposes of describing the various features of the present disclosure, but it can be appreciated that many further combinations and permutations of the disclosed features are possible. It is therefore evident that various modifications may be made to the disclosure without departing from the scope or spirit thereof. In addition, or in the alternative, other embodiments of the disclosure may be apparent from consideration of the specification and drawings and from practice of the disclosure as presented herein. It is intended that the examples set forth in this specification and the drawings be considered in all respects as illustrative and not restrictive. Although specific terms are employed herein, they are used in a generic and descriptive sense only and not for purposes of limitation.

Claims (18)

1. An information extraction method, characterized in that the method comprises:
acquiring a form image to be identified, and extracting text data in the form image;
inputting the form image into a trained anchor frame recognition model to obtain anchor frame data of the form image;
matching a pre-generated offline form template library according to the anchor frame data;
and fusing the anchor frame data and the text data based on the matching result to obtain structured form information.
2. The method according to claim 1, wherein the obtaining of the form image to be recognized and the extracting of the text data in the form image comprises:
acquiring a form image to be identified, and performing text identification on the form image based on a text identification algorithm;
and obtaining text data in the form image based on a text recognition result, wherein the text data comprises text content and corresponding position coordinates.
3. The method according to claim 2, wherein the inputting the form image to be recognized into the trained anchor frame recognition model to obtain anchor frame data of the form image comprises:
obtaining an anchor frame marking result of the form image sample, and obtaining an anchor frame marking data set based on the anchor frame marking result;
training a target detection model based on an anchor frame labeling data set to generate a trained anchor frame identification model;
and inputting the form image into the anchor frame recognition model to obtain anchor frame data of the form image, wherein the anchor frame data comprises anchor frame categories and corresponding position coordinates.
4. The method of claim 3, wherein matching a pre-generated library of offline form templates from the anchor frame data comprises:
retrieving a pre-generated offline form template library according to the position coordinates of the anchor frame, outputting the corresponding position of the anchor frame and the specific content of the corresponding line-crossing or page-crossing text, and recording as a first text;
and searching other texts except the anchor frame position based on the position coordinates of the anchor frame data, and recording the texts as second texts.
5. The method according to claim 4, wherein fusing the anchor frame data with the text data based on the matching result to obtain the structured form information comprises:
and combining the anchor frame data and the text data based on the first text and the second text to obtain the structured form information.
6. The method according to any one of claims 3-5, wherein training a target detection model based on an anchor frame labeling dataset to generate a trained anchor frame recognition model comprises:
a target detection model based on a YOLOv5 network is constructed in advance;
preprocessing the anchor frame labeling data set to generate anchor frame labeling data samples with uniform sizes;
and inputting the anchor frame marking data sample into a target detection model for training to generate a trained anchor frame identification model.
7. The method of claim 6, wherein the inputting the anchor frame labeling data samples into a target detection model for training and generating a trained anchor frame recognition model comprises:
inputting the anchor frame labeling data sample into a ResNet network of a target detection model for feature extraction;
inputting the extracted first features into a neck network part of the target detection model, and acquiring second features output by the neck network part;
merging the second features according to the resolution to generate third features;
inputting the third characteristic into a head network part to obtain a prediction result output by the head network;
and evaluating the prediction result, and adjusting the network parameters of the anchor frame identification model according to the evaluation result until the identification precision of the anchor frame identification model meets the preset condition.
8. The method of claim 7, further comprising:
and acquiring anchor frame data coordinates output by an anchor frame identification model, and converting the coordinates of the anchor frame data into coordinates in the form image according to a preset rule.
9. An information extraction apparatus, characterized in that the apparatus comprises:
the text data acquisition module is used for acquiring a form image to be identified and extracting text data in the form image;
the anchor frame data acquisition module is used for inputting the form image into a trained anchor frame recognition model to obtain anchor frame data of the form image;
the form template matching module is used for matching a pre-generated offline form template library according to the anchor frame data;
and the data fusion module is used for fusing the anchor frame data and the text data based on the matching result to obtain structured form information.
10. The apparatus of claim 9, wherein the text data obtaining module comprises:
the form image recognition unit is used for acquiring a form image to be recognized and performing text recognition on the form image based on a text recognition algorithm;
and the text data acquisition unit is used for obtaining text data in the form image based on a text recognition result, wherein the text data comprises text content and corresponding position coordinates.
11. The apparatus of claim 10, wherein the anchor frame data acquisition module comprises:
the data set acquisition unit is used for acquiring an anchor frame marking result of the form image sample and obtaining an anchor frame marking data set based on the anchor frame marking result;
the model training unit is used for training the target detection model based on the anchor frame marking data set to generate a trained anchor frame recognition model;
and the model output unit is used for inputting the form image into the anchor frame recognition model to obtain anchor frame data of the form image, wherein the anchor frame data comprises anchor frame categories and corresponding position coordinates.
12. The apparatus of claim 11, wherein the form template matching module comprises:
the first text matching unit is used for retrieving a pre-generated offline form template library according to the position coordinates of the anchor frame, outputting the corresponding position of the anchor frame and the specific content of the corresponding cross-line or cross-page text, and recording the corresponding position and the specific content as a first text;
and the second text matching unit is used for searching other texts except the anchor frame position based on the position coordinates of the anchor frame data and recording the texts as second texts.
13. The apparatus of claim 12, wherein the data fusion module comprises:
and the data merging unit is used for merging the anchor frame data and the text data based on the first text and the second text to obtain the structured form information.
14. The apparatus of any of claims 11-13, wherein the anchor frame data acquisition module further comprises:
the model construction unit is used for constructing a target detection model based on a YOLOv5 network in advance;
the data preprocessing unit is used for preprocessing the anchor frame marking data set to generate anchor frame marking data samples with uniform sizes;
and the training unit is used for inputting the anchor frame marking data sample into a target detection model for training to generate a trained anchor frame identification model.
15. The apparatus of claim 14, wherein the anchor frame data acquisition module further comprises:
a first feature extraction unit, configured to input the anchor frame annotation data sample into a ResNet network of a target detection model to perform feature extraction;
the second feature extraction unit is used for inputting the extracted first features into a neck network part of the target detection model and acquiring second features output by the neck network part;
the third feature generation unit is used for merging the second features according to the resolution and generating third features;
the prediction unit is used for inputting the third characteristic to the head network part and acquiring a prediction result output by the head network;
and the prediction result evaluation unit is used for evaluating the prediction result and adjusting the network parameters of the anchor frame identification model according to the evaluation result until the identification precision of the anchor frame identification model meets the preset condition.
16. The apparatus of claim 15, further comprising:
and the coordinate conversion module is used for acquiring the category and the coordinate of the anchor frame data output by the anchor frame recognition model and converting the coordinate of the anchor frame data into the coordinate in the form image according to a preset rule.
17. An electronic device, characterized in that the electronic device comprises at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of information extraction of any one of claims 1-8.
18. A non-transitory computer-readable storage medium having stored thereon computer-executable instructions that, when executed by one or more processors, cause the one or more processors to perform the information extraction method of any one of claims 1-8.
CN202211046852.5A 2022-08-30 2022-08-30 Information extraction method and device and electronic equipment Pending CN115223183A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211046852.5A CN115223183A (en) 2022-08-30 2022-08-30 Information extraction method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211046852.5A CN115223183A (en) 2022-08-30 2022-08-30 Information extraction method and device and electronic equipment

Publications (1)

Publication Number Publication Date
CN115223183A true CN115223183A (en) 2022-10-21

Family

ID=83617919

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211046852.5A Pending CN115223183A (en) 2022-08-30 2022-08-30 Information extraction method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN115223183A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117058689A (en) * 2023-10-09 2023-11-14 巴斯夫一体化基地(广东)有限公司 Offline detection data processing method for chemical production

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117058689A (en) * 2023-10-09 2023-11-14 巴斯夫一体化基地(广东)有限公司 Offline detection data processing method for chemical production
CN117058689B (en) * 2023-10-09 2024-02-20 巴斯夫一体化基地(广东)有限公司 Offline detection data processing method for chemical production

Similar Documents

Publication Publication Date Title
US9626555B2 (en) Content-based document image classification
US20230401828A1 (en) Method for training image recognition model, electronic device and storage medium
CN112231484A (en) News comment auditing method, system, device and storage medium
CN112464845B (en) Bill recognition method, equipment and computer storage medium
CN110059688B (en) Picture information identification method, device, computer equipment and storage medium
CN112418812A (en) Distributed full-link automatic intelligent clearance system, method and storage medium
JP2019079347A (en) Character estimation system, character estimation method, and character estimation program
CN114357174B (en) Code classification system and method based on OCR and machine learning
CN114821612B (en) Method and system for extracting information of PDF document in securities future scene
CN115223183A (en) Information extraction method and device and electronic equipment
CN113806613B (en) Training image set generation method, training image set generation device, computer equipment and storage medium
CN113673528B (en) Text processing method, text processing device, electronic equipment and readable storage medium
Arslan End to end invoice processing application based on key fields extraction
JP7320570B2 (en) Method, apparatus, apparatus, medium and program for processing images
KR20230062251A (en) Apparatus and method for document classification based on texts of the document
CN114140649A (en) Bill classification method, bill classification device, electronic apparatus, and storage medium
CN112418813A (en) AEO qualification intelligent rating management system and method based on intelligent analysis and identification and storage medium
CN115994232B (en) Online multi-version document identity authentication method, system and computer equipment
CN115130437B (en) Intelligent document filling method and device and storage medium
CN114581923A (en) Table image and corresponding annotation information generation method, device and storage medium
CN112800771B (en) Article identification method, apparatus, computer readable storage medium and computer device
Lopes et al. Artificial Intelligence and Machine Learning Approaches to Document Digitization in the Banking Industry: An Analysis.
CN112949514A (en) Scanned document information processing method and device, electronic equipment and storage medium
Graef et al. A novel hybrid optical character recognition approach for digitizing text in forms
CN113449716B (en) Field positioning and classifying method, text image recognition method, device and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination