CN115223183A

CN115223183A - Information extraction method and device and electronic equipment

Info

Publication number: CN115223183A
Application number: CN202211046852.5A
Authority: CN
Inventors: 孙强; 常鹏; 周辉; 冯兴祥; 黎利
Original assignee: Ping An Bank Co Ltd
Current assignee: Ping An Bank Co Ltd
Priority date: 2022-08-30
Filing date: 2022-08-30
Publication date: 2022-10-21

Abstract

The invention discloses an information extraction method, an information extraction device and electronic equipment, wherein the method comprises the following steps: acquiring a form image to be identified, and extracting text data in the form image; inputting the form image into a trained anchor frame recognition model to obtain anchor frame data of the form image; matching a pre-generated offline form template library according to the anchor frame data; and fusing the anchor frame data and the text data based on a matching result to obtain structured form information. The embodiment of the invention can acquire the coordinate data of the anchor frame based on the target detection model of the image, obtain the text information corresponding to the anchor frame through the template matching algorithm based on the text, and use the off-line form template library for matching, thereby improving the accuracy of final information extraction and reducing the error rate of the error detection and omission of the cross-line or cross-page text.

Description

Information extraction method and device and electronic equipment

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to an information extraction method and apparatus, and an electronic device.

Background

The image identification technology is widely applied to banking business, a large number of certificates are required to be authenticated, identified and checked in an operation link, the certificates are classified and identified manually in the traditional certificate approval process, and then the certificates are manually input, so that the efficiency is low, the time consumption is long, and the error rate is high. Through development of image recognition technology, the existing OCR form recognition technology can basically recognize bank general forms or certificates, such as identity cards, social security cards, credit card application forms, checks and the like, customers can upload directly recognized forms through a scanning function or photographing on applets, webpages and H5 pages, and administrators can automatically extract key information of the verified forms by using image recognition products.

In the international settlement business scene of a bank, the international settlement business scene relates to the link of extracting and auditing a large number of forms of the credit certificates, and due to the factors of multiple forms of the credit certificates, different formats, smaller fonts and the like, the existing image identification technology at present has low accuracy of identifying the credit certificates, the extracted texts have error and leakage conditions, manual re-processing is needed to correct the extracted contents, a large amount of human resources are put into simple and repeated low-value work, the international settlement document auditing efficiency is difficult to improve, and the forms are required to influence structuralization, the style and the accuracy of the extracted forms are increased, and the document auditing progress is accelerated.

In the prior art, the problem of structuring a form image is divided into two independent tasks: and reading the text and extracting information, and analyzing and extracting key elements in the text after detecting and identifying the text in the image. However, the method mainly focuses on improving the task of information extraction, ignores the relevance between text reading and information extraction, and thus, although the problem of form image structuring can be effectively solved, the method only focuses on the operation of two independent tasks, ignores the relevance between the two tasks, and causes that the accuracy of information extraction is reduced when the situation of text line crossing or page crossing occurs, thereby affecting the final text generation.

Accordingly, there is a need for improvements and developments in the art.

Disclosure of Invention

In view of the defects of the prior art, the invention provides an information extraction method, an information extraction device and electronic equipment, and aims to solve the problem that when information is extracted in the prior art, the accuracy of information extraction is low when text line crossing or page crossing exists.

The technical scheme of the invention is as follows:

a first embodiment of the present invention provides an information extraction method, including:

acquiring a form image to be identified, and extracting text data in the form image;

inputting the form image into a trained anchor frame recognition model to obtain anchor frame data of the form image;

matching a pre-generated offline form template library according to the anchor frame data;

and fusing the anchor frame data and the text data based on the matching result to obtain structured form information.

Further, the acquiring a form image to be recognized and extracting text data in the form image includes:

acquiring a form image to be identified, and performing text identification on the form image based on a text identification algorithm;

and obtaining text data in the form image based on a text recognition result, wherein the text data comprises text content and corresponding position coordinates.

Further, the inputting the form image to be recognized into the trained anchor frame recognition model to obtain the anchor frame data of the form image includes:

obtaining an anchor frame marking result of the form image sample, and obtaining an anchor frame marking data set based on the anchor frame marking result;

training a target detection model based on an anchor frame labeling data set to generate a trained anchor frame identification model;

and inputting the form image into the anchor frame recognition model to obtain anchor frame data of the form image, wherein the anchor frame data comprises anchor frame categories and corresponding position coordinates.

Further, if the matching result is based on, fusing the anchor frame data and the text data to obtain structured form information, including:

and merging the anchor frame data and the text data based on the first text and the second text to obtain the structured form information.

Further, based on the matching result, fusing the anchor frame data and the text data to obtain structured form information, including:

Further, the inputting the anchor frame labeling data sample into a target detection model for training to generate a trained anchor frame recognition model includes:

inputting the anchor frame marking data sample into a ResNet network of a target detection model for feature extraction;

inputting the extracted first features into a neck network part of the target detection model, and acquiring second features output by the neck network part;

merging the second features according to the resolution to generate third features;

inputting the third characteristic into a head network part to obtain a prediction result output by the head network;

and evaluating the prediction result, and adjusting the network parameters of the anchor frame identification model according to the evaluation result until the identification precision of the anchor frame identification model meets the preset condition.

Further, the method further comprises:

and acquiring anchor frame data coordinates output by an anchor frame recognition model, and converting the coordinates of the anchor frame data into coordinates in the form image according to a preset rule.

Another embodiment of the present invention provides an information extracting apparatus, including:

the text data acquisition module is used for acquiring a form image to be identified and extracting text data in the form image;

the anchor frame data acquisition module is used for inputting the form image into a trained anchor frame recognition model to obtain anchor frame data of the form image;

the form template matching module is used for matching a pre-generated offline form template library according to the anchor frame data;

and the data fusion module is used for fusing the anchor frame data and the text data based on the matching result to obtain structured form information.

Further, the text data acquisition module includes:

the form image recognition unit is used for acquiring a form image to be recognized and performing text recognition on the form image based on a text recognition algorithm;

and the text data acquisition unit is used for acquiring text data in the form image based on a text recognition result, wherein the text data comprises text content and corresponding position coordinates.

Further, the anchor frame data acquisition module comprises:

the data set acquisition unit is used for acquiring an anchor frame labeling result of the form image sample and acquiring an anchor frame labeling data set based on the anchor frame labeling result;

the model training unit is used for training the target detection model based on the anchor frame marking data set to generate a trained anchor frame identification model;

and the model output unit is used for inputting the form image into the anchor frame recognition model to obtain anchor frame data of the form image, wherein the anchor frame data comprises anchor frame categories and corresponding position coordinates.

Further, a single-mode board matching module, comprising:

the first text matching unit is used for retrieving a pre-generated offline form template library according to the position coordinates of the anchor frame, outputting the corresponding position of the anchor frame and the specific content of the corresponding cross-line or cross-page text, and recording the corresponding position and the specific content as a first text;

and the second text matching unit is used for searching other texts except the anchor box position based on the position coordinates of the anchor box data and recording the texts as second texts.

Further, the data fusion module comprises:

and the data merging unit is used for merging the anchor frame data and the text data based on the first text and the second text to obtain the structured form information.

Further, the anchor frame data acquisition module further comprises:

the model construction unit is used for constructing a target detection model based on a YOLOv5 network in advance;

the data preprocessing unit is used for preprocessing the anchor frame marking data set to generate anchor frame marking data samples with uniform sizes;

and the training unit is used for inputting the anchor frame marking data sample into a target detection model for training to generate a trained anchor frame identification model.

Further, the anchor frame data obtaining module further comprises:

the first feature extraction unit is used for inputting the anchor frame marking data sample into a ResNet network of a target detection model for feature extraction;

the second feature extraction unit is used for inputting the extracted first features into a neck network part of the target detection model and acquiring second features output by the neck network part;

the third feature generation unit is used for merging the second features according to the resolution and generating third features;

the prediction unit is used for inputting the third characteristic to the head network part and acquiring a prediction result output by the head network;

and the prediction result evaluation unit is used for evaluating the prediction result and adjusting the network parameters of the anchor frame identification model according to the evaluation result until the identification precision of the anchor frame identification model meets the preset condition.

Further, the apparatus further comprises:

and the coordinate conversion module is used for acquiring the category and the coordinate of the anchor frame data output by the anchor frame recognition model and converting the coordinate of the anchor frame data into the coordinate in the form image according to a preset rule.

Another embodiment of the present invention provides an electronic device comprising at least one processor; and the number of the first and second groups,

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the above-described information extraction method.

Another embodiment of the present invention also provides a non-transitory computer-readable storage medium storing computer-executable instructions that, when executed by one or more processors, cause the one or more processors to perform the above-described information extraction method.

Has the beneficial effects that: according to the information extraction method, the coordinate data of the anchor frame is obtained through the image-based target detection model, the text information corresponding to the anchor frame is obtained through the text-based template matching algorithm, the offline form template base is used for matching, the accuracy of final information extraction is improved, and the error rate of false detection and missed detection of cross-line or cross-page texts is reduced.

Drawings

The invention will be further described with reference to the following drawings and examples, in which:

FIG. 1 is a flow chart of a preferred embodiment of an information extraction method according to the present invention;

FIG. 2 is a flowchart illustrating a detailed step of step S100 according to an embodiment of the present invention;

FIG. 3 is a flowchart illustrating a detailed step of step S200 according to an embodiment of the present invention;

FIG. 4 is a flowchart of a detailed step of step S300 according to an embodiment of the present invention;

FIG. 5 is a functional block diagram of an information extraction apparatus according to an embodiment of the present invention;

fig. 6 is a schematic diagram of functional modules of the text data acquiring module 11 according to an embodiment of the present invention;

fig. 7 is a schematic diagram of functional modules of the anchor frame data acquiring module 12 according to a specific application embodiment of the information extracting apparatus of the present invention;

FIG. 8 is a functional block diagram of the form template matching module 13 according to an embodiment of the present invention;

fig. 9 is a schematic diagram of a hardware structure of an electronic device according to a preferred embodiment of the invention.

Detailed Description

In order to make the objects, technical solutions and effects of the present invention clearer and clearer, the present invention is described in further detail below. It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention.

Embodiments of the present invention will be described below with reference to the accompanying drawings.

In view of the above problems, an embodiment of the present invention provides an information extraction method, please refer to fig. 1, where fig. 1 is a flowchart of a preferred embodiment of the information extraction method according to the present invention. As shown in fig. 1, it includes:

s100, obtaining a form image to be identified, and extracting text data in the form image;

in specific implementation, the information extraction method in the embodiment of the invention is used for extracting information in an image. For example, the image may contain a form image of the form data, and may be a letter of credit image. The original text contains tables and characters, and the tables represent different contents. According to the embodiment of the invention, after the form image to be identified is obtained, the form image is subjected to image processing, and the text data in the form image is extracted. The text data may be extracted by an OCR (optical character recognition), wherein a specific implementation process of the OCR is the prior art, and is not described herein again.

S200, inputting the form image into the trained anchor frame recognition model to obtain anchor frame data of the form image;

in specific implementation, in order to better acquire form image data, the embodiment of the invention divides the form image into a plurality of checked-box anchor frames. And inputting the form image to be recognized into the trained anchor frame recognition model so as to obtain the anchor frame data of the form image. The anchor frame identification model may employ a target detection algorithm model. And the specific function of the characters in the form image is accurately obtained by obtaining the anchor frame.

Step S300, matching a pre-generated offline form template library according to the anchor frame data;

in specific implementation, in order to improve the processing speed of the algorithm, anchor frame data can be obtained in a form of generating a form template library in advance, and a form template matched with the anchor frame data is obtained through the anchor frame, so that subsequent data processing is facilitated.

And S400, fusing the anchor frame data and the text data based on the matching result to obtain structured form information.

And during specific implementation, if the form template is matched, filling the anchor frame data according to the form template, and fusing the text data and the anchor frame data to obtain structured form information.

The embodiment of the invention aims at extracting and structuring the key information of the current form, decomposing related tasks in the form, improving the form by adopting two types of models of recognition and detection, fusing the improved models, positioning the key information position by using a detector with higher precision, and effectively improving the precision of the finally generated text key information by one-step detection work in advance on the premise of not influencing the recognition precision of the corresponding text.

In one embodiment, as shown in fig. 2, step S100 includes:

s101, obtaining a form image to be identified, and performing text identification on the form image based on a text identification algorithm;

and S102, obtaining text data in the form image based on the text recognition result, wherein the text data comprises text content and corresponding position coordinates.

In specific implementation, after a form image to be recognized is obtained, text recognition is performed on the form image based on an existing text recognition algorithm model, and all text contents and position coordinates of the text contents in the form image are extracted.

In one embodiment, as shown in fig. 3, step S200 includes:

step S201, obtaining an anchor frame labeling result of the form image sample, and obtaining an anchor frame labeling data set based on the anchor frame labeling result;

step S202, training a target detection model based on an anchor frame labeling data set to generate a trained anchor frame recognition model;

step S203, inputting the form image into an anchor frame recognition model to obtain anchor frame data of the form image, wherein the anchor frame data comprises anchor frame categories and corresponding position coordinates.

In specific implementation, the existing form image sample is labeled manually to generate an anchor frame labeling result. And obtaining an anchor frame marking result to generate an anchor frame data set. Specifically, the provided 500-case single image data is manually labeled, formats of the packets-boxes are manually screened, the packets-boxes are divided into 10 categories according to different selection modes of the related packets-boxes, finally, the labeled position coordinates are converted into formats required by the algorithm, and the labeled data format is csv.

And dividing the anchor frame labeling data set into a training set and a verification set to train the target detection model and generate a trained anchor frame recognition model. And inputting the form image to be processed into the trained anchor frame recognition model to obtain anchor frame data, wherein the anchor frame data comprises but is not limited to anchor frame categories and corresponding position coordinates. Specifically, the current task is to train a model capable of detecting specific coordinates of a clicked-box marking the position of text key information, wherein the input is a form image picture, and the output is related categories and specific position coordinates of all clicked-boxes in the picture. The data set is divided into a training set and a verification set according to 8.

In one embodiment, as shown in fig. 4, step S300 includes:

s301, retrieving a pre-generated offline form template library according to the position coordinates of the anchor frame, and outputting the corresponding position of the anchor frame and the specific content of the corresponding cross-line or cross-page text, wherein the specific content is recorded as a first text;

and S302, searching other texts except the anchor frame position based on the position coordinates of the anchor frame data, and recording the texts as second texts.

And during specific implementation, searching a corresponding template result, searching from the constructed template dictionary according to the coordinates of the anchor frame obtained in the previous step, finding out the problems contained in the corresponding positions of the key information and the specific content of the corresponding cross-line or cross-page text, marking as a first text, and outputting the first text. Searching a corresponding text result, searching other normal texts in a region corresponding to the key information according to the anchor frame, and combining all the found key information.

Because labeling work of the content and layout of the text key information is complicated, under the condition that the data set is small in scale, a method of constructing a template dictionary in an off-line mode is adopted for data enhancement, after a small number of data sets are merged and summarized, the related layout and text of the final template are summarized, for example, a form picture A is composed of three parts, and the specific layout of each part and the specific content and position coordinates of the related cross-line and cross-page text are determined. Before the text is input into the model, the matched template is found and is sent into the model together, so that the accuracy of final information extraction is improved, and the error rate of false detection and missing detection of the cross-line and cross-page text is reduced.

In one embodiment, step S400 includes:

During specific implementation, the first text and the second text are obtained through the position coordinates of the anchor frame and the corresponding text, the first text and the second text are fused and processed to obtain all the final contents of the key information of the form image, and the form image data is structured.

In one embodiment, training a target detection model based on an anchor frame labeling dataset to generate a trained anchor frame recognition model comprises:

a target detection model based on a YOLOv5 network is constructed in advance;

preprocessing an anchor frame marking data set to generate anchor frame marking data samples with uniform sizes;

and inputting the anchor frame labeling data sample into a target detection model for training to generate a trained anchor frame recognition model.

In specific implementation, the YOLOv5 target detection algorithm model pre-constructed in the embodiment of the invention is modified to improve the performance of detecting smaller target objects, so as to be used for the task. Due to the particularity of the used data set, the detected objects only occupy a small pixel area, and the characteristics are not obvious, so that the input sizes of the pictures are unified to be (2000 ), the training is performed according to a Yolov5 target detection algorithm model, the input of the Yolov5 target detection algorithm model is form data, and a clicked-box coordinate result is output. Wherein YOLOv5 is a typical target detection one-stage model.

In one embodiment, inputting anchor frame labeling data samples into a target detection model for training, and generating a trained anchor frame recognition model, the method comprises:

inputting an anchor frame marking data sample into a ResNet network of a target detection model for feature extraction;

the second characteristic is that a third characteristic is generated after merging according to the resolution;

inputting the third characteristic into the head network part to obtain a prediction result output by the head network;

and evaluating the prediction result, and adjusting the network parameters of the anchor frame recognition model according to the evaluation result until the recognition precision of the anchor frame recognition model meets the preset condition.

In specific implementation, the anchor frame marking data sample is subjected to feature extraction through a ResNet network to generate a first feature, the extracted first feature is sent to a Neck part, the extracted feature is recorded as a second feature, the second feature is merged according to the resolution, a third feature is generated, the third feature is sent to a Head part to perform final reasoning work, and a prediction result is generated. In the embodiment of the invention, in order to make the network more helpful to the task, the Neck part in the target detection network frame is modified, the network layer corresponding to the feature with lower aggregation resolution is added, and the connection mode of the Neck part and the Head part is changed, so that the network is focused on detecting a specific feature diagram, and the aim of detecting a smaller target is finally achieved.

In a further embodiment, the method further comprises:

and acquiring anchor frame data coordinates output by the anchor frame identification model, and converting the coordinates of the anchor frame data into coordinates in the form image according to a preset rule.

In specific implementation, the corresponding categories and specific coordinates of all required check-boxes in the form image data are deduced according to the detection model, the check-boxes are converted into specific coordinates in the original image according to rules, and coordinate conversion is carried out according to the following formula: new coordinates = original coordinates image corresponding size.

According to the method embodiment, aiming at the current form key information extraction structuralization, the related tasks are decomposed, the related models which are researched before are improved, the two types of models of 'recognition and detection' are adopted for improvement, the improved models are fused, the positions of the key information are positioned by using a detector with higher precision, the precision of the finally generated text key information is effectively improved by one-step detection work in advance on the premise of not influencing the corresponding text recognition precision, meanwhile, compared with the related research, the model training speed is greatly reduced, and finally, a data enhancement method of template retrieval is added, so that the problems of false detection or missed detection of cross-row and cross-page texts are effectively solved.

It should be noted that, a certain order does not necessarily exist between the above steps, and it can be understood by those skilled in the art according to the description of the embodiments of the present invention that, in different embodiments, the above steps may have different execution orders, that is, may be executed in parallel, may also be executed interchangeably, and the like.

Another embodiment of the present invention provides an information extracting apparatus, as shown in fig. 5, the apparatus 1 includes:

the text data acquisition module 11 is configured to acquire a form image to be identified and extract text data in the form image;

the anchor frame data acquisition module 12 is configured to input the form image into the trained anchor frame recognition model to obtain anchor frame data of the form image;

the form template matching module 13 is used for matching a pre-generated offline form template library according to the anchor frame data;

and the data fusion module 14 is configured to fuse the anchor frame data and the text data based on the matching result to obtain structured form information.

The specific implementation is shown in the method embodiment, and is not described herein again.

In one embodiment, as shown in fig. 6, the text data obtaining module 11 includes:

the form image recognition unit 111 is configured to acquire a form image to be recognized, and perform text recognition on the form image based on a text recognition algorithm;

a text data obtaining unit 112, configured to obtain text data in the form image based on the text recognition result, where the text data includes text content and corresponding position coordinates.

In one embodiment, as shown in FIG. 7, the anchor frame data acquisition module 12 includes:

the data set obtaining unit 121 is configured to obtain an anchor frame labeling result of the form image sample, and obtain an anchor frame labeling data set based on the anchor frame labeling result;

the model training unit 122 is configured to train a target detection model based on the anchor frame labeling data set, and generate a trained anchor frame recognition model;

and the model output unit 123 is configured to input the form image into the anchor frame recognition model to obtain anchor frame data of the form image, where the anchor frame data includes an anchor frame category and corresponding position coordinates.

The detailed description of the embodiments is omitted here for the embodiments of the method.

In one embodiment, as shown in FIG. 8, the form template matching module 13 includes:

the first text matching unit 131 is configured to retrieve a pre-generated offline form template library according to the position coordinates of the anchor frame, output a position corresponding to the anchor frame and corresponding specific contents of a cross-line or cross-page text, and record the position as a first text;

and a second text matching unit 132, configured to search for other texts besides the anchor box position based on the position coordinates of the anchor box data, and record the texts as a second text.

In one embodiment, the data fusion module 14 includes:

In one embodiment, the anchor frame data obtaining module further comprises:

and the training unit is used for inputting the anchor frame marking data sample into the target detection model for training and generating a trained anchor frame identification model.

In one embodiment, the anchor frame data obtaining module further comprises:

the first feature extraction unit is used for inputting the anchor frame marking data sample into a ResNet network of the target detection model for feature extraction;

the third feature generation unit is used for merging the second features according to the resolution and then generating third features;

In one embodiment, the apparatus further comprises:

Another embodiment of the present invention provides an electronic device, as shown in fig. 9, an electronic device 10 includes:

one or more processors 110 and a memory 120, where one processor 110 is illustrated in fig. 9, the processor 110 and the memory 120 may be connected by a bus or other means, and fig. 9 illustrates the connection by the bus as an example.

The processor 110 is used to implement various control logic of the electronic device 10, which may be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), a single chip, an ARM (Acorn RISC Machine) or other programmable logic device, discrete gate or transistor logic, discrete hardware controls, or any combination of these components. Also, the processor 110 may be any conventional processor, microprocessor, or state machine. Processor 110 may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration.

The memory 120, which is a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules, such as program instructions corresponding to the information extraction method in the embodiment of the present invention. The processor 110 executes various functional applications and data processing of the device 10, i.e. implements the information extraction method in the above-described method embodiments, by running non-volatile software programs, instructions and units stored in the memory 120.

The memory 120 may include a storage program area and a storage data area, wherein the storage program area may store an operating device, an application program required for at least one function; the storage data area may store data created according to the use of the device 10, and the like. Further, the memory 120 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, memory 120 optionally includes memory located remotely from processor 110, which may be connected to device 10 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

One or more units are stored in the memory 120, which when executed by the one or more processors 110 perform the information extraction method in any of the method embodiments described above, e.g. performing the method steps S100 to S400 in fig. 1 described above.

Embodiments of the present invention provide a non-transitory computer-readable storage medium storing computer-executable instructions for execution by one or more processors, for example, to perform method steps S100-S400 of fig. 1 described above.

By way of example, non-volatile storage media can include read-only memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically erasable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM), which acts as external cache memory. By way of illustration and not limitation, RAM is available in many forms such as Synchronous RAM (SRAM), dynamic RAM, (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), enhanced SDRAM (ESDRAM), synchlink DRAM (SLDRAM), and Direct Rambus RAM (DRRAM). The disclosed memory controls or memories of the operating environments described herein are intended to comprise one or more of these and/or any other suitable types of memory.

Another embodiment of the present invention provides a computer program product comprising a computer program stored on a non-transitory computer-readable storage medium, the computer program comprising program instructions that, when executed by a processor, cause the processor to perform the information extraction method of the above-described method embodiment. For example, the method steps S100 to S400 in fig. 1 described above are performed.

The above-described embodiments are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

Through the above description of the embodiments, it is clear to those skilled in the art that the embodiments may be implemented by software plus a general hardware platform, and may also be implemented by hardware. Based on such understanding, the technical solutions in essence or part contributing to the related art can be embodied in the form of a software product, which can be present in a computer readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes several instructions for causing a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the method of various embodiments or some parts of embodiments.

Conditional language such as "can," "might," or "may" is generally intended to convey that a particular embodiment can include (yet other embodiments do not include) particular features, elements, and/or operations, among others, unless specifically stated otherwise or otherwise understood within the context as used. Thus, such conditional language is also generally intended to imply that features, elements and/or operations are in any way required for one or more embodiments or that one or more embodiments necessarily include logic for deciding, with or without input or prompting, whether these features, elements and/or operations are included or are to be performed in any particular embodiment.

What has been described herein in the specification and drawings includes examples capable of providing an information extraction method and apparatus. It will, of course, not be possible to describe every conceivable combination of components and/or methodologies for purposes of describing the various features of the present disclosure, but it can be appreciated that many further combinations and permutations of the disclosed features are possible. It is therefore evident that various modifications may be made to the disclosure without departing from the scope or spirit thereof. In addition, or in the alternative, other embodiments of the disclosure may be apparent from consideration of the specification and drawings and from practice of the disclosure as presented herein. It is intended that the examples set forth in this specification and the drawings be considered in all respects as illustrative and not restrictive. Although specific terms are employed herein, they are used in a generic and descriptive sense only and not for purposes of limitation.

Claims

1. An information extraction method, characterized in that the method comprises:

2. The method according to claim 1, wherein the obtaining of the form image to be recognized and the extracting of the text data in the form image comprises:

3. The method according to claim 2, wherein the inputting the form image to be recognized into the trained anchor frame recognition model to obtain anchor frame data of the form image comprises:

4. The method of claim 3, wherein matching a pre-generated library of offline form templates from the anchor frame data comprises:

retrieving a pre-generated offline form template library according to the position coordinates of the anchor frame, outputting the corresponding position of the anchor frame and the specific content of the corresponding line-crossing or page-crossing text, and recording as a first text;

and searching other texts except the anchor frame position based on the position coordinates of the anchor frame data, and recording the texts as second texts.

5. The method according to claim 4, wherein fusing the anchor frame data with the text data based on the matching result to obtain the structured form information comprises:

and combining the anchor frame data and the text data based on the first text and the second text to obtain the structured form information.

6. The method according to any one of claims 3-5, wherein training a target detection model based on an anchor frame labeling dataset to generate a trained anchor frame recognition model comprises:

a target detection model based on a YOLOv5 network is constructed in advance;

preprocessing the anchor frame labeling data set to generate anchor frame labeling data samples with uniform sizes;

and inputting the anchor frame marking data sample into a target detection model for training to generate a trained anchor frame identification model.

7. The method of claim 6, wherein the inputting the anchor frame labeling data samples into a target detection model for training and generating a trained anchor frame recognition model comprises:

inputting the anchor frame labeling data sample into a ResNet network of a target detection model for feature extraction;

8. The method of claim 7, further comprising:

and acquiring anchor frame data coordinates output by an anchor frame identification model, and converting the coordinates of the anchor frame data into coordinates in the form image according to a preset rule.

9. An information extraction apparatus, characterized in that the apparatus comprises:

10. The apparatus of claim 9, wherein the text data obtaining module comprises:

and the text data acquisition unit is used for obtaining text data in the form image based on a text recognition result, wherein the text data comprises text content and corresponding position coordinates.

11. The apparatus of claim 10, wherein the anchor frame data acquisition module comprises:

the data set acquisition unit is used for acquiring an anchor frame marking result of the form image sample and obtaining an anchor frame marking data set based on the anchor frame marking result;

the model training unit is used for training the target detection model based on the anchor frame marking data set to generate a trained anchor frame recognition model;

12. The apparatus of claim 11, wherein the form template matching module comprises:

and the second text matching unit is used for searching other texts except the anchor frame position based on the position coordinates of the anchor frame data and recording the texts as second texts.

13. The apparatus of claim 12, wherein the data fusion module comprises:

14. The apparatus of any of claims 11-13, wherein the anchor frame data acquisition module further comprises:

15. The apparatus of claim 14, wherein the anchor frame data acquisition module further comprises:

a first feature extraction unit, configured to input the anchor frame annotation data sample into a ResNet network of a target detection model to perform feature extraction;

16. The apparatus of claim 15, further comprising:

17. An electronic device, characterized in that the electronic device comprises at least one processor; and the number of the first and second groups,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of information extraction of any one of claims 1-8.

18. A non-transitory computer-readable storage medium having stored thereon computer-executable instructions that, when executed by one or more processors, cause the one or more processors to perform the information extraction method of any one of claims 1-8.