CN113568965A

CN113568965A - Method and device for extracting structured information, electronic equipment and storage medium

Info

Publication number: CN113568965A
Application number: CN202110864888.3A
Authority: CN
Inventors: 厉超; 王巍; 石明; 潘仰耀; 李捷
Original assignee: Shanghai Pudong Development Bank Co Ltd
Current assignee: Shanghai Pudong Development Bank Co Ltd
Priority date: 2021-07-29
Filing date: 2021-07-29
Publication date: 2021-10-29

Abstract

The embodiment of the invention discloses a method and a device for extracting structured information, electronic equipment and a storage medium. Wherein, the method comprises the following steps: acquiring an original form picture of any format, and adjusting the rotation angle of the original form picture based on a preset angle correction model to obtain a target form picture; determining text content and text coordinates in the target form picture according to a preset text detection and recognition model, and generating field content according to the incidence relation between the text content and the text coordinates; and determining a field name matched with the field content according to a preset graph convolution model, and outputting the field name and the field content in a key value pair mode to obtain an information extraction result. The method and the device realize the structured extraction of the form data of any format by adopting the volume model of the graph, and improve the efficiency and the precision of information extraction.

Description

Method and device for extracting structured information, electronic equipment and storage medium

Technical Field

The present invention relates to information extraction technologies, and in particular, to a method and an apparatus for extracting structured information, an electronic device, and a storage medium.

Background

The payment instructions of different card holding or non-card holding financial institutions are different, and the same financial institution has payment instructions of different formats according to the change of time or products.

The existing structured extraction of payment instruction information is to extract the content of a payment instruction by using a template matching mode. According to the text position and the content thereof in the instruction picture, matching search is carried out on the picture by using a preset form template, for example, the template with the highest matching degree or a manually specified template is used for extracting the instruction content.

The mode of template matching is used for requiring that all payment instructions needing to be extracted are subjected to format combing, and corresponding templates are configured one by one. The template matching needs to compare the matching degrees of all the templates, and the more the format of the payment instruction is, the longer the template matching time is needed. And a large amount of manpower and time are required to be invested in template arrangement and configuration, when the instruction format is changed, the corresponding template needs to be reconfigured, and a new structured extraction rule is compiled for each template, so that the later maintenance and updating work is not facilitated, and the efficiency and the precision of information extraction are influenced.

Disclosure of Invention

The embodiment of the invention provides a method and a device for extracting structured information, electronic equipment and a storage medium, which are used for improving the extraction efficiency and the extraction precision of the structured information.

In a first aspect, an embodiment of the present invention provides a method for extracting structured information, where the method includes:

the method comprises the steps of obtaining an original form picture of any format payment instruction, and adjusting the rotation angle of the original form picture based on a preset angle correction model to obtain a target form picture;

determining text content and text coordinates in the target form picture according to a preset text detection and recognition model, and generating field content according to the incidence relation between the text content and the text coordinates;

and determining a field name matched with the field content according to a preset graph convolution model, and outputting the field name and the field content in a key-value pair mode to obtain an information extraction result of the payment instruction.

In a second aspect, an embodiment of the present invention further provides an apparatus for extracting structured information, where the apparatus includes:

the target picture obtaining module is used for obtaining an original form picture of any format payment instruction, and adjusting the rotation angle of the original form picture based on a preset angle correction model to obtain a target form picture;

the field content generation module is used for determining the text content and the text coordinates in the target form picture according to a preset text detection and identification model and generating the field content according to the incidence relation between the text content and the text coordinates;

and the structure information extraction module is used for determining the field name matched with the field content according to a preset graph convolution model, and outputting the field name and the field content in a key value pair mode to obtain an information extraction result of the payment instruction.

In a third aspect, an embodiment of the present invention further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the computer program to implement the method for extracting structured information according to any embodiment of the present invention.

In a fourth aspect, the present invention further provides a storage medium containing computer-executable instructions, which when executed by a computer processor, are configured to perform the method for extracting structured information according to any embodiment of the present invention.

The embodiment of the invention obtains the target form picture with uniform orientation by rotating and adjusting the form picture of the payment instruction with any format, performs text recognition on the target form picture, and extracts the field content in the picture. And determining the field name matched with the field content according to a preset graph convolution model to obtain the structured information. The problem of among the prior art, need the template that corresponds to the payment instruction configuration of every version is solved, through the neural network of graph convolution, can carry out information extraction to the form picture of arbitrary version, reduce the process of carrying out the template configuration to the payment instruction of multiple version, practice thrift manpower and time, improve the efficiency and the precision of information extraction.

Drawings

Fig. 1 is a schematic flow chart of a method for extracting structured information according to a first embodiment of the present invention;

fig. 2 is a schematic flow chart of a method for extracting structured information according to a second embodiment of the present invention;

fig. 3 is a block diagram of a device for extracting structured information according to a third embodiment of the present invention;

fig. 4 is a schematic structural diagram of an apparatus for extracting structured information in the fourth embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.

Example one

Fig. 1 is a flowchart illustrating a method for extracting structured information according to an embodiment of the present invention, where the method is applicable to extracting structured information in a payment instruction, and the method can be executed by an apparatus for extracting structured information. As shown in fig. 1, the method specifically includes the following steps:

and 110, acquiring an original form picture of any format payment instruction, and adjusting the rotation angle of the original form picture based on a preset angle correction model to obtain a target form picture.

The financial institution may issue a payment instruction to the bank, perform operations such as transferring account amount, and for example, may issue a pension payment instruction or a fund payment instruction. The payment instruction can be submitted in the form of a form, and the form of the payment instruction of different financial institutions is different, for example, the payment instruction of a financial institution can comprise three fields of amount, initiating account and target account, and can also comprise fields of amount, initiating account, target account and time. And the information of the payment instruction is submitted to the bank in a form, and the bank extracts the structured information from the form picture to complete payment. And when a payment instruction of any format is received, acquiring a form picture of the payment instruction as an original form picture. Any format means that the field arrangement order, the field number and the like in the form pictures submitted by different organizations can be different. The financial institution may take a picture, scan or fax to upload the form containing the payment instruction information to the bank, and may also upload a PDF (Portable Document Format) electronic file. After the original form picture is obtained, angle adjustment is carried out on the original form picture, and the orientation of the original form picture is unified. An angle correction model may be preset to adjust the picture angle, and the angle correction model may be a ResNet (Deep residual network) model. For example, the original form picture may be rotated by 90 degrees to the left, 90 degrees to the right, or 180 degrees, and the direction of the characters in the form is corrected by correcting the form angle. And the original form picture after the angle is adjusted is the target form picture.

In this embodiment, optionally, before obtaining the original form picture of the payment instruction of any format, the method further includes: receiving a message request of at least one payment instruction, and judging whether parameters of the received message request meet preset parameter rules or not; if so, determining whether the number of the message requests exceeds a preset number threshold; if not, analyzing the message request to obtain an original form picture; and judging whether the picture format of the original form picture meets the preset format requirement, if so, executing to obtain the original form picture of any format payment instruction, and adjusting the rotation angle of the original form picture based on a preset angle correction model to obtain the target form picture.

Specifically, a message request of a payment instruction submitted by a financial institution is received. And acquiring message request parameters, and judging whether the parameters of the message request accord with the parameter rules according to the preset parameter rules. The parameter rules may include an information format of the payment instruction, a type of the payment instruction, and the like, and the information format of the payment instruction may be a parameter in a picture format, for example, image _ path, image _ url, or image _ base64, which respectively represent a local path, a download link, and a base64 data stream. For example, if the information format of the payment instruction in the message request parameter is a text format and the information format of the payment instruction in the parameter rule is a picture format, the message request does not conform to the parameter rule. If the message request does not accord with the parameter rule, the original form picture of the payment instruction is not received, a message of 'parameter error' is returned, and the information extraction process is directly ended.

And if the parameters of the received message requests accord with preset parameter rules, judging whether the quantity of the received message requests exceeds a preset quantity threshold value. For example, a quantity threshold may be preset to be five message requests, that is, information extraction that a GPU (Graphics Processing Unit) processes five payment instructions at most once, and determine whether the quantity of currently received or to-be-processed message requests exceeds five, if so, return a message of "signal acquisition failure", and directly end the process. And if the number of the message requests does not exceed the preset number threshold, downloading and analyzing the message requests to obtain the original form pictures of the payment instructions in the messages. After the original form picture is obtained, the original form picture can be further verified, whether the original form picture meets a preset format requirement or not is judged, and the format requirement can be requirements of the size of the picture and the like. For example, it may be verified whether the size of the original form picture meets a preset size requirement. If not, feeding back a message of 'wrong payment instruction format', and ending the process. If so, determining to obtain an original form picture of the payment instruction, and adjusting the rotation angle of the original form picture according to a preset angle correction model to obtain a target form picture. The beneficial effects of the setting are that before information extraction, whether the message request of the payment instruction accords with the preset rule is judged, the message request which does not accord with the rule is prevented from being analyzed, the format of the original form picture is standardized through the preset format requirement, information extraction errors caused by different formats are avoided, and the accuracy and the efficiency of information extraction are improved.

And 120, determining text content and text coordinates in the target form picture according to a preset text detection and recognition model, and generating field content according to the incidence relation between the text content and the text coordinates.

The method comprises the steps of presetting a text detection and identification model for identifying texts in a target form picture. And inputting the target form picture into the text detection and identification model, and outputting to obtain the text content in the target form picture and the text coordinate of the position of the text content. And associating the text content with the text coordinate one by one, and generating field content according to the association relationship between the text content and the text coordinate. That is, the field content includes a plurality of pieces of text information, and each piece of text information includes text content and corresponding text coordinates.

And step 130, determining a field name matched with the field content according to a preset graph convolution model, and outputting the field name and the field content in a key-value pair mode to obtain an information extraction result of the payment instruction.

The Graph convolution model is trained in advance, and the Graph convolution model may be a GCNN (Graph Convolutional Neural Network) model, and when the model is trained, names of fields to be extracted may be determined in advance, for example, three fields to be extracted are set in advance, and when the model is output, field contents corresponding to the three fields in the input form picture may be obtained.

And inputting the field content into the trained GCNN model, namely inputting the text content and the text coordinates into the GCNN model. The GCNN model may match text content with field names, for example, if the field names are account numbers, then match digital content in the text content in the form of account numbers to content in the account number fields. According to the text coordinates, the field names corresponding to the text coordinates in the target form picture can be obtained, the field contents and the field names are output in a key value pair mode, the field names are used as keys (keywords), the field contents are used as values, and the association relation among the field names, the text contents and the text coordinates is obtained. And obtaining a structured information extraction result of the payment instruction according to the field name and the key value pair of the field content.

In this embodiment, optionally, determining a field name matched with the field content according to a preset graph convolution model includes: determining the longitudinal distance between the text boxes according to a pre-trained graph convolution model; clustering the text coordinates of the text box according to the longitudinal distance to obtain the field position in the target form picture; performing feature extraction on the text content at the field position, and determining the matching relation between the text content and the field to be extracted preset in the graph convolution model; and determining the field name matched with the field position and the text content according to the matching relation.

Specifically, the input of the GCNN graph convolution model used in this embodiment is two items of information, namely, text coordinates and text content. A piece of text content may be framed by a text box, and the text coordinates may be coordinates of four vertices of the text box. In the aspect of text coordinates, considering that the payment instruction is in a form, determining the distance between different fields in the form according to the characteristic that the fields in the payment instruction are longitudinally arranged, for example, the content of the same field is the content of the upper and lower rows of texts, and the longitudinal distance between the two rows of texts is small; the contents of two different fields are respectively the contents of one row of texts, and the longitudinal distance between the two rows of texts is large. And determining the longitudinal distance between the text boxes according to the text coordinates of the text contents in the target form picture. And clustering the text boxes according to the longitudinal distance, and determining the text boxes with the longitudinal distance smaller than a preset distance threshold value as a cluster. The content in a clustered text box is all the text content of a field, and the position of the clustered text box is the field position of the field. And the number of the clustered clusters is the number of the fields in the target form picture.

In the aspect of text content, feature extraction is performed on all text content at each field position, the extracted features are matched with preset field names to be extracted, and a matching relation between the text content and the fields to be extracted is determined, wherein the matching relation can be the matching probability between the text content and the fields to be extracted. For example, if there are three fields to be extracted, and the matching relationship between the text content at a certain field position and the three fields to be extracted is 10%, 5%, and 85%, respectively, the field name matching the text content at the field position is determined as the field name with a matching probability of 85%. A Bag-of-words (Bag of words) model can be added into the GCNN model, and the text content is represented in the form of an N-dimensional vector which is used for describing the characteristic information of the text content of the payment instruction. The method has the advantages that through the graph convolution model, text coordinates can be clustered, so that the text content is divided into fields, the text content belonging to one field is divided into one group, the characteristics of the group of text content are extracted, the matching precision of the field content and the field name is improved, and the information extraction precision is improved.

By the GCNN model, the problem of template configuration is solved, and end-to-end information extraction of the multi-format payment instruction is realized. The extraction template of payment instructions of various formats is not required to be arranged by spending a large amount of time and labor cost. When the payment instruction format changes, template information does not need to be reconfigured one by one and extraction rules do not need to be adjusted. Since the present embodiment does not use template matching but model inference, the information extraction time does not increase as the layout increases. The structured extraction model has certain generalization capability, and can realize universal structured extraction on the same multi-format voucher bills or new bills with similar formats.

According to the technical scheme, the form picture of the payment instruction of any format is rotationally adjusted to obtain the target form picture with the uniform orientation, text recognition is carried out on the target form picture, and field content in the picture is extracted. And determining the field name matched with the field content according to a preset graph convolution model to obtain the structured information. The problem of among the prior art, need the template that corresponds to the payment instruction configuration of every version is solved, through the neural network of graph convolution, can carry out information extraction to the form picture of arbitrary version, reduce the process of carrying out the template configuration to the payment instruction of multiple version, practice thrift manpower and time, improve the efficiency and the precision of information extraction.

Example two

Fig. 2 is a flowchart illustrating a method for extracting structured information according to a second embodiment of the present invention, which is further optimized based on the second embodiment. As shown in fig. 2, the method specifically includes the following steps:

step 210, obtaining an original form picture of any format payment instruction, and adjusting a rotation angle of the original form picture based on a preset angle correction model to obtain a target form picture.

The method comprises the steps of determining an original form picture of a format corresponding to a financial institution payment instruction according to the financial institution payment instruction, judging whether the original form picture rotates or not according to a preset angle correction model, and if not, not adjusting the angle; if so, adjusting the rotation angle of the original form picture to enable the direction of the characters in the original form picture to be a forward direction, and enabling the form picture after rotation to be the target form picture. And if the original form picture does not need to be rotated and adjusted, the original form picture is the target form picture.

In this embodiment, optionally, the adjusting the rotation angle of the original form picture based on the preset angle correction model to obtain the target form picture includes: determining whether the rotation angle of the original form picture is a preset rotation angle type or not according to a preset angle classification model; if so, adjusting the original form picture according to the rotation angle of the original form picture to obtain an initial form picture; determining the current corner coordinates of the initial form picture according to a preset corner detection algorithm; judging whether the current corner coordinates are in rectangular arrangement or not, if not, determining target corner coordinates in rectangular arrangement according to the current corner coordinates; and adjusting the initial form picture according to the transformation matrix of the current corner coordinates and the target corner coordinates to obtain a target form picture.

Specifically, the angle correction model may include an angle classification model, and the angle classification model is used to classify the rotation angle of the original form picture, for example, the rotation angle of the original form picture may be divided into five classes, which are 0 °, 90 °, 180 °, 270 ° and other angles. The orientation detection can be carried out on the picture by adopting a ResNet model as an angle classification model, and the model determines whether the picture rotates or not by learning the character and the characteristic information of the whole picture in four orientations of 0 degrees, 90 degrees, 180 degrees and 270 degrees. If the rotation angle of the picture is not 0 degrees, 90 degrees, 180 degrees or 270 degrees, the angle type is determined to be other angles, the model cannot judge the orientation of the picture at other angles, and a message of 'instruction submission error' can be returned to prompt the user to submit the payment instruction again. If the image orientation is judged to be 0 degrees, the original form image orientation is correct, the original form image is the initial form image, and the next step can be carried out. And if the image orientation is judged to be 90 degrees, 180 degrees or 270 degrees, the image orientation is corrected to be 0 degrees. The rotation adjustment process of the original form picture may be that the picture is rotated according to the detected angle category by using a central point of the picture as a rotation center, and the rotated picture is the original form picture.

And presetting an angular point detection algorithm, and after the initial form picture is obtained, determining the maximum form frame position of the initial form picture according to the angular point detection algorithm and form information in the initial form picture, so as to obtain four angular point coordinates of the form as current angular point coordinates. For example, the maximum form frame position can be obtained by methods such as channel filtering and open/close operation. The channel filtering may be filtering red channels in the initial form picture, e.g., the initial form picture is stamped with red chapters, affecting the determination of the diagonal points. And determining whether the arrangement of the current corner coordinates is a rectangular arrangement or not according to the four current corner coordinates of the form, namely determining whether the four current corner coordinates can form a rectangle or not.

Whether the initial form picture has perspective distortion or not can be detected by checking the arrangement of the coordinates of the corner points. The perspective distortion means that when the user uploads the original form picture of the payment instruction, the original form picture is not a standard rectangle due to the problem of the photographing angle, for example, the original form picture is a trapezoid with a short upper side and a long lower side.

And if the current corner point coordinates are arranged in a rectangular shape, determining that the initial form picture does not have perspective distortion, and ending the picture preprocessing process of angle adjustment, wherein the initial form picture is the target form picture. And if the current corner point coordinates are not in rectangular arrangement, determining that perspective distortion exists in the initial form picture. And determining target corner coordinates according to the current corner coordinates, wherein the target corner coordinates are coordinates which can form rectangular arrangement after correction. The coordinates of the target corner point can be determined according to the slopes of the four sides of the maximum form frame of the initial form picture. For example, the longest side of the quadrangle in the initial form picture is determined according to the current corner coordinates, and the target corner coordinates are obtained according to the length-width ratio of the form and the length of the longest side. And calculating a transformation matrix of the current corner point coordinates and the corrected target corner point coordinates, operating the initial form picture according to the transformation matrix, and correcting possible perspective distortion of the image to obtain the target form picture. The beneficial effect that sets up like this lies in, through carrying out the preliminary treatment to the image, corrects the perspective that probably exists in the image and rotation scheduling problem, promotes the effect of follow-up information extraction, reduces the error of information extraction, improves information extraction efficiency and precision.

And step 220, determining text content and text coordinates in the target form picture according to a preset text detection and recognition model, and generating field content according to the incidence relation between the text content and the text coordinates.

The method comprises the steps of extracting characters in a target form picture to obtain text content and text coordinates, and serving for subsequent structured extraction.

In this embodiment, optionally, determining the text content and the text coordinates in the target form picture according to a preset text detection and recognition model includes: obtaining a text coordinate in the target form picture according to a preset text detection model; according to the text coordinates, slicing the area corresponding to the text coordinates in the target form picture to obtain a text frame image slice; and according to a preset character recognition model, carrying out character recognition on the text box image slice to obtain text contents corresponding to the text coordinates.

Specifically, the text detection model is used for extracting information of text coordinates in the target form picture, and characters in the text coordinates are subjected to frame selection to obtain a text box. And slicing the text box area to obtain a text box image slice. And identifying each text box image slice by using a character identification model to obtain the text content in each text box image slice.

The text detection model may be a PSENet (Progressive Scale Expansion network) model, and detects text coordinate information in the target form picture, and stores the text coordinate information in a four-point coordinate form, that is, four vertex coordinates of the text box are stored. And slicing the text box area according to the coordinate information of the four vertexes of the text box to obtain the text box image slice. In this embodiment, the text in each text box is a single line of text. The character recognition model can be a convolution circulation neural network model, and character recognition is carried out on the image slices of the text box to obtain character contents corresponding to each slice. And assembling the text coordinates of each slice with the text content to generate field content containing all the text coordinates and the corresponding text content. The beneficial effect who sets up like this lies in, divides the text through the section, is convenient for carry out character recognition to each text box, avoids the text too much to lead to the recognition error, improves text recognition's precision and efficiency.

And step 230, determining a field name matched with the field content according to a preset graph volume model, and outputting the field name and the field content in a key value pair mode to obtain an information extraction result of the payment instruction.

Wherein the longitudinal distance between text boxes is determined according to a pre-trained graph convolution model. And clustering the text coordinates of the text box according to the longitudinal distance to obtain the field position in the target form picture. And performing feature extraction on the text content at the field position, and determining the matching relation between the text content and the field to be extracted preset in the graph convolution model. And determining the field name matched with the field position and the text content according to the matching relation.

In this embodiment, optionally, the number of graph convolution models is at least two; correspondingly, according to a preset graph convolution model, determining a field name matched with the field content, and further comprising: inputting the field content into at least two graph convolution models to obtain a field name which is output by any one graph convolution model and matched with the field content and a matching confidence coefficient of the field content and the field name; comparing the matching confidence of any field content in at least two graph convolution models; and determining the name of the target field matched with the field content according to the comparison result.

Specifically, a plurality of graph convolution models can be trained in advance, different graph convolution models can correspond to different types of payment instructions, and fields to be extracted of the graph convolution models can be different. For example, the types of payment instructions include pension payment instructions and fund payment instructions, and different financial institutions may have different payment instruction formats for the same type of payment instruction. Fields required to be extracted by the same type of payment instruction are the same or similar, so that the graph volume model corresponding to one type of payment instruction can process multiple formats of the payment instruction.

When the financial institution submits the payment instruction, the type of the payment instruction can be specified in the message parameter in advance. After a message request of a payment instruction is received, whether the explained payment instruction type exists in the message request can be checked, if yes, whether the explained payment instruction type is consistent with the payment instruction type of the graph convolution model is judged, if yes, whether the parameter of the received message request accords with a preset parameter rule is continuously judged, and subsequent processing is carried out. And after generating the field content, selecting a graph volume model corresponding to the payment instruction type in the message request to extract the structured information. If the payment instruction type is not consistent with the payment instruction type of the graph convolution model, a message of 'payment instruction type error' can be returned, and the information extraction process is finished. And if the message request does not have the explained payment instruction type, continuing to execute and judge whether the parameters of the received message request meet the preset parameter rule or not, and performing subsequent processing.

Under the condition that the stated payment instruction type does not exist in the message request, after the field content is generated, the field content is input into all graph convolution models, each graph convolution model can output the field name matched with each field content, and the matching confidence coefficient between the field content and the field name can be obtained. And comparing the matching confidence of each graph volume model for the same field content, determining the field name with higher matching confidence with the field content or the field name with the matching confidence exceeding a preset confidence threshold according to the comparison result, and taking the field name as the target field name of the field content. For example, if the field name of a certain field content is matched by the model I is the "account number", the matching confidence is 3, the field name of the field content is matched by the model II is the "card number", and the matching confidence is 10, the target field name of the field content can be determined to be the "card number".

Each graph convolution model can determine a matched field name for each field content, after each graph convolution model determines the field name corresponding to each field content, the average value of the matching confidence degrees of the field content determined by the model and the corresponding field name can be calculated, the average value of each graph convolution model is compared, the model with the higher average value and exceeding the threshold value of the average value is selected as a target graph convolution model, and the matching result of the target graph convolution model is taken as a final result. If the average values are below the average threshold, then the default model is used. The beneficial effect that sets up like this lies in, can integrate a plurality of GCNN models to realize the extraction of multiclass and the payment instruction information of many editions, improve the flexibility and the extraction precision of information extraction.

Step 240, judging whether the text content in the key value pair meets a preset format check rule; and if not, correcting the text content.

After the structured information of the payment instruction is obtained, post-processing may be performed on the structured information, and the post-processing may be to check the information in the key value pair and determine whether the text content in the key value pair meets a preset format check rule. The format check rule may specify the number of words in the text content, for example, for the text content of the keyword "account", the format of the "account" may be specified as 16 digits in advance, and if the number exceeds 16 digits, the format check rule is not satisfied. If the text content in the key value pair meets a preset format check rule, determining that the extraction of the structured information is finished, and ending the extraction process; if not, the text content which does not meet the format check rule can be corrected. For example, the correction rule may be a two-digit decimal point and a thousand-digit separator that is required for the output amount, and when the recognition result is missing or misrecognized, the output result may be formatted according to the correction rule, that is, the decimal point or the separator may be added to the recognized text content.

The embodiment of the invention obtains the target form picture with uniform orientation by rotating and adjusting the form picture of the payment instruction with any format, performs text recognition on the target form picture, and extracts the field content in the picture. And determining the field name matched with the field content according to a preset graph convolution model to obtain structured information, and verifying the structured information to improve the information precision. The problem of among the prior art, need carry out information extraction to the form picture of arbitrary edition through the neural network of graph convolution, reduce the process that carries out the template configuration to the payment instruction of multiple edition, realized end-to-end multi-layout payment instruction and draw the flow, carry out image correction, characters detection and identification and structuralization automatically and draw, need not artifical the intervention, improve the efficiency and the precision of information extraction is solved to the payment instruction configuration corresponding template of every kind of edition.

EXAMPLE III

Fig. 3 is a block diagram of a structure of an apparatus for extracting structured information according to a third embodiment of the present invention, which is capable of executing a method for extracting structured information according to any embodiment of the present invention, and has functional modules and beneficial effects corresponding to the execution method. As shown in fig. 3, the apparatus specifically includes:

the target picture obtaining module 301 is configured to obtain an original form picture of any format payment instruction, and adjust a rotation angle of the original form picture based on a preset angle correction model to obtain a target form picture;

a field content generating module 302, configured to determine text content and text coordinates in the target form picture according to a preset text detection and identification model, and generate field content according to an association relationship between the text content and the text coordinates;

and the structure information extraction module 303 is configured to determine a field name matched with the field content according to a preset graph convolution model, and output the field name and the field content in a key-value pair manner to obtain an information extraction result of the payment instruction.

Optionally, the apparatus further comprises:

the message parameter judgment module is used for receiving a message request of at least one payment instruction before acquiring an original form picture of any format payment instruction, and judging whether parameters of the received message request meet preset parameter rules or not;

the request quantity determining module is used for determining whether the message request quantity exceeds a preset quantity threshold value if the message request quantity exceeds the preset quantity threshold value;

the original picture obtaining module is used for analyzing the message request to obtain an original form picture if the original picture is not the form picture;

and the picture format judging module is used for judging whether the picture format of the original form picture meets the preset format requirement, if so, executing the original form picture for acquiring any format payment instruction, and adjusting the rotation angle of the original form picture based on a preset angle correction model to obtain a target form picture.

Optionally, the target picture obtaining module 301 includes:

the angle type determining unit is used for determining whether the rotation angle of the original form picture is a preset rotation angle type according to a preset angle classification model;

the initial picture obtaining unit is used for adjusting the original form picture according to the rotation angle of the original form picture to obtain an initial form picture if the original form picture is in the original form picture;

a current corner coordinate determining unit, configured to determine a current corner coordinate of the initial form picture according to a preset corner detection algorithm;

a target corner coordinate determining unit, configured to determine whether the current corner coordinates are in rectangular arrangement, and if not, determine target corner coordinates in rectangular arrangement according to the current corner coordinates;

and the target picture determining unit is used for adjusting the initial form picture according to the transformation matrix of the current corner point coordinates and the target corner point coordinates to obtain a target form picture.

Optionally, the field content generating module 302 includes:

the text coordinate obtaining unit is used for obtaining a text coordinate in the target form picture according to a preset text detection model;

the image slice obtaining unit is used for slicing the area corresponding to the text coordinate in the target form picture according to the text coordinate to obtain a text frame image slice;

and the text content obtaining unit is used for carrying out character recognition on the text box image slices according to a preset character recognition model to obtain text contents corresponding to text coordinates.

Optionally, the structural information extracting module 303 includes:

the longitudinal distance determining unit is used for determining the longitudinal distance between the text boxes according to a pre-trained graph convolution model;

a field position determining unit, configured to cluster the text coordinates of the text box according to the longitudinal distance to obtain a field position in the target form picture;

the matching relation determining unit is used for extracting the characteristics of the text content at the field position and determining the matching relation between the text content and the field to be extracted preset in the graph convolution model;

and the field name determining unit is used for determining the field name matched with the field position and the text content according to the matching relation.

In this embodiment, optionally, the apparatus further includes:

the information checking module is used for judging whether the text content in the key value pair meets a preset format checking rule or not after the field name and the field content are output in the form of the key value pair;

and the information correction module is used for correcting the text content if the text content is not corrected.

Optionally, the number of graph convolution models is at least two;

correspondingly, the structure information extracting module 303 is further specifically configured to:

inputting the field content into at least two graph convolution models to obtain a field name which is output by any one graph convolution model and matched with the field content and a matching confidence coefficient of the field content and the field name;

comparing the matching confidence of any field content in at least two graph convolution models;

and determining the name of the target field matched with the field content according to the comparison result.

Example four

Fig. 4 is a schematic structural diagram of an apparatus for extracting structured information according to a fourth embodiment of the present invention. The structured information extraction device is an electronic device and fig. 4 shows a block diagram of an exemplary electronic device 400 suitable for use in implementing embodiments of the present invention. The electronic device 400 shown in fig. 4 is only an example and should not bring any limitation to the function and the scope of use of the embodiments of the present invention.

As shown in fig. 4, electronic device 400 is embodied in the form of a general purpose computing device. The components of electronic device 400 may include, but are not limited to: one or more processors or processing units 401, a system memory 402, and a bus 403 that couples the various system components (including the system memory 402 and the processing unit 401). In this embodiment, the electronic device 400 may further include a graphics processor.

Bus 403 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, micro-channel architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.

Electronic device 400 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by electronic device 400 and includes both volatile and nonvolatile media, removable and non-removable media.

The system memory 402 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM)404 and/or cache memory 405. The electronic device 400 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 406 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 4, and commonly referred to as a "hard drive"). Although not shown in FIG. 4, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In these cases, each drive may be connected to the bus 403 by one or more data media interfaces. Memory 402 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the invention.

A program/utility 408 having a set (at least one) of program modules 407 may be stored, for example, in memory 402, such program modules 407 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof may comprise an implementation of a network environment. Program modules 407 generally perform the functions and/or methods of the described embodiments of the invention.

The electronic device 400 may also communicate with one or more external devices 409 (e.g., keyboard, pointing device, display 410, etc.), with one or more devices that enable a user to interact with the electronic device 400, and/or with any devices (e.g., network card, modem, etc.) that enable the electronic device 400 to communicate with one or more other computing devices. Such communication may be through input/output (I/O) interface 411. Also, the electronic device 400 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) via the network adapter 412. As shown in FIG. 4, the network adapter 412 communicates with the other modules of the electronic device 400 over the bus 403. It should be appreciated that although not shown in FIG. 4, other hardware and/or software modules may be used in conjunction with electronic device 400, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

The processing unit 401 executes various functional applications and data processing by running the program stored in the system memory 402, for example, implementing a method for extracting structured information provided by an embodiment of the present invention, including:

EXAMPLE five

The fifth embodiment of the present invention further provides a storage medium containing computer-executable instructions, where a computer program is stored on the storage medium, and when the computer program is executed by a processor, the method for extracting structured information provided in the fifth embodiment of the present invention is implemented, where the method includes:

Computer storage media for embodiments of the invention may employ any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims

1. A method for extracting structured information is characterized by comprising the following steps:

2. The method of claim 1, further comprising, prior to obtaining the original form picture of any format payment instruction:

receiving a message request of at least one payment instruction, and judging whether parameters of the received message request meet preset parameter rules or not;

if so, determining whether the number of the message requests exceeds a preset number threshold;

if not, analyzing the message request to obtain an original form picture;

and judging whether the picture format of the original form picture meets the preset format requirement, if so, executing the original form picture for acquiring any format payment instruction, and adjusting the rotation angle of the original form picture based on a preset angle correction model to obtain a target form picture.

3. The method of claim 1, wherein adjusting the rotation angle of the original form picture based on a preset angle correction model to obtain a target form picture comprises:

determining whether the rotation angle of the original form picture is a preset rotation angle type or not according to a preset angle classification model;

if so, adjusting the original form picture according to the rotation angle of the original form picture to obtain an initial form picture;

determining the current corner coordinates of the initial form picture according to a preset corner detection algorithm;

judging whether the current corner coordinates are in rectangular arrangement or not, if not, determining target corner coordinates in rectangular arrangement according to the current corner coordinates;

and adjusting the initial form picture according to the transformation matrix of the current corner point coordinates and the target corner point coordinates to obtain a target form picture.

4. The method of claim 1, wherein determining the text content and the text coordinates in the target form picture according to a preset text detection and recognition model comprises:

obtaining a text coordinate in the target form picture according to a preset text detection model;

according to the text coordinates, slicing the area corresponding to the text coordinates in the target form picture to obtain a text frame image slice;

and according to a preset character recognition model, carrying out character recognition on the text box image slice to obtain text contents corresponding to text coordinates.

5. The method of claim 4, wherein determining the field name matching the field content according to a preset graph convolution model comprises:

determining the longitudinal distance between the text boxes according to a pre-trained graph convolution model;

clustering the text coordinates of the text box according to the longitudinal distance to obtain the field position in the target form picture;

performing feature extraction on the text content at the field position, and determining a matching relation between the text content and a preset field to be extracted in a graph convolution model;

and determining the field name matched with the field position and the text content according to the matching relation.

6. The method of claim 1, further comprising, after outputting the field name and field content in the form of a key-value pair:

judging whether the text content in the key value pair meets a preset format check rule or not;

and if not, correcting the text content.

7. The method of claim 1, wherein the graph convolution models are at least two;

correspondingly, according to a preset graph convolution model, determining a field name matched with the field content, and further comprising:

8. An apparatus for extracting structured information, comprising:

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of extracting structured information according to any one of claims 1 to 7 when executing the program.

10. A storage medium containing computer-executable instructions for performing the method of extracting structured information according to any one of claims 1 to 7 when executed by a computer processor.