CN113011144B - Form information acquisition method, device and server - Google Patents

Form information acquisition method, device and server Download PDF

Info

Publication number
CN113011144B
CN113011144B CN202110339506.5A CN202110339506A CN113011144B CN 113011144 B CN113011144 B CN 113011144B CN 202110339506 A CN202110339506 A CN 202110339506A CN 113011144 B CN113011144 B CN 113011144B
Authority
CN
China
Prior art keywords
target
text data
type
image
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110339506.5A
Other languages
Chinese (zh)
Other versions
CN113011144A (en
Inventor
李兆佳
许明
姜璐
张宝华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202110339506.5A priority Critical patent/CN113011144B/en
Publication of CN113011144A publication Critical patent/CN113011144A/en
Application granted granted Critical
Publication of CN113011144B publication Critical patent/CN113011144B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/186Templates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/174Form filling; Merging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Character Input (AREA)

Abstract

The specification provides a method, a device and a server for acquiring form information. Based on the method, the topic type of a target form in a target image is determined, text data in the target form, position information of the text data and a first type of dependence relationship based on the position between the text data are obtained; further, a preset knowledge graph is introduced, and a generalized nested form model of the target form, which simultaneously contains text data in the target form and first-class dependency relationships among the text data and second-class dependency relationships based on semantics, is constructed by combining the topic type; and then according to the generalized nested form model of the target form, the first type of dependency relationship and the second type of dependency relationship between the text data can be comprehensively utilized, and according to the target rule, the corresponding text data can be accurately extracted to serve as target form information. Therefore, the method is applicable to forms with different form styles at the same time, corresponding form information extraction is carried out, and the generalization is good.

Description

Form information acquisition method, device and server
Technical Field
The present disclosure relates to the field of artificial intelligence technologies, and in particular, to a method, an apparatus, and a server for acquiring form information.
Background
In many data processing scenarios, it is often necessary for a staff member to extract some or all of the form information of interest in a paper form for subsequent digitized data processing.
Based on the existing method, an extraction model of form information is often required to be independently built and trained for each form style in advance to specifically extract the form information in the form of the form style. Once the form styles are changed, an extraction model needs to be additionally retrained to extract form information in the form corresponding to the changed form styles. Therefore, the existing method is relatively poor in generalization in implementation, and form information in forms with different form styles cannot be effectively extracted.
In view of the above problems, no effective solution has been proposed at present.
Disclosure of Invention
The specification provides a method, a device and a server for acquiring form information, so that the method, the device and the server can be simultaneously applied to forms with different form styles to extract specific form information.
The specification provides a method for acquiring form information, which comprises the following steps:
acquiring a target image; the target image comprises a target form to be processed;
Determining the topic type of a target form contained in the target image;
acquiring text data in a target form and position information of the text data according to the target image, and determining a first type of position-based dependency relationship between the text data;
determining a second type of semantic-based dependency relationship between text data according to a preset knowledge graph, the topic type and text data in the target form and position information of the text data, and constructing a generalized nested form model for obtaining the target form; the generalized nested form model of the target form comprises text data, and a first type of position-based dependency relationship and a second type of semantic-based dependency relationship between the text data;
and extracting corresponding text data from the generalized nested form model of the target form according to the target rule to obtain target form information.
In one embodiment, after acquiring the target image, the method further comprises:
preprocessing the target image; wherein the pretreatment comprises at least one of: image normalization processing, tilt correction processing, and warp restoration processing.
In one embodiment, where the preprocessing includes a warp restoration process, preprocessing the target image includes:
detecting whether a target form in the target image has distortion or not;
determining a distortion type under the condition that the target form in the target image is determined to have distortion; wherein the distortion type includes: the distortion existing in the target form itself and introduced when the target image is acquired;
under the condition that the distortion type is determined to be the distortion existing in the target form, a preset distortion repair processing model is called, and the target image is processed; the preset distortion repair processing model is a deep learning model comprising a DocUNet structure formed by stacking two U-nets.
In one embodiment, determining the subject type of the target form contained in the target image includes:
processing the target image by using a SIFT algorithm to extract and obtain target image characteristics;
and calling a preset topic classification model to process the target image characteristics so as to determine the topic type of a target form contained in the target image.
In one embodiment, the topic type includes at least one of: real estate certificate, wedding certificate, financial statement, invoice.
In one embodiment, according to the target image, acquiring text data in a target form and position information of the text data, and determining a first type of position-based dependency relationship between the text data, including:
invoking a preset text detection model, and processing the target image according to the topic type of the target form so as to identify and determine a plurality of text image areas in the target image; wherein the text image area contains text data in the form of an image;
invoking a preset processing model, and processing a plurality of text image areas in the target image to extract text data in each text image area;
the method comprises the steps of performing list structure reduction processing on a target form to determine the position information of text data;
and determining a first type of dependency relationship based on the position between the text data according to the position information of the text data.
In one embodiment, the method for determining the position information of the text data by performing the list structure reduction processing on the target list comprises the following steps:
converting the target image into a gray scale map;
dividing a target form in the target image into a combination of a plurality of rectangular units according to the gray level map;
Correcting the combination of the rectangular units according to the determined text image areas in the target form to obtain the corrected combination of the rectangular units;
and determining the rectangular unit where each text data is located according to the combination of the plurality of corrected rectangular units, and obtaining the position information of each text data.
In one embodiment, determining a second type of semantic-based dependency relationship between text data according to a preset knowledge graph, the topic type, text data in the target form and position information of the text data, and constructing a generalized nested typing form model for obtaining the target form, including:
according to the topic type, determining a matched mode layer from a preset knowledge graph;
and constructing a corresponding metadata layer and an instance layer according to the matched mode layer, the text data in the target form and the position information of the text data so as to obtain a generalized nested form model of the target form.
In one embodiment, the target rule includes a preset custom extraction rule; wherein, the custom extraction rule includes: and customizing the target key value of the extracted text data and/or customizing the extraction condition.
In one embodiment, according to a target rule, extracting corresponding text data from a generalized nested form model of the target form to obtain target form information includes:
determining whether the current user-defined extraction condition is met according to the target rule;
under the condition that the current user-defined extraction condition is met is determined, text data corresponding to the target key value is determined to be used as first target text data by searching text data in a generalized nested table model of a target form according to the target rule, and text data corresponding to the first target text data determined based on a first type of dependency relationship of a position and/or a second type of dependency relationship of a semantic is determined to be used as second target text data;
and combining the first target text data and the second target text data as the target form information.
The specification also provides a device for acquiring form information, which comprises:
the acquisition module is used for acquiring a target image; the target image comprises a target form to be processed;
the determining module is used for determining the theme type of the target form contained in the target image;
The first processing module is used for acquiring text data in the target form and position information of the text data according to the target image, and determining a first type of position-based dependency relationship between the text data;
the second processing module is used for determining a second type of semantic-based dependency relationship between text data according to a preset knowledge graph, the topic type and the text data and the position information of the text data in the target form, and constructing a generalized nested typing form model of the target form; the generalized nested form model of the target form comprises text data, and a first type of position-based dependency relationship and a second type of semantic-based dependency relationship between the text data;
and the extraction module is used for extracting corresponding text data from the generalized nested form model of the target form according to the target rule so as to obtain target form information.
The present specification also provides a server comprising a processor and a memory for storing processor-executable instructions, which when executed by the processor implement the relevant steps of the form information acquisition method.
The present specification also provides a computer readable storage medium having stored thereon computer instructions which when executed implement the relevant steps of the form information acquisition method.
According to the method, the device and the server for acquiring the form information, the text data in the target form and the generalized nested form model of the target form based on the first type dependency relationship and the second type dependency relationship between the text data in the position and the semantic are constructed for the target form in the target image by introducing and utilizing the preset knowledge graph and combining the determined subject type of the target form; and according to the generalized nested form model of the target form, simultaneously integrating the association relationship between two dimensions of the first type of dependency relationship based on the position and the second type of dependency relationship based on the semantics between the text data, and accurately extracting the corresponding text data as the target form information according to the target rule. Therefore, the method is applicable to the forms with different form styles at the same time, specific form information is extracted, and the generalization is good.
Drawings
In order to more clearly illustrate the embodiments of the present disclosure, the drawings that are required for the embodiments will be briefly described below, in which the drawings are only some of the embodiments described in the present disclosure, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of one embodiment of the structural composition of a system to which the form information acquisition method provided by the embodiments of the present specification is applied;
FIG. 2 is a flow chart of a method for obtaining form information according to an embodiment of the present disclosure;
FIG. 3 is a schematic diagram of the structural composition of a server according to one embodiment of the present disclosure;
fig. 4 is a schematic structural diagram of an apparatus for acquiring form information according to an embodiment of the present disclosure;
FIG. 5 is a schematic diagram of one embodiment of a method for acquiring form information provided by embodiments of the present disclosure, in one example scenario;
fig. 6 is a schematic diagram of an embodiment of a method for acquiring form information provided by the embodiments of the present disclosure, in one scenario example.
Detailed Description
In order to make the technical solutions in the present specification better understood by those skilled in the art, the technical solutions in the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification, and it is obvious that the described embodiments are only some embodiments of the present specification, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are intended to be within the scope of the present disclosure.
It is contemplated that existing methods often require separate construction and training of an extraction model for each form style for processing forms of such form styles to extract the corresponding form information. Based on the above method, once the form style is changed, a new extraction model needs to be retrained to get the form of the changed form style. The method has poor generalization property, and can not be effectively suitable for form processing of different form styles.
In order to solve the above problems, the present specification considers the extraction model trained and used by the conventional method, and often only considers and uses the dependency relationship formed based on the position between the text data in the form. Therefore, when the form style changes, the extraction model for the new dependency relationship formed based on the new position tends to be retrained after the position of the text data within the form changes. In order to solve the problems, the specification provides that a second type of semantic-based dependency relationship between text data in a form can be determined by introducing and utilizing a preset knowledge graph and combining the topic type of the form based on the dimension of the semantic; meanwhile, a generalized nested form model which simultaneously contains text data in a form and first type dependency relationships based on positions and second type dependency relationships based on semantics among the text data in the form is constructed by combining the first type dependency relationships based on positions among the text data in the form determined based on structural dimensions. Furthermore, according to the generalized nested form model, the relation of two different dimensions, namely the first type of dependency relation based on the position and the second type of dependency relation based on the semantics, between text data in the form can be comprehensively utilized to extract the required form information. Therefore, the first type of dependency relationship based on the position between text data in the form is not needed to be excessively depended, so that the extraction of the form information has better generalization, and the method is applicable to the forms of different form styles and can be used for extracting the corresponding form information.
The embodiment of the specification provides a method for acquiring form information, which can be particularly applied to a system comprising a server and terminal equipment. Reference may be made in particular to fig. 1. The server and the terminal equipment can be connected in a wired or wireless mode so as to perform specific data interaction.
In this embodiment, the server may specifically include a background server applied to a service platform side and capable of implementing functions such as data transmission and data processing. Specifically, the server may be, for example, an electronic device having data operation, storage function and network interaction function. Alternatively, the server may be a software program running in the electronic device that provides support for data processing, storage, and network interactions. In the present embodiment, the number of servers is not particularly limited. The server may be one server, several servers, or a server cluster formed by several servers.
In this embodiment, the terminal device may specifically include a front-end electronic device that is applied to a user (for example, a staff member) side and is capable of implementing functions such as data acquisition and data transmission. Specifically, the terminal device may be, for example, a desktop computer, a tablet computer, a notebook computer, a smart phone, etc. Alternatively, the terminal device may be a software application capable of running in the above-mentioned electronic device. For example, it may be some APP running on a smart phone, etc.
In this embodiment, after a worker receives a paper form of a financial statement (a target form) provided by a client, in order to quickly extract a few pieces of form information focused on in the financial statement, the worker may first take a photograph including the financial statement as a target image including the target form through a camera disposed in a terminal device. The terminal device may then send the acquired target image to the server in a wired or wireless manner, so as to request the server to extract the form information of interest from the target image.
Correspondingly, the server receives and acquires the target image. The server can determine that the topic type of the target form contained in the target image is a financial statement through corresponding processing.
Then, the server can combine the theme type of the target form, call a preset text detection model first, and process the target image so as to identify and determine a plurality of text image areas in the target image. Wherein each text image area contains text data in the form of an image in the target form. Then, the server may call a preset processing model to process a plurality of text image areas in the target image, so as to identify and extract text data in each text image area. Further, the server may determine the location information of the text data by performing a list structure reduction process on the target form. And determining a first type of dependency relationship based on the position between the text data according to the position information of the text data.
Further, the server may determine, according to a preset knowledge graph, a topic type, and text data and position information of the text data in the form, a second type of semantic-based dependency relationship between the text data in the form. And a generalized nested form model which simultaneously contains text data in the form and first-class dependency relationships and second-class dependency relationships among the text data can be obtained by constructing a metadata layer and an instance layer.
The server may then obtain the corresponding target rule. According to the target rule, the server does not need to worry about the difference of form structures in different form styles in the generalized nested form model, and can comprehensively utilize the relation of two different dimensions, namely the first type of dependency relation and the second type of dependency relation, more accurately extract corresponding text data and obtain the concerned form information. The server may transmit the extracted form information to the terminal device.
Correspondingly, the terminal equipment receives and stores the form information. Meanwhile, the terminal equipment can display the extracted form information to the staff, so that the staff can provide matched business service for the client according to the form information.
The system is simultaneously suitable for the forms with a plurality of different form styles, extracts corresponding form information, and has better generalization.
Referring to fig. 2, an embodiment of the present disclosure provides a method for acquiring form information. The method is particularly applied to the server side. In particular implementations, the method may include the following.
S201: acquiring a target image; the target image comprises a target form to be processed.
S202: and determining the theme type of the target form contained in the target image.
S203: and acquiring text data in the target form and position information of the text data according to the target image, and determining a first type of position-based dependency relationship between the text data.
S204: determining a second type of semantic-based dependency relationship between text data according to a preset knowledge graph, the topic type and text data in the target form and position information of the text data, and constructing a generalized nested form model for obtaining the target form; the generalized nested form model of the target form comprises text data, and a first type of position-based dependency relationship and a second type of semantic-based dependency relationship between the text data.
S205: and extracting corresponding text data from the generalized nested form model of the target form according to the target rule to obtain target form information.
Through the embodiment, the method and the device can be simultaneously and well applied to form processing of various different form styles, can efficiently and conveniently extract specific form information, and have good generalization.
In one embodiment, the target image may specifically be a photograph including the target form to be processed, a screenshot including the target form to be processed, a photocopy file including the target form to be processed, and so on.
In one embodiment, the target form may be specifically understood as a type of overprint form (or overprint form) to be processed. Specifically, the target form may be document data having a real table structure or an approximate table structure and including related text data. For example, financial statements (with a true tabular structure), real estate certificates (with an approximate tabular structure), and so forth.
In particular, a forms form is understood to be a form (e.g., a paper form) in a non-digital form that is generated in the context of a forms form. Specifically, in the context of a form being printed, the printing of a specific content text during the business transaction is based on the form base (and possibly more than one). The forms generated in the scene may have the problems of misplacement, inclination and the like of printed texts, so that the situations of adhesion between texts and master data, exceeding of form lines and the like are caused, and the difficulty of automatic identification and extraction of form information in the forms is increased.
In addition, when the image containing the form is obtained through shooting and other modes, the difficulty of automatically identifying and extracting the form information in the cover-type form is further increased because the shot form entity (for example, a paper form) is folded or curled or the screen of equipment used during shooting is interfered by noise such as screen flashing and crease.
In one embodiment, after the target image is acquired, the method may further include the following when implemented: preprocessing the target image; wherein the pretreatment comprises at least one of: image normalization processing, tilt correction processing, and warp restoration processing.
Through the embodiment, the target image can be correspondingly preprocessed, so that partial data errors in the image are eliminated, and the preprocessed target image with relatively good effect and relatively high precision is obtained and used for participating in the extraction of the follow-up form information, so that the extraction precision of the follow-up form information can be improved.
In one embodiment, acquiring the target image may specifically include: taking a photograph containing a target form as the target image by a photographing device (e.g., a mobile phone, a camera, etc.); alternatively, the target form is scanned by a scanner to obtain a photocopy file containing the target form as the target image or the like. Of course, the above-listed ways of acquiring the target image are only illustrative. In specific implementation, other suitable manners may be adopted to obtain the target image according to specific application scenarios and processing requirements. The present specification is not limited to this.
In one embodiment, the image normalization process may specifically include the following: performing binarization processing on the target image by combining a global threshold value and a Niblack method; and/or denoising the target image by using a Gaussian filter; and/or performing a size scaling operation on the target image, etc., according to the specific situation and processing requirements.
In one embodiment, the above-described inclination correction process may specifically include the following: tilt angle detection and image rotation. Wherein the detected inclination angle may specifically include: an angle of inclination in the co-planar lateral direction, and an angle of inclination in the vertical direction along the z-axis.
In the present embodiment, the target image may be subjected to tilt correction processing by perspective transformation. Specifically, the detection of the geometric shape of the target image can be realized by utilizing the dual relation of the coordinate space and the parameter space based on the inclination correction of Hough transformation; meanwhile, a character angle detection model trained based on a four-classification algorithm can be introduced, so that four inclination angle detection aiming at 0 degree, 90 degrees, 180 degrees and 270 degrees of characters can be realized. Further, the detection results obtained by the two kinds of detection can be obtained and comprehensively utilized, and targeted inclination correction can be performed on the target image. Thereby, the correction effect of the inclination correction can be improved.
In an embodiment, in a case where the preprocessing includes a warp restoration process, the preprocessing of the target image may include the following steps when implemented: detecting whether a target form in the target image has distortion or not; determining a distortion type under the condition that the target form in the target image is determined to have distortion; wherein the distortion type includes: the distortion existing in the target form itself and introduced when the target image is acquired; under the condition that the distortion type is determined to be the distortion existing in the target form, a preset distortion repair processing model is called, and the target image is processed; the preset distortion repair processing model is a deep learning model comprising a DocUNet structure formed by stacking two U-nets.
By the embodiment, the target image can be more accurately and effectively subjected to distortion restoration processing, so that the target image with relatively good effect and relatively high precision can be obtained.
In one embodiment, an initial model including a DocUNet structure formed by stacking two U-nets may be constructed prior to implementation; and then, performing deep learning on the sample data set in the corresponding scene by using the initial model to train to obtain a preset distortion repair processing model which performs corresponding correction and recovery on distortion existing in the target form.
In one embodiment, in a case where the distortion type is determined to be the distortion introduced when the target image is acquired, the method may further include, when embodied: and processing the target image by adopting a pre-constructed automatic corner detection algorithm based on Hough straight lines, so as to automatically determine four corners of the target form in the target image, and then carrying out specific perspective correction processing based on the four corners.
In one embodiment, the target forms may be file data of different types and different contents corresponding to different application scenarios. Accordingly, the subject types of the target forms may also be varied.
In one embodiment, the theme type may specifically include at least one of: real estate certificate, wedding certificate, financial statement, invoice, etc. Of course, the types of subject matter recited above are merely illustrative. In specific implementation, other theme types may also be included according to specific application scenarios and processing requirements. For example, in a business transaction scenario of a bank, the topic types of the target form may further include: real estate certificates, driver's licenses, checks, and the like. The present specification is not limited to this.
Through the embodiment, the method for acquiring the form information provided by the specification can be widely applied to various scenes so as to process forms with various theme types.
In one embodiment, the determining the topic type of the target form included in the target image may include the following when implemented: processing the target image by using a SIFT algorithm to extract and obtain target image characteristics; and calling a preset topic classification model to process the target image characteristics so as to determine the topic type of a target form contained in the target image.
The SIFT may also be referred to as a scale invariant feature transform, which is used to determine key points with scale invariance in the image processing field. Correspondingly, the implementation is carried out, and key points can be found out by processing the target image by using a SIFT algorithm; and extracting corresponding image features aiming at the key points to serve as target image features, so that the target image features with good effects can be found out more quickly and accurately.
The preset topic classification model can be specifically understood as a pre-trained classification model for determining topic types of the forms in the image according to image features.
Specifically, the preset topic classification model may be a classification model based on a CNN structure. Correspondingly, in the specific implementation, the preset topic classification model can find the topic type with the highest similarity with the target form in the current target image from a plurality of topic types which are trained and learned before by carrying out feature matching according to the input target image features, and the topic type is used as the topic type of the target form.
Through the embodiment, the theme type of the target form in the target image can be determined more efficiently and accurately.
In one embodiment, the first type of dependency relationship may be specifically understood as an association relationship between different text data, which is determined based on location information in a form structure of a target form. The location information may specifically refer to a coordinate parameter of the text data in the form structure of the target form, or may refer to a unit number of a rectangular unit of the text data in the target form.
Specifically, for example, in the target form, the unit number of the rectangular unit where the text data "name" is located is 4, and the unit number of the rectangular unit where the text data "Zhang Sano" is located is 5. According to the position information, the text data of the adjacent positions of the two text data including the name and the Zhang Sanu can be determined, and further, the first type of dependency relationship based on the position between the two text data, namely, the text data including the name and the Zhang Sanu, can be judged to have an association relationship.
In one embodiment, the method includes obtaining text data in a target form and location information of the text data according to the target image, and determining a first type of location-based dependency relationship between the text data, where the implementation may include the following: invoking a preset text detection model, and processing the target image according to the topic type of the target form so as to identify and determine a plurality of text image areas in the target image; wherein the text image area contains text data in the form of an image; invoking a preset processing model, and processing a plurality of text image areas in the target image to extract text data in each text image area; the method comprises the steps of performing list structure reduction processing on a target form to determine the position information of text data; and determining a first type of dependency relationship based on the position between the text data according to the position information of the text data.
Through the embodiment, the text data in the target form in the target image can be accurately identified, and the position information of the text data in the form structure based on the target form can be accurately positioned; and determining a first type of dependency relationship based on the position between the text data according to the position confidence of the text data from the dimension of the position structure.
In an embodiment, the preset text detection model may specifically be a text detection model obtained by training the PSENet model by using training sample data collected under a complex scene in advance.
The training sample data collected under the complex scene may include: the printed characters have sample data with a certain inclination, and/or the printed characters have sample data with character adhesion between the printed characters and the bottom plate characters, and the like.
The text image area is specifically understood as an image area in the target image, which contains text data in which partial positions of the target form are concentrated together. The text data in the text image area belongs to data in an image form, belongs to image data, and cannot be directly extracted.
The text detection model is suitable for detecting text data of target forms in target images in complex and various environments, text image areas with various different shapes can be detected and positioned in the target images, separation between similar different text image areas is realized, and then the text data in each text image area can be extracted more accurately.
In an embodiment, during implementation, a preset processing model may be called to perform targeted image processing on each determined text image area, so as to identify text data in each text image area, and convert the text data into text data in a corresponding text form (or called character form), so that the text data in each text image area may be extracted finely.
In an embodiment, the preset processing model may specifically be a neural network model based on a cnn+rnn+ctc architecture, which is obtained by training in advance. The CTC is a loss algorithm, so that the preset processing model obtained by training has a better processing effect, so as to solve the problem that the character sequence extracted by the network model in the training process cannot be aligned with the character sequence of the group trunk.
Taking any text image area as an example when a plurality of text image areas in a target image are specifically processed based on the preset processing model, the image characteristics of the text data in the image form in the text image area can be extracted through a CNN structure in the preset processing model; then, the LSTM structure is called to process the image characteristics so as to further extract the character sequence of the text data in the form of the image in the text image area; and text data in the form of text in the text region can be entered on the basis of the character sequences described above.
Through the processing, the text data of each position area can be extracted from the target image, and the text data of different position areas can be distinguished; at the same time, the location area of each text data is also approximately located.
In one embodiment, the foregoing determining, by performing a list structure reduction process on the target form, location information of the text data may include the following when implemented: converting the target image into a gray scale map; dividing a target form in the target image into a combination of a plurality of rectangular units according to the gray level map; correcting the combination of the rectangular units according to the determined text image areas in the target form to obtain the corrected combination of the rectangular units; and determining the rectangular unit where each text data is located according to the combination of the plurality of corrected rectangular units, and obtaining the position information of each text data.
Through the embodiment, the form structure of the target form in the target image can be accurately and completely restored, and further, the position information of each text data in the target form can be accurately determined according to the form structure.
In one embodiment, after the gray scale map is obtained, the rectangular units of the form structure constituting the target form in the target image may be cleaned and combined based on the gray scale map. In the specific implementation, firstly, a black-and-white image with enhanced lines and better effect can be obtained by carrying out corrosion, blurring, brightness equalization and binarization treatment on the gray level image. The corresponding landscape and portrait structural elements may then be constructed based on the form structure of the target form in the black-and-white image. And then performing corrosion and expansion operations to obtain a transverse straight line and a longitudinal straight line respectively, and determining the intersection point between the straight lines. Further, the rectangular units (or rectangular cells) may be reconfigured according to the obtained intersections to form a plurality of preliminarily divided rectangular units, thereby obtaining a combination of a plurality of rectangular units corresponding to the form structure of the target form.
The rectangle units in the combination of the plurality of rectangle units obtained preliminarily may further comprise adjacent cells which need to be combined by using lines which are not originally existed between the two points, or further comprise errors such as extra cells which are generated by noise error interference. Therefore, it is also necessary to correct the combination of the above-described plural rectangular units for operations such as merging, deleting, and the like.
Specifically, considering that the overlapping of characters and frames is easy to occur in a set of forms (for example, a target form) in some scenes, when correction is performed, a positioning result output by a preset text detection model and a recognition result output by a preset processing model can be introduced as an aid, each rectangular unit in the combination of the plurality of rectangular units is analyzed, whether a text block (for example, a text image area) exists near a border is firstly determined according to the positioning result, and if the text block exists, pixels of the text block possibly interfere with recognition of the border. Therefore, the image area of the text block can be scratched out from the edge area, and then whether the pixel proportion of the rest part possibly forms a straight line segment can be judged. If the remainder can form a straight line segment, it is determined that this edge of the rectangular unit is present. Otherwise, determining that the unit is to be combined with the adjacent single rectangular unit according to other positioning results, or deleting the rectangular unit directly, and the like. By the above correction, a combination of a plurality of rectangular units more conforming to the form structure of the target form can be obtained.
In one embodiment, it is contemplated that some target forms may be printed with associated watermark characters or the like as the backplane characters. In order to extract the required form information more accurately, the base plate characters in the target form can be detected and filtered to eliminate the influence of form information extraction by the base plate characters.
In one embodiment, it is also contemplated that both the base and the typing characters may sometimes be present in the typing form. Where the position of the base character relative to the form in the form is generally accurate and normal, the overprint character will often be offset relative to the form. Therefore, the position reduction processing can be performed on the sleeve typewriter, so that the form structure of the target form can be better reduced, and a plurality of rectangular unit combinations with better effects can be obtained.
Wherein the position restoration algorithm is designed to operate based on the assumption that the offsets of all the sleeve typewriters relative to the base character are consistent. Specifically, the input of the algorithm is a combination of a plurality of rectangular units, and a positioning result output by a preset text detection model and a recognition result output by a preset processing model.
In one embodiment, when the position restoration process is specifically performed based on the position restoration algorithm, first, it may be considered that the bottom-plate character in the form is generally vertically centered in the rectangular unit, and if the distance between the center of the text image area (e.g., text block) and the center of the rectangular unit where the text image area is located is smaller than the set threshold, the text data in the text image area is considered to belong to the bottom-plate character. And then the extracted text data can be directly filled into the corresponding rectangular unit, and the text image area is moved out of the text list W. In addition, in the form structure, in addition to the rectangular cells included in the form structure as a main structure, contents such as a title are generally included in the head position, contents such as a date deposit are also included in the tail position, and these two contents are excluded from the form during processing. For the text image area where the two parts of content are located, the two large rows can be directly regarded as the head and the tail of the form model to be generated subsequently.
Then, the offset of the text characters needs to be calculated, and when the matching degree of all the characters and the rectangular units is high and the characters do not conflict with the characters on the bottom plate, the total offset is considered to be small, and the offset value of the characters is the most practical. For other rectangular units, the height of the rectangular unit in the middle can be set as MH, ten values are uniformly taken within the range of [ -MH, MH ] to obtain an offset candidate set, which is marked as T. For each offset value τ e T in the offset candidate set, the following penalty function is defined:
wherein,representing any one of the text blocks (i.e. text image areas) of the text, W y Represents the vertical coordinates of the text block w, minoff (w y +τ) represents the vertical center distance between w and the nearest row when the offset is τ (w y +τ) is a positive value in the case where there is a floor character in the rectangular cell closest to w in the current case, and is otherwise 0. For all text blocks, T is traversed and penalty functions for each offset case are calculated.When the penalty function is minimal, it is indicated that the filling of the text block into the table is more appropriate in the case of the current offset, and the sum of the deviations is smaller.
Furthermore, the text block can be filled into the corresponding rectangular unit according to the deviation value, so that a restored form structure is obtained, and the text block is better as a combination of a plurality of rectangular units carrying the corresponding text image area. According to the combination of the rectangular units, the position information of each text data can be more accurately and conveniently determined.
In an embodiment, determining the semantic-based second type dependency relationship between text data according to the preset knowledge graph, the topic type, the text data in the target form and the position information of the text data, and constructing a generalized nested typing form model for obtaining the target form, where the implementation may include the following: according to the topic type, determining a matched mode layer from a preset knowledge graph; and constructing a corresponding metadata layer and an instance layer according to the matched mode layer, the text data in the target form and the position information of the text data so as to obtain a generalized nested form model of the target form.
Through the embodiment, the data knowledge precipitated in the preset knowledge graph can be introduced and utilized, the semantic-based second type dependency relationship between different text data in the target form is found out from the semantic dimension, and then the generalized nested form model of the target form, which is complete and good in use effect and contains the first type dependency relationship based on the position and the second type dependency relationship based on the semantic, can be constructed.
In one embodiment, the generalized nested form model of the target form is specifically a data model which simultaneously includes text data in the target form, and a first type of position-based dependency relationship and a second column of semantic-based dependency relationship between the text data. Specifically, the generalized nested form model of the target form includes a metadata layer and an instance layer. Based on the generalization nested form model of the target form, the first type of dependency relationship and/or the second type of dependency relationship can be selectively utilized according to specific situations and processing requirements, so that the method is applicable to more form modes, and required form information can be accurately extracted.
In the implementation, the matched mode layer in the preset knowledge graph can be imported into a combination of a plurality of rectangular units which are obtained through the list structure reduction and contain text data and the first type of dependency relationship based on the position among the text data, so that the generalized nested form model of the target form is obtained.
In one embodiment, the preset knowledge graph may be specifically understood as a semantic network that is obtained by learning and training a large number of corpus data in different external fields, where the obtained sediment has a large number of data knowledge incapable of being in the field, and can more completely reflect the association relationship between the data objects (e.g., entity objects) in different fields. The entities in the real world and their interrelationships can be formally described using the preset knowledge graph.
The logic structure of the preset knowledge graph can be divided into two layers: a data layer and a mode layer.
In the data layer, specific data in the preset knowledge-graph can be represented by triples, for example, g= (E1, R, E2). Wherein G: representing a knowledge graph; e: representing entities in the knowledge graph; r: the relationship in the knowledge graph is represented and can be used for connecting two entities to describe the association relationship between the two entities. Essentially, the preset knowledge graph can be understood as a semantic network for revealing the association relationship between entities, and the entity objects and the interrelationships thereof can be formally described.
Further, the pattern layer above the data layer can be understood as the core of the preset knowledge graph. In the schema layer, knowledge of the refined data may be stored in particular. An ontology library may be generally employed to manage pattern layers of preset knowledge patterns. Wherein an ontology is a specification modeling concepts, which can be understood as an abstract model describing the objective world, giving explicit definitions of concepts and their associated links in a formal manner. Furthermore, the biggest feature of the ontology is that it is shared. Ontology-reflected knowledge is a well-defined consensus. Specifically, the domain ontology is a more specialized ontology for describing concepts and relationships between concepts in a specific domain, and provides word lists of concepts and relationships between concepts in a specific domain.
In this embodiment, a preset knowledge graph related to multiple fields may be introduced, and the previously determined topic type and a pattern layer of the preset knowledge graph in multiple fields are subjected to semantic matching, so that a pattern layer (i.e., a matched pattern layer) of the knowledge graph in the corresponding field is found from the preset knowledge graph and used as a metadata layer of the model, thereby providing a standardized basis for the subsequent specific form information extraction.
In the instance layer, the data representation mode in the data layer of the preset knowledge graph can be used, and each instance is represented as the following form by using the form of the triplet: (subject, prediction, object) to build a huge network of entity relationships. Such a representation may enable instance information to be accepted by a computer and facilitate subsequent intelligent applications.
The generalization nested typing table model for the generalization scene in this embodiment can be specifically divided into: header information, cell instance, trailer additional information, etc. Wherein the header information and the trailer additional information may not be within the scope of the nested form model, i.e. only the text data within the form is of interest. And then uniformly recognizing by a universal character recognition module and packaging. Each rectangular unit instance (e.g., a unit cell instance) serves as a main body part of the overprinting form, and specifically may include a base board character and/or overprinting character. In the model, the corresponding position information and text data of the cells can be recorded at the same time to form a specific cell instance.
In one embodiment, the target rule may specifically include a preset custom extraction rule; wherein, the custom extraction rule includes: and customizing the target key value of the extracted text data and/or customizing the extraction condition.
The user-defined extraction rule can be specifically an extraction rule which is found out from a plurality of existing extraction rules stored in a rule base and is matched with own requirements according to specific conditions by a user; or a brand new extraction rule which is redesigned and written by the user according to specific situations and requirements.
Through the embodiment, the user can be allowed to select to use the customized extraction rule to extract the target form information according to specific conditions and processing requirements, so that the diversified requirements of the user can be met, and the use experience of the user is improved.
In one embodiment, the extracting corresponding text data from the generalized nested form model of the target form according to the target rule to obtain the target form information may include the following when implemented: determining whether the current user-defined extraction condition is met according to the target rule; under the condition that the current user-defined extraction condition is met is determined, text data corresponding to the target key value is determined to be used as first target text data by searching text data in a generalized nested table model of a target form according to the target rule, and text data corresponding to the first target text data determined based on a first type of dependency relationship of a position and/or a second type of dependency relationship of a semantic is determined to be used as second target text data; and combining the first target text data and the second target text data as the target form information.
Through the embodiment, the form information meeting the user requirements can be accurately extracted according to the target rule, aiming at complex and various form information extraction scenes and based on the generalized nested form model of the target form.
In one embodiment, because the form information is extracted based on the generalized nested form model of the target form, the form information can be extracted more flexibly and accurately based on various extraction bases (for example, based on the first type of dependency, or based on the second type of dependency, or based on both the first type of dependency and the second type of dependency), and errors in extracting the form information can be reduced.
Specifically, the text data in the determined target form may be classified according to the first type of dependency relationship and/or the second type of dependency relationship, and whether the type of each text data belongs to "key" or "value" is determined; searching text data with a key type according to a target rule, and finding out text data corresponding to a target key value to serve as first text data; further, according to the first text data, the first type dependency relationship and/or the second type dependency relationship, text data with the type of value is searched to find out text data with an association relationship with the first text data and serve as second text data; and finally, combining the first text data and the second text data to obtain structured data serving as target form information.
In one embodiment, when form information is specifically extracted, a generalized nested form model of the target form may be used as input to identify and convert unstructured form data into structured data in a key-value pair (key-value) format.
Further, in specific processing, the method can be subdivided into two serial sub-flows: text classification flow and rule matching flow. The text classification flow may include: classifying the text data in each rectangular unit into one of a key or a value; the rule matching procedure may include: the method comprises the steps of taking a rule engine facing a form structure as a support, taking the form content and a text data classification result as input, and converting the form information concerned in a target form into a key value according to target rules such as a custom extraction rule and obtaining the structured data in a format.
Specifically, the text classification flow can specifically implement classification of text data in rectangular units based on a generalized nested form model. The generalized nested form model comprises rectangular unit examples extracted from the bottom plate characters and the printing characters, and also comprises two types of keys and values. In the implementation, the metadata layer of the generalized nested form model can be used as a reference basis for classification, and the rectangular unit examples can be classified efficiently and accurately by calculating the semantic similarity between the text data of the examples in the rectangular units and the data in the metadata layer.
The rule matching flow can specifically utilize the position information of the rectangular unit instance more, and according to the rule engine which is constructed in advance and faces the form structure, realize matching of text data in the form conforming to the target rule according to the keys, the value categories and the relations among the rectangular unit, the lines and the columns, thereby obtaining the structured data in the JSON format, and realizing extraction of relevant form information of the generalized nested form model through the target rule.
In one embodiment, when the rule matching is specifically performed, the electronic form (Excel) data which is restored by the generalized nested form model is specifically input to the rule engine, and the structured data in the JSON format is output.
The rule engine can be understood as an inference engine. By means of the engine, rules can be matched from a rule base according to existing facts, conflicting rules are processed, and finally screened rules (namely target rules) can be executed.
Furthermore, by clustering a large number of form structures, it was found that: the list structure generally has two cases: 1) The key and the value are adjacent in a row; 2) The first row (or first column) is all keys and the remaining rectangular cells are values.
Based on the above findings, in the present embodiment, it is desirable to be able to match out rectangular unit instances conforming to rules according to key, value categories to which rectangular units, rows, and columns belong, and relationships between them, thereby obtaining structured data in JSON format.
Based on the above considerations, a form-oriented rules engine is designed and implemented. Meanwhile, the rule engine also configures and maintains a rule base to support operations such as manually adding, deleting, modifying and checking the customized extraction rules.
In this embodiment, when form information extraction is performed based on a rule engine, the rule definition mode followed by the rule engine may include conditional extraction based on ehen and then, etc. Where, while represents the precondition of the rule, then represents the post-output of the rule. That is, only text data in a form satisfying the precondition of the when can be successfully matched, and the result is output in accordance with the output mode of the when. The plurality of extraction rules may be stored in JSON format in a rule base. And supporting the user to customize the extraction rules.
Specifically, for example, the custom rule attribute may be saved by using the values corresponding to when and then, if the text data to be matched is in m rows and n columns, and is denoted as self, top_0 represents the rectangular unit at the top of self, that is, the 0 th row and the n th column. The rule may be expressed as: if the text data in the current rectangular unit is "value" and the text data in the uppermost rectangular unit is "key", a key value pair is obtained. Wherein the key is text data in the rectangular unit instance represented by "top_0", and the value is text data in the current rectangular unit instance. Specifically, for example, reference may be made to the rule representation shown below: { "When": { "Top_0": "key", "self": "value" }, "the": { "key": "Top_0", "value": "self" }.
In addition, the rule engine also supports the customization of attributes such as "top, top_2,... After the self-defined extraction rules are subjected to reasoning and matching by a rule engine, the data information contained in the electronic nested typing form model can be converted into unified structural relation data.
In one embodiment, the form information is extracted in the above manner, so that not only text data in the generalized nested form model can be obtained, but also the dependency relationship, namely the key value relationship, between different rectangular units in the form can be obtained. Furthermore, the data can be uniformly packaged into rectangular unit key value information, header and tail notes information, the key value pair format is extracted as form information of the generalized nested form model, the format is unified, and then the form information is output, so that corresponding target form information is obtained, and a standardized result is provided for subsequent application.
In one embodiment, after extracting the target form information, the method may further include: and feeding back and displaying the target form information to a user so that the user can perform corresponding data processing according to the extracted target form information concerned. For example, matching business services are provided to customers based on the target form information.
From the above, in the method for acquiring form information provided in the embodiments of the present disclosure, after acquiring a target image of a target form including form information to be extracted, a topic type of the target form in the target image may be determined first; according to the target image, text data in the target form and position information of the text data are acquired through corresponding processing, and a first type of dependence relationship based on the position between the text data is determined; further, introducing and utilizing a preset knowledge graph, and determining semantic-based second-class dependency relationships among text data in the target form by combining the topic type of the target form, the text data in the target form and the position information of the text data, and constructing a generalized nested form model of the target form, which simultaneously contains the text data in the target form, the first-class dependency relationships among the text data and the semantic-based second-class dependency relationships; and further, according to the target rule, corresponding text data can be extracted from the generalized nested form model of the target form to serve as target form information. By introducing and utilizing a preset knowledge graph and combining the subject type of the target form, constructing a generalized nested form model of the target form, which simultaneously contains a first type of position-based dependency relationship and a second type of semantic-based dependency relationship between text data in the target form, aiming at the target form in the target image; and then according to the generalized nested form model of the target form, comprehensively utilizing the relation of two different dimensions, namely the first type of dependency relation based on the position and the second type of dependency relation based on the semantics, between the text data, and accurately extracting the corresponding text data as target form information according to a target rule. Therefore, the method is applicable to the forms with different form styles at the same time, extracts the corresponding form information, and has better generalization. In addition, based on the method, error interference during form information extraction can be effectively reduced, and target form information meeting requirements can be obtained more efficiently and accurately.
The embodiment of the specification also provides a server, which comprises a processor and a memory for storing instructions executable by the processor, wherein the processor can execute the following steps according to the instructions when being implemented: acquiring a target image; the target image comprises a target form to be processed; determining the topic type of a target form contained in the target image; acquiring text data in a target form and position information of the text data according to the target image, and determining a first type of position-based dependency relationship between the text data; determining a second type of semantic-based dependency relationship between text data according to a preset knowledge graph, the topic type and text data in the target form and position information of the text data, and constructing a generalized nested form model for obtaining the target form; the generalized nested form model of the target form comprises text data, and a first type of position-based dependency relationship and a second type of semantic-based dependency relationship between the text data; and extracting corresponding text data from the generalized nested form model of the target form according to the target rule to obtain target form information.
In order to more accurately complete the above instructions, referring to fig. 3, another specific server is further provided in this embodiment of the present disclosure, where the server includes a network communication port 301, a processor 302, and a memory 303, and the above structures are connected by an internal cable, so that each structure may perform specific data interaction.
The network communication port 301 may be specifically configured to acquire a target image; the target image comprises a target form to be processed.
The processor 302 may be specifically configured to determine a topic type of a target form included in the target image; acquiring text data in a target form and position information of the text data according to the target image, and determining a first type of position-based dependency relationship between the text data; determining a second type of semantic-based dependency relationship between text data according to a preset knowledge graph, the topic type and text data in the target form and position information of the text data, and constructing a generalized nested form model for obtaining the target form; the generalized nested form model of the target form comprises text data, and a first type of position-based dependency relationship and a second type of semantic-based dependency relationship between the text data; and extracting corresponding text data from the generalized nested form model of the target form according to the target rule to obtain target form information.
The memory 303 may be used for storing a corresponding program of instructions.
In this embodiment, the network communication port 301 may be a virtual port that binds with different communication protocols, so that different data may be sent or received. For example, the network communication port may be a port responsible for performing web data communication, a port responsible for performing FTP data communication, or a port responsible for performing mail data communication. The network communication port may also be an entity's communication interface or a communication chip. For example, it may be a wireless mobile network communication chip, such as GSM, CDMA, etc.; it may also be a Wifi chip; it may also be a bluetooth chip.
In this embodiment, the processor 302 may be implemented in any suitable manner. For example, the processor may take the form of, for example, a microprocessor or processor, and a computer-readable medium storing computer-readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a programmable logic controller, and an embedded microcontroller, among others. The description is not intended to be limiting.
In this embodiment, the memory 303 may include a plurality of layers, and in a digital system, the memory may be any memory as long as it can hold binary data; in an integrated circuit, a circuit with a memory function without a physical form is also called a memory, such as a RAM, a FIFO, etc.; in the system, the storage device in physical form is also called a memory, such as a memory bank, a TF card, and the like.
The embodiments of the present specification also provide a computer storage medium storing computer program instructions that when executed implement a method for acquiring form information described above: acquiring a target image; the target image comprises a target form to be processed; determining the topic type of a target form contained in the target image; acquiring text data in a target form and position information of the text data according to the target image, and determining a first type of position-based dependency relationship between the text data; determining a second type of semantic-based dependency relationship between text data according to a preset knowledge graph, the topic type and text data in the target form and position information of the text data, and constructing a generalized nested form model for obtaining the target form; the generalized nested form model of the target form comprises text data, and a first type of position-based dependency relationship and a second type of semantic-based dependency relationship between the text data; and extracting corresponding text data from the generalized nested form model of the target form according to the target rule to obtain target form information.
In the present embodiment, the storage medium includes, but is not limited to, a random access Memory (Random Access Memory, RAM), a Read-Only Memory (ROM), a Cache (Cache), a Hard Disk (HDD), or a Memory Card (Memory Card). The memory may be used to store computer program instructions. The network communication unit may be an interface for performing network connection communication, which is set in accordance with a standard prescribed by a communication protocol.
In this embodiment, the functions and effects of the program instructions stored in the computer storage medium may be explained in comparison with other embodiments, and are not described herein.
Referring to fig. 4, on a software level, the embodiment of the present disclosure further provides a form information acquiring apparatus, where the apparatus may specifically include the following structural modules.
The acquiring module 401 may be specifically configured to acquire a target image; the target image comprises a target form to be processed;
the determining module 402 may be specifically configured to determine a topic type of a target form included in the target image;
the first processing module 403 may be specifically configured to obtain, according to the target image, text data in a target form and location information of the text data, and determine a first type of location-based dependency relationship between the text data;
The second processing module 404 may be specifically configured to determine a second type dependency relationship based on semantics between text data according to a preset knowledge graph, the topic type, text data in the target form and location information of the text data, and construct a generalized nested typing form model of the target form; the generalized nested form model of the target form comprises text data, and a first type of position-based dependency relationship and a second type of semantic-based dependency relationship between the text data;
the extracting module 405 may be specifically configured to extract corresponding text data from the generalized nested form model of the target form according to a target rule, so as to obtain target form information.
It should be noted that, the units, devices, or modules described in the above embodiments may be implemented by a computer chip or entity, or may be implemented by a product having a certain function. For convenience of description, the above devices are described as being functionally divided into various modules, respectively. Of course, when the present description is implemented, the functions of each module may be implemented in the same piece or pieces of software and/or hardware, or a module that implements the same function may be implemented by a plurality of sub-modules or a combination of sub-units, or the like. The above-described apparatus embodiments are merely illustrative, for example, the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
From the above, the form information acquiring device provided in the embodiment of the present disclosure may be suitable for various forms with different form styles, and perform corresponding form information extraction, so that the device has better generalization.
In a specific scenario example, the method provided in the present disclosure may be applied to construct a data processing system capable of identifying and extracting form information, and further the data processing system may be used to identify and extract corresponding form information. The specific implementation process can be referred to as follows.
In this scenario example, a data processing system capable of recognizing and extracting form information may be constructed in combination with the form information acquisition method provided in the present specification. Referring specifically to FIG. 5, the data processing system may include: the system comprises an external user interface, a table picture import interface, a data persistence layer, a table format export interface, a custom rule modification interface and a table identification and information extraction device.
The external user interface is configured to provide a visual operation page for the form recognition and information extraction device, through which an external user can intuitively and conveniently operate, and includes: and performing operations such as importing various form pictures (e.g., target images), checking form information extraction results (e.g., target form information), adding, deleting and checking custom rules and the like.
The form picture import interface is used for receiving various form import requests from an external user interface, supporting batch uploading of user pictures, and effectively supporting batch unified identification and information extraction of historical form data.
The data persistence layer, as a persistence storage part of the system, mainly stores four parts of data: the method comprises the steps of externally uploading original form pictures, extracting external knowledge patterns (such as preset knowledge patterns) and a custom rule base (including target rules) maintained by a rule engine, wherein the external knowledge patterns are quoted by topic patterns, and the bottom layer is deployed by using an unstructured database (HDFS).
The table format deriving interface is used for realizing the analysis of the form under the generalized playing scene through the functional module and outputting the related information (for example, form information) of the form in a key value pair format. The interface can also perform standardized output on the extracted specific information, including table head and tail supplementary note information and table single key value pairs, and restore the table head and tail supplementary note information and the table single key value pairs into an electronic form and visual output.
The custom rule modification interface is used for receiving the adding, deleting and checking operation of an external user on the custom rule base used in the rule engine and synchronously updating the operation to the data persistence layer in real time.
Referring to fig. 6, the form recognition and information extraction apparatus mainly includes six parts, respectively: the system comprises a nested form picture acquisition module, an image preprocessing module, a text positioning and recognition module, a self-adaptive form restoration module, a generalized nested form model, an information extraction module and nested form formatting output.
Wherein, the module can be used as the input interface of data, the input top and bottom set of forms of the generalized scene mainly comprises three modes of shooting and uploading by a mobile phone, accessing by a scanner and directly uploading; meanwhile, the module also supports batch uploading operation of the forms, and is convenient for realizing the structured output of the historical form data.
The image preprocessing module is specifically configured to perform unified preprocessing on an input original form image (for example, a target image including a target form), and mainly includes: image standardization, inclination correction, distortion restoration, form feature matching and the like, so as to input clear and uniform-size form pictures and form subjects for the subsequent modules. When the module is concretely realized, the unified preprocessing of the pictures is realized by adopting a mode of combining the traditional image processing method and the deep learning method, and the preprocessing work of diversified forms in a generalized scene can be effectively and comprehensively realized.
The text positioning and identifying module may specifically include: text localization and universal text (character) recognition based on the PSENet model. The module can be particularly used for positioning the position of each text region in the overprinting form and identifying the corresponding character information. By using the module, text content (e.g. text data) in the overprinting form can be extracted after text positioning and recognition are performed, text areas (e.g. text image areas) of different cells (e.g. rectangular cells) can be distinguished, and then positioning coordinates of each text area and extracted character information can be output as subsequent modules for input.
The adaptive table restoration module may specifically include: list structure reduction, text position reduction and generalized nested typing list model construction. The module can be particularly used for taking the result of the text positioning and identifying module as an aid and realizing the structure reduction of the text list and the text character position reduction (to determine the first type of dependency relationship based on the position among the text characters) by using a position reduction algorithm. Further, the reducing of the list structure further comprises: basic cell acquisition, cell cleaning and cell merging; the text position restoration further includes: bottom plate text extraction, form region classification, and registration offset calculation. Based on the module, finally, the generalized nested form model is constructed as normalized output of the module, so that an external knowledge graph (for example, a preset knowledge graph) can be introduced at the same time, corresponding metadata in the knowledge graph is acquired by a form subject obtained in the image preprocessing module and is used as a metadata layer of the generalized nested form model, and the final generalized nested form model is output by the module in combination with a cell instance in the restored form model (comprising a head part and a tail part of a document independently restored by a row and a tail row.
The generalized nested form model can be specifically provided for realizing accuracy and standardization of form information extraction by the model mentioned in the adaptive form reduction module, and is obtained by importing corresponding external open knowledge maps according to the current service theme, and the model can be divided into a metadata layer and an instance layer (cell instance, header and tail additional information).
The information extraction module may specifically include: text classification based on semantic similarity and form matching based on custom rules engine. The module can be particularly used for taking the generalized nested form model as input information thereof, and realizing the identification and conversion of unstructured form data into structured data in a key value pair format. The module can be subdivided into two serial sub-modules: a text classification module and a form rule matching module. The text classification module is used for classifying text contents in each cell as 'key' or 'value', the rule matching module is used for taking a rule engine facing a table structure as support, taking a result of table contents and text classification as input, and converting the electronic table into structured data in a key value pair format according to a self-defined rule.
The nested form formatted output can be particularly used as a final output module of the equipment, and based on the module, the text information of the generalized nested form is finally obtained, and meanwhile, the dependency relationship between the form cells, namely the key value relationship, can be restored. The module can also output unified packaging unit cell key value information, header and tail attached information, and takes the key value pair format as unified format of generalized nested form information extraction, thereby providing standardized results for subsequent applications.
Through the scene examples, the method provided by the specification solves the problem that the prior art of the recognition of the forms by the forms is excessively dependent on the template characteristics of the forms, and the form recognition and information extraction equipment based on the generalized forms by utilizing the characteristics that the service scenes are the same but the forms are various, introducing the knowledge patterns of the related open service scenes, designing and maintaining a custom rule engine, constructing and using the generalized forms by the forms, realizing the recognition and the structured information output of the forms by the generalized forms, and improving the universality and the practicability of the forms recognition and information extraction technology.
In connection with the application of a specific scenario example, the application performance parameter comparison list shown in table 1 can be obtained.
Table 1 application performance parameter comparison list
Compared with the prior art, the form recognition and information extraction equipment based on the generalized nested form model established by the method provided by the specification can effectively improve the universality, the self-adaptability and the semantic richness of the character recognition technology on the premise of ensuring the recognition accuracy, and the characteristics are presented in the following multiple layers: from the view of a form classification mechanism, the picture preprocessing module in the specification does not depend on a fixed form to classify, and the neural network model is used for realizing theme classification, so that the coverage rate and flexibility of form types are improved; from the viewpoint of a table restoration mechanism, text position and character information are introduced into a self-adaptive table restoration module as assistance, a penalty function is designed to realize accurate offset calculation, a self-adaptive table restoration algorithm is provided, automatic separation of bottom plate characters and printing characters under a sleeved scene is realized, and position restoration is realized; from the aspect of an information extraction mechanism, the external knowledge graph introduced in the self-adaptive form reduction module is used as a metadata basis for information extraction, semantic analysis is realized and similarity calculation is performed during text classification, so that the accuracy of text classification is effectively ensured. Meanwhile, a rule engine text matching method based on a table structure is designed and realized, the definition of an engine rule is simple and flexible, the engine rule is matched with the table structure semantically, and the custom rule can cover most of the table structures, so that the complete restoration information of the unstructured form is output in a key value pair format. In the scene example, in order to realize accurate and standard information extraction and expression of form information, a generalized overprinting form model is designed and constructed, a mode layer of an external domain knowledge graph is extracted by a form subject to serve as a metadata layer, a cell instance, head information and the like of the form serve as example information, various overprinting forms are structured and uniformly expressed, semantic features of the overprinting forms are enriched, and accuracy of subsequent information identification and extraction is effectively ensured. In addition, in the scene example, the final output of the equipment is in a key value pair format, the dependency relationship between the bottom plate characters and the printing characters under the scene is also the key of information extraction, the output of the key value pair format not only realizes the text information of the reduction form, but also gives out the dependency relationship and the structure information existing in the form, realizes the association and combination between the text information under the generalized scene, and completely represents and restores the picture information of the unstructured form by using a structuring method, thereby effectively improving the practicability and the integrity of the equipment.
Although the present description provides method operational steps as described in the examples or flowcharts, more or fewer operational steps may be included based on conventional or non-inventive means. The order of steps recited in the embodiments is merely one way of performing the order of steps and does not represent a unique order of execution. When implemented by an apparatus or client product in practice, the methods illustrated in the embodiments or figures may be performed sequentially or in parallel (e.g., in a parallel processor or multi-threaded processing environment, or even in a distributed data processing environment). The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, it is not excluded that additional identical or equivalent elements may be present in a process, method, article, or apparatus that comprises a described element. The terms first, second, etc. are used to denote a name, but not any particular order.
Those skilled in the art will also appreciate that, in addition to implementing the controller in a pure computer readable program code, it is well possible to implement the same functionality by logically programming the method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc. Such a controller can be regarded as a hardware component, and means for implementing various functions included therein can also be regarded as a structure within the hardware component. Or even means for achieving the various functions may be regarded as either software modules implementing the methods or structures within hardware components.
The description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, classes, etc. that perform particular tasks or implement particular abstract data types. The specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
From the above description of embodiments, it will be apparent to those skilled in the art that the present description may be implemented in software plus a necessary general hardware platform. Based on such understanding, the technical solutions of the present specification may be embodied essentially in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and include several instructions to cause a computer device (which may be a personal computer, a mobile terminal, a server, or a network device, etc.) to perform the methods described in the various embodiments or portions of the embodiments of the present specification.
Various embodiments in this specification are described in a progressive manner, and identical or similar parts are all provided for each embodiment, each embodiment focusing on differences from other embodiments. The specification is operational with numerous general purpose or special purpose computer system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable electronic devices, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.
Although the present specification has been described by way of example, it will be appreciated by those skilled in the art that there are many variations and modifications to the specification without departing from the spirit of the specification, and it is intended that the appended claims encompass such variations and modifications as do not depart from the spirit of the specification.

Claims (11)

1. The method for acquiring the form information is characterized by comprising the following steps:
acquiring a target image; the target image comprises a target form to be processed;
determining the topic type of a target form contained in the target image;
acquiring text data in a target form and position information of the text data according to the target image, and determining a first type of position-based dependency relationship between the text data;
determining a second type of semantic-based dependency relationship between text data according to a preset knowledge graph, the topic type and text data in the target form and position information of the text data, and constructing a generalized nested form model for obtaining the target form; the generalized nested form model of the target form comprises text data, and a first type of position-based dependency relationship and a second type of semantic-based dependency relationship between the text data;
Extracting corresponding text data from the generalized nested form model of the target form according to a target rule to obtain target form information; the target rule comprises a preset custom extraction rule; wherein, the custom extraction rule includes: customizing a target key value of the extracted text data and/or customizing an extraction condition;
the method for extracting the text data from the generalized nested form model of the target form according to the target rule to obtain the target form information comprises the following steps: determining whether the current user-defined extraction condition is met according to the target rule; under the condition that the current user-defined extraction condition is met is determined, text data corresponding to the target key value is determined to be used as first target text data by searching text data in a generalized nested table model of a target form according to the target rule, and text data corresponding to the first target text data determined based on a first type of dependency relationship of a position and/or a second type of dependency relationship of a semantic is determined to be used as second target text data; and combining the first target text data and the second target text data as the target form information.
2. The method of claim 1, wherein after acquiring the target image, the method further comprises:
preprocessing the target image; wherein the pretreatment comprises at least one of: image normalization processing, tilt correction processing, and warp restoration processing.
3. The method according to claim 2, wherein, in the case where the preprocessing includes a warp restoration process, preprocessing the target image includes:
detecting whether a target form in the target image has distortion or not;
determining a distortion type under the condition that the target form in the target image is determined to have distortion; wherein the distortion type includes: the distortion existing in the target form itself and introduced when the target image is acquired;
under the condition that the distortion type is determined to be the distortion existing in the target form, a preset distortion repair processing model is called, and the target image is processed; the preset distortion repair processing model is a deep learning model comprising a DocUNet structure formed by stacking two U-nets.
4. The method of claim 1, wherein determining the subject type of the target form contained in the target image comprises:
Processing the target image by using a SIFT algorithm to extract and obtain target image characteristics;
and calling a preset topic classification model to process the target image characteristics so as to determine the topic type of a target form contained in the target image.
5. The method of claim 4, wherein the topic type comprises at least one of: real estate certificate, wedding certificate, financial statement, invoice.
6. The method of claim 1, wherein obtaining text data within a target form and location information for the text data from the target image and determining a first type of location-based dependency relationship between the text data comprises:
invoking a preset text detection model, and processing the target image according to the topic type of the target form so as to identify and determine a plurality of text image areas in the target image; wherein the text image area contains text data in the form of an image;
invoking a preset processing model, and processing a plurality of text image areas in the target image to extract text data in each text image area;
the method comprises the steps of performing list structure reduction processing on a target form to determine the position information of text data;
And determining a first type of dependency relationship based on the position between the text data according to the position information of the text data.
7. The method of claim 6, wherein determining the location information of the text data by performing a list structure reduction process on the target form comprises:
converting the target image into a gray scale map;
dividing a target form in the target image into a combination of a plurality of rectangular units according to the gray level map;
correcting the combination of the rectangular units according to the determined text image areas in the target form to obtain the corrected combination of the rectangular units;
and determining the rectangular unit where each text data is located according to the combination of the plurality of corrected rectangular units, and obtaining the position information of each text data.
8. The method of claim 1, wherein determining a semantic-based second type dependency relationship between text data according to a preset knowledge graph, the topic type, the text data in the target form and the position information of the text data, and constructing a generalized nested typing form model for obtaining the target form comprises:
According to the topic type, determining a matched mode layer from a preset knowledge graph;
and constructing a corresponding metadata layer and an instance layer according to the matched mode layer, the text data in the target form and the position information of the text data so as to obtain a generalized nested form model of the target form.
9. An apparatus for acquiring form information, comprising:
the acquisition module is used for acquiring a target image; the target image comprises a target form to be processed;
the determining module is used for determining the theme type of the target form contained in the target image;
the first processing module is used for acquiring text data in the target form and position information of the text data according to the target image, and determining a first type of position-based dependency relationship between the text data;
the second processing module is used for determining a second type of semantic-based dependency relationship between text data according to a preset knowledge graph, the topic type and the text data and the position information of the text data in the target form, and constructing a generalized nested typing form model of the target form; the generalized nested form model of the target form comprises text data, and a first type of position-based dependency relationship and a second type of semantic-based dependency relationship between the text data;
The extraction module is used for extracting corresponding text data from the generalized nested form model of the target form according to the target rule so as to obtain target form information; the target rule comprises a preset custom extraction rule; wherein, the custom extraction rule includes: customizing a target key value of the extracted text data and/or customizing an extraction condition;
the extraction module is specifically configured to determine whether a custom extraction condition is currently satisfied according to the target rule; under the condition that the current user-defined extraction condition is met is determined, text data corresponding to the target key value is determined to be used as first target text data by searching text data in a generalized nested table model of a target form according to the target rule, and text data corresponding to the first target text data determined based on a first type of dependency relationship of a position and/or a second type of dependency relationship of a semantic is determined to be used as second target text data; and combining the first target text data and the second target text data as the target form information.
10. A server comprising a processor and a memory for storing processor-executable instructions, which when executed by the processor implement the steps of the method of any one of claims 1 to 8.
11. A computer readable storage medium having stored thereon computer instructions which when executed implement the steps of the method of any of claims 1 to 8.
CN202110339506.5A 2021-03-30 2021-03-30 Form information acquisition method, device and server Active CN113011144B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110339506.5A CN113011144B (en) 2021-03-30 2021-03-30 Form information acquisition method, device and server

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110339506.5A CN113011144B (en) 2021-03-30 2021-03-30 Form information acquisition method, device and server

Publications (2)

Publication Number Publication Date
CN113011144A CN113011144A (en) 2021-06-22
CN113011144B true CN113011144B (en) 2024-01-30

Family

ID=76409247

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110339506.5A Active CN113011144B (en) 2021-03-30 2021-03-30 Form information acquisition method, device and server

Country Status (1)

Country Link
CN (1) CN113011144B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113343658B (en) * 2021-07-01 2024-04-09 湖南四方天箭信息科技有限公司 PDF file information extraction method and device and computer equipment
CN113468342B (en) * 2021-07-22 2023-12-05 北京京东振世信息技术有限公司 Knowledge graph-based data model construction method, device, equipment and medium
CN113568965A (en) * 2021-07-29 2021-10-29 上海浦东发展银行股份有限公司 Method and device for extracting structured information, electronic equipment and storage medium
CN114428839A (en) * 2022-01-27 2022-05-03 北京百度网讯科技有限公司 Data processing method, paragraph text determination device and electronic equipment
CN114220103B (en) * 2022-02-22 2022-05-06 成都明途科技有限公司 Image recognition method, device, equipment and computer readable storage medium
CN114639107B (en) * 2022-04-21 2023-03-24 北京百度网讯科技有限公司 Table image processing method, apparatus and storage medium
CN117542067B (en) * 2023-12-18 2024-06-21 北京长河数智科技有限责任公司 Region labeling form recognition method based on visual recognition

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109543690A (en) * 2018-11-27 2019-03-29 北京百度网讯科技有限公司 Method and apparatus for extracting information
CN110489424A (en) * 2019-08-26 2019-11-22 北京香侬慧语科技有限责任公司 A kind of method, apparatus, storage medium and the electronic equipment of tabular information extraction
CN111260586A (en) * 2020-01-20 2020-06-09 北京百度网讯科技有限公司 Method and device for correcting distorted document image
CN111611990A (en) * 2020-05-22 2020-09-01 北京百度网讯科技有限公司 Method and device for identifying table in image
CN112052305A (en) * 2020-09-02 2020-12-08 平安资产管理有限责任公司 Information extraction method and device, computer equipment and readable storage medium
CN112434691A (en) * 2020-12-02 2021-03-02 上海三稻智能科技有限公司 HS code matching and displaying method and system based on intelligent analysis and identification and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10733433B2 (en) * 2018-03-30 2020-08-04 Wipro Limited Method and system for detecting and extracting a tabular data from a document

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109543690A (en) * 2018-11-27 2019-03-29 北京百度网讯科技有限公司 Method and apparatus for extracting information
CN110489424A (en) * 2019-08-26 2019-11-22 北京香侬慧语科技有限责任公司 A kind of method, apparatus, storage medium and the electronic equipment of tabular information extraction
CN111260586A (en) * 2020-01-20 2020-06-09 北京百度网讯科技有限公司 Method and device for correcting distorted document image
CN111611990A (en) * 2020-05-22 2020-09-01 北京百度网讯科技有限公司 Method and device for identifying table in image
CN112052305A (en) * 2020-09-02 2020-12-08 平安资产管理有限责任公司 Information extraction method and device, computer equipment and readable storage medium
CN112434691A (en) * 2020-12-02 2021-03-02 上海三稻智能科技有限公司 HS code matching and displaying method and system based on intelligent analysis and identification and storage medium

Also Published As

Publication number Publication date
CN113011144A (en) 2021-06-22

Similar Documents

Publication Publication Date Title
CN113011144B (en) Form information acquisition method, device and server
CN109543690B (en) Method and device for extracting information
CN110532834B (en) Table extraction method, device, equipment and medium based on rich text format document
RU2668717C1 (en) Generation of marking of document images for training sample
US20190385054A1 (en) Text field detection using neural networks
CN110427972B (en) Certificate video feature extraction method and device, computer equipment and storage medium
US20210271857A1 (en) Method and apparatus for identity verification, electronic device, computer program, and storage medium
CN111104941B (en) Image direction correction method and device and electronic equipment
CN110852311A (en) Three-dimensional human hand key point positioning method and device
CN115238688A (en) Electronic information data association relation analysis method, device, equipment and storage medium
Igorevna et al. Document image analysis and recognition: a survey
Zhang et al. Landmark‐Guided Local Deep Neural Networks for Age and Gender Classification
CN114282258A (en) Screen capture data desensitization method and device, computer equipment and storage medium
CN116524574A (en) Facial area recognition method and device and electronic equipment
Vishwanath et al. Deep reader: Information extraction from document images via relation extraction and natural language
CN116110110A (en) Fake image detection method, terminal and storage medium based on face key points
CN111008295A (en) Page retrieval method and device, electronic equipment and storage medium
CN115880702A (en) Data processing method, device, equipment, program product and storage medium
CN115294557A (en) Image processing method, image processing apparatus, electronic device, and storage medium
CN112686129B (en) Face recognition system and method
JP5414631B2 (en) Character string search method, character string search device, and recording medium
CN116958615A (en) Picture identification method, device, equipment and medium
JP7420578B2 (en) Form sorting system, form sorting method, and program
JP4418726B2 (en) Character string search device, search method, and program for this method
CN112926585A (en) Cross-domain semantic segmentation method based on regenerative kernel Hilbert space

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant