CN114511865A

CN114511865A - Method and device for generating structured information and computer readable storage medium

Info

Publication number: CN114511865A
Application number: CN202111599215.6A
Authority: CN
Inventors: 游照林; 熊剑平; 陈媛媛
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Dahua Technology Co Ltd
Priority date: 2021-12-24
Filing date: 2021-12-24
Publication date: 2022-05-17

Abstract

The application discloses a method, a device and a computer readable storage medium for generating structured information, wherein the method comprises the following steps: acquiring a template image, wherein the template image comprises a plurality of reference fields and a plurality of identification areas, and the identification areas are different from the areas where the reference fields are located; carrying out direction correction processing on the acquired first image to obtain a second image; performing text recognition processing on the second image to obtain a text recognition result; matching the text recognition result with the reference field to obtain a matching result; correcting the second image based on the matching result to obtain a third image, wherein the angle of the third image is the same as that of the template image; structured information is generated based on the text recognition result, the recognition area, and the third image. Through the mode, the efficiency can be improved, and the adaptability is wide.

Description

Method and device for generating structured information and computer readable storage medium

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a method and an apparatus for generating structured information, and a computer-readable storage medium.

Background

A large amount of information is generated every day, and key information needs to be screened from the information so as to be stored or managed; for example, taking registration management of an electric vehicle as an example, data such as a certificate of the electric vehicle needs to be submitted, and the certificate information of the electric vehicle is manually input and identified, so that the efficiency is low, the time is long, the input error is easy to occur, and the recording and tracing of the input and identification processes cannot be performed.

Disclosure of Invention

The application provides a method and a device for generating structured information and a computer-readable storage medium, which can improve efficiency and have wide adaptability.

In order to solve the technical problem, the technical scheme adopted by the application is as follows: a method for generating structured information is provided, which comprises the following steps: acquiring a template image, wherein the template image comprises a plurality of reference fields and a plurality of identification areas, and the identification areas are different from the areas where the reference fields are located; carrying out direction correction processing on the acquired first image to obtain a second image; performing text recognition processing on the second image to obtain a text recognition result; matching the text recognition result with the reference field to obtain a matching result; correcting the second image based on the matching result to obtain a third image, wherein the angle of the third image is the same as that of the template image; structured information is generated based on the text recognition result, the recognition area, and the third image.

In order to solve the above technical problem, another technical solution adopted by the present application is: there is provided a document structuring apparatus comprising: the template image comprises a plurality of reference fields and a plurality of identification areas, and the identification areas are different from the areas where the reference fields are located; the processing module is connected with the acquisition module and is used for carrying out direction correction processing on the acquired first image to obtain a second image; performing text recognition processing on the second image to obtain a text recognition result; matching the text recognition result with the reference field to obtain a matching result; correcting the second image based on the matching result to obtain a third image, wherein the angle of the third image is the same as that of the template image; the generating module is connected with the processing module and used for generating the structured information based on the text recognition result, the recognition area and the third image.

In order to solve the above technical problem, another technical solution adopted by the present application is: the document structuring device comprises a memory and a processor which are connected with each other, wherein the memory is used for storing a computer program, and the computer program is used for realizing the method for generating the structuring information in the technical scheme when being executed by the processor.

In order to solve the above technical problem, another technical solution adopted by the present application is: there is provided a computer-readable storage medium for storing a computer program, which, when executed by a processor, is used to implement the method for generating structured information in the above technical solution.

Through the scheme, the beneficial effects of the application are that: firstly, a user makes a template image, wherein the template image comprises a plurality of reference fields and a plurality of identification areas, and the areas of the identification areas and the reference fields are different; then the document structuralization device carries out direction correction processing on the first image to generate a second image; identifying characters in the second image to obtain a text identification result; matching the text recognition result with the reference field to obtain a matching result; correcting the second image by using the matching result to obtain a third image with the same angle as the template image; extracting a text recognition result of a region corresponding to the recognition region from the third image to obtain structured information; the template image can be customized by the user, so that the application requirements of different users can be met, and the applicability is wide; moreover, the template image is compared with the image to be processed, the desired content can be extracted quickly, the method does not need to be realized simply by depending on manpower, and the efficiency of information extraction is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts. Wherein:

FIG. 1 is a flowchart illustrating an embodiment of a method for generating structured information provided herein;

FIG. 2 is a schematic flow chart diagram illustrating a method for generating structured information according to another embodiment of the present disclosure;

FIG. 3 is a schematic diagram of a direction detection network provided herein;

FIG. 4 is a schematic diagram of a text recognition network provided herein;

FIG. 5 is a schematic diagram of a reference field in a template image provided herein;

FIG. 6 is a schematic illustration of an identified region in the template image shown in FIG. 5;

FIG. 7 is a schematic illustration of a fourth image provided herein that is rotated by 90 from the first image to 0;

FIG. 8 is a diagram illustrating Hough line detection results of the fourth image of FIG. 7;

FIG. 9 is a schematic illustration of a second image corresponding to the fourth image shown in FIG. 7;

FIG. 10 is a diagram of fields in the second image shown in FIG. 9 corresponding to reference fields;

FIG. 11 is a schematic illustration of a third image provided herein;

FIG. 12 is the corresponding structured information for the first image of FIG. 7;

FIG. 13 is a schematic diagram illustrating an embodiment of a document structuring apparatus provided herein;

FIG. 14 is a schematic structural diagram of another embodiment of a document structuring device provided herein;

FIG. 15 is a schematic structural diagram of an embodiment of a computer-readable storage medium provided in the present application.

Detailed Description

The present application will be described in further detail with reference to the following drawings and examples. It is to be noted that the following examples are only illustrative of the present application, and do not limit the scope of the present application. Likewise, the following examples are only some examples and not all examples of the present application, and all other examples obtained by a person of ordinary skill in the art without any inventive work are within the scope of the present application.

Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

It should be noted that the terms "first", "second" and "third" in the present application are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implying any number of indicated technical features. Thus, a feature defined as "first," "second," or "third" may explicitly or implicitly include at least one of the feature. In the description of the present application, "plurality" means at least two, e.g., two, three, etc., unless explicitly specifically limited otherwise. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.

Referring to fig. 1, fig. 1 is a schematic flowchart illustrating an embodiment of a method for generating structured information according to the present application, where the method includes:

s11: and acquiring a template image.

The user can analyze different documents, make a template image and store the template image, wherein the template image comprises a plurality of reference fields and a plurality of identification areas, and the identification areas are different from the areas where the reference fields are located. Specifically, fields with fixed positions and unchanged contents in different images of the same format are used as reference fields, and the reference fields can be framed and selected as anchor points to be used for matching and correcting subsequent transmitted images; the number of the reference fields can be set according to specific application requirements, for example, when a template image is manufactured, the number of the reference fields can be at least 6; the identification area is an area in the image which needs to be identified.

It will be appreciated that the reference field may be completely different from the name field of the identification area or the reference field may be at least partially identical to the name field of the identification area.

S12: and carrying out direction correction processing on the acquired first image to obtain a second image.

The first image may be acquired from an image database, or the first image may be obtained by shooting a document to be processed by using an image pickup device, where the first image is an image including a document, and the document in the first image may be a document with a fixed format, such as a product certificate, an identification card, or a ticket. For the acquired first image, whether the first image is a regular image or not can be detected, namely whether most characters in the first image are inclined or not can be detected, if the most characters in the first image are inclined, the first image is inclined, and in order to improve the accuracy of subsequent text recognition, the first image is firstly subjected to direction correction processing to adjust the first image to be the regular image, namely, the angle of the first image is adjusted to 0 degrees, and a second image is generated.

S13: and performing text recognition processing on the second image to obtain a text recognition result.

After finishing the direction correction, processing the second image by using a text recognition method to generate a character recognition result, where the text recognition method is a recognition method commonly used in the related art, such as: optical Character Recognition (OCR).

S14: and matching the text recognition result with the reference field to obtain a matching result.

After the second image is recognized, matching the reference field in the template image with the text recognition result; specifically, the text recognition result comprises a plurality of sub-recognition results, whether each sub-recognition result is the same as the reference field or whether the similarity of the sub-recognition result and the reference field is greater than a set value is judged, if yes, matching is considered to be successful, and the sub-recognition results and the corresponding reference fields are put into the matching result.

S15: and correcting the second image based on the matching result to obtain a third image.

After the matching of the text recognition result and the template image is completed, the second image is corrected by using the matching result, namely, the angle of the second image is adjusted, so that the angle of the generated third image is the same as the angle of the template image.

S16: structured information is generated based on the text recognition result, the recognition area, and the third image.

Since the angle of the third image is the same as the angle of the template image, the positions of the corresponding regions in the two images are substantially the same, and the corresponding region (denoted as a candidate region) can be found from the third image by using the position information of the recognition region, and the text recognition result corresponding to the candidate region can be output as the structured information. It is to be understood that other information in the third image may also be output as structured information, and is not limited herein.

The embodiment provides a document structuring scheme, which relates to the technologies of character detection, character recognition and structuring; in order to extract key information in the image to be processed, such as: registering effective information in the qualification certificate of the electric vehicle; the user can customize the template image, the image processing and the character detection and recognition technology are adopted to process the image to be processed, the position information and the content which are wanted in the document to be processed can be extracted quickly, the effective information is not required to be extracted from the document manually, namely, the method is realized by a machine, and the efficiency of information extraction is improved; the user can customize the specific content of the template image, so that the application requirements of different users can be met, and the applicability is wide.

Referring to fig. 2, fig. 2 is a schematic flowchart of another embodiment of a method for generating structured information, the method includes:

s21: and acquiring the template image and the identification name field corresponding to the identification area, and establishing a corresponding relation between the identification name field and the identification area to obtain a mapping table.

A user firstly makes a template image, selects an identification area in a frame mode, and can construct a corresponding relation (namely a key-value corresponding relation) between an identification name field and the identification area through frame selection and naming for carrying out structural identification on the contents at the same position in a subsequently transmitted image with the same format, wherein the area where the identification name field is located is different from the identification area; after the template image is manufactured, the template image can be saved for subsequent operation.

S22: and detecting the direction of the first image to obtain a first inclination angle.

Detecting the first image by adopting a pre-trained direction detection network to obtain the direction (namely a first inclination angle) of the first image; specifically, the angle of the first image may be detected using a direction detection network shown in fig. 3, which uses VGG16(Visual Geometry Group), where "conv" represents a convolution operation, the size of a convolution kernel is 3 × 3, "pool" represents a maximum pooling operation, "fc" represents a fully-connected layer, and the output of the last layer "fc" is 4-dimensional data representing four categories, 0 °, 90 °, 180 °, and 270 °, respectively.

S23: and judging whether the first inclination angle is a preset angle or not.

After the angle of the first image is acquired, whether the first inclination angle is a preset angle is judged, and the preset angle may be 0 °.

S24: and if the first inclination angle is a preset angle, performing rotation processing on the first image to obtain a fourth image.

The direction of the first image is judged through a direction detection network, then the first image is rotated to 0 degrees, namely the forward direction of human eye observation is met, and a fourth image is generated.

S25: and carrying out correction processing on the fourth image to obtain a second image.

Performing rotary text correction processing on the fourth image to generate a second image; specifically, firstly, Hough (Hough) straight line detection is carried out on the fourth image to obtain a straight line detection result; and then, based on the straight line detection result, correcting the fourth image to obtain a second image.

Furthermore, Hough detection is an effective algorithm for detecting straight lines, a target point of a rectangular coordinate system is mapped to a polar coordinate system for accumulation, namely all points on any straight line on a plane of the rectangular coordinate system are accumulated to the same point set of polar coordinates, and then a long straight line characteristic is found by finding a peak value of the point set in the polar coordinate system, so that the discontinuity of the straight line can be tolerated.

The basic principle of Hough transformation is that the duality of points and straight lines is utilized to change a given curve in an original image space into one point of a parameter space through a curve expression form, so that the problem of curve detection given in the original image space is converted into clustering of pixels with a certain relation in the image space, and the problem of parameter space accumulation corresponding points which can link the pixels in a certain analytic form is found, namely the problem of finding peak values in the parameter space is found. Any straight line in the plane can be expressed by a polar coordinate equation, namely p and theta, and for any point (x, y) in the image space, the function relationship is as follows: p ═ x × cos (theta) + y × sin (theta), where p is the distance from the origin to the straight line, theta determines the direction of the straight line; because the digital image is discrete values in the image space (x, y) and the Hough space H (p, theta), each pixel point can be projected into the parameter space; if the above transformation is performed on n points on the same straight line, n sinusoidal curves are correspondingly obtained from the n points in the original image space in the parameter space, and the curves intersect at the same point, so that the collinear points in the image space have a corresponding relationship with the lines of the common points in the parameter space, and the curves in the image space can be determined only by finding out the curves of the common points in the parameter space.

In a specific embodiment, the straight line detection result includes an inclination angle of at least one straight line, and the inclination angles of all the straight lines are averaged to obtain a second inclination angle, or a mode of the inclination angles of all the straight lines is obtained and the mode is used as the second inclination angle; and then rotating the fourth image by a second inclination angle to obtain a second image, such as: and if the angle of the fourth image is the included angle between the characters in the image and the positive horizontal direction, clockwise rotating the fourth image by a second inclination angle.

It is understood that Hough line detection may be used to detect only lines with smaller inclination angles (less than 90 °), and then obtain the second inclination angle by averaging or calculating the mode of the inclination angle of each detected line; or, detecting only the straight line with a larger inclination angle (greater than 90 °) by using Hough straight line detection, and then obtaining a second inclination angle by averaging or calculating a mode of the inclination angle of each detected straight line; or, the Hough straight lines can be set to detect all the straight lines in the image, then the straight lines with larger inclination angles are filtered, the straight lines with smaller inclination angles are left, and the inclination angles of the left straight lines are averaged or the mode is calculated to obtain the second inclination angle.

When the first inclination angle is the preset angle, it indicates that the angle of the first image is 0 °, and since only 4 angle values are output in S23, the first image may still have an inclination, and at this time, the first image may be corrected to obtain the second image.

S26: and performing text detection processing on the second image to obtain a text detection result.

Global text detection is carried out on the second image by adopting a text recognition network so as to recognize characters in the second image; specifically, as shown in fig. 4, the text recognition network used may be (differential merging network, DBNET), which is a segmentation-based text detection method, where "32", "64", "128", "256" and "512" represent the number of output characteristic channels, "1/2", "1/4", "1/8", "1/16" and "1/32" represent the proportion of input images, "upsample" is an upsampling operation, the multiple of upsampling may be 2 times, "concat" is a splicing operation, "consistent stage one" to "consistent stage five" are the first to fifth convolutional layers, respectively, "basic map" is a text probability map, and "threshold map" is a threshold map.

Further, an input image (input image) passes through a characteristic pyramid, firstly, the characteristic extraction is carried out through the convolution layers in five stages, and then the characteristics of different scales are cascaded through an up-sampling operation; the method can improve the robustness of detecting texts with different scales, the output of the DBNET is a text probability map and a threshold map, and a text box can be obtained by post-processing the text probability map and the threshold map. It is understood that the specific manner of the post-processing is the same as that in the related art, and is not described herein again.

The first image is corrected into a second image, namely the angle of the second image is 0 degree or close to 0 degree; and then the second image is identified, which is beneficial to improving the accuracy of character identification.

S27: and matching the text recognition result with the reference field to obtain a matching result.

S27 is the same as S14 in the previous embodiment, and is not repeated here.

S28: and correcting the second image based on the matching result to obtain a third image.

The matching result comprises a plurality of matching fields, and whether the similarity between the sub-identification result and the reference field is greater than the preset similarity is judged; if the similarity between the sub-recognition result and the reference field is greater than the preset similarity, determining the sub-recognition result as a matching field; and performing perspective transformation on the second image based on the matching field to obtain a third image.

By performing template matching on all the sub-recognition results in the text detection result and the reference field and performing perspective transformation by using the matched field (namely the matched field) as an anchor point, the input image can be corrected to be consistent with the template image.

S29: and generating structured information based on the mapping table, the text detection result, the identification area and the third image.

The text detection result comprises a plurality of sub-recognition results, the sub-recognition results comprise recognition results of areas corresponding to the recognition areas, and the positions of the areas where the sub-recognition results are located are matched with the positions of the recognition areas to obtain candidate areas; matching the sub-identification result corresponding to the candidate area with the mapping table to obtain the identification name of the sub-identification result; and determining the identification name and the sub-identification result corresponding to the identification name as structural information, namely finally outputting the information of the region with the same position as the template image, thereby realizing the user-defined structural extraction and having higher efficiency.

In other embodiments, the text detection result includes a plurality of sub-recognition results, the sub-recognition results include a recognition result of a region corresponding to the recognition area and a recognition result of a region corresponding to the recognition name field, and the sub-recognition results may be matched with the mapping table to obtain the recognition name; and then determining the identification name and the sub-identification result corresponding to the identification name as the structured information.

In a specific embodiment, the qualification of the motor is taken as an example, as shown in fig. 5 to 12, fig. 5 and 6 are template images, 6 reference fields (i.e., fields in a dashed line box) are shown in fig. 5, and 5 identification areas (i.e., regions in a dashed line box) are shown in fig. 6; FIG. 7 is a schematic illustration of a first image undergoing rectification; fig. 8 is a result of Hough line detection on the fourth image; FIG. 9 is a second image; FIG. 10 is a diagram of 5 matching fields in a second image; FIG. 11 is a third image, where it can be seen that the text in the third image is substantially positive; fig. 12 is structured information.

In order to register effective information of the electric vehicle qualification certificate, a user self-defines a template image, establishes a key-value effective corresponding relation, and extracts information in an image to be processed by utilizing the corresponding relation and the template image, so that a function of quickly extracting position information and content required by the qualification certificate is realized; the user can customize the template according to the own requirements, and the effective information of the certificate is obtained by the digital image processing method, so that the structuralization of the certificate is realized, hardware equipment is not required to participate, and the method is simple, convenient and accurate; in addition, the information extraction can be realized without other auxiliary tools, the method can be applied to extracting information in input documents at different angles, and is convenient and fast, and good in robustness.

Referring to fig. 13, fig. 13 is a schematic structural diagram of an embodiment of a document structuring device provided in the present application, in which the document structuring device 130 includes a memory 131 and a processor 132 connected to each other, the memory 131 is used for storing a computer program, and the computer program is used for implementing a method for generating structured information in the foregoing embodiment when being executed by the processor 132.

Referring to fig. 14, fig. 14 is a schematic structural diagram of another embodiment of the document structuring device provided in the present application, in which the document structuring device 140 includes: an acquisition module 141, a processing module 142, and a generation module 143.

The obtaining module 141 is configured to obtain a template image, where the template image includes a plurality of reference fields and a plurality of identification areas, and the areas of the identification areas are different from the areas of the reference fields.

The processing module 142 is connected to the obtaining module 141, and is configured to perform direction correction processing on the obtained first image to obtain a second image; performing text recognition processing on the second image to obtain a text recognition result; matching the text recognition result with the reference field to obtain a matching result; and correcting the second image based on the matching result to obtain a third image, wherein the angle of the third image is the same as that of the template image.

The generating module 143 is connected to the processing module 142, and is configured to generate the structured information based on the text recognition result, the recognition area, and the third image.

Referring to fig. 15, fig. 15 is a schematic structural diagram of an embodiment of a computer-readable storage medium provided in the present application, where the computer-readable storage medium 150 is used for storing a computer program 151, and when the computer program 151 is executed by a processor, the computer program 151 is used for implementing a method for generating structured information in the foregoing embodiment.

The computer readable storage medium 150 may be a server, a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and various media capable of storing program codes.

In the several embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of modules or units is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed.

Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The above description is only an example of the present application and is not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings, or which are directly or indirectly applied to other related technical fields, are intended to be included within the scope of the present application.

Claims

1. A method for generating structured information, comprising:

acquiring a template image, wherein the template image comprises a plurality of reference fields and a plurality of identification areas, and the identification areas are different from the areas where the reference fields are located;

carrying out direction correction processing on the acquired first image to obtain a second image;

performing text recognition processing on the second image to obtain a text recognition result;

matching the text recognition result with the reference field to obtain a matching result;

correcting the second image based on the matching result to obtain a third image, wherein the angle of the third image is the same as that of the template image;

structured information is generated based on the text recognition result, the recognition area, and the third image.

2. The method according to claim 1, wherein the step of performing the orientation correction processing on the acquired first image to obtain the second image includes:

acquiring an identification name field corresponding to the identification area, wherein the area where the identification name field is located is different from the identification area;

and establishing a corresponding relation between the identification name field and the identification area to obtain a mapping table.

3. The method according to claim 2, wherein the text recognition result includes a plurality of sub-recognition results, and the step of generating the structured information based on the text recognition result, the recognition area, and the third image includes:

matching the position of the area where the sub-identification result is located with the position of the identification area to obtain a candidate area;

matching the sub-identification result corresponding to the candidate area with the mapping table to obtain the identification name of the sub-identification result;

and determining the identification name and a sub-identification result corresponding to the identification name as the structured information.

4. The method according to claim 3, wherein the matching result includes a plurality of matching fields, and the step of rectifying the second image based on the matching result to obtain a third image includes:

judging whether the similarity between the sub-identification result and the reference field is greater than a preset similarity or not;

if yes, determining the sub-identification result as the matching field;

and performing perspective transformation on the second image based on the matching field to obtain the third image.

5. The method according to claim 2, wherein the text recognition result includes a plurality of sub-recognition results, and the step of generating the structured information based on the text recognition result, the recognition area, and the third image includes:

matching the sub-identification result with the mapping table to obtain an identification name;

6. The method according to claim 1, wherein the step of performing the orientation correction processing on the acquired first image to obtain the second image includes:

detecting the direction of the first image to obtain a first inclination angle;

judging whether the first inclination angle is a preset angle or not;

if not, performing rotation processing on the first image to obtain a fourth image;

and correcting the fourth image to obtain the second image.

7. The method according to claim 6, wherein the step of performing rectification processing on the fourth image to obtain the second image comprises:

performing Hough line detection on the fourth image to obtain a line detection result;

and based on the straight line detection result, correcting the fourth image to obtain the second image.

8. The method according to claim 7, wherein the line detection result includes a tilt angle of at least one line, and the step of performing rectification processing on the fourth image based on the line detection result to obtain the second image comprises:

averaging the inclination angles of all the straight lines to obtain a second inclination angle;

and rotating the fourth image by the second inclination angle to obtain the second image.

9. The method of generating structured information according to claim 6, wherein said method comprises:

and when the first inclination angle is the preset angle, correcting the first image to obtain the second image.

10. A document structuring apparatus, comprising:

the template image comprises a plurality of reference fields and a plurality of identification areas, and the identification areas are different from the areas where the reference fields are located;

the processing module is connected with the acquisition module and is used for carrying out direction correction processing on the acquired first image to obtain a second image; performing text recognition processing on the second image to obtain a text recognition result; matching the text recognition result with the reference field to obtain a matching result; correcting the second image based on the matching result to obtain a third image, wherein the angle of the third image is the same as that of the template image;

and the generating module is connected with the processing module and used for generating structural information based on the text recognition result, the recognition area and the third image.

11. A document structuring apparatus, comprising a memory and a processor connected to each other, wherein the memory is used for storing a computer program, and the computer program is used for implementing the method for generating structured information according to any one of claims 1 to 9 when being executed by the processor.

12. A computer-readable storage medium for storing a computer program, wherein the computer program is configured to implement the method for generating structured information according to any one of claims 1 to 9 when executed by a processor.