CN113641746B

CN113641746B - Document structuring method, device, electronic equipment and storage medium

Info

Publication number: CN113641746B
Application number: CN202110961595.7A
Authority: CN
Inventors: 朱辉辉; 张建树; 宋时德
Original assignee: iFlytek Co Ltd
Current assignee: iFlytek Co Ltd
Priority date: 2021-08-20
Filing date: 2021-08-20
Publication date: 2024-02-20
Anticipated expiration: 2041-08-20
Also published as: CN113641746A

Abstract

The invention provides a document structuring method, a document structuring device, electronic equipment and a storage medium, wherein the method comprises the following steps: extracting visual characteristics of each text line in the target document; decoding the structural relation of each text line by line and decoding the structural type based on the structural relation based on the visual characteristics of each text line to obtain the structural relation among each text line and the structural type of each text line; and carrying out structuring treatment on the target document based on the structural relation among the text lines and the structuring type of the text lines. The invention determines the structural relation among the text lines and the structural type of the text lines based on the visual characteristics of the text lines in the target document, and carries out the structural processing on the target document based on the structural relation among the text lines and the structural type of the text lines, so that the target document after the structural processing can accurately represent the spatial structural information among the text lines, and has higher robustness.

Description

Document structuring method, device, electronic equipment and storage medium

Technical Field

The present invention relates to the field of document processing technologies, and in particular, to a method and apparatus for structuring a document, an electronic device, and a storage medium.

Background

In the existing paper document office scenario, a typical application scenario is that the paper document needs to be scanned into an electronic document so as to facilitate subsequent inquiry, arrangement or storage.

At present, the electronic document is obtained by scanning a paper document into a picture, identifying the content of a text line in the document from the picture based on an OCR (optical character recognition) network, and then performing spatial modeling sequencing on the text line by using preset rules (such as rules from top to bottom and left to right). However, when the text scene of the paper document is changed, the electronic document constructed based on the preset rule cannot accurately reflect the spatial sequence of the document, and the robustness is poor; in addition, the electronic documents obtained by the method cannot be arranged in batches, for example, when the title directory information of the electronic documents needs to be extracted in batches automatically, the title directory information cannot be extracted from the electronic documents effectively.

Disclosure of Invention

The invention provides a document structuring method, a document structuring device, electronic equipment and a storage medium, which are used for solving the defects that document structuring robustness is poor and electronic documents cannot be sorted in batches in the prior art.

The invention provides a document structuring method, which comprises the following steps:

Extracting visual characteristics of each text line in the target document;

decoding the structural relation of each text line by line and decoding the structural type based on the structural relation based on the visual characteristics of each text line to obtain the structural relation among each text line and the structural type of each text line;

and carrying out structuring processing on the target document based on the structural relation among the text lines and the structuring type of the text lines.

According to the document structuring method provided by the invention, based on the visual characteristics of each text line, the text line-by-line structural relation decoding and the structural type decoding based on the structural relation are carried out to obtain the structural relation among the text lines and the structural type of each text line, and the document structuring method comprises the following steps:

based on the visual characteristics of each text line, decoding the structural relation of the current text line to obtain the correlation between the father node of the current text line and each text line;

determining a parent node text line of the current text line based on correlation between the parent node of the current text line and each text line;

And decoding the structural type of the current text line based on the visual characteristics of the text lines and the correlation between the father node of the current text line and the text lines, so as to obtain the structural type of the text lines.

According to the document structuring method provided by the invention, based on the visual characteristics of each text line, the structural relation decoding is carried out on the current text line to obtain the correlation between the father node of the current text line and each text line, and the method comprises the following steps:

determining a parent node status feature of the current text line based on the visual feature of the current text line and the status feature of a previous text line of the current text line; the state characteristics of the previous text line are determined based on the visual characteristics of the previous text line and the correlation between the parent node of the previous text line and the text lines;

and determining the correlation between the father node of the current text line and each text line based on the father node state characteristic of the current text line and the visual characteristic of each text line.

According to the document structuring method provided by the invention, the method for decoding the structuring type of the current text line based on the visual characteristics of each text line and the correlation between the father node of the current text line and each text line, to obtain the structuring type of each text line, comprises the following steps:

Determining a parent node attention representation of the current text line based on a correlation between the parent node of the current text line and the text lines, and visual characteristics of the text lines;

and decoding the structured type of the current text line based on the visual characteristics of the current text line and the attention representation of the father node of the current text line to obtain the structured type of the current text line.

According to the document structuring method provided by the invention, the method for decoding the structuring type of the current text line based on the visual characteristics of the current text line and the attention representation of the father node of the current text line to obtain the structuring type of the current text line comprises the following steps:

determining a state feature of the current text line based on the parent node attention representation of the current text line and the parent node state feature of the current text line; the parent node state characteristics of the current text line are determined based on the visual characteristics of the current text line and the state characteristics of a previous text line of the current text line;

based on the status features of the current text line and the visual features of the current text line, a structured type of the current text line is determined.

According to the document structuring method provided by the invention, the method for extracting the visual characteristics of each text line in the target document comprises the following steps:

extracting the integral visual characteristics of the target document;

based on the sequence information and the position information of each text line in the target document, the visual characteristics of each text line are extracted from the integral visual characteristics.

According to the document structuring method provided by the invention, the visual features of each text line are extracted from the integral visual features based on the sequence information and the position information of each text line in the target document, and the method comprises the following steps:

extracting regional visual features of each text line from the overall visual features based on the position information;

and based on the sequence information, sequencing and integrating the regional visual features of each text line, and extracting the visual features of each text line based on the integrated regional visual features of each text line.

The invention also provides a document structuring device, which comprises:

the feature extraction unit is used for extracting visual features of each text line in the target document;

the structure determining unit is used for decoding the structural relation of each text line row by row and decoding the structural type based on the structural relation based on the visual characteristics of each text line to obtain the structural relation among each text line and the structural type of each text line;

And the structure processing unit is used for carrying out structural processing on the target document based on the structural relation among the text lines and the structural type of the text lines.

The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of any of the document structuring methods described above when executing the computer program.

The invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of a document structuring method as described in any of the above.

According to the document structuring method, device, electronic equipment and storage medium, the structural relation among text lines and the structuring type of the text lines are determined based on the visual characteristics of the text lines in the target document, and the target document is structured based on the structural relation among the text lines and the structuring type of the text lines, so that the structured target document can accurately represent the spatial structure information among the text lines, and the robustness is high.

Drawings

In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow diagram of a document structuring method provided by the present invention;

FIG. 2 is a schematic representation of a sample of a target document provided by the present invention;

FIG. 3 is a schematic diagram of a spatial structure tree of a target document provided by the present invention;

FIG. 4 is a flowchart of step 120 in the document structuring method provided by the present invention;

FIG. 5 is a flowchart of step 121 in the document structuring method provided by the present invention;

FIG. 6 is a flow chart of step 123 in the document structuring method provided by the present invention;

FIG. 7 is a flow chart of step 1232 in the document structuring method provided by the present invention;

FIG. 8 is a flow chart of step 110 in the document structuring method provided by the present invention;

FIG. 9 is a flow chart of step 112 in the document structuring method provided by the present invention;

FIG. 10 is a schematic diagram of a structure based on a document space structure model provided by the present invention;

FIG. 11 is a schematic diagram of a document structuring device provided by the present invention;

fig. 12 is a schematic structural diagram of an electronic device provided by the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Most enterprises and institutions need to electronize historical paper document data into electronic documents which are easy to find, manage and lose. The method commonly used at present comprises the steps of scanning each page of a paper document into a picture, detecting the position of each text line in the picture through an OCR detection model, inputting the regional characteristics of the corresponding text line into an OCR recognition network for recognition to obtain the content of all the text lines of the document, and finally carrying out space modeling sequencing on the text lines by using preset rules (such as rules from top to bottom and left to right), so as to obtain the electronic document.

However, the method mainly builds an electronic document based on preset rules, for example, the text lines obtained through detection and identification are ordered and modeled according to the rules from top to bottom and from left to right, or visual information corresponding to the text lines is sent into a classification network to classify the categories of the text lines, such as titles, text lines, tables, pictures and the like. When the text scene of the paper document is changed, the electronic document which can accurately reflect the space structure information cannot be obtained based on the preset rule, and the robustness is poor; in addition, the electronic document obtained by the method cannot be further processed in a sorting way, for example, when the title directory information of the electronic document needs to be automatically extracted in batches, the electronic document obtained by modeling by using the preset rule cannot effectively extract the internal information due to the lack of space structure information among text lines.

In this regard, the present invention provides a document structuring method. FIG. 1 is a schematic flow chart of a document structuring method provided by the invention, as shown in FIG. 1, the method comprises the following steps:

and 110, extracting visual characteristics of each text line in the target document.

Specifically, the target document refers to a document to be structured, and the target document may be a paper document or an electronic document. When the target document is a paper document, an image of the paper document can be obtained through scanning, and then the visual characteristics of the image area where each text line is located are extracted from the image. When the target document is an electronic document, the electronic document can be converted into an image format, then the visual characteristics of the image area where each text line is located are extracted from the image, for example, if the electronic document is in a PDF format, the electronic document in the PDF format is converted into the image format, and then the visual characteristics of the image area where each text line is located are extracted from the electronic document in the image format.

In extracting the visual features of each text line, an image feature extraction algorithm (such as HOG, SIFT, etc.) may be used to extract the visual features of each text line from the image of the paper document or the electronic document in the image format. The image of the paper document or the electronic document in the image format can also be input into the feature extraction model to obtain the visual features of each text line output by the feature extraction model. The structure of the feature extraction model may be constructed based on a residual network Resnet50, a BP neural network, and the like.

Step 120, based on the visual characteristics of each text line, decoding the structural relationship of each text line by line and decoding the structural type based on the structural relationship to obtain the structural relationship between each text line and the structural type of each text line.

Specifically, visual features of each text line may characterize semantic information and text scenes of an image region in which each text line is located. Structural relationship decoding refers to determining structural relationships among text lines based on visual features of the text lines, such as parent node text lines corresponding to the text lines, for example, if the current text line is the first text line under heading 1, the text line corresponding to heading 1 is the parent node text line of the current text line.

The decoding of the structured type refers to determining the structured type of each text line based on the visual characteristics of each text line, for example, if the current text line is a text line corresponding to a primary title, the structured type of the current text line is the primary title; if the current text line is the text line corresponding to the secondary title, the structured type of the current text line is the secondary title; if the current text line is paragraph text, the structured type of the current text line is text.

Considering that each text line has a structural relationship in space, the decoding of the structural relationship may be performed line by line, not only related to the characteristics of each text line, but also related to the characteristics and structural relationships of the adjacent text lines, for example, the structural relationship between the current text line and other text lines may be obtained by decoding the structural relationship of the current text line based on the visual characteristics of the current text line and the structural relationship obtained by decoding the previous text line of the current text line.

In particular, considering that each text line is spatially present with respect to not only the characteristics of the text line itself but also the characteristics of its neighboring text lines itself, especially with respect to the structural relationship between the text line itself and other text lines, the structural type decoding may be performed line by line, and when the structural type decoding is performed for the current text line, not only the characteristics of the text line itself but also the information of the structural relationship of the current text line itself, for example, the structural type decoding may be performed based on the visual characteristics of the current text line and the structural relationship between the current text line and other text lines, to obtain the structural type of the current text line.

Therefore, based on the visual characteristics of each text line in the target document, the structural relation among the text lines and the structural type of each text line can be accurately determined, compared with the traditional method that spatial modeling and sorting are carried out on the electronic document based on preset rules, the spatial structural information among the text lines in the electronic document cannot be adaptively adjusted according to text scene changes, the visual characteristics of each text line in the document structuring method provided by the embodiment of the invention can represent semantic information and text scenes of each text line in the corresponding target document, so that the spatial structural information among the text lines and the structural type of each text line can be accurately determined according to the target document under different text scenes, and the robustness is higher.

And 130, carrying out structuring processing on the target document based on the structural relation among the text lines and the structuring type of the text lines.

Specifically, the structural relationship among the text lines and the structural type of each text line can represent the spatial structure information among the text lines in the target document, and the spatial structure information among the text lines in the target document can be obtained by carrying out structural processing on the target document based on the structural relationship among the text lines and the structural type of each text line, for example, a spatial structure tree of the target document can be constructed.

Further, after the structuring process is performed on the target document, a corresponding downstream task, such as extracting the title structure of the target document, acquiring the document content under the specified title in the target document, or the like, may be performed based on the spatial structure information contained in the structured target document.

According to the document structuring method provided by the embodiment of the invention, based on the visual characteristics of each text line in the target document, the structural relation among the text lines and the structuring type of each text line are determined, and based on the structural relation among the text lines and the structuring type of each text line, the target document is structured, so that the structured target document can accurately represent the spatial structure information among the text lines, and the robustness is high. In addition, the structural type of each text line is obtained by decoding the structural type of each text line based on the structural relationship, namely, when the structural type of the current text line is determined, the characteristics of the current text line are considered, and the structural relationship between the current text line and other text lines is considered, so that the structural type of the current text line obtained by decoding has higher robustness.

In addition, the embodiment of the invention can adopt a document structuring model, such as an encoder-decoder framework to acquire the structural relation among the text lines and the structuring type of the text lines, and the method specifically comprises the following steps: inputting a target document (such as a scanned paper document image or an electronic document converted into an image format) into an encoder in a document structuring model for encoding, extracting visual features of each text line, and then inputting the visual features of each text line into a decoder in the document structuring model for decoding to obtain structural relations among each text line and structuring types of each text line.

Before inputting the image corresponding to the target document into the document structured model, the document structured model can be trained in advance, and the method can be realized by executing the following steps: first, a large number of sample target documents are collected, and sample structural relations among corresponding text lines and sample structural type labels of the text lines are determined through manual labeling. And then training the initial model based on the sample target document, the sample structural relation among the text lines and the sample structural type labels of the text lines, so as to obtain a document structural model.

Based on the above embodiment, the spatial structure tree of the target document is constructed based on the following process:

as shown in fig. 2, the target document is composed of 36 text lines, the 1 st text line (id is 1) belongs to the document name class, the 2 nd, 7 th, 11 th, 16 th, 20 th, 25 th and 32 nd text lines belong to the primary title class, the 17 th and 20 th text lines belong to the secondary title class, and the rest of the text lines belong to the text class. The document name is "xx verification function optimization project contract", the document comprises six primary titles (contract standard, application scope, contract payment mode, contract demand, progress, workload control, demand change and contract effect), each primary title is provided with corresponding paragraph text (such as paragraph text corresponding to the primary title under the contract standard) or secondary titles (such as secondary title "personnel composition" and "schedule corresponding to the primary title under the contract demand, progress and workload control"), and space structure information among text lines in the document can be determined based on the information.

As shown in fig. 3, before the spatial structure tree of the target document is constructed, the priority of the structured type of each text line may be determined, for example, if the structured type of each text line includes a document name (title), a primary title (header 1), a secondary title (header 2), a tertiary title (header 3), a quaternary title (header 4), a fifth title (header 5), a sixth title (header 6), a header (header), a footer (footer), a table (table), a picture (picture), and a text (text), the priority relationship is: document name > primary header > secondary header > tertiary header > quaternary header > sixth header > header = footer = table = picture = text (paragraph first line) > text (other).

Wherein the parent node of each text line is the text line preceding the text line and having a higher priority than itself. For example, the text line of the secondary title class of id20 has id1, id2, id7, id11 and id16, and the text line of the secondary title class of id20 has priority higher than itself before id20, and the text line of the secondary title class of id20 has id16 which is the text line nearest to it, so that the text line of the parent node of the text line of the secondary title class of id20 has id16, and the corresponding structuring type of id20 is the secondary title (header 2). Similarly, the parent node text line id1 corresponding to the primary title text line of id16 corresponds to the primary title (header 1) as the corresponding structured type. In addition, for a paragraph of text that is composed of multiple lines of text lines together, such as the paragraph text line of id 27-id30 in FIG. 2, the parent nodes of text lines within the paragraph that are not the first line of text of the paragraph are all the first line text of the paragraph, so the parent node text line id27 corresponding to the text line of id28-id30, and the structured type corresponding to id28-id30 is text (text).

Thus, based on the above method, the subtree sequence of the spatial structure tree can be determined, as the subtree sequence of id1-id10 in FIG. 3 can be expressed as (1, 0, title), (2, 1, header 1), (3, 2, text), (4, 3, text), (5, 2, text), (6, 5, text), (7, 1, header 1), (8, 7, text), (9, 8, text), (10, 8, text) in order.

The parent node text lines corresponding to the text lines can represent the structural relationship among the text lines, so that modeling is performed based on the parent node text lines corresponding to the text lines and the structural types of the text lines, a spatial structure tree of the target document shown in fig. 3 can be obtained, and the spatial structure tree contains spatial structure information among the text lines, and can be used by downstream tasks, such as extracting a document title structure, acquiring document contents under a specified title, and the like.

Based on any of the above embodiments, fig. 4 is a schematic flow chart of step 120 in the document structuring method provided by the present invention, and as shown in fig. 4, step 120 includes:

and step 121, based on the visual characteristics of each text line, decoding the structural relation of the current text line to obtain the correlation between the father node of the current text line and each text line.

Specifically, the correlation between the parent node of the current text line and each text line is used to characterize the degree of correlation between the parent node of the current text line and each text line, and the higher the degree of correlation, the higher the probability that the parent node of the current text line corresponds to the text line. The visual characteristics of each text line comprise semantic information of each text line, and based on the visual characteristics of each text line, the structural relation of the current text line is decoded, so that the correlation between the father node of the current text line and each text line can be obtained.

Step 122, determining the text line of the parent node of the current text line based on the correlation between the parent node of the current text line and each text line.

Specifically, since the correlation between the parent node of the current text line and each text line is used to characterize the degree of correlation between the parent node of the current text line and each text line, the higher the degree of correlation, the higher the probability that the parent node of the current text line corresponds to the text line. Here, the correlation between the parent node of the current text line and each text line may characterize the structural relationship between the current text line and other text lines. Alternatively, the text line with the greatest relevance may be the parent text line of the current text line.

Step 123, based on the visual characteristics of each text line and the correlation between the parent node of the current text line and each text line, decoding the structured type of the current text line to obtain the structured type of each text line.

Specifically, the visual features of each text line include semantic information of each text line, the correlation between the parent node of the current text line and each text line represents the correlation degree between the parent node of the current text line and each text line, when the structural type of the current text line is decoded based on the parent node and the text line, the state features of the current text line can be obtained, and the structural type of each text line is determined based on the state features of the current text line.

Here, since the correlation between the parent node of the current text line and each text line may represent the structural relationship between the current text line and other text lines, when the current text line is decoded in step 123, the correlation between the parent node of the current text line and each text line is referred to as the structural relationship between the current text line and other text lines, so that the stability of the decoded structured type of the current text line is higher.

Based on any of the above embodiments, fig. 5 is a schematic flow chart of step 121 in the document structuring method provided by the present invention, and as shown in fig. 5, step 121 includes:

step 1211, determining a parent node status feature of the current text line based on the visual feature of the current text line and the status feature of the previous text line of the current text line; the status feature of the previous text line is determined based on the visual feature of the previous text line and the correlation between the parent node of the previous text line and each text line;

step 1212, determining a correlation between the parent node of the current text line and each text line based on the parent node status feature of the current text line and the visual features of each text line.

Specifically, the state feature of the previous text line of the current text line includes the feature information of the previous text line, and the parent node feature information of the current text line, that is, the parent node state feature of the current text line, can be determined in combination with the visual feature of the current text line. Wherein the status feature of the previous text line is determined based on the visual features of the previous text line and the correlation between the parent node of the previous text line and each text line.

After the father node state characteristics of the current text line are obtained, the father node state characteristics are compared with the visual characteristics of each text line, and the higher the matching degree of the information contained in the visual characteristics of each text line and the father node state characteristics of the current text line is, the higher the probability of the father node of the current text line corresponding to the text line is, namely the higher the correlation between the father node of the current text line and each text line is.

Alternatively, embodiments of the present invention may obtain parent node state characteristics of the current text line through a gating loop unit (Gate Recurrent Unit, GRU), such as will be appropriateVisual features of the text lineAnd status feature of the previous text line of the current text line +.>Inputting the Parent node status feature of the current text line into the Parent GRU to obtain the Parent node status feature of the current text line- >

Next, the parent node state characteristics of the current text line are determinedPerforming attention transformation with a feature map A (the feature map A is obtained by splicing visual features of each text line) to obtain the correlation between the father node of the current text line and each text line>

Wherein,the specific implementation mode of the method is as follows:

wherein a is _i Representing the ith text line in feature map a,and->Is a learnable parameter.

Based on any of the above embodiments, fig. 6 is a schematic flow chart of step 123 in the document structuring method provided by the present invention, and as shown in fig. 6, step 123 includes:

step 1231, determining a parent node attention representation of the current text line based on the correlation between the parent node of the current text line and each text line, and the visual characteristics of each text line;

step 1232, based on the visual features of the current text line and the attention representation of the parent node of the current text line, decoding the structured type of the current text line to obtain the structured type of the current text line.

Specifically, after obtaining the correlation between the parent node of the current text line and each text line, the correlation between the parent node of the current text line and each text line may be used as a weight, and the visual characteristics of each text line are combined to determine the attention representation of the parent node of the current text line. For example, the parent node attention representation of the current text line may be calculated using the following formula

The father node attention representation of the current text line contains the contextual characteristic information of the father node of the current text line, and the structural type decoding is carried out by combining the visual characteristics of the current text line, so that the structural type of the current text line can be obtained.

Based on any of the above embodiments, fig. 7 is a schematic flow chart of step 1232 in the document structuring method provided by the present invention, and as shown in fig. 7, step 1232 includes:

step 1232-1, determining a status feature of the current text line based on the parent node attention representation of the current text line and the parent node status feature of the current text line; the parent node state characteristics of the current text line are determined based on the visual characteristics of the current text line and the state characteristics of the previous text line of the current text line;

step 1232-2, determining the structured type of the current line of text based on the status features of the current line of text and the visual features of the current line of text.

Specifically, the parent node attention representation of the current text line includes contextual feature information of the parent node of the current text line, the parent node state feature of the current text line includes feature information of the parent node, and the state feature of the current text line can be determined based on the contextual feature information and the contextual feature information. Wherein the parent node state characteristic of the current text line is determined based on the visual characteristic of the current text line and the state characteristic of the previous text line of the current text line.

The visual features of the current text line comprise contextual feature information of the current text line, and the structural type of the current text line is determined by combining the state features of the current text line.

Optionally, embodiments of the present invention may represent the parent node attention of the current text lineAnd parent node status feature of the current text line +.>Inputting the text line to a Child GRU, and analyzing the text line by the Child GRU to obtain the state characteristic +.>

Then, based on the state characteristics of the current text lineAnd visual characteristics of the current text line +.>Co-parsing to obtain structured type ++for current text line>

Wherein,is a learnable parameter.

Based on any of the above embodiments, fig. 8 is a schematic flow chart of step 110 in the document structuring method provided by the present invention, and as shown in fig. 8, step 110 includes:

step 111, extracting the integral visual characteristics of the target document;

step 112, extracting the visual features of each text line from the overall visual features based on the sequence information and the position information of each text line in the target document.

Specifically, the feature information of all text lines in the target document is integrated in the overall visual feature, so that the regional visual feature of each text line can be extracted based on the position information of each text line in the target document (such as the coordinates of each text line in the target document). Based on the regional visual features and the sequence information of each text line (such as text lines are arranged from front to back according to page numbers, the same page numbers are arranged from top to bottom and from left to right), the visual features used for representing the context information of each text line can be extracted from the regional features of each text line.

Based on any of the above embodiments, fig. 9 is a schematic flow chart of step 112 in the document structuring method provided by the present invention, and as shown in fig. 9, step 112 includes:

step 1121, extracting regional visual features of each text line from the overall visual features based on the position information;

and 1122, based on the sequence information, sorting and integrating the regional visual features of each text line, and extracting the visual features of each text line based on the integrated regional visual features of each text line.

Specifically, the location information is used to characterize the coordinate location of each text line in the target document, so based on the location information, the area where each text line is located can be determined from the overall visual characteristics, and the visual characteristics of the area of each text line in the corresponding area can be extracted.

The sequence information is used for representing the context sequence information of each text line, so that based on the sequence information, the regional visual features of each text line can be integrated in a sequencing way, and the regional visual features of each text line after the sequencing integration are extracted in a feature way, so that the visual features used for representing the context information of each text line are obtained.

Optionally, the target document may be input into the network of the Resnet50 to extract the overall visual feature map, and then the labeled text line coordinates are utilized to cut out the regional visual features corresponding to each text line from the overall visual feature map, and the regional visual features corresponding to each text line are input into the bidirectional GRU network to extract the visual features of each text line, so that the visual features of each text line include the context information of each text line, and the robustness of the visual features of each text line may be enhanced.

Based on any one of the above embodiments, the present invention further provides a document structuring method, and fig. 10 is a schematic structural diagram based on a document space structure model provided by the present invention, as shown in fig. 10, where in the embodiment of the present invention, a structuring process is performed on a target document by using a document space structure model (encoder+decoder), and specifically includes:

firstly, converting a target document into a picture format, inputting the target document in the picture format into a Resnet50 network module in an Encoder, extracting the overall visual feature map of the target document, utilizing coordinate labels of all text lines in the target document, clipping the overall visual feature map to obtain regional visual features of all the text lines through a clipping module, inputting the regional visual features of all the text lines into a bidirectional GRU network module in the Encoder, extracting context information of all the text lines, and obtaining the visual features of all the text lines.

Then, the visual characteristics of the current text line are determinedAnd status features of a previous text line to the current text lineThe Parent node status feature of the current text line is acquired by a Parent GRU module input into a Decoder>

At the parent node status feature of the current text line Thereafter, the parent node status feature of the current text line is +.>The characteristic diagram A obtained by splicing the visual characteristics of each text line is input to a part attribute module in a Decoder, and the Parent node state characteristics of the current text line are added by the part attribute module>Performing attention transformation on the feature map A to obtain the correlation between the parent node of the current text line and each text line ∈>And parent node attention representation of the current text line +.>

Next, the parent node attention of the current text line is representedParent node status feature for current text lineInputting the text line to a Child GRU module, and analyzing the text line to obtain the state characteristics of the current text line>

Finally, the Child GRU module is used for carrying out state characteristics on the current text lineVisual features of current text linesResolving to obtain the structured type ++of the current text line>

The loss function of the document space structure model is as follows:

Loss＝λ ₁ ξ _p +λ ₂ ξ _c

wherein,true vector representation representing parent node corresponding to text line at time t, for example>Representing the true value, lambda, of the structured type of the text line at time t ₁ And lambda (lambda) ₂ For parameters which can be set according to the actual situation, e.g. lambda ₁ ＝λ ₂ ＝1。

The document structuring device provided by the invention is described below, and the document structuring device described below and the document structuring method described above can be referred to correspondingly.

Based on any one of the above embodiments, the present invention further provides a document structuring device, and fig. 11 is a schematic structural diagram of the document structuring device provided by the present invention, as shown in fig. 11, where the device includes:

a feature extraction unit 1110 for extracting visual features of each text line in the target document;

a structure determining unit 1120, configured to decode the structural relationship of each text line by line and decode the structural type based on the structural relationship based on the visual feature of each text line, so as to obtain the structural relationship between each text line and the structural type of each text line;

and a structure processing unit 1130, configured to perform a structuring process on the target document based on the structural relationship between the text lines and the structuring type of the text lines.

Based on any of the above embodiments, the structure determining unit 1120 includes:

the correlation determining unit is used for decoding the structural relation of the current text line based on the visual characteristics of each text line to obtain the correlation between the father node of the current text line and each text line;

a parent node text line determining unit, configured to determine a parent node text line of the current text line based on a correlation between a parent node of the current text line and each text line;

And the structuring type determining unit is used for decoding the structuring type of the current text line based on the visual characteristics of the text lines and the correlation between the father node of the current text line and the text lines to obtain the structuring type of the text lines.

Based on any one of the above embodiments, the correlation determination unit includes:

a parent node state feature determining unit, configured to determine a parent node state feature of the current text line based on the visual feature of the current text line and a state feature of a previous text line of the current text line; the state characteristics of the previous text line are determined based on the visual characteristics of the previous text line and the correlation between the parent node of the previous text line and the text lines;

and the first determining unit is used for determining the correlation between the father node of the current text line and each text line based on the father node state characteristic of the current text line and the visual characteristic of each text line.

Based on any of the above embodiments, the structuring type determining unit comprises:

an attention representation unit for determining a parent node attention representation of the current text line based on a correlation between the parent node of the current text line and the text lines and visual features of the text lines;

And the second determining unit is used for decoding the structural type of the current text line based on the visual characteristics of the current text line and the attention representation of the father node of the current text line to obtain the structural type of the current text line.

Based on any one of the above embodiments, the second determining unit includes:

a text line state determining unit, configured to determine a state feature of the current text line based on a parent node attention representation of the current text line and a parent node state feature of the current text line; the parent node state characteristics of the current text line are determined based on the visual characteristics of the current text line and the state characteristics of a previous text line of the current text line;

and a third determining unit, configured to determine a structured type of the current text line based on the state feature of the current text line and the visual feature of the current text line.

Based on any of the above embodiments, the feature extraction unit 1110 includes:

the overall characteristic extraction unit is used for extracting overall visual characteristics of the target document;

and the visual feature extraction unit is used for extracting the visual features of the text lines from the integral visual features based on the sequence information and the position information of the text lines in the target document.

Based on any one of the above embodiments, the visual feature extraction unit includes:

a region feature extraction unit, configured to extract a region visual feature of each text line from the overall visual feature based on the location information;

and the visual feature extraction subunit is used for sorting and integrating the regional visual features of each text line based on the sequence information and extracting the visual features of each text line based on the integrated regional visual features of each text line.

Fig. 12 is a schematic structural diagram of an electronic device according to the present invention, and as shown in fig. 12, the electronic device may include: processor (1210), memory (memory) 1220, communication interface (Communications Interface) 1230 and communication bus 1240, wherein processor 1210, memory 1220 and communication interface 1230 communicate with each other via communication bus 1240. Processor 1210 may invoke logic instructions in memory 1220 to perform a document structuring method comprising: extracting visual characteristics of each text line in the target document; decoding the structural relation of each text line by line and decoding the structural type based on the structural relation based on the visual characteristics of each text line to obtain the structural relation among each text line and the structural type of each text line; and carrying out structuring processing on the target document based on the structural relation among the text lines and the structuring type of the text lines.

Further, the logic instructions in the memory 1220 described above may be implemented in the form of software functional units and may be stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

In another aspect, the present invention also provides a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, enable the computer to perform a document structuring method provided by the above methods, the method comprising: extracting visual characteristics of each text line in the target document; decoding the structural relation of each text line by line and decoding the structural type based on the structural relation based on the visual characteristics of each text line to obtain the structural relation among each text line and the structural type of each text line; and carrying out structuring processing on the target document based on the structural relation among the text lines and the structuring type of the text lines.

In yet another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the document structuring method provided above, the method comprising: extracting visual characteristics of each text line in the target document; decoding the structural relation of each text line by line and decoding the structural type based on the structural relation based on the visual characteristics of each text line to obtain the structural relation among each text line and the structural type of each text line; and carrying out structuring processing on the target document based on the structural relation among the text lines and the structuring type of the text lines.

The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method of structuring a document, comprising:

extracting visual characteristics of each text line in the target document;

based on the structural relation among the text lines and the structural type of the text lines, carrying out structural processing on the target document;

decoding the text lines row by row structural relationship based on the visual features of the text lines, including: based on the visual characteristics of each text line, decoding the structural relation of the current text line to obtain the correlation between the father node of the current text line and each text line;

based on the visual characteristics of each text line, decoding the structural relation of the current text line to obtain the correlation between the father node of the current text line and each text line, including:

2. The document structuring method according to claim 1, wherein the decoding the structural relationship of each text line by line and the decoding the structural type based on the structural relationship based on the visual features of each text line to obtain the structural relationship between each text line and the structural type of each text line comprises:

3. The method for structuring a document according to claim 2, wherein said decoding the structured type of the current text line based on the visual features of the text lines and the correlation between the parent node of the current text line and the text lines to obtain the structured type of the text lines comprises:

4. A document structuring method according to claim 3, wherein said decoding the structured type of the current text line based on the visual features of the current text line and the parent node attention representation of the current text line to obtain the structured type of the current text line comprises:

5. The document structuring method according to any one of claims 1 to 4, wherein the extracting visual features of each text line in the target document comprises:

extracting the integral visual characteristics of the target document;

6. The document structuring method according to claim 5, wherein the extracting visual features of each text line from the overall visual features based on the sequence information and the position information of each text line in the target document comprises:

7. A document structuring apparatus, comprising:

The structure processing unit is used for carrying out structural processing on the target document based on the structural relation among the text lines and the structural type of the text lines;

8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the document structuring method according to any one of claims 1 to 6 when the program is executed.

9. A non-transitory computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when executed by a processor, implements the steps of the document structuring method according to any one of claims 1 to 6.