WO2018161764A1

WO2018161764A1 - Document reading-order detection method, computer device, and storage medium

Info

Publication number: WO2018161764A1
Application number: PCT/CN2018/075626
Authority: WO
Inventors: 朱传聪
Original assignee: 腾讯科技（深圳）有限公司
Priority date: 2017-03-08
Filing date: 2018-02-07
Publication date: 2018-09-13
Also published as: CN108334805A; CN108334805B

Abstract

A document reading-order detection method comprises: a computer device identifying text blocks in a document image and constructing a block set; determining a start text block from the block set; performing, according to feature information of the start text block, a path searching operation on the start text block to determine a first text block of the block set corresponding to the start text block, the feature information of a text block comprising position information of the text block in the document image and layout information of the text block; iteratively performing the above steps until an order of execution of the path searching operations respectively corresponding to the text blocks in the block set can be uniquely determined; and determining the order of execution of the path searching operations corresponding to the text blocks in the block set, and obtaining, according to the order of execution, a reading-order of the text blocks in the document image.

Description

Method, computer device and storage medium for detecting document reading order

The present application claims the priority of the Chinese Patent Application entitled "Method and Apparatus for Detecting the Reading Order of Documents" by the Chinese Patent Office, filed on March 8, 2017, the entire disclosure of which is hereby incorporated by reference. in.

Technical field

The present application relates to the field of computer technology, and in particular, to a method, a computer device and a storage medium for detecting a reading order of a document.

Background technique

OCR (Optical Character Recognition) is a kind of algorithm for describing document image recognition. It is an image file for optical characters that converts text in a paper document into a black and white dot matrix image. The software converts the text in the image into a text format for further processing by the word processing software.

In OCR technology, methods based on directed graphs, fixed rules, and semantic analysis are commonly used to identify the reading order of documents. However, in complex environments or for complex document images, the recognition order of reading order is higher. There is a problem that the recognition performance is unstable.

Summary of the invention

Various embodiments provided in accordance with the present application provide a method, computer device, and storage medium for detecting a reading order of a document.

A method of detecting a reading order of a document, comprising:

The computer device identifies a block of text contained in the document picture to construct a block set;

The computer device determines a starting text block from the set of blocks;

The computer device performs a routing operation on the starting text block according to the feature information of the starting text block to determine a first text block corresponding to the starting text block in the block set; a text block The feature information includes location information of the text block in the document picture and layout information of the text block;

The computer device performs a routing operation on the first text block according to the feature information of the first text block to determine a text block corresponding to the first text block in the block set; and so on Until the execution order of the routing operations corresponding to each text block in the block set can be uniquely determined; and

The computer device determines an execution order of the routing operations corresponding to the text blocks in the block set, and obtains a reading order of the text blocks in the document picture according to the execution order.

A computer device comprising a memory and a processor, the memory storing computer readable instructions, the computer readable instructions being executed by the processor such that the processor performs the following steps:

Identify a block of text contained in the document image to construct a block set;

Determining a starting text block from the set of blocks;

Performing a routing operation on the starting text block according to the feature information of the starting text block to determine a first text block in the block set corresponding to the starting text block; the feature information of the text block includes Position information of the text block in the document picture and layout information of the text block;

Performing a routing operation on the first text block according to the feature information of the first text block to determine a text block corresponding to the first text block in the block set; and so on until the block The execution order of the routing operations corresponding to each text block in the collection can be uniquely determined;

Determining an execution order of the routing operations corresponding to the text blocks in the block set, and obtaining a reading order of the text blocks in the document picture according to the execution order.

One or more non-volatile storage media storing computer readable instructions, when executed by one or more processors, cause one or more processors to perform the following steps:

Determining a starting text block from the set of blocks;

Details of one or more embodiments of the present application are set forth in the accompanying drawings and description below. Other features, objects, and advantages of the invention will be apparent from the description and appended claims.

DRAWINGS

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the description of the embodiments will be briefly described below. It is obvious that the drawings in the following description are only some embodiments of the present application. Other drawings may also be obtained from those of ordinary skill in the art in light of the inventive work.

1 is a schematic diagram of an application environment of a solution of the present application in an embodiment;

2 is a schematic flowchart of a method for detecting a reading order of a document according to an embodiment;

3 is a schematic diagram of a text block included in a document picture of an embodiment;

4 is a schematic diagram of a neural network model of an embodiment;

5 is a schematic flow chart of training a neural network model according to a training sample according to an embodiment;

6 is a schematic structural diagram of an apparatus for detecting a reading order of a document according to an embodiment; and

FIG. 7 is a schematic structural diagram of an apparatus for detecting a reading order of a document according to another embodiment.

detailed description

In order to make the objects, technical solutions, and advantages of the present application more comprehensible, the present application will be further described in detail below with reference to the accompanying drawings and embodiments. It is understood that the specific embodiments described herein are merely illustrative of the application and are not intended to be limiting.

.

1 is a schematic diagram of an application environment of a solution of the present application in an embodiment; an application environment for implementing a method for detecting a reading order of a document in the embodiment of the present application is an intelligent terminal provided with an OCR system, and the smart terminal at least includes a passing system A bus-connected processor, display module, power interface, and memory, the memory including a non-volatile storage medium and an internal memory. The smart terminal identifies and displays the text information contained in the document picture through the OCR system. The display module can display the text information recognized by the OCR system; the power interface is used for connecting with an external power source, and the external power source supplies power to the smart terminal battery through the power interface; the non-volatile storage medium stores at least An operating system, an OCR system, a database, and computer readable instructions that, when executed, cause the processor to perform a method of detecting a reading order of the document. The smart terminal may be a mobile phone, a tablet computer, or the like, or may be another device having the above structure. It will be understood by those skilled in the art that the structure shown in FIG. 1 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation of the computer device to which the solution of the present application is applied. The specific computer device may It includes more or fewer components than those shown in the figures, or some components are combined, or have different component arrangements.

In conjunction with FIG. 1 and the above description of the application environment, an embodiment of a method of detecting a document reading order will be described below.

FIG. 2 is a schematic flowchart of a method for detecting a reading order of a document according to an embodiment; as shown in FIG. 2, the method for detecting a reading order of a document in the embodiment includes the following steps:

S110. Identify a text block included in a document picture, and construct a block set.

In this embodiment, the document picture may be binarized to obtain a binarized document picture. In the binarized document picture, the value of each pixel is represented by 0 or 1. Then, based on the binarized document image, the scale analysis and the layout analysis are performed to obtain all the text blocks contained in the document. The scale analysis refers to finding the scale information of each character in the binarized document picture. The scale is in pixels, and the value is the square root of the area of the rectangular area occupied by the characters. Layout analysis refers to an algorithm in OCR that divides the content of a document image into a plurality of non-overlapping regions according to information such as paragraphs and pagination. This will result in all the text blocks contained in the document, as shown in Figure 3 or Figure 5.

In another embodiment, the step of pre-processing the document picture further includes the step of correcting the document picture. That is, if the initial state of the document image to be detected is deviated from the preset standard state, the document picture is corrected to conform to the standard state. For example, if it is detected that there is a tilt, upside down, etc. in the initial state of the document picture, the direction of the document picture needs to be corrected first.

S120. Determine a starting text block from all the text blocks (ie, in the block set).

Generally, when reading a document, people start reading from a vertex (for example, the upper left corner) of the document. Based on this, in one embodiment, a center point coordinate can be selected from the block set to be located in the document image. A text block of a vertex and the text block is determined as the starting text block. For example, a text block located on the left and top of the document picture is determined as a starting text block, such as the text block R ₁ shown in FIG. 3, or the text block R ₁ shown in FIG. 5.

It will be appreciated that in other embodiments, other text blocks may also be determined as the starting text block for different documents and actual reading habits (eg, documents formatted from right to left).

S130, starting a path from the starting text block; performing a routing operation on the starting text block according to the feature information of the starting text block to determine a first text in the block set corresponding to the starting text block. And performing a routing operation on the first text block according to the feature information of the first text block to determine a text block corresponding to the first text block in the block set; and so on until the block The order of execution of the routing operations corresponding to each text block in the collection can be uniquely determined.

The feature information of the text block includes location information of the text block in the document image and layout information of the text block.

The path finding operation on the text block is actually based on the feature information of the text block to obtain the feature prediction information of the corresponding next text block. In an embodiment, the routing operation of the text block includes: learning, by using a pre-trained machine learning model, feature information of the text block to obtain feature prediction information of the text block corresponding to the text block; a correlation between feature information of each text block in which the path finding operation is not performed and the feature prediction information in the block set; and then determining a text block corresponding to the text block according to the calculated correlation degree.

In this embodiment, step S130 is a process of automatically routing a text block included in the document from the initial text block, and only needs to determine the next text block corresponding to the current text block each time the path is found. For example, a document image shown in FIG. 3, the current text block R _1, may determine that the next block of text is a text block of R ₁ R ₂ through this routing; R ₂ was then performed again as the current routing text, to give R The next text block of ₂ is R ₄ ; and so on, until the routing operation is performed on R ₆ , and it is determined that the next text block corresponding to R ₆ is R ₇ , although R ₇ and R _{8 are} not performed at this time. The path operation, but since it has been determined that the next text block corresponding to R ₆ is R ₇ , the execution order of the routing operations corresponding to R ₇ and R ₈ can be uniquely determined (ie, R _{7 and} then R ₈ ). Through the above automatic path finding method, the size and style of the document picture are more robust. And the basis of automatic path finding is based on the position between the text blocks and the layout information of the layout, so it can better overcome the image noise or the influence of the recognition environment on the detection results, which is beneficial to ensure the accuracy of the detection results.

In this embodiment, the machine learning model is trained in advance through a suitable training sample, so that the machine learning model can output a more accurate prediction result, and then an accurate next text block can be determined based on the correlation, which is applicable to Document reading order detection for various mixed document types. The machine learning model may be a neural network model or a probabilistic model of other non-neural networks.

S140. Determine an execution sequence of the routing operation corresponding to the text block in the block set, and obtain a reading order of the text block in the document picture according to the execution sequence.

Through the automatic path finding in step S130, each text block and its corresponding next text block can be obtained. When the automatic path finding ends, all the texts can be obtained according to all the text blocks and the next text block corresponding to each text block. The order in which the blocks are read. For example, after the automatic path finding is completed, the reading order of the text blocks in the document picture shown in FIG. 3 can be obtained as R ₁ → R ₂ → R ₄ → R ₅ → R ₃ → R ₆ → R ₇ → R ₈ .

Based on the method for detecting the reading order of the document according to the above embodiment, firstly identifying all the text blocks included in the document picture; determining a starting text block from all the text blocks, starting from the starting text block, and according to the text block in the document picture The location information in the text and the layout information of the text block determine which text block area should be taken next until the reading order of all the text blocks is obtained. Therefore, it can be compatible with various scenes, and has better robustness to the size, noise, and style of the document picture, and thus can accurately recognize the document reading order corresponding to each type of document picture.

In one embodiment, the machine learning module includes a plurality of parameters, and the method for detecting a reading order of the document further includes the step of training the machine learning model to enable the machine learning model output after the training The Euclidean distance between the feature prediction information and the corresponding sample information satisfies the set condition. The Euclidean distance refers to the Euclidean metric, which represents the spatial distance of two identical dimensional vectors.

In one embodiment, the manner in which the machine learning module is trained may include the following process:

First, get a training sample. Samples refer to data that has been calibrated during machine learning, including input data and output data. In this embodiment, the training samples are a plurality of sample blocks that participate in the training of the machine learning module, and the reading order of the plurality of sample blocks is known.

Then, a corresponding sample library M={G, S, T} is established based on the training samples. Where G denotes a set of sample blocks, S denotes a set of sequential states of the sample blocks in successive trainings, and T denotes a sequence of state changes to be determined during training. If the total number of sample blocks in G is n, then,

S={s _i ;i∈[1,n],s _i ∈[0,n]};

T={{R ₁ ,S ₁ ,S ₂ },{R ₂ ,S ₂ ,S ₃ },...{R _n-2 ,S _n-2 ,S _n-1 }};

If s _i =0 indicates that the reading order of the sample block R _i is not determined (ie, the order in which the routing operation is performed is not determined), if s _i >0 indicates that the reading order of the sample block R _i has been determined (ie, the order in which the routing operations are performed) It has been determined), and the reading order is the value of s _i , expressed as S(R _i )=s _i . Each item in each of the above T sequences represents a sample block currently participating in training, a current set of sequential states of each sample block in G, and a set of next sequential states of each sample block in G to be predicted. . Specifically, taking the sequence of {R ₂ , S ₂ , S ₃ } as an example, R ₂ indicates that the sample block currently participating in the training is R ₂ , and S ₂ represents the sequence state corresponding to each sample block in the G when R ₂ participates in training, S ₃ indicates the next sequential state of each sample block in G to be predicted when R _{2 is} involved in training. Among them, since the remaining last two sample blocks can be directly determined by the exclusion method, they do not need training, so only n-2 sequences need to be included in T.

Then, based on the sample library M={G, S, T} described above, the machine learning model is trained by sequentially using each state change sequence in T; after all the state change sequences in T participate in the training, the machine is saved. Learn the parameters in the model.

In an embodiment, the specific implementation of training the parameters in the machine learning model according to the kth sequence {R _k , S _k , S _k+1 } in T may include the following steps 1 to 5:

Step 1, the feature information of the sample block R _k is input into the machine learning model, and the feature prediction information O _k , k ∈ [1, n-2] of the next text block of R _k output by the machine learning model is obtained;

Step 2: Obtain a sample block R _i with a sequential state of 0 in S _k , and obtain a set G ^* :

G ^* ={R _i ;S _k (R _i )=0}; i∈[1,n];

The dimension of the set G ^* is nk;

Step 3, the G ^* respectively in the dot product of O _k, to obtain a set of ^{_{V * = {v i = R}} i · O k};

Step 4, obtaining G ^* each sample block R _i in S _{k +} sequential state corresponding _1, to give a set of ^{_{V π = {v 'i =}} S k + 1 (R i)}; set of dimensions ^V π of the set G ^* is equal to a dimension.

Step 5, normalizing V ^* can be obtained

Normalizing V ^π to obtain a set V ^ππ ={v′′ _i =v′ _i /sum(V ^π )}; constructing the corresponding loss function of the sample block R _k according to V ^** and V ^ππ Loss, based on the loss function, updating parameters in the machine learning model by a BP algorithm, wherein the loss function loss is:

In this embodiment, the loss function refers to an error obtained by machine learning calculation in the machine learning process, and the error can be measured using a plurality of functions, and the function is generally a convex function. That is, the loss function corresponding to the sample block R _k participating in the training is constructed according to the Euclidean distance of V ^** and V ^ππ . The Euclidean distance is the Euclidean metric, indicating that the two are mostly spatial distances of the dimensional vector. Through the loss function obtained in each learning process, the BP algorithm is used to adjust the parameters of the machine learning model. When the loss function converges to a certain extent, the output accuracy of the machine learning model is also increased to a certain extent. The BP algorithm, Error Back Propagation, is especially suitable for the training of the multi-layer feedforward network model. It means that the error will accumulate to the output layer during the training process, and then the error will be reversely transmitted to the output layer. Each feedforward network layer achieves the purpose of adjusting the parameters of each feedforward network layer.

In an embodiment, in order to accurately learn the feature information of each text block, the recognized text block is marked with a text box, and the feature information of each text block is expressed in the form of a feature vector:

R={x,y,w,h,s,d};

R represents a feature vector of a text block, including 6 feature information; x represents an x coordinate of a center point of the text block; y represents a y coordinate of a center point of the text block; w represents a width of the text block; and h represents a height of the text block; s represents the scale mean of all connected regions in the text block; d represents the density information of the text block. The connected area refers to an area that can be formed by a connection between pixels in a binarized image; a connection between pixels has a 4-neighbor and an 8-neighbor algorithm, for example, an 8-neighbor connection algorithm, that is, at (x) , y) the pixel of the position, if one of the 8 points adjacent to it is the same as the pixel value of (x, y), the two are connected by 8 neighborhoods, and recursively find all connected points, these points The collection is a connected area.

among them,

W and H respectively represent functions of taking length and taking width, r _i is a connected region i, K represents a total amount of connected regions included in a text block, and p represents a pixel value of a pixel.

In one embodiment, after identifying the text block included in the document picture, the step of acquiring the feature vector R={x, y, w, h, s, d} of each text block is further included. In order to make the machine learning model insensitive to the scale information, the corresponding feature information of the text block is further normalized, for example, a convention:

w=1.0; h=1.0; max(p)=1.0.

In one embodiment, the manner in which a starting text block is determined from all of the text blocks may include:

The XOY coordinate system is established with the vertex of the upper left corner of the document image as the origin (refer to FIG. 3 and FIG. 5), and the positive direction of the x-axis of the XOY coordinate system points to the width direction of the document picture, and the positive direction of the y-axis points to the length direction of the document picture. . First, a text block having the smallest x coordinate of the center point is obtained from the block set as the text block A. Then, acquiring a text block whose center point is smaller than the text block of the text block A, constructing a text block set G'; and sequentially comparing each text block B in the set G' with the text block A; If there is no intersection between the text block B and the projection of the text block A in the x-axis direction, the text block B is deleted from the set G'; if the text block B and the text block A are in the x-axis direction If there is an intersection of the projections, the text block A is updated as the text block B, and the text block B is deleted from the set G'. Detecting whether the set G' is empty after each text block comparison; if so, determining the current text block A as the starting text block; if not, updating the set G' when the text block A is updated, and Each text block in the updated set G' is compared with the current text block A; and so on until the set G' is empty. The method for determining the starting text block of this embodiment is applicable to various complicated documents and can accurately identify the starting text block.

In one embodiment, it is assumed that the feature vector of each text block is represented as R = {r ₁ , r ₂ , r ₃ , r ₄ , r ₅ , r ₆ }={x, y, w, h, s, d}, abbreviated as R = {r _j ; j ∈ [0, 6)}, r _j is the feature information j of the sample block. The machine learning model is selected as a neural network model. Correspondingly, as shown in FIG. 4, the neural network model may include a 6-dimensional input layer, a 6-dimensional output layer, a first hidden layer, and a second hidden layer. In the neural network model, the input layer is responsible for receiving input and distributing to the hidden layer (because the user cannot see these layers, so it is called the hidden layer). The hidden layer is responsible for the required calculations and output results to the output layer, and the user can see Final Results.

Preferably, the first hidden layer and the second hidden layer are 12-dimensional and 20-dimensional hidden layers, respectively. Inputting R={r _j ;j∈[0,6)} into the neural network model, the output of the first hidden layer is K ₁ :

The output of the second hidden layer is K ₂ :

The output of the 6-dimensional output layer is O:

O={o _n =sigmoid∑a _on k _2m +b _on ;n∈[0,6),m∈[0,20)};

Where a _1i and b _1i are parameters corresponding to the first hidden layer, k _1i is the i-th output of the first hidden layer; a _2m and b _2m are parameters corresponding to the second hidden layer, and k _2m is the second hidden layer The m- _th output; a _on and b _on are parameters corresponding to the 6-dimensional output layer, o _n is the n-th output, and Sigmoid represents the S-type nonlinear function.

For the training of the neural network model described above, taking the text block in FIG. 5 as an example, the text block in FIG. 5 is used as a sample block to train the neural network model, and the sample block includes R ₁ , R ₂ , and R ₃ . R ₄ and R ₅ can be expressed as:

R ₁ ={x ₁ ,y ₁ ,w ₁ ,h ₁ ,s ₁ ,d ₁ }

R ₂ ={x ₂ , y ₂ , w ₂ , h ₂ , s ₂ , d ₂ };

R ₃ ={x ₃ , y ₃ , w ₃ , h ₃ , s ₃ , d ₃ };

R ₄ ={x ₄ , y ₄ , w ₄ , h ₄ , s ₄ , d ₄ };

_{_{R 5 = {x 5, y}} 5, w 5, h 5, s 5, d 5};

It is also known that the correct reading order of R ₁ , R ₂ , R ₃ , R ₄ and R ₅ is R ₁ → R ₃ → R ₂ → R ₄ → R ₅ .

Determining, according to the training sample, a set of current sequential states of each sample block is S={s _i ; i ∈ [1, 5], s _i ∈ [0, 5]}, wherein when s _i =0 indicates that the corresponding text block R _i has not been determined execution order routing operations (i.e., not determined reading order R _i), s _i> 0 indicates that the corresponding text block R _i has been determined that the order of execution of the routing operation (i.e., R _i The reading order has been determined), and the order in which the routing operation is performed is determined as the value of s _i , expressed as S(R _i )=s _i . Therefore, the corresponding reading state of the training sample during the training process may include:

S ₀ = ( _{0, 0, 0, 0, 0} );

S ₁ = ( _1, 0, _0, 0, 0);

S ₂ = (1, 0, _2, 0, 0);

S ₃ = ( _{1, 3, 2,} 0, 0);

S ₄ = (1, _{3, 2, 4} , 0);

S ₅ = (1, 3, 2, _{4, 5} );

Further, the training samples R ₁ , R ₂ , R ₃ , R ₄ , R ₅ may also be described as a sequence of states:

{R ₁ , S ₁ , S ₂ }, {R ₃ , S ₂ , S ₃ }, {R ₂ , S ₃ , S ₄ }, {R ₄ , S ₄ , S ₅ };

Since the {R ₄ , S ₄ , S ₅ } sequence can be directly determined, it does not require training, so in the sample library, T = {{R ₁ , S ₁ , S ₂ }, {R ₃ , S ₂ , S ₃ }, {R ₂ , S ₃ , S ₄ }}. Based on the sample library, the training of the neural network model is first performed using the {R ₁ , S ₁ , S ₂ } sequence, as follows:

The R _{1 is} input into the neural network model, and the prediction information O ₁ of the next reading state output by the neural network model is obtained. Selecting a value of 0 in S ₁ corresponding sample blocks, obtained set ^{_{G * = {R 2, R}} 3, R 4, R 5}. The set G ^* respectively in the dot product of the O _1, to obtain ^{_{V * = {v 2, v}} 3, v 4, v 5} obtained after normalization

Get the value of G ^* in the state S ₂ in a corresponding, set of obtained ^V π:

V ^π ={v' ₂ ,v' ₃ ,v' ₄ ,v' ₅ }={0,2,0,0};

The normalization process yields V ^ππ ={v" ₂ , v" ₃ , v" ₄ , v " ₅ } = {0, 1, 0, 0}.

According to the set V ^** and the set V ^ππ , the corresponding loss function of the sample block R ₁ participating in the training can be constructed:

All parameters in the neural network model can be updated by the BP algorithm.

The training is continued according to the above steps, that is, according to the sequence {R ₃ , S ₂ , S ₃ }, {R ₂ , S ₃ , S ₄ }, the training is continued in accordance with the above steps, whereby the training of the neural network model can be completed. In this embodiment, a neural network model with stable performance can be obtained by selecting an appropriate training sample; the text block finding based on the trained neural network model can accurately obtain the next text block of the current text block, which is favorable for accurate detection. The order in which documents are read in each type of document picture.

The method for detecting the reading order of the document in the above embodiment of the present application can be applied to an automatic document analysis module in an OCR system, and the automatic document analysis module sorts the identified text blocks after identifying the text block included in the document image. Then, the reading order of the text block is output to the text recognition module, and after the text recognition is performed in the text recognition module, the final readable document is organized based on the already obtained reading order, thereby performing automatic analysis and storage. Specifically, when the automatic document analysis module sorts the text blocks, the information processing process includes:

The selection algorithm A=Α(R, S) is set, and the algorithm derives the state S of the next reading order according to the current text block R and the state S of the current reading order, which can be expressed as:

Where S ₀ = {s _i =0; i ∈ [1, n]}, S _n = {s _i = i; i ∈ [1, n]}, where n represents the total number of text blocks contained in the document picture.

Further, the algorithm A can be divided into three parts:

1) R _start selector Ψ ₁

Ψ _{1 is} used to select the starting text block, and the starting text block is marked with R _start . In all the text blocks R, select an R whose center point coordinate is located at the leftmost side of the document picture, denoted as R _l , and then calculate the remaining R relative to R _l and select y(R)<y(R _l ) Construction of a set of text blocks G ', preferentially, but also for G' R in descending order according to y coordinate, then compared follow each R _L and R sequentially G 'is, if the direction R _L and R in the x-axis projection intersect, mark this as the R _L R, the 'delete; otherwise, not updating R _L, R from this G' from the G deletion R; above operation is repeated until the G 'is empty , can determine R _start = R _l .

In a preferred embodiment, each time after the new R is marked as R _l and the R is deleted from G′, if it is detected that the set G′ is not empty at this time, the set G′ is updated (ie, acquired) All text blocks whose center point y coordinate is smaller than the updated R ₁ center point y coordinate get a new set G'), and by updating the set G', the time for selecting the start text block can be further reduced.

2) Feature Generator Ψ ₂

Ψ _{2 is} used to derive the feature prediction information O _i+1 according to the current text block R _i to the next reading order state, which can be described as:

As mentioned above, each text block can be described as R={x, y, w, h, s, d}, and the corresponding Ψ ₂ can be selected to include a 6-dimensional input, a 6-dimensional output, and two 12-dimensional and 20-dimensional outputs, respectively. The fully connected neural network of the hidden layer has a structure as shown in Fig. 4, in which each circle represents a neuron. For each sample block, if expressed as R = {r _i ; i ∈ [0, 6)}, the output K ₁ of the first hidden layer is:

The output of the second hidden layer is:

The output of the 6-dimensional output layer is:

O={o _i =sigmoid∑a _oi k _2j +b _oi ;i∈[0,6),j∈[0,20)}

Where a and b are parameters that require training. O is the output of Ψ ₂ .

3) Feature Synthesizer Ψ ₃

After obtaining the feature prediction information of the next reading order state by Ψ ₂ , the current reading order state S is updated as follows to obtain the next reading order state:

I) acquiring a text block having a value of 0 in the current reading order state S state, constructing a set G ^* ,

G ^* ={R _i ;S _k (R _i )=0}; i∈[1,n];

II) For each R _i ∈G ^* , calculate v _i =R _i ·O to obtain a set V ^* , V ^* ={v _i =R _i ·O};

III) Find the maximum value in V ^* and find the text block corresponding to the value, denoted as R ^* ;

IV) updating the current state of the reading order of S, S is updated in S (R ^*) value of ^{S (R *) = max (} S) +1; reading order thereby to obtain a state corresponding to a next, i.e., to give the corresponding The next block of text. By analogy, you can sort all the text blocks.

The method for detecting the reading order of the document in the present application is exemplified by taking the document picture shown in FIG. 5 as an example. Including steps 1 to 5, the steps are as follows:

Step one: performing binarization processing and direction correction processing on the original document image; and performing layout analysis on the document image subjected to the binarization processing and the direction correction processing to obtain all the text blocks included in the document. As shown in FIG. 5, the text blocks contained in the document are obtained as R ₁ , R ₂ , R ₃ , R ₄ and R ₅ .

In step two, the starting text block is determined.

Since _{_{_{R 1, R 2, R 3}}} , R 4 and R _5, R & lt center point x coordinate of the leftmost _3, R _start will thus initially assigned to R _3.

Obtaining all text blocks whose center point y coordinate is smaller than the R ₃ center point y coordinate, and sorting them in y coordinate order, can obtain the set G'=(R ₂ , R ₁ ).

Cycle through R _start . It is detected that there is no intersection of the projections of the text blocks R ₂ and R ₃ in the x-axis direction, so R ₂ is deleted from the set G′; it is detected that the projections of the text blocks R ₁ and R ₃ in the x-axis direction have an intersection, so R _start Update to R ₁ and remove R ₁ from the set G′. Since the set G′ is already empty at this time, there is no need to update the set G′ (ie, it is not necessary to obtain all the text blocks whose center point y coordinate is smaller than the R ₁ center point y coordinate. To update the set G'), the loop ends. Obtaining the text block corresponding to the current R _start is R ₁ , thereby determining that the starting text block of the document shown in FIG. 5 is R ₁ .

Step three, starting from the beginning of automatic routing text block R _1.

The current text block is R ₁ ={x ₁ , y ₁ , w ₁ , h ₁ , s ₁ , d ₁ }, the current state is S ₁ =(1,0,0,0,0); R ₁ ={ x ₁ , y ₁ , w ₁ , h ₁ , s ₁ , d ₁ } are input to the trained neural network model, and the predicted information output by the neural network model is O={o ₁ , o ₂ , o ₃ , o ₄ ,o ₅ ,o ₆ };

Based on the current state S ₁ = ( _1, 0, 0, 0, 0), the set G ^* = {R ₂ , R ₃ , R ₄ , R ₅ };

Further available:

V ^* ={R ₂ ·O, R ₃ ·O, R ₄ ·O, R ₅ ·O,};

R _i ·O=x _i ×o ₁ +y _i ×o ₂ +w _i ×o ₃ +h _i ×o ₄ +d _i ×o ₅ ;

Select the maximum value V ^* in a text block corresponding to the value obtained in Example R ₃ · O may be the maximum the present embodiment, the order of reading to update the current state S ₁ = (1,0,0,0,0) Chinese chunk The value corresponding to R ₃ is s ₃ =1+1=2, so that the next state is S ₂ =(1,0,2,0,0), and it is determined that the next text block is R ₃ .

Then, R _{3 is taken} as the current text block. In the same way, the next state corresponding to R ₃ is S ₃ = (1, _{3, 2,} 0, 0), that is, the next text block corresponding to R ₃ is R. ₂ ; then R ₂ as the current text block, in the same way, the next state corresponding to R ₂ is S ₄ = (1, 3, ₂ , ₄ , 0), that is, the next text block corresponding to R ₂ R ₄ ; then R ₄ as the current text block, since there is only one text block (ie R ₅ ) in the corresponding set G ^* at this time, the text block can be directly used as the next text block of the current text block and correspondingly The next state is S ₅ = (1, 3, 2, _{4, 5} ); the automatic path finding ends.

Step 4: According to the result of automatic path finding, the document reading order is R ₁ → R ₃ → R ₂ → R ₄ → R ₅ .

Step 5: Perform text recognition on the text block in the order of R ₁ → R ₃ → R ₂ → R ₄ → R ₅ to obtain readable text information corresponding to the document, and save and output the readable text information.

The text recognition of the text block includes steps of line segmentation and line recognition, and character recognition is performed in units of rows in sequence, thereby obtaining text information of the entire text block.

According to the method for detecting the reading order of the document by the above embodiment, since the neural network algorithm has a large number of parameters, according to the trained neural network model, it can be compatible with various scenes, and has better robustness to the size, noise and pattern of the document picture. .

It should be noted that, for the foregoing method embodiments, for the sake of brevity, they are all described as a series of action combinations, but those skilled in the art should understand that the present application is not limited by the described action sequence, because In accordance with the present application, certain steps may be performed in other sequences or concurrently. Further, any combination of the above embodiments can be made, and other embodiments can be obtained.

Based on the same idea as the method of detecting the reading order of documents in the above embodiment, the present application also provides an apparatus for detecting a reading order of a document, the apparatus being usable for performing the above-described method of detecting a reading order of a document. For the convenience of the description, in the structural schematic diagram of the device embodiment for detecting the reading order of the document, only the parts related to the embodiment of the present application are shown. Those skilled in the art can understand that the illustrated structure does not constitute a limitation on the device, and may include More or fewer parts than the illustration, or a combination of some parts, or a different part arrangement.

In an embodiment, there is also provided a computer device, the internal structure of which may be as shown in FIG. 2, the computer device includes means for detecting a reading order of the document, and the device for detecting the reading order of the document includes each module, each The modules may be implemented in whole or in part by software, hardware or a combination thereof.

FIG. 6 is a schematic structural diagram of an apparatus for detecting a reading order of a document according to an embodiment of the present invention; as shown in FIG. 6, the apparatus for detecting a reading order of a document includes: a block identifying module 610, and a starting block selecting module 620. The automatic path finding module 630 and the sequence determining module 640 are detailed as follows:

The block identification module 610 is configured to identify a text block included in a document picture, and construct a block set;

In an embodiment, the block identification module 610 may specifically include: a pre-processing sub-module for performing binarization processing and direction correction processing on the document picture; and a layout recognition sub-module for The document image of the value processing and the direction correction processing is subjected to layout analysis to obtain a text block included in the document. Among them, the layout analysis refers to an algorithm for dividing the content in a document picture into a plurality of non-overlapping regions according to paragraphs, pagination, and the like in the OCR. This will result in all the text blocks contained in the document, as shown in Figure 3 or Figure 5.

The start block selection module 620 is configured to determine a starting text block from the block set.

In general, a person reads a document from a corner of the document. Based on this, in an embodiment, the start block selection module 620 can be used to select a center point coordinate from the block set. A text block of a vertex of the document picture is determined and the text block is determined as the starting text block. For example, the start block selection module 620 can be configured to select, from all the text blocks, a text block whose center point coordinates are located on the left side and the top of the document picture (ie, the text block in the upper left corner), and determine the text block as The starting text block. The text block R ₁ as shown in FIG. 3, or the text block R ₁ shown in FIG.

It will be appreciated that in other embodiments, the starting block selection module 620 may also determine other text blocks as starting text blocks for different documents and actual reading habits (eg, documents formatted from right to left). .

The automatic path finding module 630 is configured to perform a routing operation on the starting text block according to the feature information of the starting text block to determine a first text block in the block set corresponding to the starting text block. The feature information of the text block includes location information of the text block in the document image and layout information of the text block; performing a routing operation on the first text block according to the feature information of the first text block to determine a text block corresponding to the first text block in the set of blocks; and so on until the execution order of the routing operation corresponding to each text block in the block set can be uniquely determined.

In this embodiment, the automatic path finding module 630 is configured to perform a process of automatically routing a text block included in a document from a starting text block, and each path finding only needs to determine the current text block corresponding to the next. A block of text. For example, a document image shown in FIG. 3, the current text block R _1, may determine that the next block of text is a text block of R ₁ R ₂ through this routing; R ₂ was then performed again as the current routing text, to give R The next text block of ₂ is R ₄ ; and so on, until it is determined that the next text block of R ₆ is R ₇ , the execution order of the routing operations corresponding to each text block can be uniquely determined.

The sequence determining module 640 is configured to determine an execution order of the routing operations corresponding to the text blocks in the block set, and obtain a reading order of the text blocks in the document picture according to the execution order.

For example, the sequence determining module 640 can obtain the reading order of the text blocks in the document picture shown in FIG. 3 as R ₁ → R ₂ → R ₄ → R ₅ → R ₃ → R ₆ → R ₇ → R ₈ .

In an embodiment, the starting block selection module 620 is specifically configured to establish an XOY coordinate system with an vertices of an upper left corner of the document image as an origin, and the X-axis positive direction of the XOY coordinate system points to a width direction of the document image, and a positive direction of the y-axis Pointing to the length direction of the document picture; obtaining a text block having the smallest x coordinate of the center point from the block set as the text block A;

Obtaining a text block whose center point is smaller than the text block of the text block A, constructing a text block set G'; and sequentially comparing each text block B in the set G' with the text block A;

If there is no intersection between the text block B and the projection of the text block A in the x-axis direction, the text block B is deleted from the set G'; if the text block B and the text block A are in the x-axis direction The intersection of the projections is updated, the text block A is updated as the text block B, and the text block B is deleted from the set G'; whether the set G' is empty after each text block comparison; if yes, Then determining the current text block A as the starting text block; if not, updating the set G' when the text block A is updated, and updating each text block in the updated set G' with the current text Block A performs the above comparison; and so on until the set G' is empty.

In one embodiment, each time after updating the text block A with a new text block B and deleting the text block B from G', if it is detected that the set G' is not empty at this time, the set is updated. G' (i.e., obtaining a text block in which all center point y coordinates are smaller than the updated text block A center point y coordinate to obtain a new set G'), by updating the set G', the time for selecting the start text block can be further reduced.

In an embodiment, as shown in FIG. 7, the apparatus for detecting a reading order of a document further includes: a training module 650, configured to pre-train the machine learning model, so that the feature prediction information output by the machine learning model after the training and the corresponding The Euclidean distance of the sample information satisfies the set condition.

In one embodiment, the training module 650 can include a sample library construction sub-module and a training sub-module. The sample library construction sub-module is configured to acquire training samples, and establish a sample library M={G, S, T}, where G represents a set of sample blocks, and S represents a set of sequential states of the sample blocks in successive trainings. , T represents the sequence of state changes to be determined during the training; if the total number of sample blocks in G is n, then,

S={s _i ;i∈[1,n],s _i ∈[0,n]};

T={{R ₁ ,S ₁ ,S ₂ },{R ₂ ,S ₂ ,S ₃ },...{R _n-2 ,S _n-2 ,S _n-1 }};

s _i =0 indicates that the reading order of the sample block R _i is not determined (ie, the order in which the routing operation is performed is not determined), and if s _i >0 indicates that the reading order of the sample block R _i has been determined (ie, the order in which the routing operations are performed has been performed) Determine), and the reading order is the value of s _i , expressed as S(R _i )=s _i ; each item in the T represents the sequence state of the currently participating training sample block and all current sample blocks respectively The set and the set of next sequential states of all sample blocks to be predicted.

The training sub-module is configured to sequentially train the parameters in the machine learning model by using each sequence in the T; and after all the sequences in the T participate in the training, save the parameters in the machine learning model.

In one embodiment, the training sub-module is used to implement the following process when training parameters in the machine learning model according to the kth sequence {R _k , S _k , S _k+1 } in T:

Inputting the feature information of the sample block R _k into the machine learning model, and acquiring feature prediction information O _k , k ∈ [1, n-2] of the next text block of R _k output by the machine learning model;

Obtaining a sample block R _i with a sequential state of 0 in S _k , and obtaining a set G ^* ,

G ^* ={R _i ;S _k (R _i )=0}; i∈[1,n];

The set G ^* respectively in the dot product of O _k, to obtain a set of ^{_{V * = {v i = R}} i · O k};

Obtaining a sequence state corresponding to each item in the set G ^* in S _k+1 , and obtaining a set V ^π ={v' _i =S _k+1 (R _i )};

The set V ^** is normalized to obtain the set V ^** , and the set V ^π is normalized to obtain the set V ^ππ ; the sample block R _{k is} constructed according to the set V ^** and the set V ^{ππ to} participate in the corresponding loss function during training. Updating parameters in the machine learning model by a BP algorithm based on the loss function, wherein the loss function is:

Loss=|V ^** -V ^ππ |.

In an embodiment, the block identification module 610 is further configured to acquire a feature vector R={x, y, w, h, s, d} of each text block; wherein x represents an x coordinate of a center point of the text block, y represents the y coordinate of the center point of the text block, w represents the width of the text block, h represents the height of the text block, s represents the scale mean of all connected regions in the text block, and d represents the density information of the text block.

Correspondingly, the machine learning model is a 6-dimensional input and 6-dimensional output neural network model. For example, the neural network model includes a 6-dimensional input layer, a 6-dimensional output layer, a first hidden layer, and a second hidden layer, wherein the first hidden layer and the second hidden layer are 12-dimensional and 20-dimensional hidden layers, respectively;

If the feature information of each text block is represented as R={r _j ;j∈[0,6)}, and r _j represents the feature information j of the sample block, the output K ₁ and the second hidden layer of the first hidden layer The output K _{2 of the} layer is:

The output of the 6-dimensional output layer is O:

O={o _n =sigmoid∑a _on k _2m +b _on ;n∈[0,6),m∈[0,20)};

In an embodiment, the apparatus for detecting a reading order of the document further includes: a text recognition module 660, configured to perform text recognition on each of the text blocks, and obtain text information of the document image according to the determined reading order.

The device for detecting the reading order of the document according to the above embodiment can identify all the text blocks included in the document picture, and determine a starting text block from all the text blocks; then start the path starting from the starting text block, according to the advance The trained machine learning model determines which text block area should be taken next until the reading order of all text blocks is obtained. According to the position information of the text block in the document picture and the layout information of the text block, the path finding can be compatible with various scenes, and has better robustness to the size, noise and pattern of the document picture, and can accurately identify various types of documents. The order in which the images correspond to the reading order.

It should be noted that, in the implementation of the apparatus for detecting the reading order of the document in the above example, the information interaction, the execution process, and the like between the modules are based on the same concept as the foregoing method embodiment of the present application, and the technical effects thereof are brought about. For the details of the foregoing method embodiment, refer to the description in the method embodiment of the present application, and details are not described herein again.

In addition, in the implementation of the apparatus for detecting the reading order of the document in the above example, the logical division of each functional module is merely an example, and the actual application may be considered according to requirements, for example, for the configuration requirements of the corresponding hardware or the convenience of implementation of the software. The above-mentioned function assignment is completed by different function modules, that is, the internal structure of the device for detecting the reading order of the documents is divided into different functional modules to complete all or part of the functions described above. Each function module can be implemented in the form of hardware or in the form of a software function module.

It will be understood by those skilled in the art that all or part of the processes in the above embodiments may be implemented by a computer program to instruct related hardware, and the program may be stored in a computer readable storage medium as Independent product sales or use. The program, when executed, may perform all or part of the steps of an embodiment of the methods described above. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), or a random access memory (RAM).

In the above embodiments, the descriptions of the various embodiments are all focused, and the parts that are not detailed in a certain embodiment can be referred to the related descriptions of other embodiments.

It should be understood that the various steps in the various embodiments of the present application are not necessarily performed in the order indicated by the steps. Except as explicitly stated herein, the execution of these steps is not strictly limited, and the steps may be performed in other orders. Moreover, at least some of the steps in the embodiments may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be executed at different times, and the execution of these sub-steps or stages The order is also not necessarily sequential, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of the other steps.

One of ordinary skill in the art can understand that all or part of the process of implementing the above embodiments can be completed by a computer program to instruct related hardware, and the program can be stored in a non-volatile computer readable storage medium. Wherein, the program, when executed, may include the flow of an embodiment of the methods as described above. Any reference to a memory, storage, database or other medium used in the various embodiments provided herein may include non-volatile and/or volatile memory. Non-volatile memory can include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory can include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of formats, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronization chain. Synchlink DRAM (SLDRAM), Memory Bus (Rambus) Direct RAM (RDRAM), Direct Memory Bus Dynamic RAM (DRDRAM), and Memory Bus Dynamic RAM (RDRAM).

The technical features of the above-described embodiments may be arbitrarily combined. For the sake of brevity of description, all possible combinations of the technical features in the above embodiments are not described. However, as long as there is no contradiction between the combinations of these technical features, All should be considered as the scope of this manual.

The above described embodiments are merely illustrative of several embodiments of the present application and are not to be construed as limiting the scope of the claims. It should be noted that a number of variations and modifications may be made by those skilled in the art without departing from the spirit and scope of the invention. Therefore, the scope of the invention should be determined by the appended claims.

Claims

A method of detecting a reading order of a document, comprising:

The computer device identifies a block of text contained in the document picture to construct a block set;

The computer device determines a starting text block from the set of blocks;

The computer device performs a routing operation on the starting text block according to the feature information of the starting text block to determine a first text block corresponding to the starting text block in the block set; The feature information of the text block includes at least location information of the text block in the document picture and layout layout information of the text block;

The computer device performs a routing operation on the first text block according to the feature information of the first text block to determine a text block corresponding to the first text block in the block set; and so on Until the execution order of the routing operations corresponding to each text block in the block set can be uniquely determined; and

The computer device obtains a reading order of the text blocks in the document picture according to the execution order.
The method for detecting a reading order of a document according to claim 1, wherein the determining, by the computer device, a starting text block from the set of blocks comprises:

The computer device selects, from the set of blocks, a text block whose center point coordinates are located at one vertex of the document picture, and determines the text block as the starting text block.
The method for detecting a reading order of a document according to claim 1, wherein the determining, by the computer device, a starting text block from the set of blocks comprises:

The computer device establishes an XOY coordinate system with a vertex of the document picture as an origin, an x-axis positive direction of the XOY coordinate system points to a width direction of the document picture, and a positive direction of the y-axis points to a length direction of the document picture ;

The computer device obtains, from the block set, a text block having the smallest x coordinate of the center point as the text block A;

The computer device acquires a text block whose central point has a y coordinate smaller than the text block A, constructs a text block set G′; and sequentially sets each text block B of the text block set G′ with the text block. A for comparison;

If the computer device does not have an intersection with the projection of the text block A in the x-axis direction, the text block B is deleted from the text block set G'; if the text block B There is an intersection with the projection of the text block A in the x-axis direction, then the text block A is updated as the text block B, and the text block B is deleted from the text block set G';

The computer device detects whether the text block set G' is empty after each text block comparison; if so, the current text block A is determined as the starting text block; if not, then the text block A occurs Updating the text block set G' at the time of updating, and comparing each of the updated text block sets G' with the current text block A; and so on until the text block set G' is air.
The method of detecting a reading order of a document according to claim 1, wherein the routing operation comprises:

The computer device learns the feature information of the text block by using a pre-trained machine learning model, and obtains feature prediction information of the text block corresponding to the text block;

The computer device calculates a correlation between feature information of each text block in which the path finding operation is not performed in the block set and the feature prediction information; and

The computer device determines a text block corresponding to the text block according to the calculated correlation degree.
The method of detecting a reading order of a document according to claim 1, further comprising:

The computer device pre-trains the machine learning model such that the feature prediction information output by the machine learning model after the training and the Euclidean distance of the corresponding sample information satisfy the set condition.
The method of detecting a reading order of a document according to claim 5, wherein the computer device pre-trains the machine learning model, comprising:

The computer device establishes a sample library, the information in the sample library includes: a set of sample blocks, a sequence state of each sample block in the set of the sample blocks in successive trainings, and a state change to be determined by the training a sequence; if the total number of sample blocks in the set of sample blocks is n, the sequence of state changes to be determined by the training is n-2, and the information in each state change sequence includes: a sample block currently participating in the training, Determining a current sequential state of each of the sample blocks in the set of sample blocks, and a next sequential state of each of the sample blocks in the set of sample blocks;

The computer device sequentially trains the machine learning model with each state change sequence; after n-2 state change sequences are all involved in the training, the parameters in the machine learning model are saved.
The method for detecting a reading order of a document according to claim 6, wherein the computer device trains the machine learning model by using the kth state change sequence, comprising:

The computer device characterized in the characteristic information of the text block k R k blocks of samples of the input sample blocks set of machine learning models, the machine learning model acquiring the output sample blocks corresponding to the prediction information R k O k, K∈[1,n-2];

The computer device obtains a sample block in which the reading order is not determined according to a sequence state of each sample block in the set of sample blocks when the sample block R k participates in training, to obtain a set G * ;

The computer device performs a dot product operation on the feature information of each sample block in the set G * with Ok to obtain a set V * ;

The computer device acquires a sequence state of each sample block in the set G * when the k+1th sample block participates in training, and obtains a set Vπ ;

The computer device for the collection V * normalized to give a set of V **, the collection V π normalizing process to obtain a set of V ππ; constructing the set of sample blocks according to R k and V ** set V ππ participation The corresponding loss function during training updates the parameters in the machine learning model by an error backpropagation BP algorithm based on the loss function.
A method of detecting a reading order of a document according to claim 1, wherein

The position information of the text block in the document picture includes: an x coordinate of a center point of the text block in the document picture, and a y coordinate of a center point of the text block in the document picture;

The layout information of the text block includes: a width of the text block, a height of the text block, a scale mean of all connected regions in the text block, and density information of the text block;

The machine learning model is a 6-dimensional input and 6-dimensional output neural network model.
The method for detecting a reading order of a document according to claim 8, wherein the neural network model comprises a 6-dimensional input layer, a 6-dimensional output layer, a first hidden layer, and a second hidden layer, wherein the first hidden layer The second hidden layer is a hidden layer of 12-dimensional and 20-dimensional, respectively.
The method for detecting a reading order of a document according to claim 1, wherein the computer device identifies a text block included in the document image, including:

The computer device performs binarization processing and direction correction processing on the document picture;

The computer device performs layout analysis on the document image subjected to the binarization processing and the direction correction processing to obtain a text block included in the document image.
The method for detecting a reading order of a document according to claim 1, further comprising:

The computer device performs text recognition on each text block, and obtains text information of the document picture according to the determined reading order.
A computer device comprising a memory and a processor, the memory storing computer readable instructions, the computer readable instructions being executed by the processor such that the processor performs the following steps:

Identify a block of text contained in the document image to construct a block set;

Determining a starting text block from the set of blocks;

Performing a routing operation on the starting text block according to the feature information of the starting text block to determine a first text block in the block set corresponding to the starting text block; the feature information of the text block includes Position information of the text block in the document picture and layout information of the text block;

Performing a routing operation on the first text block according to the feature information of the first text block to determine a text block corresponding to the first text block in the block set; and so on until the block The execution order of the routing operations corresponding to each text block in the collection can be uniquely determined;

Determining an execution order of the routing operations corresponding to the text blocks in the block set, and obtaining a reading order of the text blocks in the document picture according to the execution order.
The computer device according to claim 12, wherein said determining a starting text block from said set of blocks comprises:

A text block whose center point coordinates are located at one vertex of the document picture is selected from the set of blocks, and the text block is determined as the start text block.
The computer device according to claim 12, wherein said determining a starting text block from said set of blocks comprises:

Establishing an XOY coordinate system with one vertex of the document picture as an origin, the positive direction of the x-axis of the XOY coordinate system points to the width direction of the document picture, and the positive direction of the y-axis points to the length direction of the document picture;

Obtaining, from the block set, a text block having a smallest x coordinate of the center point as the text block A;

Obtaining a text block of the center point that is smaller than the text block of the text block A, constructing a text block set G′; and sequentially comparing each text block B of the text block set G′ with the text block A;

If there is no intersection between the text block B and the projection of the text block A in the x-axis direction, the text block B is deleted from the text block set G'; if the text block B and the text An intersection of the projection of the block A in the x-axis direction is performed, the text block A is updated as the text block B, and the text block B is deleted from the text block set G';

Detecting whether the text block set G' is empty after each text block comparison; if so, determining the current text block A as the starting text block; if not, updating the text block A when the update occurs The text block set G' is described, and each of the updated text block sets G' is compared with the current text block A; and so on until the text block set G' is empty.
The computer device according to claim 12, wherein the path finding operation comprises:

Learning the feature information of the text block by using a pre-trained machine learning model to obtain feature prediction information of the text block corresponding to the text block;

Calculating a correlation between feature information of each text block in which the path finding operation is not performed in the block set and the feature prediction information; and

A text block corresponding to the text block is determined according to the correlation calculated above.
The computer apparatus according to claim 12, wherein said computer readable instructions further cause said processor to perform the following steps:

The machine learning model is pre-trained such that the feature prediction information output by the machine learning model after the training and the Euclidean distance of the corresponding sample information satisfy the set condition.
The computer apparatus according to claim 16, wherein said pre-training machine learning model comprises:

Establishing a sample library, the information in the sample library comprising: a set of sample blocks, a sequence state of each sample block in the set of the sample blocks in successive trainings, and a sequence of state changes to be determined by the training; The total number of sample blocks in the set of sample blocks is n, then the sequence of state changes to be determined by the training is n-2, and the information in each state change sequence includes: the sample block currently participating in the training, and the sample block of the sample block a current sequential state of each sample block in the set, and a next sequential state of each sample block in the set of sample blocks;

The machine learning model is trained in sequence with each state change sequence; after n-2 state change sequences are all involved in the training, the parameters in the machine learning model are saved.
The computer apparatus according to claim 17, wherein said training the machine learning model with the kth state change sequence comprises:

And inputting feature information of the kth sample block R k in the set of the sample blocks into a machine learning model, and acquiring feature prediction information O k , k∈[1 of the text block corresponding to the sample block R k output by the machine learning model ,n-2];

Obtaining a sample block in which a reading order is not determined according to a sequence state of each sample block in the set of sample blocks when the sample block R k participates in training, to obtain a set G * ;

The feature information of each sample block in the set G * is respectively subjected to a dot product operation with O k to obtain a set V * ;

Obtaining a sequence state of each of the sample blocks in the set G * when the k+1th sample block participates in training, and obtaining a set Vπ ;

The set V ** is normalized to obtain a set V ** , and the set V π is normalized to obtain a set V ππ ; the sample block R k is constructed according to the set V ** and the set V ππ to participate in training A loss function that updates parameters in the machine learning model by an error backpropagation BP algorithm based on the loss function.
The computer device according to claim 12, wherein the position information of the text block in the document picture comprises: an x coordinate of a center point of the text block in the document picture, and a y coordinate of a center point of the text block in the document picture The layout information of the text block includes: a width of the text block, a height of the text block, a scale mean of all connected regions in the text block, and density information of the text block; the machine learning model is a 6-dimensional input and a 6-dimensional output nerve Network model.
The computer device according to claim 19, wherein the neural network model comprises a 6-dimensional input layer, a 6-dimensional output layer, a first hidden layer, and a second hidden layer, the first hidden layer and the second hidden layer. The layers are 12-dimensional and 20-dimensional hidden layers, respectively.
The computer device according to claim 12, wherein the identifying the text block included in the document picture comprises:

Performing binarization processing and direction correction processing on the document picture;

The document image of the binarization processing and the direction correction processing is subjected to layout analysis to obtain a text block included in the document image.
The computer apparatus according to claim 12, wherein said computer readable instructions further cause said processor to perform the following steps:

Text recognition is performed on each text block, and text information of the document picture is obtained according to the determined reading order.
One or more non-volatile storage media storing computer readable instructions, when executed by one or more processors, cause one or more processors to perform the following steps:

Identify a block of text contained in the document image to construct a block set;

Determining a starting text block from the set of blocks;

Performing a routing operation on the starting text block according to the feature information of the starting text block to determine a first text block in the block set corresponding to the starting text block; the feature information of the text block includes Position information of the text block in the document picture and layout information of the text block;

Performing a routing operation on the first text block according to the feature information of the first text block to determine a text block corresponding to the first text block in the block set; and so on until the block The execution order of the routing operations corresponding to each text block in the collection can be uniquely determined;

Determining an execution order of the routing operations corresponding to the text blocks in the block set, and obtaining a reading order of the text blocks in the document picture according to the execution order.
The storage medium according to claim 23, wherein said determining a starting text block from said set of blocks comprises:

A text block whose center point coordinates are located at one vertex of the document picture is selected from the set of blocks, and the text block is determined as the start text block.
The storage medium according to claim 23, wherein said determining a starting text block from said set of blocks comprises:

Establishing an XOY coordinate system with one vertex of the document picture as an origin, the positive direction of the x-axis of the XOY coordinate system points to the width direction of the document picture, and the positive direction of the y-axis points to the length direction of the document picture;

Obtaining, from the block set, a text block having a smallest x coordinate of the center point as the text block A;

Obtaining a text block of the center point that is smaller than the text block of the text block A, constructing a text block set G′; and sequentially comparing each text block B of the text block set G′ with the text block A;

If there is no intersection between the text block B and the projection of the text block A in the x-axis direction, the text block B is deleted from the text block set G'; if the text block B and the text An intersection of the projection of the block A in the x-axis direction is performed, the text block A is updated as the text block B, and the text block B is deleted from the text block set G';

Detecting whether the text block set G' is empty after each text block comparison; if so, determining the current text block A as the starting text block; if not, updating the text block A when the update occurs The text block set G' is described, and each of the updated text block sets G' is compared with the current text block A; and so on until the text block set G' is empty.
The storage medium according to claim 23, wherein the path finding operation comprises:

Learning the feature information of the text block by using a pre-trained machine learning model to obtain feature prediction information of the text block corresponding to the text block;

Calculating a correlation between feature information of each text block in which the path finding operation is not performed in the block set and the feature prediction information; and

A text block corresponding to the text block is determined according to the correlation calculated above.
The storage medium of claim 23, wherein the computer readable instructions further cause the processor to perform the following steps:

The machine learning model is pre-trained such that the feature prediction information output by the machine learning model after the training and the Euclidean distance of the corresponding sample information satisfy the set condition.
The storage medium of claim 27, wherein the pre-training machine learning model comprises:

Establishing a sample library, the information in the sample library comprising: a set of sample blocks, a sequence state of each sample block in the set of the sample blocks in successive trainings, and a sequence of state changes to be determined by the training; The total number of sample blocks in the set of sample blocks is n, then the sequence of state changes to be determined by the training is n-2, and the information in each state change sequence includes: the sample block currently participating in the training, and the sample block of the sample block a current sequential state of each sample block in the set, and a next sequential state of each sample block in the set of sample blocks;

The machine learning model is trained in sequence with each state change sequence; after n-2 state change sequences are all involved in the training, the parameters in the machine learning model are saved.
The storage medium according to claim 28, wherein said training the machine learning model with the kth state change sequence comprises:

And inputting feature information of the kth sample block R k in the set of the sample blocks into a machine learning model, and acquiring feature prediction information O k , k∈[1 of the text block corresponding to the sample block R k output by the machine learning model ,n-2];

Obtaining a sample block in which a reading order is not determined according to a sequence state of each sample block in the set of sample blocks when the sample block R k participates in training, to obtain a set G * ;

The feature information of each sample block in the set G * is respectively subjected to a dot product operation with O k to obtain a set V * ;

Obtaining a sequence state of each of the sample blocks in the set G * when the k+1th sample block participates in training, and obtaining a set Vπ ;

The set V ** is normalized to obtain a set V ** , and the set V π is normalized to obtain a set V ππ ; the sample block R k is constructed according to the set V ** and the set V ππ to participate in training A loss function that updates parameters in the machine learning model by an error backpropagation BP algorithm based on the loss function.
The storage medium according to claim 23, wherein the position information of the text block in the document picture comprises: an x coordinate of a center point of the text block in the document picture, and a y coordinate of a center point of the text block in the document picture The layout information of the text block includes: a width of the text block, a height of the text block, a scale mean of all connected regions in the text block, and density information of the text block; the machine learning model is a 6-dimensional input and a 6-dimensional output nerve Network model.