WO2018161764A1 - Document reading-order detection method, computer device, and storage medium - Google Patents

Document reading-order detection method, computer device, and storage medium Download PDF

Info

Publication number
WO2018161764A1
WO2018161764A1 PCT/CN2018/075626 CN2018075626W WO2018161764A1 WO 2018161764 A1 WO2018161764 A1 WO 2018161764A1 CN 2018075626 W CN2018075626 W CN 2018075626W WO 2018161764 A1 WO2018161764 A1 WO 2018161764A1
Authority
WO
WIPO (PCT)
Prior art keywords
text block
block
text
sample
blocks
Prior art date
Application number
PCT/CN2018/075626
Other languages
French (fr)
Chinese (zh)
Inventor
朱传聪
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2018161764A1 publication Critical patent/WO2018161764A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/414Extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics or text
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/10Image acquisition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition

Definitions

  • the present application relates to the field of computer technology, and in particular, to a method, a computer device and a storage medium for detecting a reading order of a document.
  • OCR Optical Character Recognition
  • OCR Optical Character Recognition
  • It is an image file for optical characters that converts text in a paper document into a black and white dot matrix image.
  • the software converts the text in the image into a text format for further processing by the word processing software.
  • Various embodiments provided in accordance with the present application provide a method, computer device, and storage medium for detecting a reading order of a document.
  • a method of detecting a reading order of a document comprising:
  • the computer device identifies a block of text contained in the document picture to construct a block set
  • the computer device determines a starting text block from the set of blocks
  • the computer device performs a routing operation on the starting text block according to the feature information of the starting text block to determine a first text block corresponding to the starting text block in the block set; a text block
  • the feature information includes location information of the text block in the document picture and layout information of the text block;
  • the computer device performs a routing operation on the first text block according to the feature information of the first text block to determine a text block corresponding to the first text block in the block set; and so on Until the execution order of the routing operations corresponding to each text block in the block set can be uniquely determined; and
  • the computer device determines an execution order of the routing operations corresponding to the text blocks in the block set, and obtains a reading order of the text blocks in the document picture according to the execution order.
  • a computer device comprising a memory and a processor, the memory storing computer readable instructions, the computer readable instructions being executed by the processor such that the processor performs the following steps:
  • the feature information of the text block includes Position information of the text block in the document picture and layout information of the text block;
  • One or more non-volatile storage media storing computer readable instructions, when executed by one or more processors, cause one or more processors to perform the following steps:
  • the feature information of the text block includes Position information of the text block in the document picture and layout information of the text block;
  • FIG. 1 is a schematic diagram of an application environment of a solution of the present application in an embodiment
  • FIG. 2 is a schematic flowchart of a method for detecting a reading order of a document according to an embodiment
  • FIG. 3 is a schematic diagram of a text block included in a document picture of an embodiment
  • FIG. 4 is a schematic diagram of a neural network model of an embodiment
  • FIG. 5 is a schematic flow chart of training a neural network model according to a training sample according to an embodiment
  • FIG. 6 is a schematic structural diagram of an apparatus for detecting a reading order of a document according to an embodiment
  • FIG. 7 is a schematic structural diagram of an apparatus for detecting a reading order of a document according to another embodiment.
  • an application environment for implementing a method for detecting a reading order of a document in the embodiment of the present application is an intelligent terminal provided with an OCR system, and the smart terminal at least includes a passing system A bus-connected processor, display module, power interface, and memory, the memory including a non-volatile storage medium and an internal memory.
  • the smart terminal identifies and displays the text information contained in the document picture through the OCR system.
  • the display module can display the text information recognized by the OCR system; the power interface is used for connecting with an external power source, and the external power source supplies power to the smart terminal battery through the power interface; the non-volatile storage medium stores at least An operating system, an OCR system, a database, and computer readable instructions that, when executed, cause the processor to perform a method of detecting a reading order of the document.
  • the smart terminal may be a mobile phone, a tablet computer, or the like, or may be another device having the above structure. It will be understood by those skilled in the art that the structure shown in FIG. 1 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation of the computer device to which the solution of the present application is applied.
  • the specific computer device may It includes more or fewer components than those shown in the figures, or some components are combined, or have different component arrangements.
  • FIG. 2 is a schematic flowchart of a method for detecting a reading order of a document according to an embodiment; as shown in FIG. 2, the method for detecting a reading order of a document in the embodiment includes the following steps:
  • the document picture may be binarized to obtain a binarized document picture.
  • the value of each pixel is represented by 0 or 1.
  • the scale analysis and the layout analysis are performed to obtain all the text blocks contained in the document.
  • the scale analysis refers to finding the scale information of each character in the binarized document picture.
  • the scale is in pixels, and the value is the square root of the area of the rectangular area occupied by the characters.
  • Layout analysis refers to an algorithm in OCR that divides the content of a document image into a plurality of non-overlapping regions according to information such as paragraphs and pagination. This will result in all the text blocks contained in the document, as shown in Figure 3 or Figure 5.
  • the step of pre-processing the document picture further includes the step of correcting the document picture. That is, if the initial state of the document image to be detected is deviated from the preset standard state, the document picture is corrected to conform to the standard state. For example, if it is detected that there is a tilt, upside down, etc. in the initial state of the document picture, the direction of the document picture needs to be corrected first.
  • a center point coordinate can be selected from the block set to be located in the document image.
  • a text block of a vertex and the text block is determined as the starting text block.
  • a text block located on the left and top of the document picture is determined as a starting text block, such as the text block R 1 shown in FIG. 3, or the text block R 1 shown in FIG. 5.
  • the feature information of the text block includes location information of the text block in the document image and layout information of the text block.
  • the path finding operation on the text block is actually based on the feature information of the text block to obtain the feature prediction information of the corresponding next text block.
  • the routing operation of the text block includes: learning, by using a pre-trained machine learning model, feature information of the text block to obtain feature prediction information of the text block corresponding to the text block; a correlation between feature information of each text block in which the path finding operation is not performed and the feature prediction information in the block set; and then determining a text block corresponding to the text block according to the calculated correlation degree.
  • step S130 is a process of automatically routing a text block included in the document from the initial text block, and only needs to determine the next text block corresponding to the current text block each time the path is found.
  • the current text block R 1 may determine that the next block of text is a text block of R 1 R 2 through this routing; R 2 was then performed again as the current routing text, to give R The next text block of 2 is R 4 ; and so on, until the routing operation is performed on R 6 , and it is determined that the next text block corresponding to R 6 is R 7 , although R 7 and R 8 are not performed at this time.
  • the machine learning model is trained in advance through a suitable training sample, so that the machine learning model can output a more accurate prediction result, and then an accurate next text block can be determined based on the correlation, which is applicable to Document reading order detection for various mixed document types.
  • the machine learning model may be a neural network model or a probabilistic model of other non-neural networks.
  • S140 Determine an execution sequence of the routing operation corresponding to the text block in the block set, and obtain a reading order of the text block in the document picture according to the execution sequence.
  • each text block and its corresponding next text block can be obtained.
  • all the texts can be obtained according to all the text blocks and the next text block corresponding to each text block.
  • the machine learning module includes a plurality of parameters
  • the method for detecting a reading order of the document further includes the step of training the machine learning model to enable the machine learning model output after the training
  • the Euclidean distance between the feature prediction information and the corresponding sample information satisfies the set condition.
  • the Euclidean distance refers to the Euclidean metric, which represents the spatial distance of two identical dimensional vectors.
  • the manner in which the machine learning module is trained may include the following process:
  • Samples refer to data that has been calibrated during machine learning, including input data and output data.
  • the training samples are a plurality of sample blocks that participate in the training of the machine learning module, and the reading order of the plurality of sample blocks is known.
  • G denotes a set of sample blocks
  • S denotes a set of sequential states of the sample blocks in successive trainings
  • T denotes a sequence of state changes to be determined during training. If the total number of sample blocks in G is n, then,
  • T ⁇ R 1 ,S 1 ,S 2 ⁇ , ⁇ R 2 ,S 2 ,S 3 ⁇ ,... ⁇ R n-2 ,S n-2 ,S n-1 ⁇ ;
  • each item in each of the above T sequences represents a sample block currently participating in training, a current set of sequential states of each sample block in G, and a set of next sequential states of each sample block in G to be predicted. .
  • R 2 indicates that the sample block currently participating in the training is R 2
  • S 2 represents the sequence state corresponding to each sample block in the G when R 2 participates in training
  • S 3 indicates the next sequential state of each sample block in G to be predicted when R 2 is involved in training.
  • the remaining last two sample blocks can be directly determined by the exclusion method, they do not need training, so only n-2 sequences need to be included in T.
  • the machine learning model is trained by sequentially using each state change sequence in T; after all the state change sequences in T participate in the training, the machine is saved. Learn the parameters in the model.
  • the specific implementation of training the parameters in the machine learning model according to the kth sequence ⁇ R k , S k , S k+1 ⁇ in T may include the following steps 1 to 5:
  • Step 1 the feature information of the sample block R k is input into the machine learning model, and the feature prediction information O k , k ⁇ [1, n-2] of the next text block of R k output by the machine learning model is obtained;
  • Step 2 Obtain a sample block R i with a sequential state of 0 in S k , and obtain a set G * :
  • the dimension of the set G * is nk
  • the loss function refers to an error obtained by machine learning calculation in the machine learning process, and the error can be measured using a plurality of functions, and the function is generally a convex function. That is, the loss function corresponding to the sample block R k participating in the training is constructed according to the Euclidean distance of V ** and V ⁇ .
  • the Euclidean distance is the Euclidean metric, indicating that the two are mostly spatial distances of the dimensional vector.
  • the BP algorithm Error Back Propagation
  • the BP algorithm is especially suitable for the training of the multi-layer feedforward network model. It means that the error will accumulate to the output layer during the training process, and then the error will be reversely transmitted to the output layer.
  • Each feedforward network layer achieves the purpose of adjusting the parameters of each feedforward network layer.
  • the recognized text block is marked with a text box, and the feature information of each text block is expressed in the form of a feature vector:
  • R represents a feature vector of a text block, including 6 feature information
  • x represents an x coordinate of a center point of the text block
  • y represents a y coordinate of a center point of the text block
  • w represents a width of the text block
  • h represents a height of the text block
  • s represents the scale mean of all connected regions in the text block
  • d represents the density information of the text block.
  • the connected area refers to an area that can be formed by a connection between pixels in a binarized image; a connection between pixels has a 4-neighbor and an 8-neighbor algorithm, for example, an 8-neighbor connection algorithm, that is, at (x) , y) the pixel of the position, if one of the 8 points adjacent to it is the same as the pixel value of (x, y), the two are connected by 8 neighborhoods, and recursively find all connected points, these points
  • the collection is a connected area.
  • W and H respectively represent functions of taking length and taking width
  • r i is a connected region i
  • K represents a total amount of connected regions included in a text block
  • p represents a pixel value of a pixel.
  • the corresponding feature information of the text block is further normalized, for example, a convention:
  • the manner in which a starting text block is determined from all of the text blocks may include:
  • the XOY coordinate system is established with the vertex of the upper left corner of the document image as the origin (refer to FIG. 3 and FIG. 5), and the positive direction of the x-axis of the XOY coordinate system points to the width direction of the document picture, and the positive direction of the y-axis points to the length direction of the document picture. .
  • a text block having the smallest x coordinate of the center point is obtained from the block set as the text block A.
  • acquiring a text block whose center point is smaller than the text block of the text block A constructing a text block set G'; and sequentially comparing each text block B in the set G' with the text block A; If there is no intersection between the text block B and the projection of the text block A in the x-axis direction, the text block B is deleted from the set G'; if the text block B and the text block A are in the x-axis direction If there is an intersection of the projections, the text block A is updated as the text block B, and the text block B is deleted from the set G'.
  • the method for determining the starting text block of this embodiment is applicable to various complicated documents and can accurately identify the starting text block.
  • the machine learning model is selected as a neural network model.
  • the neural network model may include a 6-dimensional input layer, a 6-dimensional output layer, a first hidden layer, and a second hidden layer.
  • the input layer is responsible for receiving input and distributing to the hidden layer (because the user cannot see these layers, so it is called the hidden layer).
  • the hidden layer is responsible for the required calculations and output results to the output layer, and the user can see Final Results.
  • the first hidden layer and the second hidden layer are 12-dimensional and 20-dimensional hidden layers, respectively.
  • R ⁇ r j ;j ⁇ [0,6) ⁇
  • the output of the first hidden layer is K 1 :
  • the output of the second hidden layer is K 2 :
  • the output of the 6-dimensional output layer is O:
  • a 1i and b 1i are parameters corresponding to the first hidden layer
  • k 1i is the i-th output of the first hidden layer
  • a 2m and b 2m are parameters corresponding to the second hidden layer
  • k 2m is the second hidden layer
  • the m- th output; a on and b on are parameters corresponding to the 6-dimensional output layer, o n is the n-th output, and Sigmoid represents the S-type nonlinear function.
  • the text block in FIG. 5 is used as a sample block to train the neural network model, and the sample block includes R 1 , R 2 , and R 3 .
  • R 4 and R 5 can be expressed as:
  • R 1 ⁇ x 1 ,y 1 ,w 1 ,h 1 ,s 1 ,d 1 ⁇
  • R 2 ⁇ x 2 , y 2 , w 2 , h 2 , s 2 , d 2 ⁇ ;
  • R 3 ⁇ x 3 , y 3 , w 3 , h 3 , s 3 , d 3 ⁇ ;
  • R 4 ⁇ x 4 , y 4 , w 4 , h 4 , s 4 , d 4 ⁇ ;
  • R 5 ⁇ x 5, y 5, w 5, h 5, s 5, d 5 ⁇ ;
  • R 1 , R 2 , R 3 , R 4 and R 5 is R 1 ⁇ R 3 ⁇ R 2 ⁇ R 4 ⁇ R 5 .
  • training samples R 1 , R 2 , R 3 , R 4 , R 5 may also be described as a sequence of states:
  • T ⁇ R 1 , S 1 , S 2 ⁇ , ⁇ R 3 , S 2 , S 3 ⁇ , ⁇ R 2 , S 3 , S 4 ⁇ .
  • the training of the neural network model is first performed using the ⁇ R 1 , S 1 , S 2 ⁇ sequence, as follows:
  • the corresponding loss function of the sample block R 1 participating in the training can be constructed:
  • All parameters in the neural network model can be updated by the BP algorithm.
  • the training is continued according to the above steps, that is, according to the sequence ⁇ R 3 , S 2 , S 3 ⁇ , ⁇ R 2 , S 3 , S 4 ⁇ , the training is continued in accordance with the above steps, whereby the training of the neural network model can be completed.
  • a neural network model with stable performance can be obtained by selecting an appropriate training sample; the text block finding based on the trained neural network model can accurately obtain the next text block of the current text block, which is favorable for accurate detection.
  • the method for detecting the reading order of the document in the above embodiment of the present application can be applied to an automatic document analysis module in an OCR system, and the automatic document analysis module sorts the identified text blocks after identifying the text block included in the document image. Then, the reading order of the text block is output to the text recognition module, and after the text recognition is performed in the text recognition module, the final readable document is organized based on the already obtained reading order, thereby performing automatic analysis and storage.
  • the automatic document analysis module sorts the text blocks the information processing process includes:
  • the selection algorithm A ⁇ (R, S) is set, and the algorithm derives the state S of the next reading order according to the current text block R and the state S of the current reading order, which can be expressed as:
  • algorithm A can be divided into three parts:
  • ⁇ 1 is used to select the starting text block, and the starting text block is marked with R start .
  • R l select an R whose center point coordinate is located at the leftmost side of the document picture, denoted as R l , and then calculate the remaining R relative to R l and select y(R) ⁇ y(R l )
  • Construction of a set of text blocks G ', preferentially, but also for G' R in descending order according to y coordinate, then compared follow each R L and R sequentially G 'is, if the direction R L and R in the x-axis projection intersect, mark this as the R L R, the 'delete; otherwise, not updating R L, R from this G' from the G deletion R; above operation is repeated until the G 'is empty , can determine R start R l .
  • ⁇ 2 is used to derive the feature prediction information O i+1 according to the current text block R i to the next reading order state, which can be described as:
  • the fully connected neural network of the hidden layer has a structure as shown in Fig. 4, in which each circle represents a neuron.
  • the output K 1 of the first hidden layer is:
  • the output of the second hidden layer is:
  • the output of the 6-dimensional output layer is:
  • the current reading order state S is updated as follows to obtain the next reading order state:
  • the method for detecting the reading order of the document in the present application is exemplified by taking the document picture shown in FIG. 5 as an example. Including steps 1 to 5, the steps are as follows:
  • Step one performing binarization processing and direction correction processing on the original document image; and performing layout analysis on the document image subjected to the binarization processing and the direction correction processing to obtain all the text blocks included in the document.
  • the text blocks contained in the document are obtained as R 1 , R 2 , R 3 , R 4 and R 5 .
  • step two the starting text block is determined.
  • R start will thus initially assigned to R 3.
  • Step three starting from the beginning of automatic routing text block R 1.
  • V * ⁇ R 2 ⁇ O, R 3 ⁇ O, R 4 ⁇ O, R 5 ⁇ O, ⁇ ;
  • R 3 is taken as the current text block.
  • Step 4 According to the result of automatic path finding, the document reading order is R 1 ⁇ R 3 ⁇ R 2 ⁇ R 4 ⁇ R 5 .
  • Step 5 Perform text recognition on the text block in the order of R 1 ⁇ R 3 ⁇ R 2 ⁇ R 4 ⁇ R 5 to obtain readable text information corresponding to the document, and save and output the readable text information.
  • the text recognition of the text block includes steps of line segmentation and line recognition, and character recognition is performed in units of rows in sequence, thereby obtaining text information of the entire text block.
  • the neural network algorithm since the neural network algorithm has a large number of parameters, according to the trained neural network model, it can be compatible with various scenes, and has better robustness to the size, noise and pattern of the document picture. .
  • the present application also provides an apparatus for detecting a reading order of a document, the apparatus being usable for performing the above-described method of detecting a reading order of a document.
  • the apparatus being usable for performing the above-described method of detecting a reading order of a document.
  • the illustrated structure does not constitute a limitation on the device, and may include More or fewer parts than the illustration, or a combination of some parts, or a different part arrangement.
  • a computer device the internal structure of which may be as shown in FIG. 2, the computer device includes means for detecting a reading order of the document, and the device for detecting the reading order of the document includes each module, each The modules may be implemented in whole or in part by software, hardware or a combination thereof.
  • FIG. 6 is a schematic structural diagram of an apparatus for detecting a reading order of a document according to an embodiment of the present invention. As shown in FIG. 6, the apparatus for detecting a reading order of a document includes: a block identifying module 610, and a starting block selecting module 620.
  • the automatic path finding module 630 and the sequence determining module 640 are detailed as follows:
  • the block identification module 610 is configured to identify a text block included in a document picture, and construct a block set
  • the block identification module 610 may specifically include: a pre-processing sub-module for performing binarization processing and direction correction processing on the document picture; and a layout recognition sub-module for The document image of the value processing and the direction correction processing is subjected to layout analysis to obtain a text block included in the document.
  • the layout analysis refers to an algorithm for dividing the content in a document picture into a plurality of non-overlapping regions according to paragraphs, pagination, and the like in the OCR. This will result in all the text blocks contained in the document, as shown in Figure 3 or Figure 5.
  • the start block selection module 620 is configured to determine a starting text block from the block set.
  • the start block selection module 620 can be used to select a center point coordinate from the block set.
  • a text block of a vertex of the document picture is determined and the text block is determined as the starting text block.
  • the start block selection module 620 can be configured to select, from all the text blocks, a text block whose center point coordinates are located on the left side and the top of the document picture (ie, the text block in the upper left corner), and determine the text block as The starting text block.
  • the starting block selection module 620 may also determine other text blocks as starting text blocks for different documents and actual reading habits (eg, documents formatted from right to left). .
  • the automatic path finding module 630 is configured to perform a routing operation on the starting text block according to the feature information of the starting text block to determine a first text block in the block set corresponding to the starting text block.
  • the feature information of the text block includes location information of the text block in the document image and layout information of the text block; performing a routing operation on the first text block according to the feature information of the first text block to determine a text block corresponding to the first text block in the set of blocks; and so on until the execution order of the routing operation corresponding to each text block in the block set can be uniquely determined.
  • the automatic path finding module 630 is configured to perform a process of automatically routing a text block included in a document from a starting text block, and each path finding only needs to determine the current text block corresponding to the next.
  • a block of text For example, a document image shown in FIG. 3, the current text block R 1, may determine that the next block of text is a text block of R 1 R 2 through this routing; R 2 was then performed again as the current routing text, to give R The next text block of 2 is R 4 ; and so on, until it is determined that the next text block of R 6 is R 7 , the execution order of the routing operations corresponding to each text block can be uniquely determined.
  • the sequence determining module 640 is configured to determine an execution order of the routing operations corresponding to the text blocks in the block set, and obtain a reading order of the text blocks in the document picture according to the execution order.
  • the sequence determining module 640 can obtain the reading order of the text blocks in the document picture shown in FIG. 3 as R 1 ⁇ R 2 ⁇ R 4 ⁇ R 5 ⁇ R 3 ⁇ R 6 ⁇ R 7 ⁇ R 8 .
  • the starting block selection module 620 is specifically configured to establish an XOY coordinate system with an vertices of an upper left corner of the document image as an origin, and the X-axis positive direction of the XOY coordinate system points to a width direction of the document image, and a positive direction of the y-axis Pointing to the length direction of the document picture; obtaining a text block having the smallest x coordinate of the center point from the block set as the text block A;
  • the text block B is deleted from the set G'; if the text block B and the text block A are in the x-axis direction The intersection of the projections is updated, the text block A is updated as the text block B, and the text block B is deleted from the set G'; whether the set G' is empty after each text block comparison; if yes, Then determining the current text block A as the starting text block; if not, updating the set G' when the text block A is updated, and updating each text block in the updated set G' with the current text Block A performs the above comparison; and so on until the set G' is empty.
  • G' i.e., obtaining a text block in which all center point y coordinates are smaller than the updated text block A center point y coordinate to obtain a new set G'
  • the time for selecting the start text block can be further reduced.
  • the apparatus for detecting a reading order of a document further includes: a training module 650, configured to pre-train the machine learning model, so that the feature prediction information output by the machine learning model after the training and the corresponding The Euclidean distance of the sample information satisfies the set condition.
  • a training module 650 configured to pre-train the machine learning model, so that the feature prediction information output by the machine learning model after the training and the corresponding The Euclidean distance of the sample information satisfies the set condition.
  • the training module 650 can include a sample library construction sub-module and a training sub-module.
  • T represents the sequence of state changes to be determined during the training; if the total number of sample blocks in G is n, then,
  • T ⁇ R 1 ,S 1 ,S 2 ⁇ , ⁇ R 2 ,S 2 ,S 3 ⁇ ,... ⁇ R n-2 ,S n-2 ,S n-1 ⁇ ;
  • the training sub-module is configured to sequentially train the parameters in the machine learning model by using each sequence in the T; and after all the sequences in the T participate in the training, save the parameters in the machine learning model.
  • the training sub-module is used to implement the following process when training parameters in the machine learning model according to the kth sequence ⁇ R k , S k , S k+1 ⁇ in T:
  • the set V ** is normalized to obtain the set V **
  • the set V ⁇ is normalized to obtain the set V ⁇
  • the sample block R k is constructed according to the set V ** and the set V ⁇ to participate in the corresponding loss function during training. Updating parameters in the machine learning model by a BP algorithm based on the loss function, wherein the loss function is:
  • the machine learning model is a 6-dimensional input and 6-dimensional output neural network model.
  • the neural network model includes a 6-dimensional input layer, a 6-dimensional output layer, a first hidden layer, and a second hidden layer, wherein the first hidden layer and the second hidden layer are 12-dimensional and 20-dimensional hidden layers, respectively;
  • the output of the 6-dimensional output layer is O:
  • a 1i and b 1i are parameters corresponding to the first hidden layer
  • k 1i is the i-th output of the first hidden layer
  • a 2m and b 2m are parameters corresponding to the second hidden layer
  • k 2m is the second hidden layer
  • the m- th output; a on and b on are parameters corresponding to the 6-dimensional output layer, o n is the n-th output, and Sigmoid represents the S-type nonlinear function.
  • the apparatus for detecting a reading order of the document further includes: a text recognition module 660, configured to perform text recognition on each of the text blocks, and obtain text information of the document image according to the determined reading order.
  • a text recognition module 660 configured to perform text recognition on each of the text blocks, and obtain text information of the document image according to the determined reading order.
  • the device for detecting the reading order of the document can identify all the text blocks included in the document picture, and determine a starting text block from all the text blocks; then start the path starting from the starting text block, according to the advance
  • the trained machine learning model determines which text block area should be taken next until the reading order of all text blocks is obtained.
  • the path finding can be compatible with various scenes, and has better robustness to the size, noise and pattern of the document picture, and can accurately identify various types of documents.
  • each functional module is merely an example, and the actual application may be considered according to requirements, for example, for the configuration requirements of the corresponding hardware or the convenience of implementation of the software.
  • the above-mentioned function assignment is completed by different function modules, that is, the internal structure of the device for detecting the reading order of the documents is divided into different functional modules to complete all or part of the functions described above.
  • Each function module can be implemented in the form of hardware or in the form of a software function module.
  • the storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), or a random access memory (RAM).
  • the various steps in the various embodiments of the present application are not necessarily performed in the order indicated by the steps. Except as explicitly stated herein, the execution of these steps is not strictly limited, and the steps may be performed in other orders. Moreover, at least some of the steps in the embodiments may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be executed at different times, and the execution of these sub-steps or stages The order is also not necessarily sequential, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of the other steps.
  • Non-volatile memory can include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory can include random access memory (RAM) or external cache memory.
  • RAM is available in a variety of formats, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronization chain.
  • SRAM static RAM
  • DRAM dynamic RAM
  • SDRAM synchronous DRAM
  • DDRSDRAM double data rate SDRAM
  • ESDRAM enhanced SDRAM
  • Synchlink DRAM SLDRAM
  • Memory Bus Radbus
  • RDRAM Direct RAM
  • DRAM Direct Memory Bus Dynamic RAM
  • RDRAM Memory Bus Dynamic RAM

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Image Analysis (AREA)
  • Character Discrimination (AREA)

Abstract

A document reading-order detection method comprises: a computer device identifying text blocks in a document image and constructing a block set; determining a start text block from the block set; performing, according to feature information of the start text block, a path searching operation on the start text block to determine a first text block of the block set corresponding to the start text block, the feature information of a text block comprising position information of the text block in the document image and layout information of the text block; iteratively performing the above steps until an order of execution of the path searching operations respectively corresponding to the text blocks in the block set can be uniquely determined; and determining the order of execution of the path searching operations corresponding to the text blocks in the block set, and obtaining, according to the order of execution, a reading-order of the text blocks in the document image.

Description

检测文档阅读顺序的方法、计算机设备和存储介质Method, computer device and storage medium for detecting document reading order
本申请要求于2017年03月8日提交中国专利局,申请号为201710134711.1,申请名称为“检测文档阅读顺序的方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。The present application claims the priority of the Chinese Patent Application entitled "Method and Apparatus for Detecting the Reading Order of Documents" by the Chinese Patent Office, filed on March 8, 2017, the entire disclosure of which is hereby incorporated by reference. in.
技术领域Technical field
本申请涉及计算机技术领域,特别是涉及检测文档阅读顺序的方法、计算机设备和存储介质。The present application relates to the field of computer technology, and in particular, to a method, a computer device and a storage medium for detecting a reading order of a document.
背景技术Background technique
OCR(Optical Character Recognition光学字符识别),是描述文档图片识别的一类算法,其是针对印刷体字符,采用光学的方式将纸质文档中的文字转换成为黑白点阵的图像文件,并通过识别软件将图像中的文字转换成文本格式,供文字处理软件进一步编辑加工的技术。OCR (Optical Character Recognition) is a kind of algorithm for describing document image recognition. It is an image file for optical characters that converts text in a paper document into a black and white dot matrix image. The software converts the text in the image into a text format for further processing by the word processing software.
在OCR技术中,普遍采用基于有向图、固定规则、语义分析等方法来识别文档的阅读顺序,然而这些方法在复杂环境下或者对于复杂文档图片来说,其阅读顺序的识别错误率较高,存在识别性能不稳定的问题。In OCR technology, methods based on directed graphs, fixed rules, and semantic analysis are commonly used to identify the reading order of documents. However, in complex environments or for complex document images, the recognition order of reading order is higher. There is a problem that the recognition performance is unstable.
发明内容Summary of the invention
根据本申请提供的各种实施例提供一种检测文档阅读顺序的方法、计算机设备和存储介质。Various embodiments provided in accordance with the present application provide a method, computer device, and storage medium for detecting a reading order of a document.
一种检测文档阅读顺序的方法,包括:A method of detecting a reading order of a document, comprising:
计算机设备识别文档图片中包含的文本块,构建一个块集合;The computer device identifies a block of text contained in the document picture to construct a block set;
所述计算机设备从所述块集合中确定出一起始文本块;The computer device determines a starting text block from the set of blocks;
所述计算机设备根据所述起始文本块的特征信息对所述起始文本块执行寻径操作,以确定出所述块集合中与所述起始文本块对应的第一文本块;文本块的特征信息包括该文本块在文档图片中的位置信息以及该文本块的版面布局信息;The computer device performs a routing operation on the starting text block according to the feature information of the starting text block to determine a first text block corresponding to the starting text block in the block set; a text block The feature information includes location information of the text block in the document picture and layout information of the text block;
所述计算机设备根据所述第一文本块的特征信息对所述第一文本块执行寻径操作,以确定出所述块集合中与所述第一文本块对应的文本块;并依此类推直到所述块集合中每一个文本块对应的寻径操作的执行顺序能够唯一确定;及The computer device performs a routing operation on the first text block according to the feature information of the first text block to determine a text block corresponding to the first text block in the block set; and so on Until the execution order of the routing operations corresponding to each text block in the block set can be uniquely determined; and
所述计算机设备确定所述块集合中文本块对应的寻径操作的执行顺序,根据所述执行顺序得到所述文档图片中文本块的阅读顺序。The computer device determines an execution order of the routing operations corresponding to the text blocks in the block set, and obtains a reading order of the text blocks in the document picture according to the execution order.
一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述计算机可读指令被所述处理器执行时,使得所述处理器执行如下步骤:A computer device comprising a memory and a processor, the memory storing computer readable instructions, the computer readable instructions being executed by the processor such that the processor performs the following steps:
识别文档图片中包含的文本块,构建一个块集合;Identify a block of text contained in the document image to construct a block set;
从所述块集合中确定出一起始文本块;Determining a starting text block from the set of blocks;
根据所述起始文本块的特征信息对所述起始文本块执行寻径操作,以确定出所述块集合中与所述起始文本块对应的第一文本块;文本块的特征信息包括该文本块在文档图片中的位置信息以及该文本块的版面布局信息;Performing a routing operation on the starting text block according to the feature information of the starting text block to determine a first text block in the block set corresponding to the starting text block; the feature information of the text block includes Position information of the text block in the document picture and layout information of the text block;
根据所述第一文本块的特征信息对所述第一文本块执行寻径操作,以确定出所述块集合中与所述第一文本块对应的文本块;并依此类推直到所述块集合中每一个文本块对应的寻径操作的执行顺序能够唯一确定;及Performing a routing operation on the first text block according to the feature information of the first text block to determine a text block corresponding to the first text block in the block set; and so on until the block The execution order of the routing operations corresponding to each text block in the collection can be uniquely determined;
确定所述块集合中文本块对应的寻径操作的执行顺序,根据所述执行顺序得到所述文档图片中文本块的阅读顺序。Determining an execution order of the routing operations corresponding to the text blocks in the block set, and obtaining a reading order of the text blocks in the document picture according to the execution order.
一个或多个存储有计算机可读指令的非易失性存储介质,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行如下步骤:One or more non-volatile storage media storing computer readable instructions, when executed by one or more processors, cause one or more processors to perform the following steps:
识别文档图片中包含的文本块,构建一个块集合;Identify a block of text contained in the document image to construct a block set;
从所述块集合中确定出一起始文本块;Determining a starting text block from the set of blocks;
根据所述起始文本块的特征信息对所述起始文本块执行寻径操作,以确定出所述块集合中与所述起始文本块对应的第一文本块;文本块的特征信息包括该文本块在文档图片中的位置信息以及该文本块的版面布局信息;Performing a routing operation on the starting text block according to the feature information of the starting text block to determine a first text block in the block set corresponding to the starting text block; the feature information of the text block includes Position information of the text block in the document picture and layout information of the text block;
根据所述第一文本块的特征信息对所述第一文本块执行寻径操作,以确定出所述块集合中与所述第一文本块对应的文本块;并依此类推直到所述块集合中每一个文本块对应的寻径操作的执行顺序能够唯一确定;及Performing a routing operation on the first text block according to the feature information of the first text block to determine a text block corresponding to the first text block in the block set; and so on until the block The execution order of the routing operations corresponding to each text block in the collection can be uniquely determined;
确定所述块集合中文本块对应的寻径操作的执行顺序,根据所述执行顺序得到所述文档图片中文本块的阅读顺序。Determining an execution order of the routing operations corresponding to the text blocks in the block set, and obtaining a reading order of the text blocks in the document picture according to the execution order.
本申请的一个或多个实施例的细节在下面的附图和描述中提出。本申请的其它特征、目的和优点将从说明书、附图以及权利要求书变得明显。Details of one or more embodiments of the present application are set forth in the accompanying drawings and description below. Other features, objects, and advantages of the invention will be apparent from the description and appended claims.
附图说明DRAWINGS
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the description of the embodiments will be briefly described below. It is obvious that the drawings in the following description are only some embodiments of the present application. Other drawings may also be obtained from those of ordinary skill in the art in light of the inventive work.
图1为一个实施例中的本申请方案的应用环境示意图;1 is a schematic diagram of an application environment of a solution of the present application in an embodiment;
图2为一实施例的检测文档阅读顺序的方法的示意性流程图;2 is a schematic flowchart of a method for detecting a reading order of a document according to an embodiment;
图3为一实施例的文档图片包含的文本块示意图;3 is a schematic diagram of a text block included in a document picture of an embodiment;
图4为一实施例的神经网络模型的示意图;4 is a schematic diagram of a neural network model of an embodiment;
图5为一实施例的根据训练样本训练神经网络模型的示意流程图;5 is a schematic flow chart of training a neural network model according to a training sample according to an embodiment;
图6为一实施例的检测文档阅读顺序的装置的示意性结构图;及6 is a schematic structural diagram of an apparatus for detecting a reading order of a document according to an embodiment; and
图7为另一实施例的检测文档阅读顺序的装置的示意性结构图。FIG. 7 is a schematic structural diagram of an apparatus for detecting a reading order of a document according to another embodiment.
具体实施方式detailed description
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。In order to make the objects, technical solutions, and advantages of the present application more comprehensible, the present application will be further described in detail below with reference to the accompanying drawings and embodiments. It is understood that the specific embodiments described herein are merely illustrative of the application and are not intended to be limiting.
.
图1为一个实施例中的本申请方案的应用环境示意图;实现本申请实施例的检测文档阅读顺序的方法的应用环境为设置有OCR系统的智能终端,并且所述智能终端至少还包括通过系统总线连接的处理器、显示模组、电源接口和存储器,存储器包括非易失性存储介质和内存储器。所述智能终端通过OCR系统将文档图片中包含的文本信息识别并显示出来。其中,显示模组可对OCR系统识别出的文本信息进行显示;电源接口用于与外部电源连接,外部电源通过该电源接口向智能终端电池供电;所述非易失性存储介质中至少存储有操作系统、OCR系统、数据库以及计算机可读指令,该计算机可读指令被执行时,可使得处理器执行一种检测文档阅读顺序的方法。所述智能终端可以为手机、平板电脑等,也可以是其他具有上述结构的设备。本领域技术人员可以理解,图1中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。1 is a schematic diagram of an application environment of a solution of the present application in an embodiment; an application environment for implementing a method for detecting a reading order of a document in the embodiment of the present application is an intelligent terminal provided with an OCR system, and the smart terminal at least includes a passing system A bus-connected processor, display module, power interface, and memory, the memory including a non-volatile storage medium and an internal memory. The smart terminal identifies and displays the text information contained in the document picture through the OCR system. The display module can display the text information recognized by the OCR system; the power interface is used for connecting with an external power source, and the external power source supplies power to the smart terminal battery through the power interface; the non-volatile storage medium stores at least An operating system, an OCR system, a database, and computer readable instructions that, when executed, cause the processor to perform a method of detecting a reading order of the document. The smart terminal may be a mobile phone, a tablet computer, or the like, or may be another device having the above structure. It will be understood by those skilled in the art that the structure shown in FIG. 1 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation of the computer device to which the solution of the present application is applied. The specific computer device may It includes more or fewer components than those shown in the figures, or some components are combined, or have different component arrangements.
结合图1及上述对应用环境的说明,以下对检测文档阅读顺序的方法的实施例进行说明。In conjunction with FIG. 1 and the above description of the application environment, an embodiment of a method of detecting a document reading order will be described below.
图2为一实施例的检测文档阅读顺序的方法的示意性流程图;如图2所示,本实施例中的检测文档阅读顺序的方法包括步骤:FIG. 2 is a schematic flowchart of a method for detecting a reading order of a document according to an embodiment; as shown in FIG. 2, the method for detecting a reading order of a document in the embodiment includes the following steps:
S110,识别文档图片中包含的文本块,构建一个块集合;S110. Identify a text block included in a document picture, and construct a block set.
本实施例中,可先对文档图片进行二值化处理,得到二值化文档图片,在二值化文档图片中,各个像素点的值均用0或者1表示。然后基于二值化文档图片进行尺度分析和版面分析,得出文档包含的全部文本块。其中的尺度分析是指寻找二值化文档图片中每个字符的尺度信息,尺度以像素为单位,其值为字符所占用的矩形区域面积的平方根。版面分析是指在OCR中,将文档图片中的内容按照段落、分页等信息划分为多个不重叠的区域的算法。由此可得出文档中包含的全部文本块,例如图3所示或者图5所示。In this embodiment, the document picture may be binarized to obtain a binarized document picture. In the binarized document picture, the value of each pixel is represented by 0 or 1. Then, based on the binarized document image, the scale analysis and the layout analysis are performed to obtain all the text blocks contained in the document. The scale analysis refers to finding the scale information of each character in the binarized document picture. The scale is in pixels, and the value is the square root of the area of the rectangular area occupied by the characters. Layout analysis refers to an algorithm in OCR that divides the content of a document image into a plurality of non-overlapping regions according to information such as paragraphs and pagination. This will result in all the text blocks contained in the document, as shown in Figure 3 or Figure 5.
在另一个实施例中,对文档图片进行预处理的过程中,还包括对文档图片校正的步骤。即若待检测的文档图片的初始状态相对于预设的标准状态存在偏差时,校正所述文档图片使其符合所述标准状态。例如:若检测到文档图片的初始状态下存在倾斜、上下颠倒等情况,则需先对所述文档图片的方向进行校正。In another embodiment, the step of pre-processing the document picture further includes the step of correcting the document picture. That is, if the initial state of the document image to be detected is deviated from the preset standard state, the document picture is corrected to conform to the standard state. For example, if it is detected that there is a tilt, upside down, etc. in the initial state of the document picture, the direction of the document picture needs to be corrected first.
S120,从全部文本块中(即所述块集合中)确定出一起始文本块。S120. Determine a starting text block from all the text blocks (ie, in the block set).
通常情况下,人们在阅读文档时会从文档的一顶点(例如左上角)开始进行阅读,基于此,在一个实施例中,可从所述块集合中选择出中心点坐标位于所述文档图片的一个顶点的文本块,并将该文本块确定为所述起始文本块。例如:将位于文档图片的左侧且最上方的一文本块确定为起始文本块, 如图3中所示的文本块R 1,或者图5中所示的文本块R 1Generally, when reading a document, people start reading from a vertex (for example, the upper left corner) of the document. Based on this, in one embodiment, a center point coordinate can be selected from the block set to be located in the document image. A text block of a vertex and the text block is determined as the starting text block. For example, a text block located on the left and top of the document picture is determined as a starting text block, such as the text block R 1 shown in FIG. 3, or the text block R 1 shown in FIG. 5.
可以理解的,在其他实施例中,对于不同的文档和实际的阅读习惯(例如从右到左排版的文档),也可将其他文本块确定为起始文本块。It will be appreciated that in other embodiments, other text blocks may also be determined as the starting text block for different documents and actual reading habits (eg, documents formatted from right to left).
S130,从起始文本块开始寻径;根据该起始文本块的特征信息对该起始文本块执行寻径操作,以确定出所述块集合中与该起始文本块对应的第一文本块;根据所述第一文本块的特征信息对该第一文本块执行寻径操作,以确定出所述块集合中与该第一文本块对应的文本块;并依此类推直到所述块集合中每一个文本块对应的寻径操作的执行顺序能够唯一确定。S130, starting a path from the starting text block; performing a routing operation on the starting text block according to the feature information of the starting text block to determine a first text in the block set corresponding to the starting text block. And performing a routing operation on the first text block according to the feature information of the first text block to determine a text block corresponding to the first text block in the block set; and so on until the block The order of execution of the routing operations corresponding to each text block in the collection can be uniquely determined.
其中,文本块的特征信息包括该文本块在文档图片中的位置信息以及该文本块的版面布局信息。The feature information of the text block includes location information of the text block in the document image and layout information of the text block.
对文本块进行寻径操作实际上是基于该文本块的特征信息得出其对应的下一文本块的特征预测信息。在一实施例中,对文本块的寻径操作包括:通过预先训练好的机器学习模型对所述文本块的特征信息进行学习,得出与该文本块对应的文本块的特征预测信息;计算所述块集合中未执行寻径操作的各文本块的特征信息与所述特征预测信息的相关度;然后根据上述计算出的相关度确定出所述文本块对应的文本块。The path finding operation on the text block is actually based on the feature information of the text block to obtain the feature prediction information of the corresponding next text block. In an embodiment, the routing operation of the text block includes: learning, by using a pre-trained machine learning model, feature information of the text block to obtain feature prediction information of the text block corresponding to the text block; a correlation between feature information of each text block in which the path finding operation is not performed and the feature prediction information in the block set; and then determining a text block corresponding to the text block according to the calculated correlation degree.
本实施例中,步骤S130即是一个自起始文本块起,对文档包含的文本块进行自动寻径的过程,每次寻径只需确定当前文本块对应的下一文本块。例如图3所示的文档图片,当前文本块为R 1,通过本次寻径可确定文本块R 1的下一文本块为R 2;然后将R 2作为当前文本再次进行寻径,得到R 2的下一文本块为R 4;以此类推,直到对R 6执行完寻径操作,并确定出R 6对应的下一文本块为R 7,虽然此时R 7和R 8未执行寻径操作,但由于已经确定出R 6对应的下一 文本块为R 7,因此R 7和R 8对应的寻径操作的执行顺序已经能够唯一确定(即先R 7后R 8)。通过上述自动寻径方式,对文档图片的尺寸、样式具有更好的鲁棒性。并且自动寻径的依据是基于文本块之间位置以及版面布局信息的相关性,因此能够较好的克服图片噪声或者识别环境对检测结果的影响,有利于保证检测结果的准确性。 In this embodiment, step S130 is a process of automatically routing a text block included in the document from the initial text block, and only needs to determine the next text block corresponding to the current text block each time the path is found. For example, a document image shown in FIG. 3, the current text block R 1, may determine that the next block of text is a text block of R 1 R 2 through this routing; R 2 was then performed again as the current routing text, to give R The next text block of 2 is R 4 ; and so on, until the routing operation is performed on R 6 , and it is determined that the next text block corresponding to R 6 is R 7 , although R 7 and R 8 are not performed at this time. The path operation, but since it has been determined that the next text block corresponding to R 6 is R 7 , the execution order of the routing operations corresponding to R 7 and R 8 can be uniquely determined (ie, R 7 and then R 8 ). Through the above automatic path finding method, the size and style of the document picture are more robust. And the basis of automatic path finding is based on the position between the text blocks and the layout information of the layout, so it can better overcome the image noise or the influence of the recognition environment on the detection results, which is beneficial to ensure the accuracy of the detection results.
本实施例中,预先通过合适的训练样本对所述机器学习模型进行训练,可使得所述机器学习模型输出较为准确的预测结果,然后基于相关性可确定出准确的下一文本块,适用于各种混合文档类型的文档阅读顺序检测。其中,所述机器学习模型可以为神经网络模型,也可以为其他非神经网络的概率模型。In this embodiment, the machine learning model is trained in advance through a suitable training sample, so that the machine learning model can output a more accurate prediction result, and then an accurate next text block can be determined based on the correlation, which is applicable to Document reading order detection for various mixed document types. The machine learning model may be a neural network model or a probabilistic model of other non-neural networks.
S140,确定所述块集合中文本块对应的寻径操作的执行顺序,根据所述执行顺序得到所述文档图片中文本块的阅读顺序。S140. Determine an execution sequence of the routing operation corresponding to the text block in the block set, and obtain a reading order of the text block in the document picture according to the execution sequence.
通过步骤S130的自动寻径,可得到每一个文本块及其对应的下一文本块,当自动寻径结束时,根据所有文本块以及各文本块对应的下一文本块,便可得到全部文本块的阅读顺序。例如在自动寻径结束后,可得到图3所示的文档图片中文本块的阅读顺序为R 1→R 2→R 4→R 5→R 3→R 6→R 7→R 8Through the automatic path finding in step S130, each text block and its corresponding next text block can be obtained. When the automatic path finding ends, all the texts can be obtained according to all the text blocks and the next text block corresponding to each text block. The order in which the blocks are read. For example, after the automatic path finding is completed, the reading order of the text blocks in the document picture shown in FIG. 3 can be obtained as R 1 → R 2 → R 4 → R 5 → R 3 → R 6 → R 7 → R 8 .
基于上述实施例的检测文档阅读顺序的方法,首先识别文档图片中包含的全部文本块;从全部文本块中确定出一起始文本块,从起始文本块开始寻径,根据文本块在文档图片中的位置信息以及该文本块的版面布局信息决定下一步应该走到哪个文本块区域,直到得出全部文本块的阅读顺序。由此能够兼容多种场景,对文档图片的尺寸、噪声、样式具有更好的鲁棒性,因此能够准确识别各类文档图片对应的文档阅读顺序。Based on the method for detecting the reading order of the document according to the above embodiment, firstly identifying all the text blocks included in the document picture; determining a starting text block from all the text blocks, starting from the starting text block, and according to the text block in the document picture The location information in the text and the layout information of the text block determine which text block area should be taken next until the reading order of all the text blocks is obtained. Therefore, it can be compatible with various scenes, and has better robustness to the size, noise, and style of the document picture, and thus can accurately recognize the document reading order corresponding to each type of document picture.
在一个实施例中,所述机器学习模块中包含多个参数,在所述检测文档阅读顺序的方法中,还包括对所述机器学习模型进行训练的步骤,以使得训练之后的机器学习模型输出的特征预测信息与对应的样本信息的欧式距离满足设定条件。欧式距离指的是欧几里得度量,表示两个相同维度向量的空间距离。In one embodiment, the machine learning module includes a plurality of parameters, and the method for detecting a reading order of the document further includes the step of training the machine learning model to enable the machine learning model output after the training The Euclidean distance between the feature prediction information and the corresponding sample information satisfies the set condition. The Euclidean distance refers to the Euclidean metric, which represents the spatial distance of two identical dimensional vectors.
在一个实施例中,对机器学习模块进行训练的方式可包括如下过程:In one embodiment, the manner in which the machine learning module is trained may include the following process:
首先,获取训练样本。样本是指在机器学习过程中,已经标定好了的数据,包括输入数据和输出数据。本实施例中训练样本即参与机器学习模块训练的若干样本块,且所述若干样本块的阅读顺序为已知的。First, get a training sample. Samples refer to data that has been calibrated during machine learning, including input data and output data. In this embodiment, the training samples are a plurality of sample blocks that participate in the training of the machine learning module, and the reading order of the plurality of sample blocks is known.
然后,基于训练样本建立对应的样本库M={G,S,T}。其中G表示样本块的集合,S表示样本块在先后各次训练中的顺序状态的集合,T表示训练过程中需确定的状态变化序列。若G中样本块的总数为n,则有,Then, a corresponding sample library M={G, S, T} is established based on the training samples. Where G denotes a set of sample blocks, S denotes a set of sequential states of the sample blocks in successive trainings, and T denotes a sequence of state changes to be determined during training. If the total number of sample blocks in G is n, then,
S={s i;i∈[1,n],s i∈[0,n]}; S={s i ;i∈[1,n],s i ∈[0,n]};
T={{R 1,S 1,S 2},{R 2,S 2,S 3},...{R n-2,S n-2,S n-1}}; T={{R 1 ,S 1 ,S 2 },{R 2 ,S 2 ,S 3 },...{R n-2 ,S n-2 ,S n-1 }};
若s i=0表示样本块R i的阅读顺序未确定(即执行寻径操作的顺序未确定),若s i>0表示样本块R i的阅读顺序已确定(即执行寻径操作的顺序已确定),且阅读顺序为s i的值,表示为S(R i)=s i。上述T中的每一个序列中的各项分别表示当前参与训练的样本块、G中每个样本块当前的顺序状态的集合和需预测出的G中每个样本块的下一顺序状态的集合。具体的,以{R 2,S 2,S 3}序列为例,R 2表示当前参与训练的样本块为R 2,S 2表示R 2参与训练时G中各个样本块对应的顺序状态,S 3表示采用R 2参与训练时需预测出的G中每个样本块的下一个顺序状态。其中,由于剩余的最后两个样本块可采用排除法直接确定出来,因此其不需要训练,故在T中只需包括n-2个序列。 If s i =0 indicates that the reading order of the sample block R i is not determined (ie, the order in which the routing operation is performed is not determined), if s i >0 indicates that the reading order of the sample block R i has been determined (ie, the order in which the routing operations are performed) It has been determined), and the reading order is the value of s i , expressed as S(R i )=s i . Each item in each of the above T sequences represents a sample block currently participating in training, a current set of sequential states of each sample block in G, and a set of next sequential states of each sample block in G to be predicted. . Specifically, taking the sequence of {R 2 , S 2 , S 3 } as an example, R 2 indicates that the sample block currently participating in the training is R 2 , and S 2 represents the sequence state corresponding to each sample block in the G when R 2 participates in training, S 3 indicates the next sequential state of each sample block in G to be predicted when R 2 is involved in training. Among them, since the remaining last two sample blocks can be directly determined by the exclusion method, they do not need training, so only n-2 sequences need to be included in T.
然后,基于上述的样本库M={G,S,T},依次采用T中的各个状态变化序列对机器学习模型进行训练;当T中的所有状态变化序列均参与训练之后,保存所述机器学习模型中的参数。Then, based on the sample library M={G, S, T} described above, the machine learning model is trained by sequentially using each state change sequence in T; after all the state change sequences in T participate in the training, the machine is saved. Learn the parameters in the model.
在一个实施例中,根据T中的第k个序列{R k,S k,S k+1}对机器学习模型中的参数进行训练的具体实施方式可包括如下步骤1~步骤5: In an embodiment, the specific implementation of training the parameters in the machine learning model according to the kth sequence {R k , S k , S k+1 } in T may include the following steps 1 to 5:
步骤1,将样本块R k的特征信息输入机器学习模型,获取机器学习模型输出的R k的下一文本块的特征预测信息O k,k∈[1,n-2]; Step 1, the feature information of the sample block R k is input into the machine learning model, and the feature prediction information O k , k ∈ [1, n-2] of the next text block of R k output by the machine learning model is obtained;
步骤2,获取S k中顺序状态为0的样本块R i,得到集合G *Step 2: Obtain a sample block R i with a sequential state of 0 in S k , and obtain a set G * :
G *={R i;S k(R i)=0};i∈[1,n]; G * ={R i ;S k (R i )=0}; i∈[1,n];
集合G *的维度为n-k;. The dimension of the set G * is nk;
步骤3,将G *中各项分别与O k进行点积运算,得到集合V *={v i=R i·O k}; Step 3, the G * respectively in the dot product of O k, to obtain a set of V * = {v i = R i · O k};
步骤4,获取G *中各样本块R i在S k+1中对应的顺序状态,得到集合V π={v′ i=S k+1(R i)};集合V π的维度与集合G *的维度相等. Step 4, obtaining G * each sample block R i in S k + sequential state corresponding 1, to give a set of V π = {v 'i = S k + 1 (R i)}; set of dimensions V π of the set G * is equal to a dimension.
步骤5,对V *进行归一化处理可得到
Figure PCTCN2018075626-appb-000001
对V π进行归一化处理得到集合V ππ={v″ i=v′ i/sum(V π)};根据V **和V ππ构建所述样本块R k参与训练时对应的损失函数loss,基于该损失函数通过BP算法更新所述机器学习模型中的参数。其中所述损失函数loss为:
Step 5, normalizing V * can be obtained
Figure PCTCN2018075626-appb-000001
Normalizing V π to obtain a set V ππ ={v′′ i =v′ i /sum(V π )}; constructing the corresponding loss function of the sample block R k according to V ** and V ππ Loss, based on the loss function, updating parameters in the machine learning model by a BP algorithm, wherein the loss function loss is:
Figure PCTCN2018075626-appb-000002
Figure PCTCN2018075626-appb-000002
本实施例中,损失函数是指在机器学习过程中,通过机器学习计算所得到的误差,误差可以使用多种函数进行度量,且该函数一般为凸函数。即根据V **和V ππ的欧式距离构建所述样本块R k参与训练时对应的损失函数。欧式距离即欧几里得度量,表示两个多为维向量的空间距离。通过每次学习过程中 得到的损失函数,使用BP算法对机器学习模型的参数进行调整,当损失函数收敛到一定程度时,机器学习模型的输出准确度也会提高到某个程度。其中BP算法即误差反向传播算法(Error Back Propagation),尤其适用于多层前馈网络模型的训练,是指在训练过程中误差会累积到输出层,然后通过输出层将误差反向传递到每一个前馈网络层,从而达到调节各前馈网络层参数的目的。 In this embodiment, the loss function refers to an error obtained by machine learning calculation in the machine learning process, and the error can be measured using a plurality of functions, and the function is generally a convex function. That is, the loss function corresponding to the sample block R k participating in the training is constructed according to the Euclidean distance of V ** and V ππ . The Euclidean distance is the Euclidean metric, indicating that the two are mostly spatial distances of the dimensional vector. Through the loss function obtained in each learning process, the BP algorithm is used to adjust the parameters of the machine learning model. When the loss function converges to a certain extent, the output accuracy of the machine learning model is also increased to a certain extent. The BP algorithm, Error Back Propagation, is especially suitable for the training of the multi-layer feedforward network model. It means that the error will accumulate to the output layer during the training process, and then the error will be reversely transmitted to the output layer. Each feedforward network layer achieves the purpose of adjusting the parameters of each feedforward network layer.
在一个实施例中,为了准确的对各个文本块的特征信息进行学习,对识别出的文本块采用文本框进行标记,并将每个文本块的特征信息用特征向量的形式表示为:In an embodiment, in order to accurately learn the feature information of each text block, the recognized text block is marked with a text box, and the feature information of each text block is expressed in the form of a feature vector:
R={x,y,w,h,s,d};R={x,y,w,h,s,d};
R表示文本块的特征向量,包含6个特征信息;x表示文本块的中心点的x坐标;y表示文本块的中心点的y坐标;w表示文本块的宽度;h表示文本块的高度;s表示文本块中所有连通区域的尺度均值;d表示文本块的密度信息。所述连通区域是指在二值化图像中,能够通过像素之间的连接形成的区域;像素之间的连接有4邻域和8邻域算法,例如8邻域连通算法,即在(x,y)位置的像素点,如果与其相邻的8个点中的某一个与(x,y)的像素值相同,则两者是8邻域连通的,递归查找所有连通的点,这些点的集合即为一个连通区域。R represents a feature vector of a text block, including 6 feature information; x represents an x coordinate of a center point of the text block; y represents a y coordinate of a center point of the text block; w represents a width of the text block; and h represents a height of the text block; s represents the scale mean of all connected regions in the text block; d represents the density information of the text block. The connected area refers to an area that can be formed by a connection between pixels in a binarized image; a connection between pixels has a 4-neighbor and an 8-neighbor algorithm, for example, an 8-neighbor connection algorithm, that is, at (x) , y) the pixel of the position, if one of the 8 points adjacent to it is the same as the pixel value of (x, y), the two are connected by 8 neighborhoods, and recursively find all connected points, these points The collection is a connected area.
其中,among them,
Figure PCTCN2018075626-appb-000003
Figure PCTCN2018075626-appb-000003
Figure PCTCN2018075626-appb-000004
Figure PCTCN2018075626-appb-000004
W、H分别表示取长度和取宽度的函数,r i为连通区域i,K表示文本块中包含的连通区域的总量;p表示像素点的像素值。 W and H respectively represent functions of taking length and taking width, r i is a connected region i, K represents a total amount of connected regions included in a text block, and p represents a pixel value of a pixel.
在一个实施例中,在识别文档图片中包含的文本块之后,还包括获取各文本块的特征向量R={x,y,w,h,s,d}的步骤。为了让机器学习的模型对尺度信息不敏感,进一步将文本块的对应特征信息做归一化处理,例如约定:In one embodiment, after identifying the text block included in the document picture, the step of acquiring the feature vector R={x, y, w, h, s, d} of each text block is further included. In order to make the machine learning model insensitive to the scale information, the corresponding feature information of the text block is further normalized, for example, a convention:
w=1.0;h=1.0;max(p)=1.0。w=1.0; h=1.0; max(p)=1.0.
在一个实施例中,从全部文本块中确定出一起始文本块的方式可包括:In one embodiment, the manner in which a starting text block is determined from all of the text blocks may include:
以文档图片左上角顶点为原点建立XOY坐标系(参考图3、图5所示),并且该XOY坐标系的x轴正方向指向文档图片的宽度方向,y轴正方向指向文档图片的长度方向。首先,从所述块集合中获取中心点的x坐标最小的一个文本块,作为文本块A。然后,获取中心点的y坐标小于所述文本块A的文本块,构建一个文本块集合G′;并依次将该集合G′中的每一个文本块B与所述文本块A进行对比;若所述文本块B与该文本块A在x轴方向的投影不存在交集,则将所述文本块B从集合G′中删除;若所述文本块B与该文本块A在x轴方向的投影存在交集,则更新所述文本块A为所述文本块B,并将所述文本块B从集合G′中删除。在每次文本块对比之后检测集合G′是否为空;若是,则将当前的文本块A确定为起始文本块;若否,则在所述文本块A发生更新时更新集合G′,并将更新后的集合G′中的每一个文本块与当前的文本块A进行上述对比;依次类推直到集合G′为空。本实施例的起始文本块的确定方法,适用于各类复杂的文档,并能准确识别出起始文本块。The XOY coordinate system is established with the vertex of the upper left corner of the document image as the origin (refer to FIG. 3 and FIG. 5), and the positive direction of the x-axis of the XOY coordinate system points to the width direction of the document picture, and the positive direction of the y-axis points to the length direction of the document picture. . First, a text block having the smallest x coordinate of the center point is obtained from the block set as the text block A. Then, acquiring a text block whose center point is smaller than the text block of the text block A, constructing a text block set G'; and sequentially comparing each text block B in the set G' with the text block A; If there is no intersection between the text block B and the projection of the text block A in the x-axis direction, the text block B is deleted from the set G'; if the text block B and the text block A are in the x-axis direction If there is an intersection of the projections, the text block A is updated as the text block B, and the text block B is deleted from the set G'. Detecting whether the set G' is empty after each text block comparison; if so, determining the current text block A as the starting text block; if not, updating the set G' when the text block A is updated, and Each text block in the updated set G' is compared with the current text block A; and so on until the set G' is empty. The method for determining the starting text block of this embodiment is applicable to various complicated documents and can accurately identify the starting text block.
在一个实施例中,假设将每个文本块的特征向量表示为R={r 1,r 2,r 3,r 4,r 5,r 6}={x,y,w,h,s,d},简记为R={r j;j∈[0,6)},r j为样本块的特征信 息j。所述机器学习模型选为神经网络模型。对应的,如图4所示,所述神经网络模型可包括6维输入层、6维输出层、第一隐层以及第二隐层。在神经网络模型中,输入层负责接收输入及分发到隐层(因为用户看不见这些层,所以叫做隐层),隐层负责所需的计算及输出结果给输出层,而用户则可以看到最终结果。 In one embodiment, it is assumed that the feature vector of each text block is represented as R = {r 1 , r 2 , r 3 , r 4 , r 5 , r 6 }={x, y, w, h, s, d}, abbreviated as R = {r j ; j ∈ [0, 6)}, r j is the feature information j of the sample block. The machine learning model is selected as a neural network model. Correspondingly, as shown in FIG. 4, the neural network model may include a 6-dimensional input layer, a 6-dimensional output layer, a first hidden layer, and a second hidden layer. In the neural network model, the input layer is responsible for receiving input and distributing to the hidden layer (because the user cannot see these layers, so it is called the hidden layer). The hidden layer is responsible for the required calculations and output results to the output layer, and the user can see Final Results.
优先的,所述第一隐层、第二隐层分别为12维和20维的隐层。将所述R={r j;j∈[0,6)}输入所述神经网络模型,则所述第一隐层的输出为K 1Preferably, the first hidden layer and the second hidden layer are 12-dimensional and 20-dimensional hidden layers, respectively. Inputting R={r j ;j∈[0,6)} into the neural network model, the output of the first hidden layer is K 1 :
Figure PCTCN2018075626-appb-000005
Figure PCTCN2018075626-appb-000005
所述第二隐层的输出为K 2The output of the second hidden layer is K 2 :
Figure PCTCN2018075626-appb-000006
Figure PCTCN2018075626-appb-000006
所述6维输出层的输出为O:The output of the 6-dimensional output layer is O:
O={o n=sigmoid∑a onk 2m+b on;n∈[0,6),m∈[0,20)}; O={o n =sigmoid∑a on k 2m +b on ;n∈[0,6),m∈[0,20)};
其中a 1i、b 1i为第一隐层对应的参数,k 1i为第一隐层的第i维输出;a 2m、b 2m为第二隐层对应的参数,k 2m为第二隐层的第m维输出;a on、b on为6维输出层对应的参数,o n为第n维输出,Sigmoid表示S型的非线性函数。 Where a 1i and b 1i are parameters corresponding to the first hidden layer, k 1i is the i-th output of the first hidden layer; a 2m and b 2m are parameters corresponding to the second hidden layer, and k 2m is the second hidden layer The m- th output; a on and b on are parameters corresponding to the 6-dimensional output layer, o n is the n-th output, and Sigmoid represents the S-type nonlinear function.
对于上述的神经网络模型的训练,以图5中的文本块为例,将图5中的文本块作为样本块进行所述神经网络模型的训练,样本块包括R 1,R 2,R 3,R 4以及R 5,可分别表示为: For the training of the neural network model described above, taking the text block in FIG. 5 as an example, the text block in FIG. 5 is used as a sample block to train the neural network model, and the sample block includes R 1 , R 2 , and R 3 . R 4 and R 5 can be expressed as:
R 1={x 1,y 1,w 1,h 1,s 1,d 1} R 1 ={x 1 ,y 1 ,w 1 ,h 1 ,s 1 ,d 1 }
R 2={x 2,y 2,w 2,h 2,s 2,d 2}; R 2 ={x 2 , y 2 , w 2 , h 2 , s 2 , d 2 };
R 3={x 3,y 3,w 3,h 3,s 3,d 3}; R 3 ={x 3 , y 3 , w 3 , h 3 , s 3 , d 3 };
R 4={x 4,y 4,w 4,h 4,s 4,d 4}; R 4 ={x 4 , y 4 , w 4 , h 4 , s 4 , d 4 };
R 5={x 5,y 5,w 5,h 5,s 5,d 5}; R 5 = {x 5, y 5, w 5, h 5, s 5, d 5};
且已知R 1,R 2,R 3,R 4,R 5的正确阅读顺序为R 1→R 3→R 2→R 4→R 5It is also known that the correct reading order of R 1 , R 2 , R 3 , R 4 and R 5 is R 1 → R 3 → R 2 → R 4 → R 5 .
根据所述训练样本,设定每个样本块的当前顺序状态的集合为S={s i;i∈[1,5],s i∈[0,5]},其中当s i=0时表示对应的文本块R i还未确定执行寻径操作的顺序(即R i的阅读顺序未确定),s i>0表示对应的文本块R i已确定执行寻径操作的顺序(即R i的阅读顺序已确定),且确定执行寻径操作的顺序为s i的值,表示为S(R i)=s i。因此所述训练样本在训练过程中对应的阅读状态可包括: Determining, according to the training sample, a set of current sequential states of each sample block is S={s i ; i ∈ [1, 5], s i ∈ [0, 5]}, wherein when s i =0 indicates that the corresponding text block R i has not been determined execution order routing operations (i.e., not determined reading order R i), s i> 0 indicates that the corresponding text block R i has been determined that the order of execution of the routing operation (i.e., R i The reading order has been determined), and the order in which the routing operation is performed is determined as the value of s i , expressed as S(R i )=s i . Therefore, the corresponding reading state of the training sample during the training process may include:
S 0=(0,0,0,0,0); S 0 = ( 0, 0, 0, 0, 0 );
S 1=(1,0,0,0,0); S 1 = ( 1, 0, 0, 0, 0);
S 2=(1,0,2,0,0); S 2 = (1, 0, 2, 0, 0);
S 3=(1,3,2,0,0); S 3 = ( 1, 3, 2, 0, 0);
S 4=(1,3,2,4,0); S 4 = (1, 3, 2, 4 , 0);
S 5=(1,3,2,4,5); S 5 = (1, 3, 2, 4, 5 );
进一步的,所述训练样本R 1,R 2,R 3,R 4,R 5还可描述为以下状态序列: Further, the training samples R 1 , R 2 , R 3 , R 4 , R 5 may also be described as a sequence of states:
{R 1,S 1,S 2},{R 3,S 2,S 3},{R 2,S 3,S 4},{R 4,S 4,S 5}; {R 1 , S 1 , S 2 }, {R 3 , S 2 , S 3 }, {R 2 , S 3 , S 4 }, {R 4 , S 4 , S 5 };
其中由于{R 4,S 4,S 5}序列可以直接确定出来,因此其不需要训练,因此在样本库中,T={{R 1,S 1,S 2},{R 3,S 2,S 3},{R 2,S 3,S 4}}。基于所述样本库,首先采用{R 1,S 1,S 2}序列进行所述神经网络模型的训练,过程如下: Since the {R 4 , S 4 , S 5 } sequence can be directly determined, it does not require training, so in the sample library, T = {{R 1 , S 1 , S 2 }, {R 3 , S 2 , S 3 }, {R 2 , S 3 , S 4 }}. Based on the sample library, the training of the neural network model is first performed using the {R 1 , S 1 , S 2 } sequence, as follows:
将R 1输入到神经网络模型中,获取神经网络模型输出的下一阅读状态的预测信息O 1。选取S 1中值为0所对应的样本块,可得到集合G *={R 2,R 3,R 4,R 5}。将集合G *中的各项分别与O 1进行点积,可得到V *={v 2,v 3,v 4,v 5},归一化后得 到
Figure PCTCN2018075626-appb-000007
The R 1 is input into the neural network model, and the prediction information O 1 of the next reading state output by the neural network model is obtained. Selecting a value of 0 in S 1 corresponding sample blocks, obtained set G * = {R 2, R 3, R 4, R 5}. The set G * respectively in the dot product of the O 1, to obtain V * = {v 2, v 3, v 4, v 5} obtained after normalization
Figure PCTCN2018075626-appb-000007
获取G *中各项在S 2中对应的状态值,可得到集合V π: Get the value of G * in the state S 2 in a corresponding, set of obtained V π:
V π={v′ 2,v′ 3,v′ 4,v′ 5}={0,2,0,0}; V π ={v' 2 ,v' 3 ,v' 4 ,v' 5 }={0,2,0,0};
归一化处理可得到V ππ={v″ 2,v″ 3,v″ 4,v″ 5}={0,1,0,0}。 The normalization process yields V ππ ={v" 2 , v" 3 , v" 4 , v " 5 } = {0, 1, 0, 0}.
根据集合V **和集合V ππ可构建样本块R 1参与训练时对应的损失函数: According to the set V ** and the set V ππ , the corresponding loss function of the sample block R 1 participating in the training can be constructed:
Figure PCTCN2018075626-appb-000008
Figure PCTCN2018075626-appb-000008
通过BP算法可更新所述神经网络模型中的所有参数。All parameters in the neural network model can be updated by the BP algorithm.
按照上述步骤继续训练,即根据序列{R 3,S 2,S 3},{R 2,S 3,S 4}也按照上述步骤继续训练,由此可完成所述神经网络模型的训练。本实施例中,通过选取适当的训练样本,可得到性能稳定的神经网络模型;基于训练后的神经网络模型进行文本块寻径,可准确得到当前文本块的下一文本块,有利于准确检测出各类型文档图片中的文档阅读顺序。 The training is continued according to the above steps, that is, according to the sequence {R 3 , S 2 , S 3 }, {R 2 , S 3 , S 4 }, the training is continued in accordance with the above steps, whereby the training of the neural network model can be completed. In this embodiment, a neural network model with stable performance can be obtained by selecting an appropriate training sample; the text block finding based on the trained neural network model can accurately obtain the next text block of the current text block, which is favorable for accurate detection. The order in which documents are read in each type of document picture.
本申请上述实施例的检测文档阅读顺序的方法,可应用于OCR系统中自动文档分析模块,所述自动文档分析模块在识别出文档图片包含的文本块之后,对识别出的文本块进行排序,然后将文本块的阅读顺序输出给文本识别模块,在文本识别模块中进行文本识别后,基于已经得到的阅读顺序,整理成最终的可阅读文档,从而进行自动分析和存储。具体的,所述自动文档分析模块在对文本块进行排序时,涉及信息处理过程包括:The method for detecting the reading order of the document in the above embodiment of the present application can be applied to an automatic document analysis module in an OCR system, and the automatic document analysis module sorts the identified text blocks after identifying the text block included in the document image. Then, the reading order of the text block is output to the text recognition module, and after the text recognition is performed in the text recognition module, the final readable document is organized based on the already obtained reading order, thereby performing automatic analysis and storage. Specifically, when the automatic document analysis module sorts the text blocks, the information processing process includes:
设定选择算法A=Α(R,S),该算法根据当前文本块R和当前的阅读顺序的状态S,推导出下一个阅读顺序的状态S,可以表示为:The selection algorithm A=Α(R, S) is set, and the algorithm derives the state S of the next reading order according to the current text block R and the state S of the current reading order, which can be expressed as:
Figure PCTCN2018075626-appb-000009
Figure PCTCN2018075626-appb-000009
其中S 0={s i=0;i∈[1,n]},S n={s i=i;i∈[1,n]},n表示文档图片包含 的文本块的总数。 Where S 0 = {s i =0; i ∈ [1, n]}, S n = {s i = i; i ∈ [1, n]}, where n represents the total number of text blocks contained in the document picture.
进一步的,所述算法A可分成三个部分:Further, the algorithm A can be divided into three parts:
1)R start选择器Ψ 1 1) R start selector Ψ 1
Ψ 1用于对起始文本块进行选择,起始文本块用R start标记。在所有的文本块R中,选取中心点坐标位于文档图片最左边的一个R,标记为R l,然后对剩余的R相对于R l进行计算,选取y(R)<y(R l)的文本块构建集合G′,优先的,还可对G′中的R按照y坐标降序排列,然后按照顺序将G′中的每一个R与R l进行对比,如果R与R l在x轴方向的投影有交集,则将此R标记为R l,将所述R从G′中删除;否则,不更新R l,直接将此R从G′中删除;重复上述动作,直到G′为空,可确定R start=R lΨ 1 is used to select the starting text block, and the starting text block is marked with R start . In all the text blocks R, select an R whose center point coordinate is located at the leftmost side of the document picture, denoted as R l , and then calculate the remaining R relative to R l and select y(R)<y(R l ) Construction of a set of text blocks G ', preferentially, but also for G' R in descending order according to y coordinate, then compared follow each R L and R sequentially G 'is, if the direction R L and R in the x-axis projection intersect, mark this as the R L R, the 'delete; otherwise, not updating R L, R from this G' from the G deletion R; above operation is repeated until the G 'is empty , can determine R start = R l .
在一优选实施例中,每次在将新的R标记为R l,将所述R从G′中删除之后,若检测到此时集合G′不为空,则更新集合G′(即获取所有中心点y坐标小于更新后R 1中心点y坐标的文本块得到新的集合G′),通过更新集合G′,可进一步减少选择起始文本块的时间。 In a preferred embodiment, each time after the new R is marked as R l and the R is deleted from G′, if it is detected that the set G′ is not empty at this time, the set G′ is updated (ie, acquired) All text blocks whose center point y coordinate is smaller than the updated R 1 center point y coordinate get a new set G'), and by updating the set G', the time for selecting the start text block can be further reduced.
2)特征生成器Ψ 2 2) Feature Generator Ψ 2
Ψ 2用于根据当前文本块R i得出下一个阅读顺序状态的特征预测信息O i+1,可以描述为: Ψ 2 is used to derive the feature prediction information O i+1 according to the current text block R i to the next reading order state, which can be described as:
Figure PCTCN2018075626-appb-000010
Figure PCTCN2018075626-appb-000010
如上所述,各文本块可描述为R={x,y,w,h,s,d},对应的Ψ 2可选用一个包括6维输入、6维输出和两个分别为12维和20维的隐层的全连神经网络,其结构如图4所示,其中每个圆圈表示一个神经元。对于每个样本块,若表示为R={r i;i∈[0,6)},则第一个隐层的输出K 1为: As mentioned above, each text block can be described as R={x, y, w, h, s, d}, and the corresponding Ψ 2 can be selected to include a 6-dimensional input, a 6-dimensional output, and two 12-dimensional and 20-dimensional outputs, respectively. The fully connected neural network of the hidden layer has a structure as shown in Fig. 4, in which each circle represents a neuron. For each sample block, if expressed as R = {r i ; i ∈ [0, 6)}, the output K 1 of the first hidden layer is:
Figure PCTCN2018075626-appb-000011
Figure PCTCN2018075626-appb-000011
第二隐层的输出为:The output of the second hidden layer is:
Figure PCTCN2018075626-appb-000012
Figure PCTCN2018075626-appb-000012
6维输出层的输出为:The output of the 6-dimensional output layer is:
O={o i=sigmoid∑a oik 2j+b oi;i∈[0,6),j∈[0,20)} O={o i =sigmoid∑a oi k 2j +b oi ;i∈[0,6),j∈[0,20)}
其中a、b均为需要训练的参数。O即为Ψ 2的输出。 Where a and b are parameters that require training. O is the output of Ψ 2 .
3)特征合成器Ψ 3 3) Feature Synthesizer Ψ 3
通过Ψ 2得到下一阅读顺序状态的特征预测信息之后,按照如下方式更新当前的阅读顺序状态S,以得到下一阅读顺序状态: After obtaining the feature prediction information of the next reading order state by Ψ 2 , the current reading order state S is updated as follows to obtain the next reading order state:
I)获取在当前阅读顺序状态S状态中为值0的文本块,构建集合G *I) acquiring a text block having a value of 0 in the current reading order state S state, constructing a set G * ,
G *={R i;S k(R i)=0};i∈[1,n]; G * ={R i ;S k (R i )=0}; i∈[1,n];
II)对于每一个R i∈G *,计算v i=R i·O,得到集合V *,V *={v i=R i·O}; II) For each R i ∈G * , calculate v i =R i ·O to obtain a set V * , V * ={v i =R i ·O};
III)找出V *中的最大值,并找出该值对应的文本块,记为R *III) Find the maximum value in V * and find the text block corresponding to the value, denoted as R * ;
IV)更新当前阅读顺序状态S,即更新S中的S(R *)的值为S(R *)=max(S)+1;由此可得到对应的下一阅读顺序状态,即得到对应的下一文本块。以此类推,可到全部文本块的排序。 IV) updating the current state of the reading order of S, S is updated in S (R *) value of S (R *) = max ( S) +1; reading order thereby to obtain a state corresponding to a next, i.e., to give the corresponding The next block of text. By analogy, you can sort all the text blocks.
结合上述实施例所述,下面以图5所示的文档图片为例,对本申请的检测文档阅读顺序的方法进行举例说明。包括步骤一~步骤五,各步骤具体说明如下:The method for detecting the reading order of the document in the present application is exemplified by taking the document picture shown in FIG. 5 as an example. Including steps 1 to 5, the steps are as follows:
步骤一,对原始的文档图片进行二值化处理和方向校正处理;再对经过二值化处理及方向校正处理的文档图片进行版面分析,得到文档中包含的全部文本块。如图5所示,得到文档中包含的文本块为R 1,R 2,R 3,R 4以及R 5Step one: performing binarization processing and direction correction processing on the original document image; and performing layout analysis on the document image subjected to the binarization processing and the direction correction processing to obtain all the text blocks included in the document. As shown in FIG. 5, the text blocks contained in the document are obtained as R 1 , R 2 , R 3 , R 4 and R 5 .
步骤二,确定起始文本块。In step two, the starting text block is determined.
由于在R 1,R 2,R 3,R 4以及R 5中,R 3的中心点x坐标位于最左侧,因此初始时将R start赋值为R 3Since R 1, R 2, R 3 , R 4 and R 5, R & lt center point x coordinate of the leftmost 3, R start will thus initially assigned to R 3.
获取所有中心点y坐标小于R 3中心点y坐标的文本块,并按照y坐标增序排列,可得到集合G′=(R 2,R 1)。 Obtaining all text blocks whose center point y coordinate is smaller than the R 3 center point y coordinate, and sorting them in y coordinate order, can obtain the set G'=(R 2 , R 1 ).
循环更新R start。检测到文本块R 2与R 3在x轴方向的投影没有交集,因此从集合G′中删除R 2;检测到文本块R 1与R 3在x轴方向的投影有交集,因此将R start更新为R 1,并从集合G′中删除R 1,由于此时集合G′已经为空,因此无需更新集合G′(即无需获取所有中心点y坐标小于R 1中心点y坐标的文本块以更新集合G′),循环结束。获取当前R start对应的文本块为R 1,由此可确定出图5所示文档的起始文本块为R 1Cycle through R start . It is detected that there is no intersection of the projections of the text blocks R 2 and R 3 in the x-axis direction, so R 2 is deleted from the set G′; it is detected that the projections of the text blocks R 1 and R 3 in the x-axis direction have an intersection, so R start Update to R 1 and remove R 1 from the set G′. Since the set G′ is already empty at this time, there is no need to update the set G′ (ie, it is not necessary to obtain all the text blocks whose center point y coordinate is smaller than the R 1 center point y coordinate. To update the set G'), the loop ends. Obtaining the text block corresponding to the current R start is R 1 , thereby determining that the starting text block of the document shown in FIG. 5 is R 1 .
步骤三,从起始文本块R 1开始自动寻径。 Step three, starting from the beginning of automatic routing text block R 1.
当前文本块为R 1={x 1,y 1,w 1,h 1,s 1,d 1},当前状态为S 1=(1,0,0,0,0);将R 1={x 1,y 1,w 1,h 1,s 1,d 1}输入到训练好的神经网络模型,获取神经网络模型输出的预测信息为O={o 1,o 2,o 3,o 4,o 5,o 6}; The current text block is R 1 ={x 1 , y 1 , w 1 , h 1 , s 1 , d 1 }, the current state is S 1 =(1,0,0,0,0); R 1 ={ x 1 , y 1 , w 1 , h 1 , s 1 , d 1 } are input to the trained neural network model, and the predicted information output by the neural network model is O={o 1 , o 2 , o 3 , o 4 ,o 5 ,o 6 };
基于当前状态为S 1=(1,0,0,0,0),可得到集合G *={R 2,R 3,R 4,R 5}; Based on the current state S 1 = ( 1, 0, 0, 0, 0), the set G * = {R 2 , R 3 , R 4 , R 5 };
进一步可得到:Further available:
V *={R 2·O,R 3·O,R 4·O,R 5·O,}; V * ={R 2 ·O, R 3 ·O, R 4 ·O, R 5 ·O,};
R i·O=x i×o 1+y i×o 2+w i×o 3+h i×o 4+d i×o 5R i ·O=x i ×o 1 +y i ×o 2 +w i ×o 3 +h i ×o 4 +d i ×o 5 ;
选取V *中的最大值所对应的文本块,本实施例中可得出R 3·O的值最大,更新当前阅读顺序状态S 1=(1,0,0,0,0)中文本块R 3对应的值为s 3=1+1=2,由此可得出下一状态为S 2=(1,0,2,0,0),确定出下一文本块为R 3Select the maximum value V * in a text block corresponding to the value obtained in Example R 3 · O may be the maximum the present embodiment, the order of reading to update the current state S 1 = (1,0,0,0,0) Chinese chunk The value corresponding to R 3 is s 3 =1+1=2, so that the next state is S 2 =(1,0,2,0,0), and it is determined that the next text block is R 3 .
然后将R 3作为当前文本块,按照同样的方式,可得到R 3对应的下一状态为S 3=(1,3,2,0,0),即R 3对应的下一文本块为R 2;再将R 2作为当前文本块,按 照同样的方式,可得到R 2对应的下一状态为S 4=(1,3,2,4,0),即R 2对应的下一文本块为R 4;然后将R 4作为当前文本块,由于此时对应的集合G *中只有一个文本块(即R 5),可直接将该文本块作为当前文本块的下一文本块并得到对应的下一状态为S 5=(1,3,2,4,5);自此自动寻径结束。 Then, R 3 is taken as the current text block. In the same way, the next state corresponding to R 3 is S 3 = (1, 3, 2, 0, 0), that is, the next text block corresponding to R 3 is R. 2 ; then R 2 as the current text block, in the same way, the next state corresponding to R 2 is S 4 = (1, 3, 2 , 4 , 0), that is, the next text block corresponding to R 2 R 4 ; then R 4 as the current text block, since there is only one text block (ie R 5 ) in the corresponding set G * at this time, the text block can be directly used as the next text block of the current text block and correspondingly The next state is S 5 = (1, 3, 2, 4, 5 ); the automatic path finding ends.
步骤四,根据自动寻径的结果,可得到文档阅读顺序为R 1→R 3→R 2→R 4→R 5Step 4: According to the result of automatic path finding, the document reading order is R 1 → R 3 → R 2 → R 4 → R 5 .
步骤五:按照R 1→R 3→R 2→R 4→R 5的顺序依次对文本块进行文本识别,得到文档对应的可阅读文本信息,对可阅读文本信息进行保存以及输出显示。 Step 5: Perform text recognition on the text block in the order of R 1 → R 3 → R 2 → R 4 → R 5 to obtain readable text information corresponding to the document, and save and output the readable text information.
其中,对文本块的文本识别包括行分割和行识别等步骤,依次以行为单位进行字符识别,由此可得到整个文本块的文本信息。The text recognition of the text block includes steps of line segmentation and line recognition, and character recognition is performed in units of rows in sequence, thereby obtaining text information of the entire text block.
通过上述实施例检测文档阅读顺序的方法,由于神经网络算法拥有大量的参数,根据训练好的神经网络模型,能够兼容多种场景,对文档图片的尺寸、噪声、样式具有更好的鲁棒性。According to the method for detecting the reading order of the document by the above embodiment, since the neural network algorithm has a large number of parameters, according to the trained neural network model, it can be compatible with various scenes, and has better robustness to the size, noise and pattern of the document picture. .
需要说明的是,对于前述的各方法实施例,为了简便描述,将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请并不受所描述的动作顺序的限制,因为依据本申请,某些步骤可以采用其它顺序或者同时进行。此外,还可对上述实施例进行任意组合,得到其他的实施例。It should be noted that, for the foregoing method embodiments, for the sake of brevity, they are all described as a series of action combinations, but those skilled in the art should understand that the present application is not limited by the described action sequence, because In accordance with the present application, certain steps may be performed in other sequences or concurrently. Further, any combination of the above embodiments can be made, and other embodiments can be obtained.
基于与上述实施例中的检测文档阅读顺序的方法相同的思想,本申请还提供检测文档阅读顺序的装置,该装置可用于执行上述检测文档阅读顺序的方法。为了便于说明,检测文档阅读顺序的装置实施例的结构示意图中,仅仅示出了与本申请实施例相关的部分,本领域技术人员可以理解,图示结构并不构成对装置的限定,可以包括比图示更多或更少的部件,或者组合某些部 件,或者不同的部件布置。Based on the same idea as the method of detecting the reading order of documents in the above embodiment, the present application also provides an apparatus for detecting a reading order of a document, the apparatus being usable for performing the above-described method of detecting a reading order of a document. For the convenience of the description, in the structural schematic diagram of the device embodiment for detecting the reading order of the document, only the parts related to the embodiment of the present application are shown. Those skilled in the art can understand that the illustrated structure does not constitute a limitation on the device, and may include More or fewer parts than the illustration, or a combination of some parts, or a different part arrangement.
在一个实施例中,还提供了一种计算机设备,该计算机设备的内部结构可如图2所示,该计算机设备包括检测文档阅读顺序的装置,检测文档阅读顺序的装置中包括各个模块,每个模块可全部或部分通过软件、硬件或其组合来实现。In an embodiment, there is also provided a computer device, the internal structure of which may be as shown in FIG. 2, the computer device includes means for detecting a reading order of the document, and the device for detecting the reading order of the document includes each module, each The modules may be implemented in whole or in part by software, hardware or a combination thereof.
图6为本申请一实施例的检测文档阅读顺序的装置的示意性结构图;如图6所示,本实施例的检测文档阅读顺序的装置包括:块识别模块610、起始块选择模块620、自动寻径模块630以及顺序确定模块640,各模块详述如下:FIG. 6 is a schematic structural diagram of an apparatus for detecting a reading order of a document according to an embodiment of the present invention; as shown in FIG. 6, the apparatus for detecting a reading order of a document includes: a block identifying module 610, and a starting block selecting module 620. The automatic path finding module 630 and the sequence determining module 640 are detailed as follows:
所述块识别模块610,用于识别文档图片中包含的文本块,构建一个块集合;The block identification module 610 is configured to identify a text block included in a document picture, and construct a block set;
在一个实施例中,所述块识别模块610具体可包括:预处理子模块,用于对所述文档图片进行二值化处理和方向校正处理;以及,版面识别子模块,用于对经过二值化处理及方向校正处理的文档图片进行版面分析,得到文档中包含的文本块。其中,版面分析是指在OCR中,将文档图片中的内容按照段落、分页等信息划分为多个不重叠的区域的算法。由此可得出文档中包含的全部文本块,例如图3所示或者图5所示。In an embodiment, the block identification module 610 may specifically include: a pre-processing sub-module for performing binarization processing and direction correction processing on the document picture; and a layout recognition sub-module for The document image of the value processing and the direction correction processing is subjected to layout analysis to obtain a text block included in the document. Among them, the layout analysis refers to an algorithm for dividing the content in a document picture into a plurality of non-overlapping regions according to paragraphs, pagination, and the like in the OCR. This will result in all the text blocks contained in the document, as shown in Figure 3 or Figure 5.
所述起始块选择模块620,用于从所述块集合中确定出一起始文本块。The start block selection module 620 is configured to determine a starting text block from the block set.
通常情况下,人们在阅读文档时会从文档的一角开始进行阅读,基于此,在一个实施例中,所述起始块选择模块620可用于从所述块集合中选择出中心点坐标位于所述文档图片的一个顶点的文本块,并将该文本块确定为所述起始文本块。例如,所述起始块选择模块620可用于从全部文本块中选择出中心点坐标位于文档图片的左侧且最上方的一文本块(即左上角的文本块), 将该文本块确定为起始文本块。如图3中所示的文本块R 1,或者图5中所示的文本块R 1In general, a person reads a document from a corner of the document. Based on this, in an embodiment, the start block selection module 620 can be used to select a center point coordinate from the block set. A text block of a vertex of the document picture is determined and the text block is determined as the starting text block. For example, the start block selection module 620 can be configured to select, from all the text blocks, a text block whose center point coordinates are located on the left side and the top of the document picture (ie, the text block in the upper left corner), and determine the text block as The starting text block. The text block R 1 as shown in FIG. 3, or the text block R 1 shown in FIG.
可以理解的,在其他实施例中,对于不同的文档和实际的阅读习惯(例如从右到左排版的文档),所述起始块选择模块620也可将其他文本块确定为起始文本块。It will be appreciated that in other embodiments, the starting block selection module 620 may also determine other text blocks as starting text blocks for different documents and actual reading habits (eg, documents formatted from right to left). .
所述自动寻径模块630,用于根据该起始文本块的特征信息对该起始文本块执行寻径操作,以确定出所述块集合中与该起始文本块对应的第一文本块;文本块的特征信息包括该文本块在文档图片中的位置信息以及该文本块的版面布局信息;根据所述第一文本块的特征信息对该第一文本块执行寻径操作,以确定出所述块集合中与该第一文本块对应的文本块;并依此类推直到所述块集合中每一个文本块对应的寻径操作的执行顺序能够唯一确定。The automatic path finding module 630 is configured to perform a routing operation on the starting text block according to the feature information of the starting text block to determine a first text block in the block set corresponding to the starting text block. The feature information of the text block includes location information of the text block in the document image and layout information of the text block; performing a routing operation on the first text block according to the feature information of the first text block to determine a text block corresponding to the first text block in the set of blocks; and so on until the execution order of the routing operation corresponding to each text block in the block set can be uniquely determined.
本实施例中,所述自动寻径模块630用于执行一个自起始文本块起,对文档包含的文本块进行自动寻径的过程,且每次寻径只需确定当前文本块对应的下一文本块。例如图3所示的文档图片,当前文本块为R 1,通过本次寻径可确定文本块R 1的下一文本块为R 2;然后将R 2作为当前文本再次进行寻径,得到R 2的下一文本块为R 4;以此类推,直到确定出R 6的下一文本块为R 7为止,每一个文本块对应的寻径操作的执行顺序能够唯一确定。 In this embodiment, the automatic path finding module 630 is configured to perform a process of automatically routing a text block included in a document from a starting text block, and each path finding only needs to determine the current text block corresponding to the next. A block of text. For example, a document image shown in FIG. 3, the current text block R 1, may determine that the next block of text is a text block of R 1 R 2 through this routing; R 2 was then performed again as the current routing text, to give R The next text block of 2 is R 4 ; and so on, until it is determined that the next text block of R 6 is R 7 , the execution order of the routing operations corresponding to each text block can be uniquely determined.
所述顺序确定模块640,用于确定所述块集合中文本块对应的寻径操作的执行顺序,根据所述执行顺序得到所述文档图片中文本块的阅读顺序。The sequence determining module 640 is configured to determine an execution order of the routing operations corresponding to the text blocks in the block set, and obtain a reading order of the text blocks in the document picture according to the execution order.
例如所述顺序确定模块640可得到图3所示的文档图片中文本块的阅读顺序为R 1→R 2→R 4→R 5→R 3→R 6→R 7→R 8For example, the sequence determining module 640 can obtain the reading order of the text blocks in the document picture shown in FIG. 3 as R 1 → R 2 → R 4 → R 5 → R 3 → R 6 → R 7 → R 8 .
在一个实施例中,所述起始块选择模块620具体可用于以文档图片左上角顶点为原点建立XOY坐标系,并且该XOY坐标系x轴正方向指向文档图 片的宽度方向,y轴正方向指向文档图片的长度方向;从所述块集合中获取中心点的x坐标最小的一个文本块,作为文本块A;In an embodiment, the starting block selection module 620 is specifically configured to establish an XOY coordinate system with an vertices of an upper left corner of the document image as an origin, and the X-axis positive direction of the XOY coordinate system points to a width direction of the document image, and a positive direction of the y-axis Pointing to the length direction of the document picture; obtaining a text block having the smallest x coordinate of the center point from the block set as the text block A;
获取中心点的y坐标小于所述文本块A的文本块,构建一个文本块集合G′;并依次将该集合G′中的每一个文本块B与所述文本块A进行对比;Obtaining a text block whose center point is smaller than the text block of the text block A, constructing a text block set G'; and sequentially comparing each text block B in the set G' with the text block A;
若所述文本块B与该文本块A在x轴方向的投影不存在交集,则将所述文本块B从集合G′中删除;若所述文本块B与该文本块A在x轴方向的投影存在交集,则更新所述文本块A为所述文本块B,并将所述文本块B从集合G′中删除;在每次文本块对比之后检测集合G′是否为空;若是,则将当前的文本块A确定为起始文本块;若否,则在所述文本块A发生更新时更新集合G′,并将更新后的集合G′中的每一个文本块与当前的文本块A进行上述对比;依次类推直到集合G′为空。If there is no intersection between the text block B and the projection of the text block A in the x-axis direction, the text block B is deleted from the set G'; if the text block B and the text block A are in the x-axis direction The intersection of the projections is updated, the text block A is updated as the text block B, and the text block B is deleted from the set G'; whether the set G' is empty after each text block comparison; if yes, Then determining the current text block A as the starting text block; if not, updating the set G' when the text block A is updated, and updating each text block in the updated set G' with the current text Block A performs the above comparison; and so on until the set G' is empty.
在一个实施例中,每次在用新的文本块B更新所述文本块A,将所述文本块B从G′中删除之后,若检测到此时集合G′不为空,则更新集合G′(即获取所有中心点y坐标小于更新后的文本块A中心点y坐标的文本块得到新的集合G′),通过更新集合G′,可进一步减少选择起始文本块的时间。In one embodiment, each time after updating the text block A with a new text block B and deleting the text block B from G', if it is detected that the set G' is not empty at this time, the set is updated. G' (i.e., obtaining a text block in which all center point y coordinates are smaller than the updated text block A center point y coordinate to obtain a new set G'), by updating the set G', the time for selecting the start text block can be further reduced.
在一个实施例中,如图7所示,所述检测文档阅读顺序的装置还包括:训练模块650,用于预先训练机器学习模型,使得训练之后的机器学习模型输出的特征预测信息与对应的样本信息的欧式距离满足设定条件。In an embodiment, as shown in FIG. 7, the apparatus for detecting a reading order of a document further includes: a training module 650, configured to pre-train the machine learning model, so that the feature prediction information output by the machine learning model after the training and the corresponding The Euclidean distance of the sample information satisfies the set condition.
在一个实施例中,所述训练模块650可包括样本库构建子模块和训练子模块。其中,样本库构建子模块,用于获取训练样本,建立样本库M={G,S,T},其中G表示样本块的集合,S表示样本块在先后各次训练中的顺序状态的集合,T表示训练过程中需确定的状态变化序列;若G中样本块的总数为n,则有,In one embodiment, the training module 650 can include a sample library construction sub-module and a training sub-module. The sample library construction sub-module is configured to acquire training samples, and establish a sample library M={G, S, T}, where G represents a set of sample blocks, and S represents a set of sequential states of the sample blocks in successive trainings. , T represents the sequence of state changes to be determined during the training; if the total number of sample blocks in G is n, then,
S={s i;i∈[1,n],s i∈[0,n]}; S={s i ;i∈[1,n],s i ∈[0,n]};
T={{R 1,S 1,S 2},{R 2,S 2,S 3},...{R n-2,S n-2,S n-1}}; T={{R 1 ,S 1 ,S 2 },{R 2 ,S 2 ,S 3 },...{R n-2 ,S n-2 ,S n-1 }};
s i=0表示样本块R i的阅读顺序未确定(即执行寻径操作的顺序未确定),若s i>0表示样本块R i的阅读顺序已确定(即执行寻径操作的顺序已确定),且阅读顺序为s i的值,表示为S(R i)=s i;T中的每一个序列中的各项分别表示当前参与训练的样本块、当前所有样本块的顺序状态的集合和需预测出的所有样本块的下一顺序状态的集合。 s i =0 indicates that the reading order of the sample block R i is not determined (ie, the order in which the routing operation is performed is not determined), and if s i >0 indicates that the reading order of the sample block R i has been determined (ie, the order in which the routing operations are performed has been performed) Determine), and the reading order is the value of s i , expressed as S(R i )=s i ; each item in the T represents the sequence state of the currently participating training sample block and all current sample blocks respectively The set and the set of next sequential states of all sample blocks to be predicted.
其中,训练子模块,用于依次采用T中的各个序列对机器学习模型中的参数进行训练;当T中的所有序列均参与训练之后,保存所述机器学习模型中的参数。The training sub-module is configured to sequentially train the parameters in the machine learning model by using each sequence in the T; and after all the sequences in the T participate in the training, save the parameters in the machine learning model.
在一个实施例中,所述训练子模块在根据T中的第k个序列{R k,S k,S k+1}对机器学习模型中的参数进行训练时,用于实现以下过程: In one embodiment, the training sub-module is used to implement the following process when training parameters in the machine learning model according to the kth sequence {R k , S k , S k+1 } in T:
将样本块R k的特征信息输入机器学习模型,获取机器学习模型输出的R k的下一文本块的特征预测信息O k,k∈[1,n-2]; Inputting the feature information of the sample block R k into the machine learning model, and acquiring feature prediction information O k , k ∈ [1, n-2] of the next text block of R k output by the machine learning model;
获取S k中顺序状态为0的样本块R i,得到集合G *Obtaining a sample block R i with a sequential state of 0 in S k , and obtaining a set G * ,
G *={R i;S k(R i)=0};i∈[1,n]; G * ={R i ;S k (R i )=0}; i∈[1,n];
将集合G *中各项分别与O k进行点积运算,得到集合V *={v i=R i·O k}; The set G * respectively in the dot product of O k, to obtain a set of V * = {v i = R i · O k};
获取集合G *中各项在S k+1中对应的顺序状态,得到集合V π={v′ i=S k+1(R i)}; Obtaining a sequence state corresponding to each item in the set G * in S k+1 , and obtaining a set V π ={v' i =S k+1 (R i )};
对集合V *进行归一化处理得到集合V **,对集合V π进行归一化处理得到集合V ππ;根据集合V **和集合V ππ构建样本块R k参与训练时对应的损失函数,基于该损失函数通过BP算法更新所述机器学习模型中的参数,其中所述损失函数为: The set V ** is normalized to obtain the set V ** , and the set V π is normalized to obtain the set V ππ ; the sample block R k is constructed according to the set V ** and the set V ππ to participate in the corresponding loss function during training. Updating parameters in the machine learning model by a BP algorithm based on the loss function, wherein the loss function is:
loss=|V **-V ππ|。 Loss=|V ** -V ππ |.
在一个实施例中,所述块识别模块610还用于获取各文本块的特征向量R={x,y,w,h,s,d};其中x表示文本块的中心点的x坐标,y表示文本块的中心点的y坐标,w表示文本块的宽度,h表示文本块的高度,s表示文本块中所有连通区域的尺度均值,d表示文本块的密度信息。In an embodiment, the block identification module 610 is further configured to acquire a feature vector R={x, y, w, h, s, d} of each text block; wherein x represents an x coordinate of a center point of the text block, y represents the y coordinate of the center point of the text block, w represents the width of the text block, h represents the height of the text block, s represents the scale mean of all connected regions in the text block, and d represents the density information of the text block.
对应的,所述机器学习模型为6维输入且6维输出的神经网络模型。例如:所述神经网络模型包括6维输入层、6维输出层、第一隐层以及第二隐层,所述第一隐层、第二隐层分别为12维和20维的隐层;Correspondingly, the machine learning model is a 6-dimensional input and 6-dimensional output neural network model. For example, the neural network model includes a 6-dimensional input layer, a 6-dimensional output layer, a first hidden layer, and a second hidden layer, wherein the first hidden layer and the second hidden layer are 12-dimensional and 20-dimensional hidden layers, respectively;
若每个文本块的特征信息表示为R={r j;j∈[0,6)},r j表示样本块的特征信息j,则所述第一隐层的输出K 1和第二隐层的输出K 2分别为: If the feature information of each text block is represented as R={r j ;j∈[0,6)}, and r j represents the feature information j of the sample block, the output K 1 and the second hidden layer of the first hidden layer The output K 2 of the layer is:
Figure PCTCN2018075626-appb-000013
Figure PCTCN2018075626-appb-000013
Figure PCTCN2018075626-appb-000014
Figure PCTCN2018075626-appb-000014
所述6维输出层的输出为O:The output of the 6-dimensional output layer is O:
O={o n=sigmoid∑a onk 2m+b on;n∈[0,6),m∈[0,20)}; O={o n =sigmoid∑a on k 2m +b on ;n∈[0,6),m∈[0,20)};
其中a 1i、b 1i为第一隐层对应的参数,k 1i为第一隐层的第i维输出;a 2m、b 2m为第二隐层对应的参数,k 2m为第二隐层的第m维输出;a on、b on为6维输出层对应的参数,o n为第n维输出,Sigmoid表示S型的非线性函数。 Where a 1i and b 1i are parameters corresponding to the first hidden layer, k 1i is the i-th output of the first hidden layer; a 2m and b 2m are parameters corresponding to the second hidden layer, and k 2m is the second hidden layer The m- th output; a on and b on are parameters corresponding to the 6-dimensional output layer, o n is the n-th output, and Sigmoid represents the S-type nonlinear function.
在一个实施例中,所述的检测文档阅读顺序的装置还包括:文本识别模块660,用于对各个文本块进行文本识别,并按照确定出的阅读顺序得到所述文档图片的文本信息。In an embodiment, the apparatus for detecting a reading order of the document further includes: a text recognition module 660, configured to perform text recognition on each of the text blocks, and obtain text information of the document image according to the determined reading order.
基于上述实施例提供的检测文档阅读顺序的装置,可识别文档图片中包含的全部文本块,并从全部文本块中确定出一起始文本块;接下来从起始文 本块开始寻径,根据预先训练好的机器学习模型决定下一步应该走到哪个文本块区域,直到得出全部文本块的阅读顺序。根据文本块在文档图片中的位置信息以及该文本块的版面布局信息执行寻径能够兼容多种场景,对文档图片的尺寸、噪声、样式具有更好的鲁棒性,能够准确识别各类文档图片对应的文档阅读顺序。The device for detecting the reading order of the document according to the above embodiment can identify all the text blocks included in the document picture, and determine a starting text block from all the text blocks; then start the path starting from the starting text block, according to the advance The trained machine learning model determines which text block area should be taken next until the reading order of all text blocks is obtained. According to the position information of the text block in the document picture and the layout information of the text block, the path finding can be compatible with various scenes, and has better robustness to the size, noise and pattern of the document picture, and can accurately identify various types of documents. The order in which the images correspond to the reading order.
需要说明的是,上述示例的检测文档阅读顺序的装置的实施方式中,各模块之间的信息交互、执行过程等内容,由于与本申请前述方法实施例基于同一构思,其带来的技术效果与本申请前述方法实施例相同,具体内容可参见本申请方法实施例中的叙述,此处不再赘述。It should be noted that, in the implementation of the apparatus for detecting the reading order of the document in the above example, the information interaction, the execution process, and the like between the modules are based on the same concept as the foregoing method embodiment of the present application, and the technical effects thereof are brought about. For the details of the foregoing method embodiment, refer to the description in the method embodiment of the present application, and details are not described herein again.
此外,上述示例的检测文档阅读顺序的装置的实施方式中,各功能模块的逻辑划分仅是举例说明,实际应用中可以根据需要,例如出于相应硬件的配置要求或者软件的实现的便利考虑,将上述功能分配由不同的功能模块完成,即将所述检测文档阅读顺序的装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。其中各功能模既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。In addition, in the implementation of the apparatus for detecting the reading order of the document in the above example, the logical division of each functional module is merely an example, and the actual application may be considered according to requirements, for example, for the configuration requirements of the corresponding hardware or the convenience of implementation of the software. The above-mentioned function assignment is completed by different function modules, that is, the internal structure of the device for detecting the reading order of the documents is divided into different functional modules to complete all or part of the functions described above. Each function module can be implemented in the form of hardware or in the form of a software function module.
本领域普通技术人员可以理解,实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一计算机可读取存储介质中,作为独立的产品销售或使用。所述程序在执行时,可执行如上述各方法的实施例的全部或部分步骤。其中,所述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)或随机存储记忆体(Random Access Memory,RAM)等。It will be understood by those skilled in the art that all or part of the processes in the above embodiments may be implemented by a computer program to instruct related hardware, and the program may be stored in a computer readable storage medium as Independent product sales or use. The program, when executed, may perform all or part of the steps of an embodiment of the methods described above. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), or a random access memory (RAM).
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其它实施例的相关描述。In the above embodiments, the descriptions of the various embodiments are all focused, and the parts that are not detailed in a certain embodiment can be referred to the related descriptions of other embodiments.
应该理解的是,虽然本申请各实施例中的各个步骤并不是必然按照步骤标号指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,各实施例中至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。It should be understood that the various steps in the various embodiments of the present application are not necessarily performed in the order indicated by the steps. Except as explicitly stated herein, the execution of these steps is not strictly limited, and the steps may be performed in other orders. Moreover, at least some of the steps in the embodiments may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be executed at different times, and the execution of these sub-steps or stages The order is also not necessarily sequential, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of the other steps.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一非易失性计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。One of ordinary skill in the art can understand that all or part of the process of implementing the above embodiments can be completed by a computer program to instruct related hardware, and the program can be stored in a non-volatile computer readable storage medium. Wherein, the program, when executed, may include the flow of an embodiment of the methods as described above. Any reference to a memory, storage, database or other medium used in the various embodiments provided herein may include non-volatile and/or volatile memory. Non-volatile memory can include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory can include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of formats, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronization chain. Synchlink DRAM (SLDRAM), Memory Bus (Rambus) Direct RAM (RDRAM), Direct Memory Bus Dynamic RAM (DRDRAM), and Memory Bus Dynamic RAM (RDRAM).
以上所述实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。The technical features of the above-described embodiments may be arbitrarily combined. For the sake of brevity of description, all possible combinations of the technical features in the above embodiments are not described. However, as long as there is no contradiction between the combinations of these technical features, All should be considered as the scope of this manual.
以上所述实施例仅表达了本申请的几种实施方式,不能理解为对本申请专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本发明构思的前提下,还可以做出若干变形和改进,这些都属于本申请的 保护范围。因此,本申请专利的保护范围应以所附权利要求为准。The above described embodiments are merely illustrative of several embodiments of the present application and are not to be construed as limiting the scope of the claims. It should be noted that a number of variations and modifications may be made by those skilled in the art without departing from the spirit and scope of the invention. Therefore, the scope of the invention should be determined by the appended claims.

Claims (30)

  1. 一种检测文档阅读顺序的方法,包括:A method of detecting a reading order of a document, comprising:
    计算机设备识别文档图片中包含的文本块,构建一个块集合;The computer device identifies a block of text contained in the document picture to construct a block set;
    所述计算机设备从所述块集合中确定出一起始文本块;The computer device determines a starting text block from the set of blocks;
    所述计算机设备根据所述起始文本块的特征信息对所述起始文本块执行寻径操作,以确定出所述块集合中与所述起始文本块对应的第一文本块;其中,文本块的特征信息至少包括该文本块在文档图片中的位置信息以及该文本块的版面布局信息;The computer device performs a routing operation on the starting text block according to the feature information of the starting text block to determine a first text block corresponding to the starting text block in the block set; The feature information of the text block includes at least location information of the text block in the document picture and layout layout information of the text block;
    所述计算机设备根据所述第一文本块的特征信息对所述第一文本块执行寻径操作,以确定出所述块集合中与所述第一文本块对应的文本块;并依此类推直到所述块集合中每一个文本块对应的寻径操作的执行顺序能够唯一确定;及The computer device performs a routing operation on the first text block according to the feature information of the first text block to determine a text block corresponding to the first text block in the block set; and so on Until the execution order of the routing operations corresponding to each text block in the block set can be uniquely determined; and
    所述计算机设备根据所述执行顺序得到所述文档图片中文本块的阅读顺序。The computer device obtains a reading order of the text blocks in the document picture according to the execution order.
  2. 根据权利要求1所述的检测文档阅读顺序的方法,其特征在于,所述计算机设备从所述块集合中确定出一起始文本块,包括:The method for detecting a reading order of a document according to claim 1, wherein the determining, by the computer device, a starting text block from the set of blocks comprises:
    所述计算机设备从所述块集合中选择出中心点坐标位于所述文档图片的一个顶点的文本块,并将所述文本块确定为所述起始文本块。The computer device selects, from the set of blocks, a text block whose center point coordinates are located at one vertex of the document picture, and determines the text block as the starting text block.
  3. 根据权利要求1所述的检测文档阅读顺序的方法,其特征在于,所述计算机设备从所述块集合中确定出一起始文本块,包括:The method for detecting a reading order of a document according to claim 1, wherein the determining, by the computer device, a starting text block from the set of blocks comprises:
    所述计算机设备以所述文档图片的一个顶点为原点建立XOY坐标系,所述XOY坐标系的x轴正方向指向所述文档图片的宽度方向,y轴正方向指向所述文档图片的长度方向;The computer device establishes an XOY coordinate system with a vertex of the document picture as an origin, an x-axis positive direction of the XOY coordinate system points to a width direction of the document picture, and a positive direction of the y-axis points to a length direction of the document picture ;
    所述计算机设备从所述块集合中获取中心点的x坐标最小的一个文本块,作为文本块A;The computer device obtains, from the block set, a text block having the smallest x coordinate of the center point as the text block A;
    所述计算机设备获取中心点的y坐标小于所述文本块A的文本块,构建一个文本块集合G′;并依次将所述文本块集合G′中的每一个文本块B与所述文本块A进行对比;The computer device acquires a text block whose central point has a y coordinate smaller than the text block A, constructs a text block set G′; and sequentially sets each text block B of the text block set G′ with the text block. A for comparison;
    所述计算机设备若所述文本块B与所述文本块A在x轴方向的投影不存在交集,则将所述文本块B从所述文本块集合G′中删除;若所述文本块B与所述文本块A在x轴方向的投影存在交集,则更新所述文本块A为所述文本块B,并将所述文本块B从所述文本块集合G′中删除;If the computer device does not have an intersection with the projection of the text block A in the x-axis direction, the text block B is deleted from the text block set G'; if the text block B There is an intersection with the projection of the text block A in the x-axis direction, then the text block A is updated as the text block B, and the text block B is deleted from the text block set G';
    所述计算机设备在每次文本块对比之后检测所述文本块集合G′是否为空;若是,则将当前的文本块A确定为起始文本块;若否,则在所述文本块A发生更新时更新所述文本块集合G′,并将更新后的所述文本块集合G′中的每一个文本块与当前的文本块A进行上述对比;依次类推直到所述文本块集合G′为空。The computer device detects whether the text block set G' is empty after each text block comparison; if so, the current text block A is determined as the starting text block; if not, then the text block A occurs Updating the text block set G' at the time of updating, and comparing each of the updated text block sets G' with the current text block A; and so on until the text block set G' is air.
  4. 如权利要求1所述的检测文档阅读顺序的方法,其特征在于,所述寻径操作包括:The method of detecting a reading order of a document according to claim 1, wherein the routing operation comprises:
    所述计算机设备通过预先训练好的机器学习模型对所述文本块的特征信息进行学习,得出与所述文本块对应的文本块的特征预测信息;The computer device learns the feature information of the text block by using a pre-trained machine learning model, and obtains feature prediction information of the text block corresponding to the text block;
    所述计算机设备计算所述块集合中未执行寻径操作的各文本块的特征信息与所述特征预测信息的相关度;及The computer device calculates a correlation between feature information of each text block in which the path finding operation is not performed in the block set and the feature prediction information; and
    所述计算机设备根据上述计算出的相关度确定出所述文本块对应的文本块。The computer device determines a text block corresponding to the text block according to the calculated correlation degree.
  5. 根据权利要求1所述的检测文档阅读顺序的方法,其特征在于,还包 括:The method of detecting a reading order of a document according to claim 1, further comprising:
    所述计算机设备预先训练机器学习模型,使得训练之后的机器学习模型输出的特征预测信息与对应的样本信息的欧式距离满足设定条件。The computer device pre-trains the machine learning model such that the feature prediction information output by the machine learning model after the training and the Euclidean distance of the corresponding sample information satisfy the set condition.
  6. 根据权利要求5所述的检测文档阅读顺序的方法,其特征在于,所述计算机设备预先训练机器学习模型,包括:The method of detecting a reading order of a document according to claim 5, wherein the computer device pre-trains the machine learning model, comprising:
    所述计算机设备建立样本库,所述样本库中的信息包含:样本块的集合,所述样本块的集合中每个样本块在先后各次训练中的顺序状态,以及训练需确定的状态变化序列;若所述样本块的集合中样本块的总数为n,则训练需确定的状态变化序列为n-2个,且每个状态变化序列中的信息包括:当前参与训练的样本块,所述样本块的集合中每个样本块的当前顺序状态,以及所述样本块的集合中每个样本块的下一顺序状态;The computer device establishes a sample library, the information in the sample library includes: a set of sample blocks, a sequence state of each sample block in the set of the sample blocks in successive trainings, and a state change to be determined by the training a sequence; if the total number of sample blocks in the set of sample blocks is n, the sequence of state changes to be determined by the training is n-2, and the information in each state change sequence includes: a sample block currently participating in the training, Determining a current sequential state of each of the sample blocks in the set of sample blocks, and a next sequential state of each of the sample blocks in the set of sample blocks;
    所述计算机设备依次采用各个状态变化序列对机器学习模型进行训练;当n-2个状态变化序列均参与训练之后,保存所述机器学习模型中的参数。The computer device sequentially trains the machine learning model with each state change sequence; after n-2 state change sequences are all involved in the training, the parameters in the machine learning model are saved.
  7. 根据权利要求6所述的检测文档阅读顺序的方法,其特征在于,所述计算机设备采用第k个状态变化序列对机器学习模型进行训练,包括:The method for detecting a reading order of a document according to claim 6, wherein the computer device trains the machine learning model by using the kth state change sequence, comprising:
    所述计算机设备将所述样本块的集合中第k个样本块R k的特征信息输入机器学习模型,获取机器学习模型输出的所述样本块R k对应的文本块的特征预测信息O k,k∈[1,n-2]; The computer device characterized in the characteristic information of the text block k R k blocks of samples of the input sample blocks set of machine learning models, the machine learning model acquiring the output sample blocks corresponding to the prediction information R k O k, K∈[1,n-2];
    所述计算机设备根据所述样本块的集合中每个样本块在所述样本块R k参与训练时的顺序状态,获取其中阅读顺序未确定的样本块,得到集合G *The computer device obtains a sample block in which the reading order is not determined according to a sequence state of each sample block in the set of sample blocks when the sample block R k participates in training, to obtain a set G * ;
    所述计算机设备将所述集合G *中各样本块的特征信息分别与O k进行点积运算,得到集合V *The computer device performs a dot product operation on the feature information of each sample block in the set G * with Ok to obtain a set V * ;
    所述计算机设备获取所述集合G *中各样本块在第k+1个样本块参与训练时的顺序状态,得到集合V πThe computer device acquires a sequence state of each sample block in the set G * when the k+1th sample block participates in training, and obtains a set ;
    所述计算机设备对集合V *进行归一化处理得到集合V **,对集合V π进行归一化处理得到集合V ππ;根据集合V **和集合V ππ构建所述样本块R k参与训练时对应的损失函数,基于所述损失函数通过误差反向传播BP算法更新所述机器学习模型中的参数。 The computer device for the collection V * normalized to give a set of V **, the collection V π normalizing process to obtain a set of V ππ; constructing the set of sample blocks according to R k and V ** set V ππ participation The corresponding loss function during training updates the parameters in the machine learning model by an error backpropagation BP algorithm based on the loss function.
  8. 根据权利要求1所述的检测文档阅读顺序的方法,其特征在于,A method of detecting a reading order of a document according to claim 1, wherein
    文本块在文档图片中的位置信息包括:文本块的中心点在文档图片中的x坐标,文本块的中心点在文档图片中的y坐标;The position information of the text block in the document picture includes: an x coordinate of a center point of the text block in the document picture, and a y coordinate of a center point of the text block in the document picture;
    文本块的版面布局信息包括:文本块的宽度,文本块的高度,文本块中所有连通区域的尺度均值以及文本块的密度信息;The layout information of the text block includes: a width of the text block, a height of the text block, a scale mean of all connected regions in the text block, and density information of the text block;
    所述机器学习模型为6维输入且6维输出的神经网络模型。The machine learning model is a 6-dimensional input and 6-dimensional output neural network model.
  9. 根据权利要求8所述的检测文档阅读顺序的方法,其特征在于,所述神经网络模型包括6维输入层、6维输出层、第一隐层以及第二隐层,所述第一隐层、第二隐层分别为12维和20维的隐层。The method for detecting a reading order of a document according to claim 8, wherein the neural network model comprises a 6-dimensional input layer, a 6-dimensional output layer, a first hidden layer, and a second hidden layer, wherein the first hidden layer The second hidden layer is a hidden layer of 12-dimensional and 20-dimensional, respectively.
  10. 根据权利要求1所述的检测文档阅读顺序的方法,其特征在于,所述计算机设备识别文档图片中包含的文本块,包括:The method for detecting a reading order of a document according to claim 1, wherein the computer device identifies a text block included in the document image, including:
    所述计算机设备对所述文档图片进行二值化处理和方向校正处理;The computer device performs binarization processing and direction correction processing on the document picture;
    所述计算机设备对经过二值化处理及方向校正处理的文档图片进行版面分析,得到文档图片中包括的文本块。The computer device performs layout analysis on the document image subjected to the binarization processing and the direction correction processing to obtain a text block included in the document image.
  11. 根据权利要求1所述的检测文档阅读顺序的方法,其特征在于,还包括:The method for detecting a reading order of a document according to claim 1, further comprising:
    所述计算机设备对各个文本块进行文本识别,并按照所述确定出的阅读 顺序得到所述文档图片的文本信息。The computer device performs text recognition on each text block, and obtains text information of the document picture according to the determined reading order.
  12. 一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述计算机可读指令被所述处理器执行时,使得所述处理器执行如下步骤:A computer device comprising a memory and a processor, the memory storing computer readable instructions, the computer readable instructions being executed by the processor such that the processor performs the following steps:
    识别文档图片中包含的文本块,构建一个块集合;Identify a block of text contained in the document image to construct a block set;
    从所述块集合中确定出一起始文本块;Determining a starting text block from the set of blocks;
    根据所述起始文本块的特征信息对所述起始文本块执行寻径操作,以确定出所述块集合中与所述起始文本块对应的第一文本块;文本块的特征信息包括该文本块在文档图片中的位置信息以及该文本块的版面布局信息;Performing a routing operation on the starting text block according to the feature information of the starting text block to determine a first text block in the block set corresponding to the starting text block; the feature information of the text block includes Position information of the text block in the document picture and layout information of the text block;
    根据所述第一文本块的特征信息对所述第一文本块执行寻径操作,以确定出所述块集合中与所述第一文本块对应的文本块;并依此类推直到所述块集合中每一个文本块对应的寻径操作的执行顺序能够唯一确定;及Performing a routing operation on the first text block according to the feature information of the first text block to determine a text block corresponding to the first text block in the block set; and so on until the block The execution order of the routing operations corresponding to each text block in the collection can be uniquely determined;
    确定所述块集合中文本块对应的寻径操作的执行顺序,根据所述执行顺序得到所述文档图片中文本块的阅读顺序。Determining an execution order of the routing operations corresponding to the text blocks in the block set, and obtaining a reading order of the text blocks in the document picture according to the execution order.
  13. 根据权利要求12所述的计算机设备,其特征在于,所述从所述块集合中确定出一起始文本块,包括:The computer device according to claim 12, wherein said determining a starting text block from said set of blocks comprises:
    从所述块集合中选择出中心点坐标位于所述文档图片的一个顶点的文本块,并将所述文本块确定为所述起始文本块。A text block whose center point coordinates are located at one vertex of the document picture is selected from the set of blocks, and the text block is determined as the start text block.
  14. 根据权利要求12所述的计算机设备,其特征在于,所述从所述块集合中确定出一起始文本块,包括:The computer device according to claim 12, wherein said determining a starting text block from said set of blocks comprises:
    以所述文档图片的一个顶点为原点建立XOY坐标系,所述XOY坐标系的x轴正方向指向所述文档图片的宽度方向,y轴正方向指向所述文档图片的长度方向;Establishing an XOY coordinate system with one vertex of the document picture as an origin, the positive direction of the x-axis of the XOY coordinate system points to the width direction of the document picture, and the positive direction of the y-axis points to the length direction of the document picture;
    从所述块集合中获取中心点的x坐标最小的一个文本块,作为文本块A;Obtaining, from the block set, a text block having a smallest x coordinate of the center point as the text block A;
    获取中心点的y坐标小于所述文本块A的文本块,构建一个文本块集合G′;并依次将所述文本块集合G′中的每一个文本块B与所述文本块A进行对比;Obtaining a text block of the center point that is smaller than the text block of the text block A, constructing a text block set G′; and sequentially comparing each text block B of the text block set G′ with the text block A;
    若所述文本块B与所述文本块A在x轴方向的投影不存在交集,则将所述文本块B从所述文本块集合G′中删除;若所述文本块B与所述文本块A在x轴方向的投影存在交集,则更新所述文本块A为所述文本块B,并将所述文本块B从所述文本块集合G′中删除;If there is no intersection between the text block B and the projection of the text block A in the x-axis direction, the text block B is deleted from the text block set G'; if the text block B and the text An intersection of the projection of the block A in the x-axis direction is performed, the text block A is updated as the text block B, and the text block B is deleted from the text block set G';
    在每次文本块对比之后检测所述文本块集合G′是否为空;若是,则将当前的文本块A确定为起始文本块;若否,则在所述文本块A发生更新时更新所述文本块集合G′,并将更新后的所述文本块集合G′中的每一个文本块与当前的文本块A进行上述对比;依次类推直到所述文本块集合G′为空。Detecting whether the text block set G' is empty after each text block comparison; if so, determining the current text block A as the starting text block; if not, updating the text block A when the update occurs The text block set G' is described, and each of the updated text block sets G' is compared with the current text block A; and so on until the text block set G' is empty.
  15. 根据权利要求12所述的计算机设备,其特征在于,所述寻径操作包括:The computer device according to claim 12, wherein the path finding operation comprises:
    通过预先训练好的机器学习模型对所述文本块的特征信息进行学习,得出与所述文本块对应的文本块的特征预测信息;Learning the feature information of the text block by using a pre-trained machine learning model to obtain feature prediction information of the text block corresponding to the text block;
    计算所述块集合中未执行寻径操作的各文本块的特征信息与所述特征预测信息的相关度;及Calculating a correlation between feature information of each text block in which the path finding operation is not performed in the block set and the feature prediction information; and
    根据上述计算出的相关度确定出所述文本块对应的文本块。A text block corresponding to the text block is determined according to the correlation calculated above.
  16. 根据权利要求12所述的计算机设备,其特征在于,所述计算机可读指令还使得所述处理器执行如下步骤:The computer apparatus according to claim 12, wherein said computer readable instructions further cause said processor to perform the following steps:
    预先训练机器学习模型,使得训练之后的机器学习模型输出的特征预测信息与对应的样本信息的欧式距离满足设定条件。The machine learning model is pre-trained such that the feature prediction information output by the machine learning model after the training and the Euclidean distance of the corresponding sample information satisfy the set condition.
  17. 根据权利要求16所述的计算机设备,其特征在于,所述预先训练机器学习模型,包括:The computer apparatus according to claim 16, wherein said pre-training machine learning model comprises:
    建立样本库,所述样本库中的信息包含:样本块的集合,所述样本块的集合中每个样本块在先后各次训练中的顺序状态,以及训练需确定的状态变化序列;若所述样本块的集合中样本块的总数为n,则训练需确定的状态变化序列为n-2个,且每个状态变化序列中的信息包括:当前参与训练的样本块,所述样本块的集合中每个样本块的当前顺序状态,以及所述样本块的集合中每个样本块的下一顺序状态;Establishing a sample library, the information in the sample library comprising: a set of sample blocks, a sequence state of each sample block in the set of the sample blocks in successive trainings, and a sequence of state changes to be determined by the training; The total number of sample blocks in the set of sample blocks is n, then the sequence of state changes to be determined by the training is n-2, and the information in each state change sequence includes: the sample block currently participating in the training, and the sample block of the sample block a current sequential state of each sample block in the set, and a next sequential state of each sample block in the set of sample blocks;
    依次采用各个状态变化序列对机器学习模型进行训练;当n-2个状态变化序列均参与训练之后,保存所述机器学习模型中的参数。The machine learning model is trained in sequence with each state change sequence; after n-2 state change sequences are all involved in the training, the parameters in the machine learning model are saved.
  18. 根据权利要求17所述的计算机设备,其特征在于,所述采用第k个状态变化序列对机器学习模型进行训练,包括:The computer apparatus according to claim 17, wherein said training the machine learning model with the kth state change sequence comprises:
    将所述样本块的集合中第k个样本块R k的特征信息输入机器学习模型,获取机器学习模型输出的所述样本块R k对应的文本块的特征预测信息O k,k∈[1,n-2]; And inputting feature information of the kth sample block R k in the set of the sample blocks into a machine learning model, and acquiring feature prediction information O k , k∈[1 of the text block corresponding to the sample block R k output by the machine learning model ,n-2];
    根据所述样本块的集合中每个样本块在所述样本块R k参与训练时的顺序状态,获取其中阅读顺序未确定的样本块,得到集合G *Obtaining a sample block in which a reading order is not determined according to a sequence state of each sample block in the set of sample blocks when the sample block R k participates in training, to obtain a set G * ;
    将所述集合G *中各样本块的特征信息分别与O k进行点积运算,得到集合V *The feature information of each sample block in the set G * is respectively subjected to a dot product operation with O k to obtain a set V * ;
    获取所述集合G *中各样本块在第k+1个样本块参与训练时的顺序状态,得到集合V πObtaining a sequence state of each of the sample blocks in the set G * when the k+1th sample block participates in training, and obtaining a set ;
    对集合V *进行归一化处理得到集合V **,对集合V π进行归一化处理得到集 合V ππ;根据集合V **和集合V ππ构建所述样本块R k参与训练时对应的损失函数,基于所述损失函数通过误差反向传播BP算法更新所述机器学习模型中的参数。 The set V ** is normalized to obtain a set V ** , and the set V π is normalized to obtain a set V ππ ; the sample block R k is constructed according to the set V ** and the set V ππ to participate in training A loss function that updates parameters in the machine learning model by an error backpropagation BP algorithm based on the loss function.
  19. 根据权利要求12所述的计算机设备,其特征在于,文本块在文档图片中的位置信息包括:文本块的中心点在文档图片中的x坐标,文本块的中心点在文档图片中的y坐标;文本块的版面布局信息包括:文本块的宽度,文本块的高度,文本块中所有连通区域的尺度均值以及文本块的密度信息;所述机器学习模型为6维输入且6维输出的神经网络模型。The computer device according to claim 12, wherein the position information of the text block in the document picture comprises: an x coordinate of a center point of the text block in the document picture, and a y coordinate of a center point of the text block in the document picture The layout information of the text block includes: a width of the text block, a height of the text block, a scale mean of all connected regions in the text block, and density information of the text block; the machine learning model is a 6-dimensional input and a 6-dimensional output nerve Network model.
  20. 根据权利要求19所述的计算机设备,其特征在于,所述神经网络模型包括6维输入层、6维输出层、第一隐层以及第二隐层,所述第一隐层、第二隐层分别为12维和20维的隐层。The computer device according to claim 19, wherein the neural network model comprises a 6-dimensional input layer, a 6-dimensional output layer, a first hidden layer, and a second hidden layer, the first hidden layer and the second hidden layer. The layers are 12-dimensional and 20-dimensional hidden layers, respectively.
  21. 根据权利要求12所述的计算机设备,其特征在于,识别文档图片中包含的文本块,包括:The computer device according to claim 12, wherein the identifying the text block included in the document picture comprises:
    对所述文档图片进行二值化处理和方向校正处理;Performing binarization processing and direction correction processing on the document picture;
    对经过二值化处理及方向校正处理的文档图片进行版面分析,得到文档图片中包括的文本块。The document image of the binarization processing and the direction correction processing is subjected to layout analysis to obtain a text block included in the document image.
  22. 根据权利要求12所述的计算机设备,其特征在于,所述计算机可读指令还使得所述处理器执行如下步骤:The computer apparatus according to claim 12, wherein said computer readable instructions further cause said processor to perform the following steps:
    对各个文本块进行文本识别,并按照所述确定出的阅读顺序得到所述文档图片的文本信息。Text recognition is performed on each text block, and text information of the document picture is obtained according to the determined reading order.
  23. 一个或多个存储有计算机可读指令的非易失性存储介质,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行如下步骤:One or more non-volatile storage media storing computer readable instructions, when executed by one or more processors, cause one or more processors to perform the following steps:
    识别文档图片中包含的文本块,构建一个块集合;Identify a block of text contained in the document image to construct a block set;
    从所述块集合中确定出一起始文本块;Determining a starting text block from the set of blocks;
    根据所述起始文本块的特征信息对所述起始文本块执行寻径操作,以确定出所述块集合中与所述起始文本块对应的第一文本块;文本块的特征信息包括该文本块在文档图片中的位置信息以及该文本块的版面布局信息;Performing a routing operation on the starting text block according to the feature information of the starting text block to determine a first text block in the block set corresponding to the starting text block; the feature information of the text block includes Position information of the text block in the document picture and layout information of the text block;
    根据所述第一文本块的特征信息对所述第一文本块执行寻径操作,以确定出所述块集合中与所述第一文本块对应的文本块;并依此类推直到所述块集合中每一个文本块对应的寻径操作的执行顺序能够唯一确定;及Performing a routing operation on the first text block according to the feature information of the first text block to determine a text block corresponding to the first text block in the block set; and so on until the block The execution order of the routing operations corresponding to each text block in the collection can be uniquely determined;
    确定所述块集合中文本块对应的寻径操作的执行顺序,根据所述执行顺序得到所述文档图片中文本块的阅读顺序。Determining an execution order of the routing operations corresponding to the text blocks in the block set, and obtaining a reading order of the text blocks in the document picture according to the execution order.
  24. 根据权利要求23所述的存储介质,其特征在于,所述从所述块集合中确定出一起始文本块,包括:The storage medium according to claim 23, wherein said determining a starting text block from said set of blocks comprises:
    从所述块集合中选择出中心点坐标位于所述文档图片的一个顶点的文本块,并将所述文本块确定为所述起始文本块。A text block whose center point coordinates are located at one vertex of the document picture is selected from the set of blocks, and the text block is determined as the start text block.
  25. 根据权利要求23所述的存储介质,其特征在于,所述从所述块集合中确定出一起始文本块,包括:The storage medium according to claim 23, wherein said determining a starting text block from said set of blocks comprises:
    以所述文档图片的一个顶点为原点建立XOY坐标系,所述XOY坐标系的x轴正方向指向所述文档图片的宽度方向,y轴正方向指向所述文档图片的长度方向;Establishing an XOY coordinate system with one vertex of the document picture as an origin, the positive direction of the x-axis of the XOY coordinate system points to the width direction of the document picture, and the positive direction of the y-axis points to the length direction of the document picture;
    从所述块集合中获取中心点的x坐标最小的一个文本块,作为文本块A;Obtaining, from the block set, a text block having a smallest x coordinate of the center point as the text block A;
    获取中心点的y坐标小于所述文本块A的文本块,构建一个文本块集合G′;并依次将所述文本块集合G′中的每一个文本块B与所述文本块A进行对比;Obtaining a text block of the center point that is smaller than the text block of the text block A, constructing a text block set G′; and sequentially comparing each text block B of the text block set G′ with the text block A;
    若所述文本块B与所述文本块A在x轴方向的投影不存在交集,则将所述文本块B从所述文本块集合G′中删除;若所述文本块B与所述文本块A在x轴方向的投影存在交集,则更新所述文本块A为所述文本块B,并将所述文本块B从所述文本块集合G′中删除;If there is no intersection between the text block B and the projection of the text block A in the x-axis direction, the text block B is deleted from the text block set G'; if the text block B and the text An intersection of the projection of the block A in the x-axis direction is performed, the text block A is updated as the text block B, and the text block B is deleted from the text block set G';
    在每次文本块对比之后检测所述文本块集合G′是否为空;若是,则将当前的文本块A确定为起始文本块;若否,则在所述文本块A发生更新时更新所述文本块集合G′,并将更新后的所述文本块集合G′中的每一个文本块与当前的文本块A进行上述对比;依次类推直到所述文本块集合G′为空。Detecting whether the text block set G' is empty after each text block comparison; if so, determining the current text block A as the starting text block; if not, updating the text block A when the update occurs The text block set G' is described, and each of the updated text block sets G' is compared with the current text block A; and so on until the text block set G' is empty.
  26. 根据权利要求23所述的存储介质,其特征在于,所述寻径操作包括:The storage medium according to claim 23, wherein the path finding operation comprises:
    通过预先训练好的机器学习模型对所述文本块的特征信息进行学习,得出与所述文本块对应的文本块的特征预测信息;Learning the feature information of the text block by using a pre-trained machine learning model to obtain feature prediction information of the text block corresponding to the text block;
    计算所述块集合中未执行寻径操作的各文本块的特征信息与所述特征预测信息的相关度;及Calculating a correlation between feature information of each text block in which the path finding operation is not performed in the block set and the feature prediction information; and
    根据上述计算出的相关度确定出所述文本块对应的文本块。A text block corresponding to the text block is determined according to the correlation calculated above.
  27. 根据权利要求23所述的存储介质,其特征在于,所述计算机可读指令还使得所述处理器执行如下步骤:The storage medium of claim 23, wherein the computer readable instructions further cause the processor to perform the following steps:
    预先训练机器学习模型,使得训练之后的机器学习模型输出的特征预测信息与对应的样本信息的欧式距离满足设定条件。The machine learning model is pre-trained such that the feature prediction information output by the machine learning model after the training and the Euclidean distance of the corresponding sample information satisfy the set condition.
  28. 根据权利要求27所述的存储介质,其特征在于,所述预先训练机器学习模型,包括:The storage medium of claim 27, wherein the pre-training machine learning model comprises:
    建立样本库,所述样本库中的信息包含:样本块的集合,所述样本块的集合中每个样本块在先后各次训练中的顺序状态,以及训练需确定的状态变化序列;若所述样本块的集合中样本块的总数为n,则训练需确定的状态变化 序列为n-2个,且每个状态变化序列中的信息包括:当前参与训练的样本块,所述样本块的集合中每个样本块的当前顺序状态,以及所述样本块的集合中每个样本块的下一顺序状态;Establishing a sample library, the information in the sample library comprising: a set of sample blocks, a sequence state of each sample block in the set of the sample blocks in successive trainings, and a sequence of state changes to be determined by the training; The total number of sample blocks in the set of sample blocks is n, then the sequence of state changes to be determined by the training is n-2, and the information in each state change sequence includes: the sample block currently participating in the training, and the sample block of the sample block a current sequential state of each sample block in the set, and a next sequential state of each sample block in the set of sample blocks;
    依次采用各个状态变化序列对机器学习模型进行训练;当n-2个状态变化序列均参与训练之后,保存所述机器学习模型中的参数。The machine learning model is trained in sequence with each state change sequence; after n-2 state change sequences are all involved in the training, the parameters in the machine learning model are saved.
  29. 根据权利要求28所述的存储介质,其特征在于,所述采用第k个状态变化序列对机器学习模型进行训练,包括:The storage medium according to claim 28, wherein said training the machine learning model with the kth state change sequence comprises:
    将所述样本块的集合中第k个样本块R k的特征信息输入机器学习模型,获取机器学习模型输出的所述样本块R k对应的文本块的特征预测信息O k,k∈[1,n-2]; And inputting feature information of the kth sample block R k in the set of the sample blocks into a machine learning model, and acquiring feature prediction information O k , k∈[1 of the text block corresponding to the sample block R k output by the machine learning model ,n-2];
    根据所述样本块的集合中每个样本块在所述样本块R k参与训练时的顺序状态,获取其中阅读顺序未确定的样本块,得到集合G *Obtaining a sample block in which a reading order is not determined according to a sequence state of each sample block in the set of sample blocks when the sample block R k participates in training, to obtain a set G * ;
    将所述集合G *中各样本块的特征信息分别与O k进行点积运算,得到集合V *The feature information of each sample block in the set G * is respectively subjected to a dot product operation with O k to obtain a set V * ;
    获取所述集合G *中各样本块在第k+1个样本块参与训练时的顺序状态,得到集合V πObtaining a sequence state of each of the sample blocks in the set G * when the k+1th sample block participates in training, and obtaining a set ;
    对集合V *进行归一化处理得到集合V **,对集合V π进行归一化处理得到集合V ππ;根据集合V **和集合V ππ构建所述样本块R k参与训练时对应的损失函数,基于所述损失函数通过误差反向传播BP算法更新所述机器学习模型中的参数。 The set V ** is normalized to obtain a set V ** , and the set V π is normalized to obtain a set V ππ ; the sample block R k is constructed according to the set V ** and the set V ππ to participate in training A loss function that updates parameters in the machine learning model by an error backpropagation BP algorithm based on the loss function.
  30. 根据权利要求23所述的存储介质,其特征在于,文本块在文档图片中的位置信息包括:文本块的中心点在文档图片中的x坐标,文本块的中心点在文档图片中的y坐标;文本块的版面布局信息包括:文本块的宽度,文 本块的高度,文本块中所有连通区域的尺度均值以及文本块的密度信息;所述机器学习模型为6维输入且6维输出的神经网络模型。The storage medium according to claim 23, wherein the position information of the text block in the document picture comprises: an x coordinate of a center point of the text block in the document picture, and a y coordinate of a center point of the text block in the document picture The layout information of the text block includes: a width of the text block, a height of the text block, a scale mean of all connected regions in the text block, and density information of the text block; the machine learning model is a 6-dimensional input and a 6-dimensional output nerve Network model.
PCT/CN2018/075626 2017-03-08 2018-02-07 Document reading-order detection method, computer device, and storage medium WO2018161764A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710134711.1A CN108334805B (en) 2017-03-08 2017-03-08 Method and device for detecting document reading sequence
CN201710134711.1 2017-03-08

Publications (1)

Publication Number Publication Date
WO2018161764A1 true WO2018161764A1 (en) 2018-09-13

Family

ID=62923005

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/075626 WO2018161764A1 (en) 2017-03-08 2018-02-07 Document reading-order detection method, computer device, and storage medium

Country Status (2)

Country Link
CN (1) CN108334805B (en)
WO (1) WO2018161764A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2569418A (en) * 2017-12-15 2019-06-19 Adobe Inc Using deep learning techniques to determine the contextual reading order in a document
CN111507267A (en) * 2020-04-17 2020-08-07 北京百度网讯科技有限公司 Document orientation detection method, device, equipment and storage medium
CN112966676A (en) * 2021-02-04 2021-06-15 北京易道博识科技有限公司 Document key information extraction method based on zero sample learning

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109934229B (en) * 2019-03-28 2021-08-03 网易有道信息技术(北京)有限公司 Image processing method, device, medium and computing equipment
CN110059146B (en) * 2019-04-16 2021-04-02 珠海金山网络游戏科技有限公司 Data acquisition method, server, computing equipment and storage medium
CN111079641B (en) * 2019-12-13 2024-04-16 科大讯飞股份有限公司 Answer content identification method, related device and readable storage medium
CN114495147B (en) * 2022-01-25 2023-05-05 北京百度网讯科技有限公司 Identification method, device, equipment and storage medium
CN115641573B (en) * 2022-12-22 2023-07-14 苏州浪潮智能科技有限公司 Text ordering method and device, electronic equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007027817A (en) * 2005-07-12 2007-02-01 Oki Data Corp Image reader
CN101866418A (en) * 2009-04-17 2010-10-20 株式会社理光 Method and equipment for determining file reading sequences
CN104268127A (en) * 2014-09-22 2015-01-07 同方知网(北京)技术有限公司 Method for analyzing reading order of electronic layout file

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH1152973A (en) * 1997-08-07 1999-02-26 Ricoh Co Ltd Document reading system
US8325362B2 (en) * 2008-12-23 2012-12-04 Microsoft Corporation Choosing the next document
CN105512647A (en) * 2016-01-19 2016-04-20 同方知网(北京)技术有限公司 Method and device for intelligent layout division of scanned file on small-screen equipment

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007027817A (en) * 2005-07-12 2007-02-01 Oki Data Corp Image reader
CN101866418A (en) * 2009-04-17 2010-10-20 株式会社理光 Method and equipment for determining file reading sequences
CN104268127A (en) * 2014-09-22 2015-01-07 同方知网(北京)技术有限公司 Method for analyzing reading order of electronic layout file

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ISHITANI, YASUTO: "Document Transformation System from Papers to XML Data Based on Pivot XML Document Method", PROCEEDINGS OF THE SEVENTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR' 03, 31 December 2003 (2003-12-31), XP010656617 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2569418A (en) * 2017-12-15 2019-06-19 Adobe Inc Using deep learning techniques to determine the contextual reading order in a document
GB2569418B (en) * 2017-12-15 2020-10-07 Adobe Inc Using deep learning techniques to determine the contextual reading order in a document
CN111507267A (en) * 2020-04-17 2020-08-07 北京百度网讯科技有限公司 Document orientation detection method, device, equipment and storage medium
CN112966676A (en) * 2021-02-04 2021-06-15 北京易道博识科技有限公司 Document key information extraction method based on zero sample learning
CN112966676B (en) * 2021-02-04 2023-10-20 北京易道博识科技有限公司 Document key information extraction method based on zero sample learning

Also Published As

Publication number Publication date
CN108334805A (en) 2018-07-27
CN108334805B (en) 2020-04-03

Similar Documents

Publication Publication Date Title
WO2018161764A1 (en) Document reading-order detection method, computer device, and storage medium
US11514260B2 (en) Information recommendation method, computer device, and storage medium
WO2022213879A1 (en) Target object detection method and apparatus, and computer device and storage medium
US10846524B2 (en) Table layout determination using a machine learning system
CN108304835B (en) character detection method and device
WO2019238063A1 (en) Text detection and analysis method and apparatus, and device
WO2019100724A1 (en) Method and device for training multi-label classification model
WO2019232843A1 (en) Handwritten model training method and apparatus, handwritten image recognition method and apparatus, and device and medium
CN109993102B (en) Similar face retrieval method, device and storage medium
CN110276406B (en) Expression classification method, apparatus, computer device and storage medium
CN111950528B (en) Graph recognition model training method and device
CN110874618B (en) OCR template learning method and device based on small sample, electronic equipment and medium
CN110717366A (en) Text information identification method, device, equipment and storage medium
CN111401521B (en) Neural network model training method and device, and image recognition method and device
WO2023279847A1 (en) Cell position detection method and apparatus, and electronic device
CN112766170B (en) Self-adaptive segmentation detection method and device based on cluster unmanned aerial vehicle image
US20220374473A1 (en) System for graph-based clustering of documents
US11288538B2 (en) Object functionality predication methods, computer device, and storage medium
CN111325237A (en) Image identification method based on attention interaction mechanism
KR102103511B1 (en) Code generating apparatus and method
CN116091836A (en) Multi-mode visual language understanding and positioning method, device, terminal and medium
CN114330514A (en) Data reconstruction method and system based on depth features and gradient information
CN113723367B (en) Answer determining method, question judging method and device and electronic equipment
CN112529025A (en) Data processing method and device
US11663761B2 (en) Hand-drawn diagram recognition using visual arrow-relation detection

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18763262

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18763262

Country of ref document: EP

Kind code of ref document: A1