CN111582153A

CN111582153A - Method and device for determining document orientation

Info

Publication number: CN111582153A
Application number: CN202010377027.8A
Authority: CN
Inventors: 曲福; 庞敏辉; 韩光耀; 姜泽青
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-05-07
Filing date: 2020-05-07
Publication date: 2020-08-25
Anticipated expiration: 2040-05-07
Also published as: CN111582153B

Abstract

The application discloses a method and a device for determining document orientation, and relates to the technical field of computer vision. One embodiment of the method comprises: acquiring an image of a document to be detected including a form; inputting an image of a document to be detected into a trained key point detection model, and obtaining key point information of a table of the document to be detected, wherein the key point information represents positions of at least two preset key points related to the document orientation of the table in the document to be detected; and determining the orientation of the document to be detected based on the key point information of the table of the document to be detected. The implementation mode can accurately and efficiently detect the orientation of the document.

Description

Method and device for determining document orientation

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to the technical field of computer vision, and particularly relates to a method and a device for determining document orientation.

Background

The image recognition technology mainly has the function of distinguishing objects in an image according to an observed image so as to make corresponding meaningful judgment, and the specific realization is to apply modern information processing technology and simulate the human cognitive process by a computer.

In the document processing technology, it is often necessary to recognize a plurality of text images from a document image by an image recognition technology and perform subsequent processing such as document orientation correction, optical character recognition, and the like using the recognized text images.

The method depends on accurate document orientation detection when the direction of the document image is corrected, and the current document orientation detection technology has defects in accuracy and processing efficiency.

Disclosure of Invention

A method, an apparatus, an electronic device, and a computer-readable medium for determining an orientation of a document are provided.

According to a first aspect, there is provided a method of determining the orientation of a document, the method comprising: acquiring an image of a document to be detected including a form; inputting an image of a document to be detected into a trained key point detection model, and obtaining key point information of a table of the document to be detected, wherein the key point information represents positions of at least two preset key points related to the document orientation of the table in the document to be detected; and determining the orientation of the document to be detected based on the key point information of the table of the document to be detected.

According to a second aspect, there is provided an apparatus for determining the orientation of a document, the apparatus comprising: an image acquisition module configured to acquire an image of a document to be tested including a form; the information acquisition module is configured to input the image of the document to be detected into the trained key point detection model, and obtain key point information of a table of the document to be detected, wherein the key point information represents positions of at least two preset key points, associated with the orientation of the document, of the table in the document to be detected; and the orientation determining module is configured to determine the orientation of the document to be detected based on the key point information of the table of the document to be detected.

According to a third aspect, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as described in any one of the implementations of the first aspect.

According to a fourth aspect, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform a method as described in any one of the implementations of the first aspect.

According to the method and the device for determining the orientation of the document, firstly, the image of the document to be detected including the form is obtained, secondly, the image of the document to be detected is input into the trained key point detection model, the key point information of the form of the document to be detected output by the key point detection model is obtained, and finally, the orientation of the document to be detected is determined based on the key point information of the form of the document to be detected. Therefore, the embodiment of the application detects the key points of the table of the document to be detected through the trained key point detection model to obtain the key point information, determines the orientation of the whole document according to the key point information, is accurate in positioning, and can accurately and efficiently detect the orientation of the document.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:

FIG. 1 is a flow diagram of one embodiment of a method for orienting documents according to the present application;

FIG. 2 is a schematic diagram of pre-determined key points in an application scenario according to the present application;

FIG. 3 is an exemplary flow chart for determining an orientation of a document under test according to the present application;

FIG. 4 is a flow diagram of yet another embodiment of a method for orienting documents according to the present application;

FIG. 5 is a schematic block diagram of an embodiment of an apparatus for orienting documents according to the present application;

FIG. 6 is a block diagram of an electronic device for implementing a method of determining document orientation according to an embodiment of the present application.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

FIG. 1 illustrates a flow 100 of one embodiment of a method of orienting documents according to the present application. The method for determining the orientation of the document comprises the following steps:

step 101, acquiring an image of a document to be tested including a form.

In this embodiment, the execution body on which the method for determining the orientation of the document is executed may obtain the image of the document to be detected including the table in a real-time shooting or memory reading manner. The image of the document to be measured including the form is an image formed by image data acquisition (i.e., photographing) of the document to be measured including the form. The document to be tested comprises a table, the table can be an empty table or a table filled with contents, and the filled contents comprise: the document to be tested may include contents not only limited to the table, but also text paragraphs, punctuation marks, characters, titles, and the like. The positions of the text paragraphs, punctuation marks, titles and characters can be set according to the document layout requirements, for example, in the image of the financial form, the characters are clustered or dispersed in the form, and the titles are located outside the form.

In this embodiment, in order to facilitate information extraction on the acquired image of the to-be-detected document including the table, optionally, after the image of the to-be-detected document including the table is acquired, the table may be preprocessed. For example, in order to facilitate identifying a document to be detected in an image of the document to be detected including a table, the preprocessing may include denoising processing, where the denoising processing refers to a process of reducing noise in the image, and the denoising processing may effectively improve image quality, increase a signal-to-noise ratio, and better represent effective information carried by an original image.

And 102, inputting the image of the document to be detected into the trained key point detection model, and obtaining the key point information of the table of the document to be detected output by the key point detection model.

The key point information represents the positions of at least two preset key points which are associated with the orientation of the document in the table in the document to be tested. Here, the at least two preset key points associated with the orientation of the document may be key points where a line segment formed by connecting the key points is parallel to, perpendicular to, and at a fixed angle with the orientation of the document. For example, two vertices in the table row direction, or two vertices in the table column direction. The direction of the connecting line of the two vertexes in the table row direction is consistent with the orientation of the document, and the direction of the connecting line of the two vertexes in the table column direction is perpendicular to the orientation of the document.

In some optional implementations of this embodiment, the at least two preset key points associated with the document orientation include: at least two of the four vertices of the table in the document under test. In the optional implementation manner, at least two of the four vertices of the table are used as preset key points, and according to the principle that a line is generated by two points, at least one line representing the orientation of the table of the document to be detected is obtained by the at least two vertices, so that the orientation of the table of the document to be detected can be effectively determined.

In some optional implementations of this embodiment, as shown in fig. 2, the at least two preset key points associated with the document orientation include: at least two of the four vertices (B1, B2, B3, B4) of the table in the document to be tested and at least two endpoints (D1, D2) characterizing both ends of the title of the table in the document to be tested, in fig. 2, the title of the table is "profit table", and the two endpoints at both ends of the title of the table are respectively an endpoint D1 and an endpoint D2. In the optional implementation manner, at least two of the four vertexes of the table are used as preset key points, and at least two end points at two ends of the title of the table in the document to be detected are added, and the title can indicate the orientation of the table of the document to be detected, so that the accuracy of determining the orientation of the table of the document to be detected is improved, and a reliable basis is provided for detecting the orientation of the document to be detected.

The key point information may represent information of positions of the preset key points in the image of the document to be measured, and may be, for example, position coordinates of each preset key point. Or, it may be a thermodynamic diagram of the preset key points. The higher the thermodynamic value of a position coordinate in the thermodynamic diagram, the higher the confidence that the preset key point falls on the position coordinate.

In this embodiment, the key point detection model is used to detect the key points of the image of the document to be detected, and is a pre-trained model. After obtaining the training sample containing the key point mark information, the execution main body trains the initial model by adopting the training sample, and obtains the trained key point detection model after multiple times of training, evaluation and algorithm parameter adjustment of the initial model. After the image of the document to be detected including the form is input into the key point detection model, the key point information of the form of the document to be detected can be obtained.

Specifically, the key point detection model training process may be as follows:

1) images of documents including tables are collected as training samples.

2) And marking at least two preset key points associated with the orientation of the document in the table of the training sample to construct a data set. For example, the at least two predetermined keypoints associated with document orientation comprise two vertices of a table, the locations of which are labeled.

3) The keypoint detection model is constructed using a model structure such as a convolutional neural network. And then training the key point detection model by using the constructed training sample. In the training process, the error of the key point detection model can be determined according to the difference between the key point detection result of the key point detection model on the training sample and the key point mark information of the training sample, and the parameters of the key point detection model are iteratively adjusted by using an error back propagation mode to gradually reduce the error. And stopping adjusting the parameters when the error of the key point detection model converges to a certain range or the iteration frequency reaches a preset frequency threshold value, so as to obtain the trained key point detection model.

Optionally, the preprocessing of the table may include: before the image of the document to be detected is input into the trained key point detection model, the image is zoomed to a preset size, and the zoomed image size is adapted to the trained key point detection model. For example, the trained keypoint detection model may process an image size a × b to which the image of the document to be measured may be scaled. And then inputting the scaled image into a trained key point detection model for detection.

And 103, determining the orientation of the document to be detected based on the key point information of the table of the document to be detected.

In this embodiment, since the key point information represents the positions of at least two preset key points associated with the document orientation of the table in the document to be detected, after the key point information of the table of the document to be detected is determined, the orientation of the table of the document to be detected can be determined, and further the orientation of the document to be detected can be easily determined.

When at least two preset key points associated with the document orientation are more than two, namely a plurality of preset key points, any two different preset key points in the plurality can determine the orientation of the table of the document to be detected, so that the orientation of the document to be detected can be determined by the combination of the preset key points. Here, the orientation of the document to be measured may be characterized by the direction of its text lines or the direction of its text columns.

Specifically, referring to fig. 2, assuming that the position coordinates of vertex B1 and vertex B2 of the four vertices (B1, B2, B3, B4) of the table in the document to be tested are determined, where B1 and B2 are two vertices located in the row direction of the table, a direction vector from vertex B1 to vertex B2 may be constructed according to the position coordinates of vertex B1 and vertex B2, and the direction indicated by the direction vector is the text row direction of the document. For another example, assuming that the position coordinates of a point B1 and a vertex B3 in four vertices (B1, B2, B3, B4) of a table in a document to be tested are determined, where B1 and B3 are two vertices located in the column direction of the table, a direction vector from the vertex B1 to the vertex B3 may be constructed according to the position coordinates of the vertex B1 and the vertex B3, and a direction perpendicular to the direction vector is a text row direction of the document.

When the preset key point comprises at least two vertexes of the table and end points at two ends of the title of the table, a first candidate direction representing the text row direction can be determined through the vertexes of the table, a second candidate direction representing the text row direction can be determined according to a connecting line of the end points at the two ends of the title of the table, and a final detection result of the text row direction can be determined according to confidence degrees of the first candidate direction and the second candidate direction. Therefore, a plurality of candidate directions can be determined through a plurality of groups of preset key points, and the reliability of the document orientation detection result is improved.

The method for determining the orientation of the document includes the steps of firstly obtaining an image of a document to be detected including a form, secondly inputting the image of the document to be detected into a trained key point detection model, obtaining key point information of the form of the document to be detected output by the key point detection model, and finally determining the orientation of the document to be detected based on the key point information of the form of the document to be detected. Therefore, the embodiment of the application detects the key points of the table of the document to be detected through the trained key point detection model to obtain the key point information, determines the orientation of the whole document according to the key point information, is accurate in positioning, and can accurately and efficiently detect the orientation of the document.

In some alternative implementations of the embodiment, with continuing reference to fig. 3, fig. 3 shows an exemplary process 300 for determining an orientation of a document to be tested based on key point information of a table of the document to be tested, according to the present application, including the following steps:

step 301, combining preset key points into at least one point pair, and determining location information of the at least one point pair based on key point information.

Specifically, referring to FIG. 2, title end point D1 combines with end point D2 as a pair of points, table vertex B1 combines with vertex B2 as a pair of points, table vertex B3 combines with vertex B4 as a pair of points, table vertex B1 combines with vertex B3 as a pair of points, and table vertex B2 combines with vertex B4 as a pair of points, each combination of point pairs being indicative of the true orientation of the table.

The point pairs are formed by a combination of preset key points, and the position information of the point pairs can be determined by the position information of the two contained preset key points. Optionally, the key point information includes: presetting the position coordinates of the key points; the determining the location information of at least one point pair based on the key point information includes: and determining the position coordinates of each point pair according to the position coordinates of two preset key points in each point pair. For example, the position coordinates of the title end point D1 are (12, 10), the position coordinates of the title end point D2 are (18, 15), and the position of the title end point D1 combined with the end point D2 into one point pair can be represented as (12, 10, 18, 15).

In some optional implementations of this embodiment, the key point information includes: presetting the position coordinates of the key points and the confidence coefficient of the position coordinates; the determining the location information of at least one point pair based on the key point information includes: and respectively determining the position coordinates and the corresponding confidence degrees of the point pairs according to the position coordinates and the corresponding confidence degrees of the two preset key points in the point pairs. In this optional implementation manner, the key point information includes both the position coordinates of the preset key points and the confidence degrees of the position coordinates, so that a variety of different bases are provided for determining the orientation of the table of the document to be detected, and the confidence degree of the position information of the point pair is further improved.

Step 302, determining the orientation of the document to be tested based on the position information of at least one point pair.

In this optional implementation, the key point information is associated with the document orientation, and the key points are combined into point pairs, so that the position information of the point pairs is associated with the document orientation, and the orientation of the document to be measured can be determined from the position information of at least one point pair. The direction of the straight line where the point pair is located relative to the document to be detected can be determined based on the position information of at least one point pair, and the orientation of the document to be detected is further determined according to the direction of the straight line where the point pair is located relative to the document to be detected.

In some optional implementations of this embodiment, the location information of the point pair includes: the confidence coefficient of the position coordinates of the point pairs can be calculated through the confidence coefficient of the position coordinates of the key points in the point pairs, and the specific calculation mode comprises addition average, a minimum value method and the like. For example, the title end point D1 and the end point D2 are combined into a point pair, and the confidence of the position coordinates of the title end point D1 is 50%; the confidence of the position coordinates of the title end point D2 is 30%; the confidence of the position coordinates of the point pair of the title end point D1 and the end point D2 obtained by the addition average method is 1/2 × 40% (50% + 30%), and the confidence of the position coordinates of the point pair of the title end point D1 and the end point D2 obtained by the lowest value method is 30%, which is the lowest value of the confidence of the position coordinates of the title end point D1 and the confidence of the position coordinates of the end point D2.

In this optional implementation manner, the determining the orientation of the to-be-measured document based on the position information of the at least one point pair includes:

and determining the orientation of the document to be detected according to the sequencing of the confidence degrees of the position coordinates of the point pairs and the position coordinates of the point pairs. In the optional implementation manner, the confidence degrees of the position coordinates of the point pairs are sequenced, some point pairs with higher confidence degrees of the position coordinates can be determined, and the reliability of the position selection of the point pairs is ensured. Furthermore, on the basis of higher confidence of the position coordinates, the reliability of the orientation detection of the document to be detected can be further improved by combining the position coordinates of the point pairs.

Further, the method for determining the orientation of the document to be measured according to the ranking of the confidence degrees of the position coordinates of the point pairs and the position coordinates of the point pairs may adopt any one of the following implementation manners:

1) and sequencing the confidence degrees of the position coordinates of the point pairs, and determining the orientation of the document to be detected according to the preset corresponding relation between the connecting line of the two preset key points in the point pair with the highest confidence degree and the orientation of the document to be detected. It should be noted that the preset corresponding relationship between the connection line of any two preset key points and the orientation of the document to be detected may be obtained by calibrating in advance, for example, the preset corresponding relationship between the connection line of two end points of the table title and the document to be detected is as follows: both are oriented perpendicularly.

In the implementation mode, the point pair with the highest confidence coefficient of the position coordinates is selected, so that the position information of the point pair can be guaranteed to be the most reliable, an optional implementation mode is provided for the confidence coefficient sequencing of the position coordinates of the point pair, and the reliability of the orientation detection of the document to be detected is guaranteed in the aspect of the optimal point pair position representation.

2) Sequencing the confidence degrees of the position coordinates of all the point pairs from high to low to obtain the point pairs with preset positions before sequencing to form a point pair set; respectively calculating the orientation of the document to be detected based on the position coordinates of each point pair in the point pair set to obtain at least one candidate orientation corresponding to each point pair one by one; and determining the direction angle of each candidate orientation, calculating the average value of the direction angles of each candidate orientation, and taking the direction represented by the average value of the direction angles of each candidate orientation as the orientation of the document to be detected. It should be noted that the direction angle of the candidate orientation may be formed by taking a certain element in the image of the document to be measured as a reference, such as a row or column direction of the pixel.

In the implementation mode, the direction represented by the average value of the direction angles of the candidate orientations is selected as the orientation of the to-be-detected document, an optional implementation mode is provided for the confidence degree sequencing of the position coordinates of the point pairs, the commonality of the position information of the point pairs is absorbed, and the reliability of the orientation detection of the to-be-detected document is ensured in the aspect of the commonality of the point pairs.

The method for determining the orientation of the document to be detected provided by the optional implementation manner shown in fig. 3 is to combine the preset key points into at least one point pair on the basis of determining at least two preset key points associated with the orientation of the document, determine the position information of the at least one point pair based on the key point information, and determine the orientation of the document to be detected based on the position information of the at least one point pair.

In some optional implementations of this embodiment, the key point information includes: presetting the position coordinates of the key points; further, referring to FIG. 4, a flow diagram 400 of yet another embodiment of a method of determining document orientation is shown, where the flow diagram 400 of yet another embodiment of a method of determining document orientation may include the steps of:

step 401, acquiring an image of a document to be tested including a form.

Step 402, inputting the image of the document to be detected into the trained key point detection model, and obtaining the key point information of the table of the document to be detected output by the key point detection model.

In step 403, the preset key points are combined into at least one point pair.

Step 404, respectively determining the position coordinates of each point pair based on the position coordinates of two preset key points in each point pair.

Specifically, the point pairs are formed by a combination of key points, and the positions of preset key points are also determined, and after the position coordinates of the preset key points are determined, the position coordinates of the respective point pairs can be determined. For example, the position coordinates of two preset key points (B1, B2) in a point pair of a table are B1(24, 36) and B2(45, 28), respectively, and the position coordinates of the current point pair can be represented as (24, 36, 45, 28).

Step 405, inputting the position coordinates of at least one point pair into a pre-trained document orientation prediction model, and obtaining the orientation of the to-be-detected document output by the orientation prediction model.

The orientation prediction model is obtained by acquiring position information of preset key points in a plurality of sample documents containing tables and training the position information in combination with orientation marking information of the sample documents.

The orientation prediction model is used for detecting the real orientation of the document to be detected, is a pre-trained model, and can obtain the real orientation of the document to be detected after the position coordinates of at least one point pair are input into the orientation prediction model. Orientation prediction model training process: firstly, a training sample set composed of a plurality of sample documents is obtained, wherein the sample documents of the training sample set are labeled with position information of preset key points in advance, and the position information of each preset key point corresponds to the real orientation of the sample documents. Secondly, training an initial model of the orientation prediction model by using the training samples in combination with the orientation marking information of the sample document, and obtaining the orientation prediction model after multiple training, evaluation and algorithm parameter adjustment of the initial model.

In the method for determining the orientation of the document provided in the embodiment shown in fig. 4, when the key point information includes the position coordinates of the preset key points, the position coordinates of each point pair are respectively determined based on the position coordinates of two preset key points in each point pair, and the position coordinates of at least one point pair are input into the pre-trained document orientation prediction model, so that the orientation of the document to be detected is obtained, and the reliability of the orientation detection of the document to be detected is improved.

The following describes in detail an implementation procedure of a specific implementation of the method for determining a document orientation according to the embodiment with reference to fig. 2:

the implementation process of the specific implementation method is divided into three steps, namely a first step of training a key point detection model. And secondly, performing key point detection on the image of the document to be detected including the table by using the model to obtain key point information. Step three, determining the orientation of the document to be detected according to the key point information, wherein each step is as follows:

in the first step, a large amount of document image data including tables and table titles needs to be collected, and the tables and table titles in each image are labeled with key points, so that the left end point and the right end point of the table title (for example, D1 and D2 in fig. 2), and the four vertices of the table (for example, B1 to B4 in fig. 2) can be labeled to construct a data set. And training a key point detection model based on the data set.

A second step of detecting the key point information of an image of a document to be detected including a table and a title of the table by using the key point detection model, specifically:

the method comprises the steps of obtaining an image of a document to be detected comprising a table and a table title, performing corresponding preprocessing (for example, reducing the image to the size required by a key point detection model) required by a key point detection model on the image, inputting the image of the document to be detected into the key point detection model, operating a model detection process, and obtaining key point information output by the key point detection model, wherein the key point information comprises position coordinates of all preset key points and confidence degrees of the position coordinates.

Step three, determining the orientation of the document to be detected according to the key point information output by the key point detection model, specifically:

combining preset keypoints into at least one point pair, for example, combining D1 with an end point D2 into one point pair, combining table vertex B1 with vertex B2 into one point pair, combining table vertex B3 with vertex B4 into one point pair, combining table vertex B1 with vertex B3 into one point pair, and combining table vertex B2 with vertex B4 into one point pair, since each point pair can derive a true orientation of a table of a document to be tested, there are many alternative implementations, and therefore any one of the following may be adopted:

1) and sequencing the confidence degrees of the position coordinates of the point pairs, and deducing the real orientation of the table of the document to be detected directly according to a group of point pairs with the highest confidence degrees.

2) And sequencing the confidence degrees of the position coordinates of all the point pairs from high to low, taking the point pair with the highest confidence degree at the front preset position, and taking the average value of the real orientations of the corresponding tables as the table orientation of the document to be detected.

3) And fitting the distribution of each point to the real orientation of the corresponding table, and taking the expected orientation as the table orientation of the document to be tested (refer to the implementation process shown in fig. 4 specifically).

In summary, through the above three steps, the document orientation detection based on the heading table is completed.

It should be noted that, in some embodiments of the present application, the first step is not a necessary step, for example, in practice, the keypoint detection model may be trained at other ends, and then the keypoint detection model may be directly used to detect the position of the preset keypoint in the image of the document to be detected. The above description of the first step does not constitute a necessary limitation to the specific implementation of the embodiments of the present application.

With further reference to fig. 5, as an implementation of the method shown in the above figures, the present application provides an embodiment of an apparatus for determining an orientation of a document, which corresponds to the embodiment of the method shown in fig. 1, and which is particularly applicable in various electronic devices.

As shown in fig. 5, the apparatus 500 for determining the orientation of a document according to the present embodiment includes: an image acquisition module 501, an information acquisition module 502, and an orientation determination module 503. The image obtaining module 501 may be configured to obtain an image of a document to be tested, which includes a table. The information obtaining module 502 may be configured to input the image of the document to be detected into the trained key point detection model, and obtain the key point information of the table of the document to be detected output by the key point detection model, where the key point information represents positions of at least two preset key points associated with the document orientation in the table of the document to be detected. The orientation determining module 503 may be configured to determine the orientation of the document to be tested based on the key point information of the table of the document to be tested.

In the present embodiment, in the apparatus 500 for determining the orientation of a document: the specific processing of the image obtaining module 501, the information obtaining module 502, and the orientation determining module 503 and the technical effects thereof can refer to the related descriptions of step 101, step 102, and step 103 in the corresponding embodiment of fig. 1, which are not described herein again.

In some optional implementations of this embodiment, the orientation determining module 503 includes: a point pair combining submodule (not shown), an orientation determining submodule (not shown). The point pair combining submodule may be configured to combine preset keypoints into at least one point pair, and determine location information of the at least one point pair based on the keypoint information. The orientation determining sub-module may be configured to determine an orientation of the document to be measured based on the position information of the at least one point pair.

In some optional implementations of this embodiment, the key point information includes: presetting the position coordinates of the key points and the confidence coefficient of the position coordinates; the pair combining submodule includes: a position determination unit (not shown in the figure). The position determining unit may be configured to determine the position coordinates and the corresponding confidence levels of the respective point pairs from the position coordinates and the corresponding confidence levels of the two predetermined key points in the respective point pairs. The orientation determination submodule includes: an orientation determination unit (not shown in the figure). The orientation determining unit may be configured to determine the orientation of the document to be measured according to the ranking of the confidence degrees of the position coordinates of the respective point pairs and the position coordinates of the respective point pairs.

In some optional implementations of this embodiment, the orientation determining unit includes: an orientation determining subunit (not shown in the figure). The orientation determining subunit may be configured to rank the confidence levels of the position coordinates of the point pairs, and determine the orientation of the document to be detected according to a preset correspondence between a connection line of two preset key points in the point pair with the highest confidence level and the orientation of the document to be detected.

In some optional implementations of this embodiment, the orientation determining unit includes: a set composition subunit (not shown), an orientation calculation subunit (not shown), and an average orientation calculation subunit (not shown). The set forming subunit may be configured to sort the confidence degrees of the position coordinates of all the point pairs from high to low, and obtain the point pairs with preset bits before sorting to form a point pair set. The orientation calculating subunit may be configured to calculate the orientation of the document to be measured based on the position coordinates of each point pair in the point pair set, respectively, to obtain at least one candidate orientation corresponding to each point pair. The average orientation calculating subunit may be configured to determine direction angles of the candidate orientations, calculate an average value of the direction angles of the candidate orientations, and take a direction represented by the average value of the direction angles of the candidate orientations as the orientation of the document to be measured.

In some optional implementations of this embodiment, the key point information includes: presetting the position coordinates of the key points; the pair combining submodule includes: a point pair determination unit (not shown in the figure). The orientation determination submodule includes: towards the prediction unit (not shown in the figure). The point pair determination unit may be configured to determine the position coordinates of each point pair based on the position coordinates of two preset key points in each point pair, respectively. The orientation prediction unit may be configured to input the position coordinates of the at least one point pair into a pre-trained document orientation prediction model, and obtain an orientation of the to-be-measured document output by the orientation prediction model; the orientation prediction model is obtained by acquiring position information of preset key points in a plurality of sample documents containing tables and training the position information in combination with orientation marking information of the sample documents.

In some optional implementations of the embodiment, the at least two preset key points associated with the document orientation include: at least two of the four vertices of the table in the document under test.

In some optional implementations of this embodiment, the at least two preset key points associated with the document orientation further include: at least two end points at both ends of a title characterizing a table in a document to be tested.

In the apparatus for determining the orientation of a document provided in the embodiment of the present application, firstly, the image obtaining module 501 obtains an image of a document to be detected including a form, secondly, the information obtaining module 502 inputs the image of the document to be detected into a trained key point detection model, obtains key point information of the form of the document to be detected output by the key point detection model, and finally, the orientation determining module 503 determines the orientation of the document to be detected based on the key point information of the form of the document to be detected. Therefore, the embodiment of the application detects the key points of the table of the document to be detected through the trained key point detection model to obtain the key point information, determines the orientation of the whole document according to the key point information, is accurate in positioning, and can accurately and efficiently detect the orientation of the document.

According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.

As shown in fig. 6, it is a block diagram of an electronic device according to the method for determining the document orientation in the embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

As shown in fig. 6, the electronic apparatus includes: one or more processors 601, memory 602, and interfaces for connecting the various components, including a high-speed interface and a low-speed interface. The various components are interconnected using different buses 605 and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). In fig. 6, one processor 601 is taken as an example.

The memory 602 is a non-transitory computer readable storage medium as provided herein. Wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the method of determining an orientation of a document provided herein. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the method of determining document orientation provided herein.

The memory 602, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the method of determining document orientation in the embodiments of the present application (e.g., the image acquisition module 501, the information acquisition module 502, and the orientation determination module 503 shown in fig. 5). The processor 601 executes various functional applications of the server and data processing, i.e., implementing the method of determining document orientation in the above-described method embodiments, by running non-transitory software programs, instructions, and modules stored in the memory 602.

The memory 602 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the electronic device for the method for determining the transmission path, and the like. Further, the memory 602 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 602 optionally includes memory located remotely from the processor 601, and these remote memories may be connected over a network to an electronic device that is a method of determining the orientation of a document. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device of the method of determining an orientation of a document may further include: an input device 603 and an output device 604. The processor 601, the memory 602, the input device 603, and the output device 604 may be connected by a bus 605 or other means, and are exemplified by the bus 605 in fig. 6.

The input device 603 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic apparatus for the method of determining the transmission path, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointing stick, one or more mouse buttons, a track ball, a joystick, or the like. The output devices 604 may include a display device, auxiliary lighting devices (e.g., LEDs), and tactile feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.

The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A method of determining document orientation, the method comprising:

acquiring an image of a document to be detected including a form;

inputting the image of the document to be detected into a trained key point detection model, and obtaining key point information of a table of the document to be detected, wherein the key point information is output by the key point detection model and represents positions of at least two preset key points of the table in the document to be detected, and the positions are related to the orientation of the document;

and determining the orientation of the document to be detected based on the key point information of the table of the document to be detected.

2. The method of claim 1, wherein the determining the orientation of the document to be tested based on the keypoint information of the table of the document to be tested comprises:

combining the preset key points into at least one point pair, and determining the position information of the at least one point pair based on the key point information;

and determining the orientation of the document to be tested based on the position information of the at least one point pair.

3. The method of claim 2, wherein the keypoint information comprises: the position coordinates of the preset key points and the confidence degrees of the position coordinates are set; and

the determining location information of the at least one point pair based on the keypoint information comprises:

respectively determining the position coordinates and the corresponding confidence degrees of the point pairs according to the position coordinates and the corresponding confidence degrees of the two preset key points in the point pairs;

the determining the orientation of the document to be tested based on the position information of the at least one point pair includes:

and determining the orientation of the document to be detected according to the sequencing of the confidence degrees of the position coordinates of the point pairs and the position coordinates of the point pairs.

4. The method of claim 3, wherein determining the orientation of the document to be tested according to the ranking of the confidence of the position coordinates of the point pairs and the position coordinates of the point pairs comprises:

and sequencing the confidence degrees of the position coordinates of the point pairs, and determining the orientation of the document to be detected according to a preset corresponding relation between the connecting line of two preset key points in the point pair with the highest confidence degree and the orientation of the document to be detected.

5. The method of claim 3, wherein determining the orientation of the document to be tested according to the ranking of the confidence of the position coordinates of the point pairs and the position coordinates of the point pairs comprises:

sequencing the confidence degrees of the position coordinates of all the point pairs from high to low to obtain the point pairs with preset positions before sequencing to form a point pair set;

respectively calculating the orientation of the document to be detected based on the position coordinates of each point pair in the point pair set to obtain at least one candidate orientation corresponding to each point pair one by one;

and determining the direction angle of each candidate orientation, calculating the average value of the direction angles of each candidate orientation, and taking the direction represented by the average value of the direction angles of each candidate orientation as the orientation of the document to be detected.

6. The method of claim 2, wherein the keypoint information comprises: the position coordinates of the preset key points; and

respectively determining the position coordinates of each point pair based on the position coordinates of two preset key points in each point pair;

inputting the position coordinates of the at least one point pair into a pre-trained document orientation prediction model to obtain the orientation of the to-be-detected document output by the orientation prediction model;

7. The method according to one of claims 1 to 6, wherein the at least two preset key points associated with document orientation comprise: at least two of the four vertices of the table in the document to be tested.

8. The method of claim 7, wherein the at least two preset keypoints associated with document orientation further comprise:

and characterizing at least two end points at two ends of the title of the table in the document to be tested.

9. An apparatus to determine an orientation of a document, the apparatus comprising:

an image acquisition module configured to acquire an image of a document to be tested including a form;

the information acquisition module is configured to input the image of the document to be detected into a trained key point detection model, and obtain key point information of a table of the document to be detected output by the key point detection model, wherein the key point information represents positions of at least two preset key points of the table in the document to be detected, and the preset key points are associated with the orientation of the document;

an orientation determination module configured to determine an orientation of the document to be tested based on the key point information of the table of the document to be tested.

10. The apparatus of claim 9, wherein the orientation determining module further comprises:

a point pair combining submodule configured to combine the preset keypoints into at least one point pair, the position information of the at least one point pair being determined based on the keypoint information;

an orientation determination submodule configured to determine an orientation of the document to be measured based on the position information of the at least one point pair.

11. The apparatus of claim 10, wherein the keypoint information comprises: the position coordinates of the preset key points and the confidence degrees of the position coordinates are set;

the point pair combining submodule includes:

the position determining unit is configured to determine the position coordinates and the corresponding confidence degrees of the point pairs respectively according to the position coordinates and the corresponding confidence degrees of the two preset key points in the point pairs;

the orientation determination submodule includes:

an orientation determination unit configured to determine an orientation of the document to be measured according to the ranking of the confidence degrees of the position coordinates of the point pairs and the position coordinates of the point pairs.

12. The apparatus of claim 11, wherein the orientation determining unit comprises:

and the orientation determining subunit is configured to rank the confidence degrees of the position coordinates of the point pairs, and determine the orientation of the document to be detected according to a preset corresponding relationship between a connecting line of two preset key points in the point pair with the highest confidence degree and the orientation of the document to be detected.

13. The apparatus of claim 11, wherein the orientation determining unit comprises:

the set forming subunit is configured to sort the confidence degrees of the position coordinates of all the point pairs from high to low, acquire the point pairs with preset positions before sorting and form a point pair set;

the orientation calculation subunit is configured to calculate the orientation of the document to be detected respectively based on the position coordinates of each point pair in the point pair set to obtain at least one candidate orientation corresponding to each point pair one by one;

and the average orientation calculation subunit is configured to determine the direction angle of each candidate orientation, calculate the average value of the direction angles of each candidate orientation, and take the direction represented by the average value of the direction angles of each candidate orientation as the orientation of the document to be detected.

14. The apparatus of claim 10, wherein the keypoint information comprises: the position coordinates of the preset key points;

the point pair combining submodule includes:

a point pair determination unit configured to determine position coordinates of the respective point pairs based on position coordinates of two preset key points in the respective point pairs, respectively;

the orientation determination submodule includes:

the orientation prediction unit is configured to input the position coordinates of the at least one point pair into a pre-trained document orientation prediction model to obtain the orientation of the to-be-detected document output by the orientation prediction model;

15. The apparatus according to one of claims 9-14, wherein the at least two preset key points associated with document orientation comprise: at least two of the four vertices of the table in the document to be tested.

16. The apparatus of claim 15, wherein the at least two preset keypoints associated with document orientation further comprise:

17. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-8.

18. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-8.