CN111582153B

CN111582153B - Method and device for determining orientation of document

Info

Publication number: CN111582153B
Application number: CN202010377027.8A
Authority: CN
Inventors: 曲福; 庞敏辉; 韩光耀; 姜泽青
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-05-07
Filing date: 2020-05-07
Publication date: 2023-06-30
Anticipated expiration: 2040-05-07
Also published as: CN111582153A

Abstract

The application discloses a method and a device for determining document orientation, and relates to the technical field of computer vision. One embodiment of the method comprises the following steps: acquiring an image of a document to be tested comprising a form; inputting an image of a document to be tested into a trained key point detection model to obtain key point information of a form of the document to be tested, which is output by the key point detection model, wherein the key point information represents the positions of at least two preset key points, associated with the orientation of the document, of the form in the document to be tested; and determining the orientation of the document to be tested based on the key point information of the table of the document to be tested. The embodiment can accurately and efficiently detect the document orientation.

Description

Method and device for determining orientation of document

Technical Field

Embodiments of the present application relate to the field of computer technology, and in particular, to the field of computer vision, and more particularly, to a method and apparatus for determining orientation of a document.

Background

The image recognition technology has the main functions of distinguishing objects in the image according to the observed image so as to make corresponding meaningful judgment, and the specific implementation is to apply the modern information processing technology and simulate the human cognitive process by a computer.

In document processing technology, it is often necessary to identify a plurality of text images from document images by an image recognition technology, and to use the identified text images for subsequent processing such as document direction correction, optical character recognition, and the like.

When the direction correction is carried out on the document image, accurate document orientation detection is relied on, and the conventional document orientation detection technology has the defects of accuracy and processing efficiency.

Disclosure of Invention

A method, apparatus, electronic device, and computer-readable medium for determining an orientation of a document are provided.

According to a first aspect, there is provided a method of determining the orientation of a document, the method comprising: acquiring an image of a document to be tested comprising a form; inputting an image of a document to be tested into a trained key point detection model to obtain key point information of a form of the document to be tested, which is output by the key point detection model, wherein the key point information represents the positions of at least two preset key points, associated with the orientation of the document, of the form in the document to be tested; and determining the orientation of the document to be tested based on the key point information of the table of the document to be tested.

According to a second aspect, there is provided an apparatus for determining the orientation of a document, the apparatus comprising: an image acquisition module configured to acquire an image of a document to be tested including a form; the information acquisition module is configured to input an image of the document to be tested into the trained key point detection model to acquire key point information of a form of the document to be tested, which is output by the key point detection model, wherein the key point information represents the positions of at least two preset key points, associated with the orientation of the document, of the form in the document to be tested; the orientation determining module is configured to determine the orientation of the document to be tested based on the key point information of the table of the document to be tested.

According to a third aspect, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method as described in any one of the implementations of the first aspect.

According to a fourth aspect, there is provided a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform a method as described in any implementation of the first aspect.

According to the method and the device for determining the orientation of the document, firstly, the image of the document to be tested including the form is obtained, secondly, the image of the document to be tested is input into the trained key point detection model, key point information of the form of the document to be tested, which is output by the key point detection model, is obtained, and finally, the orientation of the document to be tested is determined based on the key point information of the form of the document to be tested. Therefore, the embodiment of the application detects the key points of the form of the document to be detected through the trained key point detection model to obtain the key point information, and the key point information is used for determining the orientation of the whole document, so that the positioning is accurate, and the document orientation can be accurately and efficiently detected.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for better understanding of the present solution and do not constitute a limitation of the present application. Wherein:

FIG. 1 is a flow chart of one embodiment of a method of determining an orientation of a document according to the present application;

fig. 2 is a schematic diagram of preset key points in an application scenario according to the present application;

FIG. 3 is an exemplary flow chart for determining an orientation of a document under test according to the present application;

FIG. 4 is a flow chart of yet another embodiment of a method of determining an orientation of a document according to the present application;

FIG. 5 is a schematic structural view of an embodiment of an apparatus for orienting documents according to the present application;

FIG. 6 is a block diagram of an electronic device for implementing a method of determining document orientation in accordance with an embodiment of the present application.

Detailed Description

Exemplary embodiments of the present application are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present application to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

FIG. 1 illustrates a flow 100 of one embodiment of a method of determining an orientation of a document according to the present application. The method for determining the orientation of the document comprises the following steps:

step 101, an image of a document to be tested including a form is acquired.

In this embodiment, the execution body on which the method for determining the orientation of the document operates may acquire the image of the document to be tested including the table by means of real-time shooting or memory reading. The image of the document under test including the form is an image formed by image data collection (i.e., photographing) of the document under test including the form. The document to be tested includes a table, and the table may be an empty table or a table filled with contents including: text, symbols, etc., although the content included in the document to be tested may not be limited to tables, for example, text paragraphs, punctuation marks, characters, titles, etc. The text paragraphs, punctuation marks, titles and character positions can be set according to the document typesetting requirements, for example, characters are clustered or scattered in a form in an image of a financial form, and the titles are located outside the form.

In this embodiment, in order to facilitate the subsequent extraction of information from the acquired image of the document to be tested including the table, optionally, the table may be preprocessed after the image of the document to be tested including the table is acquired. For example, in order to facilitate recognition of a document to be detected in an image of the document to be detected including a table, the preprocessing may include denoising, where denoising refers to a process of reducing noise in the image, and the denoising may effectively improve image quality, increase signal-to-noise ratio, and better embody effective information carried by the original image.

Step 102, inputting the image of the document to be tested into the trained key point detection model to obtain the key point information of the table of the document to be tested, which is output by the key point detection model.

The key point information characterizes the positions of at least two preset key points associated with the document orientation of a table in the document to be tested. Here, the at least two preset key points associated with the orientation of the document may be key points where a line segment formed by connection is parallel, perpendicular, and at a fixed angle to the orientation of the document. For example, two vertices in the direction of a table row, or two vertices in the direction of a table column. Wherein the direction of the connecting line of the two vertexes in the direction of the table row is consistent with the direction of the document, and the connecting line of the two vertexes in the direction of the table column is perpendicular to the direction of the document.

In some optional implementations of this embodiment, the at least two preset keypoints associated with the document orientation include: at least two of the four vertices of the table in the document under test. In the alternative implementation manner, at least two of four vertexes of the form are used as preset key points, and at least one line representing the orientation of the form of the document to be tested is obtained by the at least two vertexes according to the principle of generating a line by the two points, so that the orientation of the form of the document to be tested can be effectively determined.

In some optional implementations of this embodiment, as shown in fig. 2, at least two preset key points associated with the document orientation include: at least two of the four vertices (B1, B2, B3, B4) of the form in the document to be tested and at least two endpoints (D1, D2) characterizing the two ends of the title of the form in the document to be tested, in fig. 2 the title of the form is "profit table", the two endpoints of the two ends of the title of the form are endpoint D1 and endpoint D2, respectively. In the alternative implementation manner, at least two of the four vertexes of the form are taken as preset key points, at least two end points of two ends of a title of the form in the document to be tested are added, and the title can indicate the orientation of the form of the document to be tested, so that the accuracy of determining the orientation of the form of the document to be tested is improved, and a reliable basis is provided for the orientation detection of the document to be tested.

The key point information may represent information of the position of the preset key point in the image of the document to be tested, for example, may be position coordinates of each preset key point. Alternatively, a thermodynamic diagram of preset keypoints may be used. The higher the thermodynamic value of a location coordinate in the thermodynamic diagram, the higher the confidence that a predetermined key point falls on that location coordinate.

In this embodiment, the key point detection model is used to detect the key points of the image of the document to be tested, and is a pre-trained model. After the execution main body acquires the training sample containing the key point marking information, the training sample is adopted to train the initial model, and the trained key point detection model is obtained after multiple training, evaluation and algorithm parameter adjustment of the initial model. After the image of the document to be tested including the form is input into the key point detection model, the key point information of the form of the document to be tested can be obtained.

Specifically, the keypoint detection model training process may be as follows:

1) Images of documents comprising tables are collected as training samples.

2) And marking at least two preset key points associated with the document orientation in the table of the training sample, and constructing a data set. For example, at least two preset keypoints associated with document orientation include two vertices of the table, the positions of which are noted.

3) A model structure such as a convolutional neural network is used to construct a keypoint detection model. And then training the key point detection model by using the constructed training sample. In the training process, the error of the key point detection model can be determined according to the difference between the key point detection result of the key point detection model on the training sample and the key point marking information of the training sample, and the parameter of the key point detection model is iteratively adjusted by utilizing the error back propagation mode, so that the error is gradually reduced. And stopping adjusting parameters when the error of the key point detection model is converged to a certain range or the iteration times reach a preset time threshold value, so as to obtain the trained key point detection model.

Optionally, the preprocessing of the table may include: before inputting the image of the document to be tested into the trained key point detection model, scaling the image to a preset size, so that the scaled image size is adapted to the trained key point detection model. For example, the trained keypoint detection model may process an image of a size a×b, to which the image of the document under test may be scaled. And then inputting the zoomed image into a trained keypoint detection model for detection.

And step 103, determining the orientation of the document to be tested based on the key point information of the table of the document to be tested.

In this embodiment, since the key point information characterizes the positions of at least two preset key points associated with the orientation of the document in the form of the document to be tested, after the key point information of the form of the document to be tested is determined, the orientation of the form of the document to be tested can be determined, and further, the orientation of the document to be tested can be easily determined.

When at least two preset key points related to the document orientation are more than two, namely more than two, any two different preset key points in the plurality can determine the orientation of the form of the document to be tested, so that the orientation of the document to be tested can be determined through the combination of various preset key points. Here, the orientation of the document under test can be characterized by the direction of its text lines or the direction of the text columns.

Specifically, referring to fig. 2, assuming that the position coordinates of the vertex B1 and the vertex B2 of the four vertices (B1, B2, B3, B4) of the table in the document to be tested are determined, where B1 and B2 are two vertices located in the row direction of the table, a direction vector from the vertex B1 to the vertex B2 may be constructed according to the position coordinates of the vertex B1 and the vertex B2, and the direction indicated by the direction vector is the text row direction of the document. For another example, assuming that the position coordinates of the point B1 and the vertex B3 in the four vertices (B1, B2, B3, B4) of the table in the document to be measured are determined, where B1 and B3 are two vertices located in the column direction of the table, a direction vector from the vertex B1 to the vertex B3 may be constructed according to the position coordinates of the vertex B1 and the vertex B3, and the direction perpendicular to the direction vector is the text line direction of the document.

When the preset key points include at least two vertexes of the table and end points at two ends of the title of the table, a first candidate direction of the text line direction can be determined through the vertexes of the table, then a second candidate direction representing the text line direction is determined according to the connecting line of the end points at the two ends of the title of the table, and a final text line direction detection result is determined according to the confidence degrees of the first candidate direction and the second candidate direction. Therefore, a plurality of candidate directions can be determined through a plurality of groups of preset key points, and reliability of the document orientation detection result is improved.

According to the method for determining the orientation of the document, firstly, the image of the document to be tested including the form is obtained, secondly, the image of the document to be tested is input into the trained key point detection model, the key point information of the form of the document to be tested, which is output by the key point detection model, is obtained, and finally, the orientation of the document to be tested is determined based on the key point information of the form of the document to be tested. Therefore, the embodiment of the application detects the key points of the form of the document to be detected through the trained key point detection model to obtain the key point information, and the key point information is used for determining the orientation of the whole document, so that the positioning is accurate, and the document orientation can be accurately and efficiently detected.

In some alternative implementations of the present embodiment, with continued reference to fig. 3, fig. 3 illustrates an exemplary process 300 for determining an orientation of a document under test according to the present application, where determining the orientation of the document under test based on key point information of a table of the document under test includes the following steps:

step 301, combining preset key points into at least one point pair, and determining position information of the at least one point pair based on the key point information.

Specifically, referring to fig. 2, the header endpoint D1 and the endpoint D2 are combined to form a point pair, the table vertex B1 and the vertex B2 are combined to form a point pair, the table vertex B3 and the vertex B4 are combined to form a point pair, the table vertex B1 and the vertex B3 are combined to form a point pair, and the table vertex B2 and the vertex B4 are combined to form a point pair, each of the combined point pairs can represent the true orientation of the table.

The point pairs are formed by combining preset key points, and the position information of the point pairs can be determined by the position information of the two preset key points. Optionally, the keypoint information comprises: presetting position coordinates of key points; the determining the location information of at least one point pair based on the key point information includes: and determining the position coordinates of each point pair by the position coordinates of two preset key points in each point pair. For example, the position coordinates of the title end point D1 are (12, 10), the position coordinates of the title end point D2 are (18, 15), and the positions of the title end point D1 and the end point D2 combined into one point pair may be expressed as (12, 10, 18, 15).

In some optional implementations of this embodiment, the key point information includes: presetting position coordinates of key points and confidence degrees of the position coordinates; the determining the location information of at least one point pair based on the key point information includes: and respectively determining the position coordinates and the corresponding confidence degrees of the point pairs according to the position coordinates and the corresponding confidence degrees of the two preset key points in the point pairs. In the alternative implementation manner, the key point information comprises the position coordinates of the preset key points and the confidence of the position coordinates, so that various different bases are provided for determining the orientation of the form of the document to be tested, and the reliability of the position information of the point pairs is further improved.

Step 302, determining an orientation of the document to be tested based on the position information of at least one point pair.

In this optional implementation manner, the key point information is associated with the document orientation, and the key points are combined into a point pair, so that the position information of the point pair is associated with the document orientation, and the orientation of the document to be tested can be determined according to the position information of at least one point pair. The direction of the straight line where the point pair is located relative to the document to be measured can be determined based on the position information of at least one point pair, and the direction of the document to be measured is further determined by the direction of the straight line where the point pair is located relative to the document to be measured.

In some optional implementations of this embodiment, the location information of the point pairs includes: the confidence of the position coordinates of the point pairs can be obtained through calculation of the confidence of the position coordinates of the key points in the point pairs, and specific calculation modes comprise addition average, a minimum value method and the like. For example, the heading end point D1 and the end point D2 are combined into a point pair, and the confidence of the position coordinate of the heading end point D1 is 50%; the confidence of the position coordinates of the heading end point D2 is 30%; the confidence of the position coordinates of the point pair formed by combining the heading end point D1 and the end point D2 obtained by the addition averaging method is 1/2 (50% +30%) =40%, and the confidence of the position coordinates of the point pair formed by combining the heading end point D1 and the end point D2 obtained by the minimum value method is the lowest value, namely 30%, of the confidence of the position coordinates of the heading end point D1 and the confidence of the position coordinates of the end point D2.

In this optional implementation manner, determining the orientation of the document to be tested based on the position information of at least one point pair includes:

and determining the orientation of the document to be tested according to the ordering of the confidence coefficient of the position coordinates of each point pair and the position coordinates of each point pair. In the alternative implementation manner, the confidence degrees of the position coordinates of the point pairs are ordered, so that some point pairs with higher confidence degrees of the position coordinates can be determined, and the reliability of the position selection of the point pairs is ensured. Furthermore, on the basis of higher confidence coefficient of the position coordinates, the reliability of orientation detection of the document to be detected can be further improved by combining the position coordinates of the point pairs.

Further, the method for determining the orientation of the document to be tested according to the ordering of the confidence degrees of the position coordinates of the point pairs and the position coordinates of the point pairs may adopt any one of the following implementation modes:

1) And sequencing the confidence degrees of the position coordinates of the point pairs, and determining the orientation of the document to be tested according to the preset corresponding relation between the connecting line of two preset key points in the point pair with the highest confidence degrees and the orientation of the document to be tested. It should be noted that, the preset correspondence between the connection line of any two preset key points and the orientation of the document to be tested may be calibrated in advance, for example, the preset correspondence between the connection line of two endpoints of the table header and the document to be tested is: the two are oriented vertically.

In the implementation mode, the point pair with the highest confidence coefficient of the position coordinate is selected, so that the most reliable point position information can be ensured, an alternative implementation mode is provided for the confidence coefficient sequencing of the position coordinate of the point pair, and the reliability of orientation detection of the document to be detected is ensured in the aspect of the optimal point-to-position characterization.

2) Ordering the confidence degrees of the position coordinates of all the point pairs from big to small, and acquiring the point pairs of the preset position before ordering to form a point pair set; calculating the orientation of the document to be tested based on the position coordinates of each point pair in the point pair set to obtain at least one candidate orientation corresponding to each point pair one by one; and determining the direction angle of each candidate orientation, calculating the average value of the direction angles of each candidate orientation, and taking the direction represented by the average value of the direction angles of each candidate orientation as the orientation of the document to be tested. It should be noted that, the direction angle of the candidate orientation may be formed by using a certain element in the image of the document to be measured as a reference, such as a row or column direction of the pixel.

In the implementation mode, the direction represented by the average value of the direction angles of each candidate orientation is selected as the orientation of the document to be tested, an optional implementation mode is provided for the confidence degree sequencing of the position coordinates of the point pairs, the commonality of the point pairs to the position information is absorbed, and the reliability of the orientation detection of the document to be tested is ensured in the aspect of the commonality of the point pairs.

The method for determining the orientation of the document to be tested, which is provided by the alternative implementation manner shown in fig. 3, combines preset key points into at least one point pair on the basis of determining at least two preset key points related to the orientation of the document, determines the position information of the at least one point pair on the basis of the key point information, and determines the orientation of the document to be tested on the basis of the position information of the at least one point pair.

In some optional implementations of this embodiment, the key point information includes: presetting position coordinates of key points; further, referring to FIG. 4, which illustrates a flowchart 400 of yet another embodiment of a method of determining an orientation of a document, the flowchart 400 of yet another embodiment of a method of determining an orientation of a document may include the steps of:

step 401, an image of a document to be tested including a form is acquired.

Step 402, inputting the image of the document to be tested into the trained key point detection model to obtain the key point information of the form of the document to be tested output by the key point detection model.

In step 403, the preset key points are combined into at least one point pair.

Step 404, determining the position coordinates of each point pair based on the position coordinates of two preset key points in each point pair.

Specifically, the point pairs are formed by combining key points, and the positions of the preset key points are also determined, and after the position coordinates of the preset key points are determined, the position coordinates of the point pairs can be determined. For example, the position coordinates of two preset key points (B1, B2) in a point pair of a certain table are B1 (24, 36), B2 (45, 28), respectively, and then the position coordinates of the current point pair may be represented as (24, 36, 45, 28).

And step 405, inputting the position coordinates of at least one point pair into a pre-trained document orientation prediction model to obtain the orientation of the document to be tested output by the orientation prediction model.

The orientation prediction model is obtained by acquiring position information of preset key points in a plurality of sample documents containing tables and combining orientation marking information of the sample documents.

The orientation prediction model is used for detecting the true orientation of the document to be detected, and is a pre-trained model, and when the position coordinates of at least one point pair are input into the orientation prediction model, the true orientation of the document to be detected can be obtained. Training process of orientation prediction model: firstly, a training sample set composed of a plurality of sample documents is obtained, wherein the sample documents of the training sample set are pre-marked with position information of preset key points, and the position information of each preset key point corresponds to the true orientation of the sample document. And secondly, training an initial model of the orientation prediction model by adopting a training sample in combination with orientation marking information of a sample document, and obtaining the orientation prediction model after multiple training, evaluation and algorithm parameter adjustment of the initial model.

According to the method for determining the orientation of the document, when the key point information comprises the position coordinates of the preset key points, the position coordinates of each point pair are respectively determined based on the position coordinates of the two preset key points in each point pair, the position coordinates of at least one point pair are input into a pre-trained document orientation prediction model, the orientation of the document to be detected is obtained, and the reliability of the orientation detection of the document to be detected is improved.

The following describes in detail the implementation procedure of one specific implementation of the method for determining document orientation according to the present embodiment, with reference to fig. 2:

the implementation process of the specific implementation mode is divided into three steps, namely, a first step of training a key point detection model. And a second step of using the model to detect key points of the image of the document to be detected comprising the form to obtain key point information. The third step, confirm the orientation of the file to be measured by the key point information, each step is specifically as follows:

in the first step, a large number of various document image materials including forms and titles of the forms need to be collected, key points are marked on the forms and the titles of the forms in each image, the left and right end points (for example, D1 and D2 in fig. 2) of the titles can be marked, and four vertexes (for example, B1 to B4 in fig. 2) of the forms are marked, so that a data set is constructed. And training a key point detection model based on the data set.

A second step of detecting key point information using the key point detection model from an image of a document to be detected including a form and a title of the form, specifically:

an image of a document to be tested comprising a table and a title of the table is obtained, corresponding pretreatment (for example, the image is reduced to the size required by the key point detection model) required by the key point detection model is carried out on the image, the image of the document to be tested is input into the key point detection model, the model detection process is operated, key point information output by the key point detection model is obtained, and the key point information comprises position coordinates of all preset key points and confidence degrees of the position coordinates.

Thirdly, determining the orientation of the document to be tested according to the key point information output by the key point detection model, and specifically:

combining the preset key points into at least one point pair, for example, combining D1 and endpoint D2 into one point pair, combining table vertex B1 and vertex B2 into one point pair, combining table vertex B3 and vertex B4 into one point pair, combining table vertex B1 and vertex B3 into one point pair, combining table vertex B2 and vertex B4 into one point pair, wherein each point pair can derive the true orientation of the table of the document to be tested, and various alternative implementations can be adopted here, so any one of the following can be adopted:

1) And sequencing the confidence degrees of the position coordinates of the point pairs, and directly deducing the true orientation of the form of the document to be tested according to a group of point pairs with the highest confidence degrees.

2) And sequencing the confidence degrees of the position coordinates of all the point pairs from large to small, taking the point pair with the highest confidence degree at the front preset position, and taking the average value of the true orientation of the corresponding table as the table orientation of the document to be tested.

3) Fitting the distribution of the true orientations of the corresponding tables of the point pairs, and taking the expected orientations as the table orientations of the document to be tested (refer to the implementation process shown in fig. 4 specifically).

In summary, through the above three steps, document orientation detection based on the titled form is completed.

It should be noted that, in some embodiments of the present application, the first step is an unnecessary step, for example, in practice, the keypoint detection model may be trained on other ends, where the keypoint detection model may be directly used to detect the position of the preset keypoints in the image of the document to be tested. The above description of the first step does not constitute a necessary limitation to the specific implementation of the embodiments of the present application.

With further reference to fig. 5, as an implementation of the method shown in the foregoing figures, the present application provides an embodiment of an apparatus for determining an orientation of a document, where the embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 1, and the apparatus is particularly applicable to various electronic devices.

As shown in fig. 5, the apparatus 500 for determining orientation of a document provided in this embodiment includes: an image acquisition module 501, an information acquisition module 502, and an orientation determination module 503. The image obtaining module 501 may be configured to obtain an image of a document to be tested including a table. The information obtaining module 502 may be configured to input an image of the document to be tested into the trained key point detection model, and obtain key point information of a form of the document to be tested output by the key point detection model, where the key point information characterizes positions of at least two preset key points associated with a document orientation of the form in the document to be tested. The orientation determining module 503 may be configured to determine the orientation of the document to be tested based on the key point information of the table of the document to be tested.

In this embodiment, in the apparatus 500 for determining the orientation of a document: the specific processing of the image obtaining module 501, the information obtaining module 502, the orientation determining module 503 and the technical effects thereof may refer to the descriptions related to step 101, step 102 and step 103 in the corresponding embodiment of fig. 1, and are not described herein.

In some optional implementations of this embodiment, the orientation determining module 503 includes: the sub-modules (not shown) are combined in pairs, and the sub-modules (not shown) are oriented. The point pair combination sub-module may be configured to combine preset key points into at least one point pair, and determine location information of the at least one point pair based on the key point information. The orientation determining sub-module may be configured to determine an orientation of the document under test based on the position information of the at least one point pair.

In some optional implementations of this embodiment, the keypoint information includes: presetting position coordinates of key points and confidence degrees of the position coordinates; the point-to-combination submodule comprises: a position determining unit (not shown in the figure). The position determining unit may be configured to determine the position coordinates and the corresponding confidence degrees of the respective pairs of points from the position coordinates and the corresponding confidence degrees of the two preset key points of the respective pairs of points. The orientation determination submodule includes: an orientation determining unit (not shown in the figure). The orientation determining unit may be configured to determine the orientation of the document to be measured according to the order of the confidence degrees of the position coordinates of the pairs of points and the position coordinates of the pairs of points.

In some optional implementations of this embodiment, the orientation determining unit includes: an orientation determining subunit (not shown in the figures). The orientation determining subunit may be configured to sort the confidence degrees of the position coordinates of the point pairs, and determine the orientation of the document to be tested according to a preset correspondence between the connection line of two preset key points in the point pair with the highest confidence degrees and the orientation of the document to be tested.

In some optional implementations of this embodiment, the orientation determining unit includes: the collection constitutes subunits (not shown), orientation computation subunits (not shown), average orientation computation subunits (not shown). The above-mentioned set forming subunit may be configured to sort confidence degrees of position coordinates of all the point pairs from large to small, obtain point pairs of preset bits before sorting, and form a point pair set. The orientation calculating subunit may be configured to calculate, based on the position coordinates of each point pair in the point pair set, an orientation of the document to be measured, respectively, to obtain at least one candidate orientation corresponding to each point pair one to one. The average orientation calculating subunit may be configured to determine a direction angle of each candidate orientation, calculate an average value of the direction angles of each candidate orientation, and use a direction represented by the average value of the direction angles of each candidate orientation as the orientation of the document to be measured.

In some optional implementations of this embodiment, the keypoint information includes: presetting position coordinates of key points; the point-to-combination submodule comprises: a point pair determining unit (not shown in the figure). The orientation determination submodule includes: towards the prediction unit (not shown in the figure). The above-described point pair determining unit may be configured to determine the position coordinates of each point pair based on the position coordinates of two preset key points of each point pair, respectively. The orientation prediction unit may be configured to input the position coordinates of at least one point pair into a pre-trained document orientation prediction model, and obtain an orientation of the document to be tested output by the orientation prediction model; the orientation prediction model is obtained by acquiring position information of preset key points in a plurality of sample documents containing tables and combining orientation marking information of the sample documents.

In some optional implementations of this embodiment, the at least two preset key points associated with the document orientation include: at least two of the four vertices of the table in the document under test.

In some optional implementations of this embodiment, the at least two preset key points associated with the document orientation further include: at least two endpoints of both ends of a title of a table in a document under test are characterized.

In the device for determining the orientation of the document provided in the embodiment of the present application, firstly, the image acquisition module 501 acquires an image of the document to be tested including a form, secondly, the information acquisition module 502 inputs the image of the document to be tested into the trained key point detection model, acquires the key point information of the form of the document to be tested output by the key point detection model, and finally, the orientation determination module 503 determines the orientation of the document to be tested based on the key point information of the form of the document to be tested. Therefore, the embodiment of the application detects the key points of the form of the document to be detected through the trained key point detection model to obtain the key point information, and the key point information is used for determining the orientation of the whole document, so that the positioning is accurate, and the document orientation can be accurately and efficiently detected.

According to embodiments of the present application, an electronic device and a readable storage medium are also provided.

As shown in fig. 6, a block diagram of an electronic device is provided for a method of determining orientation of a document according to an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the application described and/or claimed herein.

As shown in fig. 6, the electronic device includes: one or more processors 601, memory 602, and interfaces for connecting the components, including high-speed interfaces and low-speed interfaces. The various components are interconnected using different buses 605 and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the electronic device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface. In other embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. Also, multiple electronic devices may be connected, each providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). One processor 601 is illustrated in fig. 6.

Memory 602 is a non-transitory computer-readable storage medium provided herein. The memory stores instructions executable by the at least one processor to cause the at least one processor to perform the method of determining the orientation of a document provided herein. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the method of determining the orientation of a document provided by the present application.

The memory 602 is used as a non-transitory computer readable storage medium for storing non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules (e.g., the image acquisition module 501, the information acquisition module 502, and the orientation determination module 503 shown in fig. 5) corresponding to the method of determining the orientation of a document in the embodiments of the present application. The processor 601 executes various functional applications of the server and data processing, i.e., implements the method of determining the orientation of a document in the above-described method embodiments, by running non-transitory software programs, instructions, and modules stored in the memory 602.

The memory 602 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, at least one application program required for a function; the storage data area may store data created according to the use of the electronic device of the method for determining a transmission path, and the like. In addition, the memory 602 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, memory 602 may optionally include memory remotely located relative to processor 601, which may be connected to the electronic device for determining the orientation of the document via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device of the method of determining an orientation of a document may further include: an input device 603 and an output device 604. The processor 601, memory 602, input devices 603 and output devices 604 may be connected by a bus 605 or otherwise, in fig. 6 by way of example by bus 605.

The input device 603 may receive input numeric or character information and generate key signal inputs related to user settings and function controls of the electronic device for the method of determining a transmission path, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointer stick, one or more mouse buttons, a track ball, a joystick, etc. input devices. The output means 604 may include a display device, auxiliary lighting means (e.g., LEDs), tactile feedback means (e.g., vibration motors), and the like. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device may be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASIC (application specific integrated circuit), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

These computing programs (also referred to as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present application may be performed in parallel, sequentially, or in a different order, provided that the desired results of the technical solutions disclosed in the present application can be achieved, and are not limited herein.

The above embodiments do not limit the scope of the application. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present application are intended to be included within the scope of the present application.

Claims

1. A method of determining an orientation of a document, the method comprising:

acquiring an image of a document to be tested comprising a form;

inputting the image of the document to be tested into a trained key point detection model to obtain key point information of a form of the document to be tested, which is output by the key point detection model, wherein the key point information represents the positions of at least two preset key points, associated with the orientation of the document, of the form in the document to be tested;

determining the orientation of the document to be tested based on the key point information of the form of the document to be tested;

the determining the orientation of the document to be tested based on the key point information of the table of the document to be tested includes: combining the preset key points into at least one point pair, and determining the position information of the at least one point pair based on the key point information;

Determining the orientation of the document to be tested based on the position information of the at least one point pair;

the key point information includes: the position coordinates of the preset key points; and

the determining the location information of the at least one point pair based on the keypoint information includes:

determining the position coordinates of each point pair based on the position coordinates of two preset key points in each point pair;

the determining the orientation of the document to be tested based on the position information of the at least one point pair comprises:

inputting the position coordinates of the at least one point pair into a pre-trained document orientation prediction model to obtain the orientation of the document to be tested, which is output by the document orientation prediction model;

the document orientation prediction model is obtained by acquiring position information of preset key points in a plurality of sample documents containing tables and combining orientation marking information of the sample documents.

2. The method of claim 1, wherein the keypoint information comprises: the position coordinates of the preset key points and the confidence of the position coordinates; and

Determining the position coordinates and the corresponding confidence degrees of the point pairs respectively by the position coordinates and the corresponding confidence degrees of two preset key points in the point pairs;

and determining the orientation of the document to be tested according to the ordering of the confidence coefficient of the position coordinates of each point pair and the position coordinates of each point pair.

3. The method of claim 2, wherein the determining the orientation of the document under test based on the ordering of the confidence of the position coordinates of each of the pairs of points and the position coordinates of each of the pairs of points comprises:

and sequencing the confidence degrees of the position coordinates of the point pairs, and determining the orientation of the document to be tested according to the preset corresponding relation between the connecting line of two preset key points in the point pair with the highest confidence degrees and the orientation of the document to be tested.

4. The method of claim 2, wherein the determining the orientation of the document under test based on the ordering of the confidence of the position coordinates of each of the pairs of points and the position coordinates of each of the pairs of points comprises:

ordering the confidence degrees of the position coordinates of all the point pairs from big to small, and acquiring the point pairs of the preset position before ordering to form a point pair set;

Calculating the orientation of the document to be tested based on the position coordinates of each point pair in the point pair set respectively to obtain at least one candidate orientation corresponding to each point pair one by one;

and determining the direction angle of each candidate orientation, calculating the average value of the direction angles of each candidate orientation, and taking the direction represented by the average value of the direction angles of each candidate orientation as the orientation of the document to be tested.

5. The method according to one of claims 1 to 4, wherein the at least two preset keypoints associated with document orientation comprise: at least two of the four vertices of the table in the document under test.

6. The method of claim 5, wherein the at least two preset keypoints associated with document orientation further comprise:

at least two endpoints of two ends of a title of a table in the document under test are characterized.

7. An apparatus for determining an orientation of a document, the apparatus comprising:

an image acquisition module configured to acquire an image of a document to be tested including a form;

the information acquisition module is configured to input the image of the document to be tested into a trained key point detection model, and acquire key point information of a form of the document to be tested, which is output by the key point detection model, wherein the key point information characterizes the positions of at least two preset key points, associated with the orientation of the document, of the form in the document to be tested;

An orientation determining module configured to determine an orientation of the document to be tested based on key point information of a table of the document to be tested; the orientation determination module further includes:

a point pair combining sub-module configured to combine the preset key points into at least one point pair, and determine location information of the at least one point pair based on the key point information;

an orientation determination sub-module configured to determine an orientation of the document under test based on the location information of the at least one point pair;

the key point information includes: the position coordinates of the preset key points;

the point-to-combination submodule comprises:

a point pair determining unit configured to determine position coordinates of each of the point pairs based on position coordinates of two preset key points of each of the point pairs, respectively;

the orientation determination submodule includes:

an orientation prediction unit configured to input the position coordinates of the at least one point pair into a pre-trained document orientation prediction model, and obtain the orientation of the document to be tested output by the document orientation prediction model;

8. The apparatus of claim 7, wherein the keypoint information comprises: the position coordinates of the preset key points and the confidence of the position coordinates;

the point-to-combination submodule comprises:

a position determining unit configured to determine a position coordinate and a corresponding confidence level of each of the point pairs from the position coordinates and the corresponding confidence levels of two preset key points in each of the point pairs, respectively;

the orientation determination submodule includes:

and the orientation determining unit is configured to determine the orientation of the document to be tested according to the ordering of the confidence of the position coordinates of each point pair and the position coordinates of each point pair.

9. The apparatus of claim 8, wherein the orientation determining unit comprises:

the orientation determining subunit is configured to sort the confidence degrees of the position coordinates of the point pairs, and determine the orientation of the document to be detected according to a preset corresponding relation between the connecting line of two preset key points in the point pair with the highest confidence degrees and the orientation of the document to be detected.

10. The apparatus of claim 8, wherein the orientation determining unit comprises:

the set forming subunit is configured to sort the confidence degrees of the position coordinates of all the point pairs from big to small, obtain the point pairs of the preset position before sorting, and form a point pair set;

The orientation calculation subunit is configured to calculate the orientation of the document to be measured based on the position coordinates of each point pair in the point pair set respectively to obtain at least one candidate orientation corresponding to each point pair one by one;

an average orientation calculation subunit configured to determine a direction angle of each of the candidate orientations, calculate an average value of the direction angles of each of the candidate orientations, and take a direction represented by the average value of the direction angles of each of the candidate orientations as an orientation of the document to be measured.

11. The apparatus of one of claims 7 to 10, wherein the at least two preset keypoints associated with document orientation comprise: at least two of the four vertices of the table in the document under test.

12. The apparatus of claim 11, wherein the at least two preset keypoints associated with document orientation further comprise:

13. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-6.

14. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-6.