CN112733599A

CN112733599A - Document image processing method and device, storage medium and terminal equipment

Info

Publication number: CN112733599A
Application number: CN202011412437.8A
Authority: CN
Inventors: 刘坚强; 彭鑫; 周代国
Original assignee: Beijing Xiaomi Pinecone Electronic Co Ltd; Xiaomi Technology Wuhan Co Ltd
Current assignee: Beijing Xiaomi Pinecone Electronic Co Ltd; Xiaomi Technology Wuhan Co Ltd
Priority date: 2020-12-04
Filing date: 2020-12-04
Publication date: 2021-04-30

Abstract

The disclosure relates to a document image processing method, a document image processing device, a storage medium and a terminal device, wherein the method comprises the following steps: acquiring a document image of a target document; acquiring corner coordinates and image principal point coordinates of the document image; acquiring a feature vector corresponding to the document image according to the corner point coordinates and the image principal point coordinates; and inputting the feature vector into a pre-trained aspect ratio acquisition model, and outputting the aspect ratio of the target document. That is to say, after the feature vector corresponding to the document image of the target document is obtained, the aspect ratio of the target document can be obtained through the pre-trained aspect ratio model, so that the accuracy of the obtained aspect ratio of the target document is higher because the aspect ratio model does not limit the shape of the document image.

Description

Document image processing method and device, storage medium and terminal equipment

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to a method and an apparatus for processing a document image, a storage medium, and a terminal device.

Background

With the rapid development of portable photographing equipment such as smart phones and other technologies, people can shoot photos with higher quality more conveniently, and the photos can be used for recording wonderful moments in daily life and can also be used as digital copies of paper documents for recording or sharing important information. However, due to the influence of perspective distortion, a document photo corresponding to a paper document generated by a mobile phone may be an irregular quadrangle, and in order to correct the document photo, the aspect ratio of the document photo needs to be acquired.

In the related art, a pinhole camera model can be used for modeling an imaging process of a camera, and a calculation formula of the aspect ratio of a document photo is deduced based on a plurality of constraint conditions of a rectangle in a real scene in a single-view geometry. However, since the calculation formula is for a document photo of a regular rectangle, in the case where the document photo is of an irregular rectangle, the accuracy of the aspect ratio of the document photo obtained according to the calculation formula is low.

Disclosure of Invention

In order to overcome the problems in the related art, the present disclosure provides a document image processing method, apparatus, storage medium, and terminal device.

According to a first aspect of embodiments of the present disclosure, there is provided a document image processing method, the method including: acquiring a document image of a target document; acquiring corner coordinates and image principal point coordinates of the document image; acquiring a feature vector corresponding to the document image according to the corner point coordinates and the image principal point coordinates; and inputting the feature vector into a pre-trained aspect ratio acquisition model, and outputting the aspect ratio of the target document.

Optionally, the acquiring the corner point coordinates and the main image point coordinates of the document image includes: acquiring the size of the document image; extracting an edge position corresponding to the target document from the document image; determining the corner coordinates according to the edge positions and the size of the document image; and taking the central point of the document image as the image principal point coordinate.

Optionally, the obtaining a feature vector corresponding to the document image according to the corner point coordinates and the image principal point coordinates includes: carrying out perspective transformation on the corner point coordinates; and forming the feature vector by using the image principal point coordinates and the corner point coordinates after perspective transformation.

Optionally, the performing perspective transformation on the corner coordinates includes: acquiring a perspective transformation matrix; and the coordinates of the corner points are multiplied by the perspective transformation matrix.

Optionally, the aspect ratio obtaining model is trained by: training a target neural network model through a sample set to obtain the aspect ratio acquisition model; wherein the sample set comprises: sample document images of a plurality of sample documents, and actual aspect ratios of a plurality of the sample documents.

Optionally, the training the target neural network model through the sample set to obtain the aspect ratio obtaining model includes: obtaining a sample feature vector corresponding to a sample document image of each sample document to obtain a plurality of sample feature vectors; and performing iterative training on the target neural network model through a preset loss function of the target neural network model according to the actual aspect ratios of the plurality of sample feature vectors and the plurality of sample documents to obtain the aspect ratio acquisition model.

Optionally, the iteratively training the target neural network model through a preset loss function of the target neural network model according to the actual aspect ratios of the plurality of sample feature vectors and the plurality of sample documents to obtain the aspect ratio obtaining model includes:

circularly executing the following steps: inputting a plurality of sample feature vectors into the target neural network model, obtaining a loss value of the preset loss function according to the actual aspect ratio of a plurality of sample documents, and updating parameters of the target neural network model according to the loss value under the condition that the loss value is greater than a preset loss threshold value to obtain a new target neural network model; and taking the target neural network model corresponding to the loss value as the aspect ratio acquisition model until the loss value is less than or equal to the preset loss threshold value.

According to a second aspect of the embodiments of the present disclosure, there is provided a document image processing apparatus, the apparatus including: a document image acquisition module configured to acquire a document image of a target document; the coordinate acquisition module is configured to acquire corner coordinates and image principal point coordinates of the document image; the vector acquisition module is configured to acquire a feature vector corresponding to the document image according to the corner point coordinates and the image principal point coordinates; and the aspect ratio acquisition module is configured to input the feature vector to a pre-trained aspect ratio acquisition model and output the aspect ratio of the target document.

Optionally, the coordinate acquiring module includes: a size acquisition sub-module configured to acquire a size of the document image; the position extraction submodule is configured to extract an edge position corresponding to the target document in the document image; a corner coordinate obtaining sub-module configured to determine the corner coordinates according to the edge positions and the size of the document image; and the image principal point coordinate acquisition sub-module is configured to take the central point of the document image as the image principal point coordinate.

Optionally, the vector obtaining module includes: a transformation submodule configured to perform perspective transformation on the corner coordinates; and the vector composition submodule is configured to combine the image principal point coordinates and the corner point coordinates after perspective transformation into the feature vector.

Optionally, the transformation submodule is configured to: acquiring a perspective transformation matrix; and the coordinates of the corner points are multiplied by the perspective transformation matrix.

Optionally, the apparatus further comprises: the model acquisition module is configured to train a target neural network model through a sample set to obtain the aspect ratio acquisition model; wherein the sample set comprises: sample document images of a plurality of sample documents, and actual aspect ratios of a plurality of the sample documents.

Optionally, the model obtaining module includes: the sample vector acquisition sub-module is configured to acquire a sample feature vector corresponding to a sample document image of each sample document to obtain a plurality of sample feature vectors; and the model obtaining submodule is configured to perform iterative training on the target neural network model through a preset loss function of the target neural network model according to the actual aspect ratio of the plurality of sample feature vectors and the plurality of sample documents to obtain the aspect ratio obtaining model.

Optionally, the model obtaining sub-module is configured to: circularly executing the following steps: inputting a plurality of sample feature vectors into the target neural network model, obtaining a loss value of the preset loss function according to the actual aspect ratio of a plurality of sample documents, and updating parameters of the target neural network model according to the loss value under the condition that the loss value is greater than a preset loss threshold value to obtain a new target neural network model; and taking the target neural network model corresponding to the loss value as the aspect ratio acquisition model until the loss value is less than or equal to the preset loss threshold value.

According to a third aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the steps of the document image processing method provided by the first aspect of the present disclosure.

According to a fourth aspect of the embodiments of the present disclosure, there is provided a terminal device, including: a memory having a computer program stored thereon; a processor for executing the computer program in the memory to implement the steps of the document image processing method provided by the first aspect of the present disclosure.

The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects: acquiring a document image of a target document; acquiring corner coordinates and image principal point coordinates of the document image; acquiring a feature vector corresponding to the document image according to the corner point coordinates and the image principal point coordinates; and inputting the feature vector into a pre-trained aspect ratio acquisition model, and outputting the aspect ratio of the target document. That is to say, after the feature vector corresponding to the document image of the target document is obtained, the aspect ratio of the target document can be obtained through the aspect ratio model trained in advance, so that the aspect ratio model does not limit the shape of the document image, and the accuracy of the obtained aspect ratio of the target document is higher.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.

FIG. 1 is a flow diagram illustrating a document image processing method according to an exemplary embodiment;

FIG. 2 is a schematic illustration of a document image shown in accordance with an exemplary embodiment;

FIG. 3 is a flow diagram illustrating another document image processing method in accordance with one illustrative embodiment;

FIG. 4 is a flow diagram illustrating a method of training an aspect ratio acquisition model in accordance with an exemplary embodiment;

FIG. 5 is a schematic diagram of a document image processing apparatus according to an exemplary embodiment;

FIG. 6 is a schematic diagram illustrating the structure of another document image processing apparatus according to an exemplary embodiment;

fig. 7 is a block diagram illustrating a terminal device according to an example embodiment.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

First, an application scenario of the present disclosure will be explained. The present disclosure can be applied to a terminal device having a photographing function, and generating a digitized document by photographing a document through the terminal device is easy to use and simple to operate compared to a conventional scanner, but a document of a regular rectangle in general may become an irregular quadrangle in a photographed picture due to the presence of perspective deformation. When the subsequent image processing is performed on the photo, image correction is required to be performed first, and the original shape of the document is recovered.

At present, the most accurate document true aspect ratio estimation method is a document aspect ratio estimation method based on rectangular constraint conditions in single-view geometry, and the method uses a pinhole camera model to model the imaging process of a camera and deduces a calculation formula of the aspect ratio of a shot document based on a plurality of constraint conditions of rectangles in the single-view geometry in a real scene. However, the derivation condition of the calculation formula is that the document to be photographed is a regular rectangle, and therefore, the accuracy of the aspect ratio obtained by the calculation formula is low for the document to be photographed having an irregular rectangle, and the calculation formula cannot calculate the aspect ratio of the document to be photographed under a partial photographing condition, for example, when the photographing inclination angle is large.

In order to solve the above problems, the present disclosure provides a document image processing method, an apparatus, a storage medium, and a terminal device, where after a feature vector corresponding to a document image of a target document is obtained, an aspect ratio of the target document may be obtained through a pre-trained aspect ratio model, and thus, since the aspect ratio model does not limit the shape of the document image, the accuracy of the obtained aspect ratio of the target document is higher.

The present disclosure is described below with reference to specific examples.

The present disclosure may be applied to terminal devices, which may include handheld devices, wearable devices, cameras, and the like, and the present disclosure is not limited thereto.

FIG. 1 is a flowchart illustrating a document image processing method according to an exemplary embodiment. As shown in fig. 1, the method includes:

s101, acquiring a document image of the target document.

The shape of the target document may be a regular rectangle, such as a rectangle, or an irregular rectangle, such as a circle, a triangle, etc., the target document may be a handwritten note, a paper contract, an invoice, a business card, a poster, a book, etc., and the shape and the type of the target document are not limited by the present disclosure. The document image may include an image corresponding to the target document and an image around the target document, and fig. 2 is a schematic view of a document image shown according to an exemplary embodiment, as shown in fig. 2, a white portion is the image corresponding to the target document, and a black area is the image around the target document.

In this step, the terminal may shoot the target document through a camera mounted on the terminal to obtain a document image of the target document, where the camera may be any type of camera, and the terminal may shoot the target document at any angle, and the present disclosure does not limit a shooting inclination angle, a shooting manner, and the like.

And S102, acquiring the corner point coordinates and the image principal point coordinates of the document image.

The corner point coordinates may be vertex coordinates corresponding to the target document in the document image, and the image principal point coordinates may be coordinates of a center point of the document image.

In this step, after the document image of the target document is obtained, the corner coordinates of the document image may be determined according to the position of the target document in the document image. For example, the distances from the four vertices of the target document to the edges of the document image may be obtained first, and the corner coordinates may be determined according to the distances. As shown in fig. 2, if the width of the document image is width, the height of the document image is height,the coordinates of the top left corner of the document image may be defined as (0, 0), and the top left corner of the target document is spaced x from the left edge of the document image₁A distance y from an upper edge of the document image₁Then the coordinate of the upper left corner of the target document is (x)₁，y₁) In the same way, other corner coordinates can be acquired. In addition, after the width and the height of the document image are acquired, the principle point coordinates may be determined based on the width and the height.

S103, acquiring a feature vector corresponding to the document image according to the corner point coordinates and the image principal point coordinates.

In this step, after obtaining the corner point coordinates and the image principal point coordinates of the document image, the corner point coordinates and the image principal point coordinates may be combined to obtain the feature vector corresponding to the document image. Illustratively, if the image principal point coordinate is (x)₀，y₀) The corner point coordinate is (x)₁，y₁)、(x₂，y₂)、(x₃，y₃) And (x)₄，y₄) Then the feature vector may be (x)₀，y₀，x₁，y₁，x₂，y₂，x₃，y₃，x₄，y₄)。

And S104, inputting the feature vector into a pre-trained aspect ratio acquisition model, and outputting the aspect ratio of the target document.

In this step, after the feature vector corresponding to the document image is obtained, the aspect ratio obtaining model trained in advance may be obtained, and the feature vector is input into the aspect ratio obtaining model, where the output of the aspect ratio obtaining model is the aspect ratio of the target document.

By adopting the method, after the feature vector corresponding to the document image of the target document is obtained, the aspect ratio of the target document can be obtained through the pre-trained aspect ratio model, so that the accuracy of the aspect ratio of the obtained target document is higher because the aspect ratio model does not limit the shape of the document image.

FIG. 3 is a flow diagram illustrating another document image processing method according to an exemplary embodiment. As shown in fig. 3, the method includes:

s301, acquiring a document image of the target document.

The shape of the target document may be a regular rectangle, such as a rectangle, or an irregular rectangle, such as a circle, a triangle, etc., the target document may be a handwritten note, a paper contract, an invoice, a business card, a poster, a book, etc., and the shape and the type of the target document are not limited by the present disclosure. The document image may be an image including the target document and surroundings of the target document.

S302, obtaining the size of the document image.

In this step, after the document image is acquired, the width and height of the document image may be acquired by a related art method.

S303, extracting the edge position corresponding to the target document from the document image.

In this step, edge detection may be performed on the document image, edges in the document image are extracted, and then all the extracted edges are fitted to obtain an edge position corresponding to the target document. As shown in fig. 2, the edge position is a position where an edge of a white area in the figure is located.

S304, determining the corner coordinates of the document image according to the edge position and the size of the document image.

In this step, after the edge position corresponding to the target document and the size of the document image are obtained, the corner point of the document image may be determined according to the edge position, and then the corner point coordinates of the document image may be determined according to the position of the corner point and the size of the document image. For example, the coordinates of the corner point may be determined according to a distance between the position of the corner point and the edge of the document image, and the specific implementation manner may refer to step S102, which is not described herein again.

S305, taking the center point of the document image as the image principal point coordinate of the document image.

And S306, acquiring a feature vector corresponding to the document image according to the corner point coordinates and the image principal point coordinates.

In this step, after obtaining the corner point coordinates and the image principal point coordinates of the document image, the corner point coordinates and the image principal point coordinates may be combined to obtain the feature vector corresponding to the document image. Illustratively, if the principal point coordinate is (x)₀，y₀) The coordinate of the corner point is (x)₁，y₁)、(x₂，y₂)、(x₃，y₃) And (x)₄，y₄) Then the feature vector may be (x)₀，y₀，x₁，y₁，x₂，y₂，x₃，y₃，x₄，y₄)。

In consideration of the fact that when the document image is obtained by shooting the target document through the terminal, distortion may exist in the document image due to the influence of shooting angles and the like, in this case, the coordinates of the corner points of the obtained document image are not accurate enough, so that the aspect ratio of the finally obtained target document is not accurate enough. To correct the distorted document image, the corner coordinates may be subjected to perspective transformation after being acquired.

In a possible implementation manner, a perspective transformation matrix may be obtained, the angular point coordinate is multiplied by the perspective transformation matrix, and perspective transformation is performed on the angular point coordinate to obtain an angular point coordinate after perspective transformation.

Wherein the perspective transformation matrix can be obtained by the following formula:

where H is the perspective transformation matrix, s is the uniform scaling factor,

theta is in the range of-30, 30]，t＝[tx，ty]The range of tx and ty is [ -50, 50 [)]K is a triangular matrix satisfying normalization, V ═ x, y]The value ranges of x and y are both [ -20,20]v is a scale factor, and the value range of v is any positive integer.

In the case of obtaining the perspective transformation matrix according to formula (1), the parameters θ, tx, ty, x, y, and v may take any value within a range of values, which is not limited in the present disclosure.

After perspective transformation is performed on the corner point coordinates to obtain the perspective transformed corner point coordinates, the feature vector can be formed by the image principal point coordinates and the perspective transformed corner point coordinates. Thus, the aspect ratio of the target document obtained according to the feature vector is more accurate. For example, if the corner coordinates after perspective transformation are (x)₁₁，y₁₁)、(x₂₂，y₂₂)、(x₃₃，y₃₃) And (x)₄₄，y₄₄) The image principal point coordinate is (x)₀，y₀) Then the feature vector obtained by combining the coordinates of the image principal point and the coordinates of the corner point is (x)₀，y₀，x₁₁，y₁₁，x₂₂，y₂₂，x₃₃，y₃₃，x₄₄，y₄₄)。

S307, inputting the feature vector into a pre-trained aspect ratio acquisition model, and outputting the aspect ratio of the target document.

By adopting the method, after a document image of a target document is obtained, the size of the document image can be obtained, the edge position of the target document is extracted from the document image, the corner point coordinates of the document image are determined according to the size of the document image and the edge position, after the feature vector corresponding to the document image is obtained according to the corner point coordinates and the image principal point coordinates, the aspect ratio of the target document can be obtained through a pre-trained aspect ratio model, so that the aspect ratio obtaining model does not limit the shape of the document image, and the accuracy of the aspect ratio of the obtained target document is higher; in addition, under the condition that the document image has distortion, perspective transformation can be carried out on the corner coordinates of the document image to obtain more accurate corner coordinates, so that the accuracy of the aspect ratio of the target document can be further improved.

In order to ensure the accuracy of the aspect ratio obtaining model, the aspect ratio obtaining model may be trained in advance, and the following describes a training process of the aspect ratio obtaining model.

In a possible implementation manner, the target neural network model can be trained through a sample set to obtain the aspect ratio acquisition model; wherein the sample set comprises: sample document images of a plurality of sample documents, and actual aspect ratios of the plurality of sample documents. FIG. 4 is a flow diagram illustrating a method of training an aspect ratio acquisition model, as shown in FIG. 4, in accordance with an exemplary embodiment, the method comprising:

s401, sample document images of a plurality of sample documents are obtained, and actual aspect ratios of the plurality of sample documents are obtained.

The sample document may be a document of any shape, and the shape of the sample document is not limited by the present disclosure.

In this step, the manner of obtaining the sample document images of the plurality of sample documents may refer to the manner of obtaining the document image of the target document in the embodiment shown in fig. 1 and fig. 3, and is not described herein again. After the sample document images of the plurality of sample documents are obtained, the corner coordinates of the sample document images may be obtained first, and the manner of obtaining the corner coordinates of the sample document images may refer to the manner of obtaining the corner coordinates of the document images in the embodiments shown in fig. 1 and fig. 3, which is not described herein again.

Further, after the coordinates of the corner points of the sample document image are obtained, for a sample document with a regular rectangular shape, the actual aspect ratio of the sample document may be obtained in a related art manner. Illustratively, the actual aspect ratio of the sample document may be obtained by the following formula:

wherein the content of the first and second substances,

is thatThe actual aspect ratio of the sample document, A is the internal reference matrix of the camera that captured the sample document,

and m is the corner coordinates of the sample document image.

It should be noted that, for a sample document with an irregular rectangle, the actual aspect ratio of the multiple sample documents may be obtained manually.

S402, obtaining a sample feature vector corresponding to the sample document image of each sample document to obtain a plurality of sample feature vectors.

In this step, the manner of obtaining the sample feature vector corresponding to each sample document image may refer to the manner of obtaining the feature vector of the document image in the embodiment shown in fig. 1 and fig. 3, which is not described herein again.

S403, training the target neural network model according to the actual aspect ratio of the plurality of sample feature vectors and the plurality of sample documents to obtain the aspect ratio obtaining model.

The target neural network model may be an MLP (Multi-Layer Perceptron) neural network model, and the target neural network model may include a plurality of hidden layers, where the larger the number of the hidden layers, the higher the accuracy of the output aspect ratio, but the longer the time for obtaining the aspect ratio, so the number of the hidden layers may be determined according to the type of the terminal, for example, for a terminal with higher real-time requirement and lower accuracy requirement, fewer hidden layers may be set, for example, 2 hidden layers may be set, for a terminal with lower real-time requirement and higher accuracy requirement, more hidden layers may be set, for example, 4 hidden layers may be set, and the present disclosure may comprehensively determine the number of the hidden layers according to real-time and accuracy.

In this step, after the plurality of sample feature vectors are obtained, the target neural network model may be iteratively trained through a preset loss function of the target neural network model according to the plurality of sample feature vectors and the actual aspect ratio of the plurality of sample documents, so as to obtain the aspect ratio obtaining model. The preset loss function may be a square loss function, or may be other types of loss functions, which is not limited in this disclosure. In the process of training the target neural network model, parameters of the target neural network model can be updated in a back propagation mode according to the loss value of the preset loss function, and the aspect ratio obtaining model is finally obtained.

In training the target neural network, the following steps may be performed cyclically: inputting a plurality of sample feature vectors into the target neural network model, obtaining a loss value of the preset loss function according to the actual aspect ratio of a plurality of sample documents, and updating parameters of the target neural network model according to the loss value under the condition that the loss value is greater than a preset loss threshold value to obtain a new target neural network model; and taking the target neural network model corresponding to the loss value as the aspect ratio acquisition model until the loss value is less than or equal to the preset loss threshold value.

The actual aspect ratio of the sample document may be used as a label of a sample feature vector corresponding to the sample document, the sample feature vector may be input into the target neural network model during training of the target neural network model, the target neural network model may output a predicted aspect ratio of the sample document according to the sample feature vector, and the actual aspect ratio of the sample document may be obtained after the predicted aspect ratio is output. At the beginning of the training of the target neural network model, the error between the predicted aspect ratio of the sample document and the actual aspect ratio of the sample document may be relatively large, and ideally, the error is smaller and smaller during the training of the target neural network model. After obtaining the predicted aspect ratio and the actual aspect ratio of the sample document, a loss value of the preset loss function may be obtained, for example, if the preset loss function is a square loss function, the loss value may be obtained by the following formula:

wherein L (S) is the loss value, R is the actual aspect ratio, and MLP (S) is the predicted aspect ratio.

After the loss value of the preset loss function is obtained, the preset loss threshold value can be obtained, when the loss value is larger than the preset loss threshold value, the target neural network model does not meet the convergence condition, iterative training is required to be continued, and in this case, the parameters of the target neural network model can be adjusted according to the loss value to obtain a new target neural network model. Then, the plurality of sample feature vectors may be input into the new target neural network model, a new loss value of the preset loss function may be obtained in the same manner as described above, if the new loss value is still greater than the preset loss threshold, the parameters of the new target neural network model may be continuously adjusted in the same manner as described above, if the loss value is less than or equal to the preset loss threshold, the training of the target neural network may be stopped, and the target neural network model corresponding to the loss value may be used as the aspect ratio obtaining model.

It should be noted that the plurality of sample feature vectors may be divided into a training set and a test set, for example, if the sample feature vectors include 100 sample feature vectors, 90 sample feature vectors may be used as the training set, and the remaining 10 sample feature vectors may be used as the test set. Training the target neural network model through the sample feature vectors in the training set to obtain the aspect ratio obtaining model, wherein one sample feature vector can be input each time, the target neural network model can output the predicted aspect ratio of the sample document according to the sample feature vectors, obtain the actual aspect ratio of the sample document after outputting the predicted aspect ratio, obtain the loss value of the preset loss function according to the predicted aspect ratio and the actual aspect ratio, and adjust the parameters of the target neural network model according to the loss value to obtain a new target neural network model; or a plurality of sample feature vectors may be input each time, the target neural network model may output predicted aspect ratios of the plurality of sample documents according to the plurality of sample feature vectors, and after the plurality of predicted aspect ratios are output, obtain a plurality of actual aspect ratios of the sample documents, obtain a loss value of the preset loss function according to an average value of the plurality of predicted aspect ratios and an average value of the plurality of actual aspect ratios, and adjust a parameter of the target neural network model according to the loss value to obtain a new target neural network model.

After the aspect ratio obtaining model is obtained, the aspect ratio obtaining model can be verified through the sample feature vectors in the test set to obtain an error of the aspect ratio obtaining model, and the smaller the error is, the higher the accuracy of the aspect ratio obtaining model is.

For example, the error of the aspect ratio acquisition model may be obtained by the following formula:

where MER is the error of the aspect ratio acquisition model, N is the number of acquired aspect ratios, error _ rate_nError of aspect ratio for the nth acquisition, r_calFor the aspect ratio of the document obtained by the aspect ratio acquisition model, r_truIs the actual aspect ratio of the document.

By adopting the model training method, a model capable of obtaining the aspect ratio of the document is obtained through training, and compared with the mode in the prior art, the accuracy of the aspect ratio of the document obtained through the aspect ratio obtaining model is higher.

FIG. 5 is a schematic diagram illustrating the structure of a document image processing apparatus according to one exemplary embodiment. As shown in fig. 5, the apparatus includes:

a document image acquisition module 501 configured to acquire a document image of a target document;

a coordinate obtaining module 502 configured to obtain corner coordinates and portrait principal coordinates of the document image;

a vector obtaining module 503, configured to obtain a feature vector corresponding to the document image according to the corner coordinates and the image principal point coordinates;

an aspect ratio obtaining module 504 configured to input the feature vector to a pre-trained aspect ratio obtaining model and output an aspect ratio of the target document.

Optionally, the coordinate obtaining module 502 includes:

a size acquisition sub-module configured to acquire a size of the document image;

the position extraction submodule is configured to extract an edge position corresponding to the target document in the document image;

a corner coordinate obtaining sub-module configured to determine the corner coordinates according to the edge position and the size of the document image;

and the image principal point coordinate acquisition sub-module is configured to take the central point of the document image as the image principal point coordinate.

Optionally, the vector obtaining module 503 includes:

a transformation submodule configured to perform perspective transformation on the corner coordinates;

and the vector composition submodule is configured to combine the image principal point coordinates and the corner point coordinates after perspective transformation into the feature vector.

Optionally, the transformation submodule is configured to:

acquiring a perspective transformation matrix;

the corner point coordinates are left-multiplied by the perspective transformation matrix.

Alternatively, FIG. 6 is a schematic diagram illustrating the structure of another document image processing apparatus according to an exemplary embodiment. As shown in fig. 6, the apparatus further includes:

a model obtaining module 505 configured to train a target neural network model through a sample set, resulting in the aspect ratio obtaining model; wherein the sample set comprises: sample document images of a plurality of sample documents, and actual aspect ratios of the plurality of sample documents.

Optionally, the model obtaining module 505 includes:

the sample vector acquisition sub-module is configured to acquire a sample feature vector corresponding to the sample document image of each sample document to obtain a plurality of sample feature vectors;

and the model obtaining submodule is configured to perform iterative training on the target neural network model through a preset loss function of the target neural network model according to the actual aspect ratios of the plurality of sample feature vectors and the plurality of sample documents to obtain the aspect ratio obtaining model.

Optionally, the model obtaining sub-module is configured to:

circularly executing the following steps: inputting a plurality of sample feature vectors into the target neural network model, obtaining a loss value of the preset loss function according to the actual aspect ratio of a plurality of sample documents, and updating parameters of the target neural network model according to the loss value under the condition that the loss value is greater than a preset loss threshold value to obtain a new target neural network model;

and taking the target neural network model corresponding to the loss value as the aspect ratio acquisition model until the loss value is less than or equal to the preset loss threshold value.

By the aid of the device, after the feature vectors corresponding to the document images of the target documents are obtained, the aspect ratio of the target documents can be obtained through the pre-trained aspect ratio model, and therefore the accuracy of the obtained aspect ratio of the target documents is higher as the aspect ratio model does not limit the shapes of the document images.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

The present disclosure also provides a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the steps of the document image processing method provided by the present disclosure.

Fig. 7 is a block diagram illustrating a terminal device 700 according to an example embodiment. For example, the terminal device 700 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, and the like.

Referring to fig. 7, the terminal device 700 may include one or more of the following components: a processing component 702, a memory 704, a power component 706, a multimedia component 708, an audio component 710, an input/output (I/O) interface 712, a sensor component 714, and a communication component 716.

The processing component 702 generally controls overall operation of the terminal device 700, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 702 may include one or more processors 720 to execute instructions to perform all or a portion of the steps of the document image processing method described above. Further, the processing component 702 may include one or more modules that facilitate interaction between the processing component 702 and other components. For example, the processing component 702 may include a multimedia module to facilitate interaction between the multimedia component 708 and the processing component 702.

The memory 704 is configured to store various types of data to support operations at the terminal device 700. Examples of such data include instructions for any application or method operating on terminal device 700, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 704 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

The power component 706 provides power to the various components of the terminal device 700. Power components 706 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for terminal device 700.

The multimedia component 708 comprises a screen providing an output interface between said terminal device 700 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 708 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the terminal device 700 is in an operation mode, such as a photographing mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 710 is configured to output and/or input audio signals. For example, the audio component 710 includes a Microphone (MIC) configured to receive an external audio signal when the terminal device 700 is in an operation mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signal may further be stored in the memory 704 or transmitted via the communication component 716. In some embodiments, audio component 710 also includes a speaker for outputting audio signals.

The I/O interface 712 provides an interface between the processing component 702 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor component 714 includes one or more sensors for providing various aspects of status assessment for the terminal device 700. For example, sensor component 714 can detect an open/closed state of terminal device 700, the relative positioning of components, such as a display and keypad of terminal device 700, sensor component 714 can also detect a change in the position of terminal device 700 or a component of terminal device 700, the presence or absence of user contact with terminal device 700, orientation or acceleration/deceleration of terminal device 700, and a change in the temperature of terminal device 700. The sensor assembly 714 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 714 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 714 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 716 is configured to facilitate wired or wireless communication between the terminal device 700 and other devices. The terminal device 700 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 716 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 716 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the terminal device 700 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic elements for performing the above-described document image processing method.

In an exemplary embodiment, a non-transitory computer-readable storage medium comprising instructions, such as the memory 704 comprising instructions, executable by the processor 720 of the terminal device 700 to perform the document image processing method described above is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

In another exemplary embodiment, a computer program product is also provided, which comprises a computer program executable by a programmable apparatus, the computer program having code portions for performing the above-mentioned document image processing method when executed by the programmable apparatus.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A method of document image processing, the method comprising:

acquiring a document image of a target document;

acquiring corner coordinates and image principal point coordinates of the document image;

acquiring a feature vector corresponding to the document image according to the corner point coordinates and the image principal point coordinates;

and inputting the feature vector into a pre-trained aspect ratio acquisition model, and outputting the aspect ratio of the target document.

2. The method of claim 1, wherein the obtaining corner point coordinates and portrait principal point coordinates of the document image comprises:

acquiring the size of the document image;

extracting an edge position corresponding to the target document from the document image;

determining the corner coordinates according to the edge positions and the size of the document image;

and taking the central point of the document image as the image principal point coordinate.

3. The method according to claim 1, wherein the obtaining the feature vector corresponding to the document image according to the corner point coordinates and the principal point coordinates comprises:

carrying out perspective transformation on the corner point coordinates;

and forming the feature vector by using the image principal point coordinates and the corner point coordinates after perspective transformation.

4. The method of claim 3, wherein the perspective transforming the corner coordinates comprises:

acquiring a perspective transformation matrix;

and the coordinates of the corner points are multiplied by the perspective transformation matrix.

5. The method of claim 1, wherein the aspect ratio acquisition model is trained by:

training a target neural network model through a sample set to obtain the aspect ratio acquisition model; wherein the sample set comprises: sample document images of a plurality of sample documents, and actual aspect ratios of a plurality of the sample documents.

6. The method of claim 5, wherein training the target neural network model through a sample set to obtain the aspect ratio acquisition model comprises:

obtaining a sample feature vector corresponding to a sample document image of each sample document to obtain a plurality of sample feature vectors;

and performing iterative training on the target neural network model through a preset loss function of the target neural network model according to the actual aspect ratios of the plurality of sample feature vectors and the plurality of sample documents to obtain the aspect ratio acquisition model.

7. The method of claim 6, wherein iteratively training the target neural network model through a preset loss function of the target neural network model according to the plurality of sample feature vectors and the actual aspect ratio of the plurality of sample documents to obtain the aspect ratio obtaining model comprises:

8. A document image processing apparatus, characterized in that the apparatus comprises:

a document image acquisition module configured to acquire a document image of a target document;

the coordinate acquisition module is configured to acquire corner coordinates and image principal point coordinates of the document image;

the vector acquisition module is configured to acquire a feature vector corresponding to the document image according to the corner point coordinates and the image principal point coordinates;

and the aspect ratio acquisition module is configured to input the feature vector to a pre-trained aspect ratio acquisition model and output the aspect ratio of the target document.

9. A computer-readable storage medium, on which computer program instructions are stored, which program instructions, when executed by a processor, carry out the steps of the method according to any one of claims 1 to 7.

10. A terminal device, comprising:

a memory having a computer program stored thereon;

a processor for executing the computer program in the memory to carry out the steps of the method of any one of claims 1 to 7.