CN112733599B - Document image processing method and device, storage medium and terminal equipment - Google Patents
Document image processing method and device, storage medium and terminal equipment Download PDFInfo
- Publication number
- CN112733599B CN112733599B CN202011412437.8A CN202011412437A CN112733599B CN 112733599 B CN112733599 B CN 112733599B CN 202011412437 A CN202011412437 A CN 202011412437A CN 112733599 B CN112733599 B CN 112733599B
- Authority
- CN
- China
- Prior art keywords
- document
- aspect ratio
- document image
- sample
- image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 19
- 239000013598 vector Substances 0.000 claims abstract description 91
- 238000000034 method Methods 0.000 claims abstract description 29
- 238000003062 neural network model Methods 0.000 claims description 65
- 230000009466 transformation Effects 0.000 claims description 33
- 238000012549 training Methods 0.000 claims description 27
- 230000006870 function Effects 0.000 claims description 24
- 238000012545 processing Methods 0.000 claims description 18
- 239000011159 matrix material Substances 0.000 claims description 17
- 238000004590 computer program Methods 0.000 claims description 11
- 238000004891 communication Methods 0.000 description 10
- 238000004364 calculation method Methods 0.000 description 7
- 238000005516 engineering process Methods 0.000 description 7
- 230000001788 irregular Effects 0.000 description 7
- 238000010586 diagram Methods 0.000 description 6
- 230000008569 process Effects 0.000 description 6
- 230000003287 optical effect Effects 0.000 description 4
- 230000005236 sound signal Effects 0.000 description 4
- 238000003384 imaging method Methods 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 230000001133 acceleration Effects 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000007726 management method Methods 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000009795 derivation Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000003708 edge detection Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/41—Analysis of document content
- G06V30/414—Extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics or text
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Molecular Biology (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Graphics (AREA)
- Geometry (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The disclosure relates to a document image processing method, a device, a storage medium and a terminal device, wherein the method comprises the following steps: acquiring a document image of a target document; acquiring angular point coordinates and image principal point coordinates of the document image; acquiring a feature vector corresponding to the document image according to the corner coordinates and the principal point coordinates; and inputting the feature vector into a pre-trained aspect ratio acquisition model, and outputting the aspect ratio of the target document. That is, the present disclosure may acquire the aspect ratio of a target document through a pre-trained aspect ratio model after acquiring a feature vector corresponding to a document image of the target document, and thus, since the aspect ratio model does not limit the shape of the document image, the accuracy of the aspect ratio of the acquired target document is higher.
Description
Technical Field
The disclosure relates to the technical field of image processing, and in particular relates to a document image processing method, a document image processing device, a storage medium and terminal equipment.
Background
With the rapid development of technologies such as portable photographic devices, such as smart phones, people can more conveniently take higher quality photos, and the photos can be used for recording highlight moments in daily life, and also can be used as digital copies of paper documents for recording or sharing important information. However, due to the influence of perspective distortion, a document photo corresponding to a paper document generated by a mobile phone may be an irregular quadrilateral, and in order to correct the document photo, the aspect ratio of the document photo needs to be acquired.
In the related art, a pinhole camera model can be used to model the imaging process of a camera, and a calculation formula of the aspect ratio of a document photo is deduced based on a plurality of constraint conditions of rectangles in a real scene in single view geometry. However, since the calculation formula is for a regular rectangular document photograph, in the case where the document photograph is an irregular rectangle, the accuracy of the aspect ratio of the document photograph obtained from the calculation formula is low.
Disclosure of Invention
In order to overcome the problems in the related art, the present disclosure provides a document image processing method, apparatus, storage medium, and terminal device.
According to a first aspect of embodiments of the present disclosure, there is provided a document image processing method, the method including: acquiring a document image of a target document; acquiring angular point coordinates and image principal point coordinates of the document image; acquiring a feature vector corresponding to the document image according to the corner coordinates and the principal point coordinates; and inputting the feature vector into a pre-trained aspect ratio acquisition model, and outputting the aspect ratio of the target document.
Optionally, the acquiring the corner coordinates and the image principal point coordinates of the document image includes: acquiring the size of the document image; extracting the edge position corresponding to the target document from the document image; determining the corner coordinates according to the edge positions and the sizes of the document images; and taking the center point of the document image as the image principal point coordinate.
Optionally, the obtaining the feature vector corresponding to the document image according to the corner point coordinates and the principal point coordinates includes: performing perspective transformation on the corner coordinates; and forming the feature vector by the image principal point coordinates and the corner point coordinates after perspective transformation.
Optionally, the performing perspective transformation on the corner coordinates includes: obtaining a perspective transformation matrix; and multiplying the corner coordinates by the perspective transformation matrix.
Optionally, the aspect ratio acquisition model is trained by: training a target neural network model through a sample set to obtain the aspect ratio acquisition model; wherein the sample set comprises: a sample document image of a plurality of sample documents, and an actual aspect ratio of the plurality of sample documents.
Optionally, the training the target neural network model through the sample set, and obtaining the aspect ratio obtaining model includes: obtaining sample feature vectors corresponding to sample document images of each sample document to obtain a plurality of sample feature vectors; and carrying out iterative training on the target neural network model through a preset loss function of the target neural network model according to the sample feature vectors and the actual aspect ratios of the sample documents to obtain the aspect ratio acquisition model.
Optionally, the performing iterative training on the target neural network model according to the sample feature vectors and the actual aspect ratios of the sample documents through a preset loss function of the target neural network model, and obtaining the aspect ratio obtaining model includes:
The following steps are circularly executed: inputting a plurality of sample feature vectors into the target neural network model, acquiring a loss value of the preset loss function according to the actual aspect ratio of a plurality of sample documents, and updating parameters of the target neural network model according to the loss value under the condition that the loss value is larger than a preset loss threshold value to obtain a new target neural network model; and taking the target neural network model corresponding to the loss value as the aspect ratio acquisition model until the loss value is smaller than or equal to the preset loss threshold value.
According to a second aspect of the embodiments of the present disclosure, there is provided a document image processing apparatus including: a document image acquisition module configured to acquire a document image of a target document; the coordinate acquisition module is configured to acquire angular point coordinates and image principal point coordinates of the document image; the vector acquisition module is configured to acquire a feature vector corresponding to the document image according to the corner coordinates and the image principal point coordinates; an aspect ratio acquisition module configured to input the feature vector to a pre-trained aspect ratio acquisition model and output an aspect ratio of the target document.
Optionally, the coordinate acquisition module includes: a size acquisition sub-module configured to acquire a size of the document image; a position extraction sub-module configured to extract an edge position corresponding to the target document in the document image; the angular point coordinate acquisition submodule is configured to determine angular point coordinates according to the edge positions and the sizes of the document images; an image principal point coordinate acquisition sub-module configured to take a center point of the document image as the image principal point coordinate.
Optionally, the vector acquisition module includes: a transformation submodule configured to perform perspective transformation on the corner coordinates; and a vector composing sub-module configured to compose the feature vector from the image principal point coordinates and the perspective transformed corner point coordinates.
Optionally, the transformation submodule is configured to: obtaining a perspective transformation matrix; and multiplying the corner coordinates by the perspective transformation matrix.
Optionally, the apparatus further comprises: the model acquisition module is configured to train the target neural network model through a sample set to obtain the aspect ratio acquisition model; wherein the sample set comprises: a sample document image of a plurality of sample documents, and an actual aspect ratio of the plurality of sample documents.
Optionally, the model acquisition module includes: a sample vector obtaining sub-module configured to obtain sample feature vectors corresponding to sample document images of each sample document, so as to obtain a plurality of sample feature vectors; and the model acquisition sub-module is configured to perform iterative training on the target neural network model through a preset loss function of the target neural network model according to the sample feature vectors and the actual aspect ratios of the sample documents to obtain the aspect ratio acquisition model.
Optionally, the model acquisition sub-module is configured to: the following steps are circularly executed: inputting a plurality of sample feature vectors into the target neural network model, acquiring a loss value of the preset loss function according to the actual aspect ratio of a plurality of sample documents, and updating parameters of the target neural network model according to the loss value under the condition that the loss value is larger than a preset loss threshold value to obtain a new target neural network model; and taking the target neural network model corresponding to the loss value as the aspect ratio acquisition model until the loss value is smaller than or equal to the preset loss threshold value.
According to a third aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the steps of the document image processing method provided by the first aspect of the present disclosure.
According to a fourth aspect of embodiments of the present disclosure, there is provided a terminal device, including: a memory having a computer program stored thereon; a processor for executing the computer program in the memory to implement the steps of the document image processing method provided in the first aspect of the present disclosure.
The technical scheme provided by the embodiment of the disclosure can comprise the following beneficial effects: acquiring a document image of a target document; acquiring angular point coordinates and image principal point coordinates of the document image; acquiring a feature vector corresponding to the document image according to the corner coordinates and the principal point coordinates; and inputting the feature vector into a pre-trained aspect ratio acquisition model, and outputting the aspect ratio of the target document. That is, the aspect ratio of the target document can be obtained through the aspect ratio model trained in advance after the feature vector corresponding to the document image of the target document is obtained, and thus, the accuracy of the aspect ratio of the obtained target document is higher because the aspect ratio model does not limit the shape of the document image.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure.
FIG. 1 is a flowchart illustrating a document image processing method according to an exemplary embodiment;
FIG. 2 is a schematic diagram of a document image shown according to an example embodiment;
FIG. 3 is a flowchart illustrating another document image processing method according to an example embodiment;
FIG. 4 is a flowchart illustrating a training method for an aspect ratio acquisition model, according to an exemplary embodiment;
FIG. 5 is a schematic diagram showing a structure of a document image processing apparatus according to an exemplary embodiment;
FIG. 6 is a schematic diagram showing the structure of another document image processing apparatus according to an exemplary embodiment;
fig. 7 is a block diagram of a terminal device, according to an example embodiment.
Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.
First, an application scenario of the present disclosure will be described. The present disclosure can be applied to a terminal device having a photographing function, which photographs a document by the terminal device to generate a digitized document is easy to use and simple to operate, compared to a conventional scanner, but a document of a regular rectangle may become an irregular quadrangle in a photographed photograph due to the existence of perspective distortion. When the subsequent image processing is carried out on the photo, the original shape of the document needs to be restored by correcting the image, and the key of the process is to correctly estimate the true aspect ratio of the document.
At present, the most accurate document true aspect ratio estimation method is a document aspect ratio estimation method based on rectangular constraint conditions in single view geometry, the method uses a pinhole camera model to model the imaging process of a camera, and derives a calculation formula of the aspect ratio of a photographed document based on a plurality of constraint conditions of a rectangle in a real scene in the single view geometry. However, the derivation condition of the calculation formula is that the photographed document is a regular rectangle, and therefore, the accuracy of the aspect ratio obtained by the calculation formula is relatively low for the photographed document of an irregular rectangle, and the calculation formula cannot calculate the aspect ratio of the photographed document under partial photographing conditions, for example, when the photographing inclination angle is large.
In order to solve the above problems, the present disclosure provides a document image processing method, apparatus, storage medium, and terminal device, which can obtain an aspect ratio of a target document through a pre-trained aspect ratio model after obtaining a feature vector corresponding to a document image of the target document, so that the accuracy of the aspect ratio of the obtained target document is higher because the aspect ratio model does not limit the shape of the document image.
The present disclosure is described below in connection with specific embodiments.
The present disclosure may be applied to a terminal device, which may include a handheld device, a wearable device, a camera, etc., which is not limited by the present disclosure.
Fig. 1 is a flowchart illustrating a document image processing method according to an exemplary embodiment. As shown in fig. 1, the method includes:
S101, acquiring a document image of a target document.
The shape of the target document may be a regular rectangle, for example, a rectangle, or an irregular rectangle, for example, a circle, a triangle, etc., and the target document may be a handwritten note, a treatise contract, an invoice, a business card, a poster, a book, etc., and the shape and type of the target document are not limited in this disclosure. The document image may include an image corresponding to the target document and an image around the target document, and fig. 2 is a schematic view of a document image according to an exemplary embodiment, and as shown in fig. 2, a white portion is an image corresponding to the target document and a black region is an image around the target document.
In this step, the terminal may capture the target document through a camera carried by the terminal itself to obtain a document image of the target document, where the camera may be any type of camera, and the terminal may capture the target document at any angle, and the present disclosure does not limit a capturing tilt angle, a capturing manner, and the like.
S102, acquiring corner coordinates and image principal point coordinates of the document image.
The corner point coordinate may be a vertex coordinate corresponding to the target document in the document image, and the principal point coordinate may be a coordinate of a center point of the document image.
In this step, after the document image of the target document is acquired, the coordinates of the corner points of the document image may be determined according to the position of the target document in the document image. For example, the distances between the four vertices of the target document and the edge of the document image may be acquired first, and the corner coordinates may be determined according to the distances. As shown in fig. 2, if the width of the document image is width, the height of the document image is height, the coordinates of the upper left corner of the document image may be defined as (0, 0), the distance from the upper left corner of the target document to the left edge of the document image is x 1, the distance from the upper edge of the document image is y 1, the coordinates of the upper left corner of the target document is (x 1,y1), and other corner coordinates may be obtained in the same manner. In addition, after the width and the height of the document image are acquired, the image principal point coordinates may be determined based on the width and the height.
S103, obtaining the feature vector corresponding to the document image according to the corner point coordinates and the image principal point coordinates.
In this step, after the corner coordinates and the image principal point coordinates of the document image are obtained, the corner coordinates and the image principal point coordinates may be combined to obtain the feature vector corresponding to the document image. For example, if the principal point coordinates are (x 0,y0), the corner point coordinates are (x 1,y1)、(x2,y2)、(x3,y3) and (x 4,y4), the feature vector may be (x0,y0,x1,y1,x2,y2,x3,y3,x4,y4).
S104, inputting the feature vector into a pre-trained aspect ratio acquisition model, and outputting the aspect ratio of the target document.
In this step, after the feature vector corresponding to the document image is acquired, the aspect ratio acquisition model trained in advance may be acquired, and the feature vector is input into the aspect ratio acquisition model, and the output of the aspect ratio acquisition model is the aspect ratio of the target document.
After the feature vector corresponding to the document image of the target document is acquired by adopting the method, the aspect ratio of the target document can be acquired through the pre-trained aspect ratio model, so that the aspect ratio of the acquired target document is higher in accuracy because the aspect ratio model does not limit the shape of the document image.
FIG. 3 is a flowchart illustrating another document image processing method according to an exemplary embodiment. As shown in fig. 3, the method includes:
S301, acquiring a document image of a target document.
The shape of the target document may be a regular rectangle, for example, a rectangle, or an irregular rectangle, for example, a circle, a triangle, etc., and the target document may be a handwritten note, a treatise contract, an invoice, a business card, a poster, a book, etc., and the shape and type of the target document are not limited in this disclosure. The document image may be an image including the target document and a surrounding of the target document.
S302, acquiring the size of the document image.
In this step, after the document image is acquired, the width and height of the document image can be acquired by a related-art method.
S303, extracting the edge position corresponding to the target document from the document image.
In this step, edge detection may be performed on the document image, edges in the document image are extracted, and then, fitting is performed on all the extracted edges to obtain edge positions corresponding to the target document. As shown in fig. 2, the edge position is the position where the edge of the white area is located in the figure.
S304, according to the edge position and the size of the document image, determining the corner coordinates of the document image.
In this step, after the edge position corresponding to the target document and the size of the document image are obtained, the corner point of the document image may be determined according to the edge position, and then the corner point coordinates of the document image may be determined according to the position of the corner point and the size of the document image. For example, the coordinates of the corner may be determined according to the distance between the position of the corner and the edge of the document image, and the specific implementation manner may refer to step S102, which is not described herein.
S305, taking the center point of the document image as the image principal point coordinate of the document image.
S306, obtaining the feature vector corresponding to the document image according to the corner point coordinates and the image principal point coordinates.
In this step, after the corner coordinates and the image principal point coordinates of the document image are obtained, the corner coordinates and the image principal point coordinates may be combined to obtain the feature vector corresponding to the document image. For example, if the principal point coordinates are (x 0,y0), the corner point coordinates are (x 1,y1)、(x2,y2)、(x3,y3) and (x 4,y4), the feature vector may be (x0,y0,x1,y1,x2,y2,x3,y3,x4,y4).
In consideration of distortion of the document image that may occur due to the influence of photographing angle or the like when the document image is obtained by photographing the target document by the terminal, in this case, the coordinates of the corner points of the obtained document image are also not accurate enough, resulting in that the aspect ratio of the finally obtained target document is also not accurate enough. In order to correct the distorted document image, after the corner coordinates are acquired, perspective transformation may be performed on the corner coordinates.
In one possible implementation manner, a perspective transformation matrix may be obtained, the corner coordinates are multiplied by the perspective transformation matrix, and perspective transformation is performed on the corner coordinates, so as to obtain corner coordinates after perspective transformation.
Wherein the perspective transformation matrix can be obtained by the following formula:
wherein H is a perspective transformation matrix, s is a uniform scaling factor, The value range of theta is [ -30, 30], t= [ tx, ty ], the value ranges of tx and ty are all [ -50, 50], K is a triangle matrix meeting normalization, V= [ x, y ], the value ranges of x and y are all [ -20, 20], V is a scale factor, and the value range of V is any positive integer.
When the perspective transformation matrix is obtained by the formula (1), the parameters θ, tx, ty, x, y, and v may take any values within the range of values, which is not limited in the present disclosure.
After perspective transformation is performed on the corner coordinates to obtain perspective transformed corner coordinates, the image principal point coordinates and the perspective transformed corner coordinates can be formed into the feature vector. In this way, the aspect ratio of the target document obtained from the feature vector is more accurate. For example, if the perspective transformed corner coordinates are (x 11,y11)、(x22,y22)、(x33,y33) and (x 44,y44), the principal point coordinates of the image are (x 0,y0), and the feature vector obtained by combining the principal point coordinates of the image and the corner coordinates is (x0,y0,x11,y11,x22,y22,x33,y33,x44,y44).
S307, inputting the feature vector into a pre-trained aspect ratio acquisition model, and outputting the aspect ratio of the target document.
After the document image of the target document is acquired, the size of the document image can be acquired, the edge position of the target document is extracted from the document image, the angular point coordinates of the document image are determined according to the size of the document image and the edge position, the aspect ratio of the target document can be acquired through a pre-trained aspect ratio model after the feature vector corresponding to the document image is acquired according to the angular point coordinates and the principal point coordinates, and therefore, the aspect ratio acquisition model does not limit the shape of the document image, so that the accuracy of the aspect ratio of the acquired target document is higher; in addition, under the condition that the document image is distorted, perspective transformation can be carried out on the angular point coordinates of the document image to obtain more accurate angular point coordinates, so that the accuracy of the aspect ratio of the target document can be further improved.
In order to ensure accuracy of the aspect ratio acquisition model, the aspect ratio acquisition model may be trained in advance, and a training process of the aspect ratio acquisition model is described below.
In one possible implementation, the target neural network model may be trained by a sample set to obtain the aspect ratio acquisition model; wherein the sample set comprises: a sample document image of a plurality of sample documents, and an actual aspect ratio of the plurality of sample documents. FIG. 4 is a flowchart illustrating a training method for an aspect ratio acquisition model, according to an exemplary embodiment, the method including:
S401, acquiring sample document images of a plurality of sample documents and the actual aspect ratio of the plurality of sample documents.
The sample document may be any document with any shape, and the present disclosure does not limit the shape of the sample document.
In this step, the manner of acquiring the sample document images of the plurality of sample documents may refer to the manner of acquiring the document images of the target document in the embodiment shown in fig. 1 and 3, and will not be described here again. After the sample document images of the plurality of sample documents are acquired, the corner coordinates of the sample document images may be acquired first, and the mode of acquiring the corner coordinates of the sample document images may refer to the mode of acquiring the corner coordinates of the document images in the embodiments shown in fig. 1 and fig. 3, which is not described herein.
Further, after the corner coordinates of the sample document image are acquired, the actual aspect ratio of the sample document can be acquired by a related technology for the sample document with a regular rectangle shape. The actual aspect ratio of the sample document may be obtained, for example, by the following formula:
Wherein, For the actual aspect ratio of the sample document, A is the internal reference matrix of the camera that captured the sample document,/>
M is the corner coordinates of the sample document image.
It should be noted that, for an irregular rectangular sample document, the actual aspect ratio of the plurality of sample documents may be manually obtained.
S402, sample feature vectors corresponding to sample document images of each sample document are obtained, so that a plurality of sample feature vectors are obtained.
In this step, the manner of acquiring the sample feature vector corresponding to each sample document image may refer to the manner of acquiring the feature vector of the document image in the embodiment shown in fig. 1 and fig. 3, which is not described herein again.
S403, training the target neural network model according to the actual aspect ratio of the plurality of sample feature vectors and the plurality of sample documents to obtain the aspect ratio acquisition model.
The target neural network model may be an MLP (Multi-Layer Perceptron) neural network model, where the target neural network model may include a plurality of hidden layers, the greater the number of hidden layers, the higher the accuracy of the aspect ratio of the output, but the longer the time to obtain the aspect ratio, so the number of hidden layers may be determined according to the type of the terminal, for example, for a terminal with a higher real-time requirement and a lower accuracy requirement, fewer hidden layers may be set, for example, 2 hidden layers may be set, for a terminal with a lower real-time requirement and a higher accuracy requirement, for example, 4 hidden layers may be set, and the number of hidden layers may be comprehensively determined according to the real-time and the accuracy of the present disclosure.
In this step, after the plurality of sample feature vectors are acquired, the target neural network model may be iteratively trained according to the actual aspect ratios of the plurality of sample feature vectors and the plurality of sample documents by a preset loss function of the target neural network model, to obtain the aspect ratio acquisition model. The preset loss function may be a square loss function, or may be another type of loss function, which is not limited in this disclosure. In the process of training the target neural network model, parameters of the target neural network model can be updated in a back propagation mode according to the loss value of the preset loss function, and finally the aspect ratio acquisition model is obtained.
In training the target neural network, the following steps may be performed in a loop: inputting a plurality of sample feature vectors into the target neural network model, acquiring a loss value of the preset loss function according to the actual aspect ratio of a plurality of sample documents, and updating parameters of the target neural network model according to the loss value under the condition that the loss value is larger than a preset loss threshold value to obtain a new target neural network model; and taking the target neural network model corresponding to the loss value as the aspect ratio acquisition model until the loss value is smaller than or equal to the preset loss threshold value.
Wherein the actual aspect ratio of the sample document may be used as a label of a sample feature vector corresponding to the sample document, the sample feature vector may be input into the target neural network model in the process of training the target neural network model, the target neural network model may output a predicted aspect ratio of the sample document according to the sample feature vector, and the actual aspect ratio of the sample document may be obtained after the predicted aspect ratio is output. In the initial stage of the training of the target neural network model, the error in the predicted aspect ratio of the sample document and the actual aspect ratio of the sample document may be relatively large, and in an ideal case, the error may be smaller and smaller during the training of the target neural network model. After obtaining the predicted aspect ratio and the actual aspect ratio of the sample document, a loss value for the preset loss function may be obtained, for example, if the preset loss function is a square loss function, the loss value may be obtained by the following formula:
Where L (S) is the loss value, R is the actual aspect ratio, and MLP (S) is the predicted aspect ratio.
After obtaining the loss value of the preset loss function, the preset loss threshold value can be obtained, and under the condition that the loss value is larger than the preset loss threshold value, the target neural network model does not meet the convergence condition, and further iteration training needs to be continued, and under the condition, parameters of the target neural network model can be adjusted according to the loss value, so that a new target neural network model is obtained. And then, the plurality of sample feature vectors can be input into the new target neural network model, a new loss value of the preset loss function is obtained according to the same mode, if the new loss value is still larger than the preset loss threshold value, parameters of the new target neural network model can be continuously adjusted according to the same mode, if the loss value is smaller than or equal to the preset loss threshold value, training of the target neural network is stopped, and the target neural network model corresponding to the loss value is taken as the aspect ratio obtaining model.
It should be noted that the plurality of sample feature vectors may be divided into a training set and a test set, for example, if the sample feature vectors include 100 sample feature vectors, 90 sample feature vectors may be used as the training set, and the remaining 10 sample feature vectors may be used as the test set. Training the target neural network model through the sample feature vectors in the training set to obtain the aspect ratio acquisition model, wherein one sample feature vector can be input each time, the target neural network model can output the predicted aspect ratio of the sample document according to the sample feature vector, and after the predicted aspect ratio is output, the actual aspect ratio of the sample document is obtained, the loss value of the preset loss function is obtained according to the predicted aspect ratio and the actual aspect ratio, and the parameters of the target neural network model are adjusted according to the loss value to obtain a new target neural network model; the target neural network model may output the predicted aspect ratio of the plurality of sample documents according to the plurality of sample feature vectors, obtain a plurality of actual aspect ratios of the sample documents after outputting the plurality of predicted aspect ratios, obtain a loss value of the preset loss function according to an average value of the plurality of predicted aspect ratios and an average value of the plurality of actual aspect ratios, and adjust parameters of the target neural network model according to the loss value to obtain a new target neural network model.
After the aspect ratio acquisition model is obtained, the aspect ratio acquisition model can be validated by the sample feature vectors in the test set to acquire an error of the aspect ratio acquisition model, the smaller the error, the higher the accuracy of the aspect ratio acquisition model.
Illustratively, the error of the aspect ratio acquisition model can be acquired by the following formula:
where MER is the error of the aspect ratio acquisition model, N is the number of acquired aspect ratios, error_rate n is the error of the aspect ratio acquired N-th time, r cal is the aspect ratio of the document obtained by the aspect ratio acquisition model, and r tru is the actual aspect ratio of the document.
By adopting the model training method, a model capable of acquiring the aspect ratio of the document is obtained through training, and compared with the mode in the prior art, the accuracy of acquiring the aspect ratio of the document acquired by the model through the aspect ratio is higher.
Fig. 5 is a schematic diagram showing a structure of a document image processing apparatus according to an exemplary embodiment. As shown in fig. 5, the apparatus includes:
a document image acquisition module 501 configured to acquire a document image of a target document;
A coordinate acquiring module 502 configured to acquire corner coordinates and image principal point coordinates of the document image;
A vector obtaining module 503, configured to obtain a feature vector corresponding to the document image according to the corner coordinates and the principal point coordinates;
an aspect ratio acquisition module 504 configured to input the feature vector to a pre-trained aspect ratio acquisition model and output the aspect ratio of the target document.
Optionally, the coordinate acquisition module 502 includes:
a size acquisition sub-module configured to acquire a size of the document image;
A position extraction sub-module configured to extract an edge position corresponding to the target document in the document image;
the angular point coordinate acquisition submodule is configured to determine the angular point coordinate according to the edge position and the size of the document image;
An image principal point coordinate acquisition sub-module is configured to take a center point of the document image as the image principal point coordinate.
Optionally, the vector acquisition module 503 includes:
a transformation submodule configured to perform perspective transformation on the corner coordinates;
and a vector composing sub-module configured to compose the feature vector from the image principal point coordinates and the perspective transformed corner point coordinates.
Optionally, the transformation submodule is configured to:
obtaining a perspective transformation matrix;
the corner coordinates are multiplied by the perspective transformation matrix.
Alternatively, fig. 6 is a schematic structural view of another document image processing apparatus shown according to an exemplary embodiment. As shown in fig. 6, the apparatus further includes:
A model acquisition module 505 configured to train a target neural network model through a sample set, resulting in the aspect ratio acquisition model; wherein the sample set comprises: a sample document image of the plurality of sample documents, and an actual aspect ratio of the plurality of sample documents.
Optionally, the model obtaining module 505 includes:
The sample vector acquisition submodule is configured to acquire sample feature vectors corresponding to sample document images of each sample document so as to acquire a plurality of sample feature vectors;
The model acquisition sub-module is configured to perform iterative training on the target neural network model through a preset loss function of the target neural network model according to a plurality of sample feature vectors and the actual aspect ratio of a plurality of sample documents, so as to obtain the aspect ratio acquisition model.
Optionally, the model acquisition submodule is configured to:
The following steps are circularly executed: inputting a plurality of sample feature vectors into the target neural network model, acquiring a loss value of the preset loss function according to the actual aspect ratio of a plurality of sample documents, and updating parameters of the target neural network model according to the loss value under the condition that the loss value is larger than a preset loss threshold value to obtain a new target neural network model;
And taking the target neural network model corresponding to the loss value as the aspect ratio acquisition model until the loss value is smaller than or equal to the preset loss threshold value.
By the device, after the feature vector corresponding to the document image of the target document is acquired, the aspect ratio of the target document can be acquired through the pre-trained aspect ratio model, so that the aspect ratio of the acquired target document is higher in accuracy because the aspect ratio model does not limit the shape of the document image.
The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.
The present disclosure also provides a computer-readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the steps of the document image processing method provided by the present disclosure.
Fig. 7 is a block diagram illustrating a terminal device 700 according to an exemplary embodiment. For example, the terminal device 700 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, or the like.
Referring to fig. 7, a terminal device 700 may include one or more of the following components: a processing component 702, a memory 704, a power component 706, a multimedia component 708, an audio component 710, an input/output (I/O) interface 712, a sensor component 714, and a communication component 716.
The processing component 702 generally controls overall operation of the terminal device 700, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 702 may include one or more processors 720 to execute instructions to perform all or part of the steps of the document image processing method described above. Further, the processing component 702 can include one or more modules that facilitate interaction between the processing component 702 and other components. For example, the processing component 702 may include a multimedia module to facilitate interaction between the multimedia component 708 and the processing component 702.
The memory 704 is configured to store various types of data to support operation at the terminal device 700. Examples of such data include instructions for any application or method operating on the terminal device 700, contact data, phonebook data, messages, pictures, video, and the like. The memory 704 may be implemented by any type or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.
The power component 706 provides power to the various components of the terminal device 700. Power component 706 can include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for terminal device 700.
The multimedia component 708 comprises a screen between the terminal device 700 and the user providing an output interface. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may sense not only the boundary of a touch or slide action, but also the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 708 includes a front-facing camera and/or a rear-facing camera. The front camera and/or the rear camera may receive external multimedia data when the terminal device 700 is in an operation mode, such as a photographing mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have focal length and optical zoom capabilities.
The audio component 710 is configured to output and/or input audio signals. For example, the audio component 710 includes a Microphone (MIC) configured to receive external audio signals when the terminal device 700 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may be further stored in the memory 704 or transmitted via the communication component 716. In some embodiments, the audio component 710 further includes a speaker for outputting audio signals.
The I/O interface 712 provides an interface between the processing component 702 and peripheral interface modules, which may be a keyboard, click wheel, buttons, etc. These buttons may include, but are not limited to: homepage button, volume button, start button, and lock button.
The sensor assembly 714 includes one or more sensors for providing status assessment of various aspects for the terminal device 700. For example, the sensor assembly 714 may detect an on/off state of the terminal device 700, a relative positioning of the components, such as a display and keypad of the terminal device 700, a change in position of the terminal device 700 or a component of the terminal device 700, the presence or absence of a user's contact with the terminal device 700, an orientation or acceleration/deceleration of the terminal device 700, and a change in temperature of the terminal device 700. The sensor assembly 714 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact. The sensor assembly 714 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 714 may also include an acceleration sensor, a gyroscopic sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component 716 is configured to facilitate communication between the terminal device 700 and other devices, either wired or wireless. The terminal device 700 may access a wireless network based on a communication standard, such as WiFi,2G or 3G, or a combination thereof. In one exemplary embodiment, the communication component 716 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 716 further includes a Near Field Communication (NFC) module to facilitate short range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the terminal device 700 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic elements for performing the document image processing methods described above.
In an exemplary embodiment, a non-transitory computer readable storage medium is also provided, such as a memory 704 including instructions executable by the processor 720 of the terminal device 700 to perform the document image processing method described above. For example, the non-transitory computer readable storage medium may be ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.
In another exemplary embodiment, a computer program product is also provided, the computer program product comprising a computer program executable by a programmable apparatus, the computer program having code portions for performing the above-mentioned document image processing method when executed by the programmable apparatus.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.
Claims (8)
1. A document image processing method, the method comprising:
acquiring a document image of a target document;
Acquiring angular point coordinates and image principal point coordinates of the document image;
Acquiring a feature vector corresponding to the document image according to the corner coordinates and the principal point coordinates;
inputting the feature vector into a pre-trained aspect ratio acquisition model, and outputting the aspect ratio of the target document;
wherein, the obtaining the feature vector corresponding to the document image according to the angular point coordinates and the image principal point coordinates includes:
obtaining a perspective transformation matrix;
Multiplying the corner coordinates by the perspective transformation matrix, and performing perspective transformation on the corner coordinates to obtain corner coordinates with changed perspective;
and forming the feature vector by the image principal point coordinates and the corner point coordinates after perspective transformation.
2. The method of claim 1, wherein the acquiring the corner coordinates and the image principal point coordinates of the document image comprises:
acquiring the size of the document image;
Extracting the edge position corresponding to the target document from the document image;
Determining the corner coordinates according to the edge positions and the sizes of the document images;
And taking the center point of the document image as the image principal point coordinate.
3. The method according to claim 1, wherein the aspect ratio acquisition model is trained by:
training a target neural network model through a sample set to obtain the aspect ratio acquisition model; wherein the sample set comprises: a sample document image of a plurality of sample documents, and an actual aspect ratio of the plurality of sample documents.
4. The method of claim 3, wherein training the target neural network model through the sample set to obtain the aspect ratio acquisition model comprises:
Obtaining sample feature vectors corresponding to sample document images of each sample document to obtain a plurality of sample feature vectors;
And carrying out iterative training on the target neural network model through a preset loss function of the target neural network model according to the sample feature vectors and the actual aspect ratios of the sample documents to obtain the aspect ratio acquisition model.
5. The method of claim 4, wherein iteratively training the target neural network model through a preset loss function of the target neural network model based on the plurality of sample feature vectors and the actual aspect ratios of the plurality of sample documents, the obtaining the aspect ratio acquisition model comprises:
The following steps are circularly executed: inputting a plurality of sample feature vectors into the target neural network model, acquiring a loss value of the preset loss function according to the actual aspect ratio of a plurality of sample documents, and updating parameters of the target neural network model according to the loss value under the condition that the loss value is larger than a preset loss threshold value to obtain a new target neural network model;
And taking the target neural network model corresponding to the loss value as the aspect ratio acquisition model until the loss value is smaller than or equal to the preset loss threshold value.
6. A document image processing apparatus, characterized by comprising:
A document image acquisition module configured to acquire a document image of a target document;
The coordinate acquisition module is configured to acquire angular point coordinates and image principal point coordinates of the document image;
the vector acquisition module is configured to acquire a feature vector corresponding to the document image according to the corner coordinates and the image principal point coordinates;
An aspect ratio acquisition module configured to input the feature vector to a pre-trained aspect ratio acquisition model, outputting an aspect ratio of the target document;
Wherein, the vector acquisition module includes:
A transformation submodule configured to obtain a perspective transformation matrix; multiplying the corner coordinates by the perspective transformation matrix, and performing perspective transformation on the corner coordinates to obtain corner coordinates with changed perspective;
and a vector composing sub-module configured to compose the feature vector from the image principal point coordinates and the perspective transformed corner point coordinates.
7. A computer readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the steps of the method of any of claims 1-5.
8. A terminal device, comprising:
a memory having a computer program stored thereon;
A processor for executing the computer program in the memory to implement the steps of the method of any one of claims 1-5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011412437.8A CN112733599B (en) | 2020-12-04 | 2020-12-04 | Document image processing method and device, storage medium and terminal equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011412437.8A CN112733599B (en) | 2020-12-04 | 2020-12-04 | Document image processing method and device, storage medium and terminal equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112733599A CN112733599A (en) | 2021-04-30 |
CN112733599B true CN112733599B (en) | 2024-06-11 |
Family
ID=75598160
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011412437.8A Active CN112733599B (en) | 2020-12-04 | 2020-12-04 | Document image processing method and device, storage medium and terminal equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112733599B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115482314B (en) * | 2021-05-27 | 2025-06-06 | 北京东方思鸿科技有限公司 | Document image self-annotation method, device, storage medium and electronic device |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10032073B1 (en) * | 2015-06-25 | 2018-07-24 | Evernote Corporation | Detecting aspect ratios of document pages on smartphone photographs by learning camera view angles |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2002027449A (en) * | 2000-07-10 | 2002-01-25 | Fujitsu Ltd | Moving object identification method and moving object identification device |
US9111140B2 (en) * | 2012-01-10 | 2015-08-18 | Dst Technologies, Inc. | Identification and separation of form and feature elements from handwritten and other user supplied elements |
RU2541353C2 (en) * | 2013-06-19 | 2015-02-10 | Общество с ограниченной ответственностью "Аби Девелопмент" | Automatic capture of document with given proportions |
CN106991649A (en) * | 2016-01-20 | 2017-07-28 | 富士通株式会社 | The method and apparatus that the file and picture captured to camera device is corrected |
US11012608B2 (en) * | 2016-09-12 | 2021-05-18 | Huawei Technologies Co., Ltd. | Processing method and mobile device |
US11159717B2 (en) * | 2019-04-18 | 2021-10-26 | eyecandylab Corporation | Systems and methods for real time screen display coordinate and shape detection |
-
2020
- 2020-12-04 CN CN202011412437.8A patent/CN112733599B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10032073B1 (en) * | 2015-06-25 | 2018-07-24 | Evernote Corporation | Detecting aspect ratios of document pages on smartphone photographs by learning camera view angles |
Also Published As
Publication number | Publication date |
---|---|
CN112733599A (en) | 2021-04-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP3200125B1 (en) | Fingerprint template input method and device | |
US11825040B2 (en) | Image shooting method and device, terminal, and storage medium | |
US9959484B2 (en) | Method and apparatus for generating image filter | |
WO2018120238A1 (en) | File processing device and method, and graphical user interface | |
CN107944367B (en) | Face key point detection method and device | |
CN112200040B (en) | Occlusion image detection method, device and medium | |
CN106503682B (en) | Method and device for positioning key points in video data | |
CN104050645B (en) | Image processing method and device | |
CN109325908B (en) | Image processing method and device, electronic equipment and storage medium | |
US11252341B2 (en) | Method and device for shooting image, and storage medium | |
US9665925B2 (en) | Method and terminal device for retargeting images | |
CN112733599B (en) | Document image processing method and device, storage medium and terminal equipment | |
CN117616455A (en) | Multi-frame image alignment method, multi-frame image alignment device and storage medium | |
CN113865481B (en) | Object size measuring method, device and storage medium | |
CN114615520A (en) | Subtitle positioning method, subtitle positioning device, computer equipment and medium | |
CN107295229B (en) | The photographic method and device of mobile terminal | |
CN115760585A (en) | Image correction method, device, storage medium and electronic equipment | |
CN114418865A (en) | Image processing method, device, equipment and storage medium | |
CN111986097B (en) | Image processing method and device | |
CN111756985A (en) | Image shooting method, device and storage medium | |
CN111985280B (en) | Image processing method and device | |
CN112070681B (en) | Image processing method and device | |
CN114693707B (en) | Object contour template acquisition method, device, equipment and storage medium | |
KR101324809B1 (en) | Mobile terminal and controlling method thereof | |
CN109376588B (en) | A kind of face surveys luminous point choosing method, device and capture apparatus |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |