CN116883461B

CN116883461B - Method for acquiring clear document image and terminal device thereof

Info

Publication number: CN116883461B
Application number: CN202310567449.5A
Authority: CN
Inventors: 请求不公布姓名
Original assignee: Zhuhai Yike Intelligent Technology Co ltd; Zhuhai Xinye Electronic Technology Co Ltd
Current assignee: Zhuhai Yike Intelligent Technology Co ltd; Zhuhai Xinye Electronic Technology Co Ltd
Priority date: 2023-05-18
Filing date: 2023-05-18
Publication date: 2024-03-01
Anticipated expiration: 2043-05-18
Also published as: CN116883461A

Abstract

The invention provides a method for acquiring clear document images and a terminal device thereof, wherein the method comprises the steps of acquiring an original document image set to be processed; registering an original document image set, matching two or more document images acquired under different angles or different focusing conditions, so that points corresponding to the same position in space in the original document image set are matched one by one, and after matched key points are acquired, calculating a transformation relation of the original document image set to obtain a registered document image set; after the registered document image set is obtained, extracting the characteristic values of the same area on each image from the document image set position by position, and carrying out image fusion according to the extracted characteristic values to obtain a clear document image. According to the invention, a plurality of multi-focus images are shot on the same file content, and the image sets are fused according to the definition of the same position in the documents in different images, so that a full-focus clear document scanning image is obtained.

Description

Method for acquiring clear document image and terminal device thereof

Technical Field

The invention relates to the technical field of image processing, in particular to a method for acquiring a clear document image and a terminal device applying the method.

Background

With the popularization of intelligent devices and digital learning and office, how to convert paper documents into digital documents conveniently and quickly with high quality by using a digital camera is becoming more important. Today, people often use a camera on a mobile phone to scan and save document images in life or work, and the general flow is as follows: the mobile phone is used for shooting the document photo, and a plurality of image processing methods, such as contrast enhancement and the like, are used for obtaining a clearer document image. However, when the photographed document is relatively large or an included angle exists between the camera and the plane of the photographed document, the document image obtained by the camera may appear as a phenomenon that a part of the area is out of focus and blurred. In this case, only the conventional image processing method is not used to realize the definition of the blurred area by using only a single image, but the method based on the deep learning is also difficult to obtain a better effect, on one hand, the deep learning is not used to judge the text content and restore the text content through the characteristics even if the text is very blurred, on the other hand, the deep learning requires a large amount of computing resources and running time, and is not suitable for being directly deployed in the portable intelligent device used for shooting the document.

To address this problem, it is often necessary to take a plurality of photographs at different focus positions, and even so, it is still inconvenient to read the document because it is necessary to switch back and forth among the collection of photographs, while taking a large number of photographs for one document wastes more memory space.

In addition, when a user sits to photograph and scan a paper document flattened on a desktop, or photographs or projects a blackboard in a lecture classroom, the photographed document is often blurred in local defocus due to the fact that the imaging plane of the camera is not completely parallel to the plane of the photographed body, and characters in the defocus region are difficult to recognize when the document needs to be referred to. In addition, generally, when a document is scanned, the center of the document is often selected to be focused, when an enlarged image is read, the edge position is found to be blurred due to a slight included angle between an imaging surface and a document plane or lens shake caused by photographing action, so that the document is read to a certain extent, and the reading difficulty is increased.

Disclosure of Invention

Aiming at the problems that the photographed document is large, the partial area of the photographed document is out of focus and blurred, the occupied storage space is large and the like, the invention provides a method for acquiring a clear document image and a terminal device thereof.

In order to solve the problems, the technical scheme adopted by the invention is as follows:

a method for acquiring a clear document image, comprising the steps of:

acquiring an original document image set to be processed;

registering the original document image set, matching two or more document images acquired under different angles or different focusing conditions, so that points corresponding to the same position in space in the original document image set are matched one by one, and after matched key points are acquired, calculating the transformation relation of the original document image set to obtain a registered document image set;

after the registered document image set is obtained, extracting the characteristic values of the same area on each image from the document image set position by position, and carrying out image fusion according to the extracted characteristic values to obtain a clear document image.

According to the method for acquiring the clear document image provided by the invention, after the clear document image is acquired, the method can be further performed:

post-processing the document image: edge detection is carried out on the fused document image and the document image set before fusion, the detected document edges are fused in combination with the image set registration relationship, four vertex coordinates of the document on the fused image are calculated according to the detected document edges, the effective area of the scanned document is segmented from the document image, the document image is corrected, and therefore a clear scanned document image is obtained.

According to the method for acquiring the clear document image provided by the invention, the acquisition of the document image set to be processed comprises the following steps:

selecting different focusing images of a plurality of identical documents from a storage unit;

or the image acquisition unit directly acquires a plurality of document images in the changing process of the focal length;

wherein the image sets used in one process are typically image sets of the same document, different focus distances.

According to the method for acquiring clear document images provided by the invention, the document image set is registered, and the method comprises the following steps:

performing key point detection and feature description on each document image by using a feature extraction algorithm;

after each document image in the document image set is identified with key points and feature descriptions, measuring the distance between each pair of key point descriptors by using a matcher, and reserving correct matching by using a ratio filter so as to complete feature matching;

after the matched key points are obtained, further removing the mismatching points by using a random sampling consistency algorithm to obtain an initial perspective transformation matrix between the matching points;

and based on the matched key points and the obtained initial perspective transformation matrix, carrying out optimization by combining a nonlinear optimization algorithm to obtain a final transformation matrix.

According to the method for acquiring the clear document image provided by the invention, when the transformation relation of the document image set is calculated, in order to reduce calculation time consumption, the method is further implemented:

the original document image set is reduced according to a preset proportion;

calculating a transformation matrix between the reduced image sets;

and performing corresponding inverse scaling operation on the transformation matrix to obtain the transformation matrix suitable for the original image size.

According to the method for acquiring the clear document image provided by the invention, the corresponding inverse scaling operation is performed on the transformation matrix, and the method comprises the following steps:

setting the transformation matrix of the reduced first image and the reduced second image as formula (1):

transformation matrix after inverse scaling operation corresponding to the first image and the second imageIs formula (2):

wherein Scale is the scaling factor of the image;

according to the transformation matrixRegistering the original image set.

According to the method for acquiring the clear document image, the image fusion is carried out according to the extracted characteristic values, and the method comprises the following steps:

performing image fusion according to the extracted characteristic values based on pixel-level image fusion;

calculating the characteristic value of each region by using a window sliding mode;

and merging the document image sets into a clear image according to the characteristic values of the same area on different images.

According to the method for acquiring the clear document image provided by the invention, the document region extraction and deformation correction are carried out on the fused document image, and the method comprises the following steps:

edge detection of a document area is carried out on the fused document image;

optionally, edge detection of a document area is carried out on the document image before fusion, the information of the document image after fusion corresponding to each edge is calculated according to the mapping relation of the image set, and all the edge information is fused in a summation mode;

performing edge straight line fitting on the detected edge;

calculating four vertexes of the document image area according to the edge straight line;

and calculating a transformation matrix by using the four vertexes, and applying the transformation matrix to the fused document image to obtain a corrected document image.

A terminal apparatus for acquiring a clear document image, comprising:

a memory for storing image data and instructions executable by the processor;

a processor for processing data, executing instructions, and performing operations;

the image acquisition unit is used for acquiring an original scanned document image to be processed;

and an image output unit for displaying or printing the processed document image.

Therefore, compared with the prior art, the method for acquiring the clear document image provided by the invention acquires the full-focus document image by utilizing a plurality of differently focused document images to perform image fusion, and combines a proper edge detection algorithm and a document correction algorithm, so that the standard clear scanned document can be acquired, and the storage and the subsequent review are convenient. When a high quality printer is provided, the scanned document photo can be used to print out clear documents directly. In addition, the calculated amount of the method is far lower than that of the deep learning method, the method can be conveniently and rapidly deployed on mobile terminal equipment, clear document images are processed and synthesized on the equipment locally immediately after the camera acquires the document images, and meanwhile, the problem of information security possibly brought by document data in network transmission is avoided.

Furthermore, the invention can detect the document edge of the original document image to be processed through multiple dimensions to obtain the document edge, so that the document edge can keep global consistency, the false edge interference existing in the background and the image is eliminated, the accuracy and precision of edge detection are improved, the reliability of straight line fitting is improved for the subsequent steps, and the vertex positioning can be accurately realized.

The invention is described in further detail below with reference to the drawings and the detailed description.

Drawings

Fig. 1 is a first flowchart of an embodiment of a method of the present invention for acquiring a clear document image.

Fig. 2 is a second flowchart of an embodiment of a method of the present invention for capturing a sharp document image.

Fig. 3 is a schematic diagram of an embodiment of a terminal apparatus for acquiring a clear document image according to the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Referring to fig. 1 and 2, a method for acquiring a clear document image according to the present embodiment includes the following steps:

step S1, acquiring an original document image set to be processed, which specifically comprises the following steps: selecting different focusing images of a plurality of identical documents from a storage unit; or the image acquisition unit directly acquires a plurality of document images in the changing process of the focal length; the image sets used in one processing process are usually the same document and image sets with different focusing distances, for example, a left focusing image, a middle focusing image and a right focusing image of the same document can be used as one effective document image set to be processed. For best results, it is generally desirable that all content of a document be in focus and clear on at least one of its image sets. In addition, the resolution, depth of field and image processing time of the current common image acquisition unit are comprehensively considered, and 2-5 images are used for fusion at a time in the embodiment.

Step S2, registering an original document image set, matching two or more document images acquired under different angles or different focusing conditions, so that points corresponding to the same position in space in the original document image set are matched one by one, and after the matched key points are acquired, calculating the transformation relation of the original document image set to obtain a registered document image set;

and S3, after the registered document image set is obtained, extracting the characteristic values of the same area on each image from the document image set position by position, and carrying out image fusion according to the extracted characteristic values to obtain a clear document image.

In the present embodiment, after obtaining a clear document image, it is also possible to execute:

In this embodiment, calculating four vertex coordinates of a document from detected edges of the document includes: edge detection is carried out on the fused document image and the document image set before fusion, the detected document edges are fused in combination with the image set registration relationship, line segment fitting is carried out on the detected document edges, then screening and collecting are carried out on the fitted line segments, edge straight lines of the effective area of the document are obtained, and four vertex coordinates are calculated according to the edge straight lines;

in this embodiment, correcting a document image includes: and correcting the document image through perspective transformation according to the four vertex coordinates to generate a final scanned document image.

In the above step S2, registering the document image set includes:

In this embodiment, in calculating the transformation relationship of the document image set, in order to reduce the calculation time consumption, further execution is performed:

reducing the original document image set according to a preset proportion;

calculating a transformation matrix between the reduced image sets;

Specifically, in the step S2, the image sets are registered, and a specific objective is to match two or more images collected under different angles or different focusing conditions, so that points corresponding to the same position in space in the image sets are matched one by one, so that subsequent operations are facilitated to fuse image information, and the processing method used in the embodiment includes:

and extracting the characteristics of the document image sets one by one, matching the characteristics among the images, and calculating the transformation relation among the image sets.

The present embodiment uses a feature extraction algorithm to perform keypoint detection and feature description on each graph, and the usable features may include Harris corner points (Harris), scale-invariant feature transformation (Scale-invariant Feature Transform), acceleration robust features (Speeded Up Robust Features), local binary patterns (Local Binary Patterns), direction gradient histograms (Histogram of Oriented Gradient), orientation features of acceleration segment test, and rotated binary robust independent basic features (Oriented Features from Accelerated Segment Test and Rotated Binary Robust Independent Elementary Features), and the like.

After the keypoints and feature descriptions are identified for each document image in the set of document images, the distance between each pair of keypoint descriptors is measured using a matcher, and then the correct matching is retained by using a ratio filter to complete feature matching, wherein the used matching measurer can be a violence matcher (Brute Force Matcher) and a nearest neighbor matcher (Flann Based Matcher).

After the matched keypoints are obtained, a random sampling consistency algorithm (ranac) is used to further remove the mismatching points and obtain an initial perspective transformation matrix between the matching points.

Because the distortion model of the lens is nonlinear, the matched key points and the obtained initial perspective transformation matrix are further utilized, and a final transformation matrix is obtained by combining with a Levenberg-Marquardt nonlinear optimization algorithm (Levenberg-Marquardt) optimization.

Of course, other methods of registration of the image sets may be used in this step, such as mutual information registration (Mutual Information), normalized mutual information registration (Normalization Mutual Information), entropy correlation coefficient registration (Entropy Corrleation Coefficient), huo Enshu gram optical flow field registration (Horn-Schunck), lucaskaner optical flow field registration (Lucas-Kanade), and deep learning based registration, such as voxel deformation network (Voxelmorph), and the like.

In addition, with the rapid iterative development of technology, modern intelligent portable devices, such as mobile phones, are configured with photo sensors with higher resolution, such as 1200 ten thousand pixel lenses and 5000 ten thousand pixel lenses, and the like, the higher resolution also means that more document image details can be obtained, but with the increase of image resolution, the time of image processing is multiplied, in order to reduce time consumption, when calculating the transformation relation of an image set, the processed image set can be reduced first, and a transformation matrix among images is calculated for the set, and then the transformation matrix suitable for the original image size is obtained by performing corresponding inverse scaling operation on the transformation matrix, which is specifically implemented as follows:

scaling down the original image set to the same Scale, for example, scaling down the image to 1/4 of the original size, then scale=1/4;

calculating a transformation matrix between the reduced image sets;

inverse scaling the transformation matrix to obtain a transformation matrix at the original size, and setting the transformation matrix in which the image 1 and the image 2 are reducedThe following are provided:

transformation matrix of corresponding original image 1 and original image 2The method comprises the following steps:

registering the original image set according to the new transformation matrix. In addition, if the scale size is too large, it is considered to combine the original image set and the sumAnd performing fine tuning optimization on the transformation matrix.

Specifically, in the step S3, after the registered document image set is obtained, the image set is fused by using an image fusion technique. The general image fusion technology can be divided into three types, namely pixel-level image fusion, feature-level image fusion and decision-level image fusion, wherein the general pixel-level image fusion can better retain detailed information, so the embodiment selects pixel-level image fusion, extracts feature values of the same area on each image from the document image set position by position, and performs image fusion according to the extracted feature values to obtain a clear document image, and the specific mode is as follows:

calculating a characteristic value of each region by using a window sliding mode, wherein the characteristic value can be local variance, local image entropy, common convolution characteristics (such as Sobel characteristics and Laplacian characteristics) and the like;

the image sets are fused into a clear image according to the feature values of the same area on different images, and methods which can be adopted are a weighted average method based on features, a multi-band mixer (Multiband Blender) and the like.

In addition, the registered image sets can be fused by combining the feature pyramid, so that ringing phenomenon can be effectively reduced, and the quality of the fused image is improved.

In order to further improve the function, after obtaining a clear document image, referring to fig. 2, post-processing operation may be added to perform edge detection on the document image after fusion and the document image set before fusion, calculate four vertex coordinates of the document according to the detected document edge, and segment an effective area of the scanned document from the image according to the four vertex coordinates, and correct the effective area to obtain a clear scanned document image, which is implemented as follows:

the edge detection is carried out on the fused document image and the document image set before fusion, and the method comprises the following steps:

using an edge detection model to detect the document edge of the fused document image, simultaneously using the edge detection model to detect the document edge of the fused document image set, and combining known registration information to stack all document edge information to obtain a first document edge; in this embodiment, the edge detection model is trained by combining pixels and semantics, so that the fused document image is input into the edge detection model to perform edge prediction, a document edge probability map is obtained, and the document edge probability map is determined to be the first document edge. In the document edge probability map, the value of each pixel represents the probability that the pixel at the corresponding position in the document image to be processed belongs to the document edge, and the value range is 0.0 and 1.0.

Binarizing the first document edge to obtain a second document edge; the binarization process refers to threshold binarization process. And carrying out binarization processing on the first document edge by using a threshold binarization method to obtain a second document edge. The specific process is that the edge probability value of each pixel point in the document edge probability map represented by the first document edge is traversed. If the probability value of a certain pixel point is greater than or equal to the set threshold value p, the edge probability of the pixel point is reassigned to be 1. If the probability value of a certain pixel point is smaller than the set threshold value, the edge probability of the pixel point is reassigned to 0.

Filtering the second document edge to obtain a third document edge; since in the second document edge there may be edge blocks in some background or inside the document area, which do not belong to the document edge but to the interfering term, filtering is needed to improve accuracy. The process of filtering the second document edge includes: finding out all the connected edges in the second document edge by using a connected domain algorithm, and calculating the area of each connected edge; if the area is smaller than the set threshold, filtering the area, and only preserving the connected edges with the area larger than the preset threshold. Based on this, the second document edge is filtered to obtain a third document edge.

And refining the third document edge to obtain a fourth document edge as the document edge. Wherein the skeleton of the image can be obtained by a refinement algorithm. The refinement algorithm here may be the zhangsuin refinement algorithm.

And then screening and assembling the fitted line segments to obtain edge lines of the effective area of the document, and finally calculating 4 vertex coordinates according to the edge lines. Of course, in this step, the document area can be identified by using a deep learning method, and four vertex coordinates of the document are obtained; and correcting the document image according to the four vertex coordinates and perspective transformation, and generating a final document image.

Specifically, a straight line set based on a document edge frame can be obtained by detecting straight lines of the detected edge, then a rectangular frame set formed by any four straight lines in the straight line set is obtained, and the vertex coordinate of the largest rectangular frame is used as the vertex coordinate of a document in a document image.

And then, calculating a perspective transformation matrix based on the obtained vertex coordinates, and performing perspective correction on the fused document image according to the perspective transformation matrix to obtain a final document image comprising the distorted and repaired front-view angle document image, thereby facilitating reading and document archiving of a user and improving user experience.

Furthermore, the document edge is obtained by detecting the document edge of the original document image to be processed through multiple dimensions, so that the document edge can keep global consistency, false edge interference existing in the background and the image is eliminated, the accuracy and precision of edge detection are improved, the reliability of straight line fitting is improved for the subsequent steps, and vertex positioning can be accurately realized.

Further, the invention carries out straight line fitting on the edges of the document to determine a straight line set, further determines four vertexes of the document in the document image to be processed according to the straight line set, and carries out perspective transformation on the fused document image by combining the four vertexes to obtain a document correction result.

Terminal device embodiment for acquiring clear document image

Referring to fig. 3, a terminal apparatus for acquiring a clear document image provided in this embodiment includes:

a memory for storing image data and instructions executable by the processor;

an image output unit for displaying or printing the processed document image, the image output unit may also be a third party output device, such as an external display or printer;

the terminal can be a portable device such as a smart phone, a tablet computer and the like with a standard operating system.

In this embodiment, the obtaining of the document image set to be processed may be selecting a plurality of document images from a local memory, or directly obtaining a plurality of document images from an image collecting unit, or a small shot document video, or a Live Photo (Live Photo) of an apple, where the obtained image set is usually a document image set of the same document and different focusing distances, for example, a left focusing image, a middle focusing image and a right focusing image of the same document may be used as an effective document image set to be processed.

The memory in this embodiment refers to a digital electronic semiconductor device for storing program instructions and various data information, and may be generally divided into an internal memory (for short, memory) and an external memory (for short, external memory), where programs and data are generally stored in the external memory, and when a program command needs to be executed, commands and related data are called into the memory to be executed.

The processor in this embodiment is a microprocessor for interpreting program data already processed by computer instructions, and is typically referred to as a central processing unit (Central Processing Unit), which may be a complex instruction set microprocessor (Complex Instruction Set Computing) or a reduced instruction set microprocessor (Reduced Instruction Set Computer).

The image acquisition unit in this embodiment is usually a camera, a video camera, a scanner, or may be an intelligent terminal device with a photographing function, such as a smart phone, a tablet computer, etc. In case of an image acquisition unit separate from the current processing terminal, the image data needs to be transmitted to the processing terminal by means of an additional data transmission.

The image output unit in this embodiment is mainly used for displaying the processing result in the form of an image, and may be a printer, a projector, a display screen, or the like. The output device may be a display integrated on the terminal or may be a third party output device connected by wired or wireless data transmission.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The above embodiments are only preferred embodiments of the present invention, and the scope of the present invention is not limited thereto, but any insubstantial changes and substitutions made by those skilled in the art on the basis of the present invention are intended to be within the scope of the present invention as claimed.

Claims

1. A method for acquiring a clear document image, comprising the steps of:

acquiring an original document image set to be processed;

after the registered document image set is obtained, extracting the characteristic values of the same area on each image from the document image set position to position, and carrying out image fusion according to the extracted characteristic values to obtain a clear document image;

wherein after obtaining the clear document image, optionally performing a post-processing operation:

edge detection of a document area is carried out on the fused document image;

edge detection of a document area is carried out on the document image before fusion, the information of the document image after fusion corresponding to each edge is calculated according to the mapping relation of the image set, and all the edge information is fused in a summation mode;

performing edge straight line fitting on the detected edge;

2. The method according to claim 1, characterized in that:

the acquiring the document image set to be processed comprises the following steps:

3. The method of claim 1, wherein registering the set of document images comprises:

4. The method according to claim 1, characterized in that:

in calculating the transformation relation of the document image set, to reduce calculation time consumption, further performing:

the original document image set is reduced according to a preset proportion;

calculating a transformation matrix between the reduced image sets;

5. The method of claim 4, wherein said performing a corresponding inverse scaling operation on the transformation matrix comprises:

（1）

（2）

wherein Scale is the scaling factor of the image;

according to the transformation matrixRegistering the original image set.

6. The method of claim 1, wherein the image fusion based on the extracted feature values comprises: