CN113591846A

CN113591846A - Image distortion coefficient extraction method, distortion correction method and system, and electronic device

Info

Publication number: CN113591846A
Application number: CN202110841216.0A
Authority: CN
Inventors: 吴哲楠; 安�晟; 田宝亮; 李霄鹏; 黄宇飞; 王岩
Original assignee: Zuoyebang Education Technology Beijing Co Ltd
Current assignee: Beijing Baige Feichi Technology Co ltd
Priority date: 2021-07-23
Filing date: 2021-07-23
Publication date: 2021-11-02

Abstract

An image distortion coefficient extraction method, a distortion correction method, a system and an electronic device are provided. The method for extracting the image warping coefficient comprises the following steps: extracting a distortion coefficient on an image to be corrected by adopting a pre-trained artificial intelligence model; the artificial intelligence model is a U-Net network. The image distortion correction method comprises the following steps: extracting the distortion coefficient of the image to be corrected according to the extraction method of the image distortion coefficient; and performing distortion correction on the image to be corrected based on the distortion coefficient. The distortion detection method is simple and rapid, can rapidly and accurately distinguish the distortion degree existing in the image through a simple model of machine learning training, and can achieve the pixel-by-pixel degree; the distortion correction method of the invention uses advanced deep learning algorithm, and has the advantages of strong robustness, short time consumption and good effect.

Description

Image distortion coefficient extraction method, distortion correction method and system, and electronic device

Technical Field

The invention belongs to the technical field of machine learning, and particularly relates to an image distortion coefficient extraction method, a distortion correction method and a system, and electronic equipment and a computer readable medium adopting the same.

Background

In daily life or in the business of internet companies, when a document image needs to be processed, the problem of distortion of key elements such as characters and figures in the document image is often encountered, and once the key elements are distorted, difficulties are caused to subsequent various applications. It would be a great benefit to the user experience and to the subsequent business development if the distorted image could be reconstructed as flat as possible. The traditional distortion correction method can divide characters recognized from a shot image into a plurality of connected domains in a text domain division mode through the characters recognized from the image, each connected domain contains a Chinese character, then the upper, lower, left and right boundaries of the connected domains are marked, the connected domains are connected into text lines through the upper, lower, left and right boundaries, then a correction reference line is detected according to the first connected domain of the text lines, and character distortion is corrected according to the relative vertical displacement of the correction reference line and each character. The method has a good effect in a large-scale text document scene, but cannot achieve a good effect in the case of a text document with mixed pictures and texts. In recent years, a batch of self-adaptive methods using deep learning also appear, and the traditional method and the early deep learning method are poor in effect, poor in robustness and long in time consumption, and cannot meet the requirement of large-batch use. Therefore, how to research and develop a method and a system for correcting image distortion is a problem to be solved.

Disclosure of Invention

In view of the above, the present invention is directed to an image warping coefficient extracting method, a warping correcting method and system, and an electronic device and a computer readable medium using the same, which are intended to at least partially solve at least one of the above technical problems.

In order to achieve the above object, as a first aspect of the present invention, there is provided an image warping coefficient extracting method, comprising the steps of:

extracting a distortion coefficient on an image to be corrected by adopting a pre-trained artificial intelligence model;

the pre-trained artificial intelligence model is a U-Net network, the input of the U-Net network structure is an image to be distorted and corrected and a coordinate value of a position to be extracted, and the output is the offset of the coordinate value in the x direction and the offset of the coordinate value in the y direction required by distortion recovery.

As a second aspect of the present invention, there is also provided an image distortion correction method including the steps of:

according to the method for extracting the image distortion coefficient, the distortion coefficient of the image to be corrected is extracted;

and performing distortion correction on the image to be corrected based on the distortion coefficient.

As a third aspect of the present invention, there is also provided an image distortion correction system including:

the method comprises the steps of training an artificial intelligence model in advance, wherein the artificial intelligence model is a U-Net network;

the distortion coefficient extraction module is used for extracting the distortion coefficient on the image to be corrected by adopting the artificial intelligence model according to the distortion coefficient extraction method;

and the distortion correction module is used for performing distortion correction on the image to be corrected by adopting a deep learning model based on an encoder-decoder structure based on the distortion coefficient extracted by the distortion coefficient extraction module.

As a fourth aspect of the present invention, there is also provided an electronic device comprising a processor and a memory for storing a computer-executable program, the processor performing the method as described above when the computer-executable program is executed by the processor.

As a fifth aspect of the present invention, there is also provided a computer-readable medium storing a computer-executable program which, when executed, implements the method as described above.

Based on the above technical solution, the image distortion coefficient extraction method, the distortion correction method and the system of the present invention have at least one of the following advantages compared with the prior art:

the distortion detection method is simple and rapid, can rapidly and accurately distinguish the distortion degree existing in the image through a simple model of machine learning training, and can achieve the pixel-by-pixel degree;

the distortion correction method of the invention uses advanced deep learning algorithm, and has the advantages of strong robustness, short time consumption and good effect.

Drawings

FIG. 1 is a block flow diagram of an image warping coefficient extraction method of the present invention;

FIG. 2 is a block flow diagram of an image distortion correction method of the present invention;

FIG. 3 is a schematic diagram of an image distortion correction system according to the present invention;

FIG. 4 is a schematic diagram of a U-Net network;

FIG. 5 is a schematic diagram of the electronic device of the present invention;

FIG. 6 is a schematic diagram of a computer readable medium of the present invention;

FIG. 7 is a block flow diagram of an image distortion correction method according to embodiment 2 of the present invention;

fig. 8-12 are actual picture effects processed by the image distortion correction methods/systems of embodiments 1 and 2 of the present invention, respectively.

Detailed Description

The flow chart in the drawings is only an exemplary flow demonstration, and does not represent that all the contents, operations and steps in the flow chart are necessarily included in the scheme of the invention, nor does it represent that the execution is necessarily performed in the order shown in the drawings. For example, some operations/steps in the flowcharts may be divided, some operations/steps may be combined or partially combined, and the like, and the execution order shown in the flowcharts may be changed according to actual situations without departing from the gist of the present invention.

The block diagrams in the figures generally represent functional entities and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different network and/or processing unit devices and/or microcontroller devices.

The invention is based on the problem of distortion correction of documents, wherein the characteristics of a U-Net model are utilized to solve the problem simply and quickly.

U-Net is also colloquially a variant of a convolutional neural network, the structure of which is drawn somewhat like the letter U (see FIG. 3), hence the name U-Net. The whole U-Net neural network mainly comprises two parts: a contracting path (contracting path) and an expanding path (expanding path). The contraction path is mainly used to capture context information (context information) in the picture, and the symmetrical expansion path is used to precisely locate the portion of the picture that needs to be segmented. The appearance of U-Net is helpful for deep learning imagery for fewer samples because U-Net improves based on FCN (full convolutional Neural Network) and some data for fewer samples can be trained with data augmentation (data augmentation).

However, although U-Net is improved based on FCN, U-Net does not simply encode and decode the picture as FCN, and for accurate positioning, U-Net extracts the high pixel feature from the systolic path and combines it with a new feature map (feature map) in the up-sampling (upsampling) process to maximally retain some important feature information of the previous down-sampling (downsampling) process. In order to enable the network structure to operate more efficiently, the structure does not have full connected layers (full connected layers), so that parameters needing to be trained can be reduced to a great extent, dynamic size reasoning can be achieved, and meanwhile, all information in the picture can be well reserved due to the special U-shaped structure.

Every 3 × 3 convolutional layers (unpadded connected layers) are followed by a 2 × 2 max pooling layer (step size of 2) in the contraction path, and each convolutional layer is followed by a Relu activation function to perform down-sampling operation on the original picture, in addition to which each down-sampling operation is increased by one channel number (double the number of feature channels).

In the extended path upsampling (transposedConvolation), there is one 2 x 2 convolutional layer (the activation function is also Relu) and two 3 x 3 convolutional layers per step, while the upsampling of each step adds the feature map from the corresponding shrink path (tailored to maintain the same shape).

At the last layer of the network is a 1 x1 convolutional layer, by which the 64-channel feature vectors can be converted to the number of classification results required (e.g., 2), and finally, the entire network of U-Net has 23 convolutional layers.

Based on the characteristics of the U-Net, the invention tries to regress the distortion coefficient pixel by using the thought of pixel by pixel classification of the semantic segmentation technology, thereby achieving the distortion correction effect of the pixel level and obtaining great success as a result.

As shown in fig. 1, the specific scheme of the image distortion correction method is as follows:

correcting the distortion coefficient on the image to be corrected by adopting a pre-trained artificial intelligence model;

the pre-trained artificial intelligence model is a U-Net network, and the U-Net network with a cavity convolution structure is further preferred.

In the U-net network with the cavity convolution structure, the cavity convolution has the effect of expanding the receptive field and ensuring the resolution of the features.

The pre-trained U-Net network is generated by the following method:

constructing a U-net network structure for distortion correction; the input of the U-net network structure is an image to be distorted and corrected and coordinate values of a point (x1, y1) to be detected, and the offset (delta x, delta y) in the x direction and the y direction required by distortion recovery of the coordinate values is output.

And training the constructed U-net network structure by adopting a large number of samples, and converging the obtained U-net network structure through network regression.

The large number of samples do not need to be labeled and are all based on the samples generated by the user.

The self-generated sample can be derived from an original completely flat and distortion-free document image, and various distortion transformations are performed on the original completely flat and distortion-free document image to simulate the distortion required to be processed by the model, and meanwhile, the x-direction offset and the y-direction offset required by distortion recovery are recorded, and then a pair consisting of the distorted image and the true flat image is generated.

Wherein the method further comprises:

and designing a uniform warping processing framework to perform warping operation on the flat image.

Specifically, a uniform image warping processing framework is designed for several common warping types in an actual scene, such as page tilting, perspective transformation, book page turning and the like, and the warping types to be generated can be simply and quickly configured in a configuration file mode. When the actual scene is processed, the frequency of each distortion type in the actual scene is counted firstly, and corresponding distortion data is generated, so that the identification precision is improved.

As shown in fig. 2, the present invention also discloses a distortion correction method, comprising the following steps:

according to the method for extracting the distortion coefficient, the distortion coefficient of the image to be corrected is extracted;

Wherein the warping correction is based on a deep learning model of an encoder-decoder structure to achieve image reconstruction, wherein the encoder extracts features from the input image, and the decoder restores a flattened image from the extracted features.

Wherein the model is normalized using batch normalization and layer normalization.

Wherein, the loss function of the deep learning model is calculated by the following method:

calculating the offset in the x direction and the offset in the y direction of the image output by the deep learning model based on the acquired warping coefficient, and performing mean square error calculation on the grid point data in the two directions and the offset in the real x direction and the offset in the real y direction which are saved when the warped image is generated, wherein the mean square error calculation is used as a grid point loss function of the network; calculating a mean square error loss function between the image output by the model and the real flat image; and calculating a gradient according to the two loss functions, and updating the whole deep learning model.

The extraction of the warping coefficient on the image to be corrected can be pixel by pixel, all points on the image to be corrected can be passed once, the processing time is long, and the precision is particularly high.

After the distortion is corrected by the method, local smoothing can be performed by using Gaussian blur to eliminate local abnormal distortion.

As shown in fig. 3, the present invention also discloses an image distortion correction system, comprising:

the method comprises the steps of training an artificial intelligence model in advance, wherein the artificial intelligence model is a U-Net network and is used for extracting a distortion coefficient of an image to be corrected;

The artificial intelligence model preferably adopts a U-net network with a cavity convolution structure, and the cavity convolution has the effect of enlarging the receptive field and ensuring the resolution of the features.

The pre-trained U-Net network is generated by the following method:

The image distortion correction system further comprises a distortion processing frame module, which is used for a user to select/configure a file to determine the type of distortion to be generated, such as page tilting, perspective transformation, book page turning and the like; and counting the occurrence frequency of each distortion type in the actual scene, and determining the type of the distortion data in the self-generated sample.

The distortion correction module is used for realizing the reconstruction of the image based on a deep learning model of an encoder-decoder structure, wherein an encoder extracts features from an input image, and a decoder restores a flat image from the extracted features.

Wherein the deep learning model is normalized using batch normalization and layer normalization.

After the distortion correction is finished by the distortion correction model, local smoothing is carried out by using Gaussian blur, and local abnormal distortion is eliminated.

In one embodiment, the method and the device provide support for restoring the text page image as a processing step/processing unit, and further process the text image according to the distortion degree of the text on the basis of other related processing on the text, so that a relatively flat text image interface is obtained after the processing of the method for the distorted image with different degrees, thereby improving the use experience of a user.

In one embodiment, when a user wants to obtain an electronic test paper with handwriting removed through a paper test paper, the paper test paper needs to be converted into an electronic form, so that the user can conveniently answer the electronic test paper or print the electronic test paper in batches. After a user inputs a text image of a test paper, the distortion processing method is adopted while the related processing is carried out on the text page image of the test paper to be restored, the distortion coefficient is extracted from each pixel in the test paper image, and the processed text page image is distorted and corrected based on the obtained distortion coefficient to obtain the restored text page image. The performing related processing on the test paper text page image to be restored may include: and removing handwriting traces, watermarks, stains and the like.

In the above-mentioned process of inputting images, because the user takes the angle, the form of the test paper (not only single-page test paper, but also customized test paper) or the angle of arrangement (most probably, the angle is not flat, the effect of the electronic page is not exhibited, and even the deviation is far), and the format of the image is not restricted when the image is uploaded, therefore, considering the influence of the above factors, if it is desired to obtain an image that is as flat as possible, the warping of the image needs to be handled, and in order to accommodate different degrees of warping in different ways, the present embodiment performs the warping process using the method described above, providing a flat, horizontal image for the restoration of text, on the basis of the relevant treatment (such as handwriting removal) of the test paper, the test paper is further restored to be a flat unanswered test paper with high quality, so that the user can answer the test paper again conveniently or other purposes of the user are achieved.

The invention also discloses an electronic device comprising a processor and a memory for storing a computer executable program, wherein the processor performs the method as described above when the computer executable program is executed by the processor.

The electronic device may be embodied in the form of a general purpose computing device, for example. The number of the processors may be one, or may be multiple and work together. The invention also does not exclude that distributed processing is performed, i.e. the processors may be distributed over different physical devices. The electronic device of the present invention is not limited to a single entity, and may be a sum of a plurality of entity devices.

In which a memory stores a computer-executable program, typically machine-readable code, which is executable by the processor to enable an electronic device to perform the method of the invention, or at least some of the steps of the method.

The memory may include volatile memory, such as Random Access Memory (RAM) and/or cache memory, and may also be non-volatile memory, such as read-only memory (ROM).

Optionally, in this embodiment, the electronic device further includes an I/O interface, which is used for data exchange between the electronic device and an external device. The I/O interface may be a local bus representing one or more of several types of bus structures, including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, and/or a memory storage device using any of a variety of bus architectures.

Elements or components not shown in the above examples may also be included in the electronic device of the present invention. For example, some electronic devices further include a display unit such as a display screen, and some electronic devices further include a human-computer interaction element such as a button, a keyboard, and the like. Electronic devices are considered to be covered by the present invention as long as the electronic devices are capable of executing a computer-readable program in a memory to implement the method of the present invention or at least a part of the steps of the method.

The present invention also discloses a computer readable medium having a computer executable program stored thereon, wherein the computer executable program, when executed, implements a method as described above.

A computer readable storage medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable storage medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Python, Java, C + + or the like and conventional procedural programming languages, such as the C language, assembly language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

In order that the objects, technical solutions and advantages of the present invention will become more apparent, the present invention will be further described in detail with reference to the accompanying drawings in conjunction with the following specific embodiments. It should be noted that the following examples are only for illustrating the present invention and are not intended to limit the present invention.

Example 1

The image distortion correction system of embodiment 1, specifically comprising:

an artificial intelligence model; the artificial intelligence model is a U-net network with a cavity convolution structure;

the training module is used for training the artificial intelligence model to generate an artificial intelligence model capable of extracting a distortion coefficient; the training module trains the U-Net network by the following method:

constructing a U-net network with a cavity convolution structure for distortion correction; the input of the U-net network is an image to be distorted and corrected and coordinate values of a point (x1, y1) to be detected, and the offset (delta x, delta y) in the x direction and the y direction required by distortion recovery of the coordinate values is output;

and training the constructed U-net network by adopting a large number of samples, and converging the U-net network through network regression.

The distortion coefficient extraction module is used for extracting the distortion coefficient on the image to be corrected by adopting the artificial intelligence model trained by the training model;

The large number of samples are all self-generated samples, namely based on the original completely flat document image, various warping transformations are carried out on the sample to simulate the warping required to be processed by the model, and meanwhile, the offsets in the x direction and the y direction required by warping recovery are recorded.

Wherein the loss function of the deep learning model used by the distortion correction module is calculated by the following method:

Example 2

The image distortion correction method of embodiment 2 includes 5 steps:

1. smooth image acquisition

The flat image is collected by an image sensor (not limited to various types of CCD, CMOS and the like), and the collecting device can be a digital camera, a mobile phone camera or a scanner.

2. Warped image data generation

In order to train the model, it is necessary to generate pair of a warped input image and a true flat image.

Firstly, designing a uniform distortion processing framework to carry out distortion operation on the flat image.

Aiming at a plurality of common distortion types in an actual scene, such as page inclination, perspective transformation, book page turning and the like, a uniform image distortion processing framework is designed, and the distortion types required to be generated in the experiment can be simply and rapidly configured in a file configuration mode.

And secondly, counting the frequency of each distortion type in the actual scene to generate corresponding distortion data.

3. Image reconstruction based on encoder-decoder structure

An encoder-decoder (encoder-decoder) structure-based deep learning model is used, the encoder extracts features from an input image, the decoder restores a flat image from the extracted features, and the whole process is normalized by using batch _ normalization and layer _ normalization.

4. Calculating loss function (loss) and updating model

Calculating lattice point data in the x direction and lattice point data in the y direction of an image output by a model, and performing mean square error calculation on the lattice point data in the two directions and the lattice point data in the real x direction and the lattice point data in the real y direction which are stored when a distorted image is generated to serve as lattice points loss of a network;

calculating the mean square error (loss) between the image output by the model and the real smooth image;

and thirdly, updating the whole model according to the two loss calculation gradients.

5. Subsequent business processing

And inputting the reconstructed image into a subsequent business process.

The reconstruction effect of the distorted image in the real scene using embodiments 1 and 2 is shown in fig. 8-12, where the original image is on the left and the image output by the model is on the right. It can be seen from fig. 8-12 that the method of the present invention can strongly correct the distortion of the text, whether it is a page curl, tilt or perspective transformation, whether the text is clearly blurred or not, whether there is a picture or frame pattern, and whether it is a language or a math test paper, and the result is satisfactory.

While the foregoing embodiments have described the objects, aspects and advantages of the present invention in further detail, it should be understood that the present invention is not inherently related to any particular computer, virtual machine or electronic device, and various general-purpose machines may be used to implement the present invention. The invention is not to be considered as limited to the specific embodiments thereof, but is to be understood as being modified in all respects, all changes and equivalents that come within the spirit and scope of the invention.

Claims

1. A method for extracting an image warping coefficient is characterized by comprising the following steps:

2. The method of claim 1, wherein the pre-trained artificial intelligence model is a U-net network with a hole convolution structure;

optionally, the pre-trained U-Net network is generated by the following method:

constructing a U-net network for extracting the distortion coefficient;

training the constructed U-net network by adopting a sample; wherein the sample does not require labeling.

3. The method of claim 2, wherein the samples are derived from an otherwise completely flat document image, subjected to various warping transformations to simulate the warping that the artificial intelligence model needs to handle, while recording x-direction and y-direction offsets that are needed for warping recovery.

4. An image distortion correction method, comprising the steps of:

the method for extracting the image distortion coefficient according to any one of claims 1 to 3, wherein the distortion coefficient of the image to be corrected is obtained by extraction;

5. The method of claim 4, wherein the distortion correction is based on a deep learning model of an encoder-decoder structure to achieve image reconstruction, wherein the encoder extracts features from the input image and the decoder restores a flat image from the extracted features.

6. The method of claim 4, wherein the loss function of the deep learning model is calculated by:

7. The method according to claim 5, characterized in that the step of extracting the warping coefficients for the image to be rectified is pixel by pixel;

optionally, after the distortion correction is performed on the image to be corrected, local smoothing is performed by using gaussian blur, so as to eliminate local abnormal distortion.

8. An image distortion correction system, comprising:

a warping coefficient extracting module, configured to extract a warping coefficient on an image to be corrected by using the artificial intelligence model according to the warping coefficient extracting method of any one of claims 1 to 3;

9. An electronic device comprising a processor and a memory, the memory for storing a computer-executable program, characterized in that:

the computer executable program, when executed by the processor, performs the method of any of claims 1-7.

10. A computer-readable medium storing a computer-executable program, wherein the computer-executable program, when executed, implements the method of any of claims 1-7.