CN113723289B

CN113723289B - Image processing method, device, computer equipment and storage medium

Info

Publication number: CN113723289B
Application number: CN202111007568.2A
Authority: CN
Inventors: 李玖林; 喻红
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2021-08-30
Filing date: 2021-08-30
Publication date: 2024-03-08
Anticipated expiration: 2041-08-30
Also published as: CN113723289A

Abstract

The application relates to artificial intelligence, and provides an image processing method, an image processing device, computer equipment and a storage medium, wherein the image processing method comprises the following steps: acquiring a document image to be restored containing a deformed document; extracting the two-dimensional characteristic information of the document image to be restored, and generating a two-dimensional characteristic image of the document image to be restored based on the two-dimensional characteristic information of the document image to be restored; generating three-dimensional characteristic information of the document image to be restored based on the two-dimensional characteristic image; determining a target transformation relation according to the two-dimensional characteristic information and the three-dimensional characteristic information of the document image to be restored, wherein the target transformation relation comprises a transformation relation between the document image to be restored and an undeformed document image corresponding to the deformed document; and restoring the document image to be restored according to the target transformation relation.

Description

Image processing method, device, computer equipment and storage medium

Technical Field

The embodiment of the invention relates to an artificial intelligence technology, in particular to an image processing method, an image processing device, computer equipment and a storage medium.

Background

Paper documents are the most important information carriers, recording a lot of important information. With the development of technology and the increasing of camera pixels, people can conveniently and quickly digitize the file through mobile phone photographing and image recognition, for example, convert a document picture into pdf, excel and other formats, and then store the document picture in a computer.

If it is desired to store document information in a document image with high quality and high accuracy, there is a certain demand for a photographed document image. However, when shooting, the document may have factors such as physical deformation, stronger light or weaker light, so that the document image shot by the mobile phone has deformation and curl with different degrees, and the recognition accuracy of the information in the document image is lower.

Disclosure of Invention

The embodiment of the invention provides an image processing method, an image processing device, computer equipment and a storage medium, which can improve the recognition precision of a distorted document image.

In order to solve the technical problems, the embodiment of the invention adopts the following technical scheme: there is provided a method of processing an image,

comprising the following steps:

acquiring a document image to be restored containing a deformed document;

extracting two-dimensional characteristic information of the document image to be restored according to a first model, and generating a two-dimensional characteristic image of the document image to be restored based on the two-dimensional characteristic information of the document image to be restored;

generating three-dimensional characteristic information of the document image to be restored based on a second model and the two-dimensional characteristic image;

the second model determines a target transformation relation according to the two-dimensional characteristic information and the three-dimensional characteristic information of the document image to be restored, wherein the target transformation relation comprises a transformation relation between the document image to be restored and an undeformed document image corresponding to the deformed document, and the transformation relation is used for representing information from a non-distorted state to a distorted state of the distorted document in the document image to be restored;

And the second model restores the document image to be restored according to the target transformation relation, wherein the first model and the second model form a stacked network model.

In some modes, the method further comprises the steps of determining a target transformation relation according to the two-dimensional characteristic information and the three-dimensional characteristic information of the document image to be restored, and before restoring the document image to be restored according to the target transformation relation: acquiring a target sample set, and constructing a first loss function, wherein each training sample in the target sample set is a first image containing deformed documents, and one first image corresponds to one second image containing undeformed documents; training a first preset network according to the target sample set and the first loss function; stopping training the first preset network when the value of the first loss function is not reduced any more, so as to obtain a first model; the first loss function is used for representing an error between a prediction result of the first preset network on a third image and a fourth image, the third image is any one of the first images in the training sample set, and the fourth image is a second image corresponding to the third image.

In some aspects, after stopping training the first preset network when the value of the first loss function is no longer decreasing to obtain a first model, the method further includes: converting the document image to be restored into a two-dimensional characteristic image through the first model; the two-dimensional characteristic image is a sixty-four-channel image, and the characteristic information of the two-dimensional characteristic image comprises at least one of the following: the document image to be restored contains text information, and the texture information of the document image to be restored.

In some manners, after the converting the document image to be restored into the two-dimensional feature image by the first model, the method further includes: extracting a first feature vector of the two-dimensional feature image output by the first model; performing a deconvolution operation on the first feature vector through a second preset network to obtain a second feature vector set, wherein the second feature vector set comprises second feature vectors obtained by convolution of each layer in the deconvolution operation process; performing a deconvolution operation on the target feature vector through the second preset network to obtain a third feature vector set; the target feature vector is a second feature vector obtained by the last layer of convolution in the deconvolution operation; the third eigenvector set comprises a third eigenvector obtained by convolution of each layer in the process of the deconvolution operation; the third feature vector set comprises a third feature vector which has a mapping relation with any one of the second feature vectors in the second feature vector set.

In some manners, after the performing, by the second preset network, a deconvolution operation on the target feature vector to obtain a third feature vector set, the method further includes: fusing the second feature vector in the second feature vector set with the third feature vector in the third feature vector set to obtain a fourth image; the fourth image is a three-dimensional characteristic image of the undeformed document image obtained by transforming the first image.

In some manners, after the fusing the second feature vector in the second feature vector set with the third feature vector in the third feature vector set to obtain the fourth image, the method further includes: according to the comparison result of the fourth image and the second image, adjusting parameters of the second preset network; stopping training the second preset network when the value of the second loss function of the second preset network is not reduced any more, so as to obtain a second model; wherein the second loss function is used to represent an error between the fourth image and the second image.

In some modes, the determining a target transformation relation according to the two-dimensional feature and the three-dimensional feature of the document image to be restored, and restoring the document image to be restored according to the target transformation relation includes: inputting the document image to be restored into the first model to obtain a two-dimensional characteristic image; and inputting the two-dimensional characteristic image into the second model to obtain the target document image after the document image to be restored is restored.

In order to solve the above technical problem, an embodiment of the present invention further provides an image processing apparatus, including:

the acquisition module is used for acquiring a document image to be restored containing the deformed document;

the feature extraction module is used for extracting the two-dimensional feature information of the document image to be restored according to the first model and generating a two-dimensional feature image of the document image to be restored based on the two-dimensional feature information of the document image to be restored;

the feature extraction module is further used for generating three-dimensional feature information of the document image to be restored based on a second model and the two-dimensional feature image;

the determining module is used for determining a target transformation relation according to the two-dimensional characteristic information and the three-dimensional characteristic information of the document image to be restored, wherein the target transformation relation comprises a transformation relation between the document image to be restored and an undeformed document image corresponding to the deformed document, and the transformation relation is used for representing information from an undefined state to a distorted state of the distorted document in the document image to be restored;

and the recovery module is used for recovering the document image to be recovered according to the target transformation relation by the second model, wherein the first model and the second model form a stacked network model.

In some aspects, the apparatus further comprises a training module; the acquisition module is further used for acquiring a target sample set and constructing a first loss function, each training sample in the target sample set is a first image containing deformed documents, and one first image corresponds to one second image containing undeformed documents; the training module is used for training a first preset network according to the target sample set and the first loss function; the training module is further configured to stop training the first preset network when the value of the first loss function is no longer reduced, so as to obtain a first model; the first loss function is used for representing an error between a prediction result of the first preset network on a third image and a fourth image, the third image is any one of the first images in the training sample set, and the fourth image is a second image corresponding to the third image.

In some modes, the feature extraction module is further configured to convert the document image to be restored into a two-dimensional feature image through the first model; the two-dimensional characteristic image is a sixty-four-channel image, and the characteristic information of the two-dimensional characteristic image comprises at least one of the following: the document image to be restored contains text information, and the texture information of the document image to be restored.

In some manners, the feature extraction module is further configured to extract a first feature vector of the two-dimensional feature image output by the first model; the training module is further configured to perform a deconvolution operation on the first feature vector through a second preset network to obtain a second feature vector set, where the second feature vector set includes a second feature vector obtained by convolving each layer in the deconvolution operation process; the training module is further configured to perform a convolution operation on the target feature vector through the second preset network to obtain a third feature vector set; the target feature vector is a second feature vector obtained by the last layer of convolution in the deconvolution operation; the third eigenvector set comprises a third eigenvector obtained by convolution of each layer in the process of the deconvolution operation; the third feature vector set comprises a third feature vector which has a mapping relation with any one of the second feature vectors in the second feature vector set.

In some modes, the training module is further configured to fuse a second feature vector in the second feature vector set with a third feature vector in the third feature vector set to obtain a fourth image; the fourth image is a three-dimensional characteristic image of the undeformed document image obtained by transforming the first image.

In some modes, the training module is further configured to adjust parameters of the second preset network according to a comparison result of the fourth image and the second image; the training module is further configured to stop training the second preset network when the value of the second loss function of the second preset network is no longer reduced, so as to obtain a second model; wherein the second loss function is used to represent an error between the fourth image and the second image.

In some modes, the recovery module is specifically configured to input the document image to be recovered into the first model to obtain a two-dimensional feature image; and inputting the two-dimensional characteristic image into the second model to obtain the target document image after the document image to be restored is restored.

In order to solve the above technical problem, an embodiment of the present invention further provides a computer device, which includes a memory and a processor, where the memory stores computer readable instructions, and when the computer readable instructions are executed by the processor, the processor is caused to execute the steps of the image processing method.

To solve the above technical problem, embodiments of the present invention further provide a storage medium storing computer readable instructions, where the computer readable instructions when executed by one or more processors cause the one or more processors to perform the steps of the image processing method described above.

The embodiment of the invention has the beneficial effects that: after the document image to be restored containing the deformed document is obtained, two-dimensional features and three-dimensional features are lifted from the document image to be restored, a target transformation relation is determined according to the two-dimensional features and the three-dimensional features of the document image to be restored, and then the document image to be restored is restored according to the target transformation relation, so that the recognition precision and quality of document information contained in the deformed document image are improved.

Drawings

The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings, in which:

FIG. 1 is a flow chart of an image processing method according to an embodiment of the present application;

FIG. 2 is a second flow chart of an image processing method according to an embodiment of the present application;

FIG. 3 is a third flow chart of an image processing method according to an embodiment of the present application;

FIG. 4 is a flowchart of an image processing method according to an embodiment of the present application;

FIG. 5 is a fifth flow chart of an image processing method according to an embodiment of the present application;

FIG. 6 is a flowchart of an image processing method according to an embodiment of the present application;

FIG. 7 is a schematic diagram of a second default network according to one embodiment of the present application;

FIG. 8 is a flow chart of an image processing method according to an embodiment of the present application;

fig. 9 is a schematic view of the basic structure of an image processing apparatus according to an embodiment of the present application;

fig. 10 is a basic structural block diagram of a computer device according to an embodiment of the present application.

Detailed Description

Embodiments of the present application are described in detail below, examples of which are illustrated in the accompanying drawings, wherein the same or similar reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the drawings are exemplary only for the purpose of illustrating the present application and are not to be construed as limiting the present application.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless expressly stated otherwise, as understood by those skilled in the art. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. Furthermore, in the description and claims, "and/or" means at least one of the connected objects, and the character "/", generally means that the associated object is an "or" relationship.

It will be understood by those skilled in the art that all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs unless defined otherwise. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

Paper documents are the most important information carrier in daily work and life in the fields of banking, securities, insurance, education, medical etc., recording many valuable information. With the development of technology and the increasing increase of camera pixels, people can digitize the file more conveniently and quickly by photographing through a mobile phone, for example, convert a document picture into a pdf, excel and other formats, and then store the document picture in a computer. When taking natural pictures, it is best to save the document image information with as good quality accuracy as possible. However, when shooting, the document may have factors such as physical deformation and uncontrollable light intensity, so that natural photos shot by the mobile phone may be deformed and curled to different degrees.

Therefore, these original images often cause great difficulty in automatic information extraction and content analysis, and are difficult to digitize. There are conventional methods and deep learning methods for the technology of this case. Conventional methods typically rely on paper geometry for recovery. The shape is first represented by artificially tailored parameters, then the three-dimensional shape of the sheet is calculated, and they use techniques such as parameter optimization control to calculate the map image from the warped image and the calculated shape. A common disadvantage of these methods is that they are often computationally intensive and slow due to the optimization process and the manual adjustment of parameters, etc.

Aiming at the problems in the training mode, the application discloses an image processing method which is used for carrying out reverse deduction according to the three-dimensional shape change of the distorted paper to obtain a change function of the paper from a flat state to a distorted state, and carrying out reverse conversion on the distorted paper based on the change function to obtain a flattened paper image, so that the recognition precision and quality of the distorted document image are greatly improved.

As shown in fig. 1, a flowchart of an image processing method provided in this embodiment includes S201 to S203:

S201, obtaining a document image to be restored containing the deformed document.

The document image to be restored can be shot by a user through the electronic equipment, can be stored on the electronic equipment, and can be downloaded from a network. The image content in the document image to be restored includes a distorted document, typically a paper document.

Note that, in the embodiment of the present application, the description of distortion, deformation, folding, and the like is used to represent the state of a document, and at this time, the document has a certain degree of undulation, deformation, and the like in a three-dimensional space. The flattened or flattened state of the document is distinguished from the above-mentioned states of twisting, deforming, folding, etc., and represents that the document has a lower degree of undulation and deformation in three-dimensional space.

S202, extracting the two-dimensional characteristic information of the document image to be restored, and generating the two-dimensional characteristic image of the document image to be restored based on the two-dimensional characteristic information of the document image to be restored.

S203, determining a target transformation relation according to the two-dimensional characteristic information and the three-dimensional characteristic information of the document image to be restored.

The target transformation relation comprises a transformation relation between the document image to be restored and an undeformed document image corresponding to the deformed document.

Illustratively, after the document image to be restored is acquired, two-dimensional feature information of the document image to be restored may be extracted, the two-dimensional feature information including at least one of: text information contained in the document image to be restored, color information contained in the document image to be restored, and texture information of the document image to be restored.

It is understood that the text information may include the region of the text included in the document image to be restored and information such as color, shape, outline, etc. of the text. The color information may include information such as color, brightness, etc. of each region in the document image to be restored. The texture information may include information such as the shape and outline of the distorted document in the document image to be restored.

The two-dimensional feature information is used to represent information obtained by roughly extracting information included in the document image to be restored. After the electronic equipment acquires the two-dimensional characteristic information, the electronic equipment determines the original position information of each characteristic point in the document image to be restored, namely the position information of the characteristic point before the document is not deformed through the acquired current position information of each characteristic point in the document image to be restored and the two-dimensional characteristic information after calculation.

It can be understood that, since the distortion of the document is a change in three-dimensional space, only two-dimensional feature information of the document image to be restored is obtained, and the change rule of the document in three-dimensional space cannot be determined, that is, the motion vector of each feature point in three-dimensional space cannot be determined. Therefore, after the two-dimensional feature information of the document image to be restored is acquired, three-dimensional feature information of the document image to be restored is also required to be acquired.

Illustratively, the three-dimensional feature information of the document image to be restored is obtained based on a two-dimensional feature image generated by extracting the two-dimensional feature information of the document image to be restored. Or can be obtained directly based on the document image to be restored.

For example, after the three-dimensional feature information of the document image to be restored is obtained, the target transformation relationship from the untwisted state to the twisted state of the twisted document in the document image to be restored may be determined based on the two-dimensional feature information and the three-dimensional feature information of the document image to be restored.

S204, restoring the document image to be restored according to the target transformation relation.

For example, after obtaining the target transformation relationship from the untwisted state to the twisted state of the distorted document in the document image to be restored, the document image to be restored may be reversely transformed based on the target transformation relationship, so as to obtain the restored document image, that is, the document image in the undeformed state.

In this way, after the document image to be restored containing the deformed document is obtained, the two-dimensional feature and the three-dimensional feature are lifted from the document image to be restored, the target transformation relation is determined according to the two-dimensional feature and the three-dimensional feature of the document image to be restored, and then the document image to be restored is restored according to the target transformation relation, so that the recognition precision and quality of the document information contained in the deformed document image are improved.

In one possible implementation manner, the specific steps of extracting the two-dimensional feature information and the three-dimensional feature information of the document image to be restored and determining a target transformation relationship according to the two-dimensional feature information and the three-dimensional feature information of the document image to be restored, and restoring the document image to be restored according to the target transformation relationship may be performed through a stacked neural network model.

The stacked neural network model includes a first model and a second model, and the target document image in an undeformed state can be obtained by inputting the document image to be restored into the stacked neural network model.

Illustratively, before using the stacked neural network model, the stacked neural network model needs to be trained to properly process images containing warped documents.

Illustratively, as shown in fig. 2, before the step S202, the image processing method provided in the embodiment of the present application may further include the following steps S202a1 to S202a3:

s202a1, acquiring a target sample set and constructing a first loss function.

Each training sample in the target sample set is a first image containing deformed documents, and one first image corresponds to one second image containing undeformed documents.

Illustratively, each sample in the target sample set includes a first image including a deformed document and a second image including an undeformed document corresponding to the first image.

For example, the first image including the deformed document may be a user photographed image, that is, the user first photographs an image of the document in an undeformed state, then performs operations such as folding and rubbing on the document to obtain a deformed document, and photographs the deformed document to obtain an image including the deformed document as a sample.

The first image may also be obtained by deforming the second image by the electronic device according to a preset deformation rule, for example, the first image may be obtained by affine transformation of the second image. In order to facilitate the acquisition of a large number of samples, this may be used as a sample acquisition means.

S202a2, training a first preset network according to the target sample set and the first loss function.

The first preset network may be a residual network including a preset number of residual blocks, and the residual network is mainly used for extracting two-dimensional characteristic information of the image. Since the first preset network is used for solving the regression problem, the first loss function may be a regression loss function.

Illustratively, the regression loss function described above may include any of the following: mean square error (mean square error, mse), mean absolute error (mean absolute error, mae), quantile loss (quantile loss), huber loss (huber loss).

It should be noted that, the weight of the last layer of the first preset network does not participate in training, and the number of neurons of the last layer is 64, so that the parameter is reduced, the training process is accelerated, and the training time is shortened. The first preset network is mainly used for roughly extracting the two-dimensional characteristics of the input image.

And S202a3, stopping training the first preset network when the value of the first loss function is not reduced any more, and obtaining a first model.

The first loss function is used for representing an error between a prediction result of the first preset network on a third image and a fourth image, the third image is any one of the first images in the training sample set, and the fourth image is a second image corresponding to the third image.

In an exemplary process of training the first preset network, the first preset network compares the generated predicted image with the second image, and counter-propagates the first preset network according to the comparison result until the first loss function is not reduced, the first preset network tends to be stable, and training of the first preset network is stopped.

Illustratively, after the first model is obtained, a two-dimensional feature image of the document image to be restored can be obtained through the first model.

As shown in fig. 3, after the step S202a3, the image processing method provided in the embodiment of the present application may further include the following step S202a4:

s202a4, converting the document image to be restored into a two-dimensional characteristic image through the first model.

The two-dimensional characteristic image is a sixty-four-channel image, and the characteristic information of the two-dimensional characteristic image comprises at least one of the following: the document image to be restored contains text information, and the texture information of the document image to be restored.

It will be appreciated that, corresponding to the last layer of the second preset network containing 64 neurons, the image output by the second model is a two-dimensional characteristic image of sixty four channels.

For example, after the first preset network is trained, the two-dimensional feature image output by using the first model can be used for training the second preset network.

Illustratively, as shown in fig. 4, after the step S202a4, the image processing method provided in the embodiment of the present application may further include the following steps S202b1 to S202b3:

s202b1, extracting a first feature vector of the two-dimensional feature image output by the first model.

The first model does not deform the input image, and only extracts the two-dimensional features of the input image.

S202b2, performing deconvolution operation on the first feature vector through a second preset network to obtain a second feature vector set.

The second eigenvector set comprises a second eigenvector obtained by convolution of each layer in the lower convolution operation process.

S202b3, performing a deconvolution operation on the target feature vector through the second preset network to obtain a third feature vector set.

The target feature vector is a second feature vector obtained by the last layer of convolution in the deconvolution operation; the third eigenvector set comprises a third eigenvector obtained by convolution of each layer in the process of the deconvolution operation; the third feature vector set comprises a third feature vector which has a mapping relation with any one of the second feature vectors in the second feature vector set.

Illustratively, after the two-dimensional feature image output by the first model is acquired, the second preset network extracts three-dimensional feature information of the two-dimensional feature image, and the second loss function of the second preset network may be a likelihood loss function.

For example, after the second preset network performs the lower convolution operation, the data output by each convolution layer needs to be integrated, as shown in fig. 5, and after step S202b3, the image processing method provided in the embodiment of the present application may further include the following step S202c1:

s202c1, fusing the second eigenvector in the second eigenvector set and the third eigenvector in the third eigenvector set to obtain a fourth image.

The fourth image is a three-dimensional characteristic image of the undeformed document image obtained by transforming the first image.

The predicted image output by the second preset network is an image containing an undeformed document predicted by the second preset network. In the training process, the second preset network compares the predicted image with the image containing the original undeformed document, adjusts model parameters according to the comparison result, and stops training the second preset network under the condition that the second arithmetic function is not reduced any more.

Illustratively, after step S202c1 described above as shown in fig. 6, the image processing method provided in the embodiment of the present application may further include the following steps S202c2 and S202c3:

s202c2, adjusting parameters of the second preset network according to the comparison result of the fourth image and the second image.

And S202c3, stopping training the second preset network when the value of the second loss function of the second preset network is not reduced any more, and obtaining a second model.

Wherein the second loss function is used to represent an error between the fourth image and the second image.

For example, as shown in fig. 7, the second predetermined network is shown in the schematic structure, where L1 to L3 are the down-convolution operations of the network on the input data, and R1 to R3 are the up-convolution operations, where R1 may be understood as a copy of L3, that is, after the network obtains the data of L3 through the down-convolution operation, the up-convolution operation is performed based on the data. After the up-convolution operation is performed, the network integrates the data output by each layer of convolution, so that the processing of the image not only considers the whole information, but also considers the detail information such as local textures and the like.

For example, after the second model is obtained, the image output by the first model may be predicted to obtain a target document image containing an undeformed document.

For example, after the first model and the second model are acquired, the step S204 may specifically include the following steps S204a1 and S204a2, as shown in fig. 8, according to the image processing method of the stacked network model:

s204a1, inputting the document image to be restored into the first model to obtain a two-dimensional characteristic image.

S204a2, inputting the two-dimensional characteristic image into the second model to obtain the target document image after the document image to be restored is restored.

Illustratively, after the document image to be restored is taken as an input image of the stacked network model, the stacked network model outputs a target document image containing an undeformed document.

According to the image processing method provided by the embodiment, two-dimensional characteristic information and three-dimensional characteristic information containing deformed document images are extracted through a stacked network model containing a first model and a second model, the mapping relation between the deformed document and the undeformed document is determined based on the two-dimensional characteristic information and the three-dimensional characteristic information, and the deformed document images are restored based on the mapping relation, so that images containing the undeformed document are obtained. The stacking network model not only considers the whole information, but also well considers the detail information such as local texture, and the like, and based on the stacking network, the low-level network architecture and the high-level network architecture are connected and combined with each other ingeniously, so that an information exchange fusion channel is created, the network has stronger generalization, is more intelligent, focuses on the whole and local association, and improves the recovery precision and quality of the model on images containing deformed documents.

The embodiment of the application can acquire and process the related data based on the artificial intelligence technology. Among these, artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use knowledge to obtain optimal results.

Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

It should be noted that, in the image processing method provided in the embodiment of the present application, the execution subject may be an image processing apparatus, or a control module for executing the image processing method in the image processing apparatus. In the embodiment of the present application, an image processing apparatus provided in the embodiment of the present application will be described by taking an example in which the image processing apparatus executes an image processing method.

The subject application is operational with numerous general purpose or special purpose computer system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like. The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices

In the embodiment of the present application, the image processing methods shown in the foregoing method drawings are all exemplified by a drawing in combination with one of the embodiments of the present application. In specific implementation, the image processing method shown in the foregoing method drawings may also be implemented in combination with any other drawing that may be illustrated in the foregoing embodiments, and will not be described herein.

Referring to fig. 9, fig. 9 is a schematic diagram illustrating a basic structure of an image processing apparatus according to the present embodiment.

As shown in fig. 9, an image processing apparatus includes: an obtaining module 801, configured to obtain a document image to be restored including a deformed document; the feature extraction module 802 is configured to extract two-dimensional feature information and three-dimensional feature information of the document image to be restored, and generate a two-dimensional feature image of the document image to be restored based on the two-dimensional feature information of the document image to be restored; the feature extraction module 802 is further configured to generate three-dimensional feature information of the document image to be restored based on the two-dimensional feature image; a determining module 803, configured to determine a target transformation relationship according to the two-dimensional feature information and the three-dimensional feature information of the document image to be restored, where the target transformation relationship includes a transformation relationship between the document image to be restored and an undeformed document image corresponding to the deformed document; and a restoring module 804, configured to restore the document image to be restored according to the target transformation relationship.

In some aspects, the apparatus further comprises a training module 805; the obtaining module 801 is further configured to obtain a target sample set, and construct a first loss function, where each training sample in the target sample set is a first image including a deformed document, and one first image corresponds to one second image including an undeformed document; a training module 805 configured to train a first preset network according to the target sample set and the first loss function; the training module 805 is further configured to stop training the first preset network when the value of the first loss function is no longer reduced, to obtain a first model; the first loss function is used for representing an error between a prediction result of the first preset network on a third image and a fourth image, the third image is any one of the first images in the training sample set, and the fourth image is a second image corresponding to the third image.

In some manners, the feature extraction module 802 is further configured to convert, by the first model, the document image to be restored into a two-dimensional feature image; the two-dimensional characteristic image is a sixty-four-channel image, and the characteristic information of the two-dimensional characteristic image comprises at least one of the following: the document image to be restored contains text information, and the texture information of the document image to be restored.

In some manners, the feature extraction module 802 is further configured to extract a first feature vector of the two-dimensional feature image output by the first model; the training module 805 is further configured to perform a deconvolution operation on the first feature vector through a second preset network to obtain a second feature vector set, where the second feature vector set includes a second feature vector obtained by convolving each layer in the deconvolution operation; the training module 805 is further configured to perform a convolution operation on the target feature vector through the second preset network to obtain a third feature vector set; the target feature vector is a second feature vector obtained by the last layer of convolution in the deconvolution operation; the third eigenvector set comprises a third eigenvector obtained by convolution of each layer in the process of the deconvolution operation; the third feature vector set comprises a third feature vector which has a mapping relation with any one of the second feature vectors in the second feature vector set.

In some manners, the training module 805 is further configured to fuse the second feature vector in the second feature vector set with the third feature vector in the third feature vector set to obtain a fourth image; the fourth image is a three-dimensional characteristic image of the undeformed document image obtained by transforming the first image.

In some manners, the training module 805 is further configured to adjust parameters of the second preset network according to a comparison result of the fourth image and the second image; the training module 804 is further configured to stop training the second preset network when the value of the second loss function of the second preset network is no longer reduced, so as to obtain a second model; wherein the second loss function is used to represent an error between the fourth image and the second image.

In some manners, the restoring module 804 is specifically configured to input the document image to be restored into the first model to obtain a two-dimensional feature image; and inputting the two-dimensional characteristic image into the second model to obtain the target document image after the document image to be restored is restored.

The image processing device in the embodiment of the present application may be a device, or may be a component, an integrated circuit, or a chip in a terminal. The device may be a mobile electronic device or a non-mobile electronic device. By way of example, the mobile electronic device may be a cell phone, tablet computer, notebook computer, palm computer, vehicle-mounted electronic device, wearable device, ultra-mobile personal computer (ultra-mobile personal computer, UMPC), netbook or personal digital assistant (personal digital assistant, PDA), etc., and the non-mobile electronic device may be a server, network attached storage (Network Attached Storage, NAS), personal computer (personal computer, PC), television (TV), teller machine or self-service machine, etc., and the embodiments of the present application are not limited in particular.

The server may be an independent server, or may be a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, content delivery networks (Content Delivery Network, CDN), and basic cloud computing services such as big data and artificial intelligence platforms.

The image processing device provided by the embodiment of the application can be used for realizing the method embodiment of fig. 1 to 8. The respective processes implemented by the apparatus are not described here in detail to avoid repetition.

The beneficial effects of the various implementation manners in this embodiment may be specifically referred to the beneficial effects of the corresponding implementation manners in the foregoing method embodiment, and in order to avoid repetition, the description is omitted here.

According to the image processing device provided by the embodiment of the application, two-dimensional characteristic information and three-dimensional characteristic information containing deformed document images are extracted through a stacked network model containing a first model and a second model, the mapping relation between the deformed document and an undeformed document is determined based on the two-dimensional characteristic information and the three-dimensional characteristic information, and the deformed document images are restored based on the mapping relation, so that images containing the undeformed document are obtained. The stacking network model not only considers the whole information, but also well considers the detail information such as local texture, and the like, and based on the stacking network, the low-level network architecture and the high-level network architecture are connected and combined with each other ingeniously, so that an information exchange fusion channel is created, the network has stronger generalization, is more intelligent, focuses on the whole and local association, and improves the recovery precision and quality of the model on images containing deformed documents.

In order to solve the technical problems, the embodiment of the invention also provides computer equipment. Referring specifically to fig. 10, fig. 10 is a basic structural block diagram of a computer device according to the present embodiment.

As shown in fig. 10, the internal structure of the computer device is schematically shown. The computer device includes a processor, a non-volatile storage medium, a memory, and a network interface connected by a system bus. The non-volatile storage medium of the computer device stores an operating system, a database, and computer readable instructions, where the database may store a control information sequence, and the computer readable instructions, when executed by the processor, may cause the processor to implement an image processing method. The processor of the computer device is used to provide computing and control capabilities, supporting the operation of the entire computer device. The memory of the computer device may have stored therein computer readable instructions that, when executed by the processor, cause the processor to perform an image processing method. The network interface of the computer device is for communicating with a terminal connection. It will be appreciated by those skilled in the art that the structure shown in fig. 8 is merely a block diagram of some of the structures associated with the present application and is not limiting of the computer device to which the present application may be applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.

The processor in this embodiment is configured to execute specific functions of the acquisition module 801, the feature extraction module 802, the determination module 803, and the recovery module 804 in fig. 9, and the memory stores program codes and various types of data required for executing the above modules. The network interface is used for data transmission between the user terminal or the server. The memory in the present embodiment stores program codes and data necessary for executing all the sub-modules in the image processing apparatus, and the server can call the program codes and data of the server to execute the functions of all the sub-modules.

According to the computer equipment provided by the embodiment, two-dimensional characteristic information and three-dimensional characteristic information containing deformed document images are extracted through a stacked network model containing a first model and a second model, the mapping relation between the deformed document and the undeformed document is determined based on the two-dimensional characteristic information and the three-dimensional characteristic information, and the deformed document images are restored based on the mapping relation, so that images containing the undeformed document are obtained. The stacking network model not only considers the whole information, but also well considers the detail information such as local texture, and the like, and based on the stacking network, the low-level network architecture and the high-level network architecture are connected and combined with each other ingeniously, so that an information exchange fusion channel is created, the network has stronger generalization, is more intelligent, focuses on the whole and local association, and improves the recovery precision and quality of the model on images containing deformed documents.

The invention also provides a storage medium storing computer readable instructions that, when executed by one or more processors, cause the one or more processors to perform the steps of any of the above-described embodiments of the image processing method.

Those skilled in the art will appreciate that implementing all or part of the above-described methods in accordance with the embodiments may be accomplished by way of a computer program stored in a computer-readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. The storage medium may be a nonvolatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a random access Memory (Random Access Memory, RAM).

Those skilled in the art will appreciate that implementing all or part of the above-described embodiment methods may be accomplished by way of a computer program stored in a computer-readable storage medium, which when executed, may comprise the steps of embodiments of the methods described above. The storage medium may be a nonvolatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a random access Memory (Random Access Memory, RAM).

Those of skill in the art will appreciate that the various operations, methods, steps in the flow, actions, schemes, and alternatives discussed in the present application may be alternated, altered, combined, or eliminated. Further, other steps, means, or steps in a process having various operations, methods, or procedures discussed in this application may be alternated, altered, rearranged, split, combined, or eliminated. Further, steps, measures, schemes in the prior art with various operations, methods, flows disclosed in the present application may also be alternated, altered, rearranged, decomposed, combined, or deleted.

The foregoing is only a partial embodiment of the present application, and it should be noted that, for a person skilled in the art, several improvements and modifications can be made without departing from the principle of the present application, and these improvements and modifications should also be considered as the protection scope of the present application.

Claims

1. An image processing method, comprising:

acquiring a document image to be restored containing a deformed document;

2. The method according to claim 1, wherein before the restoring the document image to be restored according to the target transformation relationship, the method further comprises:

acquiring a target sample set, and constructing a first loss function, wherein each training sample in the target sample set is a first image containing deformed documents, and one first image corresponds to one second image containing undeformed documents;

training a first preset network according to the target sample set and the first loss function;

Stopping training the first preset network when the value of the first loss function is not reduced any more, so as to obtain a first model;

3. The method of claim 2, wherein the training of the first preset network is stopped when the value of the first loss function is no longer decreasing, and wherein after obtaining the first model, the method further comprises:

converting the document image to be restored into a two-dimensional characteristic image through the first model;

4. A method according to claim 3, wherein after said converting the document image to be restored into a two-dimensional feature image by the first model, the method further comprises:

Extracting a first feature vector of the two-dimensional feature image output by the first model;

performing a deconvolution operation on the first feature vector through a second preset network to obtain a second feature vector set, wherein the second feature vector set comprises second feature vectors obtained by convolution of each layer in the deconvolution operation process;

performing a deconvolution operation on the target feature vector through the second preset network to obtain a third feature vector set;

5. The method of claim 4, wherein after performing a deconvolution operation on the target feature vector over the second predetermined network to obtain a third feature vector set, the method further comprises:

fusing the second feature vector in the second feature vector set with the third feature vector in the third feature vector set to obtain a fourth image;

6. The method of claim 5, wherein the fusing the second feature vector of the second set of feature vectors with the third feature vector of the third set of feature vectors results in a fourth image, the method further comprising:

according to the comparison result of the fourth image and the second image, adjusting parameters of the second preset network;

stopping training the second preset network when the value of the second loss function of the second preset network is not reduced any more, so as to obtain a second model;

7. The method of claim 6, wherein the restoring the document image to be restored according to the target transformation relationship comprises:

inputting the document image to be restored into the first model to obtain a two-dimensional characteristic image;

and inputting the two-dimensional characteristic image into the second model to obtain the target document image after the document image to be restored is restored.

8. An image processing apparatus, comprising:

9. A computer device comprising a memory and a processor, the memory having stored therein computer readable instructions which, when executed by the processor, cause the processor to perform the steps of the image processing method according to any of claims 1 to 7.

10. A storage medium storing computer readable instructions which, when executed by one or more processors, cause the one or more processors to perform the steps of the image processing method of any of claims 1 to 7.