CN109377532B

CN109377532B - Image processing method and device based on neural network

Info

Publication number: CN109377532B
Application number: CN201811212227.7A
Authority: CN
Inventors: 安睿
Original assignee: Zhongan Information Technology Service Co Ltd
Current assignee: Zhongan Information Technology Service Co Ltd
Priority date: 2018-10-18
Filing date: 2018-10-18
Publication date: 2023-01-31
Anticipated expiration: 2038-10-18
Also published as: CN109377532A

Abstract

The invention provides an image processing method based on a neural network, which comprises the following steps: performing filling operation on a first image by using a generating model to obtain a second image, wherein the first image is generated by performing size compression on an original image, and the size of the second image is in a specified proportion to that of the original image; using a discrimination model to perform supervised learning on an original image and a second image to obtain a first comparison result, and using a detail comparison model to calculate a feature map of the original image and a feature map of the second image so as to determine a second comparison result, wherein the second comparison result represents the difference between the feature map of the original image and the feature map of the second image; training a generative model and a discriminative model based on the first comparison result and the second comparison result. The image processing method can restore the image with higher compression ratio, thereby greatly improving the compression ratio of the image and saving the bandwidth requirement and the storage space requirement of image transmission service.

Description

Image processing method and device based on neural network

Technical Field

The present invention relates to image processing, and more particularly, to a method and apparatus for processing an image based on a neural network.

Background

Image compression refers to a technique of losslessly or losslessly representing the original pixel matrix with a smaller amount of data, thereby storing and transmitting data in a more efficient format. Image data can be compressed because of redundancy in the data, such as: spatial redundancy due to correlation between adjacent pixels in an image, spectral redundancy due to correlation of different color planes or spectral bands, etc. The goal of data compression is to reduce the number of bits required to represent the data by removing these data redundancies. The compression of the image is divided into lossy compression and lossless compression, wherein the lossy compression means that some insensitive information is omitted in the compression process, and although the original data cannot be completely restored, a higher compression ratio can be obtained. The lossless compression refers to that only redundant information of the data is removed, the data is re-encoded in an information entropy mode to obtain a compression ratio as high as possible, and the data after lossless compression can be completely restored to original data.

Image enlargement refers to enlargement of an image in size. It is clear that the information in the original image is not sufficient to contain all the image information after the enlargement, so the enlargement must be distorted. The image magnification algorithm is to study how to enlarge the original image to a specified size by different interpolation methods under the condition of reducing distortion as much as possible. Common image amplification algorithms include bilinear interpolation, bicubic interpolation and other methods.

Neural networks (Neural networks) are an artificially designed Network structure that is essentially a Multi-layer Perceptron (Multi-layer Perceptron). The perceptron is composed of a number of neurons (Neuron), each receiving an input signal from an external or other node and deriving an output signal by means of an activation function, like the signal transfer of neurons in the brain. The neurons are connected in layers to form a network structure. Unlike nerve cells, the signals of artificial neurons can be propagated in reverse, and this feedback mechanism allows the perceptron to have a learning function. In addition to learning functions, multi-layered perceptrons can represent non-linear mappings, so neural networks can help people solve relatively complex problems such as pattern recognition, automation, decision evaluation, prediction, and the like. The application scenes are divided into supervised learning, unsupervised learning, semi-supervised learning and the like, and different loss functions are defined to describe targets and fit parameters to achieve the purpose of the network.

Disclosure of Invention

Aiming at the problem that only images with lower compression ratio can be restored in the image processing at present, the invention provides an image processing method based on a neural network on one hand, which comprises the following steps: performing a filling operation on the first image by using a generating model to obtain a second image, wherein the first image is generated by performing size compression on an original image, and the size of the second image is in a specified proportion to that of the original image; performing supervised learning on the original image and the second image by using a discriminant model to obtain a first comparison result, and calculating a feature map of the original image and a feature map of the second image by using a detail comparison model to further determine a second comparison result, wherein the second comparison result represents the difference between the feature map of the original image and the feature map of the second image; training the generative model and the discriminative model based on the first comparison result and the second comparison result.

In one embodiment, the method further comprises: and restoring a third image by using the trained generative model, wherein the third image is a compressed image.

In one embodiment, the size of the second image is the same as the size of the original image.

In one embodiment, performing a padding operation on the first image using the generative model to obtain the second image comprises: a second image is obtained by performing a multi-layer convolution operation on the first image using the generative model.

In one embodiment, training the generative model and the discriminative model based on the first comparison result and the second comparison result comprises: and training the generation model and the discriminant model by using a back propagation method by taking the first comparison result and the second comparison result as model losses.

In one embodiment, the first comparison result is a positive error loss and the second comparison result is a detail loss.

In one embodiment, the loss of detail is a loss of detail for a critical area being labeled.

In one embodiment, the detail comparison model is constructed based on a feature map hidden layer of a detail comparison network.

In one embodiment, the generation model and/or the determination model are constructed based on a convolutional neural network.

Another aspect of the present invention provides an image processing apparatus based on a neural network, including: a memory for storing instructions; and a processor coupled to the memory, the instructions when executed by the processor causing the apparatus to perform the method of any of the above.

In another aspect, the invention provides a computer-readable storage medium comprising instructions that, when executed, cause a processor of the computer to perform any of the methods described above.

The image processing method can restore the image with higher compression ratio, thereby greatly improving the compression ratio of the image and saving the bandwidth requirement and the storage space requirement of image transmission service; compared with the traditional image interpolation amplification algorithm, the image generation is carried out through the convolutional neural network, the obtained image is clearer and more accurate, the detail recovery is better, and higher amplification ratio can be borne; enhancing the image detail reduction degree of the generated model by introducing the feature map convolution layer as an additional loss; model performance can be enhanced by more customized loss function definitions, such as: the text area of the sample of the image database of the file photocopy is marked, so that the detail loss penalty of the text area is increased, the generated model is more sensitive to the text area, and more important information is highlighted according to the requirement.

Drawings

FIG. 1 is a flow diagram 100 of a neural network-based image processing method according to an embodiment of the present invention;

FIG. 2 is a flow diagram 200 of one embodiment of a neural network-based image processing method in accordance with the present invention;

fig. 3 is a schematic diagram of a neural network-based image processing apparatus 300 according to an embodiment of the present invention.

Detailed Description

Various exemplary embodiments of the present disclosure are described in detail below with reference to the accompanying drawings. The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of methods and systems according to various embodiments of the present disclosure. It should be noted that each block in the flowchart or block diagrams may represent a module, a segment, or a portion of code, which may comprise one or more executable instructions for implementing the logical function specified in the respective embodiment. It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The terms "including," "comprising," and the like, as used herein, are to be construed as open-ended terms, i.e., "including/including but not limited to," meaning that additional content may also be included. The term "based on" is "based at least in part on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment," and so on.

Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate. For the connection between the units in the drawings, for convenience of description only, it means that at least the units at both ends of the connection are in communication with each other, and is not intended to limit the inability of communication between the units that are not connected.

Interpretation of terms

Loss by mistake: a quantitative measure describing whether the generated image is true compared to the original image, e.g., a cross-entropy measure.

Loss of detail: a quantitative measure of the difference between two image details is described.

Detail comparison network: a neural network that gains loss of detail over image detail, i.e., a network formed from several convolution layers of a deep neural network trained over a large-scale image classification dataset (e.g., imageNet dataset), e.g., a residual network (ResNet).

FIG. 1 shows a flow diagram 100 of a neural network-based image processing method according to an embodiment of the present invention.

Step S101: and performing a filling operation on the first image by using the generated model to obtain a second image, wherein the first image is generated by performing size compression on the original image, and the size of the second image is in a specified proportion to the size of the original image. Reference herein to an original image includes an uncompressed image of a scene, such as an image of a user's identification card, passport, document photocopy, etc.

Step S102: and carrying out supervised learning on the original image and the second image by using a discrimination model to obtain a first comparison result, calculating a feature map of the original image and a feature map of the second image by using a detail comparison model, and further determining the second comparison result, wherein the second comparison result represents the difference between the feature map of the original image and the feature map of the second image. In one embodiment, the detail comparison model may be built based on the feature map hidden layers of the detail comparison network. In another embodiment, the detail comparison model may be constructed based on a fully connected network and trained using a large scale image classification dataset.

Step S103: training a generative model and a discriminant model based on the first comparison result and the second comparison result.

It should be understood that the generative model and/or discriminant model in the present invention are constructed based on a deep neural network. In one embodiment, the generative model and/or discriminant model is constructed based on a convolutional neural network. In another embodiment, the generative model and/or discriminant model are constructed based on a fully connected network.

FIG. 2 is a flow diagram 200 of one embodiment of a neural network-based image processing method in accordance with the present invention.

Step S201: the second image with the same size as the original image of the first image is obtained by performing a filling operation on the first image by using the generation model, wherein the first image is an image generated by performing size compression (for example, the length and the width are compressed by 2 times, preferably, the length and the width are compressed by 4 times, although the compression times can be higher) on the original image. The first image and its original image can be obtained from a corresponding image database. In one embodiment, the generative model is composed of a multi-layer deconvolution neural network, the first image is input into the generative model, and the generative model outputs a second image having the same size as the original image. It should be appreciated that a residual block (residual block), batch regularization (batch regularization), or the like may be used in generating the model to improve the model's ability to generate images. It will be appreciated that the first image may be an image generated by subjecting the original image to other suitable compression means (e.g. various lossy or lossless compression means), and that the size of the second image may be in any suitable proportion to the size of the original image.

Step S202: and the discrimination model carries out supervised learning on the original image and the second image of the first image to obtain a positive error loss, and calculates the feature map of the original image and the feature map of the second image by using a feature map hidden layer of a detail comparison network to further determine the detail loss, wherein the detail loss represents the difference between the feature map of the original image and the feature map of the second image. It will be appreciated that other suitable ways may be used to compute the feature map of the original image and the feature map of the second image instead of the feature map hidden layer of the detail-contrast network. It should also be understood that the positive-error loss is only one comparison result obtained by supervised learning, and the detail loss is also only one comparison result of the difference between the feature map of the original image and the feature map of the second image, and the positive-error loss and the detail loss may be replaced by other values as the comparison result.

In one embodiment, the discriminant model is composed of a multilayer convolutional neural network and other multilayer deep neural networks, the original image and the second image (sample) are input into the discriminant model, and the discriminant model outputs model loss, namely a comparison result of the two images. The model loss is defined as a probability of determining whether an input sample is an original image, i.e., a cross entropy of a probability that an image generated by the generation model is determined to be no and a probability that the original image is determined to be yes. The discriminant model trains the entire network by performing gradient descent and back propagation methods on the loss function. In one embodiment, the feature map of the original image and the feature map (feature map) of the second image may be generated by a network that has been trained (e.g., a convolutional layer of a residual error convolutional neural network classification model trained using an ImageNet dataset), which may make the image details restored by the generated model more realistic. In another embodiment, the detail loss is the detail loss of a key region for labeling, the performance of the model is enhanced through a customized loss function, additional strengthening can be performed on a region feature map, and the loss weight of the difference of the feature map of a corresponding region is enhanced through labeling an important region of a sample, so that the attention degree and the reduction capability of the generated model to the region are enhanced, for example, a text region of a sample of an image database for file photocopying is labeled, the detail loss penalty of the text region is increased, and the generated model is more sensitive to the text region so as to highlight more important information according to requirements.

Step S203: and training a generating model and a distinguishing model according to the obtained positive error loss and detail loss. It should be understood that the generative model and the discriminative model may also be trained with other contrast results characterizing the difference between the original image and the second image than the false positive loss and the loss of detail.

In one embodiment, the resulting positive error loss and detail loss are taken as model losses, and the generative model and the discriminative model are trained using a back propagation method.

Step S204: the trained generative model is utilized to recover the compressed images, where the compressed images may be obtained from a suitable database of images, or may be obtained in other suitable manners.

It should be understood that step S204 is not necessary, and only the steps performed when the user needs to use the trained generative model to restore images with a larger compression ratio or verify the actual use effect of the trained generative model are listed here.

The generated model trained by the countermeasure mode of the generated model and the discrimination mode can restore the image with a larger compression ratio (for example, 16 times compression or higher times compression), can ensure that the restored image has better detail quality and higher definition, can ensure the readability of the image, can completely restore the main information, is further favorable for transmitting and storing the image with a small data volume after being compressed by a high compression ratio, and saves the bandwidth and the storage space. The method is suitable for scenes with large image data volume and low requirements on image details but certain requirements on information storage.

Fig. 3 shows a schematic diagram of a neural network-based image processing apparatus 300 according to an embodiment of the present invention. The apparatus 300 may comprise: a memory 301 and a processor 302 coupled to the memory 301. The memory 301 is for storing instructions, and the processor 302 is configured to implement one or more of any of the steps of the methods described with respect to fig. 1 and 2 based on the instructions stored by the memory 301.

As shown in fig. 3, the apparatus 300 may further include a communication interface 303 for information interaction with other devices. The apparatus 300 may further comprise a bus 304, the memory 301, the processor 302 and the communication interface 303 communicating with each other via the bus 304.

The memory 301 may include a volatile memory or may include a nonvolatile memory. Processor 302 may be a Central Processing Unit (CPU), microcontroller, application Specific Integrated Circuit (ASIC), digital Signal Processor (DSP), field Programmable Gate Array (FPGA) or other programmable logic device, or one or more integrated circuits configured to implement an embodiment of the invention.

Alternatively, the neural network-based image processing method described above can be embodied by a computer program product, i.e., a tangible computer-readable storage medium. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied thereon for carrying out various aspects of the present disclosure. The computer-readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be interpreted as a transitory signal per se, such as a radio wave or other freely propagating electromagnetic wave, an electromagnetic wave propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or an electrical signal transmitted through an electrical wire.

It should be noted that the above-mentioned embodiments are only specific examples of the present invention, and obviously, the present invention is not limited to the above-mentioned embodiments, and many similar variations exist. All modifications which would occur to one skilled in the art and which are, therefore, directly derived or suggested from the disclosure herein are deemed to be within the scope of the present invention.

Claims

1. An image processing method based on a neural network, comprising:

performing a filling operation on a first image by using a generating model to obtain a second image, wherein the first image is generated by performing size compression on an original image, and the size of the second image is in a specified proportion to that of the original image;

performing supervised learning on the original image and the second image by using a discriminant model to obtain a first comparison result, and calculating a feature map of the original image and a feature map of the second image by using a detail comparison model to further determine a second comparison result, wherein the second comparison result represents the difference between the feature map of the original image and the feature map of the second image;

training the generative model and the discriminative model based on the first comparison result and the second comparison result.

2. The method of claim 1, further comprising: and restoring a third image by using the trained generative model, wherein the third image is a compressed image.

3. The method of claim 1, wherein the size of the second image is the same as the size of the original image.

4. The method of claim 1, wherein performing a padding operation on the first image using the generative model to obtain the second image comprises: the second image is obtained by performing a multi-layer convolution operation on the first image using the generative model.

5. The method of claim 1, wherein training the generative model and the discriminative model based on the first comparison result and the second comparison result comprises: and training the generation model and the discriminant model by using a back propagation method by taking the first comparison result and the second comparison result as model losses.

6. The method of claim 5, wherein the first comparison result is a loss of positive error and the second comparison result is a loss of detail.

7. The method of claim 6, wherein the loss of detail is a loss of detail for a critical area being labeled.

8. The method of claim 1, wherein the detail comparison model is constructed based on a feature map hidden layer of a detail comparison network.

9. The method according to claim 1, characterized in that the generation model and/or the judgment model are constructed based on a convolutional neural network.

10. An image processing apparatus based on a neural network, comprising:

a memory for storing instructions; and

a processor coupled to the memory, the instructions when executed by the processor cause the apparatus to perform the method of claims 1-9.

11. A computer-readable storage medium comprising instructions that, when executed, cause a processor of the computer to perform the method recited in claims 1-9.