CN111046893B

CN111046893B - Image similarity determining method and device, image processing method and device

Info

Publication number: CN111046893B
Application number: CN201811189157.8A
Authority: CN
Inventors: 周静; 谭志明
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2018-10-12
Filing date: 2018-10-12
Publication date: 2024-02-02
Anticipated expiration: 2038-10-12
Also published as: CN111046893A

Abstract

The embodiment of the invention provides an image similarity determining method and device, and an image processing method and device, wherein the image similarity determining method comprises the following steps: generating two-channel gray scale image data, wherein the two-channel gray scale image data comprises gray scale original image data of a first channel and gray scale reconstruction image data of a second channel; calculating a first similarity of the two-channel gray image data, and determining a first loss function according to the first similarity, wherein the first loss function is inversely proportional to the first similarity; calculating a pixel level second similarity of the original image data and the reconstructed image data, and determining a second loss function according to the second similarity; a loss function representing a similarity of the original image data and the reconstructed image data is determined from the first loss function and the second loss function.

Description

Image similarity determining method and device, image processing method and device

Technical Field

The present invention relates to the field of image processing, and in particular, to a method and apparatus for determining image similarity, and a method and apparatus for processing images.

Background

In recent years, with the development of artificial neural networks, in the field of image compression, a processing method of replacing manual linear transformation (for example, joint photographic experts group JPEG) with an artificial neural network is gradually used, that is, replacing analysis transformation with a learned encoder function and replacing synthesis transformation with a learned decoder function, for example, a countermeasure network (Generative Adversarial Network, GAN) and a Variational encoder (VAE) are generated, and recent researches indicate that texture information of an image can be effectively acquired through a feature map in a deep convolution network.

The GAN is used for generating images, the GAN comprises two networks, one is a generating network G and the other is a discriminator network D, the generating network G is mainly used for learning real image distribution, so that the generated images are more real to cheat the discriminator, the discriminator network D needs to conduct true and false discrimination on the generated images, in the training process, the generating network inputs a noise variable z, and generated image data G (z; theta) is output _g ) The arbiter network inputs the original image and the generated image data g (z; θ _g ) The output result is the confidence level d (x; θ _d ) The generator and the discriminator continuously resist, and finally the two networks achieve dynamic balance, the image generated by the generator is close to the real image distribution, and the discriminator cannot recognize the real and false images.

Conventional image quality evaluation methods can be classified into objective and subjective types, and in a training stage, conventional image compression algorithms can minimize pixel-level loss metrics (objective evaluation, such as mean square error MSE), thereby obtaining good peak signal-to-noise characteristics, but due to loss (blurring) of high-frequency components, the image is perceptually unreal, and in order to evaluate the image quality more accurately, the image perception quality can be taken as an additional loss metric (subjective evaluation).

It should be noted that the foregoing description of the background art is only for the purpose of providing a clear and complete description of the technical solution of the present invention and is presented for the convenience of understanding by those skilled in the art. The above-described solutions are not considered to be known to the person skilled in the art simply because they are set forth in the background of the invention section.

Disclosure of Invention

In the existing subjective evaluation method, when the loss measurement of image perception is calculated, an original image and a reconstructed image are taken as two inputs of a twin neural network, the feature vector of the original image and the feature vector of the reconstructed image are respectively extracted according to the twin neural network, a similarity function (for example, the distance measurement of the feature vector is calculated) of the two feature vectors is calculated, and the similarity function is determined to be the loss measurement of the image perception.

The inventor finds that in the method, the twin neural network is independent to the feature extraction process of the original image and the reconstructed image, so that the time for calculating the similarity function is long, and the correlation between the original image and the reconstructed image cannot be reflected because the feature extraction is independently performed, so that the semantic similarity between pixels at each corresponding position of the original image and the reconstructed image cannot be intuitively reflected, and the calculation accuracy is not accurate enough.

The embodiment of the invention provides an image similarity determining method and device, an image processing method and device, solves the problems in the prior art, and can train parameters of a neural network for image compression according to the loss function, so that the similarity of the original image data and the reconstructed image data is increased, and the quality of a reconstructed image (compressed image) is improved.

According to a first aspect of an embodiment of the present invention, there is provided an image similarity determination apparatus, wherein the apparatus includes:

a generation unit for generating two-channel gray image data, wherein the two-channel gray image data includes gray original image data of a first channel and gray reconstructed image data of a second channel;

A first calculation unit for calculating a first similarity of the two-channel gray image data, determining a first loss function according to the first similarity, wherein the first loss function is inversely proportional to the first similarity;

a second calculation unit for calculating a pixel-level second similarity of the original image data and the reconstructed image data, determining a second loss function according to the second similarity;

a determining unit for determining a loss function representing a similarity of the original image data and the reconstructed image data from the first loss function and the second loss function.

According to a second aspect of the embodiments of the present invention, there is provided an image processing apparatus, wherein the apparatus includes:

an encoder for converting input original image data into feature vectors;

a decoder for converting the feature vector into reconstructed image data;

the image similarity determination device of the first aspect, which is configured to calculate a loss function of the original image data and the reconstructed image data;

and the processing unit is used for training the parameters of the encoder according to the loss function, generating new reconstructed image data by the encoder and the decoder according to the trained parameters, and recalculating the loss function of the new reconstructed image data and the original image data by the image similarity determining device until the recalculated loss function indicates that the similarity of the original image data and the new reconstructed image data is larger than or equal to a first threshold value.

According to a third aspect of the embodiment of the present invention, there is provided an image similarity determining method, wherein the method includes:

generating two-channel gray scale image data, wherein the two-channel gray scale image data comprises gray scale original image data of a first channel and gray scale reconstruction image data of a second channel;

calculating a first similarity of the two-channel gray image data, and determining a first loss function according to the first similarity, wherein the first loss function is inversely proportional to the first similarity;

calculating a pixel level second similarity of the original image data and the reconstructed image data, and determining a second loss function according to the second similarity;

a loss function representing a similarity of the original image data and the reconstructed image data is determined from the first loss function and the second loss function.

The method has the advantages that the original image and the reconstructed image are used as a double-channel image, the perceptual-level loss function of the original image and the reconstructed image is calculated according to the double-channel image, the final loss function is determined according to the perceptual-level loss function and the pixel-level loss function, the determination method saves calculation time, improves the calculation precision of similarity, solves the problems in the prior art, trains parameters of a neural network for image compression according to the loss function, improves the similarity of the original image data and the reconstructed image data, and improves the quality of the reconstructed image (compressed image).

Specific embodiments of the invention are disclosed in detail below with reference to the following description and drawings, indicating the manner in which the principles of the invention may be employed. It should be understood that the embodiments of the invention are not limited in scope thereby. The embodiments of the invention include many variations, modifications and equivalents within the spirit and scope of the appended claims.

Features that are described and/or illustrated with respect to one embodiment may be used in the same way or in a similar way in one or more other embodiments in combination with or instead of the features of the other embodiments.

It should be emphasized that the term "comprises/comprising" when used herein is taken to specify the presence of stated features, integers, steps or components but does not preclude the presence or addition of one or more other features, integers, steps or components.

Drawings

Many aspects of the invention can be better understood with reference to the following drawings. The components in the figures are not necessarily to scale, emphasis instead being placed upon illustrating the principles of the invention. Corresponding parts in the drawings may be exaggerated or reduced in order to facilitate the illustration and description of some parts of the present invention. Elements and features described in one drawing or embodiment of the invention may be combined with elements and features shown in one or more other drawings or embodiments. Furthermore, in the drawings, like reference numerals designate corresponding parts throughout the several views, and may be used to designate corresponding parts as used in more than one embodiment.

In the drawings:

fig. 1 is a flowchart of an image similarity determination method in the present embodiment 1;

fig. 2 is a schematic diagram of the convolutional neural network structure in this embodiment 1;

FIG. 3 is a diagram showing an architecture of an image processing system in the embodiment 2;

fig. 4 is a flowchart of an image processing method in the present embodiment 2;

fig. 5 is a schematic diagram of a convolutional neural network corresponding to the decoder in embodiment 2;

fig. 6 is a schematic diagram of an image similarity determination apparatus in the present embodiment 3;

fig. 7 is a schematic diagram of an image processing apparatus in this embodiment 4.

Fig. 8 is a schematic diagram showing the hardware configuration of the electronic device in embodiment 5.

Detailed Description

The foregoing and other features of embodiments of the invention will be apparent from the following description, taken in conjunction with the accompanying drawings. These embodiments are merely illustrative and not limiting of the invention. In order to enable those skilled in the art to easily understand the principles and embodiments of the present invention, the embodiment of the present invention is described by taking a reconstructed image processed by image compression as an example, but it is understood that the embodiment of the present invention is not limited thereto, and reconstructed images based on other image processing are also included in the scope of the present invention.

The following describes specific embodiments of the present invention with reference to the drawings.

Example 1

The present embodiment 1 provides an image similarity determining method, fig. 1 is a flowchart of the method, and as shown in fig. 1, the method includes:

step 101, generating two-channel gray scale image data, wherein the two-channel gray scale image data comprises gray scale original image data of a first channel and gray scale reconstruction image data of a second channel;

step 102, calculating a first similarity of the two-channel gray image data, and determining a first loss function according to the first similarity, wherein the first loss function is inversely proportional to the first similarity;

step 103, calculating pixel-level second similarity of the original image data and the reconstructed image data, and determining a second loss function according to the second similarity;

step 104, determining a loss function representing the similarity of the original image data and the reconstructed image data according to the first loss function and the second loss function.

In order to better illustrate the above steps 101-104, a part of the features will be explained below.

In this embodiment, the loss function may be called an evaluation function, which is used to evaluate the degree of inconsistency (or consistency) between the reconstructed image and the original image, and is also an optimized objective function in the neural network, where the neural network training or optimizing process is a process of minimizing the loss function, and the smaller the loss function, the higher the similarity between the reconstructed image and the original image is.

In this embodiment, in the field of image processing, for example, image compression, the image compression distortion degree may be regarded as a compromise between the code rate and the image compression distortion degree, where the image compression distortion degree may be regarded as a similarity degree (loss function representation) between the reconstructed image (compressed image) and the original image, and the higher the similarity degree is, the higher the quality of the reconstructed image is represented, and when the quality of the reconstructed image is evaluated, the loss function is determined by combining two factors, i.e., subjective measurement and objective measurement, so that the evaluation result is more accurate, and the confidence of the reconstructed image is improved.

On the one hand, for subjective measures, i.e. the first loss function of the perception stage in this embodiment, which may reflect the content loss and the style loss of the reconstructed image, the first loss function of the perception stage may reflect the similarity of the reconstructed image and the original image in terms of semantics (perceptual features).

How the perceptual stage first loss function is calculated is described below in connection with steps 101-102.

In this embodiment, in step 101, the gray-scale original image and the gray-scale reconstructed image are combined to generate one two-channel gray-scale image, and the gray-scale original image data and the gray-scale reconstructed image data are regarded as data of two channels of the one image, respectively. The original image and the reconstructed image have the same size (x×y, X represents the number of pixels in the longitudinal direction of the image, and Y represents the number of pixels in the width direction of the image).

For example, for gray scale raw image data, it may be represented as a two-dimensional single channel matrix V1, a _y,x Gray values representing pixels with pixel positions (y, x), for gray reconstructed image data, which may be represented as a two-dimensional single channel matrix V2, b _y,x The gray value of the pixel representing the pixel position (y, x) is as follows:

in this embodiment, the matrices V1 and V2 are combined to generate a two-channel gray scale image, it can be expressed as a two-dimensional two-channel matrix V3:

in this embodiment, before step 101, the method may further include (optional, not illustrated):

step 100, converting the original image data into gray-scale original image data, and converting the reconstructed image data into gray-scale reconstructed image data, where the gray-scale processing method in step 100 may refer to the prior art, for example, performing weighted summation on three color components corresponding to each pixel in the image by using a weighted average method, so as to obtain a gray-scale value of the pixel, which is not described in detail.

In step 102, a convolutional neural network is used to calculate a first similarity of the two-channel gray image data, and fig. 2 is a schematic diagram of the convolutional neural network structure, and how to calculate the first similarity is described below with reference to fig. 2.

As shown in fig. 2, the convolutional neural network includes a plurality of convolutional layers at a bottom layer, a modified linear unit ReLU, a max pooling layer, and a full-connection layer at a high layer, the gray-scale dual-channel image data generated in step 101 is taken as an input of the convolutional neural network, that is, the matrix V3 is taken as an input of the convolutional neural network, after the first layer convolution processing at the bottom layer of the convolutional neural network, the gray-scale raw image data and the gray-scale reconstructed image data are subjected to related weighted combination and mapping, which means that the gray-scale raw image data and the gray-scale reconstructed image data are related (associated) together after the first convolution processing, after the bottom layer processing and the full-connection layer at the high layer, the number of neurons output is 1, and the output result indicates the first similarity S1, for example, the first similarity may be a fraction within 0-1, the specific algorithm of the first similarity of the dual-channel image data and the parameters of the convolutional neural network may refer to the prior art, for example, the convolution kernel size is 3×3, etc., which is not limited in this embodiment.

In this embodiment, in step 102, a perceptual stage loss function (first loss function) is determined from the first similarity, the first loss function being inversely proportional to the first similarity; the higher the first similarity, the smaller the first loss function, the lower the first similarity, the larger the first loss function, and when the loss function is used as an objective function for optimizing a neural network (for example, GAN), the obtained loss function meeting a predetermined condition indicates that the similarity between the obtained reconstructed image and the original image is greater than or equal to a first threshold.

In this embodiment, the first loss function of the sensing stage is calculated according to the first similarity calculated based on the two-channel image data, so that the semantic (sensing) relevance and the semantic (sensing) similarity of the original image and the reconstructed image can be more intuitively reflected, the calculation time of the sensing similarity is shorter, and the calculation accuracy is higher.

In one embodiment, the first loss function periodic _loss The opposite number of the logarithm of the first similarity is expressed as the following formula 1):

perceptual _loss = -log s1 formula 1

In this embodiment, for example, the first similarity S1 may be a score within 0-1, with the closer S1 is to 1, the higher the similarity is _loss The closer to 0, the higher the similarity between the original image and the reconstructed image, and the lower the distortion of the reconstructed image, whereas when S1 is closer to 0, the higher the persistence is _loss The larger the joint, the lower the similarity between the original image and the reconstructed image, and the higher the distortion of the reconstructed image.

In one embodiment, the first loss function is equal to the inverse of the logarithm of the first similarity, i.e., the first loss function is inversely proportional to the first similarity; i.e. the higher the first similarity, the smaller the first loss function, the lower the first similarity, and the larger the first loss function.

The above two embodiments are merely two examples, and the present embodiment is not limited thereto, and various implementations in which the first loss function is inversely proportional to the first similarity are within the scope of the present embodiment.

On the one hand, for objective measurement, namely, the pixel-level second loss function in the embodiment, the difference between the original image and the reconstructed image in pixel values can be reflected, and the visual perception characteristic of human eyes can not be reflected.

How to calculate the pixel level second loss function is described below in connection with step 103.

In step 103, the second similarity at the pixel level is a mean square error or a multi-level structural similarity, and the second similarity at the pixel level is used as the second loss function.

For example, the Mean Square Error (MSE) of the original image data and the reconstructed image data is calculated, and the pixel level second similarity is used as the second loss function pixel _loss I.e. calculate the originalThe mean value of the squared euclidean distance of the pixel values of each corresponding location of the image data and the reconstructed image data is shown in the following formula 2):

for example, the multi-level structural similarity of the original image data and the reconstructed image data is calculated, specifically as shown in the following formula 3): the multi-level structural similarity Reference is made to the prior art for specific calculation procedures, and no further description is given here.

Where x represents the RGB values of the reconstructed image data,RGB values representing raw image data, wherein A larger value indicates a higher degree of similarity, i.e., a smaller distortion of the reconstructed image, whereas a lower degree of similarity indicates a larger distortion of the reconstructed image.

The above two embodiments are merely two examples, and the present embodiment is not limited thereto, and the existing objective measurement in image evaluation, that is, pixel-level similarity implementation, is within the protection scope of the present embodiment.

In the present embodiment, the execution order of steps 101 to 102 and step 103 is not limited.

In the present embodiment, in step 104, a first loss function and a first weight λ are calculated ₁ And a second loss function and a first product ofTwo weights lambda ₂ Is a second product of (2); calculating the sum of the first product and the second product, determining the sum as the loss functionSpecific reference may be made to the following equation 4):

in the present embodiment, the first weight λ ₁ And a second weight lambda ₂ Can be determined according to the requirement, lambda ₁ And lambda is ₂ The sum of (2) is 1, and this is not a limitation.

In the present embodiment, the loss function The objective measure of the pixel level and the subjective measure of the perception level are included, and the loss function is used as an optimization target of a neural network (such as GAN), so that the confidence of the reconstructed image can be further improved.

Therefore, the original image and the reconstructed image are used as a double-channel image, the perception level loss function of the original image and the reconstructed image is calculated according to the double-channel image, the final loss function is determined according to the perception level loss function and the pixel level loss function, the determination method saves calculation time, the calculation precision of the similarity is improved, the problems in the prior art are solved, and the confidence of the reconstructed image can be further improved.

Example 2

In this embodiment, an image processing method based on the GAN is provided, in which an encoder is added before a generator (decoder) based on the GAN, and an original image is subjected to the encoder to obtain a feature vector, so as to obtain a reconstructed image as an input of the decoder.

FIG. 3 is a schematic diagram of the GAN-based image processing (compression) system architecture, as shown in FIG. 3, in which the GAN is first trained, its generator is used to initialize the decoder of the compression system, and the original image x is passed throughThe encoder f _θ To obtain the feature vector Z, decoder g _φ Obtaining a reconstructed image based on the feature vector ZCalculating a loss function of the original image data and the reconstructed image data to train network parameters of the encoder; updating the parameters theta of the network, generating new reconstructed image data, recalculating the loss function, repeating the process until the optimized loss function meets the preset condition, namely the similarity between the original image data and the reconstructed image data is larger than or equal to a first threshold value, in other words, the reconstructed image is high in quality.

This embodiment 2 provides an image processing method that trains parameters of a neural network for image compression using the loss function in embodiment 1 so that the similarity between the original image data and the reconstructed image data becomes high, i.e., the quality of a reconstructed image (compressed image) is improved.

Fig. 4 is a flowchart of the image processing method, as shown in fig. 4, the method includes:

step 401, converting input original image data into feature vectors by using an encoder;

step 402, converting the feature vector into reconstructed image data by a decoder;

step 403, calculating a loss function of the original image data and the reconstructed image data;

step 404, training parameters of the encoder according to the loss function, generating new reconstructed image data, and recalculating the loss function of the new reconstructed image data and the original image data until the recalculated loss function indicates that the similarity between the original image data and the new reconstructed image data is greater than or equal to a first threshold.

In this embodiment, the decoder in step 402 may be implemented by using a convolutional neural network, as shown in fig. 5, which is a schematic diagram of the convolutional neural network corresponding to the decoder, and the feature vector is used as an input to obtain a reconstructed image through four layers of reverse convolution (CONV 1, CONV2, CONV3, CONV 4), the convolutional neural network structure may refer to a deep convolutional challenge-generating network (DCGAN) in the prior art, the decoder may refer to a generator of the DCGAN, and stride2 in fig. 5 is a parameter of a convolutional layer, which indicates a step size and is not described herein.

In this embodiment, the decoder may be trained in advance using the prior art method using the fight loss function associated with the arbiter network to obtain the parameters of the decoder, for example, the fight loss function of equation 5) may be used to train the parameters of the decoder as follows:

in equation 5), it is composed of two terms, x represents the original image, Z represents the noise of the input generation network G (decoder), and G (Z) represents an image generated by the generation network G, and D (x) represents a probability that the discrimination network D judges whether or not the original image is true. And D (g (z)) is the probability that the discrimination network D judges whether the reconstructed image g (z) is authentic.

This equation 5) shows that since the neural network training (decoder parameter training) process is the process of minimizing the loss function 5), it is desirable to generate the network G with D (G (z)) as large as possible (the larger the image is the more true the image is, the more similar the image is), where L (D, G) becomes smaller, and thus min_g is desirable. While the discrimination network D targets a large value of D (x) and a small value of D (G (z)), the larger the D value is, the more true the original image is, but the reconstructed image is not), and at this time L (D, G) becomes large, so max_d is desirable. In the training process, the discrimination network D may be trained using a random gradient change algorithm, and the larger L (D, G) is desired, the better, so that a gradient (scaling) is added to update the discrimination network D (see formula 6)). The second step trains the generation network G, hopefully the smaller L (D, G) is, the better, so the gradient (descending) is subtracted to update the generation network G (see equation 7)). The whole training process is alternately performed. The above formulas 5) to 7) belong to the prior art, and the training process of the decoder can refer to the prior art, and will not be described in detail here. After the training process is completed, the parameters of the decoder are kept unchanged.

In this embodiment, the encoder in step 401 may be implemented by using a convolutional neural network, where the convolutional neural network corresponding to the encoder is opposite to the convolutional neural network corresponding to fig. 5, and the feature vector in the original image is extracted by forward convolution, and the specific structure of the convolutional neural network may refer to the structure of the convolutional neural network that may be used to extract the image feature vector in the prior art, which is not limited by this embodiment, and the initial parameter setting of the encoder is similar to the parameter setting of the above-mentioned discrimination network D, that is, the parameter of the encoder is initialized according to the parameter of the discriminator related to the decoder, and in steps 403-404, the parameter θ of the encoder is optimized, so that the loss function in step 403 satisfies the predetermined condition, that is, the similarity indicating that the reconstructed image and the original image become high, and the quality of the reconstructed image is higher.

In this embodiment, the loss function calculated in step 403 is the loss function calculated in step 104 in embodiment 1, and the specific calculation mode is shown in equation 4), which is not described herein.

In this embodiment, in step 404, it is determined whether the loss function has an optimized space, and if the determination result is yes, the parameter θ of the encoder is retrained, and steps 401-403 are executed again, a new reconstructed image is reacquired, and a new loss function is calculated, until the loss function obtained in step 403 satisfies a predetermined condition, and training is stopped. At this time, the similarity between the reconstructed image and the original image is equal to or greater than a first threshold, for example, the predetermined condition is that the loss function is equal to or less than a second threshold, that is, the quality of the reconstructed image obtained when the loss function is smaller is higher, and the reconstructed image corresponding to the case that the loss function satisfies the predetermined condition is obtained in step 404, and the reconstructed image is taken as an image processing (compression) result, where the first threshold and the second threshold may be determined as needed, and the embodiment is not limited thereto.

By the above-described embodiment, the parameters of the neural network for image compression can be trained according to the loss function in the embodiment 1, so that the similarity between the original image data and the reconstructed image data becomes high, improving the quality of the reconstructed image (compressed image).

Example 3

The present embodiment 3 also provides an image similarity determination apparatus. Since the principle of solving the problem by this apparatus is similar to that of embodiment 1, specific implementation thereof can be referred to implementation of embodiment 1, and the description thereof will not be repeated.

Fig. 6 is a schematic diagram of the image similarity determination apparatus, and as shown in fig. 6, the apparatus 600 includes:

a generating unit 601 for generating two-channel gray-scale image data, wherein the two-channel gray-scale image data includes gray-scale raw image data of a first channel and gray-scale reconstructed image data of a second channel;

a first calculating unit 602, configured to calculate a first similarity of the two-channel gray image data, and determine a first loss function according to the first similarity, where the first loss function is inversely proportional to the first similarity;

a second calculation unit 603 for calculating a pixel-level second similarity of the original image data and the reconstructed image data, determining a second loss function according to the second similarity;

a determining unit 604 for determining a loss function representing a similarity of the original image data and the reconstructed image data based on the first loss function and the second loss function.

In this embodiment, the implementation manners of the generating unit 601, the first calculating unit 602, the second calculating unit 603, and the determining unit 604 may refer to steps 101 to 104 in embodiment 1, and are not described herein again.

In this embodiment, the first calculating unit 602 calculates a first similarity of the two-channel gray-scale image data using a convolutional neural network, where the first similarity is greater than or equal to 0 and less than or equal to 1, and the first loss function is equal to the inverse of the logarithm of the first similarity, and for a specific implementation, reference may be made to embodiment 1.

In this embodiment, the second similarity at the pixel level is a mean square error or a multi-level structure similarity, and reference may be made to embodiment 1 for a specific implementation.

In the present embodiment, the determination unit 604 includes:

a product module (not shown) for calculating a first product of the first loss function and the first weight, and a second product of the second loss function and the second weight;

an addition module (not shown) for calculating a sum of the first product and the second product, the sum being determined as the loss function.

In this embodiment, the apparatus may further include: (not shown, optional)

And a conversion unit for converting the original image data into gray-scale original image data and converting the reconstructed image data into gray-scale reconstructed image data.

Example 4

The present embodiment 4 also provides an image processing apparatus. Since the principle of solving the problem by this apparatus is similar to that of embodiment 2, specific implementation thereof can be referred to implementation of embodiment 2, and the description thereof will not be repeated.

Fig. 7 is a schematic view of the image processing apparatus, and as shown in fig. 7, the apparatus 700 includes:

an encoder 701 for converting input original image data into feature vectors;

a decoder 702 for converting the feature vector into reconstructed image data;

image similarity determination means 703 for calculating a loss function of the original image data and the reconstructed image data;

a processing unit 704, configured to train parameters of the encoder 701 according to the loss function, generate new reconstructed image data according to the trained parameters by the encoder 701 and the decoder 702, and recalculate the loss function of the new reconstructed image data and the original image data by the image similarity determining device 703 until the recalculated loss function indicates that the similarity between the original image data and the new reconstructed image data is greater than or equal to a first threshold.

In this embodiment, the specific implementation of the encoder 701 and the decoder 702 may refer to embodiment 2, the implementation of the image similarity determining device 703 may refer to the image similarity determining device 600 in embodiment 3, and the specific implementation of the processing unit 704 may refer to embodiment 2, which is not described herein.

Example 5

An embodiment of the present invention also provides an electronic device including the image similarity determination apparatus described in embodiment 3 or including the image processing apparatus described in embodiment 4, the contents of which are incorporated herein. The electronic device may be, for example, a computer, server, workstation, laptop, smart phone, etc.; embodiments of the invention are not so limited.

Fig. 8 is a schematic diagram of hardware configuration of an electronic device according to an embodiment of the invention. As shown in fig. 8, an electronic device 800 may include: a processor (e.g., a central processing unit, CPU) 810 and a memory 820; the memory 820 is coupled to the central processor 810. Wherein the memory 820 may store various data; further, a program of information processing is stored and executed under the control of the processor 810.

In one embodiment, the functionality of the image similarity determination device 600 or the image processing device 700 may be integrated into the processor 810. Wherein the processor 810 may be configured to implement the image similarity determination method described in embodiment 1 or the image processing method described in embodiment 2.

In another embodiment, the image similarity determining apparatus 600 or the image processing apparatus 700 may be configured separately from the processor 810, for example, the image similarity determining apparatus 600 or the image processing apparatus 700 may be configured as a chip connected to the processor 810, and functions of the image similarity determining apparatus 600 or the image processing apparatus 700 may be implemented by control of the processor 810.

For example, the processor 810 may be configured to control: generating two-channel gray scale image data, wherein the two-channel gray scale image data comprises gray scale original image data of a first channel and gray scale reconstruction image data of a second channel; calculating a first similarity of the two-channel gray image data, and determining a first loss function according to the first similarity, wherein the first loss function is inversely proportional to the first similarity; calculating a pixel level second similarity of the original image data and the reconstructed image data, and determining a second loss function according to the second similarity; a loss function representing a similarity of the original image data and the reconstructed image data is determined from the first loss function and the second loss function.

Alternatively, for example, the processor 810 may be configured to control: converting the input original image data into feature vectors by using an encoder; the feature vector is decoded by a decoder converting into reconstructed image data; calculating a loss function of the original image data and the reconstructed image data using the method in embodiment 1; according to the loss function, training parameters of the encoder, generating the new reconstructed image data by the encoder and the decoder according to the trained parameters, and recalculating the loss function of the new reconstructed image data and the original image data by the image similarity determining device until the recalculated loss function indicates that the similarity of the original image data and the new reconstructed image data is greater than or equal to a first threshold value.

The specific implementation of the processor 810 may refer to embodiment 1 or 2, and will not be described herein.

Further, as shown in fig. 8, the electronic device 800 may further include: a transceiver unit 830, etc.; wherein, the functions of the above components are similar to the prior art, and are not repeated here. It is noted that the electronic device 800 need not include all of the components shown in fig. 8; in addition, the electronic device 800 may further include components not shown in fig. 8, to which reference is made to the related art.

The embodiment of the present invention also provides a computer-readable program, wherein when the program is executed in an image similarity determination apparatus, the program causes a computer to execute the image similarity determination method as in embodiment 1 above in the image similarity determination apparatus.

The embodiment of the present invention also provides a storage medium storing a computer-readable program, wherein the computer-readable program causes a computer to execute the image similarity determination method in embodiment 1 above in an image similarity determination apparatus.

The embodiment of the present invention also provides a computer-readable program, wherein when the program is executed in an image processing apparatus, the program causes a computer to execute the image processing method as in embodiment 2 above in the image processing apparatus.

The embodiment of the present invention also provides a storage medium storing a computer-readable program, wherein the computer-readable program causes a computer to execute the image processing method in embodiment 2 above in an image processing apparatus.

The method of image similarity determination in an image similarity determination device described in connection with the embodiments of the present invention may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. For example, one or more of the functional blocks shown in FIGS. 6-8 and/or one or more combinations of the functional blocks may correspond to software modules or hardware modules of a computer program flow. These software modules may correspond to the individual steps shown in fig. 1,4, respectively. These hardware modules may be implemented, for example, by solidifying the software modules using a Field Programmable Gate Array (FPGA).

A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. A storage medium may be coupled to the processor such that the processor can read information from, and write information to, the storage medium; or the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. The software module may be stored in a memory of the image similarity determination device or may be stored in a memory card that is insertable into the image similarity determination device.

One or more of the functional block diagrams and/or one or more combinations of functional block diagrams described with respect to fig. 6-8 may be implemented as a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any suitable combination thereof for use in performing the functions described herein. One or more of the functional block diagrams and/or one or more combinations of functional block diagrams described with respect to fig. 6-8 may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP communication, or any other such configuration.

While the invention has been described in connection with specific embodiments, it will be apparent to those skilled in the art that the description is intended to be illustrative and not limiting in scope. Various modifications and alterations of this invention will occur to those skilled in the art in light of the spirit and principles of this invention, and such modifications and alterations are also within the scope of this invention.

With respect to implementations including the above examples, the following supplementary notes are also disclosed.

Supplementary note 1, an image similarity determination apparatus, wherein the apparatus includes:

a generation unit configured to generate two-channel grayscale image data, wherein the two-channel grayscale image data includes grayscale raw image data of a first channel and grayscale reconstructed image data of a second channel;

a first calculation unit configured to calculate a first similarity of the two-channel grayscale image data, determine a first loss function according to the first similarity, wherein the first loss function is inversely proportional to the first similarity;

a second calculation unit configured to calculate a pixel-level second similarity of the original image data and the reconstructed image data, and determine a second loss function according to the second similarity;

Supplementary note 2 the apparatus according to supplementary note 1, wherein the first calculation unit calculates a first similarity of the two-channel gray scale image using a convolutional neural network, wherein the first similarity is 0 or more and 1 or less.

Supplementary note 3 the apparatus according to supplementary note 2, wherein the first loss function is equal to an inverse number of the logarithm of the first similarity.

Supplementary note 4 the apparatus of supplementary note 1, wherein the pixel level second similarity is a mean square error or a multi-level structural similarity.

Supplementary note 5 the apparatus according to supplementary note 1, wherein the determining unit includes:

a product module for calculating a first product of the first loss function and the first weight and a second product of the second loss function and the second weight;

an addition module for calculating a sum of the first product and the second product, the sum being determined as the loss function.

Supplementary note 6 the apparatus according to supplementary note 1, wherein the apparatus further comprises:

And the conversion unit is used for converting the original image data into gray original image data and converting the reconstructed image data into gray reconstructed image data.

Supplementary note 7, an image processing apparatus, wherein the apparatus includes:

the encoder is configured to encode the data in the data stream, which is used to convert the input original image data into feature vectors;

a decoder for converting the feature vector into reconstructed image data;

the image similarity determination device of any one of supplementary notes 1 to 6, for calculating a loss function of the original image data and the reconstructed image data;

Supplementary note 8, an image similarity determination method, wherein the method comprises:

calculating first similarity of the two-channel gray image data, and determining a first loss function according to the first similarity, wherein the first loss function is inversely proportional to the first similarity;

calculating pixel-level second similarity of the original image data and the reconstructed image data, and determining a second loss function according to the second similarity;

determining a loss function representing a similarity of the original image data and the reconstructed image data from the first loss function and the second loss function.

Supplementary note 9 the method according to supplementary note 8, wherein a convolutional neural network is used to calculate a first similarity of the two-channel gray scale image, where the first similarity is greater than or equal to 0 and less than or equal to 1.

Supplementary note 10, the method according to supplementary note 9, wherein the first loss function is equal to an inverse of a logarithm of the first similarity.

Supplementary note 11, the method according to supplementary note 8, wherein the pixel level second similarity is a mean square error or a multi-level structural similarity.

The method of supplementary note 12, according to supplementary note 8, wherein determining a loss function representing a similarity of the original image data and the reconstructed image data from the first loss function and the second loss function includes:

calculating a first product of the first loss function and the first weight, and a second product of the second loss function and the second weight;

a sum of the first product and the second product is calculated, and the sum is determined as the loss function.

Supplementary note 13, the method according to supplementary note 8, wherein the method further comprises:

and converting the original image data into gray original image data, and converting the reconstructed image data into gray reconstructed image data.

Claims

1. An image similarity determination apparatus, wherein the apparatus comprises:

A second calculation unit for calculating a pixel-level second similarity of the original image data and the reconstructed image data, determining a second loss function according to the pixel-level second similarity;

a determining unit configured to determine a loss function representing a similarity of the original image data and the reconstructed image data from the first loss function and the second loss function;

the first calculation unit calculates a first similarity of the two-channel gray image data by using a convolutional neural network, wherein the first similarity is more than or equal to 0 and less than or equal to 1;

the first loss function is equal to an inverse of the logarithm of the first similarity.

2. The apparatus of claim 1, wherein the pixel-level second similarity is a mean square error or a multi-level structural similarity.

3. The apparatus of claim 1, wherein the determining unit comprises:

4. The apparatus of claim 1, wherein the apparatus further comprises:

and a conversion unit for converting the original image data into the grayscale original image data and converting the reconstructed image data into the grayscale reconstructed image data.

5. An image processing apparatus, wherein the apparatus comprises:

an encoder for converting input original image data into feature vectors;

a decoder for converting the feature vector into reconstructed image data;

the image similarity determination device of any one of claims 1 to 4 for calculating a loss function of the raw image data and the reconstructed image data;

6. An image similarity determination method, wherein the method comprises:

calculating pixel-level second similarity of the original image data and the reconstructed image data, and determining a second loss function according to the pixel-level second similarity;

determining a loss function representing a similarity of the original image data and the reconstructed image data from the first loss function and the second loss function;

calculating a first similarity of the two-channel gray level image by using a convolutional neural network, wherein the first similarity is more than or equal to 0 and less than or equal to 1; the first loss function is equal to an inverse of the logarithm of the first similarity.