CN111046893A

CN111046893A - Image similarity determining method and device, and image processing method and device

Info

Publication number: CN111046893A
Application number: CN201811189157.8A
Authority: CN
Inventors: 周静; 谭志明
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2018-10-12
Filing date: 2018-10-12
Publication date: 2020-04-21
Anticipated expiration: 2038-10-12
Also published as: CN111046893B

Abstract

The embodiment of the invention provides an image similarity determining method and device and an image processing method and device, wherein the image similarity determining method comprises the following steps: generating two-channel gray-scale image data, wherein the two-channel gray-scale image data comprises gray-scale original image data of a first channel and gray-scale reconstructed image data of a second channel; calculating a first similarity of the two-channel gray-scale image data, and determining a first loss function according to the first similarity, wherein the first loss function is inversely proportional to the first similarity; calculating pixel-level second similarity of the original image data and the reconstructed image data, and determining a second loss function according to the second similarity; determining a loss function representing a similarity of the original image data and the reconstructed image data based on the first loss function and the second loss function.

Description

Image similarity determining method and device, and image processing method and device

Technical Field

The present invention relates to the field of image processing, and in particular, to a method and an apparatus for determining image similarity, and a method and an apparatus for image processing.

Background

In recent years, with the development of artificial neural networks, in the field of image compression, it has been increasingly used that the artificial neural networks replace the processing method of manual linear transformation (for example, joint photographic experts group JPEG), that is, learned encoder functions replace analysis transformation, and learned decoder functions replace synthesis transformation, and recent research shows that the texture information of images can be effectively acquired through feature maps in deep convolutional networks, for example, a Generic Adaptive Network (GAN) and a variable Auto-encoder (VAE) are generated.

The GAN is mainly used for generating images, the GAN comprises two networks, one is a generating network G and the other is a discriminator network D, the generating network G is mainly used for learning real image distribution, so that the generated images are more real, a discriminator is cheated, the discriminator network D needs to discriminate whether the generated images are true or false, a noise variable z is input into the generating network in the training process, and generated image data G (z; theta) are output_g) The discriminator network inputs the original image and the generated image data g (z; theta_g) The output result is the confidence d (x; theta_d) The generator and the discriminator resist continuously, finally the two networks reach dynamic equilibrium, the image generated by the generator is close to the distribution of a real image, and the discriminator cannot identify a true image or a false image.

The traditional image quality evaluation method can be divided into objective and subjective evaluation, in the training stage, the traditional image compression algorithm can reduce the pixel-level loss measurement (objective evaluation, such as Mean Square Error (MSE)) to the maximum extent, so that a good peak signal-to-noise ratio characteristic is obtained, but the image is not perceptually true due to the loss (blurring) of high-frequency components, and in order to evaluate the image quality more accurately, the image perception quality can be used as an additional loss measurement (subjective evaluation).

It should be noted that the above background description is only for the sake of clarity and complete description of the technical solutions of the present invention and for the understanding of those skilled in the art. Such solutions are not considered to be known to the person skilled in the art merely because they have been set forth in the background section of the invention.

Disclosure of Invention

In the conventional subjective evaluation method, when a loss metric of image perception is calculated, an original image and a reconstructed image are used as two inputs of a twin neural network, a feature vector of the original image and a feature vector of the reconstructed image are respectively extracted according to the twin neural network, a similarity function of the two feature vectors is calculated (for example, a distance metric of the feature vectors is calculated), and the similarity function is determined as the loss metric of image perception.

The inventor finds that in the method, the twin neural network is independent of the feature extraction processes of the original image and the reconstructed image, so that the time for calculating the similarity function is long, and the feature extraction is independently performed, so that the correlation between the original image and the reconstructed image cannot be embodied, the semantic similarity between pixels at corresponding positions of the original image and the reconstructed image cannot be intuitively reflected, and the calculation accuracy is not accurate enough.

The embodiment of the invention provides an image similarity determining method and device, and an image processing method and device, which solve the problems in the prior art, and can train parameters of a neural network for image compression according to a loss function, so that the similarity between original image data and reconstructed image data is high, and the quality of a reconstructed image (compressed image) is improved.

According to a first aspect of embodiments of the present invention, there is provided an image similarity determination apparatus, wherein the apparatus includes:

a generation unit configured to generate two-channel grayscale image data, wherein the two-channel grayscale image data includes grayscale original image data of a first channel and grayscale reconstructed image data of a second channel;

a first calculating unit, configured to calculate a first similarity of the two-channel grayscale image data, and determine a first loss function according to the first similarity, where the first loss function is inversely proportional to the first similarity;

a second calculation unit for calculating a pixel-level second similarity between the original image data and the reconstructed image data, and determining a second loss function according to the second similarity;

a determining unit for determining a loss function representing a similarity of the original image data and the reconstructed image data based on the first loss function and the second loss function.

According to a second aspect of an embodiment of the present invention, there is provided an image processing apparatus, wherein the apparatus includes:

an encoder for converting input original image data into a feature vector;

a decoder for converting the feature vector into reconstructed image data;

an image similarity determination apparatus according to the first aspect, for calculating a loss function of the original image data and the reconstructed image data;

and the image similarity determining device recalculates the loss functions of the new reconstructed image data and the original image data until the recalculated loss functions indicate that the similarity between the original image data and the new reconstructed image data is greater than or equal to a first threshold.

According to a third aspect of the embodiments of the present invention, there is provided an image similarity determination method, wherein the method includes:

generating two-channel gray-scale image data, wherein the two-channel gray-scale image data comprises gray-scale original image data of a first channel and gray-scale reconstructed image data of a second channel;

calculating a first similarity of the two-channel gray-scale image data, and determining a first loss function according to the first similarity, wherein the first loss function is inversely proportional to the first similarity;

calculating pixel-level second similarity of the original image data and the reconstructed image data, and determining a second loss function according to the second similarity;

determining a loss function representing a similarity of the original image data and the reconstructed image data based on the first loss function and the second loss function.

The method has the advantages that the original image and the reconstructed image are used as the dual-channel image, the perception level loss function of the original image and the reconstructed image is calculated according to the dual-channel image, the final loss function is determined according to the perception level loss function and the pixel level loss function, the determination method saves calculation time, improves calculation accuracy of similarity, solves the problems in the prior art, and can train parameters of a neural network for image compression according to the loss function, so that the similarity of original image data and reconstructed image data is high, and the quality of the reconstructed image (compressed image) is improved.

Specific embodiments of the present invention are disclosed in detail with reference to the following description and drawings, indicating the manner in which the principles of the invention may be employed. It should be understood that the embodiments of the invention are not so limited in scope. The embodiments of the invention include many variations, modifications and equivalents within the spirit and scope of the appended claims.

Features that are described and/or illustrated with respect to one embodiment may be used in the same way or in a similar way in one or more other embodiments, in combination with or instead of the features of the other embodiments.

It should be emphasized that the term "comprises/comprising" when used herein, is taken to specify the presence of stated features, integers, steps or components but does not preclude the presence or addition of one or more other features, integers, steps or components.

Drawings

Many aspects of the invention can be better understood with reference to the following drawings. The components in the figures are not necessarily to scale, emphasis instead being placed upon illustrating the principles of the invention. For convenience in illustrating and describing some parts of the present invention, corresponding parts may be enlarged or reduced in the drawings. Elements and features depicted in one drawing or one embodiment of the invention may be combined with elements and features shown in one or more other drawings or embodiments. Moreover, in the drawings, like reference numerals designate corresponding parts throughout the several views, and may be used to designate corresponding parts for use in more than one embodiment.

In the drawings:

fig. 1 is a flowchart of an image similarity determination method in this embodiment 1;

FIG. 2 is a schematic diagram of the convolutional neural network structure in this embodiment 1;

FIG. 3 is a schematic diagram of an image processing system according to the embodiment 2;

FIG. 4 is a flowchart of an image processing method in the present embodiment 2;

fig. 5 is a schematic diagram of a convolutional neural network structure corresponding to the decoder in this embodiment 2;

FIG. 6 is a schematic diagram of an image similarity determination apparatus according to the present embodiment 3;

fig. 7 is a schematic diagram of the image processing apparatus in this embodiment 4.

Fig. 8 is a schematic diagram of the hardware configuration of the electronic device in this embodiment 5.

Detailed Description

The foregoing and other features of embodiments of the present invention will become apparent from the following description, taken in conjunction with the accompanying drawings. These embodiments are merely exemplary and are not intended to limit the present invention. In order to make those skilled in the art easily understand the principle and the implementation manner of the present invention, the embodiment of the present invention is described by taking the reconstructed image processed by image compression as an example, but it is to be understood that the embodiment of the present invention is not limited thereto, and the reconstructed image processed by other image processing is also within the scope of the present invention.

The following describes a specific embodiment of the present invention with reference to the drawings.

Example 1

This embodiment 1 provides an image similarity determining method, and fig. 1 is a flowchart of the method, as shown in fig. 1, the method includes:

step 101, generating dual-channel gray-scale image data, wherein the dual-channel gray-scale image data comprises gray-scale original image data of a first channel and gray-scale reconstructed image data of a second channel;

102, calculating a first similarity of the two-channel gray-scale image data, and determining a first loss function according to the first similarity, wherein the first loss function is inversely proportional to the first similarity;

103, calculating a pixel-level second similarity of the original image data and the reconstructed image data, and determining a second loss function according to the second similarity;

and 104, determining a loss function representing the similarity between the original image data and the reconstructed image data according to the first loss function and the second loss function.

In order to better explain the above steps 101-104, the following description will first explain some features.

In this embodiment, the loss function may also be called an evaluation function, which is used to evaluate the degree of inconsistency (or consistency) between the reconstructed image and the original image, and is also an objective function optimized in the neural network, where the process of neural network training or optimization is a process of minimizing the loss function, and a smaller loss function indicates a higher similarity between the reconstructed image and the original image.

In this embodiment, in the field of image processing, for example, image compression, may be regarded as a compromise between a code rate and an image compression distortion degree, where the image compression distortion degree may be regarded as a degree of similarity (loss function representation) between a reconstructed image (compressed image) and an original image, and a higher degree of similarity indicates a higher quality of the reconstructed image, and when evaluating the quality of the reconstructed image, this embodiment combines two factors, namely subjective measurement and objective measurement, to determine a loss function, so that an evaluation result is more accurate, and a confidence of the reconstructed image is improved.

In one aspect, the subjective measure, i.e., the perceptual-level first loss function in this embodiment, may reflect the content loss and the style loss of the reconstructed image, and the perceptual-level first loss function may reflect the similarity degree of the reconstructed image and the original image in terms of semantics (perceptual features).

How to calculate the first loss function of the sensing level is described below in conjunction with steps 101-102.

In this embodiment, in step 101, the grayscale original image and the grayscale reconstructed image are combined to generate a two-channel grayscale image, and the grayscale original image data and the grayscale reconstructed image data are respectively regarded as data of two channels of the one image. The original image and the reconstructed image have the same size (X × Y, X representing the number of pixels in the longitudinal direction of the image and Y representing the number of pixels in the width direction of the image).

For example, needlesFor gray-scale raw image data, it can be represented as a two-dimensional single-channel matrix V1, a_y,xRepresenting the gray values of the pixels with pixel position (y, x), which may be represented as a two-dimensional single-channel matrix V2, b for gray reconstructed image data_y,xThe gray scale value of the pixel with the pixel position (y, x) is expressed as follows:

in the present embodiment, the matrices V1 and V2 are combined to generate a two-channel grayscale image, which can be represented as a two-dimensional two-channel matrix V3:

in this embodiment, before step 101, the method may further include (optionally, not shown):

step 100, converting the original image data into gray original image data, and converting the reconstructed image data into gray reconstructed image data, where the gray processing method in step 100 may refer to the prior art, for example, the three color components corresponding to each pixel in the image are weighted and summed by using a weighted average method to obtain the gray value of the pixel, and details are not repeated.

In step 102, a convolution neural network is used to calculate a first similarity of the two-channel grayscale image data, fig. 2 is a schematic diagram of the structure of the convolution neural network, and how to calculate the first similarity is described below with reference to fig. 2.

As shown in fig. 2, the convolutional neural network includes a plurality of convolutional layers on the bottom layer, a modified linear unit ReLU, a maximum pooling layer, and a full-link layer on the upper layer, the gray-scale dual channel image data generated in step 101 is used as the input of the convolutional neural network, that is, the matrix V3 is used as the input of the convolutional neural network, after the first convolution processing on the bottom layer of the convolutional neural network, the gray-scale original image data and the gray-scale reconstructed image data are associated (weighted combined) and mapped, which means that after the first convolution, the gray-scale original image data and the gray-scale reconstructed image data are associated (correlated) together, after the bottom layer processing, after the full-link layer on the upper layer, the number of output neurons is 1, and the output result represents the first similarity S1, for example, the first similarity may be a fraction within 0 to 1, the specific algorithm of the first similarity of the two-channel image data and the parameters of the convolutional neural network may refer to the prior art, for example, the size of the convolution kernel is 3 × 3, and the embodiment is not limited thereto.

In the present embodiment, in step 102, a perceptual level loss function (first loss function) is determined according to the first similarity, the first loss function being inversely proportional to the first similarity; that is, the higher the first similarity is, the smaller the first loss function is, the lower the first similarity is, the larger the first loss function is, the greater the loss function is, when the loss function is taken as an objective function for neural network (for example, GAN) optimization, and a loss function satisfying a predetermined condition is obtained, the similarity between the obtained reconstructed image and the original image is equal to or greater than a first threshold.

In the embodiment, the perception level first loss function is calculated according to the first similarity calculated based on the two-channel image data, the perception level first loss function can reflect the semantic (perception) relevance and the semantic (perception) similarity of the original image and the reconstructed image more intuitively, the perception similarity calculation time is shorter, and the calculation accuracy is higher.

In one embodiment, the first penalty function percentual_lossEqual to the inverse of the logarithm of the first similarity, as shown in equation 1) below:

perceptual_losseither-logS 1 formula 1)

In this embodiment, for example, the first similarity S1 may be a score within 0 ~ 1, and the closer to 1S 1, the higher the probability of similarity S1, the higher the probability of similarity S_lossThe closer to 0, the higher the similarity between the original image and the reconstructed image, and the lower the distortion of the reconstructed image, whereas the closer to 0, the higher the probability of being used in S1_lossThe larger the distance, the lower the similarity between the original image and the reconstructed image, and the higher the distortion of the reconstructed image.

In one embodiment, the first loss function is equal to the inverse of the logarithm of the first similarity, i.e. the first loss function is inversely proportional to the first similarity; that is, the higher the first similarity is, the smaller the first loss function is, and the lower the first similarity is, the larger the first loss function is.

The above two embodiments are merely two examples, and the present embodiment is not limited thereto, and various embodiments in which the first loss function is inversely proportional to the first similarity are within the protection scope of the present embodiment.

On one hand, for objective measurement, that is, the pixel-level second loss function in this embodiment, the difference between the pixel values of the original image and the reconstructed image can be reflected, and the visual perception characteristics of human eyes cannot be reflected.

How to calculate the pixel-level second penalty function is described below in connection with step 103.

In step 103, the pixel-level second similarity is a mean square error or a multi-level structural similarity, and the pixel-level second similarity is used as the second loss function.

For example, the Mean Square Error (MSE) of the original image data and the reconstructed image data is calculated, and the pixel-level second similarity is taken as the second loss function pixel_lossNamely, calculating the mean value of squared euclidean distances of pixel values at corresponding positions of the original image data and the reconstructed image data, as shown in the following formula 2):

for example, the multi-level structural similarity of the original image data and the reconstructed image data is calculated, as shown in the following formula 3): similarity of the multilevel structure

The specific calculation process of (2) can refer to the prior art, and is not described herein again.

Where x denotes the RGB values of the reconstructed image data,

RGB values representing original image data, wherein

The larger the similarity, the smaller the distortion of the reconstructed image, and the smaller the similarity, the larger the distortion of the reconstructed image.

The two embodiments described above are merely two examples, and this embodiment is not limited thereto, and the existing objective metric in image evaluation, that is, the pixel-level similarity embodiment, is within the protection scope of this embodiment.

In the present embodiment, the execution sequence of steps 101-102 and 103 is not limited.

In the present embodiment, in step 104, a first loss function and a first weight λ are calculated₁And a second loss function and a second weight λ₂A second product of; calculating the sum of the first product and the second product, and determining the sum as the loss function

Specifically, the following formula 4 can be referred to):

in the present embodiment, the first weight λ₁And a second weight λ₂Can be determined as desired, λ₁And λ₂The sum of (1) is not limited in this embodiment.

In this embodiment, the loss function

The confidence of the reconstructed image can be further improved by using the loss function as an optimization target of a neural network (such as GAN) by including objective measurement of a pixel level and subjective measurement of a perception level.

Therefore, the original image and the reconstructed image are used as a dual-channel image, the perception level loss function of the original image and the reconstructed image is calculated according to the dual-channel image, and the final loss function is determined according to the perception level loss function and the pixel level loss function.

Example 2

In this embodiment, an encoder is added in front of a GAN-based generator (decoder), and an original image passes through the encoder to obtain a feature vector, which is used as an input of the decoder to obtain a reconstructed image.

FIG. 3 is a schematic diagram of the architecture of the GAN-based image processing (compression) system, as shown in FIG. 3, the GAN is first trained, the generator is used to initialize the decoder of the compression system, and the original image x passes through the encoder f_θObtain a feature vector Z, a decoder g_φObtaining a reconstructed image based on the eigenvector Z

Calculating loss functions of the original image data and the reconstructed image data to train network parameters of an encoder; and updating the parameter theta of the network, generating new reconstructed image data, recalculating the loss function, and repeating the processes until the optimized loss function meets a preset condition, namely the similarity between the original image data and the reconstructed image data is greater than or equal to a first threshold, namely the reconstructed image quality is high.

This embodiment 2 provides an image processing method, which trains parameters of a neural network for image compression by using the loss function in embodiment 1, so that the similarity between the original image data and the reconstructed image data becomes high, i.e., the quality of the reconstructed image (compressed image) is improved.

Fig. 4 is a flowchart of the image processing method, as shown in fig. 4, the method including:

step 401, converting input original image data into a feature vector by using an encoder;

step 402, converting the feature vector into reconstructed image data by using a decoder;

step 403, calculating a loss function of the original image data and the reconstructed image data;

step 404, training the parameters of the encoder according to the loss function, generating new reconstructed image data, and recalculating the loss functions of the new reconstructed image data and the original image data until the recalculated loss functions indicate that the similarity between the original image data and the new reconstructed image data is greater than or equal to the first threshold.

In this embodiment, the decoder in step 402 may be implemented by using a convolutional neural network, fig. 5 is a schematic diagram of the convolutional neural network corresponding to the decoder, as shown in fig. 5, a feature vector is used as an input, and a reconstructed image is obtained through four layers of inverse convolution (CONV1, CONV2, CONV3, and CONV4), the convolutional neural network structure may refer to a deep convolutional pair generator network (DCGAN) in the prior art, the decoder may refer to a generator of the DCGAN, stride2 in fig. 5 is a parameter of a convolutional layer, which represents a step size, and details are not repeated here.

In this embodiment, the decoder may be trained in advance by using a countermeasure loss function associated with the discriminator network using a method in the prior art to obtain parameters of the decoder, for example, the parameters of the decoder may be trained by using the countermeasure loss function of the following formula 5):

in equation 5), x represents an original image, Z represents noise input to the generation network G (decoder), G (Z) represents an image generated by the generation network G, and D (x) represents a probability that the discrimination network D judges whether or not the original image is true. And D (g (z)) is the probability of judging whether the network D judges the reconstructed image g (z) to be true or not.

This equation 5) shows that since the process of neural network training (decoder parameter training) is the process of minimizing the loss function 5), the generated network G is expected to have D (G (z)) as large as possible (larger represents more realistic images, and realistic represents more similar images), and then L (D, G) is expected to be smaller, so min _ G is expected. On the other hand, if the network D is judged to have a large value of D (x) and a small value of D (G (z)) (the larger the value of D is, the more real the network D is), i.e., the original image is real but the reconstructed image is not real), then L (D, G) becomes larger, so max _ D is desirable. In the training process, a random gradient change algorithm can be used, and in the first step, the discriminant network D is trained, and the discriminant network D is updated by adding a gradient (averaging) in hope that the larger L (D, G) is, the better. The second step trains the generation network G, and it is expected that the smaller L (D, G) is, the better, so the gradient (desending) is subtracted to update the generation network G (see equation 7)). The whole training process is performed alternately. The above equations 5) -7) belong to the prior art, and the training process of the decoder can refer to the prior art, which is not described in detail herein. After the training process is completed, the parameters of the decoder are kept unchanged.

In this embodiment, the encoder in step 401 may be implemented by using a convolutional neural network, which is the inverse of the convolutional neural network corresponding to fig. 5, and extracts the feature vector in the original image by forward convolution, the specific structure of the convolutional neural network can refer to the structure of the convolutional neural network that can be used for extracting the image feature vector in the prior art, and the embodiment is not limited to this, the initial parameter setting of the encoder is similar to the parameter of the above discrimination network D, i.e. the parameters of the encoder are initialized based on the parameters of the discriminator associated with the decoder, in

step

403 and 404, the parameter θ of the encoder is optimized such that the loss function in step 403 satisfies a predetermined condition, i.e., the similarity between the reconstructed image and the original image becomes higher and the quality of the reconstructed image becomes higher.

In this embodiment, the loss function calculated in step 403 is the loss function calculated in step 104 in embodiment 1, and please refer to formula 4 for a specific calculation manner), which is not described herein again.

In this embodiment, in step 404, it is determined whether the loss function has an optimized space, and if yes, the parameter θ of the encoder is retrained, and the method returns to step 401 and step 403, and a new reconstructed image is obtained again, and a new loss function is calculated until the loss function obtained in step 403 meets a predetermined condition, and the training is stopped. At this time, the similarity between the reconstructed image and the original image is greater than or equal to a first threshold, for example, the predetermined condition is that the loss function is less than or equal to a second threshold, that is, the quality of the reconstructed image obtained when the loss function is smaller is higher and is more similar to the original image, in step 404, the corresponding reconstructed image when the loss function satisfies the predetermined condition is obtained, and the reconstructed image is used as an image processing (compression) result, and the first threshold and the second threshold may be determined as needed, which is not limited in this embodiment.

With the above-described embodiment, parameters of a neural network used for image compression can be trained according to the loss function in embodiment 1, so that the similarity between the original image data and the reconstructed image data becomes high, and the quality of a reconstructed image (compressed image) is improved.

Example 3

This embodiment 3 also provides an image similarity determination apparatus. Since the principle of the device for solving the problems is similar to the method of the embodiment 1, the specific implementation of the device can refer to the implementation of the method of the embodiment 1, and the description of the same parts is not repeated.

Fig. 6 is a schematic diagram of the image similarity determination apparatus, and as shown in fig. 6, the apparatus 600 includes:

a generating unit 601 configured to generate two-channel grayscale image data, wherein the two-channel grayscale image data includes grayscale original image data of a first channel and grayscale reconstructed image data of a second channel;

a first calculating unit 602, configured to calculate a first similarity of the two-channel grayscale image data, and determine a first loss function according to the first similarity, where the first loss function is inversely proportional to the first similarity;

a second calculating unit 603 configured to calculate a pixel-level second similarity between the original image data and the reconstructed image data, and determine a second loss function according to the second similarity;

a determining unit 604 for determining a loss function representing a similarity of the original image data and the reconstructed image data based on the first loss function and the second loss function.

In this embodiment, the implementation manners of the generating unit 601, the first calculating unit 602, the second calculating unit 603, and the determining unit 604 may refer to step 101-104 in embodiment 1, which is not described herein again.

In this embodiment, the first calculating unit 602 calculates a first similarity of the two-channel grayscale image data by using a convolutional neural network, wherein the first similarity is greater than or equal to 0 and less than or equal to 1, and the first loss function is equal to an inverse number of a logarithm of the first similarity, which can be referred to in embodiment 1.

In this embodiment, the pixel-level second similarity is a mean square error or a multi-level structural similarity, and the specific implementation thereof can refer to embodiment 1.

In the present embodiment, the determining unit 604 includes:

a product module (not shown) for calculating a first product of the first penalty function and the first weight, and a second product of the second penalty function and the second weight;

an adding module (not shown) for calculating a sum of the first product and the second product, the sum being determined as the loss function.

In this embodiment, the apparatus may further include: (optional, not shown)

And the conversion unit is used for converting the original image data into gray original image data and converting the reconstructed image data into gray reconstructed image data.

Example 4

Embodiment 4 also provides an image processing apparatus. Since the principle of the device for solving the problems is similar to the method of the embodiment 2, the specific implementation thereof can refer to the implementation of the method of the embodiment 2, and the description of the same parts is not repeated.

Fig. 7 is a schematic view of the image processing apparatus, and as shown in fig. 7, the apparatus 700 includes:

an encoder 701 for converting input original image data into a feature vector;

a decoder 702 for converting the feature vector into reconstructed image data;

image similarity determining means 703 for calculating a loss function of the original image data and the reconstructed image data;

a processing unit 704, configured to train parameters of the encoder 701 according to the loss function, generate new reconstructed image data according to the trained parameters, and the image similarity determining apparatus 703 recalculates the new reconstructed image data and the loss function of the original image data until the recalculated loss function indicates that the similarity between the original image data and the new reconstructed image data is greater than or equal to a first threshold.

In this embodiment, the specific implementation of the encoder 701 and the decoder 702 may refer to embodiment 2, the implementation of the image similarity determining apparatus 703 may refer to the image similarity determining apparatus 600 in embodiment 3, and the specific implementation of the processing unit 704 may refer to embodiment 2, which is not described herein again.

Example 5

An embodiment of the present invention further provides an electronic device, which includes the image similarity determining apparatus described in embodiment 3 or includes the image processing apparatus described in embodiment 4, and the contents of which are incorporated herein. The electronic device may be, for example, a computer, server, workstation, laptop, smartphone, or the like; embodiments of the invention are not limited thereto.

Fig. 8 is a schematic diagram of the hardware configuration of the electronic device according to the embodiment of the present invention. As shown in fig. 8, the electronic device 800 may include: a processor (e.g., central processing unit, CPU)810 and a memory 820; the memory 820 is coupled to the central processor 810. Wherein the memory 820 can store various data; further, a program for information processing is stored and executed under the control of the processor 810.

In one embodiment, the functions of the image similarity determination apparatus 600 or the image processing apparatus 700 may be integrated into the processor 810. The processor 810 may be configured to implement the image similarity determination method according to embodiment 1 or the image processing method according to embodiment 2.

In another embodiment, the image similarity determining apparatus 600 or the image processing apparatus 700 may be configured separately from the processor 810, for example, the image similarity determining apparatus 600 or the image processing apparatus 700 may be configured as a chip connected to the processor 810, and the functions of the image similarity determining apparatus 600 or the image processing apparatus 700 are realized by the control of the processor 810.

For example, the processor 810 may be configured to control as follows: generating two-channel gray-scale image data, wherein the two-channel gray-scale image data comprises gray-scale original image data of a first channel and gray-scale reconstructed image data of a second channel; calculating a first similarity of the two-channel gray-scale image data, and determining a first loss function according to the first similarity, wherein the first loss function is inversely proportional to the first similarity; calculating pixel-level second similarity of the original image data and the reconstructed image data, and determining a second loss function according to the second similarity; determining a loss function representing a similarity of the original image data and the reconstructed image data based on the first loss function and the second loss function.

Alternatively, for example, the processor 810 may be configured to control as follows: converting input original image data into a feature vector by using an encoder; converting the feature vector into reconstructed image data by using a decoder; calculating a loss function of the original image data and the reconstructed image data using the method of embodiment 1; the image similarity determining device recalculates the loss functions of the new reconstructed image data and the original image data until the recalculated loss function indicates that the similarity between the original image data and the new reconstructed image data is greater than or equal to a first threshold.

The specific implementation of the processor 810 can refer to

embodiment

1 or 2, and is not described herein again.

Further, as shown in fig. 8, the electronic device 800 may further include: a transmitting/receiving unit 830 and the like; the functions of the above components are similar to those of the prior art, and are not described in detail here. It is noted that the electronic device 800 does not necessarily include all of the components shown in FIG. 8; furthermore, the electronic device 800 may also comprise components not shown in fig. 8, as reference may be made to the prior art.

An embodiment of the present invention also provides a computer-readable program, wherein when the program is executed in an image similarity determination apparatus, the program causes a computer to execute the image similarity determination method as in embodiment 1 above in the image similarity determination apparatus.

An embodiment of the present invention further provides a storage medium storing a computer-readable program, where the computer-readable program enables a computer to execute the image similarity determination method in embodiment 1 above in an image similarity determination apparatus.

An embodiment of the present invention also provides a computer-readable program, wherein when the program is executed in an image processing apparatus, the program causes a computer to execute the image processing method as in embodiment 2 above in the image processing apparatus.

An embodiment of the present invention also provides a storage medium storing a computer-readable program, where the computer-readable program enables a computer to execute the image processing method in embodiment 2 above in an image processing apparatus.

The method for determining image similarity in an image similarity determination apparatus described in connection with the embodiments of the present invention may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. For example, one or more of the functional block diagrams and/or one or more combinations of the functional block diagrams illustrated in fig. 6-8 may correspond to individual software modules of a computer program flow or individual hardware modules. These software modules may correspond to the steps shown in fig. 1 and 4, respectively. These hardware modules may be implemented, for example, by solidifying these software modules using a Field Programmable Gate Array (FPGA).

A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. A storage medium may be coupled to the processor such that the processor can read information from, and write information to, the storage medium; or the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. The software module may be stored in the memory of the image similarity determination apparatus or in a memory card that is insertable into the image similarity determination apparatus.

One or more of the functional block diagrams and/or one or more combinations of the functional block diagrams described with respect to fig. 6-8 may be implemented as a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any suitable combination thereof designed to perform the functions described herein. One or more of the functional block diagrams and/or one or more combinations of the functional block diagrams described with respect to fig. 6-8 may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP communication, or any other such configuration.

While the invention has been described with reference to specific embodiments, it will be apparent to those skilled in the art that these descriptions are illustrative and not intended to limit the scope of the invention. Various modifications and alterations of this invention will become apparent to those skilled in the art based upon the spirit and principles of this invention, and such modifications and alterations are also within the scope of this invention.

With regard to the embodiments including the above embodiments, the following remarks are also disclosed.

Supplementary note 1, an image similarity determination apparatus, wherein the apparatus comprises:

a generating unit configured to generate two-channel grayscale image data, wherein the two-channel grayscale image data includes grayscale original image data of a first channel and grayscale reconstructed image data of a second channel;

a second calculation unit for calculating a pixel-level second similarity of the original image data and the reconstructed image data, and determining a second loss function according to the second similarity;

a determining unit for determining a loss function representing a similarity of the original image data and the reconstructed image data from the first loss function and the second loss function.

Supplementary note 2 and the apparatus according to supplementary note 1, wherein the first calculation unit calculates a first similarity of the two-channel grayscale image using a convolutional neural network, wherein the first similarity is equal to or greater than 0 and equal to or less than 1.

Supplementary note 3, the apparatus according to supplementary note 2, wherein the first loss function is equal to the inverse of the logarithm of the first similarity.

Supplementary note 4, the apparatus according to supplementary note 1, wherein the pixel-level second similarity is a mean square error or a multi-level structural similarity.

Supplementary note 5, the apparatus according to supplementary note 1, wherein the determining unit includes:

a product module for calculating a first product of a first penalty function and a first weight, and a second product of a second penalty function and a second weight;

an addition module for calculating a sum of the first product and the second product, the sum being determined as the loss function.

Supplementary note 6, the apparatus according to supplementary note 1, wherein the apparatus further comprises:

Supplementary note 7, an image processing apparatus, wherein the apparatus comprises:

an encoder for converting input original image data into a feature vector;

a decoder for converting the feature vectors into reconstructed image data;

the image similarity determination apparatus according to any one of supplementary notes 1 to 6, which is configured to calculate a loss function of the original image data and the reconstructed image data;

Supplementary note 8, an image similarity determination method, wherein the method comprises:

generating two-channel grayscale image data, wherein the two-channel grayscale image data comprises grayscale raw image data of a first channel and grayscale reconstructed image data of a second channel;

determining a loss function representing a similarity of the original image data and the reconstructed image data from the first loss function and the second loss function.

Supplementary note 9, the method according to supplementary note 8, wherein a first similarity of the two-channel grayscale image is calculated using a convolutional neural network, wherein the first similarity is equal to or greater than 0 and equal to or less than 1.

Reference 10 the method according to reference 9, wherein the first loss function is equal to the inverse of the logarithm of the first similarity.

Supplementary note 11, the method according to supplementary note 8, wherein the pixel-level second similarity is a mean square error or a multi-level structural similarity.

Reference 12 to the method of reference 8, wherein determining a loss function representing a similarity of the original image data and the reconstructed image data from the first loss function and the second loss function comprises:

calculating a first product of the first loss function and the first weight, and a second product of the second loss function and the second weight;

calculating a sum of the first product and the second product, the sum being determined as the loss function.

Supplementary note 13, the method according to supplementary note 8, wherein the method further comprises:

and converting the original image data into gray original image data, and converting the reconstructed image data into gray reconstructed image data.

Claims

1. An image similarity determination apparatus, wherein the apparatus comprises:

2. The apparatus according to claim 1, wherein the first calculation unit calculates a first similarity of the two-channel grayscale image data using a convolutional neural network, wherein the first similarity is equal to or greater than 0 and equal to or less than 1.

3. The apparatus of claim 2, wherein the first penalty function is equal to the inverse of the logarithm of the first similarity.

4. The apparatus of claim 1, wherein the pixel-level second similarity is a mean square error or a multi-level structural similarity.

5. The apparatus of claim 1, wherein the determining unit comprises:

6. The apparatus of claim 1, wherein the apparatus further comprises:

7. An image processing apparatus, wherein the apparatus comprises:

an encoder for converting input original image data into a feature vector;

a decoder for converting the feature vectors into reconstructed image data;

the image similarity determination apparatus of any one of claims 1 to 6, for calculating a loss function of the original image data and the reconstructed image data;

8. An image similarity determination method, wherein the method comprises:

9. The method of claim 8, wherein a first similarity of the two-channel grayscale image is calculated using a convolutional neural network, wherein the first similarity is greater than or equal to 0 and less than or equal to 1.

10. The method of claim 9, wherein the first penalty function is equal to the inverse of the logarithm of the first degree of similarity.