CN117218149A

CN117218149A - Image reconstruction method and system based on self-coding neural network

Info

Publication number: CN117218149A
Application number: CN202311476097.9A
Authority: CN
Inventors: 韩剑
Original assignee: Nantong Dumo Information Technology Co ltd
Current assignee: Nantong Dumo Information Technology Co ltd
Priority date: 2023-11-08
Filing date: 2023-11-08
Publication date: 2023-12-12
Anticipated expiration: 2043-11-08
Also published as: CN117218149B

Abstract

The invention relates to the technical field of data processing, in particular to an image reconstruction method and system based on a self-coding neural network. The method comprises the following steps: training the self-coding convolutional neural network by adopting a joint loss function; acquiring an initial image, inputting the initial image into a self-coding convolutional neural network, and outputting a reconstructed image; respectively carrying out frequency domain transformation according to the initial image and the reconstructed image to obtain an initial frequency domain image and a reconstructed frequency domain image; calculating and acquiring weighted frequency domain loss according to the initial frequency domain image and the reconstructed frequency domain image; acquiring weighted space frequency loss and airspace loss; the weighted frequency domain loss, the weighted spatial frequency loss and the spatial domain loss are taken as the joint loss function. The method enables the self-coding neural network to pay more attention to the reconstruction precision of the high-frequency component, enables the network to learn the outline and detail information of the main body in the image more effectively, avoids the blurring of the reconstructed image, and improves the precision of the reconstructed image.

Description

Image reconstruction method and system based on self-coding neural network

Technical Field

The invention relates to the technical field of data processing, in particular to an image reconstruction method and system based on a self-coding neural network.

Background

In image anomaly detection, an automatic encoder is a common method that reconstructs an input image that may contain anomalies and outputs a clean image without anomalies. These automatic encoder-based methods typically calculate an anomaly score from the reconstruction error, the difference between the input image and the reconstructed image. On the other hand, many methods do not reconstruct with sufficient accuracy, resulting in a decrease in the accuracy of anomaly detection. One of the important reasons is that natural images contain many low frequency components, while high frequency components are few, and the commonly used mean square error loss does not consider the problem of extreme unbalance of the high and low frequency components, which in turn leads to distortion of the reconstructed image and poor definition.

Disclosure of Invention

In order to solve the problems, the invention provides an image reconstruction method and system based on a self-coding neural network, and the adopted technical scheme is as follows:

in a first aspect, an embodiment of the present invention provides an image reconstruction method based on a self-coding neural network, which specifically includes the following steps:

inputting an image to be built into a self-coding convolutional neural network to obtain a first reconstructed image; the self-coding convolutional neural network adopts a joint loss function;

the method for acquiring the joint loss function comprises the following steps:

acquiring an initial image, inputting the initial image into a self-coding convolutional neural network, and outputting a reconstructed image;

respectively carrying out frequency domain transformation on the initial image and the reconstructed image to obtain an initial frequency domain image and a reconstructed frequency domain image;

obtaining frequency domain error loss according to the difference of the initial frequency domain image and the reconstructed frequency domain image on a complex plane;

performing mathematical transformation to adjust the value range according to the spectrum values in the initial frequency domain image and the reconstructed frequency domain image, and performing inverse mapping of the value range to obtain spectrum weights;

acquiring a phase according to the initial frequency domain image and the reconstructed frequency domain image to obtain a phase difference loss;

acquiring a weighted frequency domain loss based on the frequency domain error loss, the phase difference loss, and the spectral weight;

constructing weighted space frequency loss and airspace loss according to the initial image and the reconstructed image;

the weighted frequency domain loss, the weighted spatial frequency loss and the spatial domain loss are taken as the joint loss function.

The method for obtaining the frequency domain error loss according to the difference of the initial frequency domain image and the reconstructed frequency domain image on the complex plane comprises the following steps:

；

representing the difference of said initial frequency domain image, said reconstructed frequency domain image on the complex plane,/o>Is the real part of the complex number, ">Is the imaginary part of the complex number, ">Representing the length and width of the initial frequency domain image, respectively representing +.>I.e. representing +.>Scaling value of the modulus at coordinates, +.>For mapping coefficients +.>Representing a frequency domain map->Spectral values at coordinates +.>Representing the initial frequency domain image->Pixel value at coordinates +_>Representing the reconstructed frequency domain image +.>Pixel values at coordinates.

The method for carrying out mathematical transformation to adjust the value range according to the spectrum values in the initial frequency domain image and the reconstructed frequency domain image and carrying out inverse mapping of the value range to obtain the spectrum weight comprises the following steps:

；

wherein the method comprises the steps ofRepresenting +_in frequency domain plot>The spectrum value of the coordinate after log transformation; />In the case of a logarithmic transformation,is the real part of the complex number, ">Is the imaginary part of the complex number; />Representing a frequency domain map->Spectral values at the coordinates; />Representing a frequency domain map->Spectral weights at; />The maximum spectrum value and the minimum spectrum value of the frequency domain diagram after log transformation are respectively represented.

The phase difference loss acquisition method comprises the following steps:

；

wherein the method comprises the steps ofFor loss of phase difference->Table initial frequency domain image +.>Phase at coordinates; />To represent the angle value corresponding to the circumference ratio; />Is the real part of the complex number, ">Is the imaginary part of the complex number; />Representing a frequency domain map->Spectral values at the coordinates; />Representing reconstructed frequency domain image +.>Phase at coordinates.

The method for acquiring the weighted frequency domain loss comprises the following steps:

；

representing the length, width, < > of the initial frequency domain image>Respectively representing +.in the original frequency domain image and the reconstructed frequency domain image>Scaling value of the modulus at coordinates, +.>Representing a frequency domain map->Spectral weights at;representing an initial frequency domain image +.>Phase at coordinates; />To represent the angle value corresponding to the circumference ratio; />Representing reconstructed frequency domain image +.>Phase at coordinates; />And (5) losing the weighted frequency domain.

The method for weighting the space frequency loss comprises the following steps:

for the reconstructed images, LOG operator filtering of three scales is carried out on each channel image, and finally, binary images of three scales of each channel are obtained;

acquiring weighted spatial frequency loss:

；

k is the number of channels in the reconstructed image,representing spatial frequency loss weight at the first scale, < ->Representing an image of the kth channel label image after being filtered by a first scale LOG operator; />Representing an image of the k channel network reconstructed image after being filtered by a first scale LOG operator; MSE represents the mean square error loss function.

The spatial frequency loss weight is obtained by the following method:

then, the spatial frequency loss weight at each scale is obtained:

；

i represents the LOG operator of the ith scale, n represents the scale number of the LOG operator,the Gaussian standard deviation representing the ith LOG operator; />Representing spatial frequency loss weights for the i-th scale.

The airspace loss function adopts a structural similarity loss function.

In a second aspect, another embodiment of the present invention provides an image reconstruction system based on a self-encoding neural network, including a memory, a processor, and a computer program stored in the memory and running on the processor, the processor implementing the steps of the method according to any one of the preceding claims when the computer program is executed.

Compared with the prior art, the invention has the beneficial effects that:

1. according to the invention, the image is converted into the spectrogram, and the weighted spectrum loss is provided, so that the self-coding neural network is more concerned with the reconstruction precision of the high-frequency component, therefore, a clear image can be reconstructed, the detection of fine texture anomalies and edge anomalies is easier, and the accuracy of anomaly detection is improved.

2. The invention can enable the network to learn the outline and detail information of the main body in the image more effectively through weighting the space frequency loss function.

3. The weighted spectrum loss adopts the L1 distance, the L1 distance cannot overestimate a large value such as an outlier, and the low-frequency components frequently contained in the image can not be considered more in the learning process by adopting the method.

4. The invention combines phase loss to measure the distance difference of complex planes between spectrograms, and can avoid the situation that the distances are the same or close, but the real parts and the imaginary parts of the spectrum complex numbers are very different.

Drawings

In order to more clearly illustrate the embodiments of the present description or the technical solutions in the prior art, the drawings that are required for the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present description, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a method flowchart of an embodiment of an image reconstruction method based on a self-encoding neural network according to the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only some embodiments of the present invention, but not all embodiments, and all other embodiments obtained by those skilled in the art based on the embodiments of the present invention are within the scope of protection of the embodiments of the present invention.

An image reconstruction method based on a self-coding neural network, the framework of which is shown in fig. 1, comprises the following steps:

s001: and acquiring an image, constructing a self-coding convolutional neural network, inputting the image, and outputting a reconstructed image. The input image and the reconstructed image are subjected to frequency domain transformation, and the spatial domain image is converted into a frequency domain image.

S002: and calculating frequency domain error damage according to the difference of the spectrum images of the images before and after reconstruction on the complex plane. And carrying out logarithmic transformation according to the frequency spectrum values of the frequency domain images of the images before and after reconstruction to adjust the value range, and carrying out inverse mapping of the value range to obtain the frequency spectrum weight. And acquiring a phase according to the frequency domain images of the images before and after reconstruction, and performing angle processing on the phase to obtain phase difference damage. And acquiring weighted frequency domain loss based on the frequency domain error loss, the phase difference damage and the frequency domain weight.

S003: and constructing weighted spatial frequency loss and spatial domain image loss based on the input image and the reconstructed image.

S004: the self-encoding convolutional neural network is trained based on weighted frequency domain loss, weighted spatial frequency loss, and spatial domain loss.

various methods have been proposed for the problem of abnormality detection. The self-encoding neural network-based method is one of the common methods for solving this problem, and provides information for judging whether a pixel is normal or not.

The transmitted self-encoder method performs anomaly detection based on the mean square difference of the input image and the reconstructed image. Since the model learns normal features from the training images, we assume that when the reconstruction quality is particularly low, the missing regions will be normally reconstructed with high quality. The disadvantage of using the mean square error loss is that it causes the reconstruction of the high frequency components to become very blurred and there is a problem of increased residuals even if the image is normal. Therefore, the reconstruction errors of the normal region and the abnormal region cannot be distinguished, and sufficient accuracy of abnormality determination cannot be obtained.

The image to be built can be an RGB color channel image or a gray scale image, and if the image to be built is an RGB image, color space conversion is needed to be carried out, and the image to be built is converted into the gray scale image. The gray scale conversion is a well-known technique and will not be described in detail here.

Specific training details of the network model are as follows:

an image is input into the self-encoder, the image being a grayscale image.

The self-encoder is an encoder and a decoder structure, and the output label is an image when input. Reconstruction of the image can be achieved by means of a self-encoding structure.

The encoder realizes fitting, feature extraction and downsampling of input data, outputs feature graphs, wherein the number of experience of the feature graphs is 512, namely 512 convolution kernels representing the convolution of the last layer in the encoder, then inputs the feature graphs into a decoder, and the decoder performs fitting and upsampling to output original data.

The loss function adopted by the self-encoder is a combination of weighted frequency domain loss, spatial domain loss and weighted spatial frequency loss.

The joint loss function is as follows:

the weights of the three terms are respectively shown, and the empirical values are 0.5, 0.3 and 0.2.

The space domain loss has various types, such as mean square error, SSIM and the like, the scheme adopts an SSIM loss function, the SSIM is a measure relatively close to the image similarity perceived by human, and the adoption of the SSIM as the loss function can have better results than the method using the mean square error.

The method for acquiring the joint loss function comprises the following steps:

firstly, the image to be built is converted from a space domain to a frequency domain by using discrete Fourier transform.

Wherein the image size is M×N,Representing coordinates of the image pixels in the spatial domain, < +.>Is a frequency representation of the image f, < >>Is the coordinates of the spatial frequencies over the frequency spectrum.

In addition, the exponential portion of this equation can be rewritten with the Euler equation:

the following are provided:

from the equation, it can be seen that the fourier transform is a method of decomposing a function into a sum of a sine wave function and a cosine wave function of a frequency multiplied by a fundamental frequency.

in order to improve the accuracy of the reconstruction, the present invention is implemented by defining the loss function in the frequency domain. In general, natural images contain many low frequency components, while high frequency components are rare. Therefore, in order to improve the accuracy of high frequency component reconstruction, the scheme of the invention introduces a new loss function weighted frequency domain loss (WFL). WFL provides a clearer reconstructed image, which helps to improve the accuracy of anomaly detection.

The distance before and after the frequency domain reconstructed image is defined taking into account the loss function in the frequency domain.

An image that converts an image of a spatial domain into a frequency domain by fourier transform is mapped to a complex space. Therefore, it is necessary to define a distance considering the real and imaginary parts. Therefore, we calculate the difference in complex plane of the images before and after reconstruction. Then, an absolute value of the difference value considering the complex number is calculated, and an average value is defined as a distance between images in the frequency domain.

Wherein the method comprises the steps ofIs a function of the acquisition of the real part,/->Is to obtain the imaginary partIs a function of (2). />That is, the geometric meaning of the modulus at the u and v coordinates in the spectrogram is a point +.>Distance to origin. />Representing the difference in complex plane between the original image and the spectral image. />For mapping coefficients +.>Representing a frequency domain map->Spectral values at coordinates +.>Representing the initial frequency domain image->Pixel value at coordinates +_>Representing the reconstructed frequency domain image +.>The pixel value at the coordinates is 5000 as a result of the large range of the spectrum.

and considering a weighting equation according to the frequency value, and improving the accuracy of high-frequency component reconstruction. The weights should be set so that the higher the frequency, the higher the value is taken. This is expected to increase the gradient of the reconstruction error from the encoder high frequency component. When corresponding to each frequency component of the two-dimensional Fourier representationFrom the following componentsWhen expressed, the corresponding weight->The definition is as follows:

because of the large range of fourier spectra, log-log transformation is used to transform the range. In represents a logarithmic function based on e.

The addition of 1 to the true number is to avoid logarithmic processing of 0.

After log transformation, the bright points reflect the low-frequency information of the image, namely the smooth part of the image, and the energy is high because the smooth part occupies a higher proportion of the image; while the edge information of the image, i.e. the high frequency information, i.e. where the image abrupt changes relatively little, is low in energy. The higher the energy, the brighter it appears in the spectrogram, so the higher the spectral value of the low frequency component. The inverse mapping of the values is thus performed as a weight:

i.e. the maximum and minimum of the log transformed spectrogram.

since the distance in the complex plane is measured, there is a situation that the possible distances are the same or close, but the real part and the imaginary part of the spectrum complex number are greatly different, so the phase loss is adopted to avoid the situation.

Acquiring the phase:

finally, a phase difference loss function is obtained:

wherein the method comprises the steps ofFor loss of phase difference->Representing an initial frequency domain image +.>Phase at coordinates; />To represent the angle value corresponding to the circumference ratio; />Is the real part of the complex number, ">Is the imaginary part of the complex number; />Representing a frequency domain map->Spectral values at the coordinates; />Representing reconstructed frequency domain image +.>Phase at coordinates.

The phase difference loss function can ensure that the value of the frequency spectrum of the restored image is close to that of the original image to the greatest extent.Indicating phase loss, since the phase is an angle value, the denominator is +.>The decimal is obtained, the calculation is convenient, and the angle value corresponding to the circumference ratio is represented.

finally, a loss function WFL is obtained:

By means of the loss function, accurate image reconstruction can be performed on the high-frequency components of the original image. The weighted spectrum loss adopts an L1 distance, and the L1 distance does not overestimate a large value such as an outlier, and by adopting the method, low-frequency components frequently contained in the image can be not considered in the learning process. Meanwhile, the distance difference of complex planes between the spectrograms is measured by combining the phase loss, so that the situation that the distances are the same or close to each other but the real parts and the imaginary parts of the spectrum complex numbers are very different can be avoided.

To train the model using this loss function, a fourier representation is first obtained from the original image by a two-dimensional fourier transform. The original image is then taken as input to an automatic encoder, resulting in a reconstructed image. Furthermore, a fourier representation of the reconstructed image is also obtained by a two-dimensional fourier transform.

Learning using a loss function based on pixel loss (i.e., the mean square error between the pixel values of the original image and the reconstructed image) can result in blurring of the reconstructed image, which indicates the lack of high spatial frequency components. Since the blur display lacks information in the features extracted in the hidden layer to reproduce the original image, solving this problem is very important for feature extraction.

The high spatial frequency components are mainly composed of edges and detail textures, which are important features for tasks such as object detection and spatial matching. Thus, generating a clearer reconstructed image by compensating for the deficiencies of the components is important for feature extraction.

Is Gaussian standard deviation, which is a scale parameter, +.>The circumference ratio is expressed, and e is a mathematical constant. x and y represent coordinates of the image in x and y directions.

The laplace filter has a bandpass characteristic. The sub-bands passing through the filter vary proportionally; the smaller the scale used, the higher the spatial frequency of the transfer. Thus, using the laplacian filter bank, we can extract from the original imageAnd extracting features of each sub-band from the reconstructed image. An example of the output of a laplacian filter bank. The output reflects the spatial frequency because the smaller the scale used, the smaller the change in brightness extracted.Frequency response of laplacian filters=0.8, 1.6, and 3.2.

And (3) for the RGB reconstructed image output by the network, respectively carrying out LOG operator filtering of three scales on each channel, and finally, obtaining a binary image of three scales of each channel.

Then, the spatial frequency loss weight at each scale is obtained:

Obtaining space frequency loss:

k is the number of channels in the reconstructed image,representing spatial frequency loss weight at the first scale, < ->Representing the image of the kth channel label image after being filtered by the first scale LOG operator. />Representing the kth channel networkAnd the reconstructed image is filtered by a first scale LOG operator. MSE represents the mean square error loss function, and its formulation is not repeated here.

By the weighted space frequency loss function, the network can learn the outline and detail information of the main body in the image more effectively.

Further, acquiring a airspace loss function;

the space domain loss is various, such as mean square error and SSIM, and the scheme adopts an SSIM loss function, wherein the SSIM is a measure of image similarity relatively close to human perception, and the adoption of the SSIM as the loss function can obtain better results than the method using mean square error.

The network can effectively improve the reconstruction precision of the high-frequency component and the contour and detail information of the main body in the image based on the weighted frequency domain loss, the airspace loss and the weighted spatial frequency loss, so that the detection of fine texture abnormality and edge abnormality is easier, the quality of the reconstructed image is improved, and the abnormality detection accuracy is improved.

The method can also be applied to compression of image data, the image data is trained by a self-coding network, the characteristic image output by an encoder in the encoder network is extracted and used as a hidden layer vector of an original image to be stored in a database, the storage space can be reduced to a certain extent, and finally the reconstructed original image can be output by a decoder through the characteristic image.

Based on the same inventive concept as the method, the invention also provides an image reconstruction system based on the self-coding neural network, which comprises a processor and a memory, wherein the processor is used for executing a program of an image reconstruction method embodiment based on the self-coding neural network stored in the memory; since the embodiment of the image reconstruction method based on the self-coding neural network has been described in the above embodiment of the method, the description thereof will not be repeated here.

It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (modules, systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Claims

1. An image reconstruction method based on a self-coding neural network is characterized by comprising the following steps:

the method for acquiring the joint loss function comprises the following steps:

2. The image reconstruction method based on a self-coding neural network as claimed in claim 1, wherein the method for obtaining the frequency domain error loss according to the difference between the initial frequency domain image and the reconstructed frequency domain image on the complex plane comprises the following steps:

；

representing the difference of said initial frequency domain image, said reconstructed frequency domain image on the complex plane,/o>Is the real part of the complex number, ">Is the imaginary part of the complex number, ">Representing the length and width of the initial frequency domain image, respectively representing +.>I.e. representing +.>Scaling value of the modulus at coordinates, +.>For mapping coefficients +.>Representing a frequency domain map->Spectral values at coordinates +.>Representing the initial frequency domain image->Pixel value at coordinates +_>Representing the reconstructed frequency domain imagePixel values at coordinates.

3. The image reconstruction method based on a self-coding neural network as set forth in claim 1, wherein the method for performing mathematical transformation to adjust the value range according to the spectral values in the initial frequency domain image and the reconstructed frequency domain image and performing inverse mapping of the value range to obtain the spectral weight comprises the steps of:

；

wherein the method comprises the steps ofRepresenting +_in frequency domain plot>The spectrum value of the coordinate after log transformation; />For logarithmic transformation>Is the real part of the complex number, ">Is the imaginary part of the complex number; />Representing a frequency domain map->Spectral values at the coordinates; />Representing a frequency domain plotSpectral weights at; />The maximum spectrum value and the minimum spectrum value of the frequency domain diagram after log transformation are respectively represented.

4. The image reconstruction method based on a self-coding neural network as claimed in claim 1, wherein the phase difference loss acquisition method is as follows:

；

5. The image reconstruction method based on a self-coding neural network as claimed in claim 1, wherein the weighted frequency domain loss obtaining method is as follows:

；

representing the length, width, < > of the initial frequency domain image>Respectively representing +.in the original frequency domain image and the reconstructed frequency domain image>Scaling value of the modulus at coordinates, +.>Representing a frequency domain map->Spectral weights at; />Representing an initial frequency domain image +.>Phase at coordinates; />To represent the angle value corresponding to the circumference ratio; />Representing reconstructed frequency domain image +.>Phase at coordinates; />And (5) losing the weighted frequency domain.

6. The image reconstruction method based on a self-coding neural network as claimed in claim 1, wherein the method for weighting the spatial frequency loss is as follows:

acquiring weighted spatial frequency loss:

；

7. The image reconstruction method based on a self-coding neural network as claimed in claim 6, wherein the spatial frequency loss weight is obtained by:

then, the spatial frequency loss weight at each scale is obtained:

；

8. The method of image reconstruction based on a self-encoding neural network of claim 1, wherein the spatial domain loss function employs a structural similarity loss function.

9. An image reconstruction system based on a self-encoding neural network, comprising a processor and a memory, wherein the processor is configured to execute an image reconstruction method based on a self-encoding neural network as claimed in any one of claims 1 to 8 stored in the memory.