CN114966860A

CN114966860A - Seismic data denoising method based on convolutional neural network

Info

Publication number: CN114966860A
Application number: CN202210523847.2A
Authority: CN
Inventors: 刘玉敏; 唐德东; 孙永河; 宋乐鹏; 刘露; 魏海军
Original assignee: Chongqing University of Science and Technology
Current assignee: Chongqing University of Science and Technology
Priority date: 2022-05-13
Filing date: 2022-05-13
Publication date: 2022-08-30

Abstract

The invention relates to a seismic data denoising method based on a convolutional neural network, which comprises the steps of firstly constructing an initial data set, constructing a cyclic generation anti-convolutional neural network, wherein the cyclic generation anti-convolutional neural network is mainly based on the structure of the cyclic generation anti-convolutional network, the structure of the cyclic generation anti-convolutional network comprises a generator and a discriminator, the non-local neural network is used as a residual error and is connected between convolutional layers of the generator, and the original full connection mode is replaced by a PatchGAN output mode in the discriminator. Then, the anti-convolution neural network is trained for the ring generation, and finally, denoising processing is carried out on the seismic data with noise. Experiments prove that the method has good denoising effect.

Description

Seismic data denoising method based on convolutional neural network

Technical Field

The invention relates to the technical field of data processing, in particular to a seismic data denoising method based on a convolutional neural network.

Background

National coal, oil and gas energy sources are important pillars for national economic development, and the exploitation of the energy sources cannot be separated from seismic exploration technology. The seismic exploration technology is a geophysical exploration method which utilizes different impedances between underground rock stratums to generate reflection and refraction of different angles and speeds due to the fact that elastic waves caused by artificial excitation are different, collects sound wave information of the earth surface through a receiving device, analyzes the propagation rule of seismic waves in the underground after filtering, stacking, shifting and the like, and infers the structure and the position of the underground rock stratums. The method uses the elastic waves excited by a manual method to position the mineral deposits and acquire engineering geological information.

With the development of exploration technology, the exploration focus of oil and gas field resources gradually shifts to remote areas with complex surface conditions. The problem is faced in the severe acquisition environment, the amplitude attenuation of the long-distance seismic signals is extremely serious in the transmission process, and the seismic waves reflected back by the deep layer are almost submerged in noise, so that the acquired seismic data have strong noise interference, the corresponding geological structure and rock stratum property information is hidden, and the effectiveness of the data is reduced. Finally, the reliability of the seismic data is influenced, so that the subsequent interpretation work is seriously difficult, the acquired seismic data cannot be accurately interpreted, and a researcher is required to analyze a high-quality geological section structure diagram in a computer data processing mode. Noise is effectively eliminated through data processing, so that seismic data are recovered to be convenient for geological interpreters to identify.

Aiming at the requirements of eliminating seismic data noise, improving the quality of seismic data, highlighting effective signals and the like, a plurality of effective denoising algorithms exist at present. The traditional attenuation algorithm mainly comprises a transform domain algorithm, a space domain algorithm and a comprehensive type denoising algorithm. Fourier transforms that can convert time domain signals into frequency domain signals have been proposed and more researchers have applied fourier transforms to different fields. The application of the method for denoising the seismic data is frequency space domain denoising, frequency wave number domain denoising and wavelet transform denoising based on Fourier transform, and the method is to separate signal and noise according to the characteristic difference of effective signals and noise in the seismic data and then recombine the effective signals after separation.

The traditional classical denoising method comprises empirical mode decomposition, variational mode decomposition, singular value decomposition and the like, a model is established according to the spatial distribution rule of seismic data, and the method has the limitation that the reconstruction scale of effective signals is not well controlled. With the rapid development of neural networks in recent years, more and more people try to apply neural network algorithms with autonomous learning capability to the denoising of seismic data. But the learning ability of the BP neural network is unstable and local minima are easy to occur. With the improvement of the network structure and the innovation of the gradient updating algorithm, the depth of the neural network is increased, so that the concept of deep learning is proposed. The deep learning model comprises a convolutional neural network, a deep confidence network and a stack self-coding network. The convolutional neural network is widely applied to the field of data denoising due to the capability of weight sharing and mainly aiming at multi-dimensional data. In 2009, v.jain and s.seung, used convolutional neural networks for natural image denoising, which laid a foundation for attenuating seismic data noise. In the field of seismic denoising, the neural network framework is suitable for removing random noise of seismic data and consists of an input layer, a convolution layer, an activation layer, a normalization layer and an output layer. The model trains the network with noisy seismic data as input, noise as output, and mean square error as a loss function. The prior people apply the convolutional neural network to carry out a lot of valuable researches on the seismic data denoising processing. In 2008, Jain et al used CNN for the first time for denoising, demonstrating that convolutional neural networks can directly learn the end-to-end nonlinear mapping from low-quality images to clean images, and have achieved good results. In 2019, the traditional method for denoising seismic data by the Sieei Yu is analyzed and compared with a deep learning method, the deep learning denoising method can realize self-adaptive denoising, accurate signal modeling and optimal parameter selection are not needed, a convolutional neural network is used for proving good denoising performance, and the denoising effect of the existing method on the seismic data is not ideal.

Disclosure of Invention

Aiming at the problems in the prior art, the technical problems to be solved by the invention are as follows: the existing method has no ideal denoising effect on seismic data.

In order to solve the technical problems, the invention adopts the following technical scheme: a seismic data denoising method based on a convolutional neural network comprises the following steps:

s1: constructing an initial data set: the initial data set comprises synthetic seismic data and actual seismic data obtained after the acquired seismic data volume is denoised, the formats of the synthetic seismic data and the actual seismic data are respectively converted into a universal format, the synthetic seismic data and the actual seismic data after the formats are converted are visualized to form an image, and the image is correspondingly stored into a png format image;

expanding each saved png format image to obtain a plurality of images;

adding Gaussian white noise to all the images obtained by the expansion, wherein the density degrees of the Gaussian white noise added to different images in all the images obtained by the expansion are different;

respectively cutting each image added with the Gaussian white noise and the corresponding image not added with the Gaussian white noise to obtain the same number of small-size images, wherein each small-size image obtained by cutting the image added with the Gaussian white noise is used as a training sample, each small-size image obtained by cutting the image not added with the Gaussian white noise is used as a label of the corresponding training sample, and all the training samples form a training set;

s2: constructing a loop generation anti-convolution neural network, wherein the loop generation anti-convolution neural network takes the structure of the loop generation anti-convolution network as a main network, the structure of the loop generation anti-convolution network comprises a generator and a discriminator, and the non-local neural network is connected between convolution layers of the generator as a residual error;

replacing the original full connection mode with a PatchGAN output mode in a discriminator;

s3: training a loop to generate the deconvolution neural network, and circularly generating the deconvolution neural network after optimization, wherein the process is as follows: initializing a cycle to generate parameters of the deconvolution neural network, taking a training sample as the input of the cycle to generate the deconvolution neural network, calculating the loss between a predicted value output by the cycle to generate the deconvolution neural network and a training sample label by taking a minimum square root function as a loss function, reversely updating the parameters of the cycle to generate the deconvolution neural network according to the loss, and obtaining the optimized cycle to generate the deconvolution neural network after the maximum training times is reached;

s4: and converting the seismic data to be processed into a png image to be predicted by adopting the method in S1, inputting the png image to be predicted into an optimized cycle to generate an anti-convolution neural network, and outputting a denoised sectional view.

As an improvement, the non-local neural network structure in S2 is specifically as follows:

the mathematical model of the non-local neural network is defined as follows:

x in the formula (1) _i And x _j All are feature maps after convolution; y is _i Is the input of the residual error, size and x _i Keeping consistent; i. j is a position index of the two-dimensional data; f is an embedded Gaussian function used to measure the similarity between distant features; g is a non-linear mapping, C (x) is a normalization factor;

in the formula (2), theta (x) _i )＝W _θ x _i 、ψ(x _j )＝W _ψ x _j Convolution maps each of which has a convolution kernel of 1 × 1 size, W _θ 、W _ψ Weights for which network training is required;

for a given value of i, the number of bits in the bit is,

is the softmax calculation along dimension j.

In the PatchGAN output method in S2, the last layer of the discriminator is replaced with a full convolutional layer instead of a full link layer.

As an improvement, the generator structure in S2 is: the interlayer structure of the generator is composed of 2 sparse blocks, 4 convolutional layers, 4 non-local residual modules and 4 deconvolution layers, the activation function uses LeakyReLU and Tanh, and an example normalization layer is added between the convolutional layers and the activation function.

As an improvement, the structure of the discriminator in S2 is: the main interlayer structure of the discriminator consists of 4 convolutional layers and 1 full-connection layer, an activation function is LeakyReLU, and an example normalization layer is added between the convolutional layers and the activation function.

As an improvement, the example normalization layer calculation process is as follows:

first, calculate the mean value of the feature pattern X in the channel, which is defined as mu

Then, the X variance of the characteristic diagram in the channel is calculated and defined as sigma ²

If a scaling parameter λ and a translation parameter γ are added, then normalization is:

Y＝λX ₁ +γ (5)。

compared with the prior art, the invention has at least the following advantages:

the method is characterized in that the structure of the circularly generated countermeasure network is used as a main network, the structure of the circularly generated countermeasure network comprises a generator and a discriminator, and the non-local neural network is used as a residual error and connected between convolution layers of the generator. The convolutional neural network extracts local features, cannot cover the whole situation, and is easy to lose features while the depth of the network is increased.

The non-local operation has the following advantages: firstly, interaction between any two positions in two-dimensional data can be realized, and the distance between the two positions is ignored; second, the input size of the non-local operation can vary, and therefore, can be combined with a convolutional neural network. In the denoising study of the invention, the main texture features of the seismic data image are localized rather than global, such as the texture of rock strata and hydrocarbon reservoirs. Based on this, in order to enhance the non-loss of texture information, the present invention proposes to combine a non-local neural network with a loop-generating countermeasure network structure. The discriminator uses shallow layer convolution nerve network, and the PatchGAN output mode is used to replace the original full connection mode at the last layer of the discriminator. Experiments prove that the method has good denoising effect.

Drawings

Fig. 1 is a diagram of a residual structure.

Fig. 2 is a view of attention structure.

Fig. 3 is a non-partial block diagram.

FIG. 4 is a diagram of a generator architecture.

Fig. 5 is a diagram of a discriminator network.

FIGS. 6a and 6b are a comparison of denoising performance of cycleGAN under different numbers of training sets, wherein FIG. 6a is a mean square error comparison graph, and FIG. 6b is a peak signal-to-noise ratio comparison graph.

Fig. 7a and fig. 7b are comparison of evaluation of training results of synthesized data samples, wherein fig. 7a is a mean square error comparison graph, and fig. 7b is a peak signal-to-noise ratio comparison graph.

IN the figure, NLN represents a non-local neural network, Conv represents a convolutional layer, leak Relu represents an activation function, IN represents an example normalization layer, Tconv represents a transposed convolution, and the function of the transposed convolution is to ensure the size of an output image and the size of an original input image are equal.

Detailed Description

The present invention is described in further detail below.

A seismic data denoising method based on a convolutional neural network comprises the following steps:

s1: constructing an initial data set: the initial data set comprises synthetic seismic data and actual seismic data obtained after the acquired seismic data volume is denoised, the formats of the synthetic seismic data and the actual seismic data are respectively converted into a universal format, and then the synthetic seismic data and the actual seismic data after the formats are converted are visualized to form an image and are correspondingly stored to form a png format image; adopting seismic data synthesized by common software in the oil exploration industry, wherein the synthesized seismic data and the collected seismic data are in a format used in the oil exploration industry, and converting standard magnetic tape data used in the oil exploration industry into a general format, namely converting the SEGY format into the mat format in the matlab environment; the process of visualizing the image of the formatted synthetic seismic data and the actual seismic data is known in the art.

Expanding each saved png format image to obtain a plurality of images; the translation or rotation or scaling processing of each png format image to obtain a plurality of images is the prior art.

And adding Gaussian white noise to all the images obtained by the expansion, wherein the density of adding the Gaussian white noise to different images in all the images obtained by the expansion is different.

Respectively cutting each image added with the Gaussian white noise and the corresponding image not added with the Gaussian white noise to obtain the same number of small-size images, wherein each small-size image obtained by cutting the image added with the Gaussian white noise is used as a training sample, each small-size image obtained by cutting the image not added with the Gaussian white noise is used as a label of the corresponding training sample, and all the training samples form a training set; the image added with the Gaussian white noise and the image without the Gaussian white noise are respectively cut by the existing method, the cutting window acts on the whole single-channel two-dimensional gray image, and the cutting is carried out from left to right and from top to bottom by the moving step length smaller than the size of the cutting window. The cut small-sized image set is used as a training sample.

S2: and constructing a loop generation anti-convolution neural network, wherein the loop generation anti-convolution neural network takes the structure of the loop generation anti-convolution network as a main network, the structure of the loop generation anti-convolution network comprises a generator and a discriminator, and the non-local neural network is connected between convolution layers of the generator as a residual error.

In order to represent more characteristics of the output of the discriminator, the original full-connection mode is replaced by a PatchGAN output mode in the discriminator. The PatchGAN output mode is that the full convolution layer replaces the full connection layer at the last layer of the discriminator, and the output of the full convolution is not a real number but a two-dimensional matrix diagram window. The two-dimensional matrix carries more characteristic information than one numerical value, so that the judgment error can be reduced. The PatchGAN output scheme makes the size of the output data equal to the size of the tag data.

Specifically, the non-local neural network structure in S2 is specifically as follows:

the mathematical model of the non-local neural network is defined as follows:

in the formula (1), x _i And x _j All are feature maps after convolution; y is _i Is the input of the residual error, size and x _i Keeping consistent; i. j is a position index of the two-dimensional data; f is an embedded Gaussian function used to measure the similarity between distant features; g is a non-linear mapping, essentially a convolution with a convolution kernel of size 1 × 1; c (x) is a normalization factor;

in the formula (2), theta (x) _i )＝W _θ x _i 、ψ(x _j )＝W _ψ x _j Convolution maps, W, all with convolution kernel of 1 × 1 size _θ 、W _ψ Weights for which network training is required;

for a given value of i, the number of bits in the bit is,

is the softmax calculation along dimension j. The overall structure of the non-local module is essentially the combination of a residual error network and an attention mechanism, and the two network learning structures have wide application in a plurality of research backgrounds.

The structure of the non-local neural network is introduced as follows:

the residual network is used for solving the phenomenon that the gradient disappears and the performance is degraded along with the increase of the network depth. During the process of network building, the representation form of the residual network is that branch connection appears on the main branch of the deep network, for example, the output of the first convolution layer is added with the output of the third convolution layer. The spacing of the residual connections is not particularly critical, which ensures that when there is a failure in one of the inter-layer trains in the main tributaries, the subsequent normal trains are not affected, and the residual network consequently facilitates an increase in the network depth.

Referring to fig. 1, the input features X are convolved layer by layer on the main branch, and the result F (X)) is added to X connected from the branch at the node, and the addition result is used as the input of the next layer.

Extracting and selecting suitable features is very important for image processing applications. However, extracting features is a challenge for given complex background images, for which reason an attention mechanism is applied.

According to the invention, an attention module is embedded in a non-local neural network structure, and the attention mechanism is to utilize a current stage to guide a previous stage to learn characteristic information. The attention mechanism used by the invention focuses on the functions of different branches in the same network, guides the image application of the previous stage, and is integrated into the CNN for image denoising. The attention mechanism is shown in fig. 2.

The invention combines an attention mechanism and a residual error network to form a non-local convolution neural network, and considers the correlation of global characteristics in the process of model training.

Referring to fig. 3, N is the number of batches, H, W represents the height and width of the two-dimensional data, respectively, C is the number of channels,

representing the multiplication of a matrix by a matrix,

representing a matrix addition. A non-local residual block is located within the dashed box. The residual linking is represented as:

Z＝W _z y+x (6)；

z in equation (4.7) is the total output of the residual linkage, W _z Is the weight of the convolution kernel 1 × 1.

Assuming that the input is a 1 × 50 × 50 × 64 feature map x, the convolution kernel is performed in 3 branches to form a 1 × 1 convolution. The 1 × 50 × 50 × 32 feature maps are output together, and then dimension reorganization and transposition processing are performed on the feature maps, namely 2500 × 32, 32 × 2500 and 2500 × 32 from left to right. The first two are subjected to matrix multiplication to obtain 2500 multiplied by 2500 output, and then weighted output of the attention weight value of softmax is performed. The output result of the attention mechanism is multiplied by the convolution output of the third branch in a matrix mode, a 2500 x 32 feature map is output, and then the 2500 x 32 feature map dimension is split into sizes of 1 x 50 x 32. The feature map is then restored to the original size 1 × 50 × 50 × 64 by convolution with a convolution kernel of size 1 × 1. Finally, the original input and the point-to-point addition output Z between the restored size characteristic diagram elements are connected by residual errors.

The invention comprehensively considers the advantages of the non-local neural network, combines the non-local neural network with the convolutional neural network, introduces the non-local operation into the generation countermeasure network as the residual linkage of the generator, extracts the global information of the spatial sequence data and improves the denoising capability.

Specifically, referring to fig. 4, the generator structure in S2 is: the interlayer structure of the generator is composed of 4 convolutional layers (Conv), 4 non-local residual modules (Res Block, RB) and 4 deconvolution layers (TConv), the activating function uses LeakyReLU and Tanh, and an example Normalization layer (Instance Normalization) is added between the convolutional layers and the activating function

Specifically, referring to fig. 5, the structure of the discriminator in S2 is as follows: the main interlayer structure of the discriminator uses LeakyReLU by 5 convolutional layers and an activation function, and an example normalization layer is added between the convolutional layers and the activation function.

Specifically, the example normalization layer calculation process is as follows: (ii) a

Firstly, calculating the mean value of a characteristic diagram X in a channel, and defining the mean value as mu;

then, the X variance of the characteristic diagram in the channel is calculated and defined as sigma ² ；

Y＝λX ₁ +γ (5)。

the weight value of the deep learning network training has a common problem in the process of interlayer transmission, and a covariate offset phenomenon occurs in the distribution of input data after the weight value is updated. If the obtained feature map has a covariate phenomenon, the network parameters are adapted to a new distribution rule in the learning process of the next layer, which greatly reduces the efficiency of network learning. For small-batch samples generated and cut by the method, the textural features of the images have great difference, and example standardization (IN) is selected to carry out standardization processing on each characteristic map IN the image denoising working process.

In the invention, except for convolution layers in a residual error module, the sizes of convolution and transposition convolution are set to be 4 multiplied by 4, the step length is set to be 2, the number of convolution kernels of the first layer of convolution is 32, the number of the convolution kernels is usually increased by an exponential order with the base 2 along with the increase of the number of the convolution layers, and 256 feature maps are output by the last layer of convolution. In order to keep the final output of the generator and the initial input data in equal size and equal number, and then perform the operation of transposition convolution, the transposition convolution is also called as deconvolution, the size before convolution is restored to the feature map according to the given parameters of the convolution kernel, and the original value is not restored in the process.

Compared with the generator, the network structure of the discriminator is simplified. The device comprises 5 convolutional layers, a normalization layer and an activation function. The convolution kernels of the convolutional layer are defined to be 4 × 4 in size and 2 in step size, and the number of convolution kernels increases exponentially from 32 to 256.

In the invention, adam is used as an optimization algorithm for updating the weight, the setting of the learning rate needs to be repeatedly debugged, and if the selection is too small, the training time of the network can be prolonged; if the choice is too large, the optimal gradient may be skipped, resulting in difficulty in convergence. The present invention sets the learning rate to 0.002, empirically sets the decay rate of the first moment estimate to 0.5, the decay rate of the second moment estimate to 0.999, the training batch size to 1, and the iteration epoch (epoch) to 60 times.

Experiments and analyses

1. Model denoising experiment and analysis

For convenience, the method is abbreviated as CycleGAN.

In order to prevent the occurrence of maximum or minimum singular values of the data, the contribution rate of the weight parameters is influenced. Normalization processing is needed before data are input into the network, the normalization function can not only unify the reasonable range of the data, but also accelerate the training of the network, and the time cost is saved. The data of the present invention is a grayscale image with pixels in the range of 0 to 255, and the normalization operation is to divide each pixel by 255 to make the pixel value in the range of 0 to 1.

2. Evaluation index of denoising effect

The invention relates to a method for denoising seismic data, which generally comprises two methods for verifying the denoising quality of a network model: firstly, qualitatively, the visual de-noising image is observed through visual observation, and the existence degree of noise and whether the texture in the image is clear or not are seen. Secondly, the performance index is calculated quantitatively by a mathematical analysis method, and the denoising effect can be described by numerical comparison analysis and a curve diagram. Root mean square error (M) for the present invention _se ) And peak signal-to-noise ratio (P) _snr ) Evaluating the denoising effect of the network, and defining the denoising effect as follows:

in the formula: | g | calculation of luminance ₂ The expression is to calculate a second norm; n is a radical of _c％ Generating the total number of elements of the noise-free data matrix; max is the maximum value of the pixel value of the noise-free data. The root mean square error describes the square of the difference between the denoised image data and the noise-free data, and the smaller the difference is, the better the denoising effect is. The peak signal-to-noise ratio is also an objective standard for evaluating the denoising degree, and is described as a logarithmic value of a mean square error relative to the square of the maximum value of a signal, and the larger the peak signal-to-noise ratio is, the better the denoising effect is.

The constructed CycleGAN is verified by using seismic data synthesized by Tesseral-7.2.8, the synthesized seismic data contains 100 channels, each channel has 150 sampling points, and 200 sample sets and 400 sample sets are respectively expanded for the data samples. Gaussian white noise with the average value of 0 and the variance of 0.2 is artificially added to the sample set data to serve as a data input model. Before the denoising effects of different models are compared, different numbers of sample sets are used for detecting the influence of the number of training samples on the denoising performance of the cycleGAN, and 200 sample sets are used as a training set in the scheme; the protocol was used with 400 sample sets as training sets. The performance ratio of the two schemes is shown in fig. 6a and 6 b.

Fig. 6a and 6b show that the mean square error of the second scheme is attenuated more quickly and has lower error, the peak signal-to-noise ratio is better than that of the first scheme, and the mean square errors of the two schemes gradually approach with the increase of the epoch times, which shows that the countermeasure network can overcome the problem of low number of samples to a certain extent, the mean square error of the first scheme is attenuated slowly, and more epochs are needed to obtain lower errors. For comprehensive consideration, 400 samples are selected in the following denoising experiment.

3. Analysis of results of denoising experiments

In order to enhance the generalization capability of the network, the network model is trained by using a sample set containing different signal-to-noise ratios, and the denoising performance index in the training process is calculated. The experimental analysis is to compare the noise removal effect of the built network model with the noise removal effect of the common generation countermeasure network and the residual 18 network which are both deep learning algorithms.

As can be seen from comparison between FIG. 6a and FIG. 6b, the mean square error of the CycleGAN training is attenuated more quickly, the fluctuation is smaller, the stability is higher and the error is smaller, the peak signal-to-noise ratio of the data generated by the CycleGAN is higher, and the denoising effect is better. For more intuitive display of the degree of denoising, data visualization is shown in fig. 7a and 7 b.

The denoising effects of different deep learning algorithms can be visually compared with those of the FIG. 7a and the FIG. 7b, the denoising effect of the cycleGAN designed by the invention is more in noise suppression and clearer in imaging resolution compared with other algorithms, and the texture information is easy to identify.

Finally, the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made to the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, and all of them should be covered in the claims of the present invention.

Claims

1. A seismic data denoising method based on a convolutional neural network is characterized by comprising the following steps: the method comprises the following steps:

expanding each saved png format image to obtain a plurality of images;

adding Gaussian white noise into all the images obtained by the expansion, wherein the density degrees of the Gaussian white noise added into different images in all the images obtained by the expansion are different;

s2: constructing a loop to generate an anti-convolution neural network, wherein the loop generates the anti-convolution neural network by taking the structure of the loop generated anti-convolution neural network as a main network, the structure of the loop generated anti-convolution network comprises a generator and a discriminator, and the non-local neural network is connected between convolution layers of the generator as a residual error;

s3: training a loop to generate the deconvolution neural network, and after the loop is optimized, circularly generating the deconvolution neural network, wherein the process is as follows: initializing a cycle to generate parameters of the deconvolution resistant neural network, taking a training sample as the input of the cycle to generate the deconvolution resistant neural network, calculating the loss between a predicted value output by the cycle generation deconvolution resistant neural network and a training sample label by taking a minimum square root function as a loss function, reversely updating the parameters of the cycle generation deconvolution resistant neural network according to the loss, and generating the deconvolution resistant neural network by the optimized cycle after the maximum training times is reached;

2. The convolutional neural network-based seismic data denoising method of claim 1, wherein: the non-local neural network structure in S2 is specifically as follows:

the mathematical model of the non-local neural network is defined as follows:

x in the formula (1) _i And x _j All feature maps after convolution; y is _i Is the input of the residual error, size and x _i Keeping consistent; i. j is a position index of the two-dimensional data; f is an embedded Gaussian function used to measure the similarity between distant features; g is a non-linear mapping, C (x) is a normalization factor;

for a given value of i, the number of bits in the bit is,

is the softmax calculation along dimension j.

3. The convolutional neural network-based seismic data denoising method of claim 1, wherein: in the PatchGAN output method in S2, the full convolutional layer is used instead of the full link layer in the last layer of the discriminator.

4. The convolutional neural network-based seismic data denoising method of claim 3, wherein: the generator structure in the S2 is as follows: the interlayer structure of the generator is composed of 2 sparse blocks, 4 convolutional layers, 4 non-local residual modules and 4 deconvolution layers, the activation function uses LeakyReLU and Tanh, and an example normalization layer is added between the convolutional layers and the activation function.

5. The convolutional neural network-based seismic data denoising method of claim 4, wherein: the structure of the discriminator in the S2 is as follows: the main interlayer structure of the discriminator consists of 4 convolutional layers and 1 full-connection layer, an activation function is LeakyReLU, and an example normalization layer is added between the convolutional layers and the activation function.

6. The convolutional neural network-based seismic data denoising method of claim 4 or 5, wherein: the example normalization layer calculation procedure is as follows:

Y＝λX ₁ +γ (5)。