CN113068035B - Natural scene reconstruction method based on deep neural network - Google Patents

Natural scene reconstruction method based on deep neural network Download PDF

Info

Publication number
CN113068035B
CN113068035B CN202110285684.4A CN202110285684A CN113068035B CN 113068035 B CN113068035 B CN 113068035B CN 202110285684 A CN202110285684 A CN 202110285684A CN 113068035 B CN113068035 B CN 113068035B
Authority
CN
China
Prior art keywords
picture
layer
output
data
stimulation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110285684.4A
Other languages
Chinese (zh)
Other versions
CN113068035A (en
Inventor
余肇飞
张祎晨
贾杉杉
刘健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Lab
Original Assignee
Zhejiang Lab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Lab filed Critical Zhejiang Lab
Priority to CN202110285684.4A priority Critical patent/CN113068035B/en
Publication of CN113068035A publication Critical patent/CN113068035A/en
Application granted granted Critical
Publication of CN113068035B publication Critical patent/CN113068035B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/124Quantisation
    • H04N19/126Details of normalisation or weighting functions, e.g. normalisation matrices or variable uniform quantisers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • H04N19/149Data rate or code amount at the encoder output by estimating the code amount by means of a model, e.g. mathematical model or statistical model
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/44Decoders specially adapted therefor, e.g. video decoders which are asymmetric with respect to the encoder
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Algebra (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a natural scene reconstruction method based on a deep neural network, which comprises the following steps: s1, natural picture stimulation data and corresponding nerve response data are obtained; s2, constructing a pulse-picture converter which is a 3-layer fully-connected neural network, and comprising the following steps of: s21, the first layer of neurons receives pulse data of all ganglion cells as input, the second layer is a hidden layer and comprises a group of neurons, and the output of the first layer of neurons is received as input; s22, the third layer is an output layer, receives the output of the second layer as input, and activates according to an activation function, wherein the number of output neurons of the third layer is set as the number of pixels of the stimulated picture; s3, constructing an automatic encoder of pictures; s4, constructing a loss function by the output of the S21 and the S22 and the stimulation picture; s5, reconstructing a stimulation picture of the nerve response data according to the trained model.

Description

Natural scene reconstruction method based on deep neural network
Technical Field
The invention relates to the technical field of visual decoding and encoding of neural networks, in particular to a method for realizing natural picture and dynamic video stimulation based on a depth neural network and input according to neural signal reconstruction.
Background
70% -80% of the information acquired by humans comes from vision, the visual system is an important component of the brain nervous system, retinal neurons acquire external visual information, then transmit to the lateral knee, further transmit to the visual cortex, and finally form visual perception.
The existing computer vision algorithm has a certain limitation, and compared with the computer vision algorithm, the biological vision system has a plurality of unique advantages. Therefore, research on brain-like vision by referring to human brain vision mechanism may be a break-through for artificial intelligence and computer vision development. An important research problem in brain-like vision research is the problem of visual coding and decoding. Therefore, a novel decoding model can be constructed, the fine retinal nerve pulse signal data or the relatively coarse functional magnetic resonance data can be used for reconstructing a given visual natural picture and video, and the corresponding natural image and dynamic video stimulus can be restored through nerve signals.
Disclosure of Invention
In order to solve the defects in the prior art and achieve the purpose of reconstructing a corresponding complex natural image and dynamic video stimulation through fine pulse signals or coarse human brain functional magnetic resonance data, the invention adopts the following technical scheme:
a natural scene reconstruction method based on a deep neural network comprises the following steps:
s1, natural picture stimulation data and corresponding nerve response data are obtained;
s2, constructing a pulse-picture converter which is a 3-layer fully-connected neural network, and comprising the following steps of:
s21, the first layer of neurons receives pulse data of all ganglion cells as input, the number of the first layer of neurons is set to be the number of RGCs used, the second layer of neurons is a hidden layer, 512 neurons are included, and the output of the first layer of neurons is received as input, wherein the formula is as follows:
Figure BDA0002980355040000011
Figure BDA0002980355040000012
represents ReLU activation function, S is ganglion cell data, W 1 B is the weight between the first layer and the second layer 1 Is the firstBias of two layers, Y 1 Is the output of the second layer;
s22, the third layer is an output layer, the output of the second layer is received as input, and the output neuron number of the third layer is set as the stimulated picture pixel number according to the sigmoid function, and the formula is as follows:
O 1 =sigmoid(W 2 *Y 1 )+b 2 ) (2)
W 2 b is the connection weight between the second layer and the third layer 2 To bias, O 1 The output of the third layer is also the output of the pulse-picture converter;
s3, constructing an automatic encoder of pictures, namely a typical depth automatic encoder based on a Convolutional Neural Network (CNN), comprising the following steps:
s31, reducing the size of an input image by convolution and downsampling, wherein the size comprises four convolution layers, and the formula is as follows:
Figure BDA0002980355040000021
Figure BDA0002980355040000022
Figure BDA0002980355040000023
Figure BDA0002980355040000024
Wc 11 ,Wc 12 ,Wc 13 ,Wc 14 b is the convolution kernel of the four-layer convolution layer of the downsampling stage 11 ,b 12 ,b 13 ,b 14 For corresponding bias, Y 11 ,Y 12 ,Y 13 ,Y 14 Is the corresponding output;
s32, processing the image by adopting convolution and up-sampling, and recovering the texture of the down-sampled image while increasing the size of the down-sampled image, wherein compared with a down-sampling stage, the up-sampling stage further comprises four convolution layers, and the formula is as follows:
Figure BDA0002980355040000025
Figure BDA0002980355040000026
Figure BDA0002980355040000027
Figure BDA0002980355040000028
Wc 21 ,Wc 22 ,Wc 23 ,Wc 24 convolution kernel being a four-layer convolution layer of the upsampling stage, b 21 ,b 22 ,b 23 ,b 24 For corresponding bias, Y 21 ,Y 22 ,Y 23 ,O 2 Is the corresponding output;
s4, output O 1 、O 2 Constructing a loss function with the stimulation picture, and optimizing a reconstruction result output by the network;
s5, reconstructing a stimulation picture of the ganglion cells according to response data of the ganglion cells through the trained model.
Further, the S4 outputs O 1 、O 2 By loss function L compared with stimulus picture I 1 To optimize the output of the model, the formula is as follows:
L 1 :Loss=λ 1 ‖O 1 -I‖+λ 2 ‖O 2 -I‖ (5)
II is the mean square error loss, lambda 1 And lambda (lambda) 2 The weight lost by the two parts; optimizing reconstructed picture results using progressively smaller mean squaresDifference, make the output O of the model 1 、O 2 Gradually matching with the stimulation pictures I respectively to optimize the output of the model, wherein the formula of the mean square error function is as follows:
Figure BDA0002980355040000029
further, the S4 outputs O 1 、O 2 By loss function L compared with stimulus picture I 2 To make the output O of the model 1 、O 2 Respectively constructing two Loss functions with the stimulation pictures I, and alternately optimizing Loss 1 And Loss of 2 The formula is as follows:
L 2 :Loss 1 =λ 1 ‖O 1 -I‖,Loss 2 =λ 2 ‖O 2 -I‖ (6)
II is the mean square error loss, lambda 1 And lambda (lambda) 2 The weight lost by the two parts; finally, the optimized reconstructed picture result is obtained.
Further, the S4 outputs O 1 、O 2 By loss function L compared with stimulus picture I 3 Only the final output O of the model 2 Respectively constructing a loss function with the stimulated picture I for optimization, wherein the formula is as follows:
L 3 :Loss=‖O 2 -I‖ (7)
II is the mean square error loss, lambda 1 And lambda (lambda) 2 The weight lost by the two parts; finally, the optimized reconstructed picture result is obtained.
Further, the input response data is pulse emission rate or voxel response data, and the output of the pulse-picture converter is the primary decoded stimulus O 1 The output of the image-image auto-encoder is the final reconstructed stimulus picture O 2 Comparing the two outputs with the stimulation picture I, and optimizing the output of the model.
Further, the step S1 is to calculate the receptive field according to the real retinal ganglion cell white noise stimulation and impulse response data, then to construct a linear coding model, and to input the natural picture stimulation data of CIFAR100 to generate simulated ganglion cell response data, comprising the following steps:
s11, ganglion cell white noise stimulation data and real response data, obtaining the receptive field of the neuron cells according to a pulse excitation analysis method, recording data of 90 ganglion cells in salamander retina data, obtaining the receptive field of 90 ganglion cells, and generating a receptive field module by using two-dimensional Gaussian fitting receptive field according to the position information of the 90 receptive fields;
s12, converting a natural image to be simulated response into a picture with the size of 64 x 64, carrying out pixel normalization processing, accumulating pixel values in each receptive field according to receptive field modules of 90 ganglion cells, and generating response data based on the release rate.
Further, the step S1 is to acquire ganglion cell stimulation data and response data corresponding to the ganglion cell stimulation data through real physiological data acquisition, wherein the stimulation comprises static natural image stimulation and dynamic video stimulation.
Further, the S5 is a model which is trained by using real physiological data and is based on the natural scene reconstruction of the depth neural network from end to end, the real physiological data comprises static natural pictures or videos, when the training is carried out by using the static natural pictures, a decoding model is trained by stimulating impulse responses S of pictures I and neuron groups and a model output result O, then impulse responses of ganglion group cells to new stimulation are input into the model, the natural stimulating pictures are reconstructed, and the network is proved to be capable of reconstructing natural image stimulation according to the impulse responses; when training is carried out by using the real physiological data, new neuron group impulse responses are input into the trained model, and a stimulation video frame is reconstructed.
Further, the step S5 is to simulate impulse response data of the ganglion cells after the retina is stimulated by using the simulation data when the natural pictures in the CIFAR100 data set, train a decoding model, and reconstruct a stimulated picture according to the trained model and the response of the neuron group. The network can reconstruct complex natural image stimulus well according to response data of simulated retinal ganglion population cells.
Further, in the step S5, functional magnetic resonance imaging of real physiological data is used to record response data of visual cortex V1, V2 and V3 when a person looks at handwriting numbers, a decoding model is trained, and a stimulating picture is reconstructed well according to the trained model and responses of three brain region effective voxels. It is illustrated that the network can reconstruct a stimulus image from such a coarse signal as fMRI.
The invention has the advantages that:
the invention can decode the stimulation scene, such as complex static natural image and dynamic video image, according to the impulse response of the neuron population. The invention can reconstruct MNIST stimulation pictures according to the data recorded by human brain fMRI. And measuring the performance of the model, namely the similarity between the reconstructed picture and the real stimulation picture, by calculating average square error, peak signal-to-noise ratio and structural similarity index measure. The above effects can be achieved by the comprehensive decoding method, on one hand, a bridge of human brain vision and machine vision can be established, so that the mechanism of encoding and decoding of a human brain vision system is revealed; on the other hand, the model is considered to be applied to the development of retina prostheses, and the development of information technology and medical industry is promoted.
Drawings
Fig. 1 is a flow chart of the method of the present invention.
Fig. 2 is a schematic diagram of a deep network decoding model architecture for end-to-end training in the present invention.
Detailed Description
The following describes specific embodiments of the present invention in detail with reference to the drawings. It should be understood that the detailed description and specific examples, while indicating and illustrating the invention, are not intended to limit the invention.
As shown in fig. 1, the method for reconstructing natural scene based on deep neural network can decode and reconstruct the stimulus according to the response of retinal ganglion population cells to natural scene stimulus, and not only comprises static image stimulus but also dynamic video stimulus; on the other hand, the stimulation picture can be reconstructed according to the response of the human brain visual cortex to handwriting numbers recorded by the functional magnetic resonance technology. In addition, the simulated impulse response generated under the natural image stimulation is input into the trained model, and the stimulation picture of the simulated impulse response can be well reconstructed.
Retinal simulation data is generated from the linear model and receptive fields of cells of the true ganglion population. The spatial receptive field of the 90 neurons is first deduced from the impulse response of the 90 neurons under real subretinal white noise stimulation, and then the receptive field is modeled using a two-dimensional gaussian fit. And (3) tiling the new stimulation pictures in the receptive fields, counting the pixel values covered by each receptive field, and simulating to obtain the pulse release rate of each neuron.
The real physiological data included the use of static natural image stimulation and video stimulation of the salamander retina, and the use of multiple electrodes to record impulse response data of 90 ganglion cells on the salamander retina. Each still natural image has a size of 64×64 pixels, and each video frame has a size of 90×90 pixels. And response of human brain visual cortex to MNIST handwriting data stimulus using functional magnetic resonance recording, handwriting image size 28 x 28 pixels.
As shown in fig. 2, the pulse-image decoder is composed of two parts. The first part is a pulse-to-picture converter, which converts the neural signals into a graph of the same size as the stimulation picture, and in this part, a fully connected network is used, which already captures the information of the stimulation picture well. The one-dimensional vector output by the pulse-to-picture converter then needs to be reshaped into a picture of the stimulus picture size. The second part is a picture-to-picture auto-encoder, which uses a multi-layer CNN model to further reduce noise of the generated picture. The entire model inputs the neural response of the neuron population (for retinal pulse data, the model inputs the pulse firing rate, and for functional magnetic resonance data, the model data is the values of all voxels). For the exploration of the model structure, the present embodiment also tries much, and finally, it is found that the model in which the number of layers is set to 3 layers already reproduces the stimulation picture information well in the pulse-picture converter section. In the picture-picture automatic encoder section, a downsampling section is provided withFour convolution layers, the convolution kernel sizes are set to (64,7,7), (128,5,5), (256,3,3), (256,3,3), step sizes (2, 2), and the kernel sizes of all of these layers in the upsampling portion are (256,3,3), (128,3,3), (64,5,5), (3, 7), and step sizes (1, 1), respectively. Finally using the output O of the pulse-to-picture converter 1 And output O of picture-picture automatic encoder 2 And constructing a loss function by the real stimulation picture I, and optimizing a reconstruction result output by the network. The entire model forward information flow is represented as follows:
the first part is a pulse-to-picture converter, which consists of three layers of fully connected network, the first layer of neurons is 90 (90 for the model of natural image, video stimulus and analog data, since 90 neurons are recorded in the retina data, the model is 90 for natural image, video stimulus and analog data, 3092 voxels are available in fMRI data, and 3092), the second layer of neurons is 512, the third layer is 64 x 64 (64 x 64 for still picture, 90 x 90 for video stimulus, 28 x 28 for fMRI data, and 32 x 32 for analog data). For the activation function, the second layer and the third layer are ReLU and sigmoid, respectively.
Figure BDA0002980355040000051
O 1 =sigmoid(W 2 *Y 1 )+b 2 ) (2)
Figure BDA00029803550400000510
Representing the ReLU activation function, S is the neural response data of ganglion cell populations, W 1 B is the weight between the first layer and the second layer 1 For biasing of the second layer, Y 1 Is the output of the second layer; w (W) 2 B is the connection weight between the second layer and the third layer 2 To bias, O 1 The output of the third layer is also the output of the pulse-picture converter;
the second part is a picture-picture auto-encoder. Consists of a downsampled convolutional layer portion and an upsampled convolutional layer portion. The method portion of convolution and downsampling comprises four convolution layers. The method portion of convolution and upsampling also includes four convolution layers, the formula is as follows:
Figure BDA0002980355040000052
Figure BDA0002980355040000053
Figure BDA0002980355040000054
Figure BDA0002980355040000055
Figure BDA0002980355040000056
Figure BDA0002980355040000057
Figure BDA0002980355040000058
Figure BDA0002980355040000059
Wc 11 ,Wc 12 ,Wc 13 ,Wc 14 b is the convolution kernel of the four-layer convolution layer of the downsampling stage 11 ,b 12 ,b 13 ,b 14 For corresponding bias, Y 11 ,Y 12 ,Y 13 ,Y 14 Is the corresponding output; wc (Wc) 21 ,Wc 22 ,Wc 23 ,Wc 24 Convolution kernel being a four-layer convolution layer of the upsampling stage, b 21 ,b 22 ,b 23 ,b 24 For corresponding bias, Y 21 ,Y 22 ,Y 23 ,O 2 Is the corresponding output;
finally, to train the network, we designed three loss functions L 1 、L 2 、L 3 The formula is as follows;
L 1 :Loss=λ 1 ‖O 1 -I‖+λ 2 ‖O 2 -I‖ (5)
L 2 :Loss 1 =λ 1 ‖O 1 -I‖,Loss 2 =λ 2 ‖O 2 -I‖ (6)
L 3 :Loss=‖O 2 -I‖ (7)
II is the mean square error loss, lambda 1 And lambda (lambda) 2 Two-part lost weight.
And optimizing the network by using an Adam algorithm to gradually match the model output with the stimulus. After the network training is finished, the network outputs a reconstructed stimulation picture.
Embodiment one:
the real physiological data-static natural image is used for stimulating the data of ganglion cell groups recorded by the salamander retina, and the depth neural network model is trained. Natural image stimuli can be reconstructed well from the trained models and responses of the neuron population. The network can reconstruct complex natural image stimulation well according to impulse response data of retinal ganglion population cells.
Embodiment two:
the present deep neural network model is trained using the data of the ganglion cell population recorded by the real physiological data-dynamic video stimulated salamander retina. The stimulated video frames can be reconstructed well from the trained models and responses of the neuron population. The network can reconstruct complex dynamic video stimulus well according to impulse response data of retinal ganglion population cells.
Embodiment III:
the present deep neural network model was trained using simulation data-simulated impulse response data of postretinal ganglion cells stimulated when natural pictures in the CIFAR100 dataset. The stimulation pictures can be reconstructed well according to the trained models and the responses of the neuron groups. The network can reconstruct complex natural image stimulus well according to response data of simulated retinal ganglion population cells.
Embodiment four:
the response data of visual cortex V1V2V3 when a person looks at handwriting numbers is recorded by using a real physiological data-functional magnetic resonance imaging technology, and the deep neural network model is trained. The stimulation picture can be reconstructed according to the trained model and the response of the three brain region effective voxels. It is illustrated that the network can reconstruct a stimulus image from such a coarse signal as fMRI.
The above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced with equivalents; such modifications and substitutions do not depart from the spirit of the technical solutions according to the embodiments of the present invention.

Claims (10)

1. The natural scene reconstruction method based on the deep neural network is characterized by comprising the following steps of:
s1, natural picture stimulation data and corresponding nerve response data are obtained;
s2, constructing a pulse-picture converter which is a 3-layer fully-connected neural network, and comprising the following steps of:
s21, the first layer of neurons receives pulse data of all ganglion cells as input, the number of the first layer of neurons is set to be the number of RGCs used, the second layer of neurons is a hidden layer, the hidden layer comprises a group of neurons, and the output of the first layer of neurons is received as input, and the formula is as follows:
Figure FDA0004187536430000011
Figure FDA0004187536430000012
represents ReLU activation function, S is ganglion cell data, W 1 B is the weight between the first layer and the second layer 1 Is that
Bias of the second layer, Y 1 Is the output of the second layer;
s22, the third layer is an output layer, the output of the second layer is received as input, and the output neuron number of the third layer is set as the stimulated picture pixel number according to the sigmoid function, and the formula is as follows:
O 1 =sigmoid(W 2 *Y 1 )+b 2 ) (2)
W 2 b is the connection weight between the second layer and the third layer 2 To bias, O 1 The output of the third layer is also the output of the pulse-picture converter;
s3, constructing an automatic encoder of a picture-picture, comprising the following steps:
s31, reducing the size of an input image by convolution and downsampling, wherein the size comprises four convolution layers, and the formula is as follows:
Figure FDA0004187536430000013
Figure FDA0004187536430000014
Figure FDA0004187536430000015
Figure FDA0004187536430000016
Wc 11 ,Wc 12 ,Wc 13 ,Wc 14 b is the convolution kernel of the four-layer convolution layer of the downsampling stage 11 ,b 12 ,b 13 ,b 14 For corresponding bias, Y 11 ,Y 12 ,Y 13 ,Y 14 Is the corresponding output;
s32, processing the image by adopting convolution and up-sampling, and recovering the texture of the down-sampled image while increasing the size of the down-sampled image, wherein compared with a down-sampling stage, the up-sampling stage further comprises four convolution layers, and the formula is as follows:
Figure FDA0004187536430000017
Figure FDA0004187536430000018
Figure FDA0004187536430000019
Figure FDA00041875364300000110
Wc 21 ,Wc 22 ,Wc 23 ,Wc 24 convolution kernel being a four-layer convolution layer of the upsampling stage, b 21 ,b 22 ,b 23 ,b 24 For corresponding bias, Y 21 ,Y 22 ,Y 23 ,O 2 Is the corresponding output;
s4, output O 1 、O 2 Constructing a loss function with the stimulation picture, and optimizing a reconstruction result output by the network;
s5, reconstructing a stimulation picture of the nerve response data according to the trained model.
2. The method for reconstructing natural scenes based on deep neural network according to claim 1, wherein said S4 outputs O 1 、O 2 By loss function L compared with stimulus picture I 1 To optimize the output of the model, the formula is as follows:
L 1 :Loss=λ 1 ‖O 1 -I‖+λ 2 ‖O 2 -I‖ (5)
II is the mean square error loss, lambda 1 And lambda (lambda) 2 The weight lost by the two parts; optimizing the reconstructed picture result, and using the gradually reduced mean square value to make the model output O 1 、O 2 Gradually matching with the stimulation pictures I respectively to optimize the output of the model, wherein the formula of the mean square error function is as follows:
Figure FDA0004187536430000021
3. the method for reconstructing natural scenes based on deep neural network according to claim 1, wherein said S4 outputs O 1 、O 2 By loss function L compared with stimulus picture I 2 To make the output O of the model 1 、O 2 Respectively constructing two Loss functions with the stimulation pictures I, and alternately optimizing Loss 1 And Loss of 2 The formula is as follows:
L 2 :Loss 1 =λ 1 ‖O 1 -I‖,Loss 2 =λ 2 ‖O 2 -I‖ (6)
II is the mean square error loss, lambda 1 And lambda (lambda) 2 The weight lost by the two parts; finally, the optimized reconstructed picture result is obtained.
4. The method for reconstructing natural scenes based on deep neural network according to claim 1, wherein said S4 outputs O 1 、O 2 By loss function L compared with stimulus picture I 3 Only the final output O of the model 2 Respectively constructing a loss function with the stimulated picture I for optimization, wherein the formula is as follows:
L 3 :Loss=‖O 2 -I‖ (7)
ii is the mean square error loss; finally, the optimized reconstructed picture result is obtained.
5. A method of natural scene reconstruction based on a deep neural network as claimed in claim 1, wherein the input response data is pulse emission rate or voxel response data, and the output of the pulse-to-picture converter is the preliminarily decoded stimulus O 1 The output of the image-image auto-encoder is the final reconstructed stimulus picture O 2 Comparing the two outputs with the stimulation picture I, and optimizing the output of the model.
6. The method for reconstructing natural scenes based on deep neural network according to claim 1, wherein said S1, calculating receptive fields according to real retinal ganglion cell white noise stimulus and impulse response data, then constructing a linear coding model, inputting natural picture stimulus data of CIFAR100 to generate simulated ganglion cell response data, comprises the steps of:
s11, ganglion cell white noise stimulation data and real response data, obtaining a receptive field of the neuron cells according to a pulse excitation analysis method, recording ganglion cell data in salamander retina data, obtaining the receptive field of the ganglion cells, and generating a receptive field module by using a two-dimensional Gaussian fitting receptive field according to position information of the receptive field;
s12, converting the natural image to be simulated response into a picture, performing pixel normalization processing, and accumulating pixel values in each receptive field according to receptive field modules of ganglion cells to generate response data based on the release rate.
7. The method for reconstructing natural scenes based on deep neural network according to claim 1, wherein the step S1 is to acquire ganglion cell stimulation data and corresponding response data thereof through real physiological data acquisition, and the stimulation comprises static natural image stimulation and dynamic video stimulation.
8. The method for reconstructing a natural scene based on a deep neural network according to claim 1, wherein the step S5 is to train an end-to-end model for reconstructing a natural scene based on a deep neural network using real physiological data, the real physiological data including a static natural picture or video, train a decoding model by stimulating an impulse response S of a picture I and a neuron population and outputting a result O from the model when training using the static natural picture, and then input an impulse response of ganglion population cells to a new stimulus in the model, reconstruct the natural stimulus picture; when training is carried out by using the real physiological data, new neuron group impulse responses are input into the trained model, and a stimulation video frame is reconstructed.
9. The method of claim 1, wherein S5 uses simulation data to simulate impulse response data of ganglion cells after stimulating retina when a natural picture in a CIFAR100 dataset, trains a decoding model, and reconstructs a stimulating picture according to the trained model and the response of a neuron population.
10. The method for reconstructing natural scenes based on deep neural network according to claim 1, wherein the step S5 is to record response data of visual cortex V1, V2, V3 when a person is looking at handwriting numbers by using functional magnetic resonance imaging of real physiological data, train a decoding model, and reconstruct a stimulating picture according to the trained model and responses of three brain region effective voxels.
CN202110285684.4A 2021-03-17 2021-03-17 Natural scene reconstruction method based on deep neural network Active CN113068035B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110285684.4A CN113068035B (en) 2021-03-17 2021-03-17 Natural scene reconstruction method based on deep neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110285684.4A CN113068035B (en) 2021-03-17 2021-03-17 Natural scene reconstruction method based on deep neural network

Publications (2)

Publication Number Publication Date
CN113068035A CN113068035A (en) 2021-07-02
CN113068035B true CN113068035B (en) 2023-07-14

Family

ID=76561154

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110285684.4A Active CN113068035B (en) 2021-03-17 2021-03-17 Natural scene reconstruction method based on deep neural network

Country Status (1)

Country Link
CN (1) CN113068035B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108663678A (en) * 2018-01-29 2018-10-16 西北农林科技大学 More baseline InSAR phase unwrapping algorithms based on mixed integer optimization model
CN112329977A (en) * 2020-09-10 2021-02-05 国家电网有限公司 Wind power prediction system for extreme scene

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10803591B2 (en) * 2018-08-28 2020-10-13 International Business Machines Corporation 3D segmentation with exponential logarithmic loss for highly unbalanced object sizes

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108663678A (en) * 2018-01-29 2018-10-16 西北农林科技大学 More baseline InSAR phase unwrapping algorithms based on mixed integer optimization model
CN112329977A (en) * 2020-09-10 2021-02-05 国家电网有限公司 Wind power prediction system for extreme scene

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Stochastic super-resolution image reconstruction;jing tian等;《IEEE transactions on image processing》;全文 *
结合深度学习的单幅遥感图像超分辨率重建;李欣等;《万方》;全文 *

Also Published As

Publication number Publication date
CN113068035A (en) 2021-07-02

Similar Documents

Publication Publication Date Title
CN110827216B (en) Multi-generator generation countermeasure network learning method for image denoising
CN114140353B (en) Swin-Transformer image denoising method and system based on channel attention
CN108765319A (en) A kind of image de-noising method based on generation confrontation network
CN111028163B (en) Combined image denoising and dim light enhancement method based on convolutional neural network
CN109035142B (en) Satellite image super-resolution method combining countermeasure network with aerial image prior
CN110097512A (en) Construction method and the application of the three-dimensional MRI image denoising model of confrontation network are generated based on Wasserstein
CN109615582A (en) A kind of face image super-resolution reconstruction method generating confrontation network based on attribute description
Zhang et al. Self-supervised image denoising for real-world images with context-aware transformer
CN106408550A (en) Improved self-adaptive multi-dictionary learning image super-resolution reconstruction method
CN110111251B (en) Image super-resolution reconstruction method combining depth supervision self-coding and perception iterative back projection
Zhou et al. Volume upscaling with convolutional neural networks
CN112233199B (en) fMRI vision reconstruction method based on discrete characterization and conditional autoregressive
CN113658040A (en) Face super-resolution method based on prior information and attention fusion mechanism
CN114219719A (en) CNN medical CT image denoising method based on dual attention and multi-scale features
CN114748053A (en) fMRI high-dimensional time sequence-based signal classification method and device
CN113298710A (en) Optical coherence tomography super-resolution imaging method based on external attention mechanism
CN115880225A (en) Dynamic illumination human face image quality enhancement method based on multi-scale attention mechanism
CN113052935A (en) Single-view CT reconstruction method for progressive learning
CN110992295A (en) Low-dose CT reconstruction method based on wavelet-RED convolution neural network
CN116645283A (en) Low-dose CT image denoising method based on self-supervision perceptual loss multi-scale convolutional neural network
CN112488971A (en) Medical image fusion method for generating countermeasure network based on spatial attention mechanism and depth convolution
CN114093013B (en) Reverse tracing method and system for deeply forged human faces
CN114648048A (en) Electrocardiosignal noise reduction method based on variational self-coding and PixelCNN model
Sun et al. Contrast, attend and diffuse to decode high-resolution images from brain activities
CN117097876B (en) Event camera image reconstruction method based on neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant