WO2021258959A1 - 图像修复的方法、装置及电子设备 - Google Patents
图像修复的方法、装置及电子设备 Download PDFInfo
- Publication number
- WO2021258959A1 WO2021258959A1 PCT/CN2021/095778 CN2021095778W WO2021258959A1 WO 2021258959 A1 WO2021258959 A1 WO 2021258959A1 CN 2021095778 W CN2021095778 W CN 2021095778W WO 2021258959 A1 WO2021258959 A1 WO 2021258959A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- processed
- network
- layer
- frame
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 90
- 238000007906 compression Methods 0.000 claims abstract description 18
- 230000006835 compression Effects 0.000 claims abstract description 15
- 238000005070 sampling Methods 0.000 claims description 73
- 230000006870 function Effects 0.000 claims description 57
- 230000008569 process Effects 0.000 claims description 41
- 238000004088 simulation Methods 0.000 claims description 32
- 230000006837 decompression Effects 0.000 claims description 28
- 238000003860 storage Methods 0.000 claims description 13
- 230000000306 recurrent effect Effects 0.000 claims description 2
- 238000010586 diagram Methods 0.000 description 19
- 239000000284 extract Substances 0.000 description 15
- 238000004590 computer program Methods 0.000 description 7
- 238000012986 modification Methods 0.000 description 5
- 230000004048 modification Effects 0.000 description 5
- 230000000694 effects Effects 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 2
- 230000008439 repair process Effects 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000009432 framing Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
Images
Classifications
-
- G06T5/70—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformation in the plane of the image
- G06T3/40—Scaling the whole image or part thereof
- G06T3/4038—Scaling the whole image or part thereof for image mosaicing, i.e. plane images composed of plane sub-images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformation in the plane of the image
- G06T3/40—Scaling the whole image or part thereof
- G06T3/4046—Scaling the whole image or part thereof using neural networks
-
- G06T5/60—
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/30—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/42—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/42—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
- H04N19/423—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation characterised by memory arrangements
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/85—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression
- H04N19/86—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression involving reduction of coding artifacts, e.g. of blockiness
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Definitions
- the present disclosure relates to the field of image processing technology, and in particular to an image restoration method, device and electronic equipment.
- an image restoration method including:
- the target denoising network includes a single frame network and a recursive network, and the image to be processed is any frame in the video to be processed;
- the recursive network performs decompression noise processing on the to-be-processed image to output a second image, wherein the previous frame image is the image of the to-be-processed image in the to-be-processed video Previous frame image;
- the decompressing noise processing on the image to be processed through the recursive network based on the content of the previous frame image, and outputting the second image includes:
- the first convolutional layer in the recursive network includes a first subconvolutional layer and a second subconvolutional layer
- the first feature concatenation layer includes a first subfeature concatenation A layer and a second sub-feature are connected in series
- the first sampling layer includes a first down-sampling layer and a first up-sampling layer
- the decompressing noise processing on the image to be processed through the first convolutional layer, the first feature concatenation layer, and the first sampling layer cascaded in the recursive network to output a second image includes:
- the first feature map of the image to be processed extracted by the third subconvolutional layer in each second convolutional layer in the single-frame network is received through the first subfeature concatenation layer, and the first feature map of the image to be processed is received through the first
- the sub-feature concatenation layer acquires the second feature map extracted from the previous frame image by the first sub-convolutional layer corresponding to each of the third sub-convolutional layers in the recursive network;
- the concatenated feature maps are compressed through the first subconvolutional layer to obtain a compressed feature map.
- the compressed feature map is obtained from the image to be processed through each of the first subconvolutional layers.
- the first stitched feature map is processed by the second subconvolutional layer, and the second image is output.
- the single-frame network includes a cascaded second convolutional layer, a second sampling layer, and a second feature concatenation layer, and the second convolutional layer includes a third subconvolutional layer and A fourth sub-convolutional layer, the second sampling layer includes a second down-sampling layer and a second up-sampling layer;
- the performing decompression and noise processing on the to-be-processed image through the single-frame network to output a first image includes:
- the second stitched feature map is processed by the fourth subconvolutional layer, and the first image is output.
- the method before the input of the to-be-processed image into the target denoising network, the method further includes:
- the training process of the target denoising network is specifically executed:
- the network corresponding to when the first loss function is lower than the first preset threshold is used as the target denoising network.
- the first loss for the denoising network to be trained is determined according to the first prediction deviation between the simulated denoising image and the true value image of the corresponding frame in the true value video Functions, including:
- the L2 loss function is adopted
- f(x) represents the simulated denoising image
- y represents the true value image of the frame corresponding to the simulated denoising image in the true value video.
- the method further includes:
- the network corresponding to when the second loss function is lower than a second preset threshold is used as the target denoising network.
- embodiments of the present disclosure also provide an apparatus for image restoration, including:
- the input unit is configured to input the image to be processed into a target denoising network, where the target denoising network includes a single frame network and a recursive network, and the image to be processed is any frame in the video to be processed;
- a first processing unit configured to perform decompression noise processing on the image to be processed through the single-frame network, and output a first image
- the second processing unit is configured to perform decompression noise processing on the image to be processed through the recursive network according to the content of the previous frame image, and output a second image, wherein the previous frame image is the image to be processed in The previous frame of image in the to-be-processed video;
- the output unit is configured to perform a weighted summation of the first image and the second image, and output a denoising image for the image to be processed.
- an electronic device for image restoration including:
- the memory is used to store a program
- the processor is configured to execute the program in the memory and includes the following steps:
- the target denoising network includes a single frame network and a recursive network, and the image to be processed is any frame in the video to be processed;
- the recursive network performs decompression noise processing on the to-be-processed image to output a second image, wherein the previous frame image is the image of the to-be-processed image in the to-be-processed video Previous frame image;
- the embodiments of the present disclosure provide a computer-readable storage medium having computer instructions stored in the computer-readable storage medium, and when the stored computer instructions are executed by a processor, the image as described above can be realized. Repair method.
- FIG. 1 is a schematic structural diagram of a target denoising network provided by an embodiment of the disclosure
- FIG. 2 is a schematic diagram of one structure of a recursive network provided by an embodiment of the disclosure
- FIG. 3 is a schematic diagram of one structure of a recursive network provided by an embodiment of the disclosure.
- FIG. 4 is a schematic diagram of one structure of a single-frame network provided by an embodiment of the disclosure.
- FIG. 5 is a schematic structural diagram of a target denoising network provided by an embodiment of the disclosure.
- FIG. 6 is a method flowchart of an image restoration method provided by an embodiment of the disclosure.
- Fig. 7 is a flowchart of one of the methods in step S103 in an image restoration method provided by an embodiment of the present disclosure
- FIG. 8 is a method flowchart of step S102 in an image restoration method provided by an embodiment of the present disclosure
- step S101 is a flowchart of a method before step S101 in an image restoration method provided by an embodiment of the disclosure.
- FIG. 10 is a method flowchart after step S404 in an image restoration method provided by an embodiment of the present disclosure.
- FIG. 11 is a schematic structural diagram of a device for image restoration provided by an embodiment of the disclosure.
- FIG. 12 is a schematic structural diagram of an electronic device for image restoration provided by an embodiment of the disclosure.
- Existing methods for removing compression noise from video are mainly to remove compression noise during the video compression process. Specifically, in the video compression encoding process, the noise generated by compression is reduced as much as possible, which is mainly in the same compression process. At a certain level, the video presents a higher quality. The whole processing process cannot realize the denoising of the compressed and damaged video, and the video quality is poor.
- the embodiments of the present disclosure provide an image restoration method, device, and electronic equipment, which are used to remove noise in video compression and improve display quality.
- Fig. 1 is a schematic diagram of one structure of the target denoising network 1.
- the target denoising network 1 includes a single-frame network 20 and a recursive network 10.
- the recursive network 10 includes a cascaded first convolutional layer 101, a first feature concatenation layer 102 and a first sampling layer 103.
- the structure of each layer in the recursive network 10 can be multiple, as shown in FIG. 3 is a schematic diagram of one of the structures of the recursive network 10.
- the first convolutional layer 101 in the recurrent network 10 It includes a first sub-convolution layer 1011 and a second sub-convolution layer 1012.
- the first feature concatenation layer 102 includes a first sub-feature concatenation layer 1021 and a second sub-feature concatenation layer 1022.
- the first sampling layer 103 includes a first down-sampling Layer 1031 and first up-sampling layer 1032.
- the single-frame network 20 includes a cascaded second convolutional layer 201, a second sampling layer 202, and a second feature concatenation layer 203.
- the second convolutional layer 201 includes a third subconvolutional layer 2011 and a second convolutional layer.
- the second sampling layer 202 includes a second down-sampling layer 2021 and a second up-sampling layer 2022.
- the network structure of the single-frame network 20 and the recursive network 10 are roughly the same.
- the second convolutional layer 201 of the single-frame network 20 includes N third subconvolutional layers 2011, and accordingly, the recursive
- the first convolutional layer 101 of the network 10 also includes N first subconvolutional layers 1011, where N is an integer greater than 1.
- the position of each subconvolutional layer in the single-frame network 20 and the recursive network 10 It's also roughly the same.
- FIG. 5 is a schematic structural diagram of one of the target denoising network 1. Specifically, there are two first down-sampling layers 1031, and the up-sampling layers included in the first up-sampling layer 1032 There are two, the second sub-convolutional layer 1012 is one, the first sub-feature concatenation layer 1021 is one, the second down-sampling layer 2021 is two, the second up-sampling layer 2022 is two, and the fourth sub-convolutional layer 2014 is one, and the second feature concatenation layer 203 is one of the structural schematic diagrams when there are two, where the number of filters in each convolutional layer in the network is shown by the numbers above the horizontal line in FIG. 5, such as 64 and 128.
- the size of the convolution kernel of each convolutional layer can be 3 ⁇ 3, the stride is 1, and the input of each convolutional layer is filled with a pad size of 1 to ensure that each convolution
- the input and output sizes of the layers are equal, and the relu activation function can be used to perform nonlinear operations on the output after each convolutional layer is output.
- a convolutional layer with a stride of 2 can be used to downsample the spatial dimension of the feature map twice, and the size of the convolution kernel is 3 ⁇ 3.
- the convolutional layer and the depth to space layer can be used to upsample the spatial dimension of the feature map by 2 times.
- the convolutional layer expands the feature dimension of the input feature map to 4 times.
- the size of the convolution kernel is 3 ⁇ 3, and the step size is 1.
- the depth to space layer converts the expansion of the feature dimension of the feature map into an enlargement in the space dimension.
- the feature maps of different scales are extracted from the single-frame network 20 and the feature maps of the corresponding scale of the recursive network 10 are concatenated in the feature dimension, followed by a convolutional layer to compress the feature dimension.
- the target denoising network 1 constituted by the single-frame network 20 and the recursive network 10 performs image repair on any frame of the to-be-processed image in the video to be processed, and the specific processing of each layer in the target denoising network 1 The process will be described later and will not be detailed here.
- FIG. 6 is a method flowchart of an image restoration method provided by an embodiment of the present disclosure. Specifically, the image restoration method includes:
- S101 Input the image to be processed into a target denoising network, where the target denoising network includes a single frame network and a recursive network, and the image to be processed is any frame in the video to be processed;
- the video to be processed may be a compressed video.
- the original video has a frame rate of 30 and a bit rate of 100M
- the video to be processed is a video whose source video is compressed into a video with a bit rate of 2M.
- the single-frame network and the recursive network in the target denoising network can have the same encoding and decoding structure, and both can be RNN network models.
- the target denoising network may be a trained network.
- the image to be processed is any frame of image in the video to be processed.
- S102 Perform decompression noise processing on the image to be processed through a single-frame network, and output the first image;
- S103 Perform decompression noise processing on the image to be processed through the recursive network according to the content of the previous frame image, and output a second image, where the previous frame image is the previous frame of the image to be processed in the video to be processed;
- the content of the previous frame image may be the semantic information of the previous frame image, and the semantic information may be a feature map extracted from the previous frame image through each convolutional layer in the recursive network.
- the recursive network can perform decompression noise processing on the image to be processed, thereby outputting the second image.
- the recursive network can perform decompression noise processing based on the connection between the previous and next frames in the video to be processed.
- the connection between the previous and next frames can be a motion connection, and then output the second image. Since the second image is an image obtained after decompressing noise processing using the connection between the previous and subsequent frame images, the obtained second image has a better display effect.
- S104 Perform a weighted summation on the first image and the second image, and output a denoising image for the image to be processed.
- the first image and the second image may be weighted and summed, and the weighted and summed image is used as the denoising image for the image to be processed. Since the single-frame network directly denoises the image to be processed, the degree of denoising is better. In addition, due to the relationship between the front and rear frames, while ensuring the details of the images of the front and rear frames, the image to be processed is denoised through the recursive network, which helps to ensure the quality of the video.
- the weighted processing of the first image P1 and the second image P2 is a ⁇ P1+b ⁇ P2, and accordingly, the image to be processed is denoised through the target denoising network.
- the weighting coefficient of the first image P1 and the second image P2 for example, when a>b, the denoising ability of the image to be processed is stronger, for example, a ⁇ b, the difference between the image to be processed and the previous frame image The details are more consistent and the display effect is better.
- those skilled in the art can set the weighting coefficients of the first image P1 and the second image P2 according to actual application requirements, which is not limited here.
- step S103 according to the content of the previous frame image, perform decompression noise processing on the image to be processed through the recursive network, and output a second image, where the previous frame image is the previous image of the image to be processed in the video to be processed
- a frame of image including:
- the image to be processed is decompressed and noise processed, and the second image is output.
- the first convolutional layer may be one or more layers
- the first feature tandem layer may be one layer to obtain multiple layers
- the first sampling layer may be one or more layers.
- the steps perform decompression noise processing on the image to be processed through the cascaded first convolution layer, first feature concatenation layer, and first sampling layer in the recursive network, and output the second Images, including:
- S201 Receive, through the first sub-feature concatenation layer, the first feature map of the image to be processed extracted by the third sub-convolution layer in each second convolution layer in the single-frame network, and obtain the recursion through the first sub-feature concatenation layer The second feature map extracted from the previous frame image by the first subconvolution layer corresponding to each third subconvolution layer in the network;
- S202 Perform a series operation on the first feature map and the second feature map through the first sub-feature series layer to obtain a series feature map
- S203 Compress the concatenated feature maps through the first subconvolution layer to obtain a compressed feature map, where the compressed feature map is a second feature map extracted from the image to be processed through each first subconvolution layer;
- S204 Extract feature maps of multiple spatial sizes from the compressed feature map through the first down-sampling layer in the first sampling layer;
- S206 splicing feature maps of the same spatial size in feature dimensions through the second sub-feature series layer to obtain a first splicing feature map
- S207 Process the first stitched feature map through the second sub-convolutional layer, and output a second image.
- step S201 to step S207 is as follows:
- the first feature map of the image to be processed extracted by the third subconvolutional layer in each second convolutional layer in the single-frame network is received through the first subfeature concatenation layer, and the recursion is obtained through the first subfeature concatenation layer
- the second feature map extracted from the previous frame image by the first subconvolutional layer corresponding to each third subconvolutional layer in the network where there can be multiple third subconvolutional layers in a single-frame network, recursively There can also be multiple first sub-convolutional layers in the network.
- the third sub-convolutional layer can be combined with the third The first subconvolution layer corresponding to the subconvolution layer extracts the second feature map from the previous frame image.
- any one of the multiple third subconvolutional layers can extract a corresponding feature map from the image to be processed.
- any one of the multiple first subconvolutional layers can extract the corresponding feature map from the previous frame image.
- the first feature map and the second feature map are serially operated through the first sub-feature series layer to obtain a series feature map, the series feature map It includes the feature relationship between the previous and next frame images; then, the concatenated feature map is compressed through the first subconvolutional layer to obtain the compressed feature map.
- the compressed feature map may be passed through each first subconvolutional layer.
- the second feature map extracted from the image to be processed for example, the compressed feature map may be the first subconvolutional layer when the target denoising network performs denoising processing on the next frame of the image to be processed The second feature map extracted from the image to be processed.
- the first down-sampling layer can be multiple, each down-sampling layer extracts feature maps of corresponding spatial sizes, multiple down-sampling layers extract feature maps of different spatial sizes, and any two spatial sizes of the multiple spatial sizes The size is different.
- the feature maps of two spatial sizes are two feature maps of different spatial sizes.
- the feature maps of three spatial sizes are three feature maps of different spatial sizes. In this way, multiple first down-sampling layers are used to process the compressed feature maps in different spatial sizes.
- a feature map of the same spatial size as the multiple spatial sizes is determined, for example, through the first up-sampling layer, a feature map of the same spatial size as the three spatial sizes is determined.
- the feature maps of the same spatial size are spliced in the feature dimension to obtain the first spliced feature map.
- the first stitched feature map is processed through the second sub-convolutional layer, so as to output the second image.
- the output of the second image is realized through the processing between the first convolutional layer, the first feature concatenation layer, and the first sampling layer that are cascaded in the recursive network.
- step S102 performing decompression and noise processing on the image to be processed through a single-frame network, and outputting the first image, includes:
- S302 Extract feature maps of multiple spatial sizes from the first feature map through the second down-sampling layer
- S304 splicing feature maps of the same spatial size in feature dimensions through the second feature series layer to obtain a second splicing feature map
- S305 Process the second stitched feature map through the fourth sub-convolutional layer, and output the first image.
- step S301 to step S305 is as follows:
- each third subconvolution layer in the single-frame network extracts multiple spatial dimensions from the first feature map through the second down-sampling layer in the single-frame network Feature maps, where there can be multiple second down-sampling layers, each down-sampling layer extracts feature maps of corresponding spatial sizes, and multiple second down-sampling layers extract feature maps of different spatial sizes. Any two spaces have different sizes. Then, through the second up-sampling layer in the single-frame network, the feature map of the same spatial size as the multiple spatial sizes is determined.
- the second down-sampling layer extracts two feature maps of different spatial sizes from the first feature map
- the second up-sampling layer determines the feature maps of the same spatial size as the two different spatial sizes.
- the feature maps of the same spatial size are spliced in the feature dimension through the second feature concatenation layer in the single-frame network to obtain the second spliced feature map.
- the second stitched feature map is processed through the fourth sub-convolutional layer in the single-frame network, and the first image is output.
- step S101 before inputting the image to be processed into the target denoising network, the method further includes:
- S401 Acquire multiple sets of image frame sequences, and each set of image frame sequences includes multiple images
- S402 Encode multiple groups of image frame sequences into true-value video and simulation video, respectively, where each frame of the simulation image in the simulation video contains compression noise;
- S403 Input each frame of the simulation image in the simulation video into the denoising network to be trained, and output the simulation denoising image of the corresponding frame;
- S404 Determine the first loss function for the denoising network to be trained according to the first prediction deviation between the simulated denoising image and the true value image of the corresponding frame in the true value video;
- S405 Use the network corresponding to when the first loss function is lower than the first preset threshold as the target denoising network.
- step S401 to step S405 is as follows:
- Each set of image frame sequences includes multiple images.
- the training set provided by the AIM competition hosted by ICCV-2019 is used as training data.
- the training set includes a total of 240 sets of frame sequences.
- the framing sequence contains 181 clear images of 1280 ⁇ 720, and the target denoising network is trained.
- the training set is processed as follows.
- multiple sets of image frame sequences are respectively encoded into true-value video and simulation video, where each frame of the simulation image in the simulation video contains compression noise, for example, using ffmpeg
- the above 240 sets of frame sequences are encoded into MP4 format video as the true value video of the training set, where the encoding format is H.264, the frame rate is 25, and the code rate is about 130M.
- Use ffmpeg to encode the above 240 groups of frame sequences in H.264, with a frame rate of 25 and a bit rate compressed to about 2M, to generate simulation video containing compression noise and artifacts.
- each frame of the simulation image in the simulation video is input to the denoising network to be trained, and the simulation denoising image of the corresponding frame is output.
- the first loss function for the denoising network to be trained is determined, and then the first loss function is lower than the first preset
- the network corresponding to the threshold is used as the target denoising network.
- those skilled in the art can set the specific value of the first preset threshold according to actual application needs, which is not limited here.
- step S504 Determine the first loss function for the denoising network to be trained according to the first prediction deviation between the simulated denoising image and the true value image of the corresponding frame in the true value video, and according to the first prediction deviation
- the value between the deviation value ⁇ and the set value can have the following two situations:
- the first case is that if the first prediction deviation between the simulated denoising image and the true value image of the corresponding frame in the true value video is less than or equal to ⁇ , the L2 loss function is used;
- the second case is that if the first prediction deviation between the simulated denoising image and the true value image of the corresponding frame in the true value video is greater than ⁇ , the L1 loss function is used;
- f(x) represents the simulated denoising image
- y represents the true value image of the frame corresponding to the simulated denoising image in the true value video.
- the set deviation value ⁇ may be 1.
- those skilled in the art can set the value of ⁇ according to actual applications, which is not limited here.
- step S404 Determine the first loss for the denoising network to be trained according to the first prediction deviation between the simulated denoising image and the true value image of the corresponding frame in the true value video After the function, the method also includes:
- S502 Determine the second prediction deviation between the corresponding frame image in the simulated denoising image and the edge-enhanced true value video, and determine the second loss function for the denoising network to be trained;
- S503 Use the network corresponding to when the second loss function is lower than the second preset threshold as the target denoising network.
- step S501 to step S503 is as follows:
- the Adam optimization algorithm can be used to train to The network parameters in the noise network are optimized, and the true value video and the corresponding simulation video are used for training in the initial stage of training.
- the trained denoising network to be trained can be more complete Restore the image content, at this time, sharpen each frame of the image in the true value video, and then use it as the objective function y in the loss function to continue training the denoising network.
- the second prediction deviation between the simulated denoising image and the corresponding frame image in the edge-enhanced true-value video is determined, and the second loss function for the denoising network to be trained is determined.
- the second loss function is still used and calculated
- the same formula of a loss function is used to determine the second loss function for the denoising network to be trained, which will not be described in detail here.
- the network corresponding to when the second loss function is lower than the second preset threshold is used as the target denoising network.
- the ground truth video is first enhanced, and then the denoising network to be trained is enhanced, which helps to reduce the blur degree of the denoised image and can effectively improve the denoised image
- the sharpness of the image can better restore the image details, thereby improving the quality of the reconstructed image.
- the input image frame sequence can be cut into blocks, and the size of each image block is 256 ⁇ 256, so that The whole image is cut into 15 patches as a batch.
- the network learning rate can be set to 10 ⁇ (-4), and the decay coefficient of the learning rate can be 0.8.
- the learning rate decays to 0.8 times the original, thereby improving the stability of network training.
- the epoch can be set to 100, and the network has been trained for 100 epochs. When the network is trained to the last 10 epochs, the model effect obtained by each epoch no longer changes significantly.
- various parameters for training the denoising network to be trained can also be set according to actual application needs, which are not limited here.
- the problem-solving principle of the target denoising network used for image restoration is similar to that of the aforementioned image restoration method. Therefore, the implementation of the target denoising network can refer to the implementation of the aforementioned image restoration method. Repeat it again.
- an embodiment of the present disclosure also provides a device for image restoration, including:
- the input unit 100 is configured to input the image to be processed into a target denoising network, where the target denoising network includes a single frame network and a recursive network, and the image to be processed is any frame in the video to be processed;
- the first processing unit 200 is configured to perform decompression noise processing on the image to be processed through a single-frame network, and output a first image;
- the second processing unit 300 is configured to perform decompression noise processing on the image to be processed through the recursive network according to the content of the previous frame image, and output a second image, where the previous frame image is the previous frame of the image to be processed in the video to be processed image;
- the output unit 400 is configured to perform a weighted summation of the first image and the second image, and output a denoised image for the image to be processed.
- the second processing unit 300 is used for:
- the image to be processed is decompressed and noise processed, and the second image is output.
- the second processing unit 300 is used for:
- the first feature map of the image to be processed extracted by the third subconvolutional layer in each second convolutional layer in the single-frame network is received through the first subfeature concatenation layer, and the first feature map of the recursive network is obtained through the first subfeature concatenation layer
- the feature map of the same spatial size as the multiple spatial sizes is determined
- the first stitched feature map is processed through the second sub-convolutional layer, and the second image is output.
- the first processing unit 200 is used to:
- the feature map of the same spatial size as the multiple spatial sizes is determined
- the second stitched feature map is processed through the fourth sub-convolutional layer, and the first image is output.
- the apparatus for image restoration before the input unit 100 inputs the image to be processed into the target denoising network, the apparatus for image restoration further includes:
- Training unit which is used to:
- the network corresponding to when the first loss function is lower than the first preset threshold is used as the target denoising network.
- the training unit is used to:
- the L2 loss function is adopted
- f(x) represents the simulated denoising image
- y represents the true value image of the frame corresponding to the simulated denoising image in the true value video.
- the training unit is also used to:
- the network corresponding to when the second loss function is lower than the second preset threshold is used as the target denoising network.
- the problem-solving principle of the device for image restoration is similar to that of the aforementioned image restoration method. Therefore, the implementation of the device for image restoration can refer to the implementation of the aforementioned image restoration method, and the repetition will not be repeated here. Go into details.
- an electronic device for image restoration including:
- the memory 2 is used to store programs
- the processor 3 is configured to execute the program in the memory 2, and includes the following steps:
- Input the image to be processed into a target denoising network where the target denoising network includes a single frame network and a recursive network, and the image to be processed is any frame in the video to be processed;
- the previous frame image perform decompression noise processing on the image to be processed through the recursive network, and output a second image, where the previous frame image is the previous frame of the image to be processed in the video to be processed;
- the processor 3 may be a general-purpose processor, a digital signal processor, an application specific integrated circuit, a field programmable gate array or other programmable logic device, a discrete gate or transistor logic device, a discrete hardware component, and may implement or execute the The disclosed methods, steps and logic block diagrams.
- the general-purpose processor may be a microprocessor or any conventional processor or the like.
- the steps of the method for image restoration disclosed in the embodiments of the present disclosure may be directly embodied as being executed and completed by a hardware processor, or executed and completed by a combination of hardware and software modules in the processor.
- the software module can be located in a mature storage medium in the field, such as random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers.
- the storage medium is located in the memory 2, and the processor 3 reads the information in the memory 2, and completes the steps of the signal processing flow in combination with its hardware.
- the processor 3 is configured to read a program in the memory 2 and execute any step of the above-mentioned image restoration method.
- the embodiments of the present disclosure also provide a computer-readable storage medium, the computer-readable storage medium stores computer instructions, and the stored computer instructions can be executed by the processor to implement the image restoration method as above A step of.
- the embodiments of the present disclosure provide a method, device, and electronic equipment for image restoration, wherein the method inputs any frame of the to-be-processed image in the to-be-processed video into a target denoising network composed of a single-frame network and a recursive network.
- the single-frame network performs decompression noise processing on the image to be processed and outputs the first image.
- the recursive network performs decompression noise processing on the image to be processed, and outputs the first image. Two images, and then the first image and the second image are weighted and summed, and the denoising image for the current frame image is output.
- any frame of the image to be processed in the video to be processed needs to be combined with the image of the previous frame to decompress the noise, so as to realize the process
- the removal of compression noise improves the display quality.
- the connection between the previous and next frame images is used in the entire process of decompressing the noise, the motion compensation between frames can be realized, thereby improving the video quality.
- the embodiments of the present disclosure can be provided as a method, a system, or a computer program product. Therefore, the present disclosure may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, the present disclosure may take the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program codes.
- computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
- These computer program instructions can also be stored in a computer-readable memory that can direct a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device.
- the device implements the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
- These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment.
- the instructions provide steps for implementing functions specified in a flow or multiple flows in the flowchart and/or a block or multiple blocks in the block diagram.
Abstract
Description
Claims (10)
- 一种图像修复的方法,其中,包括:将待处理图像输入目标去噪网络,其中,所述目标去噪网络包括单帧网络和递归网络,所述待处理图像为待处理视频中的任一帧;通过所述单帧网络对所述待处理图像进行去压缩噪声处理,输出第一图像;根据先前帧图像的内容,通过所述递归网络对所述待处理图像进行去压缩噪声处理,输出第二图像,其中,所述先前帧图像为所述待处理图像在所述待处理视频中的前一帧图像;将所述第一图像和第二图像进行加权求和,输出针对所述待处理图像的去噪图像。
- 如权利要求1所述的方法,其中,所述根据先前帧图像的内容,通过所述递归网络对所述待处理图像进行去压缩噪声处理,输出第二图像,包括:通过所述递归网络中级联的第一卷积层、第一特征串联层、第一采样层对所述待处理图像进行去压缩噪声处理,输出第二图像。
- 如权利要求2所述的方法,其中,所述递归网络中的所述第一卷积层包括第一子卷积层和第二子卷积层,所述第一特征串联层包括第一子特征串联层和第二子特征串联层,所述第一采样层包括第一下采样层和第一上采样层;所述通过所述递归网络中级联的第一卷积层、第一特征串联层、第一采样层对所述待处理图像进行去压缩噪声处理,输出第二图像,包括:通过所述第一子特征串联层接收由所述单帧网络中各第二卷积层中第三子卷积层所提取的所述待处理图像的第一特征图,以及通过所述第一子特征串联层获取所述递归网络中与各所述第三子卷积层对应的所述第一子卷积层从所述先前帧图像中所提取的第二特征图;通过所述第一子特征串联层对所述第一特征图和所述第二特征图进行串 联操作,获得串联特征图;通过所述第一子卷积层对所述串联特征图进行压缩,获得压缩后的特征图,所述压缩后的特征图为通过各所述第一子卷积层从所述待处理图像中所提取的所述第二特征图;通过所述第一采样层中的第一下采样层,从所述压缩后的特征图中提取多个空间尺寸的特征图;通过所述第一上采样层,确定与所述多个空间尺寸中相同空间尺寸的特征图;通过所述第二子特征串联层对相同空间尺寸的特征图在特征维度上进行拼接,获得第一拼接特征图;通过所述第二子卷积层对所述第一拼接特征图进行处理,输出所述第二图像。
- 如权利要求1所述的方法,其中,所述单帧网络包括级联的第二卷积层、第二采样层和第二特征串联层,所述第二卷积层包括第三子卷积层和第四子卷积层,所述第二采样层包括第二下采样层和第二上采样层;所述通过所述单帧网络对所述待处理图像进行去压缩噪声处理,输出第一图像,包括:通过各所述第三子卷积层提取所述待处理图像的第一特征图;通过所述第二下采样层,从所述第一特征图中提取多个空间尺寸的特征图;通过所述第二上采样层,确定与所述多个空间尺寸中相同空间尺寸的特征图;通过所述第二特征串联层对相同空间尺寸的特征图在特征维度上进行拼接,获得第二拼接特征图;通过所述第四子卷积层对所述第二拼接特征图进行处理,输出所述第一图像。
- 如权利要求1-4任一项所述的方法,其中,在所述将待处理图像输入 目标去噪网络之前,所述方法还包括:所述目标去噪网络的训练过程,具体执行:获取多组图像帧序列,每组图像帧序列包括多幅图像;将所述多组图像帧序列分别编码成真值视频以及仿真视频,其中,所述仿真视频中的每帧仿真图像中包含有压缩噪声;将所述仿真视频中每帧仿真图像输入待训练去噪网络,输出对应帧的仿真去噪图像;根据所述仿真去噪图像与所述真值视频中对应帧真值图像间的第一预测偏差,确定针对所述待训练去噪网络的第一损失函数;将所述第一损失函数低于第一预设阈值时所对应的网络,作为所述目标去噪网络。
- 如权利要求5所述的方法,其中,在所述根据所述仿真去噪图像与所述真值视频中对应帧真值图像间的第一预测偏差,确定针对所述待训练去噪网络的第一损失函数之后,所述方法还包括:在所述第一损失函数低于所述第一预设阈值时,对所述真值视频中每帧真值图像进行锐化处理,获得边缘增强后的真值视频;确定所述仿真去噪图像与所述边缘增强后的真值视频中对应帧图像间的第二预测偏差,确定针对所述待训练去噪网络的第二损失函数;将所述第二损失函数低于第二预设阈值时所对应的网络,作为所述目标去噪网络。
- 一种用于图像修复的装置,其中,包括:输入单元,用于将待处理图像输入目标去噪网络,其中,所述目标去噪网络包括单帧网络和递归网络,所述待处理图像为待处理视频中的任一帧;第一处理单元,用于通过所述单帧网络对所述待处理图像进行去压缩噪声处理,输出第一图像;第二处理单元,用于根据先前帧图像的内容,通过所述递归网络对所述待处理图像进行去压缩噪声处理,输出第二图像,其中,所述先前帧图像为所述待处理图像在所述待处理视频中的前一帧图像;输出单元,用于将所述第一图像和第二图像进行加权求和,输出针对所述待处理图像的去噪图像。
- 一种用于图像修复的电子设备,其中,包括:存储器和处理器;其中,所述存储其用于存储程序;所述处理器用于执行所述存储器中的程序,包括如下步骤:将待处理图像输入目标去噪网络,其中,所述目标去噪网络包括单帧网络和递归网络,所述待处理图像为待处理视频中的任一帧;通过所述单帧网络对所述待处理图像进行去压缩噪声处理,输出第一图像;根据先前帧图像的内容,通过所述递归网络对所述待处理图像进行去压缩噪声处理,输出第二图像,其中,所述先前帧图像为所述待处理图像在所述待处理视频中的前一帧图像;将所述第一图像和第二图像进行加权求和,输出针对所述待处理图像的去噪图像。
- 一种计算机可读存储介质,其中,所述计算机可读存储介质中存储有计算机指令,存储的所述计算机指令被处理器执行时能够实现如权利要求1至7任一项所述的图像修复的方法。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/922,150 US20230177652A1 (en) | 2020-06-22 | 2021-05-25 | Image restoration method and apparatus, and electronic device |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010574404.7 | 2020-06-22 | ||
CN202010574404.7A CN111738952B (zh) | 2020-06-22 | 2020-06-22 | 一种图像修复的方法、装置及电子设备 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021258959A1 true WO2021258959A1 (zh) | 2021-12-30 |
Family
ID=72650459
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/095778 WO2021258959A1 (zh) | 2020-06-22 | 2021-05-25 | 图像修复的方法、装置及电子设备 |
Country Status (3)
Country | Link |
---|---|
US (1) | US20230177652A1 (zh) |
CN (1) | CN111738952B (zh) |
WO (1) | WO2021258959A1 (zh) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111738952B (zh) * | 2020-06-22 | 2023-10-10 | 京东方科技集团股份有限公司 | 一种图像修复的方法、装置及电子设备 |
WO2022198381A1 (zh) * | 2021-03-22 | 2022-09-29 | 京东方科技集团股份有限公司 | 图像处理方法及图像处理装置 |
CN113344811A (zh) * | 2021-05-31 | 2021-09-03 | 西南大学 | 多层卷积稀疏编码的加权递归去噪深度神经网络及方法 |
CN115546037A (zh) * | 2021-06-30 | 2022-12-30 | 北京字跳网络技术有限公司 | 一种图像处理方法、装置、电子设备和存储介质 |
CN114860996B (zh) * | 2022-07-06 | 2022-10-04 | 北京中科开迪软件有限公司 | 一种基于光盘库的电子视频档案修复方法及系统 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106331433A (zh) * | 2016-08-25 | 2017-01-11 | 上海交通大学 | 基于深度递归神经网络的视频去噪方法 |
CN107645621A (zh) * | 2016-07-20 | 2018-01-30 | 阿里巴巴集团控股有限公司 | 一种视频处理的方法和设备 |
US20190304068A1 (en) * | 2018-03-29 | 2019-10-03 | Pixar | Multi-scale architecture of denoising monte carlo renderings using neural networks |
CN110852961A (zh) * | 2019-10-28 | 2020-02-28 | 北京影谱科技股份有限公司 | 一种基于卷积神经网络的实时视频去噪方法及系统 |
CN111738952A (zh) * | 2020-06-22 | 2020-10-02 | 京东方科技集团股份有限公司 | 一种图像修复的方法、装置及电子设备 |
-
2020
- 2020-06-22 CN CN202010574404.7A patent/CN111738952B/zh active Active
-
2021
- 2021-05-25 US US17/922,150 patent/US20230177652A1/en active Pending
- 2021-05-25 WO PCT/CN2021/095778 patent/WO2021258959A1/zh active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107645621A (zh) * | 2016-07-20 | 2018-01-30 | 阿里巴巴集团控股有限公司 | 一种视频处理的方法和设备 |
CN106331433A (zh) * | 2016-08-25 | 2017-01-11 | 上海交通大学 | 基于深度递归神经网络的视频去噪方法 |
US20190304068A1 (en) * | 2018-03-29 | 2019-10-03 | Pixar | Multi-scale architecture of denoising monte carlo renderings using neural networks |
CN110852961A (zh) * | 2019-10-28 | 2020-02-28 | 北京影谱科技股份有限公司 | 一种基于卷积神经网络的实时视频去噪方法及系统 |
CN111738952A (zh) * | 2020-06-22 | 2020-10-02 | 京东方科技集团股份有限公司 | 一种图像修复的方法、装置及电子设备 |
Also Published As
Publication number | Publication date |
---|---|
CN111738952A (zh) | 2020-10-02 |
CN111738952B (zh) | 2023-10-10 |
US20230177652A1 (en) | 2023-06-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021258959A1 (zh) | 图像修复的方法、装置及电子设备 | |
CN107403415B (zh) | 基于全卷积神经网络的压缩深度图质量增强方法及装置 | |
CN108664981B (zh) | 显著图像提取方法及装置 | |
CN108596841B (zh) | 一种并行实现图像超分辨率及去模糊的方法 | |
CN112801901A (zh) | 基于分块多尺度卷积神经网络的图像去模糊算法 | |
CN110189260B (zh) | 一种基于多尺度并行门控神经网络的图像降噪方法 | |
US20220414838A1 (en) | Image dehazing method and system based on cyclegan | |
CN114073071A (zh) | 视频插帧方法及装置、计算机可读存储介质 | |
US20230368337A1 (en) | Techniques for content synthesis using denoising diffusion models | |
CN113129212B (zh) | 图像超分辨率重建方法、装置、终端设备及存储介质 | |
CN116681584A (zh) | 一种多级扩散图像超分辨算法 | |
CN109993701B (zh) | 一种基于金字塔结构的深度图超分辨率重建的方法 | |
US11783454B2 (en) | Saliency map generation method and image processing system using the same | |
CN113705575B (zh) | 一种图像分割方法、装置、设备及存储介质 | |
CN116777764A (zh) | 一种基于扩散模型的光学遥感图像去云雾方法及系统 | |
Liu et al. | Facial image inpainting using multi-level generative network | |
TWI768517B (zh) | 影像品質提昇方法及使用該方法的影像處理裝置 | |
CN113744159A (zh) | 一种遥感图像去雾方法、装置及电子设备 | |
CN113096032A (zh) | 一种基于图像区域划分的非均匀一致模糊去除方法 | |
CN114173137A (zh) | 视频编码方法、装置及电子设备 | |
CN115272131B (zh) | 基于自适应多光谱编码的图像去摩尔纹系统及方法 | |
AT&T | ||
CN111861897A (zh) | 一种图像处理方法及装置 | |
CN115063304B (zh) | 一种多尺寸融合的金字塔神经网络图像去雾方法及系统 | |
KR102536654B1 (ko) | 인공지능 기반 영상 개선 방법 및 장치 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21828713 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21828713 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 25.09.2023) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21828713 Country of ref document: EP Kind code of ref document: A1 |