WO2021258959A1 - 图像修复的方法、装置及电子设备 - Google Patents

图像修复的方法、装置及电子设备 Download PDF

Info

Publication number
WO2021258959A1
WO2021258959A1 PCT/CN2021/095778 CN2021095778W WO2021258959A1 WO 2021258959 A1 WO2021258959 A1 WO 2021258959A1 CN 2021095778 W CN2021095778 W CN 2021095778W WO 2021258959 A1 WO2021258959 A1 WO 2021258959A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
processed
network
layer
frame
Prior art date
Application number
PCT/CN2021/095778
Other languages
English (en)
French (fr)
Inventor
段然
朱丹
Original Assignee
京东方科技集团股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 京东方科技集团股份有限公司 filed Critical 京东方科技集团股份有限公司
Priority to US17/922,150 priority Critical patent/US20230177652A1/en
Publication of WO2021258959A1 publication Critical patent/WO2021258959A1/zh

Links

Images

Classifications

    • G06T5/70
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4038Scaling the whole image or part thereof for image mosaicing, i.e. plane images composed of plane sub-images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4046Scaling the whole image or part thereof using neural networks
    • G06T5/60
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/30Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
    • H04N19/423Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation characterised by memory arrangements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/85Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression
    • H04N19/86Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression involving reduction of coding artifacts, e.g. of blockiness
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Definitions

  • the present disclosure relates to the field of image processing technology, and in particular to an image restoration method, device and electronic equipment.
  • an image restoration method including:
  • the target denoising network includes a single frame network and a recursive network, and the image to be processed is any frame in the video to be processed;
  • the recursive network performs decompression noise processing on the to-be-processed image to output a second image, wherein the previous frame image is the image of the to-be-processed image in the to-be-processed video Previous frame image;
  • the decompressing noise processing on the image to be processed through the recursive network based on the content of the previous frame image, and outputting the second image includes:
  • the first convolutional layer in the recursive network includes a first subconvolutional layer and a second subconvolutional layer
  • the first feature concatenation layer includes a first subfeature concatenation A layer and a second sub-feature are connected in series
  • the first sampling layer includes a first down-sampling layer and a first up-sampling layer
  • the decompressing noise processing on the image to be processed through the first convolutional layer, the first feature concatenation layer, and the first sampling layer cascaded in the recursive network to output a second image includes:
  • the first feature map of the image to be processed extracted by the third subconvolutional layer in each second convolutional layer in the single-frame network is received through the first subfeature concatenation layer, and the first feature map of the image to be processed is received through the first
  • the sub-feature concatenation layer acquires the second feature map extracted from the previous frame image by the first sub-convolutional layer corresponding to each of the third sub-convolutional layers in the recursive network;
  • the concatenated feature maps are compressed through the first subconvolutional layer to obtain a compressed feature map.
  • the compressed feature map is obtained from the image to be processed through each of the first subconvolutional layers.
  • the first stitched feature map is processed by the second subconvolutional layer, and the second image is output.
  • the single-frame network includes a cascaded second convolutional layer, a second sampling layer, and a second feature concatenation layer, and the second convolutional layer includes a third subconvolutional layer and A fourth sub-convolutional layer, the second sampling layer includes a second down-sampling layer and a second up-sampling layer;
  • the performing decompression and noise processing on the to-be-processed image through the single-frame network to output a first image includes:
  • the second stitched feature map is processed by the fourth subconvolutional layer, and the first image is output.
  • the method before the input of the to-be-processed image into the target denoising network, the method further includes:
  • the training process of the target denoising network is specifically executed:
  • the network corresponding to when the first loss function is lower than the first preset threshold is used as the target denoising network.
  • the first loss for the denoising network to be trained is determined according to the first prediction deviation between the simulated denoising image and the true value image of the corresponding frame in the true value video Functions, including:
  • the L2 loss function is adopted
  • f(x) represents the simulated denoising image
  • y represents the true value image of the frame corresponding to the simulated denoising image in the true value video.
  • the method further includes:
  • the network corresponding to when the second loss function is lower than a second preset threshold is used as the target denoising network.
  • embodiments of the present disclosure also provide an apparatus for image restoration, including:
  • the input unit is configured to input the image to be processed into a target denoising network, where the target denoising network includes a single frame network and a recursive network, and the image to be processed is any frame in the video to be processed;
  • a first processing unit configured to perform decompression noise processing on the image to be processed through the single-frame network, and output a first image
  • the second processing unit is configured to perform decompression noise processing on the image to be processed through the recursive network according to the content of the previous frame image, and output a second image, wherein the previous frame image is the image to be processed in The previous frame of image in the to-be-processed video;
  • the output unit is configured to perform a weighted summation of the first image and the second image, and output a denoising image for the image to be processed.
  • an electronic device for image restoration including:
  • the memory is used to store a program
  • the processor is configured to execute the program in the memory and includes the following steps:
  • the target denoising network includes a single frame network and a recursive network, and the image to be processed is any frame in the video to be processed;
  • the recursive network performs decompression noise processing on the to-be-processed image to output a second image, wherein the previous frame image is the image of the to-be-processed image in the to-be-processed video Previous frame image;
  • the embodiments of the present disclosure provide a computer-readable storage medium having computer instructions stored in the computer-readable storage medium, and when the stored computer instructions are executed by a processor, the image as described above can be realized. Repair method.
  • FIG. 1 is a schematic structural diagram of a target denoising network provided by an embodiment of the disclosure
  • FIG. 2 is a schematic diagram of one structure of a recursive network provided by an embodiment of the disclosure
  • FIG. 3 is a schematic diagram of one structure of a recursive network provided by an embodiment of the disclosure.
  • FIG. 4 is a schematic diagram of one structure of a single-frame network provided by an embodiment of the disclosure.
  • FIG. 5 is a schematic structural diagram of a target denoising network provided by an embodiment of the disclosure.
  • FIG. 6 is a method flowchart of an image restoration method provided by an embodiment of the disclosure.
  • Fig. 7 is a flowchart of one of the methods in step S103 in an image restoration method provided by an embodiment of the present disclosure
  • FIG. 8 is a method flowchart of step S102 in an image restoration method provided by an embodiment of the present disclosure
  • step S101 is a flowchart of a method before step S101 in an image restoration method provided by an embodiment of the disclosure.
  • FIG. 10 is a method flowchart after step S404 in an image restoration method provided by an embodiment of the present disclosure.
  • FIG. 11 is a schematic structural diagram of a device for image restoration provided by an embodiment of the disclosure.
  • FIG. 12 is a schematic structural diagram of an electronic device for image restoration provided by an embodiment of the disclosure.
  • Existing methods for removing compression noise from video are mainly to remove compression noise during the video compression process. Specifically, in the video compression encoding process, the noise generated by compression is reduced as much as possible, which is mainly in the same compression process. At a certain level, the video presents a higher quality. The whole processing process cannot realize the denoising of the compressed and damaged video, and the video quality is poor.
  • the embodiments of the present disclosure provide an image restoration method, device, and electronic equipment, which are used to remove noise in video compression and improve display quality.
  • Fig. 1 is a schematic diagram of one structure of the target denoising network 1.
  • the target denoising network 1 includes a single-frame network 20 and a recursive network 10.
  • the recursive network 10 includes a cascaded first convolutional layer 101, a first feature concatenation layer 102 and a first sampling layer 103.
  • the structure of each layer in the recursive network 10 can be multiple, as shown in FIG. 3 is a schematic diagram of one of the structures of the recursive network 10.
  • the first convolutional layer 101 in the recurrent network 10 It includes a first sub-convolution layer 1011 and a second sub-convolution layer 1012.
  • the first feature concatenation layer 102 includes a first sub-feature concatenation layer 1021 and a second sub-feature concatenation layer 1022.
  • the first sampling layer 103 includes a first down-sampling Layer 1031 and first up-sampling layer 1032.
  • the single-frame network 20 includes a cascaded second convolutional layer 201, a second sampling layer 202, and a second feature concatenation layer 203.
  • the second convolutional layer 201 includes a third subconvolutional layer 2011 and a second convolutional layer.
  • the second sampling layer 202 includes a second down-sampling layer 2021 and a second up-sampling layer 2022.
  • the network structure of the single-frame network 20 and the recursive network 10 are roughly the same.
  • the second convolutional layer 201 of the single-frame network 20 includes N third subconvolutional layers 2011, and accordingly, the recursive
  • the first convolutional layer 101 of the network 10 also includes N first subconvolutional layers 1011, where N is an integer greater than 1.
  • the position of each subconvolutional layer in the single-frame network 20 and the recursive network 10 It's also roughly the same.
  • FIG. 5 is a schematic structural diagram of one of the target denoising network 1. Specifically, there are two first down-sampling layers 1031, and the up-sampling layers included in the first up-sampling layer 1032 There are two, the second sub-convolutional layer 1012 is one, the first sub-feature concatenation layer 1021 is one, the second down-sampling layer 2021 is two, the second up-sampling layer 2022 is two, and the fourth sub-convolutional layer 2014 is one, and the second feature concatenation layer 203 is one of the structural schematic diagrams when there are two, where the number of filters in each convolutional layer in the network is shown by the numbers above the horizontal line in FIG. 5, such as 64 and 128.
  • the size of the convolution kernel of each convolutional layer can be 3 ⁇ 3, the stride is 1, and the input of each convolutional layer is filled with a pad size of 1 to ensure that each convolution
  • the input and output sizes of the layers are equal, and the relu activation function can be used to perform nonlinear operations on the output after each convolutional layer is output.
  • a convolutional layer with a stride of 2 can be used to downsample the spatial dimension of the feature map twice, and the size of the convolution kernel is 3 ⁇ 3.
  • the convolutional layer and the depth to space layer can be used to upsample the spatial dimension of the feature map by 2 times.
  • the convolutional layer expands the feature dimension of the input feature map to 4 times.
  • the size of the convolution kernel is 3 ⁇ 3, and the step size is 1.
  • the depth to space layer converts the expansion of the feature dimension of the feature map into an enlargement in the space dimension.
  • the feature maps of different scales are extracted from the single-frame network 20 and the feature maps of the corresponding scale of the recursive network 10 are concatenated in the feature dimension, followed by a convolutional layer to compress the feature dimension.
  • the target denoising network 1 constituted by the single-frame network 20 and the recursive network 10 performs image repair on any frame of the to-be-processed image in the video to be processed, and the specific processing of each layer in the target denoising network 1 The process will be described later and will not be detailed here.
  • FIG. 6 is a method flowchart of an image restoration method provided by an embodiment of the present disclosure. Specifically, the image restoration method includes:
  • S101 Input the image to be processed into a target denoising network, where the target denoising network includes a single frame network and a recursive network, and the image to be processed is any frame in the video to be processed;
  • the video to be processed may be a compressed video.
  • the original video has a frame rate of 30 and a bit rate of 100M
  • the video to be processed is a video whose source video is compressed into a video with a bit rate of 2M.
  • the single-frame network and the recursive network in the target denoising network can have the same encoding and decoding structure, and both can be RNN network models.
  • the target denoising network may be a trained network.
  • the image to be processed is any frame of image in the video to be processed.
  • S102 Perform decompression noise processing on the image to be processed through a single-frame network, and output the first image;
  • S103 Perform decompression noise processing on the image to be processed through the recursive network according to the content of the previous frame image, and output a second image, where the previous frame image is the previous frame of the image to be processed in the video to be processed;
  • the content of the previous frame image may be the semantic information of the previous frame image, and the semantic information may be a feature map extracted from the previous frame image through each convolutional layer in the recursive network.
  • the recursive network can perform decompression noise processing on the image to be processed, thereby outputting the second image.
  • the recursive network can perform decompression noise processing based on the connection between the previous and next frames in the video to be processed.
  • the connection between the previous and next frames can be a motion connection, and then output the second image. Since the second image is an image obtained after decompressing noise processing using the connection between the previous and subsequent frame images, the obtained second image has a better display effect.
  • S104 Perform a weighted summation on the first image and the second image, and output a denoising image for the image to be processed.
  • the first image and the second image may be weighted and summed, and the weighted and summed image is used as the denoising image for the image to be processed. Since the single-frame network directly denoises the image to be processed, the degree of denoising is better. In addition, due to the relationship between the front and rear frames, while ensuring the details of the images of the front and rear frames, the image to be processed is denoised through the recursive network, which helps to ensure the quality of the video.
  • the weighted processing of the first image P1 and the second image P2 is a ⁇ P1+b ⁇ P2, and accordingly, the image to be processed is denoised through the target denoising network.
  • the weighting coefficient of the first image P1 and the second image P2 for example, when a>b, the denoising ability of the image to be processed is stronger, for example, a ⁇ b, the difference between the image to be processed and the previous frame image The details are more consistent and the display effect is better.
  • those skilled in the art can set the weighting coefficients of the first image P1 and the second image P2 according to actual application requirements, which is not limited here.
  • step S103 according to the content of the previous frame image, perform decompression noise processing on the image to be processed through the recursive network, and output a second image, where the previous frame image is the previous image of the image to be processed in the video to be processed
  • a frame of image including:
  • the image to be processed is decompressed and noise processed, and the second image is output.
  • the first convolutional layer may be one or more layers
  • the first feature tandem layer may be one layer to obtain multiple layers
  • the first sampling layer may be one or more layers.
  • the steps perform decompression noise processing on the image to be processed through the cascaded first convolution layer, first feature concatenation layer, and first sampling layer in the recursive network, and output the second Images, including:
  • S201 Receive, through the first sub-feature concatenation layer, the first feature map of the image to be processed extracted by the third sub-convolution layer in each second convolution layer in the single-frame network, and obtain the recursion through the first sub-feature concatenation layer The second feature map extracted from the previous frame image by the first subconvolution layer corresponding to each third subconvolution layer in the network;
  • S202 Perform a series operation on the first feature map and the second feature map through the first sub-feature series layer to obtain a series feature map
  • S203 Compress the concatenated feature maps through the first subconvolution layer to obtain a compressed feature map, where the compressed feature map is a second feature map extracted from the image to be processed through each first subconvolution layer;
  • S204 Extract feature maps of multiple spatial sizes from the compressed feature map through the first down-sampling layer in the first sampling layer;
  • S206 splicing feature maps of the same spatial size in feature dimensions through the second sub-feature series layer to obtain a first splicing feature map
  • S207 Process the first stitched feature map through the second sub-convolutional layer, and output a second image.
  • step S201 to step S207 is as follows:
  • the first feature map of the image to be processed extracted by the third subconvolutional layer in each second convolutional layer in the single-frame network is received through the first subfeature concatenation layer, and the recursion is obtained through the first subfeature concatenation layer
  • the second feature map extracted from the previous frame image by the first subconvolutional layer corresponding to each third subconvolutional layer in the network where there can be multiple third subconvolutional layers in a single-frame network, recursively There can also be multiple first sub-convolutional layers in the network.
  • the third sub-convolutional layer can be combined with the third The first subconvolution layer corresponding to the subconvolution layer extracts the second feature map from the previous frame image.
  • any one of the multiple third subconvolutional layers can extract a corresponding feature map from the image to be processed.
  • any one of the multiple first subconvolutional layers can extract the corresponding feature map from the previous frame image.
  • the first feature map and the second feature map are serially operated through the first sub-feature series layer to obtain a series feature map, the series feature map It includes the feature relationship between the previous and next frame images; then, the concatenated feature map is compressed through the first subconvolutional layer to obtain the compressed feature map.
  • the compressed feature map may be passed through each first subconvolutional layer.
  • the second feature map extracted from the image to be processed for example, the compressed feature map may be the first subconvolutional layer when the target denoising network performs denoising processing on the next frame of the image to be processed The second feature map extracted from the image to be processed.
  • the first down-sampling layer can be multiple, each down-sampling layer extracts feature maps of corresponding spatial sizes, multiple down-sampling layers extract feature maps of different spatial sizes, and any two spatial sizes of the multiple spatial sizes The size is different.
  • the feature maps of two spatial sizes are two feature maps of different spatial sizes.
  • the feature maps of three spatial sizes are three feature maps of different spatial sizes. In this way, multiple first down-sampling layers are used to process the compressed feature maps in different spatial sizes.
  • a feature map of the same spatial size as the multiple spatial sizes is determined, for example, through the first up-sampling layer, a feature map of the same spatial size as the three spatial sizes is determined.
  • the feature maps of the same spatial size are spliced in the feature dimension to obtain the first spliced feature map.
  • the first stitched feature map is processed through the second sub-convolutional layer, so as to output the second image.
  • the output of the second image is realized through the processing between the first convolutional layer, the first feature concatenation layer, and the first sampling layer that are cascaded in the recursive network.
  • step S102 performing decompression and noise processing on the image to be processed through a single-frame network, and outputting the first image, includes:
  • S302 Extract feature maps of multiple spatial sizes from the first feature map through the second down-sampling layer
  • S304 splicing feature maps of the same spatial size in feature dimensions through the second feature series layer to obtain a second splicing feature map
  • S305 Process the second stitched feature map through the fourth sub-convolutional layer, and output the first image.
  • step S301 to step S305 is as follows:
  • each third subconvolution layer in the single-frame network extracts multiple spatial dimensions from the first feature map through the second down-sampling layer in the single-frame network Feature maps, where there can be multiple second down-sampling layers, each down-sampling layer extracts feature maps of corresponding spatial sizes, and multiple second down-sampling layers extract feature maps of different spatial sizes. Any two spaces have different sizes. Then, through the second up-sampling layer in the single-frame network, the feature map of the same spatial size as the multiple spatial sizes is determined.
  • the second down-sampling layer extracts two feature maps of different spatial sizes from the first feature map
  • the second up-sampling layer determines the feature maps of the same spatial size as the two different spatial sizes.
  • the feature maps of the same spatial size are spliced in the feature dimension through the second feature concatenation layer in the single-frame network to obtain the second spliced feature map.
  • the second stitched feature map is processed through the fourth sub-convolutional layer in the single-frame network, and the first image is output.
  • step S101 before inputting the image to be processed into the target denoising network, the method further includes:
  • S401 Acquire multiple sets of image frame sequences, and each set of image frame sequences includes multiple images
  • S402 Encode multiple groups of image frame sequences into true-value video and simulation video, respectively, where each frame of the simulation image in the simulation video contains compression noise;
  • S403 Input each frame of the simulation image in the simulation video into the denoising network to be trained, and output the simulation denoising image of the corresponding frame;
  • S404 Determine the first loss function for the denoising network to be trained according to the first prediction deviation between the simulated denoising image and the true value image of the corresponding frame in the true value video;
  • S405 Use the network corresponding to when the first loss function is lower than the first preset threshold as the target denoising network.
  • step S401 to step S405 is as follows:
  • Each set of image frame sequences includes multiple images.
  • the training set provided by the AIM competition hosted by ICCV-2019 is used as training data.
  • the training set includes a total of 240 sets of frame sequences.
  • the framing sequence contains 181 clear images of 1280 ⁇ 720, and the target denoising network is trained.
  • the training set is processed as follows.
  • multiple sets of image frame sequences are respectively encoded into true-value video and simulation video, where each frame of the simulation image in the simulation video contains compression noise, for example, using ffmpeg
  • the above 240 sets of frame sequences are encoded into MP4 format video as the true value video of the training set, where the encoding format is H.264, the frame rate is 25, and the code rate is about 130M.
  • Use ffmpeg to encode the above 240 groups of frame sequences in H.264, with a frame rate of 25 and a bit rate compressed to about 2M, to generate simulation video containing compression noise and artifacts.
  • each frame of the simulation image in the simulation video is input to the denoising network to be trained, and the simulation denoising image of the corresponding frame is output.
  • the first loss function for the denoising network to be trained is determined, and then the first loss function is lower than the first preset
  • the network corresponding to the threshold is used as the target denoising network.
  • those skilled in the art can set the specific value of the first preset threshold according to actual application needs, which is not limited here.
  • step S504 Determine the first loss function for the denoising network to be trained according to the first prediction deviation between the simulated denoising image and the true value image of the corresponding frame in the true value video, and according to the first prediction deviation
  • the value between the deviation value ⁇ and the set value can have the following two situations:
  • the first case is that if the first prediction deviation between the simulated denoising image and the true value image of the corresponding frame in the true value video is less than or equal to ⁇ , the L2 loss function is used;
  • the second case is that if the first prediction deviation between the simulated denoising image and the true value image of the corresponding frame in the true value video is greater than ⁇ , the L1 loss function is used;
  • f(x) represents the simulated denoising image
  • y represents the true value image of the frame corresponding to the simulated denoising image in the true value video.
  • the set deviation value ⁇ may be 1.
  • those skilled in the art can set the value of ⁇ according to actual applications, which is not limited here.
  • step S404 Determine the first loss for the denoising network to be trained according to the first prediction deviation between the simulated denoising image and the true value image of the corresponding frame in the true value video After the function, the method also includes:
  • S502 Determine the second prediction deviation between the corresponding frame image in the simulated denoising image and the edge-enhanced true value video, and determine the second loss function for the denoising network to be trained;
  • S503 Use the network corresponding to when the second loss function is lower than the second preset threshold as the target denoising network.
  • step S501 to step S503 is as follows:
  • the Adam optimization algorithm can be used to train to The network parameters in the noise network are optimized, and the true value video and the corresponding simulation video are used for training in the initial stage of training.
  • the trained denoising network to be trained can be more complete Restore the image content, at this time, sharpen each frame of the image in the true value video, and then use it as the objective function y in the loss function to continue training the denoising network.
  • the second prediction deviation between the simulated denoising image and the corresponding frame image in the edge-enhanced true-value video is determined, and the second loss function for the denoising network to be trained is determined.
  • the second loss function is still used and calculated
  • the same formula of a loss function is used to determine the second loss function for the denoising network to be trained, which will not be described in detail here.
  • the network corresponding to when the second loss function is lower than the second preset threshold is used as the target denoising network.
  • the ground truth video is first enhanced, and then the denoising network to be trained is enhanced, which helps to reduce the blur degree of the denoised image and can effectively improve the denoised image
  • the sharpness of the image can better restore the image details, thereby improving the quality of the reconstructed image.
  • the input image frame sequence can be cut into blocks, and the size of each image block is 256 ⁇ 256, so that The whole image is cut into 15 patches as a batch.
  • the network learning rate can be set to 10 ⁇ (-4), and the decay coefficient of the learning rate can be 0.8.
  • the learning rate decays to 0.8 times the original, thereby improving the stability of network training.
  • the epoch can be set to 100, and the network has been trained for 100 epochs. When the network is trained to the last 10 epochs, the model effect obtained by each epoch no longer changes significantly.
  • various parameters for training the denoising network to be trained can also be set according to actual application needs, which are not limited here.
  • the problem-solving principle of the target denoising network used for image restoration is similar to that of the aforementioned image restoration method. Therefore, the implementation of the target denoising network can refer to the implementation of the aforementioned image restoration method. Repeat it again.
  • an embodiment of the present disclosure also provides a device for image restoration, including:
  • the input unit 100 is configured to input the image to be processed into a target denoising network, where the target denoising network includes a single frame network and a recursive network, and the image to be processed is any frame in the video to be processed;
  • the first processing unit 200 is configured to perform decompression noise processing on the image to be processed through a single-frame network, and output a first image;
  • the second processing unit 300 is configured to perform decompression noise processing on the image to be processed through the recursive network according to the content of the previous frame image, and output a second image, where the previous frame image is the previous frame of the image to be processed in the video to be processed image;
  • the output unit 400 is configured to perform a weighted summation of the first image and the second image, and output a denoised image for the image to be processed.
  • the second processing unit 300 is used for:
  • the image to be processed is decompressed and noise processed, and the second image is output.
  • the second processing unit 300 is used for:
  • the first feature map of the image to be processed extracted by the third subconvolutional layer in each second convolutional layer in the single-frame network is received through the first subfeature concatenation layer, and the first feature map of the recursive network is obtained through the first subfeature concatenation layer
  • the feature map of the same spatial size as the multiple spatial sizes is determined
  • the first stitched feature map is processed through the second sub-convolutional layer, and the second image is output.
  • the first processing unit 200 is used to:
  • the feature map of the same spatial size as the multiple spatial sizes is determined
  • the second stitched feature map is processed through the fourth sub-convolutional layer, and the first image is output.
  • the apparatus for image restoration before the input unit 100 inputs the image to be processed into the target denoising network, the apparatus for image restoration further includes:
  • Training unit which is used to:
  • the network corresponding to when the first loss function is lower than the first preset threshold is used as the target denoising network.
  • the training unit is used to:
  • the L2 loss function is adopted
  • f(x) represents the simulated denoising image
  • y represents the true value image of the frame corresponding to the simulated denoising image in the true value video.
  • the training unit is also used to:
  • the network corresponding to when the second loss function is lower than the second preset threshold is used as the target denoising network.
  • the problem-solving principle of the device for image restoration is similar to that of the aforementioned image restoration method. Therefore, the implementation of the device for image restoration can refer to the implementation of the aforementioned image restoration method, and the repetition will not be repeated here. Go into details.
  • an electronic device for image restoration including:
  • the memory 2 is used to store programs
  • the processor 3 is configured to execute the program in the memory 2, and includes the following steps:
  • Input the image to be processed into a target denoising network where the target denoising network includes a single frame network and a recursive network, and the image to be processed is any frame in the video to be processed;
  • the previous frame image perform decompression noise processing on the image to be processed through the recursive network, and output a second image, where the previous frame image is the previous frame of the image to be processed in the video to be processed;
  • the processor 3 may be a general-purpose processor, a digital signal processor, an application specific integrated circuit, a field programmable gate array or other programmable logic device, a discrete gate or transistor logic device, a discrete hardware component, and may implement or execute the The disclosed methods, steps and logic block diagrams.
  • the general-purpose processor may be a microprocessor or any conventional processor or the like.
  • the steps of the method for image restoration disclosed in the embodiments of the present disclosure may be directly embodied as being executed and completed by a hardware processor, or executed and completed by a combination of hardware and software modules in the processor.
  • the software module can be located in a mature storage medium in the field, such as random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers.
  • the storage medium is located in the memory 2, and the processor 3 reads the information in the memory 2, and completes the steps of the signal processing flow in combination with its hardware.
  • the processor 3 is configured to read a program in the memory 2 and execute any step of the above-mentioned image restoration method.
  • the embodiments of the present disclosure also provide a computer-readable storage medium, the computer-readable storage medium stores computer instructions, and the stored computer instructions can be executed by the processor to implement the image restoration method as above A step of.
  • the embodiments of the present disclosure provide a method, device, and electronic equipment for image restoration, wherein the method inputs any frame of the to-be-processed image in the to-be-processed video into a target denoising network composed of a single-frame network and a recursive network.
  • the single-frame network performs decompression noise processing on the image to be processed and outputs the first image.
  • the recursive network performs decompression noise processing on the image to be processed, and outputs the first image. Two images, and then the first image and the second image are weighted and summed, and the denoising image for the current frame image is output.
  • any frame of the image to be processed in the video to be processed needs to be combined with the image of the previous frame to decompress the noise, so as to realize the process
  • the removal of compression noise improves the display quality.
  • the connection between the previous and next frame images is used in the entire process of decompressing the noise, the motion compensation between frames can be realized, thereby improving the video quality.
  • the embodiments of the present disclosure can be provided as a method, a system, or a computer program product. Therefore, the present disclosure may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, the present disclosure may take the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program codes.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions can also be stored in a computer-readable memory that can direct a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device.
  • the device implements the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
  • These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment.
  • the instructions provide steps for implementing functions specified in a flow or multiple flows in the flowchart and/or a block or multiple blocks in the block diagram.

Abstract

一种图像修复的方法、装置及电子设备,其中,方法包括:将待处理图像输入目标去噪网络,其中,目标去噪网络包括单帧网络和递归网络,待处理图像为待处理视频中的任一帧(S101);通过单帧网络对待处理图像进行去压缩噪声处理,输出第一图像(S102);根据先前帧图像的内容,通过递归网络对待处理图像进行去压缩噪声处理,输出第二图像,其中,先前帧图像为待处理图像在待处理视频中的前一帧图像(S103);将第一图像和第二图像进行加权求和,输出针对待处理图像的去噪图像(S104)。用于去除视频压缩中的噪声,提高显示品质。

Description

图像修复的方法、装置及电子设备
相关申请的交叉引用
本公开要求在2020年06月22日提交中国专利局、申请号为202010574404.7、申请名称为“一种图像修复的方法、装置及电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本公开中。
技术领域
本公开涉及图像处理技术领域,特别涉及一种图像修复的方法、装置及电子设备。
背景技术
为了避免视频占用较大的存储空间,以及提高传输速度,常常需要将视频进行压缩,然而,压缩过程中难免产生各种噪声,进而影响显示效果。
发明内容
第一方面,本公开实施例提供了一种图像修复的方法,包括:
将待处理图像输入目标去噪网络,其中,所述目标去噪网络包括单帧网络和递归网络,所述待处理图像为待处理视频中的任一帧;
通过所述单帧网络对所述待处理图像进行去压缩噪声处理,输出第一图像;
根据先前帧图像的内容,通过所述递归网络对所述待处理图像进行去压缩噪声处理,输出第二图像,其中,所述先前帧图像为所述待处理图像在所述待处理视频中的前一帧图像;
将所述第一图像和第二图像进行加权求和,输出针对所述待处理图像的去噪图像。
在一种可能的实现方式中,所述根据先前帧图像的内容,通过所述递归 网络对所述待处理图像进行去压缩噪声处理,输出第二图像,包括:
通过所述递归网络中级联的第一卷积层、第一特征串联层、第一采样层对所述待处理图像进行去压缩噪声处理,输出第二图像。
在一种可能的实现方式中,所述递归网络中的所述第一卷积层包括第一子卷积层和第二子卷积层,所述第一特征串联层包括第一子特征串联层和第二子特征串联层,所述第一采样层包括第一下采样层和第一上采样层;
所述通过所述递归网络中级联的第一卷积层、第一特征串联层、第一采样层对所述待处理图像进行去压缩噪声处理,输出第二图像,包括:
通过所述第一子特征串联层接收由所述单帧网络中各第二卷积层中第三子卷积层所提取的所述待处理图像的第一特征图,以及通过所述第一子特征串联层获取所述递归网络中与各所述第三子卷积层对应的所述第一子卷积层从所述先前帧图像中所提取的第二特征图;
通过所述第一子特征串联层对所述第一特征图和所述第二特征图进行串联操作,获得串联特征图;
通过所述第一子卷积层对所述串联特征图进行压缩,获得压缩后的特征图,所述压缩后的特征图为通过各所述第一子卷积层从所述待处理图像中所提取的所述第二特征图;
通过所述第一采样层中的第一下采样层,从所述压缩后的特征图中提取多个空间尺寸的特征图;
通过所述第一上采样层,确定与所述多个空间尺寸中相同空间尺寸的特征图;
通过所述第二子特征串联层对相同空间尺寸的特征图在特征维度上进行拼接,获得第一拼接特征图;
通过所述第二子卷积层对所述第一拼接特征图进行处理,输出所述第二图像。
在一种可能的实现方式中,所述单帧网络包括级联的第二卷积层、第二采样层和第二特征串联层,所述第二卷积层包括第三子卷积层和第四子卷积 层,所述第二采样层包括第二下采样层和第二上采样层;
所述通过所述单帧网络对所述待处理图像进行去压缩噪声处理,输出第一图像,包括:
通过各所述第三子卷积层提取所述待处理图像的第一特征图;
通过所述第二下采样层,从所述第一特征图中提取多个空间尺寸的特征图;
通过所述第二上采样层,确定与所述多个空间尺寸中相同空间尺寸的特征图;
通过所述第二特征串联层对相同空间尺寸的特征图在特征维度上进行拼接,获得第二拼接特征图;
通过所述第四子卷积层对所述第二拼接特征图进行处理,输出所述第一图像。
在一种可能的实现方式中,在所述将待处理图像输入目标去噪网络之前,所述方法还包括:
所述目标去噪网络的训练过程,具体执行:
获取多组图像帧序列,每组图像帧序列包括多幅图像;
将所述多组图像帧序列分别编码成真值视频以及仿真视频,其中,所述仿真视频中的每帧仿真图像中包含有压缩噪声;
将所述仿真视频中每帧仿真图像输入待训练去噪网络,输出对应帧的仿真去噪图像;
根据所述仿真去噪图像与所述真值视频中对应帧真值图像间的第一预测偏差,确定针对所述待训练去噪网络的第一损失函数;
将所述第一损失函数低于第一预设阈值时所对应的网络,作为所述目标去噪网络。
在一种可能的实现方式中,所述根据所述仿真去噪图像与所述真值视频中对应帧真值图像间的第一预测偏差,确定针对所述待训练去噪网络的第一损失函数,包括:
若所述仿真去噪图像与所述真值视频中对应帧真值图像间的第一预测偏差小于或者等于δ时,则采用L2损失函数;
若所述仿真去噪图像与所述真值视频中对应帧真值图像间的第一预测偏差大于δ时,则采用L1损失函数;
所述L2损失函数对应的公式为:
Figure PCTCN2021095778-appb-000001
所述L1损失函数对应的公式为:
Figure PCTCN2021095778-appb-000002
其中,f(x)表示仿真去噪图像,y表示所述真值视频中与所述仿真去噪图像对应帧的真值图像。
在一种可能的实现方式中,在所述根据所述仿真去噪图像与所述真值视频中对应帧真值图像间的第一预测偏差,确定针对所述待训练去噪网络的第一损失函数之后,所述方法还包括:
在所述第一损失函数低于所述第一预设阈值时,对所述真值视频中每帧真值图像进行锐化处理,获得边缘增强后的真值视频;
确定所述仿真去噪图像与所述边缘增强后的真值视频中对应帧图像间的第二预测偏差,确定针对所述待训练去噪网络的第二损失函数;
将所述第二损失函数低于第二预设阈值时所对应的网络,作为所述目标去噪网络。
第二方面,本公开实施例还提供了一种用于图像修复的装置,包括:
输入单元,用于将待处理图像输入目标去噪网络,其中,所述目标去噪网络包括单帧网络和递归网络,所述待处理图像为待处理视频中的任一帧;
第一处理单元,用于通过所述单帧网络对所述待处理图像进行去压缩噪声处理,输出第一图像;
第二处理单元,用于根据先前帧图像的内容,通过所述递归网络对所述待处理图像进行去压缩噪声处理,输出第二图像,其中,所述先前帧图像为所述待处理图像在所述待处理视频中的前一帧图像;
输出单元,用于将所述第一图像和第二图像进行加权求和,输出针对所述待处理图像的去噪图像。
第三方面,本公开实施例提供了一种用于图像修复的电子设备,包括:
存储器和处理器;
其中,所述存储器用于存储程序;
所述处理器用于执行所述存储器中的程序,包括如下步骤:
将待处理图像输入目标去噪网络,其中,所述目标去噪网络包括单帧网络和递归网络,所述待处理图像为待处理视频中的任一帧;
通过所述单帧网络对所述待处理图像进行去压缩噪声处理,输出第一图像;
根据先前帧图像的内容,通过所述递归网络对所述待处理图像进行去压缩噪声处理,输出第二图像,其中,所述先前帧图像为所述待处理图像在所述待处理视频中的前一帧图像;
将所述第一图像和第二图像进行加权求和,输出针对所述待处理图像的去噪图像。
第四方面,本公开实施例提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机指令,存储的所述计算机指令被处理器执行时能够实现如上面所述的图像修复的方法。
附图说明
图1为本公开实施例提供的目标去噪网络的其中一种结构示意图;
图2为本公开实施例提供的递归网络的其中一种结构示意图;
图3为本公开实施例提供的递归网络的其中一种结构示意图;
图4为本公开实施例提供的单帧网络的其中一种结构示意图;
图5为本公开实施例提供的目标去噪网络的其中一种结构示意图;
图6为本公开实施例提供的一种图像修复方法的方法流程图;
图7为本公开实施例提供的一种图像修复方法中步骤S103的其中一种方 法流程图;
图8为本公开实施例提供的一种图像修复方法中步骤S102的方法流程图;
图9为本公开实施例提供的一种图像修复方法中在步骤S101之前的方法流程图;
图10为本公开实施例提供的一种图像修复方法中在步骤S404之后的方法流程图;
图11为本公开实施例提供的一种用于图像修复的装置的结构示意图;
图12为本公开实施例提供的一种用于图像修复的电子设备的结构示意图。
具体实施方式
为使本公开实施例的目的、技术方案和优点更加清楚,下面将结合本公开实施例的附图,对本公开实施例的技术方案进行清楚、完整地描述。显然,所描述的实施例是本公开的一部分实施例,而不是全部的实施例。并且在不冲突的情况下,本公开中的实施例及实施例中的特征可以相互组合。基于所描述的本公开的实施例,本领域普通技术人员在无需创造性劳动的前提下所获得的所有其他实施例,都属于本公开保护的范围。
除非另外定义,本公开使用的技术术语或者科学术语应当为本公开所属领域内具有一般技能的人士所理解的通常意义。
现有针对视频去除压缩噪声的方法,主要是在视频压缩过程中对压缩噪声进行去除,具体是,在视频压缩编码过程中,尽可能地减少因为压缩所产生的噪声,其主要是在同等压缩程度下,令视频呈现更高的质量。整个处理过程无法实现对已经压缩受损的视频进行去噪,视频品质较差。
鉴于此,本公开实施例提供了一种图像修复的方法、装置及电子设备,用于去除视频压缩中的噪声,提高显示品质。
在介绍本公开实施例所提供技术方案之前,首先对本公开实施例中用于图像修复的目标去噪网络的具体结构进行简单的描述。
如图1所示为目标去噪网络1的其中一种结构示意图,具体来讲,该目 标去噪网络1包括单帧网络20和递归网络10。
其中,如图2所示为递归网络10的其中一种结构示意图,该递归网络10包括级联的第一卷积层101、第一特征串联层102和第一采样层103。在具体实施过程中,递归网络10中的各层结构可以是多个,如图3所示为递归网络10的其中一种结构示意图,具体来讲,该递归网络10中第一卷积层101包括第一子卷积层1011和第二子卷积层1012,第一特征串联层102包括第一子特征串联层1021和第二子特征串联层1022,第一采样层103包括第一下采样层1031和第一上采样层1032。如图4所示为单帧网络20包括级联的第二卷积层201、第二采样层202和第二特征串联层203,第二卷积层201包括第三子卷积层2011和第四子卷积层2014,第二采样层202包括第二下采样层2021和第二上采样层2022。
在具体实施过程中,单帧网络20和递归网络10的网络结构大体相同,比如,单帧网络20的第二卷积层201中包括N个第三子卷积层2011,则相应地,递归网络10的第一卷积层101中也包括N个第一子卷积层1011,其中,N为大于1的整数,此外,单帧网络20与递归网络10中各子卷积层设置的位置也大体相同。第一下采样层1031可以是多个,相应地,第一上采样层1032也可以是多个,比如,有两个第一下采样层1031,则相应地有两个第一上采样层1032。
在本公开实施例中,如图5所示为目标去噪网络1的其中一种结构示意图,具体来讲,第一下采样层1031为两个,第一上采样层1032包括的上采样层为两个,第二子卷积层1012为一个,第一子特征串联层1021为一个,第二下采样层2021为两个,第二上采样层2022为两个,第四子卷积层2014为一个,第二特征串联层203为两个时的其中一种结构示意图,其中,网络中各卷积层滤波器数量如图5中横线上方的数字所示,比如64、128。
在具体实施过程中,各卷积层的卷积核的尺寸均可以为3×3,步长stride为1,对各卷积层的输入做pad size为1的0填充,从而保证各卷积层的输入输出尺寸相等,在经各卷积层输出后可以使用relu激活函数对输出做非线性 运算。对各下采样层可以使用步长stride为2的卷积层对特征图的空间维度进行2倍下采样,卷积核尺寸为3×3。对各上采样层可以采用卷积层和深度到空间depth to space层对特征图的空间维度进行2倍上采样,首先,卷积层将输入的特征图的特征维度扩大为原来的4倍,卷积核尺寸为3×3,步长为1,然后,depth to space层将特征图的特征维度的扩张转换为空间维度上的放大。对于各特征串联层,主要是从单帧网络20中提取不同尺度的特征图与递归网络10对应尺度的特征图在特征维度上做串联操作,后接一个卷积层对特征维度进行压缩。
在具体实施过程中,通过单帧网络20和递归网络10所构成的目标去噪网络1对待处理视频中的任一帧待处理图像进行图像修复,该目标去噪网络1中各层具体的处理过程将在后续进行描述,在此不再详述。
如图6所示为本公开实施例提供的一种图像修复方法的方法流程图,具体来讲,该图像修复的方法包括:
S101:将待处理图像输入目标去噪网络,其中,目标去噪网络包括单帧网络和递归网络,待处理图像为待处理视频中的任一帧;
在具体实施过程中,待处理视频可以是经压缩后的视频,比如,原视频帧率为30,码率为100M,待处理视频为将其该源视频压缩为码率为2M的视频。目标去噪网络中的单帧网络和递归网络可以是同样的编码解码结构,二者都可以是RNN网络模型。在具体实施过程中,该目标去噪网络可以是经训练好的网络。待处理图像为待处理视频中的任意一帧图像。
S102:通过单帧网络对待处理图像进行去压缩噪声处理,输出第一图像;
S103:根据先前帧图像的内容,通过递归网络对待处理图像进行去压缩噪声处理,输出第二图像,其中,先前帧图像为待处理图像在待处理视频中的前一帧图像;
在具体实施过程中,先前帧图像的内容可以是先前帧图像的语义信息,该语义信息可以是通过递归网络中的各卷积层从先前帧图像中所提取出的特征图。根据先前帧图像的内容,递归网络可以对待处理图像进行去压缩噪声 处理,从而输出第二图像。也就是说,递归网络可以根据待处理视频中的前后帧间的联系来进行去压缩噪声处理,比如,前后帧间的联系可以是运动联系,进而输出第二图像。由于第二图像为利用前后帧图像之间的联系进行去压缩噪声处理之后所获得图像,所获得的第二图像显示效果更好。
S104:将第一图像和第二图像进行加权求和,输出针对待处理图像的去噪图像。
在具体实施过程中,在输出第一图像和第二图像之后,可以是将第一图像和第二图像进行加权求和,将加权求和后的图像作为针对待处理图像的去噪图像。由于单帧网络直接对待处理图像进行去噪,去噪程度更好。此外,由于基于前后帧间的关系,在保证前后帧图像细节的同时,通过递归网络对待处理图像进行了去噪处理,有利于保证视频品质。
在具体实施过程中,比如,对第一图像P1和第二图像P2的加权处理后为a×P1+b×P2,相应地,通过目标去噪网络对待处理图像进行去噪处理后的去噪图像P可以为a×P1+b×P2,a表示对第一图像P1的加权系数,b表示对第二图像P2的加权系数,a+b=1,在具体实施过程中,可以根据实际需要来设定第一图像P1和第二图像P2的加权系数,比如,a>b时,对待处理图像的去噪能力更强,再比如,a<b,待处理图像与先前帧图像之间的细节更连贯,显示效果更好。当然,本领域技术人员可以根据实际应用需要来设定第一图像P1和第二图像P2的加权系数,在此不做限定。
在本公开实施例中,步骤S103:根据先前帧图像的内容,通过递归网络对待处理图像进行去压缩噪声处理,输出第二图像,其中,先前帧图像为待处理图像在待处理视频中的前一帧图像,包括:
通过递归网络中级联的第一卷积层、第一特征串联层、第一采样层对待处理图像进行去压缩噪声处理,输出第二图像。
在具体实施过程中,第一卷积层可以是一层或多层,第一特征串联层可以是一层获得多层,第一采样层可以是一层或多层。通过递归网络中的级联的第一卷积层、第一特征串联层、第一采样层对待处理图像进行去压缩噪声 处理,由于递归网络充分考虑到了先前帧图像中的内容,从而提高了对视频去噪后的图像内容间的连贯性,提高了显示品质。
在本公开实施例中,如图7所示,步骤:通过递归网络中级联的第一卷积层、第一特征串联层、第一采样层对待处理图像进行去压缩噪声处理,输出第二图像,包括:
S201:通过第一子特征串联层接收由单帧网络中各第二卷积层中第三子卷积层所提取的待处理图像的第一特征图,以及通过第一子特征串联层获取递归网络中与各第三子卷积层对应的第一子卷积层从先前帧图像中所提取的第二特征图;
S202:通过第一子特征串联层对第一特征图和第二特征图进行串联操作,获得串联特征图;
S203:通过第一子卷积层对串联特征图进行压缩,获得压缩后的特征图,压缩后的特征图为通过各第一子卷积层从待处理图像中所提取的第二特征图;
S204:通过第一采样层中的第一下采样层,从压缩后的特征图中提取多个空间尺寸的特征图;
S205:通过第一上采样层,确定与多个空间尺寸中相同空间尺寸的特征图;
S206:通过第二子特征串联层对相同空间尺寸的特征图在特征维度上进行拼接,获得第一拼接特征图;
S207:通过第二子卷积层对第一拼接特征图进行处理,输出第二图像。
在具体实施过程中,步骤S201至步骤S207的具体实现过程如下:
首先,通过第一子特征串联层接收由单帧网络中各第二卷积层中第三子卷积层所提取的待处理图像的第一特征图,以及通过第一子特征串联层获取递归网络中与各第三子卷积层对应的第一子卷积层从先前帧图像中所提取的第二特征图,其中,单帧网络中的第三子卷积层可以是多个,递归网络中的第一子卷积层也可以是多个,在通过单帧网络其中的一个第三子卷积层从待处理图像中提取第一特征图时,可以通过递归网络中与该第三子卷积层对应 的第一子卷积层从先前帧图像中提取第二特征图。在具体实施过程中,多个第三子卷积层中的任一子卷积层都可以从待处理图像中提取出对应的特征图。相应地,多个第一子卷积层中的任一子卷积层都可以从先前帧图像中提取出对应的特征图。
在通过第一子特征串联层接收第一特征图以及第二特征图之后,通过第一子特征串联层对第一特征图和第二特征图进行串联操作,获得串联特征图,该串联特征图包括有前后帧图像间的特征关系;然后,通过第一子卷积层对该串联特征图进行压缩,获得压缩后的特征图,该压缩后的特征图可以为通过各第一子卷积层从待处理图像中所提取的第二特征图,比如,该压缩后的特征图可以是在目标去噪网络对待处理图像之后的后一帧图像进行去噪处理时,通过第一子卷积层从待处理图像中所提取的第二特征图。
然后,通过第一采样层中的第一下采样层,从压缩后的特征图中提取多个空间尺寸的特征图。其中,第一下采样层可以是多个,每个下采样层提取相应空间尺寸的特征图,多个下采样层分别提取不同空间尺寸的特征图,多个空间尺寸中的任意两个空间尺寸大小不同。比如,第一下采样层有两个,则可以从压缩后的特征图中提取两个空间尺寸的特征图,其中,两个空间尺寸的特征图也就是说两个不同空间尺寸的特征图,再比如,第一下采样层有三个,则可以从压缩后的特征图中提取三个空间尺寸的特征图,其中,三个空间尺寸的特征图也就是说三个不同空间尺寸的特征图。从而通过多个第一下采样层实现对压缩后的特征图在不同空间尺寸的处理。
然后,通过第一上采样层,确定与多个空间尺寸中相同空间尺寸的特征图,比如,通过第一上采样层,确定与三个空间尺寸中相同空间尺寸的特征图。在具体实施过程中,递归网络中的第一上采样层也可以是多个,通过每个第一上采样层可以确定与多个空间尺寸中相同空间尺寸的特征图。
然后,通过第二子特征串联层对相同空间尺寸的特征图在特征维度上进行拼接,获得第一拼接特征图。然后,通过第二子卷积层对第一拼接特征图进行处理,从而输出第二图像。从而通过递归网络中级联的第一卷积层、第 一特征串联层、第一采样层中各层间的处理,实现了对第二图像的输出。
在本公开实施例中,如图8所示,步骤S102:通过单帧网络对待处理图像进行去压缩噪声处理,输出第一图像,包括:
S301:通过各第三子卷积层提取待处理图像的第一特征图;
S302:通过第二下采样层,从第一特征图中提取多个空间尺寸的特征图;
S303:通过第二上采样层,确定与多个空间尺寸中相同空间尺寸的特征图;
S304:通过第二特征串联层对相同空间尺寸的特征图在特征维度上进行拼接,获得第二拼接特征图;
S305:通过第四子卷积层对第二拼接特征图进行处理,输出第一图像。
在具体实施过程中,步骤S301至步骤S305的具体实现过程如下:
首先,通过单帧网络中各第三子卷积层提取待处理图像的第一特征图,然后,通过单帧网络中的第二下采样层,从第一特征图中提取多个空间尺寸的特征图,其中,第二下采样层可以是多个,每个下采样层提取相应空间尺寸的特征图,多个第二下采样层分别提取不同空间尺寸的特征图,多个空间尺寸中的任意两个空间尺寸大小不同。然后,通过单帧网络中的第二上采样层,确定与多个空间尺寸中相同空间尺寸的特征图。比如,第二下采样层从第一特征图中提取了两个不同空间尺寸的特征图,则相应地,从第二上采样层确定与这两个不同空间尺寸中相同空间尺寸的特征图,然后,通过单帧网络中的第二特征串联层对相同空间尺寸的特征图在特征维度上进行拼接,获得第二拼接特征图。然后,通过单帧网络中的第四子卷积层对第二拼接特征图进行处理,输出第一图像。
在本公开实施例中,如图9所示,在步骤S101:将待处理图像输入目标去噪网络之前,方法还包括:
目标去噪网络的训练过程,具体执行:
S401:获取多组图像帧序列,每组图像帧序列包括多幅图像;
S402:将多组图像帧序列分别编码成真值视频以及仿真视频,其中,仿 真视频中的每帧仿真图像中包含有压缩噪声;
S403:将仿真视频中每帧仿真图像输入待训练去噪网络,输出对应帧的仿真去噪图像;
S404:根据仿真去噪图像与真值视频中对应帧真值图像间的第一预测偏差,确定针对待训练去噪网络的第一损失函数;
S405:将第一损失函数低于第一预设阈值时所对应的网络,作为目标去噪网络。
在具体实施过程中,步骤S401至步骤S405的具体实现过程如下:
首先,获取多组图像帧序列,每组图像帧序列包括多幅图像,比如,采用ICCV-2019主办的AIM竞赛提供的训练集作为训练数据,该训练集共包括240组帧序列,每组帧序列含有181幅1280×720的清晰图像,对该目标去噪网络进行训练。具体地,对训练集进行如下处理,首先,将多组图像帧序列分别编码成真值视频以及仿真视频,其中,该仿真视频中的每帧仿真图像中包含有压缩噪声,比如,使用ffmpeg将上述240组帧序列编码成MP4格式的视频作为训练集的真值视频,其中,编码格式为H.264,帧率为25,码率为130M左右。使用ffmpeg将上述240组帧序列进行H.264编码,帧率为25,码率压缩至2M左右,生成含有压缩噪声和伪影的仿真视频。然后,将仿真视频中每帧仿真图像输入待训练去噪网络,输出对应帧的仿真去噪图像。然后,根据仿真去噪图像与真值视频中对应帧真值图像间的第一预测偏差,确定针对待训练去噪网络的第一损失函数,然后,将第一损失函数低于第一预设阈值时所对应的网络,作为目标去噪网络。其中,本领域技术人员可以根据实际应用需要来设置第一预设阈值的具体数值,在此不做限定。
在本公开实施例中,步骤S504:根据仿真去噪图像与真值视频中对应帧真值图像间的第一预测偏差,确定针对待训练去噪网络的第一损失函数,根据第一预测偏差与设定的偏差值δ间的数值大小,可以有以下两种情况:
第一种情况为,若仿真去噪图像与真值视频中对应帧真值图像间的第一预测偏差小于或者等于δ时,则采用L2损失函数;
第二种情况为,若仿真去噪图像与真值视频中对应帧真值图像间的第一预测偏差大于δ时,则采用L1损失函数;
L2损失函数对应的公式为:
Figure PCTCN2021095778-appb-000003
L1损失函数对应的公式为:
Figure PCTCN2021095778-appb-000004
其中,f(x)表示仿真去噪图像,y表示真值视频中与仿真去噪图像对应帧的真值图像。
在具体实施过程中,设定的偏差值δ可以是1,当然,本领域技术人员可以根据实际应用来设置δ的数值大小,在此不做限定。
在本公开实施例中,如图10所示,在步骤S404:根据仿真去噪图像与真值视频中对应帧真值图像间的第一预测偏差,确定针对待训练去噪网络的第一损失函数之后,方法还包括:
S501:在第一损失函数低于第一预设阈值时,对真值视频中每帧真值图像进行锐化处理,获得边缘增强后的真值视频;
S502:确定仿真去噪图像与边缘增强后的真值视频中对应帧图像间的第二预测偏差,确定针对待训练去噪网络的第二损失函数;
S503:将第二损失函数低于第二预设阈值时所对应的网络,作为目标去噪网络。
在具体实施过程中,步骤S501至步骤S503的具体实现过程如下:
首先,在第一损失函数低于第一预设阈值时,对真值视频中每帧真值图像进行锐化处理,获得边缘增强后的真值视频,其中,可以使用Adam优化算法对待训练去噪网络中的网络参数进行优化,在训练初期使用真值视频和对应的仿真视频进行训练,当第一损失函数低于第一预设阈值时,经训练的待训练去噪网络能够较完整地恢复图像内容,此时对真值视频中每帧图像进行锐化处理,再作为损失函数中的目标函数y,继续对该去噪网络进行训练。具体地,确定该仿真去噪图像与边缘增强后的真值视频中对应帧图像间的第 二预测偏差,确定针对该待训练去噪网络的第二损失函数,具体地,仍采用与计算第一损失函数的相同公式来确定针对待训练去噪网络的第二损失函数,在此不再详述。然后,将第二损失函数低于第二预设阈值时所对应的网络,作为目标去噪网络。由于对待训练去噪网络的训练过程中,先对真值视频进行了增强,然后再对待训练去噪网络进行增强,从而有助于减轻去噪后图像的模糊程度,能有效提升去噪后图像的清晰度,更好地还原图像细节,从而提高了重建图像的质量。
在具体实施过程中,在对待训练去噪网络进行训练时,为了提高训练效率,可以是对输入的每组图像帧序列做裁块处理,每个图像块的尺寸为256×256,从而将整幅图像裁成15个patch作为一个batch。网络学习率可以为设置为10^(-4),学习率的衰减系数可以为0.8,每训练一个epoch,学习率衰减为原来的0.8倍,从而提高了网络训练的稳定性。此外,epoch可以设置为100,网络一共训练了100个epoch,当网络训练到最后10个epoch时,每个epoch得到的模型效果已不再有明显变化。当然,还可以根据实际应用需要来设置对待训练去噪网络进行训练的各个参数,在此不做限定。
在本公开实施例中,用于图像修复的目标去噪网络解决问题的原理与前述图像修复的方法相似,因此该目标去噪网络的实施可以参见前述图像修复的方法的实施,重复之处不再赘述。
基于同一发明构思,如图11所示,本公开实施例还提供了一种用于图像修复的装置,包括:
输入单元100,用于将待处理图像输入目标去噪网络,其中,目标去噪网络包括单帧网络和递归网络,待处理图像为待处理视频中的任一帧;
第一处理单元200,用于通过单帧网络对待处理图像进行去压缩噪声处理,输出第一图像;
第二处理单元300,用于根据先前帧图像的内容,通过递归网络对待处理图像进行去压缩噪声处理,输出第二图像,其中,先前帧图像为待处理图像在待处理视频中的前一帧图像;
输出单元400,用于将第一图像和第二图像进行加权求和,输出针对待处理图像的去噪图像。
在本公开实施例中,第二处理单元300用于:
通过递归网络中级联的第一卷积层、第一特征串联层、第一采样层对待处理图像进行去压缩噪声处理,输出第二图像。
在本公开实施例中,第二处理单元300用于:
通过第一子特征串联层接收由单帧网络中各第二卷积层中第三子卷积层所提取的待处理图像的第一特征图,以及通过第一子特征串联层获取递归网络中与各第三子卷积层对应的第一子卷积层从先前帧图像中所提取的第二特征图;
通过第一子特征串联层对第一特征图和第二特征图进行串联操作,获得串联特征图;
通过第一子卷积层对串联特征图进行压缩,获得压缩后的特征图,压缩后的特征图为通过各第一子卷积层从待处理图像中所提取的第二特征图;
通过第一采样层中的第一下采样层,从压缩后的特征图中提取多个空间尺寸的特征图;
通过第一上采样层,确定与多个空间尺寸中相同空间尺寸的特征图;
通过第二子特征串联层对相同空间尺寸的特征图在特征维度上进行拼接,获得第一拼接特征图;
通过第二子卷积层对第一拼接特征图进行处理,输出第二图像。在本公开实施例中,第一处理单元200用于:
通过各第三子卷积层提取待处理图像的第一特征图;
通过第二下采样层,从第一特征图中提取多个空间尺寸的特征图;
通过第二上采样层,确定与多个空间尺寸中相同空间尺寸的特征图;
通过第二特征串联层对相同空间尺寸的特征图在特征维度上进行拼接,获得第二拼接特征图;
通过第四子卷积层对第二拼接特征图进行处理,输出第一图像。在本公 开实施例中,在输入单元100将待处理图像输入目标去噪网络之前,用于图像修复的装置还包括:
训练单元,该训练单元用于:
获取多组图像帧序列,每组图像帧序列包括多幅图像;
将多组图像帧序列分别编码成真值视频以及仿真视频,其中,仿真视频中的每帧仿真图像中包含有压缩噪声;
将仿真视频中每帧仿真图像输入待训练去噪网络,输出对应帧的仿真去噪图像;
根据仿真去噪图像与真值视频中对应帧真值图像间的第一预测偏差,确定针对待训练去噪网络的第一损失函数;
将第一损失函数低于第一预设阈值时所对应的网络,作为目标去噪网络。
在本公开实施例中,训练单元用于:
若仿真去噪图像与真值视频中对应帧真值图像间的第一预测偏差小于或者等于δ时,则采用L2损失函数;
若仿真去噪图像与真值视频中对应帧真值图像间的第一预测偏差大于δ时,则采用L1损失函数;
L2损失函数对应的公式为:
Figure PCTCN2021095778-appb-000005
L1损失函数对应的公式为:
Figure PCTCN2021095778-appb-000006
其中,f(x)表示仿真去噪图像,y表示真值视频中与仿真去噪图像对应帧的真值图像。
在本公开实施例中,训练单元还用于:
在第一损失函数低于第一预设阈值时,对真值视频中每帧真值图像进行锐化处理,获得边缘增强后的真值视频;
确定仿真去噪图像与边缘增强后的真值视频中对应帧图像间的第二预测偏差,确定针对待训练去噪网络的第二损失函数;
将第二损失函数低于第二预设阈值时所对应的网络,作为目标去噪网络。
在本公开实施例中,用于图像修复的装置解决问题的原理与前述图像修复的方法相似,因此该用于图像修复的装置的实施可以参见前述图像修复的方法的实施,重复之处不再赘述。
基于同一发明构思,如图12所示,本公开实施例提供了一种用于图像修复的电子设备,包括:
存储器2和处理器3;
其中,存储器2用于存储程序;
处理器3用于执行存储器2中的程序,包括如下步骤:
将待处理图像输入目标去噪网络,其中,目标去噪网络包括单帧网络和递归网络,待处理图像为待处理视频中的任一帧;
通过单帧网络对待处理图像进行去压缩噪声处理,输出第一图像;
根据先前帧图像的内容,通过递归网络对待处理图像进行去压缩噪声处理,输出第二图像,其中,先前帧图像为待处理图像在待处理视频中的前一帧图像;
将第一图像和第二图像进行加权求和,输出针对待处理图像的去噪图像。
处理器3可以是通用处理器、数字信号处理器、专用集成电路、现场可编程门阵列或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件,可以实现或者执行本公开实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者任何常规的处理器等。结合本公开实施例所公开的图像修复的方法的步骤可以直接体现为硬件处理器执行完成,或者用处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器2,处理器3读取存储器2中的信息,结合其硬件完成信号处理流程的步骤。
具体地,处理器3用于读取存储器2中的程序,执行上述图像修复的方法的任一步骤。
基于同一发明构思,本公开实施例还提供了一种计算机可读存储介质,该计算机可读存储介质中存储有计算机指令,存储的计算机指令被处理器执行时能够实现如上面的图像修复的方法的步骤。
本公开实施例提供了一种图像修复的方法、装置及电子设备,其中,该方法将待处理视频中的任一帧待处理图像输入由单帧网络和递归网络构成的目标去噪网络,通过该单帧网络对该待处理图像进行去压缩噪声处理,输出第一图像,根据待处理图像在待处理视频中的先前帧图像的内容,通过递归网络对待处理图像进行去压缩噪声处理,输出第二图像,然后,将第一图像和第二图像进行加权求和,输出针对该当前帧图像的去噪图像。也就是说,对待处理视频中的任一帧待处理图像需要根据当前帧的待处理图像与先前帧图像来综合起来进行去压缩噪声的处理,从而实现了对待处理视频中的任一帧图像中的压缩噪声的去除,提高了显示品质,此外,由于整个去压缩噪声的过程中利用了前后帧图像之间的联系,从而能够实现对帧间的运动补偿,进而提高了视频品质。
本领域内的技术人员应明白,本公开的实施例可提供为方法、系统、或计算机程序产品。因此,本公开可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本公开可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本公开是参照根据本公开的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设 备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
尽管已描述了本公开的优选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例作出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本公开范围的所有变更和修改。
显然,本领域的技术人员可以对本公开进行各种改动和变型而不脱离本公开的精神和范围。这样,倘若本公开的这些修改和变型属于本公开权利要求及其等同技术的范围之内,则本公开也意图包含这些改动和变型在内。

Claims (10)

  1. 一种图像修复的方法,其中,包括:
    将待处理图像输入目标去噪网络,其中,所述目标去噪网络包括单帧网络和递归网络,所述待处理图像为待处理视频中的任一帧;
    通过所述单帧网络对所述待处理图像进行去压缩噪声处理,输出第一图像;
    根据先前帧图像的内容,通过所述递归网络对所述待处理图像进行去压缩噪声处理,输出第二图像,其中,所述先前帧图像为所述待处理图像在所述待处理视频中的前一帧图像;
    将所述第一图像和第二图像进行加权求和,输出针对所述待处理图像的去噪图像。
  2. 如权利要求1所述的方法,其中,所述根据先前帧图像的内容,通过所述递归网络对所述待处理图像进行去压缩噪声处理,输出第二图像,包括:
    通过所述递归网络中级联的第一卷积层、第一特征串联层、第一采样层对所述待处理图像进行去压缩噪声处理,输出第二图像。
  3. 如权利要求2所述的方法,其中,所述递归网络中的所述第一卷积层包括第一子卷积层和第二子卷积层,所述第一特征串联层包括第一子特征串联层和第二子特征串联层,所述第一采样层包括第一下采样层和第一上采样层;
    所述通过所述递归网络中级联的第一卷积层、第一特征串联层、第一采样层对所述待处理图像进行去压缩噪声处理,输出第二图像,包括:
    通过所述第一子特征串联层接收由所述单帧网络中各第二卷积层中第三子卷积层所提取的所述待处理图像的第一特征图,以及通过所述第一子特征串联层获取所述递归网络中与各所述第三子卷积层对应的所述第一子卷积层从所述先前帧图像中所提取的第二特征图;
    通过所述第一子特征串联层对所述第一特征图和所述第二特征图进行串 联操作,获得串联特征图;
    通过所述第一子卷积层对所述串联特征图进行压缩,获得压缩后的特征图,所述压缩后的特征图为通过各所述第一子卷积层从所述待处理图像中所提取的所述第二特征图;
    通过所述第一采样层中的第一下采样层,从所述压缩后的特征图中提取多个空间尺寸的特征图;
    通过所述第一上采样层,确定与所述多个空间尺寸中相同空间尺寸的特征图;
    通过所述第二子特征串联层对相同空间尺寸的特征图在特征维度上进行拼接,获得第一拼接特征图;
    通过所述第二子卷积层对所述第一拼接特征图进行处理,输出所述第二图像。
  4. 如权利要求1所述的方法,其中,所述单帧网络包括级联的第二卷积层、第二采样层和第二特征串联层,所述第二卷积层包括第三子卷积层和第四子卷积层,所述第二采样层包括第二下采样层和第二上采样层;
    所述通过所述单帧网络对所述待处理图像进行去压缩噪声处理,输出第一图像,包括:
    通过各所述第三子卷积层提取所述待处理图像的第一特征图;
    通过所述第二下采样层,从所述第一特征图中提取多个空间尺寸的特征图;
    通过所述第二上采样层,确定与所述多个空间尺寸中相同空间尺寸的特征图;
    通过所述第二特征串联层对相同空间尺寸的特征图在特征维度上进行拼接,获得第二拼接特征图;
    通过所述第四子卷积层对所述第二拼接特征图进行处理,输出所述第一图像。
  5. 如权利要求1-4任一项所述的方法,其中,在所述将待处理图像输入 目标去噪网络之前,所述方法还包括:
    所述目标去噪网络的训练过程,具体执行:
    获取多组图像帧序列,每组图像帧序列包括多幅图像;
    将所述多组图像帧序列分别编码成真值视频以及仿真视频,其中,所述仿真视频中的每帧仿真图像中包含有压缩噪声;
    将所述仿真视频中每帧仿真图像输入待训练去噪网络,输出对应帧的仿真去噪图像;
    根据所述仿真去噪图像与所述真值视频中对应帧真值图像间的第一预测偏差,确定针对所述待训练去噪网络的第一损失函数;
    将所述第一损失函数低于第一预设阈值时所对应的网络,作为所述目标去噪网络。
  6. 如权利要求5所述的方法,其中,所述根据所述仿真去噪图像与所述真值视频中对应帧真值图像间的第一预测偏差,确定针对所述待训练去噪网络的第一损失函数,包括:
    若所述仿真去噪图像与所述真值视频中对应帧真值图像间的第一预测偏差小于或者等于δ时,则采用L2损失函数;
    若所述仿真去噪图像与所述真值视频中对应帧真值图像间的第一预测偏差大于δ时,则采用L1损失函数;
    所述L2损失函数对应的公式为:
    Figure PCTCN2021095778-appb-100001
    所述L1损失函数对应的公式为:
    Figure PCTCN2021095778-appb-100002
    其中,f(x)表示仿真去噪图像,y表示所述真值视频中与所述仿真去噪图像对应帧的真值图像。
  7. 如权利要求5所述的方法,其中,在所述根据所述仿真去噪图像与所述真值视频中对应帧真值图像间的第一预测偏差,确定针对所述待训练去噪网络的第一损失函数之后,所述方法还包括:
    在所述第一损失函数低于所述第一预设阈值时,对所述真值视频中每帧真值图像进行锐化处理,获得边缘增强后的真值视频;
    确定所述仿真去噪图像与所述边缘增强后的真值视频中对应帧图像间的第二预测偏差,确定针对所述待训练去噪网络的第二损失函数;
    将所述第二损失函数低于第二预设阈值时所对应的网络,作为所述目标去噪网络。
  8. 一种用于图像修复的装置,其中,包括:
    输入单元,用于将待处理图像输入目标去噪网络,其中,所述目标去噪网络包括单帧网络和递归网络,所述待处理图像为待处理视频中的任一帧;
    第一处理单元,用于通过所述单帧网络对所述待处理图像进行去压缩噪声处理,输出第一图像;
    第二处理单元,用于根据先前帧图像的内容,通过所述递归网络对所述待处理图像进行去压缩噪声处理,输出第二图像,其中,所述先前帧图像为所述待处理图像在所述待处理视频中的前一帧图像;
    输出单元,用于将所述第一图像和第二图像进行加权求和,输出针对所述待处理图像的去噪图像。
  9. 一种用于图像修复的电子设备,其中,包括:
    存储器和处理器;
    其中,所述存储其用于存储程序;
    所述处理器用于执行所述存储器中的程序,包括如下步骤:
    将待处理图像输入目标去噪网络,其中,所述目标去噪网络包括单帧网络和递归网络,所述待处理图像为待处理视频中的任一帧;
    通过所述单帧网络对所述待处理图像进行去压缩噪声处理,输出第一图像;
    根据先前帧图像的内容,通过所述递归网络对所述待处理图像进行去压缩噪声处理,输出第二图像,其中,所述先前帧图像为所述待处理图像在所述待处理视频中的前一帧图像;
    将所述第一图像和第二图像进行加权求和,输出针对所述待处理图像的去噪图像。
  10. 一种计算机可读存储介质,其中,所述计算机可读存储介质中存储有计算机指令,存储的所述计算机指令被处理器执行时能够实现如权利要求1至7任一项所述的图像修复的方法。
PCT/CN2021/095778 2020-06-22 2021-05-25 图像修复的方法、装置及电子设备 WO2021258959A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/922,150 US20230177652A1 (en) 2020-06-22 2021-05-25 Image restoration method and apparatus, and electronic device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010574404.7 2020-06-22
CN202010574404.7A CN111738952B (zh) 2020-06-22 2020-06-22 一种图像修复的方法、装置及电子设备

Publications (1)

Publication Number Publication Date
WO2021258959A1 true WO2021258959A1 (zh) 2021-12-30

Family

ID=72650459

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/095778 WO2021258959A1 (zh) 2020-06-22 2021-05-25 图像修复的方法、装置及电子设备

Country Status (3)

Country Link
US (1) US20230177652A1 (zh)
CN (1) CN111738952B (zh)
WO (1) WO2021258959A1 (zh)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111738952B (zh) * 2020-06-22 2023-10-10 京东方科技集团股份有限公司 一种图像修复的方法、装置及电子设备
WO2022198381A1 (zh) * 2021-03-22 2022-09-29 京东方科技集团股份有限公司 图像处理方法及图像处理装置
CN113344811A (zh) * 2021-05-31 2021-09-03 西南大学 多层卷积稀疏编码的加权递归去噪深度神经网络及方法
CN115546037A (zh) * 2021-06-30 2022-12-30 北京字跳网络技术有限公司 一种图像处理方法、装置、电子设备和存储介质
CN114860996B (zh) * 2022-07-06 2022-10-04 北京中科开迪软件有限公司 一种基于光盘库的电子视频档案修复方法及系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106331433A (zh) * 2016-08-25 2017-01-11 上海交通大学 基于深度递归神经网络的视频去噪方法
CN107645621A (zh) * 2016-07-20 2018-01-30 阿里巴巴集团控股有限公司 一种视频处理的方法和设备
US20190304068A1 (en) * 2018-03-29 2019-10-03 Pixar Multi-scale architecture of denoising monte carlo renderings using neural networks
CN110852961A (zh) * 2019-10-28 2020-02-28 北京影谱科技股份有限公司 一种基于卷积神经网络的实时视频去噪方法及系统
CN111738952A (zh) * 2020-06-22 2020-10-02 京东方科技集团股份有限公司 一种图像修复的方法、装置及电子设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107645621A (zh) * 2016-07-20 2018-01-30 阿里巴巴集团控股有限公司 一种视频处理的方法和设备
CN106331433A (zh) * 2016-08-25 2017-01-11 上海交通大学 基于深度递归神经网络的视频去噪方法
US20190304068A1 (en) * 2018-03-29 2019-10-03 Pixar Multi-scale architecture of denoising monte carlo renderings using neural networks
CN110852961A (zh) * 2019-10-28 2020-02-28 北京影谱科技股份有限公司 一种基于卷积神经网络的实时视频去噪方法及系统
CN111738952A (zh) * 2020-06-22 2020-10-02 京东方科技集团股份有限公司 一种图像修复的方法、装置及电子设备

Also Published As

Publication number Publication date
CN111738952A (zh) 2020-10-02
CN111738952B (zh) 2023-10-10
US20230177652A1 (en) 2023-06-08

Similar Documents

Publication Publication Date Title
WO2021258959A1 (zh) 图像修复的方法、装置及电子设备
CN107403415B (zh) 基于全卷积神经网络的压缩深度图质量增强方法及装置
CN108664981B (zh) 显著图像提取方法及装置
CN108596841B (zh) 一种并行实现图像超分辨率及去模糊的方法
CN112801901A (zh) 基于分块多尺度卷积神经网络的图像去模糊算法
CN110189260B (zh) 一种基于多尺度并行门控神经网络的图像降噪方法
US20220414838A1 (en) Image dehazing method and system based on cyclegan
CN114073071A (zh) 视频插帧方法及装置、计算机可读存储介质
US20230368337A1 (en) Techniques for content synthesis using denoising diffusion models
CN113129212B (zh) 图像超分辨率重建方法、装置、终端设备及存储介质
CN116681584A (zh) 一种多级扩散图像超分辨算法
CN109993701B (zh) 一种基于金字塔结构的深度图超分辨率重建的方法
US11783454B2 (en) Saliency map generation method and image processing system using the same
CN113705575B (zh) 一种图像分割方法、装置、设备及存储介质
CN116777764A (zh) 一种基于扩散模型的光学遥感图像去云雾方法及系统
Liu et al. Facial image inpainting using multi-level generative network
TWI768517B (zh) 影像品質提昇方法及使用該方法的影像處理裝置
CN113744159A (zh) 一种遥感图像去雾方法、装置及电子设备
CN113096032A (zh) 一种基于图像区域划分的非均匀一致模糊去除方法
CN114173137A (zh) 视频编码方法、装置及电子设备
CN115272131B (zh) 基于自适应多光谱编码的图像去摩尔纹系统及方法
AT&T
CN111861897A (zh) 一种图像处理方法及装置
CN115063304B (zh) 一种多尺寸融合的金字塔神经网络图像去雾方法及系统
KR102536654B1 (ko) 인공지능 기반 영상 개선 방법 및 장치

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21828713

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21828713

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 25.09.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 21828713

Country of ref document: EP

Kind code of ref document: A1