WO2023005699A1 - 视频增强网络训练方法、视频增强方法及装置 - Google Patents

视频增强网络训练方法、视频增强方法及装置 Download PDF

Info

Publication number
WO2023005699A1
WO2023005699A1 PCT/CN2022/106156 CN2022106156W WO2023005699A1 WO 2023005699 A1 WO2023005699 A1 WO 2023005699A1 CN 2022106156 W CN2022106156 W CN 2022106156W WO 2023005699 A1 WO2023005699 A1 WO 2023005699A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
layer
video frame
network
enhanced
Prior art date
Application number
PCT/CN2022/106156
Other languages
English (en)
French (fr)
Inventor
崔同兵
黄志杰
Original Assignee
广州安思创信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 广州安思创信息技术有限公司 filed Critical 广州安思创信息技术有限公司
Publication of WO2023005699A1 publication Critical patent/WO2023005699A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/90Dynamic range modification of images or parts thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • G06T9/002Image coding using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Definitions

  • the embodiments of the present application relate to the technical field of video processing, for example, to a video enhancement network training method, a video enhancement method and a device.
  • video compression/encoding can reduce storage space and transmission bandwidth. It plays a vital role.
  • Video compression will cause various distortions such as block effect and blur in the compressed video, which seriously affects people's video viewing experience.
  • neural networks are widely used in video quality improvement.
  • more complex and deeper networks are often used to extract image features, but complex and deep neural networks run slowly, and for video enhancement tasks, the network speed is also very high.
  • slow neural networks limit the application of image enhancement networks to video quality enhancement tasks.
  • the neural network used for video enhancement in the related art cannot balance the video enhancement quality and running speed.
  • the embodiment of the present application provides a video enhancement network training method, video enhancement method, device, electronic equipment and storage medium, so as to avoid the situation that the neural network used for video enhancement in the related art cannot take into account the video enhancement quality and running speed.
  • the embodiment of the present application provides a video enhancement network training method, including:
  • the video enhancement network includes an input layer, an output layer, and a plurality of dense residual subnetworks between the input layer and the output layer, and each of the dense residual subnetworks includes a downsampling layer, an upper A sampling layer and a plurality of convolutional layers located between the downsampling layer and the upsampling layer, the input feature of each convolutional layer is the sum of the output features of all layers before the convolutional layer.
  • the embodiment of the present application provides a video enhancement method, including:
  • the video data to be enhanced includes multiple frames of video frames
  • the video enhancement network is trained by the video enhancement network training method described in the first aspect.
  • the embodiment of the present application provides a video enhancement network training device, including:
  • the training data acquisition module is configured to acquire the first video frame and the second video frame used for training, and the second video frame is a video frame after the enhanced processing of the first video frame;
  • a network building block configured to construct a video augmentation network
  • a network training module configured to train the video enhancement network using the first video frame and the second video frame
  • the video enhancement network includes an input layer, an output layer, and a plurality of dense residual subnetworks between the input layer and the output layer, and each of the dense residual subnetworks includes a downsampling layer, an upper A sampling layer and a plurality of convolutional layers located between the downsampling layer and the upsampling layer, the input feature of each convolutional layer is the sum of the output features of all layers before the convolutional layer.
  • the embodiment of the present application provides a video enhancement device, including:
  • the video data acquisition module to be enhanced is configured to acquire video data to be enhanced, and the video data to be enhanced includes multi-frame video frames;
  • the video enhancement module is configured to input the video frame into the enhanced video frame obtained in the pre-trained video enhancement network;
  • a splicing module configured to splice the enhanced video frames into enhanced video data
  • the video enhancement network is trained by the video enhancement network training method described in the first aspect.
  • an embodiment of the present application provides an electronic device, the electronic device comprising:
  • processors one or more processors
  • storage means configured to store one or more programs
  • the one or more processors When the one or more programs are executed by the one or more processors, the one or more processors implement the video enhancement network training method described in the first aspect of the present application, and/or, the second aspect The described video enhancement method.
  • the embodiment of the present application provides a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the video enhancement network training method described in the first aspect of the present application is implemented, and/or , the video enhancement method described in the second aspect.
  • Fig. 1 is a flow chart of the steps of a video enhancement network training method provided by an embodiment of the present application
  • FIG. 2A is a flow chart of the steps of a video enhancement network training method provided by another embodiment of the present application.
  • Fig. 2B is a schematic diagram of the dense residual subnetwork in the embodiment of the present application.
  • FIG. 2C is a schematic structural diagram of a video enhancement network according to an embodiment of the present application.
  • Fig. 3 is a flow chart of steps of a video enhancement method provided by an embodiment of the present application.
  • Fig. 4 is a structural block diagram of a video enhancement network training device provided by an embodiment of the present application.
  • Fig. 5 is a structural block diagram of a video enhancement device provided by an embodiment of the present application.
  • Fig. 6 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • Figure 1 is a flow chart of the steps of a video enhancement network training method provided by an embodiment of the present application.
  • the embodiment of the present application is applicable to the situation where the video enhancement network is trained to enhance the video.
  • the method can be implemented by the embodiment of the present application.
  • Video enhanced network training device to perform, the video enhanced network training device can be implemented by hardware or software, and integrated in the electronic equipment provided by the embodiment of the application, for example, as shown in Figure 1, the video of the embodiment of the application
  • the enhanced network training method may include the following steps:
  • the first video frame can be the video frame used to input the video enhancement network during training
  • the second video frame can be the video frame used as the label during training, that is, the second video frame can be the first video frame after the enhancement process The resulting video frame.
  • video data is composed of multiple video frames, and the video data is coded and compressed at the sending end before network transmission, and decoded when the receiving end receives the coded and compressed video data. Since the video data is encoded and decoded, the decoded video data is distorted to a certain extent, then multiple video frames can be extracted from the decoded video data as the first video frame for training, and the encoded video frame before compression The undistorted video frame in the video data is used as the second video frame. Certainly, the enhanced video frame obtained after artificially enhancing the first video frame may also be used as the second video frame.
  • the video enhancement network of the embodiment of the present application includes an input layer, an output layer, and a plurality of dense residual subnetworks between the input layer and the output layer, and each dense residual subnetwork includes a downsampling layer, an upsampling layer, and a Multiple convolutional layers located between the downsampling layer and the upsampling layer, the input feature of each convolutional layer is the sum of the output features of all layers before the convolutional layer.
  • the input and output layers may be convolutional layers.
  • Each dense residual sub-network sets a downsampling layer, which enables all feature operations to be performed under downsampling, reducing the complexity of the video enhancement network.
  • the input of each convolutional layer in the dense residual sub-network is the sum of the output features of all layers before the convolutional layer, which realizes feature multiplexing, improves the transmission capability of features when the signal is sparse, and avoids feature loss. , which improves the recovery quality of video frames.
  • the first video frame is input to the input layer, it undergoes convolution processing to obtain a shallow feature map.
  • the shallow feature map is input into the first dense residual sub-network and then down-sampled to obtain a down-sampled feature map.
  • the input feature of each convolutional layer is the sum of the output features of all layers before the convolutional layer.
  • the video enhancement network outputs the enhanced enhanced video frame, and adjusts the parameters of the video enhancement network by calculating the loss rate of the enhanced video frame and the second video frame until the video enhancement network converges or the number of training times reaches the preset number of times to obtain a trained video.
  • An enhanced network the trained video enhanced network is used to output the enhanced video frame when the video frame to be enhanced is input.
  • the video enhancement network of the embodiment of the present application includes a plurality of dense residual sub-networks, and each dense residual sub-network includes a downsampling layer, and all features are extracted under downsampling, which reduces the complexity of the video enhancement network and improves
  • the speed of the video enhancement network is improved, and the input feature of each convolutional layer in the dense residual sub-network is the sum of the output features of all layers before the convolutional layer, which realizes feature multiplexing and can be used in the case of sparse signals.
  • the feature transmission capability is improved, and high-quality video frames can be recovered, that is, the video enhancement network in the embodiment of the present application can take both video enhancement quality and running speed into consideration.
  • Fig. 2A is a flow chart of the steps of a video enhancement network training method provided by another embodiment of the present application.
  • the embodiment of the present application is refined on the basis of the foregoing embodiments.
  • the video enhancement network training method can comprise the steps:
  • video data is composed of multiple frames of video frames, and the video data is coded and compressed by the sending end before network transmission, and decoded when the receiving end receives the coded and compressed video data. Since the video data is encoded and decoded, the decoded video data is distorted to a certain extent. Multiple video frames can be extracted from the decoded video data as the first video frame for training, and the video before encoding The unencoded and compressed video frame in the data is used as the second video frame. Certainly, the enhanced video frame obtained after artificially enhancing the first video frame may also be used as the second video frame.
  • the dense residual sub-network can be a network containing multiple convolutional layers.
  • the input of each convolutional layer is the sum of the output features of all layers before the convolutional layer.
  • each dense residual sub-network multiple sequentially connected convolutional layers are constructed, wherein the output features of each convolutional layer are summed with the output features of all layers before the convolutional layer
  • a downsampling layer is connected before the first convolutional layer and an upsampling layer is connected after the last convolutional layer
  • the second addition is connected after the upsampling layer
  • the second adder is used to add the output features of the up-sampling layer and the input features of the down-sampling layer as the output features of the dense residual sub-network.
  • the downsampling layer can be bilinear interpolation sampling
  • the convolution kernel size of each convolution layer can be 3 ⁇ 3
  • ⁇ ( ) is the activation function
  • W, b are the weights and offset coefficients of the convolutional layer
  • F i is the feature obtained after convolution.
  • FIG. 2B a schematic diagram of a dense residual sub-network is shown in Figure 2B.
  • the input feature F in is passed through the downsampling layer to obtain a downsampling feature map F 0
  • the downsampling feature map F 0 is passed through the first
  • a convolutional layer outputs the feature map F 1
  • the downsampled feature map F 0 and the feature map F 1 can be concatenated as the input feature of the second convolutional layer
  • the second convolutional layer outputs the feature map F 2
  • concatenate the feature maps F 0 , F 1 , and F 2 as the input features of the third convolutional layer, and so on.
  • the splicing of two or more feature maps may be the splicing of feature maps with the same size on the channel.
  • feature map A is H ⁇ W ⁇ C A
  • feature map B is H ⁇ W ⁇ C B
  • the feature map obtained by splicing feature map A and feature map B is H ⁇ W ⁇ (C A +C B ) , where H is the height of the feature map, W is the width of the feature map, and C is the channel value.
  • the feature map F d is up-sampled to obtain an up-sampled feature map with the same size as the input feature F in , and finally the up-sampled feature map and the input feature map F in pass through the second adder After SUM2, the output feature F out of the dense residual sub-network is obtained, and the output feature F out is used as the input feature F in of the next dense residual sub-network.
  • the second adder is used for adding pixel values of corresponding pixel points in the input feature map F in and the upsampling feature map.
  • the upsampling layer performs pixel rearrangement on the output feature map of the last convolutional layer through a preset pixel rearrangement algorithm to obtain an upsampled feature map with the same size as the input feature map of the downsampling layer.
  • the pixel shuffling (PixelShuffle) algorithm converts a low-resolution input image (Low Resolution) with a size of H ⁇ W into a high-resolution image (High Resolution) of rH ⁇ rW through Sub-pixel operation, where , r is the upsampling factor, that is, the magnification from low resolution to high resolution.
  • the upsampling layer uses PixelShuffle to obtain feature maps of 2 n ⁇ C channels through periodic screening. The method obtains a high-resolution feature map with the number of channels C.
  • an input layer C_in is connected before the first dense residual sub-network SDRB 1 .
  • the input layer C_in may be a convolutional layer with a convolution kernel equal to 3 ⁇ 3, so as to perform a convolution operation on the input image to obtain a shallow feature F in to be input into the first dense residual sub-network SDRB 1 .
  • an input layer C_out is connected after the last dense residual sub-network SDRB N.
  • the input layer C_out may be a convolutional layer with a convolution kernel equal to 3 ⁇ 3, so as to linearly transform the output features of the last dense residual sub-network SDRB N to obtain a residual map.
  • the first adder SUM1 is connected after the output layer C_out of the video enhancement network, the input of the first adder SUM1 is the residual map output by the output layer C_out and the input image I of the input layer C_in, the first An adder SUM1 adds the residual map output by the output layer C_out to the pixel value of the corresponding pixel in the input image I to output the enhanced video frame O.
  • the number of pixel bits of the first video frame can be obtained, the pixel value corresponding to the number of pixel bits can be calculated as the maximum pixel value of the first video frame, and the difference between the maximum pixel value and 1 can be calculated, for the first
  • the pixel value of each pixel in the video frame calculate the ratio of the pixel value to the difference as the normalized pixel value of each pixel, for example, the formula for normalization is as follows:
  • the input feature F in shown in Figure 2B is obtained after the normalized first video frame I is input into the input layer, and the input feature F in is sequentially processed in multiple dense residual sub-networks Transmission in SDRB N.
  • the input feature F in is first sampled by the downsampling layer, and then sequentially transmitted in the convolutional layer of the dense residual sub-network SDRB N , each convolutional layer
  • the input feature of the input feature is the sum of the output features of all layers before the convolutional layer, and the output of the last convolutional layer passes through the upsampling layer and then outputs the upsampling feature.
  • the output feature F out is used as the input feature F in of the next dense residual sub-network, and the output feature of the last dense residual sub-network SDRB N is linearly transformed through the output layer C_out to obtain a residual map.
  • the first adder SUM1 adds the residual map output by the output layer C_out to the pixel value of the corresponding pixel in the input image I to output the enhanced video frame O.
  • the loss function is the mean square error loss function, as shown in the following formula:
  • Y is the unencoded and compressed video frame, that is, the second video frame
  • O is the video frame output by the video enhancement network
  • the size of the training video can be 32
  • the training can use the Adam optimizer
  • the initial learning rate can be set to 10- 4.
  • those skilled in the art can also use other loss functions to calculate the loss rate, and the embodiment of the present application does not limit the way of calculating the loss rate.
  • the number of iterative training can also be counted, and when the number reaches the preset number, the iterative training of the video enhancement network is stopped to obtain a trained video enhancement network.
  • the parameters of the video enhancement network can also be divided into multiple sections, so as to train and adjust the parameters of each section respectively, and inherit the trained parameters to the untrained parameters to improve the training performance. speed.
  • the video enhancement network of the embodiment of the present application includes a plurality of dense residual sub-networks, and each dense residual sub-network includes a downsampling layer, and all features are extracted under downsampling, which reduces the complexity of the video enhancement network and improves
  • the speed of the video enhancement network is improved, and the input feature of each convolutional layer in the dense residual sub-network is the sum of the output features of all layers before the convolutional layer, which realizes feature multiplexing and can be used in the case of sparse signals.
  • the feature transmission capability is improved, and high-quality video frames can be recovered, that is, the video enhancement network in the embodiment of the present application can take both video enhancement quality and running speed into consideration.
  • Fig. 3 is a flow chart of the steps of a video enhancement method provided by the embodiment of the present application.
  • the embodiment of the present application is applicable to the case of enhancing decompressed video data, and the method can be executed by the video enhancement device of the embodiment of the present application.
  • the video enhancement device may be implemented by hardware or software, and integrated into the electronic device provided by the embodiment of the present application.
  • the video enhancement method of the embodiment of the present application may include the following steps:
  • the video data to be enhanced is composed of multiple video frames
  • the video enhancement may be to perform image processing on the video frames in the video data.
  • the video enhancement may be image processing including defogging, contrast enhancement, lossless magnification, stretch recovery, etc., capable of realizing high-definition video reconstruction.
  • the video data obtained by decoding before the video data is played has distortion phenomena, such as block effects, blurring and other distortions, so it is necessary to enhance the decoded video data, then it can be
  • the compressed video data is decoded to obtain the video data to be enhanced.
  • the video data to be enhanced can also be other video data.
  • the video data recorded by the camera can be used as the video data to be enhanced to improve the video data in the live broadcast scene due to light, equipment, etc. Due to the fact that the quality of the operating video is poor, the embodiment of the present application does not limit the manner of acquiring the video data to be enhanced.
  • the embodiment of the present application can pre-train the video enhancement network. After inputting a video frame, the video enhancement network can output the enhanced video frame.
  • the video enhancement network training method provided in the foregoing embodiments can be used to train video enhancement.
  • the specific training process of the network reference may be made to the foregoing embodiments, and no further details are given here.
  • the enhanced video frames can be spliced into enhanced video data according to the playing sequence of the video frames in the video data.
  • the playback time stamp of each video frame in the video data may be recorded, and each enhanced video frame may be spliced according to the playback time stamp to obtain enhanced video data.
  • the embodiment of the present application can embed the video enhancement network between the decoder and the player, the decoder does not decode a frame of video and then inputs it into the video enhancement network, and the video enhancement network outputs the enhanced video frame to The player plays in real time without splicing the enhanced video frames.
  • video data to be enhanced is obtained, video frames of the video data are input into a pre-trained video enhancement network to obtain enhanced video frames, and the enhanced video frames are spliced into enhanced video data.
  • the video enhancement network used for video enhancement includes multiple dense residual subnetworks, each of which includes a downsampling layer, and all features are extracted under downsampling, which reduces the complexity of the video enhancement network , which improves the running speed of the video enhancement network
  • the input feature of each convolutional layer in the dense residual sub-network is the sum of the output features of all layers before the convolutional layer, which realizes feature multiplexing and can be used in sparse signal
  • the feature transmission capability is improved, and high-quality video frames can be restored, that is, the video enhancement network in the embodiment of the present application can take both video enhancement quality and running speed into consideration.
  • Fig. 4 is a structural block diagram of a video enhancement network training device provided by the embodiment of the present application. As shown in Fig. 4, the video enhancement network training device of the embodiment of the present application includes:
  • the training data acquisition module 401 is configured to obtain the first video frame and the second video frame used for training, and the second video frame is a video frame after the first video frame enhancement process;
  • a network construction module 402 configured to construct a video enhancement network
  • a network training module 403, configured to use the first video frame and the second video frame to train the video enhancement network
  • the video enhancement network includes an input layer, an output layer, and a plurality of dense residual subnetworks between the input layer and the output layer, and each of the dense residual subnetworks includes a downsampling layer, an upper A sampling layer and a plurality of convolutional layers located between the downsampling layer and the upsampling layer, the input feature of each convolutional layer is the sum of the output features of all layers before the convolutional layer.
  • the video-enhanced network training device provided in the embodiment of the present application can execute the video-enhanced network training method provided in the foregoing embodiments of the present application, and has corresponding functional modules and beneficial effects for executing the method.
  • Fig. 5 is a structural block diagram of a video enhancement device provided in the embodiment of the present application. As shown in Fig. 5, the video enhancement device in the embodiment of the present application may include the following modules:
  • the video data acquisition module 501 to be enhanced is configured to acquire video data to be enhanced, and the video data to be enhanced includes multi-frame video frames;
  • the video enhancement module 502 is configured to input the video frame into the enhanced video frame obtained in the pre-trained video enhancement network;
  • the splicing module 503 is configured to splice the enhanced video frames into enhanced video data
  • the video enhancement network is trained by the video enhancement network training method described in the foregoing embodiments.
  • the video enhancement device provided in the embodiment of the present application can execute the video enhancement method provided in the embodiment of the present application, and has corresponding functional modules and beneficial effects for executing the method.
  • the electronic device may include: a processor 601 , a storage device 602 , a display screen 603 with a touch function, an input device 604 , an output device 605 and a communication device 606 .
  • the number of processors 601 in the electronic device may be one or more, and one processor 601 is taken as an example in FIG. 6 .
  • the processor 601 , storage device 602 , display screen 603 , input device 604 , output device 605 and communication device 606 of the electronic device may be connected via a bus or in other ways. In FIG. 6 , connection via a bus is taken as an example.
  • the electronic device is configured to execute the video enhancement network training method provided in any embodiment of the present application, and/or the video enhancement method.
  • the embodiment of the present application also provides a computer-readable storage medium, when the instructions in the storage medium are executed by the processor of the device, the device can execute the video enhancement network training method as described in the above method embodiment, and/or , a video enhancement method.
  • the computer readable storage medium may be a non-transitory computer readable storage medium.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

本申请实施例公开了一种视频增强网络训练方法、视频增强方法及装置,视频增强网络训练方法包括:获取训练用的第一视频帧和第二视频帧;构建视频增强网络;采用第一视频帧和第二视频帧训练视频增强网络;视频增强网络包括输入层、输出层以及位于输入层和输出层之间的多个稠密残差子网络,每个稠密残差子网络包括下采样层、上采样层以及位于上采样层和下采样层之间的多个卷积层,每个卷积层的输入特征为卷积层之前的所有层的输出特征之和。

Description

视频增强网络训练方法、视频增强方法及装置
本申请要求在2021年7月29日提交中国专利局、申请号为202110866688.1的中国专利申请的优先权,该申请的全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及视频处理技术领域,例如涉及一种视频增强网络训练方法、视频增强方法及装置。
背景技术
随着视频编码技术的发展,视频成为人们获取信息的重要媒介,而视频质量高低对于视频观看体验具有至关重要的作用,由于视频数据量较大,视频压缩/编码对减少存储空间和传输带宽起着至关重要的作用。
视频压缩会导致压缩后的视频出现方块效应、模糊等各种失真,严重影响了人们的视频观看体验。为了提升压缩视频质量,神经网络被广泛用于视频质量提升。然而,为了实现较为满意的质量提升效果,多采用更复杂、更深的网络来提取图像特征,但复杂和深度较深的神经网络运行速度慢,而对于视频增强任务而言,网络运行速度也是非常重要的,运行速度过慢的神经网络限制了图像增强网络在视频质量增强任务上的应用。
综上所述,相关技术中用于视频增强的神经网络存在无法兼顾视频增强质量和运行速度的情况。
发明内容
本申请实施例提供一种视频增强网络训练方法、视频增强方法、装置、电子设备和存储介质,以避免决相关技术中用于视频增强的神经网络无法兼顾视频增强质量和运行速度的情况。
第一方面,本申请实施例提供了一种视频增强网络训练方法,包括:
获取训练用的第一视频帧和第二视频帧,所述第二视频帧为所述第一视频帧增强处理后的视频帧;
构建视频增强网络;
采用所述第一视频帧和第二视频帧训练所述视频增强网络;
其中,所述视频增强网络包括输入层、输出层以及位于所述输入层和所述输出层之间的多个稠密残差子网络,每个所述稠密残差子网络包括下采样层、上采样层以及位于所述下采样层和所述上采样层之间的多个卷积层,每个所述卷积层的输入特征为所述卷积层之前的所有层的输出特征之和。
第二方面,本申请实施例提供了一种视频增强方法,包括:
获取待增强的视频数据,所述待增强的视频数据包括多帧视频帧;
将所述视频帧输入预先训练好的视频增强网络中得到增强后的视频帧;
将所述增强后的视频帧拼接为增强后的视频数据;
其中,所述视频增强网络通过第一方面所述的视频增强网络训练方法所训练。
第三方面,本申请实施例提供了一种视频增强网络训练装置,包括:
训练数据获取模块,设置为获取训练用的第一视频帧和第二视频帧,所述第二视频帧为所述第一视频帧增强处理后的视频帧;
网络构建模块,设置为构建视频增强网络;
网络训练模块,设置为采用所述第一视频帧和第二视频帧训练所述视频增强网络;
其中,所述视频增强网络包括输入层、输出层以及位于所述输入层和所述输出层之间的多个稠密残差子网络,每个所述稠密残差子网络包括下采样层、上采样层以及位于所述下采样层和所述上采样层之间的多个卷积层,每个所述卷积层的输入特征为所述卷积层之前的所有层的输出特征之和。
第四方面,本申请实施例提供了一种视频增强装置,包括:
待增强视频数据获取模块,设置为获取待增强的视频数据,所述待增强的视频数据包括多帧视频帧;
视频增强模块,设置为将所述视频帧输入预先训练好的视频增强网络中得到增强后的视频帧;
拼接模块,设置为将所述增强后的视频帧拼接为增强后的视频数据;
其中,所述视频增强网络通过第一方面所述的视频增强网络训练方法所训练。
第五方面,本申请实施例提供了一种电子设备,所述电子设备包括:
一个或多个处理器;
存储装置,设置为存储一个或多个程序,
当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现本申请第一方面所述的视频增强网络训练方法,和/或,第二方面所述的视频增强方法。
第六方面,本申请实施例提供了一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现本申请第一方面所述的视频增强网络训练方法,和/或,第二方面所述的视频增强方法。
附图说明
图1是本申请一实施例提供的一种视频增强网络训练方法的步骤流程图;
图2A是本申请另一实施例提供的一种视频增强网络训练方法的步骤流程图;
图2B是本申请实施例中稠密残差子网络的示意图;
图2C是本申请实施例的视频增强网络的结构示意图;
图3是本申请一实施例提供的一种视频增强方法的步骤流程图;
图4是本申请一实施例提供的一种视频增强网络训练装置的结构框图;
图5是本申请一实施例提供的一种视频增强装置的结构框图;
图6是本申请一实施例提供的一种电子设备的结构示意图。
具体实施方式
图1为本申请一实施例提供的一种视频增强网络训练方法的步骤流程图,本申请实施例可适用于训练视频增强网络来对视频进行增强处理的情况,该方法可以由本申请实施例的视频增强网络训练装置来执行,该视频增强网络训练装置可以由硬件或软件来实现,并集成在本申请实施例所提供的电子设备中,例如,如图1所示,本申请实施例的视频增强网络训练方法可以包括如下步骤:
S101、获取训练用的第一视频帧和第二视频帧,所述第二视频帧为所述第一视频帧增强处理后的视频帧。
例如,第一视频帧可以是在训练时用于输入视频增强网络的视频帧,第二视频帧可以是训练时作为标签的视频帧,即第二视频帧可以是第一视频帧经增强处理后所得到的视频帧。
在实际应用中,视频数据由多帧视频帧组成,视频数据在网络传输前先在发送端编码压缩,接收端接收到编码压缩的视频数据时进行解码。由于视频数据经过编码和解码过程,解码后的视频数据存在一定程度上的失真,则可以从解码后的视频数据中提取多个视频帧作为训练用的第一视频帧,以及将编码压缩前的视频数据中未失真的视频帧作为第二视频帧。当然,也可以是人工对第一视频帧进行增强处理后得到的、增强的视频帧作为第二视频帧。
S102、构建视频增强网络。
例如,本申请实施例的视频增强网络包括输入层、输出层以及位于输入层和输出层之间的多个稠密残差子网络,每个稠密残差子网络包括下采样层、上采样层以及位于下采样层和上采样层之间的多个卷积层,每个卷积层的输入特征为卷积层之前的所有层的输出特征之和。在一个实施例中,输入层和输出层可以是卷积层。每个稠密残差子网络设置下采样层,可以使得所有的特征操作在下采样下进行,降低了视频增强网络的复杂度。另外,稠密残差子网络中每个卷积层的输入为卷积层之前的所有层的输出特征之和,实现了特征复用,可以在信号稀疏时提高特征的传输能力,避免了特征损失,提高了视频帧的恢复质量。
S103、采用所述第一视频帧和第二视频帧训练所述视频增强网络。
例如,第一视频帧输入到输入层后经过卷积处理得到浅层特征图,该浅层特征图输入第一个稠密残差子网络中经过下采样后得到下采样特征图,然后通过多个卷积层,每个卷积层的输入特征为卷积层之前的所有层的输出特征之和。最后视频增强网络输出增强后的增强视频帧,通过增强视频帧与第二视频帧计算损失率来调整视频增强网络的参数,直到视频增强网络收敛或者训练次数达到预设次数时得到训练好的视频增强网络,该训练好的视频增强网络用于在输入待增强的视频帧时输出增强后的视频帧。
本申请实施例的视频增强网络包括多个稠密残差子网络,在每个稠密残差子网络中均包括下采样层,所有特征均在下采样下提取,降低了视频增强网络 的复杂度,提高了视频增强网络的运行速度,并且稠密残差子网络中每个卷积层的输入特征为卷积层之前的所有层的输出特征之和,实现了特征复用,能够在信号稀疏的情况下提高了特征的传输能力,能够恢复高质量的视频帧,即本申请实施例的视频增强网络能够同时兼顾视频增强质量和运行速度。
图2A为本申请另一实施例提供的一种视频增强网络训练方法的步骤流程图,本申请实施例在前述实施例的基础上进行细化,例如,如图2A所示,本申请实施例的视频增强网络训练方法可以包括如下步骤:
S201、获取训练用的第一视频帧和第二视频帧,所述第二视频帧为所述第一视频帧增强处理后的视频帧。
例如,视频数据由多帧视频帧组成,视频数据在网络传输前被发送端编码压缩,接收端接收到编码压缩的视频数据时进行解码。由于视频数据经过编码和解码过程,解码后的视频数据存在一定程度上的失真,可以从解码后的视频数据中提取多个视频帧作为训练用于的第一视频帧,以及将编码前的视频数据中未编码压缩的视频帧作为第二视频帧。当然,也可以是人工对第一视频帧进行增强处理后得到的、增强后的视频帧作为第二视频帧。
S202、构建多个依次连接的稠密残差子网络。
稠密残差子网络可以是包含多个卷积层的网络,在稠密残差子网络中,每个卷积层的输入为该卷积层之前的所有层的输出特征之和。
在本申请的实施例中,针对每个稠密残差子网络,构建多个依次连接的卷积层,其中,每个卷积层的输出特征与卷积层之前的所有层的输出特征求和作为卷积层的下一个卷积层的输入特征,在第一个卷积层之前连接一个下采样层以及在最后一个卷积层之后连接一个上采样层,在上采样层之后连接第二加法器,第二加法器用于将上采样层的输出特征与下采样层的输入特征相加作为稠密残差子网络的输出特征。
其中,下采样层可以是双线性插值采样,采样比率可以为α,其中α=2 -n,且n为正整数,各个卷积层可以的卷积核大小为3×3,激活函数为ReLU(x)=max(0,x),对于每个卷积层可以表示为:
F=σ(W*F i+b)
F i+1=[F 0,F 1,F 2,…,F i,F]
其中,σ(·)为激活函数,W,b为卷积层的权重和偏移系数,F i为经过卷积后得到的特征。
在一个示例中,如图2B所示为一个稠密残差子网络的示意图,在图2B中,输入特征F in经过下采样层之后得到下采样特征图F 0,下采样特征图F 0经过第一个卷积层输出特征图F 1,则可以将下采样特征图F 0和特征图F 1拼接作为第二个卷积层的输入特征,由第二个卷积层输出特征图F 2,然后将特征图F 0、F 1、F 2拼接作为第三个卷积层的输入特征,以此类推。其中,两个或者两个以上的特征图拼接可以是尺寸相同的特征图在通道上的拼接。示例性地,特征图A为H×W×C A,特征图B为H×W×C B,特征图A和特征图B拼接得到的特征图为H×W×(C A+C B),其中,H为特征图的高度,W为特征图的宽度,C为通道 值。
在最后一个卷积层输出特征图F d后,对特征图F d上采样得到尺寸与输入特征F in相同的上采样特征图,最后上采样特征图与输入特征图F in经过第二加法器SUM2后相加得到稠密残差子网络的输出特征F out,该输出特征F out作为下一个稠密残差子网络的输入特征F in。其中,第二加法器用于将输入特征图F in和上采样特征图中对应的像素点的像素值相加。
在一个实施例中,上采样层通过预设的像素重排算法对最后一个卷积层的输出特征图进行像素重排,得到尺寸与下采样层的输入特征图相同的上采样特征图。例如,像素重排(PixelShuffle)算法将一个尺寸为H×W的低分辨率输入图像(Low Resolution),通过Sub-pixel操作将其变为rH×rW的高分辨率图像(High Resolution),其中,r为上采样因子,即从低分辨率到高分辨率的扩大倍率,在本申请实施例中,上采样层通过PixelShuffle的方式将得到的2 n×C个通道的特征图通过周期筛选的方法得到通道数为C的高分辨率的特征图。
S203、在第一个稠密残差子网络之前连接输入层。
如图2C所示,在构建多个依次连接的稠密残差子网络SDRB N之后,在第一个稠密残差子网络SDRB 1之前连接一个输入层C_in。示例性地,输入层C_in可以是卷积核等于3×3的卷积层,以对输入图像进行卷积操作得到浅层特征F in来输入到第一个稠密残差子网络SDRB 1中。
S204、在最后一个稠密残差子网络之后连接输出层以输出残差图。
如图2C所示,在构建多个依次连接的稠密残差子网络SDRB N之后,在最后一个稠密残差子网络SDRB N之后连接一个输入层C_out。示例性地,输入层C_out可以是卷积核等于3×3的卷积层,以对最后一个稠密残差子网络SDRB N的输出特征进行线性变换得到残差图。
S205、在所述输出层之后连接第一加法器,所述第一加法器用于将所述残差图的像素值和输入所述输入层的图像的像素值相加得到增强后的视频帧。
如图2C所示,在视频增强网络的输出层C_out之后连接第一加法器SUM1,该第一加法器SUM1的输入为输出层C_out输出的残差图和输入输入层C_in的输入图像I,第一加法器SUM1将输出层C_out输出的残差图和输入图像I中对应的像素点的像素值相加以输出增强处理后的视频帧O。
S206、对所述第一视频帧进行归一化处理得到归一化处理后的第一视频帧。
在本申请的实施例中,可以获取第一视频帧的像素位数,计算像素位数对应的像素值作为第一视频帧的最大像素值,计算最大像素值与1的差值,针对第一视频帧中每个像素的像素值,计算像素值与差值的比值作为每个像素归一化处理后的像素值,例如归一化处理的公式如下:
Figure PCTCN2022106156-appb-000001
其中,B为第一视频帧的像素位数,在一个示例中,B=8时,1<<B=256。通过对第一视频帧归一化处理,可以使得特征归一化为统一量纲,在训练过程中可以提高视频增强网络的收敛速度和精度。
S207、将所述归一化处理后的第一视频帧输入所述视频增强网络的输入层以在输出增强视频帧。
例如,如图2C所示,将归一化处理后的第一视频帧I输入输入层之后得到图2B中所示的输入特征F in,输入特征F in,依次在多个稠密残差子网络SDRB N中传输。如图2B所示,在每个稠密残差子网络SDRB N中,输入特征F in先经过下采样层采样,然后依次在稠密残差子网络SDRB N的卷积层传输,每个卷积层的输入特征在该卷积层之前的所有层的输出特征之和,最后一个卷积层的输出经上采样层之后输出上采样特征,该上采样特征与输入特征F in通过第二加法器SUM2后输出稠密残差子网络SDRB N的输出特征F out。该输出特征F out作为下一个稠密残差子网络的输入特征F in,最后一个稠密残差子网络SDRB N的输出特征经过输出层C_out进行线性变换得到残差图。第一加法器SUM1将输出层C_out输出的残差图和输入图像I中对应的像素点的像素值相加以输出增强处理后的视频帧O。
S208、采用所述增强视频帧和所述第二视频帧计算损失率。
在本申请实施例中,损失函数为均方误差损失函数,如下公式所示:
L=∑|Y-O| 2
其中,Y为未经编码压缩的视频帧,即第二视频帧,O为视频增强网络输出的视频帧,训练视频大小可以是32,训练可以采用Adam优化器,初始学习率可以设置为10 -4,当然,在实际应用中,本领域技术人员还可以采用其他损失函数来计算损失率,本申请实施例对计算损失率的方式不加以限制。
S209、采用所述损失率对所述视频增强网络的参数进行调整得到训练好的视频增强网络。
在一个实施例中,可以判断损失率是否小于预设阈值,基于所述损失率小于预设阈值的判断结果,停止对视频增强网络进行训练,基于所述损失率大于或等于预设阈值的判断结果,根据损失率调整视频增强网络的参数,返回S206继续对视频增强网络迭代训练。当然,也可以是统计迭代训练的次数,在次数达到预设次数时停止对视频增强网络迭代训练得到训练好的视频增强网络。
在本申请的另一个实施例中,还可以将视频增强网络的参数分为多段,以分别对每段参数进行训练调整,并将已训练好的参数继承到未训练的参数上,以提高训练的速度。
本申请实施例的视频增强网络包括多个稠密残差子网络,在每个稠密残差子网络中均包括下采样层,所有特征均在下采样下提取,降低了视频增强网络的复杂度,提高了视频增强网络的运行速度,并且稠密残差子网络中每个卷积层的输入特征为卷积层之前的所有层的输出特征之和,实现了特征复用,能够在信号稀疏的情况下提高了特征的传输能力,能够恢复高质量的视频帧,即本申请实施例的视频增强网络能够同时兼顾视频增强质量和运行速度。
图3为本申请实施例提供的一种视频增强方法的步骤流程图,本申请实施例可适用对解压后的视频数据进行增强的情况,该方法可以由本申请实施例的视频增强装置来执行,该视频增强装置可以由硬件或软件来实现,并集成在本 申请实施例所提供的电子设备中,例如,如图3所示,本申请实施例的视频增强方法可以包括如下步骤:
S301、获取待增强的视频数据,所述待增强的视频数据包括多帧视频帧。
本申请实施例中,待增强的视频数据由多帧视频帧组成,视频增强可以是对视频数据中视频帧进行图像处理。示例性地,视频增强可以是包括去雾、对比度增强、无损放大、拉伸恢复等能够实现高清视频重建的图像处理。
在实际应用中,视频数据经编码压缩后,视频数据播放前解码得到的视频数据存在失真现象,比如存在方块效应、模糊等各种失真,因此需要对解码后的视频数据进行增强处理,则可以对压缩的视频数据进行解码得到待增强的视频数据。当然,还可以待增强的视频数据还可以是其他视频数据,在一个示例中,在直播场景中,可以将摄像头录制的视频数据作为待增强的视频数据,以改善直播场景中因为光线、器材等因素操作视频质量差的情况,本申请实施例对获取待增强的视频数据的方式不作限制。
S302、将所述视频帧输入预先训练好的视频增强网络中得到增强后的视频帧。
本申请实施例可以预先训练好视频增强网络,该视频增强网络在输入一个视频帧后,可以输出增强后的视频帧,例如,可以通过前述实施例所提供的视频增强网络训练方法来训练视频增强网络,具体训练过程可参考前述实施例,在此不再详述。
S303、将所述增强后的视频帧拼接为增强后的视频数据。
在视频增强网络输出增强后的视频帧后,可以按照视频帧在视频数据中的播放顺序,将增强后的视频帧拼接为增强后的视频数据。在一个示例中,可以记录每个视频帧在视频数据中的播放时间戳,按照播放时间戳拼接各个增强后的视频帧得到增强后的视频数据。
在一个实施例中,本申请实施例可以将视频增强网络嵌入解码器和播放器之间,解码器没解码出一帧视频帧即输入视频增强网络,由视频增强网络输出增强后的视频帧到播放器实时播放,无需对增强后的视频帧进行拼接。
本申请实施例获取待增强的视频数据,将视频数据的视频帧输入预先训练好的视频增强网络中得到增强后的视频帧,将增强后的视频帧拼接为增强后的视频数据。其中,用于视频增强的视频增强网络包括多个稠密残差子网络,在每个稠密残差子网络中均包括下采样层,所有特征均在下采样下提取,降低了视频增强网络的复杂度,提高了视频增强网络的运行速度,并且稠密残差子网络中每个卷积层的输入特征为卷积层之前的所有层的输出特征之和,实现了特征复用,能够在信号稀疏的情况下提高了特征的传输能力,能够恢复高质量的视频帧,即本申请实施例的视频增强网络能够同时兼顾视频增强质量和运行速度。
图4是本申请实施例提供的一种视频增强网络训练装置的结构框图,如图4所示,本申请实施例的视频增强网络训练装置包括:
训练数据获取模块401,设置为获取训练用的第一视频帧和第二视频帧,所 述第二视频帧为所述第一视频帧增强处理后的视频帧;
网络构建模块402,设置为构建视频增强网络;
网络训练模块403,设置为采用所述第一视频帧和第二视频帧训练所述视频增强网络;
其中,所述视频增强网络包括输入层、输出层以及位于所述输入层和所述输出层之间的多个稠密残差子网络,每个所述稠密残差子网络包括下采样层、上采样层以及位于所述下采样层和所述上采样层之间的多个卷积层,每个所述卷积层的输入特征为所述卷积层之前的所有层的输出特征之和。
本申请实施例所提供的视频增强网络训练装置可执行本申请前述实施例所提供的视频增强网络训练方法,具备执行方法相应的功能模块和有益效果。
图5是本申请实施例提供的一种视频增强装置的结构框图,如图5所示,本申请实施例的视频增强装置可以包括如下模块:
待增强视频数据获取模块501,设置为获取待增强的视频数据,所述待增强的视频数据包括多帧视频帧;
视频增强模块502,设置为将所述视频帧输入预先训练好的视频增强网络中得到增强后的视频帧;
拼接模块503,设置为将所述增强后的视频帧拼接为增强后的视频数据;
其中,所述视频增强网络通过前述实施例所述的视频增强网络训练方法所训练。
本申请实施例所提供的视频增强装置可执行本申请实施例所提供的视频增强方法,具备执行方法相应的功能模块和有益效果。
参照图6,示出了本申请一个示例中的一种电子设备的结构示意图。如图6所示,该电子设备可以包括:处理器601、存储装置602、具有触摸功能的显示屏603、输入装置604、输出装置605以及通信装置606。该电子设备中处理器601的数量可以是一个或者多个,图6中以一个处理器601为例。该电子设备的处理器601、存储装置602、显示屏603、输入装置604、输出装置605以及通信装置606可以通过总线或者其他方式连接,图6中以通过总线连接为例。所述电子设备设置为执行如本申请任一实施例提供的视频增强网络训练方法,和/或,视频增强方法。
本申请实施例还提供一种计算机可读存储介质,所述存储介质中的指令由设备的处理器执行时,使得设备能够执行如上述方法实施例所述的视频增强网络训练方法,和/或,视频增强方法。计算机可读存储介质可以是非暂态计算机可读存储介质。
需要说明的是,对于装置、电子设备、存储介质实施例而言,由于其与方法实施例基本相似,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
在本说明书的描述中,参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本申请的至少一个实施例或示例中。在本说明书中, 对上述术语的示意性表述不一定指的是相同的实施例或示例。而且,描述的具体特征、结构、材料或者特点可以在任何的一个或多个实施例或示例中以合适的方式结合。

Claims (13)

  1. 一种视频增强网络训练方法,包括:
    获取训练用的第一视频帧和第二视频帧,所述第二视频帧为所述第一视频帧增强处理后的视频帧;
    构建视频增强网络;
    采用所述第一视频帧和第二视频帧训练所述视频增强网络;
    其中,所述视频增强网络包括输入层、输出层以及位于所述输入层和所述输出层之间的多个稠密残差子网络,每个所述稠密残差子网络包括下采样层、上采样层以及位于所述下采样层和所述上采样层之间的多个卷积层,每个所述卷积层的输入特征为所述卷积层之前的所有层的输出特征之和。
  2. 根据权利要求1所述的方法,其中,所述构建视频增强网络,包括:
    构建多个依次连接的稠密残差子网络;
    在第一个稠密残差子网络之前连接所述输入层;
    在最后一个稠密残差子网络之后连接所述输出层以输出残差图;
    在所述输出层之后连接第一加法器,所述第一加法器用于将所述残差图的像素值和输入所述输入层的图像的像素值相加得到增强后的视频帧。
  3. 根据权利要求2所述的方法,其中,所述输入层和所述输出层为卷积层。
  4. 根据权利要求2所述的方法,其中,所述构建多个依次连接的稠密残差子网络,包括:
    针对每个稠密残差子网络,构建多个依次连接的卷积层,其中,每个卷积层的输出特征与所述卷积层之前的所有层的输出特征求和作为所述卷积层的下一个卷积层的输入特征;
    在第一个卷积层之前连接一个所述下采样层以及在最后一个卷积层之后连接一个所述上采样层;
    在所述上采样层之后连接第二加法器,所述第二加法器用于将所述上采样层的输出特征与所述下采样层的输入特征相加作为每个所述稠密残差子网络的输出特征。
  5. 根据权利要求4所述的方法,其中,所述上采样层通过预设的像素重排算法对最后一个卷积层的输出特征图进行像素重排,得到尺寸与所述下采样层的输入特征图相同的上采样特征图。
  6. 根据权利要求1-5任一项所述的方法,其中,所述采用所述第一视频帧和第二视频帧训练所述视频增强网络,包括:
    对所述第一视频帧进行归一化处理得到归一化处理后的第一视频帧;
    将所述归一化处理后的第一视频帧输入所述视频增强网络的输入层以输出增强视频帧;
    采用所述增强视频帧和所述第二视频帧计算损失率;
    采用所述损失率对所述视频增强网络的参数进行调整得到训练好的视频增强网络。
  7. 根据权利要求6所述的方法,其中,所述对所述第一视频帧进行归一化 处理得到归一化处理后的第一视频帧,包括:
    获取所述第一视频帧的像素位数;
    计算所述像素位数对应的像素值作为所述第一视频帧的最大像素值;
    计算所述最大像素值与1的差值;
    针对所述第一视频帧中每个像素的像素值,计算所述像素值与所述差值的比值作为所述每个像素归一化处理后的像素值。
  8. 根据权利要求6所述的方法,其中,所述采用所述损失率对所述视频增强网络的参数进行调整得到训练好的视频增强网络,包括:
    判断所述损失率是否小于预设阈值;
    基于所述损失率小于预设阈值的判断结果,停止对所述视频增强网络进行训练;
    基于所述损失率大于或等于预设阈值的判断结果,根据所述损失率调整所述视频增强网络的参数,返回对所述第一视频帧进行归一化处理得到归一化处理后的第一视频帧的步骤。
  9. 一种视频增强方法,包括:
    获取待增强的视频数据,所述待增强的视频数据包括多帧视频帧;
    将所述视频帧输入预先训练好的视频增强网络中得到增强后的视频帧;
    将所述增强后的视频帧拼接为增强后的视频数据;
    其中,所述视频增强网络通过权利要求1-8任一项所述的视频增强网络训练方法所训练。
  10. 一种视频增强网络训练装置,包括:
    训练数据获取模块,设置为获取训练用的第一视频帧和第二视频帧,所述第二视频帧为所述第一视频帧增强处理后的视频帧;
    网络构建模块,设置为构建视频增强网络;
    网络训练模块,设置为采用所述第一视频帧和第二视频帧训练所述视频增强网络;
    其中,所述视频增强网络包括输入层、输出层以及位于所述输入层和所述输出层之间的多个稠密残差子网络,每个所述稠密残差子网络包括下采样层、上采样层以及位于所述下采样层和所述上采样层之间的多个卷积层,每个所述卷积层的输入特征为所述卷积层之前的所有层的输出特征之和。
  11. 一种视频增强装置,包括:
    待增强视频数据获取模块,设置为获取待增强的视频数据,所述待增强的视频数据包括多帧视频帧;
    视频增强模块,设置为将所述视频帧输入预先训练好的视频增强网络中得到增强后的视频帧;
    拼接模块,设置为将所述增强后的视频帧拼接为增强后的视频数据;
    其中,所述视频增强网络通过权利要求1-8任一项所述的视频增强网络训练方法所训练。
  12. 一种电子设备,包括:
    一个或多个处理器;
    存储装置,设置为存储一个或多个程序,
    当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-8任一项所述的视频增强网络训练方法,和/或,权利要求9所述的视频增强方法。
  13. 一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1-8任一项所述的视频增强网络训练方法,和/或,权利要求9所述的视频增强方法。
PCT/CN2022/106156 2021-07-29 2022-07-18 视频增强网络训练方法、视频增强方法及装置 WO2023005699A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110866688.1A CN113538287B (zh) 2021-07-29 2021-07-29 视频增强网络训练方法、视频增强方法及相关装置
CN202110866688.1 2021-07-29

Publications (1)

Publication Number Publication Date
WO2023005699A1 true WO2023005699A1 (zh) 2023-02-02

Family

ID=78089767

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/106156 WO2023005699A1 (zh) 2021-07-29 2022-07-18 视频增强网络训练方法、视频增强方法及装置

Country Status (2)

Country Link
CN (1) CN113538287B (zh)
WO (1) WO2023005699A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117590761A (zh) * 2023-12-29 2024-02-23 广东福临门世家智能家居有限公司 用于智能家居的开门状态检测方法及系统

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113538287B (zh) * 2021-07-29 2024-03-29 广州安思创信息技术有限公司 视频增强网络训练方法、视频增强方法及相关装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111724309A (zh) * 2019-03-19 2020-09-29 京东方科技集团股份有限公司 图像处理方法及装置、神经网络的训练方法、存储介质
CN112288658A (zh) * 2020-11-23 2021-01-29 杭州师范大学 一种基于多残差联合学习的水下图像增强方法
CN112419219A (zh) * 2020-11-25 2021-02-26 广州虎牙科技有限公司 图像增强模型训练方法、图像增强方法以及相关装置
CN112801904A (zh) * 2021-02-01 2021-05-14 武汉大学 一种基于卷积神经网络的混合退化图像增强方法
CN113538287A (zh) * 2021-07-29 2021-10-22 广州安思创信息技术有限公司 视频增强网络训练方法、视频增强方法及相关装置

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108235058B (zh) * 2018-01-12 2021-09-17 广州方硅信息技术有限公司 视频质量处理方法、存储介质和终端
CN109785252B (zh) * 2018-12-25 2023-03-24 山西大学 基于多尺度残差密集网络夜间图像增强方法
CN111080575B (zh) * 2019-11-22 2023-08-25 东南大学 一种基于残差密集u形网络模型的丘脑分割方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111724309A (zh) * 2019-03-19 2020-09-29 京东方科技集团股份有限公司 图像处理方法及装置、神经网络的训练方法、存储介质
CN112288658A (zh) * 2020-11-23 2021-01-29 杭州师范大学 一种基于多残差联合学习的水下图像增强方法
CN112419219A (zh) * 2020-11-25 2021-02-26 广州虎牙科技有限公司 图像增强模型训练方法、图像增强方法以及相关装置
CN112801904A (zh) * 2021-02-01 2021-05-14 武汉大学 一种基于卷积神经网络的混合退化图像增强方法
CN113538287A (zh) * 2021-07-29 2021-10-22 广州安思创信息技术有限公司 视频增强网络训练方法、视频增强方法及相关装置

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117590761A (zh) * 2023-12-29 2024-02-23 广东福临门世家智能家居有限公司 用于智能家居的开门状态检测方法及系统
CN117590761B (zh) * 2023-12-29 2024-04-19 广东福临门世家智能家居有限公司 用于智能家居的开门状态检测方法及系统

Also Published As

Publication number Publication date
CN113538287A (zh) 2021-10-22
CN113538287B (zh) 2024-03-29

Similar Documents

Publication Publication Date Title
WO2023005699A1 (zh) 视频增强网络训练方法、视频增强方法及装置
CN113205456B (zh) 一种面向实时视频会话业务的超分辨率重建方法
WO2017084258A1 (zh) 编码过程中的实时视频降噪方法、终端和非易失性计算机可读存储介质
WO2021254139A1 (zh) 视频处理方法、设备及存储介质
CN110798690A (zh) 视频解码方法、环路滤波模型的训练方法、装置和设备
WO2023246923A1 (zh) 视频编码方法、解码方法、电子设备及存储介质
CN110751597A (zh) 基于编码损伤修复的视频超分辨方法
KR20210018668A (ko) 딥러닝 신경 네트워크를 사용하여 다운샘플링을 수행하는 이미지 처리 시스템 및 방법, 영상 스트리밍 서버 시스템
KR20190117691A (ko) Hdr 이미지를 재구성하기 위한 방법 및 디바이스
CN110827380A (zh) 图像的渲染方法、装置、电子设备及计算机可读介质
CN111696039A (zh) 图像处理方法及装置、存储介质和电子设备
Ho et al. Down-sampling based video coding with degradation-aware restoration-reconstruction deep neural network
WO2023050720A1 (zh) 图像处理方法、图像处理装置、模型训练方法
WO2022266955A1 (zh) 图像解码及处理方法、装置及设备
CN114173137A (zh) 视频编码方法、装置及电子设备
CN116797462A (zh) 基于深度学习的实时视频超分辨率重建方法
CN113747242B (zh) 图像处理方法、装置、电子设备及存储介质
WO2022156688A1 (zh) 分层编解码的方法及装置
CN114240750A (zh) 视频分辨率提升方法及装置、存储介质及电子设备
CN115967784A (zh) 基于mipi csi c-phy协议的图像传输处理系统及处理方法
CN115376188B (zh) 一种视频通话处理方法、系统、电子设备及存储介质
TWI822032B (zh) 影片播放系統、可攜式影片播放裝置及影片增強方法
CN117237259B (zh) 基于多模态融合的压缩视频质量增强方法及装置
CN114205646B (zh) 数据处理方法、装置、电子设备及存储介质
US20240095878A1 (en) Method, electronic device, and computer program product for video processing

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22848304

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE