WO2023005699A1 - Procédé et dispositif de formation de réseau d'amélioration vidéo, et procédé et dispositif d'amélioration vidéo - Google Patents
Procédé et dispositif de formation de réseau d'amélioration vidéo, et procédé et dispositif d'amélioration vidéo Download PDFInfo
- Publication number
- WO2023005699A1 WO2023005699A1 PCT/CN2022/106156 CN2022106156W WO2023005699A1 WO 2023005699 A1 WO2023005699 A1 WO 2023005699A1 CN 2022106156 W CN2022106156 W CN 2022106156W WO 2023005699 A1 WO2023005699 A1 WO 2023005699A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- video
- layer
- video frame
- network
- enhanced
- Prior art date
Links
- 238000012549 training Methods 0.000 title claims abstract description 67
- 238000000034 method Methods 0.000 title claims abstract description 62
- 238000005070 sampling Methods 0.000 claims abstract description 15
- 238000012545 processing Methods 0.000 claims description 10
- 238000004590 computer program Methods 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 3
- 230000008707 rearrangement Effects 0.000 claims description 3
- 230000003416 augmentation Effects 0.000 claims description 2
- 238000010586 diagram Methods 0.000 description 9
- 230000005540 biological transmission Effects 0.000 description 8
- 230000006870 function Effects 0.000 description 6
- 238000013528 artificial neural network Methods 0.000 description 5
- 101150112492 SUM-1 gene Proteins 0.000 description 4
- 101150096255 SUMO1 gene Proteins 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 230000006835 compression Effects 0.000 description 3
- 238000007906 compression Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 230000002708 enhancing effect Effects 0.000 description 3
- 101100204393 Arabidopsis thaliana SUMO2 gene Proteins 0.000 description 2
- 101100311460 Schizosaccharomyces pombe (strain 972 / ATCC 24843) sum2 gene Proteins 0.000 description 2
- 230000004913 activation Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 238000011084 recovery Methods 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000000737 periodic effect Effects 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/90—Dynamic range modification of images or parts thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T9/00—Image coding
- G06T9/002—Image coding using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Definitions
- the embodiments of the present application relate to the technical field of video processing, for example, to a video enhancement network training method, a video enhancement method and a device.
- video compression/encoding can reduce storage space and transmission bandwidth. It plays a vital role.
- Video compression will cause various distortions such as block effect and blur in the compressed video, which seriously affects people's video viewing experience.
- neural networks are widely used in video quality improvement.
- more complex and deeper networks are often used to extract image features, but complex and deep neural networks run slowly, and for video enhancement tasks, the network speed is also very high.
- slow neural networks limit the application of image enhancement networks to video quality enhancement tasks.
- the neural network used for video enhancement in the related art cannot balance the video enhancement quality and running speed.
- the embodiment of the present application provides a video enhancement network training method, video enhancement method, device, electronic equipment and storage medium, so as to avoid the situation that the neural network used for video enhancement in the related art cannot take into account the video enhancement quality and running speed.
- the embodiment of the present application provides a video enhancement network training method, including:
- the video enhancement network includes an input layer, an output layer, and a plurality of dense residual subnetworks between the input layer and the output layer, and each of the dense residual subnetworks includes a downsampling layer, an upper A sampling layer and a plurality of convolutional layers located between the downsampling layer and the upsampling layer, the input feature of each convolutional layer is the sum of the output features of all layers before the convolutional layer.
- the embodiment of the present application provides a video enhancement method, including:
- the video data to be enhanced includes multiple frames of video frames
- the video enhancement network is trained by the video enhancement network training method described in the first aspect.
- the embodiment of the present application provides a video enhancement network training device, including:
- the training data acquisition module is configured to acquire the first video frame and the second video frame used for training, and the second video frame is a video frame after the enhanced processing of the first video frame;
- a network building block configured to construct a video augmentation network
- a network training module configured to train the video enhancement network using the first video frame and the second video frame
- the video enhancement network includes an input layer, an output layer, and a plurality of dense residual subnetworks between the input layer and the output layer, and each of the dense residual subnetworks includes a downsampling layer, an upper A sampling layer and a plurality of convolutional layers located between the downsampling layer and the upsampling layer, the input feature of each convolutional layer is the sum of the output features of all layers before the convolutional layer.
- the embodiment of the present application provides a video enhancement device, including:
- the video data acquisition module to be enhanced is configured to acquire video data to be enhanced, and the video data to be enhanced includes multi-frame video frames;
- the video enhancement module is configured to input the video frame into the enhanced video frame obtained in the pre-trained video enhancement network;
- a splicing module configured to splice the enhanced video frames into enhanced video data
- the video enhancement network is trained by the video enhancement network training method described in the first aspect.
- an embodiment of the present application provides an electronic device, the electronic device comprising:
- processors one or more processors
- storage means configured to store one or more programs
- the one or more processors When the one or more programs are executed by the one or more processors, the one or more processors implement the video enhancement network training method described in the first aspect of the present application, and/or, the second aspect The described video enhancement method.
- the embodiment of the present application provides a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the video enhancement network training method described in the first aspect of the present application is implemented, and/or , the video enhancement method described in the second aspect.
- Fig. 1 is a flow chart of the steps of a video enhancement network training method provided by an embodiment of the present application
- FIG. 2A is a flow chart of the steps of a video enhancement network training method provided by another embodiment of the present application.
- Fig. 2B is a schematic diagram of the dense residual subnetwork in the embodiment of the present application.
- FIG. 2C is a schematic structural diagram of a video enhancement network according to an embodiment of the present application.
- Fig. 3 is a flow chart of steps of a video enhancement method provided by an embodiment of the present application.
- Fig. 4 is a structural block diagram of a video enhancement network training device provided by an embodiment of the present application.
- Fig. 5 is a structural block diagram of a video enhancement device provided by an embodiment of the present application.
- Fig. 6 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
- Figure 1 is a flow chart of the steps of a video enhancement network training method provided by an embodiment of the present application.
- the embodiment of the present application is applicable to the situation where the video enhancement network is trained to enhance the video.
- the method can be implemented by the embodiment of the present application.
- Video enhanced network training device to perform, the video enhanced network training device can be implemented by hardware or software, and integrated in the electronic equipment provided by the embodiment of the application, for example, as shown in Figure 1, the video of the embodiment of the application
- the enhanced network training method may include the following steps:
- the first video frame can be the video frame used to input the video enhancement network during training
- the second video frame can be the video frame used as the label during training, that is, the second video frame can be the first video frame after the enhancement process The resulting video frame.
- video data is composed of multiple video frames, and the video data is coded and compressed at the sending end before network transmission, and decoded when the receiving end receives the coded and compressed video data. Since the video data is encoded and decoded, the decoded video data is distorted to a certain extent, then multiple video frames can be extracted from the decoded video data as the first video frame for training, and the encoded video frame before compression The undistorted video frame in the video data is used as the second video frame. Certainly, the enhanced video frame obtained after artificially enhancing the first video frame may also be used as the second video frame.
- the video enhancement network of the embodiment of the present application includes an input layer, an output layer, and a plurality of dense residual subnetworks between the input layer and the output layer, and each dense residual subnetwork includes a downsampling layer, an upsampling layer, and a Multiple convolutional layers located between the downsampling layer and the upsampling layer, the input feature of each convolutional layer is the sum of the output features of all layers before the convolutional layer.
- the input and output layers may be convolutional layers.
- Each dense residual sub-network sets a downsampling layer, which enables all feature operations to be performed under downsampling, reducing the complexity of the video enhancement network.
- the input of each convolutional layer in the dense residual sub-network is the sum of the output features of all layers before the convolutional layer, which realizes feature multiplexing, improves the transmission capability of features when the signal is sparse, and avoids feature loss. , which improves the recovery quality of video frames.
- the first video frame is input to the input layer, it undergoes convolution processing to obtain a shallow feature map.
- the shallow feature map is input into the first dense residual sub-network and then down-sampled to obtain a down-sampled feature map.
- the input feature of each convolutional layer is the sum of the output features of all layers before the convolutional layer.
- the video enhancement network outputs the enhanced enhanced video frame, and adjusts the parameters of the video enhancement network by calculating the loss rate of the enhanced video frame and the second video frame until the video enhancement network converges or the number of training times reaches the preset number of times to obtain a trained video.
- An enhanced network the trained video enhanced network is used to output the enhanced video frame when the video frame to be enhanced is input.
- the video enhancement network of the embodiment of the present application includes a plurality of dense residual sub-networks, and each dense residual sub-network includes a downsampling layer, and all features are extracted under downsampling, which reduces the complexity of the video enhancement network and improves
- the speed of the video enhancement network is improved, and the input feature of each convolutional layer in the dense residual sub-network is the sum of the output features of all layers before the convolutional layer, which realizes feature multiplexing and can be used in the case of sparse signals.
- the feature transmission capability is improved, and high-quality video frames can be recovered, that is, the video enhancement network in the embodiment of the present application can take both video enhancement quality and running speed into consideration.
- Fig. 2A is a flow chart of the steps of a video enhancement network training method provided by another embodiment of the present application.
- the embodiment of the present application is refined on the basis of the foregoing embodiments.
- the video enhancement network training method can comprise the steps:
- video data is composed of multiple frames of video frames, and the video data is coded and compressed by the sending end before network transmission, and decoded when the receiving end receives the coded and compressed video data. Since the video data is encoded and decoded, the decoded video data is distorted to a certain extent. Multiple video frames can be extracted from the decoded video data as the first video frame for training, and the video before encoding The unencoded and compressed video frame in the data is used as the second video frame. Certainly, the enhanced video frame obtained after artificially enhancing the first video frame may also be used as the second video frame.
- the dense residual sub-network can be a network containing multiple convolutional layers.
- the input of each convolutional layer is the sum of the output features of all layers before the convolutional layer.
- each dense residual sub-network multiple sequentially connected convolutional layers are constructed, wherein the output features of each convolutional layer are summed with the output features of all layers before the convolutional layer
- a downsampling layer is connected before the first convolutional layer and an upsampling layer is connected after the last convolutional layer
- the second addition is connected after the upsampling layer
- the second adder is used to add the output features of the up-sampling layer and the input features of the down-sampling layer as the output features of the dense residual sub-network.
- the downsampling layer can be bilinear interpolation sampling
- the convolution kernel size of each convolution layer can be 3 ⁇ 3
- ⁇ ( ) is the activation function
- W, b are the weights and offset coefficients of the convolutional layer
- F i is the feature obtained after convolution.
- FIG. 2B a schematic diagram of a dense residual sub-network is shown in Figure 2B.
- the input feature F in is passed through the downsampling layer to obtain a downsampling feature map F 0
- the downsampling feature map F 0 is passed through the first
- a convolutional layer outputs the feature map F 1
- the downsampled feature map F 0 and the feature map F 1 can be concatenated as the input feature of the second convolutional layer
- the second convolutional layer outputs the feature map F 2
- concatenate the feature maps F 0 , F 1 , and F 2 as the input features of the third convolutional layer, and so on.
- the splicing of two or more feature maps may be the splicing of feature maps with the same size on the channel.
- feature map A is H ⁇ W ⁇ C A
- feature map B is H ⁇ W ⁇ C B
- the feature map obtained by splicing feature map A and feature map B is H ⁇ W ⁇ (C A +C B ) , where H is the height of the feature map, W is the width of the feature map, and C is the channel value.
- the feature map F d is up-sampled to obtain an up-sampled feature map with the same size as the input feature F in , and finally the up-sampled feature map and the input feature map F in pass through the second adder After SUM2, the output feature F out of the dense residual sub-network is obtained, and the output feature F out is used as the input feature F in of the next dense residual sub-network.
- the second adder is used for adding pixel values of corresponding pixel points in the input feature map F in and the upsampling feature map.
- the upsampling layer performs pixel rearrangement on the output feature map of the last convolutional layer through a preset pixel rearrangement algorithm to obtain an upsampled feature map with the same size as the input feature map of the downsampling layer.
- the pixel shuffling (PixelShuffle) algorithm converts a low-resolution input image (Low Resolution) with a size of H ⁇ W into a high-resolution image (High Resolution) of rH ⁇ rW through Sub-pixel operation, where , r is the upsampling factor, that is, the magnification from low resolution to high resolution.
- the upsampling layer uses PixelShuffle to obtain feature maps of 2 n ⁇ C channels through periodic screening. The method obtains a high-resolution feature map with the number of channels C.
- an input layer C_in is connected before the first dense residual sub-network SDRB 1 .
- the input layer C_in may be a convolutional layer with a convolution kernel equal to 3 ⁇ 3, so as to perform a convolution operation on the input image to obtain a shallow feature F in to be input into the first dense residual sub-network SDRB 1 .
- an input layer C_out is connected after the last dense residual sub-network SDRB N.
- the input layer C_out may be a convolutional layer with a convolution kernel equal to 3 ⁇ 3, so as to linearly transform the output features of the last dense residual sub-network SDRB N to obtain a residual map.
- the first adder SUM1 is connected after the output layer C_out of the video enhancement network, the input of the first adder SUM1 is the residual map output by the output layer C_out and the input image I of the input layer C_in, the first An adder SUM1 adds the residual map output by the output layer C_out to the pixel value of the corresponding pixel in the input image I to output the enhanced video frame O.
- the number of pixel bits of the first video frame can be obtained, the pixel value corresponding to the number of pixel bits can be calculated as the maximum pixel value of the first video frame, and the difference between the maximum pixel value and 1 can be calculated, for the first
- the pixel value of each pixel in the video frame calculate the ratio of the pixel value to the difference as the normalized pixel value of each pixel, for example, the formula for normalization is as follows:
- the input feature F in shown in Figure 2B is obtained after the normalized first video frame I is input into the input layer, and the input feature F in is sequentially processed in multiple dense residual sub-networks Transmission in SDRB N.
- the input feature F in is first sampled by the downsampling layer, and then sequentially transmitted in the convolutional layer of the dense residual sub-network SDRB N , each convolutional layer
- the input feature of the input feature is the sum of the output features of all layers before the convolutional layer, and the output of the last convolutional layer passes through the upsampling layer and then outputs the upsampling feature.
- the output feature F out is used as the input feature F in of the next dense residual sub-network, and the output feature of the last dense residual sub-network SDRB N is linearly transformed through the output layer C_out to obtain a residual map.
- the first adder SUM1 adds the residual map output by the output layer C_out to the pixel value of the corresponding pixel in the input image I to output the enhanced video frame O.
- the loss function is the mean square error loss function, as shown in the following formula:
- Y is the unencoded and compressed video frame, that is, the second video frame
- O is the video frame output by the video enhancement network
- the size of the training video can be 32
- the training can use the Adam optimizer
- the initial learning rate can be set to 10- 4.
- those skilled in the art can also use other loss functions to calculate the loss rate, and the embodiment of the present application does not limit the way of calculating the loss rate.
- the number of iterative training can also be counted, and when the number reaches the preset number, the iterative training of the video enhancement network is stopped to obtain a trained video enhancement network.
- the parameters of the video enhancement network can also be divided into multiple sections, so as to train and adjust the parameters of each section respectively, and inherit the trained parameters to the untrained parameters to improve the training performance. speed.
- the video enhancement network of the embodiment of the present application includes a plurality of dense residual sub-networks, and each dense residual sub-network includes a downsampling layer, and all features are extracted under downsampling, which reduces the complexity of the video enhancement network and improves
- the speed of the video enhancement network is improved, and the input feature of each convolutional layer in the dense residual sub-network is the sum of the output features of all layers before the convolutional layer, which realizes feature multiplexing and can be used in the case of sparse signals.
- the feature transmission capability is improved, and high-quality video frames can be recovered, that is, the video enhancement network in the embodiment of the present application can take both video enhancement quality and running speed into consideration.
- Fig. 3 is a flow chart of the steps of a video enhancement method provided by the embodiment of the present application.
- the embodiment of the present application is applicable to the case of enhancing decompressed video data, and the method can be executed by the video enhancement device of the embodiment of the present application.
- the video enhancement device may be implemented by hardware or software, and integrated into the electronic device provided by the embodiment of the present application.
- the video enhancement method of the embodiment of the present application may include the following steps:
- the video data to be enhanced is composed of multiple video frames
- the video enhancement may be to perform image processing on the video frames in the video data.
- the video enhancement may be image processing including defogging, contrast enhancement, lossless magnification, stretch recovery, etc., capable of realizing high-definition video reconstruction.
- the video data obtained by decoding before the video data is played has distortion phenomena, such as block effects, blurring and other distortions, so it is necessary to enhance the decoded video data, then it can be
- the compressed video data is decoded to obtain the video data to be enhanced.
- the video data to be enhanced can also be other video data.
- the video data recorded by the camera can be used as the video data to be enhanced to improve the video data in the live broadcast scene due to light, equipment, etc. Due to the fact that the quality of the operating video is poor, the embodiment of the present application does not limit the manner of acquiring the video data to be enhanced.
- the embodiment of the present application can pre-train the video enhancement network. After inputting a video frame, the video enhancement network can output the enhanced video frame.
- the video enhancement network training method provided in the foregoing embodiments can be used to train video enhancement.
- the specific training process of the network reference may be made to the foregoing embodiments, and no further details are given here.
- the enhanced video frames can be spliced into enhanced video data according to the playing sequence of the video frames in the video data.
- the playback time stamp of each video frame in the video data may be recorded, and each enhanced video frame may be spliced according to the playback time stamp to obtain enhanced video data.
- the embodiment of the present application can embed the video enhancement network between the decoder and the player, the decoder does not decode a frame of video and then inputs it into the video enhancement network, and the video enhancement network outputs the enhanced video frame to The player plays in real time without splicing the enhanced video frames.
- video data to be enhanced is obtained, video frames of the video data are input into a pre-trained video enhancement network to obtain enhanced video frames, and the enhanced video frames are spliced into enhanced video data.
- the video enhancement network used for video enhancement includes multiple dense residual subnetworks, each of which includes a downsampling layer, and all features are extracted under downsampling, which reduces the complexity of the video enhancement network , which improves the running speed of the video enhancement network
- the input feature of each convolutional layer in the dense residual sub-network is the sum of the output features of all layers before the convolutional layer, which realizes feature multiplexing and can be used in sparse signal
- the feature transmission capability is improved, and high-quality video frames can be restored, that is, the video enhancement network in the embodiment of the present application can take both video enhancement quality and running speed into consideration.
- Fig. 4 is a structural block diagram of a video enhancement network training device provided by the embodiment of the present application. As shown in Fig. 4, the video enhancement network training device of the embodiment of the present application includes:
- the training data acquisition module 401 is configured to obtain the first video frame and the second video frame used for training, and the second video frame is a video frame after the first video frame enhancement process;
- a network construction module 402 configured to construct a video enhancement network
- a network training module 403, configured to use the first video frame and the second video frame to train the video enhancement network
- the video enhancement network includes an input layer, an output layer, and a plurality of dense residual subnetworks between the input layer and the output layer, and each of the dense residual subnetworks includes a downsampling layer, an upper A sampling layer and a plurality of convolutional layers located between the downsampling layer and the upsampling layer, the input feature of each convolutional layer is the sum of the output features of all layers before the convolutional layer.
- the video-enhanced network training device provided in the embodiment of the present application can execute the video-enhanced network training method provided in the foregoing embodiments of the present application, and has corresponding functional modules and beneficial effects for executing the method.
- Fig. 5 is a structural block diagram of a video enhancement device provided in the embodiment of the present application. As shown in Fig. 5, the video enhancement device in the embodiment of the present application may include the following modules:
- the video data acquisition module 501 to be enhanced is configured to acquire video data to be enhanced, and the video data to be enhanced includes multi-frame video frames;
- the video enhancement module 502 is configured to input the video frame into the enhanced video frame obtained in the pre-trained video enhancement network;
- the splicing module 503 is configured to splice the enhanced video frames into enhanced video data
- the video enhancement network is trained by the video enhancement network training method described in the foregoing embodiments.
- the video enhancement device provided in the embodiment of the present application can execute the video enhancement method provided in the embodiment of the present application, and has corresponding functional modules and beneficial effects for executing the method.
- the electronic device may include: a processor 601 , a storage device 602 , a display screen 603 with a touch function, an input device 604 , an output device 605 and a communication device 606 .
- the number of processors 601 in the electronic device may be one or more, and one processor 601 is taken as an example in FIG. 6 .
- the processor 601 , storage device 602 , display screen 603 , input device 604 , output device 605 and communication device 606 of the electronic device may be connected via a bus or in other ways. In FIG. 6 , connection via a bus is taken as an example.
- the electronic device is configured to execute the video enhancement network training method provided in any embodiment of the present application, and/or the video enhancement method.
- the embodiment of the present application also provides a computer-readable storage medium, when the instructions in the storage medium are executed by the processor of the device, the device can execute the video enhancement network training method as described in the above method embodiment, and/or , a video enhancement method.
- the computer readable storage medium may be a non-transitory computer readable storage medium.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
Abstract
Des modes de réalisation de la présente demande concernent un procédé et un dispositif de formation de réseau d'amélioration vidéo, ainsi qu'un procédé et un dispositif d'amélioration vidéo. Le procédé de formation de réseau d'amélioration vidéo consiste à : obtenir une première trame vidéo et une seconde trame vidéo pour formation ; construire un réseau d'amélioration vidéo ; et former le réseau d'amélioration vidéo à l'aide de la première trame vidéo et de la seconde trame vidéo. Le réseau d'amélioration vidéo comprend une couche d'entrée, une couche de sortie, et une pluralité de sous-réseaux résiduels denses situés entre la couche d'entrée et la couche de sortie. Chaque sous-réseau résiduel dense comprend une couche d'échantillonnage inférieure, une couche d'échantillonnage supérieure, et une pluralité de couches de convolution situées entre la couche d'échantillonnage supérieure et la couche d'échantillonnage inférieure. Une caractéristique d'entrée de chaque couche de convolution est la somme de caractéristiques de sortie de toutes les couches avant la couche de convolution.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110866688.1 | 2021-07-29 | ||
CN202110866688.1A CN113538287B (zh) | 2021-07-29 | 2021-07-29 | 视频增强网络训练方法、视频增强方法及相关装置 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023005699A1 true WO2023005699A1 (fr) | 2023-02-02 |
Family
ID=78089767
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2022/106156 WO2023005699A1 (fr) | 2021-07-29 | 2022-07-18 | Procédé et dispositif de formation de réseau d'amélioration vidéo, et procédé et dispositif d'amélioration vidéo |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN113538287B (fr) |
WO (1) | WO2023005699A1 (fr) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117204910A (zh) * | 2023-09-26 | 2023-12-12 | 北京长木谷医疗科技股份有限公司 | 基于深度学习的膝关节位置实时追踪的自动截骨方法 |
CN117590761A (zh) * | 2023-12-29 | 2024-02-23 | 广东福临门世家智能家居有限公司 | 用于智能家居的开门状态检测方法及系统 |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113538287B (zh) * | 2021-07-29 | 2024-03-29 | 广州安思创信息技术有限公司 | 视频增强网络训练方法、视频增强方法及相关装置 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111724309A (zh) * | 2019-03-19 | 2020-09-29 | 京东方科技集团股份有限公司 | 图像处理方法及装置、神经网络的训练方法、存储介质 |
CN112288658A (zh) * | 2020-11-23 | 2021-01-29 | 杭州师范大学 | 一种基于多残差联合学习的水下图像增强方法 |
CN112419219A (zh) * | 2020-11-25 | 2021-02-26 | 广州虎牙科技有限公司 | 图像增强模型训练方法、图像增强方法以及相关装置 |
CN112801904A (zh) * | 2021-02-01 | 2021-05-14 | 武汉大学 | 一种基于卷积神经网络的混合退化图像增强方法 |
CN113538287A (zh) * | 2021-07-29 | 2021-10-22 | 广州安思创信息技术有限公司 | 视频增强网络训练方法、视频增强方法及相关装置 |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108235058B (zh) * | 2018-01-12 | 2021-09-17 | 广州方硅信息技术有限公司 | 视频质量处理方法、存储介质和终端 |
CN109785252B (zh) * | 2018-12-25 | 2023-03-24 | 山西大学 | 基于多尺度残差密集网络夜间图像增强方法 |
CN111080575B (zh) * | 2019-11-22 | 2023-08-25 | 东南大学 | 一种基于残差密集u形网络模型的丘脑分割方法 |
-
2021
- 2021-07-29 CN CN202110866688.1A patent/CN113538287B/zh active Active
-
2022
- 2022-07-18 WO PCT/CN2022/106156 patent/WO2023005699A1/fr unknown
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111724309A (zh) * | 2019-03-19 | 2020-09-29 | 京东方科技集团股份有限公司 | 图像处理方法及装置、神经网络的训练方法、存储介质 |
CN112288658A (zh) * | 2020-11-23 | 2021-01-29 | 杭州师范大学 | 一种基于多残差联合学习的水下图像增强方法 |
CN112419219A (zh) * | 2020-11-25 | 2021-02-26 | 广州虎牙科技有限公司 | 图像增强模型训练方法、图像增强方法以及相关装置 |
CN112801904A (zh) * | 2021-02-01 | 2021-05-14 | 武汉大学 | 一种基于卷积神经网络的混合退化图像增强方法 |
CN113538287A (zh) * | 2021-07-29 | 2021-10-22 | 广州安思创信息技术有限公司 | 视频增强网络训练方法、视频增强方法及相关装置 |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117204910A (zh) * | 2023-09-26 | 2023-12-12 | 北京长木谷医疗科技股份有限公司 | 基于深度学习的膝关节位置实时追踪的自动截骨方法 |
CN117590761A (zh) * | 2023-12-29 | 2024-02-23 | 广东福临门世家智能家居有限公司 | 用于智能家居的开门状态检测方法及系统 |
CN117590761B (zh) * | 2023-12-29 | 2024-04-19 | 广东福临门世家智能家居有限公司 | 用于智能家居的开门状态检测方法及系统 |
Also Published As
Publication number | Publication date |
---|---|
CN113538287A (zh) | 2021-10-22 |
CN113538287B (zh) | 2024-03-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2023005699A1 (fr) | Procédé et dispositif de formation de réseau d'amélioration vidéo, et procédé et dispositif d'amélioration vidéo | |
CN113205456B (zh) | 一种面向实时视频会话业务的超分辨率重建方法 | |
WO2017084258A1 (fr) | Procédé de réduction de bruit de vidéo en temps réel dans un processus de codage, terminal, et support de stockage non volatile lisible par ordinateur | |
WO2021254139A1 (fr) | Procédé et dispositif de traitement vidéo et support d'enregistrement | |
CN110798690A (zh) | 视频解码方法、环路滤波模型的训练方法、装置和设备 | |
WO2023246923A1 (fr) | Procédé de codage vidéo, procédé de décodage vidéo et dispositif électronique et support de stockage | |
CN110751597A (zh) | 基于编码损伤修复的视频超分辨方法 | |
KR20210018668A (ko) | 딥러닝 신경 네트워크를 사용하여 다운샘플링을 수행하는 이미지 처리 시스템 및 방법, 영상 스트리밍 서버 시스템 | |
KR20190117691A (ko) | Hdr 이미지를 재구성하기 위한 방법 및 디바이스 | |
CN111696039A (zh) | 图像处理方法及装置、存储介质和电子设备 | |
Ho et al. | Down-sampling based video coding with degradation-aware restoration-reconstruction deep neural network | |
CN113747242B (zh) | 图像处理方法、装置、电子设备及存储介质 | |
WO2023050720A1 (fr) | Procédé de traitement d'image, appareil de traitement d'image et procédé de formation de modèle | |
WO2022266955A1 (fr) | Procédé et appareil de décodage d'images, procédé et appareil de traitement d'images, et dispositif | |
CN116797462A (zh) | 基于深度学习的实时视频超分辨率重建方法 | |
WO2022156688A1 (fr) | Procédés et appareils de codage et décodage en couches | |
CN114240750A (zh) | 视频分辨率提升方法及装置、存储介质及电子设备 | |
CN115967784A (zh) | 基于mipi csi c-phy协议的图像传输处理系统及处理方法 | |
CN115376188B (zh) | 一种视频通话处理方法、系统、电子设备及存储介质 | |
Zhang et al. | An efficient depth map filtering based on spatial and texture features for 3D video coding | |
TWI822032B (zh) | 影片播放系統、可攜式影片播放裝置及影片增強方法 | |
CN117237259B (zh) | 基于多模态融合的压缩视频质量增强方法及装置 | |
CN114205646B (zh) | 数据处理方法、装置、电子设备及存储介质 | |
US20240095878A1 (en) | Method, electronic device, and computer program product for video processing | |
US11948275B2 (en) | Video bandwidth optimization within a video communications platform |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22848304 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |