WO2022111631A1 - Video transmission method, server, terminal, and video transmission system - Google Patents

Video transmission method, server, terminal, and video transmission system Download PDF

Info

Publication number
WO2022111631A1
WO2022111631A1 PCT/CN2021/133497 CN2021133497W WO2022111631A1 WO 2022111631 A1 WO2022111631 A1 WO 2022111631A1 CN 2021133497 W CN2021133497 W CN 2021133497W WO 2022111631 A1 WO2022111631 A1 WO 2022111631A1
Authority
WO
WIPO (PCT)
Prior art keywords
resolution
super
target video
video
model
Prior art date
Application number
PCT/CN2021/133497
Other languages
French (fr)
Chinese (zh)
Inventor
鲁威
王祺
孙龙
林焕
胡康康
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2022111631A1 publication Critical patent/WO2022111631A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • H04N21/234363Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements by altering the spatial resolution, e.g. for clients with a lower screen resolution
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • H04N21/234381Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements by altering the temporal resolution, e.g. decreasing the frame rate by frame skipping
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/266Channel or content management, e.g. generation and management of keys and entitlement messages in a conditional access system, merging a VOD unicast channel into a multicast channel
    • H04N21/2662Controlling the complexity of the video stream, e.g. by scaling the resolution or bitrate of the video stream based on the client capabilities

Definitions

  • the present application relates to the technical field of video transmission, and in particular, to a video transmission method, server, terminal and video transmission system.
  • the user terminal side (hereinafter referred to as the terminal side) will restore the resolution of the received video before displaying it.
  • Super-resolution hereinafter referred to as: super-resolution
  • super-resolution technology is an important computer vision and image processing method to restore high-resolution pictures from low-resolution pictures.
  • Super-resolution technology based on deep learning is developing rapidly.
  • a super-resolution model is preset on the device side to perform super-resolution reconstruction on low-quality video to obtain high-resolution and high-quality video.
  • the preset model cannot cover all video scenes, resulting in a large gap between the image quality of the restored super-resolution video and the original high-resolution video. Even in some scenes the quality of the video is degraded.
  • the embodiment of the present application provides a video transmission method, which is applied to the field of video transmission, and is used for reducing the transmission bandwidth while ensuring the video quality of the receiving end.
  • a first aspect of the embodiments of the present application provides a video transmission method, including: a server acquiring a super-score model of a target video, where the super-score model is based on the target video at a first resolution and the target video at a second resolution
  • the target video is acquired through training, the second resolution is smaller than the first resolution, and the magnification of the super-resolution model is the difference between the pixels of the first resolution and the second resolution in the length direction or the width direction.
  • the server sends the target video of the third resolution and the super-resolution model to the terminal, and the super-resolution model is used to super-resolution the target video of the third resolution at the magnification ratio.
  • Resolution reconstruction to obtain the target video at a fourth resolution.
  • the server sends the target video of the third resolution and the super-resolution model to the terminal, and the super-resolution model is based on the target video of different resolutions (the first resolution and the second resolution).
  • the terminal may super-score the target video of the third resolution according to the magnification of the super-resolution model to obtain the target video of the fourth resolution. Since the super-score model sent by the server to the terminal in the video transmission method in the embodiment of the present application is obtained based on target video training, compared with the prior art, the terminal performs over-score for various target videos according to a single preset super-score model. By using the super-score model of the method, a better super-score effect can be obtained when the target video is restored by the over-score, that is, the image quality of the restored target video of the fourth resolution is higher.
  • the target video is a single video or a set of multiple videos of the same type.
  • the target video can be an episode of a series or an independent video such as a news video, a movie, etc., that is, the super-score model is independently trained for a single video; or, the target video can be multiple videos of the same type, For example, a series in a series or a season broadcast, or a season, or the same series of videos of a personal blogger, etc., that is, multiple videos related in content, characters or performances as target videos, these videos usually have similar The style and image quality of these videos are independently trained for the super-scoring model, and the video quality obtained by the super-scoring recovery is better.
  • the third resolution is equal to the second resolution; the fourth resolution is equal to the first resolution.
  • the target video sent by the server to the terminal is the target video of lower resolution (that is, the second resolution) used for training the super-score model.
  • the super-resolution model is used to restore the target video of the second resolution to the target video of the first resolution.
  • the effect of super-resolution restoration is better, and the image quality of the restored target video is better.
  • the obtaining, by the server, the super-score model of the target video includes: the server inputting the target video of the first resolution and the target video of the second resolution into a volume
  • the neural network model is accumulated to obtain the over-fitted super-score model.
  • the server can input target videos of different resolutions into the convolutional neural network model for training, and obtain an over-fitted super-score model.
  • the over-fitted super-score model is not applicable to other non- Overscore of the target video, but works extremely well in the overscore of the training dataset i.e. the target video.
  • the obtaining, by the server, the super-score model of the target video includes: the server determining the video frames in the video of the first resolution and the video of the second resolution by the server Perform data cleaning on the video frames in the video frame to obtain the target video of the first resolution and the target video of the second resolution; the server performs data cleaning on the target video of the first resolution and the target video of the second resolution
  • the target video of the second resolution is trained to obtain the super-score model.
  • the server may also perform data cleaning on the video frames of the target video before training the super-score model.
  • the data cleaning includes removing video frames such as blurred frames and low-information frames. Using videos for model training can improve model performance.
  • the sending, by the server, the target video of the third resolution and the super-resolution model to the terminal specifically includes: the server sending the super-resolution model to the terminal
  • the server sends a data packet to the terminal, and the data packet includes the weight parameter of the super-score model and the target video of the third resolution.
  • the server sends the target video and the super-resolution model to the terminal, which can be sent separately or simultaneously.
  • the super-score model can be sent in two parts: the structure of the super-score model and the weight parameters of the super-score model.
  • the server sends the super-score model to the terminal in advance. The structure of the model, and then send the weight parameters of the super-score model and the target video to the terminal. Further, when the server sends to the terminal, the target video will be compressed into different data packets for transmission. Perform super-score on the target video.
  • the super-score models of different target videos can have the same model structure.
  • the super-score model of each target video is sent to the terminal, only the corresponding weight parameters need to be sent. That is, it is possible to reduce the amount of data transmission by this.
  • a second aspect of the embodiments of the present application provides a video transmission method, including: a terminal receiving a super-resolution model of a target video sent by a server, where the super-resolution model is based on the target video of a first resolution and a second resolution of the target video
  • the target video is obtained through training, the second resolution is smaller than the first resolution, and the magnification of the super-resolution model is the first resolution and the second resolution in the length direction or in the width direction.
  • the target video of the third resolution the terminal performs super-resolution reconstruction on the target video of the third resolution at the magnification according to the super-resolution model, and obtains the target video at the fourth resolution.
  • the terminal receives the target video and the super-resolution model of the third resolution sent by the server, because the super-resolution model is based on the target video of different resolutions (the first resolution and the second resolution).
  • the super-score model of this method can obtain a better over-score effect when performing over-score recovery on the target video. , that is, the image quality of the restored target video of the fourth resolution is higher.
  • the target video is a single video or a set of multiple videos of the same type.
  • the target video has multiple possible forms, which increases the flexibility of solution implementation.
  • the third resolution is equal to the second resolution; the fourth resolution is equal to the first resolution.
  • the target video of the third resolution obtained by the terminal is the target of the lower resolution (ie the second resolution) used for training the super-score model.
  • the super-resolution model is used to restore the target video of the second resolution to the target video of the first resolution.
  • the effect of super-resolution restoration is better, and the image quality of the restored target video is better. it is good.
  • the super-score model includes: an overfitting obtained by inputting the target video of the first resolution and the target video of the second resolution into a convolutional neural network model The combined superscore model.
  • the over-fitted super-score model is not suitable for the over-score of other non-target videos, it has an excellent effect in the over-score of the training data set, that is, the target video.
  • the super-score model includes: a super-score model obtained by training the target video of the first resolution and the target video of the second resolution , the target video of the first resolution and the target video of the second resolution are composed of video frames in the video of the first resolution and video frames of the video of the second resolution Data cleaning is performed.
  • the data set used for model training is obtained by data cleaning of the video frames of the target video, and the data cleaning includes removing video frames such as blurred frames and low-information frames, thereby improving the super-resolution model. model effect.
  • the terminal receiving the super-resolution model of the target video sent by the server, and the target video of the third resolution includes: receiving, by the terminal, the target video sent by the server.
  • the terminal can receive the structure of the super-resolution model and the weight parameters of the super-resolution model respectively.
  • the terminal first receives the structure of the super-resolution model sent by the server, and then obtains the weight parameters of the super-resolution model and target video.
  • the server sends to the terminal, the target video will be compressed into different data packets for transmission.
  • the super-score models of different target videos can have the same model structure, and the super-score models corresponding to different target videos can be obtained only by receiving the corresponding weight parameters.
  • the amount of data transfer can be reduced.
  • a third aspect of the embodiments of the present application provides a server, including: an acquisition module, configured to acquire a super-score model of a target video, where the super-score model is based on the target video at a first resolution and all images at a second resolution.
  • the target video is acquired by training, the second resolution is smaller than the first resolution, and the magnification of the super-resolution model is the ratio of the pixels of the first resolution and the second resolution in the length direction or the width of the pixels.
  • the ratio of pixels in the direction; the sending module is used to send the target video of the third resolution and the super-division model to the terminal, and the super-division model is used to measure the third resolution at the magnification ratio. Perform super-resolution reconstruction on the target video to obtain the target video of the fourth resolution.
  • the target video is a single video or a set of multiple videos of the same type.
  • the third resolution is equal to the second resolution; the fourth resolution is equal to the first resolution.
  • the obtaining module is specifically configured to: input the target video of the first resolution and the target video of the second resolution into a convolutional neural network model, and obtain the The fitted super-score model.
  • the acquiring module is specifically configured to: perform data cleaning on the video frames in the video of the first resolution and the video frames in the video of the second resolution , obtain the target video of the first resolution and the target video of the second resolution; compare the target video of the first resolution and the target video of the second resolution Perform training to obtain the super-score model.
  • the sending module is specifically configured to: send the structure of the superdivision model to the terminal; send a data packet to the terminal, where the data packet includes the superdivision model The weight parameter of the sub-model and the target video of the third resolution.
  • a fourth aspect of an embodiment of the present application provides a terminal, including: a receiving module configured to receive a super-resolution model of a target video sent by a server, where the super-resolution model is based on the target video of a first resolution and a second resolution
  • the target video of the first resolution is obtained by training, the second resolution is smaller than the first resolution, and the magnification of the super-resolution model is the first resolution and the second resolution in the length direction or in the width direction.
  • the target video of the third resolution a processing module, configured to perform super-resolution on the target video of the third resolution with the magnification according to the super-resolution model Reconstruct, and obtain the target video of the fourth resolution.
  • the target video is a single video or a set of multiple videos of the same type.
  • the third resolution is equal to the second resolution; the fourth resolution is equal to the first resolution.
  • the super-score model includes: an overfitting obtained by inputting the target video of the first resolution and the target video of the second resolution into a convolutional neural network model The combined superscore model.
  • the super-score model includes: a super-score model obtained by training the target video of the first resolution and the target video of the second resolution , the target video of the first resolution and the target video of the second resolution are composed of video frames in the video of the first resolution and video frames of the video of the second resolution Data cleaning is performed.
  • the receiving module is specifically configured to: receive the structure of the super-resolution model sent by the server; receive a data packet sent by the server, where the data packet includes the weight parameters of the super-resolution model and the target video of the third resolution.
  • a fifth aspect of an embodiment of the present application provides a server, including: one or more processors and a memory; wherein, computer-readable instructions are stored in the memory; the one or more processors read the computer The instructions are readable to cause the terminal to implement the method according to any one of the above-mentioned first aspect and various possible implementation manners of the first aspect.
  • a sixth aspect of an embodiment of the present application provides a terminal, including: one or more processors and a memory; wherein, the memory stores computer-readable instructions; the one or more processors read the computer The instructions are readable to cause the terminal to implement the method according to any one of the above-mentioned second aspect and various possible implementation manners of the second aspect.
  • a seventh aspect of an embodiment of the present application provides a video transmission system, including: the server described in any one of the foregoing first aspect and various possible implementation manners of the first aspect, and the foregoing second aspect and the second The terminal described in any one of various possible implementation manners of the aspect.
  • An eighth aspect of the embodiments of the present application provides a computer program product containing instructions, characterized in that, when it runs on a computer, the computer is caused to execute the first aspect, the second aspect and various possible implementations described above. The method of any one of the methods.
  • a ninth aspect of an embodiment of the present application provides a computer-readable storage medium, including instructions, characterized in that, when the instructions are executed on a computer, the computer is made to execute the first aspect, the second aspect, and various possibilities described above. The method described in any one of the implementation manners.
  • a tenth aspect of the embodiments of the present application provides a chip, including a processor.
  • the processor is configured to read and execute the computer program stored in the memory to perform the method in any possible implementation manner of any of the above aspects.
  • the chip includes a memory, and the memory and the processor are connected to the memory through a circuit or a wire.
  • the chip further includes a communication interface, and the processor is connected to the communication interface.
  • the communication interface is used for receiving data and/or information to be processed, the processor obtains the data and/or information from the communication interface, processes the data and/or information, and outputs the processing result through the communication interface.
  • the communication interface may be an input-output interface.
  • the technical effect brought by any one of the third aspect, the fourth aspect, the fifth aspect, the sixth aspect, the seventh aspect, the eighth aspect, the ninth aspect or the tenth aspect can refer to the first aspect
  • the technical effects brought about by the corresponding implementations in the above will not be repeated here.
  • the embodiments of the present application have the following advantages:
  • the server sends the target video of the third resolution and the super-resolution model of the target video to the terminal.
  • the super-resolution model consists of the target video of the first resolution (higher resolution) and the target video of the second resolution (lower resolution).
  • the target video is trained, so when it is used for the super-score of the target video, the recovered video is closer to the original high-resolution video than the general super-score model, and the image quality is improved.
  • 1 is a system architecture diagram of a video transmission method
  • FIG. 2 is a system architecture diagram of a video transmission method in an embodiment of the application
  • 3a is a schematic diagram of an embodiment of a video transmission method implemented on a server side in an embodiment of the present application
  • 3b is a schematic diagram of an embodiment of a video transmission method implemented by a terminal side in an embodiment of the present application
  • 3c is a schematic flowchart of implementing a video transmission method in an embodiment of the present application.
  • 4a is an interaction diagram of a video transmission method in an embodiment of the present application.
  • FIG. 4b is a schematic diagram of another embodiment of the video transmission method in the embodiment of the present application.
  • FIG. 5 is a schematic diagram of a data cleaning method in an embodiment of the present application.
  • FIG. 6 is a schematic diagram of super-resolution model training in an embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of a data block of a superdivision model in a data packet in an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of a header of a data packet in an embodiment of the present application.
  • FIG. 9 is a schematic structural diagram of a data packet in an embodiment of the present application.
  • FIG. 10 is a schematic diagram of a terminal decoding a data packet in an embodiment of the present application.
  • FIG. 11 is a schematic diagram of a terminal acquiring a super-resolution image in an embodiment of the application.
  • FIG. 12 is a schematic diagram of an embodiment of a server in an embodiment of the present application.
  • FIG. 13 is a schematic diagram of an embodiment of a terminal in an embodiment of the present application.
  • FIG. 14 is a schematic diagram of another embodiment of the server in the embodiment of the present application.
  • FIG. 15 is a schematic diagram of another embodiment of the terminal in the embodiment of the present application.
  • the embodiment of the present application provides a video transmission method, which is applied to the field of video transmission, and is used for reducing the transmission bandwidth while ensuring the video quality of the receiving end.
  • Overfitting means that in the process of model parameter fitting, since the training data contains sampling error, the complex model takes the sampling error into account during training and performs a good fit.
  • the specific performance is that the model is in the training set. The effect is good on the test set, but the effect on the test set is poor, and the model generalization ability is weak.
  • the two main reasons for overfitting are a small amount of training data, or an overly complex model.
  • the model is generally used to predict unknown data (data not in the training set), and the effect of the overfitting model is poor. Therefore, during the training process, the data volume of the training set is enlarged and the appropriate model is used to avoid overfitting. fit so that the model fits the true rule enough without having to fit too much sampling error.
  • a model with strong generalization ability is a good model in the usual sense. In model training, it is necessary to avoid overfitting and pursue generalization to make the model applicable to a wider range.
  • the video transmission method provided by the embodiment of the present application cleverly utilizes the characteristics of the overfitting model, and trains the super-score model with a small amount of data (for example, a single video or multiple videos of the same type), and obtains the over-fitting super-score model.
  • the over-fitting super-score model is specially used for the over-score of the training data. Although it is not suitable for the over-score of the non-training set, it can obtain a better over-score effect for the over-score of the training data, that is, recovery high-resolution video with higher quality.
  • the resolution of the image refers to the amount of information stored in the image, which is how many pixels there are in each inch of the image, usually expressed as "the number of horizontal pixels ⁇ the number of vertical pixels".
  • the image resolution is 640*480, which means the number of horizontal pixels is 640 and the number of vertical pixels is 480.
  • P stands for progressive scanning. Commonly used image resolutions of 360P refer to 480 ⁇ 360; 720P refers to 1280 ⁇ 720; 1080p refers to 1920 ⁇ 1080; 4K refers to 3840 ⁇ 2160.
  • the magnification of the super-resolution model refers to the ratio of the number of pixels in the length direction of the image after super-resolution and the image before super-resolution, or the ratio of the number of pixels in the width direction of the two images.
  • Super-resolution for short, is to improve the resolution of the original image by means of hardware or software, and obtain a high resolution (high resolution, HR) image or image sequence through a low resolution (low resolution, LR) image or image sequence.
  • HR high resolution
  • LR low resolution
  • Low-resolution images (low resolution, LR) and high-resolution images (high resolution, HR) correspond to each other and are used to input deep learning models, such as convolutional network models for training
  • HR images are the original images corresponding to the LR images
  • the LR image is super-reconstructed by the super-resolution model to obtain a super-resolution (SR) image
  • SR super-resolution
  • Deep learning is a method in machine learning based on representational learning of data.
  • An observation eg, an image
  • An observation can be represented in a variety of ways, such as a vector of intensity values for each pixel, or more abstractly as a series of edges, regions of a particular shape, etc. Instead, it is easier to learn tasks from examples (e.g., face recognition or facial expression recognition) using some specific representation.
  • the benefit of deep learning is to replace handcrafted features with efficient algorithms for unsupervised or semi-supervised feature learning and hierarchical feature extraction.
  • Convolutional Neural Network Do the inner product of the image (different data window data) and the filter matrix (a set of fixed weights: because the multiple weights of each neuron are fixed, it can be regarded as a constant filter) ( The operation of multiplying and summing element by element is the so-called convolution operation, which is also the source of the name of the convolutional neural network.
  • Data cleaning is the process of re-examining and verifying data to remove duplicate information, correct existing errors, and provide data consistency. That is, to remove or repair the "dirty samples" that affect subsequent model training.
  • data cleaning includes removing blurred and less informative video frames.
  • Figure 1 is an example of a classic terminal video application scenario, including five parts: input video (1001), video encoding (1002), network transmission (1003), video decoding (1004), and output video (1005) at the transmitter .
  • the input video (1001) of the sender is the input part of the scene, and is the video saved in the cloud server.
  • the video coding (1002) is to compress the input video images and reduce the redundant information of the video to facilitate network transmission.
  • the network transmission (1003) stage is to transmit the video from the sender to the receiver.
  • the video decoding (1004) is used to decode the encoded video transmitted by the network and restore the video to the state before encoding.
  • the output video (1005) is the video output synthesized by the decoder.
  • the cloud sends high-resolution video (such as 4k resolution video) to the client through the network.
  • high-resolution video such as 4k resolution video
  • the bandwidth cost of the video platform is relatively high.
  • the video playback on the client side is prone to freeze problems.
  • Video platforms usually use smaller resolutions for compressed transmission to reduce the image transmission bandwidth to the user terminal.
  • the image received by the user terminal side needs to be restored to the resolution before being displayed. Due to the loss of high-frequency information during compression and transmission, the traditional upsampling algorithm is used to interpolate the low-resolution video to obtain the high-resolution video, which is prone to blurring or compression noise, and the image quality is poor.
  • Super-resolution technology is an important computer vision and image processing method to restore high-resolution images from low-resolution images.
  • Super-resolution technology based on deep learning is developing rapidly.
  • a super-resolution model is preset on the terminal side to perform super-resolution reconstruction on low-resolution videos to obtain high-resolution and high-quality videos.
  • the target video is a single video or a collection of multiple videos of the same type.
  • the target video may be an episode of a series, or an independent video such as a news video or a movie.
  • the super-score model can be independently trained for a single video.
  • the target video may also be multiple videos of the same type, such as a series or a season of a series, a season broadcast, or a video of the same series of a personal blogger, that is, related in content, characters or performances. multiple videos. Since these videos usually have similar style and quality, training the super-score model on these videos independently, the quality of the videos recovered by super-score is better.
  • the server and the terminal can reduce the transmission bandwidth and improve the image quality of the video restored on the terminal side by means of device-cloud collaboration.
  • Cloud side that is, the server side, which uses the high-resolution video of a set of videos and the video frames corresponding to the low-resolution video to generate a training data set. Overfitting the overscore model. The low-resolution video of the set of videos and the over-fitted super-resolution neural network model corresponding to the set are sent to the end-side.
  • the embodiment of the present application uses a deep learning-based video transmission solution through device-cloud collaboration.
  • the video of the first resolution video (2001) and its corresponding second resolution video (2002) will be used as the data set of the super-score model training module (2003) of the video, wherein the second resolution video rate is less than the first resolution.
  • the super-score model training module (2003) uses the dataset to train the video-specific super-score model. Different videos will have their corresponding over-score models.
  • the sub-models are encoded together in the video and model encoding module (2004) as data packets for transmission over the network (2005).
  • the video and model decoding module (2006) decodes the received data packets to obtain the second resolution video (2007) of the set and its corresponding super-resolution model (2008). (2009) Output the first resolution video (2010).
  • the video transmission method in the embodiment of the present application is introduced from the server side and the terminal side, respectively.
  • the video transmission method in this embodiment of the present application may include steps 3101 to 3103 .
  • the server obtains the target video of the first resolution and the target video of the second resolution;
  • the server obtains two different resolution versions of the target video, including the target video of the first resolution and the target video of the second resolution, where the second resolution (or low resolution in this embodiment of the present application) is smaller than the first resolution rate (or referred to as high resolution in the embodiments of this application).
  • the target video is taken as an example of a single episode for introduction.
  • video platforms store multiple resolution versions of the same video. Common resolutions include: 360P, 720P, 1080P or 4K. As shown in Fig.
  • the server obtains the video of the target video (3001) of the first resolution of a single episode (for example: the video of the first episode of a certain series with the resolution of 1080P) and the corresponding target video of the second resolution (3001). 3002) (for example: the video of the first episode of the series with a resolution of 4K) as the data set of the super-score model (3003).
  • the video frame of the target video of the second resolution is used as the data (data) of the dataset, and the corresponding video frame of the target video of the first resolution is used as the label of the dataset.
  • the quality of the data set is very important for the training effect of the super-resolution model.
  • the data set used for training the super-resolution model can be a data set containing all video frames of the target video without data cleaning, or a data set that has been Data set with some video frames deleted after data cleaning.
  • data cleaning (3004) is performed on the dataset (3003), the data cleaning includes removing blurred frames, filtering low-frequency information, etc., to obtain a cleaned data dataset (3005) and a label dataset (3006).
  • the server performs training according to the target video of the first resolution and the target video of the second resolution to obtain a super-score model
  • the server performs overfitting training on the super-score model according to the target video of the first resolution and the target video of the second resolution, and obtains the over-fitted super-score model.
  • the magnification of the super-resolution model is the ratio of pixels in the width direction or the ratio of pixels in the length direction between the first resolution and the second resolution.
  • a single video corresponds to one super-score model.
  • a set of multiple videos of the same type corresponds to a super-score model.
  • the super-score model corresponding to each video in the set of multiple videos of the same type has the same model structure and different model parameters.
  • the server builds a super-resolution neural network model, and the super-resolution model is trained (3007) based on the data dataset (3005) and the label dataset (3006) generated in step 3101.
  • a corresponding super-score model is trained for each episode of videos. This step will result in a superscoring model (3008) for that single episode video.
  • the server inputs the data set into the convolutional neural network model for training, and obtains an overfitted super-score model.
  • the server sends the super-resolution model and the target video of the third resolution to the terminal;
  • the server sends the super-resolution model obtained in step 3102 and the target video of the third resolution to the terminal.
  • the third resolution may be the same as or different from the first resolution and the second resolution, and the specific value is not limited.
  • the third resolution may be 360P, 720P or 1080P, and the third It can be flexibly selected according to the network status. It can be understood that the target video of the third resolution contains all the video frames of the original video, and no data cleaning is required.
  • the encoding module (3010) encodes the super-resolution model (3008) and the target video (3009) of the third resolution into data packets and sends them to the network (3011) for transmission.
  • the superresolution model will be transmitted in binary file format.
  • the model structure of the super-resolution model is pre-updated to the terminal side.
  • the server only needs to send the weights parameters of the super-resolution model corresponding to the video to the terminal, without sending the model structure again.
  • the tail of the binary file contains a hash value generated by the model.
  • the super-resolution model sent to the terminal is used to super-score the target video of the third resolution, for example, to restore a video with a resolution of 1080P to a video with a resolution of 4K.
  • the network bandwidth occupied by directly sending the video of the fourth resolution is lower.
  • the model is obtained by training according to the target video, and the effect of super-score recovery is better for the target video, and the video quality obtained by over-score recovery is higher.
  • the terminal receives the super-resolution model and the target video of the third resolution sent by the server;
  • the terminal receives the super-resolution model and the target video of the third resolution sent by the cloud-side server. As shown in Figure 3c, specifically, the terminal receives the data packets transmitted by the server through the network transmission, and decodes the data packets through the decoding module (3012) to obtain the super-division model (3014) and the target video of the third resolution (3013).
  • the super-score model is obtained by the server performing overfitting training according to the target video of the first resolution and the target video of the second resolution.
  • the magnification of the super-resolution model is the ratio of pixels in the width direction or the ratio of pixels in the length direction between the first resolution and the second resolution.
  • the super-resolution model is used to perform resolution magnification for the target video of the third resolution according to the magnification ratio. Exemplarily, if the first resolution is 4K and the second resolution is 1080P, the magnification is twice, that is, the pixels of the image with the resolution of 1080P are enlarged twice in the length direction and the width direction.
  • the decoding module (3012) uses the same hash method as that of the cloud server to perform hash processing on the super-score model (3014), obtains a hash value, and uses the hash value to perform consistency check, if the same, it is considered that The model is reliable; if it is different, an error message is sent to the cloud, requiring the server to resend the correct packet.
  • the terminal performs super-resolution processing on the target video of the third resolution according to the super-resolution model, and obtains the target video of the fourth resolution;
  • the terminal uses the over-fit super-score model (3014) to perform video super-score processing (3015) on the target video (3013) of the third resolution, and generates the target video of the fourth resolution (3016) and sent to the terminal display device.
  • the target video of the fourth resolution is obtained by enlarging the target video of the third resolution by the magnification of the super-resolution model. Exemplarily, if the third resolution is 1080P and the magnification of the super-resolution model is 2 times, the resolution of the target video obtained by the super-resolution processing is 4K.
  • the network bandwidth occupied by the terminal receiving the target video of the third resolution is relatively low.
  • the terminal since the terminal performs resolution amplification according to the overfitting super-resolution model corresponding to the target video, the obtained image quality of the target video of the fourth resolution is higher.
  • the low-resolution video received by the terminal and the network bandwidth occupied by the super-score model are low.
  • the over-score model is obtained by overfitting and training according to the target video, the image quality obtained by the over-score restoration of the target video is higher.
  • the server obtains the data set
  • the server needs to fetch the dataset before training the superscore model.
  • the target video is any set of videos in the series of videos
  • the video frame of the target video of the second resolution corresponding to the set video and the video frame of the target video of the corresponding first resolution are extracted as the set.
  • a dataset of super-score models for video The video frame of the low-resolution video is used as the data of the dataset, and the video frame of the corresponding high-resolution video is used as the label of the dataset.
  • the cloud server invokes a third-party video encoding and decoding service (4002) to decode the stored video resources (4001), such as Huawei Video, iQiyi Video or Youku Video, etc. to obtain a video. frame.
  • a third-party video encoding and decoding service 4002 to decode the stored video resources (4001), such as Huawei Video, iQiyi Video or Youku Video, etc. to obtain a video. frame.
  • the stored video resources (4001)
  • the stored video resources 4001
  • the stored video resources such as Huawei Video, iQiyi Video or Youku Video, etc.
  • each frame of the decoded high- and low-definition video corresponds to each other, exemplary
  • the f_1 frame in the video frame (4004) of the second resolution corresponds to the f_1 frame of the video frame (4003) of the first resolution.
  • the video frames (4003) of the first resolution and the video frames of the second resolution may be (4004) Data cleaning is performed by removing blurred frames, filtering low-frequency information, etc.
  • the specific process of performing data cleaning on the data set is shown in Figure 5:
  • the target video of the first resolution (5001) is decoded to obtain the first resolution.
  • a video frame, the target video of the second resolution (5002) is decoded to obtain a video frame of the second resolution.
  • the resolution detection algorithm For each decoded frame, the resolution detection algorithm is used to detect that each frame in the classified data set is a high-definition frame or a fuzzy frame or a low-information frame, and the data set obtained by eliminating the fuzzy frame and the low-information frame is the data after data cleaning.
  • a data set, in which the video frames of the first resolution of the target video will be used as labels for training the super-score model, and the video frames corresponding to the second resolution will be used as the data (data) of this set of models.
  • the server trains the super-scoring model
  • the video frames (6001) of the first resolution of the set of videos and the corresponding video frames (6002) of the second resolution videos generated in step 4101 are used as the data sets of the super-score model.
  • a video frame of one resolution has a corresponding video frame of a second resolution.
  • the server encodes the data packet
  • the encoding module (4006) encodes the super-segmented model (4005) of the set of videos generated in step 4102 and the video frame (4004) of the second resolution to obtain a data packet.
  • the processing of the super-score model by the encoding module generates a binary model file model (6002-2) that only contains weights parameters and does not contain the model structure; After processing, get the hash (6002-3) value and put it at the end of the binary model file; the header (6002-1) at the front of the model 6002 is to set some parameters of the module.
  • Three small modules (6002-1, 6002-2, 6002-3) together make up the model (6002) in Figure 7.
  • the extension flag bit X of the data packet header is set to 1, so that the data packet can be extended with custom data.
  • the extended flag bit of the header (6001) is turned on, and the model data (6002) is placed in the middle of the header (6001) and the payload (6003), and the payload ( 6003) is the video data. Since the model file is included in the first data packet sent from the cloud to the terminal, the second and subsequent data packets sent by the server to the terminal will set the extended flag position of the packet header to 0 (as shown in Figure 9) , there is no extended data after the header in the data packet, and there is no model file between the header (6001) and the payload (6003), which effectively ensures the efficiency and security of data transmission.
  • the server sends the low-resolution video and the super-resolution model to the terminal;
  • the server sends a data packet including the super-resolution model (4005) and the video frame (4004) of the second resolution to the terminal through network transmission, and the terminal receives the data packet.
  • the server can select target videos of different resolutions to send to the terminal.
  • the video of the second resolution is sent as an example for introduction.
  • the terminal decodes the data packet
  • the terminal receives the data packets transmitted and delivered by the cloud side through the network, and decodes the data packets (4010) through the video frame and super-division model decoding module (4009) to obtain the super-division model (4011) and the target video of the second resolution (4012).
  • the decoding model needs to perform consistency check on the model data.
  • the decoding module obtains the binary model data block (7001) from the data packet, and decodes the data block (7004) to obtain the binary model (7002) and the hash value (7003).
  • Hash (7005) the binary model (7002) using the same hashing method as in the cloud to obtain a Hash value (7006), and perform consistency check on the Hash value (7003) and the Hash value (7006), if they are the same, The model is considered reliable. If it is different, it will send an error message to the cloud and ask the cloud to resend the data packet.
  • the terminal super-divides low-resolution video and sends it for display;
  • the terminal uses the end-side inference engine (4013) to perform video super-resolution processing on the target video (4012) of the second resolution through the super-resolution model (4011) to obtain the target video of the first resolution (4014), and sends it to the target video (4014). Displayed on the end-side display module (4015).
  • the super-resolution model can perform super-resolution reconstruction of lower-resolution video frames to obtain higher-resolution video frames.
  • the video transmission method provided by the embodiments of the present application can use a smaller bandwidth (for example, reduce the bandwidth by half) for transmission on the basis of keeping the video quality of the terminal side unchanged.
  • the method reduces the bandwidth cost of the video platform and increases the market competitiveness of the video platform.
  • a video with a resolution of 4K is about 2 GB (Gigabyte) in size
  • a video with a resolution of 1080p transmitted is about 450 MB (Megabyte)
  • the size of the super-score model is about 10 MB. It can be seen that the video provided by the embodiment of the present application is about 450 MB (Megabyte).
  • the transmission method can reduce the transmission bandwidth.
  • the super-score model sent by the server to the terminal is an over-fitting model obtained by training based on the target video
  • the effect of super-score on the target video is better than that of the general-purpose super-score model, and the over-score acquisition The video quality is higher.
  • FIG. 12 is a schematic diagram of an embodiment of the server in the embodiment of the present application.
  • the software or firmware includes, but is not limited to, computer program instructions or code, and can be executed by a hardware processor.
  • the hardware includes, but is not limited to, various types of integrated circuits, such as a central processing unit (CPU), a digital signal processor (DSP), a field programmable gate array (FPGA), or an application specific integrated circuit (ASIC).
  • CPU central processing unit
  • DSP digital signal processor
  • FPGA field programmable gate array
  • ASIC application specific integrated circuit
  • the server includes:
  • the acquisition module 1201 is used to acquire a super-score model of a target video, and the super-score model is obtained by training according to the target video of the first resolution and the target video of the second resolution, and the second resolution is less than
  • the magnification of the super-resolution model is the ratio of pixels in the length direction or the ratio of pixels in the width direction of the first resolution and the second resolution;
  • Sending module 1202 configured to send the target video of the third resolution and the super-resolution model to the terminal, where the super-resolution model is used to perform the target video of the third resolution at the magnification ratio.
  • Super-resolution reconstruction to obtain the target video at a fourth resolution.
  • the target video includes a single video or a set of multiple videos of the same type.
  • the third resolution is equal to the second resolution; the fourth resolution is equal to the first resolution.
  • the obtaining module 1201 is specifically configured to: input the target video of the first resolution and the target video of the second resolution into a convolutional neural network model, and obtain the over-fitted super-score model. .
  • the obtaining module 1201 is specifically configured to: perform data cleaning on the video frames in the video of the first resolution and the video frames in the video of the second resolution, and obtain the first resolution. the target video of the first resolution and the target video of the second resolution; perform training on the target video of the first resolution and the target video of the second resolution, and obtain the super-score Model.
  • the sending module 1202 is specifically configured to: send the structure of the super-score model to the terminal; send a data packet to the terminal, the data packet including the weight parameter of the super-score model and the the target video at a third resolution.
  • the server provided by the embodiment of the present application acquires the super-division model of the target video and the target video of the third resolution through the acquisition module, and sends them to the terminal by the sending module.
  • the over-score recovery based on the general over-score model can also improve the video quality.
  • FIG. 13 is a schematic diagram of an embodiment of the terminal in the embodiment of the present application.
  • the software or firmware includes, but is not limited to, computer program instructions or code, and can be executed by a hardware processor.
  • the hardware includes, but is not limited to, various types of integrated circuits, such as a central processing unit (CPU), a digital signal processor (DSP), a field programmable gate array (FPGA), or an application specific integrated circuit (ASIC).
  • CPU central processing unit
  • DSP digital signal processor
  • FPGA field programmable gate array
  • ASIC application specific integrated circuit
  • the terminal includes: a receiving module 1301, configured to receive a super-score model of a target video sent by a server, where the super-score model is acquired by training according to the target video of the first resolution and the target video of the second resolution,
  • the second resolution is smaller than the first resolution
  • the magnification of the super-resolution model is the ratio of the pixels of the first resolution and the second resolution in the length direction or in the width direction, and the third resolution rate of the target video;
  • the processing module 1302 is configured to perform super-resolution reconstruction on the target video of the third resolution at the magnification according to the super-resolution model, and obtain the target video of the fourth resolution.
  • the target video includes a single video or a set of multiple videos of the same type.
  • the third resolution is equal to the second resolution; the fourth resolution is equal to the first resolution.
  • the super-score model includes: an over-fitted super-score model obtained by inputting the target video of the first resolution and the target video of the second resolution into a convolutional neural network model.
  • the super-score model includes: a super-score model obtained by training the target video of the first resolution and the target video of the second resolution, and all the images of the first resolution.
  • the target video and the target video of the second resolution are obtained by performing data cleaning on the video frames in the video of the first resolution and the video frames of the video of the second resolution.
  • the receiving module 1301 is specifically configured to: receive the structure of the super-score model sent by the server; receive a data packet sent by the server, where the data packet includes the weight parameters of the super-score model and the target video at the third resolution.
  • the receiving module receives the superdivision model of the target video and the target video of the third resolution sent by the server, and the bandwidth of video transmission is relatively low, and performing superdivision recovery based on the superdivision model is more efficient than the prior art.
  • the super-score recovery based on the general super-score model can improve the video quality.
  • FIG. 14 is a schematic diagram of another embodiment of the server in the embodiment of the present application.
  • the server in this embodiment of the present application may be a physical machine or a virtual machine running on abstract hardware resources. In an actual application scenario, it may be a server that provides various cloud services.
  • the device form is not limited.
  • the server 1400 provided in this embodiment may vary greatly due to different configurations or performance, and may include one or more processors 1401 and a memory 1402, where programs or data are stored in the memory 1402.
  • the memory 1402 may be volatile storage or non-volatile storage.
  • the processor 1401 is one or more central processing units (CPU, Central Processing Unit, which can be a single-core CPU or a multi-core CPU.
  • CPU Central Processing Unit
  • the processor 1401 can communicate with the memory 1402 to execute on the server 1400 .
  • the server 1400 also includes one or more wired or wireless network interfaces 1403, such as Ethernet interfaces.
  • the server 1400 may also include one or more power supplies; one or more input/output interfaces, which may be used to connect a monitor, mouse, keyboard, touch screen device or sensing device etc., the input and output interfaces are optional components, which may or may not exist, and are not limited here.
  • FIG. 15 is a schematic diagram of another embodiment of the terminal in the embodiment of the present application.
  • the terminal 1500 provided in this embodiment may be various types of terminals with display functions, such as a mobile phone, a tablet computer, a desktop computer, a smart screen, or a wearable device, and the specific device form is not limited in this embodiment of the present application.
  • the terminal 1500 may vary greatly due to different configurations or performances, and may include one or more processors 1501 and a memory 1502 in which programs or data are stored.
  • the memory 1502 may be volatile storage or non-volatile storage.
  • the processor 1501 is one or more central processing units (CPU, Central Processing Unit, which can be a single-core CPU or a multi-core CPU.
  • CPU Central Processing Unit
  • the processor 1501 can communicate with the memory 1502 and execute on the terminal 1500 A series of instructions in memory 1502.
  • the terminal 1500 also includes one or more wired or wireless network interfaces 1503, such as Ethernet interfaces.
  • the terminal 1500 may also include one or more power supplies; one or more input and output interfaces, which may be used to connect a display, a mouse, a keyboard, a touch screen device or a sensing device etc., the input and output interfaces are optional components, which may or may not exist, and are not limited here.
  • the disclosed system, apparatus and method may be implemented in other manners.
  • the apparatus embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented.
  • the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units may be implemented in the form of hardware, or may be implemented in the form of software functional units.
  • the integrated unit if implemented in the form of a software functional unit and sold or used as an independent product, may be stored in a computer-readable storage medium.
  • the technical solutions of the present application can be embodied in the form of software products in essence, or the parts that contribute to the prior art, or all or part of the technical solutions, and the computer software products are stored in a storage medium , including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk and other media that can store program codes .

Abstract

Disclosed by embodiments of the present application is a video transmission method, applicable to the field of video transmission, used for reducing transmission bandwidth while ensuring video quality at a receiving terminal. The method of the embodiments of the present application comprises: a server obtaining a super-resolution model obtained by training a target video at a first resolution and said target video at a second resolution; sending said super-resolution model and the target video at a third resolution to a terminal; the terminal performing a super-resolution reconstruction of the target video at the third resolution on the basis of the super-resolution model; in comparison with super-resolution reconstruction by means of a generic super-resolution model, the video image quality is improved.

Description

视频传输方法、服务器、终端和视频传输系统Video transmission method, server, terminal and video transmission system
本申请要求于2020年11月30日提交中国国家知识产权局、申请号为202011373386.2、发明名称为“视频传输方法、服务器、终端和视频传输系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application with the application number 202011373386.2 and the invention title "Video Transmission Method, Server, Terminal and Video Transmission System", which was submitted to the State Intellectual Property Office of China on November 30, 2020, the entire contents of which are obtained through Reference is incorporated in this application.
技术领域technical field
本申请涉及视频传输技术领域,尤其涉及一种视频传输方法、服务器、终端和视频传输系统。The present application relates to the technical field of video transmission, and in particular, to a video transmission method, server, terminal and video transmission system.
背景技术Background technique
随着电子显像技术发展,屏幕分辨率不断提升,图像和视频等图像资源的流量消耗越来越大,带来非常高昂的带宽成本。降低视频的带宽成本,已经成为了当前视频平台的当务之急。使用较小的分辨率进行压缩传输是一种常用的降带宽技术,可以降低视频平台到用户终端的图像传输带宽。With the development of electronic imaging technology and the continuous improvement of screen resolution, the traffic consumption of image resources such as images and videos is increasing, which brings very high bandwidth costs. Reducing the bandwidth cost of video has become a top priority for current video platforms. Compressed transmission using a smaller resolution is a commonly used bandwidth reduction technology, which can reduce the image transmission bandwidth from the video platform to the user terminal.
由于较小分辨率的视频无法满足用户高质量观看需求,一般用户终端侧(以下简称:端侧)会将接收到的视频进行分辨率恢复后再显示。超分辨率(以下简称:超分)技术是从低分辨率的图片恢复高分辨率图片的一种重要的计算机视觉和图像处理手段。基于深度学习的超分辨率技术发展迅速。已有方法通过在端侧预置一个超分辨率模型,对低画质的视频进行超分辨率重建,获取高分辨率高画质的视频。Since a video with a smaller resolution cannot meet the user's high-quality viewing requirements, generally, the user terminal side (hereinafter referred to as the terminal side) will restore the resolution of the received video before displaying it. Super-resolution (hereinafter referred to as: super-resolution) technology is an important computer vision and image processing method to restore high-resolution pictures from low-resolution pictures. Super-resolution technology based on deep learning is developing rapidly. In the existing method, a super-resolution model is preset on the device side to perform super-resolution reconstruction on low-quality video to obtain high-resolution and high-quality video.
由于不同视频具有各自不同的内容、风格和画质,使用预置模型无法覆盖全部的视频场景,导致恢复的超分辨率视频的画质较原高分辨率视频的画质还有较大差距,甚至在一些场景上视频的画质会劣化。Since different videos have different content, style and image quality, the preset model cannot cover all video scenes, resulting in a large gap between the image quality of the restored super-resolution video and the original high-resolution video. Even in some scenes the quality of the video is degraded.
发明内容SUMMARY OF THE INVENTION
本申请实施例提供了一种视频传输方法,应用于视频传输领域,用于降低传输带宽的同时保证接收端的视频画质。The embodiment of the present application provides a video transmission method, which is applied to the field of video transmission, and is used for reducing the transmission bandwidth while ensuring the video quality of the receiving end.
本申请实施例的第一方面提供了一种视频传输方法,包括:服务器获取目标视频的超分模型,所述超分模型根据第一分辨率的所述目标视频和第二分辨率的所述目标视频进行训练获取,所述第二分辨率小于所述第一分辨率,所述超分模型的放大倍率为第一分辨率与第二分辨率在长度方向上或在宽度方向上的像素之比;所述服务器向终端发送第三分辨率的所述目标视频和所述超分模型,所述超分模型用于以所述放大倍率对所述第三分辨率的所述目标视频进行超分辨率重建以获取第四分辨率的所述目标视频。A first aspect of the embodiments of the present application provides a video transmission method, including: a server acquiring a super-score model of a target video, where the super-score model is based on the target video at a first resolution and the target video at a second resolution The target video is acquired through training, the second resolution is smaller than the first resolution, and the magnification of the super-resolution model is the difference between the pixels of the first resolution and the second resolution in the length direction or the width direction. ratio; the server sends the target video of the third resolution and the super-resolution model to the terminal, and the super-resolution model is used to super-resolution the target video of the third resolution at the magnification ratio. Resolution reconstruction to obtain the target video at a fourth resolution.
本申请实施例提供的视频传输方法,服务器向终端发送第三分辨率的目标视频和超分模型,该超分模型是根据不同分辨率(第一分辨率和第二分辨率)的该目标视频训练获取,终端可以根据该超分模型的放大倍率对第三分辨率的目标视频进行超分,获取第四分辨率的该目标视频。由于本申请实施例中的视频传输方法中服务器向终端发送的超分模型基于目标视频训练获取,相较现有技术中终端根据预置的单一的超分模型对各类目标视频进行超分,通 过本方法的超分模型在对目标视频进行超分恢复时可以获取较好的超分效果,即恢复的第四分辨率的目标视频的画质较高。In the video transmission method provided by the embodiment of the present application, the server sends the target video of the third resolution and the super-resolution model to the terminal, and the super-resolution model is based on the target video of different resolutions (the first resolution and the second resolution). For training and acquisition, the terminal may super-score the target video of the third resolution according to the magnification of the super-resolution model to obtain the target video of the fourth resolution. Since the super-score model sent by the server to the terminal in the video transmission method in the embodiment of the present application is obtained based on target video training, compared with the prior art, the terminal performs over-score for various target videos according to a single preset super-score model. By using the super-score model of the method, a better super-score effect can be obtained when the target video is restored by the over-score, that is, the image quality of the restored target video of the fourth resolution is higher.
在第一方面的一种可能的实现方式中,目标视频为单个视频或多个同类型视频的集合。具体的,目标视频可以是一部连续剧中的一集或者一个新闻视频、一部电影等独立的一个视频,即对单个视频独立训练超分模型;或者,目标视频可以是多个同类型视频,例如系列剧或季播剧中的一个系列,或一季,又或个人博主的同系列视频等,即在内容、人物或表演中具有关联性的多个视频作为目标视频,这些视频通常具有相似的风格和画质,将这些视频独立训练超分模型,超分恢复得到的视频画质较佳。In a possible implementation manner of the first aspect, the target video is a single video or a set of multiple videos of the same type. Specifically, the target video can be an episode of a series or an independent video such as a news video, a movie, etc., that is, the super-score model is independently trained for a single video; or, the target video can be multiple videos of the same type, For example, a series in a series or a season broadcast, or a season, or the same series of videos of a personal blogger, etc., that is, multiple videos related in content, characters or performances as target videos, these videos usually have similar The style and image quality of these videos are independently trained for the super-scoring model, and the video quality obtained by the super-scoring recovery is better.
在第一方面的一种可能的实现方式中,所述第三分辨率等于所述第二分辨率;所述第四分辨率等于所述第一分辨率。In a possible implementation manner of the first aspect, the third resolution is equal to the second resolution; the fourth resolution is equal to the first resolution.
本申请实施例提供的视频传输方法,在一种可能的实现方式中,服务器向终端发送的目标视频就是用于训练超分模型的较低分辨率(即第二分辨率)的目标视频,该超分模型用于将第二分辨率的目标视频恢复为第一分辨率的目标视频,在这一实现场景下,超分辨率恢复的效果较佳,恢复得到目标视频的画质较好。In the video transmission method provided by the embodiment of the present application, in a possible implementation manner, the target video sent by the server to the terminal is the target video of lower resolution (that is, the second resolution) used for training the super-score model. The super-resolution model is used to restore the target video of the second resolution to the target video of the first resolution. In this implementation scenario, the effect of super-resolution restoration is better, and the image quality of the restored target video is better.
在第一方面的一种可能的实现方式中,所述服务器获取目标视频的超分模型包括:所述服务器将所述第一分辨率的目标视频和第二分辨率的所述目标视频输入卷积神经网络模型,获取过拟合的所述超分模型。In a possible implementation manner of the first aspect, the obtaining, by the server, the super-score model of the target video includes: the server inputting the target video of the first resolution and the target video of the second resolution into a volume The neural network model is accumulated to obtain the over-fitted super-score model.
本申请实施例提供的视频传输方法,服务器可以将不同分辨率的目标视频输入卷积神经网络模型进行训练,获取过拟合的超分模型,过拟合的超分模型虽然不适用于其他非目标视频的超分,但是在对训练数据集即目标视频进行超分中效果极佳。In the video transmission method provided by the embodiment of the present application, the server can input target videos of different resolutions into the convolutional neural network model for training, and obtain an over-fitted super-score model. Although the over-fitted super-score model is not applicable to other non- Overscore of the target video, but works extremely well in the overscore of the training dataset i.e. the target video.
在第一方面的一种可能的实现方式中,所述服务器获取目标视频的超分模型包括:所述服务器对所述第一分辨率的视频中的视频帧和所述第二分辨率的视频中的视频帧进行数据清洗,获取所述第一分辨率的所述目标视频和所述第二分辨率的所述目标视频;所述服务器对所述第一分辨率的所述目标视频和所述第二分辨率的所述目标视频进行训练,获取所述超分模型。In a possible implementation manner of the first aspect, the obtaining, by the server, the super-score model of the target video includes: the server determining the video frames in the video of the first resolution and the video of the second resolution by the server Perform data cleaning on the video frames in the video frame to obtain the target video of the first resolution and the target video of the second resolution; the server performs data cleaning on the target video of the first resolution and the target video of the second resolution The target video of the second resolution is trained to obtain the super-score model.
本申请实施例提供的视频传输方法,服务器在训练超分模型之前还可以先对目标视频的视频帧进行数据清洗,数据清洗包括去除模糊帧与低信息量帧等视频帧,经过数据清洗后的视频用于模型训练可以提高模型效果。In the video transmission method provided by the embodiment of the present application, the server may also perform data cleaning on the video frames of the target video before training the super-score model. The data cleaning includes removing video frames such as blurred frames and low-information frames. Using videos for model training can improve model performance.
在第一方面的一种可能的实现方式中,所述服务器向终端发送第三分辨率的所述目标视频和所述超分模型具体包括:所述服务器向所述终端发送所述超分模型的结构;所述服务器向所述终端发送数据包,所述数据包包括所述超分模型的权重参数和所述第三分辨率的所述目标视频。In a possible implementation manner of the first aspect, the sending, by the server, the target video of the third resolution and the super-resolution model to the terminal specifically includes: the server sending the super-resolution model to the terminal The server sends a data packet to the terminal, and the data packet includes the weight parameter of the super-score model and the target video of the third resolution.
本申请实施例提供的视频传输方法,服务器向终端发送目标视频和超分模型的方式有多种,可以分别发送,也可以同时发送。具体地,服务器向终端发送超分模型时,超分模型可以分为超分模型的结构和超分模型的权重参数两部分发送,在一种可能的实现方式中,服务器预先向终端发送超分模型的结构,再向终端发送超分模型的权重参数和目标视频。进一步地,服务器向终端发送时会将目标视频压缩为不同的数据包进行发送,可选地,超分模型的权重参数和目标视频的第一个数据包一起发送给终端,使得终端可以尽快开始对目标视频进 行超分,在这一实现方式中,不同的目标视频的超分模型可具有相同的模型结构,每个目标视频的超分模型在发送给终端时,只需要发送对应的权重参数即可,由此,可以减少数据传输量。In the video transmission method provided by the embodiment of the present application, there are multiple ways for the server to send the target video and the super-resolution model to the terminal, which can be sent separately or simultaneously. Specifically, when the server sends the super-score model to the terminal, the super-score model can be sent in two parts: the structure of the super-score model and the weight parameters of the super-score model. In a possible implementation, the server sends the super-score model to the terminal in advance. The structure of the model, and then send the weight parameters of the super-score model and the target video to the terminal. Further, when the server sends to the terminal, the target video will be compressed into different data packets for transmission. Perform super-score on the target video. In this implementation, the super-score models of different target videos can have the same model structure. When the super-score model of each target video is sent to the terminal, only the corresponding weight parameters need to be sent. That is, it is possible to reduce the amount of data transmission by this.
本申请实施例的第二方面提供了一种视频传输方法,包括:终端接收服务器发送的目标视频的超分模型,所述超分模型根据第一分辨率的所述目标视频和第二分辨率的所述目标视频进行训练获取,所述第二分辨率小于所述第一分辨率,所述超分模型的放大倍率为第一分辨率与第二分辨率在长度方向上或在宽度方向上的像素之比,以及第三分辨率的所述目标视频;所述终端根据所述超分模型,以所述放大倍率对所述第三分辨率的所述目标视频进行超分辨率重建,获取第四分辨率的所述目标视频。A second aspect of the embodiments of the present application provides a video transmission method, including: a terminal receiving a super-resolution model of a target video sent by a server, where the super-resolution model is based on the target video of a first resolution and a second resolution of the target video The target video is obtained through training, the second resolution is smaller than the first resolution, and the magnification of the super-resolution model is the first resolution and the second resolution in the length direction or in the width direction. and the target video of the third resolution; the terminal performs super-resolution reconstruction on the target video of the third resolution at the magnification according to the super-resolution model, and obtains the target video at the fourth resolution.
本申请实施例提供的视频传输方法,终端接收服务器发送的第三分辨率的目标视频和超分模型,由于超分模型是根据不同分辨率(第一分辨率和第二分辨率)的该目标视频训练获取,相较现有技术中终端根据单一的超分模型对各类目标视频进行超分,通过本方法的超分模型在对目标视频进行超分恢复时可以获取较好的超分效果,即恢复的第四分辨率的目标视频的画质较高。In the video transmission method provided by the embodiment of the present application, the terminal receives the target video and the super-resolution model of the third resolution sent by the server, because the super-resolution model is based on the target video of different resolutions (the first resolution and the second resolution). For video training acquisition, compared with the prior art in which the terminal performs over-score for various target videos according to a single over-score model, the super-score model of this method can obtain a better over-score effect when performing over-score recovery on the target video. , that is, the image quality of the restored target video of the fourth resolution is higher.
在第二方面的一种可能的实现方式中,所述目标视频为单个视频或多个同类型视频的集合。In a possible implementation manner of the second aspect, the target video is a single video or a set of multiple videos of the same type.
本申请实施例提供的视频传输方法,目标视频有多种可能的形式,增加了方案实现的灵活性。In the video transmission method provided by the embodiments of the present application, the target video has multiple possible forms, which increases the flexibility of solution implementation.
在第二方面的一种可能的实现方式中,所述第三分辨率等于所述第二分辨率;所述第四分辨率等于所述第一分辨率。In a possible implementation manner of the second aspect, the third resolution is equal to the second resolution; the fourth resolution is equal to the first resolution.
本申请实施例提供的视频传输方法,一种典型的实现方式中,终端获取的第三分辨率的目标视频,即用于训练超分模型的较低分辨率(即第二分辨率)的目标视频,该超分模型用于将第二分辨率的目标视频恢复为第一分辨率的目标视频,在这一实现场景下,超分辨率恢复的效果较佳,恢复得到目标视频的画质较好。In a typical implementation of the video transmission method provided by the embodiment of the present application, the target video of the third resolution obtained by the terminal is the target of the lower resolution (ie the second resolution) used for training the super-score model. Video, the super-resolution model is used to restore the target video of the second resolution to the target video of the first resolution. In this implementation scenario, the effect of super-resolution restoration is better, and the image quality of the restored target video is better. it is good.
在第二方面的一种可能的实现方式中,所述超分模型包括:由所述第一分辨率的目标视频和第二分辨率的所述目标视频输入卷积神经网络模型获取的过拟合的超分模型。In a possible implementation manner of the second aspect, the super-score model includes: an overfitting obtained by inputting the target video of the first resolution and the target video of the second resolution into a convolutional neural network model The combined superscore model.
本申请实施例提供的视频传输方法,过拟合的超分模型虽然不适用于其他非目标视频的超分,但是在对训练数据集即目标视频进行超分中效果极佳。In the video transmission method provided by the embodiment of the present application, although the over-fitted super-score model is not suitable for the over-score of other non-target videos, it has an excellent effect in the over-score of the training data set, that is, the target video.
在第二方面的一种可能的实现方式中,所述超分模型包括:所述第一分辨率的所述目标视频和所述第二分辨率的所述目标视频进行训练获取的超分模型,所述第一分辨率的所述目标视频和所述第二分辨率的所述目标视频由所述第一分辨率的视频中的视频帧和所述第二分辨率的视频中的视频帧进行数据清洗得到。In a possible implementation manner of the second aspect, the super-score model includes: a super-score model obtained by training the target video of the first resolution and the target video of the second resolution , the target video of the first resolution and the target video of the second resolution are composed of video frames in the video of the first resolution and video frames of the video of the second resolution Data cleaning is performed.
本申请实施例提供的视频传输方法,用于模型训练的数据集由目标视频的视频帧进行数据清洗得到,数据清洗包括去除模糊帧与低信息量帧等视频帧,由此可以提高超分模型的模型效果。In the video transmission method provided by the embodiment of the present application, the data set used for model training is obtained by data cleaning of the video frames of the target video, and the data cleaning includes removing video frames such as blurred frames and low-information frames, thereby improving the super-resolution model. model effect.
在第二方面的一种可能的实现方式中,所述终端接收服务器发送的目标视频的超分模型,以及第三分辨率的所述目标视频包括:所述终端接收所述服务器发送的所述超分模型的结构;所述终端接收所述服务器发送的数据包,所述数据包包括所述超分模型的权重参数和所述第 三分辨率的所述目标视频。In a possible implementation manner of the second aspect, the terminal receiving the super-resolution model of the target video sent by the server, and the target video of the third resolution includes: receiving, by the terminal, the target video sent by the server. The structure of the super-resolution model; the terminal receives a data packet sent by the server, where the data packet includes the weight parameter of the super-resolution model and the target video of the third resolution.
本申请实施例提供的视频传输方法,终端获取目标视频和超分模型的方式有多种,可以分别获取,也可以同时获取。具体地,终端可以分别接收超分模型的结构和超分模型的权重参数,在一种可能的实现方式中,终端先接收服务器发送的超分模型的结构,再获取超分模型的权重参数和目标视频。进一步地,服务器向终端发送时会将目标视频压缩为不同的数据包进行发送,可选地,超分模型的权重参数和目标视频的第一个数据包一起发送给终端,使得终端可以尽快开始对目标视频进行超分,在这一实现方式中,不同的目标视频的超分模型可具有相同的模型结构,只需要接收对应的权重参数即可获取不同的目标视频对应的超分模型,由此,可以减少数据传输量。In the video transmission method provided by the embodiment of the present application, there are multiple ways for the terminal to obtain the target video and the super-score model, which may be obtained separately or at the same time. Specifically, the terminal can receive the structure of the super-resolution model and the weight parameters of the super-resolution model respectively. In a possible implementation manner, the terminal first receives the structure of the super-resolution model sent by the server, and then obtains the weight parameters of the super-resolution model and target video. Further, when the server sends to the terminal, the target video will be compressed into different data packets for transmission. Perform super-score on the target video. In this implementation, the super-score models of different target videos can have the same model structure, and the super-score models corresponding to different target videos can be obtained only by receiving the corresponding weight parameters. Thus, the amount of data transfer can be reduced.
本申请实施例第三方面提供了一种服务器,包括:获取模块,用于获取目标视频的超分模型,所述超分模型根据第一分辨率的所述目标视频和第二分辨率的所述目标视频进行训练获取,所述第二分辨率小于所述第一分辨率,所述超分模型的放大倍率为第一分辨率与第二分辨率在长度方向上的像素之比或在宽度方向上的像素之比;发送模块,用于向终端发送第三分辨率的所述目标视频和所述超分模型,所述超分模型用于以所述放大倍率对所述第三分辨率的所述目标视频进行超分辨率重建以获取第四分辨率的所述目标视频。A third aspect of the embodiments of the present application provides a server, including: an acquisition module, configured to acquire a super-score model of a target video, where the super-score model is based on the target video at a first resolution and all images at a second resolution. The target video is acquired by training, the second resolution is smaller than the first resolution, and the magnification of the super-resolution model is the ratio of the pixels of the first resolution and the second resolution in the length direction or the width of the pixels. The ratio of pixels in the direction; the sending module is used to send the target video of the third resolution and the super-division model to the terminal, and the super-division model is used to measure the third resolution at the magnification ratio. Perform super-resolution reconstruction on the target video to obtain the target video of the fourth resolution.
在第三方面的一种可能的实现方式中,所述目标视频为单个视频或多个同类型视频的集合。In a possible implementation manner of the third aspect, the target video is a single video or a set of multiple videos of the same type.
在第三方面的一种可能的实现方式中,所述第三分辨率等于所述第二分辨率;所述第四分辨率等于所述第一分辨率。In a possible implementation manner of the third aspect, the third resolution is equal to the second resolution; the fourth resolution is equal to the first resolution.
在第三方面的一种可能的实现方式中,所述获取模块具体用于:将所述第一分辨率的目标视频和第二分辨率的所述目标视频输入卷积神经网络模型,获取过拟合的所述超分模型。In a possible implementation manner of the third aspect, the obtaining module is specifically configured to: input the target video of the first resolution and the target video of the second resolution into a convolutional neural network model, and obtain the The fitted super-score model.
在第三方面的一种可能的实现方式中,所述获取模块具体用于:对所述第一分辨率的视频中的视频帧和所述第二分辨率的视频中的视频帧进行数据清洗,获取所述第一分辨率的所述目标视频和所述第二分辨率的所述目标视频;对所述第一分辨率的所述目标视频和所述第二分辨率的所述目标视频进行训练,获取所述超分模型。In a possible implementation manner of the third aspect, the acquiring module is specifically configured to: perform data cleaning on the video frames in the video of the first resolution and the video frames in the video of the second resolution , obtain the target video of the first resolution and the target video of the second resolution; compare the target video of the first resolution and the target video of the second resolution Perform training to obtain the super-score model.
在第三方面的一种可能的实现方式中,所述发送模块具体用于:向所述终端发送所述超分模型的结构;向所述终端发送数据包,所述数据包包括所述超分模型的权重参数和所述第三分辨率的所述目标视频。In a possible implementation manner of the third aspect, the sending module is specifically configured to: send the structure of the superdivision model to the terminal; send a data packet to the terminal, where the data packet includes the superdivision model The weight parameter of the sub-model and the target video of the third resolution.
本申请实施例第四方面提供了一种终端,包括:接收模块,用于接收服务器发送的目标视频的超分模型,所述超分模型根据第一分辨率的所述目标视频和第二分辨率的所述目标视频进行训练获取,所述第二分辨率小于所述第一分辨率,所述超分模型的放大倍率为第一分辨率与第二分辨率在长度方向上或在宽度方向上的像素之比,以及第三分辨率的所述目标视频;处理模块,用于根据所述超分模型,以所述放大倍率对所述第三分辨率的所述目标视频进行超分辨率重建,获取第四分辨率的所述目标视频。A fourth aspect of an embodiment of the present application provides a terminal, including: a receiving module configured to receive a super-resolution model of a target video sent by a server, where the super-resolution model is based on the target video of a first resolution and a second resolution The target video of the first resolution is obtained by training, the second resolution is smaller than the first resolution, and the magnification of the super-resolution model is the first resolution and the second resolution in the length direction or in the width direction. and the target video of the third resolution; a processing module, configured to perform super-resolution on the target video of the third resolution with the magnification according to the super-resolution model Reconstruct, and obtain the target video of the fourth resolution.
在第四方面的一种可能的实现方式中,所述目标视频为单个视频或多个同类型视频的集合。In a possible implementation manner of the fourth aspect, the target video is a single video or a set of multiple videos of the same type.
在第四方面的一种可能的实现方式中,所述第三分辨率等于所述第二分辨率;所述第四分辨率等于所述第一分辨率。In a possible implementation manner of the fourth aspect, the third resolution is equal to the second resolution; the fourth resolution is equal to the first resolution.
在第四方面的一种可能的实现方式中,所述超分模型包括:由所述第一分辨率的目标视频和第二分辨率的所述目标视频输入卷积神经网络模型获取的过拟合的超分模型。In a possible implementation manner of the fourth aspect, the super-score model includes: an overfitting obtained by inputting the target video of the first resolution and the target video of the second resolution into a convolutional neural network model The combined superscore model.
在第四方面的一种可能的实现方式中,所述超分模型包括:所述第一分辨率的所述目标视频和所述第二分辨率的所述目标视频进行训练获取的超分模型,所述第一分辨率的所述目标视频和所述第二分辨率的所述目标视频由所述第一分辨率的视频中的视频帧和所述第二分辨率的视频中的视频帧进行数据清洗得到。In a possible implementation manner of the fourth aspect, the super-score model includes: a super-score model obtained by training the target video of the first resolution and the target video of the second resolution , the target video of the first resolution and the target video of the second resolution are composed of video frames in the video of the first resolution and video frames of the video of the second resolution Data cleaning is performed.
在第四方面的一种可能的实现方式中,所述接收模块具体用于:接收所述服务器发送的所述超分模型的结构;接收所述服务器发送的数据包,所述数据包包括所述超分模型的权重参数和所述第三分辨率的所述目标视频。In a possible implementation manner of the fourth aspect, the receiving module is specifically configured to: receive the structure of the super-resolution model sent by the server; receive a data packet sent by the server, where the data packet includes the weight parameters of the super-resolution model and the target video of the third resolution.
本申请实施例第五方面提供了一种服务器,包括:一个或多个处理器和存储器;其中,所述存储器中存储有计算机可读指令;所述一个或多个处理器读取所述计算机可读指令以使所述终端实现如上述第一方面以及第一方面各种可能的实现方式中任一项所述的方法。A fifth aspect of an embodiment of the present application provides a server, including: one or more processors and a memory; wherein, computer-readable instructions are stored in the memory; the one or more processors read the computer The instructions are readable to cause the terminal to implement the method according to any one of the above-mentioned first aspect and various possible implementation manners of the first aspect.
本申请实施例第六方面提供了一种终端,包括:一个或多个处理器和存储器;其中,所述存储器中存储有计算机可读指令;所述一个或多个处理器读取所述计算机可读指令以使所述终端实现如上述第二方面以及第二方面各种可能的实现方式中任一项所述的方法。A sixth aspect of an embodiment of the present application provides a terminal, including: one or more processors and a memory; wherein, the memory stores computer-readable instructions; the one or more processors read the computer The instructions are readable to cause the terminal to implement the method according to any one of the above-mentioned second aspect and various possible implementation manners of the second aspect.
本申请实施例第七方面提供了一种视频传输系统,包括:如上述第一方面以及第一方面各种可能的实现方式中任一项所述的服务器,和如上述第二方面以及第二方面各种可能的实现方式中任一项所述的终端。A seventh aspect of an embodiment of the present application provides a video transmission system, including: the server described in any one of the foregoing first aspect and various possible implementation manners of the first aspect, and the foregoing second aspect and the second The terminal described in any one of various possible implementation manners of the aspect.
本申请实施例第八方面提供了一种包含指令的计算机程序产品,其特征在于,当其在计算机上运行时,使得所述计算机执行如上述第一方面、第二方面以及各种可能的实现方式中任一项所述的方法。An eighth aspect of the embodiments of the present application provides a computer program product containing instructions, characterized in that, when it runs on a computer, the computer is caused to execute the first aspect, the second aspect and various possible implementations described above. The method of any one of the methods.
本申请实施例第九方面提供了一种计算机可读存储介质,包括指令,其特征在于,当所述指令在计算机上运行时,使得计算机执行如上述第一方面、第二方面以及各种可能的实现方式中任一项所述的方法。A ninth aspect of an embodiment of the present application provides a computer-readable storage medium, including instructions, characterized in that, when the instructions are executed on a computer, the computer is made to execute the first aspect, the second aspect, and various possibilities described above. The method described in any one of the implementation manners.
本申请实施例第十方面提供了一种芯片,包括处理器。处理器用于读取并执行存储器中存储的计算机程序,以执行上述任一方面任意可能的实现方式中的方法。可选地,该芯片该包括存储器,该存储器与该处理器通过电路或电线与存储器连接。进一步可选地,该芯片还包括通信接口,处理器与该通信接口连接。通信接口用于接收需要处理的数据和/或信息,处理器从该通信接口获取该数据和/或信息,并对该数据和/或信息进行处理,并通过该通信接口输出处理结果。该通信接口可以是输入输出接口。A tenth aspect of the embodiments of the present application provides a chip, including a processor. The processor is configured to read and execute the computer program stored in the memory to perform the method in any possible implementation manner of any of the above aspects. Optionally, the chip includes a memory, and the memory and the processor are connected to the memory through a circuit or a wire. Further optionally, the chip further includes a communication interface, and the processor is connected to the communication interface. The communication interface is used for receiving data and/or information to be processed, the processor obtains the data and/or information from the communication interface, processes the data and/or information, and outputs the processing result through the communication interface. The communication interface may be an input-output interface.
其中,第三方面、第四方面、第五方面、第六方面、第七方面、第八方面、第九方面或第十方面中任一种实现方式所带来的技术效果可参见第一方面中相应实现方式所带来的技术效果,此处不再赘述。Wherein, the technical effect brought by any one of the third aspect, the fourth aspect, the fifth aspect, the sixth aspect, the seventh aspect, the eighth aspect, the ninth aspect or the tenth aspect can refer to the first aspect The technical effects brought about by the corresponding implementations in the above will not be repeated here.
从以上技术方案可以看出,本申请实施例具有以下优点:As can be seen from the above technical solutions, the embodiments of the present application have the following advantages:
服务器向终端发送第三分辨率的目标视频,以及目标视频的超分模型,该超分模型由第一分辨率(较高分辨率)的目标视频和第二分辨率(较低分辨率)的目标视频训练得到,因此,用于目标视频的超分时,相较通用的超分模型恢复得到的视频更接近原始的高分辨率视频,画质提高。The server sends the target video of the third resolution and the super-resolution model of the target video to the terminal. The super-resolution model consists of the target video of the first resolution (higher resolution) and the target video of the second resolution (lower resolution). The target video is trained, so when it is used for the super-score of the target video, the recovered video is closer to the original high-resolution video than the general super-score model, and the image quality is improved.
附图说明Description of drawings
图1为视频传输方法的系统架构图;1 is a system architecture diagram of a video transmission method;
图2为本申请实施例中视频传输方法的系统架构图;2 is a system architecture diagram of a video transmission method in an embodiment of the application;
图3a为本申请实施例中服务器侧实现视频传输方法的实施例示意图;3a is a schematic diagram of an embodiment of a video transmission method implemented on a server side in an embodiment of the present application;
图3b为本申请实施例中终端侧实现视频传输方法的实施例示意图;3b is a schematic diagram of an embodiment of a video transmission method implemented by a terminal side in an embodiment of the present application;
图3c为本申请实施例中实现视频传输方法的流程示意图;3c is a schematic flowchart of implementing a video transmission method in an embodiment of the present application;
图4a为本申请实施例中视频传输方法的交互图;4a is an interaction diagram of a video transmission method in an embodiment of the present application;
图4b为本申请实施例中视频传输方法的另一个实施例示意图;FIG. 4b is a schematic diagram of another embodiment of the video transmission method in the embodiment of the present application;
图5为本申请实施例中数据清洗方法的示意图;5 is a schematic diagram of a data cleaning method in an embodiment of the present application;
图6为本申请实施例中超分模型训练的示意图;FIG. 6 is a schematic diagram of super-resolution model training in an embodiment of the present application;
图7为本申请实施例中数据包中超分模型数据块的结构示意图;7 is a schematic structural diagram of a data block of a superdivision model in a data packet in an embodiment of the present application;
图8为本申请实施例中数据包的报头的结构示意图;8 is a schematic structural diagram of a header of a data packet in an embodiment of the present application;
图9为本申请实施例中数据包的结构示意图;9 is a schematic structural diagram of a data packet in an embodiment of the present application;
图10为本申请实施例中终端解码数据包的示意图;10 is a schematic diagram of a terminal decoding a data packet in an embodiment of the present application;
图11为本申请实施例中终端获取超分图像的示意图;11 is a schematic diagram of a terminal acquiring a super-resolution image in an embodiment of the application;
图12为本申请实施例中服务器的一个实施例示意图;FIG. 12 is a schematic diagram of an embodiment of a server in an embodiment of the present application;
图13为本申请实施例中终端的一个实施例示意图;FIG. 13 is a schematic diagram of an embodiment of a terminal in an embodiment of the present application;
图14为本申请实施例中服务器的另一个实施例示意图;FIG. 14 is a schematic diagram of another embodiment of the server in the embodiment of the present application;
图15为本申请实施例中终端的另一个实施例示意图。FIG. 15 is a schematic diagram of another embodiment of the terminal in the embodiment of the present application.
具体实施方式Detailed ways
本申请实施例提供了一种视频传输方法,应用于视频传输领域,用于降低传输带宽的同时保证接收端的视频画质。The embodiment of the present application provides a video transmission method, which is applied to the field of video transmission, and is used for reducing the transmission bandwidth while ensuring the video quality of the receiving end.
为了便于理解,下面对本申请实施例涉及的部分技术术语进行简要介绍:For ease of understanding, some technical terms involved in the embodiments of the present application are briefly introduced below:
过拟合(overfitting)overfitting
过拟合是指在模型参数拟合过程中,由于训练数据包含抽样误差,训练时,复杂的模型将抽样误差也考虑在内,进行了很好的拟合,具体表现是,模型在训练集上效果好,在测试集上效果差,模型泛化能力弱。出现过拟合的两个主要原因是训练数据量少,或者模型过复杂。Overfitting means that in the process of model parameter fitting, since the training data contains sampling error, the complex model takes the sampling error into account during training and performs a good fit. The specific performance is that the model is in the training set. The effect is good on the test set, but the effect on the test set is poor, and the model generalization ability is weak. The two main reasons for overfitting are a small amount of training data, or an overly complex model.
通常,模型一般用来预测未知数据(非训练集内的数据),过拟合的模型效果较差,因此会在训练过程中,通过扩大训练集数据量及使用合适的模型等方式来避免过拟合,使模型足够拟合真正的规则,而不必拟合过多的抽样误差。一般而言,泛化能力强的模型才是通常意义上的好模型。在模型训练中需要避免过拟合,而追求泛化,以使得模型的适用范围更宽。Usually, the model is generally used to predict unknown data (data not in the training set), and the effect of the overfitting model is poor. Therefore, during the training process, the data volume of the training set is enlarged and the appropriate model is used to avoid overfitting. fit so that the model fits the true rule enough without having to fit too much sampling error. Generally speaking, a model with strong generalization ability is a good model in the usual sense. In model training, it is necessary to avoid overfitting and pursue generalization to make the model applicable to a wider range.
但是本申请实施例提供的视频传输方法却巧妙利用到过拟合模型的特性,通过较小的数据量(例如单个视频或多个同类型视频)进行超分模型训练,得到过拟合的超分模型,该过拟合的超分模型被专用于训练数据的超分,虽然不适宜用于非训练集的超分,但是对于训练数据的超分可以获取较好的超分效果,即恢复的高分辨视频的画质较高。However, the video transmission method provided by the embodiment of the present application cleverly utilizes the characteristics of the overfitting model, and trains the super-score model with a small amount of data (for example, a single video or multiple videos of the same type), and obtains the over-fitting super-score model. The over-fitting super-score model is specially used for the over-score of the training data. Although it is not suitable for the over-score of the non-training set, it can obtain a better over-score effect for the over-score of the training data, that is, recovery high-resolution video with higher quality.
图像的分辨率:指图像中存储的信息量,是每英寸图像内有多少个像素点,常用“水平像素数×垂直像素数”来表示。图像分辨率640*480,代表水平像素点数量为640个,垂直像素点数量为480个。P表示逐行扫描(progressive scanning),常用的图像分辨率的360P,是指480×360;720P,是指1280×720;1080p是指1920×1080;4K是指3840×2160。The resolution of the image: refers to the amount of information stored in the image, which is how many pixels there are in each inch of the image, usually expressed as "the number of horizontal pixels × the number of vertical pixels". The image resolution is 640*480, which means the number of horizontal pixels is 640 and the number of vertical pixels is 480. P stands for progressive scanning. Commonly used image resolutions of 360P refer to 480×360; 720P refers to 1280×720; 1080p refers to 1920×1080; 4K refers to 3840×2160.
可以理解的是,图像的分辨率越高,包含的数据越多,也能表现更丰富的细节,但同时需要更多的计算机存储资源。Understandably, the higher the resolution of the image, the more data it contains, and the richer the details can be, but at the same time it requires more computer storage resources.
超分模型的放大倍率,是指超分后的图像与超分前的图像在长度方向上的像素数的比值,或者两个图像在宽度方向上的像素数的比值。The magnification of the super-resolution model refers to the ratio of the number of pixels in the length direction of the image after super-resolution and the image before super-resolution, or the ratio of the number of pixels in the width direction of the two images.
超分辨率(super resolution,SR)Super resolution (super resolution, SR)
简称超分,是通过硬件或软件的方法提高原有图像的分辨率,通过低分辨率(low resolution,LR)的图像或图像序列来得到高分辨率(high resolution,HR)的图像或图像序列的过程就是超分辨率重建。Super-resolution for short, is to improve the resolution of the original image by means of hardware or software, and obtain a high resolution (high resolution, HR) image or image sequence through a low resolution (low resolution, LR) image or image sequence. The process is super-resolution reconstruction.
低分辨率图像(low resolution,LR),和高分辨率图像(high resolution,HR)相互对应,用于输入深度学习模型,例如卷积网络模型中进行训练,HR图像是LR图像对应的原始图像,通过超分模型对LR图像进行超分重建得到超分辨率图像(super resolution,SR),由LR图像得到SR图像的过程可称为超分重建或超分恢复。Low-resolution images (low resolution, LR) and high-resolution images (high resolution, HR) correspond to each other and are used to input deep learning models, such as convolutional network models for training, HR images are the original images corresponding to the LR images , the LR image is super-reconstructed by the super-resolution model to obtain a super-resolution (SR) image, and the process of obtaining an SR image from the LR image can be called super-resolution reconstruction or super-resolution restoration.
深度学习:深度学习是机器学习中一种基于对数据进行表征学习的方法。观测值(例如一幅图像)可以使用多种方式来表示,如每个像素强度值的向量,或者更抽象地表示成一系列边、特定形状的区域等。而使用某些特定的表示方法更容易从实例中学习任务(例如,人脸识别或面部表情识别)。深度学习的好处是用非监督式或半监督式的特征学习和分层特征提取高效算法来替代手工获取特征。Deep Learning: Deep learning is a method in machine learning based on representational learning of data. An observation (eg, an image) can be represented in a variety of ways, such as a vector of intensity values for each pixel, or more abstractly as a series of edges, regions of a particular shape, etc. Instead, it is easier to learn tasks from examples (e.g., face recognition or facial expression recognition) using some specific representation. The benefit of deep learning is to replace handcrafted features with efficient algorithms for unsupervised or semi-supervised feature learning and hierarchical feature extraction.
卷积神经网络:对图像(不同的数据窗口数据)和滤波矩阵(一组固定的权重:因为每个神经元的多个权重固定,所以又可以看做一个恒定的滤波器)做内积(逐个元素相乘再求和)的操作就是所谓的卷积操作,也是卷积神经网络的名字来源。Convolutional Neural Network: Do the inner product of the image (different data window data) and the filter matrix (a set of fixed weights: because the multiple weights of each neuron are fixed, it can be regarded as a constant filter) ( The operation of multiplying and summing element by element is the so-called convolution operation, which is also the source of the name of the convolutional neural network.
数据清洗:数据清洗是对数据进行重新审查和校验的过程,目的在于删除重复信息、纠正存在的错误,并提供数据一致性。也就是去掉或者修复掉影响后续模型训练的“脏样本”。本申请中数据清洗包括剔除掉模糊的和信息量比较少的视频帧等。Data cleaning: Data cleaning is the process of re-examining and verifying data to remove duplicate information, correct existing errors, and provide data consistency. That is, to remove or repair the "dirty samples" that affect subsequent model training. In this application, data cleaning includes removing blurred and less informative video frames.
下面结合附图,对本申请的实施例进行描述,显然,所描述的实施例仅仅是本申请一部分的实施例,而不是全部的实施例。本领域普通技术人员可知,随着技术的发展和新场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。The embodiments of the present application will be described below with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of the present application, rather than all the embodiments. Those of ordinary skill in the art know that, with the development of technology and the emergence of new scenarios, the technical solutions provided in the embodiments of the present application are also applicable to similar technical problems.
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的实施例能够以除了在这里图示或描述的内容以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或模块的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或模块,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或模块。在本申请中出现的对步骤进行的命名或者编号,并不意味着必须按照命名或者编号所指示的时间/逻辑先后顺序执行方法流程中的步骤,已经命名或者编号的流程步骤可以根据要实现的 技术目的变更执行次序,只要能达到相同或者相类似的技术效果即可。The terms "first", "second" and the like in the description and claims of the present application and the above drawings are used to distinguish similar objects, and are not necessarily used to describe a specific order or sequence. It is to be understood that data so used may be interchanged under appropriate circumstances so that the embodiments described herein can be practiced in sequences other than those illustrated or described herein. Furthermore, the terms "comprising" and "having", and any variations thereof, are intended to cover non-exclusive inclusion, for example, a process, method, system, product or device comprising a series of steps or modules is not necessarily limited to those expressly listed Rather, those steps or modules may include other steps or modules not expressly listed or inherent to the process, method, product or apparatus. The naming or numbering of the steps in this application does not mean that the steps in the method flow must be executed in the time/logical sequence indicated by the naming or numbering, and the named or numbered process steps can be implemented according to the The technical purpose is to change the execution order, as long as the same or similar technical effects can be achieved.
请参阅图1,为一个经典的终端视频应用场景示例,包括发送端输入视频(1001)、视频编码(1002)、网络传输(1003)、视频解码(1004)、输出视频(1005)五个部分。发送端输入视频(1001)是该场景的输入部分,是云端服务器中已保存的视频。视频编码(1002)是对输入的视频图像进行压缩,减少视频冗余信息,以便于网络传输。网络传输(1003)阶段是将视频从发送端传送到接收端。视频解码(1004)用于解码网络传输的经过编码的视频,使视频恢复成编码之前的状态。输出视频(1005)是将解码器合成的视频输出。Please refer to Figure 1, which is an example of a classic terminal video application scenario, including five parts: input video (1001), video encoding (1002), network transmission (1003), video decoding (1004), and output video (1005) at the transmitter . The input video (1001) of the sender is the input part of the scene, and is the video saved in the cloud server. The video coding (1002) is to compress the input video images and reduce the redundant information of the video to facilitate network transmission. The network transmission (1003) stage is to transmit the video from the sender to the receiver. The video decoding (1004) is used to decode the encoded video transmitted by the network and restore the video to the state before encoding. The output video (1005) is the video output synthesized by the decoder.
云端通过网络发送高分辨率视频(如4k分辨率的视频)至用户端,视频平台的带宽成本比较高,同时在用户网络状态差的情况下,端侧视频播放容易出现卡顿的问题。The cloud sends high-resolution video (such as 4k resolution video) to the client through the network. The bandwidth cost of the video platform is relatively high. At the same time, when the user's network status is poor, the video playback on the client side is prone to freeze problems.
视频平台通常使用较小的分辨率进行压缩传输以降低到用户终端的图像传输带宽。用户终端侧接收到的图像需要进行分辨率恢复后再进行显示。由于压缩传输时存在高频信息丢失,端侧通过传统的上采样算法对低分辨率视频进行插值得到高分辨率视频,容易出现模糊或压缩噪声等现象,图像画质较差。Video platforms usually use smaller resolutions for compressed transmission to reduce the image transmission bandwidth to the user terminal. The image received by the user terminal side needs to be restored to the resolution before being displayed. Due to the loss of high-frequency information during compression and transmission, the traditional upsampling algorithm is used to interpolate the low-resolution video to obtain the high-resolution video, which is prone to blurring or compression noise, and the image quality is poor.
超分辨率技术是从低分辨率的图片恢复高分辨率图片的一种重要的计算机视觉和图像处理手段。基于深度学习的超分辨率技术发展迅速。已有方法通过在端侧预置一个超分辨率模型,对低分辨率的视频进行超分辨率重建,获取高分辨率高画质的视频。Super-resolution technology is an important computer vision and image processing method to restore high-resolution images from low-resolution images. Super-resolution technology based on deep learning is developing rapidly. In the existing methods, a super-resolution model is preset on the terminal side to perform super-resolution reconstruction on low-resolution videos to obtain high-resolution and high-quality videos.
但是不同视频具有各自不同的内容、风格和画质,使用单一的预置模型无法覆盖全部的视频场景,导致恢复的画质较原高分辨率画质还有较大差距,甚至在一些场景上视频的画质会劣化。However, different videos have different contents, styles and image quality. Using a single preset model cannot cover all video scenes, resulting in a large gap between the restored image quality and the original high-resolution image quality, and even in some scenes The quality of the video will be degraded.
本申请实施例中目标视频为单个视频或多个同类型视频的集合。具体地,目标视频可以是一部连续剧中的一集,或是一个新闻视频、一部电影等独立的一个视频。本申请实施例中,可以对单个视频独立训练超分模型。或者,目标视频还可以是多个同类型视频,例如系列剧、季播剧中的一个系列或一季,又或个人博主的同系列视频等,即在内容、人物或表演中具有关联性的多个视频。由于这些视频通常具有相似的风格和画质,将这些视频独立训练超分模型,超分恢复得到的视频画质较佳。In this embodiment of the present application, the target video is a single video or a collection of multiple videos of the same type. Specifically, the target video may be an episode of a series, or an independent video such as a news video or a movie. In this embodiment of the present application, the super-score model can be independently trained for a single video. Alternatively, the target video may also be multiple videos of the same type, such as a series or a season of a series, a season broadcast, or a video of the same series of a personal blogger, that is, related in content, characters or performances. multiple videos. Since these videos usually have similar style and quality, training the super-score model on these videos independently, the quality of the videos recovered by super-score is better.
本申请实施例提供的视频传输方法,可以由服务器和终端通过端云协同的方式降低传输带宽,并提升端侧恢复的视频的画质。With the video transmission method provided by the embodiment of the present application, the server and the terminal can reduce the transmission bandwidth and improve the image quality of the video restored on the terminal side by means of device-cloud collaboration.
(一)云侧:即服务器端,使用一集视频的高分辨率视频和低分辨率视频所对应的视频帧生成训练数据集,对于超分辨率的神经网络模型进行过拟合的训练,生成过拟合超分模型。将该集视频的低分辨率视频与该集所对应的过拟合的超分辨率神经网络模型下发至端侧。(1) Cloud side: that is, the server side, which uses the high-resolution video of a set of videos and the video frames corresponding to the low-resolution video to generate a training data set. Overfitting the overscore model. The low-resolution video of the set of videos and the over-fitted super-resolution neural network model corresponding to the set are sent to the end-side.
(二)端侧:使用云侧下发的过拟合超分辨神经网络模型,把该集低分辨率视频通过超分辨率技术重建还原成高分辨率且高画质的视频。(2) End-to-end: Using the over-fitting super-resolution neural network model delivered by the cloud side, the set of low-resolution videos is reconstructed and restored to high-resolution and high-quality videos through super-resolution technology.
如图2所示,本申请实施例通过端云协同的基于深度学习的视频传输方案。As shown in FIG. 2 , the embodiment of the present application uses a deep learning-based video transmission solution through device-cloud collaboration.
在云端的服务器侧,第一分辨率视频(2001)与其对应的第二分辨率视频(2002)的视频将被作为该视频的超分模型训练模块(2003)的数据集,其中,第二分辨率小于第一分辨率。超分模型训练模块(2003)利用数据集训练该视频专有的超分模型,不同的视频都会有其相对应的超分模型,可选地,第二分辨率视频(2002)与其对应的超分模型一同在视频和模型编码模块(2004)编码为数据包通过网络(2005)中进行传输。On the server side of the cloud, the video of the first resolution video (2001) and its corresponding second resolution video (2002) will be used as the data set of the super-score model training module (2003) of the video, wherein the second resolution video rate is less than the first resolution. The super-score model training module (2003) uses the dataset to train the video-specific super-score model. Different videos will have their corresponding over-score models. Optionally, the second resolution video (2002) and its corresponding over-score model. The sub-models are encoded together in the video and model encoding module (2004) as data packets for transmission over the network (2005).
在端侧,视频和模型解码模块(2006)将接收的数据包进行解码,得到该集的第二分辨率视频(2007)与其对应的超分模型(2008),通过超分辨率画质重建模块(2009)输出第一分辨率视频(2010)。On the end side, the video and model decoding module (2006) decodes the received data packets to obtain the second resolution video (2007) of the set and its corresponding super-resolution model (2008). (2009) Output the first resolution video (2010).
下面请参阅图3a和图3b,分别从服务器侧和终端侧介绍了本申请实施例中的视频传输方法。Referring to FIG. 3a and FIG. 3b below, the video transmission method in the embodiment of the present application is introduced from the server side and the terminal side, respectively.
参考图3a,本申请实施例中的视频传输方法可以包括步骤3101至步骤3103。Referring to FIG. 3 a , the video transmission method in this embodiment of the present application may include steps 3101 to 3103 .
3101、服务器获取第一分辨率的目标视频和第二分辨率的目标视频;3101. The server obtains the target video of the first resolution and the target video of the second resolution;
服务器获取目标视频的两个不同分辨率版本,包括第一分辨率的目标视频,和第二分辨率的目标视频,第二分辨率(本申请实施例中或称低分辨率)小于第一分辨率(本申请实施例中或称高分辨率)。本申请实施例中以目标视频为单个剧集为例进行介绍。通常视频平台存储了同一视频的多个分辨率版本,常见的分辨率包括:360P、720P、1080P或4K。如图3c所示,服务器获取单个剧集的第一分辨率的目标视频(3001)的视频(例如:某连续剧第一集的分辨率1080P的视频)与其对应的第二分辨率的目标视频(3002)(例如:该连续剧第一集的分辨率4K的视频)作为超分模型的数据集(3003)。其中第二分辨率的目标视频的视频帧作为该集数据集的数据(data),其对应的第一分辨率的目标视频的视频帧作为该集数据集的标签(label)。数据集的质量对于超分模型训练的效果十分重要,可选地,用于对超分模型进行训练的数据集,可以是未经数据清洗包含目标视频所有视频帧的数据集,也可以是经数据清洗后删减了部分视频帧的数据集。可选地,对数据集(3003)进行数据清洗(3004),数据清洗包括去除模糊帧,过滤低频信息等,得到清洗后的data数据集(3005)和label数据集(3006)。The server obtains two different resolution versions of the target video, including the target video of the first resolution and the target video of the second resolution, where the second resolution (or low resolution in this embodiment of the present application) is smaller than the first resolution rate (or referred to as high resolution in the embodiments of this application). In the embodiments of the present application, the target video is taken as an example of a single episode for introduction. Usually video platforms store multiple resolution versions of the same video. Common resolutions include: 360P, 720P, 1080P or 4K. As shown in Fig. 3c, the server obtains the video of the target video (3001) of the first resolution of a single episode (for example: the video of the first episode of a certain series with the resolution of 1080P) and the corresponding target video of the second resolution (3001). 3002) (for example: the video of the first episode of the series with a resolution of 4K) as the data set of the super-score model (3003). The video frame of the target video of the second resolution is used as the data (data) of the dataset, and the corresponding video frame of the target video of the first resolution is used as the label of the dataset. The quality of the data set is very important for the training effect of the super-resolution model. Optionally, the data set used for training the super-resolution model can be a data set containing all video frames of the target video without data cleaning, or a data set that has been Data set with some video frames deleted after data cleaning. Optionally, data cleaning (3004) is performed on the dataset (3003), the data cleaning includes removing blurred frames, filtering low-frequency information, etc., to obtain a cleaned data dataset (3005) and a label dataset (3006).
3102、服务器根据第一分辨率的目标视频和第二分辨率的目标视频进行训练,获取超分模型;3102. The server performs training according to the target video of the first resolution and the target video of the second resolution to obtain a super-score model;
服务器根据第一分辨率的目标视频和第二分辨率的目标视频对超分模型进行过拟合训练,获取过拟合的超分模型。该超分模型的放大倍率为第一分辨率与第二分辨率的宽度方向的像素比值或者长度方向的像素比值。可选地,单个视频对应一个超分模型。在另一些可能的实现方式中,多个同类型视频的集合对应一个超分模型。可选地,多个同类型视频的集合中每个视频对应的超分模型具有相同的模型结构,和不同的模型参数。The server performs overfitting training on the super-score model according to the target video of the first resolution and the target video of the second resolution, and obtains the over-fitted super-score model. The magnification of the super-resolution model is the ratio of pixels in the width direction or the ratio of pixels in the length direction between the first resolution and the second resolution. Optionally, a single video corresponds to one super-score model. In other possible implementations, a set of multiple videos of the same type corresponds to a super-score model. Optionally, the super-score model corresponding to each video in the set of multiple videos of the same type has the same model structure and different model parameters.
如图3c所示,服务器搭建超分辨率的神经网络模型,本基于步骤3101生成的data数据集(3005)与label数据集(3006)进行超分模型的模型训练(3007)。可选地,针对每一集视频训练一个对应的超分模型。此步骤将得到针对该单个剧集视频的超分模型(3008)。可选地,服务器将数据集输入卷积神经网络模型中进行训练,获取过拟合的超分模型。As shown in Figure 3c, the server builds a super-resolution neural network model, and the super-resolution model is trained (3007) based on the data dataset (3005) and the label dataset (3006) generated in step 3101. Optionally, a corresponding super-score model is trained for each episode of videos. This step will result in a superscoring model (3008) for that single episode video. Optionally, the server inputs the data set into the convolutional neural network model for training, and obtains an overfitted super-score model.
3103、服务器向终端发送超分模型和第三分辨率的目标视频;3103. The server sends the super-resolution model and the target video of the third resolution to the terminal;
服务器将步骤3102获取的超分模型和第三分辨率的目标视频发送给终端。需要说明的是,第三分辨率可以与第一分辨率和第二分辨率相同或者不同,具体数值不做限定,示例性的,第三分辨率可以为360P、720P或者1080P,第三分辨率可以根据网络状态灵活选择。可以理解的是,第三分辨率的目标视频包含原始视频的所有视频帧,不需要进行数据清洗。The server sends the super-resolution model obtained in step 3102 and the target video of the third resolution to the terminal. It should be noted that the third resolution may be the same as or different from the first resolution and the second resolution, and the specific value is not limited. For example, the third resolution may be 360P, 720P or 1080P, and the third It can be flexibly selected according to the network status. It can be understood that the target video of the third resolution contains all the video frames of the original video, and no data cleaning is required.
具体的,如图3c所示,编码模块(3010)将超分模型(3008)和第三分辨率的目标视频(3009)编码为数据包送到网络(3011)中进行传输。Specifically, as shown in Figure 3c, the encoding module (3010) encodes the super-resolution model (3008) and the target video (3009) of the third resolution into data packets and sends them to the network (3011) for transmission.
可选地,超分模型将以二进制文件格式进行传输。可选地,超分模型的模型结构预更新到端侧,当有视频传输需求时,服务器只需向终端发送该视频对应的超分模型的权重(weights)参数,不用再次发送模型结构。可选地,为保证端侧接受的模型具有数据一致性,二进制文件的尾部包含有模型生成的哈希(hash)值。Optionally, the superresolution model will be transmitted in binary file format. Optionally, the model structure of the super-resolution model is pre-updated to the terminal side. When there is a video transmission requirement, the server only needs to send the weights parameters of the super-resolution model corresponding to the video to the terminal, without sending the model structure again. Optionally, in order to ensure that the model accepted by the terminal has data consistency, the tail of the binary file contains a hash value generated by the model.
向终端发送的超分模型用于对该第三分辨率的目标视频进行超分,例如,将分辨率为1080P的视频恢复为分辨率4K的视频。The super-resolution model sent to the terminal is used to super-score the target video of the third resolution, for example, to restore a video with a resolution of 1080P to a video with a resolution of 4K.
本申请实施例提供的数据传输方法,由于向终端发送的第三分辨率的目标视频和超分模型,相较直接发送第四分辨率的视频占用的网络带宽更低,此外,由于该超分模型为根据目标视频进行过训练得到,用于对于目标视频进行超分恢复效果更好,超分恢复得到的视频画质较高。In the data transmission method provided by the embodiment of the present application, since the target video of the third resolution and the super-resolution model are sent to the terminal, the network bandwidth occupied by directly sending the video of the fourth resolution is lower. The model is obtained by training according to the target video, and the effect of super-score recovery is better for the target video, and the video quality obtained by over-score recovery is higher.
下面参阅图3b,对本申请实施例提供的视频传输方法在终端侧执行的步骤进行介绍。Referring to FIG. 3b below, the steps performed on the terminal side by the video transmission method provided by the embodiment of the present application are described.
3201、终端接收服务器发送的超分模型和第三分辨率的目标视频;3201. The terminal receives the super-resolution model and the target video of the third resolution sent by the server;
终端接收云侧服务器发送的超分模型和第三分辨率的目标视频。如图3c所示,具体地,终端接收服务器通过网络传输下发的数据包,通过解码模块(3012)解码数据包得到超分模型(3014),以及第三分辨率的目标视频(3013)。The terminal receives the super-resolution model and the target video of the third resolution sent by the cloud-side server. As shown in Figure 3c, specifically, the terminal receives the data packets transmitted by the server through the network transmission, and decodes the data packets through the decoding module (3012) to obtain the super-division model (3014) and the target video of the third resolution (3013).
该超分模型为服务器根据第一分辨率的目标视频和第二分辨率的目标视频进行过拟合训练得到。超分模型的放大倍率为第一分辨率与第二分辨率的宽度方向的像素比值或者长度方向的像素比值。该超分模型用于对第三分辨率的目标视频按照放大倍率进行分辨率放大。示例性的,若第一分辨率为4K,第二分辨率为1080P,则放大倍数为两倍,即分辨率1080P的图像在长度方向和宽度方向的像素各扩大两倍。The super-score model is obtained by the server performing overfitting training according to the target video of the first resolution and the target video of the second resolution. The magnification of the super-resolution model is the ratio of pixels in the width direction or the ratio of pixels in the length direction between the first resolution and the second resolution. The super-resolution model is used to perform resolution magnification for the target video of the third resolution according to the magnification ratio. Exemplarily, if the first resolution is 4K and the second resolution is 1080P, the magnification is twice, that is, the pixels of the image with the resolution of 1080P are enlarged twice in the length direction and the width direction.
可选地,解码模块(3012)使用与云端服务器相同的哈希方法对超分模型(3014)进行哈希处理,得到一个hash值,使用哈希值进行一致性校验,如果相同,则认为该模型是可靠的;若不同则向云端发送错误信息,要求服务器重新发送正确数据包。Optionally, the decoding module (3012) uses the same hash method as that of the cloud server to perform hash processing on the super-score model (3014), obtains a hash value, and uses the hash value to perform consistency check, if the same, it is considered that The model is reliable; if it is different, an error message is sent to the cloud, requiring the server to resend the correct packet.
3202、终端根据该超分模型对该第三分辨率的目标视频进行超分处理,获取第四分辨率的目标视频;3202. The terminal performs super-resolution processing on the target video of the third resolution according to the super-resolution model, and obtains the target video of the fourth resolution;
如图3c所示,终端使用过拟合的超分模型(3014)对第三分辨率的目标视频(3013)进行视频超分处理(3015),生成第四分辨率的目标视频(3016)并送显于终端显示装置。第四分辨率的目标视频为第三分辨率的目标视频经超分模型的放大倍数放大后得到。示例性的,若第三分辨率为1080P,超分模型的放大倍数为2倍,则经超分处理得到的目标视频的分辨率为4K。As shown in Figure 3c, the terminal uses the over-fit super-score model (3014) to perform video super-score processing (3015) on the target video (3013) of the third resolution, and generates the target video of the fourth resolution (3016) and sent to the terminal display device. The target video of the fourth resolution is obtained by enlarging the target video of the third resolution by the magnification of the super-resolution model. Exemplarily, if the third resolution is 1080P and the magnification of the super-resolution model is 2 times, the resolution of the target video obtained by the super-resolution processing is 4K.
终端接收第三分辨率的目标视频占用的网络带宽较低,此外,由于终端根据该目标视频对应的过拟合的超分模型进行分辨率放大,获取的第四分辨率的目标视频的画质较高。The network bandwidth occupied by the terminal receiving the target video of the third resolution is relatively low. In addition, since the terminal performs resolution amplification according to the overfitting super-resolution model corresponding to the target video, the obtained image quality of the target video of the fourth resolution is higher.
终端接收低分辨率视频和超分模型占用的网络带宽较低,此外,由于该超分模型为根据目标视频进行过拟合训练得到,对于目标视频的超分恢复得到的图像画质较高。The low-resolution video received by the terminal and the network bandwidth occupied by the super-score model are low. In addition, since the over-score model is obtained by overfitting and training according to the target video, the image quality obtained by the over-score restoration of the target video is higher.
下面从端云协同的角度介绍本申请实施例提供的视频传输方法,请参阅图4a。The following describes the video transmission method provided by the embodiment of the present application from the perspective of device-cloud collaboration, please refer to FIG. 4a.
4101、服务器获取数据集;4101. The server obtains the data set;
服务器在对超分模型进行训练之前需要先获取数据集。可选地,目标视频为系列视频中任意一集视频,则该集视频对应的第二分辨率的目标视频的视频帧与其对应的第一分辨率的目标视频的视频帧被提取出来作为该集视频的超分模型的数据集。其中低分辨率视频的视频 帧作为该集数据集的数据(data),其对应的高分辨率视频的视频帧作为该集数据集的标签(label)。The server needs to fetch the dataset before training the superscore model. Optionally, if the target video is any set of videos in the series of videos, then the video frame of the target video of the second resolution corresponding to the set video and the video frame of the target video of the corresponding first resolution are extracted as the set. A dataset of super-score models for video. The video frame of the low-resolution video is used as the data of the dataset, and the video frame of the corresponding high-resolution video is used as the label of the dataset.
可选地,如图4b所示,本步骤云端服务器调用第三方视频编解码服务(4002)对已存储的视频资源(4001),例如华为视频、爱奇艺视频或优酷视频等进行解码得到视频帧。本实施例中,仅以单个视频为目标视频进行介绍。示例性的,对于电视剧《隐秘的XX》中第一集的4K资源作为第一分辨率的目标视频(4003),其第一集的1080P资源作为第二分辨率的目标视频(4004),同一集视频的第一分辨率的视频帧与第二分辨率的视频帧的每一帧相互对应。Optionally, as shown in Figure 4b, in this step, the cloud server invokes a third-party video encoding and decoding service (4002) to decode the stored video resources (4001), such as Huawei Video, iQiyi Video or Youku Video, etc. to obtain a video. frame. In this embodiment, only a single video is used as the target video for introduction. Exemplarily, for the 4K resource of the first episode of the TV series "The Secret XX" as the target video of the first resolution (4003), the 1080P resource of the first episode of the TV series is used as the target video of the second resolution (4004), the same Each of the video frames of the first resolution and the video frames of the second resolution of the set video corresponds to each other.
通过视频解码服务(4002)解码获取目标视频的第二分辨率的视频帧(4004),以及对应的第一分辨率的视频帧(4003),解码后的高低清视频每帧相互对应,示例性的,例如第二分辨率的视频帧(4004)中的f_1帧对应第一分辨率的视频帧(4003)的f_1帧。Use the video decoding service (4002) to decode and obtain the video frame (4004) of the second resolution of the target video, and the corresponding video frame (4003) of the first resolution, and each frame of the decoded high- and low-definition video corresponds to each other, exemplary For example, the f_1 frame in the video frame (4004) of the second resolution corresponds to the f_1 frame of the video frame (4003) of the first resolution.
由于数据集的质量对于超分模型训练的效果十分重要,可选地,本步骤对超分模型4005进行训练之前,可以对第一分辨率的视频帧(4003)和第二分辨率的视频帧(4004)通过去除模糊帧,过滤低频信息等方式进行数据清洗,对数据集进行数据清洗的具体过程如图5所示:第一分辨率的目标视频(5001)经解码得到第一分辨率的视频帧,第二分辨率的目标视频(5002)经解码得到第二分辨率的视频帧。对于解码后的每一帧利用分辨率检测算法检测分类数据集中每一帧为高清帧或模糊帧或低信息量帧,将模糊帧与低信息量帧剔除掉获得的数据集为数据清洗之后的数据集,其中,目标视频的第一分辨率的视频帧将作为超分模型训练的标签(label),其对应第二分辨率的视频帧将作为该集模型的数据(data)。Since the quality of the dataset is very important for the effect of super-resolution model training, optionally, before training the super-resolution model 4005 in this step, the video frames (4003) of the first resolution and the video frames of the second resolution may be (4004) Data cleaning is performed by removing blurred frames, filtering low-frequency information, etc. The specific process of performing data cleaning on the data set is shown in Figure 5: The target video of the first resolution (5001) is decoded to obtain the first resolution. A video frame, the target video of the second resolution (5002) is decoded to obtain a video frame of the second resolution. For each decoded frame, the resolution detection algorithm is used to detect that each frame in the classified data set is a high-definition frame or a fuzzy frame or a low-information frame, and the data set obtained by eliminating the fuzzy frame and the low-information frame is the data after data cleaning. A data set, in which the video frames of the first resolution of the target video will be used as labels for training the super-score model, and the video frames corresponding to the second resolution will be used as the data (data) of this set of models.
4102、服务器训练超分模型;4102. The server trains the super-scoring model;
如图6所示,利用步骤4101生成的该集视频的第一分辨率的视频帧(6001)以及对应的第二分辨率视频的视频帧(6002)作为超分模型的数据集,每个第一分辨率的视频帧都有与之对应的第二分辨率的视频帧。利用该数据集对于超分神经网络模型进行过拟合的训练,得到与该集视频绑定的过拟合的超分模型(6003)。As shown in FIG. 6 , the video frames (6001) of the first resolution of the set of videos and the corresponding video frames (6002) of the second resolution videos generated in step 4101 are used as the data sets of the super-score model. A video frame of one resolution has a corresponding video frame of a second resolution. Using the data set to perform overfitting training on the super-resolution neural network model, an over-fitting super-resolution model bound to the set of videos is obtained (6003).
4103、服务器编码数据包;4103. The server encodes the data packet;
如图4b所示,编码模块(4006)将步骤4102生成的该集视频的超分模型(4005),以及第二分辨率的视频帧(4004)进行编码得到数据包。As shown in Fig. 4b, the encoding module (4006) encodes the super-segmented model (4005) of the set of videos generated in step 4102 and the video frame (4004) of the second resolution to obtain a data packet.
可选地,编码模块对超分模型的处理,如图7所示,生成仅包含权重(weights)参数且不包含模型结构的二进制模型文件model(6002-2);对该二进制模型文件进行哈希处理,得到hash(6002-3)值,放在二进制模型文件尾部;模型6002前部的header(6002-1)是设置该模块的一些参数。三个小模块(6002-1、6002-2、6002-3)共同组成了图7中的模型(6002)。如图8所示,在第一个数据包的进行传输前,将数据包报头的扩展标志位X置1,使数据包可以扩展自定义数据。如图9所示,在发送首个数据包时,把报头(6001)的扩展标志位打开,把模型数据(6002)放到报头(6001)和有效载荷(6003)的中间位置,有效载荷(6003)里面是视频数据。因模型文件已包含在云端向端侧发送的第一个数据包中,服务器向端侧发送的第二个及其后续数据包中将数据包报头的扩展标志位置0(如图9所示),则数据包中的报头后无拓展数据,报头(6001)与有效载荷(6003)之间无模型文件,有效的保证了数据传输的高效性与安全性。Optionally, the processing of the super-score model by the encoding module, as shown in Figure 7, generates a binary model file model (6002-2) that only contains weights parameters and does not contain the model structure; After processing, get the hash (6002-3) value and put it at the end of the binary model file; the header (6002-1) at the front of the model 6002 is to set some parameters of the module. Three small modules (6002-1, 6002-2, 6002-3) together make up the model (6002) in Figure 7. As shown in FIG. 8 , before the first data packet is transmitted, the extension flag bit X of the data packet header is set to 1, so that the data packet can be extended with custom data. As shown in Figure 9, when the first data packet is sent, the extended flag bit of the header (6001) is turned on, and the model data (6002) is placed in the middle of the header (6001) and the payload (6003), and the payload ( 6003) is the video data. Since the model file is included in the first data packet sent from the cloud to the terminal, the second and subsequent data packets sent by the server to the terminal will set the extended flag position of the packet header to 0 (as shown in Figure 9) , there is no extended data after the header in the data packet, and there is no model file between the header (6001) and the payload (6003), which effectively ensures the efficiency and security of data transmission.
4104、服务器向终端发送低分辨率视频和超分模型;4104. The server sends the low-resolution video and the super-resolution model to the terminal;
如图4b所示,服务器通过网络传输向终端发送包含超分模型(4005)以及第二分辨率的视频帧(4004)的数据包,终端接收该数据包。可以理解的是,根据实时的网络传输情况或者用户需求,服务器可以选择不同分辨率的目标视频向终端发送,不限定于用于训练超分模型的低分辨率的目标视频(4004),本实施例中,以发送第二分辨率的视频为例进行介绍。As shown in Figure 4b, the server sends a data packet including the super-resolution model (4005) and the video frame (4004) of the second resolution to the terminal through network transmission, and the terminal receives the data packet. It can be understood that, according to the real-time network transmission situation or user requirements, the server can select target videos of different resolutions to send to the terminal. In the example, the video of the second resolution is sent as an example for introduction.
4105、终端解码数据包;4105. The terminal decodes the data packet;
终端接收云侧通过网络传输下发的数据包,通过视频帧与超分模型解码模块(4009)解码数据包(4010)得到超分模型(4011)和第二分辨率的目标视频(4012)。为了避免模型数据在网络被篡改或丢失,解码模型需要对模型数据做一致性校验。如图10所示,解码模块从数据包中获取二进制模型数据块(7001),对数据块进行解码处理(7004),得到二进制模型(7002)和Hash值(7003)。使用与云端相同的哈希方法对二进制模型(7002)进行Hash处理(7005),得到一个Hash值(7006),对Hash值(7003)和Hash值(7006)进行一致性校验,如果相同,则认为该模型是可靠的。若不同则向云端发送错误信息,要求云端重新发送数据包。The terminal receives the data packets transmitted and delivered by the cloud side through the network, and decodes the data packets (4010) through the video frame and super-division model decoding module (4009) to obtain the super-division model (4011) and the target video of the second resolution (4012). In order to avoid the model data being tampered with or lost in the network, the decoding model needs to perform consistency check on the model data. As shown in FIG. 10 , the decoding module obtains the binary model data block (7001) from the data packet, and decodes the data block (7004) to obtain the binary model (7002) and the hash value (7003). Hash (7005) the binary model (7002) using the same hashing method as in the cloud to obtain a Hash value (7006), and perform consistency check on the Hash value (7003) and the Hash value (7006), if they are the same, The model is considered reliable. If it is different, it will send an error message to the cloud and ask the cloud to resend the data packet.
4106、终端超分低分辨率视频并送显;4106. The terminal super-divides low-resolution video and sends it for display;
请参阅图4b终端使用端侧推理引擎(4013)将第二分辨率的目标视频(4012)通过超分模型(4011)进行视频超分处理得到第一分辨率的目标视频(4014),并送显于端侧显示模块(4015)。如图11所示,超分模型可以将较低分辨率的视频帧进行超分辨率重建得到较高分辨率的视频帧。Referring to Figure 4b, the terminal uses the end-side inference engine (4013) to perform video super-resolution processing on the target video (4012) of the second resolution through the super-resolution model (4011) to obtain the target video of the first resolution (4014), and sends it to the target video (4014). Displayed on the end-side display module (4015). As shown in Figure 11, the super-resolution model can perform super-resolution reconstruction of lower-resolution video frames to obtain higher-resolution video frames.
本申请实施例提供的视频传输方法,可以在保持端侧视频画质不变的基础上,使用较小的带宽(例如带宽降低一半)进行传输。一方面,该方法降低了视频平台的带宽成本,增加了该视频平台的市场竞争力。例如,一集分辨率为4K的视频约为2GB(Gigabyte)大小,而传输1080p分辨率的视频约为450MB(Megabyte),超分模型约10MB大小,由此可见,本申请实施例提供的视频传输方法可以降低传输带宽。另一方面,由于服务器发送给终端的超分模型为基于目标视频进行训练得到的过拟合的模型,在对目标视频进行超分时效果较通用型超分模型的效果更好,超分获取的视频的画质较高。The video transmission method provided by the embodiments of the present application can use a smaller bandwidth (for example, reduce the bandwidth by half) for transmission on the basis of keeping the video quality of the terminal side unchanged. On the one hand, the method reduces the bandwidth cost of the video platform and increases the market competitiveness of the video platform. For example, a video with a resolution of 4K is about 2 GB (Gigabyte) in size, while a video with a resolution of 1080p transmitted is about 450 MB (Megabyte), and the size of the super-score model is about 10 MB. It can be seen that the video provided by the embodiment of the present application is about 450 MB (Megabyte). The transmission method can reduce the transmission bandwidth. On the other hand, since the super-score model sent by the server to the terminal is an over-fitting model obtained by training based on the target video, the effect of super-score on the target video is better than that of the general-purpose super-score model, and the over-score acquisition The video quality is higher.
上面介绍了本申请提供的视频传输方法,下面对实现该视频传输方法的服务器进行介绍,请参阅图12,为本申请实施例中服务器的一个实施例示意图。The video transmission method provided by the present application is described above, and the server that implements the video transmission method is introduced below. Please refer to FIG. 12 , which is a schematic diagram of an embodiment of the server in the embodiment of the present application.
图12中的各个模块的只一个或多个可以软件、硬件、固件或其结合实现。所述软件或固件包括但不限于计算机程序指令或代码,并可以被硬件处理器所执行。所述硬件包括但不限于各类集成电路,如中央处理单元(CPU)、数字信号处理器(DSP)、现场可编程门阵列(FPGA)或专用集成电路(ASIC)。Only one or more of the various modules in FIG. 12 may be implemented in software, hardware, firmware, or a combination thereof. The software or firmware includes, but is not limited to, computer program instructions or code, and can be executed by a hardware processor. The hardware includes, but is not limited to, various types of integrated circuits, such as a central processing unit (CPU), a digital signal processor (DSP), a field programmable gate array (FPGA), or an application specific integrated circuit (ASIC).
该服务器包括:The server includes:
获取模块1201,用于获取目标视频的超分模型,所述超分模型根据第一分辨率的所述目标视频和第二分辨率的所述目标视频进行训练获取,所述第二分辨率小于所述第一分辨率,所述超分模型的放大倍率为第一分辨率与第二分辨率在长度方向上的像素之比或在宽度方向上的像素之比;The acquisition module 1201 is used to acquire a super-score model of a target video, and the super-score model is obtained by training according to the target video of the first resolution and the target video of the second resolution, and the second resolution is less than For the first resolution, the magnification of the super-resolution model is the ratio of pixels in the length direction or the ratio of pixels in the width direction of the first resolution and the second resolution;
发送模块1202,用于向终端发送第三分辨率的所述目标视频和所述超分模型,所述超分模型用于以所述放大倍率对所述第三分辨率的所述目标视频进行超分辨率重建以获取第四分 辨率的所述目标视频。Sending module 1202, configured to send the target video of the third resolution and the super-resolution model to the terminal, where the super-resolution model is used to perform the target video of the third resolution at the magnification ratio. Super-resolution reconstruction to obtain the target video at a fourth resolution.
可选地,所述目标视频包括单个视频或多个同类型视频的集合。Optionally, the target video includes a single video or a set of multiple videos of the same type.
可选地,所述第三分辨率等于所述第二分辨率;所述第四分辨率等于所述第一分辨率。Optionally, the third resolution is equal to the second resolution; the fourth resolution is equal to the first resolution.
可选地,所述获取模块1201具体用于:将所述第一分辨率的目标视频和第二分辨率的所述目标视频输入卷积神经网络模型,获取过拟合的所述超分模型。Optionally, the obtaining module 1201 is specifically configured to: input the target video of the first resolution and the target video of the second resolution into a convolutional neural network model, and obtain the over-fitted super-score model. .
可选地,所述获取模块1201具体用于:对所述第一分辨率的视频中的视频帧和所述第二分辨率的视频中的视频帧进行数据清洗,获取所述第一分辨率的所述目标视频和所述第二分辨率的所述目标视频;对所述第一分辨率的所述目标视频和所述第二分辨率的所述目标视频进行训练,获取所述超分模型。Optionally, the obtaining module 1201 is specifically configured to: perform data cleaning on the video frames in the video of the first resolution and the video frames in the video of the second resolution, and obtain the first resolution. the target video of the first resolution and the target video of the second resolution; perform training on the target video of the first resolution and the target video of the second resolution, and obtain the super-score Model.
可选地,所述发送模块1202具体用于:向所述终端发送所述超分模型的结构;向所述终端发送数据包,所述数据包包括所述超分模型的权重参数和所述第三分辨率的所述目标视频。Optionally, the sending module 1202 is specifically configured to: send the structure of the super-score model to the terminal; send a data packet to the terminal, the data packet including the weight parameter of the super-score model and the the target video at a third resolution.
本申请实施例提供的服务器,通过获取模块获取目标视频的超分模型和第三分辨率的目标视频,由发送模块发送给终端,由此,实现低带宽的视频传输,基于该超分视频进行超分恢复较现有技术基于通用的超分模型进行超分恢复还可以提升视频画质。The server provided by the embodiment of the present application acquires the super-division model of the target video and the target video of the third resolution through the acquisition module, and sends them to the terminal by the sending module. Compared with the prior art, the over-score recovery based on the general over-score model can also improve the video quality.
下面对实现该视频传输方法的终端进行介绍,请参阅图13,为本申请实施例中终端的一个实施例示意图。A terminal that implements the video transmission method is introduced below. Please refer to FIG. 13 , which is a schematic diagram of an embodiment of the terminal in the embodiment of the present application.
图13中的各个模块的只一个或多个可以软件、硬件、固件或其结合实现。所述软件或固件包括但不限于计算机程序指令或代码,并可以被硬件处理器所执行。所述硬件包括但不限于各类集成电路,如中央处理单元(CPU)、数字信号处理器(DSP)、现场可编程门阵列(FPGA)或专用集成电路(ASIC)。Only one or more of the various modules in FIG. 13 may be implemented in software, hardware, firmware, or a combination thereof. The software or firmware includes, but is not limited to, computer program instructions or code, and can be executed by a hardware processor. The hardware includes, but is not limited to, various types of integrated circuits, such as a central processing unit (CPU), a digital signal processor (DSP), a field programmable gate array (FPGA), or an application specific integrated circuit (ASIC).
该终端包括:接收模块1301,用于接收服务器发送的目标视频的超分模型,所述超分模型根据第一分辨率的所述目标视频和第二分辨率的所述目标视频进行训练获取,所述第二分辨率小于所述第一分辨率,所述超分模型的放大倍率为第一分辨率与第二分辨率在长度方向上或在宽度方向上的像素之比,以及第三分辨率的所述目标视频;The terminal includes: a receiving module 1301, configured to receive a super-score model of a target video sent by a server, where the super-score model is acquired by training according to the target video of the first resolution and the target video of the second resolution, The second resolution is smaller than the first resolution, the magnification of the super-resolution model is the ratio of the pixels of the first resolution and the second resolution in the length direction or in the width direction, and the third resolution rate of the target video;
处理模块1302,用于根据所述超分模型,以所述放大倍率对所述第三分辨率的所述目标视频进行超分辨率重建,获取第四分辨率的所述目标视频。The processing module 1302 is configured to perform super-resolution reconstruction on the target video of the third resolution at the magnification according to the super-resolution model, and obtain the target video of the fourth resolution.
可选地,所述目标视频包括单个视频或多个同类型视频的集合。Optionally, the target video includes a single video or a set of multiple videos of the same type.
可选地,所述第三分辨率等于所述第二分辨率;所述第四分辨率等于所述第一分辨率。Optionally, the third resolution is equal to the second resolution; the fourth resolution is equal to the first resolution.
可选地,所述超分模型包括:由所述第一分辨率的目标视频和第二分辨率的所述目标视频输入卷积神经网络模型获取的过拟合的超分模型。Optionally, the super-score model includes: an over-fitted super-score model obtained by inputting the target video of the first resolution and the target video of the second resolution into a convolutional neural network model.
可选地,所述超分模型包括:所述第一分辨率的所述目标视频和所述第二分辨率的所述目标视频进行训练获取的超分模型,所述第一分辨率的所述目标视频和所述第二分辨率的所述目标视频由所述第一分辨率的视频中的视频帧和所述第二分辨率的视频中的视频帧进行数据清洗得到。Optionally, the super-score model includes: a super-score model obtained by training the target video of the first resolution and the target video of the second resolution, and all the images of the first resolution. The target video and the target video of the second resolution are obtained by performing data cleaning on the video frames in the video of the first resolution and the video frames of the video of the second resolution.
可选地,所述接收模块1301具体用于:接收所述服务器发送的所述超分模型的结构;接收所述服务器发送的数据包,所述数据包包括所述超分模型的权重参数和所述第三分辨率的所述目标视频。Optionally, the receiving module 1301 is specifically configured to: receive the structure of the super-score model sent by the server; receive a data packet sent by the server, where the data packet includes the weight parameters of the super-score model and the target video at the third resolution.
本申请实施例提供的终端,接收模块接收服务器端发送的目标视频的超分模型和第三分 辨率的目标视频,视频传输的带宽较低,基于该超分模型进行超分恢复较现有技术基于通用的超分模型进行超分恢复可以提升视频画质。In the terminal provided by the embodiment of the present application, the receiving module receives the superdivision model of the target video and the target video of the third resolution sent by the server, and the bandwidth of video transmission is relatively low, and performing superdivision recovery based on the superdivision model is more efficient than the prior art. The super-score recovery based on the general super-score model can improve the video quality.
请参阅图14,为本申请实施例中服务器的另一个实施例示意图;Please refer to FIG. 14 , which is a schematic diagram of another embodiment of the server in the embodiment of the present application;
本申请实施例中的服务器,可以是物理机,也可以为运行在抽象硬件资源上的虚拟机,在实际应用场景中,可以是提供各种云服务的服务器,本申请实施例中对其具体设备形态不做限定。The server in this embodiment of the present application may be a physical machine or a virtual machine running on abstract hardware resources. In an actual application scenario, it may be a server that provides various cloud services. The device form is not limited.
本实施例提供的服务器1400可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上处理器1401和存储器1402,该存储器1402中存储有程序或数据。The server 1400 provided in this embodiment may vary greatly due to different configurations or performance, and may include one or more processors 1401 and a memory 1402, where programs or data are stored in the memory 1402.
其中,存储器1402可以是易失性存储或非易失性存储。可选地,处理器1401是一个或多个中央处理器(CPU,Central Processing Unit,该CPU可以是单核CPU,也可以是多核CPU。处理器1401可以与存储器1402通信,在服务器1400上执行存储器1402中的一系列指令。Among them, the memory 1402 may be volatile storage or non-volatile storage. Optionally, the processor 1401 is one or more central processing units (CPU, Central Processing Unit, which can be a single-core CPU or a multi-core CPU. The processor 1401 can communicate with the memory 1402 to execute on the server 1400 . A sequence of instructions in memory 1402.
该服务器1400还包括一个或一个以上有线或无线网络接口1403,例如以太网接口。The server 1400 also includes one or more wired or wireless network interfaces 1403, such as Ethernet interfaces.
可选地,尽管图14中未示出,服务器1400还可以包括一个或一个以上电源;一个或一个以上输入输出接口,输入输出接口可以用于连接显示器、鼠标、键盘、触摸屏设备或传感设备等,输入输出接口为可选部件,可以存在也可以不存在,此处不做限定。Optionally, although not shown in FIG. 14 , the server 1400 may also include one or more power supplies; one or more input/output interfaces, which may be used to connect a monitor, mouse, keyboard, touch screen device or sensing device etc., the input and output interfaces are optional components, which may or may not exist, and are not limited here.
本实施例中服务器1400中的处理器1401所执行的流程可以参考前述方法实施例中描述的方法流程,此处不加赘述。For the process performed by the processor 1401 in the server 1400 in this embodiment, reference may be made to the method process described in the foregoing method embodiments, and details are not repeated here.
请参阅图15,为本申请实施例中终端的另一个实施例示意图;Please refer to FIG. 15 , which is a schematic diagram of another embodiment of the terminal in the embodiment of the present application;
本实施例提供的终端1500,可以为各类具有显示功能的终端,例如手机、平板电脑、台式电脑、智慧屏或可穿戴设备等,本申请实施例中对其具体设备形态不做限定。The terminal 1500 provided in this embodiment may be various types of terminals with display functions, such as a mobile phone, a tablet computer, a desktop computer, a smart screen, or a wearable device, and the specific device form is not limited in this embodiment of the present application.
该终端1500可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上处理器1501和存储器1502,该存储器1502中存储有程序或数据。The terminal 1500 may vary greatly due to different configurations or performances, and may include one or more processors 1501 and a memory 1502 in which programs or data are stored.
其中,存储器1502可以是易失性存储或非易失性存储。可选地,处理器1501是一个或多个中央处理器(CPU,Central Processing Unit,该CPU可以是单核CPU,也可以是多核CPU。处理器1501可以与存储器1502通信,在终端1500上执行存储器1502中的一系列指令。Among them, the memory 1502 may be volatile storage or non-volatile storage. Optionally, the processor 1501 is one or more central processing units (CPU, Central Processing Unit, which can be a single-core CPU or a multi-core CPU. The processor 1501 can communicate with the memory 1502 and execute on the terminal 1500 A series of instructions in memory 1502.
该终端1500还包括一个或一个以上有线或无线网络接口1503,例如以太网接口。The terminal 1500 also includes one or more wired or wireless network interfaces 1503, such as Ethernet interfaces.
可选地,尽管图15中未示出,终端1500还可以包括一个或一个以上电源;一个或一个以上输入输出接口,输入输出接口可以用于连接显示器、鼠标、键盘、触摸屏设备或传感设备等,输入输出接口为可选部件,可以存在也可以不存在,此处不做限定。Optionally, although not shown in FIG. 15 , the terminal 1500 may also include one or more power supplies; one or more input and output interfaces, which may be used to connect a display, a mouse, a keyboard, a touch screen device or a sensing device etc., the input and output interfaces are optional components, which may or may not exist, and are not limited here.
本实施例中终端1500中的处理器1501所执行的流程可以参考前述方法实施例中描述的方法流程,此处不加赘述。For the process performed by the processor 1501 in the terminal 1500 in this embodiment, reference may be made to the method process described in the foregoing method embodiments, and details are not repeated here.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and brevity of description, the specific working process of the system, device and unit described above may refer to the corresponding process in the foregoing method embodiments, which will not be repeated here.
在本申请所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的 相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented. On the other hand, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit. The above-mentioned integrated units may be implemented in the form of hardware, or may be implemented in the form of software functional units.
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。The integrated unit, if implemented in the form of a software functional unit and sold or used as an independent product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solutions of the present application can be embodied in the form of software products in essence, or the parts that contribute to the prior art, or all or part of the technical solutions, and the computer software products are stored in a storage medium , including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk and other media that can store program codes .
以上所述,以上实施例仅用以说明本申请的技术方案,而非对其限制。As mentioned above, the above embodiments are only used to illustrate the technical solutions of the present application, but not to limit them.

Claims (29)

  1. 一种视频传输方法,其特征在于,包括:A video transmission method, comprising:
    服务器获取目标视频的超分模型,所述超分模型根据第一分辨率的所述目标视频和第二分辨率的所述目标视频进行训练获取,所述第二分辨率小于所述第一分辨率,所述超分模型的放大倍率为第一分辨率与第二分辨率在长度方向上或在宽度方向上的像素之比;The server obtains a super-resolution model of the target video, and the super-resolution model is obtained by training according to the target video of a first resolution and the target video of a second resolution, and the second resolution is smaller than the first resolution The magnification of the super-resolution model is the ratio of the pixels of the first resolution and the second resolution in the length direction or in the width direction;
    所述服务器向终端发送第三分辨率的所述目标视频和所述超分模型,所述超分模型用于以所述放大倍率对所述第三分辨率的所述目标视频进行超分辨率重建以获取第四分辨率的所述目标视频。The server sends the target video of the third resolution and the super-resolution model to the terminal, and the super-resolution model is used to perform super-resolution on the target video of the third resolution at the magnification ratio Reconstruct to obtain the target video at a fourth resolution.
  2. 根据权利要求1所述的方法,其特征在于,所述目标视频为单个视频或多个同类型视频的集合。The method according to claim 1, wherein the target video is a single video or a set of multiple videos of the same type.
  3. 根据权利要求1或2所述的方法,其特征在于,The method according to claim 1 or 2, characterized in that,
    所述第三分辨率等于所述第二分辨率;the third resolution is equal to the second resolution;
    所述第四分辨率等于所述第一分辨率。The fourth resolution is equal to the first resolution.
  4. 根据权利要求1至3中任一项所述的方法,其特征在于,所述服务器获取目标视频的超分模型包括:The method according to any one of claims 1 to 3, wherein the obtaining, by the server, the super-score model of the target video comprises:
    所述服务器将所述第一分辨率的目标视频和第二分辨率的所述目标视频输入卷积神经网络模型,获取过拟合的所述超分模型。The server inputs the target video of the first resolution and the target video of the second resolution into a convolutional neural network model, and obtains the over-fitted super-score model.
  5. 根据权利要求1至3中任一项所述的方法,其特征在于,所述服务器获取目标视频的超分模型包括:The method according to any one of claims 1 to 3, wherein the obtaining, by the server, the super-score model of the target video comprises:
    所述服务器对所述第一分辨率的视频中的视频帧和所述第二分辨率的视频中的视频帧进行数据清洗,获取所述第一分辨率的所述目标视频和所述第二分辨率的所述目标视频;The server performs data cleaning on the video frames in the video of the first resolution and the video frames in the video of the second resolution, and obtains the target video of the first resolution and the second resolution. the target video of the resolution;
    所述服务器对所述第一分辨率的所述目标视频和所述第二分辨率的所述目标视频进行训练,获取所述超分模型。The server performs training on the target video of the first resolution and the target video of the second resolution to obtain the super-score model.
  6. 根据权利要求1至5中任一项所述的方法,其特征在于,所述服务器向终端发送第三分辨率的所述目标视频和所述超分模型具体包括:The method according to any one of claims 1 to 5, wherein the sending, by the server to the terminal, the target video of the third resolution and the super-resolution model specifically includes:
    所述服务器向所述终端发送所述超分模型的结构;The server sends the structure of the super-resolution model to the terminal;
    所述服务器向所述终端发送数据包,所述数据包包括所述超分模型的权重参数和所述第三分辨率的所述目标视频。The server sends a data packet to the terminal, where the data packet includes the weight parameter of the super-score model and the target video of the third resolution.
  7. 一种视频传输方法,其特征在于,包括:A video transmission method, comprising:
    终端接收服务器发送的目标视频的超分模型,所述超分模型根据第一分辨率的所述目标视频和第二分辨率的所述目标视频进行训练获取,所述第二分辨率小于所述第一分辨率,所述超分模型的放大倍率为第一分辨率与第二分辨率在长度方向上或在宽度方向上的像素之比,以及第三分辨率的所述目标视频;The terminal receives the super-score model of the target video sent by the server, and the super-score model is obtained by training according to the target video of the first resolution and the target video of the second resolution, and the second resolution is smaller than the The first resolution, the magnification of the super-resolution model is the ratio of the pixels of the first resolution and the second resolution in the length direction or in the width direction, and the target video of the third resolution;
    所述终端根据所述超分模型,以所述放大倍率对所述第三分辨率的所述目标视频进行超分辨率重建,获取第四分辨率的所述目标视频。The terminal performs super-resolution reconstruction on the target video of the third resolution at the magnification according to the super-resolution model, and obtains the target video of the fourth resolution.
  8. 根据权利要求7所述的方法,其特征在于,所述目标视频为单个视频或多个同类型视频的集合。The method according to claim 7, wherein the target video is a single video or a set of multiple videos of the same type.
  9. 根据权利要求7或8所述的方法,其特征在于,The method according to claim 7 or 8, wherein,
    所述第三分辨率等于所述第二分辨率;the third resolution is equal to the second resolution;
    所述第四分辨率等于所述第一分辨率。The fourth resolution is equal to the first resolution.
  10. 根据权利要求7至9中任一项所述的方法,其特征在于,所述超分模型包括:The method according to any one of claims 7 to 9, wherein the super-score model comprises:
    由所述第一分辨率的目标视频和第二分辨率的所述目标视频输入卷积神经网络模型获取的过拟合的超分模型。An overfitting super-score model obtained by a convolutional neural network model is input from the target video of the first resolution and the target video of the second resolution.
  11. 根据权利要求7至10中任一项所述的方法,其特征在于,所述超分模型包括:The method according to any one of claims 7 to 10, wherein the super-score model comprises:
    所述第一分辨率的所述目标视频和所述第二分辨率的所述目标视频进行训练获取的超分模型,所述第一分辨率的所述目标视频和所述第二分辨率的所述目标视频由所述第一分辨率的视频中的视频帧和所述第二分辨率的视频中的视频帧进行数据清洗得到。The super-score model obtained by training the target video of the first resolution and the target video of the second resolution, the target video of the first resolution and the target video of the second resolution The target video is obtained by performing data cleaning on video frames in the video of the first resolution and video frames in the video of the second resolution.
  12. 根据权利要求7至11中任一项所述的方法,其特征在于,所述终端接收服务器发送的目标视频的超分模型,以及第三分辨率的所述目标视频包括:The method according to any one of claims 7 to 11, wherein the terminal receives the super-score model of the target video sent by the server, and the target video of the third resolution comprises:
    所述终端接收所述服务器发送的所述超分模型的结构;receiving, by the terminal, the structure of the super-resolution model sent by the server;
    所述终端接收所述服务器发送的数据包,所述数据包包括所述超分模型的权重参数和所述第三分辨率的所述目标视频。The terminal receives a data packet sent by the server, where the data packet includes the weight parameter of the super-score model and the target video of the third resolution.
  13. 一种服务器,其特征在于,包括:A server, characterized in that it includes:
    获取模块,用于获取目标视频的超分模型,所述超分模型根据第一分辨率的所述目标视频和第二分辨率的所述目标视频进行训练获取,所述第二分辨率小于所述第一分辨率,所述超分模型的放大倍率为第一分辨率与第二分辨率在长度方向上的像素之比或在宽度方向上的像素之比;The acquisition module is used to acquire the super-resolution model of the target video, and the super-resolution model is obtained by training according to the target video of the first resolution and the target video of the second resolution, and the second resolution is smaller than the target video. the first resolution, the magnification of the super-resolution model is the ratio of the pixels of the first resolution and the second resolution in the length direction or the ratio of the pixels in the width direction;
    发送模块,用于向终端发送第三分辨率的所述目标视频和所述超分模型,所述超分模型用于以所述放大倍率对所述第三分辨率的所述目标视频进行超分辨率重建以获取第四分辨率的所述目标视频。A sending module is configured to send the target video of the third resolution and the super-resolution model to the terminal, and the super-resolution model is used to perform super-resolution on the target video of the third resolution at the magnification ratio. Resolution reconstruction to obtain the target video at a fourth resolution.
  14. 根据权利要求13所述的服务器,其特征在于,所述目标视频为单个视频或多个同类型视频的集合。The server according to claim 13, wherein the target video is a single video or a collection of multiple videos of the same type.
  15. 根据权利要求13或14所述的服务器,其特征在于,The server according to claim 13 or 14, characterized in that:
    所述第三分辨率等于所述第二分辨率;the third resolution is equal to the second resolution;
    所述第四分辨率等于所述第一分辨率。The fourth resolution is equal to the first resolution.
  16. 根据权利要求13至15中任一项所述的服务器,其特征在于,所述获取模块具体用于:The server according to any one of claims 13 to 15, wherein the obtaining module is specifically configured to:
    将所述第一分辨率的目标视频和第二分辨率的所述目标视频输入卷积神经网络模型,获取过拟合的所述超分模型。Inputting the target video of the first resolution and the target video of the second resolution into a convolutional neural network model to obtain the over-fitted super-score model.
  17. 根据权利要求13至16中任一项所述的服务器,其特征在于,所述获取模块具体用于:The server according to any one of claims 13 to 16, wherein the obtaining module is specifically configured to:
    对所述第一分辨率的视频中的视频帧和所述第二分辨率的视频中的视频帧进行数据清洗,获取所述第一分辨率的所述目标视频和所述第二分辨率的所述目标视频;Data cleaning is performed on the video frames in the video of the first resolution and the video frames in the video of the second resolution, and the target video of the first resolution and the target video of the second resolution are obtained. the target video;
    对所述第一分辨率的所述目标视频和所述第二分辨率的所述目标视频进行训练,获取所述超分模型。The target video of the first resolution and the target video of the second resolution are trained to obtain the super-score model.
  18. 根据权利要求13至17中任一项所述的服务器,其特征在于,所述发送模块具体用于:The server according to any one of claims 13 to 17, wherein the sending module is specifically configured to:
    向所述终端发送所述超分模型的结构;sending the structure of the super-resolution model to the terminal;
    向所述终端发送数据包,所述数据包包括所述超分模型的权重参数和所述第三分辨率的 所述目标视频。Send a data packet to the terminal, where the data packet includes the weight parameter of the super-score model and the target video of the third resolution.
  19. 一种终端,其特征在于,包括:A terminal, characterized in that it includes:
    接收模块,用于接收服务器发送的目标视频的超分模型,所述超分模型根据第一分辨率的所述目标视频和第二分辨率的所述目标视频进行训练获取,所述第二分辨率小于所述第一分辨率,所述超分模型的放大倍率为第一分辨率与第二分辨率在长度方向上或在宽度方向上的像素之比,以及第三分辨率的所述目标视频;The receiving module is used to receive the super-score model of the target video sent by the server. ratio is less than the first resolution, the magnification of the super-resolution model is the ratio of the pixels of the first resolution and the second resolution in the length direction or in the width direction, and the target of the third resolution video;
    处理模块,用于根据所述超分模型,以所述放大倍率对所述第三分辨率的所述目标视频进行超分辨率重建,获取第四分辨率的所述目标视频。A processing module, configured to perform super-resolution reconstruction on the target video of the third resolution at the magnification according to the super-resolution model, to obtain the target video of the fourth resolution.
  20. 根据权利要求19所述的终端,其特征在于,所述目标视频为单个视频或多个同类型视频的集合。The terminal according to claim 19, wherein the target video is a single video or a set of multiple videos of the same type.
  21. 根据权利要求19或20所述的终端,其特征在于,The terminal according to claim 19 or 20, wherein,
    所述第三分辨率等于所述第二分辨率;the third resolution is equal to the second resolution;
    所述第四分辨率等于所述第一分辨率。The fourth resolution is equal to the first resolution.
  22. 根据权利要求19至21中任一项所述的终端,其特征在于,所述超分模型包括:The terminal according to any one of claims 19 to 21, wherein the super-score model comprises:
    由所述第一分辨率的目标视频和第二分辨率的所述目标视频输入卷积神经网络模型获取的过拟合的超分模型。An overfitting super-score model obtained by a convolutional neural network model is input from the target video of the first resolution and the target video of the second resolution.
  23. 根据权利要求19至22中任一项所述的终端,其特征在于,所述超分模型包括:The terminal according to any one of claims 19 to 22, wherein the super-score model comprises:
    所述第一分辨率的所述目标视频和所述第二分辨率的所述目标视频进行训练获取的超分模型,所述第一分辨率的所述目标视频和所述第二分辨率的所述目标视频由所述第一分辨率的视频中的视频帧和所述第二分辨率的视频中的视频帧进行数据清洗得到。The super-score model obtained by training the target video of the first resolution and the target video of the second resolution, the target video of the first resolution and the target video of the second resolution The target video is obtained by performing data cleaning on video frames in the video of the first resolution and video frames in the video of the second resolution.
  24. 根据权利要求19至23中任一项所述的终端,其特征在于,所述接收模块具体用于:The terminal according to any one of claims 19 to 23, wherein the receiving module is specifically configured to:
    接收所述服务器发送的所述超分模型的结构;receiving the structure of the super-resolution model sent by the server;
    接收所述服务器发送的数据包,所述数据包包括所述超分模型的权重参数和所述第三分辨率的所述目标视频。Receive a data packet sent by the server, where the data packet includes the weight parameter of the super-score model and the target video of the third resolution.
  25. 一种服务器,其特征在于,包括:一个或多个处理器和存储器;其中,A server, characterized by comprising: one or more processors and a memory; wherein,
    所述存储器中存储有计算机可读指令;computer-readable instructions are stored in the memory;
    所述一个或多个处理器读取所述计算机可读指令以使所述终端实现如权利要求1至6中任一项所述的方法。The computer readable instructions are read by the one or more processors to cause the terminal to implement the method of any one of claims 1 to 6.
  26. 一种终端,其特征在于,包括:一个或多个处理器和存储器;其中,A terminal, comprising: one or more processors and memories; wherein,
    所述存储器中存储有计算机可读指令;computer-readable instructions are stored in the memory;
    所述一个或多个处理器读取所述计算机可读指令以使所述终端实现如权利要求7至12中任一项所述的方法。The computer readable instructions are read by the one or more processors to cause the terminal to implement the method of any one of claims 7 to 12.
  27. 一种视频传输系统,其特征在于,包括:A video transmission system, comprising:
    如权利要求1至6中任一项所述的服务器,和如权利要求7至12中任一项所述的终端。The server according to any one of claims 1 to 6, and the terminal according to any one of claims 7 to 12.
  28. 一种计算机程序产品,其特征在于,包括计算机可读指令,当所述计算机可读指令在计算机上运行时,使得所述计算机执行如权利要求1至12任一项所述的方法。A computer program product, characterized by comprising computer-readable instructions which, when executed on a computer, cause the computer to perform the method of any one of claims 1 to 12.
  29. 一种计算机可读存储介质,其特征在于,包括计算机可读指令,当所述计算机可读指令在计算机上运行时,使得所述计算机执行如权利要求1至12中任一项所述的方法。A computer-readable storage medium, characterized by comprising computer-readable instructions, which, when the computer-readable instructions are executed on a computer, cause the computer to execute the method according to any one of claims 1 to 12 .
PCT/CN2021/133497 2020-11-30 2021-11-26 Video transmission method, server, terminal, and video transmission system WO2022111631A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011373386.2 2020-11-30
CN202011373386.2A CN114584805A (en) 2020-11-30 2020-11-30 Video transmission method, server, terminal and video transmission system

Publications (1)

Publication Number Publication Date
WO2022111631A1 true WO2022111631A1 (en) 2022-06-02

Family

ID=81753781

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/133497 WO2022111631A1 (en) 2020-11-30 2021-11-26 Video transmission method, server, terminal, and video transmission system

Country Status (2)

Country Link
CN (1) CN114584805A (en)
WO (1) WO2022111631A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116483587A (en) * 2023-06-21 2023-07-25 湖南马栏山视频先进技术研究院有限公司 Video super-division parallel method, server and medium based on image segmentation
CN116962799A (en) * 2023-07-24 2023-10-27 北京国际云转播科技有限公司 Live video data transmission method and device, anchor client and server

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116416134A (en) * 2023-04-04 2023-07-11 阿里巴巴(中国)有限公司 Image super processing method, system, device, storage medium, and program product
CN116781912B (en) * 2023-08-17 2023-11-14 瀚博半导体(上海)有限公司 Video transmission method, device, computer equipment and computer readable storage medium
CN116886960A (en) * 2023-09-01 2023-10-13 深圳金三立视频科技股份有限公司 Video transmission method and device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103167284A (en) * 2011-12-19 2013-06-19 中国电信股份有限公司 Video streaming transmission method and system based on picture super-resolution
CN105744357A (en) * 2016-02-29 2016-07-06 哈尔滨超凡视觉科技有限公司 Method for reducing network video bandwidth occupation based on online resolution improvement
US20170024855A1 (en) * 2015-07-26 2017-01-26 Macau University Of Science And Technology Single Image Super-Resolution Method Using Transform-Invariant Directional Total Variation with S1/2+L1/2-norm
CN106791927A (en) * 2016-12-23 2017-05-31 福建帝视信息科技有限公司 A kind of video source modeling and transmission method based on deep learning
CN110636289A (en) * 2019-09-27 2019-12-31 北京金山云网络技术有限公司 Image data transmission method, system, device, electronic equipment and storage medium
CN111757087A (en) * 2020-06-30 2020-10-09 北京金山云网络技术有限公司 VR video processing method and device and electronic equipment

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109525859B (en) * 2018-10-10 2021-01-15 腾讯科技(深圳)有限公司 Model training method, image sending method, image processing method and related device equipment
CN111147893B (en) * 2018-11-02 2021-10-22 华为技术有限公司 Video self-adaption method, related equipment and storage medium
US20200162789A1 (en) * 2018-11-19 2020-05-21 Zhan Ma Method And Apparatus Of Collaborative Video Processing Through Learned Resolution Scaling
CN111510739B (en) * 2019-01-31 2022-04-29 华为技术有限公司 Video transmission method and device
CN110532871B (en) * 2019-07-24 2022-05-10 华为技术有限公司 Image processing method and device
CN111667410B (en) * 2020-06-10 2021-09-14 腾讯科技(深圳)有限公司 Image resolution improving method and device and electronic equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103167284A (en) * 2011-12-19 2013-06-19 中国电信股份有限公司 Video streaming transmission method and system based on picture super-resolution
US20170024855A1 (en) * 2015-07-26 2017-01-26 Macau University Of Science And Technology Single Image Super-Resolution Method Using Transform-Invariant Directional Total Variation with S1/2+L1/2-norm
CN105744357A (en) * 2016-02-29 2016-07-06 哈尔滨超凡视觉科技有限公司 Method for reducing network video bandwidth occupation based on online resolution improvement
CN106791927A (en) * 2016-12-23 2017-05-31 福建帝视信息科技有限公司 A kind of video source modeling and transmission method based on deep learning
CN110636289A (en) * 2019-09-27 2019-12-31 北京金山云网络技术有限公司 Image data transmission method, system, device, electronic equipment and storage medium
CN111757087A (en) * 2020-06-30 2020-10-09 北京金山云网络技术有限公司 VR video processing method and device and electronic equipment

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116483587A (en) * 2023-06-21 2023-07-25 湖南马栏山视频先进技术研究院有限公司 Video super-division parallel method, server and medium based on image segmentation
CN116483587B (en) * 2023-06-21 2023-09-08 湖南马栏山视频先进技术研究院有限公司 Video super-division parallel method, server and medium based on image segmentation
CN116962799A (en) * 2023-07-24 2023-10-27 北京国际云转播科技有限公司 Live video data transmission method and device, anchor client and server

Also Published As

Publication number Publication date
CN114584805A (en) 2022-06-03

Similar Documents

Publication Publication Date Title
WO2022111631A1 (en) Video transmission method, server, terminal, and video transmission system
US11234006B2 (en) Training end-to-end video processes
US20200162789A1 (en) Method And Apparatus Of Collaborative Video Processing Through Learned Resolution Scaling
US10887613B2 (en) Visual processing using sub-pixel convolutions
TWI624804B (en) A method and system for providing high resolution image through super-resolution reconstrucion
WO2019001108A1 (en) Video processing method and apparatus
CN110072119B (en) Content-aware video self-adaptive transmission method based on deep learning network
EA032859B1 (en) Tiered signal decoding and signal reconstruction
CN103167284A (en) Video streaming transmission method and system based on picture super-resolution
CN111586412B (en) High-definition video processing method, master device, slave device and chip system
WO2023061116A1 (en) Training method and apparatus for image processing network, computer device, and storage medium
WO2023000179A1 (en) Video super-resolution network, and video super-resolution, encoding and decoding processing method and device
CN114979672A (en) Video encoding method, decoding method, electronic device, and storage medium
CN111726623B (en) Method for improving reconstruction quality of spatial scalable coding video in packet loss network
WO2021057686A1 (en) Video decoding method and apparatus, video encoding method and apparatus, storage medium and electronic device
CN115665427A (en) Live broadcast data processing method and device and electronic equipment
CN114222127A (en) Video coding method, video decoding method and device
CN113727073A (en) Method and system for realizing vehicle-mounted video monitoring based on cloud computing
CN113747242A (en) Image processing method, image processing device, electronic equipment and storage medium
CN116523758B (en) End cloud combined super-resolution video reconstruction method and system based on key frames
CN110582022A (en) Video encoding and decoding method and device and storage medium
WO2022127565A1 (en) Video processing method and apparatus, and device
Barman et al. On the performance of video super-resolution algorithms for HTTP-based adaptive streaming applications
CN110636295B (en) Video encoding and decoding method and device, storage medium and electronic device
CN117956178A (en) Video encoding method and device, and video decoding method and device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21897131

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21897131

Country of ref document: EP

Kind code of ref document: A1