WO2020232613A1 - 一种视频处理方法、系统、移动终端、服务器及存储介质 - Google Patents

一种视频处理方法、系统、移动终端、服务器及存储介质 Download PDF

Info

Publication number
WO2020232613A1
WO2020232613A1 PCT/CN2019/087662 CN2019087662W WO2020232613A1 WO 2020232613 A1 WO2020232613 A1 WO 2020232613A1 CN 2019087662 W CN2019087662 W CN 2019087662W WO 2020232613 A1 WO2020232613 A1 WO 2020232613A1
Authority
WO
WIPO (PCT)
Prior art keywords
image frame
network
encoded image
layer
encoded
Prior art date
Application number
PCT/CN2019/087662
Other languages
English (en)
French (fr)
Inventor
欧勇盛
刘国栋
江国来
Original Assignee
深圳先进技术研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳先进技术研究院 filed Critical 深圳先进技术研究院
Priority to PCT/CN2019/087662 priority Critical patent/WO2020232613A1/zh
Publication of WO2020232613A1 publication Critical patent/WO2020232613A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs

Definitions

  • This application relates to the field of image processing, in particular to a video processing method, system, mobile terminal, server and storage medium.
  • Digital image compression coding is a very important technology, which is of great significance to the transmission and storage of digital images.
  • the traditional image coding algorithm is based on pixel value coding. Whether it is transform coding, predictive coding or other coding algorithms, it compresses on the basis of pixel value. Although the degree of compression is gradually increasing, the compression effect is getting better and better. Pixel value coding is difficult to compress the image or video volume to a minimum; and for traditional image coding algorithms, security issues cannot be ignored.
  • Traditional image coding algorithms need to develop various security mechanisms to ensure the transmission of the image after coding. safety.
  • the main problem to be solved by this application is to provide a video processing method, system, mobile terminal, server and storage medium, which can decode floating-point data into images, realize safe transmission of images, and enrich the decoded images.
  • the technical solution adopted in this application is to provide a video processing method, which is applied to the client, and the method includes: receiving the first encoded image frame sent by the server; determining whether an image rich instruction is received ; If the image rich instruction is received, random noise is added to the first encoded image frame to generate a second encoded image frame; wherein the first encoded image frame is floating-point data, the first encoded image frame and the second encoded image The difference between frames is within a preset range.
  • the method includes: receiving an input image; using a neural network-based coding network to process the input image , To obtain the first encoded image frame; wherein the first encoded image frame is floating-point data, the neural network-based encoding network includes at least an input layer, and each input layer includes at least two sub-input layers, the sub-input layer is used to receive Input the data of at least one channel in the image.
  • a mobile terminal which includes a memory and a processor connected to each other, wherein the memory is used to store a computer program, and the computer program is executed by the processor. , Used to implement the above video processing method.
  • another technical solution adopted in this application is to provide a server including a memory and a processor connected to each other, wherein the memory is used to store a computer program, and when the computer program is executed by the processor, To realize the above-mentioned video processing method.
  • the video processing system includes a server and a mobile terminal that are connected to each other.
  • the server is used to encode input images to obtain encoded image frames.
  • the mobile terminal is used to decode the encoded image frame to obtain the decoded image frame, where the mobile terminal is the above-mentioned mobile terminal, and the server is the above-mentioned server.
  • the computer storage medium is used to store a computer program.
  • the computer program is executed by a processor, it is used to implement the above video processing method.
  • the beneficial effects of the present application are: the client receives the first encoded image frame sent by the server, and the first encoded image frame is floating-point data; the client determines whether the image enrichment instruction is received, and if it receives the image enrichment Command to add random noise to the first encoded image frame to generate a second encoded image frame whose difference with the second encoded image frame is within a preset range, which can decode floating-point data into an image, and because Floating-point data is encoded based on semantics, and cannot be decoded if intercepted by a third party.
  • the secure transmission of images is realized, and the decoded images can be enriched, so that every time the user watches the video, the same frame will be Seeing different pictures improves the freshness of users' viewing.
  • FIG. 1 is a schematic flowchart of a first embodiment of a video processing method provided by this application
  • Fig. 2 is a schematic flowchart of a second embodiment of a video processing method provided by the present application
  • FIG. 3 is a schematic flowchart of a third embodiment of a video processing method provided by this application.
  • FIG. 4 is a schematic flowchart of a fourth embodiment of a video processing method provided by the present application.
  • FIG. 5 is a schematic diagram of the structure of the codec network provided by this application.
  • FIG. 6 is a schematic diagram of a flow of generating a first encoded image frame in the encoding network corresponding to FIG. 5;
  • FIG. 7 is a schematic diagram of the flow of generating decoded image frames in the decoding network corresponding to FIG. 5;
  • FIG. 8 is a schematic diagram of another structure of the codec network provided by this application.
  • FIG. 9 is a schematic diagram of a flow of generating a first encoded image frame in the encoding network corresponding to FIG. 8;
  • FIG. 10 is a schematic diagram of the flow of generating decoded image frames in the decoding network corresponding to FIG. 8;
  • FIG. 11 is a schematic structural diagram of an embodiment of a mobile terminal provided by the present application.
  • FIG. 12 is a schematic structural diagram of an embodiment of a server provided by this application.
  • FIG. 13 is a schematic structural diagram of an embodiment of a video processing system provided by the present application.
  • FIG. 14 is a schematic structural diagram of an embodiment of a computer storage medium provided by the present application.
  • Fig. 1 is a schematic flowchart of a first embodiment of a video processing method provided by the present application.
  • the video processing method is applied to a client, and the method includes:
  • Step 11 Receive the first encoded image frame sent by the server.
  • the first encoded image frame is floating-point data.
  • the floating-point data is obtained after the server performs encoding processing on the input image.
  • the encoding processing is semantic (image content)-based encoding, extracting the semantics of the input image, and performing Encode to obtain the first coded image frame. Since the first coded image frame is not obtained by using a pixel value-based coding algorithm, even if the first coded image frame is intercepted by a third party, the third party will not be able to do so without a corresponding decoding network.
  • the first encoded image frame is decoded to ensure the security of image transmission.
  • Step 12 Determine whether an image rich instruction is received.
  • the client After receiving the first encoded image frame, the client can determine whether it has received the image enrichment instruction input by the user or the image enrichment instruction set by default.
  • the image enrichment instruction is used to instruct to process the first encoded image frame so that the decoded image Compared with the input image, the image has some more image details or some details in the image have changed.
  • Step 13 If the image rich instruction is received, random noise is added to the first encoded image frame to generate a second encoded image frame.
  • the random noise is also floating-point data, and the data length is the same as the first encoded image frame; the client has two modes: no noise and random noise. The user can choose to enter one of the two modes or Random noise is added by default.
  • the difference between the first coded image frame and the second coded image frame is within the preset range to ensure that the first coded image frame and the second coded image frame are decoded separately
  • the difference between the two images is within the allowable range.
  • the content of the two images is roughly the same, and only some details may be different, so as to avoid the big difference between the decoded image and the original image in content;
  • the input image includes grass and a child.
  • the decoded image After superimposing random noise with the first encoded image frame and then decoding, the decoded image includes grass and the child, but the child has an extra hairpin on his head.
  • this embodiment provides a video processing method.
  • the client receives the first encoded image frame sent by the server, and after receiving the image rich instruction, processes the first encoded image frame to change the image
  • Some detailed features can decode floating-point data into an image, and because floating-point data is encoded based on semantics, it cannot be decoded even if it is intercepted by a third party, which realizes the safe transmission of images and can decode the decoded image Enriching, so that every time a user watches a video, they will see a different picture for the same frame, which improves the freshness of the user’s viewing.
  • FIG. 2 is a schematic flowchart of a second embodiment of a video processing method provided by the present application.
  • the video processing method is applied to a client, and the method includes:
  • Step 201 Send a download request message to the server according to a preset time interval or a preset number of frames.
  • the client can send a download request message to the client to request the server to deliver certain encoded image frames in the video to the client.
  • the client can request the server at intervals of a preset number of frames or a preset time; specifically, the client needs to The server requests to download the encoded image frame corresponding to the first frame in the video, so as to generate the next at least one image frame according to the encoded image frame corresponding to the first frame, and play the video smoothly.
  • Step 202 Receive the first encoded image frame sent by the server.
  • Step 203 Determine whether an image rich instruction is received.
  • steps 202-203 are similar to steps 12-13 in the foregoing embodiment, and will not be repeated here.
  • Step 204 Use the scene change detection network to determine whether a scene change occurs.
  • the scene conversion detection network is a convolutional neural network, which is used to detect whether a scene conversion occurs. It can use three-dimensional convolution or two-dimensional convolution, and use various manually labeled images to form a training set for training.
  • the output layer is a neuron Corresponds directly to whether a scene transition occurs.
  • Step 205 If a scene change occurs, generate new random noise, and add the new random noise to the first encoded image frame to generate a second encoded image frame.
  • Step 206 If there is no scene change, continue to add current random noise to the first encoded image frame to generate a second encoded image frame.
  • the transition detection is performed.
  • the first frame of each video ie, the 0th frame
  • the added random noise can change the costumes of the characters in the video, change the details of the background scenery, environmental decorations, or change the color style, but it does not affect the main plot.
  • the user repeatedly plays the same TV series or movie they can watch different Keep the content fresh.
  • Step 207 Use a neural network-based decoding network to decode the second encoded image frame to obtain a decoded image frame.
  • a neural network-based decoding network After receiving the second encoded image frame, in order to restore the floating-point data into image data, a neural network-based decoding network is used to decode the second encoded image frame.
  • Step 208 Use the image degradation removal network to process the decoded image frame to obtain the first image frame.
  • the input image may appear blurred after the encoding and decoding process, and the image degradation removal network can remove the blurred and noise contained in the generated decoded image frame.
  • the client can obtain multiple arbitrary images as the original image; then perform Gaussian blur or noise processing on the original image to generate the corresponding training image and establish the training set; and then use the image blur restoration network or
  • the image super-resolution network trains the training images in the training set, uses the loss function to measure the loss between the original image and the image output by the image degraded network, and minimizes the loss until the required image degraded network model is trained.
  • test set can also be established to test whether the trained image degradation removal network model has a better effect in removing image degradation.
  • Step 209 Use a motion estimation network to estimate the first image frame to generate at least one second image frame.
  • the motion estimation network is a Generative Adversarial Networks (GAN, Generative Adversarial Networks).
  • the generative adversarial network includes a generation network and a discriminant network.
  • the generation network includes a two-dimensional convolutional layer and a three-dimensional deconvolutional layer.
  • the two-dimensional convolutional layer is used for The characteristic information is extracted from the first image frame.
  • the three-dimensional deconvolution layer is used to receive the characteristic information and generate at least one second image frame.
  • the discrimination network includes a three-dimensional convolution layer and a fully connected layer, which are used to determine the second Whether the image frame is an image that meets the preset requirements.
  • the image that meets the preset requirements may be an image with a relatively high similarity to the image frame located after the first image frame in the video.
  • the number of second image frames is defined as ⁇ , if the current client The number of frames requested by the end from the server is the i-th (i is a positive integer) frame, and the next time the request is sent, it can request the i+ ⁇ +1-th frame from the server.
  • the value of ⁇ can be 5, and when ⁇ is 0,
  • the client needs to request each frame in the video from the server; the use of the motion estimation network can reduce the amount of transmitted information and further increase the security of information transmission.
  • the operations of the server and the client are not performed at the same time.
  • the server encodes all the frames in all video resources in advance, and stores the encoding result and the corresponding frame number.
  • the client's request comes, it will be based on the image required by the client.
  • Frames send encoded image frames to the client, and the client does not request every image frame.
  • the client can use the motion estimation network to generate the next few images after the current frame, so the client can send the server every few frames Take a frame.
  • Step 210 Send the first image frame and the second image frame to the video player for playback.
  • the client uses the first image frame to generate at least one second image frame
  • the first image frame and the second image frame may be sent to the video player in order to play the video.
  • this embodiment provides a video processing method.
  • the client receives the first encoded image frame sent by the server, and determines whether the random noise added to the first encoded image frame has changed by detecting whether the scene changes.
  • Generate the second coded image frame and use the decoding network to decode the second coded image frame to obtain the decoded image frame.
  • the decoded image frame can be de-degraded to obtain the first image frame, and then the motion estimation network is used according to the first image Frame generates at least one second image frame to avoid the need for the client to request each frame of the video from the server, which can reduce the number of data transmissions, further increase security, and can enrich and de-degrade the decoded image. Processing to improve image quality.
  • FIG. 3 is a schematic flowchart of a third embodiment of a video processing method provided by the present application.
  • the video processing method is applied to a server, and the method includes:
  • Step 31 Receive the input image.
  • the input image can be a color image, and its color format can be RGB or YCrCb, where Y, Cr, and Cb are brightness, red difference, and blue difference, respectively.
  • Step 32 Use a neural network-based coding network to perform coding processing on the input image to obtain a first coded image frame.
  • the first encoded image frame is floating-point data, and the floating-point data has nothing to do with the pixel value.
  • the floating-point data can be regarded as a "style" of the image, and the real image content is learned into the network as a distribution function Among the various layer parameters in the structure, a higher compression rate can be achieved; specifically, a 1920*1080 image can be compressed into 64 floating-point data, which greatly improves the compression rate and reduces the bandwidth required for video transmission .
  • the neural network-based coding network includes at least an input layer, and the number of input layers can be multiple to facilitate processing multiple input images at the same time when training the coding network model, and each input layer includes at least two sub-input layers ,
  • the sub-input layer is used to receive the data of at least one channel in the input image; for example, for the input image in YCrCb format, one sub-input layer can receive the data of the Y channel in the input image, and the other sub-input layer can receive the Cr in the input image. And Cb channel data.
  • this embodiment provides a video processing method.
  • the server receives the input image and encodes the input image using the encoding network to obtain the first encoded image frame, which can encode the digital image as floating-point data. And because the floating-point data is encoded based on semantics, it cannot be decoded even if it is intercepted by a third party to realize the safe transmission of the image.
  • FIG. 4 is a schematic flowchart of a fourth embodiment of a video processing method provided by the present application.
  • the video processing method is applied to a server, and the method includes:
  • Step 41 Receive the input image.
  • Step 42 Use a neural network-based coding network to perform coding processing on the input image to obtain a first coded image frame.
  • the neural network-based coding network includes at least an input layer, at least one convolutional hidden layer, a fully connected hidden layer for encoding, and a fully connected output layer for encoding, and each input layer includes at least two sub-input layers, which are used to receive input images At least one channel in the data.
  • the server encodes multiple video resources using an encoding network, and stores the encoding result and the corresponding frame number, so that when the client initiates a request, it can quickly find the encoding result corresponding to the frame number.
  • Step 43 Perform decoding processing on the first encoded image frame to obtain a decoded image frame.
  • the server After the server encodes the input image to obtain the first encoded image frame, the first encoded image frame may be decoded to obtain the decoded image frame.
  • Step 44 After receiving the video viewing request sent by the client, the neural network-based decoding network is sent to the client.
  • the neural network-based decoding network includes a decoding fully connected hidden layer, at least one deconvolution hidden layer, and an output layer.
  • the server can train multiple first encoded image frames output by the encoding network to obtain a neural network-based decoding network, and when the client initiates a request, the neural network-based decoding network is directly sent to the client; After the client sends a download request message to the server to request the server to download the first encoded image frame, the client can directly decode the first encoded image frame by using the decoding network sent by the server to obtain the decoded image frame.
  • This method of training the decoding network by the server is suitable for processing special videos. Since all special videos are trained on the client, it will take up too much resources of the client, and users may rarely use the decoding network, resulting in a waste of resources Therefore, it can be trained in the server. Only when the client needs it, the server initiates a request. The server directly sends the decoding network to the client to reduce the burden on the client; for example, for anime, anime has the same effect as a live-action drama. Completely different distribution functions, so the universal codec network of animation and the universal codec network of live-action dramas cannot use the same one, and the universal codec network should be trained separately for animation.
  • a coding and decoding network based on a neural network is shown in Figure 5.
  • This network is a variational self-encoding network.
  • the YCrCb color space is used during training.
  • Both the coding network and the decoding network have two branches.
  • the input layer includes a first sub-input layer and a second sub-input layer.
  • the steps for the server to obtain the first encoded image frame can be specifically shown in Figure 6:
  • Step 61 Use the first sub-input layer to receive the data of the first channel in the input image.
  • the color format of the input image is brightness-red difference-blue difference
  • the first channel is brightness channel Y
  • the second channel is red difference and blue difference channels CrCb.
  • Step 62 Perform down-sampling processing on the data of the second channel in the input image, and input the down-sampled data into the second sub-input layer.
  • the image data of the red and blue difference channels CrCb in the input image is down-sampled by N times, where N is a positive integer.
  • Step 63 Perform convolution, activation, pooling, batch normalization or discarding regularization on the output data of the first sub-input layer and the second sub-input layer by using the convolution hidden layer, respectively, to obtain the first coded image data and the second Encode image data.
  • Each convolutional hidden layer can have five operations: convolution, activation, pooling, batch normalization, or discard regularization, and pooling and discard regularization operations are optional.
  • the number of convolutional hidden layers of the two branches and the number of convolution kernels in the convolutional hidden layer are inconsistent.
  • the difference from the twin network is that the two branches of the coding network do not share weights, and the branch where the brightness channel Y is located The number of corresponding convolutional hidden layers is larger.
  • the data output by the first sub-input layer and the second sub-input layer can be processed separately, until the resolution of the data generated after processing is the same, the operations on the two branches are stopped, that is, the first encoded image data generated The resolution is the same as that of the second encoded image data.
  • Step 64 Combine the first encoded image data and the second encoded image data to obtain third encoded image data.
  • the first coded image data is 320 ⁇ 180 ⁇ 3
  • the second coded image data is 320 ⁇ 180 ⁇ 5.
  • the obtained third coded image data is 320 ⁇ 180 ⁇ 8.
  • Step 65 Use the convolution hidden layer to perform convolution, activation, pooling, batch normalization or discard regularization processing on the third coded image data to obtain the fourth coded image data.
  • the convolutional hidden layer After merging the first coded image data and the second coded image data generated by the two branches, the convolutional hidden layer is used to perform various processing on the merged data, and finally the fourth coded image data is obtained.
  • Step 66 Perform flattening processing on the fourth encoded image data output by the convolution hidden layer to obtain fifth encoded image data.
  • the flattening process is used for dimensionality reduction, so that the dimension of the fifth encoded image data is smaller than the dimension of the fourth encoded image data.
  • Step 67 Perform activation, batch normalization or discard regularization processing on the fifth coded image data by using the fully connected hidden layer of coding to obtain the sixth coded image data.
  • Each coded fully connected hidden layer can have three operations: activation, batch normalization, or discard regularization, and discard regularization is optional.
  • Step 68 Use the coded fully connected output layer to process the sixth coded image data to obtain the first coded image frame.
  • the number of neurons encoding the fully connected output layer is less than the number of neurons encoding the fully connected hidden layer, and its storage space is much smaller than the size of the input image; the encoding fully connected output layer is also used as the input layer of the neural network-based decoding network ,
  • the server decodes the first encoded image frame, and the steps to obtain the decoded image frame can be specifically shown in Figure 7:
  • Step 71 Receive the first coded image frame output by the coded fully connected output layer.
  • Step 72 Use the decoded fully connected hidden layer to process the first encoded image frame to obtain the first decoded image data.
  • Step 73 Set up the deconvolution hidden layer in the two branches, respectively, use the deconvolution hidden layer in each branch to deconvolve, activate, pool, batch normalize or batch the first decoded image data. The regularization process is discarded to obtain two second decoded image data.
  • Each deconvolution hidden layer can contain five operations: deconvolution, activation, pooling, batch normalization, or discard regularization, and pooling and discarding regularization operations are optional operations.
  • deconvolution activation, pooling, batch normalization, or discard regularization
  • pooling and discarding regularization operations are optional operations.
  • the deconvolution of the two branches The number of product hidden layers and the number of deconvolution kernels in the deconvolution hidden layer are inconsistent, and they do not share weights, and the number of deconvolution hidden layers corresponding to the branch where the luminance channel Y is located is larger.
  • Step 74 Use the output layer to process each second decoded image data to obtain the first decoded image frame and the second decoded image frame.
  • the output layer is a deconvolution output layer, and the number of deconvolution kernels corresponding to the luminance channel Y is 1, and the number of deconvolution kernels corresponding to the red and blue difference channels CrCb is 2.
  • Step 75 Perform up-sampling processing on the second decoded image frame to obtain a third decoded image frame.
  • the second sub-input layer receives the down-sampled data, during synthesis, the image output by the output layer corresponding to the red and blue difference channels CrCb is up-sampled, so that the luminance channel Y and the red and blue difference channels
  • the data size of channel CrCb remains the same.
  • Step 76 Combine the first decoded image frame and the third decoded image frame to obtain a decoded image frame.
  • the number of convolution kernels, the number of deconvolution kernels, activation functions, pooling parameters, upper pooling parameters, and the number of neurons in the hidden layer in the entire codec network are not hard requirements, and can be as needed Design.
  • a training set composed of various TV shows or movies is used to train the codec network.
  • the data in the brightness channel Y of each frame of image is sent to the coding network
  • the red difference channel Cr and the green difference channel Cb form a dual-channel image
  • the data is downsampled by 4 times and sent to the second branch, and they are respectively used as the two branches of the decoding network Label, calculate the loss, and add the losses of the two branches as the final loss.
  • the codec network in this embodiment is a special-purpose high-quality codec network, which is suitable for processing some special videos, specifically, it can be obtained through training of a certain TV show or movie, and is responsible for encoding and decoding the TV show or movie; For ordinary video, you don’t need to train on the server side, but train the general high-quality codec network on the client side.
  • the structure and training method of the general high-quality codec network are similar to the special high-quality codec network. The difference is The number of hidden layers and the number of convolution kernels is greater than or equal to the special high-quality codec network, which can process most videos.
  • a coding network based on a neural network and a decoding network based on a neural network are shown in Figure 8.
  • the coding network based on a neural network and a decoding network based on a neural network constitute a coding and decoding network based on a neural network, This network is a variational self-encoding network, which uses RGB color space during training.
  • the encoding network has one input and the decoding network has multiple outputs to support multi-resolution output.
  • the steps for the server to obtain the first encoded image frame may be specifically as shown in FIG. 9:
  • Step 91 Use the input layer to receive the input image.
  • the color format of the input image is red-green-blue.
  • Step 92 Use the convolution hidden layer to perform convolution, activation, pooling, batch normalization or discarding regularization processing on the input image to obtain seventh coded image data.
  • the number of convolutional hidden layers is at least 2.
  • Each convolutional hidden layer can have five operations: convolution, activation, pooling, batch normalization, or discard regularization, and pooling and discard regularization operations are optional operations , That is, in the convolution hidden layer, it is necessary to use the convolution kernel to perform the convolution operation on the output data of the previous layer to extract the feature information in the input image, and then pool the convolved data to downsample the data. Then use the activation function to activate the pooled data to increase the nonlinearity of the coding network model.
  • Step 93 Perform flattening processing on the seventh encoded image data output by the convolutional hidden layer to obtain eighth encoded image data.
  • the flattening process is used for dimensionality reduction, and expand the three-dimensional data to one dimension, that is, the dimension of the eighth coded image data is smaller than that of the seventh coded image data.
  • the input image is 1280 ⁇ 720 ⁇ 3, the number of convolution kernels is 5, the pooling operation performs 2 times downsampling, the number of convolutional hidden layers is 2, and the data after the first convolution is 1280 ⁇ 720 ⁇ 5.
  • the data after pooling is 640 ⁇ 360 ⁇ 5, the data after processing by the activation function is 640 ⁇ 360 ⁇ 5, the data after the second convolution is 640 ⁇ 360 ⁇ 10, and the data after pooling is 320 ⁇ 180 ⁇ 10, the data is 320 ⁇ 180 ⁇ 10 after the activation function is processed, and the output becomes one-dimensional data after the flattening process, and its length is 320 ⁇ 180 ⁇ 10.
  • Step 94 Perform activation, batch normalization or discard regularization processing on the eighth coded image data by using the coded fully connected hidden layer to obtain the ninth coded image data.
  • Each coded fully connected hidden layer can have three operations: activation, batch normalization, or discard regularization, and discard regularization is optional.
  • Step 95 Use the coded fully connected output layer to process the ninth coded image data to obtain the first coded image frame.
  • the number of neurons encoding the fully connected output layer is less than the number of neurons encoding the fully connected hidden layer, and its storage space is much smaller than the size of the input image; the encoding fully connected output layer is also used as the input layer of the neural network-based decoding network ,
  • the server decodes the first encoded image frame, and the steps to obtain the decoded image frame can be specifically shown in Figure 10:
  • Step 101 Receive the first coded image frame output by the coded fully connected output layer.
  • Step 102 Use the decoded fully connected hidden layer to process the first encoded image frame to obtain third decoded image data.
  • Step 103 Set up a deconvolution hidden layer in at least two branches, and use the deconvolution hidden layer in each branch to perform deconvolution, activation, pooling, and batch standardization on the third decoded image data. Or discard the regularization process to obtain at least two fourth decoded image data.
  • Each deconvolution hidden layer can contain five operations: deconvolution, activation, pooling, batch normalization, or discard regularization, and pooling and discard regularization operations are optional operations.
  • the three output branches in Figure 8 The number of deconvolution hidden layers of the path and the number of deconvolution kernels in the deconvolution hidden layer are inconsistent, and they do not share weights.
  • the number of convolution hidden layers corresponding to the branch where the high-resolution output layer image is located is larger; For example, the resolution of the image output by the output layer may be 1920*1080, 1280*720, and 640*360, respectively.
  • a training set of various TV shows or movies can be used to train the codec network.
  • each frame of image is used as input, and each frame of image is linearly interpolated to 1920 *1080, 1280*720, 640*360 three resolutions, respectively, and the three output images of the decoding network for loss calculation.
  • Step 104 Use the output layer to process each fourth decoded image data to obtain a corresponding decoded image frame.
  • the number of output layers is the same as the number of branches, the number of deconvolution hidden layers and the number of deconvolution kernels in each branch are different, and no weight is shared, the resolution of any two decoded image frames is different, and The higher the resolution, the greater the number of deconvolution hidden layers corresponding to its branch.
  • the number of convolution kernels, the number of deconvolution kernels, activation functions, pooling parameters, upper pooling parameters, and the number of neurons in the hidden layer in the entire codec network are not hard requirements, and can be done as needed design.
  • the codec network in this embodiment is a special multi-resolution codec network, which is suitable for processing some special videos. Specifically, it can be obtained through training of a certain TV show or movie, and is responsible for encoding and decoding the TV show or movie. ; For ordinary video, it is not necessary to train on the server side, but to train the general multi-resolution codec network on the client side, the structure and training method of the general multi-resolution codec network and the special multi-resolution codec network Similarly, the difference is that the number of hidden layers and the number of convolution kernels is greater than or equal to the special multi-resolution codec network, which can process most videos.
  • special codec networks including special multi-resolution codec networks and special high-quality codec networks
  • the images decoded by special codec networks have higher definition, better results, and shorter decoding time.
  • users need to click to download the special decoding network corresponding to a certain TV series or movie.
  • the special codec network can add special effects to the video.
  • the special effect function needs to be realized in the training process of the codec network.
  • the label image decorated with a certain special effect is used as a new label image during training.
  • the decoding network that generates the special effect image can be obtained, and more complex special effects can be made. For example, it can complete the special effects of live-action drama to animation or animation to live-action drama, ordinary picture to blockbuster style, etc., such as freezing effect or computer animation effect Wait.
  • FIG. 11 is a schematic structural diagram of an embodiment of a mobile terminal provided by the present application.
  • the mobile terminal 110 includes a memory 111 and a processor 112 connected to each other.
  • the memory 111 is used to store a computer program.
  • the mobile terminal 110 can train a general decoding network, a de-image degradation network, a motion estimation network, or a scene change detection network.
  • FIG. 12 is a schematic structural diagram of an embodiment of a server provided by the present application.
  • the server 120 includes a memory 121 and a processor 122 that are connected to each other.
  • the memory 121 is used to store a computer program, and the computer program is executed by the processor 122.
  • the processor 122 When used to implement the video processing method in the foregoing embodiment.
  • the server 120 can train a general encoding network, a special encoding network, and a special decoding network.
  • the server 120 stores a special decoding network so that when the mobile terminal initiates a request for a special video, the special decoding network is sent to the mobile terminal. It is convenient for the mobile terminal to decode the special video, so that the user can watch the specific video.
  • FIG. 13 is a schematic structural diagram of an embodiment of a video processing system provided by the present application.
  • the video processing system 130 includes a server 131 and a mobile terminal 132 connected to each other.
  • the server 131 is used to encode input images to obtain For the encoded image frame, the mobile terminal 132 is used to decode the encoded image frame to obtain the decoded image frame, where the server 131 is the server in the foregoing embodiment, and the mobile terminal 132 is the mobile terminal in the foregoing embodiment.
  • the video processing system 130 is an encoding and decoding system based on image content, which can compress an image to several floating-point data, greatly improving the compression rate, reducing the bandwidth required for video transmission, and encoding floating Point data is extremely secure, and even if it is intercepted, it will not reveal the transmitted information.
  • FIG. 14 is a schematic structural diagram of an embodiment of a computer storage medium provided by the present application.
  • the computer storage medium 140 is used to store a computer program 141.
  • the computer program 141 is executed by a processor, it is used to implement the Video processing method.
  • the storage medium 140 may be a server, a USB flash drive, a mobile hard disk, a read-only memory (ROM, Read-Only Memory), a random access memory (RAM, Random Access Memory), a magnetic disk, or an optical disk that can store various programs.
  • the medium of the code may be a server, a USB flash drive, a mobile hard disk, a read-only memory (ROM, Read-Only Memory), a random access memory (RAM, Random Access Memory), a magnetic disk, or an optical disk that can store various programs.
  • the medium of the code may be a server, a USB flash drive, a mobile hard disk, a read-only memory (ROM, Read-Only Memory), a random access memory (RAM, Random Access Memory), a magnetic disk, or an optical disk that can store various programs.
  • the medium of the code may be a server, a USB flash drive, a mobile hard disk, a read-only memory (ROM, Read-Only Memory), a random access memory (RAM, Random Access Memory), a magnetic disk
  • the disclosed method and device may be implemented in other ways.
  • the device implementation described above is only illustrative, for example, the division of modules or units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components can be combined or It can be integrated into another system, or some features can be ignored or not implemented.
  • the units described as separate components may or may not be physically separate, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of this embodiment.
  • the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

本申请公开了一种视频处理方法、系统、移动终端、服务器及存储介质,该视频处理方法应用于客户端,该方法包括:接收服务器发送的第一编码图像帧;判断是否接收到图像丰富指令;若接收到图像丰富指令,则将随机噪声加入第一编码图像帧中,生成第二编码图像帧;其中,第一编码图像帧为浮点型数据,第一编码图像帧与第二编码图像帧之间的差值在预设范围以内。通过上述方式,本申请能够将浮点型数据解码为图像,实现图像的安全传输,并能够对解码出来的图像进行丰富。

Description

一种视频处理方法、系统、移动终端、服务器及存储介质 【技术领域】
本申请涉及图像处理领域,具体涉及一种视频处理方法、系统、移动终端、服务器及存储介质。
【背景技术】
数字图像压缩编码是一种非常重要的技术,对数字图像的传输和存储有着非常重要的意义。传统的图像编码算法是基于像素值的编码,无论是变换编码、预测编码还是其它编码算法均是在像素值的基础上进行压缩,虽然压缩程度逐渐升高,压缩效果越来越好,但基于像素值的编码很难把图像或视频的体积压缩到极小;而且对于传统图像编码算法来说,安全问题也不容忽视,传统图像编码算法需要开发各种保密机制,以保证图像编码后传输的安全性。
【发明内容】
本申请主要解决的问题是提供一种视频处理方法、系统、移动终端、服务器及存储介质,能够将浮点型数据解码为图像,实现图像的安全传输,并能够对解码出来的图像进行丰富。
为解决上述技术问题,本申请采用的技术方案是提供一种视频处理方法,该视频处理方法应用于客户端,该方法包括:接收服务器发送的第一编码图像帧;判断是否接收到图像丰富指令;若接收到图像丰富指令,则将随机噪声加入第一编码图像帧中,生成第二编码图像帧;其中,第一编码图像帧为浮点型数据,第一编码图像帧与第二编码图像帧之间的差值在预设范围以内。
为解决上述技术问题,本申请采用的另一技术方案是提供一种视频处理方法,该视频处理方法应用于服务器,该方法包括:接收输入图像;利用基于神经网络的编码网络对输入图像进行处理,得到第一编码图像帧;其中,第一编码图像帧为浮点型数据,基于神经网络的编码网络至少包括输入层,且每个输入层包括至少两个子输入层,子输入层用于接 收输入图像中至少一个通道的数据。
为解决上述技术问题,本申请采用的另一技术方案是提供一种移动终端,该移动终端包括互相连接的存储器和处理器,其中,存储器用于存储计算机程序,计算机程序在被处理器执行时,用于实现上述的视频处理方法。
为解决上述技术问题,本申请采用的另一技术方案是提供一种服务器,该服务器包括互相连接的存储器和处理器,其中,存储器用于存储计算机程序,计算机程序在被处理器执行时,用于实现上述的视频处理方法。
为解决上述技术问题,本申请采用的另一技术方案是提供一种服务器,该视频处理系统包括互相连接的服务器和移动终端,其中,服务器用于对输入图像进行编码处理,得到编码图像帧,移动终端用于对编码图像帧进行解码,得到解码图像帧,其中,移动终端为上述的移动终端,服务器为上述的服务器。
为解决上述技术问题,本申请采用的另一技术方案是提供一种服务器,该计算机存储介质用于存储计算机程序,计算机程序在被处理器执行时,用于实现上述的视频处理方法。
通过上述方案,本申请的有益效果是:客户端接收服务器发送的第一编码图像帧,该第一编码图像帧为浮点型数据;客户端判断是否接收到图像丰富指令,若接收到图像丰富指令,则将随机噪声加入第一编码图像帧中,生成与第二编码图像帧之间的差值在预设范围以内的第二编码图像帧,能够将浮点型数据解码为图像,且由于浮点型数据为基于语义进行编码得到,被第三方截获到也无法进行解码,实现图像的安全传输,并能够对解码出来的图像进行丰富,使得每次用户观看视频时,对于同一帧画面都会看到不同的画面,提高用户观看的新鲜感。
【附图说明】
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图 仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。其中:
图1是本申请提供的视频处理方法第一实施例的流程示意图;
图2是本申请提供的视频处理方法第二实施例的流程示意图;
图3是本申请提供的视频处理方法第三实施例的流程示意图;
图4是本申请提供的视频处理方法第四实施例的流程示意图;
图5是本申请提供的编解码网络的结构示意图;
图6是图5对应的编码网络中生成第一编码图像帧的流程示意图;
图7是图5对应的解码网络中生成解码图像帧的流程示意图;
图8是本申请提供的编解码网络的另一结构示意图;
图9是图8对应的编码网络中生成第一编码图像帧的流程示意图;
图10是图8对应的解码网络中生成解码图像帧的流程示意图;
图11是本申请提供的移动终端一实施例的结构示意图;
图12是本申请提供的服务器一实施例的结构示意图;
图13是本申请提供的视频处理系统一实施例的结构示意图;
图14是本申请提供的计算机存储介质一实施例的结构示意图。
【具体实施方式】
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性的劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
参阅图1,图1是本申请提供的视频处理方法第一实施例的流程示意图,该视频处理方法应用于客户端,该方法包括:
步骤11:接收服务器发送的第一编码图像帧。
该第一编码图像帧为浮点型数据,该浮点型数据为服务器对输入图像进行编码处理后得到,编码处理为基于语义(图像内容)的编码,提取输入图像中的语义,对其进行编码,得到第一编码图像帧,由于第一 编码图像帧不是利用基于像素值的编码算法得到,即使第一编码图像帧被第三方截获,在没有对应的解码网络的情况下,第三方无法对第一编码图像帧进行解码,从而保障图像传输的安全性。
步骤12:判断是否接收到图像丰富指令。
客户端可以在接收到第一编码图像帧之后,判断是否接收到用户输入的图像丰富指令或者默认设置的图像丰富指令,该图像丰富指令用于指示对第一编码图像帧进行处理,使得解码后的图像相比输入图像增加一些图像细节或者图像中部分细节改变。
步骤13:若接收到图像丰富指令,则将随机噪声加入第一编码图像帧中,生成第二编码图像帧。
该随机噪声也为浮点型数据,且数据长度和第一编码图像帧一致;客户端设置有不加入任何噪声和加入随机噪声两种模式,用户可以选择进入两种模式中的一种模式或者默认加入随机噪声。
在处于加入随机噪声模式时,第一编码图像帧与第二编码图像帧之间的差值在预设范围以内,以保证分别对第一编码图像帧和第二编码图像帧进行解码后,解码出来的两张图像的差别在可允许的范围内,两张图像的内容大致是相同的,仅在某些细节可能不同,避免解码出来的图像和原图像在内容上有很大差别;这样在用户观看同一部电影或电视剧时,每次打开看到的场景都会有些微不同,增加观看的新鲜感。
例如,输入图像包括草地和一个小孩子,在将随机噪声与第一编码图像帧叠加,再进行解码后,解码出来的图像包括草地和孩子,但是孩子的头上多了一个发卡。
区别于现有技术,本实施例提供了一种视频处理方法,客户端接收服务器发送的第一编码图像帧,并在接收到图像丰富指令后,对第一编码图像帧进行处理,改变图像的部分细节特征,能够将浮点型数据解码为图像,且由于浮点型数据为基于语义进行编码得到,被第三方截获到也无法进行解码,实现图像的安全传输,并能够对解码出来的图像进行丰富,使得每次用户观看视频时,对于同一帧画面都会看到不同的画面,提高用户观看的新鲜感。
参阅图2,图2是本申请提供的视频处理方法第二实施例的流程示意图,该视频处理方法应用于客户端,该方法包括:
步骤201:按照预设时间间隔或间隔预设帧数发送下载请求消息至服务器。
客户端可发送下载请求消息给客户端,以请求服务器将视频中某些编码图像帧下发至客户端,可间隔预设帧数或预设时间来向服务器请求;具体地,客户端需要向服务器请求下载视频中的第一帧对应的编码图像帧,以便根据第一帧对应的编码图像帧去生成接下来的至少一帧图像,顺利播放视频。
步骤202:接收服务器发送的第一编码图像帧。
步骤203:判断是否接收到图像丰富指令。
其中,步骤202-203与上述实施例中步骤12-13类似,在此不再赘述。
步骤204:利用场景转换检测网络判断是否发生场景改变。
该场景转换检测网络为卷积神经网络,其用于检测场景转换是否发生,可使用三维卷积或者二维卷积,利用手工标记的各种图像组成训练集进行训练,输出层为一个神经元直接对应是否产生了场景转换。
步骤205:若发生场景改变,则生成新的随机噪声,并将新的随机噪声加入第一编码图像帧中,生成第二编码图像帧。
步骤206:若未发生场景改变,则继续将当前随机噪声加入第一编码图像帧中,生成第二编码图像帧。
客户端在接收到第一编码图像帧后,若客户端处于加入随机噪声的模式,则执行转场检测,每个视频的首帧(即第0帧)属于发生转场的状态,若视频中图像场景未发生改变,则保持加入的随机噪声不变,继续向第一编码图像帧加入该随机噪声;若视频中图像场景发生改变,则可以生成新的随机噪声,并将新的随机噪声与第一编码图像帧进行叠加,使得在同一场景下可以使用相同的随机噪声处理,显现出相同的细节改变,而转场后有着不同的细节改变。
例如,添加的随机噪声可以使视频中人物的服饰改变、使背景风景、 环境装饰等细节改变或使颜色风格改变,但不影响主要剧情,用户重复播放同一部电视剧或电影时,可以观看到不同的内容,保持新鲜感。
步骤207:利用基于神经网络的解码网络对第二编码图像帧进行解码处理,得到解码图像帧。
在接收到第二编码图像帧之后,为了将浮点型数据恢复成图像数据,利用基于神经网络的解码网络来对第二编码图像帧进行解码。
步骤208:利用去除图像退化网络对解码图像帧进行处理,得到第一图像帧。
输入图像在经过编码和解码过程后可能会出现图像模糊,去除图像退化网络可对生成的解码图像帧中所包含的模糊和噪声进行去除。
在一具体的实施例中,客户端可获取多张任意图像作为原始图像;然后对原始图像进行高斯模糊处理或加噪处理,生成相应的训练图像,建立训练集;再利用图像模糊复原网络或图像超分辨率网络对训练集中的训练图像进行训练,利用损失函数来衡量原始图像和去除图像退化网络输出的图像之间的损失,最小化该损失直至训练出符合要求的去除图像退化网络模型。
进一步地,还可以建立测试集,以测试训练出来的去除图像退化网络模型是否去除图像退化的效果比较好。
步骤209:利用运动估计网络对第一图像帧进行估计,生成至少一张第二图像帧。
该运动估计网络为生成式对抗网络(GAN,Generative Adversarial Networks),生成式对抗网络包括生成网络和判别网络,生成网络包括二维卷积层和三维反卷积层,二维卷积层用于从第一图像帧中提取特征信息,三维反卷积层用于接收特征信息,生成至少一张第二图像帧,判别网络包括三维卷积层和全连接层,其用于判断生成的第二图像帧是否为符合预设要求的图像。
该符合预设要求的图像可以为与视频中位于第一图像帧之后的图像帧相似度比较高的图像,在一具体的实施例中,将第二图像帧的数量定义为α,如果当前客户端向服务器请求的帧数为第i(i为正整数)帧, 则下一次发送请求时,可向服务器请求第i+α+1帧,α的值可以为5,当α为0时,客户端需要向服务器请求视频中的每一帧;利用运动估计网络可减少传输信息的数量,进一步增加信息传输的安全性。
服务器端与客户端的操作不是同时进行的,服务器事先对所有视频资源中的所有帧进行编码,并将编码结果和对应的帧号存储起来,待客户端的请求到来时,根据客户端所需的图像帧发送编码后的图像帧给客户端,而且客户端并非对每个图像帧均请求,客户端可利用运动估计网络生成当前帧后的下几帧图像,因而客户端可以每隔几帧向服务器取一帧。
步骤210:将第一图像帧以及第二图像帧发送至视频播放器进行播放。
在客户端利用第一图像帧生成至少一张第二图像帧之后,可将第一图像帧以及与第二图像帧按顺序发送至视频播放器,以进行视频的播放。
区别于现有技术,本实施例提供了一种视频处理方法,客户端接收服务器发送的第一编码图像帧,通过检测场景是否改变来判断向第一编码图像帧中加入的随机噪声是否改变,生成第二编码图像帧,并利用解码网络对第二编码图像帧进行解码,得到解码图像帧,可对解码图像帧进行去退化处理,得到第一图像帧,再利用运动估计网络根据第一图像帧生成至少一张第二图像帧,以避免客户端需要向服务器请求视频中的每一帧,能够减少数据传输的次数,进一步地增加安全性,同时能够对解码出来的图像进行丰富和去退化处理,提高图像质量。
参阅图3,图3是本申请提供的视频处理方法第三实施例的流程示意图,该视频处理方法应用于服务器,该方法包括:
步骤31:接收输入图像。
该输入图像可以为彩色图像,其颜色格式可以为RGB或YCrCb,其中,Y、Cr和Cb分别为亮度、红色差和蓝色差。
步骤32:利用基于神经网络的编码网络对输入图像进行编码处理,得到第一编码图像帧。
第一编码图像帧为浮点型数据,且该浮点型数据与像素值无关,该浮点型数据可以看作图像的一种“样式”,而真正的图像内容被作为分布函数学习到了网络结构中的各层参数中,可实现较高压缩率;具体地,可将一幅1920*1080的图像压缩成64个浮点型数据,大大提升了压缩率,减少了传输视频所需的带宽。
基于神经网络的编码网络至少包括输入层,输入层的个数可以为多个,以方便在训练编码网络模型时,同时对多张输入图像进行处理,且每个输入层包括至少两个子输入层,子输入层用于接收输入图像中至少一个通道的数据;例如,对于YCrCb格式的输入图像,一个子输入层可接收输入图像中Y通道的数据,另一个子输入层可接收输入图像中Cr和Cb通道的数据。
区别于现有技术,本实施例提供了一种视频处理方法,服务器接收输入图像,并利用编码网络对输入图像进行编码处理,得到第一编码图像帧,能够数字图像编码为浮点型数据,且由于浮点型数据为基于语义进行编码得到,被第三方截获到也无法进行解码,实现图像的安全传输。
参阅图4,图4是本申请提供的视频处理方法第四实施例的流程示意图,该视频处理方法应用于服务器,该方法包括:
步骤41:接收输入图像。
步骤42:利用基于神经网络的编码网络对输入图像进行编码处理,得到第一编码图像帧。
基于神经网络的编码网络至少包括输入层、至少一个卷积隐藏层、编码全连接隐藏层以及编码全连接输出层,且每个输入层包括至少两个子输入层,子输入层用于接收输入图像中至少一个通道的数据。
在一具体的实施例中,服务器对多个视频资源利用编码网络进行编码,并将编码结果和对应的帧号存储起来,以便客户端发起请求时,快速找到与帧号对应的编码结果。
步骤43:对第一编码图像帧进行解码处理,得到解码图像帧。
在服务器对输入图像进行编码得到第一编码图像帧后,可对第一编码图像帧进行解码处理,从而得到解码图像帧。
步骤44:在接收到客户端发送的视频观看请求后,将基于神经网络的解码网络发送至客户端。
基于神经网络的解码网络包括解码全连接隐藏层、至少一个反卷积隐藏层以及输出层。服务器可对利用编码网络输出的多个第一编码图像帧进行训练,得到基于神经网络的解码网络,并在客户端发起请求时,将该基于神经网络的解码网络直接发送给客户端;在客户端向服务器发送下载请求消息,以向服务器请求下载第一编码图像帧之后,客户端可利用服务器发送的解码网络直接对第一编码图像帧进行解码,得到解码图像帧。
这种服务器来训练解码网络的方式适用于处理特殊视频,由于所有特殊视频均在客户端进行训练,将占用客户端过多的资源,而且用户还可能很少使用该解码网络,造成资源的浪费,因而可以在服务器中训练,仅在客户端需要的时候,才向服务器发起请求,服务器直接将该解码网络发送给客户端,减轻客户端的负担;例如,对于动漫来说,动漫有着与真人剧完全不同的分布函数,所以动漫的通用编解码网络与真人剧的通用编解码网络不能使用同一个,应针对动漫单独训练通用编解码网络。
在一具体的实施例中,基于神经网络的编解码网络如图5所示,该网络为变分自编码网络,在训练时使用YCrCb颜色空间,编码网络和解码网络均为具有两条支路的网络,输入层包括第一子输入层和第二子输入层,服务器得到第一编码图像帧的步骤具体可如图6所示:
步骤61:利用第一子输入层接收输入图像中第一通道的数据。
输入图像的颜色格式为亮度-红色差-蓝色差,第一通道为亮度通道Y,第二通道为红色差和蓝色差通道CrCb。
步骤62:对输入图像中第二通道的数据进行下采样处理,并将下采样后的数据输入第二子输入层。
对输入图像中红色差和蓝色差通道CrCb的图像数据进行N倍下采样,N为正整数。
步骤63:分别利用卷积隐藏层对第一子输入层和第二子输入层输出 的数据进行卷积、激活、池化、批标准化或丢弃正则化处理,得到第一编码图像数据和第二编码图像数据。
每个卷积隐藏层都可有卷积、激活、池化、批标准化或者丢弃正则化五种操作,且池化和丢弃正则化操作是可选项。两个支路的卷积隐藏层的数量以及卷积隐藏层中卷积核的数量不一致,与孪生网络不同的是该编码网络的两个支路不共享权重,且亮度通道Y所在的支路对应的卷积隐藏层的数量多一些。
可分别对第一子输入层和第二子输入层输出的数据进行处理,直至处理后生成的数据的分辨率相同,才停止在两个支路分别进行操作,即生成的第一编码图像数据和第二编码图像数据的分辨率相同。
步骤64:将第一编码图像数据和第二编码图像数据进行合并,得到第三编码图像数据。
例如,第一编码图像数据为320×180×3,第二编码图像数据为320×180×5,进行合并后,得到的第三编码图像数据为320×180×8。
步骤65:利用卷积隐藏层对第三编码图像数据进行卷积、激活、池化、批标准化或丢弃正则化处理,得到第四编码图像数据。
将两个支路生成的第一编码图像数据和第二编码图像数据合并在一起之后,再利用卷积隐藏层对合并后的数据进行各种处理,最终得到第四编码图像数据。
步骤66:对卷积隐藏层输出的第四编码图像数据进行扁平化处理,得到第五编码图像数据。
该扁平化处理用于降维,使得第五编码图像数据的维度小于第四编码图像数据的维度。
步骤67:利用编码全连接隐藏层对第五编码图像数据进行激活、批标准化或丢弃正则化处理,得到第六编码图像数据。
每个编码全连接隐藏层都可有激活、批标准化或者丢弃正则化三种操作,且丢弃正则化操作是可选操作。
步骤68:利用编码全连接输出层对第六编码图像数据进行处理,得到第一编码图像帧。
编码全连接输出层的神经元数量小于编码全连接隐藏层的神经元数量,并且其所占存储空间远小于输入图像的大小;编码全连接输出层也用作基于神经网络的解码网络的输入层,服务器对第一编码图像帧进行解码处理,得到解码图像帧的步骤具体可如图7所示:
步骤71:接收编码全连接输出层输出的第一编码图像帧。
步骤72:利用解码全连接隐藏层对第一编码图像帧进行处理,得到第一解码图像数据。
步骤73:在两个支路中分别设置反卷积隐藏层,分别利用每个支路中的反卷积隐藏层对第一解码图像数据进行反卷积、激活、上池化、批标准化或丢弃正则化处理,以得到两个第二解码图像数据。
每个反卷积隐藏层可包含反卷积、激活、上池化、批标准化或丢弃正则化五种操作,并且上池化和丢弃正则化操作是可选操作,两个支路的反卷积隐藏层的数量以及反卷积隐藏层中反卷积核的数量不一致,且不共享权重,亮度通道Y所在的支路对应的反卷积隐藏层的数量多一些。
步骤74:分别利用输出层对每个第二解码图像数据进行处理,得到第一解码图像帧和第二解码图像帧。
该输出层为反卷积输出层,且亮度通道Y对应的反卷积核的数量为1,红色差和蓝色差通道CrCb对应的反卷积核的数量为2。
步骤75:对第二解码图像帧进行上采样处理,得到第三解码图像帧。
由于第二子输入层接收的是下采样后的数据,在进行合成时,对红色差和蓝色差通道CrCb对应的输出层所输出的图像进行上采样,使得亮度通道Y以及红色差和蓝色差通道CrCb的数据大小保持一致。
步骤76:将第一解码图像帧与第三解码图像帧进行合并,以得到解码图像帧。
除输出层外,整个编解码网络中的卷积核的数量、反卷积核的数量、激活函数、池化参数、上池化参数以及隐藏层中神经元的数量并非硬性要求,可根据需要进行设计。
在一具体的实施例中,由各种电视剧或电影组成训练集对编解码网 络进行训练,在YCrCb颜色空间下进行训练时,将每一帧图像的亮度通道Y中的数据发送至编码网络的第一个支路,红色差通道Cr和绿色差通道Cb组成双通道图像后,将数据进行4倍下采样后发送至第二个支路,并且将它们分别作为解码网络的两条支路的标签,进行损失的计算,并将两条支路的损失相加,作为最终的损失。
如果要获得多种分辨率的图像,可利用图像插值算法获得所需的分辨率图像,该编解码网络解码出来的图像质量更优,颜色偏差更小。
本实施例中的编解码网络为特用高质量编解码网络,其适用于对一些特殊视频进行处理,具体地,可由某一电视剧或电影训练得到,负责对该电视剧或电影进行编码和解码;对于普通视频来说,可不用在服务器端进行训练,而是在客户端训练通用高质量编解码网络,通用高质量编解码网络的结构和训练方法与特用高质量编解码网络类似,区别在于在隐藏层数和卷积核的数量上大于或等于特用高质量编解码网络,该通用高质量编解码网络可对大多数视频进行处理。
在另一具体的实施例中,基于神经网络的编码网络和基于神经网络的解码网络如图8所示,基于神经网络的编码网络和基于神经网络的解码网络构成基于神经网络的编解码网络,该网络为变分自编码网络,在训练时使用RGB颜色空间,编码网络为一路输入,解码网络为多路输出,以支持多分辨率输出。服务器得到第一编码图像帧的步骤具体可如图9所示:
步骤91:利用输入层接收输入图像。
该输入图像的颜色格式为红色-绿色-蓝色。
步骤92:利用卷积隐藏层对输入图像进行卷积、激活、池化、批标准化或丢弃正则化处理,得到第七编码图像数据。
卷积隐藏层的数量至少为2个,每个卷积隐藏层都可有卷积、激活、池化、批标准化或者丢弃正则化五种操作,且池化和丢弃正则化操作是可选操作,即在卷积隐藏层需要对前一层输出的数据利用卷积核进行卷积操作以提取输入图像中的特征信息,然后对卷积后的数据进行池化,以对数据进行下采样,再利用激活函数对池化后的数据进行激活,以增 加编码网络模型的非线性。
步骤93:对卷积隐藏层输出的第七编码图像数据进行扁平化处理,得到第八编码图像数据。
该扁平化处理用于降维,将三维数据展开至一维,即第八编码图像数据的维度小于第七编码图像数据的维度。
例如,输入图像为1280×720×3,卷积核的数量为5,池化操作进行2倍的下采样,卷积隐藏层的数量为2,第一次卷积后数据为1280×720×5,经过池化后数据为640×360×5,利用激活函数处理后数据为640×360×5,第二次卷积后数据为640×360×10,经过池化后数据为320×180×10,利用激活函数处理后数据为320×180×10,经过扁平化处理后输出变成1维数据,其长度为320×180×10。
步骤94:利用编码全连接隐藏层对第八编码图像数据进行激活、批标准化或丢弃正则化处理,得到第九编码图像数据。
每个编码全连接隐藏层都可有激活、批标准化或者丢弃正则化三种操作,且丢弃正则化操作是可选操作。
步骤95:利用编码全连接输出层对第九编码图像数据进行处理,得到第一编码图像帧。
编码全连接输出层的神经元数量小于编码全连接隐藏层的神经元数量,并且其所占存储空间远小于输入图像的大小;编码全连接输出层也用作基于神经网络的解码网络的输入层,服务器对第一编码图像帧进行解码处理,得到解码图像帧的步骤具体可如图10所示:
步骤101:接收编码全连接输出层输出的第一编码图像帧。
步骤102:利用解码全连接隐藏层对第一编码图像帧进行处理,得到第三解码图像数据。
步骤103:在至少两个支路中分别设置反卷积隐藏层,分别利用每个支路中的反卷积隐藏层对第三解码图像数据进行反卷积、激活、上池化、批标准化或丢弃正则化处理,得到至少两个第四解码图像数据。
每个反卷积隐藏层可包含反卷积、激活、上池化、批标准化或丢弃正则化五种操作,并且上池化和丢弃正则化操作是可选操作,图8中三 个输出支路的反卷积隐藏层的数量以及反卷积隐藏层中反卷积核的数量不一致,且不共享权重,高分辨率输出层图像所在的支路对应的卷积隐藏层的数量多一些;例如,输出层输出的图像的分辨率可分别为1920*1080、1280*720以及640*360。
在一具体的实施例中,可由各种电视剧或电影组成训练集对编解码网络进行训练,在RGB颜色空间下进行训练时将每一帧图像作为输入,并将每一帧图像线性插值成1920*1080、1280*720、640*360三种分辨率,分别将它们与解码网络的三路输出图像进行损失的计算。
步骤104:分别利用输出层对每个第四解码图像数据进行处理,得到相应的解码图像帧。
输出层的数量与支路的数量相同,每个支路中的反卷积隐藏层的数量以及反卷积核的数量不同,且不共享权重,任意两个解码图像帧的分辨率不同,且分辨率越高其所在支路对应的反卷积隐藏层的数量越多。
除输出层外,整个编解码网络中卷积核的数量、反卷积核的数量、激活函数、池化参数、上池化参数以及隐藏层中神经元的数量并非硬性要求,可根据需要进行设计。
本实施例中的编解码网络为特用多分辨率编解码网络,其适用于对一些特殊视频进行处理,具体地,可由某一电视剧或电影训练得到,负责对该电视剧或电影进行编码和解码;对于普通视频来说,可不用在服务器端进行训练,而是在客户端训练通用多分辨率编解码网络,通用多分辨率编解码网络的结构和训练方法与特用多分辨率编解码网络类似,区别在于在隐藏层数和卷积核的数量上大于或等于特用多分辨率编解码网络,该通用多分辨率编解码网络可对大多数视频进行处理。
对于特用编解码网络(包括特用多分辨率编解码网络和特用高质量编解码网络)来说,特用编解码网络解码出来的图像清晰度较高,效果较好,解码时间短,但是用户需要额外点击下载对应某一电视剧或电影的特用解码网络。
特用编解码网络可以实现在视频中加入特效,特效功能需要在编解码网络的训练过程中实现,在训练时使用某一特效装饰后的标签图像作 为新的标签图像,去训练编解码网络就可得到生成该特效图像的解码网络,可以做比较复杂的特效,例如,可以完成真人剧转动漫的特效或动漫转真人剧的特效,普通画面转大片风格等,如冰封效果或计算机动画效果等。
此外,还可以根据视频的类型,来训练各种视频题材的通用编解码网络,例如:古装剧或现代剧等类型,这种通用编解码网络只使用所属类型的视频资源进行训练,也仅负责对该类型的视频资源进行编码和解码,其网络结构可以和上述实施例中的网络结构相同,在此不再赘述。
参阅图11,图11是本申请提供的移动终端一实施例的结构示意图,移动终端110包括互相连接的存储器111和处理器112,其中,存储器111用于存储计算机程序,计算机程序在被处理器112执行时,用于实现上述实施例中的视频处理方法。
在移动终端110可训练通用解码网络、去图像退化网络、运动估计网络或场景转换检测网络等。
参阅图12,图12是本申请提供的服务器一实施例的结构示意图,服务器120包括互相连接的存储器121和处理器122,其中,存储器121用于存储计算机程序,计算机程序在被处理器122执行时,用于实现上述实施例中的视频处理方法。
服务器120可训练好通用编码网络、特用编网络以及特用解码网络,服务器120存储有特用解码网络,以便在移动终端发起特用视频的请求时,将特用解码网络发给移动终端,方便移动终端对特用视频进行解码,使得用户能够观看特定视频。
参阅图13,图13是本申请提供的视频处理系统一实施例的结构示意图,视频处理系统130包括互相连接的服务器131和移动终端132,其中,服务器131用于对输入图像进行编码处理,得到编码图像帧,移动终端132用于对编码图像帧进行解码,得到解码图像帧,其中,服务器131为上述实施例中的服务器,移动终端132为上述实施例中的移动终端。
该视频处理系统130为一种基于图像内容的编解码系统,可以将一 幅图像压缩至若干个浮点型数据,大大提升了压缩率,减少了传输视频所需的带宽,并且编码形成的浮点型数据极具安全性,即使被截获也不会泄露所传输的信息。
参阅图14,图14是本申请提供的计算机存储介质一实施例的结构示意图,计算机存储介质140用于存储计算机程序141,计算机程序141在被处理器执行时,用于实现上述实施例中的视频处理方法。
其中,该存储介质140可以是服务器、U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。
在本申请所提供的几个实施方式中,应该理解到,所揭露的方法以及设备,可以通过其它的方式实现。例如,以上所描述的设备实施方式仅仅是示意性的,例如,模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。
作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施方式方案的目的。
另外,在本申请各个实施方式中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
以上仅为本申请的实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。

Claims (17)

  1. 一种视频处理方法,其特征在于,应用于客户端,所述视频处理方法包括:
    接收服务器发送的第一编码图像帧;
    判断是否接收到图像丰富指令;
    若是,则将随机噪声加入所述第一编码图像帧中,生成第二编码图像帧;
    其中,所述第一编码图像帧为浮点型数据,所述第一编码图像帧与所述第二编码图像帧之间的差值在预设范围以内。
  2. 根据权利要求1所述的视频处理方法,其特征在于,所述将随机噪声加入所述第一编码图像帧中,生成第二编码图像帧的步骤,包括:
    利用场景转换检测网络判断是否发生场景改变;
    若是,则生成新的随机噪声,并将所述新的随机噪声加入所述第一编码图像帧中,生成所述第二编码图像帧;若否,则继续将当前随机噪声加入所述第一编码图像帧中,生成所述第二编码图像帧。
  3. 根据权利要求1所述的视频处理方法,其特征在于,所述方法还包括:
    利用基于神经网络的解码网络对所述第二编码图像帧进行解码处理,得到解码图像帧;
    利用去除图像退化网络对所述解码图像帧进行处理,得到第一图像帧;
    利用运动估计网络对所述第一图像帧进行估计,生成至少一张第二图像帧;
    将所述第一图像帧以及所述第二图像帧发送至视频播放器进行播放。
  4. 根据权利要求3所述的视频处理方法,其特征在于,所述接收服务器发送的第一编码图像帧的步骤之前,包括:
    按照预设时间间隔或间隔预设帧数发送下载请求消息至所述服务 器。
  5. 根据权利要求3所述的视频处理方法,其特征在于,所述利用去除图像退化网络对所述解码图像帧进行处理,得到第一图像帧的步骤,包括:
    获取多张图像作为原始图像;
    对所述原始图像进行高斯模糊处理或加噪处理,生成相应的训练图像,建立训练集;
    利用图像模糊复原网络或图像超分辨率网络对所述训练集中的训练图像进行训练。
  6. 根据权利要求3所述的视频处理方法,其特征在于,
    所述运动估计网络为生成式对抗网络,所述生成式对抗网络包括生成网络和判别网络,所述生成网络包括二维卷积层和三维反卷积层,所述二维卷积层用于从所述第一图像帧中提取特征信息,所述三维反卷积层用于接收所述特征信息,生成至少一张所述第二图像帧,所述判别网络包括三维卷积层和全连接层,其用于判断生成的所述第二图像帧是否为符合预设要求的图像。
  7. 一种视频处理方法,其特征在于,应用于服务器,所述视频处理方法包括:
    接收输入图像;
    利用基于神经网络的编码网络对所述输入图像进行编码处理,得到所述第一编码图像帧;
    其中,所述第一编码图像帧为浮点型数据,所述基于神经网络的编码网络至少包括输入层,且每个所述输入层包括至少两个子输入层,所述子输入层用于接收所述输入图像中至少一个通道的数据。
  8. 根据权利要求7所述的视频处理方法,其特征在于,
    所述基于神经网络的编码网络还包括至少一个卷积隐藏层、编码全连接隐藏层以及编码全连接输出层。
  9. 根据权利要求8所述的视频处理方法,其特征在于,所述方法还包括:
    对所述第一编码图像帧进行解码处理,得到解码图像帧;
    在接收到所述客户端发送的视频观看请求后,将基于神经网络的解码网络发送至所述客户端;
    其中,所述基于神经网络的解码网络包括解码全连接隐藏层、至少一个反卷积隐藏层以及输出层。
  10. 根据权利要求9所述的视频处理方法,其特征在于,所述输入层包括第一子输入层和第二子输入层,所述利用基于神经网络的编码网络对所述输入图像进行编码处理,得到所述第一编码图像帧的步骤,包括:
    利用所述第一子输入层接收所述输入图像中第一通道的数据;
    对所述输入图像中第二通道的数据进行下采样处理,并将下采样后的数据输入所述第二子输入层;
    分别利用所述卷积隐藏层对所述第一子输入层和所述第二子输入层输出的数据进行卷积、激活、池化、批标准化或丢弃正则化处理,得到第一编码图像数据和第二编码图像数据,其中,所述第一编码图像数据和第二编码图像数据的分辨率相同;
    将所述第一编码图像数据和所述第二编码图像数据进行合并,得到第三编码图像数据;
    利用所述卷积隐藏层对所述第三编码图像数据进行卷积、激活、池化、批标准化或丢弃正则化处理,得到第四编码图像数据;
    对所述卷积隐藏层输出的所述第四编码图像数据进行扁平化处理,得到第五编码图像数据,其中,所述第五编码图像数据的维度小于所述第四编码图像数据的维度;
    利用所述编码全连接隐藏层对所述第五编码图像数据进行激活、批标准化或丢弃正则化处理,得到第六编码图像数据;
    利用编码全连接输出层对所述第六编码图像数据进行处理,得到所述第一编码图像帧。
  11. 根据权利要求10所述的视频处理方法,其特征在于,所述对所述第一编码图像帧进行解码处理,得到解码图像帧的步骤,包括:
    接收所述编码全连接输出层输出的所述第一编码图像帧;
    利用所述解码全连接隐藏层对所述第一编码图像帧进行处理,得到第一解码图像数据;
    在两个支路中分别设置反卷积隐藏层,分别利用每个支路中的所述反卷积隐藏层对所述第一解码图像数据进行反卷积、激活、上池化、批标准化或丢弃正则化处理,以得到两个第二解码图像数据;
    分别利用所述输出层对每个所述第二解码图像数据进行处理,得到第一解码图像帧和第二解码图像帧;
    对所述第二解码图像帧进行上采样处理,得到第三解码图像帧;
    将所述第一解码图像帧与所述第三解码图像帧进行合并,以得到所述解码图像帧。
  12. 根据权利要求10所述的视频处理方法,其特征在于,
    所述输入图像的颜色格式为亮度-红色差-蓝色差,所述第一输入层为亮度通道,所述第二输入层为红色差和蓝色差通道。
  13. 根据权利要求9所述的视频处理方法,其特征在于,所述对所述第一编码图像帧进行解码处理,得到解码图像帧的步骤,包括:
    接收所述编码全连接输出层输出的所述第一编码图像帧;
    利用所述解码全连接隐藏层对所述第一编码图像帧进行处理,得到第三解码图像数据;
    在至少两个支路中分别设置反卷积隐藏层,分别利用每个支路中的所述反卷积隐藏层对所述第三解码图像数据进行反卷积、激活、上池化、批标准化或丢弃正则化处理,得到至少两个第四解码图像数据;
    分别利用所述输出层对每个所述第四解码图像数据进行处理,得到相应的所述解码图像帧;
    其中,所述输出层的数量与所述支路的数量相同,每个支路中的所述反卷积隐藏层的数量以及反卷积核的数量不同,且不共享权重,任意两个所述解码图像帧的分辨率不同,且分辨率越高其所在支路对应的所述反卷积隐藏层的数量越多。
  14. 一种移动终端,包括互相连接的存储器和处理器,其中,所述存储器用于存储计算机程序,所述计算机程序在被所述处理器执行时,用 于实现权利要求1-6中任一项所述的视频处理方法。
  15. 一种服务器,包括互相连接的存储器和处理器,其中,所述存储器用于存储计算机程序,所述计算机程序在被所述处理器执行时,用于实现权利要求7-13中任一项所述的视频处理方法。
  16. 一种视频处理系统,其特征在于,包括互相连接的服务器和移动终端,其中,所述服务器用于对输入图像进行编码处理,得到编码图像帧,所述移动终端用于对所述编码图像帧进行解码,得到解码图像帧,其中,所述移动终端为权利要求14所述的移动终端,所述服务器为权利要求15所述的服务器。
  17. 一种计算机存储介质,用于存储计算机程序,其特征在于,所述计算机程序在被处理器执行时,用于实现权利要求1-13中任一项所述的视频处理方法。
PCT/CN2019/087662 2019-05-20 2019-05-20 一种视频处理方法、系统、移动终端、服务器及存储介质 WO2020232613A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/087662 WO2020232613A1 (zh) 2019-05-20 2019-05-20 一种视频处理方法、系统、移动终端、服务器及存储介质

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/087662 WO2020232613A1 (zh) 2019-05-20 2019-05-20 一种视频处理方法、系统、移动终端、服务器及存储介质

Publications (1)

Publication Number Publication Date
WO2020232613A1 true WO2020232613A1 (zh) 2020-11-26

Family

ID=73459256

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/087662 WO2020232613A1 (zh) 2019-05-20 2019-05-20 一种视频处理方法、系统、移动终端、服务器及存储介质

Country Status (1)

Country Link
WO (1) WO2020232613A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114066764A (zh) * 2021-11-23 2022-02-18 电子科技大学 基于距离加权色偏估计的沙尘退化图像增强方法及装置
CN114529746A (zh) * 2022-04-02 2022-05-24 广西科技大学 基于低秩子空间一致性的图像聚类方法
CN115984675A (zh) * 2022-12-01 2023-04-18 扬州万方科技股份有限公司 一种用于实现多路视频解码及ai智能分析的系统及方法
CN117829312A (zh) * 2023-12-29 2024-04-05 南京硅基智能科技有限公司 视频驱动数字人表情模型的生成方法、装置及设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101795344A (zh) * 2010-03-02 2010-08-04 北京大学 数字全息图像压缩、解码方法及系统、传输方法及系统
CN105740916A (zh) * 2016-03-15 2016-07-06 北京陌上花科技有限公司 图像特征编码方法及装置
CN107018422A (zh) * 2017-04-27 2017-08-04 四川大学 基于深度卷积神经网络的静止图像压缩方法
CN107396124A (zh) * 2017-08-29 2017-11-24 南京大学 基于深度神经网络的视频压缩方法
US20190058489A1 (en) * 2017-08-21 2019-02-21 Kabushiki Kaisha Toshiba Information processing apparatus, information processing method, and computer program product

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101795344A (zh) * 2010-03-02 2010-08-04 北京大学 数字全息图像压缩、解码方法及系统、传输方法及系统
CN105740916A (zh) * 2016-03-15 2016-07-06 北京陌上花科技有限公司 图像特征编码方法及装置
CN107018422A (zh) * 2017-04-27 2017-08-04 四川大学 基于深度卷积神经网络的静止图像压缩方法
US20190058489A1 (en) * 2017-08-21 2019-02-21 Kabushiki Kaisha Toshiba Information processing apparatus, information processing method, and computer program product
CN107396124A (zh) * 2017-08-29 2017-11-24 南京大学 基于深度神经网络的视频压缩方法

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114066764A (zh) * 2021-11-23 2022-02-18 电子科技大学 基于距离加权色偏估计的沙尘退化图像增强方法及装置
CN114066764B (zh) * 2021-11-23 2023-05-09 电子科技大学 基于距离加权色偏估计的沙尘退化图像增强方法及装置
CN114529746A (zh) * 2022-04-02 2022-05-24 广西科技大学 基于低秩子空间一致性的图像聚类方法
CN114529746B (zh) * 2022-04-02 2024-04-12 广西科技大学 基于低秩子空间一致性的图像聚类方法
CN115984675A (zh) * 2022-12-01 2023-04-18 扬州万方科技股份有限公司 一种用于实现多路视频解码及ai智能分析的系统及方法
CN115984675B (zh) * 2022-12-01 2023-10-13 扬州万方科技股份有限公司 一种用于实现多路视频解码及ai智能分析的系统及方法
CN117829312A (zh) * 2023-12-29 2024-04-05 南京硅基智能科技有限公司 视频驱动数字人表情模型的生成方法、装置及设备

Similar Documents

Publication Publication Date Title
CN110139147B (zh) 一种视频处理方法、系统、移动终端、服务器及存储介质
WO2020232613A1 (zh) 一种视频处理方法、系统、移动终端、服务器及存储介质
CN112991203B (zh) 图像处理方法、装置、电子设备及存储介质
TWI826321B (zh) 提高影像品質的方法
CN108337465B (zh) 视频处理方法和装置
US10819994B2 (en) Image encoding and decoding methods and devices thereof
US10560731B2 (en) Server apparatus and method for content delivery based on content-aware neural network
CN111586412B (zh) 高清视频处理方法、主设备、从设备和芯片系统
CN110827380B (zh) 图像的渲染方法、装置、电子设备及计算机可读介质
CN113747242B (zh) 图像处理方法、装置、电子设备及存储介质
WO2023005140A1 (zh) 视频数据处理方法、装置、设备以及存储介质
US20190171916A1 (en) Increasing network transmission capacity and data resolution quality and computer systems and computer-implemented methods for implementing thereof
WO2022268181A1 (zh) 视频增强处理方法、装置、电子设备和存储介质
CN112330541A (zh) 直播视频处理方法、装置、电子设备和存储介质
CN107396002B (zh) 一种视频图像的处理方法及移动终端
Huang et al. A cloud computing based deep compression framework for UHD video delivery
CN113436061B (zh) 人脸图像重构方法及系统
CN113822803A (zh) 图像超分处理方法、装置、设备及计算机可读存储介质
Kato et al. Split rendering of the transparent channel for cloud ar
WO2023010981A1 (zh) 编解码方法及装置
CN115665427A (zh) 直播数据的处理方法、装置及电子设备
CN114885178A (zh) 基于双向帧预测的极低码率人脸视频混合压缩方法及系统
You et al. CNN-Based Local Tone Mapping in the Perceptual Quantization Domain
Groth et al. Wavelet-Based Fast Decoding of 360 Videos
CN113450293A (zh) 视频信息处理方法、装置、系统、电子设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19929924

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19929924

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 19929924

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 09-06-2022)

122 Ep: pct application non-entry in european phase

Ref document number: 19929924

Country of ref document: EP

Kind code of ref document: A1