WO2022021519A1

WO2022021519A1 - Video decoding method, system and device and computer-readable storage medium

Info

Publication number: WO2022021519A1
Application number: PCT/CN2020/111448
Authority: WO
Inventors: 王荣刚; 王振宇; 高文
Original assignee: 北京大学深圳研究生院
Priority date: 2020-07-29
Filing date: 2020-08-26
Publication date: 2022-02-03
Also published as: CN111800653B; CN111800653A

Abstract

Disclosed in the present application are a video decoding method, system and device and a computer-readable storage medium. In the video decoding method, before an image code stream is decoded, background frame code streams corresponding to different viewpoints of a video to be playbacked are firstly decoded to obtain reconstructed background frames, so that part of necessary operations during image code stream decoding are completed in advance, thus reducing the processing burden of a device to perform image code stream decoding and improving the overall video decoding efficiency; the image code stream corresponding to each target video is decoded by only referring to the independent reference basis of the reconstructed background frames during the decoding, and decoding of each image code stream only relies on the reconstructed background frames and does not rely on other decoded images, so that the purposes of real-time free switch and smooth playback of target videos corresponding to different viewpoints are achieved; and meanwhile, compared with a mode completely without inter-frame dependence, the method is higher in video compression efficiency and simple in system implementation.

Description

Video decoding method, system, device, and computer-readable storage medium

This application claims the priority of the Chinese patent application with the application number 202010748734.3 and the invention title "video decoding method, system, device and computer-readable storage medium" filed with the China Patent Office on July 29, 2020, the entire contents of which are passed Reference is incorporated in this application.

technical field

The present application relates to the technical field of digital signal processing, and in particular, to a video decoding method, system, device, and computer-readable storage medium.

Background technique

Free viewpoint applications allow viewers to watch videos in a range of continuous viewpoints. The viewer can set the position and angle of the viewpoint, instead of being limited to a fixed camera angle. This application often requires multiple cameras to shoot at the same time to generate videos from multiple viewpoints at the same time. Virtual reality technology allows users to watch videos in a 180- or 360-degree view. Therefore, whether it is a multi-channel video in a free viewpoint application or a panoramic video in a virtual reality application, there is a very large amount of data, which brings great challenges to video transmission. At present, the commonly used methods of encoding and decoding such videos are usually divided into two categories. One is the method with inter-frame dependence. Although this method can achieve higher compression efficiency, it needs to rely on other decoded images during decoding. , so the fluency is not good when switching viewpoints; the other is the way that there is no inter-frame dependence. Although this way can achieve better fluency of viewpoint switching, but because there is no inter-frame dependence during compression, the compression efficiency is not high. ideal. Therefore, the above situations reflect the problem that the existing encoding and decoding methods for multi-channel video or panoramic video are difficult to take into account both the compression efficiency and the smoothness of viewpoint switching.

technical solutions

The main purpose of this application is to provide a video decoding method, system, device and computer-readable storage medium to solve the technology that it is difficult to take into account the compression efficiency and the smoothness of viewpoint switching through the existing encoding and decoding methods of multi-channel video or panoramic video. problem.

In order to achieve the above object, the present application provides a video decoding method, the video decoding method includes:

receiving multiple background frame code streams corresponding to different viewpoints sent by the encoding end, and decoding the multiple background frame code streams to obtain multiple reconstructed background frames;

When a playback viewpoint selection instruction is received, according to the viewpoint selection sequence determined by the playback viewpoint selection instruction, multiple image streams corresponding to different viewpoints of the target video are received, and based on a plurality of the reconstructed background frames, the different The multi-channel image code stream of the viewpoint is decoded to obtain the multi-channel target video.

In addition, in order to achieve the above purpose, the present application also provides a video decoding system, the video decoding system includes:

The background code stream decoding module is used for receiving multiple background frame code streams corresponding to different viewpoints sent by the encoder, and decoding the multiple background frame code streams to obtain multiple reconstructed background frames;

The image code stream decoding module is used for receiving the multi-channel image code streams corresponding to different viewpoints of the target video according to the viewpoint selection sequence determined by the playback viewpoint selection instruction when receiving the playback viewpoint selection instruction, and based on a plurality of The reconstructed background frame decodes the multi-channel image code streams of different viewpoints to obtain multi-channel target videos.

Optionally, the image code stream decoding module includes:

The viewpoint-by-view determination unit is configured to, when receiving the playback viewpoint selection instruction sent by the user, determine the target playback viewpoints currently selected by the user one by one based on the playback viewpoint selection instruction, and obtain the target playback viewpoints generated by the encoding end and the target playback viewpoints one by one. One image stream corresponding to the viewpoint;

A code stream decoding unit one by one, configured to select a target reconstructed background frame corresponding to the target playback viewpoint from a plurality of the reconstructed background frames, and based on the target reconstructed background frame, pair the target reconstruction background frame corresponding to the target playback viewpoint One channel of the image stream is decoded to obtain and play a channel of target video corresponding to the target playback viewpoint.

Optionally, the playback viewpoint selection instruction includes a first viewpoint selection instruction,

The viewpoint-by-view determination unit is further configured to, when receiving a first-time viewpoint selection instruction sent by the user, acquire the first-time viewpoint selection instruction in the first-time viewpoint selection instruction, and determine the first viewpoint corresponding to the first-time selection viewpoint. The first video number of the target video;

determining the first image number of the video image at the initial position in the first target video;

Receive a first image code stream corresponding to the first video number and the first image number from the encoding end;

The code stream one by one decoding unit is further configured to, according to the first video coding, determine a first reconstructed background frame corresponding to the first target video from a plurality of the reconstructed background frames, and based on the first reconstructed background frame The background frame is reconstructed, and the first image code stream is decoded.

Optionally, the playback viewpoint selection instruction includes a viewpoint switching instruction,

The viewpoint-by-view determining unit is further configured to: when receiving a viewpoint switching instruction sent by a user, acquire a target switching viewpoint in the viewpoint switching instruction, and determine the second target video corresponding to the target switching viewpoint. video number;

Acquire the playback progress of the first target video, and determine the second image number of the second video image corresponding to the playback progress in the second target video;

Receive a second image code stream corresponding to the second video number and the second image number from the encoding end;

The viewpoint-by-view determination unit is further configured to, according to the second video coding, determine a second reconstructed background frame corresponding to the second target video from a plurality of the reconstructed background frames, and based on the second reconstructed background frame For background frames, the second image code stream is decoded.

Optionally, the image code stream decoding module further includes:

The inter-frame prediction judgment unit is configured to judge whether the image block corresponding to the image code stream uses the inter-frame prediction mode when decoding each of the image code streams, wherein the image code stream is composed of multiple the code stream composition of the image block;

a motion vector setting unit, configured to set the motion vector in the image block to 0 if the image block uses an inter prediction mode;

A residual skip decoding unit, configured to use the reconstructed background frame corresponding to the image block using the inter-frame prediction mode as a reference frame, and skip the motion vector residual information corresponding to the image block using the inter-frame prediction mode. , and decode the image code stream.

Optionally, the background frame code stream is obtained by encoding a background frame obtained by background modeling according to a video image in the target video.

Optionally, the method for performing background modeling on the video image includes a single-frame generation method, a median filtering method, and an average filtering method.

In addition, in order to achieve the above object, the present application also provides a video decoding device, the video decoding device includes: a memory, a processor, and a computer-readable instruction stored on the memory and executable on the processor, The computer-readable instructions, when executed by the processor, implement the steps of the video decoding method as described above.

In addition, in order to achieve the above purpose, the present application also provides a computer-readable storage medium, where computer-readable instructions are stored on the computer-readable storage medium, and when the computer-readable instructions are executed by a processor, the above video can be realized The steps of the decoding method.

The present application provides a video decoding method, system, device, and computer-readable storage medium. This embodiment of the present application obtains multiple reconstructed background frames by receiving multiple background frame code streams corresponding to different viewpoints sent by the encoding end, and decoding the multiple background frame code streams; In the viewpoint selection order determined by the playback viewpoint selection instruction, receive the multi-channel image code streams corresponding to different viewpoints of the target video, and decode the multi-channel image code streams of the different viewpoints based on a plurality of the reconstructed background frames, to get multi-channel target video. In the above manner, the present application decodes the background frame code streams corresponding to different viewpoints of the video to be played before decoding the image code stream to obtain the reconstructed background frame, so that all the decoding process of the image code stream is completed in advance. The necessary part of the operation reduces the processing burden of the device when decoding the image code stream and improves the overall video decoding efficiency; by only referring to the independent reference basis of the reconstructed background frame during decoding, the corresponding target video of each channel is determined. The image code stream is decoded. Since the decoding of each image code stream only depends on the reconstructed background frame, and does not depend on other decoded images, it achieves the purpose of real-time free switching and smooth playback between target videos corresponding to different viewpoints. The method of inter-frame dependence has higher video compression efficiency, thereby solving the technical problem that the existing encoding and decoding methods of multi-channel video or panoramic video cannot take into account the compression efficiency and the smoothness of viewpoint switching.

Description of drawings

1 is a schematic structural diagram of a video decoding device of a hardware operating environment involved in a solution of an embodiment of the present application;

2 is a schematic flowchart of a first embodiment of a video decoding method of the present application;

FIG. 3 is a system structure diagram in a specific embodiment of the video decoding method of the present application;

4 is a schematic diagram of panorama image division in another specific embodiment of the video decoding method of the present application;

FIG. 5 is a schematic diagram of encoding/decoding and code stream storage in another specific embodiment of the video decoding method of the present application.

Embodiments of the present invention

It should be understood that the specific embodiments described herein are only used to explain the present application, but not to limit the present application.

As shown in FIG. 1 , FIG. 1 is a schematic structural diagram of a video decoding device of a hardware operating environment involved in an embodiment of the present application.

The video decoding device in this embodiment of the present application may be a server, a PC, or a terminal device such as a smart phone or a tablet computer.

As shown in FIG. 1 , the video decoding device may include: a processor 1001 , such as a CPU, a communication bus 1002 , a user interface 1003 , a network interface 1004 , and a memory 1005 . Among them, the communication bus 1002 is used to realize the connection and communication between these components. The user interface 1003 may include a display screen (Display), an input unit such as a keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface and a wireless interface. Optionally, the network interface 1004 may include a standard wired interface and a wireless interface (eg, a WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (non-volatile memory). memory), such as disk storage. Optionally, the memory 1005 may also be a storage device independent of the aforementioned processor 1001 .

Those skilled in the art can understand that the terminal structure shown in FIG. 1 does not constitute a limitation on the video decoding device, and may include more or less components than the one shown, or combine some components, or arrange different components.

As shown in FIG. 1 , the memory 1005 as a computer storage medium may include an operating system, a network communication module, a user interface module, and computer-readable instructions.

In the terminal shown in FIG. 1 , the network interface 1004 is mainly used to connect to the background server and perform data communication with the background server; the user interface 1003 is mainly used to connect to the client (client) and perform data communication with the client; and the processor 1001 can be used to call the computer-readable instructions stored in the memory 1005, and execute and execute the video decoding method provided by the embodiment of the present invention.

Based on the above hardware structure, various embodiments of the video decoding method of the present application are proposed.

In order to solve the above problems, the present application provides a video decoding method, that is, before decoding the image code stream, the background frame code stream corresponding to different viewpoints of the video to be played is decoded to obtain the reconstructed background frame, so that the completion of the process is completed in advance. It eliminates some necessary operations when decoding the image code stream, reduces the processing burden of the device when decoding the image code stream, and improves the overall video decoding efficiency; by only referring to the independent reference basis for reconstructing the background frame during decoding Decode the image stream corresponding to each target video. Since the decoding of each image stream only depends on the reconstructed background frame, not on other decoded images, it can achieve real-time free switching between target videos corresponding to different viewpoints and smooth playback. At the same time, it has higher video compression efficiency than the method without inter-frame dependence, thus solving the technical problem that the existing encoding and decoding methods of multi-channel video or panoramic video cannot take into account the compression efficiency and the smoothness of viewpoint switching. .

Referring to FIG. 2, FIG. 2 is a schematic flowchart of a first embodiment of a video decoding method. The first embodiment of the present application provides a video decoding method, the video decoding method is applied to a decoding end, and the video decoding method includes the following steps:

Step S10, receiving multiple background frame code streams corresponding to different viewpoints sent by the encoding end, and decoding the multiple background frame code streams to obtain multiple reconstructed background frames;

Step S20, when receiving the playback viewpoint selection instruction, according to the viewpoint selection sequence determined by the playback viewpoint selection instruction, receive multiple image code streams corresponding to different viewpoints of the target video, and based on a plurality of the reconstructed background frames. The multi-channel image code streams of different viewpoints are decoded to obtain multi-channel target videos.

In this embodiment, the decoding end can acquire, in real time, the multi-channel image code streams output from the encoding end after compressing and encoding the multi-channel target videos, and each background frame code stream corresponding to each channel of the target video. The decoding end decodes each background frame code stream according to the coding standard consistent with the encoding end to obtain the corresponding reconstructed background frame. The decoding end uses the reconstructed background frame as a reference frame to decode the image code stream and output it in real time, so as to obtain the video image frames composing the multi-channel target video and play the multi-channel target video. It should be noted that the operation of decoding the background frame code stream by the decoding end is completed before decoding and playing the image code stream. When the decoding end decodes multiple image streams and background frame streams, it can start a decoder for each target video. The background frame streams and image streams of a certain target video use the corresponding video of this channel. The decoder can be used for decoding; or only one decoder can be started, and the background frame code stream and image code stream of each video are decoded using this decoder.

As a specific embodiment, for the practical application scenario of free-view live broadcast, as shown in FIG. 3 , in this specific embodiment, 16 cameras need to be deployed to shoot video, each camera is a shooting viewpoint, and 16 background generators are used to capture video. And 16 encoders to generate 16 real-time code streams (including background frame code stream and image code stream), and store the real-time code stream in the http server for the client to download. The client receives the background frame streams in the 16 real-time streams, and creates 16 decoders to decode the 16 background frame streams respectively; during the playback process, the video number and video frame to be played at the next moment are selected in real time The serial number in the video stream is obtained, and the image code stream in the real-time code stream is obtained; finally, the decoder corresponding to the video number is called to decode, and after the decoded video image is obtained, it is transmitted to the playback module of the client for playback.

The encoding process at the encoding end in this embodiment is as follows: in the first step, each background generator collects 100 frames of images from the corresponding cameras; in the second step, the background generator generates a background frame through median filtering, that is, each pixel The pixel values of the corresponding pixels on the background frame are obtained by the following operations: Obtain 100 pixel values of the corresponding pixels in 100 frames, sort the pixel values by numerical value, and select the 50th pixel value as the pixel value of the corresponding pixel on the background frame. . In the third step, each encoder uses the AVS2 coding standard to perform I-frame (intra-frame prediction image) encoding on the background frame of the corresponding channel and output the code stream, and obtain the reconstructed background frame at the same time. In the fourth step, each encoder continues to obtain the to-be-encoded image from the corresponding camera. In the fifth step, each encoder encodes the acquired image to be encoded and outputs a code stream. The encoding process uses the reconstructed background frame of the corresponding road as the reference frame, and encodes the to-be-encoded image based on the S-frame (single-forward inter-frame decoded image that should refer to the scene image) encoding of the AVS2 encoding standard. The fourth step and the fifth step above are repeatedly performed, and the code stream of the video image is continuously generated and output.

In this embodiment, at the decoding end, the process of video stream transmission and decoding is as follows: the first step is to transmit the background frame code stream corresponding to each channel of video, allocate a decoder for each channel of video, and analyze the background frame code stream of the corresponding channel. Decoding is performed to obtain the reconstructed background frame corresponding to each channel of video. The decoding process uses the I-frame decoding method of the AVS2 coding standard for decoding. The second step is to cache the reconstructed background frames of each video. The reconstructed background frame of each channel is buffered inside the corresponding decoder. In the third step, the video number K where the image code stream to be decoded is located and the frame number L in the video are obtained. The specific acquisition method is: during playback, determine the currently playing video number K according to the currently playing viewpoint selected by the user, and calculate the frame number L in the corresponding video of the image that should be displayed at the current moment according to the current moment. The fourth step is to request and transmit the L-th frame code stream of the K-th video from the http service. The fifth step is to decode the L-th frame code stream of the K-th video by taking the reconstructed background frame of the K-th video as a reference frame to obtain a decoded image and provide it to the playback module for display. The decoding process is carried out in the S-frame mode of the AVS2 coding standard. The third to fifth steps above are repeatedly performed until the playback ends.

As another specific embodiment, for the practical application scenario of virtual reality video encoding, decoding and playback, the original panoramic video is divided into regions as shown in FIG. 4, and the image of the original panoramic video is divided into 9 regions from 0 to 8 in total , the picture of each area constitutes one video, a total of 9 video. The encoding and decoding process and the code stream storage method are shown in Figure 5. During the encoding process, the corresponding background frame is first generated for each channel of video, and the encoder first encodes the background frames of each channel of video in sequence to generate a code stream, and obtains the reconstructed background frame. Then encode the first frame of each channel of video one by one in order to generate a corresponding code stream; then encode the second frame of each channel of video one by one in order to generate a corresponding code stream, and so on. Wherein, when encoding any frame of the Nth video, the reconstructed background frame of the Nth video is used as a reference frame. During the decoding process, the decoder firstly decodes each background frame in the code stream one by one and buffers it. During the playback process, at any time, according to the playback focus selected by the user, determine the number of channels that currently need to be displayed, read the code stream corresponding to the frame, and then use the reconstructed background frame of the corresponding channel as the reference frame to decode the code stream of the frame. Get the decoded image.

The specific encoding process at the encoding end in this embodiment is as follows: the first step is to acquire the first 100 frames of images of each channel of video; the second step is to use the first 100 frames of images of each channel of video to generate background frames of each channel of video through mean filtering. That is, perform the following operations on each pixel to obtain the pixel value of the corresponding pixel on the background frame: obtain 100 pixel values of the corresponding pixel in 100 frames, calculate the average value of the 100 pixel values, and use it as the pixel value of the corresponding pixel on the background frame. . The third step is to use the H.265 coding standard to perform I-frame coding on each background frame in sequence and output the code stream, and obtain the reconstructed background frame at the same time. In the fourth step, the current to-be-encoded image is acquired according to the picture sequence and the time sequence. The specific sequence is shown in Figure 5. The time sequence is prioritized, and the first frame of all the ways is encoded first, then the second frame of all the ways is encoded, and so on. When encoding the image frames of all the channels at the same time, they are encoded sequentially from the 0th channel to the 8th channel according to the picture sequence. The fifth step is to encode the acquired image to be encoded and output the code stream. The encoding process uses the reconstructed background frame of the corresponding road as a reference frame, and encodes the to-be-encoded image based on the P-frame encoding method of the H.265 encoding standard. The fourth step and the fifth step above are repeatedly performed, and the code stream of the video image is continuously generated and output.

In this embodiment, at the decoding end, the specific process of reading and decoding is as follows: the first step is to read and decode the background frame code stream corresponding to each channel of video to obtain the reconstructed background frame corresponding to each channel of video. The decoding process uses the I-frame decoding method of the H.265 coding standard for decoding. The second step is to cache the reconstructed background frames of each video. In the third step, the video number K where the image code stream to be decoded is located and the frame number L in the video are obtained. The specific acquisition method is: during playback, determine the current corresponding video number K according to the current playback focus selected by the user, and calculate the frame number L in the corresponding video of the image that should be displayed at the current moment according to the current moment. The fourth step is to read the L-th frame code stream of the K-th video. The fifth step is to decode the L-th frame code stream of the K-th video by taking the reconstructed background frame of the K-th video as a reference frame to obtain a decoded image. The decoding process is decoded in the way of P frames of the h.265 coding standard. The third to fifth steps above are repeatedly performed until the playback ends.

In this embodiment, before decoding the image code stream, the background frame code stream corresponding to different viewpoints of the video to be played is decoded to obtain the reconstructed background frame, so that all the decoding process of the image code stream is completed in advance. The necessary part of the operation reduces the processing burden of the device when decoding the image code stream and improves the overall video decoding efficiency; by only referring to the independent reference basis of the reconstructed background frame during decoding, the corresponding target video of each channel is determined. The image code stream is decoded. Since the decoding of each image code stream only depends on the reconstructed background frame, and does not depend on other decoded images, it achieves the purpose of real-time free switching and smooth playback between target videos corresponding to different viewpoints. The method of inter-frame dependence has higher video compression efficiency, thereby solving the technical problem that the existing encoding and decoding methods of multi-channel video or panoramic video cannot take into account the compression efficiency and the smoothness of viewpoint switching.

Further, based on the first embodiment shown in FIG. 2 above, a second embodiment of the video decoding method of the present application is proposed. In this embodiment, step S20 includes:

Step a, when receiving the playback viewpoint selection instruction sent by the user, determine the target playback viewpoint currently selected by the user one by one based on the playback viewpoint selection instruction, and obtain a path corresponding to the target playback viewpoint generated by the encoding end. image stream;

Step b, selecting a target reconstructed background frame corresponding to the target playback viewpoint from a plurality of the reconstructed background frames, and based on the target reconstructed background frame to the one image code stream corresponding to the target playback viewpoint. Decoding is performed to obtain and play a target video corresponding to the target playback viewpoint.

In this embodiment, after decoding the background frame code streams corresponding to multiple different viewpoints to obtain the reconstructed background frame corresponding to each different viewpoint, the decoding end can receive and decode the viewpoints according to the real-time selection of viewpoints by the user. The image code stream of the corresponding viewpoint. If there are currently eight image streams corresponding to eight different viewpoints and one reconstructed background frame, each image stream can be played as a target video after decoding. If the user selects the first view point first, the decoding end can directly use the reconstructed background frame of the first view point obtained by decoding the background frame code stream before to decode and output the first image code stream for playback. The target video corresponding to the first viewpoint. If the user switches the first viewpoint to the second viewpoint among the eight viewpoints during playback, the decoding end can receive the second image stream, and directly select the reconstructed background frame corresponding to the second viewpoint for its corresponding position for subsequent decoding and playback. Before the entire video is played, the user can switch the current playback viewpoint at any time, and the decoding end will receive the corresponding number of image streams at any time, and decode it through the existing reconstructed background frames, so as to realize the video switching of different viewpoints. play.

It should be noted that, in practical applications, there are many different situations such as live broadcast, VOD and local playback. For the application scenario of live broadcast, the image code stream is generated by the encoder and sent directly to the decoder; for the application scenarios of on-demand and local playback, the image code stream is generated by the encoder in advance and then uploaded to the server or locally. Then, when the user needs to watch, the decoding end obtains it from the server or from a local file.

Further, in this embodiment, the playback viewpoint selection instruction includes a first viewpoint selection instruction, and step a includes:

Step c, when receiving the first viewpoint selection instruction sent by the user, obtain the first selected viewpoint in the first viewpoint selection instruction, and determine the first video number of the first target video corresponding to the first selected viewpoint ;

Step d, determine the first image number of the video image at the initial position in the first target video;

Step e: Receive a first image code stream corresponding to the first video number and the first image number from the encoding end.

Step b includes:

Step f, according to the first video encoding, determine a first reconstructed background frame corresponding to the first target video from a plurality of the reconstructed background frames, and based on the first reconstructed background frame, perform an analysis on the first reconstructed background frame. An image code stream is decoded.

Further, the playback viewpoint selection instruction includes a viewpoint switching instruction,

Step a also includes:

Step g, when receiving the viewpoint switching instruction sent by the user, acquiring the target switching viewpoint in the viewpoint switching instruction, and determining the second video number of the second target video corresponding to the target switching viewpoint;

Step h, acquiring the playback progress of the first target video, and determining the second image number of the second video image corresponding to the playback progress in the second target video;

Step i, receive the second image code stream corresponding to the second video number and the second image number from the encoding end;

Step b further includes: the first viewpoint selection instruction is an instruction initiated by the user to perform viewpoint selection for the first time. The first target video is one target video corresponding to the viewpoint selected for the first time.

Step j, according to the second video encoding, determine a second reconstructed background frame corresponding to the second target video from a plurality of the reconstructed background frames, and based on the second reconstructed background frame, perform an analysis on the first reconstructed background frame. The two-image code stream is decoded.

In this embodiment, the first-time viewpoint selection instruction is an instruction issued by the user when the user selects a viewpoint for the first time. The viewpoint selected for the first time is the viewpoint currently selected for playback for the first time. The first target video is a target video corresponding to the viewpoint selected for the first time. The first video number is the belonging number of the first target video. The first image number is a video image number corresponding to the multi-frame video images composing the first target video when the current video playback progress is zero. The first image code stream is the compressed image code stream of the video image corresponding to the first video number. The first reconstructed background frame is a reconstructed background frame corresponding to the viewpoint selected for the first time. The viewpoint switching instruction is an instruction issued when the user intends to switch the currently selected viewpoint. The target switching viewpoint is the new playback viewpoint currently selected by the user. The second target video is a target video corresponding to the target switching viewpoint. The second video number is the belonging number of the second target video. The second image number is a video image number corresponding to the multiple frames of video images composing the second target video according to the current video playback progress. The second image code stream is the compressed image code stream of the video image corresponding to the second video number. The second reconstructed background frame is a reconstructed background frame corresponding to the target switching viewpoint.

Specifically, the decoding end transmits or reads the background frame code stream corresponding to each channel of video, and decodes it to obtain the reconstructed background frame corresponding to each channel of video. In the decoding process, according to the actual way of encoding the background frame by the encoding end, the decoding end uses the corresponding encoding standard to decode the background frame code stream. The second step is to cache the reconstructed background frames of each video. In the third step, the video number K where the image code stream to be decoded is located and the frame number L in the video are obtained. The specific acquisition method is: when playing, select the corresponding video number K according to the viewpoint or focus currently selected by the user, and at the same time, according to the current moment, obtain the frame number L of the corresponding image in the corresponding video; the fourth step is to transmit or read The L-th frame code stream of the K-th video. The fifth step is to decode the L-th frame code stream of the K-th video by taking the reconstructed background frame of the K-th video as a reference frame to obtain a decoded image. In the decoding process, according to the specific method of encoding the frame image at the encoding end, the decoding end uses the corresponding encoding standard to decode the background frame. The third to fifth steps above are repeatedly performed until the playback ends. It should be noted that, in actual situations, it may be necessary to determine the viewpoint, determine the reconstructed background frame to be used, and determine the image number according to the current playback progress before decoding each frame of the image stream, so as to receive the corresponding the image stream frame and decode it for playback.

Further, based on the first embodiment shown in FIG. 2 above, a third embodiment of the video decoding method of the present application is proposed. In this embodiment, the step of decoding the multi-channel image code streams of different viewpoints based on a plurality of the reconstructed background frames in this embodiment includes:

Step k, when decoding each of the image code streams, determine that the image block corresponding to the image code stream uses the inter prediction mode, and set the motion vector in the image block to 0, wherein the The image code stream is composed of a plurality of code streams of the image blocks;

Step 1, using the reconstructed background frame corresponding to the image block in the inter-frame prediction mode as a reference frame, and by skipping the motion vector residual information corresponding to the image block using the inter-frame prediction mode, to the image code. stream to decode.

In this embodiment, it should be noted that, at the encoding end, the encoding process is to cut the image into image blocks one by one as encoding units, and each image block can be selected from inter-frame prediction or intra-frame prediction. When the inter prediction mode is used, the motion vectors in the x and y directions are both 0 by default, so the encoder may not encode the motion vector residual information, thereby saving encoding overhead. The encoding end determines whether the image blocks in each video image use the inter-frame prediction mode one by one, and when detecting that the current video image uses the inter-frame prediction mode, the motion vector in the current video image is set to 0 to skip the The process of encoding the motion vector residual information corresponding to the current video image, and at the decoding end, the image code stream is composed of code streams of a plurality of the image blocks. If the encoding end uses the default 0 motion vector for encoding, when the decoding end detects that the previous image block is encoded in the inter-frame prediction mode when decoding the image code stream, the motion vector corresponding to the image block can be used for encoding. Defaults to 0 to skip decoding motion vector residual information.

Further, in this embodiment, the background frame code stream is obtained by encoding a background frame obtained by background modeling according to a video image in the target video.

Further, in this embodiment, the method for performing background modeling on the video image includes a single-frame generation method, a median filtering method, and an average filtering method.

In this embodiment, the single-frame generation method is to directly select a frame in the video of this channel, for example, select the first frame or the last frame or a frame in the center as the background frame; the middle-finger filtering method is to select H frames from the video of this channel. , sort the H pixel values of each pixel point, and select the pixel at the position of H/2 or H/2-1 or H/2+1 as the filter output value of the pixel point, and then obtain the background frame; The filtering method is to select H frames from the video, and calculate the average value of H pixel values at each pixel position as the filtering output value of the pixel, and then obtain the background frame. First, the encoding end acquires multiple target videos collected based on different viewpoints, and generates a background frame corresponding to each target video one by one based on a single frame generation method, a median filtering method or a preset background modeling method of mean filtering; The encoding end traverses all the target videos to obtain each background frame corresponding to each target video, and then encodes each background frame to obtain the background frame code stream and the reconstructed background frame generated during the encoding process.

The application also provides a video decoding system, which implements the following steps:

The present application also provides a video decoding device.

The video decoding apparatus includes a processor, a memory, and computer-readable instructions stored on the memory and executable on the processor, wherein when the computer-readable instructions are executed by the processor, the implementation is as above. The steps of the video decoding method.

For the method implemented when the computer-readable instruction is executed, reference may be made to the various embodiments of the video decoding method of the present application, which will not be repeated here.

The present application also provides a computer-readable storage medium.

Computer-readable instructions are stored on the computer-readable storage medium of the present application, and when the computer-readable instructions are executed by a processor, implement the steps of the video decoding method described above.

It should be noted that, herein, the terms "comprising", "comprising" or any other variation thereof are intended to encompass non-exclusive inclusion, such that a process, method, article or system comprising a series of elements includes not only those elements, It also includes other elements not expressly listed or inherent to such a process, method, article or system. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in the process, method, article or system that includes the element.

The above-mentioned serial numbers of the embodiments of the present application are only for description, and do not represent the advantages or disadvantages of the embodiments.

From the description of the above embodiments, those skilled in the art can clearly understand that the method of the above embodiment can be implemented by means of software plus a necessary general hardware platform, and of course can also be implemented by hardware, but in many cases the former is better implementation. Based on this understanding, the technical solutions of the present application can be embodied in the form of software products in essence or the parts that make contributions to the prior art. The computer software products are stored in a storage medium (such as ROM/RAM) as described above. , magnetic disk, optical disk), including several instructions to make a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the methods described in the various embodiments of the present application.

The above are only the preferred embodiments of the present application, and are not intended to limit the patent scope of the present application. Any equivalent structure or equivalent process transformation made by using the contents of the description and drawings of the present application, or directly or indirectly applied in other related technical fields , are similarly included within the scope of patent protection of this application.

Claims

A video decoding method, wherein the video decoding method comprises:

receiving multiple background frame code streams corresponding to different viewpoints sent by the encoding end, and decoding the multiple background frame code streams to obtain multiple reconstructed background frames;

When a playback viewpoint selection instruction is received, according to the viewpoint selection sequence determined by the playback viewpoint selection instruction, multiple image streams corresponding to different viewpoints of the target video are received, and based on a plurality of the reconstructed background frames, the different The multi-channel image code stream of the viewpoint is decoded to obtain the multi-channel target video.
The video decoding method according to claim 1, wherein when receiving a playback viewpoint selection instruction, according to the viewpoint selection sequence determined by the playback viewpoint selection instruction, receive the multi-channel image codes corresponding to different viewpoints of the target video The steps of decoding the multi-channel image code streams of the different viewpoints based on a plurality of the reconstructed background frames to obtain the multi-channel target video include:

When receiving the playback viewpoint selection instruction sent by the user, determine the target playback viewpoints currently selected by the user one by one based on the playback viewpoint selection instruction, and acquire an image code stream corresponding to the target playback viewpoint generated by the encoder ;

A target reconstructed background frame corresponding to the target playback viewpoint is selected from a plurality of the reconstructed background frames, and one image code stream corresponding to the target playback viewpoint is decoded based on the target reconstructed background frame, to obtain and play a target video corresponding to the target playback viewpoint.
The video decoding method according to claim 2, wherein the playback viewpoint selection instruction comprises a first viewpoint selection instruction,

When receiving the playback viewpoint selection instruction sent by the user, the target playback viewpoint currently selected by the user is determined one by one based on the playback viewpoint selection instruction, and an image generated by the encoding end corresponding to the target playback viewpoint is obtained. The steps of the code stream include:

When receiving the first viewpoint selection instruction sent by the user, acquiring the first selected viewpoint in the first viewpoint selection instruction, and determining the first video number of the first target video corresponding to the first selected viewpoint;

determining the first image number of the video image at the initial position in the first target video;

Receive a first image code stream corresponding to the first video number and the first image number from the encoding end;

The target reconstructed background frame corresponding to the target playback viewpoint is selected from the plurality of reconstructed background frames, and the image code stream corresponding to the target playback viewpoint is performed based on the target reconstructed background frame. The steps of decoding include:

According to the first video coding, a first reconstructed background frame corresponding to the first target video is determined from a plurality of the reconstructed background frames, and based on the first reconstructed background frame, the first image code is stream to decode.
The video decoding method according to claim 3, wherein the playback viewpoint selection instruction comprises a viewpoint switching instruction,

When receiving the playback viewpoint selection instruction sent by the user, the target playback viewpoint currently selected by the user is determined one by one based on the playback viewpoint selection instruction, and an image generated by the encoding end corresponding to the target playback viewpoint is obtained. The steps of the code stream include:

When receiving the viewpoint switching instruction sent by the user, acquiring the target switching viewpoint in the viewpoint switching instruction, and determining the second video number of the second target video corresponding to the target switching viewpoint;

Acquire the playback progress of the first target video, and determine the second image number of the second video image corresponding to the playback progress in the second target video;

Receive a second image code stream corresponding to the second video number and the second image number from the encoding end;

The target reconstructed background frame corresponding to the target playback viewpoint is selected from the plurality of reconstructed background frames, and the image code stream corresponding to the target playback viewpoint is performed based on the target reconstructed background frame. The steps of decoding include:

According to the second video encoding, a second reconstructed background frame corresponding to the second target video is determined from a plurality of the reconstructed background frames, and based on the second reconstructed background frame, the second image code is stream to decode.
The video decoding method according to claim 1, wherein the step of decoding the multi-channel image code streams of the different viewpoints based on a plurality of the reconstructed background frames comprises:

When decoding each of the image code streams, it is determined that the image block corresponding to the image code stream uses the inter prediction mode, and the motion vector in the image block is set to 0, wherein the image code stream is composed of a plurality of code streams of the image blocks;

Taking the reconstructed background frame corresponding to the image block using the inter-frame prediction mode as a reference frame, and decoding the image code stream in a manner of skipping the motion vector residual information corresponding to the image block using the inter-frame prediction mode .
The video decoding method according to claim 1, wherein the background frame code stream is obtained by encoding a background frame obtained by background modeling according to a video image in the target video.
The video decoding method according to claim 6, wherein the method for performing background modeling on the video image includes a single-frame generation method, a median filtering method, and an average filtering method.
A video decoding system, wherein the video decoding system comprises:

The background code stream decoding module is used for receiving multiple background frame code streams corresponding to different viewpoints sent by the encoder, and decoding the multiple background frame code streams to obtain multiple reconstructed background frames;

The image code stream decoding module is used for receiving the multi-channel image code streams corresponding to different viewpoints of the target video according to the viewpoint selection sequence determined by the playback viewpoint selection instruction when receiving the playback viewpoint selection instruction, and based on a plurality of The reconstructed background frame decodes the multi-channel image code streams of different viewpoints to obtain multi-channel target videos.
The video decoding system according to claim 8, wherein the image code stream decoding module comprises:

The viewpoint-by-view determination unit is configured to, when receiving the playback viewpoint selection instruction sent by the user, determine the target playback viewpoints currently selected by the user one by one based on the playback viewpoint selection instruction, and obtain the target playback viewpoints generated by the encoding end and the target playback viewpoints one by one. One image stream corresponding to the viewpoint;

A code stream decoding unit one by one, configured to select a target reconstructed background frame corresponding to the target playback viewpoint from a plurality of the reconstructed background frames, and based on the target reconstructed background frame, pair the target reconstruction background frame corresponding to the target playback viewpoint One channel of the image stream is decoded to obtain and play a channel of target video corresponding to the target playback viewpoint.
The video decoding system of claim 8, wherein the playback viewpoint selection instruction comprises a first viewpoint selection instruction,

The viewpoint-by-view determination unit is further configured to, when receiving a first-time viewpoint selection instruction sent by the user, acquire the first-time viewpoint selection instruction in the first-time viewpoint selection instruction, and determine the first viewpoint corresponding to the first-time selection viewpoint. The first video number of the target video;

determining the first image number of the video image at the initial position in the first target video;

Receive a first image code stream corresponding to the first video number and the first image number from the encoding end;

The code stream one by one decoding unit is further configured to, according to the first video coding, determine a first reconstructed background frame corresponding to the first target video from a plurality of the reconstructed background frames, and based on the first reconstructed background frame The background frame is reconstructed, and the first image code stream is decoded.
The video decoding system according to claim 9, wherein the playback viewpoint selection instruction comprises a viewpoint switching instruction,

The viewpoint-by-view determining unit is further configured to: when receiving a viewpoint switching instruction sent by a user, acquire a target switching viewpoint in the viewpoint switching instruction, and determine the second target video corresponding to the target switching viewpoint. video number;

Acquire the playback progress of the first target video, and determine the second image number of the second video image corresponding to the playback progress in the second target video;

Receive a second image code stream corresponding to the second video number and the second image number from the encoding end;

The viewpoint-by-view determination unit is further configured to, according to the second video coding, determine a second reconstructed background frame corresponding to the second target video from a plurality of the reconstructed background frames, and based on the second reconstructed background frame For background frames, the second image code stream is decoded.
The video decoding system of claim 8, wherein the image code stream decoding module further comprises:

a motion vector setting unit, configured to determine that an image block corresponding to the image code stream uses an inter-frame prediction mode when decoding each of the image code streams, and set the motion vector in the image block to 0, Wherein, the image code stream is composed of a plurality of code streams of the image blocks;

A residual skip decoding unit, configured to use the reconstructed background frame corresponding to the image block using the inter-frame prediction mode as a reference frame, and skip the motion vector residual information corresponding to the image block using the inter-frame prediction mode. , and decode the image code stream.
The video decoding system according to claim 8, wherein the background frame code stream is obtained by encoding a background frame obtained by background modeling according to the video image in the target video.
The video decoding system according to claim 13, wherein the method for performing background modeling on the video image includes a single frame generation method, a median filtering method and an average filtering method.
A video decoding apparatus, wherein the video decoding apparatus includes a memory, a processor, and computer-readable instructions stored on the memory and executable on the processor, the computer-readable instructions being processed by the processor The following steps are implemented when the device is executed:

receiving multiple background frame code streams corresponding to different viewpoints sent by the encoding end, and decoding the multiple background frame code streams to obtain multiple reconstructed background frames;

When a playback viewpoint selection instruction is received, according to the viewpoint selection sequence determined by the playback viewpoint selection instruction, multiple image streams corresponding to different viewpoints of the target video are received, and based on a plurality of the reconstructed background frames, the different The multi-channel image code stream of the viewpoint is decoded to obtain the multi-channel target video.
A computer-readable storage medium, wherein computer-readable instructions are stored on the computer-readable storage medium, and when the computer-readable instructions are executed by a processor, the following steps are implemented:

receiving multiple background frame code streams corresponding to different viewpoints sent by the encoding end, and decoding the multiple background frame code streams to obtain multiple reconstructed background frames;

When a playback viewpoint selection instruction is received, according to the viewpoint selection sequence determined by the playback viewpoint selection instruction, multiple image streams corresponding to different viewpoints of the target video are received, and based on a plurality of the reconstructed background frames, the different The multi-channel image code stream of the viewpoint is decoded to obtain the multi-channel target video.
The computer-readable storage medium according to claim 16, wherein when receiving a playback viewpoint selection instruction, according to the viewpoint selection sequence determined by the playback viewpoint selection instruction, the received target video corresponds to multiple channels of different viewpoints. The steps of decoding the multi-channel image code streams of the different viewpoints based on a plurality of the reconstructed background frames to obtain the multi-channel target video include:

When receiving the playback viewpoint selection instruction sent by the user, determine the target playback viewpoints currently selected by the user one by one based on the playback viewpoint selection instruction, and acquire an image code stream corresponding to the target playback viewpoint generated by the encoder ;

A target reconstructed background frame corresponding to the target playback viewpoint is selected from a plurality of the reconstructed background frames, and based on the target reconstructed background frame, one image stream corresponding to the target playback viewpoint is decoded, to obtain and play a target video corresponding to the target playback viewpoint.
The computer-readable storage medium of claim 17, wherein the playback viewpoint selection instruction comprises a first viewpoint selection instruction,

When receiving the playback viewpoint selection instruction sent by the user, the target playback viewpoint currently selected by the user is determined one by one based on the playback viewpoint selection instruction, and an image generated by the encoding end corresponding to the target playback viewpoint is obtained. The steps of the code stream include:

When receiving the first viewpoint selection instruction sent by the user, acquiring the first selected viewpoint in the first viewpoint selection instruction, and determining the first video number of the first target video corresponding to the first selected viewpoint;

determining the first image number of the video image at the initial position in the first target video;

Receive a first image code stream corresponding to the first video number and the first image number from the encoding end;

The target reconstructed background frame corresponding to the target playback viewpoint is selected from the plurality of reconstructed background frames, and the image code stream corresponding to the target playback viewpoint is performed based on the target reconstructed background frame. The steps of decoding include:

According to the first video coding, a first reconstructed background frame corresponding to the first target video is determined from a plurality of the reconstructed background frames, and based on the first reconstructed background frame, the first image code is stream to decode.
The computer-readable storage medium of claim 18, wherein the playback viewpoint selection instruction comprises a viewpoint switching instruction,

When receiving the playback viewpoint selection instruction sent by the user, the target playback viewpoint currently selected by the user is determined one by one based on the playback viewpoint selection instruction, and an image generated by the encoding end corresponding to the target playback viewpoint is obtained. The steps of the code stream include:

When receiving the viewpoint switching instruction sent by the user, acquiring the target switching viewpoint in the viewpoint switching instruction, and determining the second video number of the second target video corresponding to the target switching viewpoint;

Acquire the playback progress of the first target video, and determine the second image number of the second video image corresponding to the playback progress in the second target video;

Receive a second image code stream corresponding to the second video number and the second image number from the encoding end;

The target reconstructed background frame corresponding to the target playback viewpoint is selected from the plurality of reconstructed background frames, and the image code stream corresponding to the target playback viewpoint is performed based on the target reconstructed background frame. The steps of decoding include:

According to the second video encoding, a second reconstructed background frame corresponding to the second target video is determined from a plurality of the reconstructed background frames, and based on the second reconstructed background frame, the second image code is stream to decode.
The computer-readable storage medium of claim 16, wherein the step of decoding the multiplexed image code streams of different viewpoints based on a plurality of the reconstructed background frames comprises:

When decoding each of the image code streams, it is determined that the image block corresponding to the image code stream uses the inter prediction mode, and the motion vector in the image block is set to 0, wherein the image code stream is composed of a plurality of code streams of the image blocks;

Taking the reconstructed background frame corresponding to the image block using the inter-frame prediction mode as a reference frame, and decoding the image code stream in a manner of skipping the motion vector residual information corresponding to the image block using the inter-frame prediction mode .