WO2022100742A1

WO2022100742A1 - Video encoding and video playback method, apparatus and system

Info

Publication number: WO2022100742A1
Application number: PCT/CN2021/130745
Authority: WO
Inventors: 郑洛
Original assignee: 华为云计算技术有限公司
Priority date: 2020-11-16
Filing date: 2021-11-15
Publication date: 2022-05-19
Also published as: CN114513669A

Abstract

The present application relates to the technical field of video processing, and disclosed therein are a video encoding and video playback method, apparatus and system, which are applied to a surround playback scenario. The method comprises: acquiring a plurality of raw video streams obtained on the basis of video streams collected by a plurality of cameras in the same time period for the same spatial area, and generating at least one target video stream according to the plurality of raw video streams, the target video stream being a video stream obtained by selecting a specific number of image frames from among the raw video streams corresponding to each camera according to a set direction; and compressing the at least one target video stream. On the basis of the foregoing, in a video playback stage, the present application is beneficial for reducing the transmission code rate of a video stream.

Description

Video coding and video playback method, device and system

This application claims the priority of the Chinese patent application with the application number 202011282060.9 and the application title "Video coding and video playback method, device and system", which was submitted to the State Intellectual Property Office on November 16, 2020, the entire contents of which are incorporated by reference in this application.

technical field

The present application relates to the technical field of video processing, and in particular, to a video coding and video playback method, device and system.

Background technique

In the 5G scenario, audiences pursue a better video viewing experience and hope to see more details in the video, which leads to the demand for 360° viewing around the target object. Especially in sports competitions, concerts and other scenes with specific focus, audiences want to watch dynamic surround playback images at different times. Surround playback refers to playback of images collected at different angles for the same spatial area in a certain surround direction. For the terminal, displaying the picture of the surround playback to the user is to show the user the images collected by the cameras of each camera position according to the surround direction.

In the prior art, the video streams collected by cameras at different positions are independently compressed, so images collected by cameras at other positions cannot be referred to during decompression. In the surround playback scenario, since the continuous multi-frame images played by the terminal are from cameras at different positions, to decompress a certain frame of image, it is necessary to rely on other images of the image in the corresponding original video stream, which leads to the video stream The transmission bit rate is higher.

SUMMARY OF THE INVENTION

Embodiments of the present application provide a video encoding and video playback method, device, and system, which help to reduce the transmission bit rate of a video stream, and can be applied to a surround playback scenario.

In order to achieve the above purpose, the application provides the following technical solutions:

In a first aspect, the present application provides a video encoding method. The method includes: first, acquiring multiple original video streams obtained based on video streams collected by multiple cameras for the same spatial area in the same time period. Second, at least one target video stream is generated according to the plurality of original video streams. The target video stream is a video stream obtained by selecting a certain number of frame images from the original video stream corresponding to each camera according to the set direction. Next, the at least one target video stream is compressed. In the technical solution, in the encoding stage, a certain number of frame images are selected from the original video stream corresponding to each camera according to the set direction to generate the target video stream, and then the target video stream is compressed. Based on this, in the surround playback stage, the video clips selected from the compressed target video stream can be directly transmitted to the terminal, so that the terminal can decompress the video clips to be played without relying on other images of each original video stream. Decoding helps to reduce the transmission bit rate of the video stream.

In a possible design, the time stamps corresponding to the images in the target video stream are consecutive. In this way, it is helpful for the terminal to realize real-time surround playback within the consecutive multiple time stamps, thereby increasing the smoothness of the video picture.

In a possible design, the multiple original video streams include a first original video stream, and the target video stream takes a first frame image in the first original video stream as a starting point. In this way, it is helpful for the terminal to realize surround playback from the first frame image in the original video stream.

In a possible design, the method further includes: generating and sending an index of the at least one target video stream. The index of the target video stream includes: the identifier of the camera corresponding to the target video stream and the category of the target video stream. The camera corresponding to the target video stream is the camera corresponding to the first frame image in the target video stream. Wherein, if the first frame of image is acquired by a camera, or an image acquired by a camera is obtained after processing, the first frame of image corresponds to the camera. The category of the video stream is used to characterize whether the video stream is the original video stream or the target video stream. In this way, when the number of cameras is large, it helps to save the storage space occupied by the index of the video stream.

In a possible design, the set direction is a surrounding direction of the plurality of cameras, and the surrounding direction includes a clockwise direction or a counterclockwise direction.

In a possible design, the method further includes: encapsulating the compressed at least one target video stream to obtain multiple segments. Then, the indices of the plurality of segments are generated and transmitted. In this way, during video playback, the terminal can acquire the video clips to be played based on the segment granularity, instead of having to acquire the to-be-played video clips based on the segment granularity or the video stream granularity, which helps to save transmission resources.

In a second aspect, the present application provides a video playback method, the method comprising: receiving a video clip to be played, the video clip to be played is selected from a first target video stream, and the first target video stream is obtained from a plurality of cameras according to a set direction A video stream obtained by selecting a certain number of frame images from the original video stream corresponding to each camera in . The multiple original video streams corresponding to the multiple cameras are obtained based on the video streams collected by the multiple cameras for the same spatial area in the same time period. Then, the video clip to be played is decompressed, and the decompressed video clip is played. Since the video clip to be played by the terminal is selected from the first target video stream, the terminal does not need to acquire images compressed based on different original video streams. In this way, it helps to reduce the transmission bit rate of the video stream.

In a possible design, the time stamps corresponding to the images in the first target video stream are consecutive. In this way, it is helpful for the terminal to realize real-time surround playback in multiple consecutive time stamps.

In a possible design, the multiple original video streams include a first original video stream, and the first target video stream takes a first frame image in the first original video stream as a starting point. In this way, it is helpful for the terminal to realize surround playback from the first frame image in the original video stream.

In a possible design, before receiving the to-be-played video clip, the method further includes: sending a request message, where the request message is used to request the to-be-played video clip.

In a possible design, the request message includes an index of the first target video stream. The index of the first target video stream is used to determine the first target video stream, and the index of the first target video stream includes the identifier of the camera corresponding to the first target video stream and the category of the first target video stream. The camera corresponding to the first target video stream is the camera corresponding to the first frame image in the first target video stream. Wherein, if the first frame of image is acquired by a camera, or an image acquired by a camera is obtained after processing, the first frame of image corresponds to the camera. The category of the video stream is used to characterize whether the video stream is the original video stream or the target video stream. This possible design provides a technical solution for the terminal to send the index of the first target video stream, thus it can be obtained that in this technical solution, the terminal determines the index of the first target video stream, which helps to save the calculation of the network device resource.

In a possible design, the method further includes: receiving an index of at least one target video stream. The at least one target video stream is generated according to the plurality of original video streams, and the at least one target video stream includes a first target video stream. The index of the at least one target video stream may be actively requested by the terminal from the network device, or may be actively pushed by the network device to the terminal.

In a possible design, the request message further includes an index of the target segment to which the video segment to be played belongs. In this case, receiving the video stream segment to be played includes: receiving the target segment. In this way, it helps to further reduce the transmission bit rate of the video stream.

In one possible design, the method further includes: receiving an index of a segment in the at least one target video stream. Wherein, the at least one target video stream is generated according to the plurality of original video streams, and the at least one target video stream includes a first target video stream. The index of the segment in the at least one target video stream may be actively requested by the terminal from the network device, or may be actively pushed by the network device to the terminal.

In a possible design, the method further includes: determining the wraparound direction of the video clip to be played and a timestamp corresponding to the start image of the video clip to be played; then, based on the wraparound direction of the video clip to be played and the video clip to be played The timestamp corresponding to the starting image of the , determines the index of the first target video stream.

In a possible design, the wrapping direction corresponding to the first target video stream is the same as the wrapping direction of the video clip to be played; and the first target video stream includes the previous image of the first image in the currently playing video stream Frame images, the timestamp corresponding to the first image is the same as the timestamp corresponding to the starting image.

In a third aspect, the present application provides a video playback method, the method comprising: determining a video stream segment to be played, wherein the to-be-played video segment is selected from a first target video stream, and the first target video stream is from a set direction from A video stream obtained by selecting a certain number of frame images from the original video stream corresponding to each camera in the multiple cameras. The multiple original video streams corresponding to the multiple cameras are obtained based on the video streams collected by the multiple cameras for the same spatial area in the same time period. Then, the to-be-played video stream segment is sent to the terminal.

In a possible design, the time stamps corresponding to the images in the first target video stream are consecutive.

In a possible design, the multiple original video streams include a first original video stream, and the first target video stream takes a first frame image in the first original video stream as a starting point.

In a possible design, before determining the to-be-played video stream segment, the method further includes: receiving a request message sent by the terminal, where the request message is used to request the to-be-played video segment.

In a possible design, the request message includes an index of the first target video stream, and the index of the first target video stream includes an identifier of a camera corresponding to the first target video stream and a category of the first target video stream. Wherein, the camera corresponding to the first target video stream is the camera corresponding to the first frame image in the first target video stream. The category of the video stream is used to characterize whether the video stream is the original video stream or the target video stream. The method further includes: determining the first target video stream based on the index of the first target video stream.

In a possible design, the method further includes: sending an index of at least one target video stream to the terminal; wherein the at least one target video stream is generated according to the multiple original video streams, and the at least one target video stream includes The first target video stream.

In a possible design, the request message further includes an index of the target segment to which the video segment to be played belongs. The method also includes determining the target segment from the first target video stream based on the index of the target segment. In this case, sending the video stream segment to be played to the terminal includes: sending the target segment to the terminal.

In a possible design, the method further includes: sending an index of a segment in the at least one target video stream to the terminal. Wherein, the at least one target video stream is generated according to the plurality of original video streams, and the at least one target video stream includes a first target video stream.

For explanations and beneficial effects of the relevant content in any video playback method provided in the third aspect, reference may be made to the corresponding description in the video playback method corresponding to the second aspect, which will not be repeated here.

It should be noted that the network device that executes the method provided by the first aspect may be the same as or different from the network device that executes the method provided by the third aspect.

In a fourth aspect, the present application provides a video encoding apparatus. The apparatus may be a chip or a network device.

In a possible design, the apparatus is used to perform any one of the methods provided in the first aspect above. The present application may divide the device into functional modules according to the method provided in the first aspect. For example, each function module may be divided corresponding to each function, or two or more functions may be integrated into one processing module. Exemplarily, the present application may divide the apparatus into an acquisition unit, a generation unit, a compression unit, and the like according to functions. For descriptions of possible technical solutions and beneficial effects performed by each of the above-divided functional modules, reference may be made to the corresponding technical solutions in the first aspect, which will not be repeated here.

In another possible design, the apparatus includes: a processor for implementing any one of the methods described in the first aspect above. The apparatus may further include a memory, the memory is coupled to the processor, and the memory is used for storing a computer program. When the processor executes the computer program stored in the memory, any one of the methods described in the first aspect can be implemented. The apparatus may also include a communication interface for the device to communicate with other devices, for example, the communication port may be a transceiver, circuit, bus, module or other type of communication interface. The computer program in the memory in this application can be pre-stored or stored after being downloaded from the Internet when the device is used. This application does not uniquely limit the source of the computer program in the memory. The coupling in this embodiment of the present application is an indirect coupling or connection between units or modules, which may be in electrical, mechanical or other forms, and is used for information interaction between units or modules.

In a fifth aspect, the present application provides a video playback device.

In a possible design, the apparatus is used to perform any one of the methods provided in the second aspect or the third aspect. The present application may divide the device into functional modules according to the method provided in the second aspect or the third aspect. For example, each function module may be divided corresponding to each function, or two or more functions may be integrated into one processing module.

Exemplarily, when the apparatus performs any one of the methods provided in the second aspect, the apparatus may be a terminal, and the present application may divide the apparatus into a receiving unit, a decompression unit, and a playing unit according to functions. For descriptions of possible technical solutions and beneficial effects performed by each of the above-divided functional modules, reference may be made to the corresponding technical solutions in the second aspect, which will not be repeated here.

When the apparatus executes any of the methods provided in the third aspect, the apparatus may be a chip or a network device, and the present application may divide the apparatus into a determining unit and a sending unit according to functions. For descriptions of possible technical solutions and beneficial effects performed by each of the above-divided functional modules, reference may be made to the corresponding technical solutions in the above-mentioned third aspect, which will not be repeated here.

In another possible design, the apparatus includes: a processor, configured to implement any one of the methods described in the second aspect or the third aspect. The apparatus may further include a memory coupled to the processor, where the memory is used to store a computer program, and when the processor executes the computer program stored in the memory, any one of the methods described in the second or third aspects above can be implemented. The apparatus may also include a communication interface for the device to communicate with other devices, for example, the communication port may be a transceiver, circuit, bus, module or other type of communication interface. The computer program in the memory in this application can be pre-stored or stored after being downloaded from the Internet when the device is used. This application does not uniquely limit the source of the computer program in the memory. The coupling in this embodiment of the present application is an indirect coupling or connection between units or modules, which may be in electrical, mechanical or other forms, and is used for information interaction between units or modules.

In a sixth aspect, the present application provides a computer readable storage medium, such as a computer non-transitory readable storage medium. A computer program (or instruction) is stored thereon, and when the computer program (or instruction) runs on the video encoding apparatus, the video encoding apparatus is made to execute any one of the methods provided in the first aspect.

In a seventh aspect, the present application provides a computer readable storage medium, such as a computer non-transitory readable storage medium. A computer program (or instruction) is stored thereon, and when the computer program (or instruction) runs on the video playback device, the video playback device is made to execute any one of the methods provided in the second aspect or the third aspect.

In an eighth aspect, the present application provides a computer program product that, when executed on a computer, enables any one of the methods provided in the first to third aspects to be executed.

In a ninth aspect, the present application provides a chip system, comprising: a processor, where the processor is configured to call and run a computer program stored in the memory from a memory, and execute any one of the methods provided in the first to third aspects.

In a tenth aspect, the present application provides a video system, including: a network device and a terminal. The network device may be configured to execute any of the methods provided in the foregoing first aspect, and the terminal may be configured to execute any of the foregoing methods provided in the second aspect.

In a possible design, the network device may also be used to execute any one of the methods provided in the third aspect. Alternatively, the video system further includes other network devices for executing any of the methods provided in the third aspect.

It can be understood that any of the above-provided video encoding devices, video playback devices, computer storage media, computer program products or video systems can be applied to the corresponding methods provided above. For the beneficial effects, reference may be made to the beneficial effects in the corresponding method, which will not be repeated here.

In this application, the names of the above-mentioned video encoding apparatus and video playback apparatus do not limit the devices or functional modules themselves. In actual implementation, these devices or functional modules may appear in other names. As long as the functions of each device or functional module are similar to those of the present application, they fall within the scope of the claims of the present application and their equivalents.

These and other aspects of the present application will be more clearly understood from the following description.

Description of drawings

FIG. 1 is a schematic diagram of a distribution manner of cameras according to an embodiment of the present application;

2 is a schematic diagram of the relationship between a video stream, slice, segment, I frame and P frame provided by an embodiment of the present application;

3 is a schematic diagram of video streams collected by multiple cameras under a surround playback scenario provided by the conventional technology;

Fig. 4 is the schematic diagram of the video stream of a kind of transmission that the conventional technology provides based on Fig. 3;

5A is a schematic structural diagram of a video system provided by an embodiment of the application;

5B is a schematic structural diagram of another video system provided by an embodiment of the present application;

5C is a schematic structural diagram of another video system provided by an embodiment of the application;

5D is a schematic structural diagram of another video system provided by an embodiment of the application;

FIG. 6 is a schematic diagram of a dual-focus application scenario provided by an embodiment of the present application;

7 is a schematic structural diagram of a computer device according to an embodiment of the present application;

8 is a schematic flowchart of a video encoding method provided by an embodiment of the present application;

9 is a schematic process diagram of a video encoding method provided by an embodiment of the present application;

10 is a schematic diagram of an encapsulation format provided by an embodiment of the present application;

11 is a schematic flowchart of a video playback method provided by an embodiment of the present application;

12 is a schematic diagram of a video stream transmitted between a distribution side system and a terminal and a played video stream provided by an embodiment of the application;

13 is a schematic diagram of another video stream transmitted between a distribution side system and a terminal and a played video stream provided by an embodiment of the present application;

14 is a schematic structural diagram of a video encoding apparatus according to an embodiment of the present application;

FIG. 15 is a schematic structural diagram of a video playback device according to an embodiment of the application;

FIG. 16 is a schematic structural diagram of another video playback apparatus provided by an embodiment of the present application.

Detailed ways

First, some terms and technologies involved in this application are explained for the convenience of readers:

1), surround playback

Surround playback refers to playback of images collected at different angles for the same spatial area in a certain surround direction. Surround playback includes still surround playback and dynamic surround playback. Still surround playback refers to playback of images collected from the same spatial area at different angles at the same moment in a certain surround direction. Dynamic surround playback refers to the playback of images collected from the same spatial area at different times and at different angles in a certain surround direction. The surround playback involved in this application refers to dynamic surround playback.

The surround direction refers to the clockwise or counterclockwise direction based on the angle at which the currently playing image is captured.

The space area refers to the area targeted for surround playback, usually an area with a specific focus, such as the space area where the basketball arena is located, or the space area where the stage of the concert is located.

On the one hand, surround playback requires video streams collected by multiple cameras distributed in a specific manner for the same spatial area at different angles, and one video stream includes consecutive multiple frames of images. Wherein, the field of view of each camera in the plurality of cameras has an overlapping area with the spatial area. The fields of view of different cameras may or may not have overlapping areas. This embodiment of the present application does not limit the distribution manner of the plurality of cameras. For example, the plurality of cameras may be distributed in an annular manner, as shown in a panel in FIG. 1 . Alternatively, the plurality of cameras may be distributed in a fan-shaped manner, as shown in b in FIG. 1 . Alternatively, the plurality of cameras may be distributed in a right-angle (ie, 90°) manner, as shown in panel c in FIG. 1 . Alternatively, the plurality of cameras may be distributed in a flat-angle (ie, 180°) manner (or a straight-line manner), as shown in d in FIG. 1 .

It should be noted that, in any distribution manner, the plurality of cameras may be distributed evenly or unevenly. In FIG. 1 , the multiple cameras are uniformly distributed as an example for description.

As an example, the surrounding direction is the clockwise direction based on the distribution of the plurality of cameras, taking the camera that captures the "currently playing image" (or the camera that captures the image used to obtain the "currently playing image") as the reference, or Counterclockwise. Surround playback is to sequentially play the images collected by the multiple cameras (or images obtained by processing based on the images collected by the multiple cameras) according to the surrounding direction.

On the other hand, surround playback requires camera synchronization technology to ensure that the multiple cameras capture video streams at the same time. For example, based on the camera synchronization technology, it is realized that the multiple cameras all collect images at a certain moment, and the time interval for subsequent acquisition of two adjacent frames of images is the same. Regarding the specific implementation of "based on the camera synchronization technology to ensure that the multiple cameras capture video streams at the same time", reference may be made to the prior art.

2), video stream, I frame, P frame, slice

Each frame of image in a video stream corresponds to a timestamp, which is used for aligning the image with images in other video streams, and for image transmission, etc. For example, assuming that there are 5 cameras deployed in the acquisition side system of the surrounding playback scene, and the video stream collected by each camera includes 100 frames of images, then according to the sequence of acquisition, the images in each video stream can correspond to timestamp 1- 100.

The images in the video stream may include I frames and P frames, and the I frames and P frames indicate the compression mode of the frame images. specific:

I-frames represent keyframes. During encoding, the I frame belongs to intra-frame compression encoding, that is, compression is performed based on the image of the current frame, and during decoding, the image of the current frame can be obtained by decompression based on the compressed data.

The P frame represents the difference frame between the image of the current frame and the image of the previous frame of the image of the current frame (specifically, it may be an I frame or a P frame). During encoding, the P frame belongs to inter-frame compression encoding, that is, compression is performed based on the difference between the image of this frame and the image of the previous frame of the image. When decoding, it is decompressed based on the data obtained after compression and the image of the previous frame. , to get the image of this frame.

A video stream may include multiple slices, and each slice includes one frame of image or consecutive multiple frames of images. Each fragment can be encoded and decoded independently and transmitted independently.

For example, the first frame of image in a slice is an I frame, and other images may be I frames or P frames. Typically, the first picture in a slice is an I-frame, and the other pictures are P-frames.

3), segment

According to a certain encapsulation format, the image is encapsulated to obtain segments. A segment may include one frame of image or consecutive multiple frames of images, and each frame of image may be an I frame or a P frame. A shard can contain multiple segments. Segments cannot be encoded and decoded independently, but they can be transmitted independently.

This embodiment of the present application does not limit the encapsulation manner, for example, the encapsulation manner may include chunk encapsulation or transport stream (transport stream, TS) encapsulation, and the like.

As shown in FIG. 2 , it is a schematic diagram of the relationship among a video stream, a slice, a segment, an I frame and a P frame according to an embodiment of the present application. Figure 2 shows that the video stream includes multiple slices, such as slice 1 and slice 2, etc. One slice includes 10 frames of images, and the first frame image in each slice is an I frame, and the other images are P frames ; A segment includes a frame of image as an example to illustrate.

4), other terms

In the embodiments of the present application, words such as "exemplary" or "for example" are used to represent examples, illustrations or illustrations. Any embodiments or designs described in the embodiments of the present application as "exemplary" or "such as" should not be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present the related concepts in a specific manner.

In the embodiments of the present application, the terms "first" and "second" are only used for description purposes, and cannot be understood as indicating or implying relative importance or implying the number of indicated technical features. Thus, a feature defined as "first" or "second" may expressly or implicitly include one or more of that feature. In the description of the embodiments of the present application, unless otherwise specified, "plurality" means two or more.

In the surround playback scenario, the video clips played by the terminal are recombined from images collected by multiple cameras. In the traditional technology, in the encoding phase, the video streams collected by different cameras are independently compressed. Therefore, in the video playback phase, in order to enable the terminal to correctly decode the recombined video clips, in the encoding phase, each A video stream contains multiple groups of pictures (GOPs), wherein the first picture of each GOP is an I frame, and other pictures can be P frames. The more I frames, the higher the transmission bit rate.

For example, as shown in FIG. 3 , it is assumed that video streams 1-5 are acquired by cameras 1-5 deployed clockwise, and each video stream contains 100 frames of images. In the encoding stage, the encoder compresses the five video streams respectively. In the playback stage, it is assumed that the terminal starts playing from the first frame of video stream 1 (that is, normal playback), and when playing the fourth frame of video stream 1, the terminal determines that it needs to start clockwise playback under the user's instruction. , then during surround playback, the video clips played by the terminal sequentially include: the fifth frame in video stream 2, the sixth frame in video stream 3, the seventh frame in video stream 4, and the fifth frame in video stream 5. 8 frames of images, the 9th frame of images in video stream 1, the 10th frame of images in video stream 2... as shown in Figure 4.

Referring to FIG. 4 , during surround playback, the fifth frame image and the fourth frame image in the video clip played by the terminal are selected from different video streams. In the encoding stage, the encoder compresses the video stream collected by the same camera respectively, as shown in Figure 3. Therefore, during surround playback, when the terminal needs to correctly decode the fifth frame of image, it cannot refer to the video stream in the playback video stream. 4 frames of images, so, in the encoding stage, the 5th frame of images needs to be set as I frame. Similarly, other images that follow in the surround playback stage have the same problem. This causes the problem of high transmission bit rate.

Since the first frame image of each GOP is an I frame, and other images are P frames, if a GOP contains many images, the first frame image of the surround playback is a non-first frame image of a GOP. , the terminal needs to decode the first frame image of the GOP in sequence to obtain the first frame image of the surround playback, which will result in the inability to perform the surround playback in real time. Therefore, in order to realize real-time surround playback, in the traditional technology, the GOP is generally set very small. For example, a GOP includes one or two frames of images, which will cause a lot of I frames in a video stream, resulting in a high video transmission bit rate. The problem. For example, if a GOP includes 5 frames of images, then, when the first frame of images played around is the fifth frame of images in the GOP, the terminal needs to decode the first frame of images in the GOP to obtain the second frame of images. The second frame of image is decoded to obtain the third frame of image, and so on, the fifth frame of image is obtained by decoding the fourth frame of image, and the process takes a long time, so real-time surround playback cannot be realized.

To this end, the embodiments of the present application provide a video encoding method and a video playback method, which can be applied to a surround playback scenario. In the encoding stage, the network device records the original video stream corresponding to each camera in the surround playback scenario according to a set direction. Select a certain number of frame images to generate the target video stream, and then compress the target video stream. Based on this, in the surround playback stage, the network device transmits the video clips selected from the compressed target video stream to the terminal, so that the terminal can decode the video clips to be played, and does not need to transmit the video clips based on different original video streams to the terminal. Compressed image. Therefore, it helps to reduce the transmission bit rate of the video stream.

Further, since the target video stream is composed of images in multiple original video streams, the compression of the target video stream can be equivalent to that an image in one original video stream can refer to an image in another original video stream. to compress. Correspondingly, the terminal can decode an image in one original video stream to obtain an image in another original video stream. Compared with the traditional technology, the video stream transmitted during video playback does not need to include many I-frames, thereby helping to reduce the transmission bit rate of the video stream.

As shown in FIG. 5A , it is a schematic structural diagram of a video system 1 according to an embodiment of the present application. The video system 1 includes: an acquisition side system 10, an encoder 20, and one or more terminals (eg, a terminal 30 and a terminal 31).

The acquisition side system 10 includes a plurality of cameras. The multiple cameras are distributed and deployed in a certain spatial area based on a specific manner (as shown in FIG. 1 ), so as to collect video streams for the spatial area from different angles.

Each camera can be a fixed focus camera. The focal points of the multiple cameras may correspond to the same focus area, or may correspond to multiple focus areas. That is to say, the surround playback of the embodiments of the present application can be applied to a single-focus scene, and can also be applied to a multi-focus scene (eg, a bi-focus scene).

As shown in FIG. 6 , it is a schematic diagram of a dual-focus application scenario. Fig. 6 is a scene set for a basketball game. Multiple cameras are deployed around the basketball court. The multiple cameras are distributed in a ring shape. Each camera has a focal point. The focus of some cameras is in the first focus area, and the other cameras are in the first focus area. The focal point is in the second focal area, thus forming a bifocal scene. In one example, the focus of a part of the cameras is positioned to the first focus area of the basketball arena by manual focusing, and the focus of another part of the cameras is positioned to the second focus area of the basketball arena, and the camera synchronization technology is used to achieve the multiple The camera captures an image at a certain moment, and the time interval for subsequent capture of two adjacent frames of images is the same.

The encoder 20 is configured to execute the video encoding method provided by the embodiment of the present application. The specific implementation of the video encoding method provided by the embodiments of the present application may refer to the following, for example, refer to the video encoding method shown in FIG. 8 .

The terminal, which may also be referred to as a playback terminal, is used to decode and play video clips in the video stream. This embodiment of the present application does not limit the physical form of the terminal, for example, it may be a smart phone, a tablet computer, or the like.

The encoder 20 is a functional module, and its functions can be implemented by software, or by hardware, or by software combined with hardware.

In FIG. 5A , it is illustrated by taking an example that the acquisition-side system 10 and the encoder 20 are independently installed. In some embodiments, the acquisition-side system 10 may further include a control node connected to the above-mentioned multiple cameras, and the control node may control the multiple cameras. As shown in Figure 5B, the encoder 20 may be integrated in the control node. Of course, the encoder 20 can also be independent of the acquisition-side system 10. At this time, the multiple cameras can send the captured video streams to the control node, and the control node sends the video streams (or the processed video streams) to the encoder 20.

In some embodiments, the encoder 20 is further configured to communicate with the terminal, so as to provide the terminal with video segments in the encoded video stream, so as to realize video playback.

In other embodiments, as shown in FIG. 5C , the video system 1 may further include: a distribution side system 40 . The encoder 20 may be integrated in the distribution-side system 40 , or may be provided independently from the distribution-side system 40 . In FIG. 5C , the encoder 20 and the distribution-side system 40 are independently installed as an example for description.

Based on FIG. 5C , the encoder 20 can also be used to send the encoding result to the distribution-side system 40 . The distribution-side system 40 is configured to communicate with the terminal, so as to distribute the video clips in the encoded video stream to the terminal, so as to realize video playback.

Optionally, the distribution-side system 40 may be a content delivery network (content delivery network, CDN). The CDN may be any kind of CDN in the conventional technology, but is of course not limited to this. The function of CDN can be implemented by one server or by multiple servers. Optionally, the distribution-side system 40 may be one or more dedicated servers, that is, servers or server clusters specially set up to implement the video playback method provided by the embodiments of the present application. In general, using a CDN for video distribution can save costs compared to using a dedicated server.

If the distribution-side system 40 is a CDN, the video system may also include an origin station 50 . The source site 50 is the data source of the CDN. As shown in FIG. 5D , the encoder 20 may be integrated in the source station 50 . Of course, the encoder 20 may also be independent of the source station 50. In this case, the encoder 20 may send the encoding result of the video encoding method provided by the embodiment of the present application to the CDN via the source station 50. The source station 50 may be implemented by one server, or may be implemented by multiple servers together.

It should be noted that the above video systems shown in FIGS. 5A to 5D are all examples of the video systems applicable to the embodiments of the present application, and they are not applicable to the video systems to which the video encoding and video playback methods provided by the embodiments of the present application are applicable. constitute a limitation.

In terms of hardware implementation, the above-mentioned device for realizing the function of the encoder 20, the server for realizing the function of the distribution side system 40, and the terminal can all be realized by the computer device 70 as shown in FIG. 7 .

As shown in FIG. 7 , a computer device 70 may be used to implement the video encoding method or the video playing method provided by the embodiments of the present application. For example, when the computer device 70 is a device that implements the functions of the encoder 20, it is used to implement the video encoding method provided by the embodiment of the present application, and optionally, it is also used to implement the video playback method provided by the embodiment of the present application. For another example, when the computer device 70 is a server or a terminal that implements the functions of the distribution-side system 40, it is used to implement the video playback method provided by the embodiment of the present application.

The computer device 70 shown in FIG. 7 may include a processor 701, a memory 702, a communication interface 703, and a bus 704. The processor 701 , the memory 702 and the communication interface 703 may be connected through a bus 704 .

The processor 701 is the control center of the computer device 70, and may specifically be a general-purpose central processing unit (central processing unit, CPU), or other general-purpose processors, or the like. Wherein, the general-purpose processor may be a microprocessor or any conventional processor or the like.

As an example, processor 701 may include one or more CPUs, such as CPU 0 and CPU 1 shown in FIG. 7 .

Memory 702 may be read-only memory (ROM) or other type of static storage device that can store static information and instructions, random access memory (RAM) or other type of static storage device that can store information and instructions A dynamic storage device that can also be an electrically erasable programmable read-only memory (EEPROM), a magnetic disk storage medium, or other magnetic storage device, or can be used to carry or store instructions or data structures in the form of desired program code and any other medium that can be accessed by a computer, but is not limited thereto.

In one possible implementation, the memory 702 may be independent of the processor 701 . The memory 702 may be connected to the processor 701 through a bus 704 for storing data, instructions or program codes. When the processor 701 calls and executes the instructions or program codes stored in the memory 702, the video encoding method or the video playing method provided by the embodiments of the present application can be implemented.

In another possible implementation manner, the memory 702 can also be integrated with the processor 701 .

The communication interface 703 is used for connecting the computer device 70 with other devices through a communication network, and the communication network can be an Ethernet, a radio access network (RAN), a wireless local area network (wireless local area networks, WLAN) and the like. The communication interface 703 may include a receiving unit for receiving data, and a transmitting unit for transmitting data.

The bus 704 may be an industry standard architecture (industry standard architecture, ISA) bus, a peripheral component interconnect (peripheral component interconnect, PCI) bus, or an extended industry standard architecture (extended industry standard architecture, EISA) bus or the like. The bus can be divided into address bus, data bus, control bus and so on. For ease of presentation, only one thick line is used in FIG. 7, but it does not mean that there is only one bus or one type of bus.

It should be pointed out that the structure shown in FIG. 7 does not constitute a limitation on the computer device 70. In addition to the components shown in FIG. 7, the computer device 70 may include more or less components than those shown in the figure, or a combination of certain components may be included. some components, or a different arrangement of components. For example, when the computer device 70 is a terminal, the computer device 70 may further include a display screen, an audio input and output device, and the like, which is not limited in this embodiment of the present application.

Hereinafter, the video encoding method provided by the embodiments of the present application will be described with reference to the accompanying drawings. The method can be applied to any of the video systems provided above (such as the video system shown in any of Figures 5A-5D).

As shown in FIG. 8 , it is a schematic flowchart of a video coding method provided by an embodiment of the present application. This embodiment is described by taking the method applied to the video system shown in FIG. 5C as an example. For example, the multiple cameras in this embodiment may be some or all of the cameras in the acquisition-side system 10 in FIG. 5C , and the encoder and the distribution-side system in this embodiment may be the

encoders

20 and 20 in FIG. 5C , respectively. Distribution side system 40 .

The method shown in FIG. 8 may include the following S101-S108:

S101: Multiple cameras collect multiple video streams for the same spatial area in the same time period. For example, a camera captures a video stream for the spatial region during the time period.

Regarding the distribution manner of the plurality of cameras, the explanation of the spatial area and the like may refer to the above.

It can be understood that, after the camera is started, it can continuously collect images, and the same time period in S101 can be any time period in the process of the camera collecting images. For example, the camera can take the video stream captured at each cycle as a raw video stream. Wherein, the duration of the cycle is equal to the duration of the time period in S101.

S102: The multiple cameras send multiple original video streams to the encoder. For example, one camera corresponds to one raw video stream. The original video stream corresponding to one camera may be the video stream collected by the camera in S101, or the video stream obtained by the camera after processing the video stream collected in S101.

That is to say, in this embodiment of the present application, the video stream received by the encoder is defined as the original video stream.

S103: The encoder generates at least one target video stream according to the multiple original video streams. The target video stream is a video stream obtained by selecting a certain number of frame images from the original video stream corresponding to each camera according to the set direction.

Optionally, the set direction is a surrounding direction of the plurality of cameras, such as a clockwise surrounding direction or a counterclockwise surrounding direction.

Optionally, in the process of generating a target video stream, the number of frames of video streams selected from different original video streams may be the same or different.

Optionally, the time stamps corresponding to the images in the target video stream are consecutive. In this way, it is helpful for the terminal to realize real-time surround playback in multiple consecutive time stamps. The following specific examples are all described by taking the continuous time stamps corresponding to the images in the target video stream as an example for description. Here, a unified description is provided, and details are not repeated below.

Optionally, the multiple original video streams include a first original video stream, and the first original video stream may be any one of the multiple original video streams. The target video stream starts with the first frame image in the first original video stream. In this way, it is helpful for the terminal to realize surround playback from the first frame image in the original video stream.

Optionally, each original video stream and each target video stream include the same number of frames of images. In this way, combined with the above-mentioned optional implementation manners, it is helpful for the terminal to realize real-time surround playback from each frame in the original video stream.

Optionally, the number of target video streams generated by the encoder based on different original video streams is the same.

Optionally, S103 may include: for each original video stream in the multiple original video streams, the encoder generates a first-type video stream and a second-type video stream. Each of the original video streams, each of the first type of video streams, and each of the second type of video streams include the same number of frames of images. The first type of video stream corresponding to the first original video stream is, starting from the first frame image in the first original video stream, and according to the first surround direction of the plurality of cameras, from the original video stream collected by each camera. The video stream obtained by selecting the first number of frame images. The time stamps corresponding to the images in the first type of video stream are continuous. The second type of video stream corresponding to the first original video stream is, taking the first frame image in the first original video stream as a starting point, and selecting from the original video streams collected by each camera according to the second surround direction of the plurality of cameras. The video stream obtained from the second number of frame images. The time stamps corresponding to the images in the second type of video stream are continuous.

The first wraparound direction is opposite to the second wraparound direction. Optionally, the first surrounding direction is a clockwise direction, and the second surrounding direction is a counterclockwise direction.

The optional implementation manner is described by taking "the encoder generates the first type video stream and the second type video stream corresponding to each original video stream" as an example. Optionally, the encoder may not generate the second type of video stream or the first type of video stream. If the encoder generates the first type of video stream, the terminal may perform surround playback based on the first surround direction. If the encoder generates the second type of video stream, the terminal may perform surround playback based on the second surround direction.

The first numbers corresponding to different original video streams may or may not be equal. For the convenience of description, the following description is given by taking an example that the first numbers corresponding to each original video stream are equal to each other. The first quantity corresponding to an original video stream refers to the quantity of images selected from the original video stream each time in the process of generating the first type of video stream.

The second numbers corresponding to different original video streams may or may not be equal. For the convenience of description, the following description is given by taking an example that the second numbers corresponding to each original video stream are equal. The second quantity corresponding to an original video stream refers to the quantity of images selected from the original video stream each time in the process of generating the second type of video stream.

In addition, each of the first quantities and each of the second quantities may or may not be equal. For the convenience of description, the following description is given by taking an example that each of the first quantities and each of the second quantities are equal.

Both the first quantity and the second quantity may be values entered by the administrator. The first quantity and the second quantity can be updated. Both the first number and the second number may be an integer greater than or equal to 1.

As shown in Figure 9, it is assumed that the acquisition side system includes cameras 1-5 arranged in a clockwise direction, the original video streams corresponding to cameras 1-5 are original video streams 1-5 respectively, and each original video stream includes 100 frames of images , the encoder can generate the first type video stream i and the second type video stream i based on the original video stream i. Among them, 1≤i≤5, i is an integer. Assuming that the first wrapping direction is clockwise, the second wrapping direction is counterclockwise, and both the first and second numbers are 1, then:

The first type of video stream 1 takes the first frame image in the original video stream 1 as the starting point, and selects 1 from the

original video stream

1, 2, 3, 4, 5, 1, 2, 3, 4, 5... The video stream obtained after the frame image, the time stamps corresponding to the images in the video stream are continuous. Specifically, the images in the first type of video stream 1 are: the first frame image in the original video stream 1, the second frame image in the original video stream 2, the third frame image in the original video stream 3, the original video The 4th image in stream 4, the 5th image in original video stream 5, the 6th image in original video stream 1, the 7th image in original video stream 2...the 100th image in original video stream 5 frame image.

By analogy, the encoder can obtain the first type of video streams 2-5, as shown in FIG. 9 .

The second type of video stream 1 takes the first frame image in the original video stream 1 as the starting point, and selects 1 from the

original video stream

1, 5, 4, 3, 2, 1, 5, 4, 3, 2... The video stream obtained after the frame image, the time stamps corresponding to the images in the video stream are continuous. Specifically, the images in the second type of video stream 1 are sequentially: the first frame image in the original video stream 1, the second frame image in the original video stream 5, the third frame image in the original video stream 4, the original video The 4th image in stream 3, the 5th image in original video stream 2, the 6th image in original video stream 1, the 7th image in original video stream 5... The 100th image in original video stream 2 frame image.

By analogy, the encoder can obtain the second type of video streams 2-5, as shown in FIG. 9 .

As an example, a certain number of frames of images are selected from an original video stream, and the number can be considered as a rotation interval, that is, in the process of surround playback, one video stream switches to another video stream after playing a few frames. For example, based on the example of FIG. 9 , in the process of surround playback, one video stream is switched to another video stream after playing one frame of image.

S104: The encoder generates an index of each video stream in the multiple original video streams and the at least one target video stream.

The embodiments of the present application do not limit the specific implementation manner of the index of the video stream.

Optionally, the index of a video stream includes the identifier of the camera corresponding to the video stream and the category of the video stream. The camera corresponding to the original video stream refers to the camera that collects the original video stream, or the camera that collects the original video stream for obtaining the original video stream. The camera corresponding to the target video stream refers to the camera corresponding to the first frame image in the target video stream. Wherein, if the first frame of image is acquired by a camera or obtained by processing an image acquired by a camera, the first frame of image corresponds to the pair of cameras. The category of the video stream is used to characterize whether the video stream is the original video stream or the target video stream. In this way, when the number of cameras is large, it helps to save the storage space occupied by the index of the video stream.

Further optionally, if the target video stream includes a first-type video stream and a second-type video stream, the category of the video stream can be used to characterize whether the video stream is an original video stream, a first-type video stream or a second-type video stream. .

For example, suppose the index of a video stream is [A, B], where A represents the identification of the camera, B represents the category of the video stream, and the categories of the video stream are "00", "01" and "10", in order represents the original video stream, the first type of video stream or the second type of video stream; then, based on the example shown in Figure 9, the index of each video stream can be as shown in Table 1:

Table 1

Optionally, the encoder may sequentially number each video stream in the multiple original video streams and the multiple target videos, and use the number of each video stream as an index of the video stream. For example, based on the example shown in FIG. 9 , there are 5 original video streams, 5 first-type video streams and 5 second-type video streams, the indices of these 15 video streams may be 1-15 in sequence.

S105: The encoder determines I frames and P frames in the multiple original video streams and the at least one target video stream based on the number of frames of images included in one slice, and determines the I frames and P frames for each video after determining the I frames and P frames. stream is compressed.

The number of frames of an image included in a slice may be predefined, and may be updated after being predefined.

For example, assuming that a video stream includes 100 frames of images and a slice includes 10 frames of images, since the first frame of image in a slice is an I frame, other images can be either I frames or P frames. Therefore, in S105 , the encoder may determine the 1st, 11th, 21st, 31st, . . . 91 frames of images in the video stream as I frames, and other images as P frames.

S106: The encoder performs fragmentation processing on the compressed multiple original video streams and the compressed at least one target video stream obtained in S105, respectively, to obtain a plurality of fragments, and generates an index of each fragment.

Optionally, the slices corresponding to the same timestamp include the same number of frames of images. This facilitates the alignment of corresponding video streams at the segment boundary, thereby facilitating segment download alignment. The time stamp corresponding to the fragment may be represented by the time stamp corresponding to the image at a specific position included in the fragment, or by the position of the fragment in the video stream to which it belongs.

Optionally, the frames of the images included in each slice obtained in S106 are the same, so that the encoding is simple and fast.

This embodiment of the present application does not limit the specific implementation manner of the fragmented index. For example, the index of the slice may include the index of the slice in the compressed video stream to which it belongs. Taking a slice including 10 frames of images as an example, each video stream in Figure 9 can be divided into 10 slices after compression, and, for each compressed video stream, the first -10 frames of images constitute slice 1 of the video stream, 11-20 frames of images constitute slice 2 of this video stream... 91-100 frames of images constitute slice 10 of this video stream. The indices of the slices into which each compressed video stream is divided are in the order of 1-10.

It should be noted that, for the convenience and brevity of the drawings, FIG. 9 illustrates that the video stream includes slices. In fact, it should be understood that the compressed video stream includes slices.

In addition, it should be noted that since each fragment can be independently encoded, decoded and transmitted independently, the encoder can perform fragmentation processing on the video stream, and the video stream can be transmitted and decoded based on the fragmentation granularity during video playback, thereby Helps save transmission resources. Fragmentation of the video stream by the encoder is an optional step. If the encoder does not perform slice processing on the video stream, the encoder can set the first frame of image in each original video stream and the target video stream as an I frame, and can also enable the terminal to correctly decode the received video stream.

S107: The encoder sends the encoding result to the distribution-side system. The encoding result includes each compressed video stream and index information generated during the encoding process. The index information may include an index of each video stream and an index of each slice in each video stream.

Optionally, various application services (over the top, OTT) protocol communication may be used between the encoder and the distribution side system to provide users with various application services through the Internet, which is of course not limited to this.

S108: Based on the encoding result, the distribution-side system generates and stores the correspondence between "the identifier of the video content, the compressed video stream, and the index information".

A video stream obtained by compressing video streams collected from different angles in the same spatial area in the same time period corresponds to a segment of video content as a whole, and a segment of video content corresponds to an identifier (ie, an identifier of the video content). Different video content corresponds to the video stream obtained by compressing the video stream collected for the same spatial area in different time periods; or, for different video content, corresponding to the video stream collected in the same time period (or different Compressed video stream.

For example, based on the example shown in FIG. 9 , the corresponding relationship stored in S108 may include: “identification of video content, 15 compressed video streams (ie original video streams 1-5, first-type video streams 1-5 and The correspondence between the second type of video streams 1-5), the index of each video stream in the 15 video streams, and the index of each slice of each video stream in the 15 video streams”.

Optionally, the identifier of the video content may be a uniform resource locator (uniform resource locator, URL), which is used to represent the storage location of the encoding result corresponding to the video content in the distribution-side system. Subsequently, the terminal may request the video content from the distribution-side system based on the identifier. For example, based on the example shown in Figure 9, the URL can be used to represent "compressed original video streams 1-5, compressed first-type video streams 1-5, and compressed second-type video streams 1-5 , and the indices of these video streams, the indices of the slices in these video streams, etc." storage locations in the distribution-side system.

In the video encoding method provided by the embodiment of the present application, according to the set direction, a certain number of frame images are selected from the original video stream collected by each camera to obtain the target video stream, and the original video stream and the target video stream are respectively compressed. The encoder compresses the target video stream, which can be equivalent to that an image in one original video stream can be compressed with reference to an image in another original video stream. In this way, in the surround playback stage, the video clips provided by the distribution-side system to the terminal can be directly selected from the target video stream. For a terminal, an image in another original video stream can be obtained by decoding based on an image in one original video stream. Therefore, compared with the conventional technology, the video stream transmitted during video playback does not need to include many I-frames, thereby helping to reduce the transmission bit rate of the video stream.

For example, based on the examples shown in Figures 9 and 12 below, it can be known that when the first frame image of each fragment in the original video stream and the target video stream is set as I frame, and other frames are set as P frames, the terminal is The received video stream can be correctly decoded, and the specific analysis process can be referred to the description for FIG. 9 and FIG. 12 , and details are not repeated here.

In addition, the video coding method provided by the embodiment of the present application does not need to combine multiple video streams into one high-resolution picture, so the number of cameras and picture resolution are not limited. That is to say, after using the technical solution for video encoding, in the video playback stage, the transmission bit rate of the video stream can be reduced under the condition of ensuring the picture resolution.

In some embodiments, optionally, the encoder may not perform fragmentation on the compressed multiple original video streams and the compressed at least one target video stream. Based on this, the above S105-S106 can be replaced with the following steps 1-2:

Step 1: The encoder determines the first frame image in each original video stream and each target video stream as an I frame, and other images are determined as I frames or P frames (for example, other images are determined as P frames), to determine Each video stream after I-frame and P-frame is compressed.

Step 2: The encoder separately encapsulates each compressed original video stream and each compressed target video stream to obtain multiple segments, and generates an index of each segment.

Based on this, the index information in S107 and S108 may include: the index of each video stream and the index of each segment in each video stream.

That is to say, in this embodiment, the encoder may not perform fragmentation processing on the compressed video stream, but directly encapsulate it. This is a technical solution provided considering that segments can be transmitted independently.

In other embodiments, before executing S106, the encoder may encapsulate images in the compressed multiple original video streams and the compressed at least one target video stream obtained in S105. Based on this, S106 may include: the encoder performs fragmentation processing on the multiple original video streams and at least one target video stream after performing encapsulation.

Since segments can be transmitted independently, during video playback, the terminal can obtain video segments to be played from the distribution-side system based on the segment granularity, instead of obtaining the to-be-played video from the distribution-side system based on the segment granularity or video stream granularity fragments, thereby helping to save transmission resources.

The number of frames of an image included in a segment may be predefined, and may be updated after being predefined. A slice can consist of one or more segments.

Optionally, in order to realize single-frame rotation playback (that is, real-time surround playback), that is, in the process of playing the video stream, when the terminal is currently playing any frame of any video stream, the surround playback can be started, and the encoder can use The segmentation method encapsulates each frame of image in each of the compressed multiple original video streams and the compressed at least one target video stream into a segment respectively.

An example of an encapsulation format is shown in FIG. 10 . FIG. 10 illustrates an example of encapsulating images belonging to one slice based on the fmp4chunk method. Specifically, each frame of the image in the slice is encapsulated into an independent mdat, and encapsulated in a multi-moof header manner. Among them, the moof header includes styp, sidx and moof. In this way, each frame of image in the video stream can be transmitted independently as a segment. The first image in the slice is an I frame, and the other images are P frames.

Optionally, if the encoder encapsulates the images in the compressed multiple original video streams and the compressed at least one target video stream obtained in S105, the index information in S107 may also include the The index of each segment. Wherein, the embodiment of the present application does not limit the specific implementation manner of the segmented index. For example, the index of the segment may include the index of the segment in the shard to which it belongs. Taking a shard including 5 segments as an example, the indices of the segments in each shard may be 1-5.

That is to say, in this embodiment, the encoder can perform both fragmentation processing and encapsulation on the compressed video stream. Since the first frame image of each fragment is an I frame, and the video stream without fragmentation processing is usually only the first frame image is an I frame, therefore, compared with the technology of "no fragmentation processing, only encapsulation" The scheme, in the video playback stage, can reduce the transmission of redundant images to a certain extent.

Hereinafter, the video playback method provided by the embodiments of the present application will be described with reference to the accompanying drawings.

As shown in FIG. 11 , it is a schematic flowchart of a video playing method according to an embodiment of the present application. This embodiment is described by taking the method applied to the video system shown in FIG. 5C as an example. For example, the terminal in this embodiment may be the terminal 30 or the terminal 31 in FIG. 5C . The distribution-side system in this embodiment may be the distribution-side system 40 in 5C.

The method shown in FIG. 11 may include the following S201-S210:

S201: The terminal acquires the identifier of the video content to be played.

The video content to be played refers to the video content that the user desires to play, such as a football game or a concert. The video content to be played in S201 may be the video content in the above-mentioned S108. Based on the example shown in FIG. 9, the video content includes: 15 compressed video streams (that is, the original video streams 1-5, the first type of video Streams 1-5 and the second type of video streams 1-5), the index of each of the 15 video streams, and the index of each slice of each of the 15 video streams.

The embodiment of the present application does not limit the specific form of the identifier of the video content to be played. For example, the identifier of the video content to be played may be a URL of the video content to be played.

This embodiment of the present application does not specifically limit the manner of acquiring the identifier of the video content to be played. For example, the user can select the video content to be played by clicking a video link on the terminal screen, and the terminal receives the user's operation instruction and obtains the URL of the to-be-played video content based on the operation instruction. For another example, the distribution-side system may actively push URLs of one or more video contents to the terminal, where the one or more video contents include URLs of the video contents to be played.

S202: Based on the identifier of the video content to be played, the terminal obtains index information corresponding to the video content to be played from the distribution side system. The index information may include an index of a video stream corresponding to the video content to be played and an index of a segment in the video stream (eg, an index of each video stream and an index of each segment in each video stream).

Taking OTT-based communication between the terminal and the distribution-side system as an example, the terminal may send a request to the distribution-side system, where the request carries the identifier of the video content to be played, and is used to request index information corresponding to the video content to be played. The distribution-side system acquires and feeds back index information corresponding to the to-be-played video content to the terminal based on the identifier of the to-be-played video content. Of course, the distribution-side system can also actively push the index information corresponding to the video content to be played to the terminal.

S203: The terminal requests an initial video stream of the video content to be played from the distribution side system, decompresses the requested initial video stream, and plays the decompressed initial video stream.

The initial video stream can be predefined. Exemplarily, the initial video stream may be an original video stream corresponding to the to-be-played video content, which may be specified by the administrator or determined by the distribution-side system according to the size of the index.

Specifically, S203 may include: taking the transmission of video streams between the terminal and the distribution-side system based on fragmentation granularity as an example, the terminal requests the distribution-side system for a part of the fragments in the initial video stream of the video content to be played, and decompresses the requested To the fragment, play the decompressed fragment.

For example, the terminal sequentially requests the first fragment, the second fragment in the initial video stream of the video content to be played, ... until the terminal determines that the surround playback needs to be started. For example, until the terminal receives the first operation in S204, that is, in this example, before the terminal receives the first operation in S204, the initial video stream is played in sequence (ie, the normal playing stage). Wherein, the terminal receives the first operation in S204, indicating that the terminal has a requirement to start performing surround playback of the video content. For example, assuming that the initial video stream is original video stream 1, the terminal can sequentially request slice 1, slice 2, slice 3... in the compressed original video stream 1 from the distribution-side system, and decompress the requested and play the decompressed segments until the terminal executes S204, and stops acquiring the segments in the original video stream 1.

It should be noted that S203 is an optional step. If the technical solution of S203 is not performed, it can be understood that: the terminal directly plays the video stream corresponding to the video content to be played in a surround. Executing the technical solution of S203 can be understood as: the terminal first plays (or normally plays) the video stream corresponding to the video content to be played, and then plays the video stream corresponding to the video content to be played in a loop.

S204: The terminal receives the first operation, and in response to the first operation, the terminal acquires the wrapping direction of the video clip to be played, the time stamp corresponding to the start image of the video clip to be played, and the time stamp corresponding to the end image of the video clip to be played.

That is to say, the terminal starts to perform surround playback under the instruction of the user. Of course, the embodiment of the present application is not limited to this. For example, the terminal may start to perform surround playback when a preset image in a certain video stream is played.

The embodiments of the present application do not limit the specific implementation manner of the first operation. For example, the first operation may be a touch operation, a pressing operation, or a voice operation, or a combination of at least two operations described above. Here are some possible implementations:

In an implementation manner, taking the terminal as a terminal with a touch screen such as a smart phone and a tablet computer as an example, the terminal receives a user's touch operation on a rotating control on the touch screen. In response to the touch operation, the terminal determines the wrapping direction of the video clip to be played, the time stamp corresponding to the start image and the time stamp corresponding to the end image of the video clip to be played.

For example, when the touch operation is to slide the rotation control in the first direction (eg, to the left), it means that the wrapping direction of the video clip to be played is clockwise. When the touch operation is to slide the rotation control in the second direction (eg, rightward), it indicates that the wrapping direction of the video clip to be played is counterclockwise. That is, for touch operations in different directions of the same rotating control, different wrapping directions are indicated.

For another example, when the touch operation is a touch operation on the first rotation control, it means that the wrapping direction of the video clip to be played is clockwise. When the touch operation is a touch operation to the second rotation control, it indicates that the wrapping direction of the video clip to be played is a counterclockwise direction. That is, for touch operations of different rotating controls, different wrapping directions are indicated.

Optionally, the terminal determines the time stamp corresponding to the start image of the video clip to be played through the start touch time of the touch operation. For example, without considering the delay, the terminal may use the timestamp corresponding to the next frame of the currently playing image when receiving the touch operation as the timestamp corresponding to the start image of the video clip to be played. For another example, in the case of considering the delay, the terminal may use the timestamp corresponding to the next N frames of images of the currently playing image when receiving the touch operation as the timestamp corresponding to the start image of the video clip to be played; wherein, N is Integer greater than 1, N is a predefined value.

Optionally, the terminal determines the duration of the video clip to be played through the touch duration of the touch operation. Subsequently, the terminal may determine the time stamp corresponding to the end image of the to-be-played video clip based on the time-stamp corresponding to the start image of the to-be-played video clip and the duration of the to-be-played video clip.

In another implementation manner, taking the terminal as a remote controller, a set top box (STB), a smart phone, a platform computer, etc. with buttons as an example, the terminal receives the user's pressing operation on the rotary button. In response to the pressing operation, the terminal determines the wrapping direction of the video clip to be played, the time stamp corresponding to the start image and the time stamp corresponding to the end image of the video clip to be played. Similarly, for pressing operations of the same key in different directions, different wrapping directions are indicated; or, for pressing operations of different keys, different wrapping directions are indicated. Optionally, the terminal may determine the time stamp corresponding to the start image of the video clip to be played through the start pressing time of the pressing operation. Optionally, the terminal may determine the duration of the to-be-played video clip according to the pressing duration or the number of pressings of the pressing operation.

S205: The terminal determines the index of the video stream (ie the first target video stream) to which the video clip to be played belongs from the at least one target video stream based on the wrapping direction of the video clip to be played and the timestamp corresponding to the start image.

The wrapping direction corresponding to the first target video stream is the same as the wrapping direction of the video clip to be played. The first target video stream includes an image of the previous frame of the first image in the currently playing video stream, and the timestamp corresponding to the first image is the same as the timestamp corresponding to the start image of the video clip to be played.

For example, based on FIG. 9 , taking the time stamps corresponding to the images in each video stream as 1-100 in sequence, it is assumed that the terminal starts to play the first frame of the original video stream 1, and plays to the fourth frame ( When corresponding to time stamp 4), the first operation is received. In response to the first operation, the terminal determines that the wrapping direction of the video clip to be played is the clockwise direction, then the terminal may determine that the type of the first target video stream is the first type of video based on the wrapping direction of the video clip to be played is the clockwise direction. flow. In response to the first operation, without considering the delay, the terminal determines that the timestamp corresponding to the start image of the video clip to be played is timestamp 5 . Based on this, it can be known that the first image is the fifth frame image in the original video stream 1 . The first target video stream is the first type of video stream including the fourth frame image in the original video stream 1 , that is, the first type of video stream 3 .

S206: The terminal determines, based on the time stamp corresponding to the start image and the index of the first target video stream, the index of the first slice to which the start image belongs, and based on the time stamp corresponding to the end image and the first target video stream to determine the index of the second slice to which the starting image belongs.

For example, the terminal determines the index of the starting image based on the timestamp corresponding to the starting image and the index of the first target video stream; and determines the termination based on the timestamp corresponding to the ending image and the index of the first target video stream The index of the image. And, based on the index of the starting image, the index of the first slice to which the starting image belongs is determined; and based on the index of the ending image, the index of the second slice to which the ending image belongs is determined.

For another example, the terminal may directly determine the index of the first slice to which the starting image belongs based on the timestamp corresponding to the starting image and the index of the first target video stream. Correspondingly, the terminal may directly determine the index of the second slice to which the starting image belongs based on the timestamp corresponding to the termination image and the index of the first target video stream.

Based on the example in S205, taking the transmission of video streams between the distribution-side system and the terminal based on slice granularity as an example, since the start image is the fifth frame image in the first type of video stream 3, the end image is of the first type The 17th frame image in the video stream 3, therefore, the first slice and the second slice are slice 1 and slice 2 in the first type of video stream 3, respectively.

S207: The terminal sends a request message to the distribution-side system, where the request message includes the index of the first target video stream, the index of the first segment, and the index of the second segment.

S208: The distribution-side system searches for the video clip to be played based on the request message.

Specifically, the distribution-side system searches for the first target video stream from multiple original video streams and at least one target video stream based on the index of the first target video stream. Then, looking up the first slice from the first target video stream based on the index of the first slice; and looking up the second slice from the first target video stream based on the index of the second slice.

S209: The distribution-side system sends the found video clip to be played to the terminal.

Specifically, the distribution-side system sends the first fragment and the second fragment to the terminal.

S210: The terminal decompresses the video clip to be played, and plays the decompressed video clip to be played.

So far, surround playback is achieved.

Optionally, after the surround playback ends, the terminal may continue to request and play the images after the starting point in the original video stream using the next frame of image in the original video stream where the last frame of the image in the surround playback stage is located as the starting point.

Optionally, before performing S210, the method may further include: the terminal splicing the to-be-played video clips after the played video clips according to the time stamps corresponding to the received images in the to-be-played video clips. In S210, the terminal decompresses the video clip to be played based on the spliced video clip.

As shown in FIG. 12 , it is a schematic diagram of the video stream transmitted between the distribution-side system and the terminal and the video stream played by the terminal provided based on the example in S206 . Among them, the video streams transmitted between the distribution-side system and the terminal are in order: slice 1 in the compressed original video stream 1, slices 1-2 in the compressed first type video stream 3, and compressed original video stream 1.

Fragments

2, 3 in video stream 4...

For the terminal, perform the following steps:

After receiving the slice 1 in the compressed original video stream 1, the terminal decompresses the first to fourth frames of images in the compressed original video stream 1.

After receiving the segment 1 in the compressed first type video stream 3, the terminal splices the 5th to 10th frames in the compressed first type video stream 3 to the 1st to 4th frames in the original video stream 1 After the frame image, the 5th to 10th frame images in the compressed video stream 3 of the first type are decompressed. Since the fifth frame image in the first type video stream 3 is compressed based on the fourth frame image in the original video stream 1 during encoding, during decoding, the fifth frame image in the compressed first type video stream 3 The frame image can be decompressed based on the 4th frame image in the original video stream 1.

After receiving the segment 2 in the compressed first-type video stream 3, the terminal splices the 11th to 13th frames of the compressed first-type video stream 3 to the fifth frame in the first-type video stream 3. After -10 frames of images, the 11-13 frames of images in the compressed first-type video stream 3 are decompressed. Since the 11th frame image in the first type video stream 3 is an I frame and can be decoded independently, it is also possible to directly perform the 11-13th frame images in the compressed first type video stream 3 without splicing. unzip.

After receiving the slice 2 in the compressed original video stream 4, the terminal splices the 14th-20th frame images in the compressed original video stream 4 to the 11th-13th frame images in the first type video stream 3 After that, the 14th to 20th frame images in the compressed original video stream 4 are decompressed. Because during encoding, the 14th frame image in the original video stream 4 is encoded based on the 13th frame image in the original video stream 4, and the 13th frame image in the original video stream 4 is the same as that in the first type of video stream 3. The 13th frame image is the same, that is to say, the 14th frame image in the original video stream 4 is encoded based on the 13th frame image in the first type video stream 3 . Therefore, during decoding, the 14th frame image in the compressed original video stream 4 can be obtained by decompressing the 13th frame image in the first type of video stream 3 .

In the video playback method provided by the embodiment of the present application, the distribution-side system selects, at the request of the terminal, a video segment to be played currently required by the terminal from at least one target video stream. A certain number of frame images are selected from the original video stream collected by the camera to obtain the video stream. Since the video clip to be played by the terminal is selected from the first target video stream, the terminal does not need to acquire images compressed based on different original video streams. Therefore, it helps to reduce the transmission bit rate of the video stream.

Further, since in the encoding stage, the encoder compresses the target video stream, it can be equivalent that an image in one original video stream can be compressed with reference to an image in another original video stream. In the surround playback stage, the video clips provided by the distribution-side system to the terminal are directly selected from the target video stream. Therefore, the terminal can decode an image in one original video stream to obtain an image in another original video stream. Therefore, compared with the conventional technology, the video stream transmitted during video playback does not need to include many I-frames, thereby helping to reduce the transmission bit rate of the video stream. Specific examples thereof are the examples shown in FIGS. 9 and 12 .

In addition, since the above video coding method does not need to combine multiple video streams into one high-resolution picture, the number of cameras and picture resolution are not limited. That is to say, after video encoding is performed by using the above video encoding method, in the video playback stage, the transmission bit rate of the video stream can be reduced under the condition of ensuring the picture resolution.

In addition, in this technical solution, the determination of the index of the first target video stream, the index of the segment to which the starting image and the ending image of the video clip to be played belong, etc. are all performed on the terminal side, and the system on the distribution side does not perceive it. Therefore, this technology The improvement of the solution to the distribution-side system is small, so it can be applied to the traditional distribution-side system.

Alternatively, the above S204-S207 may also be performed by the distribution-side system. For example, the terminal sends information related to the first operation (such as the rotary key for the first operation, the operation duration of the first operation, etc.) to the distribution-side system, and the distribution-side system executes the first operation based on the information related to the first operation. process. At this time, it is also unnecessary to perform the above S202, that is, the terminal does not need to acquire various index information of the video stream. In this way, the processing pressure of the terminal is relieved.

In some embodiments, if S106 is replaced with: the compressed multiple original video streams and the compressed at least one target video stream are encapsulated, then:

The index information in S202 may also include an index of a segment in the slice, such as an index of each segment in each slice.

S206 may further include: the terminal determining the index of the segment to which the starting image belongs in the first slice, and determining the index of the segment to which the termination image belongs in the second slice. Based on the example in S205, the segment to which the start image of the video clip to be played and the segment to which the end image belongs are the 5th segment and the 17th segment in the first-type video stream 3, respectively.

The request message in S207 may further include: an index in the first slice of the segment to which the start image belongs, and an index in the second slice of the segment to which the end image belongs.

S208 may further include: the distribution-side system searches for the segment to which the start image belongs from the first segment based on the index of the segment to which the start image belongs, and retrieves the segment from the second segment based on the index of the segment to which the end image belongs. Find the segment to which the termination image belongs.

S209 may include: the distribution-side system sends to the terminal the segment to which the start image in the first segment belongs, the segment to which the end image in the second segment belongs, and other segments between the two segments part.

As shown in FIG. 13 , it is a schematic diagram of the video stream transmitted between the distribution side system and the terminal and the video stream played by the terminal provided based on the example in S205 and in combination with segmentation. The video streams transmitted between the distribution-side system and the terminal are in sequence: the 1st to 4th frames of the compressed original video stream 1, the 5th to 17th frames of the compressed first-type video stream 3, and the compressed The 18th, 19th... frame images in the original video stream 4 of .

For related descriptions such as splicing and decompression, reference may be made to the above description of FIG. 12 , which will not be repeated here.

Compared with the example of FIG. 12 , in the example shown in FIG. 13 , fewer images are transmitted between the distribution-side system and the terminal, that is, the images in one slice are encapsulated into multiple Transmitting video streams at segment granularity helps reduce the transmission of redundant images, thereby saving transmission resources. At the same time, the use of segmented granularity transmission is consistent with the use of segmented transmission during terminal playback, and the first to fourth frames in the original video stream 1, the 5th to 17th frames in the first type of video stream 3 are played in sequence, and The 18th, 19th... frame images in the original video stream 4.

In other embodiments, if in the video encoding stage, the encoder does not perform fragmentation processing on the compressed video stream, but directly encapsulates it, then:

The index information in S202 may include: an index of a video stream corresponding to the to-be-played video content and an index of a segment in the video stream (eg, an index of each video stream and an index of each segment in each video stream).

S206 can be replaced with: the terminal determines the index of the first segment to which the starting image belongs based on the timestamp corresponding to the starting image and the index of the first target video stream; and based on the timestamp corresponding to the ending image and the first target video stream The index of the target video stream, which determines the index of the second segment to which the termination image belongs.

The request message in S207 may include: the index of the first target video stream, the index of the first segment, and the index of the second segment.

S209 specifically includes: the distribution-side system sends the first segment, the second segment, and the segment between the two segments to the terminal.

The solutions provided by the embodiments of the present application have been introduced above mainly from the perspective of methods. In order to realize the above-mentioned functions, it includes corresponding hardware structures and/or software modules for executing each function. Those skilled in the art should easily realize that the present application can be implemented in hardware or a combination of hardware and computer software with the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein. Whether a function is performed by hardware or computer software driving hardware depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each particular application, but such implementations should not be considered beyond the scope of this application.

In this embodiment of the present application, functional modules may be divided into a video encoding device (such as an encoder) and a video decoding device (such as a CDN system or a terminal) according to the foregoing method examples. For example, each functional module may be divided according to each function, or two One or more functions are integrated in one processing module. The above-mentioned integrated modules can be implemented in the form of hardware or in the form of software function modules. It should be noted that, the division of modules in the embodiments of the present application is schematic, and is only a logical function division, and there may be other division manners in actual implementation.

As shown in FIG. 14 , FIG. 14 shows a schematic structural diagram of a video encoding apparatus 140 provided by an embodiment of the present application. The video encoding apparatus 140 can be used to perform any of the video encoding methods provided above, for example, to perform the steps performed by the encoder in the video encoding method shown in FIG. 8 .

For example, the video encoding apparatus 140 may include: an acquisition unit 1401 , a generation unit 1402 and a compression unit 1403 .

The acquiring unit 1401 is configured to acquire multiple original video streams, wherein the multiple original video streams are obtained based on the video streams collected by multiple cameras for the same spatial region in the same time period. The generating unit 1402 is configured to generate at least one target video stream according to the multiple original video streams, wherein the target video stream is a video stream obtained by selecting a certain number of frame images from the original video streams corresponding to each camera according to the set direction . A compression unit 1403, configured to compress the at least one target video stream. For example, with reference to FIG. 8 , the acquiring unit 1401 may be configured to perform the receiving action corresponding to S102 , the generating unit 1402 may be configured to perform S103 , and the compressing unit 1403 may be configured to perform S105 .

Optionally, the time stamps corresponding to the images in the target video stream are consecutive.

Optionally, the multiple original video streams include a first original video stream, and the target video stream takes a first frame image in the first original video stream as a starting point.

Optionally, the video encoding apparatus 140 further includes: a sending unit 1404 and an encapsulating unit 1405 .

Optionally, the generating unit 1402 is further configured to generate an index of the at least one target video stream. The index of the target video stream includes: the identification of the camera corresponding to the target video stream and the category of the target video stream, the camera corresponding to the target video stream is the camera corresponding to the first frame image in the target video stream, and the category of the video stream is used to represent the video The stream is the original video stream or the destination video stream. The sending unit 1404 is configured to send the index of the at least one target video stream. For example, with reference to FIG. 8 , the generating unit 1402 can be used to execute S106, and the sending unit 1404 can be used to execute S107.

Optionally, the set direction is a surrounding direction of the plurality of cameras, and the surrounding direction includes a clockwise direction or a counterclockwise direction.

Optionally, the encapsulation unit 1405 is configured to encapsulate the compressed at least one target video stream to obtain multiple segments. The generating unit 1402 is further configured to generate indexes of the plurality of segments. The sending unit 1404 is configured to send the indices of the multiple segments.

For the specific description of the foregoing optional manners, reference may be made to the foregoing method embodiments, which will not be repeated here. In addition, the explanation of any of the video encoding apparatuses 140 provided above and the description of the beneficial effects may refer to the corresponding method embodiments above, which will not be repeated.

As an example, with reference to FIG. 7 , some or all of the functions implemented in the acquiring unit 1401 , the generating unit 1402 , the compression unit 1403 , and the encapsulating unit 1405 in the video encoding apparatus 140 may be executed by the processor 701 in FIG. 7 . Program code in memory 702 is implemented. The sending unit 1404 may be implemented by the sending unit in the communication interface 703 in FIG. 7 .

As shown in FIG. 15 , FIG. 15 shows a schematic structural diagram of a video playback apparatus 150 provided by an embodiment of the present application. The video playback apparatus 150 can be used to perform any of the video playback methods provided above, for example, to perform the steps performed by the terminal in the video playback method shown in FIG. 11 .

For example, the video playback device 150 may include: a receiving unit 1501, a decompressing unit 1502, and a playing unit 1503.

The receiving unit 1501 is configured to receive a video clip to be played, and the video clip to be played is selected from a first target video stream. The first target video stream is a video stream obtained by selecting a certain number of frame images from original video streams corresponding to each of the plurality of cameras according to a set direction. The multiple original video streams corresponding to the multiple cameras are obtained based on the video streams collected by the multiple cameras for the same spatial area in the same time period. The decompression unit 1502 is configured to decompress the video segment to be played. The playing unit 1503 is used to play the decompressed video segment. For example, referring to FIG. 11 , the receiving unit 150 can be used to perform the receiving action corresponding to S209 , the decompression unit 1502 can be used to perform the decompression step in S210 , and the playing unit 1503 can be used to perform the playing step in S210 .

Optionally, the time stamps corresponding to the images in the first target video stream are consecutive.

Optionally, the multiple original video streams include a first original video stream, and the first target video stream takes a first frame image in the first original video stream as a starting point.

Optionally, the video playback apparatus 150 further includes: a sending unit 1504 and a determining unit 1505 .

Optionally, the sending unit 1504 is configured to send a request message, where the request message includes an index of the first target video stream. The index of the first target video stream is used to determine the first target video stream. The index of the first target video stream includes an identifier of a camera corresponding to the first target video stream and a category of the first target video stream. The camera corresponding to the first target video stream is the camera corresponding to the first frame image in the first target video stream. The category of the video stream is used to characterize whether the video stream is the original video stream or the target video stream. For example, in conjunction with FIG. 11 , the sending unit 1504 can be used to perform S207.

Optionally, the request message further includes the index of the target segment to which the video segment to be played belongs. The receiving unit 1501 is specifically configured to: receive the target segment.

Optionally, the determining unit 1505 is configured to determine the wraparound direction of the video clip to be played and the timestamp corresponding to the start image of the video clip to be played; and, based on the wraparound direction of the video clip to be played and the start of the video clip to be played The timestamp corresponding to the image determines the index of the first target video stream.

Optionally, the wrapping direction corresponding to the first target video stream is the same as the wrapping direction of the video clip to be played. The first target video stream includes the image of the previous frame of the first image in the currently playing video stream, and the timestamp corresponding to the first image is the same as the timestamp corresponding to the starting image.

For the specific description of the foregoing optional manners, reference may be made to the foregoing method embodiments, which will not be repeated here. In addition, the explanation and description of the beneficial effects of any one of the video playback apparatuses 150 provided above may refer to the above-mentioned corresponding method embodiments, which will not be repeated.

As an example, with reference to FIG. 7 , some or all of the functions implemented in the decompression unit 1502 and the determination unit 1505 in the video playback device 150 may be implemented by the processor 701 in FIG. 7 executing the program code in the memory 702 in FIG. 7 . . The receiving unit 1501 may be implemented by the receiving unit in the communication interface 703 in FIG. 7 . The sending unit 1504 can be implemented by the sending unit in the communication interface 703 in FIG. 7 . The playback unit 1503 can be implemented by a display screen, an audio input and output device, etc. (not shown in FIG. 7 ).

As shown in FIG. 16 , FIG. 16 shows a schematic structural diagram of a video playback apparatus 160 provided by an embodiment of the present application. The video playback apparatus 160 can be used to perform any of the video playback methods provided above, for example, to perform the steps performed by the network device in the video playback method shown in FIG. 11 .

For example, the video playback apparatus 160 may include: a determining unit 1601 and a sending unit 1602 .

The determining unit 1601 is configured to determine the video stream segment to be played. Wherein, the video clip to be played is selected from the first target video stream, and the first target video stream is a video stream obtained by selecting a certain number of frame images from the original video streams corresponding to each camera in the plurality of cameras according to the set direction. The multiple original video streams corresponding to the multiple cameras are obtained based on the video streams collected by the multiple cameras for the same spatial area in the same time period. The sending unit 1602 is configured to send the to-be-played video stream segment to the terminal. For example, with reference to FIG. 11 , the determining unit 1601 can be used to perform S208, and the sending unit 1602 can be used to perform S209.

Optionally, the video playback apparatus 160 may further include: a receiving unit 1603 . The receiving unit 1603 is configured to receive a request message sent by the terminal, where the request message is used to request a video clip to be played.

Optionally, the request message includes an index of the first target video stream, and the index of the first target video stream includes an identifier of a camera corresponding to the first target video stream and a category of the first target video stream. Wherein, the camera corresponding to the first target video stream is the camera corresponding to the first frame image in the first target video stream. The category of the video stream is used to characterize whether the video stream is the original video stream or the target video stream. The determining unit 1601 is further configured to determine the first target video stream based on the index of the first target video stream.

Optionally, the sending unit 1602 is further configured to send the index of at least one target video stream to the terminal. The at least one target video stream is generated according to the plurality of original video streams, and the at least one target video stream includes a first target video stream.

Optionally, the request message further includes an index of the target segment to which the video segment to be played belongs. The determining unit 1601 is further configured to, based on the index of the target segment, determine the target segment from the first target video stream. In this case, the sending unit 1602 is specifically configured to send the target segment to the terminal.

Optionally, the sending unit 1602 is further configured to send the index of the segment in the at least one target video stream to the terminal. Wherein, the at least one target video stream is generated according to the plurality of original video streams, and the at least one target video stream includes a first target video stream.

For the specific description of the foregoing optional manners, reference may be made to the foregoing method embodiments, which will not be repeated here. In addition, for the explanation of any of the above-mentioned video playback apparatuses 160 and the description of the beneficial effects, reference may be made to the above-mentioned corresponding method embodiments, which will not be repeated.

As an example, with reference to FIG. 7 , part or all of the functions implemented by the determining unit 1601 in the video playback device 160 may be implemented by the processor 701 in FIG. 7 executing the program codes in the memory 702 in FIG. 7 . The sending unit 1602 may be implemented by the sending unit in the communication interface 703 in FIG. 7 . The receiving unit 1603 may be implemented by the receiving unit in the communication interface 703 in FIG. 7 .

The embodiment of the present application also provides a video system, including a network device and a terminal. A network device, used for: acquiring multiple original video streams, which are obtained based on video streams collected by multiple cameras for the same spatial area in the same time period; and generating at least one target video based on the multiple original video streams The target video stream is a video stream obtained by selecting a certain number of frame images from the original video stream corresponding to each camera according to the set direction; and, compressing at least one target video stream. The terminal is used for: receiving the video clip to be played from the network device, the video clip to be played is selected from the first target video stream in the compressed at least one target video stream; and decompressing the video clip to be played, and then playing the decompressed Video clips after.

Wherein, the network device here may be the encoder described above. Reference may be made to the above with regard to other functions performed by the encoder, for example with reference to the embodiment shown in Figure 8 above. The terminal may be the terminal described above. For other functions performed by the terminal, reference may be made to the above, for example, the embodiment shown in FIG. 11 above.

In an implementation manner, the network device may also be used to perform the steps performed by the distribution-side system in the embodiment shown in FIG. 11 above.

In another implementation manner, the video system may further include a distribution-side system, where the distribution-side system is used for sending the encoding result with the encoder, and distributing the to-be-played video segment to the terminal. For the specific implementation method, refer to the above, for example, refer to the embodiment shown in FIG. 11 above.

Embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program runs on a computer, the computer is made to execute any one of the above-mentioned methods. Video encoding method or video playback method.

For the explanation and description of the beneficial effects of any fault handling system and computer-readable storage medium provided above, reference may be made to the above-mentioned corresponding embodiments, which will not be repeated here.

The embodiment of the present application also provides a chip. The chip integrates a control circuit and one or more ports for realizing the functions of the video encoding device 140 , the video playing device 150 or the video playing device 160 . Optionally, for the functions supported by the chip, reference may be made to the above, which will not be repeated here. Those of ordinary skill in the art can understand that all or part of the steps for implementing the above embodiments can be completed by instructing relevant hardware through a program. The described program can be stored in a computer-readable storage medium. The above-mentioned storage medium may be a read-only memory, a random access memory, or the like. The above-mentioned processing unit or processor can be a central processing unit, a general-purpose processor, a specific integrated circuit (application specific integrated circuit, ASIC), a microprocessor (digital signal processor, DSP), a field programmable gate array (field programmable gate array, FPGA) or other programmable logic devices, transistor logic devices, hardware components, or any combination thereof.

The embodiments of the present application also provide a computer program product containing instructions, when the instructions are run on a computer, the instructions cause the computer to execute any one of the methods in the foregoing embodiments. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the procedures or functions according to the embodiments of the present application are generated in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable device. Computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from a website site, computer, server, or data center over a wire (e.g. coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (eg infrared, wireless, microwave, etc.) means to transmit to another website site, computer, server or data center. Computer-readable storage media can be any available media that can be accessed by a computer or data storage devices including one or more servers, data centers, etc., that can be integrated with the media. Useful media may be magnetic media (eg, floppy disks, hard disks, magnetic tapes), optical media (eg, DVDs), or semiconductor media (eg, SSDs), and the like.

It should be noted that the above-mentioned devices for storing computer instructions or computer programs provided in the embodiments of the present application, such as but not limited to, the above-mentioned memory, computer-readable storage medium, and communication chip, etc., are all non-transitory (non-transitory) .

Other variations to the disclosed embodiments can be understood and effected by those skilled in the art in practicing the claimed application, from review of the drawings, the disclosure, and the appended claims. In the claims, the word "comprising" does not exclude other components or steps, and "a" or "an" does not exclude a plurality. A single processor or other unit may fulfill the functions of several items recited in the claims. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that these measures cannot be combined to advantage. Although the application has been described in conjunction with specific features and embodiments thereof, various modifications and combinations may be made without departing from the spirit and scope of the application. Accordingly, this specification and drawings are merely exemplary illustrations of the application as defined by the appended claims, and are deemed to cover any and all modifications, variations, combinations or equivalents within the scope of this application.

Claims

A video coding method, characterized in that the method comprises:

Acquiring multiple original video streams, wherein the multiple original video streams are obtained based on video streams collected by multiple cameras for the same spatial area in the same time period;

generating at least one target video stream according to the plurality of original video streams, wherein the target video stream is a video stream obtained by selecting a certain number of frame images from the original video streams corresponding to each camera according to a set direction;

The at least one target video stream is compressed.
The method according to claim 1, wherein the time stamps corresponding to the images in the target video stream are continuous.
The method according to claim 1 or 2, wherein the multiple original video streams comprise a first original video stream, and the target video stream starts from a first frame image in the first original video stream .
The method according to any one of claims 1 to 3, wherein the method further comprises:

Generate and send the index of the at least one target video stream; wherein, the index of the target video stream includes: the identifier of the camera corresponding to the target video stream and the category of the target video stream, the target video stream corresponding The camera is the camera corresponding to the first frame image in the target video stream, and the category of the video stream is used to indicate whether the video stream is the original video stream or the target video stream.
The method according to any one of claims 1 to 4, wherein the set direction is a surrounding direction of the plurality of cameras, and the surrounding direction includes a clockwise direction or a counterclockwise direction.
The method according to any one of claims 1 to 5, wherein the method further comprises:

encapsulating the compressed at least one target video stream to obtain multiple segments;

An index of the plurality of segments is generated and sent.
A video playback method, characterized in that the method comprises:

Receive a video clip to be played, the video clip to be played is selected from a first target video stream, and the first target video stream is to select a certain number of original video streams corresponding to each camera in a plurality of cameras according to a set direction A video stream obtained from a frame image; the multiple original video streams corresponding to the multiple cameras are obtained based on the video streams collected by the multiple cameras for the same spatial area in the same time period;

Decompress the to-be-played video clip, and then play the decompressed video clip.
The method according to claim 7, wherein the time stamps corresponding to the images in the first target video stream are consecutive.
The method according to claim 7 or 8, wherein the plurality of original video streams comprise a first original video stream, and the first target video stream uses a first frame image in the first original video stream as a starting point.
The method according to any one of claims 7 to 9, wherein the method further comprises:

Send a request message, where the request message includes an index of the first target video stream, where the index of the first target video stream is used to determine the first target video stream; wherein, the index of the first target video stream Including the identification of the camera corresponding to the first target video stream and the category of the first target video stream; the camera corresponding to the first target video stream is corresponding to the first frame image in the first target video stream The category of camera, video stream is used to characterize whether the video stream is the original video stream or the target video stream.
The method according to claim 10, wherein the request message further includes an index of a target segment to which the video clip to be played belongs; and the receiving the video stream clip to be played comprises:

The target segment is received.
The method according to any one of claims 7 to 11, wherein the method further comprises:

determining the wraparound direction of the to-be-played video clip and the timestamp corresponding to the start image of the to-be-played video clip;

The index of the first target video stream is determined based on the wraparound direction of the to-be-played video clip and the timestamp corresponding to the start image of the to-be-played video clip.
The method of claim 12, wherein:

The wrapping direction corresponding to the first target video stream is the same as the wrapping direction of the to-be-played video clip;

The first target video stream includes an image of the previous frame of the first image in the currently playing video stream, and the timestamp corresponding to the first image is the same as the timestamp corresponding to the starting image.
A video encoding device, characterized in that the device comprises:

an acquisition unit, configured to acquire multiple original video streams, wherein the multiple original video streams are obtained based on the video streams collected by multiple cameras in the same space region in the same time period;

A generating unit, configured to generate at least one target video stream according to the plurality of original video streams, where the target video stream is a video stream obtained by selecting a certain number of frame images from the original video streams corresponding to each camera according to a set direction ;

a compression unit, configured to compress the at least one target video stream.
The apparatus according to claim 14, wherein the time stamps corresponding to the images in the target video stream are consecutive.
The apparatus according to claim 14 or 15, wherein the plurality of original video streams comprise a first original video stream, and the target video stream takes a first frame image in the first original video stream as a starting point .
The device according to any one of claims 14 to 16, characterized in that:

The generating unit is further configured to generate an index of the at least one target video stream; wherein, the index of the target video stream includes: an identifier of a camera corresponding to the target video stream and a category of the target video stream, and the index of the target video stream includes: The camera corresponding to the target video stream is the camera corresponding to the first frame image in the target video stream, and the category of the video stream is used to characterize whether the video stream is an original video stream or a target video stream;

The apparatus further includes: a sending unit configured to send an index of the at least one target video stream.
The device according to any one of claims 14 to 17, wherein the set direction is a surrounding direction of the plurality of cameras, and the surrounding direction includes a clockwise direction or a counterclockwise direction.
The device according to any one of claims 14 to 18, wherein the device further comprises: an encapsulating unit and a sending unit;

the encapsulation unit, configured to encapsulate the compressed at least one target video stream to obtain a plurality of segments;

The generating unit is further configured to generate indexes of the multiple segments;

The sending unit is configured to send the indices of the multiple segments.
A video playback device, characterized in that the device comprises:

A receiving unit, configured to receive a video clip to be played, the video clip to be played is selected from a first target video stream, and the first target video stream is an original video corresponding to each camera in the plurality of cameras according to a set direction A video stream obtained by selecting a certain number of frame images in the stream; the multiple original video streams corresponding to the multiple cameras are obtained based on the video streams collected by the multiple cameras in the same time period for the same spatial area;

a decompression unit, configured to decompress the to-be-played video clip;

The playback unit is used to play the decompressed video segment.
The apparatus according to claim 20, wherein the time stamps corresponding to the images in the first target video stream are consecutive.
The apparatus according to claim 20 or 21, wherein the plurality of original video streams comprise a first original video stream, and the first target video stream uses a first frame image in the first original video stream as a starting point.
The device according to any one of claims 20 to 22, wherein the device further comprises:

a sending unit, configured to send a request message, where the request message includes an index of the first target video stream, and the index of the first target video stream is used to determine the first target video stream; wherein the first target video stream is The index of the target video stream includes the identifier of the camera corresponding to the first target video stream and the category of the first target video stream; the camera corresponding to the first target video stream is the first target video stream in the first target video stream. The camera corresponding to a frame of image, and the category of the video stream is used to indicate whether the video stream is the original video stream or the target video stream.
The apparatus according to claim 23, wherein the request message further comprises an index of the target segment to which the video segment to be played belongs;

The receiving unit is specifically configured to: receive the target segment.
The device according to any one of claims 20 to 24, wherein the device further comprises:

a determining unit, configured to determine the wrapping direction of the video clip to be played and the time stamp corresponding to the start image of the video clip to be played; and, based on the wrapping direction of the video clip to be played and the video clip to be played The time stamp corresponding to the starting image of the first target video stream is determined.
The apparatus of claim 25, wherein:

The wrapping direction corresponding to the first target video stream is the same as the wrapping direction of the to-be-played video clip;

The first target video stream includes an image of the previous frame of the first image in the currently playing video stream, and the timestamp corresponding to the first image is the same as the timestamp corresponding to the starting image.
A video system, comprising: a network device and a terminal;

The network device is configured to: acquire multiple original video streams, the multiple original video streams are obtained based on video streams collected by multiple cameras for the same spatial area in the same time period; and based on the multiple original video streams , generating at least one target video stream, the target video stream is a video stream obtained by selecting a certain number of frame images from the original video stream corresponding to each camera according to the set direction; And, compressing the at least one target video stream;

The terminal is configured to: receive a to-be-played video clip from the network device, the to-be-played video clip is selected from a first target video stream in the compressed at least one target video stream; Play the video clip for decompression, and then play the decompressed video clip.
A video encoding device, comprising a memory and a processor, wherein the memory is used to store a computer program, and the processor is used to call the computer program to execute the method according to any one of claims 1 to 6 method.
A video playback device, characterized in that it comprises a memory and a processor, the memory is used to store a computer program, and the processor is used to call the computer program to execute any one of claims 7 to 13. method.