CN112738508A

CN112738508A - Video coding method, video determining method, video processing method, server and VR terminal

Info

Publication number: CN112738508A
Application number: CN202011624218.6A
Authority: CN
Inventors: 陈旻; 宋利; 杨榛; 邢刚; 冯亚楠; 蔡卫勇; 柳建龙
Original assignee: China Mobile Communications Group Co Ltd; MIGU Culture Technology Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; MIGU Culture Technology Co Ltd
Priority date: 2020-12-31
Filing date: 2020-12-31
Publication date: 2021-04-30

Abstract

The invention discloses a video coding method, a video coding determining method, a video coding processing method, a server and a VR terminal, relates to the technical field of virtual reality, and aims to solve the problems that an existing video stream switching mode is large in network bandwidth occupation and smooth switching time fluctuates. The encoding method comprises the steps of obtaining N paths of video streams corresponding to a target video, wherein N is a positive integer and is more than or equal to 2, the number of P frames in initial GOPs of the N paths of video streams is different, and the number of P frames in non-initial GOPs is the same. The embodiment of the invention can realize low smooth switching time delay under the condition of low playing resource occupation.

Description

Video coding method, video determining method, video processing method, server and VR terminal

Technical Field

The invention relates to the technical field of virtual reality, in particular to a video coding method, a video determining method, a video processing method, a server and a VR terminal.

Background

Although the 360 ° VR (Virtual Reality) streaming panoramic video is a 360 ° panoramic frame, the visible range of human eyes is only about 110 °, and most 360 ° panoramic images are wasted because they are outside the visible range of human eyes at the same time. To save playback resources (bandwidth, decoding effort), VR FOV (Field of View) techniques have emerged. The VR FOV is a technical scheme for encoding and transmitting FOV streaming media for showing VR pictures in a low-quality full-view high-quality visual area, and can achieve the purpose of saving playing resources on the premise of not influencing user experience as much as possible.

At present, in order to increase the compression ratio, a video stream based on h.265/HEVC coding introduces an "inter-frame prediction" mechanism, and the video stream is composed of several groups of "a combination of a key frame (independently decodable, I frame) and a predicted frame (not independently decodable, P frame) of a frame N, where the decoding process of the P frame needs to refer to the decoding result of the video of the previous frame. Generally, an I frame occupies a much larger number of bytes than a P frame.

The VR FOV requires a smooth switching (transition) time (MTP latency) from high definition quality (head front view) to low quality (head back view) to high quality (head back view) of the view. For high-quality code streams of the front and rear view angles of the head, the switching between two discontinuous video streams is almost equivalent. If the current time point of the new view at the switching time point is a P frame, this frame cannot be decoded with reference to the last frame of the old view because its previous frame belongs to another discontinuous piece of video.

In the prior art, when the switching of the user's head turning FOV high quality video stream is processed, in order to ensure smooth switching of normal decoding and reduce MTP latency, the number of I frames is increased, and the period of occurrence of I frames in the video stream is shortened.

However, increasing the number of I frames leads to an increase in video flow volume after h.265/HEVC coding compression, increasing network bandwidth occupation, or a decrease in video quality at the same bitrate; and because the user turn-around event happens randomly, the decoding side can not accurately control the interval between the current P frame and the reference I frame, and can not accurately control the time consumed by invalid decoding, thereby causing MTP latency fluctuation and slow time.

Disclosure of Invention

The embodiment of the invention provides a video coding method, a determining method, a processing method, a server and a VR terminal, and aims to solve the problems of large network bandwidth occupation and smooth switching time fluctuation in the conventional video stream switching mode.

In a first aspect, an embodiment of the present invention provides a video encoding method, including:

and acquiring N paths of video streams corresponding to the target video, wherein N is a positive integer and is more than or equal to 2, the number of P frames in initial GOPs between the N paths of video streams is different, and the number of P frames in non-initial GOPs is the same.

Optionally, the obtaining N paths of video streams corresponding to the target video includes:

determining the number of P frames in the initial GOP corresponding to the N paths of video streams respectively;

determining the number of P frames in a non-initial GOP of N paths of video streams;

and coding the target video according to the number of P frames in the initial GOP corresponding to the N paths of video streams and the number of P frames in the non-initial GOP of the N paths of video streams to obtain the N paths of video streams of the target video.

Optionally, the determining the number of P frames in the starting GOP corresponding to each of the N paths of video streams includes:

and determining the number of P frames in the initial GOP corresponding to the N paths of video streams respectively according to the number of P frames in the non-initial GOP of the N paths of video streams, the stream sequence number of the N paths of video streams and the encoding path number of the target video.

In a second aspect, an embodiment of the present invention provides a method for determining a video stream, including:

when the virtual reality VR terminal is determined to be switched from a first FOV to a second FOV, determining a first video stream from N paths of video streams according to attribute information of the N paths of video streams corresponding to a target video;

wherein the first FOV and the second FOV are field angles at which the VR terminal plays the target video; the attribute information includes: the number of P frames in a starting GOP and the number of P frames in a non-starting GOP among the N paths of video streams; the number of P frames in the initial GOP among the N paths of video streams is different, the number of P frames in the non-initial GOP is the same, N is a positive integer and is more than or equal to 2; and the target P frame in the first video stream is closest to the I frame, and the target P frame is a P frame corresponding to the second FOV at the switching moment.

Optionally, the step of determining a first video stream from the N video streams according to the attribute information of the N video streams corresponding to the target video includes:

calculating an L value according to the video playing time corresponding to the FOV switching time, the number of P frames in the initial GOP between the N paths of video streams and the number of P frames in the non-initial GOP; the L value is the number of P frames spaced between the nearest I frame and the P frame corresponding to the second FOV at the view angle switching moment in each path of video stream;

and taking the video stream with the minimum L value as the first video stream.

In a third aspect, an embodiment of the present invention further provides a video processing method, applied to a server, including:

receiving request information which is sent by a VR terminal and used for acquiring a first video stream;

sending the first video stream to the VR terminal according to the request information;

wherein the request information is sent when the VR terminal is switched from a first FOV to a second FOV, and the first FOV and the second FOV are field angles of a target video played by the virtual reality VR terminal; the first video stream is determined from N paths of video streams corresponding to a target video; the distance between a target P frame in the first video stream and an I frame is the shortest, and the target P frame is a P frame corresponding to the second FOV at the moment of switching the view angle; the number of P frames in the initial GOP among the N paths of video streams is different, the number of P frames in the non-initial GOP is the same, N is a positive integer and is more than or equal to 2.

In a fourth aspect, an embodiment of the present invention further provides a video processing method, applied to a VR terminal, including:

under the condition that the first FOV is determined to be switched to the second FOV, request information for acquiring the first video stream is sent to the server side; the first FOV and the second FOV are field angles of a target video played by the virtual reality VR terminal; the first video stream is determined from N paths of video streams corresponding to a target video; and the distance between a target P frame in the first video stream and an I frame is the shortest, and the target P frame is a P frame corresponding to the second FOV at the moment of view angle switching; the number of P frames in the initial GOP among the N paths of video streams is different, the number of P frames in the non-initial GOP is the same, N is a positive integer and is more than or equal to 2;

receiving the first video stream sent by the server according to the request information;

and taking the first video stream as a playing source of the second FOV for decoding and playing.

In a fifth aspect, an embodiment of the present invention further provides a server, including: a transceiver, a memory, a processor, and a computer program stored on the memory and executable on the processor; the processor is configured to read the program in the memory to implement the steps in the method according to the first aspect, the second aspect, or the third aspect.

In a sixth aspect, an embodiment of the present invention further provides a VR terminal, including: a transceiver, a memory, a processor, and a computer program stored on the memory and executable on the processor; the processor is configured to read the program in the memory to implement the steps of the method according to the second aspect or the fourth aspect.

In a seventh aspect, the embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps in the method according to the first aspect, the second aspect, the third aspect, or the fourth aspect.

In the embodiment of the invention, N paths of video streams corresponding to a target video are obtained through coding, wherein N is a positive integer and is more than or equal to 2, the number of P frames in initial GOPs between the N paths of video streams is different, and the number of P frames in non-initial GOPs is the same. Therefore, more P frame number can be set in the non-initial GOP, so that larger I frame interval is realized, each path of video stream can be ensured to obtain larger compression ratio, and the occupation of network resources is saved; and the VR terminal can also select the video stream where the I frame closest to the current reference P is located from the N paths of video streams, so that the time consumption of invalid decoding is effectively controlled, and the smooth switching time is reduced. Therefore, the scheme of the embodiment of the invention can realize low smooth switching time delay under the condition of low playing resource occupation.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive exercise.

Fig. 1 is a flowchart of a video encoding method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a 3-way encoded video stream provided by an embodiment of the present invention;

fig. 3 is a flowchart of a method for determining a video stream according to an embodiment of the present invention;

fig. 4 is a flowchart of a video processing method according to an embodiment of the present invention;

FIG. 5 is a second flowchart of a video processing method according to an embodiment of the present invention;

fig. 6 is a block diagram of a video encoding apparatus according to an embodiment of the present invention;

fig. 7 is a block diagram of a video stream determination apparatus according to an embodiment of the present invention;

FIG. 8 is a block diagram of a video processing apparatus according to an embodiment of the present invention;

fig. 9 is a second block diagram of a video processing apparatus according to an embodiment of the present invention;

fig. 10 is a schematic hardware structure diagram of a server according to an embodiment of the present invention;

fig. 11 is a second schematic diagram of the hardware structure of the server according to the embodiment of the present invention;

fig. 12 is a third schematic diagram of a hardware structure of a server according to an embodiment of the present invention;

fig. 13 is a schematic diagram of a hardware structure of a VR terminal according to an embodiment of the present invention;

fig. 14 is a second hardware structure diagram of the VR terminal according to the second embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1, fig. 1 is a flowchart of a video encoding method provided by an embodiment of the present invention, where the video encoding method is applied to a server, and as shown in fig. 1, the method includes the following steps:

step 101, acquiring N paths of video streams corresponding to a target video, wherein N is a positive integer and is more than or equal to 2, the number of P frames in initial GOPs among the N paths of video streams is different, and the number of P frames in non-initial GOPs is the same;

where a GOP starts from one I frame and precedes the next I frame. A GOP is a group of consecutive pictures that is a set of pictures in a sequence. The first picture of the GOP must be an I-frame, which ensures that the GOP can be decoded independently without reference to other pictures. In this embodiment, a GOP includes an I frame, which is an intra-coded frame and is a complete picture, and a plurality of P frames, which are forward predicted frames, and changes from the I frame are recorded. It can be appreciated that without an I-frame, a P-frame cannot be decoded.

As shown in fig. 2, the starting GOP is a GOP in which the first I frame of the target video is located, and GOPs subsequent to the starting GOP are all non-starting GOPs.

Illustratively, as shown in fig. 2, a schematic diagram of 3 video streams corresponding to the target video is shown, that is, N is taken as 3 for exemplary illustration. In the 3-path video stream, the number of P frames in the video stream start GOP with the sequence number 1 is 3 frames, the number of P frames in the video stream start GOP with the sequence number 2 is 6 frames, the number of P frames in the video stream start GOP with the sequence number 3 is 9 frames, and the start GOPs corresponding to the 3-path coded video streams are all different. The number of P frames in the non-initial GOP of the 3-way video stream is the same, and all the P frames are 9 frames.

It should be noted that the above-mentioned taking 3 as N is merely an exemplary illustration, and in practical applications, the number N of encoding paths of the target video may be determined according to at least one of the size of the storage space of the target video, the GOP size, and the view switching time tolerance.

The method realizes that the offsets of the insertion points of the I frames are not equal on a time axis in a mode that the quantity of the P frames in the initial GOP is different and the quantity of the P frames in the non-initial GOP is the same, but the video stream has a larger I frame interval, and generates a plurality of paths of video streams by encoding, thus ensuring that each path of video stream can obtain a larger compression ratio and further saving the occupation of network resources.

In the above embodiment, because the number of P frames in the initial GOP between the N video streams used is different, and the number of P frames in the non-initial GOP is the same, it can be ensured that each video stream can obtain a larger compression rate, thereby saving the occupation of network resources, and the VR terminal can also select the video stream where the I frame closest to the current reference P is located from the N video streams, so that the VR terminal can accurately control the interval between the target P frame and the corresponding reference I frame when decoding, thereby reducing the workload of invalid decoding and the smooth switching (transition) time when turning around. Therefore, the scheme of the embodiment can realize low smooth switching time delay under the condition of low playing resource occupation.

In one embodiment, the step 101 includes:

Specifically, in practical applications, the number of P frames in the non-initial GOP of the N video streams may be determined according to the video compression ratio requirement and/or the video quality requirement. It will be appreciated that the greater the number of P frames in the non-starting GOP, the greater the I frame interval, and the greater the video stream compression ratio.

Illustratively, taking N as 3 as an example, the number of P frames in a starting GOP and the number of P frames in a non-starting GOP may be transmitted to an h.265/HEVC encoder, and a target video source of the same input is encoded three times to obtain a 3-channel high-definition h.265/HEVC encoded and compressed video stream, where the I frame intervals of the non-starting GOPs between the 3 channels of video stream are the same, but the time axis positions are different.

In an embodiment, the determining the number of P frames in a starting GOP corresponding to each of the N video streams includes:

Specifically, for each video stream of the N video streams, determining the number of P frames in a start GOP corresponding to each video stream of the N video streams includes:

according to the formula:

determining the number of P frames in a starting GOP corresponding to one of the N paths of video streams; wherein M is the number of P frames in the initial GOP of one of the N paths of video streams; f is the number of P frames in the non-initial GOP of the N paths of video streams; s is the stream sequence number of one of the N paths of video streams; and N is the number of encoding paths of the target video.

For example, when the number of h.265/HEVC high quality video streams is specified to be 3 video streams, and the number of P frames in the non-initial GOP is 9, for a video stream with a sequence number (stream number) of 1,

that is, the number of P frames in the initial GOP in the video stream with sequence number 1 is 3; for a video stream with sequence number 2,

that is, the number of P frames in the initial GOP in the video stream with sequence number 2 is 6; for a video stream with sequence number 3,

i.e., the number of P frames in the starting GOP in the video stream with sequence number 3 is 9.

Referring to fig. 3, fig. 3 is a flowchart of a video stream determination method provided by an embodiment of the present invention, where the video stream determination method is applied to a server or a VR terminal, as shown in fig. 3, and includes the following steps:

step 301, when it is determined that the virtual reality VR terminal is switched from a first FOV to a second FOV, determining a first video stream from N video streams according to attribute information of the N video streams corresponding to a target video;

As an implementation manner, when a user wearing the VR terminal turns his head, the VR terminal switches the first field angle FOV to the second FOV, and according to the attribute information of the N paths of video streams, the VR terminal determines, from the N paths of video streams, a video stream (first video stream) where an I frame closest to a target P frame is located, where the target P frame is a P frame corresponding to the second FOV at the switching time, and then sends request information for acquiring the first video stream to the server, so as to acquire the first video stream from the server, and finally, performs decoding processing and playing. In this way, the interval between the target P frame and the corresponding reference I frame can be accurately controlled, thereby reducing the ineffective decoding workload and the smooth switching (transition) time during the head turning.

In this step, the attribute information of the N paths of video streams is obtained by the VR terminal and the server in advance, or is obtained by the VR terminal after performing dynamic interaction with the server for each target video.

As another implementation manner, when a user wearing the VR terminal turns his head, the VR terminal switches a first field angle FOV to a second field angle FOV, the service side acquires a field angle switching signal of the VR terminal, and determines a first video stream from the N paths of video streams; and the first video stream is sent to the VR terminal for decoding and playing, so that the interval between the target P frame and the corresponding reference I frame can be accurately controlled, and the invalid decoding workload and the smooth switching (transition) time during head turning are reduced.

In the above embodiment, when it is determined that the view switching occurs, a first video stream is determined from the N streams according to the attribute information of the N streams, and the first video stream is provided to the VR terminal for decoding and playing. In the embodiment, because the number of the P frames in the initial GOP between the N paths of used video streams is different, and the number of the P frames in the non-initial GOP is the same, it can be ensured that each path of video stream can obtain a larger compression rate, thereby saving the occupation of network resources, and the VR terminal can also accurately control the interval between the target P frame and the corresponding reference I frame when decoding, thereby reducing the invalid decoding workload and the smooth switching (transition) time when switching the head.

In one embodiment, step 301 comprises:

and taking the video stream with the minimum L value as the first video stream.

Specifically, calculating the L value according to the video playing duration corresponding to the FOV switching time, the number of P frames in the start GOP between the N paths of video streams, and the number of P frames in the non-start GOP includes:

according to the formula:

calculating the number of P frames spaced between the target P frame and the nearest I frame in each path of video stream;

taking the video stream with the minimum L value as the first video stream;

wherein, L is the number of P frames between the target P frame and the nearest I frame in each path of video stream, and the target P frame is the P frame corresponding to the second FOV at the switching time; t is₀The unit of video playing time corresponding to the FOV switching time is millisecond; q is the number of P frames in the starting GOP; f is the number of P frames in the non-initial GOP, and J is the frame rate in frames/second.

As shown in fig. 2, assuming that the current VR terminal uses video stream 1 to play the first FOV, when the first FOV is determined to be switched to the second FOV, the video playing time length T corresponding to the FOV switching time is calculated according to the above formula₀The number L of P frames spaced between the corresponding target P frame (black frame) and the reference I frame in each video stream, as can be seen from fig. 2, if the number of P frames spaced between the target P frame in the video stream 2 and the corresponding reference I frame is the least, the video stream 2 is used as the first video stream to replace the video stream 1 corresponding to the original first FOV, and is used as the playing source of the high-quality video; redirecting to a serverSend request information, pull video stream 2, and find out in video stream 2 at T₀And playing the I frame which needs to be referred to when the target P frame corresponding to the long time is decoded, starting from the I frame, decoding frame by frame until the current target P frame, then discarding the frame before the current target P frame, and starting to present and play from the current target P frame.

Referring to fig. 4, fig. 4 is a flowchart of a video processing method according to an embodiment of the present invention, and as shown in fig. 4, the video processing method applied to a server includes the following steps:

step 401, receiving request information sent by a VR terminal for acquiring a first video stream;

step 402, sending the first video stream to the VR terminal according to the request information;

Wherein the request information includes a stream sequence number of the first video stream.

In this step, the number of P frames in the initial GOP of the N paths of video streams is different, the number of P frames in the non-initial GOP is the same, and the generated multiple paths of video streams are encoded according to the principle that the offsets of the insertion points of the I frames are not equal on the time axis, but the video streams have larger I frame intervals, so that each path of video stream can be guaranteed to obtain a larger compression ratio, and the occupation of network resources is saved.

In this embodiment, when a user wearing the VR terminal turns his head, the VR terminal switches the first field angle FOV to the second field angle FOV, determines, from the N video streams, a first video stream where an I frame corresponding to the second field angle is located at the time of switching the viewing angle according to the attribute information of the N video streams, and sends request information for acquiring the first video stream to the server, and the server sends the first video stream to the VR terminal according to the request information, so as to provide the first video stream for the VR terminal to perform decoding and playing. The embodiment can ensure that each path of video stream can obtain a larger compression ratio, thereby saving the occupation of network resources, and simultaneously can accurately control the interval between the target P frame and the corresponding reference I frame, thereby reducing the invalid decoding workload and the smooth switching (transition) time during head turning.

Referring to fig. 5, fig. 5 is a flowchart of a video processing method according to an embodiment of the present invention, and as shown in fig. 5, the video processing method applied to a VR terminal includes the following steps:

step 501, in the case that the first FOV is switched to the second FOV, sending request information for acquiring the first video stream to a server; the first FOV and the second FOV are field angles of a target video played by the virtual reality VR terminal; the first video stream is determined from N paths of video streams corresponding to a target video; and the distance between a target P frame in the first video stream and an I frame is the shortest, and the target P frame is a P frame corresponding to the second FOV at the moment of view angle switching; the number of P frames in the initial GOP among the N paths of video streams is different, the number of P frames in the non-initial GOP is the same, N is a positive integer and is more than or equal to 2;

Specifically, the step of determining the first video stream from the N paths of video streams corresponding to the target video includes: determining a first video stream from the N paths of video streams according to the attribute information of the N paths of video streams; wherein the attribute information includes: the number of P frames in a starting GOP and the number of P frames in a non-starting GOP among the N paths of video streams.

The attribute information of the N paths of video streams is obtained by the VR terminal and the server in advance, or is obtained by the VR terminal after performing dynamic interaction with the server for each target video.

In this step, when the user wearing the VR terminal turns his head, the VR terminal switches the first field angle FOV to the second field angle FOV, and determines, from the N video streams, the video stream where the I frame corresponding to the second field angle is closest to the P frame distance corresponding to the second field angle at the time of switching the viewing angle according to the attribute information of the N video streams, and sends request information for acquiring the target video stream to the server, so as to perform stream pulling processing.

Wherein the request information includes a stream sequence number of the target video stream.

Step 502, receiving the first video stream sent by the server according to the request information;

step 503, using the first video stream as the playing source of the second FOV, and performing decoding playing.

According to the scheme, on the encoding side, a plurality of paths of high-quality video streams with a larger I frame encoding interval period are used, on the decoding side, when the FOV high-quality video switches the view angle, one path of high-quality video stream with the latest I frame is pulled from the server side, and low FOV MTP delay occupied by lower playing resources (bandwidth and decoding calculation power) can be realized. Therefore, under the same code rate, the video quality can be improved; under the same video quality, the occupation of playing bandwidth can be reduced; and the MTP latency index can be improved.

The embodiment of the invention also provides a video coding device. Referring to fig. 6, fig. 6 is a block diagram of a video encoding apparatus according to an embodiment of the present invention.

As shown in fig. 6, the video processing apparatus 600 includes:

an obtaining module 601, configured to obtain N paths of video streams corresponding to a target video, where N is a positive integer and N is greater than or equal to 2, where the number of P frames in starting group of pictures (GOP) of the N paths of video streams is different, and the number of P frames in non-starting GOPs is the same;

optionally, the obtaining module 601 includes:

the first acquisition submodule is used for determining the number of P frames in the initial GOP corresponding to the N paths of video streams respectively;

the second acquisition submodule is used for determining the number of P frames in a non-initial GOP of the N paths of video streams;

and the third obtaining sub-module is used for coding the target video according to the number of the P frames in the initial GOP corresponding to the N paths of video streams and the number of the P frames in the non-initial GOP of the N paths of video streams, so as to obtain the N paths of video streams of the target video.

Optionally, the first obtaining sub-module includes:

and the first acquisition unit is used for determining the number of the P frames in the initial GOP corresponding to the N paths of video streams respectively according to the number of the P frames in the non-initial GOP of the N paths of video streams, the stream sequence number of the N paths of video streams and the encoding path number of the target video.

The apparatus 600 provided in the embodiment of the present invention may implement the above-described video encoding method, which has similar implementation principles and technical effects, and this embodiment is not described herein again.

The embodiment of the invention also provides a video stream determining device. Referring to fig. 7, fig. 7 is a block diagram of a video stream determination apparatus according to an embodiment of the present invention.

As shown in fig. 7, the video stream determining apparatus 700 includes:

a determining module 701, configured to determine, when it is determined that the virtual reality VR terminal is switched from the first FOV to the second FOV, a first video stream from N video streams according to attribute information of the N video streams corresponding to the target video;

Optionally, the determining module 701 includes:

the first determining submodule is used for calculating an L value according to the video playing time corresponding to the FOV switching time, the number of P frames in the initial GOP among the N paths of video streams and the number of P frames in the non-initial GOP; the L value is the number of P frames spaced between the nearest I frame and the P frame corresponding to the second FOV at the view angle switching moment in each path of video stream;

and the second determining submodule is used for taking the video stream with the minimum L value as the first video stream.

The apparatus 700 provided in the embodiment of the present invention may implement the above-described embodiment of the video stream determining method, and the implementation principle and the technical effect are similar, which are not described herein again.

The embodiment of the invention also provides a video processing device. Referring to fig. 8, fig. 8 is a structural diagram of a video processing apparatus according to an embodiment of the present invention.

As shown in fig. 8, the video processing apparatus 800, applied to a server, includes:

a first receiving module 801, configured to receive request information for acquiring a first video stream sent by a VR terminal;

a first sending module 802, configured to send the first video stream to the VR terminal according to the request information;

The apparatus 800 provided in the embodiment of the present invention may execute the above-mentioned video processing method applied to the server side, and the implementation principle and the technical effect are similar, which are not described herein again.

As shown in fig. 9, the video processing apparatus 900, applied to a VR terminal, includes:

a second sending module 901, configured to send, to the server, request information for acquiring the first video stream if it is determined that the first FOV is switched to the second FOV; the first FOV and the second FOV are field angles of a target video played by the virtual reality VR terminal; the first video stream is determined from N paths of video streams corresponding to a target video; and the distance between a target P frame in the first video stream and an I frame is the shortest, and the target P frame is a P frame corresponding to the second FOV at the moment of view angle switching; the number of P frames in the initial GOP among the N paths of video streams is different, the number of P frames in the non-initial GOP is the same, N is a positive integer and is more than or equal to 2;

a second receiving module 902, configured to receive the first video stream sent by the server according to the request information;

a playing module 903, configured to use the first video stream as a playing source of the second FOV to perform decoding playing.

The apparatus 900 provided in the embodiment of the present invention may execute the above-mentioned embodiment of the video processing method applied to the VR terminal side, and the implementation principle and the technical effect are similar, which are not described herein again.

As shown in fig. 10, an embodiment of the present invention provides a server, including: a transceiver 1010, a memory 1020, a bus interface, a processor 1000, and a computer program stored on the memory 1020 and executable on the processor 1000; the processor 1000, which is used to read the program in the memory 1020, executes the following processes:

A transceiver 1010 for receiving and transmitting data under the control of the processor 1000.

Where in fig. 10, the bus architecture may include any number of interconnected buses and bridges, with various circuits being linked together, particularly one or more processors represented by processor 1000 and memory represented by memory 1020. The bus architecture may also link together various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. The bus interface provides an interface. The transceiver 1010 may be a number of elements including a transmitter and a transceiver providing a means for communicating with various other apparatus over a transmission medium. The processor 1000 is responsible for managing the bus architecture and general processing, and the memory 1020 may store data used by the processor 1000 in performing operations.

The processor 1000 is responsible for managing the bus architecture and general processing, and the memory 1020 may store data used by the processor 1000 in performing operations.

The processor 1000 is further configured to read the computer program and execute the following steps:

The server provided in the embodiment of the present invention may execute the above-mentioned video encoding method embodiment, and the implementation principle and technical effect are similar, which are not described herein again.

As shown in fig. 11, an embodiment of the present invention provides a server, including: a transceiver 1110, a memory 1120, a bus interface, a processor 1100, and a computer program stored on the memory 1120 and executable on the processor 1100; the processor 1100, which reads the program in the memory 1120, performs the following processes:

A transceiver 1110 for receiving and transmitting data under the control of the processor 1100.

Where in fig. 11, the bus architecture may include any number of interconnected buses and bridges, with one or more processors, represented by processor 1100, and various circuits, represented by memory 1120, being linked together. The bus architecture may also link together various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. The bus interface provides an interface. The transceiver 1110 may be a number of elements including a transmitter and a transceiver providing a means for communicating with various other apparatus over a transmission medium. The processor 1100 is responsible for managing the bus architecture and general processing, and the memory 1120 may store data used by the processor 1100 in performing operations.

The processor 1100 is responsible for managing the bus architecture and general processing, and the memory 1120 may store data used by the processor 1100 in performing operations.

The processor 1100 is also adapted to read the computer program and perform the following steps:

and taking the video stream with the minimum L value as the first video stream.

The server provided in the embodiment of the present invention may execute the above-mentioned embodiment of the video stream determining method, and the implementation principle and technical effect are similar, which are not described herein again.

As shown in fig. 12, an embodiment of the present invention provides a server, including: a transceiver 1210, a memory 1220, a bus interface, a processor 1200 and a computer program stored on the memory 1220 and executable on the processor 1200; a processor 1200 for reading the program in the memory 1220 and executing the following processes:

A transceiver 1210 for receiving and transmitting data under the control of the processor 1200.

Where in fig. 12, the bus architecture may include any number of interconnected buses and bridges, with various circuits of one or more processors represented by processor 1200 and memory represented by memory 1220 being linked together. The bus architecture may also link together various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. The bus interface provides an interface. The transceiver 1210 may be a number of elements, including a transmitter and a transceiver, providing a means for communicating with various other apparatus over a transmission medium. The processor 1200 is responsible for managing the bus architecture and general processing, and the memory 1220 may store data used by the processor 1200 in performing operations.

The processor 1200 is responsible for managing the bus architecture and general processing, and the memory 1220 may store data used by the processor 1200 in performing operations.

The server provided in the embodiment of the present invention may execute the above-mentioned video processing method applied to the server side, and the implementation principle and technical effect are similar, which are not described herein again.

As shown in fig. 13, an embodiment of the present invention provides a VR terminal, including: a transceiver 1310, a memory 1320, a bus interface, a processor 1300, and a computer program stored on the memory 1320 and executable on the processor 1300; a processor 1300, for reading the program in the memory 1320, for executing the following processes:

A transceiver 1310 for receiving and transmitting data under the control of the processor 1300.

In fig. 13, among other things, the bus architecture may include any number of interconnected buses and bridges with various circuits being linked together, particularly one or more processors represented by processor 1300 and memory represented by memory 1320. The bus architecture may also link together various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. The bus interface provides an interface. The transceiver 1310 can be a number of elements including a transmitter and a transceiver that provide a means for communicating with various other apparatus over a transmission medium. The processor 1300 is responsible for managing the bus architecture and general processing, and the memory 1320 may store data used by the processor 1300 in performing operations.

The processor 1300 is responsible for managing the bus architecture and general processing, and the memory 1320 may store data used by the processor 1300 in performing operations.

As shown in fig. 14, an embodiment of the present invention provides a VR terminal, including: a transceiver 1410, a memory 1420, a bus interface, a processor 1400, and a computer program stored on the memory 1420 and executable on the processor 1400; the processor 1400 is used for reading the program in the memory 1420 and executing the following processes:

A transceiver 1410 for receiving and transmitting data under the control of the processor 1400.

Where in fig. 14 the bus architecture may include any number of interconnected buses and bridges, in particular one or more processors, represented by the processor 1400, and various circuits of memory, represented by the memory 1420, linked together. The bus architecture may also link together various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. The bus interface provides an interface. The transceiver 1410 may be a number of elements including a transmitter and a transceiver providing a means for communicating with various other apparatus over a transmission medium. The processor 1400 is responsible for managing the bus architecture and general processing, and the memory 1420 may store data used by the processor 1400 in performing operations.

The processor 1400 is responsible for managing the bus architecture and general processing, and the memory 1420 may store data used by the processor 1400 in performing operations.

The processor 1400 is further configured to read the computer program and execute the following steps:

and taking the video stream with the minimum L value as the first video stream.

The VR terminal provided in the embodiment of the present invention may execute the above-mentioned embodiment of the video stream determination method applied to the VR terminal, and the implementation principle and the technical effect are similar, which is not described herein again.

Those skilled in the art will appreciate that all or part of the steps for implementing the above embodiments may be performed by hardware, or may be instructed to be performed by associated hardware by a computer program that includes instructions for performing some or all of the steps of the above methods; and the computer program may be stored in a readable storage medium, which may be any form of storage medium.

In addition, a computer-readable storage medium is provided in a specific embodiment of the present invention, and a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements the steps in the video processing method, or implements the steps in the video encoding method, or implements the steps in the video stream determining method, and can achieve the same technical effects, and in order to avoid repetition, the computer program is not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may be physically included alone, or two or more units may be integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.

The integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) to execute some steps of the transceiving method according to various embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. A video encoding method, comprising:

2. The video coding method according to claim 1, wherein the obtaining N video streams corresponding to the target video comprises:

3. The video coding method according to claim 2, wherein the determining the number of P frames in the starting GOP corresponding to each of the N video streams comprises:

4. A method for video stream determination, comprising:

5. The method according to claim 4, wherein the step of determining the first video stream from the N video streams according to the attribute information of the N video streams corresponding to the target video comprises:

and taking the video stream with the minimum L value as the first video stream.

6. A video processing method is applied to a server side and comprises the following steps:

7. A video processing method is applied to a VR terminal and comprises the following steps:

8. A server, comprising: a transceiver, a memory, a processor, and a computer program stored on the memory and executable on the processor; characterized in that the processor, a program for reading in the memory, implements the steps in the video coding method according to any one of claims 1 to 3, or implements the steps in the method for determining a video stream according to any one of claims 4 to 5, or implements the steps in the video processing method according to claim 6.

9. A VR terminal, comprising: a transceiver, a memory, a processor, and a computer program stored on the memory and executable on the processor; processor for reading a program in a memory implementing the steps in the method for determining a video stream according to any one of claims 4 to 5 or the steps in the method for processing a video according to claim 7.

10. A computer-readable storage medium storing a computer program, characterized in that the computer program, when being executed by a processor, implements the steps in the video encoding method according to any one of claims 1 to 3, or implements the steps in the video stream determination method according to any one of claims 4 to 5, or implements the steps in the video processing method according to any one of claims 6 to 7.