CN112533005B

CN112533005B - Interaction method and system for VR video slow live broadcast

Info

Publication number: CN112533005B
Application number: CN202011012713.1A
Authority: CN
Inventors: 刘睿; 李涌泉; 胡勇
Original assignee: AVIT Ltd
Current assignee: AVIT Ltd
Priority date: 2020-09-24
Filing date: 2020-09-24
Publication date: 2022-10-04
Anticipated expiration: 2040-09-24
Also published as: CN112533005A

Abstract

The invention discloses an interaction method and system for VR video slow live broadcast, wherein the method comprises the following steps: obtaining VR video slow live broadcast stream, and performing video live broadcast after preprocessing; receiving an interaction request sent by a user through VR interaction equipment in a video live broadcast process and responding; the method specifically comprises the following steps: a user sends a request for amplifying a specific area through VR interactive equipment in the live video process; and after the information of the specific area is acquired and fused and decoded, performing amplified rendering display based on the view angle center of the specific area. Aiming at the new live broadcast form of slow live broadcast, the invention leads audiences to feel 360-degree omnibearing service scenes in a VR video mode, introduces an interaction means aiming at fading the interest of long-time watching, leads the audiences to select interested visual angle areas to amplify and watch ultra-high definition picture details, has adjustable amplification factor, large information quantity, strong interaction and excellent display effect, obviously improves the user experience and improves the use viscosity of the watching of the users.

Description

Interaction method and system for VR video slow live broadcast

Technical Field

The invention belongs to the technical field of virtual reality, and particularly relates to an interaction method and system for VR video slow live broadcasting.

Background

The slow live broadcast is unattended continuous live broadcast in real time all day, and is a concept different from the past live broadcast ecology. Compared with the traditional live broadcasting mode comprising the elements of shot switching and editing, presenter description, audience interaction and the like, the slow live broadcasting mode has the most remarkable characteristic that the slow live broadcasting mode has no interference, audiences watch real and objective event processes in the identity of a third party, and the slow live broadcasting mode has accompany performance, strong substitution sense and unknown curiosity, and can promote the audiences to keep continuous attention to certain event.

The VR video is a spherical video containing 360 degrees x180 degrees all-round visual angle information, and allows a viewer to change the visual angle during watching and select an interested area for watching. Carry out the live broadcast slowly with the medium mode of VR video, on the one hand can the live visual angle scope of scene of effectual extension, accomplish not have the dead angle and watch, on the other hand has more immersed the sense, and spectator seems like being on the spot, effectively holds the watching effect of live broadcast business slowly.

Since VR video covers all-around viewing angles, higher resolution (8K and above) is required to ensure its sharpness and immersion. The VR video is divided into tiles and adaptive streaming is realized by combining the characteristic of MPEG-DASH adaptive multi-rate, so that the problems of network transmission and terminal decoding caused by ultrahigh resolution and ultrahigh code rate of the VR video can be effectively solved. By dynamically transmitting a set of appropriate Tile streams corresponding to the viewing angle region, the user is allowed to browse the video in an interactive manner, so that the interested part can be selected at any time for watching.

The existing VR slow live broadcast only presents a monotonous video scene to audiences, and the audiences inevitably feel dull and lose interest along with the long-time lapse. Therefore, the disadvantage can be well remedied by adding the timely picture interaction, so that the audiences have strong participation sense, and the long-term attention of the audiences is stimulated.

Disclosure of Invention

The following presents a simplified summary of embodiments of the invention in order to provide a basic understanding of some aspects of the invention. It should be understood that the following summary is not an exhaustive overview of the invention. It is not intended to determine the key or important part of the present invention, nor is it intended to limit the scope of the present invention. Its sole purpose is to present some concepts in a simplified form as a prelude to the more detailed description that is discussed later.

Aiming at the problems in the prior art, the application provides an interaction method and system for slow live VR video broadcasting, and in the process of watching slow live VR video broadcasting, audiences can select and amplify a point-of-sight region interested by themselves and watch high-definition details of the region, so that the interactivity and the participation of the users in a slow live broadcasting service scene are effectively improved.

According to one aspect of the application, an interaction method for slow live VR video is provided, which includes:

obtaining VR video slow live broadcast stream, and performing video live broadcast after preprocessing;

receiving an interaction request sent by a user through VR interaction equipment in a video live broadcast process and responding; the method comprises the following steps: a user sends a request for amplifying a specific area through VR interactive equipment in the live video process; and after the information of the specific area is acquired and fused and decoded, performing amplified rendering display based on the view angle center of the specific area. Wherein, the VR interaction device can be a head display all-in-one machine key or a remote control and the like.

As an optimal scheme, obtaining a VR video slow live stream, performing video live after preprocessing, specifically including: and obtaining a VR video slow live stream, performing video caching and video downsampling on the VR video slow live stream, and then performing video blocking and video secondary distribution on the video after the video caching and the video downsampling.

The video cache is implemented by presetting video cache time and storing the video cache time as two VR video slow live streams; the video downsampling is to reduce the resolution of a cached VR video slow live stream to form a basic full view code stream; the video blocking is to perform Tile video block segmentation on the other cached VR video slow live stream and perform block coding based on MPEG MCTS constraint; the secondary distribution of the video is as follows: and carrying out plug flow transmission again on the VR video slow live stream after the Tile segmentation and the basic full view code stream after the downsampling based on a general stream distribution protocol.

The method comprises the steps of acquiring position information (position and direction information of an interest area) of a specific area and an amplification request (amplification control information such as amplification factors) sent by a user through VR interactive equipment in a live video process in real time; decoding the basic full-view code stream, and forming a full-view texture object by using the decoded image; decoding the partitioned VR video slow live stream, and taking the decoded image as a texture object; and rendering by OpenGL, confirming a circular amplification area on a window in a vertex shader stage, mapping a full-view texture object decoded by a basic full-view code stream to the global surface in a fragment shader stage, performing coordinate transformation on a texture object decoded by a partitioned VR video slow live stream, and sampling and mapping to the circular amplification area.

The basic full view code stream is not subjected to blocking processing, the high-definition VR video slow live stream is blocked according to a certain algorithm and is subjected to block coding based on MPEG MCTS constraint, and blocks are linearly numbered from left to right and from top to bottom. And performing push streaming transmission again on the basis of a general stream distribution protocol (such as MPEG DASH) on the partitioned VR video slow live stream (ultra-high-definition video Tile stream) and the downsampled basic full view angle code stream. Therefore, the scheme of simultaneously transmitting the basic stream and the ultra-high definition block transmission stream is adopted, and when a specific area needs to be amplified, the corresponding video blocks in the ultra-high definition block transmission stream are selected to be processed, so that the method is simple and feasible.

For the video cache, the time of the video cache can be set to be 5 to 10 minutes due to the fact that slow live broadcast is not sensitive to time delay, and the cache medium is a memory or an SSD and is used for subsequent video down-sampling, blocking and secondary distribution.

The video downsampling is to reduce the data volume under the condition of keeping the visual field of the video to be the same; in this step, the cached VR video slow live stream (ultra high definition stream (e.g., 64K)) is reduced in resolution to form a basic full view stream (e.g., 8K).

For video blocking, the cached VR video slow live stream (ultra-high-definition stream) is subjected to Tile video block segmentation, and block coding is performed based on MPEG MCTS constraint. In order to avoid frequently reinitializing a decoder of user terminal equipment and ensure that the resolution of an image after code stream fusion keeps a fixed value, the method divides a VR video slow live stream into a series of Tile video blocks with equal resolution.

Furthermore, the VR video slow live stream is divided into a series of square Tile video blocks with equal resolution according to the row-column ratio of 1. The larger the resolution of the Tile video block, the lower the data utilization rate, and the too small resolution of the Tile video block may reduce the encoding performance and the compression efficiency. Setting the side length of a Tile video block as a tilleLength, recording the horizontal and vertical resolutions of a VR video slow live stream as a pixelWidth and a pixelHeight respectively, and then setting the horizontal Tile number of the VR video slow live stream as colNum = pixelWidth/tilleLength, and the vertical Tile video block number as rowNum = pixelHeight/tilleLength.

For video secondary distribution: and performing VR video slow live streaming of the Tile video block and the downsampled basic full view code stream, and performing push streaming transmission again based on a general stream distribution protocol (such as MPEG DASH).

The method includes that a user sends a request for amplifying a specific area through VR interactive equipment in a live video broadcasting process, obtains information of the specific area, performs fusion decoding, and performs amplified rendering display based on a view angle center of the specific area, and specifically includes: the method comprises the steps of acquiring an interest area (such as a perfect circle) circled by a user through VR interactive equipment in a video live broadcast process in real time, enabling a rendered sub-image serving as a texture to be a circumscribed square of the interest area, and forcibly stretching a texture image of a video frame to a ratio of 2 to 1 during ERP projection, so that the ratio of 1 to 1 needs to be manually stretched back during rendering. The original video tiles are preferably set to be square because the ERP forced stretching and human pulling are balanced.

The method comprises the following steps of obtaining an interest area circled by a user through VR interactive equipment in a video live broadcast process in real time, wherein the interest area specifically comprises the following steps: and acquiring the position and direction information of the interest area circled by the user through VR interactive equipment in the live video broadcasting process in real time. VR interactive equipment can be realized by the first equipment that shows of VR, and the first equipment that shows of VR works in three degree of freedom modes, and the first initial point that shows position and spherical model coordinate system coincides. The head is only rotated and not translated in the coordinate system, described towards a direction vector from the origin of coordinates to the viewpoint. Generally, the head display device will provide a 4x4 homogeneous matrix in real time through the interface, containing the position information and orientation information of the head display:

the top left 3x3 matrix is a rotation matrix R, and the head display current orientation ori can be calculated according to R:

the viewpoint is the intersection point of the ray and the spherical surface along the ori direction of the coordinate origin. Since the position of a point on the spherical surface projected onto the plane in the ERP projection is only related to the angle, ori can be used as the viewpoint coordinate and is written as (x, y, z).

The user sends out through VR interactive device and enlarges the request in the live video process, acquires the information in this particular region and fuses the back of decoding, carries out the rendering display of enlargeing formula based on the visual angle center in this particular region, still includes: setting the side length of a rendered sub-image, calculating the coordinates of the center point of the sub-image of the interest region according to the viewpoint coordinates and the spherical model, calculating the numbers (marked as Tile serial numbers) of all Tile video blocks overlapped with the ultra-high definition video frame, and distributing and transmitting the Tile video blocks.

Wherein setting the side length of the rendered sub-image comprises: setting FOV of the circular magnification area as alpha, and under the condition that the magnification is equal to 1, rendering the edge length of the sub-image as follows: alpha/360 degrees pixelWidth, as the magnification increases, the pixels of the rendered sub-image can be reduced, and the resolution of the rendered sub-image is kept unchanged for uniform processing.

Calculating the sub-image center point coordinates of the region of interest comprises:

first, the viewpoint coordinates (x, y, z) are converted into spherical coordinates (r, Φ, θ):

the spherical coordinates (r, φ, θ) are then converted into texture coordinates (u, v):

and finally, converting the texture coordinates into pixel coordinates to obtain pixel coordinates (s, t) of the center of the sub-image:

the pixel coordinate origin is defined as the upper left corner point of the image, the horizontal right direction is the direction of the s axis, and the vertical downward direction is the direction of the t axis.

Calculating the serial numbers of all the video blocks overlapped with the ultra-high definition video frame comprises the following steps:

setting the center coordinate of the square sub-image as P (sp, tp) and the side length as render Length in the current position of rendering the sub-image, calculating the coordinates of the four corners and on which Tile video block the sub-image falls.

The coordinates of the point A at the upper left corner are equal to the coordinates of the point P at the center minus half of the side length respectively:

the calculation formulas of the line number row, the column number col and the sequence number tileOrderA of the Tile where the point A is located are as follows:

tileOrder _A ＝row*colNum+col

B. the sequence number of the Tile video block where the three points C and D are located is calculated as:

after calculating the Tile video blocks where the four corner points are located, the numbers of all overlapped Tile video blocks, namely Tile serial numbers, can be determined, the Tile video blocks are the Tile video blocks to be loaded, the first row takes the Tile video blocks with the serial numbers [ Tile order A, tile order B ], and the Tile serial numbers of each downward row can be obtained by simply adding rowNum to the first row until the points C and D are located.

After the information of the specific region is acquired and fused and decoded, performing amplified rendering display based on the view angle center of the specific region, specifically comprising: a code stream fusion process, a video decoding process and a texture mapping process.

The code stream fusion process comprises the following steps: and splicing the loaded Tile video blocks of the plurality of super-clear code streams into a square for decoding, so that the VR interactive equipment only needs one decoder to decode the super-clear code streams, and the performance requirement on the terminal equipment is reduced.

Because there are several situations in the position mapping of the rendered subimage in the super-clear video code stream, in order to ensure that the resolution is kept uniform in the subsequent decoding process (avoid frequently reinitializing the decoder), the number of Tile video blocks on each side of the rendered subimage is processed according to [ render Length/Tile Length ] +1 during splicing, and a vacant part (single line or single column or row and column) is appointed to be filled by the first Tile video block.

In addition, the code stream fusion process follows the general method of MPEG MCTS code stream fusion, namely the modification of header information and the fusion of code stream information.

The video decoding process includes: and simultaneously decoding the ultrahigh-definition Tile video block code stream and the basic full-view code stream, wherein the ultrahigh-definition Tile video block code stream and the basic full-view code stream are respectively decoded by independent decoders. And decoding the ultrahigh-definition Tile video block code stream aiming at the fused code stream, and establishing a texture object for the decoded image. And independently decoding the basic full view code stream, and taking the decoded full view frame as a texture object.

The texture mapping process comprises: obtaining the information of the current specific area to be amplified, and performing texture coordinate transformation, fragment coloring and the like on the amplified area.

The information of the specific area information to be amplified is obtained through VR interactive equipment (such as a head display integrated machine key or a remote control and the like). The information includes two attributes, attribute b _ Zoom, which indicates whether to perform region enlargement, and takes a value of 0 or 1. The attribute f _ ZoomLevel represents the amplification factor, the value is a continuous range, the lower limit is 1, and the upper limit is calculated by the following method:

assuming that the FOV of the VR interaction device is β, the monocular resolution is width device x height device, the number of pixels in the horizontal direction is numDevice = α/β width device, and for the ultra-high definition stream, the horizontal field angle FOV corresponding to numDevice pixels is γ = numDevice/pixelWidth 360 °. The upper limit of the magnification is α/γ (α is the FOV of the circular magnified region).

The viewer can flip the value of b _ Zoom between 0 and 1 through the VR interaction device and increase or decrease the value of f _ Zoom level when b _ Zoom is equal to 1.

The calculation method of the transformation of the texture coordinates of the enlarged region is as follows:

the current texture coordinates (u _ cur, v _ cur) of the vertex of the magnified region are transformed into new texture coordinates (u, v), i.e. the coordinates of the sampled colors from the texture object. u _ cur and v _ cur are vertex current texture coordinates, and u0 and v0 are texture coordinates of the center of the square rendering sub-image in the texture object. u _ viewpoint and v _ viewpoint are texture coordinates of the viewpoint in the spherical vertex model.

The normalized parameter is used to stretch the circular magnified region to be the inscribed circle of the square sub-image without magnification, and its value is set as:

wherein the value of theta is half of the field angle of the circular magnification area, namely alpha/2;

the parameter f _ zoom level is magnification factor information in the magnification control, and the zooming effects of different sizes are realized.

The parameter factor is a correction factor for reducing image distortion caused by vertex density variation, and uses trigonometric function cos θ, where θ is the latitude of the vertex, defining the equator latitude to be 0 ° and the two poles latitude to be 90 °. And image distortion caused by the fact that the density of the top points of the spherical surface is increased when the spherical surface is close to the poles is relieved.

The fragment coloring process comprises the following steps: for the basic full-view code stream, coloring the texture object according to the full-coordinate sampling color; for the amplified region, the texture coordinates are transformed according to the method to realize the amplification effect, and then the color is sampled from the texture object generated by the high-definition layer Tile according to the texture coordinates.

According to another aspect of the present application, there is provided a VR video slow live interactive system, including:

the video preprocessing unit is used for acquiring VR video slow live stream, and performing video live broadcast after preprocessing;

the video interaction computing unit is used for receiving a request of a user for amplifying a specific area sent by VR interactive equipment in the video live broadcasting process;

and the video decoding and rendering unit is used for acquiring the information of the specific area, performing fusion decoding on the information, and performing amplification type rendering display on the basis of the view angle center of the specific area.

The invention has the beneficial effects that: aiming at the new live broadcast form of slow live broadcast, audiences can feel 360-degree omnibearing service scenes in a VR video mode, and aiming at fading the interest of long-time watching, an interaction means is introduced, the audiences can select interested visual angle areas to amplify and watch ultra-high definition picture details, the amplification factor is adjustable, the information quantity is large, the interaction sense is strong, the display effect is excellent, the user experience is remarkably improved, and the use viscosity of the user watching is improved.

Drawings

The invention may be better understood by reference to the following description taken in conjunction with the accompanying drawings, in which like reference numerals identify like or similar parts throughout the figures. The accompanying drawings, which are incorporated in and form a part of this specification, illustrate preferred embodiments of the present invention and, together with the detailed description, serve to further explain the principles and advantages of the invention. In the drawings:

FIG. 1 is a schematic flow diagram of the present invention;

FIG. 2 is a schematic diagram illustrating a rendering sub-image scale selection according to the present invention;

FIG. 3 is a schematic diagram of the ultra high definition code stream blocking method according to the present invention;

FIG. 4 is a schematic view of the viewpoint coordinate and subimage calculation of the present invention;

FIG. 5 is a schematic view of sub-image coordinates according to the present invention;

fig. 6a, 6b, 6c, and 6d are exemplary diagrams of sub-image position mapping according to the present invention, where fig. 6a is a sub-image covering 6 × 6 Tile video blocks, fig. 6b is a sub-image covering 6 × 5 Tile video blocks, fig. 6c is a sub-image covering 5 × 6 Tile video blocks, and fig. 6d is a sub-image covering 5 × 5 Tile video blocks;

FIG. 7 is a schematic diagram of code stream fusion filling according to the present invention.

Detailed Description

Embodiments of the present invention will be described below with reference to the accompanying drawings. Elements and features depicted in one drawing or one embodiment of the invention may be combined with elements and features shown in one or more other drawings or embodiments. It should be noted that the figures and description omit representation and description of components and processes that are not relevant to the present invention and that are known to those of ordinary skill in the art for the sake of clarity.

Example 1

The embodiment provides a VR video slow live broadcast interactive system which comprises a video preprocessing unit, a video interaction calculating unit and a video decoding and rendering unit. The VR panorama ultra-high definition camera serves as an ultra-high definition live source, provides VR video slow live stream (such as 64K), after caching and down-sampling (such as 8K), adopts a transmission mode of ultra-high definition block and basic full view, continuously transmits, decodes and renders the basic resolution full view live stream under a conventional condition, when a user is interested in a certain area in the live stream and wants to further experience picture details, the interested area can be enlarged through a head display peripheral (such as a head display all-in-one machine key or remote control and the like), at the moment, a plurality of ultra-high resolution blocks of the interested area are transmitted as required, after fusion decoding is carried out, and enlarged rendering display is carried out based on a view center of the interested area. Fig. 1 is a schematic flowchart of the interactive system in this embodiment.

The video preprocessing unit mainly completes the functions of video caching, video down-sampling, video blocking, video secondary distribution and the like.

The video caching can be set to be 5-10 minutes due to the fact that slow live broadcasting is not sensitive to time delay, and the caching medium is an internal memory or an SSD and is used for subsequent downsampling, blocking and secondary distribution.

And video downsampling, namely reducing the resolution of the cached ultra-high-definition code stream (such as 64K) to form a basic full-view code stream (such as 8K).

And video blocking, namely performing Tile block segmentation on the cached ultra high definition code stream, and performing block coding based on MPEG MCTS constraint. In order to avoid frequently re-initializing a decoder of user terminal equipment and ensure that the resolution of an image after code stream fusion keeps a fixed value, the method divides an ultra-high definition code stream into a series of Tile video blocks with equal resolution.

The interest area enlarged by a user is a perfect circle, the mode with the highest efficiency is to enable a rendered sub-image serving as a texture to be a circumscribed square of an enlarged area, and because the texture image of a video frame is forcibly stretched to a ratio of 2. Referring to fig. 2, the original video blocks can be directly set to be square due to the balance between the ERP forced stretching and the human pulling.

Referring to fig. 3, the ultra-high definition code stream may be divided into a plurality of square Tile blocks according to a row-column ratio of 1. The larger the Tile resolution, the lower the data utilization, and the too small Tile resolution may degrade the coding performance and compression efficiency. Setting the side length of the Tile block as tilleLength, recording the horizontal and vertical resolutions of the ultra-high definition stream as pixelWidth and pixelHeight respectively, and setting the horizontal Tile number of the ultra-high definition stream as colNum = pixelWidth/tilleLength and the vertical Tile number as rowNum = pixelHeight/tilleLength.

And video secondary distribution, namely performing plug-flow transmission on the partitioned ultrahigh-definition video Tile stream and the downsampled basic full view code stream again based on a general stream distribution protocol (such as MPEG DASH).

And the video interaction computing unit comprises the functions of viewpoint position information acquisition, block selection, code stream loading and the like.

And the video decoding and rendering unit comprises the functions of code stream fusion, video decoding, texture mapping and the like.

And code stream fusion, namely splicing a plurality of loaded Tie video blocks of the super-clear code stream into a square for decoding, so that the VR interactive equipment only needs one decoder to decode the super-clear code stream, and the performance requirement on the terminal equipment is reduced.

Referring to the sub-image position mapping schematic diagrams of fig. 6a, 6b, 6c, and 6d, since there are several situations in the position mapping of the rendered sub-image in the super-definition video code stream, in order to ensure that the resolution is uniform in the subsequent decoding process (avoid frequent re-initialization of the decoder), the number of tiles on each side of the rendered sub-image is processed according to [ renderLength/tileLength ] +1 during splicing, and a vacant part (single line or single column or row and column) is agreed to be filled by the first Tile, see the code stream fusion filling schematic diagram of fig. 7.

The method of code stream fusion follows the general method of MPEG MCTS code stream fusion, namely the modification of header information and the fusion of code stream information.

And video decoding, namely decoding the ultra-high definition Tile code stream and the basic full view code stream simultaneously, wherein the ultra-high definition Tile code stream and the basic full view code stream respectively use independent decoders. And the ultra-high-definition Tile code stream decodes the fused code stream, and establishes a texture object for the decoded image. And (4) independently decoding the basic full view code stream, and taking the decoded full view frame as a texture object.

The texture mapping comprises the functions of obtaining current amplification control information, converting the texture coordinate of an amplification area, coloring the fragment and the like. In the fragment coloring stage, for the basic full-view code stream, the texture object is colored according to the full-coordinate sampling color; for the amplification area, the texture coordinates need to be transformed according to the method to realize the amplification effect, and then the color is sampled from the texture object generated by the high-definition layer Tile according to the texture coordinates.

Example 2

The embodiment provides an interaction method for VR video slow live broadcast, which comprises the following processes:

obtaining a VR video slow live stream, and performing video caching on the VR video slow live stream; the slow live streaming copy of the cached VR video is divided into two copies, and the two copies are processed as follows: downsampling one part of the cached VR video slow live stream to form a basic full-view code stream, and performing video blocking and block coding on the other part of the cached VR video slow live stream; simultaneously transmitting a basic full view code stream and a partitioned VR video slow live stream;

a user sends a request for amplifying a specific area through VR interactive equipment in the live video process; acquiring information of the specific area, performing fusion decoding, and performing amplification type rendering display based on a view angle center of the specific area, specifically, acquiring position information and an amplification request (such as amplification factor) of the specific area, which are sent by a user through VR interactive equipment in a video live broadcast process, respectively decoding a basic full view code stream and a blocked VR video slow live broadcast stream (different decoders can be adopted for simultaneous decoding), decoding the basic full view code stream, and forming a full view texture object from a decoded image; decoding the partitioned VR video slow live stream, and taking the decoded image as a texture object; and rendering by OpenGL, confirming a circular amplification area on a window in a vertex shader stage, mapping a full-view texture object decoded by a basic full-view code stream to the global surface in a fragment shader stage, performing coordinate transformation on a texture object decoded by a partitioned VR video slow live stream, and sampling and mapping to the circular amplification area.

The method comprises the steps of obtaining a VR video slow live stream, performing video live broadcast after preprocessing, in brief, obtaining the VR video slow live stream, performing video caching and video down-sampling on the VR video slow live stream, and performing video blocking and video secondary distribution on videos subjected to video caching and video down-sampling. The video caching is implemented by presetting video caching time and storing the video caching time as two VR video slow live streams; the video downsampling is to reduce the resolution of a cached VR video slow live stream to form a basic full view code stream; the video blocking is to divide the other cached VR video slow live stream into Tile video blocks and to perform block coding based on MPEG MCTS constraint; the secondary distribution of the video is as follows: and carrying out plug flow transmission again on the VR video slow live stream after the Tile segmentation and the basic full view code stream after the downsampling based on a general stream distribution protocol.

Preferably, tile video block segmentation is performed on the cached slow VR video live stream, specifically, the slow VR video live stream is divided into a series of square Tile video blocks with equal resolution.

Example 3

Taking an example of a 64K (61440 x 30720) VR video slow live broadcast source, the specific steps of this embodiment are:

the video preprocessing unit caches the 10-minute VR ultra-high definition stream into the memory.

And the video preprocessing unit carries out resolution downsampling on the VR ultra-high definition code stream (64K, 61440x30720) to form an 8K basic full view code stream.

The basic full-view code stream is not subjected to blocking processing, a 64K ultrahigh-definition VR code stream is divided into 24 rows and 48 columns according to an ERP projection mode, 1152 blocks (Tile) with the same resolution (1280 x 1280) are counted, and blocking coding is carried out based on MPEG MCTS constraint. The blocks are numbered linearly from left to right and top to bottom.

And carrying out plug flow transmission on the partitioned ultra-high-definition video Tile stream and the downsampled basic full view code stream again based on a general stream distribution protocol (such as MPEG DASH).

The video interaction calculating unit obtains the position information of the current viewpoint and calculates the coordinates (x, y, z) of the current viewpoint of the audience.

The video interaction calculation unit sets the FOV of the circular enlargement area to be 36 degrees, then the render length of the side length of the sub-image is 6144, and the coordinates (s, t) of the sub-image are calculated according to the calculation method of the coordinates of the center point of the sub-image.

According to FIG. 5, the coordinates of the four corners and on which Tile can be calculated from (s, t):

coordinates of the point A at the upper left corner are as follows:

the row number and the column number of the Tile where the point A is located, and the serial number of the Tile are as follows:

tileOrder _A ＝row*48+col

similarly, the sequence number of the Tile where the three points B, C and D are located is calculated as:

after calculating the tiles at the four corner points, the sequence numbers of all overlapped tiles, that is, the tiles to be loaded, can be determined, the first row takes the tiles with sequence numbers [ tileOrderA, tileOrderB ], and the next Tile sequence number of each row can be obtained by simply adding 48 to the first row until the rows at which the points C and D are located.

The resolution of the square rendered sub-image is 6144 × 6144, and when the sub-image falls within the original image, both the horizontal and vertical directions overlap at least 5 tiles, and at most 6 tiles. In order to ensure the consistency of the subsequent decoding and rendering processing, the consistency of the resolution of the fused code stream needs to be ensured. The resolution of the image after each fusion and decoding is larger than that of the sub-image, and the sub-image is contained in the image. Only the sub-image portion is used for rendering. Therefore, the edge length of the Tile overlap region is set to 6.

And the video decoding and rendering unit performs code stream fusion, and splices the tiles into a 6x6 form for decoding. Since there are not necessarily 6 × 6 tiles that overlap, a certain Tile needs to be repeated to be spliced into a 6 × 6 form. As shown in fig. 7, only 5 × 5 tiles overlap, and the Tile numbered 1 can be repeated as the last row and the last column. The method follows the general method of MPEG MCTS code stream fusion, namely modifying the header information and fusing the code stream information.

And simultaneously decoding the ultra-high definition Tile code stream and the basic full-view code stream, wherein the ultra-high definition Tile code stream and the basic full-view code stream are respectively decoded by independent decoders. And decoding the fused code stream by the ultra-high-definition Tile code stream, and establishing a texture object by the decoded image. And independently decoding the basic full view code stream, and taking the decoded full view frame as a texture object.

The current Zoom control information is acquired, b _ Zoom =1, indicating that the Zoom function is turned on, and assuming that the FOV of the head-up display device is 120 ° and the monocular resolution is 1920 × 1080, the number of pixels in the horizontal direction is 36/120 × 1920=576, and for 64K high-definition layer video, the horizontal field angle corresponding to 576/61440 × 360 ° =3.38 ° for 576 pixels, so that distortion-free Zoom of 36 °/3.38 ° =10.65 times can be supported.

And (3) performing texture coordinate transformation of the magnified region:

at this time:

normalized＝6144/7680/2/(θ/360°)

θ＝36°/2，f_ZoomLevel＝10.65，factor＝cosθ。

and rendering is completed by OpenGL, according to an amplification control logic, a circular amplification area on a window is confirmed in a vertex shader stage, a full-view texture object decoded from a base layer VR video is mapped to a global surface in a fragment shader stage, and a sampling mapping is carried out to the circular amplification area after coordinate transformation is carried out on the texture object decoded from a high-definition layer fusion code stream.

The invention combines basic stream and ultra-high definition blocked transmission stream to perform an interaction function of VR image amplification, and the algorithm processing of all flows of the amplification interaction function creatively solves the problem of image interactivity of ultra-high definition VR video live broadcast, and belongs to the pioneering invention. Although the prior art includes code stream blocking, code stream fusion and spherical rendering, the problem of VR video transmission and fusion rendering as required is only solved, visual interaction of a picture area is not included, a scheme for amplifying a picture of VR video live broadcast is not provided, and the functions of selective downloading of an ultra-high definition code stream, setting of an amplification algorithm, rendering of an amplification area and the like for a view point center in the patent are achieved. And in ultra-high definition slow live scenes such as exhibition, competition, text travel and the like, the live broadcast system has picture interactivity, is beneficial to improving live broadcast experience, and increases live broadcast scene value.

In the foregoing description of specific embodiments of the invention, features described and/or illustrated with respect to one embodiment may be used in the same or similar way in one or more other embodiments, in combination with or instead of the features of the other embodiments.

It should be emphasized that the term "comprises/comprising" when used herein, is taken to specify the presence of stated features, elements, steps or components, but does not preclude the presence or addition of one or more other features, elements, steps or components.

In addition, the method of the present invention is not limited to be performed in the time sequence described in the specification, and may be performed in other time sequences, in parallel, or independently. Therefore, the order of execution of the methods described in this specification does not limit the technical scope of the present invention.

While the present invention has been disclosed above by the description of specific embodiments thereof, it should be understood that all of the embodiments and examples described above are illustrative and not restrictive. Various modifications, improvements and equivalents of the invention may be devised by those skilled in the art within the spirit and scope of the appended claims. Such modifications, improvements and equivalents are also intended to be included within the scope of the present invention.

Claims

1. An interaction method for VR video slow live broadcast is characterized in that: the method comprises the following steps:

receiving an interaction request sent by a user through VR interaction equipment in a video live broadcast process and responding; receiving an interaction request sent by a user through VR interaction equipment in a video live broadcast process and responding specifically comprises the following steps: a user sends a request for amplifying a specific area through VR interactive equipment in the live video process; acquiring the information of the specific area, performing fusion decoding, and performing amplified rendering display based on the view angle center of the specific area;

obtain VR video and broadcast the stream slowly, carry out the video after the preliminary treatment and broadcast, specifically include: obtaining a VR video slow live stream, performing video caching and video down-sampling on the VR video slow live stream, and then performing video blocking and video secondary distribution on a video subjected to video caching and video down-sampling;

the video caching is implemented by presetting video caching time and storing the video caching time as two VR video slow live streams; the video down-sampling is to reduce the resolution of a cached VR video slow live stream to form a basic full view code stream; the video blocking is to perform Tile video block segmentation on the other cached VR video slow live stream and perform block coding based on MPEG MCTS constraint; the secondary video distribution is as follows: carrying out plug flow transmission on the VR video slow live stream after Tile segmentation and the basic full-view code stream after downsampling based on a general stream distribution protocol;

the method comprises the following steps that a user sends a request for amplifying a specific area through VR interactive equipment in the live video broadcasting process, obtains information of the specific area, performs fusion decoding, and performs amplified rendering display based on a view angle center of the specific area, and specifically comprises the following steps: acquiring position information and an amplification request of a specific area sent by a user through VR interactive equipment in a live video broadcasting process in real time; decoding the basic full-view code stream, and forming a full-view texture object by using the decoded image; decoding the partitioned VR video slow live stream, and taking the decoded image as a texture object; rendering is carried out by OpenGL, a circular amplification area on a window is confirmed in a vertex shader stage, a full-view texture object decoded by a basic full-view code stream is mapped to the global surface in a fragment shader stage, and sampling mapping is carried out to the circular amplification area after coordinate transformation is carried out on the texture object decoded by a blocked VR video slow live stream.

2. The VR video slow-live interaction method of claim 1, comprising: the method specifically comprises the step of dividing the cached VR video slow live stream into a series of square Tile video blocks with equal resolution.

3. The utility model provides a VR video slow live interactive system which characterized in that: the method comprises the following steps:

the video decoding and rendering unit is used for acquiring the information of the specific area, performing fusion decoding on the information, and performing amplified rendering display on the basis of the visual angle center of the specific area;

the video pre-processing unit is used for executing: obtaining a VR video slow live stream, performing video caching and video down-sampling on the VR video slow live stream, and then performing video blocking and video secondary distribution on a video subjected to video caching and video down-sampling;

the video caching is implemented by presetting video caching time and storing the video caching time as two VR video slow live streams; the video downsampling is to reduce the resolution of a cached VR video slow live stream to form a basic full view code stream; the video blocking is to divide the other cached VR video slow live stream into Tile video blocks and to perform blocking coding based on MPEG MCTS constraint; the secondary distribution of the video is as follows: performing plug-flow transmission on the VR video slow live stream subjected to Tile video block segmentation and the base full view code stream subjected to down sampling based on a general stream distribution protocol;

the video interaction computing unit is used for executing: acquiring position information and an amplification request of a specific area sent by a user through VR interactive equipment in a live video broadcasting process in real time; the video decoding and rendering unit is used for executing: decoding the basic full-view code stream, and forming a full-view texture object by using the decoded image; decoding the partitioned VR video slow live stream, and taking the decoded image as a texture object; rendering is carried out by OpenGL, a circular amplification area on a window is confirmed in a vertex shader stage, a full-view texture object decoded by a basic full-view code stream is mapped to the global surface in a fragment shader stage, and sampling mapping is carried out to the circular amplification area after coordinate transformation is carried out on the texture object decoded by a blocked VR video slow live stream.

4. The VR video slow live interactive system of claim 3, wherein: the Tile segmentation of the cached VR video slow live stream is specifically to divide the VR video slow live stream into a series of square Tile video blocks with equal resolution.