CN111064981B

CN111064981B - System and method for video streaming

Info

Publication number: CN111064981B
Application number: CN201811202640.5A
Authority: CN
Inventors: 冉瑞元; 张佳宁; 张道宁
Original assignee: Nolo Co ltd
Current assignee: Nolo Co ltd
Priority date: 2018-10-16
Filing date: 2018-10-16
Publication date: 2021-07-16
Anticipated expiration: 2038-10-16
Also published as: CN111064981A

Abstract

The invention discloses a system and a method for video streaming, wherein the method comprises the following steps: acquiring attitude data of VR equipment; obtaining predicted attitude data according to the obtained attitude data; sending the predicted attitude data to application platform software for picture rendering; and acquiring the rendered picture, and sending the rendered picture to VR equipment for display. The method can predict the attitude data more accurately, thereby reducing data jitter and delay.

Description

System and method for video streaming

Technical Field

The present invention relates to a system for video streaming; in particular to a system for video streaming for virtual reality applications; meanwhile, the method for realizing the attitude synchronization of the system belongs to the technical field of virtual reality.

Background

Video streaming (video streaming) is a video playing technology that compresses a series of video data and then transmits the compressed video data in segments to transmit video and audio on a network in real time for viewing. The video streaming technology is used by the once popular Quick Time Player, Real Player, etc. At present, with the deep development of industries such as network application, live application and the like, video streaming is applied more and more widely.

Virtual Reality (VR) is a Virtual environment generated by a modern high-technology means with a computer technology as a core, and a user obtains the same feeling as the real world through vision, hearing, touch and the like by means of special input/output devices. The virtual reality technology is a high-level man-machine interaction technology which comprehensively applies computer graphics, man-machine interface technology, sensor technology, artificial intelligence and the like, makes a vivid artificial simulation environment and can effectively simulate various perceptions of a human in a natural environment.

Head-mounted displays (head displays for short) are one of the core devices for virtual reality display, and are mainly classified into three types: the first type is PC head display, which needs to connect the head display with the PC by a data line, and the head display can process data on the PC and display pictures on the head display; the second type is an integrated head display, which is essentially a VR device with running storage capability and display capability, and does not need external equipment, so the integrated head display is also called as a VR all-in-one machine; the third type is a mobile head display, which is used by matching a VR box with a mobile terminal.

In the existing integrated head display and mobile head display, a user can only use a pre-installed application program of the head display, but cannot use an application program which can only be used on the PC head display, such as videos and applications running on a Steam VR application platform, and the application range of the integrated head display and the mobile head display is limited.

Disclosure of Invention

In view of the deficiencies of the prior art, the present invention provides a system for video streaming.

Another technical problem to be solved by the present invention is to provide a method for video streaming.

In order to achieve the purpose, the invention adopts the following technical scheme:

according to a first aspect of the embodiments of the present invention, a system for video streaming is provided, including a terminal, a VR device;

the terminal is provided with a server of application platform software and streaming software;

the VR equipment is used for sending attitude data to a server of the streaming software;

the streaming software server is used for obtaining predicted attitude data according to the attitude data sent by the VR equipment;

and the application platform software is used for rendering pictures according to the predicted attitude data.

Preferably, a streaming software client is installed on the VR device, and the VR device sends the attitude data to a streaming software server through the streaming software client;

and the server side of the streaming software acquires the rendered pictures and sends the rendered pictures to the client side of the streaming software, and the client side of the streaming software sends the pictures to the VR equipment for display.

Preferably, the server of the streaming software comprises a server driver, and the positioning prediction unit is located in the server driver and is configured to obtain predicted attitude data according to the attitude data sent by the VR device.

Preferably, the obtaining, by the positioning prediction unit, predicted posture data according to the posture data sent by the VR device includes:

acquiring a first time stamp and a second time stamp, wherein the first time stamp is the time when the server side of the streaming software receives the ith attitude data, and the second time stamp is the time when the server side of the streaming software receives the (i + 1) th attitude data;

acquiring data delay of the attitude data received by a server of the streaming software;

acquiring a third timestamp, wherein the third timestamp is the time when the application platform software samples from the server of the streaming software;

and obtaining the predicted attitude data of the third timestamp according to the attitude data of the first timestamp and the first timestamp, the attitude data of the second timestamp and the data delay.

According to a second aspect of the embodiments of the present invention, there is provided a method of video streaming, including the steps of:

acquiring attitude data of VR equipment;

obtaining predicted attitude data according to the obtained attitude data;

sending the predicted attitude data to application platform software for picture rendering;

and acquiring the rendered picture, and sending the rendered picture to VR equipment for display.

Preferably, the obtaining of the predicted attitude data according to the obtained attitude data includes the following steps:

acquiring a third timestamp, wherein the third timestamp is the time of sampling from a server of the streaming software by the application platform software;

Preferably, the data delay is obtained by using the following formula:

M＝T0+(t2-t1)+ΔT；

wherein M is the data delay; t1 is the time when the sensor acquires the attitude data; t2 is the time when the attitude data is sent to the server of the streaming software; Δ T is the network delay.

Preferably, the predicted attitude data of the third timestamp is obtained according to the attitude data of the first timestamp and the first timestamp, the attitude data of the second timestamp and the second timestamp, and the data delay, and the following formula is adopted:

wherein, V_jIs T_j' predicted attitude data at time; t is_iIs a first timestamp; v_iPose data that is a first timestamp; t is_i+1Is a second timestamp; v_i+1Pose data that is a second timestamp; t is_j' is a third timestamp; m is the data delay.

Preferably, the sending the predicted attitude data to the application platform software for image rendering includes the following steps:

sending the predicted attitude data to a data interface, and transmitting the predicted attitude data to VR application in the application platform software through the data interface;

and determining the picture content rendered by an application engine according to the predicted attitude data obtained by the VR application and the application logic, and rendering the picture.

According to a third aspect of the embodiments of the present invention, there is provided an apparatus for implementing video streaming, including a processor and a memory, the processor being configured to execute a program of video streaming stored in the memory to implement the method of video streaming as described above.

According to the video streaming method provided by the invention, the attitude data sent by VR equipment is received by a server of streaming software, the attitude data when the application platform software carries out image rendering is predicted according to the attitude data sent by the VR equipment, the image rendering is carried out according to the predicted attitude data, and the rendered image is sent to the VR equipment for image display. The method is characterized in that a server of streaming software is installed on a PC terminal, and the method ensures that a PC is responsible for running VR application and only picture display is required for VR equipment; therefore, the picture processing can be carried out through the hardware of the PC, satisfactory pictures can be obtained on the screen of the VR equipment, and the jitter and the delay of data can be reduced through accurate prediction of the attitude data, so that the picture quality is higher.

Drawings

FIG. 1 is a schematic diagram of a video streaming system according to the present invention;

FIG. 2 is a flow chart of a method for video streaming according to the present invention;

FIG. 3 is a flow chart of the positioning prediction unit predicting pose data provided by the present invention;

fig. 4 is a diagram illustrating data delay in streaming frames of a VR application to a VR device according to an embodiment of the present invention.

Detailed Description

The technical contents of the invention are described in detail below with reference to the accompanying drawings and specific embodiments.

As shown in fig. 1, the system for video streaming provided by the present invention includes a terminal and a VR device.

The terminal is provided with a server of application platform software and streaming software; in the embodiment provided by the present invention, the terminal is exemplified by a pc (personal computer), and may also be a terminal with data processing capability, such as a tablet computer, a smart television, and a smart phone. The application platform software installed on the PC is, illustratively, the stem VR platform software (corresponding APP on the smartphone). Of course, other application platforms such as a VIVEPORT platform, a HYPEREAL platform, an Ant-Vision VR application platform, a grand assistant, Tencent WEGAME, an OGP application platform, etc. may be used. Application engines UE4 (unknown Engine 4), U3D (Universal 3D), etc. used by VR applications in the SteamVR application platform software have already been integrated with the SDK provided by OpenVR, so that the screen of the application can be seen on the display of the PC. The server side of the streaming software installed on the terminal may be set as the a side of the nomoheme software, for example.

The service end of the streaming software comprises two parts, one part is a control interface, and the other part is a server driver. The server driver is preferably a dll file, but may be implemented in other forms, such as an SDK, API file, etc. When application platform software, such as SteamVR platform software, is started on a PC, the server driver is loaded accordingly.

The client installed with the streaming software on the VR device may be set as the B-side of the nomogram software, for example. The VR equipment is provided with various sensors, such as a nine-axis sensor, an inertial sensor and the like, and can sense attitude actions, namely pitching, rolling, yawing and the like. The VR equipment sends the attitude data to a streaming software server on the PC through a streaming software client; and sending the data to the application platform software through a server of streaming software on the PC, so that the application platform software renders a real-time picture. The VR equipment can be integrated VR equipment, then the client side of the series flow software is installed in the system of the integrated VR equipment, the picture is also displayed on the display screen of the integrated VR equipment, and the sensor is fixedly installed on the integrated VR equipment. The VR equipment can be mobile VR equipment, then the client of series flow software is installed in the smart mobile phone of the mobile VR equipment, the picture can be displayed on the smart mobile phone of the mobile VR equipment and also can be displayed on the display screen of the mobile VR equipment, and the sensor can be fixedly installed in the shell of the mobile VR equipment and also can be used as the sensor of the smart mobile phone installed in the mobile VR equipment.

The PC and the VR device are connected by a wired/wireless method, and when the PC and the VR device are connected by a wireless method, they preferably operate in a WLAN (wireless local area network) or 5G communication environment. Due to the characteristics of high speed, low delay and the like of 5G communication, the actual delay generated by the PC and the VR equipment under the 5G communication environment is basically negligible.

In order to have a good use experience, if data is sent to the application platform software without limitation, that is, each time a piece of data is received, the data is directly sent to the application platform software, since the frequency of each device is different (for example, the frequency of data transmission by the VR device is X, and the frequency of data acquisition by the application platform software is Y, X is not equal to Y), the delay is different, and finally, problems such as picture delay, picture jitter, and the like are caused. In order to solve the problem, data must be reasonably estimated, so that rendered pictures are more stable, smooth and fluent. Therefore, in the video streaming system provided by the invention, the terminal is provided with the positioning prediction unit, and the positioning prediction unit is arranged in the server driver of the streaming software server in a software form. The positioning prediction unit is used for predicting gesture data required by the application platform software for rendering pictures according to the gesture data of the VR equipment, and the application platform software renders real-time pictures according to the prediction data. The predicted attitude data is obtained through the positioning prediction unit, so that the attitude data of the application platform software at the next moment can be predicted more accurately, and the data jitter and delay are reduced. And rendering the image of the predicted attitude data at the next moment in the VR application, and transmitting the rendered image to VR equipment by the terminal through a client of streaming software in a UDP (user datagram protocol) mode through a server of the streaming software to be displayed. This process is described in detail later.

Udp (user data program) is a connectionless transport layer protocol in the open system interconnection reference model, and provides transaction-oriented simple unreliable information transfer service.

In the embodiment provided by the invention, the VR application in the application platform software uses an application engine (UE4, U3D, etc.), and integrates an SDK provided by a data interface, such as an SDK of OpenVR, so that a screen of the application can be seen on a display of a PC.

In order to stream the frames of the VR application to the VR device, the architecture shown in fig. 1 is used to implement the requirement, in the stream architecture shown in fig. 1, several core modules that need to be implemented are: the device comprises a server driver installed on a streaming software server of a terminal, VR equipment, a streaming software client installed on the VR equipment, and a positioning prediction unit. The VR equipment is used for acquiring attitude data and transmitting the data to the server driver; and the client and the server driver of the streaming software are used for data transmission and processing. And the positioning prediction unit is used for predicting the attitude data required by the application platform software for image rendering according to the attitude data sent by the VR equipment. The positioning prediction unit is positioned in a server driver of a streaming software server.

Fig. 2 is a flow chart of a method for video streaming provided by the present invention, and the following describes the whole process of video streaming in detail by way of example.

And S1, acquiring the posture data of the VR equipment.

The attitude data of the VR device is acquired by sensors mounted on the VR device, such as a nine-axis sensor, an inertial sensor, a six-axis sensor, a gyroscope, a magnetometer, and the like. Pose data for other parts of the user, such as the hand, is obtained by sensors mounted on the handle of the localization tracking device.

And transmitting the attitude data of the VR equipment to a client of streaming software installed on the VR equipment, and transmitting the attitude data to a server driver of a streaming software server installed on a terminal through the client of the streaming software through UDP (user Datagram protocol), wherein the server driver acquires the attitude data of the VR equipment.

And S2, obtaining the predicted attitude data according to the acquired attitude data.

In order to have good use experience, the attitude data sent by the VR equipment must be reasonably estimated, so that rendered pictures are more stable, smooth and fluent. Therefore, in the video streaming system provided by the invention, the server driver comprises the positioning prediction unit, and the positioning prediction unit can be arranged in the server driver of the streaming software server in a software form. The positioning prediction unit obtains predicted attitude data according to the acquired attitude data, and specifically comprises the following steps:

and S21, acquiring a first time stamp and a second time stamp, wherein the first time stamp is the time when the ith attitude data is received by the server of the streaming software, and the second time stamp is the time when the (i + 1) th attitude data is received by the server of the streaming software.

In the embodiment provided by the present invention, the positioning prediction unit obtains a first timestamp Ti (i is 1, 2 … … N, N is a positive integer, and N > -1), where the first timestamp Ti is obtained by signing the i-th posture data sent by the received VR device and the time when the posture data is received. The positioning prediction unit obtains a second time stamp T_i+1(i 1, 2 … … N, N being a positive integer, N > -1), a second timestamp T_i+1The time stamp is obtained by signing the received (i + 1) th posture data sent by the VR device and the time of receiving the posture data.

And S22, acquiring the data delay M of the streaming software when the server receives the attitude data.

When video streaming between different devices is realized, the frequency of data collected during rendering by application platform software is X Hz; the frequency at which the VR device transmits the pose data is Y hertz. The data delay M is the total delay from the generation of the action to the receipt of the gesture data by the server driver.

The data delay M can be obtained by:

M＝T0+(t2-t1)+ΔT

where T0 is the delay from the time the motion is generated to the time the sensor acquires the motion, in particular for a VR device worn on the user's head, the delay is the delay from the time the user's head is moved to the time the sensor acquires the head motion; t1 is the time when the sensor acquires the attitude data; t2 is the time when the attitude data is sent to the server of the streaming software; Δ T is the network delay.

As in fig. 4, is all the data delay involved in the action generation to server driver getting the data.

In the embodiment provided by the present invention, the data delay Δ T due to the network delay is fixed and only needs to be calculated once in the whole video streaming process. The process of acquiring data delay caused by network delay specifically comprises the following steps:

s221, at a first sending time t3, a server driver of the streaming software server sends request data to the VR device.

S222, at the first receiving time t4, the server driver of the streaming software receives the reply message sent by the VR device.

S223, obtaining the network delay according to the first receiving time and the first sending time.

The network delay adopts the following formula:

the network delay Δ T may be obtained by the time of the request and response between the server driver and the VR device.

And S23, acquiring a third timestamp, wherein the third timestamp is the time when the application platform software samples from the server side of the streaming software.

The frequency of data transmission of the VR equipment is X, the frequency of data acquisition of the application platform software is Y, and X is not equal to Y. The positioning prediction unit acquires ith attitude data and (i + 1) th attitude data which are sent to a streaming software server by VR equipment and corresponding first timestamp Ti and second timestamp T_i+1Followed by the acquisition of a third time stamp V_j', third time stamp V_j'is the time when the application platform software samples from the streaming software's server.

And S24, obtaining the predicted attitude data of the third timestamp according to the attitude data of the first timestamp and the first timestamp, the attitude data of the second timestamp and the second timestamp, and the data delay.

Obtaining predicted attitude data of a third timestamp according to the attitude data obtained by the first timestamp and the first timestamp, the attitude data obtained by the second timestamp and the second timestamp, and data delay, and adopting the following formula:

By the method, T can be accurately predicted_jAttitude data at time' to reduce data jitter and latency.

And S3, sending the predicted attitude data to the VR application for picture rendering.

Will predict the obtained T_jAnd the posture data at the moment is transmitted to application platform software for picture rendering, and then the rendered picture is transmitted to VR equipment for display.

The application platform software performs picture rendering according to the predicted attitude data, and sends rendered pictures to the VR equipment for picture display, and the method specifically comprises the following steps:

and S31, sending the predicted attitude data to a data interface, and transmitting the predicted attitude data to VR application in the application platform software through the data interface.

And transmitting the predicted attitude data acquired by a positioning prediction unit in a server driver of a streaming software server to a data interface, wherein VR application in SteamVR application platform software uses an application engine, an SDK provided by OpenVR data interface is integrated, and the OpenVR data interface transmits the attitude data to VR application.

And S32, determining the picture content rendered by the application engine according to the predicted attitude data obtained by the VR application and the application logic, and rendering the picture.

And transmitting the posture data and the application logic obtained according to the VR application to an application engine to obtain the exact rendering picture content, and rendering the picture. The application Engine is Unreal Engine 4, Universal 3D and the like.

Preferably, the control information obtained by the server driver of the streaming software server is also sent to the VR application for screen rendering. And sending the control information acquired by the streaming software server to a data interface OpenVR, and transmitting the control information to VR application through the data interface OpenVR. And the VR application also transmits the control information to an application engine according to the obtained control information so as to obtain the exact rendering picture content and render the picture.

In the embodiment provided by the invention, the data rendered by the application engine is stored in a video memory of a video card, for example, a video memory of an Nvidia video card, and notifies the VR application that the picture is rendered, the VR application notifies the data interface OpenVR, and the data interface OpenVR notifies the server driver of the streaming software server of the event that the rendering is completed.

And S4, acquiring the rendered picture, and sending the picture to VR equipment for display.

The method for acquiring the rendered picture and sending the rendered picture to VR equipment for display specifically comprises the following steps:

and S41, acquiring texture data corresponding to the rendered picture, and encoding a frame of picture into a plurality of data packets.

After the server driver of the streaming software server knows the rendered event of the picture, the corresponding texture data, namely the data of one frame of picture, is found in the video memory through the texture address transmitted by the data interface OpenVR, and the one frame of picture is encoded into a plurality of data packets.

In the embodiment provided by the invention, the special library for video coding and decoding provided by great is adopted, namely, the NvCodec library.

When initialization is performed, the encoding format and the picture format of the ncodec library are notified in advance. In the embodiment provided by the present invention, the data is encoded using H264. Regarding the picture FORMAT, using the NV _ ENC _ BUFFER _ FORMAT _ ABGR formatted picture, the NvCodec library will encode a frame of a picture into a plurality of small packets as required in the current frame.

And S42, sending the encoded data packets to VR equipment for decoding and displaying.

After the encoding is finished, a server driver of the streaming software server side sends a plurality of encoded data packets to a streaming software client installed on the VR device, the streaming software client is transmitted to the VR device, and the VR device decodes the received data packets after receiving a complete frame picture data to form a complete image on the VR device and displays the complete image.

The method and related hardware for displaying the picture by the VR device may use any method and hardware that are available and are not specifically required here.

In the embodiment of the present invention, the server side of the streaming software installed on the terminal may further obtain control information sent by the VR device, where the control information may be from the VR device, or from a controller cooperating with the VR device, or the like; and the server side of the streaming software sends the predicted attitude information to the application platform software for picture rendering, and simultaneously sends the control information to the application platform software for picture rendering.

In summary, the video streaming method provided by the present invention predicts the gesture data of the application platform software when performing the frame rendering according to the gesture data sent by the VR device by calculating the data delay of the gesture data received by the server of the streaming software, performs the frame rendering according to the predicted data, and sends the rendered frame to the VR device for frame display.

The embodiment of the present invention further provides an apparatus for implementing video streaming, where the apparatus includes a processor and a memory, and the processor is configured to execute a program for implementing video streaming stored in the memory to implement the method for implementing video streaming as described above. The memory herein stores one or more programs. Wherein the memory may comprise volatile memory, such as random access memory; the memory may also include non-volatile memory, such as read-only memory, flash memory, a hard disk, or a solid state disk; the memory may also comprise a combination of memories of the kind described above. When executed by one or more processors, the one or more programs in the memory may implement some or all of the steps of the above-described method for video streaming in the above-described method embodiments.

The system and method for video streaming provided by the present invention are described in detail above. Any obvious modifications to the invention, which would occur to those skilled in the art, without departing from the true spirit of the invention, would constitute a violation of the patent rights of the invention and would carry a corresponding legal responsibility.

Claims

1. A video streaming system is characterized by comprising a terminal and a VR device; the terminal is provided with a server of application platform software and streaming software; the VR equipment is used for sending attitude data to a server of the streaming software;

the streaming software server is used for obtaining predicted attitude data according to the attitude data sent by the VR equipment; wherein obtaining predicted pose data comprises the sub-steps of: acquiring a first time stamp and a second time stamp, wherein the first time stamp is the time when the server side of the streaming software receives the ith attitude data, and the second time stamp is the time when the server side of the streaming software receives the (i + 1) th attitude data; acquiring data delay of the attitude data received by a server of the streaming software; acquiring a third timestamp, wherein the third timestamp is the time when the application platform software samples from the server of the streaming software; obtaining predicted attitude data of the third timestamp by adopting the following formula according to the attitude data of the first timestamp and the first timestamp, the attitude data of the second timestamp and the second timestamp, and data delay:

wherein, V'_jIs T'_jPredicted attitude data at time, T_iIs a first time stamp, V_iIs attitude data of a first time stamp, T_i+1Is the secondTime stamp, V_i+1Is pose data of a second time stamp, T'_jIs the third timestamp, M is the data delay;

the data delay is obtained by adopting the following formula:

M＝T0+(t2–t1)+ΔT；

wherein, T0 is a delay from the generation of the action to the acquisition of the action by the sensor, T1 is a time when the sensor acquires the attitude data, T2 is a time when the attitude data is sent to a server of the streaming software, and Δ T is a network delay;

2. The video streaming system of claim 1, wherein:

the VR equipment is provided with a client of streaming software, and sends attitude data to a server of the streaming software through the client of the streaming software;

3. The video streaming system of claim 2, wherein:

the server side of the streaming software comprises a server driver, and the positioning prediction unit is located in the server driver and used for obtaining predicted attitude data according to the attitude data sent by the VR equipment.

4. A method of video streaming, comprising the steps of:

acquiring attitude data of VR equipment;

obtaining predicted attitude data according to the obtained attitude data; wherein obtaining predicted pose data comprises the sub-steps of: acquiring a first time stamp and a second time stamp, wherein the first time stamp is the time when the server side of the streaming software receives the ith attitude data, and the second time stamp is the time when the server side of the streaming software receives the (i + 1) th attitude data; acquiring data delay of the attitude data received by a server of the streaming software; acquiring a third timestamp, wherein the third timestamp is the time of sampling from a server of the streaming software by the application platform software; obtaining predicted attitude data of the third timestamp by adopting the following formula according to the attitude data of the first timestamp and the first timestamp, the attitude data of the second timestamp and the second timestamp, and data delay:

wherein, V'_jIs T'_jPredicted attitude data at time, T_iIs a first time stamp, V_iIs attitude data of a first time stamp, T_i+1Is the second time stamp, V_i+1Is pose data of a second time stamp, T'_jIs the third timestamp, M is the data delay;

the data delay is obtained by adopting the following formula:

M＝T0+(t2–t1)+ΔT；

5. The method for video streaming according to claim 4, wherein the step of sending the predicted pose data to an application platform software for rendering comprises the steps of:

6. An apparatus for implementing video streaming, comprising a processor and a memory, the processor being configured to execute a program for video streaming stored in the memory to implement the method for video streaming according to claim 4 or 5.