CN113099204B - Remote live-action augmented reality method based on VR head-mounted display equipment - Google Patents

Remote live-action augmented reality method based on VR head-mounted display equipment Download PDF

Info

Publication number
CN113099204B
CN113099204B CN202110392566.3A CN202110392566A CN113099204B CN 113099204 B CN113099204 B CN 113099204B CN 202110392566 A CN202110392566 A CN 202110392566A CN 113099204 B CN113099204 B CN 113099204B
Authority
CN
China
Prior art keywords
virtual
client
depth
scene
real
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110392566.3A
Other languages
Chinese (zh)
Other versions
CN113099204A (en
Inventor
郝爱民
张素梅
李如意
郭日俊
李帅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qingdao Inception Space Technology Co ltd
Original Assignee
Qingdao Research Institute Of Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qingdao Research Institute Of Beihang University filed Critical Qingdao Research Institute Of Beihang University
Priority to CN202110392566.3A priority Critical patent/CN113099204B/en
Publication of CN113099204A publication Critical patent/CN113099204A/en
Application granted granted Critical
Publication of CN113099204B publication Critical patent/CN113099204B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/204Image signal generators using stereoscopic image cameras
    • H04N13/239Image signal generators using stereoscopic image cameras using two 2D image sensors having a relative position equal to or related to the interocular distance
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/271Image signal generators wherein the generated image signals comprise depth maps or disparity maps
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/275Image signal generators from 3D object models, e.g. computer-generated stereoscopic image signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/30Image reproducers
    • H04N13/332Displays for viewing with the aid of special glasses or head-mounted displays [HMD]
    • H04N13/344Displays for viewing with the aid of special glasses or head-mounted displays [HMD] with head-mounted left-right displays

Abstract

The application discloses a remote live-action augmented reality method based on VR head-mounted display equipment, which comprises the following steps: step S100: acquiring a remote live-action video in real time by adopting a camera device at a far end to obtain a remote live-action video of a far-end scene; performing depth calculation on the obtained remote live-action video to obtain depth information of the remote live-action video; step S200: constructing a scene structure by using the depth information of the remote live-action video obtained in the step S100, and fusing the virtual object and the remote live-action video to obtain virtual-real fused image data; step S300: and intercepting the virtual-real fused image data, and sending the virtual-real fused image data to the client in real time through the network. The invention has the advantages of overcoming the defects that the prior VR live broadcast software can only passively watch the pushed live broadcast picture, can not superpose the virtual model, has low image quality, weak immersion, limited effective transmission distance and the like.

Description

Remote live-action augmented reality method based on VR head-mounted display equipment
Technical Field
The invention relates to a remote live-action augmented reality method based on VR head-mounted display equipment, and belongs to the field of virtual reality.
Background
With the popularization of virtual reality technology and 5G networks, the novel live broadcast mode of VR + live broadcast gradually becomes a new media application trend. Live broadcast based on VR technology can provide high-quality live broadcast pictures for users, so that the users are immersed in scenes full of reality, the live broadcast becomes more real and stereoscopic, and strong visual impact is brought to the users, thereby arousing the interests of the users. 2020, the mountaineer for the height measurement of the peal successfully climbs the peal, wherein a VR +5G technology is used for panoramic direct seeding, so that common people can enjoy the immersive experience that the common people can watch stars in the cloud and the sky in the plateau of the snow region, the time and space limitations are broken, the hands can be smoothly enjoyed, and the clouds and the sea watch the sunrise. However, in the existing VR + live broadcasting mode, the user still stays in the traditional "you play and see" stage, and only can passively watch the pushed live broadcasting picture. Goggles flight glasses that big jiang released can assist user operation unmanned aerial vehicle with the picture at first person's visual angle, brings the flight experience of user's immersive, but has also received effective transmission distance's restriction. In addition, in the current VR live broadcast platform, the image quality of the picture received by the user is not high, the content homogenization is serious, the time delay is high, and the immersion feeling is not strong.
Disclosure of Invention
The invention provides a remote live-action augmented reality method based on VR head-mounted display equipment, which overcomes the defects that the existing VR live-action software can only passively watch pushed live-action pictures, can not superpose virtual models, has low image quality, weak immersion feeling, limited effective transmission distance and the like.
The invention adopts the technical scheme that a remote live-action augmented reality method based on VR head-mounted display equipment comprises the following steps:
step S100: acquiring a remote live-action video in real time by adopting a camera device at a far end to obtain a remote live-action video of a far-end scene; performing depth calculation on the obtained remote live-action video to obtain depth information of the remote live-action video;
step S200: constructing a scene structure by using the depth information of the remote live-action video obtained in the step S100, and fusing the virtual object and the remote live-action video to obtain virtual-real fused image data;
step S300: intercepting the image data after the virtual-real fusion, and sending the image data after the virtual-real fusion to a client in real time through a network;
step S400: the client receives the virtual-real fused image data transmitted by the network, and displays and plays the image data by using VR head-mounted display equipment;
step S500: the action data of the user are acquired through the VR head-mounted display device, and then the action data are transmitted through a network to control the remote acquisition device to follow up in real time.
In step S100, an external camera robot is used as a camera at a far end, and the external camera robot collects real-time video images from multiple angles and a wide angle; then, depth calculation is carried out on the video image through software to obtain video data and depth information of the shot real scene; the outer shooting robot comprises binocular cameras, two-degree-of-freedom holders, a calculation communication module and a trolley which are arranged in pairs; the binocular camera is fixed on the cradle head, and the cradle head and the computing communication module are fixed on the trolley; the computing communication module comprises a computing module in charge of depth computing, a rendering module for scene construction and virtual-real fusion and a wireless communication module for wireless data transmission; data transmission is carried out between the external camera and the client through a cloud server; the client side is composed of data receiving and processing equipment and VR head-mounted display equipment, and the cloud server comprises a Web server and a signaling server.
Preferably, in the above VR head-mounted display device-based augmented reality method, the specific process of step S200 includes:
s201, constructing a 3D scene structure according to the depth information obtained in the step S100; combining the 3D scene structure with the collected 2D video image to render a 3D scene;
s202, importing the geometric virtual object generated by the computer into a 3D scene;
and S203, carrying out occlusion and shadow calculation on the imported geometric virtual object according to the depth information calculated by the calculation module, and realizing virtual-real fusion.
Preferably, in the above VR head-mounted display device-based augmented reality remote sensing method, the specific process of step S300 includes:
s301, rendering the scene subjected to virtual-real fusion into a Back Buffer of OpenGL by a rendering module according to a specified frame rate;
s302, after the Back Buffer is updated, the rendering module acquires rendered image data from the Back Buffer;
s303, carrying out compression coding on the image data rendered in the step S302;
s304, the wireless communication module sends the coded image data to the client in real time through a high-speed network.
Preferably, in the above VR head-mounted display device based augmented reality remote sensing method, the specific process of step S400 includes:
s401, the client receives the transmitted video image data through a WebRTC point-to-point technology;
s402, the client decodes the image data after receiving the image data;
s403, converting the decoded image data into Texture, and using the Texture as a Texture map of a geometric patch in a scene;
and S404, projecting the video image to VR head-mounted display equipment for display.
Preferably, in the above remote realistic augmented reality method based on the VR head-mounted display device, the specific process of step S500 includes:
s501, the client tracks and collects head movements of a user through VR head-mounted display equipment to generate a rotation instruction;
s502, the client acquires instructions of the outward shooting robot, such as forward and backward movement, sent by a user through a handle;
s503, converting the instruction data into character strings, connecting the character strings through WebRTC and sending the character strings to the pickup robot in real time;
s504, after the external camera robot receives the instruction data, converting the character string into a numerical value; the external camera robot is respectively converted into actual instructions of a holder or a trolley according to the identifiers of different instructions;
and S505, the outer shooting robot controls the holder to rotate and the trolley to retreat or advance through actual instructions.
Preferably, in the above VR head-mounted display device-based augmented reality method, the specific process of step S100 includes:
step S101: in the process of trolley advancing and the process of cradle head rotation of the outer shooting robot, the binocular camera collects surrounding live-action videos in real time, and sends the collected left and right video images to the computing module;
step S102: the calculation module corrects the two paths of video images; after correction, the corresponding pixels of the same reference point on the two images are on the same line;
step S103: acquiring disparity maps of a left video image and a right video image by adopting an SGBM algorithm, and selecting one of the disparity maps for calculation;
step S104: filling a parallax map hole: detecting the disparity map selected in the step S103 to find a cavity area, and then filling the cavity by using the average value of the adjacent reliable disparity values;
step S105: and converting the disparity map into a depth map, wherein the conversion formula is as follows:
depth = (Fx × base)/Disp, where Depth represents a Depth value; fx represents the normalized focal length; baseline represents the distance between the optical centers of two cameras, and is called the Baseline distance; disp represents a disparity value;
step S106: and traversing pixels of the disparity map to perform depth conversion, so as to obtain a depth map.
In step S201, camera calibration is performed by using the collected binocular video images Image1 and Image2 and the calculated depth information, so as to obtain the pos information of the binocular camera in the real scene, and the pos information is assigned to the virtual binocular camera in the virtual scene; adding two geometric patches in a virtual scene, respectively serving as display pictures of a virtual camera, and respectively displaying binocular 2D video images Image1 and Image2 on the two patches;
in step S202, a virtual object is imported into a virtual 3D scene, and an offset between the position information P1 of the virtual object to be placed in the real scene and the position information P2 of the binocular camera in the real scene is calculated; and (3) calculating to obtain the position information P4 of the virtual object in the virtual space according to the position information P3 of the virtual camera, wherein the calculation formula is as follows: p4= P3+ P1-P2.
Preferably, in step S203, the implementation manner of occlusion in the above-mentioned remote live-action augmented reality method based on the VR head-mounted display device is:
in the step S100, depth calculation of a scene is carried out according to a video image acquired by a real binocular camera to obtain a depth map D1; generating a depth map D2 through ZBuff data of a binocular camera of the virtual scene; comparing the pixels traversing the depth map D1 with the pixels corresponding to the depth map D2, and if the depth value of the pixel of the depth map D1 is smaller than that of the pixel corresponding to the depth map D2, performing pixel position offset on the pixels corresponding to the video images Image1 and Image2 on the two display panels to enable the positions of the pixels to exceed the positions of the pixels corresponding to the depth map D2;
in step S203, the shadow is implemented as: the spatial position P5 corresponding to the depth image pixel can be restored according to the depth image D1 of the real binocular camera; obtaining a space coordinate system matrix M of the lamplight through the position and rotation information of the lamplight, and calculating a space position P6 of a pixel under the lamplight coordinate system, wherein the calculation formula is as follows: p6= M × P5; and calculating whether the pixel is shielded from the virtual object under the lamplight coordinate system, and if so, superposing a shadow when the pixel is rendered.
Preferably, in the above VR head-mounted display device-based remote live-action augmented reality method, in step S304, the WebRTC-based point-to-point connection implementation process is as follows:
establishing connection between the client and the signaling server and between the external camera robot and the signaling server by using socket.IO, and acquiring an IP address and a corresponding communication port of the signaling server;
establishing point-to-point connection between the external camera robot and the client through a signaling server, wherein the specific process comprises the following steps:
the client creates a PeerConnection; a client creates a Data Channel and creates an Offer;
in the callback function of creating Offer, obtaining the Description information of Offer, setting the Description information as local Description information through an interface SetLocalDescription (), and then sending Offer to a signaling server; utilizing a regular expression to add the setting of the maximum code rate and the average code rate in the Description information; the signaling server forwards the Offer to the external camera robot;
after receiving the Offer, the outer camera robot sends Answer information to a signaling server, and the signaling server forwards the Answer information to the client; the client collects local ICE Candidate and sends the ICE Candidate to the signaling server, and the signaling server forwards the ICE Candidate to the external camera robot;
the foreign camera robot receives the ICE Candidate of the client, collects the local ICE Candidate and forwards the ICE Candidate to the client through the signaling server; the client receives the far-end ICE Candidate and adds the ICE Candidate into the PeerConnection; connection establishment;
and sending the encoded image data to the client through the established point-to-point connection.
The invention has the advantages that: the defects that the pushed live-broadcast pictures can only be passively watched by existing VR live-broadcast software, virtual models cannot be superposed, the image quality is low, the immersion sense is not strong, the effective transmission distance is limited and the like are overcome, the remote live-broadcast augmented reality method based on the VR head-mounted display equipment is provided, remote live-broadcast data can be collected in real time, virtual digital models can be superposed and transmitted in real time, the VR head-mounted display equipment can be connected for immersion viewing, the video recording equipment can be remotely controlled to follow up according to head actions, and the real-time performance, the reality sense, the immersion sense and the interactivity are greatly improved.
Compared with a common virtual reality live broadcast scene, the method and the device have the advantages that the professional shooting equipment is used for processing the image in combination with the related algorithm, the actual human eye observation effect is better fitted, and the definition of the video and the comfort level of an experiencer can be improved; the intelligent recognition of remote real scenes and the timely superposition of virtual objects are realized by applying a deep learning and graphics algorithm, so that a real-time virtual fusion effect is achieved; the method is combined with a 5G transmission technology, the high-definition image quality and the fluency of the client are improved, the client generates an immersion scene by receiving a common 2D picture, compared with a panoramic picture shot by a panoramic camera, transmission data is reduced by 50%, and the image quality is higher in definition; utilize VR head to wear display device's head-eye follow-up signal control multi freedom cloud platform, realize real-time interaction to the sense of immersing of first person master control mode reinforcing user for user experience is more close reality. The invention can be widely applied to cultural exposition, house exhibition, medical rehabilitation and remote observation and cooperative operation of high-risk industries, and has wide application prospect and higher social and economic value expectation.
Drawings
FIG. 1 is a schematic diagram of the working principle of the present invention;
FIG. 2 is a schematic structural diagram of a remote exo-robot of the present invention;
FIG. 3 is a process flow diagram of the method of the present invention.
Detailed Description
The technical features of the present invention will be further described with reference to the following embodiments.
Referring to fig. 1, the invention provides a remote realistic augmented reality method based on a VR head-mounted display device, and hardware for implementing the remote realistic augmented reality method based on the VR head-mounted display device mainly includes three main components, namely a remote end, a client and a cloud server, which are described in sequence below.
And the remote end adopts an external camera robot to perform real-time acquisition, depth calculation, scene construction, virtual-real fusion and wireless transmission of the live-action video, namely to complete the steps S100, S200 and S300 in the step 3. The structural schematic diagram of the peripheral robot is shown in fig. 2 and is composed of a binocular camera, a two-degree-of-freedom holder, a calculation communication module and a trolley. The binocular camera is fixed on the cradle head, and the cradle head and the calculation communication module are fixed on the trolley. The binocular camera is used for simulating human eyes, and the two-degree-of-freedom holder can rotate in the horizontal direction and the vertical direction by corresponding angles respectively, so that the inventor simulates neck motions of a human by using the two-degree-of-freedom holder; the trolley is used for simulating the advancing actions of people such as advancing, retreating, left turning, right turning and the like. The computing communication module comprises a computing module, a rendering module and a wireless communication module and is respectively responsible for functions of depth computing, scene construction, virtual-real fusion, wireless transmission and the like.
The client side is composed of data receiving and processing equipment and VR head-mounted display equipment. And the data receiving and processing equipment is responsible for receiving and processing the video data, converting the video data into data which can be recognized by the VR head-mounted display equipment, and correctly playing the data in the VR head-mounted display equipment. Meanwhile, the data receiving and processing equipment is also responsible for acquiring the head action data of the user through the VR head-mounted display equipment, and sending the head action data to a far end after processing.
The Web server and the signaling server are deployed on the cloud server at the same time. The Web server is mainly used for accessing a Web page when the client is a browser, and is not specifically described in this embodiment. The signaling server is responsible for establishing point-to-point communication between the remote end and the client.
Referring to fig. 3, the present embodiment provides a remote real-scene augmented reality method based on a VR head-mounted display device, where the method includes the following steps:
step S100: acquiring a remote live-action video in real time by adopting a camera device at a far end to obtain a remote live-action video of a far-end scene; performing depth calculation on the obtained remote live-action video to obtain depth information of the remote live-action video;
step S200: constructing a scene structure by using the depth information of the remote live-action video obtained in the step S100, and fusing the virtual object and the remote live-action video to obtain virtual-real fused image data;
step S300: intercepting the image data after the virtual-real fusion, and sending the image data after the virtual-real fusion to a client in real time through a network;
step S400: the client receives the image data after the virtual-real fusion transmitted by the network, and uses VR head-mounted display equipment to display and play;
step S500: the action data of the user are acquired through the VR head-mounted display device, and then are transmitted through a network, and the follow-up of the remote acquisition device is controlled in real time.
In the embodiment of the invention, firstly, a video image is collected in real time at multiple angles and wide angles through an external camera robot, and depth calculation is carried out on the video image through software to obtain video data and depth information of a shot real scene. And then constructing a scene structure according to the obtained depth information, and combining the scene structure with the acquired 2D video image to render a complex 3D scene. And (3) guiding the geometric virtual object generated by the computer into the scene, and carrying out occlusion and shadow calculation to realize virtual-real accurate fusion.
Then, acquiring a virtual-real fused rendering image from a back buffer rendered by software according to a specified frame rate, and performing compression coding to reduce the data volume transmitted by a network; and then transmitting the encoded image data to the client in real time through a high-speed network such as a 5G network. And finally, the client receives the image data transmitted by the network, performs reverse decoding firstly, converts the decoded video data into a required image format, and finally projects the image data to VR head-mounted display equipment for high-immersion presentation.
In addition, the client tracks and collects the head movement of the user through VR head-mounted display equipment to generate rotation movement instruction data; the robot is used for acquiring command data such as forward and backward movement sent by a user through a handle. User instructions are transmitted to the remote robot in real time through a 5G high-speed network, the robot is controlled to rotate correspondingly when shooting is conducted, and the robot is controlled to move forwards and backwards correspondingly, so that the immersion sense of a user is enhanced in a first person master control mode, and the user experience is closer to reality.
In an embodiment of the present invention, the step S100 specifically includes:
s101, in the process of trolley advancing or the process of cradle head rotating, a binocular camera collects surrounding live-action videos in real time and sends collected left and right video images to a computing module;
s102, a calculation module corrects two paths of video images, including distortion correction and stereo epipolar line correction, and after correction, corresponding pixels of the same reference point on the two images are on the same line;
s103, acquiring disparity maps of the left video image and the right video image by adopting an SGBM algorithm, and calculating a left disparity map and a right disparity map;
and S104, filling the disparity map hole. Due to the conditions of shading or uneven illumination and the like, a part of unreliable parallax can appear in the parallax image, and thus a parallax image hole is formed. The step is to detect the disparity map, find a cavity area and then fill the cavity with the mean value of the adjacent reliable disparity values;
and S105, converting the disparity map into a depth map. The unit of parallax is a pixel, the unit of depth is a millimeter, the conversion formula:
Depth=(Fx*Baseline)/Disp
wherein Depth represents a Depth value; fx represents the normalized focal length; baseline represents the distance between the optical centers of two cameras, and is called the Baseline distance; disp denotes a disparity value.
And traversing pixels of the disparity map to perform depth conversion, so as to obtain a depth map.
In the embodiment of the present invention, the step S200 specifically includes:
s201, the rendering module constructs a complex 3D scene by utilizing the collected 2D video image and the depth information obtained by calculation.
S202, the geometric virtual object generated by the computer can be led into the scene.
S203, because the depth information is calculated by the calculation module, the introduced geometric virtual object can be shielded and shaded, and virtual-real precise fusion is realized.
Specifically, in step S201, camera calibration is performed by using the acquired binocular video images Image1 and Image2 and the calculated depth information, so as to obtain the position information of the binocular camera in the real scene, and the position information is assigned to the virtual binocular camera in the virtual scene. Two geometric patches are added to the virtual scene, and the binocular 2D video images Image1 and Image2 are displayed on the two patches as display screens of the virtual camera.
Specifically, in step S202, a virtual object is imported into the virtual 3D scene, and an offset between the position information P1 of the virtual object to be placed in the real scene and the position information P2 of the binocular camera in the real scene is calculated. According to the position information P3 of the virtual camera, position information P4 of the virtual object in the virtual space is obtained through calculation, and the calculation formula is as follows:
P4=P3+P1-P2。
specifically, in step S203, the occlusion and the shadow are implemented as follows:
and (3) realizing shielding:
in step S100, depth calculation of the scene has been performed based on the video images captured by the real binocular camera, so that a depth map D1 can be obtained, and a depth map D2 can be generated from the ZBuff data of the binocular camera of the virtual scene (the virtual scene includes two display panels and the imported geometric virtual object). Traversing the pixels of the depth map D1, comparing with the pixels corresponding to the depth map D2, if the depth value of the pixels of the depth map D1 is smaller than the depth value of the pixels corresponding to the depth map D2, shifting the pixel positions of the corresponding pixels of the video images Image1 and Image2 on the two display panels to enable the positions of the pixels to exceed the positions of the corresponding pixels of the depth map D2, thereby rendering the effect that the real scene shields the virtual object.
Shadow realization:
the spatial position P5 corresponding to the depth image pixel can be restored according to the depth map D1 of the real binocular camera, a spatial coordinate system matrix M of the lighting is obtained through the position and rotation information of the lighting, and then the spatial position P6 of the pixel under a lighting coordinate system can be calculated, and the calculation formula is as follows:
P6=M*P5。
and calculating whether the pixels are shielded from the virtual object under the lamplight coordinate system, and if so, superposing a shadow when the pixels are rendered.
In an embodiment of the present invention, the step S300 specifically includes:
s301, rendering the virtual-real fused scene into a Back Buffer of OpenGL by a rendering module according to a specified frame rate (here, 60FPS is set);
and S302, after the Back Buffer updating is finished, the rendering module acquires rendered image data from the Back Buffer.
S303, in order to reduce the data volume of network transmission, compressing and encoding the image data;
s304, the wireless communication module sends the coded image data to the client in real time through a high-speed network such as a 5G network.
Specifically, in step S301, openGL (Open Graphics Library) is an application programming interface, and may be used to develop an interactive three-dimensional computer Graphics application. OpenGL uses a double buffering technique, i.e. has two buffers: front Buffer and Back Buffer. Front Buffer is the screen we see, back Buffer is in memory and is invisible to us. Each time we draw is done in the Back Buffer, the final result of the drawing must be copied to the screen when the drawing is complete. Similarly, in the invention, the scene after the virtual-real fusion is rendered into the Back Buffer first and then switched to the screen.
60FPS is a definition in the field of images, meaning that the picture is refreshed at 60 frames per second. Although theoretically, the faster the refresh rate of the frame is, the better the smoothness of the frame is, the too high refresh rate has no practical meaning, but in the present invention, the amount of data to be transmitted is increased, which leads to the decrease of the definition of the frame. The invention uses 60FPS to render data and reads image data from Back Buffer, which can meet the requirement of fluency and ensure that the image quality is not reduced.
Specifically, in step S303, the uncompressed image data has a large amount and occupies a large bandwidth, and in order to reduce the data amount transmitted through the network, the image data is compressed and encoded according to the present invention. The hard coding algorithm of NVIDIA is adopted, the algorithm uses a GPU (graphic processing unit) for coding, the performance is high, the quality is good, and the transmission efficiency is effectively improved.
Specifically, in step S304, the WebRTC technology is used to establish point-to-point communication between the remote end and the client, and then the encoded image data is sent to the client in real time. WebRTC is an abbreviation of Web Real-Time Communication, and is a Real-Time Communication technology that allows Web applications or sites to establish Peer-to-Peer (Peer-to-Peer) connections between browsers without an intermediary, so as to implement transmission of video streams and/or audio streams or any other data, and support a Web browser to perform Real-Time voice conversation or video conversation. In the present invention, a point-to-point connection between two C + + applications is established, not between browsers. The point-to-point connection implementation process based on the WebRTC comprises the following steps:
firstly, establishing connection between a client and a signaling server and connection between a remote end and the signaling server by using socket.IO, wherein the IP address and the corresponding communication port of the signaling server are required to be known in the process;
then, a point-to-point connection between the remote end and the client is established through a signaling server, and the specific flow is as follows:
the client creates a PeerConnection.
The client creates a Data Channel and creates an Offer. In the callback function for creating Offer, the Description information of Offer is obtained, and is set as the local Description information through an interface SetLocalDescription (), and then Offer is sent to the signaling server. In this step, it is to be noted that the setting of the maximum code rate and the average code rate needs to be added in the Description information by using a regular expression, otherwise, the video quality is reduced due to too low code rate in the transmission process. The signaling server then forwards Offer to the remote end.
After receiving the Offer, the remote end sends the Answer information to the signaling server, and the signaling server forwards the Answer information to the client.
The client collects the local ICE Candidate and sends it to the signaling server, which forwards it to the remote end.
The remote end receives the ICE Candidate of the client end, and also collects the local ICE Candidate and forwards the ICE Candidate to the client end through the signaling server.
The client receives the far-end ICE Candidate and adds the ICE candididate into the PeerConnection.
And establishing the connection.
And finally, sending the encoded image data to the client through the established point-to-point connection.
In the embodiment of the present invention, the step S400 specifically includes:
s401, the client receives the transmitted video image data through a WebRTC point-to-point technology;
s402, decoding the received image data;
s403, converting the decoded image data into Texture, and using the Texture as a Texture map of a geometric patch in a scene;
s404, projecting the video image to VR head-mounted display equipment through a camera for displaying.
Specifically, in step S401, to receive compressed Video image data sent by a remote end through an established WebRTC point-to-point connection, a Video rendering class VideoRenderer needs to be created first, the class inherits from a VideoFrame class of the WebRTC, and a Video Sink is registered in the class by using a Video Track to connect a Video Engine; then, in the operation process, the video image data transmitted by the far end can be received from the OnFrame () overloading interface of the type.
Specifically, in step S402, the received video image data is compression-encoded at the far end, and therefore, corresponding decoding is also required. The hard decoding algorithm of NVIDIA is still adopted for decoding, and a decoded video image is obtained. The NVIDIA hard decoding algorithm is based on CUDA, and utilizes the parallel processing capability of a GPU to accelerate decoding. The algorithm is high in execution efficiency, and the quality of the decoded image meets the requirements of the invention.
Specifically, in step S403, the client software must ensure that the decoded video image data can be played in real time, and the implementation manner adopted by the present invention is as follows: a geometric patch model is created in a scene, and then a default Texture is set as a Texture map for the model. After each video image data is decoded, the video image data is converted into the image data of Texture, and then the image data of default Texture is replaced, so that the effect of continuous playing is achieved. Compared with the method for replacing the Texture, the method has the advantages that only the memory occupied by the original image data of the Texture needs to be rewritten, and the replaced Texture memory does not need to be released repeatedly, so that the execution efficiency is improved, and the memory occupied amount is also reduced.
Specifically, in step S404, two head-mounted display device cameras are set in the client software, and are aligned with the geometric patch model in step S403. VR wears display device and passes through streamVR and the camera in the software is correlated with, can directly project the image of shooing in the camera to VR wears display device to reach the purpose that we watched the video in real time in VR wears display device.
In the embodiment of the present invention, the step S500 specifically includes:
s501, tracking and collecting head motions of a user through VR head-mounted display equipment by a client, and generating a rotation instruction;
s502, the client collects instructions of advancing and retreating the robot and the like sent by a user through a handle;
s503, converting the instruction data into character strings, connecting the character strings through WebRTC, and sending the character strings to the remote robot in real time;
and S504, after the remote robot receives the instruction data, converting the character string into a numerical value. And respectively converting the command into an actual command of the holder or the trolley according to the marks of the different commands.
And S505, sending an instruction, and controlling the holder or the trolley to rotate or advance.
Therefore, the remote robot is controlled in a first-person main control mode, the immersion of the user is enhanced, and the user experience is closer to reality.
It is to be understood that the above description is not intended to limit the present invention, and the present invention is not limited to the above examples, and those skilled in the art should understand that they can make various changes, modifications, additions and substitutions within the spirit and scope of the present invention.

Claims (6)

1. A remote live-action augmented reality method based on VR head-mounted display equipment is characterized in that: the method comprises the following steps:
step S100: acquiring a remote live-action video in real time by adopting a camera device at a far end to obtain a remote live-action video of a far-end scene; performing depth calculation on the obtained remote live-action video to obtain depth information of the remote live-action video;
step S200: constructing a scene structure by using the depth information of the remote live-action video obtained in the step S100, and fusing the virtual object and the remote live-action video to obtain virtual-real fused image data;
step S300: intercepting the image data after the virtual-real fusion, and sending the image data after the virtual-real fusion to a client in real time through a network;
step S400: the client receives the virtual-real fused image data transmitted by the network, and displays and plays the image data by using VR head-mounted display equipment;
step S500: acquiring action data of a user through VR head-mounted display equipment, and transmitting the action data through a network to control the remote acquisition equipment to follow up in real time;
in step S100, a remote terminal uses a remote robot as a camera, and the remote robot collects real-time video images from multiple angles and wide angles; then, depth calculation is carried out on the video image through software to obtain video data and depth information of the shot real scene; the outer shooting robot comprises binocular cameras, two-degree-of-freedom holders, a calculation communication module and a trolley which are arranged in pairs; the binocular camera is fixed on the cradle head, and the cradle head and the computing communication module are fixed on the trolley; the computing communication module comprises a computing module in charge of depth computing, a rendering module for scene construction and virtual-real fusion and a wireless communication module for wireless data transmission; the data transmission is carried out between the outer shooting robot and the client through the cloud server; the client side consists of data receiving and processing equipment and VR head-mounted display equipment, and the cloud server comprises a Web server and a signaling server;
the specific process of step S100 includes:
step S101: in the process of trolley advancing and the process of cradle head rotation of the outer shooting robot, the binocular camera collects surrounding live-action videos in real time, and sends the collected left and right video images to the computing module;
step S102: the calculation module corrects the two paths of video images; after correction, the corresponding pixels of the same reference point on the two images are on the same line;
step S103: acquiring disparity maps of a left video image and a right video image by adopting an SGBM algorithm, and selecting one of the disparity maps for calculation;
step S104: filling holes in a parallax map: detecting the disparity map selected in the step S103, finding a hole area, and filling a hole by using the mean value of the nearby reliable disparity values;
step S105: converting the disparity map into a depth map, wherein the conversion formula is as follows:
depth = (Fx × base)/Disp, where Depth represents a Depth value; fx represents the normalized focal length; baseline represents the distance between the optical centers of two cameras, and is called as a Baseline distance; disp represents a disparity value;
step S106: performing depth conversion on pixels of the ergodic parallax map to obtain a depth map;
the specific process of step S200 includes:
s201, constructing a 3D scene structure according to the depth information obtained in the step S100; combining a 3D scene structure with the acquired 2D video image to render a 3D scene;
s202, importing the geometric virtual object generated by the computer into a 3D scene;
s203, according to the depth information calculated by the calculation module, carrying out occlusion and shadow calculation on the imported geometric virtual object to realize virtual-real fusion;
in step S201, calibrating a camera by using the collected binocular video images Image1 and Image2 and the calculated depth information to obtain the position information of a binocular camera in the real scene, and assigning the position information to a virtual binocular camera in the virtual scene; adding two geometric patches in a virtual scene, respectively serving as display pictures of a virtual camera, and respectively displaying binocular 2D video images Image1 and Image2 on the two patches;
in step S202, a virtual object is imported into a virtual 3D scene, and an offset between the position information P1 of the virtual object to be placed in the real scene and the position information P2 of the binocular camera in the real scene is calculated; according to the position information P3 of the virtual camera, position information P4 of the virtual object in the virtual space is obtained through calculation, and the calculation formula is as follows: p4= P3+ P1-P2.
2. The VR head-mounted display device based augmented reality method of claim 1, wherein: the specific process of step S300 includes:
s301, rendering the scene after the virtual-real fusion into a Back Buffer of OpenGL by a rendering module according to a specified frame rate;
s302, after the Back Buffer is updated, the rendering module acquires rendered image data from the Back Buffer;
s303, carrying out compression coding on the image data rendered in the step S302;
s304, the wireless communication module sends the coded image data to the client in real time through a high-speed network.
3. The VR head-mounted display device based tele-reality augmented reality method of claim 2, wherein: the specific process of step S400 includes:
s401, the client receives the transmitted video image data through a WebRTC point-to-point technology;
s402, the client decodes the image data after receiving the image data;
s403, converting the decoded image data into Texture, and using the Texture as a Texture map of a geometric patch in a scene;
and S404, projecting the video image to VR head-mounted display equipment for display.
4. The VR head-mounted display device based tele-reality augmented reality method of claim 1, wherein: the specific process of step S500 includes:
s501, the client tracks and collects head movements of a user through VR head-mounted display equipment to generate a rotation instruction;
s502, the client collects a forward and backward command of the external camera robot sent by a user through a handle;
s503, converting the instruction data into character strings, connecting the character strings through a WebRTC, and sending the character strings to the external camera robot in real time;
s504, after the external camera robot receives the instruction data, converting the character string into a numerical value; the external camera robot is respectively converted into actual instructions of the holder or the trolley according to the identifiers of different instructions;
and S505, the outer camera robot controls the holder to rotate, and the trolley to move backwards or forwards through an actual command.
5. The VR head-mounted display device based augmented reality method of claim 1, wherein: in step S203, the occlusion is implemented as follows:
in the step S100, depth calculation of a scene is carried out according to a video image acquired by a real binocular camera to obtain a depth map D1; generating a depth map D2 through ZBuff data of a binocular camera of the virtual scene; comparing the pixels traversing the depth map D1 with the pixels corresponding to the depth map D2, and if the depth value of the pixel of the depth map D1 is smaller than that of the pixel corresponding to the depth map D2, performing pixel position offset on the pixels corresponding to the video images Image1 and Image2 on the two display panels to enable the positions of the pixels to exceed the positions of the pixels corresponding to the depth map D2;
in step S203, the shadow is implemented as: the spatial position P5 corresponding to the depth image pixel can be restored according to the depth image D1 of the real binocular camera; obtaining a space coordinate system matrix M of the lamplight through the position and rotation information of the lamplight, and calculating a space position P6 of a pixel under the lamplight coordinate system, wherein the calculation formula is as follows: p6= M × P5; and calculating whether the pixel is shielded from the virtual object under the lamplight coordinate system, and if so, superposing a shadow when the pixel is rendered.
6. The VR head-mounted display device based tele-reality augmented reality method of claim 2, wherein: in step S304, the WebRTC-based peer-to-peer connection implementation process is as follows:
establishing connection between the client and the signaling server and between the external camera robot and the signaling server by using socket.IO, and acquiring an IP address and a corresponding communication port of the signaling server;
the point-to-point connection between the external camera robot and the client is established through the signaling server, and the specific process is as follows:
the client creates a PeerConnection; a client creates a Data Channel and creates an Offer;
in the callback function of creating Offer, obtaining the Description information of Offer, setting the Description information as local Description information through an interface SetLocalDescription (), and then sending Offer to a signaling server; utilizing a regular expression to add the setting of the maximum code rate and the average code rate in the Description information; the signaling server forwards the Offer to the external camera robot;
after receiving the Offer, the outer camera robot sends Answer information to a signaling server, and the signaling server forwards the Answer information to the client; the client collects local ICE Candidate and sends the ICE Candidate to the signaling server, and the signaling server forwards the ICE Candidate to the external camera robot;
the foreign camera robot receives the ICE Candidate of the client, collects the local ICE Candidate and forwards the ICE Candidate to the client through the signaling server; the client receives the far-end ICE Candidate and adds the ICE Candidate into the PeerConnection; establishing connection;
and sending the encoded image data to the client through the established point-to-point connection.
CN202110392566.3A 2021-04-13 2021-04-13 Remote live-action augmented reality method based on VR head-mounted display equipment Active CN113099204B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110392566.3A CN113099204B (en) 2021-04-13 2021-04-13 Remote live-action augmented reality method based on VR head-mounted display equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110392566.3A CN113099204B (en) 2021-04-13 2021-04-13 Remote live-action augmented reality method based on VR head-mounted display equipment

Publications (2)

Publication Number Publication Date
CN113099204A CN113099204A (en) 2021-07-09
CN113099204B true CN113099204B (en) 2022-12-13

Family

ID=76676454

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110392566.3A Active CN113099204B (en) 2021-04-13 2021-04-13 Remote live-action augmented reality method based on VR head-mounted display equipment

Country Status (1)

Country Link
CN (1) CN113099204B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113625997B (en) * 2021-07-20 2024-03-12 青岛小鸟看看科技有限公司 XR software development architecture, application method and electronic equipment
CN113766267B (en) * 2021-09-15 2023-10-24 深圳市道通智能航空技术股份有限公司 Multi-path video live broadcast method, system, equipment and storage medium based on unmanned aerial vehicle
CN113949929A (en) * 2021-10-15 2022-01-18 保升(中国)科技实业有限公司 Video communication lifelike technology
CN114401414B (en) * 2021-12-27 2024-01-23 北京达佳互联信息技术有限公司 Information display method and system for immersive live broadcast and information pushing method
CN114047824A (en) * 2022-01-13 2022-02-15 北京悉见科技有限公司 Method for interaction of multiple terminal users in virtual space
TWI821876B (en) * 2022-01-21 2023-11-11 在地實驗文化事業有限公司 Mobile smart augmented reality live broadcast device
CN116938901A (en) * 2022-04-01 2023-10-24 中国移动通信有限公司研究院 Data transmission method, device, electronic equipment and readable storage medium
CN114844934A (en) * 2022-04-28 2022-08-02 北京北建大科技有限公司 Multi-person large-space VR interactive scene building method based on cloud rendering
CN117278731A (en) * 2023-11-21 2023-12-22 启迪数字科技(深圳)有限公司 Multi-video and three-dimensional scene fusion method, device, equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102945564A (en) * 2012-10-16 2013-02-27 上海大学 True 3D modeling system and method based on video perspective type augmented reality
WO2018095278A1 (en) * 2016-11-24 2018-05-31 腾讯科技(深圳)有限公司 Aircraft information acquisition method, apparatus and device
CN110378990A (en) * 2019-07-03 2019-10-25 北京悉见科技有限公司 Augmented reality scene shows method, apparatus and storage medium
WO2020221311A1 (en) * 2019-04-30 2020-11-05 齐鲁工业大学 Wearable device-based mobile robot control system and control method

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102510506B (en) * 2011-09-30 2014-04-16 北京航空航天大学 Virtual and real occlusion handling method based on binocular image and range information
CN102568026B (en) * 2011-12-12 2014-01-29 浙江大学 Three-dimensional enhancing realizing method for multi-viewpoint free stereo display
CN103868460B (en) * 2014-03-13 2016-10-05 桂林电子科技大学 Binocular stereo vision method for automatic measurement based on parallax optimized algorithm
CN105373224B (en) * 2015-10-22 2016-06-22 山东大学 A kind of mixed reality games system based on general fit calculation and method
DE102016200225B4 (en) * 2016-01-12 2017-10-19 Siemens Healthcare Gmbh Perspective showing a virtual scene component
CN106843456B (en) * 2016-08-16 2018-06-29 深圳超多维光电子有限公司 A kind of display methods, device and virtual reality device based on posture tracking
CN107835436B (en) * 2017-09-25 2019-07-26 北京航空航天大学 A kind of real-time virtual reality fusion live broadcast system and method based on WebGL
CN109062407A (en) * 2018-07-27 2018-12-21 江西省杜达菲科技有限责任公司 Remote mobile terminal three-dimensional display & control system and method based on VR technology
CN112419472B (en) * 2019-08-23 2022-09-30 南京理工大学 Augmented reality real-time shadow generation method based on virtual shadow map
CN112114667A (en) * 2020-08-26 2020-12-22 济南浪潮高新科技投资发展有限公司 AR display method and system based on binocular camera and VR equipment
CN112486606B (en) * 2020-11-19 2022-08-12 湖南麒麟信安科技股份有限公司 Cloud desktop display optimization method and system based on Android system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102945564A (en) * 2012-10-16 2013-02-27 上海大学 True 3D modeling system and method based on video perspective type augmented reality
WO2018095278A1 (en) * 2016-11-24 2018-05-31 腾讯科技(深圳)有限公司 Aircraft information acquisition method, apparatus and device
WO2020221311A1 (en) * 2019-04-30 2020-11-05 齐鲁工业大学 Wearable device-based mobile robot control system and control method
CN110378990A (en) * 2019-07-03 2019-10-25 北京悉见科技有限公司 Augmented reality scene shows method, apparatus and storage medium

Also Published As

Publication number Publication date
CN113099204A (en) 2021-07-09

Similar Documents

Publication Publication Date Title
CN113099204B (en) Remote live-action augmented reality method based on VR head-mounted display equipment
CN106101741B (en) Method and system for watching panoramic video on network video live broadcast platform
CN106157359B (en) Design method of virtual scene experience system
CN112533002A (en) Dynamic image fusion method and system for VR panoramic live broadcast
KR20150067194A (en) Controlled three-dimensional communication endpoint
CN114401414B (en) Information display method and system for immersive live broadcast and information pushing method
US20190222823A1 (en) Techniques for Capturing and Rendering Videos with Simulated Reality Systems and for Connecting Services with Service Providers
CN107277494A (en) three-dimensional display system and method
JP7217226B2 (en) Methods, devices and streams for encoding motion-compensated images in global rotation
You et al. Internet of Things (IoT) for seamless virtual reality space: Challenges and perspectives
US11967014B2 (en) 3D conversations in an artificial reality environment
US20150179218A1 (en) Novel transcoder and 3d video editor
CN107562185B (en) Light field display system based on head-mounted VR equipment and implementation method
CN113379831B (en) Augmented reality method based on binocular camera and humanoid robot
US11348252B1 (en) Method and apparatus for supporting augmented and/or virtual reality playback using tracked objects
US11727645B2 (en) Device and method for sharing an immersion in a virtual environment
JP6091850B2 (en) Telecommunications apparatus and telecommunications method
JP7011728B2 (en) Image data output device, content creation device, content playback device, image data output method, content creation method, and content playback method
Valli et al. Advances in spatially faithful (3d) telepresence
WO2022191070A1 (en) 3d object streaming method, device, and program
CN117456113B (en) Cloud offline rendering interactive application implementation method and system
US11688124B2 (en) Methods and apparatus rendering images using point clouds representing one or more objects
WO2022022548A1 (en) Free viewpoint video reconstruction and playing processing method, device, and storage medium
TWI817273B (en) Real-time multiview video conversion method and system
US20230115563A1 (en) Method for a telepresence system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20240318

Address after: Room 102, 1st Floor, Building 1, Yongye Building, No. 166 Haier Road, Laoshan District, Qingdao City, Shandong Province, 266000

Patentee after: Qingdao Inception Space Technology Co.,Ltd.

Country or region after: China

Address before: 266000 No.393, Songling Road, Laoshan District, Qingdao, Shandong Province

Patentee before: QINGDAO RESEARCH INSTITUTE OF BEIHANG University

Country or region before: China