CN116866541A

CN116866541A - Virtual-real combined real-time video interaction system and method

Info

Publication number: CN116866541A
Application number: CN202310670578.7A
Authority: CN
Inventors: 李锐; 文锋
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2023-06-07
Filing date: 2023-06-07
Publication date: 2023-10-10

Abstract

The application discloses a virtual-real combined real-time video interaction system and a virtual-real combined real-time video interaction method, which relate to the technical field of videos, wherein real-time pose detection is carried out on an interaction object to obtain object pose information of the interaction object mapped to a virtual scene, then the scene data of the virtual scene are combined to generate video picture data which can be seen at the position and the pose of the interaction object, further a video data distributor carries out picture processing based on a stereoscopic vision technology and then displays the picture data through a display screen, and correspondingly, the interaction object wearing a stereoscopic vision receiver can receive a corresponding stereoscopic vision perception picture through the stereoscopic vision receiver, the video picture content can be changed through changing the pose of the interaction object of the system, the interaction object can perceive the virtual scene through the video picture and the surrounding real environment, discomfort caused by virtual-real paradox of VR technology is avoided, and the safety of video content interaction is improved.

Description

Virtual-real combined real-time video interaction system and method

Technical Field

The application relates to the technical field of videos, and provides a virtual-real combined real-time video interaction system and method.

Background

Video is a form of multimedia that carries rich information, and with the development of network technology, the form of presenting video content is becoming more and more rich, such as traditional video player technology, and Virtual Reality (VR) technology that is becoming popular today. Among them, three-dimensional (3D) stereoscopic video technology belongs to one of video technologies, and compared with 2D video, the content of 3D video presents more intuitively and truly, and can more give the viewer the feeling of being in the scene.

However, the conventional 3D stereoscopic video needs to record video materials in advance, and only can present fixed video content according to the lens view angle set by the film director, so that interaction cannot be performed; while VR video can dynamically present video pictures of different visual angles and can interact with video content, VR video presentation requires viewers to wear professional VR helmet equipment, and the equipment surrounds pictures seen by eyes in all aspects, so that real human perception and visual perception are opposite, 3D dizziness is easy to generate, discomfort caused by the stereoscopic vision is strong, and surrounding environments cannot be observed when the VR helmet is worn, so that certain potential safety hazards exist.

Therefore, how to improve the realism and security of the video content interaction and reduce the discomfort during the video content interaction is a current urgent problem to be solved.

Disclosure of Invention

The embodiment of the application provides a virtual-real combined real-time video interaction system and a virtual-real combined real-time video interaction method, which are used for improving the reality and safety of video content interaction and reducing the discomfort of the video content interaction.

In one aspect, a real-time video interaction system combining virtual and real is provided, the system comprising:

the detection subsystem is used for carrying out real-time pose detection on the interactive object in a detectable area in the real space to obtain object pose information of the interactive object mapped to the virtual scene, and a space mapping relation exists between the detectable area and the virtual scene;

the rendering subsystem is used for rendering video pictures according to the object pose information based on the scene data of the virtual scene to obtain corresponding video picture data;

the video data distributor is used for carrying out picture processing on the video picture data based on a stereoscopic vision technology and distributing the processed video picture data to the display screen for display;

And the stereoscopic vision receiver is worn on the interactive object and is used for performing stereoscopic vision receiving control on the eyes of the interactive object by adopting a visual receiving control mode corresponding to the stereoscopic vision technology so that the eyes of the interactive object receive different video pictures displayed on the display screen to form stereoscopic vision perception.

In a possible implementation manner, the system further comprises at least one interactive function device, wherein each interactive function device is in one-to-one correspondence with a virtual prop in the virtual scene and is used for providing an interactive function effect with the virtual scene;

the detection subsystem is also used for detecting the real-time pose of the interactive function equipment controlled by the interactive object to obtain the pose information of the virtual prop in the virtual scene, which corresponds to the interactive function equipment;

the rendering subsystem is further used for determining at least one virtual object interacted with the virtual prop from the virtual scene based on the prop pose information, and performing video picture rendering on the at least one virtual object based on the corresponding interaction function effect to obtain video picture data containing the interaction function effect.

In one possible implementation, the detection subsystem includes at least one motion capture camera, a wearable tracking device worn on the interactive object, and a motion capture analysis apparatus, the at least one motion capture camera positioned around the detectable region;

the at least one motion capture camera is used for shooting motion capture images comprising the wearable tracking device;

the motion capture analysis device is used for detecting the pose of the wearable tracking device in real time according to at least one motion capture image, and determining the pose information of the object based on the obtained pose information of the tracking device.

In one possible embodiment, the wearable tracking device includes a flexible wearing part, a rigid mounting part, and a plurality of trackers fixedly mounted on the rigid mounting part and worn on the interactive object through the flexible wearing part;

the trackers are arranged in a directed mode, and the arrangement direction points to the facing direction of the interactive object;

the rendering subsystem is specifically configured to obtain corresponding video frame data according to the position and the facing direction indicated by the object pose information based on the scene data in the virtual scene.

In one possible implementation, the motion capture camera is an infrared camera, and the tracker is a reflective marker point coated with fluorescent material on the surface.

In one possible embodiment, the detection subsystem comprises a depth camera and a depth analysis device;

the depth camera is used for shooting a depth image of the interactive object, and each pixel point in the depth image represents the distance between the corresponding object and the depth camera;

the depth analysis device is used for extracting skeleton information of the interactive object based on the depth image and determining the object pose information according to the obtained skeleton information.

In a possible implementation manner, the system further comprises an operable terminal device, configured to provide an operable page for inputting object parameters of the interactive object, where the object parameters include a pupil distance parameter and a wearing offset parameter, and the wearing offset parameter is an offset value between a wearable tracking device and the interactive object;

the rendering subsystem is specifically configured to perform compensation processing on pose information of the tracking device based on the wearing offset parameter to obtain pose information of the object; and respectively obtaining binocular pose information sets of the interactive object based on the pupil distance parameters and the object pose information, and performing video picture rendering on the binocular pose information sets based on scene data in the virtual scene to obtain video picture data containing video pictures corresponding to the eyes respectively.

In a possible implementation manner, the virtual scene includes a virtual screen corresponding to the display screen;

the rendering subsystem is specifically configured to configure a corresponding virtual image acquisition device in the virtual scene based on the binocular pose information set; and performing video picture rendering based on picture data of the virtual scene projected into the virtual screen, which is acquired by the virtual image acquisition device, so as to acquire corresponding video picture data.

In one possible implementation, the display screen is composed of a plurality of sub-screens, and the video data distributor includes a plurality of sub-distributors, each corresponding to at least one sub-screen;

each sub-distributor is used for carrying out picture processing based on the stereoscopic vision technology on the video picture data to be displayed of the corresponding sub-screen, and distributing the processed video picture data to the corresponding sub-screen for display.

In one possible implementation, the rendering subsystem includes a plurality of graphics renderers, each graphics renderer corresponding to one sub-screen one to one;

and each graphic renderer is used for carrying out video picture rendering on the basis of the relative position relation between the object pose information and the corresponding sub-screen and the scene data in the virtual scene, and obtaining the video picture data to be displayed of the corresponding sub-screen.

In a possible implementation, the rendering subsystem further comprises a synchronization device;

the synchronization device is used for respectively sending synchronization signals to the plurality of graphic renderers so that the plurality of graphic renderers synchronously perform video picture rendering.

In one possible implementation of the method according to the invention,

the video data distributor is specifically configured to perform a cropping process based on a binocular viewing angle on the video picture data, obtain left-eye picture data and right-eye picture data respectively, and sequentially distribute the left-eye picture data and the right-eye picture data to the display screen for display;

the stereoscopic vision receiver is specifically configured to block a right eye from receiving a picture when the display screen displays a left eye picture, and block a left eye from receiving a picture when the display screen displays a right eye picture.

In one possible embodiment, the stereoscopic receiver is shutter glasses.

In one aspect, a real-time video interaction method combining virtual and real is provided, which includes:

performing real-time pose detection on an interactive object positioned in a detectable region in real space to obtain object pose information of the interactive object mapped to a virtual scene, wherein a spatial mapping relationship exists between the detectable region and the virtual scene;

Based on the scene data of the virtual scene, performing video picture rendering aiming at the object pose information to obtain corresponding video picture data;

performing picture processing based on stereoscopic vision technology on the video picture data, and outputting the processed video picture data to a display screen for display;

and carrying out stereoscopic vision receiving control on the eyes of the interactive object by adopting a visual receiving control mode corresponding to the stereoscopic vision technology, so that the eyes of the interactive object receive different video pictures displayed by the display screen to form stereoscopic vision perception.

In one aspect, a computer device is provided comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of any of the methods described above when the computer program is executed.

In one aspect, a computer storage medium is provided, on which a computer program is stored which, when executed by a processor, carries out the steps of any of the methods described above.

In one aspect, a computer program product is provided that includes a computer program stored in a computer readable storage medium. The processor of the computer device reads the computer program from the computer readable storage medium, and the processor executes the computer program so that the computer device performs the steps of any of the methods described above.

In the embodiment of the application, the real-time pose detection is carried out on the interactive object to obtain the object pose information of the interactive object mapped to the virtual scene, then the scene data of the virtual scene is combined to generate the video picture data which can be seen at the position and the pose of the interactive object, and then the video data distributor carries out the picture processing based on the stereoscopic vision technology and then displays the picture through the display screen, and correspondingly, the interactive object wearing the stereoscopic vision receiver can receive the corresponding stereoscopic vision sensing picture through the stereoscopic vision receiver. Therefore, compared with the traditional stereoscopic video playing technology, the scheme of the embodiment of the application can render corresponding video pictures according to the pose of the interactive object in real time, so that the interactive object can change the video picture content by changing the pose of the interactive object, the video picture display is not limited to a fixed visual angle any more, the interaction with video is realized, the reality of the interaction with video content is improved, the scheme of the embodiment of the application still displays pictures through a display screen, the interactive object can form stereoscopic vision perception only by wearing a corresponding stereoscopic vision receiver, and does not need to wear equal heavy equipment similar to VR equipment, and meanwhile, the interactive object can sense virtual scenes through the video pictures and sense surrounding real environments by using the stereoscopic vision receiver, thereby avoiding discomfort caused by the paradox of virtual reality of the VR technology and improving the safety of the interaction of video content.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the related art, the drawings that are required to be used in the embodiments or the related technical descriptions will be briefly described, and it is apparent that the drawings in the following description are only embodiments of the present application, and other drawings may be obtained according to the provided drawings without inventive effort for those skilled in the art.

FIGS. 1 a-1 b are schematic views of video presentation in the related art;

fig. 2a to fig. 2b are schematic diagrams of an application scenario provided in an embodiment of the present application;

fig. 3 is a schematic diagram of a system architecture of a virtual-real combined real-time video interactive system according to an embodiment of the present application;

FIG. 4 is a schematic diagram of an interactive prop simulated by an interactive function device according to an embodiment of the present application;

fig. 5a to fig. 5b are schematic diagrams of pictures of an interaction effect presented by the interaction function device according to the embodiment of the present application;

FIG. 6 is a schematic diagram of a system architecture of a detection subsystem implemented based on motion capture technology according to an embodiment of the present application;

fig. 7a to 7c are schematic diagrams related to a detection subsystem implemented based on a motion capture technology according to an embodiment of the present application;

FIG. 8 is a diagram illustrating another system architecture of a detection subsystem 301 according to an embodiment of the present application;

FIG. 9 is a schematic diagram of coordinate system establishment provided by an embodiment of the present application;

FIG. 10 is a schematic diagram of a virtual screen according to an embodiment of the present application;

fig. 11 is a schematic diagram of a conversion principle of a binocular pose information set using a hat as an example of a wearable tracking device according to an embodiment of the present application;

fig. 12 is a schematic diagram of mapping relation of a virtual stereo camera according to an embodiment of the present application;

FIG. 13 is a schematic diagram of a rendering subsystem according to an embodiment of the present application;

fig. 14 is a schematic diagram of a synchronization signal according to an embodiment of the present application;

FIG. 15 is a schematic diagram of a time-division-based frame output timing according to an embodiment of the present application;

fig. 16 is a flow chart of a real-time video interaction method combining virtual and real according to an embodiment of the present application;

FIG. 17 is a schematic diagram of an interaction flow provided in an embodiment of the present application;

fig. 18 is a schematic diagram of a composition structure of a virtual-real combined real-time video interaction device according to an embodiment of the present application;

fig. 19 is a schematic diagram of the composition structure of another virtual-real combination real-time video interactive apparatus to which the embodiment of the present application is applied.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application. Embodiments of the application and features of the embodiments may be combined with one another arbitrarily without conflict. Also, while a logical order is depicted in the flowchart, in some cases, the steps depicted or described may be performed in a different order than presented herein.

It will be appreciated that in the following detailed description of the application, data related to interactive objects and the collection of the related data, such as motion capture images, depth images, etc., are involved, when the embodiments of the application are applied to a particular product or technology, related permissions or consents need to be obtained, and the collection, use and processing of the related data need to comply with the relevant laws and regulations and standards of the relevant country and region.

In order to facilitate understanding of the technical solution provided by the embodiments of the present application, some key terms used in the embodiments of the present application are explained here:

light emitting diode (Light Emitting Diode, LED) display screen: LEDs are tiny bulbs that are visible on electronic devices, LED displays use a large number of LEDs to illuminate the screen, LEDs are low power devices that provide high brightness, LEDs are widely used, for example, from televisions to computer displays to high resolution billboards or LED video walls in shopping centers, and the like, all involve the use of LEDs.

Video interaction: the method is a mode capable of carrying out interaction with respect to a played video, when the video interaction is implemented, the played video content also changes correspondingly along with the interaction process, wherein the interaction action can comprise the change of the pose of an interaction object, the change of the pose of the interaction object brings about the change of the viewing angle of the interaction object, so that the change of the video content can be the video content which can be viewed along with the change of the viewing angle along with the switching, and the interaction object uses the interaction prop to carry out certain action or send certain instruction and the like, for example, the interaction prop is a prop simulating a flashlight, the interaction object simulates the use action of the flashlight, and the area pointed by the flashlight is illuminated in a video picture; or the interactive prop is a prop simulating a sword, and when the interactive object simulates to cut an object by using the sword, the object touched by the sword in the video picture is cut accordingly.

Virtual-real combination: the virtual-real combination is an effect of enabling a real environment and a virtual scene to be perceived simultaneously, and by adopting a virtual-real combination mode, an interactive object can interact with the virtual scene through a video picture, and can simultaneously and truly perceive each object in the real environment, so that the interactive object can avoid obstacles in the real environment and avoid stumbling by the obstacles while carrying out video interaction, and the safety is higher.

Pose information: the pose information related to the embodiment of the application comprises object pose information of an interactive object and prop pose information of the interactive prop, wherein the object pose information comprises position information and pose information of the interactive object, and the pose information comprises but is not limited to information such as orientation, visual angle, visual line direction or limb action of the interactive object, and the object pose information can be mapped into a virtual scene, so that a corresponding video picture is generated according to the mapped object pose information; or the prop pose information is similar to the prop pose information, and the prop pose information can express the position information and the pose information of the prop, so that the prop can be judged to generate the interactive effect, and the corresponding interactive effect is further added in the video picture.

Optical motion capture: the optical motion capture system mainly comprises optical mark points (markers), an optical motion capture camera, a POE (Power Over Ethernet) exchanger, a calibration frame and other devices, wherein the mark points are pasted on key parts of a moving object (such as joints of the crotch, the elbow and the like of a human body), the Marker points are detected in real time from different angles through the optical motion capture camera, data are transmitted to data processing equipment (such as a computer) in real time, and the specific space coordinates, the specific directions, the pose information of the motion track and the like of the Marker points are calculated according to an algorithm, so that the motion pose of the moving object is restored.

Detectable region: the method refers to a limited area range capable of detecting the interactive object or the interactive prop, and can be generally considered as a range in which the interactive object can move when realizing video interaction, the detectable area has a spatial mapping relation with the virtual scene, and the position change of the interactive object in the detectable area can be mapped to the corresponding position change in the virtual scene. For example, the spatial mapping relationship may be equally mapped, i.e. the interactive object changes position within the detectable region, and an equal change in position may occur in the virtual scene, i.e. the interactive object moves 1m to the left, and then likewise moves 1m to the left in the virtual scene; alternatively, the spatial mapping relationship may be proportionally changed, that is, the position change of the interactive object in the detectable region may be mapped into the virtual scene in a proportion, for example, the interactive object moves to the left by 1m, then moves to the left by 2m in the virtual scene, or the interactive object moves to the left by 1m, then moves to the left by 0.5m in the virtual scene; alternatively, the spatial mapping relationship may be related to the state of the interactive object itself, for example, the greater the moving speed of the interactive object, the greater the proportion of mapping, following the moving speed change of the interactive object.

Virtual scene: the method is used for presenting video pictures, the content displayed by the video pictures is scene content of a virtual scene, the virtual scene can be a completely fictive scene, or a scene constructed according to a real scene, and the method is similar to the real scene, and different scene contents in the virtual scene can be observed when the pose of a mapping object of an interactive object in the virtual scene changes.

Stereoscopic vision technology: the stereoscopic vision is an important subject in the field of computer vision, aims to reconstruct three-dimensional geometric information of a scene, and uses the parallax principle of human eyes for stereoscopic vision, namely, the left eye and the right eye are different from the real world observation, and the brain just uses the difference of the left eye and the right eye, so that people can recognize the distance of the object. Therefore, in order to enable the video frames to exhibit the stereoscopic effect, a common method is to process the frames by using a stereoscopic technique to obtain video frames with different left and right eyes, and to realize stereoscopic effect by combining the two frames by the brain.

Stereoscopic vision receiver: the stereoscopic vision display device is corresponding to a stereoscopic vision technology and is used for controlling the left eye and the right eye to be capable of receiving different video pictures, the common stereoscopic vision display device adopts a time division technology, so that the left eye and the right eye receive pictures at different moments, the left eye and the right eye can receive different video pictures with parallax, and stereoscopic vision perception is formed by combining the two pictures through the brain.

Interaction function device: the interactive function device is a device worn by an interactive object, such as a hand-held sword or a flashlight, but needs to be explained, the corresponding interactive function device does not need to be a device with the effect, for example, when the interactive effect is the lighting effect of the flashlight, the interactive function device does not need to be an entity flashlight, but can be an article similar to the flashlight in appearance, and can be used for approximately simulating the pose change of the flashlight in use. For an interactive function device, there is a virtual prop corresponding to the virtual prop in the virtual scene, such as a simulated flashlight device, and there is a virtual flashlight corresponding to the virtual prop in the virtual scene, so that a lighting effect similar to that of the flashlight can be generated in the virtual scene.

Game engine: refers to the core components of some compiled editable computer game systems or some interactive real-time image applications. These systems provide game designers with the various tools required to write games in order to allow the game designer to easily and quickly write a game program without starting from scratch. Most game engines support a variety of operating platforms such as Linux, mac OS X, windows. The game engine includes a rendering engine (i.e., a "renderer" including a two-dimensional image engine and a three-dimensional image engine), a physical engine, a collision detection system, an audio engine, a script engine, a computer animation engine, an artificial intelligence engine, a network engine, and a scene management engine.

The following briefly describes the technical idea of the embodiment of the present application:

with the development of network technology, the form of presenting video content is more and more abundant, and the content presentation of 3D video mainly has the following modes:

(1) In a traditional 3D stereoscopic video playing mode, see fig. 1a, a 3D movie scene is played in a movie theater, and a viewer needs to wear special 3D glasses to form 3D stereoscopic perception through content displayed on a screen.

(2) In the VR video playing mode, as shown in fig. 1b, a viewer needs to wear a heavy VR head display device, and can see that the VR head display device generally surrounds a picture which can be seen by human eyes in all aspects, so that although the VR video playing mode can realize interaction with video content, in the process of video interaction, the visual perception of an interaction object is in a virtual environment presented in VR video, while the other perception of the interaction object is the actual perception of the surrounding environment, the two are opposite to each other, for example, the video content is the actual perception that the interaction object is traversing a mountain hole with rough road, but the actual environment is a VR experience store of a market, and the environment perception brought by the two is completely different, so that the interaction object is easy to generate 3D dizziness, and the discomfort brought by the current VR video interaction is strong. Moreover, it can be seen that the full-scale eye surrounding makes the interactive object unable to observe and feel the surrounding environment when wearing the VR head display device to be immersed in the virtual world, so that the interactive object is easy to collide with surrounding obstacles in the interaction process, even stumbled by the obstacles, and has a certain potential safety hazard.

Based on the above, the embodiment of the application provides a virtual-real combined real-time video interaction system, which obtains object pose information of an interaction object mapped to a virtual scene by carrying out real-time pose detection on the interaction object, then combines scene data of the virtual scene to generate video picture data which can be seen at the position and the pose of the interaction object, further carries out picture processing based on a stereoscopic vision technology by a video data distributor, and then displays the picture through a display screen, and correspondingly, the interaction object wearing a stereoscopic vision receiver can receive a corresponding stereoscopic vision perception picture through the stereoscopic vision receiver. Therefore, compared with the traditional stereoscopic video playing technology, the scheme of the embodiment of the application can render corresponding video pictures according to the pose of the interactive object in real time, so that the interactive object can change the video picture content by changing the pose of the interactive object, the video picture display is not limited to a fixed visual angle any more, the interaction with video is realized, the reality of the interaction with video content is improved, the scheme of the embodiment of the application still displays pictures through a display screen, the interactive object can form stereoscopic vision perception only by wearing a corresponding stereoscopic vision receiver, and does not need to wear equal heavy equipment similar to VR equipment, and meanwhile, the interactive object can sense virtual scenes through the video pictures and sense surrounding real environments by using the stereoscopic vision receiver, thereby avoiding discomfort caused by the paradox of virtual reality of the VR technology and improving the safety of the interaction of video content.

In order to improve experience, the display screen of the embodiment of the application is formed by combining a plurality of screens, and likewise, the graphic renderers are in one-to-one correspondence with the screens, and each graphic renderer is used for rendering the content to be displayed of the corresponding screen when video rendering is performed, so that video scenes with a larger range can be displayed through the plurality of screens, and better video interaction experience is provided for interaction objects.

The following description is made for some simple descriptions of application scenarios applicable to the technical solution of the embodiment of the present application, and it should be noted that the application scenarios described below are only used for illustrating the embodiment of the present application, but not limiting. In the specific implementation process, the technical scheme provided by the embodiment of the application can be flexibly applied according to actual needs.

The scheme provided by the embodiment of the application can be applied to scenes needing video interaction, such as stereoscopic video playing or stereoscopic game scenes. As shown in fig. 2a or fig. 2b, an application scenario is provided for an embodiment of the present application, in which an interactive object 201 and a real-time video interactive system 202 may be included, where the real-time video interactive system 202 is used to provide an environment for real-time video interaction for the interactive object 201, so the real-time video interactive system 202 may also be referred to as a virtual studio or a virtual video interactive product.

In a specific implementation process, when the interactive object 201 uses the video interactive function provided by the real-time video interactive system 202, the real-time video interactive system 202 obtains object pose information of mapping the interactive object into a virtual scene by performing real-time pose detection on the interactive object, and then generates video picture data which can be seen at the position and the pose of the interactive object by combining the scene data of the virtual scene, so that the video data distributor performs picture processing based on a stereoscopic vision technology and then displays the picture through a display screen, and accordingly, the interactive object wearing the stereoscopic vision receiver can receive a corresponding stereoscopic vision sensing picture through the stereoscopic vision receiver.

In addition, when the interactive object 201 uses the interactive function device included in the real-time video interactive system 202 to implement interaction, the real-time video interactive system 202 performs real-time pose detection for the interactive function device in addition to real-time pose detection for the interactive object 201, so that the interactive function device participates in the rendering process of the video picture according to the device pose information of the interactive function device to present a corresponding interactive effect in the video picture.

In one possible implementation, referring specifically to fig. 2a, all of the functions of the real-time video interaction system 202 may be integrated into the same device, which is referred to as a video interaction device. For example, the video interactive device may be an LED video wall or a smart display device that integrates all hardware and software functions required to implement the functions of the real-time video interactive system 202, except for devices worn by the interactive object 201 (e.g., interactive function devices, stereo vision receivers, etc.).

In such a case, the video interaction device may comprise a device including one or more processors, memory, sensor modules (e.g., pose detection sensors) and interaction I/O interfaces, etc. In addition, the memory of the video interaction device may further store program instructions that are required to be executed by each of the virtual-real combined real-time video interaction methods provided by the embodiments of the present application, where the program instructions, when executed by the processor, can be used to implement the virtual-real combined real-time video interaction process provided by the embodiments of the present application, for example, related program instructions including a pose detection process, a video picture rendering process, a picture processing process based on stereoscopic vision technology, and the like.

In one possible implementation, referring specifically to FIG. 2b, the real-time video interaction system 202 may comprise a front-end display device 2021 and a back-end data processing device 2022.

The front-end display device 2021 may include basic functions related to screen display, and functions for performing related data acquisition with respect to the interactive object 201 (for example, capturing motion capture images or depth images, etc.). For example, the front end display device 2021 may be implemented by an LED tv wall or a smart display device.

In a specific application, the front-end display device 2021 may be a terminal device with a display function, for example, any device with a video screen content presentation function, such as a mobile phone, a tablet personal computer (PAD), a notebook computer, a desktop computer, an intelligent television, and an intelligent vehicle-mounted device, or the front-end display device 2021 may be a special device for implementing a virtual-real combination real-time video interactive function provided by the embodiment of the present application, for example, may be sold as an interactive product. In the front-end display device 2021, virtual-real combined real-time video interaction functions may be provided to the user in the form of an Application (APP), for example, after the APP is installed in all the front-end display devices 2021, the user may implement the functions of the real-time video interaction system 202 provided by the embodiment of the present application in cooperation with hardware (such as a camera or an interaction prop) required by the process of the embodiment of the present application.

The background data processing device 2022 is configured to provide required data processing capabilities for real-time video interaction processes combined with virtual reality, such as pose detection processes, video frame rendering processes, and frame processing processes based on stereoscopic techniques. For example, it may be a computer device, such as a host computer, having a certain computing power; alternatively, the server may be a server, for example, a background server corresponding to the APP, which is mentioned above, and the server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, that is, a content distribution network (Content Delivery Network, CDN), and basic cloud computing services such as big data and an artificial intelligence platform, which are not limited thereto.

In an embodiment of the present application, the front end display device 2021 and the back end data processing device 2022 may be directly or indirectly connected by one or more networks. The network may be a wired network, or may be a Wireless network, for example, a mobile cellular network, or may be a Wireless-Fidelity (WIFI) network, or may be other possible networks, which embodiments of the present application are not limited in this respect.

In a possible implementation manner, each subsystem (for example, a detection subsystem, a rendering subsystem, a display subsystem, a visual receiving subsystem, a prop subsystem, etc.) included in the virtual-real combination real-time video interaction system provided by the embodiment of the application can be implemented by using independent devices, so that a user can select a proper device according to own requirements when purchasing the device.

In the following, the virtual-real combined real-time video interactive system provided by the exemplary embodiment of the present application will be described with reference to the accompanying drawings in conjunction with the above-described application scenario, and it should be noted that the above-described application scenario is only shown for the convenience of understanding the spirit and principle of the present application, and the embodiment of the present application is not limited in any way in this respect.

Referring to fig. 3, a schematic system architecture of a virtual-real combined real-time video interaction system according to an embodiment of the present application is shown, where the system includes a detection subsystem 301, a rendering subsystem 302, a video data distributor 303, a display screen 304 disposed around a detectable area a, and a stereoscopic receiver 305, where each portion is used to implement the following functions, respectively, to implement a real-time video interaction process.

In the embodiment of the present application, when an interactive object is located in a detectable area a in real space and the video interactive function provided by the embodiment of the present application is implemented, the detection subsystem 301 may be configured to perform real-time pose detection on the interactive object in the detectable area a to obtain object pose information of the interactive object in the detectable area a in real space, and a spatial mapping relationship exists between the detectable area a and a virtual scene, so that the object pose information in real space may be mapped to the virtual scene by using the spatial mapping relationship to obtain object pose information of the interactive object in the virtual scene; the rendering subsystem 302 is configured to perform video frame rendering on the object pose information output by the detection subsystem 301 based on the scene data of the virtual scene, so as to obtain video frame data corresponding to the object pose information; the video data distributor 303 is configured to perform frame processing based on stereoscopic vision technology on the video frame data rendered by the rendering subsystem 302, and distribute the processed video frame data to the display screen 304 for display; the stereoscopic vision receiver 305 is worn on the interactive object, and is configured to perform stereoscopic vision receiving control on both eyes of the interactive object by adopting a visual receiving control manner corresponding to a stereoscopic vision technology, so that the both eyes of the interactive object receive different video images displayed on the display screen, and form stereoscopic vision perception.

Compared with the traditional stereoscopic video playing technology, the real-time video interaction system can render corresponding video pictures in real time according to the pose of the interaction object, so that the interaction object can change the content of the video pictures by changing the pose of the interaction object, the video picture display is not limited to a fixed visual angle any more, the interaction with the video is realized, and the reality of the interaction of the video content is improved. In addition, the real-time video interaction system still displays pictures through the display screen, the interaction objects can form stereoscopic vision perception only by wearing corresponding stereoscopic vision receivers, and heavy equipment similar to VR equipment is not needed to be worn, meanwhile, by using the stereoscopic vision receivers, the interaction objects can perceive virtual scenes through the video pictures and surrounding real environments, discomfort caused by paradox of virtual reality of VR technology is avoided, and safety in video content interaction is improved.

In a possible implementation manner, the real-time video interaction system provided by the embodiment of the application further comprises an interaction function device, wherein the interaction function device is used for providing at least one interaction function effect with the virtual scene, and each interaction function effect corresponds to the virtual prop in the virtual scene one by one.

By way of example, the interactive function effects may include, but are not limited to, the following interactive function effects:

(1) The flashlight lighting effect corresponds to that shown in fig. 4, and the corresponding virtual prop can be the virtual flashlight prop, when the interactive object uses the interactive function device to simulate the virtual flashlight prop, the operation of the interactive object on the interactive function device, such as direction rotation or shake, can be equally transmitted to the virtual flashlight prop, so that the virtual flashlight prop in the virtual scene presents corresponding changes, and then the interactive function effect presented on the video picture is affected.

(2) The sword cutting effect is corresponding, see fig. 4, and the corresponding virtual prop can be the virtual sword prop, when the interactive object uses the interactive function device to simulate the virtual sword prop, the operation of the interactive object on the interactive function device, such as handheld cutting action, can be equally transferred to the virtual sword prop, so that the virtual sword prop in the virtual scene presents a corresponding cutting action, and then the interactive function effect presented on the video picture is affected. For example, when the virtual scene is a fruit cutting scene, the interactive function effect displayed may be that the fruit is cut by the virtual sword prop, the cut fruit displays a corresponding cut light effect, or when the virtual scene is a simulated combat scene, the interactive function effect displayed may be that the virtual sword prop fights against an enemy, the cut enemy displays a corresponding injury effect, or the like.

(3) The gun impact effect corresponds to that shown in fig. 4, and the corresponding virtual prop can be a virtual gun prop, when the interactive object uses the interactive function device to simulate the virtual gun prop, when the interactive object performs control operation on the interactive function device, for example, presses a gun-firing button or a bullet-changing button, the control operation can be equally converted to the virtual sword prop, and the virtual sword prop in the virtual scene is controlled to perform corresponding actions, so that the interactive function effect displayed on the video picture is affected.

Of course, other possible interactive functional effects may be included in practical application, such as a fan blowing effect, a water flowing effect when watering the kettle, etc., which are not limited by the comparison of the embodiments of the present application, and are not exemplified herein.

It should be noted that, in the embodiment of the present application, for the interactive function device, the interactive function effect and the interactive function effect may be one-to-one, that is, an interactive function device may be used to provide a specific interactive function effect to a user, and then the user may configure the interactive function device with the corresponding interactive function effect according to the requirement of the user. Or, in order to reduce the cost of the user for configuring the interactive function device, one interactive function device in the embodiment of the application can correspond to multiple interactive function effects, and the configuration of the corresponding interactive function effect can be performed according to the virtual scene or the selection of the interactive function effect can be performed by the user during actual use. Or, the interactive function device provided in the embodiment of the present application may be a device capable of switching a configuration, and a user may change the configuration of the interactive function device according to the configuration of a virtual prop to be simulated, for example, when the pose detection of the interactive function device is implemented in the form of a marker point (marker), the configuration of the interactive function device is substantially determined by combining a plurality of markers, so that the markers may be designed to be movably or detachably connected, and thus, the user may change the positions of the markers to adapt to the configurations of different virtual props.

In the embodiment of the present application, when the real-time video interactive system further includes an interactive function device, the detection subsystem 301 is further configured to perform real-time pose detection on the interactive function device controlled by the interactive object, similar to the pose detection of the interactive object, the pose information of the interactive function device in the detectable region a in real space can be obtained, and the pose information in real space can be mapped into the virtual scene through the spatial mapping relationship, so as to obtain the pose information of the interactive function device in the virtual scene; similarly, the rendering subsystem 302 may determine at least one virtual object interacting with the virtual prop from the virtual scene based on the prop pose information output by the detection subsystem 301, and perform video frame rendering on the at least one virtual object based on the corresponding interaction function effect, to obtain video frame data including the interaction function effect.

Referring to fig. 5a, a schematic diagram of an interactive function device for simulating a flashlight illumination effect is shown in fig. 5a, which is to be understood to be a flashlight prop for convenience in describing the flashlight illumination effect, but the interactive function device is not necessarily a real flashlight prop in an actual scene, and is only required to be an interactive function prop capable of simulating a flashlight function. Referring to fig. 5a, the virtual scene is a cave scene with insufficient light, correspondingly, the video image presented on the display screen is a dark cave image, the interactive object can select to use the flashlight prop, when the flashlight prop of the interactive object is in video interaction, the detection subsystem 301 detects the pose change of the flashlight prop, the rendering subsystem 302 is based on the pose information output by the detection subsystem 301, such as the position and the illumination direction of the flashlight prop, and can create a virtual flashlight prop corresponding to the virtual flashlight prop in the virtual scene, and control the pose information of the virtual flashlight prop and the flashlight prop held by the interactive object to be consistent, so that the virtual flashlight prop can simulate the illumination light of the emitted flashlight, and can illuminate the virtual scene, and correspondingly, the rendered video image can include the illumination effect as shown in fig. 5 a.

Referring to fig. 5b, a schematic diagram of an interactive function device for simulating fruit cutting effect of a sword is shown in fig. 5b, and similarly, the interactive function device is directly shown as a sword prop, but in an actual scene, the interactive function device is not necessarily a real sword prop, and is just an interactive function prop capable of simulating a sword channel function. Referring to fig. 5b, the virtual scene is a fruit cutting game scene in which a lot of fruits fly towards the interactive object, in which a virtual interactive object corresponding to the interactive object may exist, the action of the virtual interactive object is different from that of the interactive object, and the virtual interactive object is also held by the virtual sword prop, so that, correspondingly, a video image presented on the display screen is a image in which fruits fly towards the interactive object, the interactive object may hold the sword prop to cut the approaching fruits, the detection subsystem 301 detects the pose of the sword prop in real time, the rendering subsystem 302 may map to the virtual sword prop based on the pose information output by the detection subsystem 301, such as the position and the direction of the sword pointer, so that the corresponding pose is presented in the virtual scene, and then the fruit contacted with the virtual sword prop and the contact position in the virtual scene may be calculated, so that the fruit is cut from the contact position, and the corresponding video image may include the fruit cutting effect as shown in fig. 5 b.

Of course, in different virtual scenes, different virtual props may be used to present video frames with different interaction effects, and in actual scenes, the design may be performed according to actual situations, which is not limited by the embodiment of the present application.

The following description will be made for each of the above-described functional subsystems, respectively.

(one) detection subsystem 301

In one possible implementation, the detection subsystem 301 may be implemented based on Motion capture (Mocap) technology, i.e., the detection subsystem may be a Motion capture system.

The Motion capture technology is to set a tracker at a key part of a moving object (i.e. an interactive object or interactive function device), capture the position of the tracker by a Motion capture system, and obtain data of three-dimensional space coordinates after processing by a computer. The conventional motion capture techniques can be classified into mechanical, acoustic, electromagnetic, active optical, and passive optical types in principle. Mechanical motion capture relies on mechanical devices to track and measure motion trajectories, and a typical system consists of a plurality of joints and rigid links, wherein an angle sensor is arranged in a rotatable joint, so that the change condition of the rotation angle of the joint can be measured; the common acoustic motion capture device consists of a transmitter, a receiver and a processing unit, wherein the transmitter is a fixed ultrasonic generator, the receiver is generally composed of three ultrasonic probes which are arranged in a triangle, and the system can calculate and determine the position and the direction of the receiver by measuring the time or the phase difference of sound waves from the transmitter to the receiver; the electromagnetic motion capture system is a relatively common motion capture device, and generally consists of a transmitting source, a receiving sensor and a data processing unit, wherein the transmitting source generates an electromagnetic field distributed according to a certain space-time rule in space, the receiving sensor is arranged at a key position of a body of a moving object, moves in the electromagnetic field along with the action of a performer, and is connected with the data processing unit through a cable or in a wireless mode; optical motion capture performs the task of motion capture by tracking a specific light spot on a target, most of common optical motion capture is based on the computer vision principle, and theoretically, as long as a point in space is visible to two cameras at the same time, the position of the point in space at the time can be determined according to images and camera parameters shot by the two cameras at the same time, and when the cameras continuously shoot at a sufficiently high speed, the motion trail of the point can be obtained from an image sequence.

From the technical point of view, the essence of motion capture is to measure, track and record the motion trail of an object in a three-dimensional space. A typical motion capture system generally consists of the following components:

(1) The sensor, i.e. the tracking device fixed at a specific part of the moving object, provides the Motion capture system with the position information of the moving object Motion, and generally determines the number of trackers according to the capturing accuracy.

(2) The signal capturing device is different according to the type of the Motion capture system, is responsible for capturing the position signal, is a circuit board for capturing the electric signal for a mechanical system, and can be a high-resolution infrared camera for an optical Motion capture system.

(3) The data processing equipment can be used for subsequent work by combining a three-dimensional model after the data captured by the Motion capture system is corrected and processed, so that the work is finished by data processing software or hardware, the software is also good, and the hardware is also strike, and the data processing is finished by means of the high-speed computing capability of the computer on the data, so that the three-dimensional model really and naturally moves.

The signal capturing device and the data processing device are connected through a data transmission device, and particularly, a Motion capture system requiring a real-time effect needs to quickly and accurately transmit a large amount of Motion data from the signal capturing device to a computer system for processing, and the data transmission device is used for completing the work.

Based on the motion capture system described above, referring to fig. 6, a schematic system structure diagram of a detection subsystem implemented based on a motion capture technology according to an embodiment of the present application is shown. The detection subsystem 301 may include at least one motion capture camera 3011, a wearable tracking device 3012 worn on the interactive object, and a motion capture analysis device 3013, where the at least one motion capture camera 3011 is positioned around the detectable region a to image capture the interactive object from different angular directions.

Each motion capture camera 3011 of the at least one motion capture camera 3011 is configured to perform image capturing on the detectable region a to capture a motion capture image including a wearable tracking device, where each motion capture camera 3011 captures a motion capture image and sends the motion capture image to the motion capture analysis device 3013, and the motion capture analysis device 3013 may perform real-time pose detection on the wearable tracking device 3012 according to the at least one input motion capture image to obtain tracking device pose information of the wearable tracking device 3012, where the wearable tracking device 3012 is worn on the interactive object and changes accordingly with pose change of the interactive object, so that object pose information of the interactive object may be determined based on the obtained tracking device pose information.

In practical application, the number of the motion capture cameras 3011 can be configured according to the shooting angle of the motion capture cameras 3011 and the angle to be detected in the detectable region, so as to cover the required detection angle and accurately detect the pose of the interaction object.

In a practical scenario, when pose detection is performed on an interactive object or an interactive function device, the pose detection is performed based on trackers (i.e., sensors included in the dynamic capture system) included in the wearable tracking device 3012, so that the relative positions of the trackers are generally required not to be changed, and thus the wearable tracking device 3012 is generally required to be implemented based on a rigid object.

Therefore, in the embodiment of the application, the wearable tracking device 3012 may include a flexible wearing part, a rigid mounting part and a plurality of trackers, the trackers are fixedly mounted on the rigid mounting part, the rigid mounting part is generally understood as a part which is not easy to deform, and after the trackers are fixedly mounted, the relative positions among the trackers are not easy to change in the process of changing the pose of the interactive object, so that tracking calculation of the pose can not be realized, the accuracy of detected pose information is improved, the trackers are worn on the interactive object through the flexible wearing part, the compliance of the wearable part with the wearing body part of the interactive object after being worn is higher, and the comfort of the interactive object is also higher. In order to facilitate determining the facing direction of the interactive object, the trackers can be arranged in a directional manner, and the arrangement direction points to the facing direction of the interactive object. The fixed mounting means that the position of the tracker is not changed after the tracker is mounted on the rigid mounting component, but the tracker and the rigid mounting component can be connected in a detachable connection mode, for example, an adhesive mode or a buckling fixed mode can be adopted, and the embodiment of the application is not limited by comparison.

In this way, the object pose information output by the detection subsystem 301 may specifically indicate the position and the facing direction of the interactive object, and the rendering subsystem 302 may specifically obtain corresponding video frame data according to the position and the facing direction indicated by the object pose information based on the scene data in the virtual scene, that is, according to the position and the facing direction of the interactive object, the viewing angle direction expected to be observed by the interactive object may be uniquely determined, so that the virtual scene frame of the corresponding viewing angle is presented to the interactive object.

Referring to fig. 7a, an arrangement diagram of several trackers according to an embodiment of the present application is shown. Referring to figure 7a, when 3 trackers are included, then a triangular arrangement may be present between the trackers with the tips of the triangle pointing in the direction of face; similarly, when a greater number of trackers are included, as shown in figures b and c of fig. 7a, they may also be arranged in a triangular array with the tips of the triangles pointing in the facing direction at all times.

In specific applications, corresponding devices may be employed according to different implementation principles of the motion capture technology employed. In the embodiment of the application, an optical-based motion capture technology is taken as an example, and the motion capture camera can be specifically an infrared camera, and the tracker can be specifically a reflective marker with a fluorescent material smeared on the surface.

In the embodiment of the application, the wearable tracking device can be designed into various types of wearable devices, for example, the wearable tracking device can be divided into a head wearable tracking device, a chest wearable tracking device, a leg wearable tracking device and the like according to different wearing parts, and the main perception mode of video interaction is considered to be visual perception, so that the visual angle detection is a more direct detection mode, the visual angle changes along with the movement of the head, and therefore, in order to more accurately detect the visual angle data of an interaction object, the wearable tracking device can be preferably adopted, such as a hat.

Referring to fig. 7b, a schematic diagram of a wearable tracking device is shown with a hat as an example. The flexible wearing part contained in the wearable tracking equipment can be specifically a cap top of a cap, the rigid mounting part can be specifically a cap peak of the cap, and the trackers can be mounted on the cap peak.

Similarly, the pose detection principle on the interactive function device is similar, and each tracker on the interactive object can be designed according to the appearance of the interactive function device, so that the appearance outline of the interactive function device can be determined in an auxiliary mode, the three-dimensional space information of the object can be tracked, corresponding virtual props can be bound in a virtual scene later, and the addition of the subsequent interactive effect is assisted.

By way of example, where an interactive function device simulates a flashlight, an interactive handle or flashlight-like shaped object may be used to simulate the function of the flashlight.

The interactive handle can be a device for realizing the interactive control of the whole system and the control of the video interactive function, and can be sold along with the products of the video interactive system, and in general, the interactive handle can be provided with a large number of trackers when leaving a factory, so that the simulation of various objects can be realized, a flashlight mode can be selected when a flashlight is simulated, namely, the trackers required by the simulated flashlight are available, and the rest trackers can be closed; of course, the interactive handle may be shipped without the tracker being configured, and the tracker may be configured by the interactive object by itself, for example, by adhering the tracker to the interactive handle.

The flashlight-like object may be an object similar to the flashlight, such as a cylindrical object. In an actual scene, the interactive object can configure the tracker by itself, for example, the tracker is stuck to a cylindrical object, so that similar objects can be selected at will, and flexible configuration of the interactive function device is realized. Of course, in order to avoid failure in pose detection due to deformation, a rigid object is selected as much as possible as an object in a flashlight-like shape.

Referring to fig. 7c, when a microphone is used as the flashlight-like object, a tracker may be disposed around the microphone so as to surround the object as much as possible, and thus the detection of the object is facilitated. The specific illustration in fig. 7c is that the tracker is affixed at the bottom and in four orientations, with one pair of opposing orientations being at a different height than the other pair. Of course, in an actual scene, other position layout methods may be adopted, as long as the outline of the object can be detected, which is not limited in the embodiment of the present application.

In one possible implementation, the detection subsystem 301 may be implemented based on depth image technology, i.e. the detection subsystem may be a depth detection system. Compared with the detection system scheme of the dynamic capture technology, the deployment difficulty and the cost of the detection system scheme based on the depth image technology are lower.

Referring to fig. 8, another system structure diagram of a detection subsystem 301 provided by an embodiment of the present application is shown, where the detection subsystem 301 may include a depth camera 3014 and a depth analysis device 3015. The depth camera 3014 is mainly used for shooting a depth image of an interactive object, the depth image is an image representing a distance, each pixel point in the depth image represents a distance between a corresponding object and the depth camera, and the depth analysis device 3015 is used for extracting skeleton information of the interactive object based on the depth image and determining pose information of the object according to the obtained skeleton information.

Specifically, the depth camera 3014 may be 1 fixed depth camera, which may be disposed at a central position of the display screen, and when the real screen is an environmental screen of 360 °, a position is selected for installation, and the depth camera 3014 may be used to pick up skeletal information of a person to obtain a spatial information position similar to eyes.

It should be noted that, in practical application, the two modes may be adopted at the same time to mutually verify the accuracy of pose information, or the two modes may be alternatively adopted.

In the embodiment of the application, the detectable region and the virtual scene have a spatial mapping relationship, and the spatial mapping relationship can be a one-to-one mapping relationship or a mapping relationship with a certain scaling ratio.

Taking a one-to-one mapping relationship as an example, the positions in the detectable region and the positions in the virtual scene are matched one by one, so that in order to facilitate mapping between the detectable region and the virtual scene, a coordinate system can be created, and the origin of coordinates of the detection subsystem needs to be determined first. For example, referring to fig. 9, a schematic diagram is created for a coordinate system, when a ring screen is used, the center of the ring screen may be set as the origin of coordinates, so that the relationship between the origin of coordinates of the detection subsystem and the display screen may be determined, and further, the directions of the x-axis and the y-axis may also need to be determined, so that a virtual screen model consistent with the actual display screen may be created in the virtual scene for subsequent viewing angle calculation and video image rendering process.

In the embodiment of the present application, the motion capture analysis device 3013 or the depth analysis device 3015 each include corresponding analysis software, so that when image data is input, pose information of an interactive object or an interactive function device can be obtained by the analysis software according to the image data and transmitted to the rendering subsystem 302 in real time. For example, for the motion capture software in the motion capture analysis device 3013, three-dimensional spatial information of the interactive object or the interactive function device may be calculated by capturing an image frame returned by the camera, and transmitted to the rendering subsystem 302 in real time.

(two) rendering subsystem 302

The rendering subsystem 302 is used to implement a rendering process of video pictures, which may be implemented by a rendering engine, for example, rendering of video pictures by a game engine. As described above, the embodiments of the present application facilitate determining video pictures projected onto a virtual screen by creating a virtual screen in a virtual scene that coincides with a real display screen.

Referring to fig. 10, a schematic diagram of a virtual screen provided by an embodiment of the present application shows that, when a virtual screen consistent with a display screen exists in a virtual scene, different scene contents, that is, corresponding video frames, can be observed through the virtual screen at a position in the virtual scene, and then the rendering process is essentially converted into judging, at a specific position, the content of the virtual scene projected onto the virtual screen.

Specifically, when the rendering subsystem 302 receives the object pose information of the interactive object, a corresponding virtual image acquisition device (i.e., a virtual camera) can be created in the virtual scene based on the object pose information, the position of the virtual camera is completely consistent with the pose of the interactive object, that is, the position of the virtual camera is consistent with the interactive object, the shooting view angle of the virtual camera is consistent with the eye view angle of the interactive object, and the view finding surface of the virtual camera is located on the virtual screen, so that the picture content of the virtual scene shot by the virtual camera projected to the virtual screen is the video picture that can be seen by the interactive object.

In an actual scene, in order to realize stereoscopic vision perception, the embodiment of the application adopts a stereoscopic vision technology based on two eyes, namely, the obtained pose information of the object is converted into a pose information set of the two eyes, and the pose information set of the two eyes contains the pose information of each of the two eyes of the interactive object.

Referring to fig. 11, a schematic diagram of a principle of converting a binocular pose information set using a hat as an example of a wearable tracking device is shown. Where the triangle represents the position of the wearable tracking device, e.g. a hat, the average position of the tracker mounted on the visor may be characterized, the pointing direction of the triangle is the facing direction of the interactive object, and there is a certain offset between the tracker and the eyes of the interactive object, i.e. the wearing offset value shown in fig. 11, which is fixed once the tracker is fixed to the visor, and thus may be preconfigured. Furthermore, after the tracking device pose information of the wearable tracking device is acquired, a binocular pose information set can be obtained based on object parameters of the pre-configured interactive object.

The object parameters may include a wearing offset parameter and a pupil distance parameter, where the wearing offset parameter is an offset value between the wearable tracking device and the interactive object, for example, the wearing offset value shown in fig. 11, the pupil distance parameter corresponds to the pupil distance, and the object parameters may be configured in advance to adapt to different interactive objects for viewing, and compensate for the offset between the wearable tracking device and the eyes, so as to avoid errors, and improve the problem of inaccurate picture rendering

In practical applications, a default value is configured for both the wearing offset parameter and the interpupillary distance parameter, for example, the interpupillary distance parameter may be an average interpupillary distance obtained by statistics, and the wearing offset parameter may be determined based on the type of wearing.

In another configuration manner, referring to fig. 3, the system provided in the embodiment of the present application may further include an operable terminal device 306, where the terminal device 306 may be configured to provide an operable page for inputting object parameters of an interactive object, and before the interactive object performs a video interaction process, a corresponding parameter value may be input according to a situation of the interactive object, so that relevant parameters of the current interactive object may be accurately obtained, so that subsequent pose information conversion is more accurate. In addition, it should be noted that the operable terminal apparatus 306 may be used for configuring other parameters of the entire system, in addition to the above-described parameters, and may also be used for configuring a virtual scene, for example, removing or resizing objects in the virtual scene.

In another configuration manner, the configuration may be performed through the interactive handle, and then the relevant configuration page may be presented on the display screen 304 through the operation of the interactive handle, so as to perform corresponding configuration for the corresponding configuration item.

Further, with continued reference to fig. 11, the rendering subsystem 302 may specifically perform compensation processing on the input pose information of the tracking device based on the wearing offset parameter to obtain object pose information, and may respectively obtain pose information of each of the left and right eyes of the interactive object based on the pupil distance parameter and the obtained object pose information, so as to obtain the above-mentioned binocular pose information set, so that the rendering subsystem 302 may perform video frame rendering on the binocular pose information set based on scene data in the virtual scene to obtain video frame data including video frames corresponding to each of the eyes, specifically, perform the above-mentioned rendering process on each of the pose information of the eyes.

Specifically, after the binocular pose information set is obtained, referring to fig. 12, a corresponding virtual image acquisition device may be configured in the virtual scene based on the binocular pose information set, that is, corresponding virtual stereo cameras are created at corresponding positions based on pose information of the left and right eyes respectively, one virtual camera corresponds to the left eye, and the other virtual camera corresponds to the right eye, so that video picture rendering is performed based on picture data of the virtual scene acquired by the two cameras projected into the virtual screen, and corresponding video picture data is obtained.

Meanwhile, when the interactive object uses the interactive function device, virtual props of corresponding interactive functions can be bound for the interactive object in the virtual scene, so that other interactive objects can be set when rendering is performed, for example, when the interactive function device simulates a flashlight, a spotlight can be emitted at the position of the interactive function device, so that the content in a virtual picture can reflect the change in reality, and although the interactive function device is held in a hand, a spotlight light source can be sent according to the current interactive effect and the position of the interactive function device in the virtual environment, and the effect of the flashlight is simulated.

In some scenes, for example, when a large LED display wall is used as the display screen, considering that the surrounding display screen is relatively large, the whole image rendering or image displaying is more burdensome for a single processor, so in the embodiment of the present application, as shown in fig. 10, a whole display screen may be divided into a plurality of sub-screens, that is, the display screen 304 is composed of a plurality of sub-screens, and accordingly, the corresponding virtual screen is also correspondingly divided into a corresponding number of sub-screens, and the division modes and ranges are consistent, and when the video image rendering is performed, the image rendering can be performed for different sub-screens.

In consideration of rendering performance, the embodiment of the application adopts a plurality of graphic renderers to render, each graphic renderer corresponds to one sub-screen one by one, and each graphic renderer can be used for performing video picture rendering based on the relative position relation between object pose information and the sub-screen corresponding to the graphic renderers and scene data in a virtual scene to obtain video picture data to be displayed of the sub-screen corresponding to the graphic renderers, namely, for each graphic renderer, mapping an interactive object to the position and the viewing angle of the virtual scene, performing rendering processing based on scene content projected to the sub-screen corresponding to the graphic renderers to obtain the video picture to be displayed of the sub-screen.

Referring to fig. 13, a schematic diagram of a rendering subsystem according to an embodiment of the present application is specifically described herein by taking an example in which a display screen is divided into 4 sub-screens, but in an actual scenario, the number of sub-screens is not limited. Where, corresponding to 4 sub-screens, rendering subsystem 302 may include 4 image renderers, one corresponding to a video frame to be displayed for rendering one sub-screen.

Specifically, after the binocular pose information set of the interactive object is obtained, the virtual stereo camera can be created at the position, and for each sub-screen, the corresponding image renderer performs picture rendering of the camera based on the left and right eye positions to obtain video picture data including left and right eye pictures of each sub-screen.

It should be noted that, in practical application, the rendering of the left and right eye frames may be respectively rendered, so that video frames that need to be displayed by the left and right eyes respectively may be obtained, and the two video frames are distributed to the video data distributor when the video data distributor distributes the video frames subsequently; or, the rendering of the left and right eye pictures can be performed simultaneously, that is, the stereoscopic camera can respectively obtain the scene contents perceived by the left and right eyes, and then the picture rendering can be performed based on the scene contents, so as to obtain a video picture comprising the left and right eye pictures.

In the embodiment of the application, the plurality of graphics renderers can be realized by adopting a plurality of cluster nodes or a plurality of computer devices, and the plurality of cluster nodes or the plurality of computer devices can be object devices or virtual devices. For example, each graphics renderer may be a physical display card.

When there are multiple image renderers in the system, the multiple image renderers need to render each frame of video picture synchronously, so-called screen tearing phenomenon, which is to avoid that the pictures finally displayed on the display screen are not synchronous, and in order to avoid that the video camera shoots inconsistent contents on each virtual screen, synchronization between the multiple image renderers must be ensured. In view of this, the system provided by the embodiment of the present application may further include a synchronization device configured to send synchronization signals to the plurality of graphics renderers, respectively, so that the plurality of graphics renderers synchronously perform video frame rendering.

Specifically, synchronization of rendering signals can be achieved through a synchronization card between the graphic renderers, each graphic renderer is linked through a network cable and the synchronization card, and the synchronization card sends the synchronization signals to each image renderer, so that each graphic renderer can render the same frame of video picture at the same time or approximately the same time.

In one possible implementation, the synchronization card may be a dedicated synchronization device, i.e. a dedicated device is provided separately from the plurality of graphics renderers for rendering synchronization.

In one possible implementation, the synchronization card may be implemented by one of the master graphics renderers, that is, the graphics renderers may perform the function of rendering synchronization in addition to the rendering of video frames, so as to send a synchronization signal to the remaining graphics renderers, so as to perform a synchronization operation through the synchronization signal.

Referring to fig. 14, a schematic diagram of a synchronization signal provided in an application embodiment is shown, where the synchronization signal includes a rising edge and a falling edge, and the rising edge or the falling edge may be used as a trigger signal, that is, when a rising edge or a falling edge is received, a video frame is triggered to be rendered once, a period is between one rising edge and the next rising edge, a trigger signal may be sent accurately in advance, the signal is synchronized to each image renderer through a network cable, and the period or the frequency of the synchronization signal may be set according to the rendering frame rate, for example, the rendering frame rate may be set to be not less than 60 frames/second, so as to avoid that the smaller frame rate may cause a jamming phenomenon to occur to a human eye. Typically, the synchronization error is on the order of microseconds, which can be met for picture synchronization on the order of milliseconds.

(III) display related subsystem (including video data distributor 303, display screen 304 and stereoscopic receiver 305)

In the embodiment of the present application, the display screen may be an LED display screen, and of course, display screens of other principles may also be used, for example, an organic light emitting semiconductor (Organic Electroluminescence Display, OLED) display screen or a liquid crystal display screen, which is not limited in this embodiment of the present application. Taking an LED display screen as an example, the video data distributor may be an LED processor, and is configured to process the received video picture data and output the processed video picture data to the LED display screen for display.

Specifically, the rendering subsystem 302 may render the left and right eye frames to be spliced into a complete frame to be output to the LED processor, and then the LED processor may perform frame processing based on the stereoscopic vision technology, so that the stereoscopic vision receiver worn by the interactive object may adopt a corresponding visual receiving control manner, so that the interactive object may generate stereoscopic vision perception. The video data distributor is specifically used for carrying out cutting processing based on binocular viewing angles on video picture data to respectively obtain left-eye picture data and right-eye picture data, sequentially distributing the left-eye picture data and the right-eye picture data to the display screen for display, and the stereoscopic vision receiver is specifically used for blocking right-eye picture receiving when the left-eye picture is displayed on the display screen and blocking left-eye picture receiving when the right-eye picture is displayed on the display screen.

In one possible embodiment, the stereoscopic vision technique may be a time-division-based stereoscopic vision technique, that is, left and right eyes display images at different times, and the left and right eyes of a person receive images at corresponding times.

Specifically, after the LED processor receives the complete frame data, the LED processor may perform cutting processing on the complete frame to obtain separate frame data of the left eye and the right eye, and perform frequency multiplication on the frame according to the screen rate setting, for example, frequency multiplication of 60 frames is performed to 120 frames, and the frame data of the left eye and the right eye are inserted in time respectively. Referring to fig. 15, a schematic diagram of a time-division-based frame output timing is shown in fig. 15, where fig. 15 is a diagram illustrating an example of outputting a left-eye frame first and then outputting a right-eye frame, and after a left-eye frame and a right-eye frame are obtained by cutting, the left-eye frame is first used as a first frame output to a display screen, the right-eye frame is used as a second frame output to the display screen, and so on.

The stereoscopic vision receiver can also adopt time division-based vision receiving control, namely, the left eye and the right eye can be controlled to receive corresponding video pictures at different moments, for example, the stereoscopic vision receiver can be shutter glasses which can be opened and closed according to corresponding frequency decibels based on a set frame rate, or the brightness of the left eye and the right eye is controlled, for example, the frame rate of pictures is 60 (120 after frequency multiplication), the frame rate of the shutter glasses can also be 120, and thus the frames of the left eye and the right eye are kept consistent with a screen, so that the pictures of the left eye and the right eye are obtained under 60 frame rates, and the stereoscopic vision perception effect is achieved. Referring to fig. 15, when the display screen displays the first frame, the left eye is controlled to receive the picture, the right eye is controlled to receive the picture, and similarly, when the display screen displays the second frame, the right eye is controlled to receive the picture, and the left eye is controlled to receive the picture.

In one possible implementation manner, the stereoscopic vision technique may also be a stereoscopic vision technique based on a color separation stereoscopic imaging technique, where two images photographed at different viewing angles are printed in the same picture with two different colors, and when viewed with naked eyes, blurred ghost images are displayed, and stereoscopic effects are only seen through corresponding stereoscopic glasses such as red and blue, that is, the colors are filtered, the red images pass through blue lenses, and the different images seen by the two eyes overlap in the brain to display a 3D stereoscopic effect.

In the embodiment of the application, the video data distributor comprises a plurality of sub-screens corresponding to the display screen, and each sub-distributor corresponds to at least one sub-screen and is used for carrying out picture processing based on the stereoscopic vision technology on the video picture data to be displayed of the corresponding sub-screen and distributing the processed video picture data to the corresponding sub-screen for display. Since each sub-distributor performs a process similar to that described above, a detailed description thereof will be omitted.

Referring to fig. 16, based on the same inventive concept, the embodiment of the present application further provides a real-time video interaction method with combined virtual and real functions, and fig. 16 is a schematic flow diagram of the real-time video interaction method with combined virtual and real functions, where the method can be applied to the real-time video interaction system with combined virtual and real functions, and the specific implementation flow is as follows:

step 1601: and performing real-time pose detection on the interactive object positioned in the detectable region in the real space to obtain object pose information of the interactive object mapped to the virtual scene, wherein a spatial mapping relation exists between the detectable region and the virtual scene.

Step 1602: and rendering video pictures according to the object pose information based on scene data of the virtual scene to obtain corresponding video picture data.

Step 1603: and performing picture processing based on a stereoscopic vision technology on the video picture data, and outputting the processed video picture data to a display screen for display.

Step 1604: and a visual receiving control mode corresponding to a stereoscopic vision technology is adopted to perform stereoscopic vision receiving control on the two eyes of the interactive object, so that the two eyes of the interactive object receive different video pictures displayed on the display screen to form stereoscopic vision perception.

The steps may be performed by the subsystems of the real-time video interactive system, so that reference may be made to the description of the corresponding parts, and thus, a detailed description is omitted.

The following describes the scheme of the embodiment of the present application through a specific interaction flow chart. Referring to fig. 17, an interaction flow diagram provided in an embodiment of the present application is shown.

S1: the interactive object wears a head tracking hat, shutter glasses, and a handheld interactive device.

S2: the interactive object enters a detectable area, namely an interactive experience area of the real-time video interactive system.

S3: the tracking equipment acquires the head coordinates of the person, namely the detection subsystem can detect the head tracking hat of the interactive object and determine the three-dimensional space information of the head tracking hat.

S4: the game engine renders left and right views at the viewing angle through the head coordinates.

S5: left and right eye pictures are encoded by the LED processor and projected onto the screen.

S6: shutter glasses control eyes to watch right and left eye pictures through shutters.

In summary, the embodiment of the application provides a method for calculating and simulating a picture seen by human eyes in real time by combining an led screen and an active capturing device, and the method has interaction capability and stereoscopic vision capability. Specifically, the positioning information of the human eye space is obtained in real time through the passive tracking system, and the positioning can be realized only by wearing a few simple mark points on the head, so that the heavy wearing of the traditional VR helmet is abandoned. Simultaneously, the video picture is timely fed back through the real-time rendering of the picture in the view angle of the game engine, the left eye picture and the right eye picture rendered on the led screen are displayed in cooperation with wearing the shutter type 3D glasses, the effect of stereoscopic vision perception is achieved, and the movement of the stereoscopic vision accessible experimenter is matched at any time in real time, so that the effect of immersing is achieved. Meanwhile, interactive track prop content can be provided, virtual pictures can be interacted in real time, and the led screen can use a fully enclosed wall surface or a semi-enclosed form, so that an experimenter can watch all environments without dead angles. Because we can observe real environment simultaneously, combined the virtual content in led screen and the real content in the actual scene, have accomplished real virtual and real combination, abandon wear-type VR helmet, the experimenter possess more portable activity space, only need wear a cap, and a pair of glasses just can accomplish the experience, do not have any wire rod constraint, also can more freely remove.

Referring to fig. 18, based on the same technical concept, the embodiment of the application further provides a real-time video interaction device combining virtual and real. In one embodiment, the computer device may be a device shown in fig. 2a, or a terminal device or a background data processing device shown in fig. 2b, or any device shown in fig. 3 (a device corresponding to a detection subsystem, a device corresponding to a rendering subsystem, or a video data distributor, etc.), where the device includes a memory 1801, a communication module 1803, and one or more processors 1802 as shown in fig. 18.

A memory 1801 for storing computer programs for execution by the processor 1802. The memory 1801 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, a program required for running an instant communication function, and the like; the storage data area can store various instant messaging information, operation instruction sets and the like.

The memory 1801 may be a volatile memory (RAM) such as a random-access memory (RAM); the memory 1801 may also be a nonvolatile memory (non-volatile memory), such as a read-only memory, a flash memory (flash memory), a hard disk (HDD) or a Solid State Drive (SSD); or memory 1801, is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to such. The memory 1801 may be a combination of the above memories.

The processor 1802 may include one or more central processing units (central processing unit, CPU) or digital processing units, or the like. A processor 1802, configured to implement the virtual-real combined real-time video interactive system when calling the computer program stored in the memory 1801.

The communication module 1803 is used for communicating with a terminal device and other servers.

The specific connection medium between the memory 1801, the communication module 1803, and the processor 1802 is not limited to the above embodiments of the present application. The embodiment of the present application is illustrated in fig. 18 by a connection between the memory 1801 and the processor 1802 via the bus 1804. The bus 1804 is illustrated in fig. 18 by a bold line, and the connection between other components is merely illustrative and not limiting. The bus 1804 may be divided into an address bus, a data bus, a control bus, and the like. For ease of description, only one thick line is depicted in fig. 18, but only one bus or one type of bus is not depicted.

The memory 1801 stores a computer storage medium, in which computer executable instructions are stored, for implementing the virtual-real combined real-time video interactive system according to the embodiments of the present application, and the processor 1802 is configured to execute the virtual-real combined real-time video interactive system according to the embodiments.

In another embodiment, when the real-time video interaction device combining virtual and real is provided to the user in an integrated form, the structure thereof may also be as shown in fig. 19, including: communication component 1910, memory 1920, display unit 1930, camera 1940, sensor 1950, audio circuit 1960, bluetooth module 1970, processor 1980, and the like.

The communication component 1910 is configured to communicate with a server. In some embodiments, a circuit wireless fidelity (Wireless Fidelity, wiFi) module may be included, where the WiFi module is a short-range wireless transmission technology, and the computer device may help the user to send and receive information through the WiFi module.

Memory 1920 may be used to store software programs and data. The processor 1980 executes various functions of the terminal device and data processing by executing software programs or data stored in the memory 1920. The memory 1920 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device. The memory 1920 stores an operating system that enables the terminal device to function. The memory 1920 in the present application may store an operating system and various application programs, and may also store codes for executing the real-time video interaction method of virtual-real combination according to the embodiments of the present application.

The display unit 1930 may also be used to display information input by a user or information provided to the user and a graphical user interface (graphical user interface, GUI) of various menus of the terminal device. In particular, the display unit 1930 may include a display screen 1932 disposed on the front of the terminal device. The display screen 1932 may be configured in the form of a liquid crystal display, light emitting diodes, or the like. The display unit 1930 may be used to display various video screens or system configuration pages in the embodiment of the present application.

The display unit 1930 may also be used to receive input numeric or character information, generate signal inputs related to user settings and function control of the terminal device, and in particular, the display unit 1930 may include a touch screen 1931 provided on the front surface of the terminal device, and may collect touch operations on or near the user, such as clicking buttons, dragging scroll boxes, and the like.

The touch screen 1931 may cover the display screen 1932, or the touch screen 1931 may be integrated with the display screen 1932 to implement input and output functions of the terminal device, and after integration, the touch screen may be simply referred to as a touch screen. The display unit 1930 may display an application program and corresponding operation steps in the present application.

The camera 1940 may be used to capture still images, and a user may comment the image captured by the camera 1940 through an application. The number of cameras 1940 may be one or more. The object generates an optical image through the lens and projects the optical image onto the photosensitive element. The photosensitive element may be a charge coupled device (charge coupled device, CCD) or a Complementary Metal Oxide Semiconductor (CMOS) phototransistor. The photosensitive element converts the optical signal into an electrical signal, which is then transferred to a processor 1980 for conversion into a digital image signal.

The terminal device may further comprise at least one sensor 1950, such as an acceleration sensor 1951, a distance sensor 1952, a fingerprint sensor 1953, a temperature sensor 1954. The terminal device may also be configured with other sensors such as gyroscopes, barometers, hygrometers, thermometers, infrared sensors, light sensors, motion sensors, and the like.

The audio circuitry 1960, speaker 1961, microphone 1962 may provide an audio interface between a user and a terminal device. The audio circuit 1960 may transmit the received electrical signal converted from audio data to the speaker 1961, and the electrical signal is converted into a sound signal by the speaker 1961 to be output. The terminal device may also be configured with a volume button for adjusting the volume of the sound signal. On the other hand, the microphone 1962 converts the collected sound signals into electrical signals, receives the electrical signals by the audio circuit 1960, converts the electrical signals into audio data, and outputs the audio data to the communication component 1910 for transmission to, for example, another terminal device, or outputs the audio data to the memory 1920 for further processing.

The bluetooth module 1970 is used for exchanging information with other bluetooth devices with bluetooth modules through bluetooth protocols.

Processor 1980 is a control center of the terminal device, connecting various parts of the entire terminal using various interfaces and lines, performing various functions of the terminal device and processing data by running or executing software programs stored in memory 1920, and invoking data stored in memory 1920. In some embodiments, processor 1980 may include one or more processing units; processor 1980 may also integrate an application processor that primarily handles operating systems, user interfaces, applications, etc., with a baseband processor that primarily handles wireless communications. It will be appreciated that the baseband processor described above may not be integrated into processor 1980. Processor 1980 of the present application may run an operating system, applications, user interface displays and touch responses, and virtual-real combined real-time video interaction methods of embodiments of the present application. In addition, processor 1980 is coupled to a display unit 1930.

Based on the same inventive concept, the embodiments of the present application also provide a storage medium storing a computer program, which when run on a computer, causes the computer to perform the steps in the virtual-real combined real-time video interaction method according to the various exemplary embodiments of the present application described above in the present specification.

In some possible embodiments, aspects of the virtual-real combined real-time video interaction system provided by the present application may also be implemented as a form of a computer program product comprising a computer program for causing a computer device to perform the steps of the virtual-real combined real-time video interaction method according to the various exemplary embodiments of the application described herein when the program product is run on the computer device, e.g. the computer device may perform the steps of the embodiments.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The program product of embodiments of the present application may take the form of a portable compact disc read only memory (CD-ROM) and comprise a computer program and may run on a computer device. However, the program product of the present application is not limited thereto, and in the present application, the readable storage medium may be any tangible medium that can contain, or store the program, including a computer program, for use by or in connection with a command execution system, apparatus, or device.

The readable signal medium may comprise a data signal propagated in baseband or as part of a carrier wave in which a readable computer program is embodied. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with a command execution system, apparatus, or device.

A computer program embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Computer programs for performing the operations of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages.

It should be noted that although several units or sub-units of the apparatus are mentioned in the above detailed description, such a division is merely exemplary and not mandatory. Indeed, the features and functions of two or more of the elements described above may be embodied in one element in accordance with embodiments of the present application. Conversely, the features and functions of one unit described above may be further divided into a plurality of units to be embodied.

Furthermore, although the operations of the methods of the present application are depicted in the drawings in a particular order, this is not required to either imply that the operations must be performed in that particular order or that all of the illustrated operations be performed to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

While preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the application.

It will be apparent to those skilled in the art that various modifications and variations can be made to the present application without departing from the spirit or scope of the application. Thus, it is intended that the present application also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims

1. A virtual-real combined real-time video interactive system, comprising:

2. The system of claim 1, further comprising an interactive function device for providing at least one interactive function effect with the virtual scene, each interactive function effect being in one-to-one correspondence with a virtual prop in the virtual scene;

3. The system of claim 1 or 2, wherein the detection subsystem comprises at least one motion capture camera, a wearable tracking device worn on the interactive object, and a motion capture analysis apparatus, the at least one motion capture camera positioned around the detectable region;

4. The system of claim 3, wherein the wearable tracking device comprises a flexible wearing part, a rigid mounting part, and a plurality of trackers fixedly mounted on the rigid mounting part and worn on the interactive object through the flexible wearing part;

5. The system of claim 4, wherein the motion capture camera is an infrared camera and the tracker is a reflective marker dot coated with a fluorescent material.

6. The system of claim 1 or 2, wherein the detection subsystem comprises a depth camera and a depth analysis device;

7. The system of claim 1 or 2, further comprising an operable terminal device for providing an operable page for inputting object parameters of the interactive object, the object parameters comprising a pupil distance parameter and a wear offset parameter, the wear offset parameter being an offset value between a wearable tracking device and the interactive object;

8. The system of claim 7, wherein the virtual scene includes a virtual screen corresponding to the display screen;

9. The system of claim 1 or 2, wherein the display screen is comprised of a plurality of sub-screens, and the video data distributor comprises a plurality of sub-distributors, each sub-distributor corresponding to at least one sub-screen;

10. The system of claim 9, wherein the rendering subsystem comprises a plurality of graphics renderers, each graphics renderer corresponding one-to-one to one sub-screen;

11. The system of claim 10, wherein the rendering subsystem further comprises a synchronization device;

12. The system of claim 1 or 2, wherein,

13. The system of claim 1 or 2, wherein the stereoscopic receiver is shutter glasses.

14. The virtual-real combined real-time video interaction method is characterized by comprising the following steps of:

and performing stereoscopic vision receiving control on the eyes of the interactive object by adopting a visual receiving control mode corresponding to the stereoscopic vision technology, so that the eyes of the interactive object receive different video pictures displayed by the display screen to form stereoscopic vision perception.

15. A computer program product comprising a computer program, characterized in that,

which computer program, when being executed by a processor, carries out the steps of the method according to claim 14.