CN116860112B

CN116860112B - Combined scene experience generation method, system and medium based on XR technology

Info

Publication number: CN116860112B
Application number: CN202311028622.0A
Authority: CN
Inventors: 蔡铁峰
Original assignee: Shenzhen Vocational And Technical University
Current assignee: Shenzhen Vocational And Technical University
Priority date: 2023-08-16
Filing date: 2023-08-16
Publication date: 2024-01-23
Anticipated expiration: 2043-08-16
Also published as: CN116860112A

Abstract

The invention discloses a combined scene experience generation method, a system and a medium based on an XR technology, wherein the experience generation method comprises the following steps: step S10: generating binocular stereoscopic experience pictures of all sub-scenes of the combined scene; step S20: and carrying out shielding calculation on the binocular stereoscopic experience pictures of each sub-scene to synthesize a combined scene experience picture, and displaying the combined scene experience picture to a user for watching. According to the invention, under the condition that the combined scene does not meet the condition that a separation surface exists among all the sub-scenes, particularly under the condition that the combined scene consists of the virtual scene and the real scene, when the shielding consistency is achieved, all the virtual sub-scenes do not need to generate and transmit the depth image of the corresponding experience picture when the user experience picture is generated respectively, and the combined scene experience can be generated correctly for the user, so that the bandwidth for transmitting the depth image is remarkably saved.

Description

Combined scene experience generation method, system and medium based on XR technology

Technical Field

The invention relates to the technical field of virtual reality, in particular to a combined scene experience generation method, a system and a medium based on an extended reality (XR) technology.

Background

Technologies such as Virtual Reality (VR), augmented Reality (AR), and Mixed Reality (MR) are compatible, and are collectively referred to as an augmented reality technology (XR). Based on the XR technology, a combined scene formed by combining more than 2 sub-scenes can be constructed, wherein the sub-scenes can be virtual scenes, digital twin scenes or real scenes. The combined scene can break the colorful experience activity types of the fence construction among the scenes, for example, in a classroom, a plurality of single teaching activity scenes or a plurality of collaborative teaching activity scenes can be combined to generate a teaching combined scene containing a plurality of scenes, and a plurality of combined scenes can be constructed by changing the types and the quantity of the contained scenes, wherein the method mainly comprises the following steps: "in-study" scenes, "demonstration-practical training" scenes, "cognition-practical training" scenes, practical training help scenes, assessment competition scenes, multi-group practical operation demonstration and the like.

The combined scene sub-scene can be a single user experience or a collaborative experience of a plurality of users. In a combined scene formed by combining a real scene and a virtual sub-scene, a real scene experience picture is generated in real time by a real camera, a virtual scene experience picture is rendered by a computing system, and a complete experience picture of the combined scene is obtained by synthesizing the real scene experience picture and the virtual sub-scene experience picture. When a combined scene contains multiple virtual sub-scenes, a single computing device is likely not able to load the rendering of this combined scene. Under the energization of high-performance wireless network technologies such as a 5G network and wifi6, services such as storage, calculation, rendering and the like required by an XR mobile terminal (5G mobile phone, head display and the like) can be put on the cloud. Therefore, based on cloud services such as cloud storage, cloud computing and cloud rendering, the computing, storage and rendering capabilities of a single XR terminal can be unlimited, each virtual sub-scene of a combined scene is deployed and operated on a plurality of servers on a server cluster or on a plurality of display cards of a large server, each virtual sub-scene is respectively and independently rendered on different servers or different display cards to generate experience pictures, all sub-scene experience pictures are summarized together after being transmitted, and the experience pictures of the combined scene are obtained through image synthesis, so that based on the cloud services, XR combined scene immersive experience containing a large number of sub-scenes can be generated for users.

When sub-scenes in the combined scene are respectively and independently rendered, the user experience picture of each sub-scene needs to be synthesized by shielding calculation to the experience picture of the user on the XR combined scene. In the patent application of a new dimension space construction method, system and platform based on XR technology (application number CN 202210022608.9), corresponding depth images are generated while rendering and generating user experience pictures of all sub-scenes, and shielding calculation is carried out on the user experience pictures of all the sub-scenes according to the depth images. In order to subtract the generation and transmission of the depth image, in the patent application of the ubiquitous training campus construction method, system and storage medium based on the XR technology (application number: CN 202210908795.0), the separation surfaces must exist between all sub-scenes in advance, and the occlusion relation between the experience pictures of all virtual scenes can be calculated without the depth image by the position of the separation surface where the scene is located. However, many XR combined scenes in practical application requirements cannot meet the requirement that each sub-scene has a separation surface, for example: the combined scene composed of a real campus scene and a plurality of virtual training scenes is deployed in the scenes of squares, building halls and the like of the real campus, the virtual training scenes are often surrounded by buildings in the real campus scene, and a separation surface capable of separating the virtual training scenes from the real campus scene does not exist between the virtual training scenes and the real campus scene, so that the construction method of the ubiquitous training campus construction method, the ubiquitous training campus construction system and the construction method of a storage medium (application number: CN 202210908795.0) based on the XR technology cannot be applied to.

Disclosure of Invention

The main objective of the present invention is to provide a method, a system and a medium for generating a combined scene experience based on XR technology, wherein even when the combined scene does not meet the requirement that a separation surface exists between all sub-scenes, when each sub-scene of the combined scene generates an experience picture, a part of virtual sub-scenes can also not need to generate a depth image corresponding to a transmission experience picture, and can also correctly complete the occlusion synthesis of the experience picture, especially when the combined scene consists of a real scene and more than 1 virtual sub-scene, even if the separation surface does not exist between the virtual sub-scene and the real scene, all the virtual sub-scenes can not need to generate the depth image corresponding to the transmission experience picture, and the occlusion synthesis of the experience picture can also be correctly completed.

In order to achieve the above purpose, the present invention provides a combined scene experience generating method based on XR technology, the method comprising the following steps:

step S10: generating binocular stereoscopic experience pictures of all sub-scenes of the combined scene, wherein all arbitrary virtual sub-scenes s _i The example obtains the pose of the user p in the coordinate system in real time, and according to the real-time pose and s _i Is used for the combination imaging interval of the user p and s _i Pupil distance in scene space, generating user p to scene s _i Binocular stereoscopic experience picture;

step S20: the binocular stereoscopic experience pictures of all the sub-scenes are subjected to shielding calculation to be synthesized into a combined scene experience picture, and the combined scene experience picture is displayed for a user to see;

wherein for any sub-scene s in the combined scene _j If the scene instance does not generate a depth image, in said step S20, when S _j When the pixel needs of the experience picture and the corresponding pixels of other components of the combined scene are compared with each other through the depth values, and the depth values of the pixels can be s _j And in addition, any sub-scene of the combined scene is a real scene or a virtual sub-scene, and each sub-scene is a convex section or a non-convex section in the display section of the combined scene.

As a further development of the invention, step S00 follows step S10: setting experience generation parameters of a combined scene, wherein the experience generation parameters comprise imaging intervals of each virtual sub-scene, pupil distance of a user p in each virtual sub-scene, rotation translation transformation relation between a coordinate system of the combined scene and a coordinate system of each virtual sub-scene, display intervals of each sub-scene in the combined scene and whether scene instances of each sub-scene generate corresponding depth images when rendering and generating experience pictures respectively.

As a further improvement of the present invention, step S00 sets whether the virtual sub-scene needs to generate a depth image in the following manner: s is(s) _i For any virtual sub-scene in the combined scene, when s _i When the shielding consistency is provided for other sub-scenes in the combined scene, setting s _i Does not require the generation of a depth image, otherwise it is.

As a further improvement of the present invention, step S30 is also provided after step S20, wherein the combined scene receives the user p interaction input, judges whether the interaction input is interaction of the sub-scene, and if the interaction input is judged to be the user p to any sub-scene S _i Is then converted into a scene s _i Interactive input in coordinate system s _i Responding to the converted interactive input.

As a further development of the invention, for any virtual sub-scene s _i If s is _i In step S00, the display section set in the combined scene is a convex section, when S _i When the display interval of the combination scene is not intersected with the display interval of other components in the combination scene, s is needed _i The method has shielding consistency for other components in the combined scene, and is s _i Setting its scene instance in step S00 does not require generating a depth image.

As a further development of the invention, for any virtual sub-scene s _i If s _i Does not generate a depth image for the scene instance of s _i The depth value of the other components of the required and combined scene is compared by s _i The method for calculating the depth information characterization value in the display interval of the combined scene comprises the following steps: calculating s in real time according to the pose of the user _i Is that the combined scene display interval boundary faces are usedDepth image of the shady side of user p, S generated in step S10 _i If any pixel of the user experience picture images an object point, assigning the depth value of the pixel corresponding to the depth image of the cathode surface of the boundary of the display interval to the pixel.

As a further improvement of the present invention, the combined scene is composed of 1 real scene s ₀ With more than 1 virtual sub-scene, in step S00, all virtual sub-scenes are set to be convex intervals in the display interval of the combined scene, the virtual sub-scenes are not overlapped with each other, and corresponding depth images are not generated when the experience picture of the user p is generated in step S10, and in step S10, the XR terminal of the user p generates a real scene S for the virtual sub-scenes ₀ In step S20, when the sub-scene experience pictures are combined into the combined scene experience picture by performing occlusion calculation, for any sub-virtual scene S _i When calculating the same view line s of the user p _i Pixels and s ₀ S is the shielding relation of corresponding pixels _i The depth value of the pixel is from s _i The depth information characterization value is obtained by calculation in the display interval of the combined scene, and is obtained by searching for s _i With any other virtual sub-scene s _j According to s _i And s _j The one-way shielding relation between the display intervals of the combined scene can obtain s on the same view line of the user p _i Arbitrary pixel and s _j Occlusion relationship of corresponding pixels.

As a further improvement of the invention, when a person in a real scene enters any virtual sub-scene s _i After the display section of the combined scene, if s _i If the scene instance of (1) does not generate a depth image, then set s _i The scene instance of (2) requires that the corresponding depth image be generated while the user p experience picture is generated, when the person in the real scene leaves s _i S after the display interval of the combined scene _i The generation of the depth image is stopped.

To achieve the above object, the present invention further proposes an XR technology based combined scene experience generation system, the system comprising a memory, a processor and an XR technology based combined scene experience generation method program stored on the processor, the XR technology based combined scene experience generation method program being executed by the processor to perform the steps of the method as described above.

To achieve the above object, the present invention also proposes a computer readable storage medium having stored thereon a computer program which, when invoked by a processor, performs the steps of the combined scene experience generation method based on XR technology as described above.

The combined scene experience generation method, system and storage medium based on the XR technology have the beneficial effects that:

the invention adopts the technical scheme that the method comprises the following steps: step S10: generating binocular stereoscopic experience pictures of all sub-scenes of the combined scene, wherein all arbitrary virtual sub-scenes s _i Acquiring the pose of the user p in the coordinate system in real time, and according to the real-time pose and s _i Is used for the combination imaging interval of the user p and s _i Pupil distance in scene space, generating user p to scene s _i Binocular stereoscopic experience picture; step S20: the binocular stereoscopic experience pictures of all the sub-scenes are subjected to shielding calculation to be synthesized into a combined scene experience picture, and the combined scene experience picture is displayed for a user to see; wherein for any sub-scene s in the combined scene _j If the scene instance does not generate a depth image, in said step S20, when S _j When the pixel needs of the experience picture and the corresponding pixels of other components of the combined scene are compared with each other through the depth values, and the depth values of the pixels can be s _j And in addition, any sub-scene of the combined scene is a real scene or a virtual sub-scene, and each sub-scene is a convex section or a non-convex section in the display section of the combined scene. The invention ensures that when the combined scene does not meet the condition that separation surfaces exist among all sub-scenes, especially when the combined scene consists of the virtual scene and the real scene, and the virtual sub-scene has shielding consistency, the depth image of the corresponding experience picture does not need to be generated and transmitted when the user experience picture is respectively generatedThe combined scene experience can be correctly generated for the user, so that the bandwidth for transmitting the depth image is remarkably saved.

Drawings

Fig. 1 is a flow chart of a combined scene experience generating method based on an XR technology.

Fig. 2 is a diagram illustrating a combined scenario of the present invention.

FIG. 3 is a schematic diagram illustrating the calculation of occlusion consistency and display interval depth information according to the present invention.

Fig. 4 is a schematic diagram of a system configuration according to a first embodiment of the present invention.

Fig. 5 is a schematic diagram of a system configuration according to a second embodiment of the present invention.

Fig. 6 is a schematic diagram of a system configuration according to a fourth embodiment of the present invention.

Fig. 7 is a schematic diagram of a system configuration according to a fifth embodiment of the present invention.

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent.

Detailed Description

It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

Referring to fig. 1, the invention provides a combined scene experience generating method based on XR technology, which comprises the following steps:

wherein for any sub-scene s in the combined scene _j If its fieldJing Shili no depth image is generated, and in said step S20, when S _j When the pixel needs of the experience picture and the corresponding pixels of other components of the combined scene are compared with each other through the depth values, and the depth values of the pixels can be s _j And in addition, any sub-scene of the combined scene is a real scene or a virtual sub-scene, and each sub-scene is a convex section or a non-convex section in the display section of the combined scene.

In addition, step S00 is further preceded by step S10: setting experience generation parameters of a combined scene, wherein the experience generation parameters comprise imaging intervals of each virtual sub-scene, pupil distance of a user p in each virtual sub-scene, rotation translation transformation relation between a coordinate system of the combined scene and a coordinate system of each virtual sub-scene, display intervals of each sub-scene in the combined scene and whether scene instances of each sub-scene generate corresponding depth images when rendering and generating experience pictures respectively.

The method for setting whether the virtual sub-scene needs to generate the depth image in step S00 is as follows: s is(s) _i For any virtual sub-scene in the combined scene, when s _i When the shielding consistency is provided for other sub-scenes in the combined scene, setting s _i Does not require the generation of a depth image, otherwise it is.

Step S30 is also performed after the step S20, wherein the combined scene receives the interactive input of the user p, judges whether the interactive input is the interaction of the sub-scene, and if the interactive input is judged to be the interaction of the user p on any sub-scene S _i Is then converted into a scene s _i Interactive input in coordinate system s _i Responding to the converted interactive input.

In particular for arbitrary virtual sub-scene s _i If s is _i In step S00, the display section set in the combined scene is a convex section, when S _i When the display interval of the combination scene is not intersected with the display interval of other components in the combination scene, s is needed _i The method has shielding consistency for other components in the combined scene, and is s _i Setting its field in step S00The scene instance does not need to generate a depth image.

Further, for arbitrary virtual sub-scene s _i If s _i Does not generate a depth image for the scene instance of s _i The depth value of the other components of the required and combined scene is compared by s _i The method for calculating the depth information characterization value in the display interval of the combined scene comprises the following steps: calculating s in real time according to the pose of the user _i In the case of the depth image of the shade surface of the combined scene display section boundary facing the user p, S generated in the step S10 _i If any pixel of the user experience picture images an object point, assigning the depth value of the pixel corresponding to the depth image of the cathode surface of the boundary of the display interval to the pixel.

The combined scene is composed of 1 real scene s ₀ With more than 1 virtual sub-scene, in step S00, all virtual sub-scenes are set to be convex intervals in the display interval of the combined scene, the virtual sub-scenes are not overlapped with each other, and corresponding depth images are not generated when the experience picture of the user p is generated in step S10, and in step S10, the XR terminal of the user p generates a real scene S for the virtual sub-scenes ₀ In step S20, when the sub-scene experience pictures are combined into the combined scene experience picture by performing occlusion calculation, for any sub-virtual scene S _i When calculating the same view line s of the user p _i Pixels and s ₀ S is the shielding relation of corresponding pixels _i The depth value of the pixel is from s _i The depth information characterization value is obtained by calculation in the display interval of the combined scene, and is obtained by searching for s _i With any other virtual sub-scene s _j According to s _i And s _j The one-way shielding relation between the display intervals of the combined scene can obtain s on the same view line of the user p _i Arbitrary pixel and s _j Occlusion relationship of corresponding pixels.

Also when a person in the real scene enters any virtual sub-scene s _i After the display section of the combined scene, if s _i If the scene instance of (1) does not generate a depth image, then set s _i Scene instance of (2)It is necessary to generate the corresponding depth image while generating the user p experience picture when the person in the real scene leaves s _i S after the display interval of the combined scene _i The generation of the depth image is stopped.

A combined scene experience generating method based on XR technology according to the present invention will be described in detail with reference to fig. 2 to 7, and the first to fifth embodiments.

The following describes important technical terms related to the present invention.

Convex section

Order theIs a three-dimensional convex section, and has the following properties: let->Wherein the point a and the point b are two points, and the coordinates of the point a areThe coordinates of the point b are +.>The point c is on the connection line of the points a and b and between the points a and b, and the coordinate of the point c can be expressed as +.>Wherein 1 > lambda > 0, then the c-point necessarily also belongs to +.>

Scene and scene instance

A scene defines objects contained in a three-dimensional space, object states, object itself running logic, and logic for interactions between objects; the scene example is a program process or thread which is operated in real time by computing resources such as a computer processor, a memory, a display card and the like according to scene definition, and the program process or thread calculates the states of all objects in the scene in real time, renders pictures and responds to the interaction of users. When a single scene has multiple user experiences at the same time, if the computing resource which can be obtained by the single scene instance can not generate experience pictures for all users in real time, multiple scene instances are needed to be generated for the scene and distributed to all users, object states in the scene are synchronized by establishing communication connection among the scene instances, and the experience pictures are respectively generated for the corresponding users in real time by all the scene instances, so that all the users share the experience scene.

XR scene imaging interval

User p in a combined scenarioPerforming immersive interactive experience s _i To form a combined scene->Is one of the sub-scenes of (a). In generating a combined scene->S is as experience of _i It is possible that only part of the scene content needs to be in the combined scene +.>Is presented internally to the user p-view in order to define s _i Need to be in the combined scene->Scene content presented to user p to watch, setting a three-dimensional interval and scene s _i Only the content of the three-dimensional interval needs to be imaged when the experience picture of the user p is generated, and the set three-dimensional interval is an XR scene imaging interval. The imaging interval can be infinite, or can be any three-dimensional shape interval such as a cube, a sphere and the like.

XR scene in display interval of combined scene

Combined sceneComprising a plurality ofSub-scenes, in order to make the sub-scenes between the combined scene +.>Without or with reduced overlap, for any sub-scene s _i In setting up the combined scene->A three-dimensional interval omega is internally set _i For presenting s _i Defining s _i At->When the combination scene is set up>Coordinate system and sub-scene s _i In the rotation-translation scaling relationship between the coordinate systems, the three-dimensional interval Ω is calculated based on the rotation-translation scaling relationship _i Mapping to sub-scene s _i The three-dimensional interval obtained- >Can be used as scene s _i In turn, scene s according to this rotation-translation scaling relationship _i Imaging interval +.>Mapping to combined scene->The three-dimensional interval omega _i Can be used as scene s _i In a combined scene->Is displayed in the display section of (a).

Unidirectional shielding and bidirectional shielding

Scene s _i Sum s _j For combined scenesAt any time t _k If on all the lines of view of user p, if s _i And s _j The shielding is s _i S of scene content occlusion of (2) _j Scene content, at time t _k For user p, scene s _i Sum s _j Unidirectional shielding, and the shielding relation is s _i Unidirectional shielding s _j The method comprises the steps of carrying out a first treatment on the surface of the Conversely, if on all the lines of view of user p, if s _i And s _j The shielding is s _j S of scene content occlusion of (2) _i Scene content, at time t _k For user p, scene s _i Sum s _j Unidirectional shielding, and the shielding relation is s _j Unidirectional shielding s _i The method comprises the steps of carrying out a first treatment on the surface of the In contrast, if on all the lines of view of user p, if s _i And s _j Occlusion occurs, probably s _i S of scene content occlusion of (2) _j Scene content, also possible s _j S of scene content occlusion of (2) _i Scene content, at time t _k For user p, scene s _i Sum s _j And (5) bidirectional shielding. If at all times, user p is in the combined scene +. >The internal pose has a scene s _i Sum s _j Non-bidirectional occlusion, then we call: in a combined scene->In, scene s _i Sum s _j Is constantly one-way occlusion. Two disjoint convex sections are necessarily constantly and unidirectionally shielded, and details are described in the 115-116 sections of patent application (application number: CN 202210908795.0) of a ubiquitous training campus building method, system and storage medium based on XR technology.

The principles of the present invention, particularly occlusion consistency, are described below.

Fig. 2 gives an example of a combined scene consisting of 4 sub-scenes, on the user view line shown in fig. 2, the scenes a, d, c may be occluded, wherein if occlusion occurs between scene c and scene d, it is certain that scene d occludes scene c, but no separation plane exists between scene a and both scenes c and d, which is a bi-directional occlusion. Although scene c and scene d are bi-directional occlusion with scene a, even if the corresponding depth image is not generated when the experience picture is generated, there are other ways to obtain effective depth information of the experience picture that can accurately perform occlusion calculation with scene a, specifically as follows.

In fig. 3, a view of a user is combined with a view of a scene B to form a combined scene, a view of the user passes through a scene a display interval and a scene B display interval of the combined scene, a, B, c, d, e, f is 6 points on the view, a and f are in a scene a display area, B, c, d, e is in a scene B display area, it can be found that the shielding relation between a and B, c, d, e is completely consistent, the point of the scene B display area is shielded by a, the shielding relation between f and B, c, d, e is also completely consistent, and the point of the scene B display area is shielded by a point f. Therefore, if the shielding relation between the point a and any point in b, c, d, e can be calculated, the shielding relation between the point a and all points in b, c, d, e is directly obtained; similarly, if the shielding relation between the f point and any point in b, c, d, e can be calculated, the shielding relation between the f point and all points in b, c, d, e is directly obtained. Taking the B diagram of fig. 3 as an example, when the depth value of the point B on the boundary of the display section of the scene B facing the sunny side of the user's sight (the included angle between the normal line of the outward direction of the boundary of the display section and the sight line is greater than 90 degrees) is calculated, the occlusion relationship between the point a and the point f and any point c of the common sight line in the display section of the scene B can be determined only by comparing the depth value with the point B. Similarly, as shown in fig. 3 c, when the depth value of the point e on the boundary of the shade surface of the display section of the scene B is calculated, the shielding relationship between a and f and any common line of sight point c in the display section of the scene B can be determined only by comparing the depth value with the point e. Let scene a be a combined scene Is a sub-scene of scene A +.>The display interval of (2) is omega and the point b is +.>When user p is in the combined scene +.>Position W _k Next, for all points where Ω will be blocked from point b, if they have the same blocking relationship with point b (both blocked from b, or both blocked from b), then we call: combined scene->The display section omega in the user position W _k The lower point b has occlusion consistency. If atUnder all the poses of the model, a combined scene +.>The display interval omega in the display interval has shielding consistency to the point b, and the display interval omega is called: combined sceneThe display interval omega in the display interval has shielding consistency to the point b. Let scene B be the combined scene +.>Is a sub-scene of scene B inIs Ω ', if Ω has occlusion consistency for all points in Ω', then it is called: omega has occlusion consistency for omega'. If Ω is a convex section, Ω 'is a three-dimensional section with arbitrary shape, Ω and Ω' do not overlap, and Ω has shielding consistency.

First embodiment:

the combined scene of the embodiment is a virtual-real mixed combined scene, and the combined scene consists of 1 real scene s ₀ Is composed of N virtual sub-scenes, wherein N is a natural number greater than 1, and the virtual sub-scenes are s respectively ₁ 、s ₂ 、...、s _N . As shown in fig. 4, in this embodiment, the combined scene experience generating system includes: the system comprises a real scene imaging module, a virtual scene depth information calculation module, an image synthesis module, a combined scene parameter setting module, an example of a 1 st virtual sub-scene, an example of a 2 nd virtual sub-scene, an example of an N < th > virtual sub-scene. Wherein the 1 st instance of the virtual sub-scene is the virtual sub-scene s ₁ The 2 nd instance of the virtual sub-scene is the virtual sub-scene s ₂ An instance of the nth sub-scene is a virtual sub-scene s _N Is an example of (a). For the video perspective MR head display, the real scene imaging module generates a binocular stereoscopic vision picture of a real scene and a corresponding depth image of the user through photoelectric equipment such as a camera on a user MR head display terminal, and for the optical perspective MR head display, the real scene imaging module does not need to generate the binocular stereoscopic vision picture but needs to generate a binocular stereoscopic vision depth image through the photoelectric equipment such as the camera on the user MR head display terminal (the binocular stereoscopic vision picture can be considered to be generated, and the color value of each pixel in the knowledge binocular stereoscopic vision picture image is 0, namely a full black image). The instance of the virtual sub-scene receives the real-time pose and the interactive operation information of the user sent by the user interface, responds to the received interactive operation information, renders a user binocular stereoscopic experience picture of a scene imaging interval in real time under the real-time pose of the user, sends the user binocular stereoscopic experience picture to the image synthesis module, and the image synthesis module carries out shielding calculation among pictures to synthesize a combined scene experience picture of the user according to the depth image corresponding to the stereoscopic vision picture of the real scene and the depth image generated by the instance of the virtual sub-scene or the depth information of each virtual sub-scene calculated by the virtual scene depth information calculation module in the combined scene, and displays the combined scene experience picture to the user through the user interface. The user or the system can set the imaging interval and each virtual sub-scene in the combined scene through the combined scene parameter setting module Parameters such as rotation, translation, scaling relation of the virtual sub-scene coordinate system and the combined scene coordinate system. The real scene imaging module, the virtual scene depth information calculation module, the image synthesis module and the combined scene parameter setting module are all deployed on the user terminal, and the example of the 1 st virtual sub-scene, the example of the 2 nd virtual sub-scene and the example of the N th virtual sub-scene are deployed and run on a cloud rendering server of the cloud.

Combined sceneFrom 1 real scene s ₀ Is composed of N virtual sub-scenes(s) with natural number greater than 1 ₁ 、s ₂ 、...、s _N Generating a combined scene for the user p in real time>Wherein for simplicity the real scene s ₀ Can be combined with scene->And sharing a coordinate system. As shown in fig. 1, the experience generating method of the present embodiment includes the following steps:

step S00: setting experience generation parameters of a combined scene, wherein the experience generation parameters comprise imaging intervals of each virtual sub-scene, pupil distance of a user p in each virtual sub-scene, rotation translation transformation relation between a coordinate system of the combined scene and a coordinate system of each virtual sub-scene, display intervals of each sub-scene in the combined scene and whether scene instances of each sub-scene generate corresponding depth images when rendering and generating experience pictures respectively;

Step S10: generating binocular stereoscopic experience pictures of all sub-scenes of the combined scene, wherein all arbitrary virtual sub-scenes s _i The example obtains the pose of the user p in the coordinate system in real time, and according to the real-time pose and s _i Is used for the combination imaging interval of the user p and s _i Pupil distance in scene space, generating user p to scene s _i Binocular stereoscopic vision body of (2)Checking pictures;

step S30, the combined scene receives the interactive input of the user p, judges whether the interactive input is the interaction of the sub-scene, and judges that the interactive input is the user p to any sub-scene S if the interactive input is judged _i Is then converted into a scene s _i Interactive input in coordinate system s _i Responding to the converted interactive input.

The specific implementation of each step is described in detail below:

the step S00 is specifically implemented as follows:

setting various combined scene parameters in a combined scene parameter setting module: setting each virtual sub-scene s ₁ 、s ₂ 、...、s _N The imaging intervals of (a) are omega respectively ₁ 、Ω ₂ 、...、Ω _N The method comprises the steps of carrying out a first treatment on the surface of the Setting a three-dimensional interval omega ' in a combined scene, wherein virtual scenes cannot be deployed in the three-dimensional interval omega ' cannot be deployed in the combined scene ' ₀ For example, a virtual scene cannot be deployed in an interval occupied by a wall, equipment and other objects in a real scene; setting the user p in each virtual sub-scene s ₁ 、s ₂ 、...、s _N The interpupillary distances of d are respectively ₁ 、d ₂ 、...、d _N The method comprises the steps of carrying out a first treatment on the surface of the The pupil distance of the user in the combined scene isSetting each virtual sub-scene s ₁ 、s ₂ 、...、s _N In a combined scene->The display intervals of (a) are omega respectively ₁ ′、Ω′ ₂ 、...、Ω′ _N The method comprises the steps of carrying out a first treatment on the surface of the Setting a rotation-translation scaling relationship from each virtual sub-scene coordinate system to a combined scene coordinate system, and for any virtual sub-scene s in the rotation-translation scaling relationship _i (i is more than or equal to 1 and less than or equal to N), scene s _i The rotation, translation and scaling relationship between the coordinate system and the combined scene coordinate system is that the rotation angles around the Z, X, Y three axes are respectively theta according to the sequence of Z, X, Y ⁱ 、β ⁱ 、α ⁱ Translation of +.>Z, X, Y scaling factor lambda for three axes _i The method comprises the steps of carrying out a first treatment on the surface of the Setting whether each virtual sub-scene instance needs to generate a depth image, namely, whether corresponding depth images are generated simultaneously when rendering and generating a binocular stereoscopic experience picture.

These parameters are not completely independent of each other and the following constraints need to be satisfied: enabling the pupil distance used by the terminal of the user p to generate a real scene stereoscopic experience picture or a depth image to be d ₀ (d ₀ As much as possible the true interpupillary distance of user p, the user is in the combined scenePupil distance- >Pupil distance for arbitrary virtual sub-scene s _i There is->Arbitrary virtual sub-scene s _i Its imaging interval omega _i Any point b of (2) ₀ According to s _i Coordinate system to combined scene->After coordinate transformation is carried out on the rotation translation scaling relation of the coordinate system, the obtained point is +.>Belonging to omega _i 'A'; display interval omega ₁ ′、Ω′ ₂ 、...、Ω′ _N Are not overlapped with each other, omega ₁ ′、Ω′ ₂ 、...、Ω′ _N Are all equal to omega' ₀ Non-overlapping; for arbitrary sub-virtual scene s _i If it displays the interval omega _i 'p.OMEGA' ₀ 、Ω ₁ ′、Ω′ ₂ 、...、Ω′ _N Middle-removing omega _i If all other sections except' have shielding consistency, setting s _i Does not need to generate corresponding depth images while rendering and generating binocular stereoscopic experience pictures, otherwise s is set _i The scene instance of (2) requires that a corresponding depth image be generated while rendering to generate a binocular stereoscopic experience picture. The virtual sub-scene set of which the scene instance does not need to generate the depth image is Λ, and the virtual sub-scene set which needs to generate the depth image is +.>

Wherein, any display interval omega is judged _i 'p.OMEGA' _j Whether the method has shielding consistency is as follows: at Ω _i 'and Ω' _j If omega, without overlapping _i ' convex interval is Ω _i 'p.OMEGA' _j Possessing shielding consistency, if omega _i ' being a non-convex region, first obtaining a region Ω surrounding the non-convex region _i ' minimum convex sectionIf->With omega' _j Non-overlapping, then->For omega' _j Has shielding consistency, thereby having omega _i 'p.OMEGA' _j And the occlusion consistency exists, otherwise, the occlusion consistency does not exist. Wherein a non-convex region Ω surrounding the non-convex region is obtained _i ' minimum convex zone->The method of (2) can be manually determined or automatically determined by a system, and the automatic determination method of the system comprises the following steps: traversing omega _i ' any pair of boundary points on the boundary, any point in all points between boundary points if not belonging to omega _i ' all incorporate Ω _i ' get->

The step S10 is specifically implemented as follows:

at any time t _j The pose of the user p in the combined scene coordinate system is [ W ] _j Q _j ]Each virtual sub-scene s ₁ 、s ₂ 、...、s _N Receiving the real-time pose of the user p from the user interface, calculating the pose of the user under each virtual sub-scene coordinate system according to the rotation, translation and scaling relationship between the respective coordinate system and the combined scene coordinate system, and for s ₁ 、s ₂ 、...、s _N Any virtual scene s _i According to s _i Coordinate system and combined sceneRotation translation scaling relation of coordinate system, and calculating to obtain user p in s _i The pose in the coordinate system is +.>Computing the pose of the user p in the virtual sub-scene coordinate system from the rotational translational scaling relationship of the coordinate system is well known to those skilled in the art and will not be described in detail herein. For s ₁ 、s ₂ 、...、s _N Any virtual scene s _i According to the s of the user _i Pupil distance d of (2) _i Posture->Imaging interval omega thereof _i Imaging scene content in the scene to generate an immersive binocular stereoscopic experience picture, wherein s _i Left and right eyes of the stereoscopic vision picture are respectively used for +.>And->Representation, wherein->And->To identify which pixels correspond to empty scene content, e.g. to set the value of the empty scene content pixels in the experience picture image to a specific value τ _null (e.g., blue) or may be identified by a specific array, where each element in the array corresponds to a pixel in the image, and the element can only be given a value of 0 or 1 to identify whether the corresponding pixel images the scene content empty, in which embodiment the experience picture image ∈>And->The value of the pixel whose middle scene content is empty is set to a specific value τ _null 。

For a virtual sub-scene set which needs to generate corresponding depth images while rendering and generating a binocular stereoscopic experience pictureAny scene s _k Rendering left and right eye images of the generated stereoscopic experience picture as +.>And->It is also necessary to generate a corresponding depth image +.>And->Pixels in which the imaged scene content is empty correspond to depthThe value of the degree is set to a certain value delta, which may be infinity. Special attention to- >And->S is the only thing _k In a combined scene->Depth image in coordinate system, where λ _k For scene s _k Coordinate system to combined scene->Scaling factors of the coordinate system. In this embodiment, real scene s is acquired in real time by the imaging module of the XR terminal ₀ Generates left and right eye stereoscopic images +.>And (3) with(for an optical perspective MR helmet, it can be considered that all pixel RGB color values of [000 ] are generated]Left and right eye stereoscopic images of (2) corresponding to the depth image +.>And->

In the embodiment of the invention, the imaging view angles and the resolutions of all images are defaulted to be the same, so that the pixels with the same image coordinates among different images have a shielding relationship; if the imaging field angle and the resolution are different between different images, the imaging field angle and the resolution between the images can be converted to be the same by interpolation or sampling, which is a common means for those skilled in the art.

Each virtual sub-scene transmits the generated stereoscopic experience picture image to the image synthesis module in real time, and if depth images are synchronously generated, the depth images also need to be transmitted to the image synthesis module in real time.

The step S20 is specifically implemented as follows:

the image synthesis module receives a binocular stereoscopic image of a stereoscopic experience picture generated by a real scene and a virtual sub-scene instance contained in a combined scene and also receives the binocular stereoscopic image of the stereoscopic experience picture The virtual sub-scenes in the set correspond to the depth images of the real scene experience pictures, while the virtual sub-scenes in the set Λ do not transmit the depth images. For arbitrary scene +.>It is in the combined scene->The display interval is->Real-time calculation +.>Can be +.>Surface depth value characterization of the user p-positive part +.>Depth information of (2) may be +.>Surface depth value characterization of the user p-facing female part +.>Or by an arbitrary value between the surface depth value of the male part and the surface depth value of the female part->Is included in the depth information of (a). />The depth information of (2) can be characterized by a depth image as well, at any time t _j ，/>For the left eye depth image of user p +.>Representing right eye depth image +.>And (3) representing. When using->Surface depth value characterization of the sunny side (user p facing) part +.>The calculation method is as follows: the virtual scene depth information calculation module reads +.>It is in the combined scene->The display interval is->User is in combination scene->The interpupillary distance is->At any time t _j Reading from the user interface to the pose of the user p in the combined scene coordinate system as [ W ] _j Q _j ]The combined scene is emptied Only keep->Is directed outward, generates left and right eye images of the user p, the images being directed only +.>Surface imaging of the sunny side part, corresponding depth image is +.>And->Wherein the depth value of the pixels not imaged on the surface is represented by a certain value delta; when using->The surface depth value of the female (facing away from the user p) portion characterizes Ω' _k1 The calculation method is as follows: the virtual scene depth information calculation module reads +.>It is in the combined scene->The display interval is->User is in combination scene->The interpupillary distance is->At any time t _j Reading from the user interface to the pose of the user p in the combined scene coordinate system as [ W ] _j Q _j ]Empty the combined scene->Only keep->Is directed inward, generates left and right eye images of the user p, the images being directed only +.>Surface imaging of the female part, corresponding depth image is +.>And->Wherein the depth value of the pixels not imaged on the surface is represented by a certain value delta; the values of the pixels in the depth image may be arbitrarily set between the corresponding surface depth value of the male surface portion and the corresponding surface depth value of the female surface portion.

For combined scenesAny sub-scene +.>The binocular stereoscopic experience picture image has arbitrary pixels tau imaged object points, and is composed of +. >In a combined scene->Display interval->The depth information of (c) characterizes the depth of tau and thus +.>And comparing depth values of pixels of the sight of the co-user of the stereoscopic vision picture with other components to obtain the shielding relation among the pixels. Wherein, according to the pose of the user, the display interval is calculated in real time>Can be used for the depth information of +.>The depth map representation corresponding to the imaging of the sunny side or the cloudy side can also be represented by any depth map with the depth value between the depth maps corresponding to the imaging of the sunny side or the cloudy side. />And if any pixel tau of the experience picture images an object point, representing the depth value of the pixel point of the display interval depth image and the line of sight of the tau user as the depth value of tau.

The image synthesis module receives binocular stereoscopic vision experience picture images generated by real scenes and virtual sub-scenes contained in the combined scene in real time, anddepth images corresponding to virtual sub-scenes and real scene experience pictures in the set are read out in real time from a virtual scene depth information calculation module to the fact that all scenes in the lambda are combined in the combined scene ∈ ->After the depth information representing value of the display section, image synthesis can be performed. SynthesisThe specific method is as follows: at any time t _j Received real scene s ₀ The stereoscopic experience picture left and right eye depth image is +.>And->Corresponding depth image +.>And->For collectionsAny scene s _k The left eye and the right eye of the received stereoscopic vision picture are +.>And->The corresponding depth image is +.>And->(s _k In a combined scene->Depth image in coordinate system is +.>And->λ _k For scene s _k Coordinate system to combined scene->Scaling factor of the coordinate system), for arbitrary scene +.>The left eye and the right eye of the received stereoscopic vision picture are +.>And (3) withThe corresponding display interval depth information characterization image read from the virtual scene depth information calculation module is +.>And->Make the combined scene after synthesis ∈ ->The user p stereoscopic experience picture left eye image is +.>The right eye image is +.>The corresponding depth images are +.>Initial setting-> Traversing set +.>For any of the sub-scenes s _k Traversing->All pixels, for any of themIf->Then:walk->All pixels, for any of themIf->Then:traversing all sub-scenes in the set Λ, for any sub-scene therein +.>Walk->All pixels, for any of which +.>If the pixel images the scene content, from +.>Find out the corresponding depth value/>When the depth value is less than + >When (I)>Shielding->The original pixel is specifically: when->And->ThenAnd->Walk->All pixels, for any of themIf->And->Then:note that: />The value of the Chinese is tau _null The corresponding depth value cannot be taken +.>And->The depth value of which should be considered infinite. After the traversal is completed, the resulting imageAnd->Namely the user p is about the combined scene>Is a binocular stereoscopic experience picture. The binocular stereoscopic experience picture is displayed to the user p through the user interface.

The step S30 is specifically implemented as follows:

user p is in a combined scene via a user interfacePerforming interactive operation to generate an interactive operation command a, wherein the position parameter of the interactive operation is +.>The attitude angle parameter is +.>User interface traversal combined scene +.>All virtual sub-scenes in the line show interval omega ₁ ′、Ω′ ₂ 、...、Ω′ _N If it is judged that the coordinate value is +.>Is at display interval omega _k ' in which Ω _k ' virtual sub-scene s _k According to s _k Coordinate system and combined scene->Rotation-translation scaling of the coordinate system, handle +.>Pose parameter in coordinate system>And->Conversion to s _k Pose parameters under a coordinate system, assigning the transformed pose parameters to pose parameter components of an interactive operation command a, and sending the interactive command a transformed with the pose parameters to a virtual sub-scene s _k Virtual sub-scene s _k In response to the interactive operation command.

In the embodiment of the invention, all the virtual sub-scene imaging intervals can be set to be convex intervals, so that all the virtual sub-scenes do not need to transmit depth images to the image synthesis module.

The beneficial effects of the embodiment are as follows:

the embodiment is suitable for a virtual-real fusion combined scene, particularly suitable for an MR head display, when a virtual sub-scene has shielding consistency with other sub-scene display intervals and three-dimensional intervals in the combined scene, in which the virtual scene cannot be deployed, the three-dimensional intervals do not need to transmit a depth image, only stereoscopic vision experience pictures to a terminal are needed to be transmitted, shielding calculation image synthesis can be correctly performed with other contents of the combined scene, particularly when all virtual sub-scene imaging intervals are set to be convex intervals, all virtual sub-scenes do not need to transmit the depth image to a user terminal, and immersive experience of the virtual fusion combined scene can be generated for a user, so that network bandwidth is remarkably saved.

Second embodiment:

on the basis of the first embodiment, the adjustment system is constructed to obtain the second embodiment. As shown in fig. 5, the virtual scene depth information calculation module is adjusted to be deployed on the cloud server, a virtual scene image synthesis module is added, the virtual scene image synthesis module is also deployed on the cloud server, the stereoscopic experience picture generated by each virtual sub-scene is firstly subjected to shielding calculation in the virtual scene image synthesis module to obtain a synthetic image only containing all virtual sub-scene contents, the synthetic image of all virtual sub-scene contents is transmitted to the virtual-real image synthesis module on the user terminal, and the virtual-real image synthesis module performs shielding calculation on the synthetic image containing all virtual sub-scene contents and the stereoscopic experience picture of the real scene user to obtain a final virtual-real fused combined scene experience picture, and the final virtual-real fused combined scene experience picture is displayed to the user through the user interface. Referring to the image synthesis method of the first embodiment, a specific implementation method for generating a synthesized image including only all virtual sub-scene contents is as follows: at any time t _j The left eye synthetic image of the user p stereoscopic experience picture only containing all virtual sub-scene contents is made to beThe right eye composite image isThe corresponding depth images are +.>Initial setting->Wherein (1)>For virtual sub-scene s _i The generated stereoscopic experience picture of the left eye and the right eye of the user p is s _i Scene instance with synchronous generation of corresponding depth image +.>Then also initially set +.>If the corresponding depth image is not synchronously generated, s is obtained by calculation of the virtual scene depth information calculation module _i Displaying the interval depth information characterization value in the combined scene: depth image->Walk->All pixels, if for any of them +.>A value of τ _null Then(delta can be as large as possible infinity), otherwise +.>Walk->All pixels, if for any of them +.>A value of τ _null Then->(delta can be as large as possible infinity), otherwise +.>Then the set Λ and +.>The occlusion calculating image composition is performed on all sub-scenes in the system, and the occlusion calculating image composition process is the same as that of the first embodiment and will not be repeated. After synthesis, user p stereoscopic experience picture image only containing all virtual sub-scene contents is obtained>Corresponding depth image +.>Transmitting to virtual-real image synthesis module, and generating stereoscopic experience picture image of real scene in real time >Corresponding depth image +.>Carrying out shielding calculation synthesis to obtain a virtual-real fusion combined scene +.>Is a user experience screen of (1).

The beneficial effects are that:

compared with the first embodiment, the work of synthesizing the occlusion calculation images between the virtual sub-scenes is put on the cloud end, so that the calculated amount of image synthesis of the terminal is reduced, and the cost is that the cloud end performs image synthesis once, and the delay is possibly increased.

Third embodiment:

if a person in a real scene possibly enters a display interval of a virtual sub-scene in a combined scene, if the entered virtual sub-scene does not generate a corresponding depth image when an experience picture is generated, performing shielding calculation by using a depth information characterization value of the virtual sub-scene in the display interval of the combined scene, and possibly shielding and calculating distortion of the person and the display content of the virtual sub-scene, in order to solve the distortion problem, the first embodiment and the second embodiment are improved in a specific improvement mode that: when personnel in a real scene enterArbitrary virtual sub-scene s _i After the display section of the combined scene, if s _i If no depth image is generated in step S10, setting S _i It is necessary to generate corresponding depth images while generating the user p stereoscopic experience picture when the person in the real scene leaves s _i Resetting s after the display interval of the combined scene _i No corresponding depth image is generated while generating the user p stereoscopic experience picture, so that in step 10, s _i The generation of the depth image is stopped. Wherein, judging whether the person in the real scene enters the virtual scene sub-scene s _i At the display interval omega of the combined scene _i The mode of' is: calculating the combination scene of the real person from the real scene user experience picture by using a computer vision algorithmOr calculate the position of the real person figure in the combined scene +.>Occupied space, when real person is in the combined scene +.>Is positioned at omega _i ' or the shape of a real person in a combined scene +.>Occupied space and omega _i When 'intersecting', the person in the real scene enters the virtual scene sub-scene s _i At the display interval omega of the combined scene _i ′。

The beneficial effects are that:

the occlusion calculations of the person and the virtual sub-scene are not distorted.

Fourth embodiment:

the combined scene is instead formed by combining only virtual sub-scenes, and no real scene is included any more, and the system is formed as shown in fig. 6, and compared with the system diagram of the first embodiment of fig. 4, only the real scene imaging module is omitted. The experience generating method of the present embodiment is the same as the second embodiment in that the computing work related to the real scene is subtracted compared with the experience generating method of the first embodiment, and the step 20 is not repeated here.

The beneficial effects are that:

a Virtual Reality (VR) terminal may be adapted.

Fifth embodiment:

the combined scene is also formed by combining virtual sub-scenes only, and does not contain the real scene any more, the system is formed as shown in fig. 7, and compared with the system diagram of the second embodiment of fig. 5, the real scene imaging module is omitted. The experience generation method of the present embodiment is the same as that of the second embodiment except for subtracting the calculation work related to the real scene, and will not be described here again.

The beneficial effects are that:

the VR terminal can be adapted, and compared with the fourth embodiment, the cloud end is placed in the work of synthesizing the occlusion calculation images between the virtual sub-scenes, so that the computing amount of image synthesis performed by the terminal is completely avoided in the embodiment.

In addition, in order to achieve the above object, the present invention proposes a combined scene experience generating system based on XR technology, the system including a memory, a processor, and a combined scene experience generating method program based on XR technology stored on the processor, the combined scene experience generating method program based on XR technology executing the steps of the combined scene experience generating method described in all the embodiments above when executed by the processor.

To achieve the above object, the present invention further provides a computer readable storage medium, where a computer program is stored, where the computer program executes the steps of the combined scene experience generating method according to all the embodiments described above when the computer program is called by a processor.

The invention provides a combined scene experience generation method, a system and a medium based on an XR technology, wherein the experience generation method comprises the following steps: step S10: generating a binocular stereoscopic experience picture of each sub-scene of the combined scene, whereinAll arbitrary virtual sub-scenes s _i Acquiring the pose of the user p in the coordinate system in real time, and according to the real-time pose and s _i Is used for the combination imaging interval of the user p and s _i Pupil distance in scene space, generating user p to scene s _i Binocular stereoscopic experience picture; step S20: the binocular stereoscopic experience pictures of all the sub-scenes are subjected to shielding calculation to be synthesized into a combined scene experience picture, and the combined scene experience picture is displayed for a user to see; wherein for any sub-scene s in the combined scene _j If the scene instance does not generate a depth image, in said step S20, when S _j When the pixel needs of the experience picture and the corresponding pixels of other components of the combined scene are compared with each other through the depth values, and the depth values of the pixels can be s _j And in addition, any sub-scene of the combined scene is a real scene or a virtual sub-scene, and each sub-scene is a convex section or a non-convex section in the display section of the combined scene. According to the invention, under the condition that the combined scene does not meet the condition that a separation surface exists among all the sub-scenes, particularly under the condition that the combined scene consists of the virtual scene and the real scene, when the shielding consistency is achieved, all the virtual sub-scenes do not need to generate and transmit the depth image of the corresponding experience picture when the user experience picture is generated respectively, and the combined scene experience can be generated correctly for the user, so that the bandwidth for transmitting the depth image is remarkably saved.

The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather to utilize the equivalent structures or equivalent processes disclosed in the present specification and the accompanying drawings, or to be directly or indirectly applied to other related technical fields, which are all encompassed by the present invention.

Claims

1. A combined scene experience generation method based on an augmented reality (XR) technology, the method comprising the steps of:

wherein for any sub-scene s in the combined scene _j If the scene instance does not transmit a depth image, in said step S20, when S _j When the requirements of each pixel of the experience picture and the corresponding pixels of other components of the combined scene are compared through depth values, and the occlusion relation is obtained, the depth values of the pixels for imaging the object point can be s _j And representing the depth information of the display interval of the combined scene, wherein the depth information of the display interval is obtained based on the real-time pose of the user p in the combined scene, and in addition, any sub-scene of the combined scene is a real scene or a virtual sub-scene, and each sub-scene is a convex interval or a non-convex interval in the display interval of the combined scene.

2. The method of generating according to claim 1, characterized in that step S00 is preceded by step S10: setting experience generation parameters of a combined scene, wherein the experience generation parameters comprise imaging intervals of each virtual sub-scene, pupil distance of a user p in each virtual sub-scene, rotation translation transformation relation between a coordinate system of the combined scene and a coordinate system of each virtual sub-scene, display intervals of each sub-scene in the combined scene and whether scene instances of each sub-scene generate corresponding depth images when rendering and generating experience pictures respectively.

3. The method according to claim 2, wherein the step S00 sets whether the virtual sub-scene needs to generate the depth image by: s is(s) _i For any virtual sub-scene in the combined scene, when s _i When the combined scene has shielding consistency to other sub-scenesThen set s _i Does not require the generation of a depth image, otherwise it is.

4. The method of claim 3, wherein step S20 is followed by step S30 of the combined scene accepting user p interaction input, determining whether the interaction input is an interaction with the sub-scene, and if so, determining that the interaction input is user p with any sub-scene S _i Is then converted into a scene s _i Interactive input in coordinate system s _i Responding to the converted interactive input.

5. The method of generating of claim 4, wherein for any virtual sub-scene s _i If s is _i In step S00, the display section set in the combined scene is a convex section, when S _i When the display interval of the combination scene is not intersected with the display interval of other components in the combination scene, s is needed _i The method has shielding consistency for other components in the combined scene, and is s _i Setting its scene instance in step S00 does not require generating a depth image.

6. The method of generating of claim 5, wherein for any virtual sub-scene s _i If s _i Does not transmit a depth image in the scene instance of the combined sceneThe display interval in (a) is omega' _i Omega 'can be used' _i Surface depth value of p-positive facing portion of user characterizes omega' _i Depth information of (2) may be Ω' _i Surface depth value characterization Ω 'of the user facing p-female part' _i Or by representing Ω 'by any value between the surface depth value of the male part and the surface depth value of the female part' _i Is included in the depth information of (a).

7. The generation of claim 6 Method, characterized in that the combined scene consists of 1 real scene s ₀ With more than 1 virtual sub-scene, in step S00, all virtual sub-scenes are set to be convex intervals in the display interval of the combined scene, the virtual sub-scenes are not overlapped with each other, and corresponding depth images are not generated when the experience picture of the user p is generated in step S10, and in step S10, the XR terminal of the user p generates a real scene S for the virtual sub-scenes ₀ In step S20, when the sub-scene experience pictures are combined into the combined scene experience picture by performing occlusion calculation, for any sub-virtual scene S _i If s _i Does not transmit depth images when computing the scene instances of user p on the same view line s _i Pixels and s ₀ When the pixel is corresponding to the shielding relation of the pixel, if the pixel images the object point, the depth value of the pixel is from s _i The depth information characterization value is obtained by calculation in the display interval of the combined scene, and is obtained by searching for s _i With any other virtual sub-scene s _j According to s _i And s _j The one-way shielding relation between the display intervals of the combined scene can obtain s on the same view line of the user p _i Arbitrary pixel and s _j Occlusion relationship of corresponding pixels.

8. The method according to any one of claims 1-7, wherein when a person in a real scene enters any virtual sub-scene s _i After the display section of the combined scene, if s _i If the scene instance of (1) does not generate a depth image, then set s _i The scene instance of (2) requires that the corresponding depth image be generated while the user p experience picture is generated, when the person in the real scene leaves s _i S after the display interval of the combined scene _i The generation of the depth image is stopped.

9. An XR technology based combined scene experience generation system, comprising a memory, a processor, and an XR technology based combined scene experience generation method program stored on the processor, which when executed by the processor performs the steps of the method of any one of claims 1 to 8.

10. A computer readable storage medium, characterized in that it has stored thereon a computer program which, when invoked by a processor, performs the steps of the combined scene experience generation method based on XR technology as claimed in any one of claims 1 to 8.