CN117315203A

CN117315203A - XR combined scene experience picture generation method, system, terminal and medium

Info

Publication number: CN117315203A
Application number: CN202311205254.2A
Authority: CN
Inventors: 蔡铁峰
Original assignee: Shenzhen Vocational And Technical University
Current assignee: Shenzhen Vocational And Technical University
Priority date: 2023-09-19
Filing date: 2023-09-19
Publication date: 2023-12-29

Abstract

The invention discloses a method, a system, a terminal and a medium for generating an XR combined scene experience picture.

Description

XR combined scene experience picture generation method, system, terminal and medium

Technical Field

The invention relates to the technical field of ubiquitous virtual reality, in particular to an XR combined scene experience picture generation method, system, terminal and medium.

Background

Techniques such as Virtual Reality (VR), augmented Reality (AR), mixed Reality (MR) are of great value in solving the problems of invisible, untouched, and the like in practical training of vocational education. VR, AR, MR (hereinafter referred to as XR, augmented reality) and the like. Under the energization of high-performance wireless network technologies such as a 5G network and wifi6, services such as storage, calculation, rendering and the like required by an XR mobile terminal (5G mobile phone, head display and the like) can be put on the cloud. Thus, based on cloud services such as cloud storage, cloud computing, cloud rendering and the like, computing, storage and rendering capabilities which can be possessed by a single XR terminal can be unlimited.

Based on the XR technology, a combined scene formed by combining more than 2 sub-scenes can be constructed, wherein the sub-scenes can be virtual or real scenes. The combined scene can break the colorful experience activity types of the fence construction among the scenes, for example, in a classroom, a plurality of single teaching activity scenes or a plurality of collaborative teaching activity scenes can be combined to generate a teaching combined scene containing a plurality of scenes, and a plurality of combined scenes can be constructed by changing the types and the quantity of the contained scenes, wherein the method mainly comprises the following steps: "in-study" scenes, "demonstration-practical training" scenes, "cognition-practical training" scenes, practical training help scenes, assessment competition scenes, multi-group practical operation demonstration and the like.

In a combined scene formed by combining a real scene and a virtual sub-scene, a real scene experience picture is generated in real time by a real camera, a virtual scene experience picture is rendered by a computing system, and a complete experience picture of the combined scene is obtained by synthesizing the real scene experience picture and the virtual sub-scene experience picture. When a combined scene contains multiple virtual sub-scenes, a single computing device may not be able to load the rendering of this combined scene. Each virtual sub-scene of the combined scene is deployed and operated on a plurality of servers on a server cluster or a plurality of display cards of 1 large server, each virtual sub-scene is respectively and independently rendered on different servers or different display cards to generate experience pictures, the experience pictures of each sub-scene are summarized together after being transmitted, and the experience pictures of the combined scene are obtained through image synthesis, so that based on cloud service, XR combined scene immersive experience containing a large number of sub-scenes can be generated for users.

When sub-scenes in the combined scene are respectively and independently rendered, the user experience picture of each sub-scene needs to be synthesized by shielding calculation to the experience picture of the user on the XR combined scene. In the patent application of a new dimension space construction method, system and platform based on XR technology (application number CN 202210022608.9), corresponding depth images are generated while rendering and generating user experience pictures of all sub-scenes, and shielding calculation is carried out on the user experience pictures of all the sub-scenes according to the depth images. In order to subtract the generation and transmission of the depth image, in the patent application of the ubiquitous training campus construction method, system and storage medium based on the XR technology (application number: CN 202210908795.0), the separation surfaces must exist between all sub-scenes in advance, and the occlusion relation between the experience pictures of all virtual scenes can be calculated without the depth image by the position of the separation surface where the scene is located. However, many XR combined scenes in practical application requirements cannot meet the requirement that each sub-scene has a separation surface, for example: the combined scene composed of a real campus scene and a plurality of virtual training scenes is deployed in the scenes of squares, building halls and the like of the real campus, the virtual training scenes are often surrounded by buildings in the real campus scene, and a separation surface capable of separating the virtual training scenes from the real campus scene does not exist between the virtual training scenes and the real campus scene, so that the construction method of the ubiquitous training campus construction method, the ubiquitous training campus construction system and the construction method of a storage medium (application number: CN 202210908795.0) based on the XR technology cannot be applied to. Two XR combined scene experience generating methods are given in patent applications of application numbers 2023110288353 and 2023110286220, and for a combined scene formed by combining a convex scene and a non-convex scene, transmission of experience picture depth information is effectively reduced by representing a sub-interval experience picture or calculating a unidirectional occlusion user position interval by the depth information of the sub-interval in a display interval of the combined scene.

Because of the limitation of power, the effective range of the depth information of the existing XR mobile terminal depth camera is not more than 10 meters, and the imaging content with the distance of more than 10 meters is poor in accuracy although the depth information can be obtained through binocular vision measurement. In a combined scene formed by combining a real scene and a virtual sub-scene, when the real scene is not convex and the display interval of the virtual sub-scene exceeds the effective range of the depth information of the terminal depth camera, large-area errors possibly existing between the real scene and the virtual sub-scene can be calculated by means of shielding of the depth information due to insufficient precision of the depth information of the picture of the real scene. The two XR combined scene experience generating methods disclosed in the patent applications of application nos. 2023110288353 and 2023110286220 effectively reduce transmission of depth information of an experience picture, but do not completely avoid obtaining occlusion relations between picture pixels through depth value comparison. There remains a need for improved XR combined scene experience generation methods.

Disclosure of Invention

The invention mainly aims to provide a method, a system, a terminal and a medium for generating an XR combined scene experience picture, and particularly aims at a large combined scene formed by combining a real scene and a virtual sub-scene, so that direct comparison of depth information with poor precision is avoided, and an object shielding effect with incorrect combined scene experience picture is obtained.

In order to achieve the above purpose, the present invention provides a method for generating an XR combined scene experience picture, the method comprising the following steps:

step S10, generating experience pictures of all sub-scenes of the combined scene, wherein all arbitrary virtual sub-scenes S _i Acquiring the pose of the user p in the coordinate system in real time, and according to the real-time pose and s _i Is the imaging interval of (a) and the user p is s _i Pupil distance in scene space, generating user p versus s _i Is an experience picture of the computer;

step S20, the binocular stereoscopic experience picture of each sub-scene is combined into a combined scene experience picture by shielding calculation, and any 2 pixels tau of experience picture images of common user vision but belonging to different sub-scenes are used _i And τ _j If τ _i And τ _j All have imaging of non-sky area of scene content, and computing τ _i And τ _j The shielding relation is as follows: handle τ _i Display interval and tau of the combined scene _j Assigning tau to the occlusion relation of the display interval of the combined scene _i And τ _j A shielding relationship;

wherein a single sub-scene can have a plurality of display sub-intervals in the combined scene, any sub-scene of the combined scene is a virtual scene or a real scene, and for a combined scene sub-scene s having a plurality of display sub-intervals in the combined scene _i When any pixel tau calculates the shielding relation with other combined scene sub-scene pixels, the display sub-interval to which tau belongs needs to be judged, and the shielding relation between pixels is obtained by the shielding relation between the display sub-interval to which tau belongs and the display interval or the display sub-interval of other combined scene sub-scenes.

As a further improvement of the invention, the combined scene comprises a real scene, and the step S10 is preceded by the step S00 of setting the generation parameters of the experience picture of the combined scene, wherein the setting comprises setting the imaging interval of each virtual sub-scene, the pupil distance of a user p in each virtual sub-scene, the rotation translation transformation relation between the combined scene coordinate system and each virtual sub-scene coordinate system, the display interval of each virtual sub-scene in the combined scene, and setting a plurality of display sub-intervals of the real scene in the combined scene.

Wherein, the step S20 includes:

s201, identifying display subintervals of pixels in a stereoscopic experience picture of a real scene to which a combined scene belongs;

s202, obtaining the shielding relation between the display sub-interval of the real scene in the combined scene and the display interval of each virtual sub-scene under the current user pose;

s203, according to the shielding relation, synthesizing the real scene experience picture and the virtual sub-scene experience picture.

As a further improvement of the present invention, the step S201 is specifically implemented as: for any pixel tau belonging to the experience picture of the real scene, calculating a display sub-display interval set of the real scene penetrated by the user sight where tau is located in the combined scene, and when the error range of the depth value of tau and any display sub-display interval in the setWhen the depth value range on the sight line is intersected, tau belongs to the display subinterval +.>

As another improvement of the present invention, the step S00 further includes marking object categories included in each display subinterval of the real scene in the combined scene, and the step S201 is specifically implemented as: semantic segmentation is carried out on the real scene experience picture image to obtain the category of each pixel imaging content, for any pixel tau belonging to the real scene experience picture, a display sub-display interval set of the real scene penetrated by the user sight where tau is located in the combined scene is calculated, and when the category of the tau imaging content belongs to any display sub-display interval in the setWhen the object is included, τ is within the display subinterval +.>

As another improvement method of the present invention, the step S00 further includes marking object instances included in each display subinterval of the real scene in the combined scene, and the step S201 is specifically implemented as: performing instance segmentation on the image of the real scene experience picture to obtain an instance of each pixel imaging content, for any pixel tau belonging to the real scene experience picture, calculating a display sub-display interval set of the real scene penetrated by the line of sight of the user tau in the combined scene, and when the instance of the tau imaging content belongs to any display sub-display interval in the set In the case of the object example involved, then τ belongs to the display subinterval +.>

As a further improvement of the present invention, step S00 further includes ensuring that each set virtual sub-scene is blocked in one direction between a display section of the combined scene and each display sub-section of the real scene, and in step S20, a manner of processing a blocking relationship between a real scene person and virtual sub-scene content is to scratch out the real scene person and repair an image on the real scene experience screen or obtain the blocking relationship according to the depth information comparison, and step S203 includes:

s2031, synthesizing all virtual sub-scene experience pictures to obtain a virtual scene synthesis experience picture;

s2032, synthesizing the real scene experience picture and the virtual scene synthesis experience picture to obtain a combined scene experience picture.

To achieve the above object, the present invention also proposes an XR combined scene experience picture generation system, the system comprising a memory, a processor and an XR combined scene experience picture generation method program stored on the processor, the XR combined scene experience picture generation method program being executed by the processor to perform the steps of the method as described above.

In order to achieve the above objective, the present invention further proposes an XR combined scene experience picture generation terminal, which executes step S203 or step S2031 or step S2032 of the XR combined scene experience picture generation method as described above.

To achieve the above object, the present invention also proposes a computer readable storage medium having stored thereon a computer program which, when invoked by a processor, performs the steps of the XR combined scene experience picture generation method as described above.

Drawings

FIG. 1 is a flowchart of a method for generating an XR combined scene experience picture according to the present invention;

FIG. 2 is a diagram illustrating binocular stereoscopic vision according to the present invention;

FIG. 3 is an example of binocular stereo depth information error of the present invention;

FIG. 4 is an example of a real scene of the present invention;

fig. 5 is an example of a combined scene of virtual-real fusion according to the present invention;

FIG. 6 is a schematic diagram of a real scene divided into a plurality of display subintervals according to the present invention;

FIG. 7 is a system configuration diagram of a first embodiment of the present invention;

FIG. 8 is a schematic view of a real scene display subinterval of the present invention through which a line of sight of a pixel passes;

fig. 9 is a system configuration diagram of an eighth embodiment of the present invention.

Detailed Description

It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

Referring to fig. 1, the invention provides a method for generating an XR combined scene experience picture, which comprises the following steps:

Further, the combined scene includes a real scene, and step S00 is further performed before the step S10, wherein the step S00 is performed to set a combined scene experience picture generation parameter, and the method comprises the steps of setting an imaging interval of each virtual sub-scene, a pupil distance of a user p in each virtual sub-scene, a rotation translation transformation relation between the combined scene coordinate system and each virtual sub-scene coordinate system, a display interval of each virtual sub-scene in the combined scene, and setting a plurality of display sub-intervals of the real scene in the combined scene.

Wherein, the step S20 includes:

Further, the step S201 is specifically implemented as: for any pixel tau belonging to the experience picture of the real scene, calculating a display sub-display interval set of the real scene penetrated by the user sight where tau is located in the combined scene, and when the error range of the depth value of tau and any display sub-display interval in the set When the depth value range on the sight line is intersected, tau belongs to the display subinterval +.>

Or, the step S00 further includes marking object categories included in each display subinterval of the real scene in the combined scene, and the step S201 is specifically implemented as: semantic segmentation is carried out on the real scene experience picture image to obtain the category of each pixel imaging content, for any pixel tau belonging to the real scene experience picture, a display sub-display interval set of the real scene penetrated by the user sight where tau is located in the combined scene is calculated, and when the category of the tau imaging content belongs to any display sub-display interval in the setWhen the object is included, τ is within the display subinterval +.>

Or, the step S00 further includes marking object instances included in each display subinterval of the real scene in the combined scene, and the step S201 is specifically implemented as: performing instance segmentation on the image of the real scene experience picture to obtain an instance of each pixel imaging content, for any pixel tau belonging to the real scene experience picture, calculating a display sub-display interval set of the real scene penetrated by the line of sight of the user tau in the combined scene, and when the instance of the tau imaging content belongs to any display sub-in the set Indication areaIn the case of the object example involved, then τ belongs to the display subinterval +.>

Further, step S00 further includes ensuring that each set virtual sub-scene is blocked unidirectionally in the display interval of the combined scene and each display sub-interval of the real scene, and in step S20, the way of processing the blocking relation between the real scene personnel and the virtual sub-scene content is to remove the real scene personnel and repair the image for the real scene experience picture or obtain the blocking relation according to the depth information comparison, and step S203 includes:

The following describes important technical terms related to the present invention.

Convex section

Order theIs a three-dimensional convex section, and has the following properties: let->Wherein the point a and the point b are two points, and the coordinates of the point a areThe coordinates of the point b are +.>The point c is on the connection line of the points a and b and between the points a and b, and the coordinate of the point c can be expressed as +.>Wherein 1 > lambda > 0, then the c-point necessarily also belongs to +.>

Scene and scene instance

A scene defines objects contained in a three-dimensional space, object states, object itself running logic, and logic for interactions between objects; the scene example is a program process or thread which is operated in real time by computing resources such as a computer processor, a memory, a display card and the like according to scene definition, and the program process or thread calculates the states of all objects in the scene in real time, renders pictures and responds to the interaction of users. When a single scene has multiple user experiences at the same time, if the computing resource which can be obtained by the single scene instance can not generate experience pictures for all users in real time, multiple scene instances are needed to be generated for the scene and distributed to all users, object states in the scene are synchronized by establishing communication connection among the scene instances, and the experience pictures are respectively generated for the corresponding users in real time by all the scene instances, so that all the users share the experience scene.

User field of view and coordinate system

The user view field is a space range visible by a user, and the user view field changes along with the head and eyeball movement of the user. In the embodiment of the invention, the user view field coordinate system is set in the following manner: on the binocular optical center line of human eyes, the midpoint between the binocular optical centers is used as the origin of a coordinate system, the direction of the binocular optical center line of human eyes from the left eye to the right eye is defined as the positive direction of the x axis, the direction of the human face is defined as the positive direction of the z axis, and the direction of the human face is perpendicular to the x axis and the z axis and meets the requirement that the left hand coordinate system defines the positive direction of the y axis.

XR scene imaging interval

User p in a combined scenarioPerforming immersive interactive experience s _i To form a combined scene->Is one of the sub-scenes of (a). In generating a combined scene->S is as experience of _i It is possible that only part of the scene content needs to be in the combined scene +.>Is presented internally to the user p-view in order to define s _i Need to be in the combined scene->Scene content presented to user p to watch, setting a three-dimensional interval and scene s _i Only the content of the three-dimensional interval needs to be imaged when the experience picture of the user p is generated, and the set three-dimensional interval is an XR scene imaging interval. The imaging interval can be infinite, or can be any three-dimensional shape interval such as a cube, a sphere and the like.

XR scene in display interval of combined scene

Combined sceneComprising a plurality of sub-scenes, in order to make the sub-scenes combined in the scene->Without or with reduced overlap, for any sub-scene s _i In setting up the combined scene->A three-dimensional interval omega is internally set for presenting s _i Defining s _i At->When the combination scene is set up>Coordinate system and sub-scene s _i When the rotation and translation scaling relation between the coordinate systems is adopted, the three-dimensional interval omega is mapped to the sub-scene s according to the rotation and translation scaling relation _i The three-dimensional interval obtained->Can be used as scene s _i In turn, scene s according to this rotation-translation scaling relationship _i Imaging interval +.>Mapping to the field of view of the viewer p, the resulting three-dimensional interval can be used as scene s _i In a combined scene->Is displayed in the display section of (a).

Unidirectional shielding and bidirectional shielding

Scene s _i Sum s _j For combined scenesAt any time t _k If on all the lines of view of user p, if s _i And s _j The shielding is s _i S of scene content occlusion of (2) _j Scene content, at time t _k For user p, scene s _i Sum s _j Unidirectional shielding, and the shielding relation is s _i Unidirectional shielding s _j The method comprises the steps of carrying out a first treatment on the surface of the Conversely, if on all the lines of view of user p, if s _i And s _j The shielding is s _j S of scene content occlusion of (2) _i Scene content, at time t _k For user p, scene s _i Sum s _j Unidirectional shielding, and the shielding relation is s _j Unidirectional shielding s _i The method comprises the steps of carrying out a first treatment on the surface of the In contrast, if on all the lines of view of user p, if s _i And s _j Occlusion occurs, probably s _i S of scene content occlusion of (2) _j Scene content, also possible s _j S of scene content occlusion of (2) _i Scene content, at time t _k For user p, scene s _i Sum s _j And (5) bidirectional shielding. If at all times, user p is in the combined scene +.>The internal pose has a scene s _i Sum s _j Non-bidirectional occlusion, then we call: in a combined scene->In, scene s _i Sum s _j Is one-way shielding. Two disjoint convex sections are necessarily blocked in one direction, and see the specification 115-116 of patent application (CN 202210908795.0) of a ubiquitous training campus building method, system and storage medium based on XR technology for details.

The principle of the present invention will be described below

Binocular visual depth information accuracy analysis

Taking fig. 2 as an example, a line from an object point a to an optical center of the left-eye camera is perpendicular to an imaging plane of the left-eye camera, an imaging position of the left-eye camera is taken as an origin of a coordinate system, the position of the object point a on the imaging plane of the right-eye camera is taken as a point u, and the point u is positioned in a photosensitive section of an image pixel tau of the right-eye camera.

As shown in FIG. 3, since the image values are discrete, the image coordinate value of the image pixel τ corresponds to the point u 'on the imaging plane, and the connection line between the right eye optical center and u' isAnd->The intersection point of (a) is A'. Thus, according to binocular images generated by the left and right eye cameras, the depth value of A is calculated as |O ' A ' |, the true depth value of A is calculated as |O ' A|, and the two have deviation. Taking the case that the left and right eye images are both 4k resolution, the number of transverse pixels is 4096 pixels, and the angle of view is 120 degrees, in the following exampleOn an XR head display, the distance |OO ' | between the binocular cameras is close to the pupil distance of human eyes, the distance |OO ' | is calculated by taking the average pupil distance of the human eyes as 63mm, the distance |O ' A| is calculated by taking the deviation of half pixels as an imaging error range, the obtained depth value range is 37m to 75m, and the visible error range is extremely large. See Zhang Anjun for details on the accuracy analysis of binocular stereo vision of program 5.2, machine vision (ISBN 7-03-014717-0, 2005).

Fig. 4 shows a real scene comprising a building, a tree, a square, into which the user P enters, wherein the square has a length and a width of 100 meters. In fig. 5, virtual scenes A, B, C, D are deployed on squares of real scenes, and three-dimensional display sections are defined for each virtual scene in order to avoid overlapping of virtual objects between the virtual scenes. Thus, a combined scene comprising the virtual scene and the real scene is constructed. When the user P uses the XR head-display device to perform immersive experience on the combined scene, the virtual scene can easily generate high-precision depth information when generating an experience picture, but in the real scene experience picture, high-precision short-distance depth information (for example, within a range of 10 meters) can be obtained through the depth camera in the XR head display, and the precision of the long-distance depth information is poor. If the occlusion relationship between scene contents is judged directly through the comparison of depth information, the situation that the virtual scene D occludes the real tree or the real building occludes the virtual scene A and the scene B may occur in the user experience of the combined scene of fig. 5, and the actual occlusion relationship is seriously deviated.

As shown in fig. 6, the method provided by the invention divides the real scene into a plurality of three-dimensional intervals, and all the three-dimensional intervals of the real scene and all the virtual sub-scenes in the combined scene are unidirectional occlusion on the user view line of the position where the user P is located. The XR head display acquires the experience picture of the user P of the real scene, correctly judges the subinterval to which each pixel belongs through a certain method, and obtains the correct shielding relation between each pixel in the experience picture of the real scene and the corresponding pixel of each virtual sub-scene experience picture in the combined scene according to the unidirectional shielding relation between the subinterval and each virtual scene in the combined scene in the display interval of the combined scene. And thus synthesizing to obtain a combined scene experience picture. When the sub-section of each pixel of the real scene is determined, the sub-section of the real scene to which the pixel belongs is determined according to the pixel depth information and the error range thereof, and when the sub-section of the real scene to which the pixel belongs cannot be accurately determined according to the pixel depth information and the error range thereof, the sub-section of the pixel can be further determined according to the visual information such as the type of the imaging object.

In the embodiment of the invention, the imaging view angles and the resolutions of all images are defaulted to be the same, so that the pixels with the same image coordinates among different images have a shielding relationship; if the imaging field angle and the resolution are different between different images, the imaging field angle and the resolution between the images can be converted to be the same by interpolation or sampling, which is a common means for those skilled in the art.

First embodiment:

combined scenario of the present embodimentIs a virtual-real mixed combined scene, and the combined scene consists of 1 real scene s ₀ Is composed of N virtual sub-scenes, wherein N is a natural number greater than 1, and the virtual sub-scenes are s respectively ₁ 、s ₂ 、...、s _N . As shown in fig. 7, in this embodiment, the combined scene experience generating system includes: the user interface 10, the image composition module 20, the real scene imaging module 30, the combined scene parameter setting module 40, the section identification module 50, the instance of the 1 st virtual sub-scene (60-1), the instance of the 2 nd virtual sub-scene (60-2), -the nth virtual sub-scene (60-N). Wherein the 1 st instance (60-1) of the virtual sub-scene is the virtual sub-scene s ₁ Is the virtual sub-scene s, the 2 nd virtual sub-scene (60-2) ₂ The instance (60-N) of the nth sub-scene is the virtual sub-scene s _N Is an example of (a). For video perspective MR head display, the real scene imaging module 30 generates a binocular stereoscopic experience picture of a user on a real scene and a corresponding depth image through photoelectric equipment such as a camera module and the like on a user MR head display terminal, and for optical perspective MR head display, the real scene imaging module30, through the photoelectric devices such as a camera on the MR head display terminal of the user, a binocular stereoscopic experience picture is not required to be generated, but a binocular stereoscopic depth image is required to be generated (the binocular stereoscopic picture can be considered to be generated, and the color value of each pixel in the binocular stereoscopic picture image is 0, namely a full black image). The user sends a command to the combined scene parameter setting module 40 through the user interface 10 to set parameters such as imaging intervals of each sub-scene, display intervals of each sub-scene in the combined scene, display intervals of the real scene, and the like, and the system can also send a command to the combined scene parameter setting module 40 to set the combined scene parameters, and the set parameters can be sent to the instance of each virtual sub-scene and the interval identification module 50. Each virtual sub-scene instance receives user pose and interaction operation information from the user interface 10 in real time, responds to the interaction operation, and generates a user binocular stereoscopic vision experience picture according to an imaging interval, pupil distance of a user under a coordinate system and the real-time pose. Each virtual sub-scene sends a binocular stereoscopic user experience picture to the image composition module 20. The image composition module 20 performs image composition on the real scene stereoscopic imaging picture read from the real scene imaging module 30 in real time and the user experience picture sent by each virtual sub-scene, and when in image composition, needs to obtain the display sub-interval to which each pixel of the real scene experience picture belongs, and further performs image composition according to the shielding relationship between each virtual sub-scene display interval or between the virtual sub-scene display interval and the real scene display sub-interval. The image composition module 20 sends the composed image to the user interface 10 for display to the user.

In the embodiment of the invention, the real scene s is used for simplicity ₀ And combining scenesAnd sharing a coordinate system. As shown in fig. 1, the combined scene experience generating method of the present embodiment includes the following steps:

step S00, setting a combined scene experience picture generation parameter, including setting an imaging interval of each virtual sub-scene, a pupil distance of a user p in each virtual sub-scene, a rotation translation transformation relation between the combined scene coordinate system and each virtual sub-scene coordinate system, a display interval of each virtual sub-scene in the combined scene, and setting a display sub-interval of a real scene in the combined scene

step S20, the binocular stereoscopic experience picture of each sub-scene is combined into a combined scene experience picture by shielding calculation, and any 2 pixels tau of experience picture images of common user vision but belonging to different sub-scenes are used _i And τ _j If τ _i And τ _j All have imaging of scene content and calculation of τ _i And τ _j The shielding relation is as follows: handle τ _i Display interval and tau of the combined scene _j Assigning tau to the occlusion relation of the display interval of the combined scene _i And τ _j A shielding relationship;

wherein a single sub-scene can have a plurality of display intervals in the combined scene, any sub-scene of the combined scene is a virtual scene or a real scene, and for sub-scenes s having a plurality of display intervals in the combined scene _i When any pixel τ and other sub-scene pixels calculate the occlusion relationship, it is also necessary to determine the display section to which τ belongs.

The specific implementation of step S00 is as follows:

in the combined scene parameter setting module 40, each combined scene parameter is set: setting each virtual sub-scene s ₁ 、s ₂ 、...、s _N The imaging intervals of (a) are omega respectively ₁ 、Ω ₂ 、...、Ω _N The method comprises the steps of carrying out a first treatment on the surface of the Setting a three-dimensional interval omega ' in a combined scene, wherein virtual scenes cannot be deployed in the three-dimensional interval omega ' cannot be deployed in the combined scene ' ₀ For example, real scene s ₀ The virtual scene cannot be deployed in the interval occupied by the wall, equipment and other articles, and the interval is used for judging the virtual sub-scene and the real scene s ₀ Is used to represent the real scene s when the occlusion relationship of (a) ₀ But does not actually define the real scene s ₀ Imaging; setting the user p in each virtual sub-scene s ₁ 、s ₂ 、...、s _N The interpupillary distances of d are respectively ₁ 、d ₂ 、...、d _N The method comprises the steps of carrying out a first treatment on the surface of the The pupil distance of the user in the combined scene is Setting each virtual sub-scene s ₁ 、s ₂ 、...、s _N In a combined scene->The display intervals of (2) are omega 'respectively' ₁ 、Ω′ ₂ 、...、Ω′ _N The method comprises the steps of carrying out a first treatment on the surface of the Setting a rotation-translation scaling relationship from each virtual sub-scene coordinate system to a combined scene coordinate system, and for any virtual sub-scene s in the rotation-translation scaling relationship _i (i is more than or equal to 1 and less than or equal to N), scene s _i The rotation, translation and scaling relationship between the coordinate system and the combined scene coordinate system is that the rotation angles around the Z, X, Y three axes are respectively theta according to the sequence of Z, X, Y ⁱ 、β ⁱ 、α ⁱ Respectively translate along X, Y, Z axisZ, X, Y scaling factor lambda for three axes _i ；

These parameters are not completely independent of each other and the following constraints need to be satisfied: enabling the pupil distance used by the terminal of the user p to generate a real scene stereoscopic experience picture or a depth image to be d ₀ (d ₀ As much as possible the true interpupillary distance of user p, the user is in the combined scenePupil distance->Pupil distance for arbitrary virtual sub-scene s _i There is->Arbitrary virtual sub-scene s _i Its imaging interval omega _i Any point b of (2) ₀ According to s _i Coordinate system to combined scene->After coordinate transformation is carried out on the rotation translation scaling relation of the coordinate system, the obtained point is +.>Belonging to omega' _i The method comprises the steps of carrying out a first treatment on the surface of the Display interval omega' ₁ 、Ω′ ₂ 、...、Ω′ _N Are not overlapped with each other, omega' ₁ 、Ω′ ₂ 、...、Ω′ _N Are all equal to omega' ₀ And do not overlap.

Still further, for Ω' ₀ Setting a plurality of display subintervals omega' _0,0 、Ω′ _0,1 、...、Completely cover omega' ₀ And all display sub-intervals do not intersect with the display intervals of other sub-scenes of the combined scene.

Still further, ensure that each virtual sub-scene display section set is respectively corresponding to the real scene s ₀ The separation plane exists in each display sub-section of (a) that is, constant unidirectional occlusion.

Further, it is ensured that a separation surface exists between the set virtual sub-scene display intervals, and the separation surfaces are constant unidirectional shielding.

The specific implementation of step S10 is as follows:

at any time t _j User p is in a combined sceneThe pose in the coordinate system is [ W ] _j Q _j ]Wherein W is _j Is the position, Q _j For each virtual sub-scene s, for the attitude angle ₁ 、s ₂ 、...、s _N Receiving real-time pose [ W ] of user p from user interface _j Q _j ]According to the rotation, translation and scaling relation between the respective coordinate system and the combined scene coordinate system, calculating to obtain the pose of the user under each virtual scene coordinate system, and for s ₁ 、s ₂ 、...、s _N Any virtual scene s _i According to s _i Coordinate system and combined scene->Rotation translation scaling relation of coordinate system, and calculating to obtain user p in s _i The pose in the coordinate system is +.>Computing the pose of the user p in the virtual sub-scene coordinate system from the rotational translational scaling relationship of the coordinate system is well known to those skilled in the art and will not be described in detail herein. For s ₁ 、s ₂ 、...、s _N Any virtual scene s _i According to the s of the user _i Pupil distance d of (2) _i Posture->Imaging interval omega thereof _i Imaging scene content in the scene to generate an immersive binocular stereoscopic experience picture, wherein s _i Left and right eye images of the stereoscopic picture are respectively used +.>And->Representation, wherein->And->To identify which pixels correspond to empty scene content, e.g. to set the value of the empty scene content pixels in the experience picture image to a specific value τ _null (e.g., blue), or may be performed with a particular arrayIdentifying whether the corresponding pixel is empty in the imaging scene content by only taking a value of 0 or 1, and selecting experience picture image +.>And->The value of the pixel whose middle scene content is empty is set to a specific value τ _null 。

Real scene s ₀ The stereoscopic experience picture left and right eye images areAnd->Corresponding depth image +.>And (3) withLet the depth image generated by the depth camera be +.>And->And the maximum effective depth value of the depth camera is T _d Order-makingConstruction of 2 sizes n _H ×n _W Matrix G of ^L And G ^R Wherein n is _H Stereoscopic experience picture image for real scene>And->N of the pixel rows of (2) _W Stereoscopic experience picture image for real scene >And->Is a pixel column number of (a). For->Is +.>If the depth value of the imaged scene content does not exceed T _d When make G ^L (uv) =0, otherwise let G ^L (uv) =1. Similarly, in G ^R Record->Whether the scene content imaged by the corresponding pixel in (a) exceeds T _d . If->And->Scene content depth values with pixel imaging in excess of T _d It is necessary to calculate the depth value of this part of pixels from binocular stereo vision and assign the depth value of this part of pixels to +.>And->The calculation of pixel depth values by binocular stereo vision is a common technique for those skilled in the art, e.g. FibioTosi et al, paper "NeRF-Supervised deep stereo", givesHigh-performance stereoscopic vision depth information calculating method [1 ]]And will not be described in detail herein.

The specific implementation of step S20 is as follows:

the specific way of synthesizing the image by the image synthesizing module 20 is as follows: at any time t _j The stereoscopic experience picture left and right eye images received from the real scene imaging module 30 areAnd->Corresponding depth image +.>And->For any scene s in the virtual sub-scene _k The left eye and the right eye of the received stereoscopic vision picture are +.>And->

The specific implementation of the step also comprises the following steps:

s201, identifying a real scene display subinterval of each pixel in a real scene stereoscopic experience picture to which a combined scene belongs;

S202, obtaining shielding relations between all display sub-intervals of a real scene in a combined scene and display intervals of all virtual sub-scenes under the current user pose;

Step S201 is specifically implemented as follows:

at the current pose of user p [ W _j Q _j ]Next, for a real scene s ₀ Experience the left and right eye depth of the pictureDegree imageAnd->Is +.>Obtaining pixels according to the transformation relation between the image coordinates and the display surface coordinate system>Corresponding pixels on the display area +.>The three-dimensional coordinate value of the image point u in the user view field can be obtained according to the transformation relation between the two-dimensional coordinate system of the display surface and the three-dimensional coordinate system of the user view field, and the three-dimensional coordinate value of the image point u in the user view field is further obtained according to the user view field coordinate system and the combined scene +.>The transformation relation of the coordinate system can obtain the image point +.>In a combined scene->Three-dimensional coordinate values in the coordinate system. Starting from the optical center of the human eye (if +.>A pixel of the left-eye image, here the optical center is the left-eye optical center), and an image point +.>The ray L is obtained by connecting and extending the line, and the real scene penetrated by the ray L is calculated in the combined scene +.>Is well known to those skilled in the art, and is not described in detail herein. Set for real scene display subinterval for letting ray L pass +. >And (3) representing. Wherein M is ₀ The real scene traversed by L for L shows the number of subintervals.

Construction of 2 sizes n _H ×n _W Matrix E of (2) _0,0 、E _0,1 Wherein n is _H Stereoscopic experience of picture images for real scenesAnd->N of the pixel rows of (2) _W Stereoscopic experience picture image for real scene>And->Initial setting E _0,0 、E _0,1 All elements in (2) are-1.

When M ₀ When the pixel is not 0, the pixel is usedFor example, the pixel value of the left-eye image is +.>Pixel +.>In depth image +.>The mid depth value is +.>If G ^L (u v) the value is 0, and the pixel can be obtained according to the depth value precision of the depth camera>The range of the true depth value of the corresponding object point is interval +.>If G ^L (u v) the value was 1, and the pixels were calculated according to formulas (5.11) and (5.16) of the accuracy analysis of the 5.2 nd-program binocular stereoscopic vision, which was compiled in Zhang Anjun, machine vision (ISBN 7-03-014717-0, published 2005)>Depth value accuracy, obtaining pixel +.>The range of the true depth value of the corresponding object point is interval +.>The depth value interval of the intersection point of all the display subintervals in the ray L and the Λ is calculated respectively, and the calculated depth value interval of the intersection point of all the display subintervals in the ray L and the Λ is respectivelyThis calculation method is well known to those skilled in the art and will not be described again. If M ₀ When 1, and->Depth value interval of only display subinterval in lambdaWith intersection, set E _0,0 (u v)＝k ₀ . If M ₀ Not less than 2, traversing the lambda in turnDepth value interval showing intersection point of subinterval in rays L and Λ, if there is +.>And->If there is intersection, then set E _0,0 (u v)＝k _i . Similarly, according to the pixel depth value interval, identifying the pixels in the combined scene +.>The display subinterval is recorded in E _0,1 Is a kind of medium.

The step S202 is specifically implemented as follows:

for a real scene, arbitrarily displaying subinterval omega 'in a combined scene' _0,r Obtaining an omega' _0,r And any virtual sub-scene display interval omega _i ' separating surface, calculating the relation between the current position of the user and the separating surface, so that the relation can obtain omega ' under the current user pose ' _0,r With omega' _i According to the method, the shielding relation between all display sub-intervals of the real scene in the combined scene and the display intervals of the virtual sub-scenes can be obtained. In paragraph 0157 of the description of the patent application "method, system and storage medium for ubiquitous training campus construction based on XR technology" (application number: CN 202210908795.0), the above calculation process is described in detail, and will not be described in detail here.

Constructing a block of size (m ₀ +1) x (n+1) matrix E, m ₀ +1 is the number of sub-intervals of the real scene displayed in the combined scene, and N is the combined sceneThe number of virtual sub-scenes contained therein. E, all elements in the E are initially set to be 0, the shielding relation between all display sub-intervals of the real scene in the combined scene and the display intervals of all virtual sub-scenes is traversed, and if the calculated result shows that the real scene is in the current user poseCombined scene arbitrary display subinterval omega' _0,r And any virtual sub-scene display interval omega _i 'the shielding relation is omega' _0,r Shielding omega' _i E (r, i) =1 is set.

Constructing a matrix E' with the size of (N+1) x (N+1), wherein N is a combined sceneAll elements in the number of virtual sub-scenes contained in E' are initialized to 0. According to section 0157 of description of patent application (CN 202210908795.0) of ubiquitous training campus construction method and system based on XR technology and storage medium, all virtual sub-scenes are obtained in combined scene->And displaying the shielding relation between the intervals. Traversing to obtain all virtual sub-scenes in a combined scene +.>The occlusion relationship between the inner display intervals for any virtual sub-scene s _i And s _j In the display section omega 'of the combined scene' _i With omega' _j If omega in the current pose of the user _i 'unidirectional shielding omega' _j E '(i, j) =1 is set, otherwise E' (j, i) =1 is set.

Step S203 is specifically implemented as follows:

the left and right eye images of the combined scene binocular stereoscopic vision experience picture after synthesis are made to beDefining two-dimensional auxiliary matrices Q _L 、Q _R 。Q _L Rank and picture->The same rank number->Record->The value of each pixel in the display sub-section is the pixel value of which virtual sub-scene or which real scene. Q (Q) _R Rank and picture->The same rank number->Record->The value of each pixel in the display sub-section is the pixel value of which virtual sub-scene or which real scene.

The implementation of step S203 further specifically includes the following steps:

Step S2031 is specifically implemented as follows:

let Q _L All elements in the virtual scene synthesis experience picture are initially 0, so that the left eye image of the virtual scene synthesis experience picture obtained by synthesizing all virtual sub-scene experience pictures isInitial setting->If->Namely +.>There is an imaging of an object within the scene of a non-sky area +. >Wherein->For virtual sub-scene s ₁ Experience picture left eye image, traverse all other virtual sub-scenes in the combined scene, for any virtual sub-scene s _i The left eye image of the experience picture is +.>Traversing all its pixels, if +.>When->0, thenWhen->The E' determination scene needs to be retrieved->In a combined scene->Internal three-dimensional display interval and s _i Is a three-dimensional display section. Query elementIf->The value of (1) is not treated, if +.>0, then->When all the left-eye images of the virtual sub-scene are traversed, synthesizing to obtain a left-eye synthesized experience picture image of the virtual scene according to the shielding relation after all the pixels of the left-eye images are traversed. And similarly, obtaining the right eye synthetic experience picture image of the virtual scene according to the method.

Step S2032 is specifically implemented as follows:

traversing a real scene s ₀ Is a stereoscopic experience picture left eye imageEach pixel, for any pixelIf Q _L (u v) is not 0, and E _0,0 (u v) is not-1, and E (E) _0,0 (u v)Q _L (u v)) has a value of 0, i.e.>In a combined scene for a real scene->Scene imaging in display subintervals, corresponding display subintervalsIs virtually scene->Is not processed if the display interval of the display screen is blocked; otherwise->After the traversal of all pixels is completed, get +. >And experiencing a picture left eye image for the synthesized combined scene. The combined scene can be synthesized by the same principleExperience the picture right eye image.

Second embodiment:

in the first example, the value interval according to the pixel depth value cannot ensure that the real scene is correctly identified in the display sub-interval of the combined scene, especially when the value interval of the pixel depth value intersects with the display sub-intervals of the plurality of real scenes in the combined scene. The present embodiment improves on the first embodiment. The improvement is as follows.

In step S00, the method also comprises marking each display subinterval omega 'of the real scene in the combined scene' _0,0 、Ω′ _0,1 、...、Is a class of objects of the same class. Make any display subinterval omega' _0,i Comprises Γ for object class collection _i And (3) representing.

The specific implementation of step S201 further includes the following steps:

for imagesAnd->And carrying out semantic segmentation to obtain the category to which each pixel imaging content belongs. Semantic segmentation of images is a well-known technique to those skilled in the art, and for example Hu, yubin et al, provide an image semantic segmentation method [2 ] with excellent effect based on neural networks]And will not be described in detail herein. />Is +.>C for category to which imaging content belongs ^L (u v) represents->Is +.>C for category to which imaging content belongs ^R (u v).

To be used forIs +.>For example, also at the current pose of user p [ W _j Q _j ]Next, the eye center is taken as the starting point (if->Pixels of the left-eye image, the optical center here is the left-eye optical center), and +.>The image points on the display surface are connected and extended to obtain a ray L, and the real scene penetrated by the ray L is calculated to be in the combined scene +.>Is displayed for the sub-section. Set for real scene subinterval let ray L pass>And (3) representing. Wherein M is ₀ Number of real scene subintervals traversed by L. Traversing all display sub-intervals in lambda, if lambda has display sub-intervals +.>Satisfy the following requirementsThen set E _0,0 (u v) =i. Wherein->For arbitrary display subinterval +.>A set of object categories contained. The method is traversed by->All pixels in the left eye image of the real scene stereoscopic experience picture can be obtained in the combined scene +.>The display subinterval to which the display subinterval belongs. Similarly, the pixels in the right eye image of the stereoscopic experience picture of the real scene can be identified to be in the combined scene +.>The display subinterval is recorded in E _0,1 Is a kind of medium.

Through the improvement, the beneficial effects that obtain are: in the case that the display sub-section of the real scene in the combined scene cannot be correctly identified according to the value section of the pixel depth value, the method of the embodiment can identify the display sub-section according to the object type information of the imaging content of each pixel.

Third embodiment

The position of the person in the real scene in the combined scene is continuously changed, the person can enter a display interval of the virtual sub-scene in the combined scene, and the shielding relation between the person in the real scene and the content in the virtual sub-scene cannot be obtained through the shielding relation between the display sub-interval of the real scene and the virtual sub-scene. The present embodiment improves on the second embodiment, and step S201 further includes the steps ofAnd->Semantic segmentation is carried out to obtain the category to which each pixel imaging content belongs, then the pixels with the category as personnel are scratched out, and then +.>And->And (5) performing image restoration. Image restoration is a well known technique to those skilled in the art, and for example XuRui gives a video image restoration method called "Deep-flow guided video impainting ^[4] And will not be described in detail herein. Therefore, personnel in the real scene picture are eliminated, and shielding calculation of the personnel and the virtual sub-scene is avoided. For other moving objects in the real scene, the method can be eliminated.

The beneficial effects are as follows: and error shielding and displaying of personnel and virtual sub-scenes in a real scene experience picture are avoided.

Fourth embodiment

The embodiment calculates the shielding relation between the person in the real scene and the content in the virtual sub-scene at the cost of generating and transmitting the depth image in the virtual sub-scene in the combined scene. The embodiment of the present invention is improved on the basis of the second embodiment, and in step S10, any virtual sub-scene S _i Also generates a depth image corresponding to the experience picture, and in step S201, the depth image is displayedAnd->After semantic segmentation is performed to obtain the category to which each pixel imaging content belongs, the pixels with imaging content as personnel are marked, and in step S2032, the image +.>And->The pixels marked as personnel are compared with the depth values of the corresponding depth images of the real scene experience picture and the corresponding pixel depth values of the virtual scene synthesis experience picture images, and the imaging pixels of the real scene personnel and the virtual scene synthesis experience picture are obtainedAnd the surface corresponds to the shielding relation of the pixels, so that the pixel value of the combined scene experience picture image after synthesis is obtained.

The beneficial effects are as follows: the occlusion calculation is performed on the personnel and the virtual scene content in the real scene, but the accuracy of the real scene experience picture depth value cannot be ensured, so that the occlusion calculation result may have larger deviation, the virtual sub-scene needs to generate and transmit a depth image, and the occupation of network bandwidth is increased.

Fifth embodiment

The fourth embodiment is modified as follows: in step S10, only the virtual sub-scene set with the depth image to be generated needs to generate the depth image while generating the experience picture; in step S201, according to the depth value of the pixel whose imaging content is the person, it is determined whether the imaging content of the real scene experience picture is the pixel of the person, and whether the imaged object point is within the display interval of the virtual sub-scene in the combined scene, that is, whether the real scene person enters the virtual sub-scene, for the virtual sub-scene in which the real scene person enters, the depth image is set to be generated, otherwise, the depth image is set not to be generated.

The beneficial effects are as follows: the number of virtual sub-scenes required to generate a transmission depth image can be reduced as compared with the fourth embodiment.

Sixth embodiment

In the second and third embodiments, if the same object class exists in a plurality of display sub-sections of the real scene, the display sub-section to which the pixel imaging content belongs may not be identified according to the class of the pixel imaging content. The present embodiment improves the first and second embodiments. The specific improvements are as follows.

In step S00, the method also comprises marking each display subinterval omega 'of the real scene in the combined scene' _0,0 、Ω′ _0,1 、...、Is an object instance of (a). Make any display subinterval omega' _0,i Z for containing object instance set _i And (3) representing. />

The specific implementation of step S201 further includes the following steps:

for imagesAnd->And performing instance segmentation to obtain an instance to which the imaging content of each pixel belongs. Image instance segmentation is a well known technique to those skilled in the art, for example, li, ruihuang et al, gives an excellent image instance segmentation method [3 ]]The image instance segmentation method is not described here in detail. />Is +.>O for the example to which the imaging content belongs ^L (u v) represents->Is +.>O for category to which imaging content belongs ^R (u v).

To be used forIs +.>For example, also at the current pose of user p [ W _j Q _j ]Next, the eye center is taken as the starting point (if->Pixels of the left-eye image, the optical center here is the left-eye optical center), and +.>The image points on the display surface are connected and extended to obtain a ray L, and the real scene penetrated by the ray L is calculated to be in the combined scene +.>Is displayed for the sub-section. Set for real scene subinterval let ray L pass>And (3) representing. Wherein M is ₀ Number of real scene subintervals traversed by L. Traversing all display sub-intervals in lambda, if lambda has display sub-intervals +.>Satisfy O ^R (u v)∈Z _i Setting E _0,0 (u v) =i. Wherein Z is _i For arbitrary display subinterval +.>A set of contained instances. Traversing the methodAll pixels in the left eye image of the real scene stereoscopic experience picture can be obtained in the combined scene +.>The display subinterval to which the display subinterval belongs. Similarly, each pixel in the right eye image of the stereoscopic experience picture of the real scene can be identified to be in the combined sceneThe display subinterval is recorded in E _0,1 Is a kind of medium. Because in the display subinterval of the set real scene in the combined scene, only one subinterval can be ensured to exist in any instance (a large object can also be divided into a plurality of instances to ensure that a single instance only exists in a single subinterval), thereby being capable of always identifying each instance in the stereoscopic experience picture of the real scene The pixels are in the display subinterval to which the combined scene belongs.

Seventh embodiment

Through the wearing equipment with the positioning function, when pose information of a person can be acquired, more accurate depth information of the person can be obtained from the position of the person in a real scene. The sixth embodiment is modified as follows.

Step S201 further includes: the personnel ID corresponding to each pixel of personnel can be marked by example segmentation, and any personnel p can be obtained by the positioning function of the wearable equipment or other positioning methods _k Position in combined scene, whereby position and user p current pose [ W _j Q _j ]Can obtain any person p _k A depth value d relative to user p _k When the depth value exceeds the effective collection range of the depth information of the XR terminal depth camera, the depth value d is calculated _k Assigning a value to an imageAnd->The imaging content of the medium is person d _k Is a pixel of (1); step S2032 further includes: image->And->The imaging content of the real scene person is the pixel of the person, and the depth value of the pixel is compared with the depth value of the pixel corresponding to the virtual scene experience picture, so that the shielding relation between the imaging pixel of the real scene person and the pixel corresponding to the virtual scene experience picture is obtained.

Eighth embodiment:

in the first to seventh embodiments, the image synthesis is performed after all the experience picture images of the virtual sub-scenes are required to be transmitted to the terminal, and this embodiment improves the image synthesis, and the experience picture images of all the virtual sub-scenes are synthesized at the cloud end first, and the synthesized images are transmitted to the user terminal to be synthesized with the real scene experience picture of the user, so as to obtain the final virtual-real fusion combined scene experience picture.

As shown in particular in fig. 9. The system configuration image synthesis module of the first embodiment is disassembled into the virtual scene image synthesis module 70 and the virtual-real scene image synthesis module 20 in the system of the present embodiment. The virtual scene image synthesis module 70 is disposed at the cloud end, and the virtual scene image synthesis module 20 is disposed at the user terminal. The virtual scene image synthesis module 70 runs step S2031, and the virtual-real scene image synthesis module 20 runs step S2032.

The beneficial effects are as follows: the amount of image data transmitted to the user terminal by the cloud is reduced.

Ninth embodiment:

when the combined scene is all the virtual sub-scene and does not include the real scene, especially when the virtual sub-scene field is not convex and is not unidirectional shielded with other virtual sub-scenes, a plurality of display sub-intervals can be constructed for the virtual sub-scene in the combined scene, so that the first embodiment is improved, specifically as follows: in step S00, a display subinterval of the virtual sub-scene in the combined scene is set, and the display subinterval of the virtual sub-scene in the combined scene and the display intervals or the display subintervals of other virtual sub-scenes in the combined scene are blocked in one direction; in step S10, for the virtual sub-scene in which the display sub-section is set, the display sub-section to which the imaging content of each pixel in the generated experience picture image belongs needs to be marked, and in step S203, the occlusion relationship between pixels is obtained according to the occlusion relationship between the display section and the display sub-section or between the display section and the display sub-section of the virtual scene or between the display sub-section and the display sub-section, and further image synthesis is performed.

Tenth embodiment:

the experience picture generation methods of the first to ninth embodiments only use the experience picture synthesis of a part of the sub-scenes in the combined scene, and the experience picture synthesis of the synthesized semi-finished product experience picture and other sub-scenes in the combined scene can adopt the combined scene experience picture generation methods given by application numbers CN202210908795.0, 2023110288353, 2023110286220, 2023110288353, 2023110286220 and the like, or the part of the sub-scene experience picture is synthesized by the other combined scene experience picture generation methods, and the synthesized semi-finished product is synthesized with the rest of the sub-scene experience picture by adopting the experience picture generation methods of the first to ninth embodiments.

The XR combined scene experience picture generation method has the beneficial effects that: according to the method, the pixel shielding relation is obtained by identifying the display subinterval of the real scene experience picture pixel in the combined scene, and the shielding relation of the pixel is obtained by the shielding relation between the display subinterval and other sub-scene display intervals or between the display subintervals, so that the shielding relation is prevented from being obtained by directly comparing depth information with poor precision, and the correct shielding of the scene contents in the combined scene experience picture is better realized.

The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Reference to the literature

[1]Tosi,Fabio,et al."NeRF-Supervised Deep Stereo."Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023.

[2]Hu,Yubin,et al."Efficient Semantic Segmentation by Altering Resolutions for Compressed Videos."Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023.

[3]Li,Ruihuang,etal."SIM:Semantic-aware Instance Mask Generation for Box-Supervised Instance Segmentation."Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023.

[4]Xu,Rui,Li,et al.Deep flow-guided video inpainting.In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(pp.3723-3732).

Claims

1. An augmented reality (XR) composite scene experience picture generation method, the method comprising:

wherein, a single sub-scene can have a plurality of display sub-intervals in the combined scene, and any sub-field of the combined sceneThe scene is a virtual scene or a real scene, for a combined scene sub-scene s having a plurality of display sub-intervals in said combined scene _i When any pixel tau calculates the shielding relation with other combined scene sub-scene pixels, the display sub-interval to which tau belongs needs to be judged, and the shielding relation between pixels is obtained by the shielding relation between the display sub-interval to which tau belongs and the display interval or the display sub-interval of other combined scene sub-scenes.

2. The method according to claim 1, wherein the combined scene includes a real scene, and the step S10 is preceded by a step S00 of setting a combined scene experience picture generation parameter including setting an imaging section of each virtual sub-scene, a pupil distance of a user p in each virtual sub-scene, a rotation-translation transformation relation between the combined scene coordinate system and each virtual sub-scene coordinate system, a display section of each virtual sub-scene in the combined scene, and setting a plurality of display sub-sections of the real scene in the combined scene.

3. The generating method according to claim 2, wherein the step S20 includes:

4. The generating method according to claim 3, wherein the step S201 is specifically implemented as: for any pixel tau belonging to the real scene experience picture, calculating a display sub-display interval set of the real scene penetrated by the user sight where tau is located in the combined scene, and when the error range of the depth value of tau and any display sub-display interval omega 'in the set' _0,ki When the depth value range of the sight line is intersectedTau belongs to the display subinterval omega' _0,ki 。

5. The generating method according to claim 3, wherein the step S00 further includes marking object categories included in each display subinterval of the real scene in the combined scene, and the step S201 is specifically implemented as: semantic segmentation is carried out on the real scene experience picture image to obtain the category to which each pixel imaging content belongs, for any pixel tau belonging to the real scene experience picture, a display sub-display interval set of the real scene penetrated by the user sight where tau is located in the combined scene is calculated, and when the category to which tau imaging content belongs to any display sub-display interval omega 'in the set' _0,ki When the object is included, τ is in the display subinterval Ω' _0,ki 。

6. The generating method according to claim 3, wherein the step S00 further includes marking object instances included in each display subinterval of the real scene in the combined scene, and the step S201 is specifically implemented as: performing instance segmentation on the image of the real scene experience picture to obtain an instance of each pixel imaging content, for any pixel tau belonging to the real scene experience picture, calculating a display sub-display interval set of the real scene penetrated by the line of sight of the user where tau is positioned in the combined scene, and when the instance of the tau imaging content belongs to any display sub-display interval omega 'in the set' _0,ki When the object example is included, τ belongs to the display subinterval Ω' _0,ki 。

7. The method according to any one of claims 2 to 6, wherein the step S00 further includes ensuring that each set virtual sub-scene is unidirectional blocked in a display section of the combined scene and each display sub-section of the real scene, respectively, and the step S20 processes the blocking relation between the real scene person and the virtual sub-scene content in such a way that the real scene person is scratched out from the real scene experience picture and the image is repaired or the blocking relation is obtained according to the depth information comparison, and the step S203 includes:

8. An XR composite scene experience picture generation system comprising a memory, a processor, and an XR composite scene experience picture generation method program stored on the processor, which when executed by the processor performs the steps of the method of any one of claims 1 to 7.

9. An XR combined scene experience picture generation terminal performing step S203 or step S2031 or step S2032 of the method according to any of claims 1 to 7.

10. A computer readable storage medium having stored thereon a computer program which when invoked by a processor performs the steps of the method according to any one of claims 1 to 7.