WO2024147184A1

WO2024147184A1 - Virtual space display system, terminal device, and virtual space display program

Info

Publication number: WO2024147184A1
Application number: PCT/JP2023/000088
Authority: WO
Inventors: 広夢宮下; 美紀北端; 翔平松尾; 涼平西條
Original assignee: 日本電信電話株式会社
Priority date: 2023-01-05
Filing date: 2023-01-05
Publication date: 2024-07-11

Abstract

This virtual space display system includes a terminal device and an information processing device. The terminal device uses an image frame outputted by an imaging device, derives at least any one among a person region that is recognized as a person, an object region that is recognized as an object, and a designated region in which a predetermined region is designated in the image frame, uses at least any one among the person region, the object region, and the designated region to set a predetermined alpha value in the image frame, and synthesizes a display image on the basis of the actual space image frame and a virtual space image frame received from the information processing device. The information processing device uses the actual space image frame received from the terminal device and the object region or the person region, generates a virtual object, adjusts the predetermined position and shape of the virtual object in the virtual space, and draws by superimposing the actual space image frame and the virtual object onto the virtual space.

Description

VIRTUAL SPACE DISPLAY SYSTEM, TERMINAL DEVICE, AND VIRTUAL SPACE DISPLAY PROGRAM

The present invention relates to a virtual space display system, a terminal device, and a virtual space display program.

When users at different locations communicate with each other via video, a method is known that makes it easier for them to converse by constructing a virtual space on a computer (hereafter referred to as "virtual space") and expressing the sense of distance and line of sight between the other users. Specifically, a method has been proposed in which the position of each location is set within a virtual space, and the way video and audio captured at one location are seen and heard is edited or processed based on the relative positions of the other location when viewed from that location.

For example, in conventional technology, the areas showing people or objects are obtained from video captured at each location, the video is rotated based on the position and direction of the user's line of sight, and the video is pasted into a virtual space. Furthermore, a method is known in which the video captured of the virtual space is expressed by performing perspective projection conversion so that it corresponds to the position of the display of the terminal device operated by the user (see Patent Document 1).

Patent No. 5798536

However, conventional technology has the problem that it is difficult to provide high-quality communication to users participating in virtual spaces.

For example, conventional technology extracts only the areas of a person from a captured image and pastes them into a virtual space, making it appear as if the user is actually at that point in the virtual space. However, if the accuracy of the subject extraction is poor, degradation of image quality (e.g., graininess or blurring of the image) can occur at the boundary between the person and other areas, impairing the quality of the experience of communication between users.

For example, conventional technologies do not always take into consideration locations or users who participate in a virtual space without wearing a head-mounted display (hereinafter referred to as "HMD"). For example, conventional technologies do not allow a user who participates in a virtual space through video from the real world using a camera and microphone to interfere with a user who is immersed in the virtual space using an HMD, or with an avatar that represents that user in the virtual space (for example, by interacting with the user by touching or handing over a virtual object), making it difficult to provide high-quality communication.

The present invention has been made in consideration of the above circumstances, and aims to provide technology that realizes high-quality communication by suppressing deterioration in image quality caused by subject extraction in communication involving video and virtual space, and by realizing interaction between users participating in real space and users participating in virtual space via video.

In order to solve the above problems and achieve the object, the virtual space display system of the present invention is a virtual space display system that displays a virtual space, and the terminal device has a calculation unit that uses an image frame output from a shooting device to calculate at least one of a person area recognized as a person in the image frame, an object area recognized as an object, and a designated area in which a predetermined area is designated, a setting unit that sets a predetermined alpha value in the image frame using at least one of the person area, the object area, and the designated area, and a synthesis unit that synthesizes a display image based on a real space image frame, which is the image frame in which the alpha value is set, and a virtual space image frame received from an information processing device, and the information processing device has a generation unit that generates a virtual object using the real space image frame received from the terminal device and the object area or the person area, an adjustment unit that adjusts the predetermined position and shape of the virtual object in the virtual space, and a drawing unit that draws the real space image frame and the virtual object by superimposing them on the virtual space.

The present invention has the effect of enabling high-quality communication to be provided to users participating in a virtual space.

FIG. 1 is a diagram showing an overview of a virtual space display method according to an embodiment. FIG. 2 is a diagram showing an overview of a virtual space according to the embodiment. FIG. 3 is a diagram illustrating an example of an interaction in a virtual space according to the embodiment. FIG. 4 is a diagram illustrating an example of functional blocks of the virtual space display system according to the embodiment. FIG. 5 is a diagram illustrating an example of a device configuration of a virtual space display system according to an embodiment. FIG. 6 is a diagram showing an overview of area information according to the embodiment. FIG. 7 is a diagram showing an overview of setting the sum of alpha values to an image frame according to this embodiment. FIG. 8 is a diagram illustrating an example of compositing a real space image frame and a virtual space image frame according to the embodiment. FIG. 9 is a diagram showing an overview of overlaying a real space image layer and a virtual space image frame according to the embodiment. FIG. 10 is a diagram showing an outline of a bone according to the embodiment. FIG. 11 is a diagram illustrating an example of a flowchart of a processing procedure of the terminal device according to the embodiment. FIG. 12 is a diagram illustrating an example of a flowchart of a processing procedure of the information processing device according to the embodiment. FIG. 13 is a diagram showing an example of a computer in which the constituent devices of the virtual space display system according to this embodiment are realized.

Below, the form for carrying out the present invention (hereinafter, "embodiments") will be described with reference to the drawings. Note that each embodiment is not limited to the contents described below.

[1. Overall Overview]
The virtual space display system 1 of the present embodiment provides a video communication technology used when users at remote locations or users in a virtual space communicate with each other via video. Specifically, the virtual space display system 1 provides high-quality communication by suppressing degradation of video quality caused by extraction of a subject from a real video in communication connecting a real video and a virtual space. Furthermore, the virtual space display system 1 enables interference with a user immersed in a virtual space and an avatar that represents the user in the virtual space, improving the communication experience in the virtual space.

The outline of this embodiment will now be further explained using FIG. 1. FIG. 1 is a diagram showing an outline of a virtual space display method according to an embodiment.

The upper part of Figure 1 shows a base (hereafter referred to as "real space base 10R") that participates in communication from real space via video. On the other hand, the lower part of Figure 1 shows a base (hereafter referred to as "virtual space base 10V") that participates in communication from virtual space using an HMD or the like. Furthermore, the terminal device 100 of the real space base 10R and the information processing device 200 of the virtual space base 10V are connected by a specified network N ((1) in Figure 1).

At the real-world base 10R, the user 20R uses a display device 30R (e.g., a large-screen display) to communicate with other bases. An image capture device 31R (e.g., a camera) at the real-world base 10R outputs an image of the user 20R to the terminal device 100 ((2) in FIG. 1). On the other hand, the display device 30R outputs a predetermined image to the user 20R based on information output from the terminal device 100 ((3) in FIG. 1).

At the virtual space base 10V, the user 20V wears the HMD 32 and is immersed in the virtual space, thereby communicating with other bases. In addition, an attitude acquisition sensor 32a (for example, a 6DoF sensor that acquires the position and inclination of a controller) built into the HMD 32 acquires information regarding the position, movement, inclination, etc. of the user 20V's head, and outputs predetermined information to the information processing device 200 ((4-1) in FIG. 1). On the other hand, an attitude acquisition device 33 at the virtual space base 10V acquires information regarding the position, movement, inclination, etc. of the user 20V's hands, and outputs predetermined information to the information processing device 200 ((4-2) in FIG. 1).

The information processing device 200 performs rendering in the virtual space based on the information about the posture of the user 20V output by the posture acquisition sensor 32a and the posture acquisition device 33 ((5) in FIG. 1). The virtual space display system 1 then provides communication and interaction via the virtual space between the user 20R at the real space base 10R and the user 20V at the virtual space base 10V.

In addition, the virtual space display system 1 can output tactile information such as vibrations based on interference generated in the virtual space via a tactile output device 33a (e.g., a vibration motor built into the controller or a haptic device such as a separately connected haptic suit) built into the posture acquisition device 33, etc. ((6) in Figure 1).

Next, the positional relationships of virtual objects and the like in the virtual space generated by the information processing device 200 and an overview of the virtual space as seen from the HMD will be described with reference to FIG. 2. FIG. 2 is a diagram showing an overview of the virtual space according to the embodiment.

As shown in FIG. 2, the virtual space 40 is displayed as a virtual space observed from the perspective of the avatar of the user 20V. The virtual space 40 is configured such that a virtual display device 30V is installed on the ground 50.

Here, the positional relationship of virtual objects in virtual space 40 will be explained using virtual space 41, which is an overhead view of virtual space 40. In virtual space 41, there exists a virtual ground 50, on which an avatar 51, a virtual display device 30V, and a virtual imaging device 31V are placed. In terms of specific positional relationships, the virtual display device 30V is placed on the ground 50 of virtual space 41, and the virtual imaging device 31V is placed in the depth direction of the display device 30V when viewed from the avatar 51 of user 20V.

The virtual display device 30V displays a composite image of a real space image frame (an image frame in which the sum of a specified alpha value is set to an image frame in real space) and a virtual space image frame (an image frame in which an avatar or virtual object is composited with an image frame acquired in virtual space).

The virtual camera 31V is a virtual object that is installed to indicate the angle of view when an image is taken from the location in question using the camera. Note that the virtual camera 31V in this embodiment has an angle of view that is used to photograph the avatar 51 looking at the virtual display device 30V from the front, through the virtual display device 30V. The virtual camera 31V installed here is a virtual camera that is positioned based on a 3D model of an actual camera, etc., and does not actually take pictures.

Here, an example of a virtual space in which a real-space base and a virtual-space base can participate will be described with reference to FIG. 3. FIG. 3 is a diagram showing an example of an interaction in a virtual space according to an embodiment.

In the virtual space 42 in the upper part of Figure 3, an avatar 51 is placed in the foreground, and a virtual display device 30V is placed in the background. Although not shown in the virtual space 42, a virtual camera is placed behind the virtual display device 30V, and an image of the virtual space taken by the camera is composited with a real-space image frame and displayed on the virtual display device 30V. Therefore, when viewed from the avatar 51, it appears as if the communication partner (such as a user at the real-space base) and the user's own avatar 51 are composited in a large mirror.

In virtual space 43 in the middle of Figure 3, avatar 51 in the virtual space is shown holding a virtual object. For example, in virtual space 43 in the middle of Figure 3, avatar 51 is shown holding book 52, which is a virtual object generated based on book 52 at a real-space base. In this way, virtual space display system 1 realizes interaction by replacing an object at the real-space base (book 52) with a virtual object in the virtual space (book 52).

Furthermore, in virtual space 44 in the lower part of Fig. 3, the real-space base and the virtual-space base are superimposed on the display image. The virtual space 44 shows a state in which user 20R at the real-space base is "interfering" with hamburger 53 held by avatar 51 at the virtual-space base through the virtual space. At this time, hamburger 53, which is a virtual object, may be set so that it can be grasped or moved by avatar 51, or moved or deformed based on interference by user 20R at the real-space base, or special visual effects can be added using CG (Computer Graphics), etc.

In this way, the virtual space display system 1 provides communication and interaction between users at real space bases and users at virtual space bases via the virtual space.

Next, a series of processing steps of the virtual space display system 1 according to this embodiment will be described with reference to FIG. 4. FIG. 4 is a diagram showing an example of functional blocks of the virtual space display system 1 according to this embodiment. Note that each functional unit shown in FIG. 4 has the same function as the functional unit shown in FIG. 5, which will be described later. Therefore, a detailed explanation of the functions of each functional unit will be given later, and this section will only provide an overview of the processing.

In FIG. 4, the terminal device 100 at the real world base 10R is connected to the information processing device 200 at the virtual world base 10V via a specific network N. In other words, the area above the network N shown in FIG. 4 is the real world base 10R, and the area below the network N is the virtual world base 10V.

First, the image capturing device 31R captures an image of a user at the real-space base 10R, and outputs image frames based on the frame rate of the captured image to the terminal device 100 ((S1) in FIG. 4). The calculation unit 131 calculates the person area and object area from the output image frames. The calculation unit 131 may also calculate a designated area.

The setting unit 132 sets a predetermined alpha value for the image frame. The transmission unit 133 transmits the image frame for which the alpha value has been set (hereinafter, an image frame for which an alpha value has been set is referred to as a "real-space image frame"), the person region, the object region, and the designated region to the information processing device 200 via the network N ((S2) in FIG. 4).

The receiving unit 231 of the information processing device 200 receives the real-space image frame, person area, object area, and designated area transmitted by the transmitting unit 133 of the terminal device 100 (S3 in FIG. 4). The updating unit 232 updates the image displayed on the virtual display device installed in the virtual space based on the real-space image frame, person area, object area, and designated area received by the receiving unit 231 (S4 in FIG. 4).

The determination unit 233 of the information processing device 200 determines whether the image frame and object area received by the receiving unit 231 ((S5) in FIG. 4) are an existing object area (e.g., an object area included in a past image frame, etc.). If the object area is a new object area, the first generation unit 2341 generates a virtual object that imitates the shape or pattern of the object that the object area targets. Next, if it is determined that a part of the avatar (e.g., a face or hand) or a bone (a virtual skeleton that is set in a virtual object in a virtual space and provides collision detection, etc.; hereafter referred to as "bone") has interfered with the virtual object, or the user has performed some action on the virtual object via the controller, the first adjustment unit 2351 adjusts the virtual object by moving or transforming it.

The second generation unit 2342 of the information processing device 200 uses the image frame and person area received by the receiving unit 231 ((S6) in FIG. 4) to generate bones based on information such as the position, size, and shape of the person area. Next, if the second adjustment unit 2352 determines that the bones have interfered with another user's avatar or that the user has performed some action via the controller, it performs adjustments such as moving or deforming the target avatar. In addition, the second adjustment unit 2352 uses information regarding the position and tilt corresponding to the posture acquired by the posture acquisition sensor 32a or posture acquisition device 33 ((S7) in FIG. 4) to adjust the avatar's standing position and the position and tilt of the face and hands.

The calculation unit 236 of the information processing device 200 uses a virtual space image frame captured with an angle of view centered on the avatar as viewed from the position of the virtual imaging device to calculate the avatar area and the virtual object area from the virtual space image frame. The transmission unit 237 transmits the virtual space image frame, the avatar area, and the virtual object area to the terminal device 100 via the network N (S8 in FIG. 4).

The receiving unit 134 of the terminal device 100 receives the virtual space image frame, the avatar area, and the virtual object area transmitted by the transmitting unit 237 of the information processing device 200 (S9 in FIG. 4). The synthesis unit 135 synthesizes a display image to be displayed on the display device 30R using the real space image frame set by the setting unit 132 (S10 in FIG. 4) and the virtual space image frame received by the receiving unit 134 (S11 in FIG. 4). Next, the output unit 136 outputs the display image synthesized by the synthesis unit 135 to the display device 30R (S12 in FIG. 4).

Meanwhile, the drawing unit 238 of the information processing device 200 draws the situation of the virtual space as seen from the gaze position of the avatar of the user immersed in the virtual space. The output unit 239 outputs the situation of the virtual space drawn by the drawing unit 238 ((S13) in FIG. 4) to the HMD 32 ((S14) in FIG. 4). Furthermore, the output unit 239 outputs interference caused by contact with the avatar or virtual object, etc., to the haptic output device 33a as haptic information such as vibrations ((S15) in FIG. 4).

2. System and Device Configuration
From here, a description will be given of the virtual space display system 1 according to the present embodiment. Fig. 5 is a diagram showing an example of the device configuration of the virtual space display system 1 according to the embodiment.

2-1. Configuration of the virtual space display system
5, the virtual space display system 1 is a virtual space display system that displays a virtual space, and includes a terminal device 100 and an information processing device 200. The virtual space display system 1 has a configuration in which the terminal device 100 and the information processing device 200 are connected via a predetermined network N.

2-2. Configuration of terminal device
Next, a configuration example of the terminal device 100 included in the virtual space display system 1 according to this embodiment will be described. As shown in FIG.

(Communication unit 110)
The communication unit 110 is realized by a NIC (Network Interface Card) or the like, and controls communication via an electric communication line such as a LAN (Local Area Network) or the Internet. The communication unit 110 is connected to a network N by wire or wirelessly as necessary, and can transmit and receive information in both directions.

In this embodiment, it is assumed that communication between the terminal device 100 and the information processing device 200 is performed via the communication unit 110. It is also assumed that communication with other devices (e.g., the display device 30R, the image capture device 31R, etc.) is performed via the communication unit 110.

(Memory unit 120)
The storage unit 120 stores data and programs necessary for various processes by the control unit 130. The storage unit 120 is realized by a semiconductor memory element such as a random access memory (RAM) or a flash memory, or a storage device such as a hard disk or an optical disk. As shown in FIG. 5 , the storage unit 120 has an image frame storage unit 121, a region information storage unit 122, and a composite image storage unit 123.

(Image frame storage unit 121)
The image frame storage unit 121 stores image frames at a real-space base that are output based on video captured by the imaging device 31 R. Furthermore, the image frame storage unit 121 stores real-space image frames in which the sum of alpha values is set by the setting unit 132.

The image frame storage unit 121 also stores virtual space image frames transmitted by a transmission unit 237 of the information processing device 200, which will be described later. Note that the image frame storage unit 121 can store information other than the above-mentioned information, without being limited thereto, as long as it is within the category of image frames.

(Area information storage unit 122)
The region information storage unit 122 stores a person region 122a, an object region 122b, and a specified region 122c.

Here, an overview of the person area 122a, the object area 122b, and the designated area 122c will be described with reference to FIG. 6. FIG. 6 is a diagram showing an overview of area information according to an embodiment.

In FIG. 6, an image frame 60, a person area 122a, an object area 122b, and a designated area 122c are shown. The image frame 60 in FIG. 6 is an image frame that includes a user of a real-space base and their background information, acquired by a photographing device. The person area 122a and the object area 122b are binary images in which people or objects are represented by white pixels and the rest are represented by black pixels. The designated area 122c is an arbitrarily set area, and is a binary image in which the designated location is represented by white pixels and the rest are represented by black pixels.

(Person area 122a)
The person area 122a is information on a pixel area having characteristics of a "person" included in an image frame that is recognized as a person area by a predetermined method. For example, the person area 122a may be an area of a person (user 20R) (an area of white pixels in the person area 122a in FIG. 6 ) calculated by a calculation unit 131 described later from an image frame 60 that includes user 20R holding a book 52 in his right hand in FIG. 6 .

(Object region 122b)
The object region 122b is information on a pixel region having characteristics of an "object" included in an image frame that is recognized as an object region by a predetermined method. For example, the object region 122b may be a region of an object (book 52) (a region of white pixels in the object region 122b in FIG. 6) calculated by a calculation unit 131 described later from an image frame 60 that includes a user 20R holding a book 52 in his right hand in FIG. 6.

(Designated area 122c)
The designated area 122c is information on a pixel area arbitrarily designated in an image frame by an operation of a user or the like. The designated area may be set by an input operation of the user or by input of a binary image. For example, the designated area 122c may be an area (the area of white pixels of the designated area 122c in FIG. 6) calculated by a calculation unit 131 described later from an area 54 designated in an image frame 60 including a user 20R of a real-space base 10R holding a book 52 in his right hand in FIG. 6.

(Synthetic Image Storage Unit 123)
Returning now to Fig. 5, the description will be continued. The composite image storage unit 123 stores a display image composited by a composite unit 135, which will be described later.

(Control unit 130)
The control unit 130 has an internal memory for temporarily storing programs that define various processing procedures and the like and processing data, and is realized by electronic circuits such as a CPU (Central Processing Unit) and an MPU (Micro Processing Unit), and integrated circuits such as an ASIC (Application Specific Integrated Circuit) and an FPGA (Field Programmable Gate Array). As shown in FIG. 5, the control unit 130 has a calculation unit 131, a setting unit 132, a transmission unit 133, a reception unit 134, a synthesis unit 135, and an output unit 136.

(Calculation Unit 131)
The calculation unit 131 uses an image frame output from the imaging device to calculate at least one of a person region 122a recognized as a person in the image frame, an object region 122b recognized as an object, and a designated region 122c in which a predetermined region is designated. For example, the calculation unit 131 can use a method such as real-time subject extraction from any background using deep learning, background separation, person recognition, or segmentation based on depth information from a depth camera, or object detection by matching feature amounts, in calculating the above-mentioned region information.

For example, the calculation unit 131 can calculate an area recognized as a person (the area of white pixels in the person area 122a in FIG. 6) from the image frame 60 in FIG. 6 as the person area 122a.

For example, the calculation unit 131 can calculate the area recognized as book 52 (the area of white pixels in object area 122b in FIG. 6) as object area 122b from image frame 60 in FIG. 6. Note that the calculation unit 131 does not detect all objects in the video as object area 122b, but can detect objects that meet specific conditions, such as those held by user 20R in his/her hand or placed in a specific area.

For example, the calculation unit 131 may calculate the designated area 122c based on the above-mentioned method in accordance with preset calculation conditions. For example, the calculation unit 131 may calculate the designated area 122c as the area corresponding to the designated area 54 designated from near the center of the background room to the right side (the area of white pixels in the designated area 122c in FIG. 6), excluding the left side area included in the image frame 60 in FIG. 6 from the calculation target. Note that the designated area 122c may be calculated by the calculation unit 131, or may be specified by a user, etc.

In addition, the calculation unit 131 may use any method other than the above-mentioned method for calculating the person area 122a, the object area 122b, and the designated area 122c, without any limitations.

(Setting Unit 132)
5, the description will be continued. The setting unit 132 sets a predetermined alpha value for the image frame using at least one of the person region 122a, the object region 122b, and the designated region 122c. In other words, the setting unit 132 sets the alpha value of the image frame to the sum of the person region 122a, the object region 122b, and the designated region 122c.

Specifically, when focusing on any pixel in the image frame, if any pixel in person area 122a, object area 122b, or designated area 122c at the same coordinates is white, setting unit 132 sets the relevant pixel to remain. Conversely, if all pixels in person area 122a, object area 122b, and designated area 122c at the same coordinates are black, setting unit 132 sets the relevant pixel to be transparent.

Here, the setting of the sum of the alpha values will be further explained using FIG. 7. FIG. 7 is a diagram showing an overview of the setting of the sum of the alpha values for an image frame according to an embodiment.

First, as a premise, when the alpha value ranges from 0 to 255, in the images showing each area, black pixels are "0" and white pixels are "255". Additionally, when the alpha value ranges from 0.0 to 1.0, in the images showing each area, black pixels are "0.0" and white pixels are "1.0".

When the alpha value ranges from 0 to 255, the setting unit 132 calculates the values "0+0=0", "0+255=255", and "255+255=255" as the "process of setting the sum" described above. The process described above means adding up (or treating together) pixels that you want to keep (for example, alpha 255, white pixels). In this case, as described above, black pixels remain black because "0+0=0". On the other hand, white pixels are replaced with white pixels because "0+255=255". Note that in the area of white pixels, they remain white because "255+255=255".

As a specific example, a case in which person area 122a and object area 122b in the upper part of Fig. 7 are added together will be described. For example, setting unit 132 sets the sum of the alpha values of person area 122a and object area 122b shown in the upper part of Fig. 7, and sets person area + object area 122ab. On the other hand, setting unit 132 sets the sum of the alpha values of object area 122b and designated area 122c shown in the upper part of Fig. 7, and sets object area + designated area 122bc.

Furthermore, the setting unit 132 sets the sum of the person area + object area 122ab and the object area + designated area 122bc shown in the middle of FIG. 7, and sets the person area + object area + designated area 122abc. Note that the order of the summation of the alpha values described above is not particularly limited and may be changed as appropriate.

The setting unit 132 applies the person area + object area + designated area 122abc to the image frame 60, and sets the real space image frame 60R in which the sum of the alpha values is set.

In this way, the setting unit 132 superimposes the "white pixels" and "black pixels" areas of the person area 122a, object area 122b, and designated area 122c to calculate a real-space image frame 60R in which the sum of the alpha values is set from the image frame 60. Therefore, the real-space image frame 60R shown in the lower part of Figure 7 is an image frame in which an alpha value is set by the setting unit 132, and the left side of the background is depicted with black pixels indicating that it is transparent.

(Transmitting unit 133)
5 again. The transmitting unit 133 transmits the real space image frame in which a predetermined alpha value is set, the person area 122a, and the object area 122b to the information processing device 200. The transmitting unit 133 may transmit the designated area 122c as well, or may transmit the image frame in which no alpha value is set, the person area 122a, the object area 122b, and the designated area 122c. The transmitting unit 133 may also transmit audio information acquired by the image capturing device 31R.

(Receiving unit 134)
The receiving unit 134 receives the virtual space image frame, the avatar area 222a, and the virtual object area 222b calculated by the calculation unit 236 of the information processing device 200. Note that, for the above-mentioned virtual space image frame, the avatar area 222a, and the virtual object area 222b, the process of setting the sum of alpha values by the setting unit 132 may be similarly performed by the calculation unit 236 of the information processing device 200.

(Synthesizing unit 135)
The synthesis unit 135 synthesizes a display image based on a real space image frame, which is an image frame in which an alpha value is set, and a virtual space image frame received from the information processing device 200. Here, an example of the synthesis of a display image by the synthesis unit 135 will be described with reference to Fig. 8. Fig. 8 is a diagram showing an example of the synthesis of a real space image frame and a virtual space image frame according to an embodiment.

FIG. 8 shows a real space image frame 60R in which the sum of alpha values is set, and a virtual space image frame 60V in which the sum of alpha values is similarly set. Here, a person area 122a and an object area 122b are set in the real space image frame 60R. On the other hand, an avatar area 222a equivalent to the person area 122a of the real space image frame 60R, and a virtual object area 222b equivalent to the object area 122b are set in the virtual space image frame 60V.

The synthesis unit 135 synthesizes the real space image frame 60R and the virtual space image frame 60V described above to generate the display image 70RV. Specifically, as shown in the lower part of FIG. 8, the display image 70RV can be synthesized such that the virtual space image frame 60V fits within the transparent area of the real space image frame 60R.

Here, we will further explain the composition of displayed images (overlaying of layers) using Figure 9. Figure 9 is a diagram showing an overview of overlaying a real-space image layer and a virtual-space image frame according to an embodiment.

As shown in FIG. 9, the compositing unit 135 composites the layers of each image so that they overlap. Note that if an alpha value that makes pixels transparent is set, the pixels of the lower layers are more likely to be hidden by the pixels of other layers when the layers are superimposed. Conversely, the pixels of the upper layers are less likely to be hidden by the pixels of other layers and are more likely to be displayed.

In FIG. 9, the synthesis unit 135 superimposes the real space image frame 60R and the virtual space image frame 60V. Furthermore, the synthesis unit 135 superimposes on top of that the person region 122a and the object region 122b in the real space image frame 60R, and the avatar region 222a and the virtual object region 222b in the virtual space image frame 60V. The synthesis unit 135 then synthesizes the display image 70RV.

Note that each area at this time transfers pixels from the corresponding real space image frame 60R or virtual space image frame 60V. Furthermore, the synthesis unit 135 may add a specified area to a layer in cases such as when an alpha value is not set in the real space image frame 60R.

Furthermore, the order of layers shown in FIG. 9 is not limited and may be changed arbitrarily. Furthermore, when depth information or distance from the camera is stored in person area 122a, object area 122b, or avatar area 222a, virtual object area 222b, synthesis unit 135 may automatically switch the order of layers according to the depth or distance. In other words, the closer to the shooting device or virtual shooting device, the more likely it is to appear above other layers, and conversely, the further away it is from the camera or virtual camera, the more likely it is to be hidden by other layers and less likely to remain in the displayed image.

With this layer structure, when passing an object from a virtual space base to a real space base, the virtual space display system 1 can display the virtual object in front (for example, at the top of the layer) even if the virtual object cuts in on the real space base side. This allows the user to move and transform the object while looking at the screen.

(Output unit 136)
5 again, the description will be continued. The output unit 136 outputs the display image synthesized by the synthesis unit 135 to the display device 30R. Note that the information output by the output unit 136 is not particularly limited.

2-3. Configuration of information processing device
Next, a configuration example of the information processing device 200 included in the virtual space display system 1 according to this embodiment will be described. As shown in FIG.

(Communication unit 210)
The communication unit 210 is realized by a NIC or the like, and controls communication via an electric communication line such as a LAN, the Internet, etc. The communication unit 210 is connected to a network N by wire or wirelessly as necessary, and can transmit and receive information in both directions.

In this embodiment, it is assumed that communication between the information processing device 200 and the terminal device 100 is performed via the communication unit 210. It is also assumed that communication with other devices (e.g., the HMD 32, the posture acquisition device 33, etc.) is performed via the communication unit 210.

(Memory unit 220)
The storage unit 220 stores data and programs necessary for various processes by the control unit 230. The storage unit 220 is realized by a semiconductor memory element such as a RAM or a flash memory, or a storage device such as a hard disk or an optical disk. As shown in FIG. 5 , the storage unit 220 has an image frame storage unit 221, a region information storage unit 222, and a generation information storage unit 223.

(Image frame storage unit 221)
The image frame storage unit 221 stores virtual space image frames, which are image frames captured with an angle of view centered on an avatar in a virtual space. The image frame storage unit 221 can also store real space image frames received from the terminal device 100. Note that the image frame storage unit 221 can store information other than the above-mentioned information, without being limited thereto, as long as it is within the category of image frames.

(Area information storage unit 222)
The region information storage unit 222 stores the avatar region 222a and the virtual object region 222b calculated from the virtual space image frame. The region information storage unit 222 can also store the person region 122a, the object region 122b, and the designated region 122c received from the terminal device 100.

(Avatar area 222a)
Avatar region 222a is information relating to a pixel region having characteristics of an "avatar" included in an image frame that has been recognized as an avatar region using a predetermined method.

(Virtual object area 222b)
The virtual object region 222b is information relating to a pixel region having characteristics of a "virtual object" included in an image frame that has been recognized as a virtual object region by a predetermined method.

(Generation information storage unit 223)
The generation information storage unit 223 stores the virtual object 223 a generated by the first generation unit 2341 and the bones 223 b generated by the second generation unit 2342 .

(Virtual object 223a)
The virtual object 223a is information about a virtual object that is generated by acquiring pixels of an image frame corresponding to an object region, using them as texture, and imitating the shape and pattern of the corresponding object by a first generating unit 2341 described later. Note that the virtual object 223a is a virtual object placed in a virtual space, and the same will be used hereinafter.

(Bone 223b)
The bones 223b are information about bones generated by a second generator 2342 (described later) using image frames in real space. Note that the bones 223b are virtual objects set in the person area 122a, and the same applies hereinafter.

(Control unit 230)
The control unit 230 has an internal memory for temporarily storing programs that define various processing procedures and processing data, and is realized by electronic circuits such as a CPU and an MPU, and integrated circuits such as an ASIC and an FPGA.

As shown in FIG. 5, the control unit 230 also includes a receiving unit 231, an updating unit 232, a determining unit 233, a generating unit 234, an adjusting unit 235, a calculating unit 236, a transmitting unit 237, a drawing unit 238, and an output unit 239.

(Receiving unit 231)
The receiving unit 231 receives the real space image frame, the person region 122 a, the object region 122 b, and the designated region 122 c, in which an alpha value has been set, transmitted by the transmitting unit 133 of the terminal device 100.

(Update unit 232)
The update unit 232 updates the video displayed on a virtual display device installed in the virtual space based on the real space image frame, the person area 122a, the object area 122b, and the specified area 122c received from the terminal device 100.

(Determination unit 233)
The determination unit 233 determines whether or not the real space image frame and object region 122b received from the terminal device 100 are an existing object region (for example, an object region that was included in a past image frame).

The determination unit 233 may use image features for determining whether the object region 122b is new or existing. For example, the determination unit 233 calculates features from pixels corresponding to the object region 122b in the image frame, and compares a predetermined virtual object 223a from a plurality of virtual objects created in advance. Then, the determination unit 233 can determine that there is similarity when the features satisfy a predetermined condition with any of the previously created virtual objects 223a.

(Generation unit 234)
The generation unit 234 generates a virtual object by using the real space image frame received from the terminal device 100 and the object region 122b or the person region 122a. Specifically, the generation unit 234 has a first generation unit 2341 and a second generation unit 2342, which each execute the above-mentioned processing.

(First generation unit 2341)
The first generation unit 2341 (generation unit) generates a virtual object 223a that imitates the shape or pattern of the object included in the object area 122b as a virtual object, using the real-space image frame and a pixel area corresponding to the object area 122b included in the real-space image frame based on the judgment result of the judgment unit 233.

For example, when the determination unit 233 determines that an object included in the object region 122b is new, the first generation unit 2341 can obtain pixels of the image frame corresponding to the object region 122b, use them as texture, generate a virtual object 223a that mimics the shape and pattern of the object, and place it in the virtual space. For example, when a book is shown as the object region 122b as in the real-space image frame 60R in the upper left of Figure 8, a rectangular VR object is generated and the pixels corresponding to the object region 122b of the image frame are transferred to the texture.

The first generating unit 2341 may use image features to generate the virtual object 223a. For example, if the determining unit 233 determines that a plurality of virtual objects 223a created in advance are similar, the first generating unit 2341 may generate the virtual object 223a created in advance in the virtual space.

The first generation unit 2341 may also create the virtual object 223a at the real space base. In this case, the first generation unit 2341 may generate a 3D model of the object to be created that has been photographed by a user at the real space base using a technique such as photogrammetry, and transmit the model to the virtual space base to generate the desired object.

The position at which virtual object 223a is placed in the virtual space may be consistent with the display image (the result of combining the image frame and the virtual space image frame). For example, as shown in display image 70RV in the lower part of Figure 8, a determination as to whether or not to generate virtual object 223a may be made only when a user at a real space base places an object (book) so that it extends into the virtual space, and the generated virtual object 223a may be placed at a corresponding position in the virtual space (near the avatar's left hand).

(Second generation unit 2342)
Here, the explanation will be continued by returning to Fig. 5. The second generating unit 2342 (generating unit) generates bones 223b that set the posture of the user in the virtual space based on one or more combinations of the position, size, and shape of the person included in the person area 122a as a virtual object, using the real-space image frame and a pixel area corresponding to the person area 122a included in the real-space image frame.

For example, the second generator 2342 can receive the real-space image frame and the person area 122a from the receiver 231, and generate bones 223b from the position, size, and shape of the person area 122a.

Here, an overview of bones 223b will be described with reference to FIG. 10. FIG. 10 is a diagram showing an overview of bones 223b according to the embodiment. Originally, bones in a virtual space correspond to the skeleton of an avatar, and are set in avatars that exist in the virtual space of this embodiment in order to perform collision detection with objects in the virtual space.

On the other hand, the bones 223b generated by the second generator 2342 here virtually set the posture that the user at the real-space base would be in if he or she were in the virtual space. In this embodiment, the bones 223b are generated for the purpose of obtaining collision detection with virtual objects in the virtual space, like normal bones, or for allowing the user at the real-space base to touch virtual objects in the virtual space when they reach out their hands. For this reason, the bones 223b are not drawn, visualized, rendered, or the like, and are not represented as avatars.

The top part of Figure 10 shows an example of the generated bones 223b being drawn superimposed on the person area 122a, reflecting the posture of the body and the position and inclination of the hands. In the virtual space 45 in the bottom part of Figure 10, by applying bones 223b to user 20R at the real-space base, it becomes possible to interfere with the hamburger 53 held by avatar 51 in the virtual space.

(Adjustment unit 235)
5 again, the description will be continued. The adjustment unit 235 adjusts a predetermined position and shape of a virtual object in a virtual space. Specifically, the adjustment unit 235 has a first adjustment unit 2351 and a second adjustment unit 2352, and each of them executes the above-mentioned processes.

(First adjustment unit 2351)
The first adjustment unit 2351 (adjustment unit) adjusts the position or shape of the virtual object 223a when at least one of the avatar and bones 223b, which are virtual objects, interferes with the virtual object 223a in the virtual space. Specifically, the first adjustment unit 2351 detects interference with or operation of the virtual object 223a by the avatar or bones 223b, and performs adjustment to move or deform the virtual object 223a accordingly.

For example, in virtual space 45 in the lower part of Fig. 10, the first adjustment unit 2351 behaves as if user 20R at the real-space base is touching hamburger 53 held by avatar 51. At this time, the first adjustment unit 2351 determines that the hand of the person area to which bone 223b is set has interfered with hamburger 53, which is virtual object 223a, and executes a movement such that hamburger 53 follows the hand of bone 223b.

(Second Adjustment Unit 2352)
The second adjustment unit 2352 (adjustment unit) adjusts the position or shape of the avatar when the bone 223b, which is a virtual object, interferes with the avatar in the virtual space. Specifically, when it is determined that the bone 223b interferes with the avatar or that the user has performed some action on the avatar via the controller, the second adjustment unit 2352 performs adjustment to move or deform the avatar.

For example, if a hamburger is placed in the virtual space, the second adjustment unit 2352 executes an operation in which the person area 122a in which the bone 223b of the user at the real-space base is set and the avatar in the virtual space grabs it and moves it, or brings it close to the face and eats it. In addition, when the bone 223b interferes with the avatar, the second adjustment unit 2352 may move the position of the avatar, or may perform adjustments based on operations such as swaying the hair when the avatar's head is stroked, or pinching the clothes.

(Calculation unit 236)
Returning to Fig. 5, the description will be continued. The calculation unit 236 uses the virtual space image frame to calculate the avatar area 222a and the virtual object area 222b from the virtual space image frame.

The virtual space image frame is the state of the virtual space as seen from the position of the virtual camera, and is assumed to be captured with an angle of view mainly centered on the avatar. The calculation unit 236 can calculate the avatar area 222a and the virtual object area 222b by using the image captured from the virtual camera position in the virtual space as the virtual space image frame.

(Transmitting unit 237)
The transmission unit 237 transmits the virtual space image frame, the avatar area 222a, and the virtual object area 222b to the terminal device 100 at the real space base. At this time, the virtual space image frame transmitted by the transmission unit 237 may have the sum of alpha values and a designated area set, similar to the real space base.

(Drawing unit 238)
The drawing unit 238 draws the real-space image frame and the virtual object by superimposing them on the virtual space. Specifically, the drawing unit 238 superimposes the real-space image frame and at least one of the avatar and the virtual object 223a, which are virtual objects, and draws them as one virtual display screen (virtual display image) in the virtual space. In other words, the drawing unit 238 may draw a virtual display image in which the real-space image frame of the real-space base and the virtual-space image frame of the virtual-space base are superimposed and displayed in the virtual space, as in the virtual space 42 shown in FIG. 3.

Furthermore, the rendering unit 238 renders, as a virtual display image, an image taken from the avatar's line of sight in the virtual space, with the angle of view centered on the position the user at the virtual space base wishes to see.

(Output unit 239)
The output unit 239 outputs information about virtual touch to present a virtual touch to the user when an avatar, which is a virtual object, interferes with at least one of the virtual object 223a or the bone 223b. Specifically, the output unit 239 outputs the interference caused by contact of the avatar or the virtual object 223a to the haptic output device 33a as haptic information such as vibration. As a result, the user 20V at the virtual space base can perceive haptic information such as vibration based on the interference caused in the virtual space via the haptic output device 33a built in the posture acquisition device 33 or the like.

The output unit 239 also outputs the virtual display image drawn by the drawing unit 238 to a display device (e.g., an HMD, etc.) worn by the user 20V at the virtual space base. This allows the user 20V at the virtual space base to obtain a field of view as if he or she were inside the virtual space, as the virtual display image is displayed on the HMD, etc., of the image captured from the line of sight of the avatar in the virtual space output by the output unit 239.

Furthermore, when a user at the virtual space base is wearing a haptic device, the output unit 239 may provide feedback to the user of the interference by vibration depending on the position where the bone 223b is touched.

3. Modifications
From here, we will explain modified examples realized by the virtual space display system 1 according to this embodiment. The virtual space display system 1 according to the embodiment described above provides a technology for realizing communication and interaction between a "real space base" and a "virtual space base" via a virtual space.

On the other hand, as a modified example, the virtual space display system 1 can also provide technology that realizes communication and interaction between "real space bases" and "real space bases" via virtual space. For example, the virtual space display system 1 relating to the modified example can provide an interaction such as interrupting the other person's screen when a hand is extended on a display screen on which frame images of real space bases are superimposed. As mentioned above, with regard to sharing objects, interaction can be realized by creating a virtual 2D space on the screen.

For example, the left half of the virtual space display system 1 is used as the screen for Person A, and the right half is used as Person B's screen, with a semi-transparent overhead view of a tennis court layer placed on top of it. Then, Person A and Person B can both use objects they are holding (such as a book or pen) to interact with each other, such as hitting the ball.

4. Processing Procedure
From here, the processing procedure by the virtual space display system 1 according to this embodiment will be described. Note that since the processing procedure is divided into a "processing procedure of the terminal device 100" and a "processing procedure of the information processing device 200", flowcharts showing the processing procedures will also be described. Note that each step described below may be executed in a different order, and some processing may be omitted. Also, the "processing procedure of the terminal device 100" and the "processing procedure of the information processing device 200" may be implemented in an appropriate combination.

(Processing Procedure of Terminal Device 100)
First, a processing procedure of the terminal device 100 will be described with reference to Fig. 11. Fig. 11 is a diagram showing an example of a flowchart of a processing procedure of the terminal device 100 according to the embodiment.

The calculation unit 131 calculates a person area and an object area from an image frame (step S101). The calculation unit 131 may also calculate a designated area as necessary.

The setting unit 132 sets the sum of the alpha values of the person area, object area, and specified area in the image frame (step S102). The transmission unit 133 transmits the real-space image frame, the person area, and the object area (step S103). Note that the transmission unit 133 may also transmit the specified area as necessary.

The information processing device 200 then performs a predetermined process based on the received real-space image frame, person area, and object area (step S104). Note that this process will be omitted here because it will be explained in detail in the processing procedure "steps S204 to S214" of the information processing device 200 described later in FIG. 12.

The receiving unit 134 receives the virtual space image frame, the avatar area, and the virtual object area from the information processing device 200 (step S105). The synthesis unit 135 synthesizes the display image using the real space image frame and the virtual space image frame (step S106). Then, the output unit 136 outputs the display image (step S107), and the process ends.

(Processing procedure of information processing device 200)
Next, a processing procedure of the information processing device 200 will be described with reference to Fig. 12. Fig. 12 is a diagram showing an example of a flowchart of a processing procedure of the information processing device 200 according to the embodiment.

The second adjustment unit 2352 adjusts the posture of the avatar using posture information output from an external device (e.g., a posture acquisition sensor or posture acquisition device) (step S201). If the avatar interferes with the virtual object (Yes in step S202), the first adjustment unit 2351 performs adjustments such as moving or deforming the virtual object (step S203). On the other hand, if the avatar does not interfere with the virtual object, the second adjustment unit 2352 skips the processing of step S203 (No in step S202).

The receiving unit 231 receives the real-space image frame, the person area, and the object area transmitted by the transmitting unit 133 of the terminal device 100 (step S204). Next, the updating unit 232 updates the video of the virtual space using the real-space image frame (step S205).

Here, the determination unit 233 determines that the object region is new (Yes in step S206). In this case, the second generation unit 2342 generates bones from the real-space image frame and the person region, and sets them in the person region (step S207). Furthermore, the first generation unit 2341 generates a virtual object from the real-space image frame and the object region (step S208).

On the other hand, if the determination unit 233 determines that the object region is not new, it skips the processing of steps S207 and S208 (No in step S206).

If the bone interferes with the avatar (Yes in step S209), the second adjustment unit 2352 performs adjustments such as moving or deforming the avatar (step S210). Next, if the bone interferes with a virtual object (Yes in step S211), the first adjustment unit 2351 performs adjustments such as moving or deforming the virtual object (step S212).

Then, the output unit 239 outputs information about the virtual touch based on the results of the adjustments made by the first adjustment unit 2351 and the second adjustment unit 2352 (step S213).

If the bone does not interfere with the avatar (No in step S209), the second adjustment unit 2352 does not perform any processing and skips steps S210 to S213. On the other hand, if the bone does not interfere with the virtual object (No in step S211), the first adjustment unit 2351 does not perform the processing in step S212 and skips it.

The calculation unit 236 calculates the avatar area and the virtual object area from the virtual space image frame (step S214). Next, the transmission unit 237 transmits the virtual space image frame, the avatar area, and the virtual object area to the terminal device 100 (step S215). Note that the subsequent processing in the terminal device 100 is omitted because it is the same as the content described in FIG. 12.

The drawing unit 238 draws a virtual display image to be displayed in the virtual space (step S216). Then, the output unit 239 outputs the virtual space image to a device worn by the user at the virtual space base (step S217), and the process ends.

5. Effects
Conventional technology extracts only the areas of people from captured video and pastes them into the virtual space, making it appear as if the user is actually at that point in the virtual space. However, if the accuracy of subject extraction is poor, degradation of image quality occurs at the boundary between people and other areas, which can impair the quality of experience of communication between users.

In addition, conventional technologies do not always take into consideration locations or users who participate in a virtual space without wearing an HMD or similar device. For example, in conventional technologies, a user who participates in a virtual space via video from the real world using a camera and microphone is unable to interfere with users who are immersed in the virtual space using an HMD, or with avatars that represent them in the virtual space, making it difficult to provide high-quality communication.

The virtual space display system 1 according to this embodiment displays a virtual space. The calculation unit 131 of the terminal device 100 uses an image frame output from the imaging device to calculate at least one of a person area recognized as a person in the image frame, an object area recognized as an object, and a designated area in which a predetermined area is designated. The setting unit 132 of the terminal device 100 sets a predetermined alpha value in the image frame using at least one of the person area, object area, and designated area. The synthesis unit 135 of the terminal device 100 synthesizes a display image based on a real space image frame, which is an image frame in which an alpha value has been set, and a virtual space image frame received from the information processing device 200.

On the other hand, the generation unit 234 of the information processing device 200 generates a virtual object using the real-space image frame received from the terminal device 100 and the object area or person area. The adjustment unit 235 of the information processing device 200 adjusts a predetermined position and shape of the virtual object in the virtual space. The drawing unit 238 of the information processing device 200 is characterized by drawing the real-space image frame and the virtual object by superimposing them in the virtual space. Therefore, according to this embodiment, the following effects are achieved.

In this embodiment, the virtual space display system 1 suppresses quality degradation caused by the accuracy of subject extraction in communication involving video and virtual space by combining real-space locations and virtual-space locations on the video. As a result, the virtual space display system 1 provides the effect of displaying a virtual space that makes it easy for users to understand the relative positions of each other.

In addition, in conventional technology, when placing real-world images in a virtual space, a method was sometimes used in which the subject was extracted and only the person was drawn, while the non-person areas were made transparent. However, in shooting environments other than special environments (such as chromakey studios), the accuracy of subject extraction was not very high, and jagged edges near the boundaries could be noticeable, and flickering due to lighting flicker or sensor noise could occur.

The virtual space display system 1 therefore synthesizes the real space and the virtual space and displays them on a display device. As a result, the virtual space display system 1 can render the real space image displayed in the virtual space without causing noise, by using the real space background as is. Note that the accuracy of the subject extraction is only apparent when a user at the real space base extends their hand and enters the virtual space, but by adjusting the size of the affected area, it is possible to suppress deterioration in the quality of the image as a whole.

Furthermore, in conventional technologies, when real-space bases and virtual-space bases are displayed side-by-side in an image, this is done by using CG synthesis on live-action images. For example, conventional technologies include techniques such as motion capture and face tracking that reference real-space images, depth information, or facial features, and reflect them on objects in virtual space, or construct virtual hands to operate objects. However, it can be difficult to achieve interaction between real-space bases and virtual-space bases.

Furthermore, the virtual space display system 1 provides the effect of enabling interaction between users, such as generating virtual objects from a real space base that can interact with avatars and virtual objects at a virtual space base.

Specifically, the virtual space display system 1 in this embodiment acquires a person area from an image of a real space base, sets virtual bones for it, and provides a virtual avatar in the virtual space that follows the user at the real space base. This allows the virtual space display system 1 to render images of real space and virtual space in parallel, providing the effect of allowing a subject included in the image of real space to jump over the boundary and interfere with an object in the virtual space.

Furthermore, the virtual space display system 1 in this embodiment makes it possible to provide high-quality communication even in cases where, depending on the content of the communication, an object needs to be handed over from the real space to the virtual space. Specifically, the virtual space display system 1 allows the user at the real space base to point the object they want to hand over to the camera, and the virtual space base recognizes it and generates a similar object, thus virtually handing over the object.

6. Hardware Configuration
Each component of each device shown in the figure is a functional concept, and does not necessarily have to be physically configured as shown in the figure. In other words, the specific form of distribution and integration of each device is not limited to that shown in the figure, and all or a part of it can be functionally or physically distributed and integrated in any unit according to various loads, usage conditions, etc. Furthermore, all or any part of each processing function performed by each device can be realized by a CPU and a program analyzed and executed by the CPU, or can be realized as hardware using wired logic.

Furthermore, among the various processes described in this embodiment, all or part of the processes described as being performed automatically can also be performed manually using known methods. In addition, the information including the processing procedures, control procedures, specific names, various data, and parameters shown in the drawings can be changed as desired unless otherwise specified.

[program]
In one embodiment, the various devices constituting the terminal device 100 and the information processing device 200 can be implemented by installing the above-mentioned virtual space display program as package software or online software on a desired computer. For example, the above-mentioned virtual space display program can be executed by an information processing device to function as various devices constituting the terminal device 100 and the information processing device 200. The information processing device here includes desktop or notebook personal computers. In addition, the information processing device also includes mobile communication terminals such as smartphones and mobile phones, and slate terminals such as PDAs (Personal Digital Assistants).

FIG. 13 is a diagram showing an example of a computer in which the constituent devices of the virtual space display system 1 according to this embodiment are realized. The computer 1000 has, for example, a memory 1010 and a CPU 1020. The computer 1000 also has a hard disk drive interface 1030, a disk drive interface 1040, a serial port interface 1050, a video adapter 1060, and a network interface 1070. Each of these components is connected by a bus 1080.

The memory 1010 includes a ROM (Read Only Memory) 1011 and a RAM 1012. The ROM 1011 stores a boot program such as a BIOS (Basic Input Output System). The hard disk drive interface 1030 is connected to a hard disk drive 1090. The disk drive interface 1040 is connected to a disk drive 1100. A removable storage medium such as a magnetic disk or optical disk is inserted into the disk drive 1100. The serial port interface 1050 is connected to a mouse 1110 and a keyboard 1120, for example. The video adapter 1060 is connected to a display 1130, for example.

The hard disk drive 1090 stores, for example, an OS 1091, an application program 1092, a program module 1093, and program data 1094. That is, the programs that define the processes of the various devices that make up the terminal device 100 and the information processing device 200 are implemented as program modules 1093 in which computer-executable code is written. The program modules 1093 are stored, for example, in the hard disk drive 1090. For example, the program modules 1093 for executing processes similar to the functional configurations of the various devices that make up the terminal device 100 and the information processing device 200 are stored in the hard disk drive 1090. The hard disk drive 1090 may be replaced by an SSD (Solid State Drive).

Furthermore, the setting data used in the processing of the above-mentioned embodiment is stored as program data 1094, for example, in memory 1010 or hard disk drive 1090. Then, the CPU 1020 reads out the program module 1093 or program data 1094 stored in memory 1010 or hard disk drive 1090 into RAM 1012 as necessary, and executes the processing of the above-mentioned embodiment.

The program module 1093 and program data 1094 are not limited to being stored in the hard disk drive 1090, but may be stored in a removable storage medium, for example, and read by the CPU 1020 via the disk drive 1100 or the like. Alternatively, the program module 1093 and program data 1094 may be stored in another computer connected via a network (such as a LAN or a WAN (Wide Area Network)). The program module 1093 and program data 1094 may then be read by the CPU 1020 from the other computer via the network interface 1070.

[7. Other]
Although the present embodiment has been described above, the present embodiment is not limited by the description and drawings that form a part of the disclosure. In other words, other embodiments, examples, operation techniques, etc. made by those skilled in the art based on the present embodiment are all included in the scope of the present embodiment.

10R: Real space base 10V:

Virtual space base

20R, 20V:

User

30R, 30V:

Display device

31R, 31V: Image capture device 32: HMD
32a Attitude acquisition sensor 33 Attitude acquisition device 33a

Tactile output device

40, 41, 42, 43, 44, 45 Virtual space 50 Ground 51 Avatar 52 Book 53 Hamburger 54 Area 60 Image frame 60R Real space image frame 60V Virtual space image frame 70RV Display image 100 Terminal device 110 Communication unit 120 Storage unit 121 Image frame storage unit 122 Area information storage unit

122a Person area

122b Object area 122c Designated area 122ab Person area + object area 122bc Object area + designated area 122abc Person area + object area + designated area 123 Composite image storage unit 130 Control unit 131 Calculation unit 132 Setting unit 133 Transmission unit 134 Receiving unit 135 Synthesis unit 136 Output unit 200 Information processing device 210 Communication unit 220 Storage unit 221 Image frame storage unit 222 Area information storage unit 222a Avatar area 222b Virtual object area 223 Generation information storage unit 223a Virtual object 223b Bone 230 Control unit 231 Receiving unit 232 Update unit 233 Determination unit 234 Generation unit 2341 First generation unit 2342 Second generation unit 235 Adjustment unit 2351 First adjustment unit 2352 Second adjustment unit 236 Calculation unit 237 Transmission unit 238 Drawing unit 239 Output unit 1000 Computer 1010 Memory 1011 ROM
1012 RAM
1020 CPU
1030 Hard disk drive interface 1040 Disk drive interface 1050 Serial port interface 1060 Video adapter 1070 Network interface 1080 Bus 1090 Hard disk drive 1091 OS
1092 application program 1093 program module 1094 program data 1100 disk drive 1110 mouse 1120 keyboard

Claims

A virtual space display system for displaying a virtual space, comprising:
The terminal device is
a calculation unit that calculates, using an image frame output from the image capture device, at least one of a person area recognized as a person in the image frame, an object area recognized as an object, and a designated area that designates a predetermined area;
a setting unit that sets a predetermined alpha value in the image frame by using at least one of the person region, the object region, and the designated region;
a synthesis unit that synthesizes a display image based on a real space image frame, which is the image frame to which the alpha value is set, and a virtual space image frame received from an information processing device,
The information processing device includes:
a generation unit that generates a virtual object by using the real space image frame received from the terminal device and the object region or the person region;
an adjustment unit that adjusts a predetermined position and a predetermined shape of the virtual object in the virtual space;
a drawing unit that draws the real-space image frame and the virtual object in the virtual space by superimposing them on each other;
A virtual space display system comprising:
the generation unit uses the real-space image frame and a pixel region corresponding to the object region included in the real-space image frame to generate, as the virtual object, a virtual object that imitates a shape or a pattern of an object included in the object region;
generating bones for setting a posture of a user in the virtual space based on one or more combinations of a position, a size, and a shape of the person included in the person area as the virtual object, using the real-space image frame and a pixel area corresponding to the person area included in the real-space image frame;
2. The virtual space display system according to claim 1.
the adjustment unit adjusts a position or a shape of the virtual object when at least one of the avatar and the bone, which are the virtual object, interferes with the virtual object in the virtual space;
adjusting a position or a shape of the avatar when the bone, which is the virtual object, and the avatar interfere with each other in the virtual space;
2. The virtual space display system according to claim 1.
the rendering unit superimposes the real-space image frame and at least one of an avatar and a virtual object, which are the virtual objects, and renders the superimposed real-space image frame as one virtual display screen in the virtual space;
4. The virtual space display system according to claim 1, wherein the virtual space display system comprises: a display unit configured to display a display image of the virtual space;
an output unit that outputs information about a virtual touch to present a virtual touch to a user when an avatar, which is the virtual object, interferes with at least one of a virtual object and a bone;
4. The virtual space display system according to claim 1, wherein the virtual space display system comprises: a display unit configured to display a display image of the virtual space;
a calculation unit that calculates, using an image frame output from the image capture device, at least one of a person area recognized as a person in the image frame, an object area recognized as an object, and a designated area that designates a predetermined area;
a setting unit that sets a predetermined alpha value in the image frame by using at least one of the person region, the object region, and the designated region;
a synthesis unit that synthesizes a display image based on a real-space image frame, which is the image frame to which the alpha value is set, and a virtual-space image frame received from an information processing device;
A terminal device comprising:
A virtual space display program for causing a computer to function as the terminal device described in claim 6.