WO2024034396A1

WO2024034396A1 - Information processing device, information processing method, and program

Info

Publication number: WO2024034396A1
Application number: PCT/JP2023/027306
Authority: WO
Inventors: 龍正小池; 光高鳥; 吉弘田村
Original assignee: ソニーグループ株式会社
Priority date: 2022-08-10
Filing date: 2023-07-26
Publication date: 2024-02-15

Abstract

An information processing device according to the present technology comprises: an avatar generation unit that generates an avatar of a photographer by reflecting body information of the photographer; a three-dimensional space generation unit that generates information about a three-dimensional space from a photographed image, and disposes the avatar in the three-dimensional space in accordance with the orientation of the photographer at photographing; and a display video generation unit that generates, as a display video, a video from a viewing position set in the three-dimensional space.

Description

Information processing device, information processing method, program

The present technology relates to an information processing device, an information processing method, and a program for communicating between users in remote locations.

There is a desire to facilitate smooth communication between users in remote locations. In order to meet such demands, Patent Document 1 below proposes a system in which not only normal voices such as conversations are presented, but also additional information according to the context (state, situation) of the content, etc. , a technology for promoting communication such as natural conversation has been disclosed.

Japanese Patent Application Publication No. 2021-071632

By the way, in order for home users who are unable to leave their homes due to illness or old age to feel as if they are traveling with others, they may try to communicate with the user who is on the road and the home user. .
At this time, the user on the trip carries a mobile terminal or the like and travels around the destination while taking pictures of the scenery, and enjoys conversations with people at home while sharing the captured images.

However, when sharing a video of a landscape taken from the photographer's perspective, it is difficult to share the feeling of traveling with the photographer because the photographer is not reflected within the angle of view.
In addition, when a photographer shares a self-portrait while capturing the scenery behind him within the field of view, the direction of movement of the photographer and the direction of movement experienced by the person at home are opposite, so It is difficult to share the feeling of traveling with other people.

The present technology was developed in view of these problems, and aims to provide an experience that makes it feel like you are walking around the shooting location with the photographer, even though you are in a remote location.

The information processing device according to the present technology includes an avatar generation unit that generates an avatar of the photographer by reflecting physical information of the photographer, and an avatar generation unit that generates three-dimensional space information from the photographed image, and A three-dimensional space generation unit that arranges the avatar in the three-dimensional space according to its orientation, and a display image generation unit that generates an image from a viewing position set in the three-dimensional space as a display image. It is.
As a result, the person at home (viewer) can feel as if he/she is walking around the shooting location together with the photographer.

The information processing method according to the present technology includes a process of generating an avatar of the photographer by reflecting the physical information of the photographer, and generating information of a three-dimensional space from the photographed image, and adjusting the orientation of the photographer at the time of photographing. In this information processing method, an information processing apparatus executes a process of arranging the avatar in the three-dimensional space in accordance with the above, and a process of generating a video from a viewing position set in the three-dimensional space as a display video.

The program according to the present technology has a function of generating an avatar of the photographer by reflecting the physical information of the photographer, and generating information of a three-dimensional space from the photographed image, and creating an avatar of the photographer according to the orientation of the photographer at the time of photographing. This program causes a computer device to execute a function of arranging the avatar in the three-dimensional space, and a function of generating a video from a viewing position set in the three-dimensional space as a display video.
Such an information processing method and program can also provide the same effect as the information processing device according to the present technology described above.

FIG. 1 is a block diagram illustrating a configuration example of an information processing system according to a first embodiment of the present technology. FIG. 2 is an explanatory diagram showing the appearance of a distributor terminal and an HM device. FIG. 3 is a diagram showing an example of an image presented to a distributor. FIG. 2 is a diagram showing an example of an image presented to a viewer. FIG. 7 is a diagram showing another example of an image presented to a distributor. FIG. 2 is a block diagram of a computer device. FIG. 2 is a diagram showing the flow of processing executed by each device of the information processing system. It is a flow chart about an example of processing performed in an HM device. 3 is a flowchart illustrating an example of processing executed at a distributor terminal. It is a flow chart about an example of processing performed in a viewer terminal. FIG. 2 is a block diagram showing a configuration example of an information processing system according to a second embodiment. FIG. 3 is a diagram for explaining localization of a sound image according to the position of an avatar. FIG. 6 is an explanatory diagram of an example in which the volume of sound is changed depending on the distance from the avatar. 12 is a flowchart of an example of a process executed at a distributor terminal in the second embodiment. It is a flowchart about an example of a process performed in a viewer terminal in a 2nd embodiment. FIG. 3 is a block diagram showing a configuration example of an information processing system in a third embodiment. It is a flowchart about an example of processing performed in a viewer terminal in a 3rd embodiment. FIG. 3 is an explanatory diagram of the correspondence between the orientation of the distributor and the orientation of the avatar. It is a figure showing an example of an avatar. It is a figure which shows another example of an avatar.

Hereinafter, embodiments of the present technology will be described in the following order with reference to the accompanying drawings.
<1. First embodiment>
<1-1. Information processing system configuration>
<1-2. Computer equipment>
<1-3. Processing flow>
<2. Second embodiment>
<3. Third embodiment>
<4. Modified example>
<5. Summary＞
<6. This technology＞

<1. First embodiment>
<1-1. Information processing system configuration>
A configuration example of the information processing system 1 in the first embodiment will be described with reference to the attached drawings.

The information processing system 1 is a system for facilitating smooth communication between users located at separate locations. Further, the information processing system 1 is also a system used by one user to view a video shot while moving while the other user is at home or the like.

In the following description, a user (photographer) who shoots a video while moving will be referred to as a "distributor", and a user who views the video shot by the distributor will be referred to as a "viewer".

The information processing system 1 is a system that provides the viewer with an experience as if they were moving along with the broadcaster through the filming location.

The information processing system 1 includes a distributor terminal 2 and an HM (Head-mount) device 3 as a distributor-side device, and further includes a viewer terminal 4 as a viewer-side device (see FIG. 1).

An example of the distributor terminal 2 and the HM device 3 is shown in FIG.
The distributor terminal 2 is, for example, a smartphone or the like, and is a device that the distributor can hold in his hand and shoot video while moving.

The distributor terminal 2 is equipped with a display unit that can display images from the viewer side and a microphone that can collect the distributor's voice and environmental sounds.

The HM device 3 is connected to the distributor terminal 2 for wired or wireless communication, and is configured with a speaker for reproducing audio and environmental sounds from the viewer side.

Note that the distributor terminal 2 and the HM device 3 may be realized by a single mobile terminal device by consolidating the functions of both in a smartphone.

An example of the viewer terminal 4 is shown in FIG.
The viewer terminal 4 is a stationary device, and includes a display unit on which images shot by the distributor terminal 2 are displayed, a speaker for reproducing the broadcaster's voice and environmental sounds, and the viewer's voice, etc. It is equipped with a microphone for inputting information.

Additionally, the viewer terminal 4 may include a controller that performs various operations. As the controller, various devices such as a keyboard and a mouse can be considered.

Note that the viewer terminal 4 may be configured as a system in which some or all of the computer device, microphone, speaker, and operation device are independently provided.

As shown in FIG. 1, the distributor terminal 2 includes an input section 5, a physical information acquisition section 6, an output section 7, and a communication section 8.
The HM device 3 also includes an IMU (Inertial Measurement Unit) 9, an output section 10, and a communication section 11.

The input unit 5 of the distributor terminal 2 includes a microphone for inputting voice and environmental sounds, a camera unit equipped with an image sensor for capturing images, a touch panel for inputting the distributor's operations, various button operators, etc. It is included.

The input unit 5 outputs “sound data” such as audio and environmental sounds, and “video data” such as still images and moving images captured by the image sensor to the communication unit 8. In addition, in the figure, sound data is simply described as "sound", and video data is simply described as "video".

The physical information acquisition unit 6 acquires physical information of the distributor based on acceleration information and angular velocity information as sensing data by the IMU 9 included in the HM device 3.

Here, the distributor's physical information includes, for example, the distributor's posture and movements. Specifically, this is information that specifies the distributor's face direction, line of sight direction, movement speed (walking speed), movement method (walking, cycling, etc.), posture, gestures, etc.

These pieces of information may be acquired not only from the IMU 9 included in the HM device 3 but also from a camera that photographs the distributor. For example, information specifying the line of sight, gestures, posture, etc. of the distributor may be obtained by analyzing an image captured by a camera included in the HM device 3.

The physical information acquisition unit 6 outputs the physical information acquired from the IMU data to the communication unit 8.

The output unit 7 outputs the sound and video on the viewer side received from the viewer terminal 4. The output unit 7 in this example includes, for example, a display unit that outputs video from the viewer side, and the sound from the viewer side is output from a speaker serving as the output unit 10 included in the HM device 3. Ru.

An example of an image displayed on the display section as the output section 7 is shown in FIG.

As shown in the figure, in the main area Ar1 of the display unit serving as the output unit 7, an image captured by the camera of the viewer terminal 4 is displayed in a large size. That is, an image of the viewer sitting on a chair and facing the camera is displayed.

In the upper right corner area Ar2 of the main area Ar1 of the display section, an image captured by the camera unit of the distributor terminal 2 is displayed. Note that the video displayed in the corner area Ar2 may include an avatar AT, which will be described later.

The communication unit 8 receives IMU data from the HM device 3 and supplies it to the physical information acquisition unit 6. Further, the communication unit 8 supplies the video data received from the viewer terminal 4 to the output unit 7 and transmits the sound data received from the viewer terminal 4 to the HM device 3. Furthermore, the communication unit 8 transmits the audio data and video data supplied from the input unit 5 and the physical information supplied from the physical information acquisition unit 6 to the viewer terminal 4.

The IMU 9 of the HM device 3 acquires IMU data (acceleration data, angular velocity data, etc.) used to obtain physical information about the distributor by being equipped with an acceleration sensor, an angular velocity sensor, etc., and outputs it to the communication unit 11.

The output unit 10 is configured with a speaker, and performs audio output by reproducing audio data as audio data received from the distributor terminal 2.

The communication unit 11 transmits IMU data supplied from the IMU 9 to the distributor terminal 2 and receives sound data from the distributor terminal 2.

As shown in FIG. 1, the viewer terminal 4 includes an input section 12, an avatar generation section 13, a three-dimensional space generation section 14, a display video generation section 15, an output section 16, and a communication section 17.

The input unit 12 includes a microphone, a camera, various operators, and the like. The input section 12 outputs audio data and video data to the communication section 17.

The avatar generation unit 13 generates an avatar AT for the distributor based on the physical information received from the distributor terminal 2. The avatar AT is, for example, a three-dimensional object.

The avatar AT generated by the avatar generation unit 13 may be based on a photographed image of the broadcaster, or may be based on a specific character selected by the broadcaster or the viewer. .

Furthermore, at least a part of the avatar AT may be transparent or semi-transparent, or may be primitive enough to be recognized as a human.

The three-dimensional object as the avatar AT generated by the avatar generation unit 13 is supplied to the three-dimensional space generation unit 14 as an avatar model.

Note that the avatar generation unit 13 reflects physical information about the distributor on the avatar AT.

Specifically, physical information such as the distributor's face direction, line of sight direction, joint shape, etc. is reflected in the posture of the avatar AT. In addition, the distributor's gestures, walking or stopping state, walking speed, etc. as physical information are reflected as the movement of the avatar AT. Note that the orientation of the broadcaster's body is taken into consideration during placement.

By generating an avatar AT that reflects the distributor's posture and movements in this way, it is possible to make the viewer perceive the distributor's situation, etc.

Note that instead of reflecting the physical information on the avatar AT, the avatar generation unit 13 may provide the three-dimensional space generation unit 14 with information such as the posture and orientation that the avatar AT should take as additional information of the avatar model.

The avatar generation unit 13 may add the distributor's movements as an animation to the avatar AT.
The animation given to the avatar AT regarding the distributor may be determined based on the distributor's moving speed or movement mode (such as walking or cycling).

For example, if the distributor is walking, a walking animation may be added, and if the distributor is running, a running animation may be added.
Further, if it is estimated that the distributor is moving on a bicycle, an animation of the avatar AT may be provided with an animation of the broadcaster moving on a bicycle.

Further, when a gesture movement by the distributor is received as physical information, the avatar generation unit 13 may add an animation to make the avatar AT perform the gesture movement.

These various types of information are transmitted from the distributor terminal 2 as physical information as described above.

The avatar generation unit 13 performs processing to reflect physical information on the avatar AT multiple times during communication between the distributor terminal 2 and the viewer terminal 4. For example, physical information may be reflected on the avatar AT periodically, such as once every second, or physical information may be reflected on the avatar AT when a change occurs in the broadcaster's posture, direction, movement, etc. It's okay.

The three-dimensional space generation unit 14 performs image recognition (three-dimensional estimation processing) on the video (photographed image) received from the distributor terminal 2 to identify image areas such as the floor (ground) and walls, and generates three-dimensional space. Generate space.

There are several possible ways to generate a three-dimensional space. For example, a three-dimensional object is generated by performing image recognition on various subjects shown in a video, and by pasting an image cut out from the video as a texture on each surface of each three-dimensional object, a three-dimensional object based on the captured image can be created. Dimensional spaces can be generated.

Additionally, a three-dimensional space may be realized by pasting a photographed image on the inner surface of a sphere.

The three-dimensional space generation unit 14 arranges the viewing position and the avatar AT in the generated three-dimensional space.

The viewing position is the position of the viewer's viewpoint in the generated three-dimensional space, and can be regarded as the position of the virtual camera.

The position of the camera that captured the video used to generate the three-dimensional space and the position of the virtual camera set after generating the three-dimensional space may be the same or different positions.

The three-dimensional space generation unit 14 specifies a position suitable for placing the avatar AT as a "placeable position" based on the result of the three-dimensional estimation process in the recognized three-dimensional space.

In order to specify the position where the device can be placed, when the ground, foothold, etc. facing vertically upward is detected, that area is specified as the position where it can be placed. That is, the three-dimensional space generation unit 14 specifies a position where spatial consistency can be ensured even if the avatar AT is placed as a position where the avatar AT can be placed.
In other words, if the ground facing vertically upward is not detected, it is determined that the possible placement position cannot be specified.

Note that the placeable position may be a placeable area where the avatar AT can be placed.

At this time, the three-dimensional space generation unit 14 first specifies a position where the virtual camera can be placed within the angle of view when the virtual camera is set at the viewing position. That is, the three-dimensional space generation unit 14 determines whether or not there is a position where the avatar AT can be placed in the video presented to the viewer.

The three-dimensional space generation unit 14 places the avatar AT generated by the avatar generation unit 13 at a position where it can be placed.

When the avatar AT is placed within the viewing angle of the virtual camera set at the viewing position, the captured image with the avatar AT placed is displayed on a display unit such as a monitor as the output unit 16 of the viewer terminal 4. Ru.

FIG. 4 shows an example of an image displayed on the output unit 16 of the viewer terminal 4.
As shown in the figure, in the display image of the output unit 16, an avatar AT as a three-dimensional object with an image of the distributor pasted as a texture is placed at the left end of the road.

Note that additional conditions may be taken into consideration when specifying possible placement positions in the three-dimensional space generation unit 14.
For example, even if the ground is detected, it may be determined that a position where a person cannot actually stand or a position where it is inappropriate for a person to stand is not a position that can be placed.

For example, on a road with a roadway and a sidewalk, it may be determined that the roadway area is not a position where it can be placed. That is, only the area of the sidewalk may be specified as the position where it can be placed.

Furthermore, even if a surface facing vertically upward is detected, if that surface is a water surface such as a lake surface, it may be determined that the surface is not a position where it can be placed.

Alternatively, if the detected vertically upward facing surface is the top surface of an object such as a suitcase, it may be determined that the position is not a position that can be placed.
In other words, when the ground is detected, a region where no object exists on the ground is identified as a placement possible position.

Additionally, it may be determined that positions where other passersby or animals are present are not positions that can be placed.

The three-dimensional space generation unit 14 determines not to arrange the avatar AT within the field of view of the virtual camera when a position that can be placed within the field of view of the virtual camera cannot be specified.

There are several possible processes when it is decided not to place the avatar AT within the field of view of the virtual camera.

One is to arrange the avatar AT outside the field of view of the virtual camera.
Thereby, the viewer who views the video based on the angle of view of the virtual camera does not have to view the avatar AT in an unnatural state.

The other is to display the avatar AT at a specific position on the display section as the output section 16 of the viewer terminal 4. At this time, in order to make it clear that the avatar AT is not placed in a three-dimensional space, it is possible to display the avatar AT after securing an area above the display area where, for example, the sky is likely to be displayed. good.

Note that the three-dimensional space generation unit 14 may decide not to arrange the avatar AT within the viewing angle of the virtual camera based on conditions other than those described above. For example, if the viewer's desire to see scenery can be inferred, or if the viewer performs an operation to hide the avatar AT, it is necessary to place the avatar AT within the viewing angle of the virtual camera. You may decide.

In that case, the three-dimensional space generation unit 14 may immediately place the avatar AT outside the angle of view, or may add animation to the avatar AT so that the avatar AT moves outside the angle of view. The AT may naturally move out of the field of view. By adding the animation, a high sense of realism can be provided to the viewer.

FIG. 5 is an example in which a main area Ar3 where the video shot by the distributor terminal 2 is displayed and a sub area Ar4 where only the avatar AT is displayed are displayed on the output unit 16.

In order to indicate that the avatar AT is not placed at a predetermined position in a three-dimensional space, the sub-area Ar4 is an area surrounded by a rectangular frame.

Thereby, the viewer can understand the status of the broadcaster through the avatar AT without feeling uncomfortable.

The three-dimensional space generation unit 14 arranges the avatar AT at a position where the avatar AT can be placed so that the direction of the distributor's body at the shooting location matches the direction of the body of the avatar AT in the three-dimensional space.

Note that the avatar AT generated by the avatar generation unit 13 is generated so that the direction of the face, the direction of the line of sight, etc. match the direction of the face and the direction of the line of sight of the distributor.
Therefore, by arranging the avatar AT so that the body direction of the avatar AT matches the body direction of the broadcaster in the real space by the three-dimensional space generation unit 14, the direction of the face and line of sight of the broadcaster can also be adjusted in the real space. The direction of the broadcaster's face and line of sight will be matched. Therefore, the posture and various orientations of the distributor in real space can be reproduced in three-dimensional space without any discomfort.

The three-dimensional space generation unit 14 outputs a model of a three-dimensional space in which the avatar AT is arranged and the viewing position is set (hereinafter referred to as a “3D (Dimension) space model” as appropriate) to the display video generation unit 15.

The display video generation unit 15 performs rendering processing to generate a viewing video from a virtual camera set at a viewing position, using a 3D space model regarding a three-dimensional space in which the avatar AT is placed. In other words, the display video generation unit 15 obtains a two-dimensional image by capturing a three-dimensional space using a virtual camera.
Thereby, the display image generation section 15 outputs a two-dimensional rendered image to the output section 16.

The output unit 16 outputs the sound and video received from the distributor terminal 2.
Specifically, the output unit 16 outputs the sound data received from the distributor terminal 2 from a speaker or the like serving as the output unit 16. Further, the output unit 16 performs a process of displaying the rendered image supplied from the display image generation unit 15 on a display unit serving as the output unit 16.

The communication unit 17 receives sound data, video data, and physical information from the distributor terminal 2, supplies the sound data to the output unit 16, supplies the video data to the three-dimensional space generation unit 14, and uses the physical information to generate an avatar. 13. Furthermore, the communication unit 17 transmits the audio data and video data acquired by the viewer terminal 4 to the distributor terminal 2.

<1-2. Computer equipment>
The above-described distributor terminal 2, HM device 3, and viewer terminal 4 realize each function in each device by having a computer device execute a predetermined program.

A functional block diagram of the computer device is shown in FIG.
Note that each computer device does not need to have all of the configurations shown below, and may have only some of them.

As shown in FIG. 6, the CPU (Central Processing Unit) 71 of each computer device stores data in a nonvolatile memory section 74 such as a ROM (Read Only Memory) 72 or an EEP-ROM (Electrically Erasable Programmable Read-Only Memory). Various processes are executed according to the program currently running or the program loaded from the storage unit 79 to the RAM (Random Access Memory) 73. Further, the RAM 73 also appropriately stores data necessary for the CPU 71 to execute various processes.
The CPU 71, ROM 72, RAM 73, and nonvolatile memory section 74 are interconnected via a bus 83. An input/output interface 75 is also connected to this bus 83.

The input/output interface 75 is connected to an input section 76 consisting of an operator or an operating device.
For example, as the input unit 76, various operators and operating devices such as a keyboard, a mouse, a key, a dial, a touch panel, a touch pad, and a remote controller are assumed.
A user's operation is detected by the input unit 76, and a signal corresponding to the input operation is interpreted by the CPU 71.

Further, a display section 77 consisting of an LCD (Liquid Cristal Display) or an organic EL panel, and an audio output section 78 consisting of a speaker etc. are connected to the input/output interface 75 either integrally or separately.
The display unit 77 is a display unit that performs various displays, and is configured by, for example, a display device provided on the front of the distributor terminal 2 as a computer device, a separate display device connected to the housing, or the like.
The display unit 77 displays various images, moving images (videos), etc. on the display screen based on instructions from the CPU 71. Further, the display unit 77 displays various operation menus, icons, messages, etc., ie, as a GUI (Graphical User Interface), based on instructions from the CPU 71.

The input/output interface 75 may be connected to a storage section 79 made up of a hard disk, solid-state memory, etc., and a communication section 80 made up of a modem or the like.

The communication unit 80 performs communication processing via a transmission path such as the Internet, and communicates with various devices by wired or wireless communication, bus communication, or the like.

A drive 81 is also connected to the input/output interface 75 as required, and a removable storage medium 82 such as a magnetic disk, optical disk, magneto-optical disk, or semiconductor memory is appropriately installed.
The drive 81 can read data files such as image files and various computer programs from the removable storage medium 82 . The read data file is stored in the storage section 79, and images and sounds included in the data file are outputted on the display section 77 and the audio output section 78. Further, computer programs and the like read from the removable storage medium 82 are installed in the storage unit 79 as necessary.

In this computer device, for example, software for the processing of this embodiment can be installed via network communication by the communication unit 80 or the removable storage medium 82. Alternatively, the software may be stored in advance in the ROM 72, storage unit 79, or the like.

By the CPU 71 performing processing operations based on various programs, necessary information processing and communication processing are executed in the communication section 8 of the distributor terminal 2, the communication section 11 of the HM device 3, and the communication section 17 of the viewer terminal 4. be done.
Note that the computer devices constituting the distributor terminal 2, HM device 3, and viewer terminal 4 are not limited to a single computer device as shown in FIG. may be configured. The plurality of computer devices may be systemized using a LAN or the like, or may be located at remote locations via a VPN using the Internet or the like. The plurality of computer devices may include computer devices as a server group (cloud) that can be used by a cloud computing service.

<1-3. Processing flow＞
FIG. 7 shows an example of the flow of processing executed in each of the distributor terminal 2, HM device 3, and viewer terminal 4. Note that the execution order of each process shown in FIG. 7 is just an example, and some processes may be executed one after the other, or some processes may be executed in parallel.

The CPU 71 of the HM device 3 acquires IMU data from the IMU 9 in step S101, and transmits the IMU data to the distributor terminal 2 in step S102.

On the other hand, the CPU 71 of the distributor terminal 2 executes the process of acquiring the audio data and video data on the distributor side from the RAM 73 included in the microphone and image sensor in step S201, and then acquires the IMU data transmitted from the HM device 3. The receiving process is executed in step S202.

In the following step S203, the CPU 71 of the distributor terminal 2 acquires the distributor's physical information based on the received IMU data.

In step S204, the CPU 71 of the distributor terminal 2 transmits the audio data and video data of the distributor, and the physical information estimated about the distributor to the viewer terminal 4.

The CPU 71 of the viewer terminal 4 executes the process of acquiring the audio data and video data on the viewer side from the RAM 73 provided in the microphone and camera in step S301, and then acquires the audio data and video data transmitted from the distributor terminal 2. and physical information is received in step S302.

In step S303, the CPU 71 of the viewer terminal 4 generates an avatar AT based on the physical information. The avatar AT generated at this time may be provided with an animation based on physical information as described above.

In step S304, the CPU 71 of the viewer terminal 4 generates a three-dimensional space based on the video data captured by the distributor. For example, the CPU 71 of the viewer terminal 4 performs three-dimensional estimation by performing image recognition processing on the video, and generates a three-dimensional model for each subject.

In step S305, the CPU 71 of the viewer terminal 4 sets the viewing position in the generated three-dimensional space.

The CPU 71 of the viewer terminal 4 identifies a placement position where the avatar AT can be placed based on the result of the three-dimensional estimation, and in the subsequent step S307 places the avatar AT in the placement area.

In step S308, the CPU 71 of the viewer terminal 4 performs rendering processing to generate a viewing video based on the 3D space model and viewing position regarding the 3D space in which the avatar AT is placed.

In step S309, the CPU 71 of the viewer terminal 4 generates and reproduces an audio signal from the sound data received from the distributor terminal 2.

In step S310, the CPU 71 of the viewer terminal 4 outputs the rendered video generated by the rendering process in step S308.

In step S311, the CPU 71 of the viewer terminal 4 transmits the audio data and video data on the viewer side acquired in step S301 to the distributor terminal 2.

The CPU 71 of the distributor terminal 2 receives the audio data and video data in step S205, and transmits only the audio data to the HM device 3 in step S206.
In response, the CPU 71 of the HM device 3 receives the audio data on the viewer side in step S103, and generates and reproduces an audio signal from the audio data in step S104.

The CPU 71 of the distributor terminal 2 outputs the remaining video data in step S207.

In this way, playback processing based on the sound data and acoustic data acquired on the distributor side is performed in the viewer's environment, and playback processing based on the sound data and acoustic data acquired on the viewer side. is performed by the distributor.

The processing executed by each device to realize the processing flow shown in FIG. 7 will be described.

FIG. 8 is an example of a process executed by the CPU 71 of the HM device 3. Note that processes similar to those shown in FIG. 7 are given the same step numbers, and description thereof will be omitted as appropriate.

In step S111, the CPU 71 of the HM device 3 determines whether it is time to acquire IMU data. If it is determined that it is time to acquire the IMU data, the CPU 71 of the HM device 3 acquires the IMU data in step S101.

The IMU 9 of the HM device 3 outputs sensing data, for example, every few msec, but the CPU 71 may acquire these sensing data all at once every several hundred msec or every few seconds, or may acquire these sensing data every time the sensing data is output. It's okay.

In step S102, the CPU 71 of the HM device 3 causes the communication unit 11 to execute a process of transmitting the acquired IMU data to the distributor terminal 2.

On the other hand, if it is determined in step S111 that it is not the time to acquire IMU data, the CPU 71 of the HM device 3 proceeds to the process in step S112 without executing the processes in step S101 and step S102.

In step S112, the CPU 71 of the HM device 3 determines whether or not the sound data acquired on the viewer side has been received.
If it is determined that no sound data has been received, the CPU 71 of the HM device 3 returns to the process of step S111.

On the other hand, if it is determined that sound data has been received, the CPU 71 of the HM device 3 reproduces sound by generating an audio signal from the received sound data and supplying it to the speaker in step S104.

After the process in step S104, the CPU 71 of the HM device 3 returns to the process in step S111. That is, the CPU 71 of the HM device 3 repeatedly executes the determination process in step S111 and the determination process in step S112, and when a condition is satisfied, executes the corresponding process.

FIG. 9 is an example of a process executed by the CPU 71 of the distributor terminal 2. Processes similar to those shown in FIG. 7 are given the same step numbers, and description thereof will be omitted as appropriate.

In step S211, the CPU 71 of the distributor terminal 2 determines whether or not IMU data has been received from the HM device 3.

If it is determined that IMU data has been received, the CPU 71 of the distributor terminal 2 acquires physical information about the distributor based on the received IMU data in step S203, and acquires the audio data and video data on the distributor's side in step S201. is acquired from a microphone, camera unit, etc., and the audio data, video data, and physical information on the distributor side are transmitted to the viewer terminal 4 in step S204.

On the other hand, if it is determined that IMU data has not been received, the CPU 71 of the distributor terminal 2 proceeds to step S212 without executing the processes of steps S103, S201, and S204.

In addition, in the example shown in FIG. 9, when IMU data is acquired, the IMU data is transmitted to the viewer terminal 4 together with sound data and video data, but the acquisition of IMU data, sound data, and video data is and transmission may be performed independently. That is, in a certain transmission process, only IMU data may be transmitted, and in a certain transmission process, only video data may be transmitted.

In step S212, the CPU 71 of the distributor terminal 2 determines whether or not the audio data and video data from the viewer side have been received.
If it is determined that the audio data and video data have not been received, the CPU 71 of the distributor terminal 2 returns to the process of step S211.

On the other hand, if it is determined that the sound data has been received, the CPU 71 of the distributor terminal 2 performs a process of transmitting the sound data to the HM device 3 in step S206. In response to this, the process of step S104 shown in FIG. 8 is executed in the HM device 3, so that the distributor can listen to the sound acquired on the viewer side.

In step S207, the CPU 71 of the distributor terminal 2 outputs the video data acquired by the viewer. This allows the distributor to view the video shot by the viewer.

After the process in step S207, the CPU 71 of the distributor terminal 2 returns to the process in step S211. That is, the CPU 71 of the distributor terminal 2 repeatedly executes the determination process in step S211 and the determination process in step S212, and when a condition is satisfied, executes the corresponding process.

Note that in the determination process of step S212, it may be determined whether at least one of audio data and video data has been received.
In this case, if it is determined that the audio data has been received, the CPU 71 of the distributor terminal 2 executes the process of step S206, and if it is determined that the video data has been received, the CPU 71 of the distributor terminal 2 executes the process of step S207. Execute processing.

FIG. 10 is an example of processing executed by the CPU 71 of the viewer terminal 4. Processes similar to those shown in FIG. 7 are given the same step numbers, and description thereof will be omitted as appropriate.

In step S321, the CPU 71 of the viewer terminal 4 determines whether or not the audio data, video data, and physical information from the distributor have been received.

If it is determined that the message has not been received, the CPU 71 of the viewer terminal 4 proceeds to the process of step S301.

On the other hand, if it is determined that the message has been received, the CPU 71 of the viewer terminal 4 generates an avatar AT based on the received physical information about the distributor in step S303.

In step S304, the CPU 71 of the viewer terminal 4 performs three-dimensional estimation on the received video data from the distributor and generates a three-dimensional space.

In step S305, the CPU 71 of the viewer terminal 4 sets the viewing position at a predetermined position in the generated three-dimensional space.

In step S306, the CPU 71 of the viewer terminal 4 identifies possible placement positions in the generated three-dimensional space.

In step S307, the CPU 71 of the viewer terminal 4 places the avatar AT at a position where it can be placed.

In step S308, the CPU 71 of the viewer terminal 4 generates a rendered video based on the visual perception from the viewing position by performing rendering processing.

Each process from step S303 to step S310 is a series of processes performed in response to receiving information from the distributor terminal 2.

Subsequently, the CPU 71 of the viewer terminal 4 acquires audio data and video data on the viewer side from a microphone, camera, etc. in step S301, and transmits the audio data and video data on the viewer side to the distributor terminal 2 in step S311. Send.

As shown in FIG. 10, the CPU 71 of the viewer terminal 4 transmits the audio data and video data on the viewer side to the distributor terminal 2 by executing the processes of steps S301 and S311, and makes the determination in step S321. Each process from step S303 to step S310 is executed as appropriate depending on the result of the process.

<2. Second embodiment>
The information processing system 1A according to the second embodiment is the information processing system according to the first embodiment in that the position of the avatar AT can be changed and the sound output is performed according to the position of the avatar AT. Different from 1.

FIG. 11 shows an example of the configuration of the information processing system 1A. Note that the same components as the information processing system 1 shown in FIG. 1 are designated by the same reference numerals, and description thereof will be omitted as appropriate.

The information processing system 1A includes a distributor terminal 2A and an HM device 3 as devices on the distributor side, and a viewer terminal 4A as a device on the viewer side. The configuration of the HM device 3 is the same as that of the first embodiment, so a description thereof will be omitted.

The distributor terminal 2A includes an input section 5, a physical information acquisition section 6, an output section 7, and a communication section 8, as in the first embodiment.

The input unit 5 outputs first change information to the communication unit 8 in response to the distributor's operation to change the position of the avatar AT. That is, the first change information is change information regarding the position of the avatar AT.

The first change information is transmitted to the viewer terminal 4A via the communication unit 8.

Similarly to the first embodiment, the viewer terminal 4A includes an input section 12, an avatar generation section 13, a three-dimensional space generation section 14, a display video generation section 15, an output section 16, and a communication section 17. , an avatar position control section 18 and an acoustic signal generation section 19.

The first change information received from the distributor terminal 2A is provided to the avatar position control unit 18 via the communication unit 17.

Furthermore, the instruction to change the position of the avatar AT can also be made by the viewer's operation. The viewer's operation for changing the position of the avatar AT is also supplied to the avatar position control unit 18 via the input unit 12 as first change information.

The avatar position control unit 18 changes the position of the avatar AT based on the first change information supplied via the communication unit 17 or via the input unit 12. The changed position of avatar AT is supplied to the three-dimensional space generation unit 14 as avatar AT placement information.

The three-dimensional space generation unit 14 performs three-dimensional estimation and generates a three-dimensional space by performing image recognition using the video data supplied from the distributor terminal 2A via the communication unit 17.

The three-dimensional space generation unit 14 specifies a position where the avatar model supplied from the avatar generation unit 13 can be placed, and places the avatar AT. The three-dimensional space generation unit 14 then adjusts the position of the placed avatar AT based on the placement information supplied from the avatar position control unit 18.

At this time, the three-dimensional space generation unit 14 may adjust the position of the changed avatar AT so that spatial consistency is ensured. For example, if the new position of the avatar AT based on the placement information is not a positionable position, the position closest to the new position may be determined as the adjusted position.

Note that when moving the avatar AT from the current position to a new position, the avatar generation unit 13 may add a walking animation, a running animation, or the like.

The 3D space model in which each part is arranged by the 3D space generation unit 14 is supplied not only to the display image generation unit 15 but also to the audio signal generation unit 19.

The acoustic signal generation unit 19 performs processing to generate an acoustic signal in which a sound image of the sound data acquired at the distributor terminal 2A is localized to each object arranged in a three-dimensional space. The generated acoustic signal is input to the output unit 16 as rendered sound.

For example, when the voice uttered by a distributor is input to the distributor terminal 2A via the input unit 5, the sound data related to the voice uttered is sent from the distributor terminal 2A to the viewer terminal 4A.

The acoustic signal generation unit 19 receives the sound data related to the uttered sound via the communication unit 17, identifies the position of the avatar AT in the 3D space model generated by the 3D space generation unit 14, and compares the position with the viewing position. The sound image of the uttered voice is localized at the placement position of the avatar AT according to the relationship.

For example, as shown in FIG. 12, when the avatar AT is positioned on the left side within the viewing angle from the viewing position, the sound output from the left speaker 16L as the output unit 16 is louder than the right speaker 16R. By doing so, the sound image of the broadcaster's uttered voice is localized on the left side.

By playing back the acoustic signal obtained in this way in stereo from the speaker serving as the output unit 16, the viewer can experience an environment in which the voice uttered by the broadcaster can be naturally heard from the position where the avatar AT is placed. Can be done.

Note that the system is not limited to stereo playback, but may be configured to perform multi-channel playback such as 5.1ch so that the voice uttered by the distributor can be heard three-dimensionally.

Further, when the position of the avatar AT changes in the depth direction, the audio signal generation unit 19 can make the viewer perceive a sense of depth by generating an audio signal with delay, reverberation, etc. added.

Further, the audio signal generation unit 19 may adjust the volume of the audio signal depending on the distance between the avatar AT and the viewing position. That is, the sound signal may be generated such that the closer the viewing position is to the avatar AT, the higher the volume.

Specifically, as shown in FIG. 13, when the distance between the avatar AT and the viewing position is greater than that shown in FIG. is made smaller than that shown in FIG. This allows the viewer to experience the depth of the reproduced sound.

In addition, if the avatar AT is located outside the visual field of view from the viewing position, the sound image of the uttered voice is localized outside the field of view, thereby indicating that the broadcaster is located outside the field of view. It can be perceived by the viewer.

Note that it is desirable that the sound to be localized at the placement position of the avatar AT includes not only the voice uttered by the distributor, but also all the sounds generated by the distributor when the distributor claps his or her hands.

FIG. 14 shows an example of the process executed by the distributor terminal 2A in this embodiment, and FIG. 15 shows an example of the process executed by the viewer terminal 4A. Note that the processing executed by the HM device 3 is the same as that shown in FIG. 8 described in the first embodiment, so the description thereof will be omitted.

Note that in each figure, the same step numbers are given to processes similar to those in the first embodiment, and descriptions thereof are omitted as appropriate.

In step S211 of FIG. 14, the CPU 71 of the distributor terminal 2A determines whether or not the IMU data has been received, and when it is determined that the IMU data has been received, the CPU 71 of the distributor terminal 2A executes the processes of steps S203, S201, and S204 to update the IMU data. Physical information, sound data, and video data based on the data are transmitted to the viewer terminal 4A.

On the other hand, if it is determined that IMU data has not been received, the CPU 71 of the distributor terminal 2A proceeds to step S221, and determines whether or not the first change information has been input. As described above, the first change information is operation information for changing the position of the avatar AT.

If it is determined that the first change information has been input, the CPU 71 of the distributor terminal 2A transmits the first change information to the viewer terminal 4A in step S222. As a result, the first change information is transmitted to the viewer terminal 4A via the communication unit 8.

After completing each process of step S221 and step S222, or after determining in step S221 that the first change information has not been input, the process advances to step S212.
In step S212, the CPU 71 of the distributor terminal 2A determines whether or not audio data and video data have been received from the viewer terminal 4A, and if so, executes corresponding processing in steps S206 and S207.

Next, FIG. 15 will be explained.
In step S321, the CPU 71 of the viewer terminal 4A determines whether at least part of the audio data, video data, and physical information has been received from the distributor terminal 2A.

If it is determined that it has been received, the CPU 71 of the viewer terminal 4A generates an avatar AT reflecting the physical information in step S303, generates a three-dimensional space in step S304, and sets a viewing position in step S305.

Subsequently, the CPU 71 of the viewer terminal 4A specifies a placement possible position based on the three-dimensional estimation result in step S306, and places the avatar AT at the placement possible position in step S307.

In step S331, the CPU 71 of the viewer terminal 4A determines whether or not the first change information has been received. In this determination process, it is determined that the first change information has been received not only when the first change information is received from the distributor terminal 2A but also when the first change information is received from the input unit 12.

If it is determined that the first change information has been received, the CPU 71 of the viewer terminal 4A performs a process of changing the placement position of the avatar AT in step S332, and proceeds to the process of step S308. At this time, it may be determined whether the changed placement position is a placement possible position or not, and if it is not a placement possible position, a process may be performed to adjust the new placement position of the avatar AT.

Note that if it is determined in step S331 that the first change information has not been received, the CPU 71 of the viewer terminal 4A proceeds to the process in step S308 without executing the process in step S332.
Description of each process after step S308 will be omitted.

Note that the CPU 71 of the viewer terminal 4A may perform the determination process in step S331 without arranging the avatar AT in step S307. Specifically, if it is determined in step S331 that the first change information has not been received, processing is performed to place the avatar AT in a position where it can be placed in step S332, and if it is determined that the first change information has been received, the process is performed in step S332. The avatar AT may be placed after taking into account the received first change information.

<3. Third embodiment>
The information processing system 1B in the third embodiment differs from the previous examples in that the viewing position can be changed by the viewer.

An example of the configuration of the information processing system 1B will be described with reference to FIG. 16. Note that the same components as the information processing system 1 shown in FIG. 1 are designated by the same reference numerals, and description thereof will be omitted as appropriate.

The information processing system 1B includes a distributor terminal 2, an HM device 3, and a viewer terminal 4B. The configurations of the distributor terminal 2 and the HM device 3 are the same as those in the first embodiment, so a description thereof will be omitted.

The viewer terminal 4B includes an input section 12, an avatar generation section 13, a three-dimensional space generation section 14, a display video generation section 15, an output section 16, and a communication section 17.

The input unit 12 receives an operation to change the viewing position by the viewer, and supplies information about the change operation to the three-dimensional space generation unit 14 as second change information.

The three-dimensional space generation unit 14 sets the viewing position, that is, the position of the virtual camera, in the three-dimensional space generated by three-dimensional estimation, taking into account the second change information.

In this way, by allowing the position of the virtual camera to be set to any position, the viewer can move the position of the virtual camera by his or her own operations. As a result, the viewer can feel as if they are moving freely in the three-dimensional space to some extent, and can have an experience as if they were moving around the filming location with the broadcaster.

Note that as the deviation between the position of the virtual camera and the position of the camera unit at the time of photographing increases, the sense of discomfort regarding the texture pasted to each object in the three-dimensional space increases. Specifically, there is a case where a virtual camera is placed behind an object to which an image taken from the front is pasted as a texture.

Therefore, the position of the virtual camera may be set within a predetermined range centered on the position of the camera unit at the time of shooting. Thereby, it is possible to provide the viewer with a video in which spatial consistency is ensured to some extent.

FIG. 17 shows an example of processing executed by the viewer terminal 4B in this embodiment.
Note that the same steps as those in the first embodiment are given the same step numbers, and description thereof will be omitted as appropriate.

In step S321, the CPU 71 of the viewer terminal 4B determines whether or not the audio data, video data, and physical information from the distributor have been received.

If it is determined that it has been received, the CPU 71 of the viewer terminal 4B generates an avatar AT reflecting the physical information in step S303, generates a three-dimensional space in step S304, and sets a viewing position in step S305.

Subsequently, the CPU 71 of the viewer terminal 4B determines whether or not the second change information has been received in step S341. If it is determined that the second change information has been received, the CPU 71 of the viewer terminal 4B performs a process of changing the viewing position in step S342.

As a result, the viewing position can be changed according to the viewer's operations, allowing the viewer to experience the feeling of moving around freely in a three-dimensional space.

Note that the CPU 71 of the viewer terminal 4B may execute the process of step S341 without executing the process of step S305. Then, when determining that the second change information has been received, the CPU 71 of the viewer terminal 4B sets the viewing position in consideration of the second change information in step S342, and determines that the second change information has not been received. In this case, the viewing position may be determined in step S342 as in the previous embodiment.

<4. Modified example>
In the above example, an example was described in which the orientation of the distributor and the orientation of the avatar AT in real space are made to match. The orientation here refers to the direction of the body, the direction of the face, and the direction of the line of sight, but the orientations of the distributor and the avatar AT may be made to match by matching the orientations with respect to the object that the distributor is gazing at.

Specifically, as shown in FIG. 18, consider a situation in which a broadcaster is directing his or her body or face toward an object X in a three-dimensional space.
At this time, the orientation of the avatar AT in the three-dimensional space does not match the orientation of the broadcaster in the real space, but the orientations of both parties are the same in that the body and face are directed toward the object X that is being watched. can be said to be in agreement. In this way, the orientation of the avatar AT may be adjusted to match the orientation of the broadcaster in order to reproduce the situation where the avatar AT is directly facing the object X.

To do this, the viewer terminal 4 (4A) identifies the object X that the distributor is gazing at based on the distributor's physical information. Thereby, the avatar AT can be arranged so that its body or face faces the direction in which the object X exists.

The avatar AT may take any form. For example, as explained in each example above, it may be a three-dimensional object with the image of the broadcaster pasted as a texture, or it may be a three-dimensional object with the image of the distributor pasted as a texture, or it may be a three-dimensional object with a texture of some kind of character (a giant panda in FIG. 19) as shown in FIG. It may be a three-dimensional object.

Alternatively, the avatar AT may be a three-dimensional object having only an outline, as shown in FIG. 20, so that the scenery is not hidden by the avatar AT as much as possible. This allows the viewer to enjoy the photographed scenery even more.

Although the three-dimensional space generating unit 14 generates the three-dimensional space by performing image recognition processing using the photographed images, the three-dimensional space may be generated using other methods.
For example, information on a 3D object at a shooting location is acquired from an external server device that provides a map service, the captured image and the 3D object are aligned, and then the 3D object is transferred from the captured image to the 3D object. A three-dimensional space may be generated by pasting the cut out partial images as a texture.

<5. Summary＞
As explained in each of the above examples, the viewer terminals 4 (4A, 4B) as information processing devices generate an avatar AT of the photographer by reflecting the physical information of the photographer (the above-mentioned distributor). a generation unit 13; a three-dimensional space generation unit 14 that generates information on a three-dimensional space (for example, a 3D space model) from a photographed image and arranges the avatar AT in the three-dimensional space according to the orientation of the photographer at the time of photographing; It includes a display video generation unit 15 that generates a video from a viewing position set in a three-dimensional space as a display video.
When a videographer (distributor) takes selfies while walking and broadcasts the video, the viewer viewing the video may see a video that appears to be facing the videographer and moving backwards in the direction of the videographer's movement. I will do it. This makes it difficult to feel as if you are walking along the shooting location with the photographer.
Furthermore, when the photographer performs distribution while photographing the moving direction, the photographer is not reflected in the angle of view, and the user does not feel as if he or she is walking along the shooting location.
Therefore, the viewer terminals 4 (4A, 4B) as information processing devices in the present technology generate an avatar AT that reflects the physical information of the photographer, place it in the three-dimensional space generated from the photographed image, and The video will be presented to the viewer. Further, the orientation of the placed avatar AT corresponds to the orientation of the photographer. For example, the avatar AT is arranged so that the direction of the photographer matches the direction of the avatar AT.
This allows the viewer to feel as if they are walking around the filming location together with the photographer.
Therefore, for example, it is possible to get a sense of unity as if you were traveling together while staying at home, and to eliminate feelings of alienation. In addition, since the photographer's physical information is reflected in the avatar AT, for example, the photographer's posture and gestures are reflected in the avatar AT, allowing the viewer to decide when to call out to the photographer. It can be understood naturally and contributes to smooth communication.
In addition, in each of the above-mentioned examples, an example was described in which the viewer terminal 4 (4A, 4B) is equipped with the avatar generation section 13, the three-dimensional space generation section 14, and the display video generation section 15, but other than this It may be configured as follows.
For example, the distributor terminal 2 (2A) of the information processing system 1 (1A, 1B) may include each of these units, or the information processing device as a server device included in the information processing system 1 may include each of these units. You can leave it there. That is, the above-described configuration may be realized by an aspect of cloud computing.
Note that the orientation of the avatar AT and the orientation of the photographer do not need to completely match. For example, by applying the photographer's orientation to each discrete direction determined by dividing 360 degrees into 4 or 8 equal parts, and adjusting the orientation of the avatar AT to the applied orientation, the photographer's orientation and the avatar The directions of the ATs may be made to substantially match. Even with this mode, the viewer can feel as if he or she is traveling with the photographer.

As explained with reference to FIG. 4, FIG. 5, etc., the three-dimensional space generation unit 14 of the viewer terminal 4 (4A, 4B) is configured to determine the body orientation of the photographer (distributor) and the body orientation of the avatar AT. may be arranged so that they match.
Thereby, the moving direction of the photographer can be made to match the moving direction of the avatar AT, and the avatar AT can be placed in the photographed video without causing any discomfort.
Note that, as shown in FIG. 18, the matching of orientations may refer to facing the direction of a specific object.

As explained with reference to FIG. 4, FIG. may be arranged so that they match.
Thereby, it is possible to match the object that the photographer is looking at with the object that is ahead of the direction of the avatar AT's face. Therefore, the viewer can appropriately understand the object in which the photographer has shown interest, and smooth communication can be achieved.

As explained with reference to FIGS. 4, 5, etc., the three-dimensional space generation unit 14 of the viewer terminal 4 (4A, 4B) is configured to generate the following information: may be arranged so that they match.
This allows the photographer to more accurately grasp the object he or she is looking at, and facilitates smooth communication. That is, if the photographer knows the object that he is gazing at, he can have a conversation about that object, and since the photographer is gazing at the object, it is possible to expect the conversation to expand.

As described in the third embodiment with reference to FIG. The camera unit position of the distributor terminal 2 (2A) may be set to a different position from the current position of the camera unit of the distributor terminal 2 (2A).
That is, it is possible to set the viewing position at an arbitrary position in the generated three-dimensional space, which is different from the camera position at the time of photographing. As a result, the viewer can freely move the viewing position according to his or her own will, thereby giving the viewer the feeling of freely moving around the photographer (distributor). Therefore, it is possible to get a stronger sense of being together with the photographer.

As described in the third embodiment with reference to FIG. 16 etc., the viewing position set by the three-dimensional space generation unit 14 of the viewer terminal 4 (4A, 4B) is the position of the viewer viewing the displayed video. It may be possible to change it by operation.
By freely moving the viewing position according to their will, the viewer can feel that they are freely moving around the photographer (distributor).

As described above, the physical information received by the viewer terminals 4 (4A, 4B) includes the moving speed of the photographer (distributor), and the avatar generation unit 13 of the viewer terminals 4 (4A, 4B) The avatar AT may be generated by reflecting the movement speed.
Thereby, it is possible to display an avatar AT appropriate for the movement of the video. Specifically, when the background movement in the video is fast, a running avatar AT is displayed, and when the background movement is stopped, a standing avatar AT is displayed.

As explained with reference to FIG. 4 etc., the three-dimensional space generation unit 14 of the viewer terminal 4 (4A, 4B) can arrange the avatar AT in the three-dimensional space based on the result of image recognition for the captured image. The position may be specified as a position that can be placed, and the avatar AT may be placed at the position that can be placed.
This allows the avatar AT to be placed in a natural position. That is, it is possible to provide the viewer with a video in which spatial consistency is ensured.

As explained with reference to FIG. 5, etc., the three-dimensional space generation unit 14 of the viewer terminal 4 (4A, 4B) places the avatar AT within the viewing angle of the viewing position when a possible placement position cannot be specified. You don't have to.
For example, in a three-dimensional space, if there is no space suitable for arranging the avatar AT within the viewing angle from the viewing position, the avatar AT may be placed within the viewing angle of the viewing position, that is, at a position visible to the viewer. It will not be placed.
Thereby, it is possible to prevent the viewer from visually recognizing the avatar AT placed in an unnatural position.

As explained with reference to FIG. 5 etc., the three-dimensional space generation unit 14 of the viewer terminal 4 (4A, 4B) can be placed when there is no surface facing vertically upward within the viewing angle from the viewing position. It may be determined that the position cannot be specified.
When an appropriate space such as the ground does not exist, by not arranging the avatar AT within the angle of view, it is possible to prevent an unnatural avatar AT from being arranged.

As explained with reference to FIG. 5 etc., the display video generation unit 15 of the viewer terminal 4 (4A, 4B) has to arrange the avatar AT within the viewing angle from the position of use by the three-dimensional space generation unit 14. If it is determined, the avatar AT may be displayed at a predetermined position (for example, the upper right corner) on the display screen.
To display the avatar AT of the photographer (distributor) in a separate frame, for example, in a corner of the display screen, so that the viewer can check the situation of the photographer while eliminating the sense of incongruity regarding the arrangement of the avatar AT. Can be done.

As described above, the three-dimensional space generation unit 14 of the viewer terminal 4 (4A, 4B) may place the avatar AT outside the viewing angle of the viewing position when the possible placement position cannot be specified. .
By arranging the avatar AT outside the viewing angle, it is not necessary to present the avatar AT in an unnatural position to the viewer. In addition, since the sound image such as the voice emitted by the photographer (distributor) is localized to the position of the avatar AT placed outside the field of view, the photographer's avatar AT is placed in a position that the viewer cannot see. You can naturally perceive that you are there.

As described with reference to FIG. 4 and the like, the avatar AT generated at the viewer terminals 4 (4A, 4B) may be generated based on an image of the photographer (distributor).
Thereby, the viewer can accept the photographer's avatar AT without feeling uncomfortable.

As explained with reference to FIG. 11, FIG. 12, FIG. 13, etc., the viewer terminals 4 (4A, 4B) are configured to display a sound image of the sound generated by the photographer (distributor) of the avatar AT in the three-dimensional space. It may also include an acoustic signal generation section 19 that generates an acoustic signal localized at a position.
As a result, for example, the voice uttered by the photographer is heard from the position of the avatar AT, so that the viewer can comfortably accept that the photographer is present at the position of the avatar AT.

As explained with reference to FIG. 11, FIG. 12, FIG. 13, etc., the audio signal generation unit 19 of the viewer terminal 4 (4A, 4B) is configured to match the viewing position set in the three-dimensional space and the position of the avatar AT. The acoustic signal may be generated to have a volume depending on the distance.
Thereby, an appropriate acoustic signal can be generated according to the distance from the avatar AT, and the viewer can perceive the voice of the photographer (distributor) without feeling any discomfort.

As explained with reference to FIG. 11 etc., the position of the avatar AT placed by the three-dimensional space generation unit 14 of the viewer terminal 4 (4A, 4B) can be changed by the operation of the viewer viewing the displayed video. may be taken as
Thereby, the viewer can move away from the photographer (distributor) or approach the photographer in three-dimensional space.

As explained with reference to FIG. 11 etc., the position of the avatar AT placed by the three-dimensional space generation unit 14 of the viewer terminal 4 (4A, 4B) can be changed by the operation of the photographer (distributor). may be done.
This allows the photographer (distributor) to move away from or approach the viewer in three-dimensional space.
In this way, it is preferable that the distance between the photographer and the viewer is not constant but changes intermittently, giving the viewer the feeling of actually walking through the filming location.

As described with reference to FIG. 1 and the like, the avatar generation unit 13 of the viewer terminal 4 (4A, 4B) may perform the process of reflecting physical information on the avatar AT multiple times in the time direction.
This makes it possible to periodically reflect the behavior of the photographer (distributor) on the avatar AT.

The information processing method in the embodiment includes a process of generating an avatar AT of the photographer by reflecting the physical information of the photographer (distributor), and a process of generating three-dimensional space information from the photographed image, and generating information about the photographer at the time of photographing. The information processing device performs a process of arranging the avatar AT in a three-dimensional space so that the orientation of the avatar AT matches the orientation of the avatar AT, and a process of generating a video from a viewing position set in the three-dimensional space as a display video. It is something to be carried out.

The program in the embodiment has a function of generating an avatar AT of the photographer by reflecting the physical information of the photographer (distributor), and a function of generating three-dimensional space information from the photographed image and determining the orientation of the photographer at the time of photographing. A computer device that causes a computer device to perform a function of arranging an avatar AT in a three-dimensional space so that the direction of the avatar AT matches the direction of the avatar AT, and a function of generating a video from a viewing position set in the three-dimensional space as a display video. It is.
That is, the computer device is made to execute at least a part of each process shown in FIGS. 7, 10, 15, and 17.
Even with such an information processing method and program, the same operations and effects as those of the viewer terminals 4 (4A, 4B) as the above-described embodiments and modifications can be obtained.

Note that the effects described in this specification are merely examples and are not limiting, and other effects may also exist.

Moreover, the above-mentioned examples may be combined in any way, and even when various combinations are used, it is possible to obtain the various effects described above.

<6. This technology＞
(1)
an avatar generation unit that generates an avatar of the photographer by reflecting physical information of the photographer;
a three-dimensional space generation unit that generates three-dimensional space information from a photographed image and arranges the avatar in the three-dimensional space according to the orientation of the photographer at the time of photographing;
An information processing device, comprising: a display video generation unit that generates a video from a viewing position set in the three-dimensional space as a display video.
(2)
The information processing device according to (1), wherein the three-dimensional space generation unit performs the arrangement so that the orientation of the photographer's body and the orientation of the avatar's body match.
(3)
The information processing device according to any one of (1) to (2) above, wherein the three-dimensional space generation unit performs the arrangement so that the orientation of the photographer's face and the orientation of the avatar's face match.
(4)
The information processing device according to any one of (1) to (2) above, wherein the three-dimensional space generation unit performs the arrangement so that the direction of the line of sight of the photographer and the direction of the line of sight of the avatar match.
(5)
The information processing device according to any one of (1) to (4) above, wherein the viewing position can be set to a position different from a shooting viewpoint at the time of shooting.
(6)
The information processing device according to (5) above, wherein the viewing position can be changed by an operation of a viewer viewing the displayed video.
(7)
The physical information includes a moving speed of the photographer,
The information processing device according to any one of (1) to (6) above, wherein the avatar generation unit generates the avatar by reflecting the movement speed.
(8)
The three-dimensional space generation unit specifies a position where the avatar can be placed in the three-dimensional space based on the result of image recognition for the captured image, and places the avatar at the position where the avatar can be placed. The information processing device according to any one of (1) to (7) above.
(9)
The information processing device according to (8), wherein the three-dimensional space generation unit does not arrange the avatar within the viewing angle of the viewing position when the positionable position cannot be specified.
(10)
The information processing device according to (9), wherein the three-dimensional space generation unit determines that the positionable position cannot be specified when there is no surface facing vertically upward within the angle of view.
(11)
The display image generation unit displays the avatar at a predetermined position on the display screen when the three-dimensional space generation unit determines that the avatar is not placed within the angle of view. The information processing device according to any one of (10).
(12)
The information according to any one of (9) to (10) above, wherein the three-dimensional space generation unit places the avatar outside the viewing angle of the viewing position when the positionable position cannot be specified. Processing equipment.
(13)
The information processing device according to any one of (1) to (12) above, wherein the avatar is generated based on an image of the photographer.
(14)
According to any one of (1) to (13) above, comprising an acoustic signal generation unit that generates an acoustic signal in which a sound image of the sound generated by the photographer is localized to the position of the avatar in the three-dimensional space. information processing equipment.
(15)
The information processing device according to (14) above, wherein the audio signal generation unit generates the audio signal to have a volume corresponding to a distance between the viewing position set in the three-dimensional space and the position of the avatar. .
(16)
The information processing device according to (15) above, wherein the position of the avatar can be changed by an operation of a viewer viewing the displayed video.
(17)
The information processing device according to any one of (15) to (16) above, wherein the position of the avatar can be changed by an operation of the photographer.
(18)
The information processing device according to any one of (1) to (17) above, wherein the avatar generation unit performs a process of reflecting the physical information on the avatar multiple times in a time direction.
(19)
a process of generating an avatar of the photographer by reflecting physical information of the photographer;
A process of generating three-dimensional space information from a photographed image and arranging the avatar in the three-dimensional space according to the orientation of the photographer at the time of photographing;
An information processing method in which an information processing apparatus executes the following steps: generating a video from a viewing position set in the three-dimensional space as a display video.
(20)
a function of generating an avatar of the photographer by reflecting physical information of the photographer;
a function of generating three-dimensional space information from a photographed image and arranging the avatar in the three-dimensional space according to the orientation of the photographer at the time of photographing;
A program that causes a computer device to execute a function of generating a video from a viewing position set in the three-dimensional space as a display video.

4, 4A, 4B Viewer terminal 13 Avatar generation section 14 Three-dimensional space generation section 15 Display video generation section 19 Audio signal generation section 71 CPU
AT Avatar

Claims

an avatar generation unit that generates an avatar of the photographer by reflecting physical information of the photographer;
a three-dimensional space generation unit that generates three-dimensional space information from a photographed image and arranges the avatar in the three-dimensional space according to the orientation of the photographer at the time of photographing;
An information processing device, comprising: a display video generation unit that generates a video from a viewing position set in the three-dimensional space as a display video.
The information processing device according to claim 1, wherein the three-dimensional space generation unit performs the arrangement so that the orientation of the photographer's body and the orientation of the avatar's body match.
The information processing apparatus according to claim 1, wherein the three-dimensional space generation unit performs the arrangement so that the orientation of the photographer's face and the orientation of the avatar's face match.
The information processing device according to claim 1, wherein the three-dimensional space generation unit performs the arrangement so that the direction of the line of sight of the photographer and the direction of the line of sight of the avatar match.
The information processing device according to claim 1, wherein the viewing position can be set to a position different from a photographing viewpoint at the time of photographing.
The information processing device according to claim 5, wherein the viewing position can be changed by an operation of a viewer viewing the displayed video.
The physical information includes a moving speed of the photographer,
The information processing device according to claim 1, wherein the avatar generation unit generates the avatar by reflecting the movement speed.
The three-dimensional space generation unit specifies a position where the avatar can be placed in the three-dimensional space based on a result of image recognition for the captured image, and places the avatar at the position where the avatar can be placed. The information processing device according to item 1.
The information processing device according to claim 8, wherein the three-dimensional space generation unit does not arrange the avatar within the viewing angle of the viewing position if the positionable position cannot be specified.
The information processing device according to claim 9, wherein the three-dimensional space generation unit determines that the positionable position cannot be specified when there is no surface facing vertically upward within the angle of view.
The display video generation unit displays the avatar at a predetermined position on the display screen when the three-dimensional space generation unit determines that the avatar is not placed within the angle of view. Information processing device.
The information processing device according to claim 9, wherein the three-dimensional space generation unit arranges the avatar outside the viewing angle of the viewing position when the position where the position can be placed cannot be specified.
The information processing device according to claim 1, wherein the avatar is generated based on an image of the photographer.
The information processing apparatus according to claim 1, further comprising an acoustic signal generation unit that generates an acoustic signal in which a sound image of the sound generated by the photographer is localized to the position of the avatar in the three-dimensional space.
The information processing device according to claim 14, wherein the acoustic signal generation unit generates the acoustic signal so that the volume corresponds to a distance between the viewing position set in the three-dimensional space and the position of the avatar.
The information processing device according to claim 15, wherein the position of the avatar can be changed by an operation of a viewer viewing the displayed video.
The information processing apparatus according to claim 15, wherein the position of the avatar can be changed by an operation of the photographer.
The information processing device according to claim 1, wherein the avatar generation unit performs a process of reflecting the physical information on the avatar multiple times in a time direction.
a process of generating an avatar of the photographer by reflecting physical information of the photographer;
A process of generating three-dimensional space information from a photographed image and arranging the avatar in the three-dimensional space according to the orientation of the photographer at the time of photographing;
An information processing method in which an information processing apparatus executes the following steps: generating a video from a viewing position set in the three-dimensional space as a display video.
a function of generating an avatar of the photographer by reflecting physical information of the photographer;
a function of generating three-dimensional space information from a photographed image and arranging the avatar in the three-dimensional space according to the orientation of the photographer at the time of photographing;
A program that causes a computer device to execute a function of generating a video from a viewing position set in the three-dimensional space as a display video.