WO2024034396A1 - Information processing device, information processing method, and program - Google Patents

Information processing device, information processing method, and program Download PDF

Info

Publication number
WO2024034396A1
WO2024034396A1 PCT/JP2023/027306 JP2023027306W WO2024034396A1 WO 2024034396 A1 WO2024034396 A1 WO 2024034396A1 JP 2023027306 W JP2023027306 W JP 2023027306W WO 2024034396 A1 WO2024034396 A1 WO 2024034396A1
Authority
WO
WIPO (PCT)
Prior art keywords
avatar
dimensional space
photographer
information processing
generation unit
Prior art date
Application number
PCT/JP2023/027306
Other languages
French (fr)
Japanese (ja)
Inventor
龍正 小池
光 高鳥
吉弘 田村
Original Assignee
ソニーグループ株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ソニーグループ株式会社 filed Critical ソニーグループ株式会社
Publication of WO2024034396A1 publication Critical patent/WO2024034396A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0481Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
    • G06F3/04815Interaction with a metaphor-based environment or interaction object displayed as three-dimensional, e.g. changing the user viewpoint with respect to the environment or object
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/431Generation of visual interfaces for content selection or interaction; Content or additional data rendering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working

Definitions

  • the present technology relates to an information processing device, an information processing method, and a program for communicating between users in remote locations.
  • Patent Document 1 proposes a system in which not only normal voices such as conversations are presented, but also additional information according to the context (state, situation) of the content, etc. , a technology for promoting communication such as natural conversation has been disclosed.
  • the user on the trip carries a mobile terminal or the like and travels around the destination while taking pictures of the scenery, and enjoys conversations with people at home while sharing the captured images.
  • the present technology was developed in view of these problems, and aims to provide an experience that makes it feel like you are walking around the shooting location with the photographer, even though you are in a remote location.
  • the information processing device includes an avatar generation unit that generates an avatar of the photographer by reflecting physical information of the photographer, and an avatar generation unit that generates three-dimensional space information from the photographed image, and A three-dimensional space generation unit that arranges the avatar in the three-dimensional space according to its orientation, and a display image generation unit that generates an image from a viewing position set in the three-dimensional space as a display image. It is. As a result, the person at home (viewer) can feel as if he/she is walking around the shooting location together with the photographer.
  • the information processing method includes a process of generating an avatar of the photographer by reflecting the physical information of the photographer, and generating information of a three-dimensional space from the photographed image, and adjusting the orientation of the photographer at the time of photographing.
  • an information processing apparatus executes a process of arranging the avatar in the three-dimensional space in accordance with the above, and a process of generating a video from a viewing position set in the three-dimensional space as a display video.
  • the program according to the present technology has a function of generating an avatar of the photographer by reflecting the physical information of the photographer, and generating information of a three-dimensional space from the photographed image, and creating an avatar of the photographer according to the orientation of the photographer at the time of photographing.
  • This program causes a computer device to execute a function of arranging the avatar in the three-dimensional space, and a function of generating a video from a viewing position set in the three-dimensional space as a display video.
  • Such an information processing method and program can also provide the same effect as the information processing device according to the present technology described above.
  • FIG. 1 is a block diagram illustrating a configuration example of an information processing system according to a first embodiment of the present technology.
  • FIG. 2 is an explanatory diagram showing the appearance of a distributor terminal and an HM device.
  • FIG. 3 is a diagram showing an example of an image presented to a distributor.
  • FIG. 2 is a diagram showing an example of an image presented to a viewer.
  • FIG. 7 is a diagram showing another example of an image presented to a distributor.
  • FIG. 2 is a block diagram of a computer device.
  • FIG. 2 is a diagram showing the flow of processing executed by each device of the information processing system. It is a flow chart about an example of processing performed in an HM device.
  • 3 is a flowchart illustrating an example of processing executed at a distributor terminal.
  • FIG. 2 is a block diagram showing a configuration example of an information processing system according to a second embodiment.
  • FIG. 3 is a diagram for explaining localization of a sound image according to the position of an avatar.
  • FIG. 6 is an explanatory diagram of an example in which the volume of sound is changed depending on the distance from the avatar.
  • 12 is a flowchart of an example of a process executed at a distributor terminal in the second embodiment. It is a flowchart about an example of a process performed in a viewer terminal in a 2nd embodiment.
  • FIG. 3 is a block diagram showing a configuration example of an information processing system in a third embodiment.
  • FIG. 3 is an explanatory diagram of the correspondence between the orientation of the distributor and the orientation of the avatar. It is a figure showing an example of an avatar. It is a figure which shows another example of an avatar.
  • the information processing system 1 is a system for facilitating smooth communication between users located at separate locations. Further, the information processing system 1 is also a system used by one user to view a video shot while moving while the other user is at home or the like.
  • a user who shoots a video while moving will be referred to as a "distributor”, and a user who views the video shot by the distributor will be referred to as a "viewer”.
  • the information processing system 1 is a system that provides the viewer with an experience as if they were moving along with the broadcaster through the filming location.
  • the information processing system 1 includes a distributor terminal 2 and an HM (Head-mount) device 3 as a distributor-side device, and further includes a viewer terminal 4 as a viewer-side device (see FIG. 1).
  • HM Head-mount
  • the distributor terminal 2 is, for example, a smartphone or the like, and is a device that the distributor can hold in his hand and shoot video while moving.
  • the distributor terminal 2 is equipped with a display unit that can display images from the viewer side and a microphone that can collect the distributor's voice and environmental sounds.
  • the HM device 3 is connected to the distributor terminal 2 for wired or wireless communication, and is configured with a speaker for reproducing audio and environmental sounds from the viewer side.
  • the distributor terminal 2 and the HM device 3 may be realized by a single mobile terminal device by consolidating the functions of both in a smartphone.
  • the viewer terminal 4 is a stationary device, and includes a display unit on which images shot by the distributor terminal 2 are displayed, a speaker for reproducing the broadcaster's voice and environmental sounds, and the viewer's voice, etc. It is equipped with a microphone for inputting information.
  • the viewer terminal 4 may include a controller that performs various operations.
  • various devices such as a keyboard and a mouse can be considered.
  • the viewer terminal 4 may be configured as a system in which some or all of the computer device, microphone, speaker, and operation device are independently provided.
  • the distributor terminal 2 includes an input section 5, a physical information acquisition section 6, an output section 7, and a communication section 8.
  • the HM device 3 also includes an IMU (Inertial Measurement Unit) 9, an output section 10, and a communication section 11.
  • IMU Inertial Measurement Unit
  • the input unit 5 of the distributor terminal 2 includes a microphone for inputting voice and environmental sounds, a camera unit equipped with an image sensor for capturing images, a touch panel for inputting the distributor's operations, various button operators, etc. It is included.
  • the input unit 5 outputs “sound data” such as audio and environmental sounds, and “video data” such as still images and moving images captured by the image sensor to the communication unit 8.
  • sound data is simply described as “sound”
  • video data is simply described as "video”.
  • the physical information acquisition unit 6 acquires physical information of the distributor based on acceleration information and angular velocity information as sensing data by the IMU 9 included in the HM device 3.
  • the distributor's physical information includes, for example, the distributor's posture and movements. Specifically, this is information that specifies the distributor's face direction, line of sight direction, movement speed (walking speed), movement method (walking, cycling, etc.), posture, gestures, etc.
  • Pieces of information may be acquired not only from the IMU 9 included in the HM device 3 but also from a camera that photographs the distributor.
  • information specifying the line of sight, gestures, posture, etc. of the distributor may be obtained by analyzing an image captured by a camera included in the HM device 3.
  • the physical information acquisition unit 6 outputs the physical information acquired from the IMU data to the communication unit 8.
  • the output unit 7 outputs the sound and video on the viewer side received from the viewer terminal 4.
  • the output unit 7 in this example includes, for example, a display unit that outputs video from the viewer side, and the sound from the viewer side is output from a speaker serving as the output unit 10 included in the HM device 3. Ru.
  • FIG. 1 An example of an image displayed on the display section as the output section 7 is shown in FIG.
  • an image captured by the camera of the viewer terminal 4 is displayed in a large size. That is, an image of the viewer sitting on a chair and facing the camera is displayed.
  • an image captured by the camera unit of the distributor terminal 2 is displayed.
  • the video displayed in the corner area Ar2 may include an avatar AT, which will be described later.
  • the communication unit 8 receives IMU data from the HM device 3 and supplies it to the physical information acquisition unit 6. Further, the communication unit 8 supplies the video data received from the viewer terminal 4 to the output unit 7 and transmits the sound data received from the viewer terminal 4 to the HM device 3. Furthermore, the communication unit 8 transmits the audio data and video data supplied from the input unit 5 and the physical information supplied from the physical information acquisition unit 6 to the viewer terminal 4.
  • the IMU 9 of the HM device 3 acquires IMU data (acceleration data, angular velocity data, etc.) used to obtain physical information about the distributor by being equipped with an acceleration sensor, an angular velocity sensor, etc., and outputs it to the communication unit 11.
  • IMU data acceleration data, angular velocity data, etc.
  • the output unit 10 is configured with a speaker, and performs audio output by reproducing audio data as audio data received from the distributor terminal 2.
  • the communication unit 11 transmits IMU data supplied from the IMU 9 to the distributor terminal 2 and receives sound data from the distributor terminal 2.
  • the viewer terminal 4 includes an input section 12, an avatar generation section 13, a three-dimensional space generation section 14, a display video generation section 15, an output section 16, and a communication section 17.
  • the input unit 12 includes a microphone, a camera, various operators, and the like.
  • the input section 12 outputs audio data and video data to the communication section 17.
  • the avatar generation unit 13 generates an avatar AT for the distributor based on the physical information received from the distributor terminal 2.
  • the avatar AT is, for example, a three-dimensional object.
  • the avatar AT generated by the avatar generation unit 13 may be based on a photographed image of the broadcaster, or may be based on a specific character selected by the broadcaster or the viewer. .
  • At least a part of the avatar AT may be transparent or semi-transparent, or may be primitive enough to be recognized as a human.
  • the three-dimensional object as the avatar AT generated by the avatar generation unit 13 is supplied to the three-dimensional space generation unit 14 as an avatar model.
  • the avatar generation unit 13 reflects physical information about the distributor on the avatar AT.
  • physical information such as the distributor's face direction, line of sight direction, joint shape, etc. is reflected in the posture of the avatar AT.
  • the distributor's gestures, walking or stopping state, walking speed, etc. as physical information are reflected as the movement of the avatar AT. Note that the orientation of the broadcaster's body is taken into consideration during placement.
  • the avatar generation unit 13 may provide the three-dimensional space generation unit 14 with information such as the posture and orientation that the avatar AT should take as additional information of the avatar model.
  • the avatar generation unit 13 may add the distributor's movements as an animation to the avatar AT.
  • the animation given to the avatar AT regarding the distributor may be determined based on the distributor's moving speed or movement mode (such as walking or cycling).
  • a walking animation may be added, and if the distributor is running, a running animation may be added. Further, if it is estimated that the distributor is moving on a bicycle, an animation of the avatar AT may be provided with an animation of the broadcaster moving on a bicycle.
  • the avatar generation unit 13 may add an animation to make the avatar AT perform the gesture movement.
  • the avatar generation unit 13 performs processing to reflect physical information on the avatar AT multiple times during communication between the distributor terminal 2 and the viewer terminal 4. For example, physical information may be reflected on the avatar AT periodically, such as once every second, or physical information may be reflected on the avatar AT when a change occurs in the broadcaster's posture, direction, movement, etc. It's okay.
  • the three-dimensional space generation unit 14 performs image recognition (three-dimensional estimation processing) on the video (photographed image) received from the distributor terminal 2 to identify image areas such as the floor (ground) and walls, and generates three-dimensional space. Generate space.
  • a three-dimensional object is generated by performing image recognition on various subjects shown in a video, and by pasting an image cut out from the video as a texture on each surface of each three-dimensional object, a three-dimensional object based on the captured image can be created. Dimensional spaces can be generated.
  • a three-dimensional space may be realized by pasting a photographed image on the inner surface of a sphere.
  • the three-dimensional space generation unit 14 arranges the viewing position and the avatar AT in the generated three-dimensional space.
  • the viewing position is the position of the viewer's viewpoint in the generated three-dimensional space, and can be regarded as the position of the virtual camera.
  • the position of the camera that captured the video used to generate the three-dimensional space and the position of the virtual camera set after generating the three-dimensional space may be the same or different positions.
  • the three-dimensional space generation unit 14 specifies a position suitable for placing the avatar AT as a "placeable position" based on the result of the three-dimensional estimation process in the recognized three-dimensional space.
  • the three-dimensional space generation unit 14 specifies a position where spatial consistency can be ensured even if the avatar AT is placed as a position where the avatar AT can be placed. In other words, if the ground facing vertically upward is not detected, it is determined that the possible placement position cannot be specified.
  • the placeable position may be a placeable area where the avatar AT can be placed.
  • the three-dimensional space generation unit 14 first specifies a position where the virtual camera can be placed within the angle of view when the virtual camera is set at the viewing position. That is, the three-dimensional space generation unit 14 determines whether or not there is a position where the avatar AT can be placed in the video presented to the viewer.
  • the three-dimensional space generation unit 14 places the avatar AT generated by the avatar generation unit 13 at a position where it can be placed.
  • the captured image with the avatar AT placed is displayed on a display unit such as a monitor as the output unit 16 of the viewer terminal 4.
  • FIG. 4 shows an example of an image displayed on the output unit 16 of the viewer terminal 4.
  • an avatar AT as a three-dimensional object with an image of the distributor pasted as a texture is placed at the left end of the road.
  • the roadway area is not a position where it can be placed. That is, only the area of the sidewalk may be specified as the position where it can be placed.
  • the detected vertically upward facing surface is the top surface of an object such as a suitcase, it may be determined that the position is not a position that can be placed. In other words, when the ground is detected, a region where no object exists on the ground is identified as a placement possible position.
  • positions where other passersby or animals are present are not positions that can be placed.
  • the three-dimensional space generation unit 14 determines not to arrange the avatar AT within the field of view of the virtual camera when a position that can be placed within the field of view of the virtual camera cannot be specified.
  • One is to arrange the avatar AT outside the field of view of the virtual camera. Thereby, the viewer who views the video based on the angle of view of the virtual camera does not have to view the avatar AT in an unnatural state.
  • the other is to display the avatar AT at a specific position on the display section as the output section 16 of the viewer terminal 4. At this time, in order to make it clear that the avatar AT is not placed in a three-dimensional space, it is possible to display the avatar AT after securing an area above the display area where, for example, the sky is likely to be displayed. good.
  • the three-dimensional space generation unit 14 may decide not to arrange the avatar AT within the viewing angle of the virtual camera based on conditions other than those described above. For example, if the viewer's desire to see scenery can be inferred, or if the viewer performs an operation to hide the avatar AT, it is necessary to place the avatar AT within the viewing angle of the virtual camera. You may decide.
  • the three-dimensional space generation unit 14 may immediately place the avatar AT outside the angle of view, or may add animation to the avatar AT so that the avatar AT moves outside the angle of view.
  • the AT may naturally move out of the field of view. By adding the animation, a high sense of realism can be provided to the viewer.
  • FIG. 5 is an example in which a main area Ar3 where the video shot by the distributor terminal 2 is displayed and a sub area Ar4 where only the avatar AT is displayed are displayed on the output unit 16.
  • the sub-area Ar4 is an area surrounded by a rectangular frame.
  • the three-dimensional space generation unit 14 arranges the avatar AT at a position where the avatar AT can be placed so that the direction of the distributor's body at the shooting location matches the direction of the body of the avatar AT in the three-dimensional space.
  • the avatar AT generated by the avatar generation unit 13 is generated so that the direction of the face, the direction of the line of sight, etc. match the direction of the face and the direction of the line of sight of the distributor. Therefore, by arranging the avatar AT so that the body direction of the avatar AT matches the body direction of the broadcaster in the real space by the three-dimensional space generation unit 14, the direction of the face and line of sight of the broadcaster can also be adjusted in the real space. The direction of the broadcaster's face and line of sight will be matched. Therefore, the posture and various orientations of the distributor in real space can be reproduced in three-dimensional space without any discomfort.
  • the three-dimensional space generation unit 14 outputs a model of a three-dimensional space in which the avatar AT is arranged and the viewing position is set (hereinafter referred to as a “3D (Dimension) space model” as appropriate) to the display video generation unit 15.
  • 3D (Dimension) space model a model of a three-dimensional space in which the avatar AT is arranged and the viewing position is set
  • the display video generation unit 15 performs rendering processing to generate a viewing video from a virtual camera set at a viewing position, using a 3D space model regarding a three-dimensional space in which the avatar AT is placed. In other words, the display video generation unit 15 obtains a two-dimensional image by capturing a three-dimensional space using a virtual camera. Thereby, the display image generation section 15 outputs a two-dimensional rendered image to the output section 16.
  • the output unit 16 outputs the sound and video received from the distributor terminal 2. Specifically, the output unit 16 outputs the sound data received from the distributor terminal 2 from a speaker or the like serving as the output unit 16. Further, the output unit 16 performs a process of displaying the rendered image supplied from the display image generation unit 15 on a display unit serving as the output unit 16.
  • the communication unit 17 receives sound data, video data, and physical information from the distributor terminal 2, supplies the sound data to the output unit 16, supplies the video data to the three-dimensional space generation unit 14, and uses the physical information to generate an avatar. 13. Furthermore, the communication unit 17 transmits the audio data and video data acquired by the viewer terminal 4 to the distributor terminal 2.
  • the CPU (Central Processing Unit) 71 of each computer device stores data in a nonvolatile memory section 74 such as a ROM (Read Only Memory) 72 or an EEP-ROM (Electrically Erasable Programmable Read-Only Memory).
  • a nonvolatile memory section 74 such as a ROM (Read Only Memory) 72 or an EEP-ROM (Electrically Erasable Programmable Read-Only Memory).
  • Various processes are executed according to the program currently running or the program loaded from the storage unit 79 to the RAM (Random Access Memory) 73.
  • the RAM 73 also appropriately stores data necessary for the CPU 71 to execute various processes.
  • the CPU 71, ROM 72, RAM 73, and nonvolatile memory section 74 are interconnected via a bus 83.
  • An input/output interface 75 is also connected to this bus 83.
  • the input/output interface 75 is connected to an input section 76 consisting of an operator or an operating device.
  • an input section 76 consisting of an operator or an operating device.
  • various operators and operating devices such as a keyboard, a mouse, a key, a dial, a touch panel, a touch pad, and a remote controller are assumed.
  • a user's operation is detected by the input unit 76, and a signal corresponding to the input operation is interpreted by the CPU 71.
  • a display section 77 consisting of an LCD (Liquid Cristal Display) or an organic EL panel, and an audio output section 78 consisting of a speaker etc. are connected to the input/output interface 75 either integrally or separately.
  • the display unit 77 is a display unit that performs various displays, and is configured by, for example, a display device provided on the front of the distributor terminal 2 as a computer device, a separate display device connected to the housing, or the like.
  • the display unit 77 displays various images, moving images (videos), etc. on the display screen based on instructions from the CPU 71. Further, the display unit 77 displays various operation menus, icons, messages, etc., ie, as a GUI (Graphical User Interface), based on instructions from the CPU 71.
  • GUI Graphic User Interface
  • the input/output interface 75 may be connected to a storage section 79 made up of a hard disk, solid-state memory, etc., and a communication section 80 made up of a modem or the like.
  • the communication unit 80 performs communication processing via a transmission path such as the Internet, and communicates with various devices by wired or wireless communication, bus communication, or the like.
  • a drive 81 is also connected to the input/output interface 75 as required, and a removable storage medium 82 such as a magnetic disk, optical disk, magneto-optical disk, or semiconductor memory is appropriately installed.
  • the drive 81 can read data files such as image files and various computer programs from the removable storage medium 82 .
  • the read data file is stored in the storage section 79, and images and sounds included in the data file are outputted on the display section 77 and the audio output section 78. Further, computer programs and the like read from the removable storage medium 82 are installed in the storage unit 79 as necessary.
  • software for the processing of this embodiment can be installed via network communication by the communication unit 80 or the removable storage medium 82.
  • the software may be stored in advance in the ROM 72, storage unit 79, or the like.
  • the CPU 71 performing processing operations based on various programs, necessary information processing and communication processing are executed in the communication section 8 of the distributor terminal 2, the communication section 11 of the HM device 3, and the communication section 17 of the viewer terminal 4. be done.
  • the computer devices constituting the distributor terminal 2, HM device 3, and viewer terminal 4 are not limited to a single computer device as shown in FIG. may be configured.
  • the plurality of computer devices may be systemized using a LAN or the like, or may be located at remote locations via a VPN using the Internet or the like.
  • the plurality of computer devices may include computer devices as a server group (cloud) that can be used by a cloud computing service.
  • FIG. 7 shows an example of the flow of processing executed in each of the distributor terminal 2, HM device 3, and viewer terminal 4. Note that the execution order of each process shown in FIG. 7 is just an example, and some processes may be executed one after the other, or some processes may be executed in parallel.
  • the CPU 71 of the HM device 3 acquires IMU data from the IMU 9 in step S101, and transmits the IMU data to the distributor terminal 2 in step S102.
  • the CPU 71 of the distributor terminal 2 executes the process of acquiring the audio data and video data on the distributor side from the RAM 73 included in the microphone and image sensor in step S201, and then acquires the IMU data transmitted from the HM device 3.
  • the receiving process is executed in step S202.
  • step S203 the CPU 71 of the distributor terminal 2 acquires the distributor's physical information based on the received IMU data.
  • step S204 the CPU 71 of the distributor terminal 2 transmits the audio data and video data of the distributor, and the physical information estimated about the distributor to the viewer terminal 4.
  • the CPU 71 of the viewer terminal 4 executes the process of acquiring the audio data and video data on the viewer side from the RAM 73 provided in the microphone and camera in step S301, and then acquires the audio data and video data transmitted from the distributor terminal 2. and physical information is received in step S302.
  • step S303 the CPU 71 of the viewer terminal 4 generates an avatar AT based on the physical information.
  • the avatar AT generated at this time may be provided with an animation based on physical information as described above.
  • step S304 the CPU 71 of the viewer terminal 4 generates a three-dimensional space based on the video data captured by the distributor. For example, the CPU 71 of the viewer terminal 4 performs three-dimensional estimation by performing image recognition processing on the video, and generates a three-dimensional model for each subject.
  • step S305 the CPU 71 of the viewer terminal 4 sets the viewing position in the generated three-dimensional space.
  • the CPU 71 of the viewer terminal 4 identifies a placement position where the avatar AT can be placed based on the result of the three-dimensional estimation, and in the subsequent step S307 places the avatar AT in the placement area.
  • step S308 the CPU 71 of the viewer terminal 4 performs rendering processing to generate a viewing video based on the 3D space model and viewing position regarding the 3D space in which the avatar AT is placed.
  • step S309 the CPU 71 of the viewer terminal 4 generates and reproduces an audio signal from the sound data received from the distributor terminal 2.
  • step S310 the CPU 71 of the viewer terminal 4 outputs the rendered video generated by the rendering process in step S308.
  • step S311 the CPU 71 of the viewer terminal 4 transmits the audio data and video data on the viewer side acquired in step S301 to the distributor terminal 2.
  • the CPU 71 of the distributor terminal 2 receives the audio data and video data in step S205, and transmits only the audio data to the HM device 3 in step S206.
  • the CPU 71 of the HM device 3 receives the audio data on the viewer side in step S103, and generates and reproduces an audio signal from the audio data in step S104.
  • the CPU 71 of the distributor terminal 2 outputs the remaining video data in step S207.
  • playback processing based on the sound data and acoustic data acquired on the distributor side is performed in the viewer's environment, and playback processing based on the sound data and acoustic data acquired on the viewer side. is performed by the distributor.
  • FIG. 8 is an example of a process executed by the CPU 71 of the HM device 3. Note that processes similar to those shown in FIG. 7 are given the same step numbers, and description thereof will be omitted as appropriate.
  • step S111 the CPU 71 of the HM device 3 determines whether it is time to acquire IMU data. If it is determined that it is time to acquire the IMU data, the CPU 71 of the HM device 3 acquires the IMU data in step S101.
  • the IMU 9 of the HM device 3 outputs sensing data, for example, every few msec, but the CPU 71 may acquire these sensing data all at once every several hundred msec or every few seconds, or may acquire these sensing data every time the sensing data is output. It's okay.
  • step S102 the CPU 71 of the HM device 3 causes the communication unit 11 to execute a process of transmitting the acquired IMU data to the distributor terminal 2.
  • step S111 determines whether it is the time to acquire IMU data. If it is determined in step S111 that it is not the time to acquire IMU data, the CPU 71 of the HM device 3 proceeds to the process in step S112 without executing the processes in step S101 and step S102.
  • step S112 the CPU 71 of the HM device 3 determines whether or not the sound data acquired on the viewer side has been received. If it is determined that no sound data has been received, the CPU 71 of the HM device 3 returns to the process of step S111.
  • the CPU 71 of the HM device 3 reproduces sound by generating an audio signal from the received sound data and supplying it to the speaker in step S104.
  • the CPU 71 of the HM device 3 After the process in step S104, the CPU 71 of the HM device 3 returns to the process in step S111. That is, the CPU 71 of the HM device 3 repeatedly executes the determination process in step S111 and the determination process in step S112, and when a condition is satisfied, executes the corresponding process.
  • FIG. 9 is an example of a process executed by the CPU 71 of the distributor terminal 2. Processes similar to those shown in FIG. 7 are given the same step numbers, and description thereof will be omitted as appropriate.
  • step S211 the CPU 71 of the distributor terminal 2 determines whether or not IMU data has been received from the HM device 3.
  • the CPU 71 of the distributor terminal 2 acquires physical information about the distributor based on the received IMU data in step S203, and acquires the audio data and video data on the distributor's side in step S201. is acquired from a microphone, camera unit, etc., and the audio data, video data, and physical information on the distributor side are transmitted to the viewer terminal 4 in step S204.
  • the CPU 71 of the distributor terminal 2 proceeds to step S212 without executing the processes of steps S103, S201, and S204.
  • the IMU data when IMU data is acquired, the IMU data is transmitted to the viewer terminal 4 together with sound data and video data, but the acquisition of IMU data, sound data, and video data is and transmission may be performed independently. That is, in a certain transmission process, only IMU data may be transmitted, and in a certain transmission process, only video data may be transmitted.
  • step S212 the CPU 71 of the distributor terminal 2 determines whether or not the audio data and video data from the viewer side have been received. If it is determined that the audio data and video data have not been received, the CPU 71 of the distributor terminal 2 returns to the process of step S211.
  • the CPU 71 of the distributor terminal 2 performs a process of transmitting the sound data to the HM device 3 in step S206.
  • the process of step S104 shown in FIG. 8 is executed in the HM device 3, so that the distributor can listen to the sound acquired on the viewer side.
  • step S207 the CPU 71 of the distributor terminal 2 outputs the video data acquired by the viewer. This allows the distributor to view the video shot by the viewer.
  • the CPU 71 of the distributor terminal 2 After the process in step S207, the CPU 71 of the distributor terminal 2 returns to the process in step S211. That is, the CPU 71 of the distributor terminal 2 repeatedly executes the determination process in step S211 and the determination process in step S212, and when a condition is satisfied, executes the corresponding process.
  • step S212 it may be determined whether at least one of audio data and video data has been received. In this case, if it is determined that the audio data has been received, the CPU 71 of the distributor terminal 2 executes the process of step S206, and if it is determined that the video data has been received, the CPU 71 of the distributor terminal 2 executes the process of step S207. Execute processing.
  • FIG. 10 is an example of processing executed by the CPU 71 of the viewer terminal 4. Processes similar to those shown in FIG. 7 are given the same step numbers, and description thereof will be omitted as appropriate.
  • step S321 the CPU 71 of the viewer terminal 4 determines whether or not the audio data, video data, and physical information from the distributor have been received.
  • step S301 If it is determined that the message has not been received, the CPU 71 of the viewer terminal 4 proceeds to the process of step S301.
  • the CPU 71 of the viewer terminal 4 generates an avatar AT based on the received physical information about the distributor in step S303.
  • step S304 the CPU 71 of the viewer terminal 4 performs three-dimensional estimation on the received video data from the distributor and generates a three-dimensional space.
  • step S305 the CPU 71 of the viewer terminal 4 sets the viewing position at a predetermined position in the generated three-dimensional space.
  • step S306 the CPU 71 of the viewer terminal 4 identifies possible placement positions in the generated three-dimensional space.
  • step S307 the CPU 71 of the viewer terminal 4 places the avatar AT at a position where it can be placed.
  • step S308 the CPU 71 of the viewer terminal 4 generates a rendered video based on the visual perception from the viewing position by performing rendering processing.
  • step S309 the CPU 71 of the viewer terminal 4 generates and reproduces an audio signal from the sound data received from the distributor terminal 2.
  • step S310 the CPU 71 of the viewer terminal 4 outputs the rendered video generated by the rendering process in step S308.
  • step S303 to step S310 are a series of processes performed in response to receiving information from the distributor terminal 2.
  • the CPU 71 of the viewer terminal 4 acquires audio data and video data on the viewer side from a microphone, camera, etc. in step S301, and transmits the audio data and video data on the viewer side to the distributor terminal 2 in step S311. Send.
  • the CPU 71 of the viewer terminal 4 transmits the audio data and video data on the viewer side to the distributor terminal 2 by executing the processes of steps S301 and S311, and makes the determination in step S321.
  • Each process from step S303 to step S310 is executed as appropriate depending on the result of the process.
  • the information processing system 1A according to the second embodiment is the information processing system according to the first embodiment in that the position of the avatar AT can be changed and the sound output is performed according to the position of the avatar AT. Different from 1.
  • FIG. 11 shows an example of the configuration of the information processing system 1A. Note that the same components as the information processing system 1 shown in FIG. 1 are designated by the same reference numerals, and description thereof will be omitted as appropriate.
  • the information processing system 1A includes a distributor terminal 2A and an HM device 3 as devices on the distributor side, and a viewer terminal 4A as a device on the viewer side.
  • the configuration of the HM device 3 is the same as that of the first embodiment, so a description thereof will be omitted.
  • the distributor terminal 2A includes an input section 5, a physical information acquisition section 6, an output section 7, and a communication section 8, as in the first embodiment.
  • the input unit 5 outputs first change information to the communication unit 8 in response to the distributor's operation to change the position of the avatar AT. That is, the first change information is change information regarding the position of the avatar AT.
  • the first change information is transmitted to the viewer terminal 4A via the communication unit 8.
  • the viewer terminal 4A includes an input section 12, an avatar generation section 13, a three-dimensional space generation section 14, a display video generation section 15, an output section 16, and a communication section 17. , an avatar position control section 18 and an acoustic signal generation section 19.
  • the first change information received from the distributor terminal 2A is provided to the avatar position control unit 18 via the communication unit 17.
  • the instruction to change the position of the avatar AT can also be made by the viewer's operation.
  • the viewer's operation for changing the position of the avatar AT is also supplied to the avatar position control unit 18 via the input unit 12 as first change information.
  • the avatar position control unit 18 changes the position of the avatar AT based on the first change information supplied via the communication unit 17 or via the input unit 12.
  • the changed position of avatar AT is supplied to the three-dimensional space generation unit 14 as avatar AT placement information.
  • the three-dimensional space generation unit 14 performs three-dimensional estimation and generates a three-dimensional space by performing image recognition using the video data supplied from the distributor terminal 2A via the communication unit 17.
  • the three-dimensional space generation unit 14 specifies a position where the avatar model supplied from the avatar generation unit 13 can be placed, and places the avatar AT.
  • the three-dimensional space generation unit 14 then adjusts the position of the placed avatar AT based on the placement information supplied from the avatar position control unit 18.
  • the three-dimensional space generation unit 14 may adjust the position of the changed avatar AT so that spatial consistency is ensured. For example, if the new position of the avatar AT based on the placement information is not a positionable position, the position closest to the new position may be determined as the adjusted position.
  • the avatar generation unit 13 may add a walking animation, a running animation, or the like.
  • the 3D space model in which each part is arranged by the 3D space generation unit 14 is supplied not only to the display image generation unit 15 but also to the audio signal generation unit 19.
  • the acoustic signal generation unit 19 performs processing to generate an acoustic signal in which a sound image of the sound data acquired at the distributor terminal 2A is localized to each object arranged in a three-dimensional space.
  • the generated acoustic signal is input to the output unit 16 as rendered sound.
  • the sound data related to the voice uttered is sent from the distributor terminal 2A to the viewer terminal 4A.
  • the acoustic signal generation unit 19 receives the sound data related to the uttered sound via the communication unit 17, identifies the position of the avatar AT in the 3D space model generated by the 3D space generation unit 14, and compares the position with the viewing position.
  • the sound image of the uttered voice is localized at the placement position of the avatar AT according to the relationship.
  • the sound output from the left speaker 16L as the output unit 16 is louder than the right speaker 16R.
  • the sound image of the broadcaster's uttered voice is localized on the left side.
  • the viewer By playing back the acoustic signal obtained in this way in stereo from the speaker serving as the output unit 16, the viewer can experience an environment in which the voice uttered by the broadcaster can be naturally heard from the position where the avatar AT is placed. Can be done.
  • system is not limited to stereo playback, but may be configured to perform multi-channel playback such as 5.1ch so that the voice uttered by the distributor can be heard three-dimensionally.
  • the audio signal generation unit 19 can make the viewer perceive a sense of depth by generating an audio signal with delay, reverberation, etc. added.
  • the audio signal generation unit 19 may adjust the volume of the audio signal depending on the distance between the avatar AT and the viewing position. That is, the sound signal may be generated such that the closer the viewing position is to the avatar AT, the higher the volume.
  • the avatar AT is located outside the visual field of view from the viewing position, the sound image of the uttered voice is localized outside the field of view, thereby indicating that the broadcaster is located outside the field of view. It can be perceived by the viewer.
  • the sound to be localized at the placement position of the avatar AT includes not only the voice uttered by the distributor, but also all the sounds generated by the distributor when the distributor claps his or her hands.
  • FIG. 14 shows an example of the process executed by the distributor terminal 2A in this embodiment
  • FIG. 15 shows an example of the process executed by the viewer terminal 4A. Note that the processing executed by the HM device 3 is the same as that shown in FIG. 8 described in the first embodiment, so the description thereof will be omitted.
  • step S211 of FIG. 14 the CPU 71 of the distributor terminal 2A determines whether or not the IMU data has been received, and when it is determined that the IMU data has been received, the CPU 71 of the distributor terminal 2A executes the processes of steps S203, S201, and S204 to update the IMU data. Physical information, sound data, and video data based on the data are transmitted to the viewer terminal 4A.
  • the CPU 71 of the distributor terminal 2A proceeds to step S221, and determines whether or not the first change information has been input.
  • the first change information is operation information for changing the position of the avatar AT.
  • the CPU 71 of the distributor terminal 2A transmits the first change information to the viewer terminal 4A in step S222. As a result, the first change information is transmitted to the viewer terminal 4A via the communication unit 8.
  • step S212 the CPU 71 of the distributor terminal 2A determines whether or not audio data and video data have been received from the viewer terminal 4A, and if so, executes corresponding processing in steps S206 and S207.
  • step S321 the CPU 71 of the viewer terminal 4A determines whether at least part of the audio data, video data, and physical information has been received from the distributor terminal 2A.
  • the CPU 71 of the viewer terminal 4A If it is determined that it has been received, the CPU 71 of the viewer terminal 4A generates an avatar AT reflecting the physical information in step S303, generates a three-dimensional space in step S304, and sets a viewing position in step S305.
  • the CPU 71 of the viewer terminal 4A specifies a placement possible position based on the three-dimensional estimation result in step S306, and places the avatar AT at the placement possible position in step S307.
  • step S331 the CPU 71 of the viewer terminal 4A determines whether or not the first change information has been received. In this determination process, it is determined that the first change information has been received not only when the first change information is received from the distributor terminal 2A but also when the first change information is received from the input unit 12.
  • the CPU 71 of the viewer terminal 4A performs a process of changing the placement position of the avatar AT in step S332, and proceeds to the process of step S308. At this time, it may be determined whether the changed placement position is a placement possible position or not, and if it is not a placement possible position, a process may be performed to adjust the new placement position of the avatar AT.
  • step S331 if it is determined in step S331 that the first change information has not been received, the CPU 71 of the viewer terminal 4A proceeds to the process in step S308 without executing the process in step S332. Description of each process after step S308 will be omitted.
  • the CPU 71 of the viewer terminal 4A may perform the determination process in step S331 without arranging the avatar AT in step S307. Specifically, if it is determined in step S331 that the first change information has not been received, processing is performed to place the avatar AT in a position where it can be placed in step S332, and if it is determined that the first change information has been received, the process is performed in step S332.
  • the avatar AT may be placed after taking into account the received first change information.
  • the information processing system 1B in the third embodiment differs from the previous examples in that the viewing position can be changed by the viewer.
  • FIG. 16 An example of the configuration of the information processing system 1B will be described with reference to FIG. 16. Note that the same components as the information processing system 1 shown in FIG. 1 are designated by the same reference numerals, and description thereof will be omitted as appropriate.
  • the information processing system 1B includes a distributor terminal 2, an HM device 3, and a viewer terminal 4B.
  • the configurations of the distributor terminal 2 and the HM device 3 are the same as those in the first embodiment, so a description thereof will be omitted.
  • the viewer terminal 4B includes an input section 12, an avatar generation section 13, a three-dimensional space generation section 14, a display video generation section 15, an output section 16, and a communication section 17.
  • the input unit 12 receives an operation to change the viewing position by the viewer, and supplies information about the change operation to the three-dimensional space generation unit 14 as second change information.
  • the three-dimensional space generation unit 14 sets the viewing position, that is, the position of the virtual camera, in the three-dimensional space generated by three-dimensional estimation, taking into account the second change information.
  • the viewer can move the position of the virtual camera by his or her own operations.
  • the viewer can feel as if they are moving freely in the three-dimensional space to some extent, and can have an experience as if they were moving around the filming location with the broadcaster.
  • the position of the virtual camera may be set within a predetermined range centered on the position of the camera unit at the time of shooting. Therefore, it is possible to provide the viewer with a video in which spatial consistency is ensured to some extent.
  • FIG. 17 shows an example of processing executed by the viewer terminal 4B in this embodiment. Note that the same steps as those in the first embodiment are given the same step numbers, and description thereof will be omitted as appropriate.
  • step S321 the CPU 71 of the viewer terminal 4B determines whether or not the audio data, video data, and physical information from the distributor have been received.
  • the CPU 71 of the viewer terminal 4B If it is determined that it has been received, the CPU 71 of the viewer terminal 4B generates an avatar AT reflecting the physical information in step S303, generates a three-dimensional space in step S304, and sets a viewing position in step S305.
  • the CPU 71 of the viewer terminal 4B determines whether or not the second change information has been received in step S341. If it is determined that the second change information has been received, the CPU 71 of the viewer terminal 4B performs a process of changing the viewing position in step S342.
  • the viewing position can be changed according to the viewer's operations, allowing the viewer to experience the feeling of moving around freely in a three-dimensional space.
  • the CPU 71 of the viewer terminal 4B may execute the process of step S341 without executing the process of step S305. Then, when determining that the second change information has been received, the CPU 71 of the viewer terminal 4B sets the viewing position in consideration of the second change information in step S342, and determines that the second change information has not been received. In this case, the viewing position may be determined in step S342 as in the previous embodiment.
  • the orientation of the avatar AT in the three-dimensional space does not match the orientation of the broadcaster in the real space, but the orientations of both parties are the same in that the body and face are directed toward the object X that is being watched. can be said to be in agreement.
  • the orientation of the avatar AT may be adjusted to match the orientation of the broadcaster in order to reproduce the situation where the avatar AT is directly facing the object X.
  • the viewer terminal 4 (4A) identifies the object X that the distributor is gazing at based on the distributor's physical information. Thereby, the avatar AT can be arranged so that its body or face faces the direction in which the object X exists.
  • the avatar AT may take any form. For example, as explained in each example above, it may be a three-dimensional object with the image of the broadcaster pasted as a texture, or it may be a three-dimensional object with the image of the distributor pasted as a texture, or it may be a three-dimensional object with a texture of some kind of character (a giant panda in FIG. 19) as shown in FIG. It may be a three-dimensional object.
  • the avatar AT may be a three-dimensional object having only an outline, as shown in FIG. 20, so that the scenery is not hidden by the avatar AT as much as possible. This allows the viewer to enjoy the photographed scenery even more.
  • the three-dimensional space generating unit 14 generates the three-dimensional space by performing image recognition processing using the photographed images
  • the three-dimensional space may be generated using other methods. For example, information on a 3D object at a shooting location is acquired from an external server device that provides a map service, the captured image and the 3D object are aligned, and then the 3D object is transferred from the captured image to the 3D object.
  • a three-dimensional space may be generated by pasting the cut out partial images as a texture.
  • the viewer terminals 4 (4A, 4B) as information processing devices generate an avatar AT of the photographer by reflecting the physical information of the photographer (the above-mentioned distributor).
  • a generation unit 13 a three-dimensional space generation unit 14 that generates information on a three-dimensional space (for example, a 3D space model) from a photographed image and arranges the avatar AT in the three-dimensional space according to the orientation of the photographer at the time of photographing;
  • the viewer viewing the video may see a video that appears to be facing the videographer and moving backwards in the direction of the videographer's movement. I will do it. This makes it difficult to feel as if you are walking along the shooting location with the photographer. Furthermore, when the photographer performs distribution while photographing the moving direction, the photographer is not reflected in the angle of view, and the user does not feel as if he or she is walking along the shooting location. Therefore, the viewer terminals 4 (4A, 4B) as information processing devices in the present technology generate an avatar AT that reflects the physical information of the photographer, place it in the three-dimensional space generated from the photographed image, and The video will be presented to the viewer.
  • the orientation of the placed avatar AT corresponds to the orientation of the photographer.
  • the avatar AT is arranged so that the direction of the photographer matches the direction of the avatar AT. This allows the viewer to feel as if they are walking around the filming location together with the photographer. Therefore, for example, it is possible to get a sense of unity as if you were traveling together while staying at home, and to eliminate feelings of alienation.
  • the photographer's physical information is reflected in the avatar AT, for example, the photographer's posture and gestures are reflected in the avatar AT, allowing the viewer to decide when to call out to the photographer. It can be understood naturally and contributes to smooth communication.
  • the viewer terminal 4 (4A, 4B) is equipped with the avatar generation section 13, the three-dimensional space generation section 14, and the display video generation section 15, but other than this It may be configured as follows.
  • the distributor terminal 2 (2A) of the information processing system 1 (1A, 1B) may include each of these units, or the information processing device as a server device included in the information processing system 1 may include each of these units. You can leave it there. That is, the above-described configuration may be realized by an aspect of cloud computing. Note that the orientation of the avatar AT and the orientation of the photographer do not need to completely match.
  • the photographer's orientation and the avatar AT may be made to substantially match. Even with this mode, the viewer can feel as if he or she is traveling with the photographer.
  • the three-dimensional space generation unit 14 of the viewer terminal 4 is configured to determine the body orientation of the photographer (distributor) and the body orientation of the avatar AT. may be arranged so that they match. Thereby, the moving direction of the photographer can be made to match the moving direction of the avatar AT, and the avatar AT can be placed in the photographed video without causing any discomfort.
  • the matching of orientations may refer to facing the direction of a specific object.
  • FIG. may be arranged so that they match. Thereby, it is possible to match the object that the photographer is looking at with the object that is ahead of the direction of the avatar AT's face. Therefore, the viewer can appropriately understand the object in which the photographer has shown interest, and smooth communication can be achieved.
  • the three-dimensional space generation unit 14 of the viewer terminal 4 (4A, 4B) is configured to generate the following information: may be arranged so that they match. This allows the photographer to more accurately grasp the object he or she is looking at, and facilitates smooth communication. That is, if the photographer knows the object that he is gazing at, he can have a conversation about that object, and since the photographer is gazing at the object, it is possible to expect the conversation to expand.
  • the camera unit position of the distributor terminal 2 (2A) may be set to a different position from the current position of the camera unit of the distributor terminal 2 (2A). That is, it is possible to set the viewing position at an arbitrary position in the generated three-dimensional space, which is different from the camera position at the time of photographing. As a result, the viewer can freely move the viewing position according to his or her own will, thereby giving the viewer the feeling of freely moving around the photographer (distributor). Therefore, it is possible to get a stronger sense of being together with the photographer.
  • the viewing position set by the three-dimensional space generation unit 14 of the viewer terminal 4 (4A, 4B) is the position of the viewer viewing the displayed video. It may be possible to change it by operation. By freely moving the viewing position according to their will, the viewer can feel that they are freely moving around the photographer (distributor).
  • the physical information received by the viewer terminals 4 (4A, 4B) includes the moving speed of the photographer (distributor), and the avatar generation unit 13 of the viewer terminals 4 (4A, 4B)
  • the avatar AT may be generated by reflecting the movement speed. Thereby, it is possible to display an avatar AT appropriate for the movement of the video. Specifically, when the background movement in the video is fast, a running avatar AT is displayed, and when the background movement is stopped, a standing avatar AT is displayed.
  • the three-dimensional space generation unit 14 of the viewer terminal 4 can arrange the avatar AT in the three-dimensional space based on the result of image recognition for the captured image.
  • the position may be specified as a position that can be placed, and the avatar AT may be placed at the position that can be placed. This allows the avatar AT to be placed in a natural position. That is, it is possible to provide the viewer with a video in which spatial consistency is ensured.
  • the three-dimensional space generation unit 14 of the viewer terminal 4 places the avatar AT within the viewing angle of the viewing position when a possible placement position cannot be specified. You don't have to. For example, in a three-dimensional space, if there is no space suitable for arranging the avatar AT within the viewing angle from the viewing position, the avatar AT may be placed within the viewing angle of the viewing position, that is, at a position visible to the viewer. It will not be placed. Thereby, it is possible to prevent the viewer from visually recognizing the avatar AT placed in an unnatural position.
  • the three-dimensional space generation unit 14 of the viewer terminal 4 (4A, 4B) can be placed when there is no surface facing vertically upward within the viewing angle from the viewing position. It may be determined that the position cannot be specified. When an appropriate space such as the ground does not exist, by not arranging the avatar AT within the angle of view, it is possible to prevent an unnatural avatar AT from being arranged.
  • the display video generation unit 15 of the viewer terminal 4 (4A, 4B) has to arrange the avatar AT within the viewing angle from the position of use by the three-dimensional space generation unit 14. If it is determined, the avatar AT may be displayed at a predetermined position (for example, the upper right corner) on the display screen.
  • a predetermined position for example, the upper right corner
  • the three-dimensional space generation unit 14 of the viewer terminal 4 may place the avatar AT outside the viewing angle of the viewing position when the possible placement position cannot be specified. .
  • the avatar AT By arranging the avatar AT outside the viewing angle, it is not necessary to present the avatar AT in an unnatural position to the viewer.
  • the sound image such as the voice emitted by the photographer (distributor) is localized to the position of the avatar AT placed outside the field of view, the photographer's avatar AT is placed in a position that the viewer cannot see. You can naturally perceive that you are there.
  • the avatar AT generated at the viewer terminals 4 (4A, 4B) may be generated based on an image of the photographer (distributor). Thereby, the viewer can accept the photographer's avatar AT without feeling uncomfortable.
  • the viewer terminals 4 are configured to display a sound image of the sound generated by the photographer (distributor) of the avatar AT in the three-dimensional space. It may also include an acoustic signal generation section 19 that generates an acoustic signal localized at a position. As a result, for example, the voice uttered by the photographer is heard from the position of the avatar AT, so that the viewer can comfortably accept that the photographer is present at the position of the avatar AT.
  • the audio signal generation unit 19 of the viewer terminal 4 (4A, 4B) is configured to match the viewing position set in the three-dimensional space and the position of the avatar AT.
  • the acoustic signal may be generated to have a volume depending on the distance. Thereby, an appropriate acoustic signal can be generated according to the distance from the avatar AT, and the viewer can perceive the voice of the photographer (distributor) without feeling any discomfort.
  • the position of the avatar AT placed by the three-dimensional space generation unit 14 of the viewer terminal 4 (4A, 4B) can be changed by the operation of the viewer viewing the displayed video. may be taken as Thereby, the viewer can move away from the photographer (distributor) or approach the photographer in three-dimensional space.
  • the position of the avatar AT placed by the three-dimensional space generation unit 14 of the viewer terminal 4 (4A, 4B) can be changed by the operation of the photographer (distributor). may be done. This allows the photographer (distributor) to move away from or approach the viewer in three-dimensional space. In this way, it is preferable that the distance between the photographer and the viewer is not constant but changes intermittently, giving the viewer the feeling of actually walking through the filming location.
  • the avatar generation unit 13 of the viewer terminal 4 may perform the process of reflecting physical information on the avatar AT multiple times in the time direction. This makes it possible to periodically reflect the behavior of the photographer (distributor) on the avatar AT.
  • the information processing method in the embodiment includes a process of generating an avatar AT of the photographer by reflecting the physical information of the photographer (distributor), and a process of generating three-dimensional space information from the photographed image, and generating information about the photographer at the time of photographing.
  • the information processing device performs a process of arranging the avatar AT in a three-dimensional space so that the orientation of the avatar AT matches the orientation of the avatar AT, and a process of generating a video from a viewing position set in the three-dimensional space as a display video. It is something to be carried out.
  • the program in the embodiment has a function of generating an avatar AT of the photographer by reflecting the physical information of the photographer (distributor), and a function of generating three-dimensional space information from the photographed image and determining the orientation of the photographer at the time of photographing.
  • a computer device that causes a computer device to perform a function of arranging an avatar AT in a three-dimensional space so that the direction of the avatar AT matches the direction of the avatar AT, and a function of generating a video from a viewing position set in the three-dimensional space as a display video. It is. That is, the computer device is made to execute at least a part of each process shown in FIGS. 7, 10, 15, and 17. Even with such an information processing method and program, the same operations and effects as those of the viewer terminals 4 (4A, 4B) as the above-described embodiments and modifications can be obtained.
  • This technology> (1) an avatar generation unit that generates an avatar of the photographer by reflecting physical information of the photographer; a three-dimensional space generation unit that generates three-dimensional space information from a photographed image and arranges the avatar in the three-dimensional space according to the orientation of the photographer at the time of photographing; An information processing device, comprising: a display video generation unit that generates a video from a viewing position set in the three-dimensional space as a display video. (2) The information processing device according to (1), wherein the three-dimensional space generation unit performs the arrangement so that the orientation of the photographer's body and the orientation of the avatar's body match.
  • the physical information includes a moving speed of the photographer, The information processing device according to any one of (1) to (6) above, wherein the avatar generation unit generates the avatar by reflecting the movement speed.
  • the three-dimensional space generation unit specifies a position where the avatar can be placed in the three-dimensional space based on the result of image recognition for the captured image, and places the avatar at the position where the avatar can be placed.
  • the information processing device according to any one of (1) to (7) above.

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Human Computer Interaction (AREA)
  • Computer Graphics (AREA)
  • Computer Hardware Design (AREA)
  • Software Systems (AREA)
  • Processing Or Creating Images (AREA)

Abstract

An information processing device according to the present technology comprises: an avatar generation unit that generates an avatar of a photographer by reflecting body information of the photographer; a three-dimensional space generation unit that generates information about a three-dimensional space from a photographed image, and disposes the avatar in the three-dimensional space in accordance with the orientation of the photographer at photographing; and a display video generation unit that generates, as a display video, a video from a viewing position set in the three-dimensional space.

Description

情報処理装置、情報処理方法、プログラムInformation processing device, information processing method, program
 本技術は、遠隔地にいるユーザ同士でコミュニケーションを図るための情報処理装置、情報処理方法及びプログラムに関する。 The present technology relates to an information processing device, an information processing method, and a program for communicating between users in remote locations.
 遠隔地にいるユーザ同士で円滑なコミュニケーションを図りたいという要望がある。このような要望を満たすために、下記特許文献1においては、会話等の通常の音声だけでなく、その内容等のコンテクスト(状態、状況)に応じた付加的な情報をユーザに提示することで、自然な会話等のコミュニケーションを促進する技術が開示されている。 There is a desire to facilitate smooth communication between users in remote locations. In order to meet such demands, Patent Document 1 below proposes a system in which not only normal voices such as conversations are presented, but also additional information according to the context (state, situation) of the content, etc. , a technology for promoting communication such as natural conversation has been disclosed.
特開2021-071632号公報Japanese Patent Application Publication No. 2021-071632
 ところで、病気や高齢のために家を出ることができない在宅者が他者と共に旅行しているかのような気分を味わうために、旅先にいるユーザと在宅者たるユーザとでコミュニケーションを図ることがある。
 このとき、旅先にいるユーザは、携帯端末等を所持し旅先の風景を撮影しながら巡ることで、撮影した画像を共有しながら在宅者との会話等を楽しむ。
By the way, in order for home users who are unable to leave their homes due to illness or old age to feel as if they are traveling with others, they may try to communicate with the user who is on the road and the home user. .
At this time, the user on the trip carries a mobile terminal or the like and travels around the destination while taking pictures of the scenery, and enjoys conversations with people at home while sharing the captured images.
 しかし、撮影者の目線で風景を撮影した映像を共有する場合には、撮影者が画角内に写り込むことが無いため、撮影者と一緒に旅行している感覚を共有することが難しい。
 また、撮影者が自身の背後の風景を画角内に収めながら自撮りした映像を共有する場合には、撮影者の移動方向と在宅者が体感する移動方向が逆方向となるため、やはり撮影者と一緒に旅行している感覚を共有し難い。
However, when sharing a video of a landscape taken from the photographer's perspective, it is difficult to share the feeling of traveling with the photographer because the photographer is not reflected within the angle of view.
In addition, when a photographer shares a self-portrait while capturing the scenery behind him within the field of view, the direction of movement of the photographer and the direction of movement experienced by the person at home are opposite, so It is difficult to share the feeling of traveling with other people.
 本技術は、このような問題に鑑みて為されたものであり、遠隔地にいながら撮影者と共に撮影地を行動しているような体験を提供することを目的とする。 The present technology was developed in view of these problems, and aims to provide an experience that makes it feel like you are walking around the shooting location with the photographer, even though you are in a remote location.
 本技術に係る情報処理装置は、撮影者の身体情報を反映して前記撮影者のアバターを生成するアバター生成部と、撮影画像から3次元空間の情報を生成し、撮影時における前記撮影者の向きに応じて前記アバターを前記3次元空間に配置する3次元空間生成部と、前記3次元空間において設定された視聴位置からの映像を表示映像として生成する表示映像生成部と、を備えたものである。
 これにより、在宅者(視聴者)は撮影者と共に撮影地を行動している感覚を得ることができる。
The information processing device according to the present technology includes an avatar generation unit that generates an avatar of the photographer by reflecting physical information of the photographer, and an avatar generation unit that generates three-dimensional space information from the photographed image, and A three-dimensional space generation unit that arranges the avatar in the three-dimensional space according to its orientation, and a display image generation unit that generates an image from a viewing position set in the three-dimensional space as a display image. It is.
As a result, the person at home (viewer) can feel as if he/she is walking around the shooting location together with the photographer.
 本技術に係る情報処理方法は、撮影者の身体情報を反映して前記撮影者のアバターを生成する処理と、撮影画像から3次元空間の情報を生成し、撮影時における前記撮影者の向きに応じて前記アバターを前記3次元空間に配置する処理と、前記3次元空間において設定された視聴位置からの映像を表示映像として生成する処理と、を情報処理装置が実行する情報処理方法である。 The information processing method according to the present technology includes a process of generating an avatar of the photographer by reflecting the physical information of the photographer, and generating information of a three-dimensional space from the photographed image, and adjusting the orientation of the photographer at the time of photographing. In this information processing method, an information processing apparatus executes a process of arranging the avatar in the three-dimensional space in accordance with the above, and a process of generating a video from a viewing position set in the three-dimensional space as a display video.
 本技術に係るプログラムは、撮影者の身体情報を反映して前記撮影者のアバターを生成する機能と、撮影画像から3次元空間の情報を生成し、撮影時における前記撮影者の向きに応じて前記アバターを前記3次元空間に配置する機能と、前記3次元空間において設定された視聴位置からの映像を表示映像として生成する機能と、をコンピュータ装置に実行させるプログラムである。
 このような情報処理方法及びプログラムによっても、上記した本技術に係る情報処理装置と同様の作用が得られる。
The program according to the present technology has a function of generating an avatar of the photographer by reflecting the physical information of the photographer, and generating information of a three-dimensional space from the photographed image, and creating an avatar of the photographer according to the orientation of the photographer at the time of photographing. This program causes a computer device to execute a function of arranging the avatar in the three-dimensional space, and a function of generating a video from a viewing position set in the three-dimensional space as a display video.
Such an information processing method and program can also provide the same effect as the information processing device according to the present technology described above.
本技術に係る第1の実施の形態における情報処理システムの構成例を示すブロック図である。FIG. 1 is a block diagram illustrating a configuration example of an information processing system according to a first embodiment of the present technology. 配信者端末及びHM装置の外観を示す説明図である。FIG. 2 is an explanatory diagram showing the appearance of a distributor terminal and an HM device. 配信者に提示される画像の一例を示す図である。FIG. 3 is a diagram showing an example of an image presented to a distributor. 視聴者に提示される画像の一例を示す図である。FIG. 2 is a diagram showing an example of an image presented to a viewer. 配信者に提示される画像の他の例を示す図である。FIG. 7 is a diagram showing another example of an image presented to a distributor. コンピュータ装置のブロック図である。FIG. 2 is a block diagram of a computer device. 情報処理システムの各装置が実行する処理の流れを示す図である。FIG. 2 is a diagram showing the flow of processing executed by each device of the information processing system. HM装置において実行される処理の一例についてのフローチャートである。It is a flow chart about an example of processing performed in an HM device. 配信者端末において実行される処理の一例についてのフローチャートである。3 is a flowchart illustrating an example of processing executed at a distributor terminal. 視聴者端末において実行される処理の一例についてのフローチャートである。It is a flow chart about an example of processing performed in a viewer terminal. 第2の実施の形態における情報処理システムの構成例を示すブロック図である。FIG. 2 is a block diagram showing a configuration example of an information processing system according to a second embodiment. アバターの位置に応じた音像の定位を説明するための図である。FIG. 3 is a diagram for explaining localization of a sound image according to the position of an avatar. アバターとの距離に応じて音響の音量を変化させる例についての説明図である。FIG. 6 is an explanatory diagram of an example in which the volume of sound is changed depending on the distance from the avatar. 第2の実施の形態における配信者端末において実行される処理の一例についてのフローチャートである。12 is a flowchart of an example of a process executed at a distributor terminal in the second embodiment. 第2の実施の形態における視聴者端末において実行される処理の一例についてのフローチャートである。It is a flowchart about an example of a process performed in a viewer terminal in a 2nd embodiment. 第3の実施の形態における情報処理システムの構成例を示すブロック図である。FIG. 3 is a block diagram showing a configuration example of an information processing system in a third embodiment. 第3の実施の形態における視聴者端末において実行される処理の一例についてのフローチャートである。It is a flowchart about an example of processing performed in a viewer terminal in a 3rd embodiment. 配信者の向きとアバターの向きの一致についての説明図である。FIG. 3 is an explanatory diagram of the correspondence between the orientation of the distributor and the orientation of the avatar. アバターの一例を示す図である。It is a figure showing an example of an avatar. アバターの他の例を示す図である。It is a figure which shows another example of an avatar.
 以下、添付図面を参照し、本技術に係る実施の形態を次の順序で説明する。
<1.第1の実施の形態>
<1-1.情報処理システムの構成>
<1-2.コンピュータ装置>
<1-3.処理フロー>
<2.第2の実施の形態>
<3.第3の実施の形態>
<4.変形例>
<5.まとめ>
<6.本技術>
Hereinafter, embodiments of the present technology will be described in the following order with reference to the accompanying drawings.
<1. First embodiment>
<1-1. Information processing system configuration>
<1-2. Computer equipment>
<1-3. Processing flow>
<2. Second embodiment>
<3. Third embodiment>
<4. Modified example>
<5. Summary>
<6. This technology>
<1.第1の実施の形態>
<1-1.情報処理システムの構成>
 第1の実施の形態における情報処理システム1の構成例について添付図を参照して説明する。
<1. First embodiment>
<1-1. Information processing system configuration>
A configuration example of the information processing system 1 in the first embodiment will be described with reference to the attached drawings.
 情報処理システム1は、離れた位置にいるユーザ同士のコミュニケーションを円滑に図るためのシステムである。また、情報処理システム1は、一方のユーザが移動しながら撮影した映像を他方のユーザが家などにいながら視聴するために用いられるシステムでもある。 The information processing system 1 is a system for facilitating smooth communication between users located at separate locations. Further, the information processing system 1 is also a system used by one user to view a video shot while moving while the other user is at home or the like.
 以降の説明においては、移動しながら映像を撮影するユーザ(撮影者)を「配信者」と記載し、配信者が撮影した映像を視聴するユーザを「視聴者」と記載する。 In the following description, a user (photographer) who shoots a video while moving will be referred to as a "distributor", and a user who views the video shot by the distributor will be referred to as a "viewer".
 情報処理システム1は、視聴者があたかも配信者と行動を共にして撮影地を移動しているかのような体験を提供するシステムである。 The information processing system 1 is a system that provides the viewer with an experience as if they were moving along with the broadcaster through the filming location.
 情報処理システム1は、配信者側の装置として配信者端末2とHM(Head-mount)装置3を備え、更に、視聴者側の装置として視聴者端末4を備えている(図1参照)。 The information processing system 1 includes a distributor terminal 2 and an HM (Head-mount) device 3 as a distributor-side device, and further includes a viewer terminal 4 as a viewer-side device (see FIG. 1).
 配信者端末2及びHM装置3の一例を図2に示す。
 配信者端末2は、例えば、スマートフォンなどとされ、配信者が手に持って移動しながら映像を撮影することが可能な装置とされている。
An example of the distributor terminal 2 and the HM device 3 is shown in FIG.
The distributor terminal 2 is, for example, a smartphone or the like, and is a device that the distributor can hold in his hand and shoot video while moving.
 配信者端末2は、視聴者側の映像を表示できる表示部と配信者の音声や環境音を集音可能なマイクロフォンなどを備えている。 The distributor terminal 2 is equipped with a display unit that can display images from the viewer side and a microphone that can collect the distributor's voice and environmental sounds.
 HM装置3は、配信者端末2と有線通信あるいは無線通信が可能に接続されており、視聴者側の音声や環境音を再生するためのスピーカを備えて構成されている。 The HM device 3 is connected to the distributor terminal 2 for wired or wireless communication, and is configured with a speaker for reproducing audio and environmental sounds from the viewer side.
 なお、配信者端末2とHM装置3は、双方の機能をスマートフォンに集約させることにより1台の携帯端末装置によって実現されてもよい。 Note that the distributor terminal 2 and the HM device 3 may be realized by a single mobile terminal device by consolidating the functions of both in a smartphone.
 視聴者端末4の一例を図2に示す。
 視聴者端末4は、据え置き型の装置とされ、配信者端末2によって撮影された映像が映し出される表示部と、配信者側の音声や環境音を再生するためのスピーカと、視聴者の音声等を入力するマイクロフォンなどを備えて構成されている。
An example of the viewer terminal 4 is shown in FIG.
The viewer terminal 4 is a stationary device, and includes a display unit on which images shot by the distributor terminal 2 are displayed, a speaker for reproducing the broadcaster's voice and environmental sounds, and the viewer's voice, etc. It is equipped with a microphone for inputting information.
 また、視聴者端末4は、各種の操作を行うコントローラを備えていてもよい。コントローラとしては、キーボードやマウスなど各種のデバイスが考えられる。 Additionally, the viewer terminal 4 may include a controller that performs various operations. As the controller, various devices such as a keyboard and a mouse can be considered.
 なお、視聴者端末4は、コンピュータ装置とマイクロフォンとスピーカと操作デバイスの一部または全部が独立して設けられたシステムとして構成されていてもよい。 Note that the viewer terminal 4 may be configured as a system in which some or all of the computer device, microphone, speaker, and operation device are independently provided.
 配信者端末2は図1に示すように入力部5と身体情報取得部6と出力部7と通信部8とを備えている。
 また、HM装置3は、IMU(Inertial Measurement Unit)9と出力部10と通信部11とを備えている。
As shown in FIG. 1, the distributor terminal 2 includes an input section 5, a physical information acquisition section 6, an output section 7, and a communication section 8.
The HM device 3 also includes an IMU (Inertial Measurement Unit) 9, an output section 10, and a communication section 11.
 配信者端末2の入力部5は、音声や環境音が入力されるマイクロフォンや、画像を撮像するイメージセンサを備えたカメラユニットや、配信者の操作が入力されるタッチパネルや各種のボタン操作子などが含まれている。 The input unit 5 of the distributor terminal 2 includes a microphone for inputting voice and environmental sounds, a camera unit equipped with an image sensor for capturing images, a touch panel for inputting the distributor's operations, various button operators, etc. It is included.
 入力部5からは音声や環境音などの「音データ」とイメージセンサによって撮像された静止画や動画像などの「映像データ」が通信部8に出力される。なお、図中においては音データを単に「音」と記載し、映像データを単に「映像」と記載している。 The input unit 5 outputs “sound data” such as audio and environmental sounds, and “video data” such as still images and moving images captured by the image sensor to the communication unit 8. In addition, in the figure, sound data is simply described as "sound", and video data is simply described as "video".
 身体情報取得部6はHM装置3が備えるIMU9によるセンシングデータとしての加速度情報や角速度情報に基づいて配信者の身体情報を得る。 The physical information acquisition unit 6 acquires physical information of the distributor based on acceleration information and angular velocity information as sensing data by the IMU 9 included in the HM device 3.
 ここで、配信者の身体情報とは、例えば、配信者の姿勢や動きなどである。具体的には、配信者の顔の向き、視線の向き、移動速度(歩行速度)、移動方法(徒歩、自転車など)、姿勢、ジェスチャーなどを特定する情報である。 Here, the distributor's physical information includes, for example, the distributor's posture and movements. Specifically, this is information that specifies the distributor's face direction, line of sight direction, movement speed (walking speed), movement method (walking, cycling, etc.), posture, gestures, etc.
 これらの情報は、HM装置3が備えるIMU9だけでなく、配信者を撮影するカメラ等から取得されてもよい。例えば、配信者についての視線やジェスチャーや姿勢などを特定する情報は、HM装置3が備えるカメラの撮像画像を画像解析することによって取得されてもよい。 These pieces of information may be acquired not only from the IMU 9 included in the HM device 3 but also from a camera that photographs the distributor. For example, information specifying the line of sight, gestures, posture, etc. of the distributor may be obtained by analyzing an image captured by a camera included in the HM device 3.
 身体情報取得部6は、IMUデータから取得した身体情報を通信部8に出力する。 The physical information acquisition unit 6 outputs the physical information acquired from the IMU data to the communication unit 8.
 出力部7は、視聴者端末4から受信した視聴者側の音や映像の出力を行う。本例における出力部7は、例えば、視聴者側の映像を出力する表示部等を備えて構成されており、視聴者側の音についてはHM装置3が備える出力部10としてのスピーカから出力される。 The output unit 7 outputs the sound and video on the viewer side received from the viewer terminal 4. The output unit 7 in this example includes, for example, a display unit that outputs video from the viewer side, and the sound from the viewer side is output from a speaker serving as the output unit 10 included in the HM device 3. Ru.
 出力部7としての表示部に表示される画像の一例を図3に示す。 An example of an image displayed on the display section as the output section 7 is shown in FIG.
 図示するように、出力部7としての表示部の主領域Ar1には、視聴者端末4のカメラによって撮像された映像が大きく映し出されている。即ち、椅子に座りカメラに正対した状態の視聴者についての画像が表示されている。 As shown in the figure, in the main area Ar1 of the display unit serving as the output unit 7, an image captured by the camera of the viewer terminal 4 is displayed in a large size. That is, an image of the viewer sitting on a chair and facing the camera is displayed.
 そして、表示部の主領域Ar1における右上の角領域Ar2には、配信者端末2のカメラユニットによって撮像された映像が映し出されている。なお、角領域Ar2に表示される映像には後述するアバターATが含まれていてもよい。 In the upper right corner area Ar2 of the main area Ar1 of the display section, an image captured by the camera unit of the distributor terminal 2 is displayed. Note that the video displayed in the corner area Ar2 may include an avatar AT, which will be described later.
 通信部8は、HM装置3からIMUデータを受信して身体情報取得部6に供給する。また、通信部8は、視聴者端末4から受信した映像データを出力部7に供給し、視聴者端末4から受信した音データをHM装置3に送信する。更に、通信部8は、入力部5から供給された音データと映像データ、そして、身体情報取得部6から供給された身体情報を視聴者端末4に送信する。 The communication unit 8 receives IMU data from the HM device 3 and supplies it to the physical information acquisition unit 6. Further, the communication unit 8 supplies the video data received from the viewer terminal 4 to the output unit 7 and transmits the sound data received from the viewer terminal 4 to the HM device 3. Furthermore, the communication unit 8 transmits the audio data and video data supplied from the input unit 5 and the physical information supplied from the physical information acquisition unit 6 to the viewer terminal 4.
 HM装置3のIMU9は、加速度センサや角速度センサ等を備えることにより配信者についての身体情報を得るために用いられるIMUデータ(加速度データや角速度データなど)を取得して通信部11に出力する。 The IMU 9 of the HM device 3 acquires IMU data (acceleration data, angular velocity data, etc.) used to obtain physical information about the distributor by being equipped with an acceleration sensor, an angular velocity sensor, etc., and outputs it to the communication unit 11.
 出力部10は、スピーカを備えて構成され、配信者端末2から受信した音データとしての音響データを再生することにより音響出力を行う。 The output unit 10 is configured with a speaker, and performs audio output by reproducing audio data as audio data received from the distributor terminal 2.
 通信部11は、IMU9から供給されたIMUデータを配信者端末2に送信し、配信者端末2から音データを受信する。 The communication unit 11 transmits IMU data supplied from the IMU 9 to the distributor terminal 2 and receives sound data from the distributor terminal 2.
 視聴者端末4は図1に示すように入力部12とアバター生成部13と3次元空間生成部14と表示映像生成部15と出力部16と通信部17とを備えている。 As shown in FIG. 1, the viewer terminal 4 includes an input section 12, an avatar generation section 13, a three-dimensional space generation section 14, a display video generation section 15, an output section 16, and a communication section 17.
 入力部12は、マイクロフォンやカメラや各種の操作子などを含んで構成されている。入力部12は通信部17に対して音データ及び映像データを出力する。 The input unit 12 includes a microphone, a camera, various operators, and the like. The input section 12 outputs audio data and video data to the communication section 17.
 アバター生成部13は、配信者端末2から受信した身体情報に基づいて配信者についてのアバターATを生成する。当該アバターATは例えば3次元オブジェクトとされる。 The avatar generation unit 13 generates an avatar AT for the distributor based on the physical information received from the distributor terminal 2. The avatar AT is, for example, a three-dimensional object.
 アバター生成部13が生成するアバターATは、配信者を撮影した撮影画像に基づいたものとされてもよいし、配信者または視聴者によって選択された特定のキャラクタに基づいたものとされてもよい。 The avatar AT generated by the avatar generation unit 13 may be based on a photographed image of the broadcaster, or may be based on a specific character selected by the broadcaster or the viewer. .
 また、アバターATの少なくとも一部が透明或いは半透明とされていてもよいし、人と分かる程度のプリミティブなものとされてもよい。 Furthermore, at least a part of the avatar AT may be transparent or semi-transparent, or may be primitive enough to be recognized as a human.
 アバター生成部13によって生成されたアバターATとしての3次元オブジェクトは、アバターモデルとして3次元空間生成部14に供給される。 The three-dimensional object as the avatar AT generated by the avatar generation unit 13 is supplied to the three-dimensional space generation unit 14 as an avatar model.
 なお、アバター生成部13は、配信者についての身体情報をアバターATに反映させる。 Note that the avatar generation unit 13 reflects physical information about the distributor on the avatar AT.
 具体的には、身体情報として配信者の顔の向き、視線の向き、関節の形状などをアバターATの姿勢に反映させる。また、身体情報としての配信者のジェスチャーや歩行や停止の状態、或いは、歩行速度などをアバターATの動きとして反映させる。なお、配信者の体の向きについては配置時に考慮される。 Specifically, physical information such as the distributor's face direction, line of sight direction, joint shape, etc. is reflected in the posture of the avatar AT. In addition, the distributor's gestures, walking or stopping state, walking speed, etc. as physical information are reflected as the movement of the avatar AT. Note that the orientation of the broadcaster's body is taken into consideration during placement.
 このように配信者の姿勢や動きを反映させたアバターATを生成することにより、視聴者に対して配信者の状況等を知覚させることが可能となる。 By generating an avatar AT that reflects the distributor's posture and movements in this way, it is possible to make the viewer perceive the distributor's situation, etc.
 なお、アバター生成部13は、身体情報をアバターATに反映させる替わりにアバターATに取らせるべき姿勢や向きなどの情報をアバターモデルの付加情報として3次元空間生成部14に提供してもよい。 Note that instead of reflecting the physical information on the avatar AT, the avatar generation unit 13 may provide the three-dimensional space generation unit 14 with information such as the posture and orientation that the avatar AT should take as additional information of the avatar model.
 アバター生成部13は、配信者の動きをアニメーションとしてアバターATに付与してもよい。
 配信者についてのアバターATに付与するアニメーションは、配信者の移動速度や移動態様(徒歩や自転車など)によって決定されてもよい。
The avatar generation unit 13 may add the distributor's movements as an animation to the avatar AT.
The animation given to the avatar AT regarding the distributor may be determined based on the distributor's moving speed or movement mode (such as walking or cycling).
 例えば、配信者が歩いている場合には歩いているアニメーションを付与し、走っている場合には走っているアニメーションを付与してもよい。
 また、配信者が自転車に乗って移動していると推定される場合には、アバターATに対して自転車に乗って移動しているアニメーションを付与してもよい。
For example, if the distributor is walking, a walking animation may be added, and if the distributor is running, a running animation may be added.
Further, if it is estimated that the distributor is moving on a bicycle, an animation of the avatar AT may be provided with an animation of the broadcaster moving on a bicycle.
 更に、配信者によるジェスチャー動作を身体情報として受信した場合には、アバター生成部13は、アバターATに当該ジェスチャー動作を行わせるようにアニメーションを付与してもよい。 Further, when a gesture movement by the distributor is received as physical information, the avatar generation unit 13 may add an animation to make the avatar AT perform the gesture movement.
 これらの各種の情報は上述したように身体情報として配信者端末2から送信される。 These various types of information are transmitted from the distributor terminal 2 as physical information as described above.
 アバター生成部13は、配信者端末2と視聴者端末4の通信中において複数回身体情報をアバターATに反映させる処理を行う。例えば、1秒間に1度など定期的に身体情報をアバターATに反映させてもよいし、配信者の姿勢や向きや動きなどに変化が生じた際にそれらの身体情報をアバターATに反映させてもよい。 The avatar generation unit 13 performs processing to reflect physical information on the avatar AT multiple times during communication between the distributor terminal 2 and the viewer terminal 4. For example, physical information may be reflected on the avatar AT periodically, such as once every second, or physical information may be reflected on the avatar AT when a change occurs in the broadcaster's posture, direction, movement, etc. It's okay.
 3次元空間生成部14は、配信者端末2から受信した映像(撮影画像)に対する画像認識(3次元推定処理)を行うことにより床(地面)や壁などの画像領域をそれぞれ特定し、3次元空間を生成する。 The three-dimensional space generation unit 14 performs image recognition (three-dimensional estimation processing) on the video (photographed image) received from the distributor terminal 2 to identify image areas such as the floor (ground) and walls, and generates three-dimensional space. Generate space.
 3次元空間の生成方法には幾つか考えられる。例えば、映像に映る各種の被写体について画像認識を行うことにより3次元のオブジェクトを生成し、それぞれの3次元オブジェクトの各面に映像から切り出した画像をテクスチャとして貼り付けることにより、撮影画像に基づく3次元空間を生成することができる。 There are several possible ways to generate a three-dimensional space. For example, a three-dimensional object is generated by performing image recognition on various subjects shown in a video, and by pasting an image cut out from the video as a texture on each surface of each three-dimensional object, a three-dimensional object based on the captured image can be created. Dimensional spaces can be generated.
 また、球体の内面に撮影画像を貼り付けることにより3次元空間を実現してもよい。 Additionally, a three-dimensional space may be realized by pasting a photographed image on the inner surface of a sphere.
 3次元空間生成部14は、生成した3次元空間において視聴位置とアバターATを配置する。 The three-dimensional space generation unit 14 arranges the viewing position and the avatar AT in the generated three-dimensional space.
 視聴位置は、生成された3次元空間における視聴者の視点となる位置であり、仮想カメラの位置と捉える事ができる。 The viewing position is the position of the viewer's viewpoint in the generated three-dimensional space, and can be regarded as the position of the virtual camera.
 3次元空間の生成に用いられた映像を撮影したカメラの位置と、3次元空間の生成後に設定される仮想カメラの位置は同じ位置とされてもよく、また、異なる位置とされてもよい。 The position of the camera that captured the video used to generate the three-dimensional space and the position of the virtual camera set after generating the three-dimensional space may be the same or different positions.
 3次元空間生成部14は、認識した3次元空間における3次元推定処理の結果に基づいてアバターATの配置に適した位置を「配置可能位置」として特定する。 The three-dimensional space generation unit 14 specifies a position suitable for placing the avatar AT as a "placeable position" based on the result of the three-dimensional estimation process in the recognized three-dimensional space.
 配置可能位置の特定は、鉛直上方を向く地面や足場などが検出された場合にその領域を配置可能位置と特定する。即ち、3次元空間生成部14は、アバターATを配置しても空間的整合性が確保できる位置を配置可能位置と特定する。
 換言すれば、鉛直上方を向く地面などが検出されない場合に配置可能位置が特定できないと判定する。
In order to specify the position where the device can be placed, when the ground, foothold, etc. facing vertically upward is detected, that area is specified as the position where it can be placed. That is, the three-dimensional space generation unit 14 specifies a position where spatial consistency can be ensured even if the avatar AT is placed as a position where the avatar AT can be placed.
In other words, if the ground facing vertically upward is not detected, it is determined that the possible placement position cannot be specified.
 なお、配置可能位置は、アバターATを配置可能な領域とされた配置可能領域であってもよい。 Note that the placeable position may be a placeable area where the avatar AT can be placed.
 このとき、3次元空間生成部14は、先ず、視聴位置に仮想カメラを設定した際の画角内において配置可能位置を特定する。即ち、3次元空間生成部14は、視聴者に提示される映像内においてアバターATを配置可能な位置が存在するか否かを判定する。 At this time, the three-dimensional space generation unit 14 first specifies a position where the virtual camera can be placed within the angle of view when the virtual camera is set at the viewing position. That is, the three-dimensional space generation unit 14 determines whether or not there is a position where the avatar AT can be placed in the video presented to the viewer.
 3次元空間生成部14は、配置可能位置にアバター生成部13によって生成されたアバターATを配置する。 The three-dimensional space generation unit 14 places the avatar AT generated by the avatar generation unit 13 at a position where it can be placed.
 視聴位置に設定された仮想カメラの画角内にアバターATが配置された場合には、視聴者端末4の出力部16としてのモニタ等の表示部にアバターATが配置された撮影画像が表示される。 When the avatar AT is placed within the viewing angle of the virtual camera set at the viewing position, the captured image with the avatar AT placed is displayed on a display unit such as a monitor as the output unit 16 of the viewer terminal 4. Ru.
 視聴者端末4の出力部16に表示される画像の一例を図4に示す。
 図示するように、出力部16における表示画像においては、道の左端に配信者の画像をテクスチャとして貼り付けた3次元オブジェクトとしてのアバターATが配置されている。
FIG. 4 shows an example of an image displayed on the output unit 16 of the viewer terminal 4.
As shown in the figure, in the display image of the output unit 16, an avatar AT as a three-dimensional object with an image of the distributor pasted as a texture is placed at the left end of the road.
 なお、3次元空間生成部14における配置可能位置の特定においては更なる条件を考慮してもよい。
 例えば、地面などが検出された場合であっても、実際に人が立つことができない位置や立つことが適切でない位置は配置可能位置ではないと判定してもよい。
Note that additional conditions may be taken into consideration when specifying possible placement positions in the three-dimensional space generation unit 14.
For example, even if the ground is detected, it may be determined that a position where a person cannot actually stand or a position where it is inappropriate for a person to stand is not a position that can be placed.
 例えば、車道と歩道がある道路において車道の領域は配置可能位置でないと判定してもよい。即ち、歩道の領域のみが配置可能位置と特定されてもよい。 For example, on a road with a roadway and a sidewalk, it may be determined that the roadway area is not a position where it can be placed. That is, only the area of the sidewalk may be specified as the position where it can be placed.
 また、鉛直上方を向く面を検出したとしてもその面が湖面などの水面である場合には配置可能位置でないと判定してもよい。 Furthermore, even if a surface facing vertically upward is detected, if that surface is a water surface such as a lake surface, it may be determined that the surface is not a position where it can be placed.
 或いは、検出した鉛直上方を向く面がスーツケースなどのオブジェクトの上面である場合に配置可能位置でないと判定してもよい。
 換言すれば、地面を検出した場合に、当該地面の上にオブジェクトが存在しない領域について配置可能位置と特定する。
Alternatively, if the detected vertically upward facing surface is the top surface of an object such as a suitcase, it may be determined that the position is not a position that can be placed.
In other words, when the ground is detected, a region where no object exists on the ground is identified as a placement possible position.
 また、他の通行人や動物などが存在する位置についても配置可能位置ではないと判定してもよい。 Additionally, it may be determined that positions where other passersby or animals are present are not positions that can be placed.
 3次元空間生成部14は、仮想カメラの画角内に配置可能位置が特定できないとした場合に、仮想カメラの画角内にアバターATを配置しないことを決定する。 The three-dimensional space generation unit 14 determines not to arrange the avatar AT within the field of view of the virtual camera when a position that can be placed within the field of view of the virtual camera cannot be specified.
 仮想カメラの画角内にアバターATを配置しないと決定した場合の処理としては幾つか考えられる。 There are several possible processes when it is decided not to place the avatar AT within the field of view of the virtual camera.
 一つは、仮想カメラの画角外にアバターATを配置することである。
 これにより、仮想カメラの画角に基づく映像を視聴する視聴者に対して、不自然な状態のアバターATを視認させずに済む。
One is to arrange the avatar AT outside the field of view of the virtual camera.
Thereby, the viewer who views the video based on the angle of view of the virtual camera does not have to view the avatar AT in an unnatural state.
 もう一つは、視聴者端末4の出力部16としての表示部上の特定の位置にアバターATを表示させる。このとき、3次元空間に配置されたアバターATではないことが分かるように、例えば空などが映し出されやすい表示部の上方にアバターATを表示する領域を確保した上でアバターATを表示させてもよい。 The other is to display the avatar AT at a specific position on the display section as the output section 16 of the viewer terminal 4. At this time, in order to make it clear that the avatar AT is not placed in a three-dimensional space, it is possible to display the avatar AT after securing an area above the display area where, for example, the sky is likely to be displayed. good.
 なお、3次元空間生成部14は、上述した以外の条件に基づいてアバターATを仮想カメラの画角内に配置しないと決定してもよい。例えば、視聴者が風景を見たいという要望が推定できる場合や、或いは、視聴者がアバターATを非表示にする操作を行った場合などに、アバターATを仮想カメラの画角内に配置しないと決定してもよい。 Note that the three-dimensional space generation unit 14 may decide not to arrange the avatar AT within the viewing angle of the virtual camera based on conditions other than those described above. For example, if the viewer's desire to see scenery can be inferred, or if the viewer performs an operation to hide the avatar AT, it is necessary to place the avatar AT within the viewing angle of the virtual camera. You may decide.
 その場合には、3次元空間生成部14は、即座にアバターATを画角外に配置してもよいし、アバターATが画角外に移動するようにアバターATにアニメーションを付与させることによりアバターATが自然と画角外に移動するようにしてもよい。当該アニメーションを付与することで、視聴者に対して高い臨場感を提供することができる。 In that case, the three-dimensional space generation unit 14 may immediately place the avatar AT outside the angle of view, or may add animation to the avatar AT so that the avatar AT moves outside the angle of view. The AT may naturally move out of the field of view. By adding the animation, a high sense of realism can be provided to the viewer.
 図5は、配信者端末2によって撮影された映像が表示される主領域Ar3と、アバターATのみが表示されるサブ領域Ar4が出力部16に表示される例である。 FIG. 5 is an example in which a main area Ar3 where the video shot by the distributor terminal 2 is displayed and a sub area Ar4 where only the avatar AT is displayed are displayed on the output unit 16.
 3次元空間の所定位置に配置されたアバターATではないことを示すために、サブ領域Ar4は矩形状の枠に囲まれた領域とされている。 In order to indicate that the avatar AT is not placed at a predetermined position in a three-dimensional space, the sub-area Ar4 is an area surrounded by a rectangular frame.
 これにより、視聴者は違和感なくアバターATを介して配信者の状態を把握することができる。 Thereby, the viewer can understand the status of the broadcaster through the avatar AT without feeling uncomfortable.
 3次元空間生成部14は、撮影地における配信者の体の向きをと3次元空間におけるアバターATの体の向きが一致するように、アバターATを配置可能位置に配置する。 The three-dimensional space generation unit 14 arranges the avatar AT at a position where the avatar AT can be placed so that the direction of the distributor's body at the shooting location matches the direction of the body of the avatar AT in the three-dimensional space.
 なお、アバター生成部13によって生成されたアバターATは、顔の向きや視線の向き等が配信者の顔の向きや視線の向きと一致するように生成されている。
 従って、3次元空間生成部14によってアバターATの体の向きが実空間における配信者の体の向きと一致するように配置されることで、配信者の顔の向きや視線の向きについても実空間における配信者の顔の向きや視線の向きと一致される。従って、実空間における配信者の姿勢や各種の向き等を3次元空間内において違和感なく再現することができる。
Note that the avatar AT generated by the avatar generation unit 13 is generated so that the direction of the face, the direction of the line of sight, etc. match the direction of the face and the direction of the line of sight of the distributor.
Therefore, by arranging the avatar AT so that the body direction of the avatar AT matches the body direction of the broadcaster in the real space by the three-dimensional space generation unit 14, the direction of the face and line of sight of the broadcaster can also be adjusted in the real space. The direction of the broadcaster's face and line of sight will be matched. Therefore, the posture and various orientations of the distributor in real space can be reproduced in three-dimensional space without any discomfort.
 3次元空間生成部14は、アバターATが配置され視聴位置が設定された3次元空間についてのモデル(以降、適宜「3D(Dimension)空間モデル」と記載)を表示映像生成部15に出力する。 The three-dimensional space generation unit 14 outputs a model of a three-dimensional space in which the avatar AT is arranged and the viewing position is set (hereinafter referred to as a “3D (Dimension) space model” as appropriate) to the display video generation unit 15.
 表示映像生成部15は、アバターATが配置された3次元空間に関する3D空間モデルを用いて、視聴位置に設定された仮想カメラからの視聴映像を生成するためのレンダリング処理を行う。換言すれば、表示映像生成部15は、仮想カメラを用いて3次元空間を撮像することにより2次元の画像を得る。
 これにより、表示映像生成部15は2次元のレンダリング映像を出力部16に出力する。
The display video generation unit 15 performs rendering processing to generate a viewing video from a virtual camera set at a viewing position, using a 3D space model regarding a three-dimensional space in which the avatar AT is placed. In other words, the display video generation unit 15 obtains a two-dimensional image by capturing a three-dimensional space using a virtual camera.
Thereby, the display image generation section 15 outputs a two-dimensional rendered image to the output section 16.
 出力部16は、配信者端末2から受信した音や映像についての出力を行う。
 具体的に、出力部16は、配信者端末2から受信した音データを出力部16としてのスピーカなどから出力する。また、出力部16は、表示映像生成部15から供給されたレンダリング映像を出力部16としての表示部に表示させる処理を行う。
The output unit 16 outputs the sound and video received from the distributor terminal 2.
Specifically, the output unit 16 outputs the sound data received from the distributor terminal 2 from a speaker or the like serving as the output unit 16. Further, the output unit 16 performs a process of displaying the rendered image supplied from the display image generation unit 15 on a display unit serving as the output unit 16.
 通信部17は、配信者端末2から音データ、映像データ及び身体情報を受信し、音データを出力部16に供給し、映像データを3次元空間生成部14に供給し、身体情報をアバター生成部13に供給する。また、通信部17は、視聴者端末4において取得した音データと映像データを配信者端末2に送信する。
The communication unit 17 receives sound data, video data, and physical information from the distributor terminal 2, supplies the sound data to the output unit 16, supplies the video data to the three-dimensional space generation unit 14, and uses the physical information to generate an avatar. 13. Furthermore, the communication unit 17 transmits the audio data and video data acquired by the viewer terminal 4 to the distributor terminal 2.
<1-2.コンピュータ装置>
 上述した配信者端末2とHM装置3と視聴者端末4は、コンピュータ装置が所定のプログラムを実行することによりそれぞれの装置における各機能を実現する。
<1-2. Computer equipment>
The above-described distributor terminal 2, HM device 3, and viewer terminal 4 realize each function in each device by having a computer device execute a predetermined program.
 コンピュータ装置の機能ブロック図を図6に示す。
 なお、各コンピュータ装置が以下に示す全ての構成を備えている必要はなく、一部のみを備えていてもよい。
A functional block diagram of the computer device is shown in FIG.
Note that each computer device does not need to have all of the configurations shown below, and may have only some of them.
 各コンピュータ装置のCPU(Central Processing Unit)71は、図6に示すように、ROM(Read Only Memory)72や例えばEEP-ROM(Electrically Erasable Programmable Read-Only Memory)などの不揮発性メモリ部74に記憶されているプログラム、または記憶部79からRAM(Random Access Memory)73にロードされたプログラムに従って各種の処理を実行する。また、RAM73にはCPU71が各種の処理を実行する上で必要なデータなども適宜記憶される。
 CPU71、ROM72、RAM73、不揮発性メモリ部74は、バス83を介して相互に接続されている。このバス83にはまた、入出力インタフェース75も接続されている。
As shown in FIG. 6, the CPU (Central Processing Unit) 71 of each computer device stores data in a nonvolatile memory section 74 such as a ROM (Read Only Memory) 72 or an EEP-ROM (Electrically Erasable Programmable Read-Only Memory). Various processes are executed according to the program currently running or the program loaded from the storage unit 79 to the RAM (Random Access Memory) 73. Further, the RAM 73 also appropriately stores data necessary for the CPU 71 to execute various processes.
The CPU 71, ROM 72, RAM 73, and nonvolatile memory section 74 are interconnected via a bus 83. An input/output interface 75 is also connected to this bus 83.
 入出力インタフェース75には、操作子や操作デバイスよりなる入力部76が接続される。
 例えば入力部76としては、キーボード、マウス、キー、ダイヤル、タッチパネル、タッチパッド、リモートコントローラ等の各種の操作子や操作デバイスが想定される。
 入力部76によりユーザの操作が検知され、入力された操作に応じた信号はCPU71によって解釈される。
The input/output interface 75 is connected to an input section 76 consisting of an operator or an operating device.
For example, as the input unit 76, various operators and operating devices such as a keyboard, a mouse, a key, a dial, a touch panel, a touch pad, and a remote controller are assumed.
A user's operation is detected by the input unit 76, and a signal corresponding to the input operation is interpreted by the CPU 71.
 また入出力インタフェース75には、LCD(Liquid Cristal Display)或いは有機ELパネルなどよりなる表示部77や、スピーカなどよりなる音声出力部78が一体又は別体として接続される。
 表示部77は各種表示を行う表示部であり、例えばコンピュータ装置としての配信者端末2の前面に設けられるディスプレイデバイスや、筐体に接続される別体のディスプレイデバイス等により構成される。
 表示部77は、CPU71の指示に基づいて表示画面上に各種の画像や動画(映像)等の表示を実行する。また表示部77はCPU71の指示に基づいて、各種操作メニュー、アイコン、メッセージ等、即ちGUI(Graphical User Interface)としての表示を行う。
Further, a display section 77 consisting of an LCD (Liquid Cristal Display) or an organic EL panel, and an audio output section 78 consisting of a speaker etc. are connected to the input/output interface 75 either integrally or separately.
The display unit 77 is a display unit that performs various displays, and is configured by, for example, a display device provided on the front of the distributor terminal 2 as a computer device, a separate display device connected to the housing, or the like.
The display unit 77 displays various images, moving images (videos), etc. on the display screen based on instructions from the CPU 71. Further, the display unit 77 displays various operation menus, icons, messages, etc., ie, as a GUI (Graphical User Interface), based on instructions from the CPU 71.
 入出力インタフェース75には、ハードディスクや固体メモリなどより構成される記憶部79や、モデムなどより構成される通信部80が接続される場合もある。 The input/output interface 75 may be connected to a storage section 79 made up of a hard disk, solid-state memory, etc., and a communication section 80 made up of a modem or the like.
 通信部80は、インターネット等の伝送路を介しての通信処理や、各種機器との有線通信或いは無線通信、バス通信などによる通信を行う。 The communication unit 80 performs communication processing via a transmission path such as the Internet, and communicates with various devices by wired or wireless communication, bus communication, or the like.
 入出力インタフェース75にはまた、必要に応じてドライブ81が接続され、磁気ディスク、光ディスク、光磁気ディスク、或いは半導体メモリなどのリムーバブル記憶媒体82が適宜装着される。
 ドライブ81により、リムーバブル記憶媒体82からは画像ファイル等のデータファイルや、各種のコンピュータプログラムなどを読み出すことができる。読み出されたデータファイルは記憶部79に記憶されたり、データファイルに含まれる画像や音声が表示部77や音声出力部78で出力されたりする。またリムーバブル記憶媒体82から読み出されたコンピュータプログラム等は必要に応じて記憶部79にインストールされる。
A drive 81 is also connected to the input/output interface 75 as required, and a removable storage medium 82 such as a magnetic disk, optical disk, magneto-optical disk, or semiconductor memory is appropriately installed.
The drive 81 can read data files such as image files and various computer programs from the removable storage medium 82 . The read data file is stored in the storage section 79, and images and sounds included in the data file are outputted on the display section 77 and the audio output section 78. Further, computer programs and the like read from the removable storage medium 82 are installed in the storage unit 79 as necessary.
 このコンピュータ装置では、例えば本実施の形態の処理のためのソフトウェアを、通信部80によるネットワーク通信やリムーバブル記憶媒体82を介してインストールすることができる。或いは当該ソフトウェアは予めROM72や記憶部79等に記憶されていてもよい。 In this computer device, for example, software for the processing of this embodiment can be installed via network communication by the communication unit 80 or the removable storage medium 82. Alternatively, the software may be stored in advance in the ROM 72, storage unit 79, or the like.
 CPU71が各種のプログラムに基づいて処理動作を行うことで、配信者端末2の通信部8やHM装置3の通信部11や視聴者端末4の通信部17において必要な情報処理や通信処理が実行される。
 なお、配信者端末2やHM装置3や視聴者端末4を構成するコンピュータ装置は、図6のようなコンピュータ装置が単一で構成されることに限らず、複数のコンピュータ装置がシステム化されて構成されてもよい。複数のコンピュータ装置は、LAN等によりシステム化されていてもよいし、インターネット等を利用したVPN等により遠隔地に配置されたものでもよい。複数のコンピュータ装置には、クラウドコンピューティングサービスによって利用可能なサーバ群(クラウド)としてのコンピュータ装置が含まれてもよい。
By the CPU 71 performing processing operations based on various programs, necessary information processing and communication processing are executed in the communication section 8 of the distributor terminal 2, the communication section 11 of the HM device 3, and the communication section 17 of the viewer terminal 4. be done.
Note that the computer devices constituting the distributor terminal 2, HM device 3, and viewer terminal 4 are not limited to a single computer device as shown in FIG. may be configured. The plurality of computer devices may be systemized using a LAN or the like, or may be located at remote locations via a VPN using the Internet or the like. The plurality of computer devices may include computer devices as a server group (cloud) that can be used by a cloud computing service.
<1-3.処理フロー>
 配信者端末2とHM装置3と視聴者端末4それぞれにおいて実行される処理の流れの一例を図7に示す。なお、図7に示す各処理の実行順序はあくまで一例であり、一部の処理が前後してもよいし一部の処理が並行して実行されてもよい。
<1-3. Processing flow>
FIG. 7 shows an example of the flow of processing executed in each of the distributor terminal 2, HM device 3, and viewer terminal 4. Note that the execution order of each process shown in FIG. 7 is just an example, and some processes may be executed one after the other, or some processes may be executed in parallel.
 HM装置3のCPU71は、ステップS101においてIMU9からIMUデータを取得し、ステップS102において当該IMUデータを配信者端末2に送信する。 The CPU 71 of the HM device 3 acquires IMU data from the IMU 9 in step S101, and transmits the IMU data to the distributor terminal 2 in step S102.
 一方、配信者端末2のCPU71は、配信者側の音データと映像データをマイクロフォンやイメージセンサが備えるRAM73から取得する処理をステップS201で実行した上で、HM装置3から送信されたIMUデータを受信する処理をステップS202で実行する。 On the other hand, the CPU 71 of the distributor terminal 2 executes the process of acquiring the audio data and video data on the distributor side from the RAM 73 included in the microphone and image sensor in step S201, and then acquires the IMU data transmitted from the HM device 3. The receiving process is executed in step S202.
 配信者端末2のCPU71は、続くステップS203において、受信したIMUデータに基づいて配信者の身体情報を取得する。 In the following step S203, the CPU 71 of the distributor terminal 2 acquires the distributor's physical information based on the received IMU data.
 配信者端末2のCPU71は、ステップS204において、配信者側の音データと映像データ、そして配信者について推定された身体情報を視聴者端末4に送信する。 In step S204, the CPU 71 of the distributor terminal 2 transmits the audio data and video data of the distributor, and the physical information estimated about the distributor to the viewer terminal 4.
 視聴者端末4のCPU71は、視聴者側の音データと映像データをマイクロフォンやカメラが備えるRAM73から取得する処理をステップS301で実行した上で、配信者端末2から送信された音データと映像データと身体情報を受信する処理をステップS302で実行する。 The CPU 71 of the viewer terminal 4 executes the process of acquiring the audio data and video data on the viewer side from the RAM 73 provided in the microphone and camera in step S301, and then acquires the audio data and video data transmitted from the distributor terminal 2. and physical information is received in step S302.
 視聴者端末4のCPU71は、ステップS303において、身体情報に基づいてアバターATを生成する。このとき生成されるアバターATは、上述したように身体情報に基づくアニメーションが付与されたものであってもよい。 In step S303, the CPU 71 of the viewer terminal 4 generates an avatar AT based on the physical information. The avatar AT generated at this time may be provided with an animation based on physical information as described above.
 視聴者端末4のCPU71は、ステップS304において、配信者側で撮影された映像データに基づいて3次元空間を生成する。例えば、視聴者端末4のCPU71は、映像に対する画像認識処理を行うことにより3次元推定を行い、被写体それぞれについての3次元モデルを生成する。 In step S304, the CPU 71 of the viewer terminal 4 generates a three-dimensional space based on the video data captured by the distributor. For example, the CPU 71 of the viewer terminal 4 performs three-dimensional estimation by performing image recognition processing on the video, and generates a three-dimensional model for each subject.
 視聴者端末4のCPU71は、ステップS305において、生成された3次元空間において視聴位置を設定する。 In step S305, the CPU 71 of the viewer terminal 4 sets the viewing position in the generated three-dimensional space.
 視聴者端末4のCPU71は、3次元推定の結果に基づいてアバターATを配置可能な配置可能位置を特定し、続くステップS307において、配置可能領域にアバターATを配置する。 The CPU 71 of the viewer terminal 4 identifies a placement position where the avatar AT can be placed based on the result of the three-dimensional estimation, and in the subsequent step S307 places the avatar AT in the placement area.
 視聴者端末4のCPU71は、ステップS308において、アバターATが配置された3次元空間に関する3D空間モデルと視聴位置に基づいて視聴映像を生成するためのレンダリング処理を行う。 In step S308, the CPU 71 of the viewer terminal 4 performs rendering processing to generate a viewing video based on the 3D space model and viewing position regarding the 3D space in which the avatar AT is placed.
 視聴者端末4のCPU71は、ステップS309において、配信者端末2から受信した音データから音響信号を生成して再生を行う。 In step S309, the CPU 71 of the viewer terminal 4 generates and reproduces an audio signal from the sound data received from the distributor terminal 2.
 視聴者端末4のCPU71は、ステップS310において、ステップS308のレンダリング処理によって生成されたレンダリング映像を出力する。 In step S310, the CPU 71 of the viewer terminal 4 outputs the rendered video generated by the rendering process in step S308.
 視聴者端末4のCPU71は、ステップS311において、ステップS301で取得した視聴者側の音データ及び映像データを配信者端末2に送信する。 In step S311, the CPU 71 of the viewer terminal 4 transmits the audio data and video data on the viewer side acquired in step S301 to the distributor terminal 2.
 配信者端末2のCPU71は、ステップS205において当該音データと映像データを受信し、ステップS206において音データのみをHM装置3へ送信する。
 これに応じて、HM装置3のCPU71はステップS103において視聴者側の音データを受信し、ステップS104において当該音データから音響信号を生成し再生する。
The CPU 71 of the distributor terminal 2 receives the audio data and video data in step S205, and transmits only the audio data to the HM device 3 in step S206.
In response, the CPU 71 of the HM device 3 receives the audio data on the viewer side in step S103, and generates and reproduces an audio signal from the audio data in step S104.
 配信者端末2のCPU71はステップS207において、残った映像データを出力する。 The CPU 71 of the distributor terminal 2 outputs the remaining video data in step S207.
 このようにして、配信者側で取得された音データと音響データに基づいた再生処理が視聴者側の環境で行われると共に、視聴者側で取得された音データと音響データに基づいた再生処理が配信者側で行われる。 In this way, playback processing based on the sound data and acoustic data acquired on the distributor side is performed in the viewer's environment, and playback processing based on the sound data and acoustic data acquired on the viewer side. is performed by the distributor.
 図7に示す処理の流れを実現するために各装置が実行する処理について説明する。 The processing executed by each device to realize the processing flow shown in FIG. 7 will be described.
 図8は、HM装置3のCPU71が実行する処理の一例である。なお、図7に示す処理と同様の処理については同じステップ番号を付し適宜説明を省略する。 FIG. 8 is an example of a process executed by the CPU 71 of the HM device 3. Note that processes similar to those shown in FIG. 7 are given the same step numbers, and description thereof will be omitted as appropriate.
 HM装置3のCPU71は、ステップS111でIMUデータの取得タイミングが否かを判定する。IMUデータの取得タイミングであると判定した場合、HM装置3のCPU71はステップS101においてIMUデータを取得する。 In step S111, the CPU 71 of the HM device 3 determines whether it is time to acquire IMU data. If it is determined that it is time to acquire the IMU data, the CPU 71 of the HM device 3 acquires the IMU data in step S101.
 HM装置3のIMU9は例えば数msecごとにセンシングデータを出力するが、CPU71はこれらのセンシングデータを数百msecごと或いは数秒ごとにまとめて取得してもよいし、センシングデータの出力ごとに取得してもよい。 The IMU 9 of the HM device 3 outputs sensing data, for example, every few msec, but the CPU 71 may acquire these sensing data all at once every several hundred msec or every few seconds, or may acquire these sensing data every time the sensing data is output. It's okay.
 HM装置3のCPU71は、ステップS102において、取得したIMUデータを配信者端末2に送信する処理を通信部11に実行させる。 In step S102, the CPU 71 of the HM device 3 causes the communication unit 11 to execute a process of transmitting the acquired IMU data to the distributor terminal 2.
 一方、ステップS111においてIMUデータの取得タイミングでないと判定した場合、HM装置3のCPU71はステップS101及びステップS102の処理を実行せずにステップS112の処理へと進む。 On the other hand, if it is determined in step S111 that it is not the time to acquire IMU data, the CPU 71 of the HM device 3 proceeds to the process in step S112 without executing the processes in step S101 and step S102.
 ステップS112では、HM装置3のCPU71は、視聴者側で取得された音データを受信したか否かを判定する。
 音データを受信していないと判定した場合、HM装置3のCPU71はステップS111の処理へと戻る。
In step S112, the CPU 71 of the HM device 3 determines whether or not the sound data acquired on the viewer side has been received.
If it is determined that no sound data has been received, the CPU 71 of the HM device 3 returns to the process of step S111.
 一方、音データを受信したと判定した場合、HM装置3のCPU71はステップS104において、受信した音データから音響信号を生成しスピーカに供給することにより音響再生を行う。 On the other hand, if it is determined that sound data has been received, the CPU 71 of the HM device 3 reproduces sound by generating an audio signal from the received sound data and supplying it to the speaker in step S104.
 ステップS104の処理後、HM装置3のCPU71はステップS111の処理へと戻る。即ち、HM装置3のCPU71は、ステップS111の判定処理とステップS112の判定処理を繰り返し実行しながら、条件を満たした際に、それに対応する処理を実行する。 After the process in step S104, the CPU 71 of the HM device 3 returns to the process in step S111. That is, the CPU 71 of the HM device 3 repeatedly executes the determination process in step S111 and the determination process in step S112, and when a condition is satisfied, executes the corresponding process.
 図9は、配信者端末2のCPU71が実行する処理の一例である。図7に示す処理と同様の処理については同じステップ番号を付し適宜説明を省略する。 FIG. 9 is an example of a process executed by the CPU 71 of the distributor terminal 2. Processes similar to those shown in FIG. 7 are given the same step numbers, and description thereof will be omitted as appropriate.
 配信者端末2のCPU71はステップS211において、HM装置3からIMUデータを受信したか否かを判定する。 In step S211, the CPU 71 of the distributor terminal 2 determines whether or not IMU data has been received from the HM device 3.
 IMUデータを受信したと判定した場合、配信者端末2のCPU71は、ステップS203において受信したIMUデータに基づいて配信者についての身体情報を取得し、ステップS201において配信者側の音データ及び映像データをマイクロフォンやカメラユニットなどから取得し、ステップS204において配信者側の音データ及び映像データと身体情報を視聴者端末4に送信する。 If it is determined that IMU data has been received, the CPU 71 of the distributor terminal 2 acquires physical information about the distributor based on the received IMU data in step S203, and acquires the audio data and video data on the distributor's side in step S201. is acquired from a microphone, camera unit, etc., and the audio data, video data, and physical information on the distributor side are transmitted to the viewer terminal 4 in step S204.
 一方、IMUデータを受信していないと判定した場合、配信者端末2のCPU71はステップS103、S201及びS204の各処理を実行せずにステップS212へと進む。 On the other hand, if it is determined that IMU data has not been received, the CPU 71 of the distributor terminal 2 proceeds to step S212 without executing the processes of steps S103, S201, and S204.
 なお、図9に示す例では、IMUデータを取得した際に音データと映像データと合わせてIMUデータを視聴者端末4に送信する例を示したが、IMUデータと音データと映像データの取得と送信をそれぞれ独立して行ってもよい。即ち、ある送信処理においてはIMUデータのみが送信され、ある送信処理においては映像データのみが送信されるなどしてもよい。 In addition, in the example shown in FIG. 9, when IMU data is acquired, the IMU data is transmitted to the viewer terminal 4 together with sound data and video data, but the acquisition of IMU data, sound data, and video data is and transmission may be performed independently. That is, in a certain transmission process, only IMU data may be transmitted, and in a certain transmission process, only video data may be transmitted.
 配信者端末2のCPU71はステップS212において、視聴者側の音データと映像データを受信したか否かを判定する。
 音データと映像データを受信していないと判定した場合、配信者端末2のCPU71はステップS211の処理へと戻る。
In step S212, the CPU 71 of the distributor terminal 2 determines whether or not the audio data and video data from the viewer side have been received.
If it is determined that the audio data and video data have not been received, the CPU 71 of the distributor terminal 2 returns to the process of step S211.
 一方、受信したと判定した場合、配信者端末2のCPU71はステップS206において、音データをHM装置3に送信する処理を行う。これに応じてHM装置3において図8に示すステップS104の処理が実行されることにより、視聴者側で取得された音を配信者が聴くことができる。 On the other hand, if it is determined that the sound data has been received, the CPU 71 of the distributor terminal 2 performs a process of transmitting the sound data to the HM device 3 in step S206. In response to this, the process of step S104 shown in FIG. 8 is executed in the HM device 3, so that the distributor can listen to the sound acquired on the viewer side.
 配信者端末2のCPU71はステップS207において、視聴者側で取得された映像データの出力が行われる。これにより、視聴者側で撮影された映像を配信者が見ることができる。 In step S207, the CPU 71 of the distributor terminal 2 outputs the video data acquired by the viewer. This allows the distributor to view the video shot by the viewer.
 ステップS207の処理後、配信者端末2のCPU71はステップS211の処理へと戻る。即ち、配信者端末2のCPU71は、ステップS211の判定処理とステップS212の判定処理を繰り返し実行しながら、条件を満たした際に、それに対応する処理を実行する。 After the process in step S207, the CPU 71 of the distributor terminal 2 returns to the process in step S211. That is, the CPU 71 of the distributor terminal 2 repeatedly executes the determination process in step S211 and the determination process in step S212, and when a condition is satisfied, executes the corresponding process.
 なお、ステップS212の判定処理では、音データと映像データの少なくとも何れか一方を受信したか否かを判定してもよい。
 この場合には、音データを受信したと判定した場合、配信者端末2のCPU71はステップS206の処理を実行し、映像データを受信したと判定した場合、配信者端末2のCPU71はステップS207の処理を実行する。
Note that in the determination process of step S212, it may be determined whether at least one of audio data and video data has been received.
In this case, if it is determined that the audio data has been received, the CPU 71 of the distributor terminal 2 executes the process of step S206, and if it is determined that the video data has been received, the CPU 71 of the distributor terminal 2 executes the process of step S207. Execute processing.
 図10は、視聴者端末4のCPU71が実行する処理の一例である。図7に示す処理と同様の処理については同じステップ番号を付し適宜説明を省略する。 FIG. 10 is an example of processing executed by the CPU 71 of the viewer terminal 4. Processes similar to those shown in FIG. 7 are given the same step numbers, and description thereof will be omitted as appropriate.
 視聴者端末4のCPU71はステップS321において、配信者側の音データと映像データと身体情報を受信したか否かを判定する。 In step S321, the CPU 71 of the viewer terminal 4 determines whether or not the audio data, video data, and physical information from the distributor have been received.
 受信していないと判定した場合、視聴者端末4のCPU71はステップS301の処理へと進む。 If it is determined that the message has not been received, the CPU 71 of the viewer terminal 4 proceeds to the process of step S301.
 一方、受信したと判定した場合、視聴者端末4のCPU71はステップS303において、受信した配信者についての身体情報に基づいてアバターATを生成する。 On the other hand, if it is determined that the message has been received, the CPU 71 of the viewer terminal 4 generates an avatar AT based on the received physical information about the distributor in step S303.
 視聴者端末4のCPU71はステップS304において、受信した配信者側の映像データに対する3次元推定を行い、3次元空間を生成する。 In step S304, the CPU 71 of the viewer terminal 4 performs three-dimensional estimation on the received video data from the distributor and generates a three-dimensional space.
 視聴者端末4のCPU71はステップS305において、生成した3次元空間における所定の位置に視聴位置を設定する。 In step S305, the CPU 71 of the viewer terminal 4 sets the viewing position at a predetermined position in the generated three-dimensional space.
 視聴者端末4のCPU71はステップS306において、生成した3次元空間における配置可能位置を特定する。 In step S306, the CPU 71 of the viewer terminal 4 identifies possible placement positions in the generated three-dimensional space.
 視聴者端末4のCPU71はステップS307において、配置可能位置にアバターATを配置する。 In step S307, the CPU 71 of the viewer terminal 4 places the avatar AT at a position where it can be placed.
 視聴者端末4のCPU71はステップS308において、レンダリング処理を行うことにより視聴位置からの視覚に基づいたレンダリング映像を生成する。 In step S308, the CPU 71 of the viewer terminal 4 generates a rendered video based on the visual perception from the viewing position by performing rendering processing.
 視聴者端末4のCPU71はステップS309において、配信者端末2から受信した音データから音響信号を生成して再生を行う。 In step S309, the CPU 71 of the viewer terminal 4 generates and reproduces an audio signal from the sound data received from the distributor terminal 2.
 視聴者端末4のCPU71はステップS310において、ステップS308のレンダリング処理によって生成されたレンダリング映像を出力する。 In step S310, the CPU 71 of the viewer terminal 4 outputs the rendered video generated by the rendering process in step S308.
 ステップS303からステップS310の各処理は、配信者端末2から情報を受信したことに応じて行われる一連の処理である。 Each process from step S303 to step S310 is a series of processes performed in response to receiving information from the distributor terminal 2.
 続いて、視聴者端末4のCPU71は、ステップS301において視聴者側の音データ及び映像データをマイクロフォンやカメラなどから取得し、ステップS311において視聴者側の音データ及び映像データを配信者端末2に送信する。 Subsequently, the CPU 71 of the viewer terminal 4 acquires audio data and video data on the viewer side from a microphone, camera, etc. in step S301, and transmits the audio data and video data on the viewer side to the distributor terminal 2 in step S311. Send.
 図10に示すように、視聴者端末4のCPU71は、ステップS301及びステップS311の処理を実行することにより視聴者側の音データや映像データを配信者端末2に送信しつつ、ステップS321の判定処理の結果に応じて適宜ステップS303からステップS310の各処理を実行する。
As shown in FIG. 10, the CPU 71 of the viewer terminal 4 transmits the audio data and video data on the viewer side to the distributor terminal 2 by executing the processes of steps S301 and S311, and makes the determination in step S321. Each process from step S303 to step S310 is executed as appropriate depending on the result of the process.
<2.第2の実施の形態>
 第2の実施の形態における情報処理システム1Aは、アバターATの位置を変更可能に構成された点と、アバターATの位置に応じた音響出力を行う点が第1の実施の形態における情報処理システム1と異なる。
<2. Second embodiment>
The information processing system 1A according to the second embodiment is the information processing system according to the first embodiment in that the position of the avatar AT can be changed and the sound output is performed according to the position of the avatar AT. Different from 1.
 情報処理システム1Aの構成例について図11に示す。なお、図1に示す情報処理システム1と同様の構成については同じ符号を付し適宜説明を省略する。 FIG. 11 shows an example of the configuration of the information processing system 1A. Note that the same components as the information processing system 1 shown in FIG. 1 are designated by the same reference numerals, and description thereof will be omitted as appropriate.
 情報処理システム1Aは、配信者側の装置としての配信者端末2AとHM装置3、そして視聴者側の装置としての視聴者端末4Aを備えている。HM装置3の構成については第1の実施の形態と同様の構成であるため説明を省略する。 The information processing system 1A includes a distributor terminal 2A and an HM device 3 as devices on the distributor side, and a viewer terminal 4A as a device on the viewer side. The configuration of the HM device 3 is the same as that of the first embodiment, so a description thereof will be omitted.
 配信者端末2Aは、第1の実施の形態と同様に、入力部5と身体情報取得部6と出力部7と通信部8とを備えている。 The distributor terminal 2A includes an input section 5, a physical information acquisition section 6, an output section 7, and a communication section 8, as in the first embodiment.
 入力部5は、アバターATの位置を変更するために行われる配信者の操作に応じて第1変更情報を通信部8に出力する。即ち、第1変更情報は、アバターATの位置についての変更情報などである。 The input unit 5 outputs first change information to the communication unit 8 in response to the distributor's operation to change the position of the avatar AT. That is, the first change information is change information regarding the position of the avatar AT.
 第1変更情報は、通信部8を介して視聴者端末4Aに送信される。 The first change information is transmitted to the viewer terminal 4A via the communication unit 8.
 視聴者端末4Aは、第1の実施の形態と同様に、入力部12とアバター生成部13と3次元空間生成部14と表示映像生成部15と出力部16と通信部17とを備え、更に、アバター位置制御部18と音響信号生成部19とを備えている。 Similarly to the first embodiment, the viewer terminal 4A includes an input section 12, an avatar generation section 13, a three-dimensional space generation section 14, a display video generation section 15, an output section 16, and a communication section 17. , an avatar position control section 18 and an acoustic signal generation section 19.
 配信者端末2Aから受信した第1変更情報は、通信部17を介してアバター位置制御部18に提供される。 The first change information received from the distributor terminal 2A is provided to the avatar position control unit 18 via the communication unit 17.
 また、アバターATの位置の変更指示は、視聴者の操作によっても可能とされる。アバターATの位置を変更するための視聴者の操作は同じく第1変更情報として入力部12を介してアバター位置制御部18に供給される。 Furthermore, the instruction to change the position of the avatar AT can also be made by the viewer's operation. The viewer's operation for changing the position of the avatar AT is also supplied to the avatar position control unit 18 via the input unit 12 as first change information.
 アバター位置制御部18は、通信部17を介して或いは入力部12を介して供給された第1変更情報に基づいてアバターATの位置を変更する。変更後のアバターATの位置は、アバターATの配置情報として3次元空間生成部14に供給される。 The avatar position control unit 18 changes the position of the avatar AT based on the first change information supplied via the communication unit 17 or via the input unit 12. The changed position of avatar AT is supplied to the three-dimensional space generation unit 14 as avatar AT placement information.
 3次元空間生成部14は、配信者端末2Aから通信部17を介して供給された映像データを用いて画像認識を行うことにより3次元推定を行い3次元空間を生成する。 The three-dimensional space generation unit 14 performs three-dimensional estimation and generates a three-dimensional space by performing image recognition using the video data supplied from the distributor terminal 2A via the communication unit 17.
 3次元空間生成部14は、アバター生成部13から供給されるアバターモデルを配置する配置可能位置を特定し、アバターATを配置する。そして、3次元空間生成部14は、アバター位置制御部18から供給された配置情報に基づいて、配置されたアバターATの位置を調整する。 The three-dimensional space generation unit 14 specifies a position where the avatar model supplied from the avatar generation unit 13 can be placed, and places the avatar AT. The three-dimensional space generation unit 14 then adjusts the position of the placed avatar AT based on the placement information supplied from the avatar position control unit 18.
 このとき、3次元空間生成部14は、空間的整合性が確保されるように変更後のアバターATの位置を調整してもよい。例えば、配置情報に基づくアバターATの新たな位置が配置可能位置ではなかった場合、当該新たな位置に最も近い配置可能位置を調整後の位置として決定してもよい。 At this time, the three-dimensional space generation unit 14 may adjust the position of the changed avatar AT so that spatial consistency is ensured. For example, if the new position of the avatar AT based on the placement information is not a positionable position, the position closest to the new position may be determined as the adjusted position.
 なお、現在の位置から新たな位置へアバターATを動かす場合には、アバター生成部13によって歩くアニメーションや走るアニメーションなどが付与されてもよい。 Note that when moving the avatar AT from the current position to a new position, the avatar generation unit 13 may add a walking animation, a running animation, or the like.
 3次元空間生成部14によって各部が配置された3D空間モデルは、表示映像生成部15だけでなく音響信号生成部19に供給される。 The 3D space model in which each part is arranged by the 3D space generation unit 14 is supplied not only to the display image generation unit 15 but also to the audio signal generation unit 19.
 音響信号生成部19は、配信者端末2Aにおいて取得された音データの音像が3次元空間に配置された各オブジェクトに定位された音響信号を生成する処理を行う。生成された音響信号はレンダリング音として出力部16に入力される。 The acoustic signal generation unit 19 performs processing to generate an acoustic signal in which a sound image of the sound data acquired at the distributor terminal 2A is localized to each object arranged in a three-dimensional space. The generated acoustic signal is input to the output unit 16 as rendered sound.
 例えば、配信者の発話音声が入力部5を介して配信者端末2Aに入力された場合には、該発話音声に係る音データが配信者端末2Aから視聴者端末4Aに送られてくる。 For example, when the voice uttered by a distributor is input to the distributor terminal 2A via the input unit 5, the sound data related to the voice uttered is sent from the distributor terminal 2A to the viewer terminal 4A.
 音響信号生成部19は、通信部17を介して当該発話音声に係る音データを受け取り、3次元空間生成部14によって生成された3D空間モデルにおけるアバターATの位置を特定し、視聴位置との位置関係に応じてアバターATの配置位置に発話音声の音像を定位させる。 The acoustic signal generation unit 19 receives the sound data related to the uttered sound via the communication unit 17, identifies the position of the avatar AT in the 3D space model generated by the 3D space generation unit 14, and compares the position with the viewing position. The sound image of the uttered voice is localized at the placement position of the avatar AT according to the relationship.
 例えば、図12に示すように、アバターATの位置が視聴位置からの視覚の画角内の左側によっている場合には、出力部16としての右スピーカ16Rよりも左スピーカ16Lからの出音を大きくすることにより、左側に配信者の発話音声の音像を定位させる。 For example, as shown in FIG. 12, when the avatar AT is positioned on the left side within the viewing angle from the viewing position, the sound output from the left speaker 16L as the output unit 16 is louder than the right speaker 16R. By doing so, the sound image of the broadcaster's uttered voice is localized on the left side.
 このようにして得られた音響信号を出力部16としてのスピーカからステレオ再生することにより、視聴者はアバターATが配置された位置から配信者の発話音声が自然と聞こえてくる環境を体験することができる。 By playing back the acoustic signal obtained in this way in stereo from the speaker serving as the output unit 16, the viewer can experience an environment in which the voice uttered by the broadcaster can be naturally heard from the position where the avatar AT is placed. Can be done.
 なお、ステレオ再生に限らず5.1chなどの多チャンネル再生を行うことにより配信者の発話音声が立体的に聞こえるように構成してもよい。 Note that the system is not limited to stereo playback, but may be configured to perform multi-channel playback such as 5.1ch so that the voice uttered by the distributor can be heard three-dimensionally.
 また、アバターATの位置が奥行き方向に変化した場合には、音響信号生成部19は、遅延や残響感等を付与した音響信号を生成することにより奥行き感を視聴者に知覚させることができる。 Further, when the position of the avatar AT changes in the depth direction, the audio signal generation unit 19 can make the viewer perceive a sense of depth by generating an audio signal with delay, reverberation, etc. added.
 更に、音響信号生成部19は、アバターATと視聴位置の距離に応じて音響信号の音量を調整してもよい。即ち、アバターATと視聴位置が近いほど音量を上げるように音響信号を生成してもよい。 Further, the audio signal generation unit 19 may adjust the volume of the audio signal depending on the distance between the avatar AT and the viewing position. That is, the sound signal may be generated such that the closer the viewing position is to the avatar AT, the higher the volume.
 具体的には図13に示すように、アバターATと視聴位置の距離が図12に示すよりも離れた場合には、出力部16としての左スピーカ16Lと右スピーカ16Rから出力される音の音量を図12に示すよりも小さくする。これにより、視聴者は、再生された音響についての奥行きを体感することができる。 Specifically, as shown in FIG. 13, when the distance between the avatar AT and the viewing position is greater than that shown in FIG. is made smaller than that shown in FIG. This allows the viewer to experience the depth of the reproduced sound.
 なお、アバターATが視聴位置からの視覚の画角外に位置している場合には、画角外に発話音声の音像を定位させることにより、画角外に配信者が位置していることを視聴者に知覚させることができる。 In addition, if the avatar AT is located outside the visual field of view from the viewing position, the sound image of the uttered voice is localized outside the field of view, thereby indicating that the broadcaster is located outside the field of view. It can be perceived by the viewer.
 なお、アバターATの配置位置に定位させる音は、配信者の発話音声だけでなく、配信者が手を叩くなどしたときに配信者から発生する音全般を対象とすることが望ましい。 Note that it is desirable that the sound to be localized at the placement position of the avatar AT includes not only the voice uttered by the distributor, but also all the sounds generated by the distributor when the distributor claps his or her hands.
 本実施の形態における配信者端末2Aが実行する処理の一例を図14に、視聴者端末4Aが実行する処理の一例を図15に示す。なお、HM装置3が実行する処理については先の第1の実施の形態で説明した図8と同様の処理であるため、説明を省略する。 FIG. 14 shows an example of the process executed by the distributor terminal 2A in this embodiment, and FIG. 15 shows an example of the process executed by the viewer terminal 4A. Note that the processing executed by the HM device 3 is the same as that shown in FIG. 8 described in the first embodiment, so the description thereof will be omitted.
 なお、各図において第1の実施の形態と同様の処理については同じステップ番号を付し適宜説明を省略する。 Note that in each figure, the same step numbers are given to processes similar to those in the first embodiment, and descriptions thereof are omitted as appropriate.
 配信者端末2AのCPU71は図14のステップS211において、IMUデータを受信したか否かを判定し、受信したと判定した場合に、ステップS203、S201およびS204の各処理を実行することにより、IMUデータに基づく身体情報と音データと映像データを視聴者端末4Aに送信する。 In step S211 of FIG. 14, the CPU 71 of the distributor terminal 2A determines whether or not the IMU data has been received, and when it is determined that the IMU data has been received, the CPU 71 of the distributor terminal 2A executes the processes of steps S203, S201, and S204 to update the IMU data. Physical information, sound data, and video data based on the data are transmitted to the viewer terminal 4A.
 一方、IMUデータを受信していないと判定した場合には、配信者端末2AのCPU71はステップS221へと進み、第1変更情報が入力されたか否かを判定する。第1変更情報は、先述したように、アバターATの位置を変更するための操作情報などとされる。 On the other hand, if it is determined that IMU data has not been received, the CPU 71 of the distributor terminal 2A proceeds to step S221, and determines whether or not the first change information has been input. As described above, the first change information is operation information for changing the position of the avatar AT.
 第1変更情報が入力されたと判定した場合、配信者端末2AのCPU71はステップS222において、第1変更情報を視聴者端末4Aに送信する。これにより、通信部8を介して視聴者端末4Aに第1変更情報が送信される。 If it is determined that the first change information has been input, the CPU 71 of the distributor terminal 2A transmits the first change information to the viewer terminal 4A in step S222. As a result, the first change information is transmitted to the viewer terminal 4A via the communication unit 8.
 ステップS221およびステップS222の各処理を終えた後、或いはステップS221において第1変更情報は入力されていないと判定した後、ステップS212へと進む。
 ステップS212では、配信者端末2AのCPU71は、視聴者端末4Aから音データと映像データを受信したか否かを判定し、受信した場合にはそれに対応する処理をステップS206、S207において実行する。
After completing each process of step S221 and step S222, or after determining in step S221 that the first change information has not been input, the process advances to step S212.
In step S212, the CPU 71 of the distributor terminal 2A determines whether or not audio data and video data have been received from the viewer terminal 4A, and if so, executes corresponding processing in steps S206 and S207.
 続いて図15の説明を行う。
 視聴者端末4AのCPU71はステップS321において配信者端末2Aから音データと映像データと身体情報の少なくとも一部を受信したか否かを判定する。
Next, FIG. 15 will be explained.
In step S321, the CPU 71 of the viewer terminal 4A determines whether at least part of the audio data, video data, and physical information has been received from the distributor terminal 2A.
 受信したと判定した場合、視聴者端末4AのCPU71は、ステップS303で身体情報を反映したアバターATを生成し、ステップS304で3次元空間を生成し、ステップS305で視聴位置を設定する。 If it is determined that it has been received, the CPU 71 of the viewer terminal 4A generates an avatar AT reflecting the physical information in step S303, generates a three-dimensional space in step S304, and sets a viewing position in step S305.
 続いて、視聴者端末4AのCPU71は、ステップS306で3次元推定結果に基づいて配置可能位置を特定し、ステップS307でアバターATを配置可能位置に配置する。 Subsequently, the CPU 71 of the viewer terminal 4A specifies a placement possible position based on the three-dimensional estimation result in step S306, and places the avatar AT at the placement possible position in step S307.
 視聴者端末4AのCPU71はステップS331で、第1変更情報を受信したか否かを判定する。この判定処理では、配信者端末2Aから第1変更情報を受信した場合だけでなく入力部12から第1変更情報を受け取った場合に受信したと判定する。 In step S331, the CPU 71 of the viewer terminal 4A determines whether or not the first change information has been received. In this determination process, it is determined that the first change information has been received not only when the first change information is received from the distributor terminal 2A but also when the first change information is received from the input unit 12.
 第1変更情報を受信したと判定した場合、視聴者端末4AのCPU71はステップS332において、アバターATの配置位置を変更する処理を行い、ステップS308の処理へと進む。このとき、変更後の配置位置が配置可能位置であるか否かを判定し、配置可能位置でない場合にはアバターATの新たな配置位置を調整する処理を行ってもよい。 If it is determined that the first change information has been received, the CPU 71 of the viewer terminal 4A performs a process of changing the placement position of the avatar AT in step S332, and proceeds to the process of step S308. At this time, it may be determined whether the changed placement position is a placement possible position or not, and if it is not a placement possible position, a process may be performed to adjust the new placement position of the avatar AT.
 なお、ステップS331において第1変更情報を受信していないと判定した場合、視聴者端末4AのCPU71はステップS332の処理を実行せずにステップS308の処理へと進む。
 ステップS308以降の各処理については説明を省略する。
Note that if it is determined in step S331 that the first change information has not been received, the CPU 71 of the viewer terminal 4A proceeds to the process in step S308 without executing the process in step S332.
Description of each process after step S308 will be omitted.
 なお、視聴者端末4AのCPU71は、ステップS307でアバターATの配置を行わずに、ステップS331の判定処理を行ってもよい。具体的には、ステップS331において第1変更情報を受信していないと判定した場合にはステップS332でアバターATを配置可能位置に配置する処理を行い、受信したと判定した場合にはステップS332で受信した第1変更情報を加味した上でアバターATの配置を行ってもよい。
Note that the CPU 71 of the viewer terminal 4A may perform the determination process in step S331 without arranging the avatar AT in step S307. Specifically, if it is determined in step S331 that the first change information has not been received, processing is performed to place the avatar AT in a position where it can be placed in step S332, and if it is determined that the first change information has been received, the process is performed in step S332. The avatar AT may be placed after taking into account the received first change information.
<3.第3の実施の形態>
 第3の実施の形態における情報処理システム1Bは視聴位置を視聴者が変更可能とされた点が先の各例と異なる。
<3. Third embodiment>
The information processing system 1B in the third embodiment differs from the previous examples in that the viewing position can be changed by the viewer.
 情報処理システム1Bの構成例について図16を参照して説明する。なお、図1に示す情報処理システム1と同様の構成については同じ符号を付し適宜説明を省略する。 An example of the configuration of the information processing system 1B will be described with reference to FIG. 16. Note that the same components as the information processing system 1 shown in FIG. 1 are designated by the same reference numerals, and description thereof will be omitted as appropriate.
 情報処理システム1Bは、配信者端末2とHM装置3と視聴者端末4Bとを備えている。配信者端末2とHM装置3の構成については第1の実施の形態と同様のため説明を省略する。 The information processing system 1B includes a distributor terminal 2, an HM device 3, and a viewer terminal 4B. The configurations of the distributor terminal 2 and the HM device 3 are the same as those in the first embodiment, so a description thereof will be omitted.
 視聴者端末4Bは、入力部12とアバター生成部13と3次元空間生成部14と表示映像生成部15と出力部16と通信部17とを備えている。 The viewer terminal 4B includes an input section 12, an avatar generation section 13, a three-dimensional space generation section 14, a display video generation section 15, an output section 16, and a communication section 17.
 入力部12は、視聴者による視聴位置の変更操作を受け付け、該変更操作の情報を第2変更情報として3次元空間生成部14に供給する。 The input unit 12 receives an operation to change the viewing position by the viewer, and supplies information about the change operation to the three-dimensional space generation unit 14 as second change information.
 3次元空間生成部14は、3次元推定により生成した3次元空間内において第2変更情報を加味した上で視聴位置、即ち仮想カメラの位置を設定する。 The three-dimensional space generation unit 14 sets the viewing position, that is, the position of the virtual camera, in the three-dimensional space generated by three-dimensional estimation, taking into account the second change information.
 このように、仮想カメラの位置を任意の位置に設定がきるようにすることで、視聴者は自身の操作によって仮想カメラの位置を動かすことができる。これにより、視聴者は、3次元空間の中をある程度自由に動いているような感覚を得ることができ、配信者と共に撮影地を移動しているかのような体験を得ることができる。 In this way, by allowing the position of the virtual camera to be set to any position, the viewer can move the position of the virtual camera by his or her own operations. As a result, the viewer can feel as if they are moving freely in the three-dimensional space to some extent, and can have an experience as if they were moving around the filming location with the broadcaster.
 なお、仮想カメラの位置と撮影時のカメラユニットの位置の乖離が大きくなると、3次元空間における各オブジェクトに貼り付けたテクスチャについての違和感が大きくなる。具体的には、前方から撮影した映像がテクスチャとして前部に貼り付けられているオブジェクトの裏側に仮想カメラを配置した場合などである。 Note that as the deviation between the position of the virtual camera and the position of the camera unit at the time of photographing increases, the sense of discomfort regarding the texture pasted to each object in the three-dimensional space increases. Specifically, there is a case where a virtual camera is placed behind an object to which an image taken from the front is pasted as a texture.
 従って、仮想カメラの位置は、撮影時のカメラユニットの位置を中心として所定範囲内に設定可能とされてもよい。これにより、空間的整合性がある程度確保された状態の映像を視聴者に提供することができる。 Therefore, the position of the virtual camera may be set within a predetermined range centered on the position of the camera unit at the time of shooting. Thereby, it is possible to provide the viewer with a video in which spatial consistency is ensured to some extent.
 本実施の形態における視聴者端末4Bが実行する処理の一例を図17に示す。
 なお、第1の実施の形態と同様の処理については同じステップ番号を付し適宜説明を省略する。
FIG. 17 shows an example of processing executed by the viewer terminal 4B in this embodiment.
Note that the same steps as those in the first embodiment are given the same step numbers, and description thereof will be omitted as appropriate.
 視聴者端末4BのCPU71は、ステップS321において、配信者側の音データと映像データと身体情報を受信したか否かを判定する。 In step S321, the CPU 71 of the viewer terminal 4B determines whether or not the audio data, video data, and physical information from the distributor have been received.
 受信したと判定した場合、視聴者端末4BのCPU71は、ステップS303で身体情報を反映したアバターATを生成し、ステップS304で3次元空間を生成し、ステップS305で視聴位置を設定する。 If it is determined that it has been received, the CPU 71 of the viewer terminal 4B generates an avatar AT reflecting the physical information in step S303, generates a three-dimensional space in step S304, and sets a viewing position in step S305.
 続いて、視聴者端末4BのCPU71は、ステップS341において、第2変更情報を受信したか否かを判定する。第2変更情報を受信したと判定した場合、視聴者端末4BのCPU71はステップS342において、視聴位置を変更する処理を行う。 Subsequently, the CPU 71 of the viewer terminal 4B determines whether or not the second change information has been received in step S341. If it is determined that the second change information has been received, the CPU 71 of the viewer terminal 4B performs a process of changing the viewing position in step S342.
 これにより、視聴者の操作に応じて視聴位置が変更可能とされ、3次元空間内をある程度自由に動き回るような体験を得ることができる。 As a result, the viewing position can be changed according to the viewer's operations, allowing the viewer to experience the feeling of moving around freely in a three-dimensional space.
 なお、視聴者端末4BのCPU71は、ステップS305の処理を実行せずに、ステップS341の処理を実行してもよい。そして、視聴者端末4BのCPU71は、第2変更情報を受信したと判定した場合にステップS342において第2変更情報を加味して視聴位置を設定し、第2変更情報を受信していないと判定した場合にステップS342において先の実施の形態のように視聴位置を決定してもよい。
Note that the CPU 71 of the viewer terminal 4B may execute the process of step S341 without executing the process of step S305. Then, when determining that the second change information has been received, the CPU 71 of the viewer terminal 4B sets the viewing position in consideration of the second change information in step S342, and determines that the second change information has not been received. In this case, the viewing position may be determined in step S342 as in the previous embodiment.
<4.変形例>
 上述した例では、実空間における配信者の向きとアバターATの向きを一致させる例について説明した。ここでいう向きとは体の向きや顔の向きや視線の向きであるが、配信者が注視している物体に対する向きを合わせることで配信者とアバターATの向きを一致させるとしてもよい。
<4. Modified example>
In the above example, an example was described in which the orientation of the distributor and the orientation of the avatar AT in real space are made to match. The orientation here refers to the direction of the body, the direction of the face, and the direction of the line of sight, but the orientations of the distributor and the avatar AT may be made to match by matching the orientations with respect to the object that the distributor is gazing at.
 具体的には図18に示すように、3次元空間において配信者がある物体Xに対して体や顔を向けている状況を考える。
 このとき、3次元空間におけるアバターATの向きは実空間における配信者の向きとは一致していないが、注視している物体Xに対して体や顔を向けているという点においては両者の向きは一致していると言える。このように、物体Xに対して正対しているという状況を再現するという意味でアバターATの向きを配信者の向きに合わせてもよい。
Specifically, as shown in FIG. 18, consider a situation in which a broadcaster is directing his or her body or face toward an object X in a three-dimensional space.
At this time, the orientation of the avatar AT in the three-dimensional space does not match the orientation of the broadcaster in the real space, but the orientations of both parties are the same in that the body and face are directed toward the object X that is being watched. can be said to be in agreement. In this way, the orientation of the avatar AT may be adjusted to match the orientation of the broadcaster in order to reproduce the situation where the avatar AT is directly facing the object X.
 そのためには、視聴者端末4(4A)は、配信者の身体情報に基づいて配信者が注視している物体Xを特定する。これにより、当該物体Xが存在する方向に体の向き或いは顔の向きが向くようにアバターATを配置することができる。 To do this, the viewer terminal 4 (4A) identifies the object X that the distributor is gazing at based on the distributor's physical information. Thereby, the avatar AT can be arranged so that its body or face faces the direction in which the object X exists.
 アバターATはどのような態様であってもよい。例えば、上述した各例で説明したように、配信者の画像をテクスチャとして貼り付けた3次元オブジェクトであってもよいし、図19に示すように、何らかのキャラクタ(図19ではジャイアントパンダ)を模した3次元オブジェクトであってもよい。 The avatar AT may take any form. For example, as explained in each example above, it may be a three-dimensional object with the image of the broadcaster pasted as a texture, or it may be a three-dimensional object with the image of the distributor pasted as a texture, or it may be a three-dimensional object with a texture of some kind of character (a giant panda in FIG. 19) as shown in FIG. It may be a three-dimensional object.
 また、アバターATが図20に示すように輪郭のみを有した3次元オブジェクトとされることにより、アバターATによって景色ができるだけ隠れないようにしてもよい。これにより、視聴者は撮影された景色をより楽しむことができる。 Alternatively, the avatar AT may be a three-dimensional object having only an outline, as shown in FIG. 20, so that the scenery is not hidden by the avatar AT as much as possible. This allows the viewer to enjoy the photographed scenery even more.
 3次元空間生成部14は、撮影画像を用いた画像認識処理を行うことにより3次元空間を生成したが、それ以外の方法を用いて3次元空間を生成してもよい。
 例えば、地図サービスを提供している外部のサーバ装置から撮影地点についての3次元オブジェクトの情報を取得し、撮影画像と当該3次元オブジェクトの位置合わせを行った上で当該3次元オブジェクトに撮影画像から切り出した部分画像をテクスチャとして貼り付けることにより3次元空間を生成してもよい。
Although the three-dimensional space generating unit 14 generates the three-dimensional space by performing image recognition processing using the photographed images, the three-dimensional space may be generated using other methods.
For example, information on a 3D object at a shooting location is acquired from an external server device that provides a map service, the captured image and the 3D object are aligned, and then the 3D object is transferred from the captured image to the 3D object. A three-dimensional space may be generated by pasting the cut out partial images as a texture.
<5.まとめ>
 上述した各例において説明したように、情報処理装置としての視聴者端末4(4A、4B)は、撮影者(上述の配信者)の身体情報を反映して撮影者のアバターATを生成するアバター生成部13と、撮影画像から3次元空間(例えば3D空間モデル)の情報を生成し、撮影時における撮影者の向きに応じてアバターATを3次元空間に配置する3次元空間生成部14と、3次元空間において設定された視聴位置からの映像を表示映像として生成する表示映像生成部15と、を備えている。
 撮影者(配信者)が歩きながら自撮りしつつ配信を行う場合には、当該映像を視聴する視聴者は、撮影者と対面しながら撮影者の移動方向に後ずさりしているような映像を視聴することになる。これは、撮影者と共に撮影地を歩いているような感覚になりにくい。
 また、撮影者が移動方向を撮影しながら配信を行う場合には、撮影者が画角に写り込むことが無く、やはり撮影者と共に撮影地を歩いているような感覚にはならない。
 そこで、本技術における情報処理装置としての視聴者端末4(4A、4B)は、撮影者の身体情報を反映したアバターATを生成して撮影画像から生成された3次元空間に配置し、その状態の映像を視聴者に提示する。また、配置されるアバターATの向きは撮影者の向きに応じたものとされる。例えば、撮影者の向きとアバターATの向きが一致するようにアバターATが配置される。
 これにより、視聴者は撮影者と共に撮影地を行動している感覚を得ることができる。
 従って、例えば、自宅にいながら一緒に旅行をしているような一体感を得ることができ疎外感を排除することができる。また、アバターATに撮影者の身体情報が反映されているため、例えば、撮影者の姿勢やジェスチャーなどがアバターATに反映されることにより、視聴者は撮影者に対して声を掛けるタイミング等を自然に把握することができ、円滑なコミュニケーションに寄与することができる。
 なお、上述した各例では、アバター生成部13と、3次元空間生成部14と、表示映像生成部15とを視聴者端末4(4A、4B)が備えている例を説明したが、これ以外の構成であってもよい。
 例えば、情報処理システム1(1A、1B)の配信者端末2(2A)がこれらの各部を備えていてもよいし、情報処理システム1が備えるサーバ装置としての情報処理装置がこれらの各部を備えていてもよい。即ち、クラウドコンピューティングの態様によって上述した構成が実現されてもよい。
 なお、アバターATの向きと撮影者の向きは完全に一致させる必要は無い。例えば、360度を4等分や8等分して定められた離散的な各方向に撮影者の向きを当てはめ、アバターATの向きを当該当てはめた向きに合わせることにより、撮影者の向きとアバターATの向きを略一致させるようにしてもよい。このような態様であっても、視聴者は撮影者と共に旅先で行動を共にしているかのような感覚を得ることができる。
<5. Summary>
As explained in each of the above examples, the viewer terminals 4 (4A, 4B) as information processing devices generate an avatar AT of the photographer by reflecting the physical information of the photographer (the above-mentioned distributor). a generation unit 13; a three-dimensional space generation unit 14 that generates information on a three-dimensional space (for example, a 3D space model) from a photographed image and arranges the avatar AT in the three-dimensional space according to the orientation of the photographer at the time of photographing; It includes a display video generation unit 15 that generates a video from a viewing position set in a three-dimensional space as a display video.
When a videographer (distributor) takes selfies while walking and broadcasts the video, the viewer viewing the video may see a video that appears to be facing the videographer and moving backwards in the direction of the videographer's movement. I will do it. This makes it difficult to feel as if you are walking along the shooting location with the photographer.
Furthermore, when the photographer performs distribution while photographing the moving direction, the photographer is not reflected in the angle of view, and the user does not feel as if he or she is walking along the shooting location.
Therefore, the viewer terminals 4 (4A, 4B) as information processing devices in the present technology generate an avatar AT that reflects the physical information of the photographer, place it in the three-dimensional space generated from the photographed image, and The video will be presented to the viewer. Further, the orientation of the placed avatar AT corresponds to the orientation of the photographer. For example, the avatar AT is arranged so that the direction of the photographer matches the direction of the avatar AT.
This allows the viewer to feel as if they are walking around the filming location together with the photographer.
Therefore, for example, it is possible to get a sense of unity as if you were traveling together while staying at home, and to eliminate feelings of alienation. In addition, since the photographer's physical information is reflected in the avatar AT, for example, the photographer's posture and gestures are reflected in the avatar AT, allowing the viewer to decide when to call out to the photographer. It can be understood naturally and contributes to smooth communication.
In addition, in each of the above-mentioned examples, an example was described in which the viewer terminal 4 (4A, 4B) is equipped with the avatar generation section 13, the three-dimensional space generation section 14, and the display video generation section 15, but other than this It may be configured as follows.
For example, the distributor terminal 2 (2A) of the information processing system 1 (1A, 1B) may include each of these units, or the information processing device as a server device included in the information processing system 1 may include each of these units. You can leave it there. That is, the above-described configuration may be realized by an aspect of cloud computing.
Note that the orientation of the avatar AT and the orientation of the photographer do not need to completely match. For example, by applying the photographer's orientation to each discrete direction determined by dividing 360 degrees into 4 or 8 equal parts, and adjusting the orientation of the avatar AT to the applied orientation, the photographer's orientation and the avatar The directions of the ATs may be made to substantially match. Even with this mode, the viewer can feel as if he or she is traveling with the photographer.
 図4や図5等を参照して説明したように、視聴者端末4(4A、4B)の3次元空間生成部14は、撮影者(配信者)の体の向きとアバターATの体の向きが一致するように配置を行ってもよい。
 これにより、撮影者の移動方向とアバターATの移動方向を一致させることができ、撮影された映像に対して違和感無くアバターATを配置することができる。
 なお、図18に示すように、向きの一致とは、特定の対象物の方向を向くようにすることを指していてもよい。
As explained with reference to FIG. 4, FIG. 5, etc., the three-dimensional space generation unit 14 of the viewer terminal 4 (4A, 4B) is configured to determine the body orientation of the photographer (distributor) and the body orientation of the avatar AT. may be arranged so that they match.
Thereby, the moving direction of the photographer can be made to match the moving direction of the avatar AT, and the avatar AT can be placed in the photographed video without causing any discomfort.
Note that, as shown in FIG. 18, the matching of orientations may refer to facing the direction of a specific object.
 図4や図5等を参照して説明したように、視聴者端末4(4A、4B)の3次元空間生成部14は、撮影者(配信者)の顔の向きとアバターATの顔の向きが一致するように配置を行ってもよい。
 これにより、撮影者が見ている物体とアバターATの顔の向きの先にある物体を一致させることができる。従って、視聴者は、撮影者が興味を示した物体を適切に把握することができ、円滑なコミュニケーションを図ることが可能となる。
As explained with reference to FIG. 4, FIG. may be arranged so that they match.
Thereby, it is possible to match the object that the photographer is looking at with the object that is ahead of the direction of the avatar AT's face. Therefore, the viewer can appropriately understand the object in which the photographer has shown interest, and smooth communication can be achieved.
 図4や図5等を参照して説明したように、視聴者端末4(4A、4B)の3次元空間生成部14は、撮影者(配信者)の視線の向きとアバターATの視線の向きが一致するように配置を行ってもよい。
 これにより、撮影者が見ている物体をより正確に把握することができ、円滑なコミュニケーションを図ることができる。即ち、撮影者が注視している物体が分かればその物体に関する会話などを行うことができ、また、物体が注視しているものであるからこそ会話の広がりを期待することができる。
As explained with reference to FIGS. 4, 5, etc., the three-dimensional space generation unit 14 of the viewer terminal 4 (4A, 4B) is configured to generate the following information: may be arranged so that they match.
This allows the photographer to more accurately grasp the object he or she is looking at, and facilitates smooth communication. That is, if the photographer knows the object that he is gazing at, he can have a conversation about that object, and since the photographer is gazing at the object, it is possible to expect the conversation to expand.
 図16等を参照して第3の実施の形態において説明したように、視聴者端末4(4A、4B)の3次元空間生成部14が設定する視聴位置は撮影時における撮影視点(即ち、撮影時の配信者端末2(2A)のカメラユニットの位置)とは異なる位置に設定可能とされてもよい。
 即ち、生成された3次元空間において任意の位置に撮影時のカメラ位置とは異なる位置に視聴位置を設定することが可能とされている。これにより、視聴者は、自分の意思で自由に視聴位置を動かすことにより、撮影者(配信者)の周囲を自由に移動している感覚を得ることができる。従って、撮影者と行動を共にしているという感覚をより強く得ることが可能となる。
As described in the third embodiment with reference to FIG. The camera unit position of the distributor terminal 2 (2A) may be set to a different position from the current position of the camera unit of the distributor terminal 2 (2A).
That is, it is possible to set the viewing position at an arbitrary position in the generated three-dimensional space, which is different from the camera position at the time of photographing. As a result, the viewer can freely move the viewing position according to his or her own will, thereby giving the viewer the feeling of freely moving around the photographer (distributor). Therefore, it is possible to get a stronger sense of being together with the photographer.
 図16等を参照して第3の実施の形態において説明したように、視聴者端末4(4A、4B)の3次元空間生成部14が設定する視聴位置は、表示映像を視聴する視聴者の操作によって変更可能とされてもよい。
 視聴者は、意思によって視聴位置を自由に動かすことにより、撮影者(配信者)の周囲を自由に移動している感覚を得ることができる。
As described in the third embodiment with reference to FIG. 16 etc., the viewing position set by the three-dimensional space generation unit 14 of the viewer terminal 4 (4A, 4B) is the position of the viewer viewing the displayed video. It may be possible to change it by operation.
By freely moving the viewing position according to their will, the viewer can feel that they are freely moving around the photographer (distributor).
 上述したように、視聴者端末4(4A、4B)が受信する身体情報には撮影者(配信者)の移動速度が含まれ、視聴者端末4(4A、4B)のアバター生成部13は、移動速度を反映してアバターATの生成を行ってもよい。
 これにより、映像の動きに適切なアバターATを表示することができる。具体的には、映像における背景の動きが速いときには走っているアバターATが表示され、背景の動きが止まっているときには立ち止まっているアバターATが表示される。
As described above, the physical information received by the viewer terminals 4 (4A, 4B) includes the moving speed of the photographer (distributor), and the avatar generation unit 13 of the viewer terminals 4 (4A, 4B) The avatar AT may be generated by reflecting the movement speed.
Thereby, it is possible to display an avatar AT appropriate for the movement of the video. Specifically, when the background movement in the video is fast, a running avatar AT is displayed, and when the background movement is stopped, a standing avatar AT is displayed.
 図4等を参照して説明したように、視聴者端末4(4A、4B)の3次元空間生成部14は、撮影画像に対する画像認識の結果に基づいて3次元空間におけるアバターATを配置可能な位置を配置可能位置として特定し、アバターATを配置可能位置に配置してもよい。
 これにより、自然な位置にアバターATを配置することができる。即ち、空間的整合性が確保された映像を視聴者に提供することができる。
As explained with reference to FIG. 4 etc., the three-dimensional space generation unit 14 of the viewer terminal 4 (4A, 4B) can arrange the avatar AT in the three-dimensional space based on the result of image recognition for the captured image. The position may be specified as a position that can be placed, and the avatar AT may be placed at the position that can be placed.
This allows the avatar AT to be placed in a natural position. That is, it is possible to provide the viewer with a video in which spatial consistency is ensured.
 図5等を参照して説明したように、視聴者端末4(4A、4B)の3次元空間生成部14は、配置可能位置が特定できない場合に視聴位置の画角内にアバターATを配置しなくてもよい。
 例えば、3次元空間において、特に、視聴位置からの画角内にアバターATの配置に適したスペースが存在しない場合には、視聴位置の画角内、即ち、視聴者に見える位置にアバターATを配置しないようにされる。
 これにより、不自然な位置に配置されたアバターATを視聴者が視認してしまうことを防止することができる。
As explained with reference to FIG. 5, etc., the three-dimensional space generation unit 14 of the viewer terminal 4 (4A, 4B) places the avatar AT within the viewing angle of the viewing position when a possible placement position cannot be specified. You don't have to.
For example, in a three-dimensional space, if there is no space suitable for arranging the avatar AT within the viewing angle from the viewing position, the avatar AT may be placed within the viewing angle of the viewing position, that is, at a position visible to the viewer. It will not be placed.
Thereby, it is possible to prevent the viewer from visually recognizing the avatar AT placed in an unnatural position.
 図5等を参照して説明したように、視聴者端末4(4A、4B)の3次元空間生成部14は、視聴位置からの画角内に鉛直上方を向く面が存在しない場合に配置可能位置を特定できないと判定してもよい。
 地面などの適切な空間が存在しない場合に、アバターATを画角内に配置しないことで、不自然なアバターATが配置されてしまうことを防止することができる。
As explained with reference to FIG. 5 etc., the three-dimensional space generation unit 14 of the viewer terminal 4 (4A, 4B) can be placed when there is no surface facing vertically upward within the viewing angle from the viewing position. It may be determined that the position cannot be specified.
When an appropriate space such as the ground does not exist, by not arranging the avatar AT within the angle of view, it is possible to prevent an unnatural avatar AT from being arranged.
 図5等を参照して説明したように、視聴者端末4(4A、4B)の表示映像生成部15は、3次元空間生成部14によって使用位置からの画角内にアバターATを配置しないと判定された場合に、アバターATを表示画面上における所定の位置(例えば右上の隅)に表示させてもよい。
 表示画面における例えば角部に撮影者(配信者)のアバターATを別枠等で表示することで、アバターATの配置態様についての違和感を無くしつつ撮影者の様子を視聴者が確認できるようにすることができる。
As explained with reference to FIG. 5 etc., the display video generation unit 15 of the viewer terminal 4 (4A, 4B) has to arrange the avatar AT within the viewing angle from the position of use by the three-dimensional space generation unit 14. If it is determined, the avatar AT may be displayed at a predetermined position (for example, the upper right corner) on the display screen.
To display the avatar AT of the photographer (distributor) in a separate frame, for example, in a corner of the display screen, so that the viewer can check the situation of the photographer while eliminating the sense of incongruity regarding the arrangement of the avatar AT. Can be done.
 上述したように、視聴者端末4(4A、4B)の3次元空間生成部14は、配置可能位置が特定できない場合に、アバターATを視聴位置の画角外にアバターATを配置してもよい。
 アバターATを画角外に配置することで視聴者に不自然な位置に配置されたアバターATを提供せずに済む。また、画角外に配置されたアバターATの位置に撮影者(配信者)が発した音声などの音像が定位されることにより、視聴者は見えない位置に撮影者のアバターATが配置されていることを自然に知覚することができる。
As described above, the three-dimensional space generation unit 14 of the viewer terminal 4 (4A, 4B) may place the avatar AT outside the viewing angle of the viewing position when the possible placement position cannot be specified. .
By arranging the avatar AT outside the viewing angle, it is not necessary to present the avatar AT in an unnatural position to the viewer. In addition, since the sound image such as the voice emitted by the photographer (distributor) is localized to the position of the avatar AT placed outside the field of view, the photographer's avatar AT is placed in a position that the viewer cannot see. You can naturally perceive that you are there.
 図4等を参照して説明したように、視聴者端末4(4A、4B)において生成されるアバターATは撮影者(配信者)を撮影した画像に基づいて生成されてもよい。
 これにより、視聴者は撮影者のアバターATを違和感なく受け入れることができる。
As described with reference to FIG. 4 and the like, the avatar AT generated at the viewer terminals 4 (4A, 4B) may be generated based on an image of the photographer (distributor).
Thereby, the viewer can accept the photographer's avatar AT without feeling uncomfortable.
 図11、図12及び図13等を参照して説明したように、視聴者端末4(4A、4B)は、撮影者(配信者)からの発生音についての音像が3次元空間におけるアバターATの位置に定位された音響信号を生成する音響信号生成部19を備えていてもよい。
 これにより、例えば撮影者から発せられた発話音声がアバターATの位置から聞こえてくるため、視聴者は撮影者がアバターATの位置に存在することを違和感なく受け入れることができる。
As explained with reference to FIG. 11, FIG. 12, FIG. 13, etc., the viewer terminals 4 (4A, 4B) are configured to display a sound image of the sound generated by the photographer (distributor) of the avatar AT in the three-dimensional space. It may also include an acoustic signal generation section 19 that generates an acoustic signal localized at a position.
As a result, for example, the voice uttered by the photographer is heard from the position of the avatar AT, so that the viewer can comfortably accept that the photographer is present at the position of the avatar AT.
 図11、図12及び図13等を参照して説明したように、視聴者端末4(4A、4B)の音響信号生成部19は、3次元空間において設定された視聴位置とアバターATの位置との距離に応じた音量となるように音響信号を生成してもよい。
 これにより、アバターATとの距離に応じた適切な音響信号を生成することができ、視聴者は撮影者(配信者)の音声を違和感なく知覚することができる。
As explained with reference to FIG. 11, FIG. 12, FIG. 13, etc., the audio signal generation unit 19 of the viewer terminal 4 (4A, 4B) is configured to match the viewing position set in the three-dimensional space and the position of the avatar AT. The acoustic signal may be generated to have a volume depending on the distance.
Thereby, an appropriate acoustic signal can be generated according to the distance from the avatar AT, and the viewer can perceive the voice of the photographer (distributor) without feeling any discomfort.
 図11等を参照して説明したように、視聴者端末4(4A、4B)の3次元空間生成部14によって配置されたアバターATの位置は、表示映像を視聴する視聴者の操作によって変更可能とされてもよい。
 これにより、視聴者は、3次元空間において撮影者(配信者)から離れたり撮影者に近づいたりすることができる。
As explained with reference to FIG. 11 etc., the position of the avatar AT placed by the three-dimensional space generation unit 14 of the viewer terminal 4 (4A, 4B) can be changed by the operation of the viewer viewing the displayed video. may be taken as
Thereby, the viewer can move away from the photographer (distributor) or approach the photographer in three-dimensional space.
 図11等を参照して説明したように、視聴者端末4(4A、4B)の3次元空間生成部14によって配置されたアバターATの位置は、撮影者(配信者)の操作によって変更可能とされてもよい。
 これにより、撮影者(配信者)は、3次元空間において視聴者から離れたり視聴者に近づいたりすることができる。
 このように、撮影者と視聴者の距離が一定ではなく断続的に変化することにより、撮影地を実際に歩いているような感覚を視聴者に対して与えることができ好適である。
As explained with reference to FIG. 11 etc., the position of the avatar AT placed by the three-dimensional space generation unit 14 of the viewer terminal 4 (4A, 4B) can be changed by the operation of the photographer (distributor). may be done.
This allows the photographer (distributor) to move away from or approach the viewer in three-dimensional space.
In this way, it is preferable that the distance between the photographer and the viewer is not constant but changes intermittently, giving the viewer the feeling of actually walking through the filming location.
 図1等を参照して説明したように、視聴者端末4(4A、4B)のアバター生成部13は、身体情報をアバターATに反映させる処理を時間方向において複数回行ってもよい。
 これにより、撮影者(配信者)の挙動をアバターATに定期的に反映させることなどが可能となる。
As described with reference to FIG. 1 and the like, the avatar generation unit 13 of the viewer terminal 4 (4A, 4B) may perform the process of reflecting physical information on the avatar AT multiple times in the time direction.
This makes it possible to periodically reflect the behavior of the photographer (distributor) on the avatar AT.
 実施の形態における情報処理方法は、撮影者(配信者)の身体情報を反映して撮影者のアバターATを生成する処理と、撮影画像から3次元空間の情報を生成し、撮影時における撮影者の向きとアバターATの向きが一致するようにアバターATを3次元空間に配置する処理と、3次元空間において設定された視聴位置からの映像を表示映像として生成する処理と、を情報処理装置が実行するものである。 The information processing method in the embodiment includes a process of generating an avatar AT of the photographer by reflecting the physical information of the photographer (distributor), and a process of generating three-dimensional space information from the photographed image, and generating information about the photographer at the time of photographing. The information processing device performs a process of arranging the avatar AT in a three-dimensional space so that the orientation of the avatar AT matches the orientation of the avatar AT, and a process of generating a video from a viewing position set in the three-dimensional space as a display video. It is something to be carried out.
 実施の形態におけるプログラムは、撮影者(配信者)の身体情報を反映して撮影者のアバターATを生成する機能と、撮影画像から3次元空間の情報を生成し、撮影時における撮影者の向きとアバターATの向きが一致するようにアバターATを3次元空間に配置する機能と、3次元空間において設定された視聴位置からの映像を表示映像として生成する機能と、をコンピュータ装置に実行させるものである。
 即ち、図7、図10、図15及び図17に示す各処理の少なくとも一部をコンピュータ装置に実行させるものである。
 このような情報処理方法やプログラムによっても、上記した各実施の形態及び変形例としての視聴者端末4(4A、4B)と同様の作用及び効果を得ることができる。
The program in the embodiment has a function of generating an avatar AT of the photographer by reflecting the physical information of the photographer (distributor), and a function of generating three-dimensional space information from the photographed image and determining the orientation of the photographer at the time of photographing. A computer device that causes a computer device to perform a function of arranging an avatar AT in a three-dimensional space so that the direction of the avatar AT matches the direction of the avatar AT, and a function of generating a video from a viewing position set in the three-dimensional space as a display video. It is.
That is, the computer device is made to execute at least a part of each process shown in FIGS. 7, 10, 15, and 17.
Even with such an information processing method and program, the same operations and effects as those of the viewer terminals 4 (4A, 4B) as the above-described embodiments and modifications can be obtained.
 なお、本明細書に記載された効果はあくまでも例示であって限定されるものではなく、また他の効果があってもよい。 Note that the effects described in this specification are merely examples and are not limiting, and other effects may also exist.
 また、上述した各例はいかように組み合わせてもよく、各種の組み合わせを用いた場合であっても上述した種々の作用効果を得ることが可能である。
Moreover, the above-mentioned examples may be combined in any way, and even when various combinations are used, it is possible to obtain the various effects described above.
<6.本技術>
(1)
 撮影者の身体情報を反映して前記撮影者のアバターを生成するアバター生成部と、
 撮影画像から3次元空間の情報を生成し、撮影時における前記撮影者の向きに応じて前記アバターを前記3次元空間に配置する3次元空間生成部と、
 前記3次元空間において設定された視聴位置からの映像を表示映像として生成する表示映像生成部と、を備えた
 情報処理装置。
(2)
 前記3次元空間生成部は、前記撮影者の体の向きと前記アバターの体の向きが一致するように前記配置を行う
 上記(1)に記載の情報処理装置。
(3)
 前記3次元空間生成部は、前記撮影者の顔の向きと前記アバターの顔の向きが一致するように前記配置を行う
 上記(1)から上記(2)の何れかに記載の情報処理装置。
(4)
 前記3次元空間生成部は、前記撮影者の視線の向きと前記アバターの視線の向きが一致するように前記配置を行う
 上記(1)から上記(2)の何れかに記載の情報処理装置。
(5)
 前記視聴位置は前記撮影時における撮影視点とは異なる位置に設定可能とされた
 上記(1)から上記(4)の何れかに記載の情報処理装置。
(6)
 前記視聴位置は、前記表示映像を視聴する視聴者の操作によって変更可能とされた
 上記(5)に記載の情報処理装置。
(7)
 前記身体情報には前記撮影者の移動速度が含まれ、
 前記アバター生成部は、前記移動速度を反映して前記アバターの生成を行う
 上記(1)から上記(6)の何れかに記載の情報処理装置。
(8)
 前記3次元空間生成部は、前記撮影画像に対する画像認識の結果に基づいて前記3次元空間における前記アバターを配置可能な位置を配置可能位置として特定し、前記アバターを前記配置可能位置に配置する
 上記(1)から上記(7)の何れかに記載の情報処理装置。
(9)
 前記3次元空間生成部は、前記配置可能位置が特定できない場合に前記視聴位置の画角内に前記アバターを配置しない
 上記(8)に記載の情報処理装置。
(10)
 前記3次元空間生成部は、前記画角内に鉛直上方を向く面が存在しない場合に前記配置可能位置を特定できないと判定する
 上記(9)に記載の情報処理装置。
(11)
 前記表示映像生成部は、前記3次元空間生成部によって前記画角内に前記アバターを配置しないと判定された場合に、前記アバターを表示画面上における所定の位置に表示させる
 上記(9)から上記(10)の何れかに記載の情報処理装置。
(12)
 前記3次元空間生成部は、前記配置可能位置が特定できない場合に、前記アバターを前記視聴位置の画角外に前記アバターを配置する
 上記(9)から上記(10)の何れかに記載の情報処理装置。
(13)
 前記アバターは前記撮影者を撮影した画像に基づいて生成される
 上記(1)から上記(12)の何れかに記載の情報処理装置。
(14)
 前記撮影者からの発生音についての音像が前記3次元空間における前記アバターの位置に定位された音響信号を生成する音響信号生成部を備えた
 上記(1)から上記(13)の何れかに記載の情報処理装置。
(15)
 前記音響信号生成部は、前記3次元空間において設定された前記視聴位置と前記アバターの位置との距離に応じた音量となるように前記音響信号を生成する
 上記(14)に記載の情報処理装置。
(16)
 前記アバターの位置は、前記表示映像を視聴する視聴者の操作によって変更可能とされた
 上記(15)に記載の情報処理装置。
(17)
 前記アバターの位置は、前記撮影者の操作によって変更可能とされた
 上記(15)から上記(16)の何れかに記載の情報処理装置。
(18)
 前記アバター生成部は、前記身体情報を前記アバターに反映させる処理を時間方向において複数回行う
 上記(1)から上記(17)の何れかに記載の情報処理装置。
(19)
 撮影者の身体情報を反映して前記撮影者のアバターを生成する処理と、
 撮影画像から3次元空間の情報を生成し、撮影時における前記撮影者の向きに応じて前記アバターを前記3次元空間に配置する処理と、
 前記3次元空間において設定された視聴位置からの映像を表示映像として生成する処理と、を
 情報処理装置が実行する情報処理方法。
(20)
 撮影者の身体情報を反映して前記撮影者のアバターを生成する機能と、
 撮影画像から3次元空間の情報を生成し、撮影時における前記撮影者の向きに応じて前記アバターを前記3次元空間に配置する機能と、
 前記3次元空間において設定された視聴位置からの映像を表示映像として生成する機能と、を
 コンピュータ装置に実行させるプログラム。
<6. This technology>
(1)
an avatar generation unit that generates an avatar of the photographer by reflecting physical information of the photographer;
a three-dimensional space generation unit that generates three-dimensional space information from a photographed image and arranges the avatar in the three-dimensional space according to the orientation of the photographer at the time of photographing;
An information processing device, comprising: a display video generation unit that generates a video from a viewing position set in the three-dimensional space as a display video.
(2)
The information processing device according to (1), wherein the three-dimensional space generation unit performs the arrangement so that the orientation of the photographer's body and the orientation of the avatar's body match.
(3)
The information processing device according to any one of (1) to (2) above, wherein the three-dimensional space generation unit performs the arrangement so that the orientation of the photographer's face and the orientation of the avatar's face match.
(4)
The information processing device according to any one of (1) to (2) above, wherein the three-dimensional space generation unit performs the arrangement so that the direction of the line of sight of the photographer and the direction of the line of sight of the avatar match.
(5)
The information processing device according to any one of (1) to (4) above, wherein the viewing position can be set to a position different from a shooting viewpoint at the time of shooting.
(6)
The information processing device according to (5) above, wherein the viewing position can be changed by an operation of a viewer viewing the displayed video.
(7)
The physical information includes a moving speed of the photographer,
The information processing device according to any one of (1) to (6) above, wherein the avatar generation unit generates the avatar by reflecting the movement speed.
(8)
The three-dimensional space generation unit specifies a position where the avatar can be placed in the three-dimensional space based on the result of image recognition for the captured image, and places the avatar at the position where the avatar can be placed. The information processing device according to any one of (1) to (7) above.
(9)
The information processing device according to (8), wherein the three-dimensional space generation unit does not arrange the avatar within the viewing angle of the viewing position when the positionable position cannot be specified.
(10)
The information processing device according to (9), wherein the three-dimensional space generation unit determines that the positionable position cannot be specified when there is no surface facing vertically upward within the angle of view.
(11)
The display image generation unit displays the avatar at a predetermined position on the display screen when the three-dimensional space generation unit determines that the avatar is not placed within the angle of view. The information processing device according to any one of (10).
(12)
The information according to any one of (9) to (10) above, wherein the three-dimensional space generation unit places the avatar outside the viewing angle of the viewing position when the positionable position cannot be specified. Processing equipment.
(13)
The information processing device according to any one of (1) to (12) above, wherein the avatar is generated based on an image of the photographer.
(14)
According to any one of (1) to (13) above, comprising an acoustic signal generation unit that generates an acoustic signal in which a sound image of the sound generated by the photographer is localized to the position of the avatar in the three-dimensional space. information processing equipment.
(15)
The information processing device according to (14) above, wherein the audio signal generation unit generates the audio signal to have a volume corresponding to a distance between the viewing position set in the three-dimensional space and the position of the avatar. .
(16)
The information processing device according to (15) above, wherein the position of the avatar can be changed by an operation of a viewer viewing the displayed video.
(17)
The information processing device according to any one of (15) to (16) above, wherein the position of the avatar can be changed by an operation of the photographer.
(18)
The information processing device according to any one of (1) to (17) above, wherein the avatar generation unit performs a process of reflecting the physical information on the avatar multiple times in a time direction.
(19)
a process of generating an avatar of the photographer by reflecting physical information of the photographer;
A process of generating three-dimensional space information from a photographed image and arranging the avatar in the three-dimensional space according to the orientation of the photographer at the time of photographing;
An information processing method in which an information processing apparatus executes the following steps: generating a video from a viewing position set in the three-dimensional space as a display video.
(20)
a function of generating an avatar of the photographer by reflecting physical information of the photographer;
a function of generating three-dimensional space information from a photographed image and arranging the avatar in the three-dimensional space according to the orientation of the photographer at the time of photographing;
A program that causes a computer device to execute a function of generating a video from a viewing position set in the three-dimensional space as a display video.
4、4A、4B 視聴者端末
13 アバター生成部
14 3次元空間生成部
15 表示映像生成部
19 音響信号生成部
71 CPU
AT アバター
4, 4A, 4B Viewer terminal 13 Avatar generation section 14 Three-dimensional space generation section 15 Display video generation section 19 Audio signal generation section 71 CPU
AT Avatar

Claims (20)

  1.  撮影者の身体情報を反映して前記撮影者のアバターを生成するアバター生成部と、
     撮影画像から3次元空間の情報を生成し、撮影時における前記撮影者の向きに応じて前記アバターを前記3次元空間に配置する3次元空間生成部と、
     前記3次元空間において設定された視聴位置からの映像を表示映像として生成する表示映像生成部と、を備えた
     情報処理装置。
    an avatar generation unit that generates an avatar of the photographer by reflecting physical information of the photographer;
    a three-dimensional space generation unit that generates three-dimensional space information from a photographed image and arranges the avatar in the three-dimensional space according to the orientation of the photographer at the time of photographing;
    An information processing device, comprising: a display video generation unit that generates a video from a viewing position set in the three-dimensional space as a display video.
  2.  前記3次元空間生成部は、前記撮影者の体の向きと前記アバターの体の向きが一致するように前記配置を行う
     請求項1に記載の情報処理装置。
    The information processing device according to claim 1, wherein the three-dimensional space generation unit performs the arrangement so that the orientation of the photographer's body and the orientation of the avatar's body match.
  3.  前記3次元空間生成部は、前記撮影者の顔の向きと前記アバターの顔の向きが一致するように前記配置を行う
     請求項1に記載の情報処理装置。
    The information processing apparatus according to claim 1, wherein the three-dimensional space generation unit performs the arrangement so that the orientation of the photographer's face and the orientation of the avatar's face match.
  4.  前記3次元空間生成部は、前記撮影者の視線の向きと前記アバターの視線の向きが一致するように前記配置を行う
     請求項1に記載の情報処理装置。
    The information processing device according to claim 1, wherein the three-dimensional space generation unit performs the arrangement so that the direction of the line of sight of the photographer and the direction of the line of sight of the avatar match.
  5.  前記視聴位置は前記撮影時における撮影視点とは異なる位置に設定可能とされた
     請求項1に記載の情報処理装置。
    The information processing device according to claim 1, wherein the viewing position can be set to a position different from a photographing viewpoint at the time of photographing.
  6.  前記視聴位置は、前記表示映像を視聴する視聴者の操作によって変更可能とされた
     請求項5に記載の情報処理装置。
    The information processing device according to claim 5, wherein the viewing position can be changed by an operation of a viewer viewing the displayed video.
  7.  前記身体情報には前記撮影者の移動速度が含まれ、
     前記アバター生成部は、前記移動速度を反映して前記アバターの生成を行う
     請求項1に記載の情報処理装置。
    The physical information includes a moving speed of the photographer,
    The information processing device according to claim 1, wherein the avatar generation unit generates the avatar by reflecting the movement speed.
  8.  前記3次元空間生成部は、前記撮影画像に対する画像認識の結果に基づいて前記3次元空間における前記アバターを配置可能な位置を配置可能位置として特定し、前記アバターを前記配置可能位置に配置する
     請求項1に記載の情報処理装置。
    The three-dimensional space generation unit specifies a position where the avatar can be placed in the three-dimensional space based on a result of image recognition for the captured image, and places the avatar at the position where the avatar can be placed. The information processing device according to item 1.
  9.  前記3次元空間生成部は、前記配置可能位置が特定できない場合に前記視聴位置の画角内に前記アバターを配置しない
     請求項8に記載の情報処理装置。
    The information processing device according to claim 8, wherein the three-dimensional space generation unit does not arrange the avatar within the viewing angle of the viewing position if the positionable position cannot be specified.
  10.  前記3次元空間生成部は、前記画角内に鉛直上方を向く面が存在しない場合に前記配置可能位置を特定できないと判定する
     請求項9に記載の情報処理装置。
    The information processing device according to claim 9, wherein the three-dimensional space generation unit determines that the positionable position cannot be specified when there is no surface facing vertically upward within the angle of view.
  11.  前記表示映像生成部は、前記3次元空間生成部によって前記画角内に前記アバターを配置しないと判定された場合に、前記アバターを表示画面上における所定の位置に表示させる
     請求項9に記載の情報処理装置。
    The display video generation unit displays the avatar at a predetermined position on the display screen when the three-dimensional space generation unit determines that the avatar is not placed within the angle of view. Information processing device.
  12.  前記3次元空間生成部は、前記配置可能位置が特定できない場合に、前記アバターを前記視聴位置の画角外に前記アバターを配置する
     請求項9に記載の情報処理装置。
    The information processing device according to claim 9, wherein the three-dimensional space generation unit arranges the avatar outside the viewing angle of the viewing position when the position where the position can be placed cannot be specified.
  13.  前記アバターは前記撮影者を撮影した画像に基づいて生成される
     請求項1に記載の情報処理装置。
    The information processing device according to claim 1, wherein the avatar is generated based on an image of the photographer.
  14.  前記撮影者からの発生音についての音像が前記3次元空間における前記アバターの位置に定位された音響信号を生成する音響信号生成部を備えた
     請求項1に記載の情報処理装置。
    The information processing apparatus according to claim 1, further comprising an acoustic signal generation unit that generates an acoustic signal in which a sound image of the sound generated by the photographer is localized to the position of the avatar in the three-dimensional space.
  15.  前記音響信号生成部は、前記3次元空間において設定された前記視聴位置と前記アバターの位置との距離に応じた音量となるように前記音響信号を生成する
     請求項14に記載の情報処理装置。
    The information processing device according to claim 14, wherein the acoustic signal generation unit generates the acoustic signal so that the volume corresponds to a distance between the viewing position set in the three-dimensional space and the position of the avatar.
  16.  前記アバターの位置は、前記表示映像を視聴する視聴者の操作によって変更可能とされた
     請求項15に記載の情報処理装置。
    The information processing device according to claim 15, wherein the position of the avatar can be changed by an operation of a viewer viewing the displayed video.
  17.  前記アバターの位置は、前記撮影者の操作によって変更可能とされた
     請求項15に記載の情報処理装置。
    The information processing apparatus according to claim 15, wherein the position of the avatar can be changed by an operation of the photographer.
  18.  前記アバター生成部は、前記身体情報を前記アバターに反映させる処理を時間方向において複数回行う
     請求項1に記載の情報処理装置。
    The information processing device according to claim 1, wherein the avatar generation unit performs a process of reflecting the physical information on the avatar multiple times in a time direction.
  19.  撮影者の身体情報を反映して前記撮影者のアバターを生成する処理と、
     撮影画像から3次元空間の情報を生成し、撮影時における前記撮影者の向きに応じて前記アバターを前記3次元空間に配置する処理と、
     前記3次元空間において設定された視聴位置からの映像を表示映像として生成する処理と、を
     情報処理装置が実行する情報処理方法。
    a process of generating an avatar of the photographer by reflecting physical information of the photographer;
    A process of generating three-dimensional space information from a photographed image and arranging the avatar in the three-dimensional space according to the orientation of the photographer at the time of photographing;
    An information processing method in which an information processing apparatus executes the following steps: generating a video from a viewing position set in the three-dimensional space as a display video.
  20.  撮影者の身体情報を反映して前記撮影者のアバターを生成する機能と、
     撮影画像から3次元空間の情報を生成し、撮影時における前記撮影者の向きに応じて前記アバターを前記3次元空間に配置する機能と、
     前記3次元空間において設定された視聴位置からの映像を表示映像として生成する機能と、を
     コンピュータ装置に実行させるプログラム。
    a function of generating an avatar of the photographer by reflecting physical information of the photographer;
    a function of generating three-dimensional space information from a photographed image and arranging the avatar in the three-dimensional space according to the orientation of the photographer at the time of photographing;
    A program that causes a computer device to execute a function of generating a video from a viewing position set in the three-dimensional space as a display video.
PCT/JP2023/027306 2022-08-10 2023-07-26 Information processing device, information processing method, and program WO2024034396A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2022-128046 2022-08-10
JP2022128046 2022-08-10

Publications (1)

Publication Number Publication Date
WO2024034396A1 true WO2024034396A1 (en) 2024-02-15

Family

ID=89851574

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2023/027306 WO2024034396A1 (en) 2022-08-10 2023-07-26 Information processing device, information processing method, and program

Country Status (1)

Country Link
WO (1) WO2024034396A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2013047962A (en) * 2012-10-01 2013-03-07 Olympus Imaging Corp Reproduction display device, reproduction display program, reproduction display method, and image processing server
JP2017076998A (en) * 2016-11-22 2017-04-20 オリンパス株式会社 Image processing device, image processing method, and program
JP2020087277A (en) * 2018-11-30 2020-06-04 株式会社ドワンゴ Video synthesizer, method for synthesizing video, and video synthesizing program

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2013047962A (en) * 2012-10-01 2013-03-07 Olympus Imaging Corp Reproduction display device, reproduction display program, reproduction display method, and image processing server
JP2017076998A (en) * 2016-11-22 2017-04-20 オリンパス株式会社 Image processing device, image processing method, and program
JP2020087277A (en) * 2018-11-30 2020-06-04 株式会社ドワンゴ Video synthesizer, method for synthesizing video, and video synthesizing program

Similar Documents

Publication Publication Date Title
JP6724110B2 (en) Avatar display system in virtual space, avatar display method in virtual space, computer program
KR101918262B1 (en) Method and system for providing mixed reality service
US11450071B2 (en) Adapting acoustic rendering to image-based object
KR20200071099A (en) Mixed reality space audio
US20050275913A1 (en) Binaural horizontal perspective hands-on simulator
US10359988B2 (en) Shared experience of virtual environments
CN112533017B (en) Live broadcast method, device, terminal and storage medium
JP6739611B1 (en) Class system, viewing terminal, information processing method and program
CN117597916A (en) Protecting private audio in virtual conferences and applications thereof
CN115428420A (en) Apparatus and method for providing augmented reality interaction
US20240303947A1 (en) Information processing device, information processing terminal, information processing method, and program
CN109996060B (en) Virtual reality cinema system and information processing method
US20240087213A1 (en) Selecting a point to navigate video avatars in a three-dimensional environment
KR20210056414A (en) System for controlling audio-enabled connected devices in mixed reality environments
WO2024034396A1 (en) Information processing device, information processing method, and program
US20230186552A1 (en) System and method for virtualized environment
US11641459B2 (en) Viewing system, distribution apparatus, viewing apparatus, and recording medium
CN113194329B (en) Live interaction method, device, terminal and storage medium
WO2022209129A1 (en) Information processing device, information processing method and program
CN111448805B (en) Apparatus, method, and computer-readable storage medium for providing notification
JP2022043909A (en) Content presenting device and program
US11776227B1 (en) Avatar background alteration
US11741652B1 (en) Volumetric avatar rendering
WO2024009653A1 (en) Information processing device, information processing method, and information processing system
EP4357884A1 (en) Controlling vr/ar headsets

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23852372

Country of ref document: EP

Kind code of ref document: A1