WO2022102446A1 - Dispositif, procédé et système de traitement d'informations et procédé de génération de données - Google Patents

Dispositif, procédé et système de traitement d'informations et procédé de génération de données Download PDF

Info

Publication number
WO2022102446A1
WO2022102446A1 PCT/JP2021/040217 JP2021040217W WO2022102446A1 WO 2022102446 A1 WO2022102446 A1 WO 2022102446A1 JP 2021040217 W JP2021040217 W JP 2021040217W WO 2022102446 A1 WO2022102446 A1 WO 2022102446A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
event
data
sound
information processing
Prior art date
Application number
PCT/JP2021/040217
Other languages
English (en)
Japanese (ja)
Inventor
正宏 高橋
幸子 西出
実祐樹 白川
崇基 津田
Original Assignee
ソニーグループ株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ソニーグループ株式会社 filed Critical ソニーグループ株式会社
Publication of WO2022102446A1 publication Critical patent/WO2022102446A1/fr

Links

Images

Classifications

    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F13/00Video games, i.e. games using an electronically generated display having two or more dimensions
    • A63F13/50Controlling the output signals based on the game progress
    • A63F13/53Controlling the output signals based on the game progress involving additional visual information provided to the game scene, e.g. by overlay to simulate a head-up display [HUD] or displaying a laser sight in a shooting game
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F13/00Video games, i.e. games using an electronically generated display having two or more dimensions
    • A63F13/50Controlling the output signals based on the game progress
    • A63F13/54Controlling the output signals based on the game progress involving acoustic signals, e.g. for simulating revolutions per minute [RPM] dependent engine sounds in a driving game or reverberation against a virtual wall
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F13/00Video games, i.e. games using an electronically generated display having two or more dimensions
    • A63F13/60Generating or modifying game content before or while executing the game program, e.g. authoring tools specially adapted for game development or game-integrated level editor
    • A63F13/65Generating or modifying game content before or while executing the game program, e.g. authoring tools specially adapted for game development or game-integrated level editor automatically by game devices or servers from real world data, e.g. measurement in live racing competition
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63GMERRY-GO-ROUNDS; SWINGS; ROCKING-HORSES; CHUTES; SWITCHBACKS; SIMILAR DEVICES FOR PUBLIC AMUSEMENT
    • A63G31/00Amusement arrangements
    • A63G31/16Amusement arrangements creating illusions of travel
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63JDEVICES FOR THEATRES, CIRCUSES, OR THE LIKE; CONJURING APPLIANCES OR THE LIKE
    • A63J99/00Subject matter not provided for in other groups of this subclass
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0487Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/76Television signal recording
    • H04N5/91Television signal processing therefor
    • H04N5/92Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/76Television signal recording
    • H04N5/91Television signal processing therefor
    • H04N5/93Regeneration of the television signal or of selected parts thereof

Definitions

  • This technology relates to an information processing device, an information processing method, an information processing system, and a data generation method capable of providing augmented reality content.
  • Patent Document 1 describes a system that provides content in real time according to the movement of the user's body.
  • the timing of a predetermined state in the movement operation of the user is predicted, and the content is played back at the predicted timing. This makes it possible to reproduce, for example, virtual footstep content at an appropriate timing according to the walking of the user (paragraphs [0016] [0020] [0028] [0036] of the specification of Patent Document 1]. FIG. 1, FIG. 4, etc.).
  • Cited Document 1 a method of expanding the sense of reality by reproducing sound has been developed. While users can individually experience augmented reality (AR) with such sounds, it may be difficult to convey the content of the experience to others. Therefore, there is a demand for a technique capable of sharing the experience of augmented reality by sound.
  • AR augmented reality
  • the purpose of this technique is to provide an information processing device, an information processing method, an information processing system, and a data generation method capable of sharing the experience of augmented reality by sound.
  • the information processing apparatus includes an acquisition unit and a generation unit.
  • the acquisition unit acquires event information in which audiovisual data including sound data presented at the event is recorded along a timeline for each event that occurs while the user is experiencing augmented reality by sound.
  • the generation unit Based on the event information, the generation unit generates reproduction data for reproducing the audiovisual data with reference to the timeline.
  • event information that records the events that occurred during the experience of augmented reality by sound is acquired.
  • audiovisual data including sound data presented at each event is recorded along the timeline.
  • reproduction data for reproducing audiovisual data on the timeline is generated. This makes it possible to reproduce the user's experience and share the augmented reality experience with sound.
  • the audiovisual data may include visual data related to the event.
  • the visual data may be data that visually represents the content of the sound data.
  • the visual data may be data that visually represents the content of the sound or environmental sound picked up when the event occurs.
  • the event information may include the detection timing of the trigger that causes the event in the timeline.
  • the event information may include the presentation timing of the audiovisual data at the event.
  • the generation unit may set the timing at which the audiovisual data is reproduced in the reproduction data based on the presentation timing of the audiovisual data.
  • the presentation timing of the audiovisual data may be recorded as a time relative to the detection timing of the trigger.
  • the event information may include the presentation position of the audiovisual data in the event.
  • the generation unit may set a position in which the audiovisual data is reproduced in the reproduction data based on the presentation position of the audiovisual data.
  • the audiovisual data may be data generated based on the situation information representing the situation when the event occurs, or data selected based on the situation information.
  • the status information may include information about at least one of the user's surrounding environment, the user's status, and the user's operation content.
  • the acquisition unit may acquire the event information regarding an event targeting the first user.
  • the generation unit provides shared data as the reproduction data, which is presented to a second user different from the first user so that an event targeting the first user can be viewed. May be generated.
  • the generation unit is the shared data when the relative distance between the first user and the second user, or at least one of the affiliations of the first user and the second user satisfies a predetermined condition. May be generated.
  • the generation unit reproduces the sound data in the shared data based on the relative positional relationship between the presentation position of the sound data in the event targeting the first user and the position of the second user. At least one of the volume and the playback direction may be adjusted.
  • the generation unit may generate reproduction content that reproduces the event along the timeline as the reproduction data.
  • the acquisition unit may acquire an experience image taken when the user experiences augmented reality with the sound.
  • the generation unit may generate the reproduction content by adding an effect based on the audiovisual data to the experience image along the timeline.
  • the experience image may be an image taken by at least one of a camera carried by a person who has experienced augmented reality with the sound including the user, or a camera arranged around the experience person.
  • the generation unit adjusts at least one of the reproduction volume or reproduction direction of the sound data in the reproduction content based on the relative positional relationship between the presentation position of the sound data in the event and the shooting position of the experience image. You may.
  • the information processing method is an information processing method executed by a computer system, and the sound data presented at the event is used for each event that occurs while the user is experiencing the augmented reality by sound. Includes audiovisual data to acquire event information recorded along the timeline. Based on the event information, reproduction data for reproducing the audiovisual data is generated with reference to the timeline.
  • the information processing system has a recording unit, an acquisition unit, and a generation unit.
  • the recording unit generates event information by recording audiovisual data including sound data presented at the event along a timeline for each event that occurs while the user is experiencing augmented reality by sound.
  • the acquisition unit acquires the event information.
  • the generation unit Based on the event information, the generation unit generates reproduction data for reproducing the audiovisual data with reference to the timeline.
  • the data generation method is a data generation method executed by a computer system, and the sound data presented at the event is used for each event that occurs while the user is experiencing the augmented reality by sound. Includes audiovisual data to acquire event information recorded along the timeline. Based on the event information, reproduction data for reproducing the audiovisual data is generated with reference to the timeline.
  • FIG. 1 is a schematic diagram showing a configuration example of an information processing system according to a first embodiment of the present technology.
  • the information processing system 100 is a system that provides a sound AR (sound AR) experience for a plurality of users 1.
  • the information processing system 100 includes a server device 20 and at least one terminal device 30.
  • the server device 20 is a device that manages the entire AR experience by sound.
  • the terminal device 30 is a device carried and used by each user 1.
  • the server device 20 and each terminal device 30 are communicably connected via the network 50.
  • the network 50 for example, a connection line using the Internet, a private network, or the like is used.
  • FIG. 2 is a schematic diagram showing an outline of AR by sound.
  • AR by sound is augmented reality (AR) that allows user 1 to perceive virtual objects and interactions by superimposing sounds such as voices and sound effects (AR sound 3) on the real world.
  • AR by sound is provided as an attraction such as a theme park or various hands-on events.
  • the user 1 who participates in the attraction moves around the venue with a terminal device 30 to which the earphone 2 is attached, as shown in FIG. 2, for example.
  • the earphone 2 for example, an open type earphone configured to hear an external sound is used.
  • a closed type earphone capable of capturing external sound may be used. This makes it possible to superimpose the AR sound 3 to be reproduced on the sound in the real world and present it.
  • a playback device such as headphones may be used.
  • AR by sound not only AR sound 3 but also AR visual may be reproduced.
  • the AR visual is, for example, visual information such as a virtual character superimposed on a real-world image and a visual effect added to the real-world image.
  • a real-world image taken by a camera mounted on the terminal device 30 is displayed on the display of the terminal device 30.
  • the AR visual is superimposed on the image in the real world.
  • information related to the AR sound 3 for example, a character or the like
  • information representing the contents of the AR sound 3 for example, an effect representing an onomatopoeia
  • the AR visual does not necessarily have to be related to the AR sound 3.
  • the position information and actions of the user 1 are appropriately detected.
  • the position information of the user 1 is detected using Bluetooth (registered trademark), a beacon, GPS, or the like, and the behavior of the user 1 is detected using a motion sensor such as an IMU (inertial measurement unit).
  • the experience of the user 1 is produced by playing back the AR sound 3 in the real world by using the position information, the action, and the like of the user 1 detected in this way as a trigger. Therefore, the content experienced by the user 1 is different for each user 1.
  • a virtual footstep 3a and a virtual voice 3b are reproduced as the AR sound 3.
  • AR sounds are schematically illustrated using musical notes.
  • a virtual footstep 3a (sound effect) is presented in accordance with the movement operation of the user 1.
  • a virtual footstep 3a is reproduced from the earphone 2 in accordance with the landing timing.
  • This makes it possible to perceive the sensation of walking on a virtual ground such as a snow field or a puddle, or to perceive the sensation of walking as a virtual character.
  • a virtual voice 3b localized at a predetermined position in the real world is presented.
  • the virtual sound 3b localized on the bench 51 installed in the real world is reproduced.
  • the virtual audio 3b is reproduced so that it can be heard from behind the user 1.
  • the volume of the virtual voice 3b is adjusted so as to become louder as it approaches the bench 51. This makes it possible to experience the sensation as if an invisible character exists on the bench 51.
  • the content of AR by sound is not limited, and it is possible to design arbitrary content using AR sound such as voice, sound effect, and BGM.
  • a session is a group of events that occur from the start to the end of a series of scenarios.
  • the period from the start to the end of the scenario may be referred to as a session.
  • At least one event occurs while the scenario is in progress, triggered by the position information, action, and the like of the user 1.
  • AR sound 3 will be presented in time with this event.
  • AR visuals may be presented according to the event. That is, it can be said that the event is a series of processes that occur during the session and add data such as AR sound 3 and AR visual to the real world.
  • a session is also a set of events that includes at least one event.
  • one event occurs every time the foot of the user 1 lands, and the virtual footstep 3a is reproduced.
  • the event will be generated as many times as the number of times the user 1 performs the landing operation, and one session will include many events.
  • one event is generated triggered by the action of the user 1, and the virtual voice 3b is reproduced.
  • an event that presents the next voice or the like according to the scenario can be started.
  • a plurality of events designed according to the scenario are sequentially generated according to the user's behavior and the like, and the scenario progresses.
  • a set of events that occur sequentially is a session.
  • the user 1 may participate in the attraction alone or in a group.
  • attractions may be accompanied by guides. Therefore, it can be said that the user 1 is a person who has experienced AR by sound (a participant of an attraction).
  • the guide is a staff member who is in charge of the progress of the scenario while sharing the experience in the same group as the user 1.
  • a group is an organization that includes a plurality of users 1 who share an experience in the same session. If a guide is attached, the group consisting of user 1 and the guide is a group.
  • an AR sound 3 for example, a virtual footstep
  • the AR sound 3 is played back by the earphone 2, and the AR sound 3 is presented only to the user 1 who has performed the operation.
  • other users 1 who participate in the same session do not perceive the content experienced by the user 1 who generated the event.
  • AR by sound it is conceivable that the experience of each user 1 is not shared even among the users 1 who are participating in the same session.
  • FIG. 3 is a schematic diagram showing an event information generation flow.
  • the event information 10 is information that records the contents of the event E that occurs during the session.
  • the event information 10 is used to execute a process of presenting an event E generated for each user 1 so that another user 1 can view the event E.
  • the generation flow of the event information 10 will be described with reference to FIG.
  • AR data 11 is presented for each event E.
  • the AR data 11 includes auditory data (data of AR sound 3) superimposed on the real world.
  • the AR data 11 includes visual data related to the event E (data of the AR visual 4).
  • the AR data 11 corresponds to audiovisual data.
  • FIG. 3 schematically illustrates the sound library 13 and the visual library 14 as libraries of AR data 11 presented during the session.
  • the sound library 13 is a database in which the data of the AR sound 3 created in advance is stored.
  • AR sound 3 is a general term for sounds generated in a virtual space in AR by sound.
  • the data of AR sound 3 may be referred to and simply referred to as AR sound 3.
  • the data of the AR sound 3 corresponds to the sound data.
  • the AR sound 3 includes, for example, sound effects, narration, BGM, and the like.
  • a sound effect is, for example, a sound that is played in response to a trigger detected during a session.
  • the virtual footsteps described with reference to FIG. 2 are examples of sound effects.
  • Narration is, for example, voice in a language used for the progress of a scenario.
  • the virtual voice described with reference to FIG. 2 is an example of narration.
  • BGM is, for example, a sound that is reproduced regardless of a trigger. Also, BGM according to the trigger May be changed. Music data, environmental sounds, and the like are used for BGM.
  • the visual library 14 is a database in which the data of the AR visual 4 produced in advance is stored.
  • AR visual 4 is a general term for visual information virtually added to an object or sound, and includes image data and effect data accompanied by visual effects.
  • the data of AR visual 4 may be referred to and simply referred to as AR visual 4.
  • the data of AR visual 4 corresponds to visual data.
  • the AR visual 4 is presented so that it can be viewed in real time through the camera.
  • the AR visual 4 includes, for example, graphic data of a virtual character, text data such as utterances and onomatopoeia, effect data imitating a natural phenomenon, digital effect data, and the like.
  • the graphic data of the virtual character is, for example, 3D model data representing the virtual character or two-dimensional image data.
  • Text data such as utterances and onomatopoeia are data (fonts, animations, etc.) that display texts that express utterance contents and texts that express sound effects by onomatopoeia, for example.
  • the data of the effect imitating a natural phenomenon is image data expressing a natural phenomenon such as rain, wind, thunder, snow, petals, and fallen leaves.
  • the digital effect data is, for example, data that specifies image processing. For example, data that specifies a visual effect such as blurring an image or vibrating the entire screen is used.
  • the AR sound 3 and the AR visual 4 are stored in association with each other.
  • the AR sound 3 representing a virtual footstep is stored in association with the AR visual 4 representing the onomatopoeia of the footstep as text.
  • the AR visual 4 of the virtual character and the AR sound 3 representing the sound thereof are stored in association with each other.
  • the AR visual 4 includes data that visually represents the contents of the AR sound 3.
  • FIG. 3 schematically shows the video library 15 and the event library 16, respectively.
  • the video library 15 and the event library 16 are databases in which data generated during the session are stored, and are sequentially constructed as the session progresses.
  • the video library 15 is a database in which the captured video 5 is stored.
  • the captured image 5 is an image captured when the user 1 experiences AR by sound. That is, the captured image 5 is captured while the session is in progress (during the actual experience of the user).
  • the captured image 5 is an image taken by a camera (hereinafter referred to as a portable camera) carried by a person who has experienced AR by sound including the user 1.
  • the portable camera is provided in, for example, the terminal device 30 carried by each user 1.
  • the captured image 5 is an image taken by a camera (hereinafter referred to as a fixed point camera) arranged around the experiencer.
  • the fixed point camera may be fixed like a streetlight camera or may be movable.
  • the captured image 5 may include a still image in addition to the moving image. Further, the user 1 shown in the captured image 5 can be identified by the position information of the user 1, subject recognition, and the like. In the present embodiment, the captured image 5 corresponds to an experience image.
  • the event library 16 is a database in which event information 10 is stored.
  • the event information 10 is information in which the contents of each event that occurs during the session are recorded along the timeline.
  • the event information 10 is generated by each terminal device 30. Then, the event information 10 transmitted from each terminal device 30 is stored in the event library 16.
  • AR sound 3 is presented when an event occurs.
  • the AR sound 3 presented at the event in this way is recorded.
  • the AR visual 4 related to the AR sound 3 is recorded in the event information 10. Note that this AR visual 4 does not have to be the one actually presented during the event.
  • time information representing a time (time) on the timeline 6 is added to the event information 10.
  • the timeline 6 is schematically illustrated with thick black arrows.
  • the timeline 6 is, for example, a time axis in one session.
  • a timeline 6 is set in which the start time of the session is zero.
  • the standard time of the area where the attraction is performed may be used.
  • Time Line 6 is used as a reference when describing the time in a session.
  • the time information includes information indicating the timing at which the event E has occurred in the timeline 6.
  • the detection timing of the trigger according to the action of the user 1 is recorded as the timing at which the event E occurs. That is, the event information 10 includes the detection timing of the trigger that causes the event E on the timeline 6. This makes it possible to share each event E at an appropriate timing.
  • the time information includes information indicating the presentation timing for presenting the AR data 11 (AR sound 3 and AR visual 4) on the timeline 6. That is, the event information 10 includes the presentation timing of the AR data 11 in the event E.
  • the presentation timing of the AR data 11 may be, for example, the timing when the event E occurs at the same time. Further, when the AR data 11 is presented after a certain waiting time after the event E occurs, the waiting time may be recorded as time information. As described above, the presentation timing of the AR data 11 may be recorded as a time relative to the detection timing of the trigger 8. Therefore, the presentation timing of the AR data 11 may be the time when the AR data 11 is actually presented, or the scheduled time when the AR data 11 is presented. This makes it possible to freely set the timing at which the AR data 11 is reproduced.
  • the event information is the information in which the AR data 11 including the AR sound 3 presented in the event E is recorded along the timeline for each event E generated while the user 1 is experiencing the AR by sound.
  • recording data in the present disclosure includes both storing the target data itself and storing information that specifies the target data. Therefore, the event information 10 may store the data of the AR sound 3 or the AR visual 4 as it is, or may store the information (ID number or the like) that specifies the AR sound 3 or the AR visual 4.
  • each position information 7 is schematically illustrated by using an icon representing a place.
  • the position information 7 includes the location (experience position) of the user 1 when the event E occurs.
  • the experience position of the user 1 is calculated from, for example, the output of the position sensor mounted on the terminal device 30 and the captured image 5.
  • the position information 7 includes a presentation position for presenting the AR data 11 (AR sound 3 and AR visual 4) at the event E.
  • the presentation position of the AR data 11 differs depending on the type of event and the like. For example, in the event E that presents a virtual footstep, the presentation position of the AR data 11 coincides with the experience position of the user 1. Further, for example, in the event E in which the virtual voice is localized and presented in the real world, the presentation position of the AR data 11 is set in advance.
  • the presentation positions of the AR sound 3 and the AR visual 4 are not always the same.
  • the event information may include an ID that identifies the user who is the target of the event, information indicating the affiliation of the user, and the like.
  • FIG. 3 consider a state in which one session is progressing along the timeline 6. While the session is in progress, the presence or absence of the trigger 8 is constantly monitored for each user 1 (terminal device 30) who participates in the session. Then, when the trigger 8 (event E) is detected, the process of presenting the AR data 11 corresponding to the event E is executed. Specifically, according to the detected trigger 8, AR sound 3 and AR visual 4 are provided from the sound library 13 and the visual library 14, respectively, and are reproduced at appropriate timings. This makes it possible to provide an AR experience with sound according to the event E.
  • the AR sound 3 and the AR visual 4 corresponding to the event E are recorded along the timeline 6 as the event information 10.
  • the experience position of the user 1, the presentation position of the AR data 11, the presentation timing of the AR data 11, and the like are added to the event information 10.
  • the captured image 5 captured by the portable camera or the fixed point camera is recorded in the video library 15 together with the position information 7 including the photographing position and the photographing direction.
  • the captured image 5 captured by the mobile camera is an image subjectively captured by the user 1's experience
  • the captured image 5 captured by the fixed point camera is an objectively captured image of the user 1's experience. That is, the experience of the user 1 during the session is recorded in the video library as an objective / subjective shot video 5.
  • the event information 10 of the event E that the user 1 experiences while the user 1 is experiencing the AR by sound and the captured image 5 that captures the state of the experience are arranged in parallel. Will be recorded.
  • FIG. 4 is a block diagram showing a functional configuration example of the information processing system 100 shown in FIG.
  • FIG. 4A is a block diagram showing a configuration example of the server device 20.
  • FIG. 4B is a block diagram showing a configuration example of the terminal device 30.
  • the server device 20 collects and manages the data output from the plurality of terminal devices 30, and outputs the data to each terminal device 30 as needed. That is, the server device 20 functions as a data server. As shown in FIG. 4A, the server device 20 includes a communication unit 21, a storage unit 22, and a server control unit 23.
  • the communication unit 21 is a communication module that performs network communication with other devices via the network 50.
  • the communication unit 21 has, for example, a data transmission function for transmitting data generated by the server device 20, and a data reception function for receiving data transmitted from another device (terminal device 30 or the like) via the network 50. ..
  • the specific configuration of the communication unit 21 is not limited, and various communication modules compatible with wired LAN, wireless LAN, optical communication, and the like may be used.
  • the storage unit 22 is a non-volatile storage device.
  • a recording medium using a solid-state element such as an SSD (Solid State Drive) or a magnetic recording medium such as an HDD (Hard Disk Drive) is used.
  • the type of recording medium used as the storage unit 22 is not limited, and for example, any recording medium for recording data non-temporarily may be used.
  • the server control program is stored in the storage unit 22.
  • the server control program is, for example, a program that controls the operation of the entire server device 20. Further, as shown in FIG. 4A, the above-mentioned sound library 13, visual library 14, video library 15, and event library 16 are stored in the storage unit 22.
  • the server control unit 23 controls the operation of the server device 20.
  • the server control unit 23 has a hardware configuration necessary for a computer such as a CPU and a memory (RAM, ROM). Various processes are executed by the CPU loading the server control program stored in the storage unit 22 into the RAM and executing the program. In the present embodiment, the server control unit 23 executes the server control program to realize the server data management unit 24 as a functional block.
  • the server data management unit 24 manages the data handled by the server device 20. For example, the server data management unit 24 acquires the data transmitted from the terminal device 30 and appropriately stores it in each library in the storage unit 22. In the present embodiment, the event information 10 generated by the terminal device 30 is stored in the event library 16. Further, the captured video 5 captured by the terminal device 30 (portable camera) is stored in the video library 15. Further, for example, the server data management unit 24 refers to each library in the storage unit 22 to read necessary data in response to a command or the like transmitted from the terminal device 30, and transmits the read data to the terminal device 30. .. In the present embodiment, the event information 10 generated by each terminal device 30 is shared among the terminal devices 30 via the server data management unit 24. The process of sharing the event information 10 is performed in real time.
  • the terminal device 30 shown in FIG. 4B is a portable playback device capable of presenting audio and video to the user 1.
  • the terminal device 30 for example, a smartphone, a portable music player, or the like is used.
  • the terminal device 30 includes an audio output unit 31, an image display unit 32, a photographing unit 33, a sensor unit 34, a communication unit 35, a storage unit 36, and a terminal control unit 40. Have.
  • the audio output unit 31 outputs an audio signal for driving the speaker mounted on the earphone 2.
  • an audio amplifier that outputs an audio signal by wire, a communication module that outputs an audio signal wirelessly, or the like is used as the audio output unit 31.
  • the specific configuration of the audio output unit 31 is not limited.
  • an audio signal is generated and output to the earphone 2 based on audio data such as AR sound 3.
  • voice such as AR sound 3 is reproduced from the earphone 2, and the user 1 can experience auditory AR by sound.
  • the image display unit 32 is a display for displaying an image.
  • a display module including a liquid crystal display, an organic EL display, or the like is used as the image display unit 32.
  • the AR visual 4 or the like is superimposed on the video (shooting video 5) shot by the shooting unit 33, which will be described later, and output to the image display unit 32.
  • the user 1 can experience the visual AR by the image.
  • the photographing unit 33 is a camera provided in the terminal device 30, and photographs a real-world view seen from the terminal device 30.
  • a digital camera including an image sensor such as a CMOS (Complementary Metal-Oxide Semiconductor) sensor or a CCD (Charge Coupled Device) sensor is used as the photographing unit 33.
  • the photographing unit 33 functions as a portable camera carried by the user 1 described above. Further, the video shot by the shooting unit 33 is appropriately stored in the video library 15 as the shooting video 5.
  • the sensor unit 34 is a sensor module composed of a plurality of sensors provided in the terminal device 30.
  • the sensor unit 34 includes various sensors such as an motion sensor and a position sensor.
  • the motion sensor is a sensor that detects the operating state of the terminal device 30.
  • a 9-axis sensor including a 3-axis acceleration sensor, a 3-axis gyro sensor, and a 3-axis compass sensor is used.
  • the position sensor is a sensor that detects the current position of the terminal device 30 based on a signal from the outside.
  • a positioning module that receives radio waves from a GPS (Global Positioning System) satellite and detects the current position is used.
  • a beacon module using a predetermined radio wave may be used as a position sensor.
  • the specific configuration of the sensor unit 34 is not limited, and a temperature sensor, an illuminance sensor, or the like may be provided. ..
  • the communication unit 35 is a communication module that performs network communication with other devices via the network 50.
  • the communication unit 35 has, for example, a data transmission function for transmitting data generated by the terminal device 30, and a data reception function for receiving data transmitted from the server device 20 via the network 50.
  • the specific configuration of the communication unit 35 is not limited, and various communication modules compatible with wired LAN, wireless LAN, optical communication, and the like may be used.
  • the storage unit 36 is a non-volatile storage device.
  • a recording medium using a solid-state element such as an SSD or a magnetic recording medium such as an HDD is used.
  • the type of recording medium used as the storage unit 36 is not limited, and for example, any recording medium for recording data non-temporarily may be used.
  • the storage unit 36 stores a terminal control program that controls the operation of the entire terminal device 30.
  • the storage unit 36 stores a library of AR data 11 used in AR by sound.
  • the library of AR data 11 is constructed by appropriately downloading all or part of the data contained in the sound library 13 and the visual library 14 stored in the storage unit 22 of the server device 20, for example.
  • the libraries of AR data 11 (sound library 13 and visual library 14) may be installed in the storage unit 36 in advance.
  • the terminal control unit 40 controls the operation of the terminal device 30.
  • the terminal control unit 40 has a hardware configuration necessary for a computer such as a CPU and a memory (RAM, ROM). Various processes are executed by the CPU loading the terminal control program stored in the storage unit 36 into the RAM and executing the program.
  • the terminal control unit 40 corresponds to the information processing device according to the present embodiment.
  • terminal control unit 40 for example, a device such as a PLD (Programmable Logic Device) such as an FPGA (Field Programmable Gate Array) or another device such as an ASIC (Application Specific Integrated Circuit) may be used. Further, for example, a processor such as a GPU (Graphics Processing Unit) may be used as the terminal control unit 40.
  • a PLD Programmable Logic Device
  • FPGA Field Programmable Gate Array
  • ASIC Application Specific Integrated Circuit
  • a processor such as a GPU (Graphics Processing Unit) may be used as the terminal control unit 40.
  • the CPU of the terminal control unit 40 executes the program (control program) according to the present embodiment, so that the terminal data management unit 41, the trigger detection unit 42, and the shared event determination unit 43 are functional blocks. And the live content generation unit 44 are realized. Then, the information processing method and the data generation method according to the present embodiment are executed by these functional blocks.
  • dedicated hardware such as an IC (integrated circuit) may be appropriately used.
  • the terminal data management unit 41 manages the data handled by the terminal device 30. For example, the terminal data management unit 41 acquires the data transmitted from the server device 20 and stores it in the storage unit 36. Alternatively, the terminal data management unit 41 outputs the data transmitted from the server device 20 to another functional block of the terminal device 30. Further, the terminal data management unit 41 appropriately transmits the data generated by the terminal device 30 and the data stored in the storage unit 36 to the server device 20. In addition, reading of the data stored in the storage unit 36 and the like are appropriately executed.
  • the terminal data management unit 41 generates event information 10 regarding the event E for the terminal user.
  • the terminal user is a user 1 who uses the terminal device 30.
  • the terminal user corresponds to the second user.
  • event information 10 related to the event E is generated.
  • the type and timing of the event E that has occurred, the type and presentation timing of the AR data (AR sound / AR visual) presented at the event E, and the like are recorded along the timeline 6.
  • the event information 10 of the terminal user generated by the terminal data management unit 41 is transmitted to the server device 20 or output to the live content generation unit 44 described later.
  • the terminal data management unit 41 provided in each terminal device 30 presents the AR sound presented in the event E for each event E generated while the user 1 is experiencing the AR by sound.
  • the AR data 11 including 3 is recorded along the timeline 6, and the event information 10 is generated.
  • the recording unit is realized by the terminal data management unit 41 provided in each terminal device 30.
  • the terminal data management unit 41 acquires the event information 10 regarding the event E for another user.
  • the other user is a user 1 who uses another terminal device 30.
  • the other user corresponds to the first user.
  • event information 10 related to the event E is generated and transmitted to the server device 20.
  • the terminal data management unit 41 acquires the event information 10 of another user transmitted by the other terminal device 30 to the server device 20.
  • the event information 10 of the terminal user transmitted to the server device 20 will be transmitted to the other terminal device 30.
  • the event information 10 (event library 16) is shared between the terminal devices 30.
  • the terminal data management unit 41 functions as an acquisition unit.
  • the trigger detection unit 42 detects the trigger 8 of the event E targeting the user 1 (terminal user).
  • the trigger 8 for example, the operation of the user 1 is set. In this case, based on the output of the motion sensor of the sensor unit 34, it is determined whether or not the user 1 has performed an motion (walking motion, a predetermined gesture, or the like) that becomes the trigger 8. Further, for example, the trigger 8 may be set according to the position of the user 1. In this case, it is determined whether or not the user 1 has invaded a predetermined area based on the output of the position sensor of the sensor unit 34. Further, the trigger 8 or the like may be set according to the elapsed time. In addition, a plurality of triggers 8 can be combined and used as a trigger 8 for one event E.
  • the event E corresponding to the trigger 8 is executed. That is, the detection timing of the trigger 8 is the occurrence timing of the event E.
  • the detection timing of the last detected trigger 8 is the occurrence timing of the event E.
  • the above-mentioned event information 10 is typically generated immediately when the trigger 8 is detected, and is shared with another terminal device 30.
  • the shared event determination unit 43 determines whether or not a shared event has occurred.
  • the shared event is an event E shared among a plurality of users 1.
  • AR data 11 is presented so that the experience content of event E is shared with a plurality of users 1 in real time.
  • it is determined whether or not the event E generated for another user who is experiencing AR by sound together with the terminal user corresponds to a shared event.
  • this determination process for example, conditions related to the position and affiliation of each user 1, the type of event E, and the like are determined. This point will be described in detail later.
  • the live content generation unit 44 generates AR live content by sound.
  • the live content is content provided in real time to the user 1 who experiences AR by sound.
  • the live content generation unit 44 generates playback data for reproducing the AR data 11 with reference to the timeline 6.
  • the reproduction data for example, data for designating reproduction parameters such as the type of AR data 11 (AR sound 3 and AR visual 4), presentation timing, and presentation position is generated.
  • the live content generation unit 44 generates live content of an individual event E (hereinafter referred to as an individual event) for a terminal user who uses the terminal device 30.
  • an individual event live content of an individual event E
  • the reproduction data for reproducing the AR data 11 corresponding to the individual event is generated.
  • This playback data is data that allows the terminal user to experience an individual event.
  • the live content generation unit 44 generates playback data based on the event information 10 of another user. Specifically, as the reproduction data, shared data is generated that presents the event E for the other user to the terminal user different from the other user so that the event E can be viewed.
  • the event information 10 of another user is acquired.
  • the reproduction data (shared data) for reproducing the AR data 11 corresponding to the event E targeting the other user along the timeline 6 is generated.
  • reproducing the AR data 11 according to the shared data it is possible to make the terminal user experience the event E experienced by another user along the same timeline 6. This makes it possible to share the experience of AR by sound among a plurality of users 1.
  • the process of generating shared data is executed by the shared event determination unit 43 for an event determined to be a shared event. That is, when the individual event for another user becomes a shared event from the viewpoint of the terminal user, the individual event of the other user is presented so as to be viewable.
  • the live content generation unit 44 corresponds to the generation unit.
  • FIG. 5 is a flowchart showing an example of a process of sharing an event among a plurality of users 1.
  • the process shown in FIG. 5 is a loop process that is continuously executed in the terminal device 30 while the session is in progress.
  • the event library 16 is synchronized in the terminal device 30 (step 101).
  • This process is a process for sharing event information regarding the event E that has occurred in the other terminal device 30.
  • the terminal data management unit 41 acquires the event information 10 generated by the other terminal device 30 via the server device 20.
  • the shared event determination unit 43 determines whether or not a shared event has occurred (step 102). Here, based on the event information 10 acquired in step 101, it is determined whether or not the event E newly generated in the other terminal device 30 corresponds to the shared event.
  • the determination process of the shared event it is determined whether or not the relative distance between the other user and the terminal user satisfies a predetermined condition. Specifically, when the relative distance is equal to or less than the threshold value, the event to be determined is determined to be a shared event. This makes it possible to share the event E that has occurred to other users who are sufficiently close.
  • the relative distance threshold value may be appropriately set according to the type of event E and the like. For example, in the event E that presents a virtual voice, the threshold value is set to about 5 m to 10 m.
  • the determination process of the shared event it is determined whether or not the affiliation of another user and the terminal user satisfies a predetermined condition. Specifically, when the group to which each user belongs is the same, it is determined that the event to be determined is a shared event. This makes it possible to share the event E that has occurred to users who belong to the same group.
  • the shared data generation process is executed (step 103 in FIG. 5).
  • the shared data is generated when the relative distance between the other user and the terminal user or at least one of the affiliations of the other user and the terminal user satisfies a predetermined condition.
  • step 102 When it is determined that the newly generated event E is a shared event (Yes in step 102), the live content generation unit 44 generates shared data for sharing the event E (step 103). In this case, the AR data 11 is presented to the terminal user who uses the terminal device 30 according to the shared data. This makes it possible for the terminal user to experience an event experienced by another user. If it is determined that the newly generated event E is not a shared event (No in step 102), step 101 is executed again and the event library 16 is synchronized again.
  • FIG. 6 is a schematic diagram showing an example of a shared event shared using shared data.
  • FIG. 6 schematically illustrates users 1a and 1b participating in a session to experience AR by sound.
  • AR sound 3 representing a virtual footstep is presented according to the landing motion of the participants.
  • AR visual 4 onomatopoeia, musical notes, etc.
  • the event E is generated as many times as the number of times the user 1 performs the landing operation.
  • the process of generating shared data in the terminal device 30b used by the user 1b will be mainly described with the user 1b on the left side of the figure as the terminal user.
  • the user 1a on the right side in the figure becomes another user when viewed from the user 1b
  • the terminal device 30a used by the user 1a becomes another terminal device 30.
  • the live content generation unit 44 sets the timing at which the AR data 11 is reproduced in the shared data, based on the presentation timing of the AR data 11.
  • the shared data for reproducing the AR data 11 is generated at the presentation timing of the AR data 11.
  • shared data for reproducing the AR sound 3 reproduced by the user 1a to the user 1b is generated.
  • the live content generation unit 44 generates shared data instructing the AR sound 3 to be played at the earliest possible timing after the event information 10 is acquired. Even in such a case, it is possible to present the AR sound 3 corresponding to the operation of the user 1a to the user 1b.
  • the terminal device 30b generates shared data instructing the AR sound 3 to be reproduced at the timing when the landing is predicted. This makes it possible to present the AR sound 3 according to the operation of the user 1a without delay.
  • the live content generation unit 44 sets the position where the AR data 11 is reproduced in the shared data based on the presentation position of the AR data 11.
  • the position where the AR data 11 is reproduced is a position in the real space where the AR data 11 is localized.
  • the presentation position of the AR data 11 specified in the event information 10 is set to the position where the AR data 11 is reproduced.
  • shared data for reproduction is generated for the user 1b so that the AR sound 3 can be heard from the feet of the user 1a.
  • the reproduction volume and reproduction direction of the AR sound 3 are adjusted so that the AR sound 3 can be heard from the feet of the user 1a.
  • the reproduction direction is adjusted so that the AR sound 3 can be heard from the direction in which the user 1a is seen from the user 1b.
  • only one of the reproduction volume and the reproduction direction may be adjusted.
  • the AR sound 3 is played so as to be heard from the lower left of the user 1b.
  • the AR sound 3 is reproduced so that the AR sound 3 can be heard from the lower right of the user 1b. That is, the AR sound 3 reproduced according to the shared data is reproduced so as to be heard from a fixed position (at the feet of the user 1a) regardless of the position of the user 1b.
  • the shared data is based on the relative positional relationship between the presentation position of the AR sound 3 and the position of the terminal user (user 1b) in the event targeting another user (user 1a). At least one of the reproduction volume and the reproduction direction of the AR sound 3 in the above is adjusted.
  • events E1 and E2 occur with the operation of user 1a.
  • the AR data 11 associated with the events E1 and E2 is presented viewably not only to the user 1a but also to the user 1b.
  • the AR data 11 associated with the events E3 and E4 is presented viewably not only to the user 1b but also to the user 1a.
  • the AR experience in a group in the AR experience in a group, the AR sound 3 and the AR visual 4 experienced by another user in the pseudo space can be placed in the space of oneself (terminal user) within a specific distance. It is possible to localize and experience. As a result, it becomes possible to share the experience content of AR by sound in real time, and it becomes possible to demonstrate excellent entertainment.
  • FIGS. 7A to 7C and FIGS. 8A to 8C are schematic views showing an example of event information associated with a shared event.
  • the event E generated for the user 1a is a shared event shared by the user 1b. That is, the user 1a generates the trigger 8, and the user 1b objectively actually experiences it in the same group.
  • event information 10 regarding a shared event is schematically illustrated. In the following, the operation of the system at each event and an example of a possible scenario will be described.
  • the AR sound 3 is presented to the user 1a.
  • the type of the trigger 8 include the action (gesture) of the user 1, the area in, the passage of time, and the like.
  • the AR sound 3 corresponding to the trigger 8 is selected from the sound library 13 and immediately reproduced.
  • the type of AR sound 3, the presentation timing, and the like are recorded in the event information 10.
  • the event information 10 shown in FIG. 7A does not record information regarding the presentation position of the AR sound 3.
  • the event information 10 is acquired, and the AR sound 3 recorded therein is immediately reproduced. This makes it possible for the user 1b to experience the event E experienced by the user 1a at almost the same timing.
  • FIG. 7A there is a scenario in which a virtual footstep is reproduced.
  • a virtual footstep For example, when the user 1 enters a specific area, the BGM or the like is switched and the scenario is started.
  • This scenario (session) ends after a certain period of time (for example, 2 minutes) and switches to the next scenario.
  • a certain period of time for example, 2 minutes
  • the AR sound 3 is played so that a virtual footstep can be heard according to the step.
  • the virtual footsteps corresponding to the steps of the user 1a are reproduced so that the user 1b can also hear them.
  • the AR sound 3 is presented according to the position of the user 1a.
  • the position (experience position 25a) of the user 1a when the trigger 8 is detected is detected.
  • the AR sound 3 corresponding to the event E is selected from the sound library 13. Then, the AR sound 3 is reproduced by the user 1a according to the relative position between the position of the sound source to be reproduced (presentation position 26s) and the position of the user 1a (experience position 25a).
  • the presentation position 26s of the AR sound 3 is recorded in the event information 10.
  • the event information 10 is acquired, and the AR sound 3 is the user 1b according to the relative position between the presentation position 26s of the AR sound 3 recorded therein and the position of the user 1b (experience position 25b). Will be played back. This makes it possible to present the AR sound 3 to both the users 1a and 1b so that they can be heard from the presentation position 26s in the real world.
  • FIG. 7B there is a scenario in which different sound effects are reproduced for each position.
  • a specific sound effect AR sound 3
  • AR sound 3 a specific sound effect
  • another sound effect is played.
  • the sound effect corresponding to the jump of the user 1a is reproduced so as to be heard by the user 1b.
  • the presentation position 26s of the sound effect is set to a predetermined position in the real world (for example, the feet of the user 1a), and the sound effect is reproduced by the users 1a and 1b so as to be heard from the presentation position 26s. ..
  • the AR sound 3 and the AR visual 4 are presented according to the position of the user 1a. Further, it is assumed that the user 1a is photographing the scenery of the real world with the portable camera (shooting unit 33) of the terminal device 30. For example, in the terminal device 30 of the user 1a, when the event E occurs, the experience position 25a of the user 1a is detected. Further, the AR sound 3 corresponding to the event E is selected from the sound library 13. Then, the AR sound 3 is reproduced by the user 1a according to the relative position between the presentation position 26s of the AR sound 3 and the experience position 25a of the user 1a.
  • the AR visual 4 corresponding to the AR sound 3 is selected from the visual library 14. Then, the AR visual 4 is displayed to the user 1a according to the relative position between the position where the AR visual 4 should be displayed (presentation position 26v) and the experience position 25a of the user 1a (or the shooting position 27a of the shot image 5a). .. Specifically, the AR visual 4 is presented superimposing on the captured image 5a. Therefore, when the user 1a is looking through the camera as a real experience, both the AR sound 3 and the AR visual 4 reproduce the relative position with the user 1a and are superimposed on the captured image 5a.
  • the event information 10 in addition to the information of the AR sound 3 (type, presentation timing, presentation position 26s), the information of the AR visual 4 (type, presentation timing, presentation position 26v) is recorded. Further, the captured video 5a captured by the user 1a is recorded in the video library 15 together with the information on the captured position 27a.
  • the event information 10 is acquired, and the AR sound 3 is the user 1b according to the relative position between the presentation position 26s of the AR sound 3 recorded therein and the position of the user 1b (experience position 25b). Will be played back.
  • FIG. 7C An example of the scenario assumed in FIG. 7C is a scenario in which a real-world object speaks.
  • the stone statue begins to talk.
  • the sound spoken by the stone statue is reproduced as AR sound 3.
  • the AR visual 4 representing the facial expression of the stone image is superimposed and displayed on the captured image 5.
  • an animation such as the movement of the mouth of the stone statue according to the dialogue is played.
  • the terminal device 30 of the user 1a executes a process of superimposing the AR visual 4 on the basis of the presentation position 26v.
  • the AR sound 3 is reproduced so that the user 1b can hear the dialogue from a certain direction of the stone statue.
  • the user 1b can also experience the event in which the stone statue generated by the user 1a speaks at the same time.
  • FIG. 8A the AR sound 3 and the AR visual 4 are presented according to the position of the user 1a as in FIG. 7A. Further, it is assumed that the user 1b as well as the user 1a is taking a picture of the scenery in the real world with the portable camera of the terminal device 30.
  • the images captured by the user 1a and the user 1b will be referred to as captured images 5a and 5b, and the positions where the captured images 5a and 5b have been captured will be referred to as shooting positions 27a and 27b.
  • the AR sound 3 corresponding to the event E is selected from the sound library 13. Then, the AR sound 3 is reproduced by the user 1a according to the relative position between the presentation position 26s of the AR sound 3 and the experience position 25a of the user 1a. Further, the AR visual 4 corresponding to the AR sound 3 is selected from the visual library 14. Then, the AR visual 4 is superimposed and displayed on the shot image 5a so that the relative position between the presentation position 26v of the AR visual 4 and the experience position 25a of the user 1a (or the shooting position 27a of the shot image 5a) is reproduced. To. At this time, the event information 10 records the information of the AR sound 3 (type, presentation timing, presentation position 26s) and the information of the AR visual 4 (type, presentation timing, presentation position 26v).
  • the event information 10 is acquired, and the AR sound 3 is sent to the user 1b according to the relative position between the presentation position 26s of the AR sound 3 recorded therein and the experience position 25b of the user 1b. Will be played. Further, the AR visual 4 recorded in the event information 10 is displayed on the shot image 5b so that the relative position between the presentation position 26v and the experience position 25b of the user 1b (or the shooting position 27b of the shot video 5b) is reproduced. AR visual 4 is superimposed and displayed.
  • the captured video 5a captured by the user 1a is recorded in the video library 15 together with the information of the captured position 27a.
  • the captured video 5b captured by the user 1b is recorded in the video library 15 together with the information on the captured position 27b.
  • FIG. 8A there is a scenario that expresses an interaction with an object in the real world.
  • the sound of the ground noise (AR sound 3) reverberates.
  • the voice (AR sound 3) at which the stone statue begins to suffer is reproduced.
  • the AR visual 4 showing how the stone image is starting to collapse is superimposed on the captured image 5a (photographed image 5b).
  • the captured image 5b is superposed with the AR visual 4 showing how the user 1a is illuminated and the user 1a is releasing energy. In this way, it is also possible to present an AR visual 4 that is visible only to one user 1.
  • the users 1a and 1b are photographing the captured images 5a and 5b, respectively. Further, in FIG. 8B, the photographed image 5c is photographed from the photographing position 27c by the fixed point camera 46 (fixed camera). The captured image 5c captured by the fixed point camera 46 is recorded in the image library 15 together with the information on the photographing position 27c.
  • the captured image 5c captured by the fixed point camera 46 is displayed in real time on a monitor or the like provided at a venue different from the place where the user 1a and the user 1b are present.
  • the AR sound 3 and the AR visual 4 reproduce the relative position of the fixed point camera 46 with respect to the shooting position 27c and are superimposed on the shot image 5c.
  • the event E experienced by the users 1a and 1b can be viewed by a large number of users 1 who are not present.
  • FIG. 8A there is a scenario in which users 1a and 1b jointly proceed.
  • AR sound 3 representing the sound of thunder and the sound of squall starting to fall is played.
  • the AR visual 4 representing the squall is superimposed on the captured images 5a and 5b captured by the users 1a and 1b.
  • the AR visual 4 showing how the light from the users 1a and 1b heads toward the sky is superimposed on the captured image 5c of the fixed point camera 46 after the users 1a and 1b jump.
  • the AR visual 4 showing the appearance of squall falling over the entire area is superimposed on the captured image 5c.
  • FIG. 8C AR data 11 according to the situation when the event E occurs is presented.
  • the AR data 11 including the AR sound 3 and the AR visual 4 is presented, and the captured images 5a to 5c are captured.
  • the AR data 11 data generated based on the situation information representing the situation when the event E occurs, or data selected based on the situation information is used. Therefore, for example, even if the event E is designed according to the same scenario, the presented AR data 11 changes depending on the occurrence situation. This makes it possible to provide an experience that suits the actual environment and improve entertainment.
  • the AR data 11 used for the event E is set, the status information is acquired by the terminal device 30 that has detected the event E.
  • the status information includes information about at least one of the user 1's surrounding environment, the user 1's status, and the user 1's operation content.
  • the information regarding the surrounding environment of the user 1 is, for example, information such as the weather, time, season, temperature, and the holding position (outdoor / indoor) of the event E. For example, information on the weather at the current position is read via the network.
  • the information regarding the status of the user 1 is information indicating the status of each user 1, such as the items possessed by the user 1, the role of the user 1 in the session, and the height, weight, physique, and gender of the user 1. be. This information is acquired by referring to, for example, the information of the user 1 input before the session is started.
  • the information regarding the operation content of the user 1 is information indicating the type and scale (momentum) of the operation of the user 1, the facial expression of the user 1, the loudness of the voice, and the like. This information is acquired, for example, based on the output of the motion sensor mounted on the terminal device 30, an image taken by the user, or the like.
  • the specific content of the situation information and the method of acquiring the situation information are not limited. For example, information regarding the progress of the scenario may be used as the status information.
  • the AR data 11 reproduced by the shared data is selected based on the situation information. That is, when a shared event occurs, the AR data 11 to be reproduced is selected according to the weather at that time and the status of the user 1. As described above, in the present embodiment, it is possible to change the contents of the AR sound 3 and the AR visual 4 and enable the reproduction (selection of audible / inaudible) according to the situation information. This makes it possible to design an event E or the like in which the reproduction conditions of the AR data 11 are set in detail.
  • item information 28 and weather information 29 are used as situation information.
  • Item information 28 and weather information 29 are schematically illustrated by a heart mark and a sun mark, respectively.
  • the AR sound 3 and the AR visual 4 are acquired from the sound library 13 and the visual library 14, respectively.
  • AR data 11 for sunny weather, cloudy weather, rainy weather, etc. is stored in the library in advance, and is appropriately selected according to the weather information.
  • a new AR data 11 may be generated by applying an effect for each weather to the basic AR data 11.
  • whether or not to reproduce the AR sound 3 and the AR visual 4 is selected based on the item information 28.
  • the user 1a possesses the item.
  • the AR sound 3 is played by both the user 1a and the user 1b.
  • the AR visual 4 is displayed only to the user 1 who possesses the item, and is not displayed to the user 1b.
  • the AR visual 4 may be displayed only to the user 1b who does not have the item.
  • FIG. 8C there is a scenario that progresses according to the item and the weather.
  • the item (bell) in a scenario.
  • the item bell starts to ring without permission.
  • the sound of the bell is the AR sound 3 presented to the user 1a and the user 1b.
  • the AR visual 4 showing the appearance of the user 1a bleeding from the head is superimposed on the captured image 5b. Since this AR visual 4 is not displayed to the user 1a, the user 1a cannot know what he / she is doing.
  • the sound of the bell is gradually lowered, and the AR visual 4 corresponding to the sound is superimposed on the captured images 5a and b.
  • This AR visual 4 is selected based on the weather information 29 and the like. For example, if the time when the event E occurs is daytime, AR visual 4 expressing the night world through the camera is selected. Further, for example, if the time when the event E occurs is summer, the AR visual 4 expressing the appearance of snow starting to fall through the camera is selected.
  • the terminal control unit 40 acquires the event information 10 that records the event E that occurred during the experience of AR by sound.
  • AR data 11 including the AR sound 3 presented in each event E is recorded along the timeline 6.
  • shared data for reproducing the AR data 11 on the timeline 6 is generated.
  • FIG. 9 is a schematic diagram showing an example of an event given as a comparative example.
  • two users 1a and 1b are presented with an AR sound 3 representing a virtual footstep.
  • the AR sound 3 presented to the user 1a is not played by the user 1b
  • the AR sound 3 presented to the user 1b is not played by the user 1a.
  • each user 1 can hear a virtual footstep according to his / her walking motion, he / she does not know what kind of footstep the other user 1 is listening to. That is, the AR experience by sound is closed for each user 1 and is not shared with each other.
  • the event E generated while experiencing the AR by sound is recorded as the event information 10.
  • AR data 11 presented at event E for each user 1 who experiences AR is recorded along the timeline 6. Therefore, by using the event information 10, it is possible to monitor information such as what kind of event E is occurring among the plurality of users 1 on the same timeline. As a result, it is possible to present the AR data 11 of the event E to the user 1 who wants to share the event E at substantially the same time, and it is possible to share the experience content of AR by sound in real time.
  • FIG. 10 is a schematic diagram showing a configuration example of the information processing system according to the second embodiment.
  • the information processing system 200 is a system that provides the experience content of AR by sound as reproduced content based on the event information generated during the experience.
  • the information processing system 200 includes a server device 220, at least one terminal device 230, and at least one reproduction device 240.
  • the server device 220, the terminal device 230, and the playback device 240 are communicably connected via the network 50.
  • the information processing system 200 has a configuration in which the reproduction device 240 is added to the information processing system 100 shown in FIG.
  • the server device 220 is a device that manages the entire AR experience by sound and generates reproduced contents.
  • the configuration of the server device 220 will be described with reference to FIG.
  • the terminal device 230 is a device carried and used by each user 1.
  • the terminal device 230 is configured in the same manner as the terminal device 30 shown in FIG. 1, for example.
  • the reproduction device 240 is a device used by the user 1 to reproduce the reproduced content, for example, after the AR experience is completed.
  • the user 1 can request the reproduced content via a predetermined GUI (Graphical User Interface) or the like displayed on the playback device 240.
  • GUI Graphic User Interface
  • As the reproduction device 240 for example, a smartphone, a PC, or the like owned by the user 1 is used. Further, the terminal device 230 may be used as it is as the reproduction device 240.
  • FIG. 11 is a schematic diagram showing an example of AR reproduction content by sound.
  • the reproduction content 60 is data configured to reproduce the event E along the timeline.
  • the reproduced content 60 is typically generated as video data including audio.
  • a video obtained by processing the shot video 5 shot during the experience of AR by sound based on the event information 10 is generated as the reproduction content 60.
  • the reproduced content 60 may be configured as audio data or image data.
  • the user 1 who has experienced AR by sound accesses the server device 220 via the playback device 240 and requests the generation of the reproduced content 60. Therefore, when the reproduction content 60 is generated, it is assumed that the event library 16 in which the event information 10 is recorded has already been constructed in the server device 220. Further, the generated reproduced content 60 is used for looking back (re-experience) at a later date of the user 1. Alternatively, the reproduced content 60 is shared from the user 1 to a third party, and is used as a medium for sharing the AR experience by sound.
  • FIG. 11 schematically shows a photographed image 5 captured by the user 1a and the user 1b who experienced the AR by sound.
  • the captured image 5 is an image taken when a certain event E occurs.
  • the reproduced content 60 is generated by superimposing the AR sound 3 presented at the event E and the AR visual 4 corresponding to the AR sound 3 on the captured image 5.
  • the timing of presenting each AR data 11 is set with reference to the timeline 6 used when recording the event information 10. This makes it possible to faithfully reproduce the content actually experienced by the user 1.
  • a sound effect representing the moving sound of the object 4a (insect feather sound in FIG. 6) is presented at the event E.
  • the captured image 5 is superposed with an AR visual 4 such as an image of the object 4a corresponding to the sound effect and an onomatopoeia representing the sound effect.
  • the narration (AR sound 3) spoken by the character 4b is presented at the event E.
  • the captured image 5 is superposed with the image of the character 4b performing the utterance operation and the AR visual 4 such as the text 4c representing the content of the narration.
  • the AR visual 4 may be presented at the time of the experience or may not be presented at the time of the experience.
  • the reproduced content 60 is the content that presents the AR sound 3 presented at the plurality of events E and the AR visual 4 corresponding to the AR sound 3. This makes it possible to restore the session corresponding to one scenario, and it is possible to relive the session.
  • the re-experience is a session in which the recorded session is played back later to be simulated, and a part thereof.
  • the re-experience includes both a subjective experience of experiencing the session subjectively and an objective experience of experiencing the session objectively.
  • the video library 15 is a database of the captured video 5 which is the base of the moving image at the time of re-experience. In the re-experience session, the captured image 5 will be used like a recorded image.
  • FIG. 12 is a block diagram showing a functional configuration example of the server device 220 shown in FIG.
  • the server device 220 collects and manages data (event information 10 and captured video 5) output from a plurality of terminal devices 230, and generates a library thereof (event library 16 and video library 15). Further, the server device 220 generates the reproduction content 60 in response to the request command transmitted from the reproduction device 240.
  • the server device 220 includes a communication unit 221, a storage unit 222, and a server control unit 223.
  • the communication unit 221 and the storage unit 222 are configured in the same manner as the communication unit 21 and the storage unit 22 of the server device 20 shown in FIG. 1, for example.
  • the server control unit 223 controls the operation of the server device 220.
  • the server control unit 223 has a hardware configuration necessary for a computer such as a CPU and a memory (RAM, ROM). Various processes are executed by the CPU loading and executing the server control program stored in the storage unit 222 into the RAM.
  • the server control unit 223 corresponds to the information processing device according to the present embodiment.
  • the CPU of the server control unit 223 executes the program (server control program) according to the present embodiment, whereby the server data management unit 224 and the reproduction content generation unit 225 are realized as functional blocks. .. Then, the information processing method and the data generation method according to the present embodiment are executed by these functional blocks.
  • dedicated hardware such as an IC (integrated circuit) may be appropriately used.
  • the server data management unit 224 manages the data handled by the server device 220. For example, while the AR attraction by sound is being executed, the server data management unit 224 acquires the event information 10 and the captured video 5 transmitted from the terminal device 230, and acquires the event library 16 and the video in the storage unit 222. Each is stored in the library 15. Further, when the AR attraction by sound is completed, the server data management unit 224 acquires the request command transmitted from the reproduction device 240 and outputs it to the reproduction content generation unit 225. When the reproduction content 60 is generated, the event information and the captured video 5 requested by the reproduction content generation unit 225 are read from each library and output to the reproduction content generation unit 225. In the present embodiment, the server data management unit 224 functions as an acquisition unit.
  • the reproduction content generation unit 225 generates the reproduction content 60 based on the event information 10. As described with reference to FIG. 11, the reproduction content 60 is reproduction data for reproducing the AR data 11 with reference to the timeline 6. Note that the reproduction content 60 is different from the shared data described in the first embodiment, and is reproduction data generated after the user 1 finishes the AR experience by sound and reproduces the event E experienced by the user 1. .. In this way, the reproduction content generation unit 225 generates the reproduction content 60 that reproduces the event E along the timeline 6 as the reproduction data. In the present embodiment, the reproduced content generation unit 225 corresponds to the generation unit.
  • the reproduction content generation unit 225 generates the reproduction content 60 by adding the effect of the AR data 11 to the captured image 5 along the timeline 6.
  • the process of adding the effect by the AR data 11 means the processing process of the captured image 5 performed by using the AR data 11.
  • the process of superimposing the AR sound 3 as the sound of the captured image 5 is also a process of adding an effect.
  • the AR visual 4 is image data
  • a process of superimposing the image data on the captured image 5 is executed.
  • the AR visual 4 is data for designating a digital effect
  • a process of applying the designated digital effect to the captured image 5 is executed.
  • a depth image or the like may be acquired together with the captured image 5, and the AR visual 4 may be superimposed by identifying and measuring the target real object by using subject recognition or depth sensing. This makes it possible to generate the reproduced content 60 in which the AR visual 4 is naturally superimposed.
  • the timing at which these AR data 11 are presented (reproduced) is set along the timeline 6. This point will be described in detail later.
  • FIG. 13 is a flowchart showing an example of a method of generating the reproduced content 60.
  • the process shown in FIG. 13 is, for example, a process executed by the server device 220 in response to a request command transmitted from the playback device 240. Alternatively, the process shown in FIG. 13 may be automatically executed when one session ends.
  • the reproduction content generation unit 225 sets the target (reproduction target) of the reproduction content 60 (step 201).
  • the reproduction target is a session or event for which the reproduction content 60 is generated.
  • the user 1 sets a session, a time zone, or the like for generating the reproduced content 60 via a GUI screen or the like displayed on the playback device 240.
  • the information set here is transmitted to the server device 220 together with a request command requesting the generation of the reproduced content 60.
  • the request command is acquired by the server data management unit 224 and output to the reproduction content generation unit 225.
  • a session set by the user is set as it is as a session to be reproduced.
  • the event E included in the time zone set by the user may be set as the reproduction target.
  • the server data management unit 224 refers to the event library 16 and acquires the event information 10 necessary for generating the reproduced content 60 (step 202). Specifically, the event information 10 of the event E (session) set as the reproduction target is read from the event library 16. At this time, the captured video 5 in which the event E to be reproduced is captured is read from the video library 15. The video library 15 may record a plurality of shot videos 5 shot at the same timing. In this case, the necessary captured image 5 is appropriately read. The event information 10 and the captured video 5 are output to the reproduction content generation unit 225.
  • the reproduction content generation unit 225 generates the reproduction content 60 based on the event information 10 to be reproduced and the captured image 5 (step 203). Specifically, a process of adding the AR data 11 (AR sound 3 and AR visual 4) recorded in the event information 10 to the corresponding captured image 5 is executed. When there are a plurality of captured images 5, AR data 11 is added to each captured image 5.
  • the generated reproduction content 60 is transmitted to the reproduction device 240 by the server data management unit 224. As a result, the user 1 can view the desired reproduced content 60.
  • the process of generating the reproduced content 60 will be described in detail.
  • FIG. 14 is a schematic diagram showing an example of event information 10 used for generating the reproduced content 60.
  • the event information 10 and the captured video 5 used for generating the reproduced content 60, and the situation when they are recorded are schematically illustrated.
  • the event information 10 the event E generated during the actual experience and the captured image 5 captured during the actual experience are stored together with the time code of the same timeline 6. ing.
  • the event information 10 and the captured video 5 shown in FIG. 14 are, for example, information recorded during the actual experience shown in FIG. 8C.
  • the event information 10 the information of the AR sound 3 (type, presentation timing, presentation position 26s) and the information of the AR visual 4 (type, presentation timing, presentation position 26v) are recorded.
  • These AR data 11 are data selected and generated according to, for example, weather information 29 and item information 28 (situation information), and these status information is also recorded in the event information 10.
  • the photographed images 5a and 5b photographed by the users 1a and 1b and the photographed image 5c photographed by the fixed point camera 46 are recorded.
  • the reproduction content generation unit 225 sets the timing at which the AR data 11 is reproduced in the reproduction content 60 based on the presentation timing of the AR data 11.
  • the reproduction content 60 for reproducing the AR data 11 is generated at the presentation timing of the AR data 11. That is, in the captured video 5 that is the reproduced content 60, the timing for reproducing the AR sound 3 and the timing for superimposing the AR visual 4 are set to the presentation timing recorded in the event information 10.
  • the presentation timing of the AR sound 3 is the timing at which the AR sound 3 is actually presented in the actual experience.
  • the presentation timing of the AR visual 4 is a timing set based on the presentation timing of the AR sound 3. Therefore, the presentation timing of the AR visual 4 may coincide with the presentation timing of the AR sound 3, or may be set before or after that.
  • the AR sound 3 is a sound effect representing a footstep
  • the AR visual 4 is an image representing an onomatopoeia of the footstep.
  • the onomatopoeia presentation timing is set to coincide with the footstep presentation timing. That is, the AR visual 4 is presented in synchronization with the AR sound 3. Therefore, in the reproduced content 60, the onomatopoeia is presented at the same time as the footsteps.
  • the presentation timing is synchronized.
  • the AR visual 4 representing the interaction with the AR sound 3 for example, when displaying an image holding an umbrella corresponding to the rain sound
  • the presentation timing is synchronized.
  • the AR sound 3 is a sound effect representing a rain sound and the AR visual 4 is an image representing a rain cloud.
  • the presentation timing of the rain cloud is set before the presentation timing of the rain sound. This makes it possible to express the appearance of rain clouds before it starts to rain.
  • the AR sound 3 is a sound effect representing thunder and the AR visual 4 is an image (or effect) representing thunder.
  • the presentation timing of the thunder image and the effect is set after the presentation timing of the thunder. This makes it possible to express the appearance of thunder occurring after thunder. In this way, by setting the presentation timing of the AR visual 4 to be shifted from the presentation timing of the AR sound 3, it is possible to exert various effects in the reproduced content 60.
  • the reproduced content 60 (re-experience moving image) is generated based on at least one shot image 5. That is, the reproduction content 60 is generated by superimposing the AR data 11 on all or a part of the captured image 5 in which the event E to be reproduced is captured.
  • the reproduced content 60 is composed of captured images 5a to 5c as materials.
  • the reproduction content 60 for the user 1a is generated, the reproduction content 60 is generated using the captured video 5a taken by the user 1a himself / herself if it is a completely subjective moving image.
  • the captured image 5b captured by the user 1b is used, and in the case of a third-person objective moving image, the captured image 5c captured by the fixed-point camera 46 is used.
  • the reproduced content 60 is configured by appropriately combining the scenes of the captured images 5a to 5c. This makes it possible to provide the reproduced content 60 in which the content experienced by the user 1 is recorded from various viewpoints.
  • the presentation position of the AR data 11 in the reproduced content 60 will be described.
  • the positions for reproducing the AR data 11 are relatively arranged starting from the position of the camera that shot the scene (shooting position 27), and the AR data 11 is the shot image 5 of each camera. Is superimposed on.
  • the AR sound 3 is superimposed so as to be reproduced from the direction in which the presentation position 26s of the AR sound 3 is present when viewed from the shooting position 27 of the shot image 5.
  • the playback volume is set according to the distance from the shooting position 27 to the AR sound presentation position 26s.
  • the AR sound 3 is reproduced in the reproduced content 60 based on the relative positional relationship between the presentation position 26s of the AR sound 3 in the event E and the shooting position 27 of the shot image 5. At least one of the volume or playback orientation is adjusted. This makes it possible to provide the reproduced content 60 that reproduces the position and orientation of the AR sound 3 presented in the actual experience.
  • the AR visual 4 is superimposed on the presentation position 26v of the AR visual 4 as seen from the shooting position 27 of the shot image 5.
  • the posture, size, and the like of the AR visual 4 are appropriately adjusted according to the direction and distance from the shooting position 27 to the presentation position 26s of the AR visual.
  • the posture of the AR visual 4 in the reproduced content 60 is based on the relative positional relationship between the presentation position 26v of the AR visual 4 in the event E and the shooting position 27 of the captured image 5. Or at least one of the sizes is adjusted. This makes it possible to provide the reproduced content 60 that reproduces the position and orientation of the AR visual 4 presented in the actual experience.
  • the process of generating the reproduced content 60 it is possible to change the content of the external variable such as the weather information 29 as long as it does not affect the progress of the scenario. For example, it is possible to change the content experienced in rainy weather to the experience in fine weather, or to change the content experienced in summer to the experience in winter.
  • the selection of the AR data 11 or the like changes, and it is possible to add an effect different from that at the time of the actual experience.
  • the status of user 1 may be changed. By changing the parameters related to the selection of the AR data 11 in this way, it is possible to expand and reproduce the contents of the actual experience.
  • the user 1 who participates in the attraction can be considered as an individual case or a group case.
  • the experience of one user 1 is recorded as the event information 10, and the reproduced content 60 that reproduces the experience content of the user 1 is generated.
  • the experiences of the plurality of users 1 are recorded as the event information 10, and the reproduced contents 60 in which the experiences of each user are reproduced as a series of experiences are generated.
  • the contents actually experienced by the user 1 and the examples of the reproduced contents 60 will be described separately for individual cases and group cases.
  • FIG. 15 is a schematic diagram showing the flow of attractions in which the user participates alone.
  • the user 1 who participates in the attraction borrows the terminal device 230 in which the application for providing AR by sound is installed.
  • the user 1 receives an explanation (introduction) about the attraction from the staff 9, and the experience starts.
  • the user 1 during the experience wears the earphone 2 connected to the terminal device 230.
  • the user 1 will experience a session (event E) according to various scenarios while moving according to the guidance.
  • the event information 10 regarding the user 1 is accumulated in the server device 220.
  • the terminal device 230 is returned to Staff 9.
  • QR code 48 is presented.
  • This QR code 48 is used as a link to a site or application that provides the reproduced content 60.
  • the user 1 can acquire the reproduced content that reproduces his / her own experience by accessing the link destination of the QR code 48 using his / her own smartphone or the like.
  • FIG. 16 is a schematic diagram showing an example of a session that a user experiences alone. 16A to 16D schematically show an example of a session experienced by one user 1 using illustrations.
  • AR sound 3 such as narration (NA) corresponding to an area (spot) and sound effect (SE) such as spatial sound is presented through the terminal device 230.
  • the AR data 11 presented during the session is only the AR sound 3, and the AR visual 4 is not used.
  • the user 1 experiences the sensation as if something is there, although it cannot be confirmed with the naked eye.
  • the state of the experience is photographed by the fixed point camera 46.
  • the session reproduction content 60 shown in FIG. 16A is created based on the captured image 5 captured by the fixed point camera 46.
  • the AR visual 4 such as the character 4b speaking the narration is superimposed on the captured image 5.
  • the user 1 can know the identity of the character 4b that was not known during the actual experience. Further, an AR visual 4 such as an onomatopoeia or a musical note representing a sound effect is superimposed on the captured image 5. This makes it possible to visually produce sound effects and provide a rich re-experience.
  • an AR visual 4 such as an onomatopoeia or a musical note representing a sound effect is superimposed on the captured image 5. This makes it possible to visually produce sound effects and provide a rich re-experience.
  • an AR sound 3 in which a sound (sound source) moves around the user 1 is presented.
  • the AR data 11 presented during the session is only the AR sound 3, and the AR visual 4 is not used.
  • the user 1 experiences the sensation as if something that cannot be seen with the naked eye is moving around him.
  • the state of the experience is photographed by the fixed point camera 46.
  • the session reproduction content 60 shown in FIG. 16B is created based on the captured image 5 captured by the fixed point camera 46.
  • the AR visual 4 such as the object 4a as the sound source is superimposed on the captured image 5.
  • the position of the sound source may be adjusted with reference to the shooting position 27 of the fixed point camera 46, or the position when presented to the user 1 may be used as it is.
  • an AR sound 3 such as a narration that guides the user 1 is presented.
  • a scenario of searching for a character 4b as a sound source proceeds by guidance by sound.
  • the voice of the narration becomes louder.
  • a narration or the like indicating that the direction is different is reproduced.
  • the user 1 shoots the direction in which the narration can be heard with the terminal device 230 (portable camera) the character 4b that superimposes on the shot image 5 and utters the narration is presented.
  • 16C is created based on the captured image 5 captured by the user 1 with the terminal device 230.
  • the character 4b that utters the narration is superimposed on the captured image 5.
  • a text representing the content of the narration may be added as a new AR visual 4.
  • the AR sound 3 corresponding to the operation of the user 1 is presented.
  • the AR sound 3 representing a virtual footstep is reproduced according to the landing motion of the user 1.
  • the state of this experience is photographed by the fixed point camera 46.
  • the session reproduction content 60 shown in FIG. 16D is created based on the captured image 5 captured by the fixed point camera 46.
  • the AR visual 4 representing the onomatopoeia of the footsteps, the footprints, and the like is superimposed on the captured image 5.
  • the footsteps experienced by the user 1 can be reproduced in both the sound and the image, and the user 1 can easily share his / her own experience with a third party.
  • FIG. 17 is a schematic diagram showing an example of a session that the user 1 experiences in a group.
  • 17A to 17D schematically show an example of a session in which two users 1a and 1b experience as a group by using illustrations.
  • the sessions shown in FIGS. 17A to 17D are the same as the sessions shown in FIGS. 16A to 16D.
  • the terminal device 230 When participating in an attraction in a group, the terminal device 230 is rented out to each user 1a and 1b in the group, respectively.
  • shared data is generated for the shared event of each user 1a and 1b, and the experience of each user 1a and 1b is shared. It should be noted that this technique can be applied even when the process of sharing the experience is not performed.
  • the QR code 48 After the experience is over, the QR code 48 will be presented in exchange for the return of the terminal device 230. At this time, each user 1a and 1b in the group is presented with an individual QR code 48.
  • narration and sound effects according to the area are presented as AR sound 3.
  • AR Visual 4 will not be presented during this session.
  • Users 1a and 1b experience the sensation as if there is a sound source that cannot be seen with the naked eye.
  • the AR sound 3 is reproduced so that the position of the sound source in the real world is the same for both the user 1a and the user 1b.
  • the state of the experience is photographed by the fixed point camera 46.
  • the session reproduction content 60 shown in FIG. 17A is created based on the captured image 5 captured by the fixed point camera 46.
  • the AR visual 4 such as the character 4b uttering the narration and the onomatopoeia representing the sound effect is superimposed on the captured image 5. This makes it possible for the user 1a and the user 1b to relive the content experienced by the two people with one reproduced content 60.
  • an AR sound 3 in which a sound (sound source) moves around the users 1a and 1b is presented.
  • AR Visual 4 will not be presented during this session.
  • users 1a and 1b experience the sensation as if there is a sound source moving around them.
  • the state of the experience is photographed by the fixed point camera 46.
  • the session reproduction content 60 shown in FIG. 17B is created based on the captured image 5 captured by the fixed point camera 46.
  • the AR sound 3 and the AR visual 4 such as the object 4a as the sound source thereof are superimposed on the captured image 5.
  • the users 1a and 1b can look back on their experiences while visually confirming the movement of the sound source.
  • an AR sound 3 such as a narration that guides each user 1a and 1b is presented.
  • the users 1a and 1b photograph the direction in which the narration can be heard with the terminal device 230 (portable camera)
  • the character 4b that superimposes on the photographed image 5 and utters the narration is presented.
  • the shooting positions of the users 1a and 1b are different from each other, the characters 4b shot from different directions are superimposed on the respective shot images 5.
  • the session reproduction content 60 shown in FIG. 17C is created based on the captured video 5 captured by the users 1a and 1b with the respective terminal devices 230. Therefore, the reproduced content 60 provided to the users 1a and 1b is an image obtained by shooting the character 4b from the shooting position selected by each. In this way, the users 1a and 1b can acquire the original recorded video, respectively.
  • AR sound 3 representing virtual footsteps corresponding to the actions of users 1a and 1b is presented.
  • the experiences of the users 1a and 1b are shared, and virtual footsteps matching each other's landing movements are reproduced from the respective landing points.
  • the state of this experience is photographed by the fixed point camera 46.
  • the session reproduction content 60 shown in FIG. 17D is created based on the captured image 5 captured by the fixed point camera 46.
  • all the footsteps reproduced in accordance with the steps of the users 1a and 1b and the AR visual 4 corresponding to them are superimposed on the captured image 5.
  • FIG. 18 is a schematic diagram for explaining the reproduced content 60 based on the captured video 5 captured by the users 1.
  • 18A to 18C schematically show the flow until the users 1a and 1b view the reproduced content 60 of the session in which the two users participated.
  • the reproduced content 60 is generated based on the captured image 5 in which the user 1a captures the user 1b during the session.
  • AR sound 3 (narration, sound effect, BGM, etc.) localized near the bench 51 arranged in the real world is presented to each user 1a and 1b.
  • user 1b estimates the sound source of AR sound 3
  • the AR sound 3 is presented to the user 1b sitting on the bench 51 so as to be heard by the ear.
  • the user 1b is surprised to hear a voice or the like in his / her ear, and informs the user 1a to that effect.
  • the user 1a considers that there is something invisible in the vicinity of the bench 51, and photographs the state of the user 1b sitting on the bench 51.
  • the AR visual 4 and the like are not superimposed on the captured image 5 (through image) that the user 1a visually recognizes during shooting.
  • FIG. 18C shows how the users 1a and 1b confirm the reproduced content 60 after the end of the session.
  • the reproduced content 60 is an image in which the AR sound 3 heard by the user 1b and the AR visual 4 corresponding to the AR sound 3 are superimposed on the photographed image 5 taken by the user 1a in FIG. 18B. ..
  • each user 1a and 1b confirms the identity of the virtual character 4b next to the user 1b, the identity of the sound effect heard around the user 1b, and the like for the first time by viewing the reproduced content 60. It becomes possible to do. In this way, it is possible to provide the reproduced content 60 that later supplements the content experienced by the user 1 in the group during the session.
  • the character 4b may be superimposed and presented at the time of the through image. In this case, the interaction between the character 4b and the user 1 may be realized during shooting.
  • FIG. 19 is a schematic diagram showing an example of generating the reproduced content 60 according to the context.
  • Context means, for example, the status and flow of a session.
  • a session in which the character 4b, which is invisible to the naked eye, guides the user 1 by narration is photographed (left side of FIG. 19). Further, in the reproduced content 60 of the session (on the right side of FIG. 19), effects according to the facial expression, reaction, and the like of the user 1 are superimposed. At this time, for example, when the facial expression or reaction of the user 1 expresses joy, the AR visual 4 expressing the context that the user 1 is pleased is superimposed.
  • the context is determined by executing a process of identifying emotions based on the captured image 5, the sound at that time, and the like.
  • the situation information weather information, item information, etc.
  • situation information indicating the situation of the session is extracted from the captured image 5 and the environmental sound.
  • Binaural recording or the like may be used when collecting environmental sounds. In binaural recording, for example, the voice is recorded at the ear of the user 1. As a result, it is possible to record data that accurately expresses the position and the like of the sound source by reflecting the acoustic effect and the like on the head of the user 1.
  • the information set in advance by the user 1 or the like may be used as the status information indicating the status of the user 1, or the information indicating the progress status of the scenario (for example, branching of the scenario) is used as the status information. You may. Further, the information or the like that identifies each user 1 in the captured image 5 may be used as the situation information. In this case, for example, the user 1 shown in the captured image 5 is specified by using an arbitrary personal authentication technique or the like. In this way, it is possible to customize the AR data 11 in detail by specifying each user 1.
  • a process of selecting an AR sound 3 or an AR visual 4 suitable for the context, or changing the AR sound 3 or the AR visual 4 to a shape suitable for the context is executed.
  • processing change processing
  • a process for automatically generating the AR sound 3 or the AR visual 4 in a form suitable for the context may be executed. These processes may be executed when the event information 10 is generated, or may be executed when the reproduced content 60 is generated.
  • variations of AR data 11 (AR sound 3 and AR visual 4) and expected change processing for each variation will be described.
  • the change process described below is a process of changing at least one of the AR sound 3 and the AR visual 4.
  • “Kodama” and “echo” will be described as examples of spatial acoustics.
  • a change process is executed that changes the timing at which the kodama returns depending on the location.
  • a change process is performed that changes the echo sound or effect according to the size of the space.
  • FIG. 20 is a schematic diagram showing an example of generating the reproduced content 60 using the sound collection data.
  • the sound collection data is, for example, data representing the sound and environmental sound collected when an event occurs, and is collected by using, for example, the above-mentioned method such as binaural recording.
  • the sound pick-up data may be picked up by using stereo recording or monaural recording using the terminal device 230.
  • a narration explaining the scenery of the real world (the scenery of the waterfall in the figure) is reproduced as AR sound 3 (left side of FIG. 20).
  • the reproduced content 60 of the session (on the right side of FIG. 20) is generated based on the captured image 5 in which the user 1 himself captures the scenery.
  • the text of the narration and the character 4b that utters the narration are superimposed on the reproduced content 60 as an AR visual 4.
  • the reproduced content 60 is presented with an AR visual 4 representing a sound (sound collection data) emitted by the user 1 at the time of shooting.
  • the voice "seen” emitted by the user 1 is expressed by the text 4c surrounded by the balloon.
  • the AR visual 4 data that visually represents the content of the sound or the environmental sound (sound collection data) collected when the event occurs may be used.
  • voice processing such as sound detection and voice recognition is used.
  • human voice is detected and recorded by sound detection.
  • the recorded content is extracted as text by voice recognition, and the visual including the text is automatically generated.
  • visuals representing the environment may be automatically generated from environmental sounds other than human voice.
  • the server control unit 223 acquires the event information 10 that records the event E that occurred during the experience of AR by sound.
  • AR data 11 including the AR sound 3 presented in each event E is recorded along the timeline 6.
  • the reproduction content 60 that reproduces the AR data 11 on the timeline 6 is generated. This makes it possible to reproduce the experience content of the user 1 and share the augmented reality experience by sound.
  • the reproduced content 60 in which the related effects such as AR visuals are superimposed on the captured image 5 is generated in conjunction with the AR sound (AR sound 3) by sound.
  • AR sound AR sound 3
  • the AR sound 3 that the user 1 listened to during the experience is presented again, and the AR visual 4 according to the audio content and the situation of the user 1 is superimposed on the video being experienced.
  • the AR sound 3 and the AR visual 4 experienced by the user 1 are managed by the timeline 6, they can be reproduced at any time.
  • the reproduced content 60 by adding the AR visual 4 and the like, it is possible to produce a rich re-experience, and it is possible to exhibit excellent entertainment.
  • the experience value is greatly improved by covering not only the actual experience but also the re-experience.
  • the information processing system 100 that generates shared data for sharing the AR session by sound in real time and the information processing system 200 that generates the reproduced content that reproduces the AR session by sound after the experience are described separately. Not limited to this, a system capable of generating reproduction data of both shared data and reproduced content may be constructed.
  • the information processing method according to the present technology is executed by a computer such as a terminal device and a server device operated by a user has been described.
  • the information processing method and the data generation method according to the present technology may be executed by these computers and other computers capable of communicating via a network or the like.
  • the information processing method and the data generation method according to the present technology can be executed not only in a computer system composed of a single computer but also in a computer system in which a plurality of computers operate in conjunction with each other.
  • the system means a set of a plurality of components (devices, modules (parts), etc.), and it does not matter whether or not all the components are in the same housing. Therefore, a plurality of devices housed in separate housings and connected via a network, and one device in which a plurality of modules are housed in one housing are both systems.
  • the execution of the information processing method and the data generation method according to the present technology by the computer system is performed, for example, when the acquisition of event information, the generation of reproduced data, etc. are executed by a single computer, or by a computer in which each process is different. Includes both when executed. Further, the execution of each process by a predetermined computer includes having another computer execute a part or all of the process and acquiring the result.
  • the information processing method and program related to this technology can be applied to the configuration of cloud computing in which one function is shared by a plurality of devices via a network and processed jointly.
  • the present technology can also adopt the following configurations.
  • An acquisition unit that acquires event information in which audiovisual data including sound data presented at the event is recorded along the timeline for each event that occurs while the user is experiencing augmented reality by sound.
  • a generation unit and an information processing device that generate reproduction data for reproducing the audiovisual data based on the event information.
  • the audiovisual data is an information processing device including visual data related to the event.
  • the visual data is an information processing device that visually represents the contents of the sound data.
  • the visual data is an information processing device that visually represents the content of the sound or environmental sound picked up when the event occurs.
  • the event information is an information processing device including a detection timing of a trigger that causes the event in the timeline.
  • the event information includes the presentation timing of the audiovisual data in the event.
  • the generation unit is an information processing device that sets the timing at which the audiovisual data is reproduced in the reproduction data based on the presentation timing of the audiovisual data.
  • the information processing apparatus according to (6). An information processing device that records the presentation timing of audiovisual data as a time relative to the detection timing of the trigger.
  • the information processing apparatus according to any one of (1) to (7).
  • the event information includes the presentation position of the audiovisual data in the event.
  • the generation unit is an information processing device that sets a position in which the audiovisual data is reproduced in the reproduction data based on the presentation position of the audiovisual data.
  • the information processing apparatus according to any one of (1) to (8).
  • the audiovisual data is an information processing device that is data generated based on situation information representing a situation when the event occurs, or data selected based on the situation information.
  • the status information is an information processing device including information on at least one of the user's surrounding environment, the user's status, and the user's operation content.
  • the acquisition unit acquires the event information regarding the event targeting the first user, and obtains the event information.
  • the generation unit generates information for generating shared data as the reproduction data, which is presented to a second user different from the first user so that an event targeting the first user can be viewed.
  • Processing equipment (12) The information processing apparatus according to (11).
  • the generation unit is the shared data when the relative distance between the first user and the second user, or at least one of the affiliations of the first user and the second user satisfies a predetermined condition.
  • An information processing device that produces.
  • the generation unit reproduces the sound data in the shared data based on the relative positional relationship between the presentation position of the sound data in the event targeting the first user and the position of the second user.
  • An information processing device that adjusts at least one of the volume and playback direction.
  • the information processing apparatus is an information processing device that generates reproduction content that reproduces the event along the timeline as the reproduction data.
  • the information processing apparatus according to (14). The acquisition unit acquires an experience image taken when the user experiences augmented reality with the sound.
  • the generation unit is an information processing device that generates the reproduction content by adding an effect of the audiovisual data to the experience image along the timeline.
  • the experience image is an information processing device that is an image taken by at least one of a camera carried by a person who has experienced augmented reality with the sound including the user, or a camera arranged around the experience person.
  • the information processing apparatus according to any one of (15) and (16).
  • the generation unit adjusts at least one of the reproduction volume or reproduction direction of the sound data in the reproduced content based on the relative positional relationship between the presentation position of the sound data in the event and the shooting position of the experience image.
  • Information processing equipment For each event that occurs while the user is experiencing augmented reality by sound, the event information in which audiovisual data including the sound data presented at the event is recorded along the timeline is acquired. An information processing method in which a computer system executes to generate reproduction data for reproducing the audiovisual data based on the event information based on the timeline. (19) A recording unit that records audiovisual data including sound data presented at the event along a timeline and generates event information for each event that occurs while the user is experiencing augmented reality by sound.
  • the acquisition unit that acquires the event information and An information processing system including a generation unit for generating reproduction data for reproducing the audiovisual data based on the event information. (20) For each event that occurs while the user is experiencing augmented reality by sound, the event information in which audiovisual data including the sound data presented at the event is recorded along the timeline is acquired. A data generation method in which a computer system executes to generate reproduction data for reproducing the audiovisual data based on the event information based on the timeline.
  • E, E1 to E4 ... Events 1, 1a, 1b ... User 3 ... AR sound 4 ... AR visual 5a to 5c ... Shooting video 6 ... Timeline 7 ... Position information 8 ... Trigger 10 ... Event information 11 ... AR data 20, 220 ... Server device 23, 223 ... Server control unit 24, 224 ... Server data management unit 225 ... Reproduction content generation unit 30, 30a, 30b, 230 ... Terminal device 40 ... Terminal control unit 41 ... Terminal data management unit 44 ... Live content generation Part 60 ... Reproduction content 100, 200 ... Information processing system

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • General Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Acoustics & Sound (AREA)
  • General Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Optics & Photonics (AREA)
  • User Interface Of Digital Computer (AREA)
  • Electrophonic Musical Instruments (AREA)
  • Television Signal Processing For Recording (AREA)

Abstract

Un dispositif de traitement d'informations est muni d'une unité d'acquisition et d'une unité de génération. Pour chaque événement survenant lorsqu'un utilisateur fait l'expérience d'une réalité augmentée par l'intermédiaire d'un son, l'unité d'acquisition acquiert des informations de l'événement dans lesquelles des données audiovisuelles, y compris des données sonores présentées pendant l'événement, sont enregistrées sur une frise chronologique. En fonction des informations de l'événement, l'unité de génération génère des données de lecture afin de lire les données audiovisuelles sur la base de la frise chronologique.
PCT/JP2021/040217 2020-11-12 2021-11-01 Dispositif, procédé et système de traitement d'informations et procédé de génération de données WO2022102446A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2020188554A JP2023181567A (ja) 2020-11-12 2020-11-12 情報処理装置、情報処理方法、情報処理システム、及びデータ生成方法
JP2020-188554 2020-11-12

Publications (1)

Publication Number Publication Date
WO2022102446A1 true WO2022102446A1 (fr) 2022-05-19

Family

ID=81602190

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2021/040217 WO2022102446A1 (fr) 2020-11-12 2021-11-01 Dispositif, procédé et système de traitement d'informations et procédé de génération de données

Country Status (2)

Country Link
JP (1) JP2023181567A (fr)
WO (1) WO2022102446A1 (fr)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2014187559A (ja) * 2013-03-25 2014-10-02 Yasuaki Iwai 仮想現実提示システム、仮想現実提示方法
WO2017099213A1 (fr) * 2015-12-11 2017-06-15 ヤマハ発動機株式会社 Dispositif de présentation d'onomatopée correspondant à des résultats d'évaluation d'un environnement alentour
JP2018063589A (ja) * 2016-10-13 2018-04-19 キヤノンマーケティングジャパン株式会社 情報処理装置、情報処理システム、その制御方法及びプログラム
WO2018186178A1 (fr) * 2017-04-04 2018-10-11 ソニー株式会社 Dispositif de traitement d'informations, procédé de traitement d'informations, et programme
JP2019209392A (ja) * 2018-05-31 2019-12-12 国立大学法人名古屋大学 力覚視覚化装置、ロボットおよび力覚視覚化プログラム

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2014187559A (ja) * 2013-03-25 2014-10-02 Yasuaki Iwai 仮想現実提示システム、仮想現実提示方法
WO2017099213A1 (fr) * 2015-12-11 2017-06-15 ヤマハ発動機株式会社 Dispositif de présentation d'onomatopée correspondant à des résultats d'évaluation d'un environnement alentour
JP2018063589A (ja) * 2016-10-13 2018-04-19 キヤノンマーケティングジャパン株式会社 情報処理装置、情報処理システム、その制御方法及びプログラム
WO2018186178A1 (fr) * 2017-04-04 2018-10-11 ソニー株式会社 Dispositif de traitement d'informations, procédé de traitement d'informations, et programme
JP2019209392A (ja) * 2018-05-31 2019-12-12 国立大学法人名古屋大学 力覚視覚化装置、ロボットおよび力覚視覚化プログラム

Also Published As

Publication number Publication date
JP2023181567A (ja) 2023-12-25

Similar Documents

Publication Publication Date Title
US10819969B2 (en) Method and apparatus for generating media presentation content with environmentally modified audio components
US11039264B2 (en) Method of providing to user 3D sound in virtual environment
US11977670B2 (en) Mixed reality system for context-aware virtual object rendering
US10908682B2 (en) Editing cuts in virtual reality
US10657727B2 (en) Production and packaging of entertainment data for virtual reality
KR102609668B1 (ko) 가상, 증강, 및 혼합 현실
JP6959943B2 (ja) 3d音声ポジショニングを用いて仮想現実又は拡張現実のプレゼンテーションを生成するための方法及び装置
JP2021082310A (ja) 拡張現実および仮想現実のためのシステムおよび方法
US20150302651A1 (en) System and method for augmented or virtual reality entertainment experience
US20160110922A1 (en) Method and system for enhancing communication by using augmented reality
US20210375052A1 (en) Information processor, information processing method, and program
JPWO2018139117A1 (ja) 情報処理装置、情報処理方法およびそのプログラム
JP2009088729A (ja) 合成画像出力装置および合成画像出力処理プログラム
JP2009086785A (ja) 合成画像出力装置および合成画像出力処理プログラム
US20240203058A1 (en) System and method for performance in a virtual reality environment
KR102200239B1 (ko) 실시간 cg 영상 방송 서비스 시스템
WO2022102446A1 (fr) Dispositif, procédé et système de traitement d'informations et procédé de génération de données
EP4080907A1 (fr) Dispositif de traitement d'informations et procédé de traitement d'informations
EP4306192A1 (fr) Dispositif de traitement d'information, terminal de traitement d'information, procédé de traitement d'information et programme
US11726551B1 (en) Presenting content based on activity
WO2023281820A1 (fr) Dispositif de traitement d'informations, procédé de traitement d'informations et support de stockage
Harju Exploring narrative possibilities of audio augmented reality with six degrees of freedom
Fujimura et al. PARAOKE alpha: a new application development of multiplex-hidden display technique for music entertainment system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21891693

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21891693

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP