WO2022224586A1 - 情報処理装置、情報処理方法、プログラム、ならびに、情報記録媒体 - Google Patents
情報処理装置、情報処理方法、プログラム、ならびに、情報記録媒体 Download PDFInfo
- Publication number
- WO2022224586A1 WO2022224586A1 PCT/JP2022/008277 JP2022008277W WO2022224586A1 WO 2022224586 A1 WO2022224586 A1 WO 2022224586A1 JP 2022008277 W JP2022008277 W JP 2022008277W WO 2022224586 A1 WO2022224586 A1 WO 2022224586A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- information processing
- orientation
- virtual
- user
- processing device
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; ELECTRIC HEARING AIDS; PUBLIC ADDRESS SYSTEMS
- H04R3/00—Circuits for transducers
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S7/00—Indicating arrangements; Control arrangements, e.g. balance control
Definitions
- the present invention relates to an information processing device, an information processing method, a program, and an information recording medium for estimating the orientation of a user's face in the real world and outputting information according to this.
- the sound source selection device disclosed in Patent Document 1 is headphones and a virtual sound source providing means for providing a plurality of virtual sound sources localized via the headphones to the listener wearing the headphones; virtual sound source selection means for selecting one virtual sound source from the plurality of virtual sound sources;
- the virtual sound source providing means is localized sound source arrangement pattern storage means for storing a plurality of localized sound source arrangement patterns of the plurality of virtual sound sources to be provided to the listener; arrangement pattern selection means for selecting a desired pattern from the plurality of localized sound source arrangement patterns according to the listener's selection action; mixing means for providing the plurality of virtual sound sources according to the localized sound source arrangement pattern; a head movement detection sensor mounted on the headphones and detecting movement of the listener's head; head motion determination means for determining the motion of the head based on the output of the head motion detection sensor;
- the arrangement pattern selection means selects another localized sound source arrangement pattern from the localized sound source arrangement pattern storage means
- the front camera sometimes called the in-camera, front camera, or front camera.
- a rear camera sometimes called a rear camera
- a head movement detection sensor included in headphones is used to detect movement of the user's head.
- audio equipment such as headphones and earphones used with smartphones and tablets have noise canceling functions and external audio capture functions, they are becoming popular, but most of them do not have head movement detection sensors. is.
- the present invention is intended to solve the above problems, and includes an information processing apparatus, an information processing method, a program, and an information recording medium for estimating the orientation of a user's face in the real world and outputting information according to the orientation. Regarding.
- An information processing apparatus has a camera, detecting a first orientation of the information processing device in a first coordinate system fixed in the real world; If the photographed image taken by the camera contains the face image of the user, the second coordinate system of the user's face in the second coordinate system fixed to the information processing device can be obtained from the photographed image and the face image. Estimate the orientation, calculating a third orientation of the user's face in the first coordinate system from the detected first orientation and the estimated second orientation; Information corresponding to the calculated third orientation is output.
- an information processing device an information processing method, a program, and an information recording medium for estimating the direction of a user's face in the real world and outputting information according to this.
- FIG. 1 is an explanatory diagram showing a schematic configuration of an information processing device according to an embodiment of the present invention
- FIG. 4 is a flow chart showing control of an information processing method executed by the information processing apparatus according to the embodiment of the present invention
- 4 is a drawing substitute photograph showing an example of display by the information processing apparatus according to the embodiment of the present invention in gray scale.
- 4 is a drawing-substituting photograph showing an example of display by the information processing apparatus according to the embodiment of the present invention in monochrome binary.
- 4 is a drawing substitute photograph showing an example of display by the information processing apparatus according to the embodiment of the present invention in gray scale.
- 4 is a drawing-substituting photograph showing an example of display by the information processing apparatus according to the embodiment of the present invention in monochrome binary.
- 4 is a drawing substitute photograph showing an example of display by the information processing apparatus according to the embodiment of the present invention in gray scale.
- 4 is a drawing-substituting photograph showing an example of display by the information processing apparatus according to the embodiment of the present invention in monochrome binary.
- 4 is a drawing substitute photograph showing an example of display by the information processing apparatus according to the embodiment of the present invention in gray scale.
- 4 is a drawing-substituting photograph showing an example of display by the information processing apparatus according to the embodiment of the present invention in monochrome binary.
- 4 is a drawing substitute photograph showing an example of display by the information processing apparatus according to the embodiment of the present invention in gray scale.
- 4 is a drawing-substituting photograph showing an example of display by the information processing apparatus according to the embodiment of the present invention in monochrome binary.
- 4 is a drawing substitute photograph showing an example of display by the information processing apparatus according to the embodiment of the present invention in gray scale.
- 4 is a drawing-substituting photograph showing an example of display by the information processing apparatus according to the embodiment of the present invention in monochrome binary.
- 4 is a drawing substitute photograph showing an example of display by the information processing apparatus according to the embodiment of the present invention in gray scale.
- 4 is a drawing-substituting photograph showing an example of display by the information processing apparatus according to the embodiment of the present invention in monochrome binary.
- 4 is a drawing substitute photograph showing an example of display by the information processing apparatus according to the embodiment of the present invention in gray scale.
- 4 is a drawing-substituting photograph showing an example of display by the information processing apparatus according to the embodiment of the present invention in monochrome binary.
- 4 is a drawing-substituting photograph showing, in grayscale, a display example of a stage in a virtual concert venue by the information processing apparatus according to the embodiment of the present invention.
- FIG. 10 is a drawing-substituting photograph showing a display example of a stage in a virtual concert venue by the information processing apparatus according to the embodiment of the present invention in monochrome binary.
- 3 is a drawing-substitute photograph showing, in monochrome binary, a display example of a virtual room in which a plurality of displays are arranged by the information processing apparatus according to the embodiment of the present invention.
- FIG. 4 is a drawing-substituting photograph showing, in grayscale, a display example of a virtual room in which a plurality of moving image contents are arranged by the information processing apparatus according to the embodiment of the present invention
- 10 is a drawing-substituting photograph showing, in monochrome binary, a display example of a virtual room in which a plurality of pieces of moving image content are arranged by the information processing apparatus according to the embodiment of the present invention.
- 10 is a drawing-substituting photograph showing, in grayscale, a display example when one moving image content is noticed in the virtual room by the information processing apparatus according to the embodiment of the present invention.
- 10 is a drawing-substituting photograph showing, in monochrome binary, a display example when one moving image content is noticed in the virtual room by the information processing apparatus according to the embodiment of the present invention.
- 10 is a drawing-substituting photograph showing, in grayscale, a display example when one moving image content is noticed in the virtual room by the information processing apparatus according to the embodiment of the present invention.
- 10 is a drawing-substituting photograph showing, in monochrome binary, a display example when one moving image content is noticed in the virtual room by the information processing apparatus according to the embodiment of the present invention.
- 10 is a drawing-substituting photograph showing, in grayscale, a display example when one moving image content is noticed in the virtual room by the information processing apparatus according to the embodiment of the present invention.
- FIG. 10 is a drawing-substituting photograph showing, in monochrome binary, a display example when one moving image content is noticed in the virtual room by the information processing apparatus according to the embodiment of the present invention.
- 1 is an explanatory diagram showing a schematic configuration of an information processing device that processes an object of interest according to an embodiment of the present invention
- FIG. 1 is an explanatory diagram showing a schematic configuration of an information processing device according to an embodiment of the present invention. An outline will be described below with reference to this figure.
- the information processing apparatus 101 has a camera 151. As shown in FIG. It has a detection unit 111 , an estimation unit 112 , a calculation unit 113 and an output unit 114 . Also, the audio equipment 152, the screen 153 of the display, etc. can be employed as the output destination of the information.
- the information processing apparatus 101 is typically realized by executing a program on a portable computer such as a smart phone or a tablet.
- the computer is connected to various output devices and input devices, and exchanges information with these devices.
- Programs run on a computer can be distributed and sold by a server to which the computer is communicatively connected, as well as CD-ROM (Compact Disk Read Only Memory), flash memory, EEPROM (Electrically Erasable Programmable ROM). After recording on a non-transitory information recording medium such as the above, it is also possible to distribute and sell the information recording medium.
- CD-ROM Compact Disk Read Only Memory
- flash memory flash memory
- EEPROM Electrically Erasable Programmable ROM
- the program is installed on a computer's hard disk, solid state drive, flash memory, EEPROM, or other non-temporary information recording medium. Then, the computer realizes the information processing apparatus according to the present embodiment.
- a computer's CPU Central Processing Unit
- RAM Random Access Memory
- OS Operating System
- Various information required in the process of program execution can be temporarily recorded in the RAM.
- the computer has a GPU (Graphics Processing Unit) for performing various image processing calculations at high speed.
- GPU Graphics Processing Unit
- libraries such as GPU and TensorFlow, it becomes possible to use learning functions and classification functions in various artificial intelligence processing under the control of CPU.
- the information processing apparatus 101 of the present embodiment uses a dedicated electronic circuit instead of implementing the information processing apparatus of the present embodiment using a computer on which software is installed.
- a portable camera, a portable electronic game device, or the like can be used as the information processing device 101 .
- the program can also be used as material for generating wiring diagrams, timing charts, etc. of electronic circuits.
- an electronic circuit that satisfies the specifications defined in the program is configured by FPGA (Field Programmable Gate Array) or ASIC (Application Specific Integrated Circuit), and the electronic circuit performs the functions defined in the program.
- the information processing apparatus of this embodiment is realized by functioning as a dedicated device that fulfills the functions.
- the information processing apparatus 101 will be described below assuming that it is implemented by a computer executing a program.
- the information processing apparatus 101 can be connected wirelessly or by wire to audio equipment 152 such as headphones, earphones, neck speakers, bone conduction speakers, hearing aids, etc., as information output destinations.
- audio equipment 152 such as headphones, earphones, neck speakers, bone conduction speakers, hearing aids, etc.
- These audio devices 152 desirably have an external audio capture function.
- the detection unit 111 detects the first orientation of the information processing device 101 in the first coordinate system fixed to the real world.
- the orientation (first orientation) of the information processing device 101 in the first coordinate system fixed in the real world is detected via a geomagnetic sensor, an inertial sensor for detecting gravity, an acceleration sensor, a gyro sensor, etc., which the information processing device 101 has. can do.
- the position (first position) of the information processing device 101 in the first coordinate system can also be detected by a geolocation detection function using GPS, Wifi access points, Bluetooth beacons, or the like.
- estimating section 112 calculates the face image of the user in the second coordinate system fixed to information processing device 101 from the captured image and the face image. Estimate the second orientation of the face.
- the information processing apparatus 101 extracts the face image drawn in the captured image by image recognition, recognizes the characteristic parts such as the eyes, nose, mouth, etc. Then, based on the face image, the information processing apparatus 101 Estimate the relative user face orientation (second orientation). A general face tracking technique can be applied to this process.
- the position (second position) of the user's face relative to the information processing device 101 may be further estimated based on the position and size of the face image in the captured image.
- calculation section 113 calculates a third orientation of the user's face in the real world (first coordinate system) from the detected first orientation and the estimated second orientation.
- the directional transformation between the first coordinate system and the second coordinate system can be uniquely defined based on the first orientation. Further, when the first position is detected, it is possible to uniquely determine coordinate transformation of coordinate values between the first coordinate system and the second coordinate system based on the first orientation and the first position. .
- the output unit 114 outputs information corresponding to the calculated third orientation.
- the audio equipment 152 worn by the user or the screen 153 of the display can be adopted.
- the information to be output is the voice mixed by setting one or more virtual sound sources in the real world and changing the intensity, tone, phase, etc. of the waveform associated with each virtual sound source according to the third direction. Information can be employed.
- the ratio of the amplification factor based on the angle difference for each virtual sound source is maintained so that the average sound pressure does not change significantly when it is assumed that the face is rotated once.
- the virtual direction associated with the virtual sound source is a virtual position where the virtual sound source is located in the first coordinate system; a first orientation and a first position detected by the detection unit 111; , the same processing as the former may be performed after calculation.
- the third position can be obtained by coordinate-transforming the relative face position (second position) with respect to the information processing device 101 obtained by face tracking into a coordinate system fixed to the information processing device 101 .
- the orientation of the user's face may be displayed on the display screen 153 like a compass. If a display mode is adopted in which the direction of the "needle" of the "compass” changes in accordance with the change in direction when the user changes the direction of the face, the range in which the screen 153 of the display falls within the user's field of vision. If it is within, the user can confirm that the present embodiment is operating properly.
- direct sound and reverb sound may be generated based on the waveform of the virtual sound source, and the mixing ratio of the two may be changed according to the angular difference. If the angle difference is small, the user can be made to feel that the virtual sound source is being heard loudly from the front side by increasing the ratio of the direct sound. This is called echo correction.
- the central sound range of the virtual sound source in the front is obtained, and for the virtual sound sources in other directions, the obtained central sound range is weakened by an equalizer to reduce the frequency fogging and reduce the virtual sound source in the front side. It is also possible to let the user listen by floating it. This is called center range correction.
- saturation correction For the virtual sound source on the front side, it is possible to add saturation that strengthens the overtone components to make the sound brilliant, and make the virtual sound source on the front side stand out for the user to listen to. This is called saturation correction.
- the camera 151 of the information processing apparatus 101 is a so-called front camera, its photographing direction matches the display direction of the screen 153 of the display and faces the direction in which the user is assumed to be positioned.
- the user's face should be captured by the camera 151 .
- correction may be used as an average default value.
- dramatic correction may be made to emphasize the virtual sound source in front.
- the user's face image and the user's hand image are image-recognized, and the position of the user's face (for example, the center position of the face) and the position of the user's hand (for example, the position of the tip of the little finger) ) and, after estimating, depending on the distance (closeness) between the two, the intensity of the dramatic correction can be changed, thereby easily responding to the gesture of listening.
- the position of the user's hand image in the photographed image for example, the position of the tip of the little finger
- the representative point of the photographed image for example, the center position of the photographed image, the center position of the face image, etc.
- the output of the virtual sound source has directivity linked to this.
- the external sound enters the user's ear as it is, such as a speaker
- the environmental sound and the virtual sound are mixed without contradiction according to the direction of the user's face and provided to the user. It is possible to provide voice augmented reality.
- FIG. 2 is a flow chart showing control of an information processing method executed by the information processing apparatus according to the embodiment of the present invention. Description will be made below with reference to this figure. It should be noted that each step of the following processing can be omitted as appropriate depending on the mode of application.
- the information processing device 101 detects a first orientation (or first position) of the information processing device 101 in the real world (first coordinate system) via a geomagnetic sensor, a gyro sensor, an acceleration sensor, etc. (step S202). ).
- Information processing device 101 then repeats the following process for each of the virtual sound sources (step S207).
- the information processing device 101 acquires the virtual direction of the virtual sound source in the first coordinate system (step S208).
- This virtual orientation may be determined in advance, or calculated based on the virtual position of the virtual sound source in the first coordinate system and the first position of the information processing device 101 (or the third position of the user's face).
- the amplification factor may be further corrected according to the distance (closeness) between the virtual position of the virtual sound source and the first position (or third position). That is, the smaller the distance, the larger the amplification factor, and the like.
- the information processing device 101 further corrects the new parameters for reproduction of all virtual sound sources based on their mutual relationships (step S211).
- This correction includes, for example, center range correction for emphasizing the virtual sound source on the front side compared to other virtual sound sources, and power correction for maintaining the force of the entire virtual sound source as it is.
- Step S212-S214 the process returns to step S202.
- the result of detection of the orientation of the user's face and the result of detection of the tip of the little finger are displayed in a window. Until the user gets accustomed to the operation, he or she can check and practice gestures while holding a position where the camera 151 captures the user's face by looking at the detection results.
- the window By tapping or sliding the on/off button to the left of the play button, the window can be closed as shown in Figures 5 and 6.
- the window can be displayed again by tapping or sliding the same on/off button again.
- musical instrument icons are arranged in a circle. This represents the orientation of the virtual sound source part placed in the virtual space.
- the musical instruments are arranged at equal intervals, but they do not necessarily have to be evenly spaced and circular, and can be arranged arbitrarily.
- the avatar of the operating user At the center of the circle is the avatar of the operating user, and the direction of the white arrow indicates the direction of the user's face.
- the musical instrument icon at the tip of the white arrow corresponds to the virtual sound source positioned in front of the user.
- the white arrow is pointing in a default direction (for example, upward), and if the user changes the direction of the face or moves the position of the smartphone, the direction of the white arrow changes accordingly.
- Tapping on the avatar resets the direction of the white arrow and the placement (distance) of the instrument.
- Two sliders are lined up at the bottom of the screen 153 .
- the upper slider represents the distance to the musical instruments arranged in a circle in the virtual space, and the distance can be changed by moving the slider. In the arrangement shown in Figures 3 and 4 the distance is 20 meters, in Figures 9 and 10 it is 10 meters and in Figures 11 and 12 it is 30 meters. The distance from the avatar to the musical instrument shown on screen 153 also changes according to this distance.
- the lower slider is linked to the degree of focusing, that is, the angle of the sector.
- the degree of focusing can be changed by gestures, but it can also be adjusted by moving the slider directly.
- the master volume (the default value of the mixer gain) for each instrument.
- the information processing apparatus 101 multiplies the master volume by a multiplier corresponding to the angle difference, thereby once calculating the amplification factor used for mixing, and then performing correction so that the overall power becomes substantially constant.
- the boost mode is set.
- the boost mode when adjusting the amplification factor to keep the overall power constant, it is possible to emphasize the instrument in front by doubling the strength of the virtual sound source in front.
- Figures 17 and 18 are examples of output when the same functions as the above smartphone are implemented on a tablet.
- an augmented reality image is displayed overlaid with a video of a virtual person playing a virtual musical instrument in an uninhabited park captured by a rear camera.
- the present embodiment can also be provided for virtual reality instead of augmented reality.
- 19 and 20 provide the user with a virtual reality as if players of virtual musical instruments were arranged in a circle on the stage of a virtual concert venue and the user was placed in the center.
- a virtual object is created by composing an image of playing a musical instrument.
- a performance sound of a musical instrument is associated with each virtual object as a virtual sound source, and the virtual sound source is mixed and output in the same manner as in the above embodiment.
- the user can have the experience of being the conductor of a virtual concert.
- the user selects the avatar facing the user, i.e., the avatar positioned in front of the user, among the avatars of a plurality of performers, as the target of attention by using a gesture of listening. It can be identified as an object of interest.
- the virtual object displayed in the center of the screen 153 becomes the target object.
- the center of the screen is displayed.
- the object of interest is the virtual object displayed in the direction in which the face is directed, not the virtual object displayed in the direction. That is, the pronunciation object that has the smallest angle difference between the virtual direction associated with the virtual sound source and the third direction and is equal to or less than the threshold angle is specified as the object of interest.
- the user may be able to change the position and orientation of the object of interest.
- the screen 153 when the screen 153 is configured as a touch screen, when the touch screen is touched and a tracing operation is performed, the object of interest is moved along the locus of the same shape obtained by translating the locus of the tracing operation. Also good.
- the target object since the target object is specified, it is not necessary to touch the target object itself displayed on the screen 153, and the tracing operation can be performed on the screen 153 other than the place where the target object is displayed, The position of the target object can be changed without hiding the target object with the finger.
- the virtual video played on each virtual display together with the sound functioning as the virtual sound source corresponds to the virtual object.
- the user it is also possible for the user to view and compare more than 10 virtual moving images in order. That is, the virtual moving images can be exchanged on a virtual display arranged at a position invisible to the user in the virtual space.
- the user can view the virtual moving images in order by rotating his or her body in the real space while holding the information processing device 101 .
- the virtual display may be rotated around the user in the virtual space.
- the user turns his face to one of the virtual moving images displayed on the screen 153 and makes a gesture such as listening, or keeps turning his face for the duration of the threshold time. , etc., the virtual moving image can be specified as the target object.
- FIGS. 25 and 26 show how the virtual video drawn in the center of the screen in FIGS. 23 and 24 is identified as the object of interest, enlarged in the center of the screen, and the video and audio of the target object being played. .
- the user can cancel the identification as the object of interest by making a gesture of spreading out his/her hand and bringing it closer to the camera 151 of the information processing device 101, by tapping the touch screen for a short period of time, or the like.
- the virtual moving image surrounding the user in the virtual space is rotated around the user so that the virtual moving image whose identification has been canceled is positioned where the user's head is facing. It's good as a thing. That is, (the virtual orientation of) the virtual object placed in the virtual space around the virtual starting point in the virtual space so that the virtual orientation of the virtual object whose identification has been canceled matches the calculated third orientation. will be rotated.
- the information processing apparatus 101 has a specifying unit 301 and a canceling unit 302 in addition to the configuration disclosed in FIG.
- the identifying unit 301 and the canceling unit 302 acquire various kinds of information from the detecting unit 111, the estimating unit 112, and the calculating unit 113, and control the output unit 114 accordingly.
- Each sounding object can be, for example, a virtual object in the above embodiment, which corresponds to an avatar of a performer playing a virtual musical instrument or a virtual display playing back a virtual moving image.
- Each pronunciation object is associated with a virtual sound source.
- the virtual sound source corresponds to the performance sound output by the virtual musical instrument or the sound reproduced together with the virtual moving image.
- the appearance of the virtual world displayed on the screen 153 changes accordingly.
- the screen 153 of the information processing device 101 functions as a "window" for looking into the virtual space.
- the cancellation unit 302 of the information processing device 101 determines whether or not the cancellation condition is satisfied, and performs processing accordingly.
- a specific condition is a condition for specifying one of a plurality of sounding objects as an attention object by the user
- a cancellation condition is a condition for canceling identification as an attention object.
- a gesture of listening closely, or continuing to face a specific sounding object for a predetermined period of time or longer is adopted as the specific condition.
- a gesture of bringing the screen closer to 151, tapping on the touch screen that constitutes the screen 153, etc. are employed, but other conditions can also be employed.
- the identifying unit 301 determines that the specific condition is satisfied, the sounding object having the smallest angular difference between the virtual direction associated with the virtual sound source and the calculated third direction is selected by the user. identified as the object of interest by
- the output unit 114 mixes the virtual sound source with an intensity corresponding to the angle difference between the virtual direction associated with the virtual sound source and the calculated third direction.
- output unit 114 outputs the virtual sound source associated with the specified object of interest to another virtual sound source. Give priority to the sound source.
- the performance sound of the performer's virtual musical instrument corresponding to the object of interest and the sound accompanying the virtual moving image are output with priority over other sounds.
- priority includes, for example, setting the amplification factor of the virtual sound source of the target object to a predetermined constant and setting the amplification factor of the other virtual sound sources to zero (mute) or a small value.
- the virtual moving image corresponding to the object of interest is displayed in a predetermined size in the center of the screen for highlighting.
- the cancellation unit 302 cancels the identification as the object of interest when the cancellation condition is satisfied.
- priority output of the virtual sound source and highlighting on the screen 153 are ended, and the output method described first is adopted.
- the position of the virtual object placed in the virtual space can also be rotated around the viewpoint in the virtual space. That is, the information processing apparatus 101 rotates the virtual orientation of the sound object placed in the virtual space around the viewpoint position based on a gesture based on the user's hand image included in the captured image or a touch operation on the screen.
- the position and orientation of the avatar can be edited by pinching the player's avatar or touching the screen 153 with a plurality of fingers and rotating the avatar. That is, while the object of interest is being specified, the information processing apparatus 101 determines the position or orientation of the object of interest in the virtual space based on a gesture based on the user's hand image included in the captured image or a touch operation on the screen. can be changed.
- the information processing apparatus has a camera, a detection unit that detects a first orientation of the information processing device in a first coordinate system fixed in the real world; If the photographed image taken by the camera contains the face image of the user, the second coordinate system of the user's face in the second coordinate system fixed to the information processing device can be obtained from the photographed image and the face image.
- an estimator for estimating orientation a calculation unit that calculates a third orientation of the user's face in the first coordinate system from the detected first orientation and the estimated second orientation
- An output unit for outputting information corresponding to the calculated third orientation is provided.
- the information processing device is wirelessly or wiredly connected to the audio equipment worn by the user,
- the output unit can be configured to output the information to the audio equipment.
- the audio device can be configured to be headphones, earphones, neck speakers, bone conduction speakers, or hearing aids capable of capturing ambient sounds.
- the output unit may a virtual orientation associated with the virtual sound source; the calculated third orientation; A sound obtained by mixing the virtual sound source with an intensity corresponding to the angle difference between the two can be output as the information.
- the virtual orientation can be configured to be predetermined.
- the information processing device displays video information corresponding to the detected first position and first orientation on a screen whose display direction is the same as the shooting direction of the camera,
- the waveform of the virtual sound source may be corrected according to the size of the face image.
- the captured image includes the face image of the user and the hand image of the user
- the distance between the face of the user and the hand of the user in the second coordinate system to correct the waveform of the virtual sound source.
- the waveform of the virtual sound source can be corrected according to the distance between the representative point of the captured image and the hand image. .
- the virtual sound source is associated with a sounding object placed in the virtual space;
- the information processing device is A screen in which the state of the virtual space observed from the viewpoint position and line-of-sight direction corresponding to the detected first position and first direction in which the sound object is arranged is displayed in the same direction as the shooting direction of the camera.
- the output unit outputs the virtual sound source associated with the identified object of interest in preference to other virtual sound sources instead of outputting information according to the calculated third direction, displaying the identified object of interest on the screen while emphasizing it more than other pronunciation objects;
- the identification as the target object can be canceled when the cancellation condition is satisfied.
- the pronunciation object is a video that is played back with audio
- the information processing device displaying the target object at a predetermined position in the screen at a predetermined magnification;
- the output unit outputs a mixed sound by muting other virtual sound sources with a predetermined amplification factor for the virtual sound source associated with the object of interest,
- the information processing device adjusts the viewpoint position so that the virtual orientation of the pronunciation object whose identification as the object of interest has been canceled matches the calculated third orientation. centered around the virtual orientation of the sounding object placed in the virtual space.
- the information processing device virtualizes a sounding object arranged in the virtual space around the viewpoint position based on a gesture based on the hand image of the user included in the captured image or a touch operation on the screen. Can be configured to rotate orientation.
- the pronunciation object is an avatar that emits a sound
- the information processing device moves the object of interest in the virtual space based on a gesture based on the hand image of the user included in the captured image or a touch operation on the screen. It can be configured to change position or orientation.
- an information processing device having a camera, detecting a first orientation of the information processing device in a first coordinate system fixed in the real world; If the photographed image taken by the camera contains the face image of the user, the second coordinate system of the user's face in the second coordinate system fixed to the information processing device can be obtained from the photographed image and the face image. Estimate the orientation, calculating a third orientation of the user's face in the first coordinate system from the detected first orientation and the estimated second orientation; It is configured to output information according to the calculated third orientation.
- the program may be recorded on a non-temporary computer-readable information recording medium, distributed, and sold. It can also be distributed and sold through a temporary transmission medium such as a computer communication network.
- a computer-readable non-temporary information recording medium is configured to record the above program.
Landscapes
- Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Acoustics & Sound (AREA)
- Signal Processing (AREA)
- User Interface Of Digital Computer (AREA)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2023516316A JP7611614B2 (ja) | 2021-04-20 | 2022-02-28 | 情報処理装置、情報処理方法、プログラム、ならびに、情報記録媒体 |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2021-070745 | 2021-04-20 | ||
| JP2021070745 | 2021-04-20 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2022224586A1 true WO2022224586A1 (ja) | 2022-10-27 |
Family
ID=83722776
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/JP2022/008277 Ceased WO2022224586A1 (ja) | 2021-04-20 | 2022-02-28 | 情報処理装置、情報処理方法、プログラム、ならびに、情報記録媒体 |
Country Status (2)
| Country | Link |
|---|---|
| JP (1) | JP7611614B2 (https=) |
| WO (1) | WO2022224586A1 (https=) |
Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2008092193A (ja) * | 2006-09-29 | 2008-04-17 | Japan Science & Technology Agency | 音源選択装置 |
| JP2017092732A (ja) * | 2015-11-11 | 2017-05-25 | 株式会社国際電気通信基礎技術研究所 | 聴覚支援システムおよび聴覚支援装置 |
| WO2019026597A1 (ja) * | 2017-07-31 | 2019-02-07 | ソニー株式会社 | 情報処理装置、情報処理方法、並びにプログラム |
| JP2019126033A (ja) * | 2018-01-18 | 2019-07-25 | 株式会社電通ライブ | 音声情報提供システム、音声情報提供装置、及びプログラム |
| US20190335288A1 (en) * | 2014-12-23 | 2019-10-31 | Ray Latypov | Method of Providing to User 3D Sound in Virtual Environment |
| WO2020184021A1 (ja) * | 2019-03-12 | 2020-09-17 | ソニー株式会社 | 情報処理装置、情報処理方法、およびプログラム |
-
2022
- 2022-02-28 WO PCT/JP2022/008277 patent/WO2022224586A1/ja not_active Ceased
- 2022-02-28 JP JP2023516316A patent/JP7611614B2/ja active Active
Patent Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2008092193A (ja) * | 2006-09-29 | 2008-04-17 | Japan Science & Technology Agency | 音源選択装置 |
| US20190335288A1 (en) * | 2014-12-23 | 2019-10-31 | Ray Latypov | Method of Providing to User 3D Sound in Virtual Environment |
| JP2017092732A (ja) * | 2015-11-11 | 2017-05-25 | 株式会社国際電気通信基礎技術研究所 | 聴覚支援システムおよび聴覚支援装置 |
| WO2019026597A1 (ja) * | 2017-07-31 | 2019-02-07 | ソニー株式会社 | 情報処理装置、情報処理方法、並びにプログラム |
| JP2019126033A (ja) * | 2018-01-18 | 2019-07-25 | 株式会社電通ライブ | 音声情報提供システム、音声情報提供装置、及びプログラム |
| WO2020184021A1 (ja) * | 2019-03-12 | 2020-09-17 | ソニー株式会社 | 情報処理装置、情報処理方法、およびプログラム |
Also Published As
| Publication number | Publication date |
|---|---|
| JP7611614B2 (ja) | 2025-01-10 |
| JPWO2022224586A1 (https=) | 2022-10-27 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN108769562B (zh) | 生成特效视频的方法和装置 | |
| CN109379643B (zh) | 视频合成方法、装置、终端及存储介质 | |
| CN110688082B (zh) | 确定音量的调节比例信息的方法、装置、设备及存储介质 | |
| JP6932206B2 (ja) | 空間オーディオの提示のための装置および関連する方法 | |
| US9754621B2 (en) | Appending information to an audio recording | |
| EP3236346A1 (en) | An apparatus and associated methods | |
| CN109192218B (zh) | 音频处理的方法和装置 | |
| AU2014200042B2 (en) | Method and apparatus for controlling contents in electronic device | |
| US12231866B2 (en) | Apparatus and associated methods for capture of spatial audio | |
| CN109346111B (zh) | 数据处理方法、装置、终端及存储介质 | |
| JP2013250838A (ja) | 情報処理プログラム、情報処理装置、情報処理システム、および情報処理方法 | |
| EP4113517A1 (en) | Method and apparatus for processing videos | |
| US20190335292A1 (en) | An Apparatus and Associated Methods | |
| JP2020520576A5 (https=) | ||
| CN111276122A (zh) | 音频生成方法及装置、存储介质 | |
| CN111492342A (zh) | 音频场景处理 | |
| CN110600034B (zh) | 歌声生成方法、装置、设备及存储介质 | |
| CN113963707A (zh) | 音频处理方法、装置、设备和存储介质 | |
| CN107087208B (zh) | 一种全景视频播放方法、系统及存储装置 | |
| CN113766275A (zh) | 视频剪辑方法、装置、终端及存储介质 | |
| JP5929535B2 (ja) | エフェクト制御装置、エフェクト制御方法、およびプログラム | |
| JP7611614B2 (ja) | 情報処理装置、情報処理方法、プログラム、ならびに、情報記録媒体 | |
| CN110136752B (zh) | 音频处理的方法、装置、终端及计算机可读存储介质 | |
| US10200606B2 (en) | Image processing apparatus and control method of the same | |
| US11647350B2 (en) | Audio processing |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22791368 Country of ref document: EP Kind code of ref document: A1 |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2023516316 Country of ref document: JP |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 22791368 Country of ref document: EP Kind code of ref document: A1 |