WO2023249073A1 - 情報処理装置、ディスプレイデバイス、情報処理方法、及びプログラム - Google Patents

情報処理装置、ディスプレイデバイス、情報処理方法、及びプログラム Download PDF

Info

Publication number
WO2023249073A1
WO2023249073A1 PCT/JP2023/023086 JP2023023086W WO2023249073A1 WO 2023249073 A1 WO2023249073 A1 WO 2023249073A1 JP 2023023086 W JP2023023086 W JP 2023023086W WO 2023249073 A1 WO2023249073 A1 WO 2023249073A1
Authority
WO
WIPO (PCT)
Prior art keywords
display device
sound source
display
information
sound
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/JP2023/023086
Other languages
English (en)
French (fr)
Japanese (ja)
Inventor
晴輝 西村
愛実 田畑
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Pixie Dust Technologies Inc
Sumitomo Pharma Co Ltd
Original Assignee
Sumitomo Pharmaceuticals Co Ltd
Pixie Dust Technologies Inc
Sumitomo Pharma Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sumitomo Pharmaceuticals Co Ltd, Pixie Dust Technologies Inc, Sumitomo Pharma Co Ltd filed Critical Sumitomo Pharmaceuticals Co Ltd
Priority to JP2024529066A priority Critical patent/JPWO2023249073A1/ja
Publication of WO2023249073A1 publication Critical patent/WO2023249073A1/ja
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G02OPTICS
    • G02BOPTICAL ELEMENTS, SYSTEMS OR APPARATUS
    • G02B27/00Optical systems or apparatus not provided for by any of the groups G02B1/00 - G02B26/00, G02B30/00
    • G02B27/02Viewing or reading apparatus
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09GARRANGEMENTS OR CIRCUITS FOR CONTROL OF INDICATING DEVICES USING STATIC MEANS TO PRESENT VARIABLE INFORMATION
    • G09G5/00Control arrangements or circuits for visual indicators common to cathode-ray tube indicators and other visual indicators
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09GARRANGEMENTS OR CIRCUITS FOR CONTROL OF INDICATING DEVICES USING STATIC MEANS TO PRESENT VARIABLE INFORMATION
    • G09G5/00Control arrangements or circuits for visual indicators common to cathode-ray tube indicators and other visual indicators
    • G09G5/36Control arrangements or circuits for visual indicators common to cathode-ray tube indicators and other visual indicators characterised by the display of a graphic pattern, e.g. using an all-points-addressable [APA] memory
    • G09G5/38Control arrangements or circuits for visual indicators common to cathode-ray tube indicators and other visual indicators characterised by the display of a graphic pattern, e.g. using an all-points-addressable [APA] memory with means for controlling the display position
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Definitions

  • the present disclosure relates to an information processing apparatus, a display device, an information processing method, and a program.
  • the attitude (orientation or inclination) of the HMD can be measured using an IMU (Inertial Measurement Unit) that includes a gyro sensor, an acceleration sensor, a geomagnetic sensor, and the like.
  • IMU Inertial Measurement Unit
  • the IMU has a problem in that errors occur due to drift and accumulate over time.
  • Patent Document 1 discloses a technique for correcting drift of an IMU sensor.
  • a calibration offset is generated by comparing the 3D physical position calculated by the HMD itself with position data generated by another HMD using an optical sensor and an IMU.
  • an external device equipped with another sensor that is, another HMD equipped with an optical sensor and an IMU sensor.
  • An object of the present disclosure is to provide a technique for suppressing the adverse effects caused by measurement errors in the posture of a display device worn by a user.
  • An information processing apparatus includes: a data acquisition unit that acquires sensor data indicating a change in the posture of the display device from a sensor included in a display device that can be mounted on a user's head; Based on the acquired sensor data and information indicating the direction of the sound source, a sound is emitted from the sound source at a display position within the display section of the display device that corresponds to the direction of the sound source with respect to the display device.
  • a display control means for displaying information regarding sound and correcting a deviation in a display position of information displayed by the display control means in response to a predetermined condition being satisfied regarding sensor data acquired by the data acquisition means. and a correction means.
  • FIG. 1 is a diagram showing an example of the configuration of a display device according to the present embodiment.
  • 2 is a diagram schematically showing a glass-type display device that is an example of the display device shown in FIG. 1.
  • FIG. 7 is a diagram illustrating an example of a change in a user's orientation over time when there is no drift of an IMU sensor.
  • FIG. 3 is a diagram showing an example of a screen displayed on a display when there is no drift of an IMU sensor.
  • FIG. 7 is a diagram illustrating an example of changes over time in the user's orientation and the reference direction of the IMU sensor when there is a drift of the IMU sensor.
  • FIG. 3 is a diagram showing an example of a screen displayed on a display when there is a drift of an IMU sensor.
  • FIG. 7 is a diagram illustrating an example of temporal changes in the orientation of the user and the reference direction of the IMU sensor when the drift of the IMU sensor is corrected at time tx.
  • FIG. 7 is a diagram showing an example of a screen displayed on a display when the drift of the IMU sensor is corrected at time tx.
  • FIG. 3 is a diagram showing a data structure of a sound source database according to the present embodiment. It is a flowchart of audio processing of this embodiment.
  • FIG. 3 is a diagram for explaining sound collection by a microphone.
  • FIG. 2 is a flowchart illustrating a first example of updating a reference direction in audio processing according to the present embodiment. It is a figure showing the example of a display on a display device.
  • FIG. 3 is a diagram for explaining how the user sees the image. 3 is a diagram illustrating a configuration example of an information processing system according to modification 1.
  • FIG. 7 is a diagram illustrating the appearance of a multi-microphone device according to modification 1.
  • FIG. 3 is a diagram showing an example of a screen displayed on a display when there is a drift of an IMU sensor. 12 is a flowchart of audio processing in Modification 1.
  • a coordinate system (microphone coordinate system) based on the position and orientation of a microphone set, which will be described later, may be used.
  • the microphone coordinate system has its origin at the position of the microphone set (for example, the center of gravity of a display device or multi-microphone device including the microphone set), and the x-axis and y-axis are perpendicular to each other at the origin.
  • the x+ direction is the front of the microphone set
  • the x-direction is defined as the rear of the microphone set
  • the y+ direction is defined as the left direction of the microphone set
  • the y-direction is defined as the right direction of the microphone set. do.
  • the direction in a specific coordinate system means the direction with respect to the origin of the coordinate system. If the microphone set is provided with a display device, the microphone coordinate system is dependent on the coordinate system of the display device. On the other hand, if the microphone set is separate from the display device (eg, provided in a multi-microphone device), the microphone coordinate system is independent of the coordinate system of the display device.
  • FIG. 1 is a diagram showing an example of the configuration of a display device according to this embodiment.
  • FIG. 2 is a diagram schematically showing a glass-type display device, which is an example of the display device shown in FIG.
  • the display device 1 when the display device 1 is configured to be able to be mounted on the user's head, the display device 1 may be a glass-type display device, a head-mounted display, a wearable device, or smart glasses.
  • the display device 1 may be an optical see-through glass display device, but the format of the display device 1 is not limited thereto.
  • the display device 1 may be a video see-through glass display device. That is, the display device 1 may include a camera.
  • the display device 1 may display a synthesized image obtained by synthesizing the text image generated based on voice recognition and the captured image captured by the camera on the display 102, which will be described later.
  • the captured image is an image captured in the front direction of the user, and may include an image of the speaker.
  • the display device 1 may perform AR (Augmented Reality) display by combining a text image generated based on voice recognition and a photographed image taken with a camera, for example on a smartphone, a personal computer, or a tablet terminal. .
  • the display device 1 includes a controller 10, a plurality of microphones 101, a display 102, and an IMU sensor 103. That is, the plurality of microphones 101, display 102, and IMU sensor 103 are configured as one unit. In the following description, the plurality of microphones 101 may be referred to as a "microphone set.”
  • the controller 10 is an information processing device that controls the display device 1.
  • the controller 10 is connected to a microphone 101, a display 102, and an IMU sensor 103 by wire or wirelessly.
  • the controller 10 includes a storage device 11, a processor 12, an input/output interface 13, and a communication interface 14.
  • the data includes, for example, the following data. ⁇ Databases referenced in information processing ⁇ Data obtained by executing information processing (that is, execution results of information processing)
  • the processor 12 is configured to implement the functions of the controller 10 by activating a program stored in the storage device 11.
  • Processor 12 is an example of a computer. For example, by activating a program stored in the storage device 11, the processor 12 displays an image (hereinafter referred to as a "text image") representing text corresponding to the speech sound collected by the microphone 101 at a predetermined position on the display 102. Realize the functions presented to.
  • the display device 1 may include dedicated hardware such as an ASIC or FPGA, and at least a part of the processing of the processor 12 described in this embodiment may be executed by the dedicated hardware.
  • the input/output interface 13 acquires at least one of the following. - Audio signal collected by the microphone 101 - User instructions input from an input device connected to the controller 10 - Sensor data acquired from the IMU sensor 103 (measurement results by the IMU sensor 103)
  • the input device is, for example, a microphone 101, an IMU sensor 103, a drive button, a keyboard, a pointing device, a touch panel, a remote controller, a switch, or a combination thereof.
  • the input/output interface 13 is configured to output information to an output device connected to the controller 10.
  • the output device is, for example, the display 102.
  • the microphone 101 collects sounds around the display device 1, for example.
  • the sounds collected by the microphone 101 include, for example, at least one of the following sounds. ⁇ Sounds spoken by a person ⁇ Sounds of the environment in which the display device 1 is used (hereinafter referred to as "environmental sounds")
  • the microphones 101 are arranged so as to maintain a predetermined positional relationship with each other.
  • the display 102 presents (eg, displays) an image under the control of the controller 10.
  • the display 102 may be implemented in any manner as long as it can present an image to the user.
  • the display 102 can be realized, for example, by the following implementation method. ⁇ HOE (Holographic optical element) or DOE (Diffractive optical element) using an optical element (for example, a light guide plate) ⁇ Liquid crystal display ⁇ Retinal projection display ⁇ LED (Light Emitting Diode) display ⁇ Organic EL (Electro Luminescence) display ⁇ Laser display ⁇ Using optical elements (for example, lenses, mirrors, diffraction gratings, liquid crystals, MEMS mirrors, HOE) , a display that guides light emitted from a light emitter.In particular, when using a retinal projection display, even people with amblyopia can easily observe images. Therefore, it is possible for a person suffering from both hearing loss and amblyopia to more easily recognize the arrival direction of speech
  • the IMU sensor 103 outputs sensor data indicating a change in the attitude (orientation or inclination) of the display device 1. For example, the IMU sensor 103 measures the three-dimensional inertial motion of the display device 1. The IMU sensor 103 transmits sensor data indicating measurement results to the controller 10. The controller 10 estimates the attitude of the display device 1 based on the sensor data received from the IMU sensor 103.
  • the IMU sensor 103 includes an acceleration sensor and a gyro sensor, and measures accelerations in three orthogonal axes and angular velocities around the three axes.
  • the configuration of the IMU sensor 103 is not limited to this.
  • the IMU sensor 103 may further include a 3-axis geomagnetic sensor, or the IMU sensor 103 may include a gyro sensor without an acceleration sensor. Good too.
  • the microphone set includes microphones 101-1 to 101-5.
  • Microphone 101-1 is placed at right temple 21.
  • the microphone 101-2 is placed on the right endpiece 22.
  • the microphone 101-4 is placed on the left endpiece 24.
  • Microphone 101-5 is placed at left temple 25.
  • the number and arrangement of the microphones 101 included in the microphone set in the display device 1 are not limited to the example in FIG. 2 .
  • the controller 10 is placed inside the right temple 21, for example.
  • the arrangement of the controller 10 is not limited to the example shown in FIG. 2, and the controller 10 may be configured separately from the display device 1, for example.
  • FIG. 3 is a diagram illustrating an example of a change in the user's orientation over time when there is no drift of the IMU sensor.
  • FIG. 4 is a diagram showing an example of a screen displayed on the display when there is no drift of the IMU sensor.
  • FIG. 5 is a diagram illustrating an example of changes over time in the user's orientation and the reference direction of the IMU sensor when there is a drift of the IMU sensor.
  • FIG. 6 is a diagram showing an example of a screen displayed on the display when there is a drift of the IMU sensor.
  • FIG. 7 is a diagram illustrating an example of temporal changes in the orientation of the user and the reference direction of the IMU sensor when the drift of the IMU sensor is corrected at time tx.
  • FIG. 8 is a diagram showing an example of a screen displayed on the display when the drift of the IMU sensor is corrected at time tx.
  • the user US10 moves his head between times t0 and t2, and the microphone 101 mounted on the display device 1 also moves in conjunction.
  • the direction of the sound source is estimated based on the microphone 101. Therefore, even if the sound source is completely stationary, the estimated direction of the speaker SP11 will vary depending on the movement of the user US10's head.
  • the controller 10 sets the reference direction R12 to the front direction of the user US10 at that time (i.e. , the front direction of the display device 1).
  • the controller 10 determines the local coordinate system of the display device 1 and the microphone 101 (that is, the coordinate system based on the position and orientation of the display device 1 (hereinafter, "device coordinate system"). It is possible to calculate how much the microphone coordinate system) and the microphone coordinate system) are rotated with respect to the reference coordinate system at time ti.
  • the controller 10 changes the direction of the sound source in the microphone coordinate system at time ti to the reference coordinate system based on the correspondence between the reference direction R12 and the reference coordinate system and the posture UO13(ti) of the user US10 at time ti. can be converted to the direction (angle) of the sound source at . Thereby, the controller 10 can derive the direction of the sound source in the reference coordinate system regardless of the orientation of the user US10's head.
  • the controller 10 determines that the sound source directions corresponding to the audio signals received by the microphone 101 from time t0 to time t2 are the same in the reference coordinate system.
  • the controller 10 treats "Hello”, "I'm”, and “Taro”, which are the contents of the audio signals received by the microphone 101 from time t0 to t2, as the utterance contents of a specific sound source (speaker SP11). Identify.
  • the controller 10 displays an icon IC15 representing the identified sound source (speaker SP11) and a text image TI16 representing the content of the sound (utterance) emitted from the sound source at each time ti. Images arranged at positions corresponding to the estimation result of the direction of the sound source at ti and the posture UO13(ti) of the user US10 are sequentially generated. The controller 10 sequentially displays the generated images on the display 102. As a result, information regarding the sound emitted from the sound source is displayed in the display 102 of the display device 1 at a display position corresponding to the direction of the sound source with respect to the display device 1.
  • the user US10 can determine in which direction from the user's point of view what kind of utterance was made by the speaker (in other words, what kind of sound was emitted by the sound source located in which direction). It can be easily understood. Note that displaying the icon IC15 in the image generated by the controller 10 is not essential, and the controller 10 may generate an image that does not include the icon IC15 but includes the text image TI16. The same applies to subsequent examples.
  • the controller 10 estimates the orientation UO13(t1) of the user US10 based on the reference direction RD12(1) rather than the reference direction RD12(0), the estimation result of the orientation UO13(t1) of the user US10 includes the reference direction.
  • a drift error error caused by the drift of the IMU sensor 103 corresponding to the difference between RD12(1) and reference direction RD12(0) is included. Therefore, the sound source direction of the speaker SP11 in the reference coordinate system at time t1 derived by the coordinate system transformation also includes an error.
  • the reference direction RD12(2) at time t2 further deviates from the reference direction RD12(0).
  • the controller 10 estimates the orientation UO13(t2) of the user US10 based on the reference direction RD12(2) rather than the reference direction RD12(0), the estimation result of the orientation UO13(t2) of the user US10 includes the reference direction.
  • a drift error corresponding to the difference between RD12(2) and reference direction RD12(0) is included. Therefore, the sound source direction of the speaker SP11 in the reference coordinate system at time t2 derived by the coordinate system transformation also includes an error.
  • the controller 10 determines that the sound sources that emitted the sounds corresponding to those audio signals are the same. do. Therefore, the controller 10 determines that the sound sources corresponding to the audio signals received by the microphone 101 from time t0 to t1 are the same, but the sound source corresponding to the audio signal received by the microphone 101 at time t2 is different from the above-mentioned sound source. There is a possibility that it will be determined to be different.
  • An image arranged at a position according to the estimation result of the direction of the second sound source and the posture UO13 (t2) of the user US10 is generated. Such an image may make the user think that a new speaker has appeared at time t2, but in reality, only one speaker SP11 exists, so the user may feel confused or uncomfortable by looking at the image. There is a risk of
  • the controller 10 of this embodiment updates the reference direction (that is, updates the correspondence between the reference direction and the reference coordinate system) in response to the fulfillment of a predetermined update condition regarding the sensor data acquired from the IMU sensor 103. Update.
  • the estimated drift error in the posture of the display device 1 is corrected, and accordingly, the shift in the display position of information regarding the sound emitted from the sound source is corrected, and the same sound source is incorrectly identified as different sound sources. is suppressed.
  • FIGS. 3 and 5 it is assumed that the user US10 wearing the display device 1 faces the speaker SP11 from time t0 to t2. As shown in FIG.
  • the controller 10 detects that the user has performed a predetermined gesture (for example, a nodding gesture or a tilting gesture) based on the sensor data acquired from the IMU sensor 103. Then, it is determined that the update condition is satisfied. In response to this determination, the controller 10 updates (resets) the reference direction.
  • a predetermined gesture for example, a nodding gesture or a tilting gesture
  • the controller 10 updates the reference direction RD12a(x) at time tx to the reference direction RD12b(0) corresponding to the front direction of the user US10 (that is, the front direction of the display device 1) at time tx. .
  • the reference direction RD12b(0) matches the reference direction RD12a(0). Due to the drift of the IMU sensor 103, the reference direction RD12b(1) at time t2 deviates from the reference direction RD12b(0).
  • the error between the reference direction RD12b(1) and the reference direction RD12b(0) is the error at time t2 when the reference direction is not reset (the error between the reference direction RD12(2) and the reference direction RD12 in FIG. (error between 0 and 0).
  • the controller 10 determines whether the sound sources corresponding to the audio signals received by the microphone 101 from time t0 to t2 are the same. It is determined that there is. As a result, the contents of the audio signals received by the microphone 101 from time t0 to t2, ⁇ Hello,'' ⁇ I'm,'' and ⁇ Taro,'' are all identified as the utterances of a specific sound source (speaker SP11). Ru.
  • the controller 10 displays an icon IC15 representing the identified sound source (speaker SP11) and a text image TI16 representing the content of the sound (utterance) emitted from the sound source at each time ti. Images placed at positions according to the estimation result of the direction of the sound source at ti and the orientation UO13(ti) of the user US10 are sequentially generated. The controller 10 sequentially displays the generated images on the display 102. By looking at such an image, the user US10 can determine in which direction from the user's point of view what kind of utterance was made by the speaker (in other words, what kind of sound was emitted by the sound source located in which direction). It can be easily understood.
  • the sound source database stores sound source information.
  • the sound source information is information regarding a sound source (typically, a speaker) around the microphone 101, which is identified by the controller 10.
  • the sound source database includes an "ID” field, a "name” field, an "icon” field, and a "direction” field. Each field is associated with each other.
  • the "ID" field stores the sound source ID.
  • the sound source ID is information that identifies a sound source.
  • the controller 10 detects a new sound source, it issues a new sound source ID and assigns the sound source ID to the sound source.
  • the "name" field stores sound source name information.
  • the sound source name information is information regarding the name of the sound source.
  • the controller 10 may automatically determine the sound source name information, or may set the sound source name information according to user instructions.
  • the controller 10 may assign some initial sound source name to the newly detected sound source according to a predetermined rule or randomly.
  • the icon information is information regarding the icon of the sound source.
  • the icon information may include the icon image (e.g., one of the preset icon images or a photo or drawing provided by the user), or the format of the icon (e.g., color, texture, optical effects, shape, etc.) may contain information that can identify the person.
  • the controller 10 may automatically determine the icon information or may set the icon information according to a user instruction.
  • the controller 10 may assign some initial icon to the newly detected sound source according to a predetermined rule or randomly. However, if the sound source icon is not displayed in the image presented to the user, the icon information can be omitted from the sound source information.
  • the "direction" field stores sound source direction information.
  • the sound source direction information is information regarding the direction of the sound source with respect to the microphone 101.
  • the direction of the sound source is expressed as an angle of deviation from an axis with a predetermined direction in the reference coordinate system being 0 degrees.
  • the audio processing shown in FIG. 10 is started after the power of the display device 1 is turned on and the initial settings are completed.
  • the start timing of the process shown in FIG. 10 is not limited to this.
  • the process shown in FIG. 10 may be repeatedly executed, for example, at a predetermined period, so that the user of the display device 1 can view images that are updated in real time.
  • the controller 10 acquires an audio signal via the microphone 101 (S110). Specifically, the plurality of microphones 101-1, . . . , 101-5 included in the microphone set each collect the speech sounds emitted by the speaker. Microphones 101-1 to 101-5 collect speech sounds that arrive via a plurality of paths shown in FIG. 11. The microphones 101-1 to 101-5 convert the collected speech sounds into audio signals.
  • the controller 10 After step S110, the controller 10 performs direction-of-arrival estimation (S111).
  • the storage device 11 stores a direction-of-arrival estimation model.
  • the arrival direction estimation model describes information for specifying the correlation between the spatial information included in the audio signal and the arrival direction of the speech sound.
  • the microphone set is integrated with the display device 1, and estimates the arrival direction of the speech sound emitted from the speaker PR3 to be a direction shifted by an angle A2 to the left from the x-axis. .
  • the microphone set estimates the arrival direction of the speech sound emitted by the speaker PR4 to be a direction shifted by an angle A3 to the left from the x-axis.
  • the microphone set estimates the arrival direction of the speech sound emitted by the speaker PR5 to be a direction shifted by an angle A1 to the right from the x-axis.
  • the controller 10 may determine that a nodding gesture has occurred when a pitch angle index according to sensor data acquired from the IMU sensor 103 is equal to or greater than a pitch threshold.
  • the pitch angle index may be, for example, the absolute value of the estimated pitch angle at one point in time of the orientation of the display device 1, or the statistical value of the pitch angle at multiple consecutive points in time (e.g., average value, median value, maximum value, minimum value, etc.). (value, mode, variance, or standard deviation) can be used.
  • the controller 10 may determine that a gesture of tilting the head has occurred when a roll angle index according to sensor data acquired from the IMU sensor 103 is equal to or greater than a roll threshold.
  • the controller 10 sets the condition for updating the reference direction to be that the user has performed a shaking motion in a specific direction, such as a nodding gesture or a tilting gesture.
  • a specific direction such as a nodding gesture or a tilting gesture.
  • the reference direction update conditions are not limited to this.
  • the controller 10 may determine that the user has pressed a predetermined switch included in the display device 1 as a condition for updating the reference direction. In this case, when the user notices that an error has occurred in posture estimation, he or she can face forward (with his or her face directly facing the other party) and press a predetermined switch to move toward the reference direction. can be reset to correct errors.
  • the controller 10 sets any one of the following as a new (updated) reference direction (for example, a reference direction in which the azimuth angle indicating the attitude of the display device 1 is 0 degrees).
  • a new (updated) reference direction for example, a reference direction in which the azimuth angle indicating the attitude of the display device 1 is 0 degrees.
  • ⁇ Front direction of display device 1 ⁇ Weighted average of the current (before update) reference direction and the front direction of display device 1 ⁇ Move the current (before update) reference direction closer to the front direction of display device 1 Value corrected to
  • step S202 the controller 10 ends the process of FIG. 14. Further, if the predetermined gesture is not generated in step S201, the controller 10 skips updating the reference direction (S202) and ends the process of FIG. 14.
  • the controller 10 updates the reference direction so that the larger the pitch angle index is, the closer the updated reference direction is to the front direction of the display device 1 than the pre-update reference direction. , reset the reference direction.
  • the controller 10 updates the reference direction so that the larger the above-mentioned roll angle index is, the closer the updated reference direction is to the front direction of the display device 1 with respect to the pre-update reference direction. , reset the reference direction.
  • the controller 10 updates the reference direction so that the updated reference direction matches the front direction of the display device 1 when the pitch angle index described above exceeds the first pitch threshold.
  • Reset direction The controller 10 determines that when the pitch angle index is between the first pitch threshold and the second pitch threshold, the updated reference direction is between the pre-updated reference direction and the front direction of the display device 1.
  • Reset the reference direction as follows.
  • the second pitch threshold is smaller than the first pitch threshold.
  • the controller 10 does not reset the reference direction when the pitch angle index is less than the second pitch threshold.
  • the first pitch threshold and the second pitch threshold can be determined using the same technique as the pitch threshold in the first example of updating the reference direction (S1131).
  • the controller 10 updates the reference direction so that the updated reference direction matches the front direction of the display device 1 when the roll angle index described above exceeds the first roll threshold.
  • Reset direction The controller 10 determines that when the roll angle index is between the first roll threshold and the second roll threshold, the updated reference direction is between the pre-updated reference direction and the front direction of the display device 1.
  • Reset the reference direction as follows.
  • the second roll threshold is smaller than the first roll threshold.
  • the controller 10 does not reset the reference direction when the roll angle index is less than the second roll threshold.
  • the first roll threshold and the second roll threshold can be determined using the same technique as the roll threshold in the first example of updating the reference direction (S1131).
  • step S1132 the controller 10 executes coordinate system transformation (S1132). Specifically, the controller 10 converts the target direction estimation result (the sound source direction in the microphone coordinate system) obtained in step S111 into the reference coordinate based on the measurement result obtained in step S1130 and the posture estimation result of the display device 1. Convert to the direction of the sound source in the system.
  • coordinate system transformation S1132
  • the controller 10 executes a match determination (S1133). Specifically, the controller 10 determines whether the sound source corresponding to the target direction is the same as the identified sound source. As an example, the controller 10 compares the result of converting the target direction to the sound source direction in the reference coordinate system with the sound source direction information (FIG. 9) about the identified sound source. Then, when the controller 10 determines that the converted target direction matches any of the sound source direction information regarding the identified sound sources, the controller 10 associates the target direction with the (identified) sound source having matching sound source direction information. The sound source is treated as a matching sound source.
  • the controller 10 determines that the converted target direction does not match any of the sound source direction information regarding the identified sound sources, the controller 10 detects that a new sound source exists in the target direction.
  • the fact that the converted target direction matches the sound source direction information includes at least that the converted target direction matches the direction indicated by the sound source direction information, and furthermore, the converted target direction and the sound source direction information correspond to the direction indicated by the sound source direction information. It may include that the difference or ratio from the indicated direction is within a permissible range.
  • step S1133 the controller 10 assigns a new sound source ID (S1134). Specifically, the controller 10 assigns a new sound source ID to information regarding the sound emitted from the sound source corresponding to the target direction (for example, a voice recognition result). Further, the controller 10 adds a record corresponding to this new sound source ID to the sound source database (FIG. 9).
  • step S1133 the controller 10 assigns a matching sound source ID (S1135). Specifically, the controller 10 adds a sound source ID that identifies the sound source to information (for example, a voice recognition result) regarding the sound emitted from the sound source corresponding to the target direction.
  • step S1134 or step S1135 the controller 10 ends the process of FIG. 13.
  • step S112 the controller 10 executes audio signal extraction (S113).
  • a beamforming model is stored in the storage device 11.
  • the beamforming model describes information for specifying the correlation between a predetermined direction and parameters for forming directivity with a beam in that direction.
  • forming directivity is a process of amplifying or attenuating sound in a specific direction of arrival.
  • the controller 10 calculates parameters for forming a directivity with a beam in the direction of arrival by inputting the direction of arrival estimated in S111 into the beamforming model.
  • the controller 10 inputs the calculated angle A1 into the beamforming model and calculates parameters for forming a directivity with a beam in a direction shifted by an angle A1 to the right from the x-axis. do.
  • the controller 10 inputs the calculated angle A2 into the beamforming model and calculates parameters for forming a directivity having a beam in a direction shifted by the angle A2 to the left from the x-axis.
  • the controller 10 inputs the calculated angle A3 into the beamforming model and calculates parameters for forming a directivity having a beam in a direction shifted by the angle A3 to the left from the x-axis.
  • the controller 10 amplifies or attenuates the audio signals acquired from the microphones 101-1 to 101-5 using the parameters calculated for the angle A1.
  • the controller 10 extracts, from the acquired audio signals, an audio signal for the speech sound coming from the sound source in the direction corresponding to the angle A1, by synthesizing the amplified or attenuated audio signals.
  • the controller 10 amplifies or attenuates the audio signals acquired from the microphones 101-1 to 101-5 using the parameters calculated for the angle A2.
  • the controller 10 extracts, from the acquired audio signal, an audio signal for the speech sound coming from the sound source in the direction corresponding to the angle A2, by synthesizing the amplified or attenuated audio signals.
  • the controller 10 amplifies or attenuates the audio signals acquired from the microphones 101-1 to 101-5 using the parameters calculated for the angle A3.
  • the controller 10 extracts, from the acquired audio signal, an audio signal for the speech sound coming from the sound source in the direction corresponding to the angle A3, by synthesizing the amplified or attenuated audio signals.
  • the storage device 11 stores a speech recognition model.
  • the speech recognition model describes information for specifying the correlation between a speech signal and text for the speech signal.
  • the speech recognition model is, for example, a trained model generated by machine learning.
  • the voice recognition model may be stored in an external device (for example, a cloud server) that the controller 10 can access via a network (for example, the Internet).
  • the controller 10 determines the text corresponding to the input voice signal by inputting the voice signal extracted in step S113 to the voice recognition model.
  • the controller 10 may select the speech recognition engine based on the identification result of the sound source corresponding to the speech signal.
  • the controller 10 determines the text corresponding to the input audio signal by inputting the audio signals extracted for angles A1 to A3 into the audio recognition model.
  • step S114 the controller 10 executes text image generation (S115). Specifically, the controller 10 generates a text image representing the text based on the result of the voice recognition process in step S114.
  • step S115 the controller 10 determines the display mode (S116). Specifically, the controller 10 determines in what manner the display image including the text image generated in step S115 is to be displayed on the display 102.
  • step S116 the controller 10 executes image display (S117). Specifically, the processor 12 displays a display image on the display 102 according to the display mode determined in step S116.
  • FIG. 15 is a diagram illustrating a display example on a display device.
  • FIG. 16 is a diagram for explaining the appearance in the user's field of view.
  • the controller 10 determines the display position of the text image on the display unit of the display device 1 based on at least the direction of the sound source in the reference coordinate system and the posture of the user (that is, the measurement result by the IMU sensor 103).
  • the horizontal display position of the text image will be explained.
  • the images of the speakers P2 to P4 drawn with broken lines in FIG. 15 represent the real images seen by the user P1 through the display 102.
  • the text images T1 to T3 depicted in FIG. 15 represent images displayed on the display 102 and seen by the user P1, and do not exist in real space.
  • the image positions of the field of view viewed through the display 102-1 and the field of view viewed through the display 102-2 differ from each other depending on parallax.
  • the controller 10 determines, as the display position of the text image, a position that corresponds to the direction of the sound source that emitted the sound related to the text image and the posture of the user. More specifically, the controller 10 determines the display position of the text image T1 corresponding to the sound (speech sound of the speaker P2) coming from the direction of the angle A1 with respect to the display device 1, based on the angle seen from the viewpoint of the user P1. The position is determined to be visible in the direction corresponding to A1.
  • the controller 10 changes the display position of the text image T2 corresponding to the sound (speech sound of the speaker P3) arriving from the direction of the angle A2 with respect to the display device 1 in the direction corresponding to the angle A2 when viewed from the viewpoint of the user P1. Decide on a position where it can be seen.
  • the controller 10 displays the display position of the text image T3 corresponding to the voice (speech sound of the speaker P4) coming from the direction of the angle A3 with respect to the display device 1 in the direction corresponding to the angle A3 when viewed from the viewpoint of the user P1. Decide on a position where it can be seen.
  • angles A1 to A3 here represent azimuth angles.
  • text images T1 to T3 are displayed on the display 102 at display positions that correspond to the direction of each sound source in the reference coordinate system and the posture of the user.
  • the text image T1 representing the content of the statement by the speaker P2 is presented to the user P1 of the display device 1 together with the image of the speaker P2 that is visible through the display 102.
  • a text image T2 representing the content of the speech by the speaker P3 is presented to the user P1 together with an image of the speaker P3 that is visible through the display 102.
  • a text image T3 representing the content of the speech by the speaker P4 is presented to the user P1 together with an image of the speaker P4 that is visible through the display 102.
  • the horizontal display position of the text image displayed on the display 102 is determined according to the estimation result of the direction of the sound source in the reference coordinate system.
  • the display 102 is configured so that the image of the speaker and the text image of the content of the statement appear in the same direction as viewed from the user P1.
  • the display position of the text image is changed.
  • the controller 10 may estimate the attitude of the display device 1 based on the acquired sensor data. Based on the estimated posture and information indicating the direction of the sound source, the controller 10 displays the sound emitted from the sound source at a display position within the display 102 of the display device 1 that corresponds to the direction of the sound source with respect to the display device 1. Information regarding the sound played may also be displayed. Thereby, the display position of information regarding the sound emitted from the sound source is linked to the attitude of the display device 1, so that it is possible to assist the user in grasping the relationship between the display position of the information and the direction of the sound source.
  • the controller 10 may correct the shift in the display position of the information regarding the sound by correcting the drift error of the estimated posture. Thereby, it is possible to suppress the adverse effects of drift errors on the display position of information regarding sound.
  • the controller 10 may correct the drift error of the estimated attitude by updating the reference direction of the azimuth of the estimated attitude so that it approaches the front direction of the display device 1 at the time when a predetermined condition is satisfied. .
  • the updated reference direction approaches the front direction of the display device 1 at the time when the predetermined condition was satisfied, so that the negative effects of drift errors on the display position of sound information can be effectively suppressed. can.
  • the predetermined condition may include a condition that the roll angle index according to the acquired sensor data is equal to or greater than a roll threshold.
  • the controller 10 may update the reference direction so that the reference direction coincides with the front direction when the pitch angle index exceeds the first pitch threshold. Thereby, when the user performs a motion such as nodding his head, the reference direction can be reset to match the front of the user at the time when the user performed the motion.
  • the controller 10 determines whether the updated reference direction is the same as the pre-updated reference direction and the front direction.
  • the reference direction may be updated so that the direction is between the two, or the reference direction may not be updated when the pitch angle index is less than the second pitch threshold.
  • the reference direction can be reset to be closer to the front of the user at the time the user performed the motion, and the user's nodding motion is slightly In this case, the reference direction is maintained, so the update frequency of the reference direction can be optimized.
  • the controller 10 may present information that prompts the user wearing the display device 1 to perform a predetermined swinging motion, and may determine the pitch threshold based on sensor data acquired after presenting the information. Thereby, it is possible to determine a pitch threshold value according to the characteristics of the pitch angle at the time of a predetermined swinging motion of the user, so that the occurrence of the motion can be detected with higher accuracy.
  • the controller 10 acquires information indicating the direction of the sound source with respect to a microphone set including a plurality of microphones 101 included in the display device 1, and based on the information and the estimated orientation (of the display device 1), the controller 10 determines the reference coordinate system.
  • the direction of the sound source may be identified.
  • the controller 10 displays the sound emitted from the sound source at a display position within the display 102 of the display device 1 that corresponds to the direction of the sound source with respect to the display device 1. You may also display information regarding.
  • the information regarding the sound emitted from the sound source may include text obtained by performing speech recognition on the sound picked up by the microphone set. Thereby, the user can understand the content of the utterance of the speaker serving as the sound source by looking at the displayed information.
  • the display device 1 may be a glass-type display device, and the display 102 is placed within the visual field of the user wearing the display device 1. This makes it easier for the user to understand the displayed information.
  • Modification 1 is an example in which a multi-microphone device separate from a display device includes a microphone set.
  • the information processing system 200 shown in FIG. 17 acquires audio using the multi-microphone device 30, and displays a text image corresponding to the acquired audio on the display device 2 in a manner that allows the direction of arrival of the audio to be identified. It is configured as follows.
  • the form of the display device 2 includes, for example, at least one of the following types. ⁇ Glass-type display device ⁇ Head-mounted display ⁇ PC ⁇ Tablet terminal
  • the information processing system 200 includes a display device 2 and a multi-microphone device 30.
  • the display device 2 includes a controller 10, a display 102, and an IMU sensor 103.
  • Communication between the multi-microphone device 30 and the display device 2 is realized, for example, by a USB connection, a Bluetooth (registered trademark) connection, or a connection via a network such as Wi-Fi or a mobile network.
  • FIG. 18 is a diagram showing the appearance of a multi-microphone device according to modification 1.
  • the multi-microphone device 30 includes a microphone set including a plurality of microphones 31.
  • the multi-microphone device 30 includes five microphones 31-1, . . . , 31-5 (hereinafter simply referred to as microphones 31 unless otherwise distinguished).
  • the multi-microphone device 30 generates an audio signal by receiving (collecting) sound emitted from a sound source using microphones 31-1, . . . , 31-5.
  • the multi-microphone device 30 estimates the direction of arrival of the sound (that is, the direction of the sound source) in the microphone coordinate system.
  • the multi-mic device 30 performs beamforming processing.
  • the multi-microphone device 30 executes some or all of the audio signal acquisition (S110), direction of arrival estimation (S111), and audio signal extraction (S113) of the audio signal processing shown in FIG. It can be equipped with functions for
  • the multi-microphone device 30 can include a processor, a storage device, and a communication interface or input/output interface for performing these processes.
  • the microphone 31 collects sounds around the multi-microphone device 30, for example.
  • the sounds collected by the microphone 31 include, for example, at least one of the following sounds. ⁇ Speech sounds by a person ⁇ Sounds of the environment in which the multi-microphone device 30 is used
  • the multi-microphone device 30 has a mark 31a on the surface of the casing that indicates the reference direction of the multi-microphone device 30 (for example, the front (that is, the x+ direction), but may be in another predetermined direction). It is attached. Thereby, the user can easily recognize the orientation of the multi-microphone device 30 from visual information. Note that the means for recognizing the orientation of the multi-microphone device 30 is not limited to this.
  • the mark 31a may be integrated with the housing of the multi-microphone device 30.
  • FIG. 19 is a diagram showing an example of a screen displayed on the display when there is a drift of the IMU sensor.
  • the display device 2 of Modification 1 can also display the same UI (User Interface) screen as the display device 1 of this embodiment.
  • the multi-microphone device 30 does not move in conjunction with the user's posture, so the correspondence between the microphone coordinate system and the reference coordinate system remains constant unless the multi-microphone device 30 is moved. Therefore, the controller 10 identifies and maintains the attitude of the microphone set in the reference coordinate system at a certain point in time.
  • the controller 10 determines the direction in the reference coordinate system based on the direction of the sound source with respect to the microphone set (sound arrival direction) and the attitude of the microphone set in the reference coordinate system. Identify the direction of the sound source.
  • the controller 10 Based on the sound source direction in the reference coordinate system and the estimated orientation of the display device 2, the controller 10 displays the sound emitted from the sound source at a display position in the display 102 that corresponds to the direction of the sound source with respect to the display device 2. Display information about.
  • the drift of the IMU sensor 103 does not affect the estimation result of the direction of the sound source in the reference coordinate system, so a situation in which the same speaker is identified as different sound sources does not occur.
  • the display position of information regarding the sound emitted from the sound source may become inappropriate.
  • the controller 10 displays an icon IC15 representing the identified first sound source and a text image TI16 representing the content of the sound (utterance) emitted from the sound source (speaker SP11) at time ti. Images arranged at positions according to the estimation result of the direction of the sound source and the posture UO13(ti) of the user US10 are sequentially generated. However, if the drift error becomes large, the estimation result of the user's posture UO13(ti) becomes inaccurate, so the display positions of the icon IC15 and the text image TI16 are determined when the user's posture is accurately estimated.
  • position PO17 that is, the position where the actual sound source exists in the user's field of view.
  • information regarding the sound emitted from the sound source is placed in a position that corresponds to a direction far away from the actual direction of the sound source, so the user may feel confused or uncomfortable when looking at the image. There is a risk.
  • the controller 10 of the first modification updates the reference direction (that is, updates the correspondence between the reference direction and the reference coordinate system) when a predetermined update condition is satisfied, thereby preventing the user from accumulating drift errors.
  • the estimation error of the pose can be suppressed.
  • the display position of information regarding the sound emitted from the sound source can be optimized.
  • FIG. 20 is a flowchart of audio processing according to modification 1.
  • the audio processing shown in FIG. 20 is started after the display device 2 and multi-microphone device 30 are powered on and the initial settings are completed.
  • the start timing of the process shown in FIG. 20 is not limited to this.
  • the process shown in FIG. 20 may be repeatedly executed, for example, at a predetermined period, so that the user of the display device 2 can view images that are updated in real time.
  • the processor included in the multi-microphone device 30 inputs the audio signals received from the microphones 31-1 to 31-5 into the arrival direction estimation model, thereby calculating the speech sounds collected by the microphones 31-1 to 31-5.
  • the direction of arrival (that is, the direction of the source of the speech sound with respect to the multi-microphone device 30) is estimated.
  • the processor for example, in the microphone coordinate system, moves the predetermined direction (in the first modification, the front (x+ direction) of the multi-microphone device 30) determined based on the microphones 31-1 to 31-5 to 0. Express the direction of arrival of the speech sound by the declination angle from the axis defined as degrees.
  • the processor included in the multi-microphone device 30 inputs the estimated direction of arrival into the beamforming model to calculate parameters for forming a directivity with a beam in the direction of arrival.
  • the controller 10 executes the process shown in FIG. 13. First, the controller 10 acquires measurement results (S1130) and updates the reference direction (S1131), as in the present embodiment.
  • step S112 the controller 10 executes voice recognition processing (S114), text image generation (S115), display mode determination (S116), and image display (S117), as in the present embodiment.
  • Each step of the above information processing can be executed by any of the display device 1, display device 2, controller 10, and multi-microphone device 30.
  • the controller 10 of Modification 1 may acquire a multi-channel audio signal generated by the multi-microphone device 30, and estimate the direction of arrival (S131) and extract the audio signal (S132).
  • a plurality of display devices 1 or 2 may be connected to one controller 10.
  • the display mode of information may be configured to be changeable for each display device 1 or display device 2.
  • a user's instruction is input from the input device of the controller 10, but the invention is not limited to this.
  • a user's instruction may be input from an operation unit included in the display device 1 or the display device 2.
  • the display device 1 may display information regarding the sound emitted from the sound source at a display position corresponding to the reference direction.
  • information will be displayed on the right side of the display, and if the user is facing to the right of the reference direction, the information will be displayed on the display. The information is displayed on the left side of the screen. If the direction of the sound source with respect to the user matches the reference direction, information regarding the sound emitted from the sound source is displayed at the position where the sound source exists as viewed from the user.
  • the controller 10 detects a predetermined swinging motion, the controller 10 resets the reference direction to match the front direction of the display device 1. This eliminates the deviation between the direction of the sound source relative to the user and the reference direction, and accordingly corrects the deviation in the information display position.
  • Display device 2 Display device 10: Controller 11: Storage device 12: Processor 13: Input/output interface 14: Communication interface 21: Right temple 22: Right endpiece 23: Bridge 24: Left endpiece 25: Left temple 26: Rim 30 : Multi-mic device 31 : Microphone 101 : Microphone 102 : Display 103 : IMU sensor 200 : Information processing system

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Multimedia (AREA)
  • Computer Hardware Design (AREA)
  • Acoustics & Sound (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • General Health & Medical Sciences (AREA)
  • Optics & Photonics (AREA)
  • User Interface Of Digital Computer (AREA)
PCT/JP2023/023086 2022-06-23 2023-06-22 情報処理装置、ディスプレイデバイス、情報処理方法、及びプログラム Ceased WO2023249073A1 (ja)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2024529066A JPWO2023249073A1 (https=) 2022-06-23 2023-06-22

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2022-100752 2022-06-23
JP2022100752 2022-06-23

Publications (1)

Publication Number Publication Date
WO2023249073A1 true WO2023249073A1 (ja) 2023-12-28

Family

ID=89380071

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2023/023086 Ceased WO2023249073A1 (ja) 2022-06-23 2023-06-22 情報処理装置、ディスプレイデバイス、情報処理方法、及びプログラム

Country Status (2)

Country Link
JP (1) JPWO2023249073A1 (https=)
WO (1) WO2023249073A1 (https=)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011257342A (ja) * 2010-06-11 2011-12-22 Nsk Ltd ヘッドトラッキング装置及びヘッドトラッキング方法
JP2012059121A (ja) * 2010-09-10 2012-03-22 Softbank Mobile Corp 眼鏡型表示装置
US20170277257A1 (en) * 2016-03-23 2017-09-28 Jeffrey Ota Gaze-based sound selection
WO2021230180A1 (ja) * 2020-05-11 2021-11-18 ピクシーダストテクノロジーズ株式会社 情報処理装置、ディスプレイデバイス、提示方法、及びプログラム

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011257342A (ja) * 2010-06-11 2011-12-22 Nsk Ltd ヘッドトラッキング装置及びヘッドトラッキング方法
JP2012059121A (ja) * 2010-09-10 2012-03-22 Softbank Mobile Corp 眼鏡型表示装置
US20170277257A1 (en) * 2016-03-23 2017-09-28 Jeffrey Ota Gaze-based sound selection
WO2021230180A1 (ja) * 2020-05-11 2021-11-18 ピクシーダストテクノロジーズ株式会社 情報処理装置、ディスプレイデバイス、提示方法、及びプログラム

Also Published As

Publication number Publication date
JPWO2023249073A1 (https=) 2023-12-28

Similar Documents

Publication Publication Date Title
US11755122B2 (en) Hand gesture-based emojis
CN114761909B (zh) 头戴式设备及调整对虚拟内容项的渲染的方法
CN111630477B (zh) 提供增强现实服务的设备及其操作方法
US9401050B2 (en) Recalibration of a flexible mixed reality device
CN108170279B (zh) 头显设备的眼动和头动交互方法
US20170277257A1 (en) Gaze-based sound selection
US20190212828A1 (en) Object enhancement in artificial reality via a near eye display interface
CN110646938A (zh) 近眼显示器系统
US20210303258A1 (en) Information processing device, information processing method, and recording medium
US20200097246A1 (en) Systems and methods configured to provide gaze-based audio in interactive experiences
US11234092B2 (en) Remote inference of sound frequencies for determination of head-related transfer functions for a user of a headset
KR20170100641A (ko) 실세계 객체들의 가상 표현들
WO2017213070A1 (ja) 情報処理装置および方法、並びに記録媒体
JP2015536514A (ja) Imuを用いた直接ホログラム操作
WO2019155840A1 (ja) 情報処理装置、情報処理方法、およびプログラム
WO2017051595A1 (ja) 情報処理装置、情報処理方法及びプログラム
US11670157B2 (en) Augmented reality system
KR20190053001A (ko) 이동이 가능한 전자 장치 및 그 동작 방법
JP2018067115A (ja) プログラム、追跡方法、追跡装置
JP2018055589A (ja) プログラム、物体の追跡方法、表示装置
WO2020031486A1 (ja) 情報処理装置、情報処理方法、プログラム及び情報処理システム
JP7820732B2 (ja) 情報処理装置、ディスプレイデバイス、提示方法、及びプログラム
WO2023249073A1 (ja) 情報処理装置、ディスプレイデバイス、情報処理方法、及びプログラム
JP2024027122A (ja) 情報処理装置、情報処理方法、及びプログラム
US20230120092A1 (en) Information processing device and information processing method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23827255

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2024529066

Country of ref document: JP

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 23827255

Country of ref document: EP

Kind code of ref document: A1