WO2022044342A1 - Head-mounted display and voice processing method therefor - Google Patents

Head-mounted display and voice processing method therefor Download PDF

Info

Publication number
WO2022044342A1
WO2022044342A1 PCT/JP2020/032962 JP2020032962W WO2022044342A1 WO 2022044342 A1 WO2022044342 A1 WO 2022044342A1 JP 2020032962 W JP2020032962 W JP 2020032962W WO 2022044342 A1 WO2022044342 A1 WO 2022044342A1
Authority
WO
WIPO (PCT)
Prior art keywords
unit
real space
voice
sound source
head
Prior art date
Application number
PCT/JP2020/032962
Other languages
French (fr)
Japanese (ja)
Inventor
仁 秋山
治 川前
保 伊藤
Original Assignee
マクセル株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by マクセル株式会社 filed Critical マクセル株式会社
Priority to PCT/JP2020/032962 priority Critical patent/WO2022044342A1/en
Publication of WO2022044342A1 publication Critical patent/WO2022044342A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R1/00Details of transducers, loudspeakers or microphones
    • H04R1/20Arrangements for obtaining desired frequency or directional characteristics
    • H04R1/32Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only
    • H04R1/40Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only by combining a number of identical transducers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones

Definitions

  • the present invention relates to a head-mounted display and a voice processing method thereof.
  • HMD Head Mounted Display
  • VR virtual reality
  • AR augmented reality
  • Patent Document 1 defines a spherical object called a sound reflection object in order to improve the entertainment property of the virtual space, and inputs the sound reflection object to the microphone as the reflection is generated.
  • a configuration is disclosed in which the user's sound is processed and output from the headphones.
  • a conventional general HMD in the case of a video transmission type (video see-through type) in which a real space image can be captured by a shooting means (camera), the shot real space image and an AR object are superimposed on the display screen of the HMD. Is displayed. At that time, as the sound provided to the user (acoustic space including sound effects), the sound in the real space acquired by the voice input means (microphone) mounted on the HMD is output as it is regardless of the presence or absence of the AR object. .. Therefore, the voice heard by the user does not reflect the influence of the AR object visually recognized by the user, and may give a sense of discomfort to the user depending on the arrangement status of the AR object.
  • FDTD Finite Difference Time Domain
  • Patent Document 1 describes a method of using a non-transmissive HMD, arranging a VR object in a virtual space, and changing the sound effect by the presence of a sound reflection object.
  • the influence on the sound generated by the sound source in the real space due to the arrangement of the VR object or the AR object other than the sound reflection object is not taken into consideration, and the discomfort given to the user cannot be eliminated.
  • Patent Document 1 defines that the sound source to be processed for sound exists at the center of the sound reflection object, it is difficult to apply it when the positional relationship between the sound source position and the AR object changes.
  • An object of the present invention is to arrange an AR object in a real space, process a voice generated in the real space in real time in consideration of the arrangement of the AR object, and present the head-mounted display to the user and a voice processing method thereof. To provide.
  • the head-mounted display of the present invention has an imaging unit that captures an image in real space, an AR object generation unit that generates an AR object, and an image in real space that has been captured, as image processing. It includes an AR object superimposing unit that superimposes the generated AR object, and a display unit that displays the superposed real space image and the AR object. Further, as voice processing, the positional relationship between the voice input unit for inputting the voice emitted in the real space, the object existing in the real space arranged by the AR object superimposing unit, and the AR object displayed superimposed on the object.
  • a voice data processing unit that processes at least one of the voice data of the voice from the real space and the voice emitted by the AR object in consideration of the acoustic characteristics of the surrounding object or the surrounding AR object, and the voice data. It is provided with an audio output unit that outputs audio from a real space processed by the processing unit or audio of an AR object.
  • the schematic diagram which shows an example of the voice processing treated in this invention The schematic diagram which shows an example of the voice processing treated in this invention.
  • the schematic diagram which shows the other example of the voice processing treated in this invention The schematic diagram which shows the other example of the voice processing treated in this invention.
  • FIG. The figure which shows the functional block composition of HMD in Example 1.
  • FIG. The figure which shows the example of the notation of acoustic characteristics.
  • the figure which shows the example of video / audio processing at the time of AR object superimposition display The figure which shows the example of video / audio processing at the time of AR object superimposition display.
  • the figure which shows the example of AR object superimposition display. A flowchart showing a video processing procedure at the time of superimposed display.
  • FIG. A flowchart showing a voice processing procedure at the time of superimposed display.
  • FIGS. 1A and 1B are schematic views showing an example of voice processing handled in the present invention.
  • the user 10 is wearing a head-mounted display (HMD) 1 and is listening to a voice emitted from the mouth 301 of the person 30.
  • the HMD1 is attached to the head of the user 10, and the view of the user 10 is covered with the display screen of the HMD1.
  • a camera 70 is arranged in the center of the front of the HMD 1 to capture a real-space image in the front direction of the user 10.
  • the HMD1 is a video transmission type HMD, and the captured image is displayed on the display screen of the HMD1.
  • a microphone for acquiring real-space sound is arranged in the vicinity of the camera 70.
  • the sound acquired by the microphone is processed as described later as necessary, and is output from the headphone 80 to the user 10.
  • the HMD 1 and the headphone 80 are electrically connected, they may be structurally integrated.
  • FIG. 1A shows a state in which there is no AR object on the line segment connecting the user 10 and the mouth 301 of the person 30.
  • the user 10 can visually recognize the person 30 in front by displaying the image taken by the camera 70 of the HMD1. Further, by acquiring the voice emitted from the mouth 301 of the person 30 with the microphone of the HMD 1 and outputting it from the headphone 80 without processing as it is, the user 10 can hear the voice of the person 30 without discomfort.
  • FIG. 1B shows a state in which the AR object 35 is arranged on the line segment connecting the user 10 and the mouth 301 of the person 30.
  • the AR object 35 here is an acoustic obstacle that attenuates the transmission of sound, such as a tsuitate.
  • the person 30 can be photographed by the camera 70 of the HMD 1, the person 30 is hidden behind the AR object 35 and is not displayed on the display screen of the HMD 1, and the user 10 visually recognizes the person 30. Can not do it.
  • the sound emitted from the mouth 301 of the person 30 can be acquired by the microphone of the HMD 1 and output from the headphones 80.
  • FIG. 1A when the acquired voice data is output from the headphone 80 without being processed as it is, the voice of the invisible person 30 can be heard without being affected by the AR object 35 (acoustic obstacle). As a result, the user 10 feels uncomfortable.
  • the sound emitted from the mouth 301 of the person 30 is generated as if the AR object 35 exists in the real space, taking into account the acoustic characteristics of the AR object 35.
  • Process as follows. Specifically, according to the shape and material of the AR object 35, processing processing for attenuating the audio data is performed and output. As a result, it is possible to reduce the discomfort of the voice given to the user 10.
  • FIGS. 1C and 1D are schematic views showing other examples of voice processing handled in the present invention.
  • the user 10 wearing the HMD 1 is located behind the person 30 and is in a state of listening to the voice emitted from the mouth 301.
  • FIG. 1C shows a state in which there is no AR object on a straight line connecting the user 10 and the mouth 301 of the person 30.
  • FIG. 1D shows a state in which the AR object 35'is arranged on the extension line connecting the user 10 and the mouth 301 of the person 30. It is assumed that the AR object 35'here is an acoustic reflector that reflects the transmission of sound, for example, a mirror surface. In this state, the voice emitted from the mouth 301 of the person 30 is reflected by the AR object 35'and reaches the user 10 (indicated by a chain line in the figure). Therefore, as in FIG. 1C, if the audio data acquired by the microphone is output from the headphone 80 without being processed as it is, the user 10 feels uncomfortable.
  • the sound emitted from the mouth 301 of the person 30 is processed in consideration of the acoustic characteristics (reflection characteristics) of the AR object 35'. Specifically, according to the shape and material of the AR object 35', processing processing for amplifying audio data is performed and output. As a result, it is possible to reduce the discomfort of the voice given to the user 10.
  • the audio data is processed and output in consideration of the acoustic characteristics (attenuation characteristics / reflection characteristics) of the AR object. Then, as shown in FIG. 1B, the case where the AR object is an acoustic obstacle (attenuation characteristic) will be mainly described.
  • HMD head-mounted display
  • FIG. 2A is a diagram showing a hardware configuration of a head-mounted display (HMD) 1.
  • the HMD 1 includes a main control unit 2, a system bus 3, a storage unit 4, a sensor unit 5, a communication processing unit 6, a video processing unit 7, an audio processing unit 8, and an operation input unit 9.
  • the main control unit 2 is a microprocessor unit that controls the entire HMD 1 according to a predetermined operation program.
  • the system bus 3 is a data communication path for transmitting and receiving various commands and data between the main control unit 2 and each constituent block in the HMD 1.
  • the storage unit 4 stores various data such as a program unit 41 that stores a program for controlling the operation of the HMD 1, an operation setting value, a detection value from the sensor unit, an object including contents, and library information downloaded from the library. It has various data units 42 and a rewritable program function unit 43 such as a work area used for various program operations.
  • the storage unit 4 can store an operation program downloaded from the network, various data created by the operation program, and the like. In addition, it is possible to store contents such as moving images, still images, and sounds downloaded from the network. In addition, it is possible to store data such as moving images and still images taken by using the shooting function of the camera.
  • the storage unit 4 needs to hold the stored information even when the HMD 1 is not supplied with power from the outside. Therefore, for example, a device such as a semiconductor element memory such as a flash ROM or SSD (Solid State Drive), a magnetic disk drive such as an HDD (Hard Disc Drive), or the like is used.
  • Each operation program stored in the storage unit 4 can be updated and expanded in function by a download process from each server device on the network.
  • the sensor unit 5 is a group of various sensors for detecting the state of the HMD1.
  • the sensor group includes a GPS (Global Positioning System) receiving unit 51, a geomagnetic sensor unit 52, a three-dimensional sensor unit 53, an acceleration sensor unit 54, and a gyro sensor unit 55. These sensor groups detect the position, tilt, direction, movement, etc. of the HMD1. In addition to this, an illuminance sensor, an altitude sensor, a proximity sensor, and the like may be further provided.
  • the three-dimensional sensor unit 53 measures the distance to each point of the object.
  • 3D sensors 3D scanners
  • non-contact light (lattice pattern) projection method non-contact laser light cutting method
  • any method may be used.
  • the communication processing unit 6 has a LAN (Local Area Network) communication unit 61 and a telephone network communication unit 62.
  • the LAN communication unit 61 is connected to a network such as the Internet via an access point or the like, and transmits / receives data to / from each network server device on the network.
  • the connection with the access point or the like may be made by a wireless connection such as Wi-Fi (registered trademark).
  • the telephone network communication unit 62 performs telephone communication (call) and data transmission / reception by wireless communication with a base station or the like of a mobile telephone communication network. Communication with base stations, etc. is performed by W-CDMA (Wideband Code Division Multiple Access) (registered trademark) method, GSM (Global System for Mobile communications) method, LTE (Long Term Evolution) method, or other communication methods. good.
  • W-CDMA Wideband Code Division Multiple Access
  • GSM Global System for Mobile communications
  • LTE Long Term Evolution
  • the LAN communication unit 61 and the telephone network communication unit 62 each include a coding circuit, a decoding circuit, an antenna, and the like. Further, the communication processing unit 6 may further include other communication units such as an infrared communication unit and a Bluetooth (registered trademark) communication unit.
  • the video processing unit 7 has an imaging unit 71, a display unit 72, and a video calculation unit 73.
  • the image pickup unit 71 uses an electronic device such as a CCD (Charge Coupled Device) or a CMOS (Complementary Metal Oxide Semiconductor) sensor to convert the light input from the lens into an electric signal to convert video data of an external scene or an object.
  • the camera 70 (FIGS. 1A to 1D) to be acquired.
  • the display unit 72 provides image data (AR object) and video data (camera-captured video) to the user 10 wearing the HMD 1.
  • the display unit 72 is a display device such as a backlit type liquid crystal display or a self-luminous organic EL display.
  • the video calculation unit 73 performs arithmetic processing related to video, such as a process of generating an augmented reality (AR object) image by a computer and a process of superimposing a real space image acquired by the imaging unit 71 and an AR object.
  • arithmetic processing related to video such as a process of generating an augmented reality (AR object) image by a computer and a process of superimposing a real space image acquired by the imaging unit 71 and an AR object.
  • the voice processing unit 8 has a voice input unit 81, a voice output unit 82, and a voice calculation unit 83.
  • the voice input unit 81 is a microphone that converts real space sounds, user voices, and the like into voice data and inputs them.
  • the audio output unit 82 is a headphone 80 (FIGS. 1A to 1D) that outputs audio information and the like necessary for the user.
  • the voice calculation unit 83 performs calculation processing related to voice, such as detection of the position of a sound source and processing of voice data based on the acoustic characteristics of an AR object (acoustic changes due to material, thickness, cross-sectional area, etc.).
  • the operation input unit 9 receives an input of an operation instruction to the HMD1.
  • the operation input unit 9 is composed of operation keys and the like in which button switches and the like are arranged, but may further include other operation devices.
  • the communication processing unit 6 may be used to operate the HMD 1 by using a separate mobile terminal device connected by wired communication or wireless communication.
  • the HMD1 may be operated by analyzing the captured image of the image pickup unit 71 of the image processing unit 7 and performing an operation such as a gesture.
  • the hardware configuration example of HMD1 shown in FIG. 2A includes many configurations that are not essential to this embodiment, but even if the configuration is not provided with these, the effect of this embodiment may be impaired. do not have.
  • FIG. 2B is a diagram showing a functional block configuration of the head-mounted display (HMD) 1.
  • HMD head-mounted display
  • the 3D sensor information acquisition unit 12 acquires information from the 3D sensor unit 53 of the sensor unit 5.
  • the information from the three-dimensional sensor unit 53 includes distance information from the HMD 1 to each point of the object in the real space.
  • the 3D data processing unit 13 grasps the shape of the object based on the information acquired by the 3D sensor information acquisition unit 12 (distance information to each point of the object).
  • the three-dimensional data storage unit 14 stores the three-dimensional data in the real space obtained by the three-dimensional data processing unit 13 in various data units 42 of the storage unit 4. In this embodiment, it is premised that the three-dimensional data in the real space is acquired in advance and stored by the three-dimensional data storage unit 14.
  • the shooting data acquisition unit 15 captures the real space by the imaging unit 71 (camera 70) of the video processing unit 7 and acquires the shooting data. With this function, it is possible to provide a real space image to a user 10 who wears a video transmissive HMD1 that cannot directly see the real space.
  • the AR object information storage unit 16 stores information about the AR object in various data units 42 of the storage unit 4.
  • the information about the AR object includes information about the image of the AR object (shape, size, arrangement, etc.) and incidental information about the virtual elements (hereinafter, components) that make up the AR object (material and thickness of each component). ⁇ Cross-sectional area, etc.) is included. Further, the AR object information and the AR object incidental information can be acquired from an external server connected to the Internet network via the LAN communication unit 61 of the communication processing unit 6.
  • the AR object generation unit 17 generates an AR object by using the video calculation unit 73 of the video processing unit 7 based on the information about the image of the AR object stored in the AR object information storage unit 16.
  • the AR object to be generated determines display specifications such as its shape, size, and arrangement according to the shape information of the real space stored in the three-dimensional data storage unit 14.
  • the AR object superimposing unit 18 superimposes the real space image acquired by the shooting data acquisition unit 15 and the AR object generated by the AR object generation unit 17 by using the image calculation unit 73 of the image processing unit 7. Specifically, occlusion processing such as displaying the real space image located in front (front) of the AR object but not displaying the real space image located in the back (back) of the AR object is performed and superimposed processing is performed. I do.
  • the display data output unit 19 displays the real space image superimposed by the AR object overlay unit 18 and the image of the AR object on the display screen of the HMD1.
  • the acoustic characteristic data storage unit 21 stores acoustic characteristic data related to acoustic characteristics (attenuation characteristics / reflection characteristics) with parameters such as material, thickness, and cross-sectional area for each component of the AR object.
  • acoustic characteristic data storage unit 21 also stores acoustic characteristic data related to materials other than the components of the AR object to be adopted. Further, it is also possible to acquire the acoustic characteristics related to the material of the component of the AR object from the external server connected to the Internet network via the LAN communication unit 61 of the communication processing unit 6.
  • the voice data acquisition unit 22 is the voice input unit 81 (microphone) of the voice processing unit 7, and acquires the voice in the real space.
  • the sound source position specifying unit 23 detects the position of the sound source.
  • the voice calculation unit 83 of the voice processing unit 8 is used from the arrangement information of the real space stored in the three-dimensional data storage unit 14 and the image of the real space acquired by the shooting data acquisition unit 15. Identify the location of the sound source.
  • the AR object shielding determination unit 24 uses the positional relationship between the sound source position of the audio data acquired by the sound source position specifying unit 23 and the display position of the AR object displayed by the display data output unit 19, and the sound source position of the audio data. It is determined whether or not the display position of the AR object exists on the line segment connecting the user 10 and the position of the user 10 (that is, the position of the HMD1). That is, from the viewpoint of the user 10 (HMD1), it is determined whether or not the sound source position of the audio data is shielded by the AR object.
  • the voice data processing unit 25 analyzes and processes the voice data acquired by the voice data acquisition unit 22 according to the determination result of the AR object shielding determination unit 24. Specifically, when it is determined that the sound source position of the audio data is shielded by the AR object, the information of the AR object component is selected from the acoustic characteristics of each material stored in the acoustic characteristic data storage unit 21. Select the corresponding acoustic characteristics based on. Further, the voice data is analyzed and processed by the voice calculation unit 83 of the voice processing unit 8 using the selected acoustic characteristics.
  • the voice data output unit 26 outputs the voice processed by the voice data processing unit 25 from the voice output unit 82 (headphone 80) of the voice processing unit 7 toward the user 10.
  • FIG. 3 is a diagram showing an example of notation for acoustic characteristics, and shows two examples (a) and (b) having different components.
  • (A) is an example of a low-pass type characteristic
  • (b) is an example of a band-pass type characteristic
  • the change in amplitude (response characteristic) is indicated by taking time or frequency on the horizontal axis.
  • the acoustic characteristics of the actual object can be measured by a stationary analysis method, a non-stationary analysis method such as an impulse response, and stored in the acoustic characteristic data storage unit 21.
  • the impulse response according to the incidental information such as the material, thickness, and cross-sectional area of the constituent elements of the AR object is stored in a pattern.
  • the impulse response is a time change of the output amplitude waveform when an impulse (a signal having a very short time) is input.
  • FIG. 4 is a table describing the impulse response to the components of the AR object. This table is stored in the acoustic characteristic data storage unit 21.
  • the items in the table 450 are the acoustic characteristics (impulse response) of the component with respect to the parameters of the number 451 that identifies the component of the AR object, the material 452 of the component, the thickness 453 of the component, and the cross-sectional area 454 of the component. ) 455 is described in a pattern.
  • the parameters of the material 452 are distinguished by metal, glass, plastic and the like.
  • the thickness 453 is distinguished by two patterns of thick / thin, and the cross-sectional area 454 is distinguished by two patterns of large / small.
  • the impulse response 455 of each parameter is given in a pattern with a response value 1 to a response value 4 or the like.
  • FIG. 5 is a diagram schematically showing an impulse response waveform, and the response values 1 to 4 in FIG. 4 are shown as an example.
  • the impulse response is a time change of the output amplitude waveform when an impulse is input, and changes depending on the combination of parameters such as the material, thickness, and cross-sectional area of the component.
  • the response value 1 is a case where the material is metal, the thickness is thick, and the cross-sectional area is large, and the response waveform has a large amplitude attenuation.
  • the response value 2 is a case where the material is metal, the thickness is thick, and the cross-sectional area is small, and the response waveform has a small amplitude attenuation.
  • the response value 3 and the response value 4 are cases where the material is metal and the thickness is thin, and the difference in frequency characteristics appears as the difference in the shape of the response waveform. In this way, the impulse responses to various conditions of the components are patterned and expressed. Impulse response values for materials other than metal can be patterned in the same way.
  • the corresponding impulse response value is selected by referring to the table according to the material, thickness, and cross-sectional area of the components of the AR object. If the AR object information storage unit 16 does not have information on the components of the AR object (material, thickness, cross-sectional area, etc.), the information on the image of the AR object (shape, size, surface texture, etc.) can be used. It is also possible to estimate. Since the impulse response values are patterned in this way, it is possible to simplify and speed up the arithmetic processing.
  • the table in which the acoustic characteristics of this component are patterned and the impulse response value may be stored in various data units 42 of the storage unit 4 of the HMD 1, but communication processing is performed when the material of the AR object is known. It is also possible to download and use only the necessary impulse response value from an external server connected via the LAN communication unit 61 of the unit 6.
  • the impulse response value is adopted as the acoustic characteristic used in this embodiment, as shown in FIG. 3, a filter characteristic (for example, a low-pass filter) that defines the frequency response can also be adopted.
  • a filter characteristic for example, a low-pass filter
  • the acoustic characteristics used in this embodiment are described assuming the state of FIG. 1B and paying attention to the attenuation characteristics when the voice passes through the AR object 35. Furthermore, in order to correspond to the state of FIG. 1D, by considering the reflection characteristic when the voice is reflected by the AR object 35', more realistic voice processing becomes possible. In this case, the distance between the AR object and the sound source in the real space may be added as a parameter to the table of FIG.
  • Example of superimposed display by HMD 6A to 6C are diagrams showing an example of video / audio processing at the time of superimposing and displaying an AR object. First, video processing will be described.
  • FIG. 6A is a real-space landscape seen in front of the user, and is photographed by the image pickup unit 71 (camera 70) of the HMD1.
  • the image pickup unit 71 camera 70
  • the background 31 of the room and the person 32 sitting in the corner of the room the mouth 321 which is the sound source emitted by the person 32
  • the mouth 321 which is the sound source emitted by the person 32
  • FIG. 6B is an AR object generated by the AR object generation unit 17 of the HMD1.
  • an example of an AR object 36 of an automobile is shown as an AR object.
  • Image information of various AR objects is stored in the AR object information storage unit 16, and is selected from the image information.
  • FIG. 6C is an image in which the photographed image of the real space of FIG. 6A (the image of the background 31 of the room and the image of the person 32) and the AR object 36 of FIG. 6B are superimposed, and the display screen is displayed by the display data output unit 19 of the HMD 1. It is displayed on (display unit 72). The user 10 wearing the HMD1 can visually recognize such a superposed image via the HMD1.
  • a real space image is captured by the image pickup unit 71, the image data is processed, and the image data is superimposed and displayed on the display unit 72 together with the AR object 36.
  • the AR object superimposing unit 18 considers the state of the real space (arrangement and size of the object, etc.) stored in the three-dimensional data storage unit 14, and the size and arrangement of the AR object is the real space. Process so that it does not conflict with the state. That is, the real space image (a part of the background 31) that is not shielded by the AR object 36 is displayed, and the real space image data (a part of the background 31 and the person 32) that is shielded by the AR object 36 is not displayed. Perform processing.
  • the AR object 36 (automobile) is present in front of the background 31 and the person 32, and the person 32 is hidden behind the AR object 36 (automobile).
  • the person 32 is hidden behind the AR object 36 and cannot be seen, but the outline of the person 32 is shown by a broken line 32'in order to show the position.
  • the sound is collected by the voice input unit 81 (microphone) and output as it is from the voice output unit 82 (headphone 80).
  • the voice input unit 81 microphone
  • the voice output unit 82 headphone 80
  • the user 10 wearing the HMD 1 does not feel any discomfort when listening to the voice.
  • the sound source position is specified by the sound source position specifying unit 23 to be the mouth 321 of the person 32.
  • the AR object shielding determination unit 24 determines whether or not the sound source position of the audio data is shielded by the AR object as seen from the user 10.
  • the voice data processing unit 25 processes the voice data into a voice that is slightly trapped (muffled), and the processed voice is processed from the voice output unit 82 (headphone 80). Output.
  • the user 10 feels that the voice emitted by the person 32'is heard from behind, and can intuitively grasp that the voice is from the person 32'behind the AR object 36 (automobile). ..
  • the voice data processing unit 25 has acoustic characteristics corresponding to the components (for example, the material is metal, the thickness is thin, and the cross-sectional area is large) of the AR object (automobile) stored in the acoustic characteristic data storage unit 21.
  • the impulse response values 3 FIGGS. 4 and 5
  • FIGS. 7A and 7B are diagrams showing other examples of video / audio processing at the time of superimposed display of AR objects.
  • a case where a person gets on an AR object (automobile) a case where the processing of voice data from the person is different depending on the open / closed state of the window is shown.
  • FIG. 7A shows a state in which the AR object 37 and the person 33 in the real space are superimposed and displayed.
  • the person 33 adjusts the arrangement and size of the AR object 37 so as to be in the AR object 37 (automobile). Further, the window 371 of the AR object 37 is closed, and the person 33 is displayed semi-transparently through the window 371.
  • the voice emitted by the person 33 is processed according to the acoustic characteristics of the AR object 37 and presented to the user 10 wearing the HMD 1. Since the window 371 whose material is glass exists as a component of the AR object on the line segment connecting the mouth 331 of the person 33 and the user 10, the impulse response value of glass is used for the acoustic characteristics.
  • FIG. 7B shows a state in which the passenger seat window 371 of the AR object 37 is open. Since there is no component of the AR object on the line segment connecting the mouth 331 of the person 33 and the user 10, the voice emitted by the person 33 is output to the user 10 without any processing, as in the state of FIG. 6A. do. In this way, by making the voice presented to the user 10 different according to the change of the AR object (the open / closed state of the window), the voice heard by the user 10 becomes more realistic.
  • opening and closing of the window was explained as a change of the AR object, but it can be applied to any form as long as it is a change of the AR object. Needless to say, for example, it can be applied to the open / closed state of a door in a house displayed as an AR object, the attachment / detachment state of a mask attached to a person's mouth, and the like.
  • the AR object is stationary, but it goes without saying that the above voice processing can be applied even when the AR object moves.
  • the voice heard by the user 10 may be changed over time depending on the position of the AR object passing in front of the sound source.
  • FIG. 8 is a flowchart showing a video processing procedure for superimposing and displaying a real space video and an AR object.
  • FIG. 8 shows a case where the real space image of FIG. 6A and the AR object of FIG. 6B are superimposed and displayed as superimposed as shown in FIG. 6C. Further, the following processing is executed by each functional block shown in FIG. 2B.
  • the shooting data acquisition unit 15 acquires the shooting data in the real space (S412).
  • the AR object generation unit 17 analyzes the information regarding the image of the acquired AR object 36 and generates the AR object 36 (S414).
  • AR When generating an AR object, AR to be displayed by obtaining real space information (three-dimensional data of the real space acquired in advance) stored in the three-dimensional data storage unit 14 and considering the shape of the real space. The arrangement and size of the object 36 are determined.
  • the AR object superimposing unit 18 superimposes the shooting data (real space image) acquired in the shooting data acquisition process (S412) and the AR object 36 generated in the AR object generation process (S414) (S415). At that time, an occlusion process is performed in which the image (image) portion hidden in the shadow is not displayed according to the positional relationship between the image in the real space seen from the user 10 (HMD1) and the depth direction of the AR object.
  • the display data output unit 19 outputs the display data to which the AR object superimposition processing (S415) has been performed to the display screen of the HMD 1 (S416).
  • FIG. 9 is a flowchart showing a procedure for processing audio data at the time of superimposed display. This is a process of converting the voice emitted by the person 32 in the real space of FIG. 6A into voice data that can be heard by the user 10 with the AR object 36 of FIG. 6C superimposed, and outputting the voice data.
  • the incidental information material, thickness, cross-sectional area, etc. of the components
  • the incidental information material, thickness, cross-sectional area, etc. of the components
  • the voice data acquisition unit 22 determines whether or not the voice input unit 81 has a voice input (S433). As a result of the determination, if there is a voice input, the process proceeds to S434, and the voice data acquisition unit 22 acquires the input voice data. If there is no voice input, wait until there is voice input.
  • the sound source position specifying unit 23 specifies the sound source position (the position of the mouth 321 of the person 32) (S435).
  • the position of the mouth 321 of the person 32 which is the sound source position, is determined by analyzing the real space arrangement information stored in the three-dimensional data storage unit 14 and the real space image acquired by the shooting data acquisition unit 15. Can be identified.
  • the AR object shielding determination unit 24 determines whether or not the sound source position is shielded by the AR object. That is, it is determined whether or not the AR object 36 exists on the line segment connecting the sound source (mouth 321 of the person 32) and the user 10 wearing the HMD 1 (S436).
  • the component of the AR object 36 that shields the sound source is specified. Then, the acoustic characteristic data corresponding to the component of the AR object 36 is selected from the acoustic characteristic data stored in the acoustic characteristic data storage unit 21. In the case of FIG. 6C, the acoustic characteristic data of the AR object 36 adopts the impulse response value 3 corresponding to the material being metal, the thickness being thin, and the cross-sectional area being large.
  • the voice data processing unit 25 processes voice data according to the acoustic characteristics (response value 3) selected in S437 (S438).
  • the audio data output unit 26 outputs audio data from the audio output unit 81 (headphone 80).
  • the sound source position 301 was shielded by the AR object 35, assuming the state of FIG. 1B. Further, in order to correspond to the state of FIG. 1D, it is determined whether or not the sound is reflected by the AR object 35'in front of the sound source position 301, and the sound characteristic (reflection characteristic) of the AR object 35'is determined.
  • the audio data processing process (S438) may be performed.
  • the voices emitted in the real space are taken up as the sound source, but further, the voices emitted by the AR objects (automobiles) 36 and 37 (for example, engine sound and horn sound) are taken up. ) Can also be processed.
  • the voice data output processing S439, the voice data in the real space and the voice data generated from the AR object are superimposed and the voice is output.
  • the video processing and the audio processing in the superimposed display are performed by the video calculation unit 73 and the audio calculation unit 83, which are hardware elements, but it goes without saying that these can also be realized by software processing.
  • the superimposed display of the AR object and the audio output in consideration of the arrangement of the AR object are realized, and the user can watch the video and audio without discomfort. Further, since the patterned acoustic characteristic data is used in the voice data processing by the voice calculation unit, the processing time can be shortened and the processing can be performed in real time.
  • the arrangement and size of the AR objects are determined so that there is no contradiction in the arrangement in the real space.
  • the arrangement and size of the AR object are determined first, and the image in the real space is edited and superimposed according to this.
  • the basic configuration of the HMD in the second embodiment is the same as the configuration described in the first embodiment, but the function for editing the image in the real space is added to the functional block.
  • FIG. 10 is a diagram showing a functional block configuration of the HMD.
  • a shooting data editing unit 27 and a sound source position moving unit 28 are added to the functional block configuration of FIG. 2B.
  • the shooting data editing unit 27 edits the shooting data and edits the size and arrangement of the shooting target. That is, the object to be photographed is edited according to the AR object generated by the AR object generation unit 17, and both are superimposed and displayed by the AR object superimposition unit 18.
  • the sound source position moving unit 28 processes the audio data so as to match the moved sound source position when the sound source moves.
  • the sound source position can be arbitrarily set by adjusting the volume difference between the left and right headphones, the arrival time difference between the left and right headphones, and the like.
  • FIGS. 11A and 11B are diagrams showing an example of AR object superimposed display. This is a case where the image of the person 34, which is the shooting data in the real space, is edited and superimposed on the AR object 37 (automobile) to display the state in which the person 34 is on the AR object 37 (automobile).
  • FIG. 11A shows a case where the person 34 and the AR object 37 are superimposed without any image processing.
  • the person 34 shown by the broken line exists in the front portion (engine portion) of the AR object 37 (automobile), and is in a state that cannot be realized in reality.
  • the arrangement of the AR object 37 (automobile) is changed to realize the state in which the person 34 is on board, but in this embodiment, the arrangement of the person 34 is changed to get on the automobile. Realize the state of being.
  • FIG. 11B shows a case where the image of the person 34 is edited and superimposed on the AR object 37.
  • the position of the person 34 is moved in the direction of the arrow 39 and placed at the position of the window 371 of the AR object 37.
  • the person 34'after placement is made to look translucent through the window 371, and the size of the person 34'is also adjusted according to the size of the AR object 37.
  • the shooting data editing unit 27 edits the shooting data (size and arrangement) of the person 34 existing in the real space, and the image is not unnatural even if it is superimposed on the AR object 37. Has been realized.
  • the position of the person 34 is moved in the direction of the arrow 39 and displayed as the person 34'.
  • the sound source position of the voice emitted from the person 34 changes from the mouth 341 before the movement to the mouth 341'after the movement.
  • the sound source position moving unit 28 processes the voice data so that the voice of the person 34 is emitted from the moved sound source position 341'.
  • the voice of the person 34' can be provided so as not to make the person feel unnatural.
  • FIG. 12 is a flowchart showing a video processing procedure at the time of superimposed display in this embodiment. This is based on the flowchart of the first embodiment (FIG. 8), and steps S421 to S422 for editing a real space image are added. Here, only the differences from the flowchart of FIG. 8 will be described.
  • the shooting data is edited by the shooting data editing unit 27.
  • the shooting data of the person 34 is edited, the person 34 is arranged at the position of the window 371 of the AR object 37, and the size of the person 34 is also adjusted according to the size of the AR object 37. After that, it is output as display data in S416.
  • FIG. 13 is a flowchart showing a procedure for processing audio data at the time of superimposed display. This is based on the flowchart of the first embodiment (FIG. 9), and steps S441 to S443 of voice processing accompanying the movement of the sound source position are added. Here, only the differences from the flowchart of FIG. 9 will be described.
  • the sound source position for example, the mouth 341 of the person 34 in FIG. 11A
  • the sound source position specifying unit 23 in S435 whether or not the shooting data is edited by the shooting data editing unit 27 in the newly added S441 is determined. judge. That is, in the video processing flowchart of FIG. 12, it is determined whether or not the shooting data editing process of S422 is executed.
  • the process proceeds to the added S442, and if the shooting data is not edited, the process proceeds to S436.
  • S442 it is determined whether or not it is necessary to move the sound source position with the editing of the shooting data. For example, in editing the shooting data from FIG. 11A to FIG. 11B, it is determined whether or not the sound source position (mouth 341 of the person 34) needs to be moved. If it is necessary to move the sound source position, the process proceeds to S443, and if it is not necessary to move the sound source position, the process proceeds to S436.
  • the sound source position moving unit 28 processes the voice data so as to match the moved sound source position.
  • the AR object shielding determination unit 24 determines whether or not the sound source position is shielded by the AR object.
  • the sound source position at this time will be the sound source position after the movement if the sound source position has been moved in S443.
  • audio data is processed and output according to the acoustic characteristics of the AR object to be shielded.
  • the voice emitted from the mouth 341 of the person 34 in FIG. 11A is output as if it was emitted from the mouth 341'in FIG. 11B and with the influence of the shielding of the window 371 added.
  • the second embodiment even if the relationship between the real space image and the arrangement and size of the AR object is unnatural, by editing the real space image or processing the real space sound. , The unnaturalness can be resolved and presented.
  • the second embodiment by editing the video in the real space, it is possible to display a display that cannot be realized in reality. For example, it is possible to realistically display the virtual state in which a person in real space is riding in a car floating in the air as an AR object. It is difficult for a person to take such a posture in the real space, but it can be easily realized by editing the image in the real space.
  • FIG. 14 is a diagram showing a state in which the user 10 wears the HMD 1 of this embodiment.
  • Two cameras 711 and 712 are mounted on the left and right ends as the image pickup unit 71 of the HMD 1, and by displaying the real space image taken by the two cameras as a stereo image, the user 10 can use the depth direction of the real space. Will be able to be recognized intuitively.
  • the internal configuration of the HMD 1 and the video / audio processing are the same as in the case of the first embodiment, and duplicate description will be omitted.
  • the sound field space expands the sound in the real space by mounting two microphones on the left and right ends of the HMD1 for the sound input unit 81 (microphone) as well as the camera. It can be acquired as stereo sound. Then, by outputting the stereo sound from the sound output unit 82 (left and right headphones 821, 822) of the HMD 1, it is possible to provide the user 10 with a sound having a sense of depth.
  • the user 10 wearing the HMD 1 can intuitively recognize the depth direction of the image in the real space and listen to the sound in the real space as the stereo sound in which the sound field space expands. ..
  • the case where there is one sound source was assumed, but in the fourth embodiment, the case where there are a plurality of sound sources will be described.
  • the basic configuration of the HMD in the fourth embodiment is the same as the configuration described in the first embodiment, but the function for separating a plurality of sound sources is added to the functional block.
  • FIG. 15 is a diagram showing a functional block configuration of the HMD.
  • a sound source separation unit 29 is added to the functional block configuration of FIG. 2B.
  • the sound source separation unit 29 analyzes the audio data acquired by the audio data acquisition unit 22, and performs a process of separating each sound source into individual sound sources.
  • a general method for separating sound sources there is a method in which a plurality of microphones are installed externally, and audio data at individual positions of the plurality of microphones are detected for phase difference, sound pressure difference, etc., and the position of the sound source is accurately obtained.
  • two microphones are mounted on the left and right ends of the HMD 1, and the sound emitted by one sound source reaches the left and right microphones due to the phase difference and the volume difference. Detects the direction of the sound source and simply separates the sound source. Further, the position of the sound source can be specified and separated from the arrangement information in the real space stored in the three-dimensional data storage unit 14 and the image in the real space acquired by the shooting data acquisition unit 15.
  • FIG. 16 is a flowchart showing a procedure for processing voice data. This is based on the flowchart of the first embodiment (FIG. 9), and steps S445 and S446 accompanying the sound source separation are added. Here, only the differences from the flowchart of FIG. 9 will be described.
  • a plurality of sound sources are individually separated by the sound source separating unit 29 in the newly added S445. Then, in the next S436 or later, audio data processing is performed on the separated individual sound sources based on the acoustic characteristics of the components of the AR object to be shielded.
  • a threshold value is set for the volume (sound pressure), frequency, etc., and the sound data processing of S436 to S438 is performed for the sound source having a low volume below the threshold value and the sound source having a high frequency exceeding the threshold value. It may be possible to narrow down the number of sound sources to be processed by voice processing. In S446, it is determined whether or not the processing has been completed for all the separated sound sources, and if so, the audio data is output in S439.
  • the voice from the AR object for example, the engine sound of the car or the horn sound
  • the voice output can be performed.
  • the fourth embodiment even when a plurality of sound sources exist, it is possible to output sound for each sound source in consideration of the arrangement of AR objects.
  • FIG. 17 is a diagram showing the overall configuration of the HMD system.
  • the HMD 1 worn by the user 10 is connected to the wireless router 65 and the network network 66 via the LAN communication unit 61 of the communication unit 6.
  • a plurality of servers 67 to 69 are connected to the network network 66.
  • the first server 67 contains information on the image of the AR object (information on the shape, size, arrangement, etc.), incidental information on the components of the AR object (material, thickness, cross-sectional area, etc. of each component). Information about AR objects is stored and managed. By using the first server 67, it is possible to reduce the load on various data units 42 of the storage unit 4 required by the AR object information storage unit 16.
  • the second server 68 is stored and managed by patterning acoustic characteristic data with parameters of material, thickness, and cross-sectional area for each material.
  • the material stored and managed by the second server 68 includes not only the material of the component of the AR object but also the acoustic property data regarding the material other than the component of the AR object.
  • the third server 69 is a server that performs arithmetic processing performed by the video arithmetic unit 73 of the video processing unit 7 and the audio arithmetic unit 83 of the audio processing unit 8 at high speed. By using this third server, it is possible to reduce the load on the AR object generation unit 17, the AR object superimposition unit 18, the sound source position specifying unit 23, and the voice data processing unit 25.
  • all the servers 67 to 69 are used in order to reduce the load on the HMD 1, but it goes without saying that the servers can be appropriately selected and used as needed. It should be noted that these servers can be aggregated without being independent of each other.
  • the configuration for realizing the technique of the present invention is not limited to the above-mentioned Examples, and can be applied to various modified examples.
  • the acoustic characteristics attenuation characteristics or reflection characteristics are used depending on the positional relationship
  • a video transmission type HMD is premised, but the terminal used by the user does not have to be in the form of an HMD.
  • the AR object may be displayed by using the camera and the display of the smartphone, and the sound processed by the acoustic processing shown in the present embodiment may be output to the headphones worn by the user.
  • each embodiment may be realized by hardware, for example, by designing a part or all of them by an integrated circuit. Further, it may be realized by software by interpreting and executing a program in which a microprocessor unit or the like realizes each function or the like. Hardware and software may be used together.
  • the software at that time may be stored in the program unit 41 or the like of the HMD in advance at the time of product shipment, or may be acquired from various servers or the like on the Internet after the product is shipped. .. Further, the software provided by a memory card, an optical disk, or the like may be acquired.
  • control lines and information lines shown in the figure indicate what is considered necessary for explanation, and do not necessarily indicate all the control lines and information lines on the product. In practice, it can be considered that almost all configurations are interconnected.
  • HMD Head mount display
  • Main control unit 4 ... Storage unit, 5 ... Sensor unit, 6 ... Communication processing unit, 7 ... Video processing unit, 8 ... Audio processing unit, 9 ... Operation input unit, 10 ... User, 12 ... 3D sensor information acquisition unit, 13 ... 3D data processing unit, 14 ... 3D data storage unit, 15 ... Shooting data acquisition unit, 16 ... AR object information storage unit, 17 ... AR object generation unit, 18 ... AR object superimposition unit, 19 ... display data output unit, 21 ... acoustic characteristic data storage unit, 22 ... audio data acquisition unit, 23 ... sound source position identification unit, 24 ... AR object shielding determination unit, 25 ... audio data processing unit , 26 ...
  • Audio data output unit 27 ... Shooting data editing unit, 28 ... Sound source position moving unit, 29 ... Sound source separation unit, 32 to 34 ... People, 35 to 37 ... AR object, 42 ...
  • Various data units 53 ... 3 Dimensional sensor unit, 67-69 ... server, 70 ... camera, 71 ... imaging unit, 72 ... display unit, 73 ... video calculation unit, 80 ... headphones, 81 ... audio input unit, 82 ... audio output unit, 83 ... audio calculation Department.

Abstract

This HMD displays a video of a real space captured by an imaging unit 71 and an AR object generated by an AR object generating unit 17 superimposed on each other on a display unit 72. Correspondingly, voice generated in the real space or voice generated by the AR object is output from a voice output unit 82. At this time, a voice data processing unit 25, in accordance with the positional relationship between an object present in the real space and an AR object disposed in superimposition therewith, subjects voice data of at least one of the voice from the real space and the voice generated by the AR object to processing/treatment, taking the acoustic characteristics of a surrounding object or a surrounding AR object into consideration. An acoustic characteristics data saving unit 21 saves the acoustic characteristics of the AR object in patterned form using the shape of each constituent element as a parameter. In this way, it is possible to perform in real-time a realistic display of an AR object and voice output taking the arrangement of the AR object into consideration.

Description

ヘッドマウントディスプレイ及びその音声処理方法Head-mounted display and its audio processing method
 本発明は、ヘッドマウントディスプレイ及びその音声処理方法に関する。 The present invention relates to a head-mounted display and a voice processing method thereof.
 近年、ヘッドマウントディスプレイ(以下、HMD:Head Mounted Display)が、実用化されている。HMDは眼鏡形式の表示画面上に、コンピュータなどで作成した仮想現実(VR:Virtual Reality)の画像(VRオブジェクト)を表示し、ユーザが、まるで仮想世界にいるような感覚を体験することができる。さらには、現実空間の映像に、コンピュータで生成した拡張現実(AR:Augmented Reality)の画像(ARオブジェクト)を重畳して表示し、ARオブジェクトを、あたかも現実空間に存在するが如く提示することも可能となっている。 In recent years, head-mounted displays (hereinafter, HMD: Head Mounted Display) have been put into practical use. The HMD displays a virtual reality (VR) image (VR object) created by a computer or the like on a display screen in the form of glasses, allowing the user to experience the feeling of being in a virtual world. .. Furthermore, it is also possible to superimpose an image (AR object) of augmented reality (AR) generated by a computer on an image in real space and display the AR object as if it exists in real space. It is possible.
 上記VRオブジェクトを用いたゲームにおいて、特許文献1には、仮想空間のエンターテイメント性を向上させるため、音反射オブジェクトという球形のオブジェクトを規定し、この音反射オブジェクトによる反射が発生するとしてマイクに入力したユーザ音声を加工し、ヘッドフォンから出力させる構成が開示される。 In the game using the VR object, Patent Document 1 defines a spherical object called a sound reflection object in order to improve the entertainment property of the virtual space, and inputs the sound reflection object to the microphone as the reflection is generated. A configuration is disclosed in which the user's sound is processed and output from the headphones.
特開2018-11193号公報Japanese Unexamined Patent Publication No. 2018-11193
 従来の一般的なHMDは、撮影手段(カメラ)により実空間の映像を取り込めるビデオ透過型(ビデオシースルー型)の場合、HMDの表示画面上に、撮影した実空間の映像と、ARオブジェクトを重畳して表示している。その際、ユーザに提供する音声(効果音を含む音響空間)は、ARオブジェクトの存在有無にかかわらず、HMDに搭載した音声入力手段(マイク)で取得した実空間の音声をそのまま出力している。このため、ユーザが聞く音声には、ユーザが視認するARオブジェクトの影響が反映されず、ARオブジェクトの配置状況によっては、ユーザに対し違和感を与えることがあった。 In the case of a conventional general HMD, in the case of a video transmission type (video see-through type) in which a real space image can be captured by a shooting means (camera), the shot real space image and an AR object are superimposed on the display screen of the HMD. Is displayed. At that time, as the sound provided to the user (acoustic space including sound effects), the sound in the real space acquired by the voice input means (microphone) mounted on the HMD is output as it is regardless of the presence or absence of the AR object. .. Therefore, the voice heard by the user does not reflect the influence of the AR object visually recognized by the user, and may give a sense of discomfort to the user depending on the arrangement status of the AR object.
 これに関し、ARオブジェクトの存在を仮定し、そのARオブジェクトの存在による音響特性の影響を、コンピュータにより音場シミュレーションをする有限差分時間領域(FDTD:Finite Difference Time Domain)法などの手法を使用することもできる。しかしながら、このようなシミュレーションは演算負荷が大きく、処理時間が増大して、リアルタイムで処理することは困難である。 In this regard, use a method such as the Finite Difference Time Domain (FDTD) method, which assumes the existence of an AR object and simulates the effect of acoustic characteristics due to the existence of the AR object on a sound field by a computer. You can also. However, such a simulation has a large computational load, increases the processing time, and is difficult to process in real time.
 前記特許文献1には、非透過型のHMDを用い、仮想空間内において、VRオブジェクトを配置し、音反射オブジェクトの存在により、音響効果を変化させる手法が記載されている。しかしながら、音反射オブジェクト以外のVRオブジェクトやARオブジェクトを配置したことによる、実空間の音源で発生した音声への影響は考慮されておらず、ユーザに与える違和感を解消することにはならない。また、特許文献1では、音声加工対象である音源は音反射オブジェクトの中心に存在すると規定されているので、音源位置とARオブジェクトの位置関係が変化するような場合には適用が困難である。 The above-mentioned Patent Document 1 describes a method of using a non-transmissive HMD, arranging a VR object in a virtual space, and changing the sound effect by the presence of a sound reflection object. However, the influence on the sound generated by the sound source in the real space due to the arrangement of the VR object or the AR object other than the sound reflection object is not taken into consideration, and the discomfort given to the user cannot be eliminated. Further, since Patent Document 1 defines that the sound source to be processed for sound exists at the center of the sound reflection object, it is difficult to apply it when the positional relationship between the sound source position and the AR object changes.
 本発明の目的は、実空間内にARオブジェクトを配置するとともに、実空間で発生した音声をARオブジェクトの配置を考慮してリアルタイムに加工し、ユーザに提示するヘッドマウントディスプレイ及びその音声処理方法を提供することである。 An object of the present invention is to arrange an AR object in a real space, process a voice generated in the real space in real time in consideration of the arrangement of the AR object, and present the head-mounted display to the user and a voice processing method thereof. To provide.
 前記課題を解決するため、本発明のヘッドマウントディスプレイは、映像処理として、実空間の映像を撮影する撮像部と、ARオブジェクトを生成するARオブジェクト生成部と、前記撮影した実空間の映像と、前記生成したARオブジェクトとを重畳するARオブジェクト重畳部と、前記重畳されたた実空間の映像とARオブジェクトとを表示する表示部と、を備える。また音声処理として、実空間にて発せられた音声を入力する音声入力部と、前記ARオブジェクト重畳部により配置された実空間に存在する物体とこれに重畳して表示されるARオブジェクトの位置関係に応じて、実空間からの音声とARオブジェクトが発する音声の少なくとも一方の音声データを、周囲の物体または周囲のARオブジェクトの音響特性を加味して加工処理する音声データ加工部と、前記音声データ加工部にて加工処理された実空間からの音声またはARオブジェクトの音声を出力する音声出力部と、を備える。 In order to solve the above-mentioned problems, the head-mounted display of the present invention has an imaging unit that captures an image in real space, an AR object generation unit that generates an AR object, and an image in real space that has been captured, as image processing. It includes an AR object superimposing unit that superimposes the generated AR object, and a display unit that displays the superposed real space image and the AR object. Further, as voice processing, the positional relationship between the voice input unit for inputting the voice emitted in the real space, the object existing in the real space arranged by the AR object superimposing unit, and the AR object displayed superimposed on the object. A voice data processing unit that processes at least one of the voice data of the voice from the real space and the voice emitted by the AR object in consideration of the acoustic characteristics of the surrounding object or the surrounding AR object, and the voice data. It is provided with an audio output unit that outputs audio from a real space processed by the processing unit or audio of an AR object.
 本発明によれば、現実感のあるARオブジェクトの表示と、ARオブジェクトの配置を考慮した音声出力を、リアルタイムに実現することができる。 According to the present invention, it is possible to realize a realistic display of an AR object and an audio output in consideration of the arrangement of the AR object in real time.
本発明で扱う音声処理の一例を示す模式図。The schematic diagram which shows an example of the voice processing treated in this invention. 本発明で扱う音声処理の一例を示す模式図。The schematic diagram which shows an example of the voice processing treated in this invention. 本発明で扱う音声処理の他の例を示す模式図。The schematic diagram which shows the other example of the voice processing treated in this invention. 本発明で扱う音声処理の他の例を示す模式図。The schematic diagram which shows the other example of the voice processing treated in this invention. 実施例1におけるHMDのハードウェア構成を示す図。The figure which shows the hardware structure of HMD in Example 1. FIG. 実施例1におけるHMDの機能ブロック構成を示す図。The figure which shows the functional block composition of HMD in Example 1. FIG. 音響特性を表記法の例を示す図。The figure which shows the example of the notation of acoustic characteristics. ARオブジェクトの構成要素のインパルス応答を記述したテーブル。A table that describes the impulse responses of the components of an AR object. インパルス応答の波形を模式的に示す図。The figure which shows the waveform of the impulse response schematically. ARオブジェクト重畳表示時の映像・音声処理の例を示す図。The figure which shows the example of video / audio processing at the time of AR object superimposition display. ARオブジェクト重畳表示時の映像・音声処理の例を示す図。The figure which shows the example of video / audio processing at the time of AR object superimposition display. ARオブジェクト重畳表示時の映像・音声処理の例を示す図。The figure which shows the example of video / audio processing at the time of AR object superimposition display. ARオブジェクト重畳表示時の映像・音声処理の他の例を示す図。The figure which shows the other example of video / audio processing at the time of AR object superimposition display. ARオブジェクト重畳表示時の映像・音声処理の他の例を示す図。The figure which shows the other example of video / audio processing at the time of AR object superimposition display. ARオブジェクト重畳表示時の映像処理手順を示すフローチャート。A flowchart showing a video processing procedure at the time of superimposing an AR object. ARオブジェクト重畳表示時の音声処理手順を示すフローチャート。A flowchart showing a voice processing procedure at the time of superimposing an AR object. 実施例2におけるHMDの機能ブロック構成を示す図。The figure which shows the functional block composition of HMD in Example 2. FIG. ARオブジェクト重畳表示の例を示す図。The figure which shows the example of AR object superimposition display. ARオブジェクト重畳表示の例を示す図。The figure which shows the example of AR object superimposition display. 重畳表示時の映像処理手順を示すフローチャート。A flowchart showing a video processing procedure at the time of superimposed display. 重畳表示時の音声処理手順を示すフローチャート。A flowchart showing a voice processing procedure at the time of superimposed display. 実施例3のHMDをユーザが装着した状態を示す図。The figure which shows the state which the user attached the HMD of Example 3. 実施例4におけるHMDの機能ブロック構成を示す図。The figure which shows the functional block composition of HMD in Example 4. FIG. 重畳表示時の音声処理手順を示すフローチャート。A flowchart showing a voice processing procedure at the time of superimposed display. 実施例5におけるHMDシステムの全体構成を示す図。The figure which shows the whole structure of the HMD system in Example 5.
 以下、本発明の実施形態について、図面を用いて説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.
 図1Aおよび図1Bは、本発明で扱う音声処理の一例を示す模式図である。ユーザ10は、ヘッドマウントディスプレイ(HMD)1を装着して、人物30の口301から発する音声を聞いている状態を示している。HMD1は、ユーザ10の頭部に装着され、ユーザ10の視界はHMD1の表示画面で覆われている。HMD1の正面中央にはカメラ70が配置されており、ユーザ10の正面方向における実空間の映像を撮影している。HMD1はビデオ透過型HMDであり、撮影した映像は、HMD1の表示画面に表示される。 1A and 1B are schematic views showing an example of voice processing handled in the present invention. The user 10 is wearing a head-mounted display (HMD) 1 and is listening to a voice emitted from the mouth 301 of the person 30. The HMD1 is attached to the head of the user 10, and the view of the user 10 is covered with the display screen of the HMD1. A camera 70 is arranged in the center of the front of the HMD 1 to capture a real-space image in the front direction of the user 10. The HMD1 is a video transmission type HMD, and the captured image is displayed on the display screen of the HMD1.
 さらに、図示していないが、カメラ70の近傍には実空間の音声を取得するマイクロフォン(以下、「マイク」)が配置されている。マイクで取得した音声は、必要に応じて後述するように加工され、ヘッドフォン80からユーザ10に出力される。HMD1とヘッドフォン80は電気的に接続されているが、構造的に一体化させてもよい。 Further, although not shown, a microphone (hereinafter, "microphone") for acquiring real-space sound is arranged in the vicinity of the camera 70. The sound acquired by the microphone is processed as described later as necessary, and is output from the headphone 80 to the user 10. Although the HMD 1 and the headphone 80 are electrically connected, they may be structurally integrated.
 図1Aは、ユーザ10と人物30の口301とを結ぶ線分上に、ARオブジェクトが無い状態を示す。この状態では、HMD1のカメラ70で撮影した映像を表示することにより、ユーザ10は前方の人物30を視覚的に認識できる。また、人物30の口301から発する音声をHMD1のマイクで取得し、そのまま加工せずにヘッドフォン80から出力することにより、ユーザ10は人物30の音声として違和感なく聞くことができる。 FIG. 1A shows a state in which there is no AR object on the line segment connecting the user 10 and the mouth 301 of the person 30. In this state, the user 10 can visually recognize the person 30 in front by displaying the image taken by the camera 70 of the HMD1. Further, by acquiring the voice emitted from the mouth 301 of the person 30 with the microphone of the HMD 1 and outputting it from the headphone 80 without processing as it is, the user 10 can hear the voice of the person 30 without discomfort.
 図1Bは、ユーザ10と人物30の口301とを結ぶ線分上に、ARオブジェクト35が配置された状態を示す。ここでのARオブジェクト35は、例えば衝立のように、音声の伝達を減衰させるような音響的障害物であることを想定する。この状態では、HMD1のカメラ70で人物30を撮影できても、人物30はARオブジェクト35の陰に隠れているので、HMD1の表示画面に表示されず、ユーザ10は人物30を視覚的に認識することができない。一方、人物30の口301から発する音声はHMD1のマイクで取得し、ヘッドフォン80から出力することができる。しかし、図1Aと同じように、取得した音声データをそのまま加工せずにヘッドフォン80から出力すると、目に見えない人物30の音声がARオブジェクト35(音響的障害物)の影響を受けずに聞こえることで、ユーザ10は違和感を覚えることになる。 FIG. 1B shows a state in which the AR object 35 is arranged on the line segment connecting the user 10 and the mouth 301 of the person 30. It is assumed that the AR object 35 here is an acoustic obstacle that attenuates the transmission of sound, such as a tsuitate. In this state, even if the person 30 can be photographed by the camera 70 of the HMD 1, the person 30 is hidden behind the AR object 35 and is not displayed on the display screen of the HMD 1, and the user 10 visually recognizes the person 30. Can not do it. On the other hand, the sound emitted from the mouth 301 of the person 30 can be acquired by the microphone of the HMD 1 and output from the headphones 80. However, as in FIG. 1A, when the acquired voice data is output from the headphone 80 without being processed as it is, the voice of the invisible person 30 can be heard without being affected by the AR object 35 (acoustic obstacle). As a result, the user 10 feels uncomfortable.
 そこで本実施例では、実空間に存在しないARオブジェクト35を表示する場合、ARオブジェクト35の音響特性を加味して、人物30の口301から発する音声を、あたかも実空間にARオブジェクト35が存在するが如く加工する。具体的には、ARオブジェクト35の形状や材質に応じて、音声データを減衰させる加工処理を施して出力する。その結果、ユーザ10に与える音声の違和感を軽減させることができる。 Therefore, in this embodiment, when the AR object 35 that does not exist in the real space is displayed, the sound emitted from the mouth 301 of the person 30 is generated as if the AR object 35 exists in the real space, taking into account the acoustic characteristics of the AR object 35. Process as follows. Specifically, according to the shape and material of the AR object 35, processing processing for attenuating the audio data is performed and output. As a result, it is possible to reduce the discomfort of the voice given to the user 10.
 また図1Cおよび図1Dは、本発明で扱う音声処理の他の例を示す模式図である。この例では、HMD1を装着したユーザ10は、人物30の後方に位置し、口301から発する音声を聞いている状態である。 Further, FIGS. 1C and 1D are schematic views showing other examples of voice processing handled in the present invention. In this example, the user 10 wearing the HMD 1 is located behind the person 30 and is in a state of listening to the voice emitted from the mouth 301.
 図1Cは、ユーザ10と人物30の口301とを結ぶ直線上に、ARオブジェクトが無い状態を示す。人物30の口301から発する音声をHMD1のマイクで取得し、そのまま加工せずにヘッドフォン80から出力することにより、ユーザ10は人物30の音声として違和感なく聞くことができる。ただしこの場合は、ユーザ10は人物30の後方に位置するので、図1Aの場合と比較して聞こえる音量は小さくなる。 FIG. 1C shows a state in which there is no AR object on a straight line connecting the user 10 and the mouth 301 of the person 30. By acquiring the voice emitted from the mouth 301 of the person 30 with the microphone of the HMD 1 and outputting it from the headphone 80 without processing as it is, the user 10 can hear the voice of the person 30 without discomfort. However, in this case, since the user 10 is located behind the person 30, the audible volume is smaller than that in the case of FIG. 1A.
 図1Dは、ユーザ10と人物30の口301とを結ぶ延長線上に、ARオブジェクト35’が配置された状態を示す。ここでのARオブジェクト35’は、例えば鏡面のように、音声の伝達を反射させるような音響的反射物であることを想定する。この状態では、人物30の口301から発する音声は、ARオブジェクト35’で反射されてユーザ10に届く分が存在する(図中、鎖線で示す)。よって、図1Cと同じように、マイクで取得した音声データをそのまま加工せずにヘッドフォン80から出力すると、ユーザ10は違和感を覚えることになる。 FIG. 1D shows a state in which the AR object 35'is arranged on the extension line connecting the user 10 and the mouth 301 of the person 30. It is assumed that the AR object 35'here is an acoustic reflector that reflects the transmission of sound, for example, a mirror surface. In this state, the voice emitted from the mouth 301 of the person 30 is reflected by the AR object 35'and reaches the user 10 (indicated by a chain line in the figure). Therefore, as in FIG. 1C, if the audio data acquired by the microphone is output from the headphone 80 without being processed as it is, the user 10 feels uncomfortable.
 そこで本実施例では、実空間に存在しないARオブジェクト35’を表示する場合、人物30の口301から発する音声を、ARオブジェクト35’の音響特性(反射特性)を加味して加工する。具体的には、ARオブジェクト35’の形状や材質に応じて、音声データを増幅させる加工処理を施して出力する。その結果、ユーザ10に与える音声の違和感を軽減させることができる。 Therefore, in this embodiment, when the AR object 35'that does not exist in the real space is displayed, the sound emitted from the mouth 301 of the person 30 is processed in consideration of the acoustic characteristics (reflection characteristics) of the AR object 35'. Specifically, according to the shape and material of the AR object 35', processing processing for amplifying audio data is performed and output. As a result, it is possible to reduce the discomfort of the voice given to the user 10.
 このように本実施例では、実空間に存在しないARオブジェクトを表示するとき、ARオブジェクトの音響特性(減衰特性/反射特性)を加味して音声データを加工して出力するものであるが、以下では、図1Bのように、ARオブジェクトが音響的障害物(減衰特性)である場合を中心に説明を行う。
  以下、本実施例のヘッドマウントディスプレイ(HMD)の構成と動作を説明する。
As described above, in this embodiment, when displaying an AR object that does not exist in the real space, the audio data is processed and output in consideration of the acoustic characteristics (attenuation characteristics / reflection characteristics) of the AR object. Then, as shown in FIG. 1B, the case where the AR object is an acoustic obstacle (attenuation characteristic) will be mainly described.
Hereinafter, the configuration and operation of the head-mounted display (HMD) of this embodiment will be described.
 [HMDのハードウェア構成]
  図2Aは、ヘッドマウントディスプレイ(HMD)1のハードウェア構成を示す図である。HMD1は、主制御部2、システムバス3、記憶部4、センサ部5、通信処理部6、映像処理部7、音声処理部8、操作入力部9を含んで構成される。
[HMD hardware configuration]
FIG. 2A is a diagram showing a hardware configuration of a head-mounted display (HMD) 1. The HMD 1 includes a main control unit 2, a system bus 3, a storage unit 4, a sensor unit 5, a communication processing unit 6, a video processing unit 7, an audio processing unit 8, and an operation input unit 9.
 主制御部2は、所定の動作プログラムに従ってHMD1全体を制御するマイクロプロセッサユニットである。システムバス3は、主制御部2とHMD1内の各構成ブロックとの間で各種コマンドやデータなどの送受信を行うためのデータ通信路である。 The main control unit 2 is a microprocessor unit that controls the entire HMD 1 according to a predetermined operation program. The system bus 3 is a data communication path for transmitting and receiving various commands and data between the main control unit 2 and each constituent block in the HMD 1.
 記憶部4は、HMD1の動作を制御するためのプログラムなどを記憶するプログラム部41、動作設定値やセンサ部からの検出値やコンテンツを含むオブジェクトやライブラリからダウンロードしたライブラリ情報などの各種データを記憶する各種データ部42、各種プログラム動作で使用するワークエリアなどの書き替え可能なプログラム機能部43を有する。 The storage unit 4 stores various data such as a program unit 41 that stores a program for controlling the operation of the HMD 1, an operation setting value, a detection value from the sensor unit, an object including contents, and library information downloaded from the library. It has various data units 42 and a rewritable program function unit 43 such as a work area used for various program operations.
 記憶部4は、ネットワーク上からダウンロードした動作プログラムや前記動作プログラムで作成した各種データ等を記憶可能である。また、ネットワーク上からダウンロードした動画や静止画や音声等のコンテンツを記憶可能である。また、カメラによる撮影機能を使用して撮影した動画や静止画等のデータを記憶可能である。ここに記憶部4は、HMD1に外部から電源が供給されていない状態であっても記憶している情報を保持する必要がある。従って、例えば、フラッシュROMやSSD(Solid State Drive)などの半導体素子メモリ、HDD(Hard Disc Drive)などの磁気ディスクドライブ、等のデバイスが用いられる。なお、記憶部4に記憶された各動作プログラムは、ネットワーク上の各サーバ装置からのダウンロード処理により更新及び機能拡張することが可能である。 The storage unit 4 can store an operation program downloaded from the network, various data created by the operation program, and the like. In addition, it is possible to store contents such as moving images, still images, and sounds downloaded from the network. In addition, it is possible to store data such as moving images and still images taken by using the shooting function of the camera. Here, the storage unit 4 needs to hold the stored information even when the HMD 1 is not supplied with power from the outside. Therefore, for example, a device such as a semiconductor element memory such as a flash ROM or SSD (Solid State Drive), a magnetic disk drive such as an HDD (Hard Disc Drive), or the like is used. Each operation program stored in the storage unit 4 can be updated and expanded in function by a download process from each server device on the network.
 センサ部5は、HMD1の状態を検出するための各種のセンサ群である。センサ群には、GPS(Global Positioning System)受信部51、地磁気センサ部52、3次元センサ部53、加速度センサ部54、ジャイロセンサ部55が含まれる。これらのセンサ群により、HMD1の位置、傾き、方角、動き、等を検出する。また、これ以外に、照度センサ、高度センサ、近接センサ等、をさらに備えてもよい。 The sensor unit 5 is a group of various sensors for detecting the state of the HMD1. The sensor group includes a GPS (Global Positioning System) receiving unit 51, a geomagnetic sensor unit 52, a three-dimensional sensor unit 53, an acceleration sensor unit 54, and a gyro sensor unit 55. These sensor groups detect the position, tilt, direction, movement, etc. of the HMD1. In addition to this, an illuminance sensor, an altitude sensor, a proximity sensor, and the like may be further provided.
 このうち3次元センサ部53は、対象物の各ポイントまでの距離を計測するものである。3次元センサ(3Dスキャナー)には種々の方式があり、例えば、非接触式の光(格子パターン)投影法、非接触式のレーザ光切断方式、レーザによる非接触式の位相差方式(フェィズシフト方式)などが可能である。本実施例ではいずれの方式でも構わない。 Of these, the three-dimensional sensor unit 53 measures the distance to each point of the object. There are various methods for 3D sensors (3D scanners), such as non-contact light (lattice pattern) projection method, non-contact laser light cutting method, and non-contact phase difference method (phase shift method) using a laser. ) Etc. are possible. In this embodiment, any method may be used.
 通信処理部6は、LAN(Local Area Network)通信部61、電話網通信部62を有する。LAN通信部61は、アクセスポイント等を介してインターネット等のネットワーク網と接続され、ネットワーク網上の各ネットワークサーバ装置とデータの送受信を行う。アクセスポイント等との接続は、Wi-Fi(登録商標)等の無線接続で行われてよい。 The communication processing unit 6 has a LAN (Local Area Network) communication unit 61 and a telephone network communication unit 62. The LAN communication unit 61 is connected to a network such as the Internet via an access point or the like, and transmits / receives data to / from each network server device on the network. The connection with the access point or the like may be made by a wireless connection such as Wi-Fi (registered trademark).
 電話網通信部62は、移動体電話通信網の基地局等との無線通信により、電話通信(通話)及びデータの送受信を行う。基地局等との通信はW-CDMA(Wideband Code Division Multiple Access)(登録商標)方式やGSM(Global System for Mobile communications)方式、LTE(Long Term Evolution)方式、或いはその他の通信方式によって行われてよい。 The telephone network communication unit 62 performs telephone communication (call) and data transmission / reception by wireless communication with a base station or the like of a mobile telephone communication network. Communication with base stations, etc. is performed by W-CDMA (Wideband Code Division Multiple Access) (registered trademark) method, GSM (Global System for Mobile communications) method, LTE (Long Term Evolution) method, or other communication methods. good.
 LAN通信部61、電話網通信部62、は、それぞれ符号化回路や復号回路やアンテナ等を備える。また、通信処理部6が、赤外線通信部やBluetooth(登録商標)通信部など、他の通信部をさらに備えていてもよい。 The LAN communication unit 61 and the telephone network communication unit 62 each include a coding circuit, a decoding circuit, an antenna, and the like. Further, the communication processing unit 6 may further include other communication units such as an infrared communication unit and a Bluetooth (registered trademark) communication unit.
 映像処理部7は、撮像部71、表示部72、映像演算部73を有する。撮像部71は、CCD(Charge Coupled Device)やCMOS(Complementary Metal Oxide Semiconductor)センサ等の電子デバイスを用いて、レンズから入力した光を電気信号に変換することにより、外景や対象物の映像データを取得するカメラ70(図1A~1D)である。 The video processing unit 7 has an imaging unit 71, a display unit 72, and a video calculation unit 73. The image pickup unit 71 uses an electronic device such as a CCD (Charge Coupled Device) or a CMOS (Complementary Metal Oxide Semiconductor) sensor to convert the light input from the lens into an electric signal to convert video data of an external scene or an object. The camera 70 (FIGS. 1A to 1D) to be acquired.
 表示部72は、画像データ(ARオブジェクト)や映像データ(カメラ撮影映像)を、HMD1を装着したユーザ10に提供する。ここでHMD1は実空間を透視しないビデオ透過型なので、表示部72は、例えばバックライト型液晶ディスプレイや、自己発光型である有機ELディスプレイなどの表示デバイスである。 The display unit 72 provides image data (AR object) and video data (camera-captured video) to the user 10 wearing the HMD 1. Here, since the HMD 1 is a video transmissive type that does not see through the real space, the display unit 72 is a display device such as a backlit type liquid crystal display or a self-luminous organic EL display.
 映像演算部73は、コンピュータによる拡張現実(ARオブジェクト)の画像を生成する処理、撮像部71で取得した実空間の映像とARオブジェクトとを重畳する処理等、映像に関する演算処理を行う。 The video calculation unit 73 performs arithmetic processing related to video, such as a process of generating an augmented reality (AR object) image by a computer and a process of superimposing a real space image acquired by the imaging unit 71 and an AR object.
 音声処理部8は、音声入力部81、音声出力部82、音声演算部83を有する。音声入力部81は、実空間の音やユーザの音声などを音声データに変換して入力するマイクである。音声出力部82は、ユーザに必要な音声情報等を出力するヘッドフォン80(図1A~1D)である。 The voice processing unit 8 has a voice input unit 81, a voice output unit 82, and a voice calculation unit 83. The voice input unit 81 is a microphone that converts real space sounds, user voices, and the like into voice data and inputs them. The audio output unit 82 is a headphone 80 (FIGS. 1A to 1D) that outputs audio information and the like necessary for the user.
 音声演算部83は、音源の位置検出や、ARオブジェクトの音響特性(材質・厚さ・断面積等による音響変化)に基づく音声データの加工処理など、音声に関する演算処理を行う。 The voice calculation unit 83 performs calculation processing related to voice, such as detection of the position of a sound source and processing of voice data based on the acoustic characteristics of an AR object (acoustic changes due to material, thickness, cross-sectional area, etc.).
 操作入力部9は、HMD1に対する操作指示の入力を受け付ける。操作入力部9は、ボタンスイッチ等を並べた操作キー、等で構成されるが、その他の操作デバイスをさらに備えてもよい。また、通信処理部6を利用し、有線通信または無線通信により接続された別体の携帯端末機器を用いてHMD1の操作を行ってもよい。さらには、映像処理部7の撮像部71の撮影映像を解析し、ジェスチャなどの動作で、HMD1の操作を行なってもよい。 The operation input unit 9 receives an input of an operation instruction to the HMD1. The operation input unit 9 is composed of operation keys and the like in which button switches and the like are arranged, but may further include other operation devices. Further, the communication processing unit 6 may be used to operate the HMD 1 by using a separate mobile terminal device connected by wired communication or wireless communication. Further, the HMD1 may be operated by analyzing the captured image of the image pickup unit 71 of the image processing unit 7 and performing an operation such as a gesture.
 なお、図2Aに示したHMD1のハードウェア構成例は、本実施例に必須ではない構成も多数含んでいるが、これらが備えられていない構成であっても本実施例の効果を損なうことはない。 The hardware configuration example of HMD1 shown in FIG. 2A includes many configurations that are not essential to this embodiment, but even if the configuration is not provided with these, the effect of this embodiment may be impaired. do not have.
 [HMDの機能ブロック構成]
  図2Bは、ヘッドマウントディスプレイ(HMD)1の機能ブロック構成を示す図である。制御機能部11に含まれる各機能ブロックは、主に、図2Aにおける主制御部2と、記憶部4のプログラム部41及びプログラム機能部43により実現される。各機能ブロックの機能について説明する。
[HMD functional block configuration]
FIG. 2B is a diagram showing a functional block configuration of the head-mounted display (HMD) 1. Each functional block included in the control function unit 11 is mainly realized by the main control unit 2 in FIG. 2A, and the program unit 41 and the program function unit 43 of the storage unit 4. The function of each function block will be described.
 3次元センサ情報取得部12は、センサ部5の3次元センサ部53からの情報を取得する。3次元センサ部53からの情報には、HMD1から実空間における対象物の各ポイントまでの距離情報が含まれる。 The 3D sensor information acquisition unit 12 acquires information from the 3D sensor unit 53 of the sensor unit 5. The information from the three-dimensional sensor unit 53 includes distance information from the HMD 1 to each point of the object in the real space.
 3次元データ処理部13は、3次元センサ情報取得部12が取得した情報(対象物の各ポイントまでの距離情報)に基づいて、対象物の形状を把握する。3次元データ保存部14は、3次元データ処理部13で得られた実空間の3次元データを、記憶部4の各種データ部42に保存する。本実施例では、この実空間の3次元データを予め取得して、3次元データ保存部14により保存していることを前提としている。 The 3D data processing unit 13 grasps the shape of the object based on the information acquired by the 3D sensor information acquisition unit 12 (distance information to each point of the object). The three-dimensional data storage unit 14 stores the three-dimensional data in the real space obtained by the three-dimensional data processing unit 13 in various data units 42 of the storage unit 4. In this embodiment, it is premised that the three-dimensional data in the real space is acquired in advance and stored by the three-dimensional data storage unit 14.
 撮影データ取得部15は、映像処理部7の撮像部71(カメラ70)により実空間を撮影し、撮影データを取得する。この機能により、直接実空間を視認できないビデオ透過型HMD1を装着するユーザ10に対して、実空間の映像を提供することができる。 The shooting data acquisition unit 15 captures the real space by the imaging unit 71 (camera 70) of the video processing unit 7 and acquires the shooting data. With this function, it is possible to provide a real space image to a user 10 who wears a video transmissive HMD1 that cannot directly see the real space.
 ARオブジェクト情報保存部16は、ARオブジェクトに関する情報を、記憶部4の各種データ部42に保存する。ARオブジェクトに関する情報には、ARオブジェクトの画像に関する情報(形状・大きさ・配置等)や、ARオブジェクトを構成する仮想の要素(以下、構成要素)に関する付帯情報(各構成要素の材質・厚さ・断面積等)が含まれる。また、通信処理部6のLAN通信部61を介して、インターネット網に接続された外部のサーバより、ARオブジェクトの情報及びARオブジェクトの付帯情報を取得することもできる。 The AR object information storage unit 16 stores information about the AR object in various data units 42 of the storage unit 4. The information about the AR object includes information about the image of the AR object (shape, size, arrangement, etc.) and incidental information about the virtual elements (hereinafter, components) that make up the AR object (material and thickness of each component).・ Cross-sectional area, etc.) is included. Further, the AR object information and the AR object incidental information can be acquired from an external server connected to the Internet network via the LAN communication unit 61 of the communication processing unit 6.
 ARオブジェクト生成部17は、ARオブジェクト情報保存部16に保存されているARオブジェクトの画像に関する情報に基づき、映像処理部7の映像演算部73を用いて、ARオブジェクトを生成する。生成するARオブジェクトは、3次元データ保存部14に保存されている実空間の形状情報に合わせて、その形状・大きさ・配置といった表示仕様を決定する。 The AR object generation unit 17 generates an AR object by using the video calculation unit 73 of the video processing unit 7 based on the information about the image of the AR object stored in the AR object information storage unit 16. The AR object to be generated determines display specifications such as its shape, size, and arrangement according to the shape information of the real space stored in the three-dimensional data storage unit 14.
 ARオブジェクト重畳部18は、撮影データ取得部15で取得した実空間の映像と、ARオブジェクト生成部17で生成したARオブジェクトとを、映像処理部7の映像演算部73を用いて重畳する。具体的には、ARオブジェクトの前面(手前)に位置する実空間の映像は表示するが、ARオブジェクトの背面(奥)に位置する実空間の映像は表示しない等のオクルージョン処理を行い、重畳処理を行う。 The AR object superimposing unit 18 superimposes the real space image acquired by the shooting data acquisition unit 15 and the AR object generated by the AR object generation unit 17 by using the image calculation unit 73 of the image processing unit 7. Specifically, occlusion processing such as displaying the real space image located in front (front) of the AR object but not displaying the real space image located in the back (back) of the AR object is performed and superimposed processing is performed. I do.
 表示データ出力部19は、ARオブジェクト重畳部18により重畳された実空間の映像とARオブジェクトの画像を、HMD1の表示画面に表示する。 The display data output unit 19 displays the real space image superimposed by the AR object overlay unit 18 and the image of the AR object on the display screen of the HMD1.
 音響特性データ保存部21は、ARオブジェクトの構成要素毎に、材質、厚さ、断面積等をパラメータに、音響特性(減衰特性/反射特性)にかかわる音響特性データを保存する。勿論、音響特性データ保存部21には、採用するARオブジェクトの構成要素以外の材料に関する音響特性データも保存していることは言うまでもない。さらに、通信処理部6のLAN通信部61を介して、インターネット網に接続された外部のサーバより、ARオブジェクトの構成要素の材質に関する音響特性を取得することもできる。 The acoustic characteristic data storage unit 21 stores acoustic characteristic data related to acoustic characteristics (attenuation characteristics / reflection characteristics) with parameters such as material, thickness, and cross-sectional area for each component of the AR object. Of course, it goes without saying that the acoustic characteristic data storage unit 21 also stores acoustic characteristic data related to materials other than the components of the AR object to be adopted. Further, it is also possible to acquire the acoustic characteristics related to the material of the component of the AR object from the external server connected to the Internet network via the LAN communication unit 61 of the communication processing unit 6.
 音声データ取得部22は、音声処理部7の音声入力部81(マイク)で、実空間の音声を取得する。音源位置特定部23は、音源の位置を検出する。これには、3次元データ保存部14に保存されている実空間の配置情報と、撮影データ取得部15にて取得した実空間の映像から、音声処理部8の音声演算部83を用いて、音源の位置を特定する。 The voice data acquisition unit 22 is the voice input unit 81 (microphone) of the voice processing unit 7, and acquires the voice in the real space. The sound source position specifying unit 23 detects the position of the sound source. For this purpose, the voice calculation unit 83 of the voice processing unit 8 is used from the arrangement information of the real space stored in the three-dimensional data storage unit 14 and the image of the real space acquired by the shooting data acquisition unit 15. Identify the location of the sound source.
 ARオブジェクト遮蔽判定部24は、音源位置特定部23で取得した音声データの音源位置と、表示データ出力部19で表示するARオブジェクトの表示位置との位置関係を用いて、前記音声データの音源位置とユーザ10の位置(すなわちHMD1の位置)とを結ぶ線分上に、前記ARオブジェクトの表示位置が存在するかどうかを判定する。すなわち、ユーザ10(HMD1)から見て、ARオブジェクトにより音声データの音源位置が遮蔽されているかどうかを判定する。 The AR object shielding determination unit 24 uses the positional relationship between the sound source position of the audio data acquired by the sound source position specifying unit 23 and the display position of the AR object displayed by the display data output unit 19, and the sound source position of the audio data. It is determined whether or not the display position of the AR object exists on the line segment connecting the user 10 and the position of the user 10 (that is, the position of the HMD1). That is, from the viewpoint of the user 10 (HMD1), it is determined whether or not the sound source position of the audio data is shielded by the AR object.
 音声データ加工部25は、ARオブジェクト遮蔽判定部24の判定結果に応じて、音声データ取得部22にて取得した音声データを解析・加工する。具体的には、ARオブジェクトにより音声データの音源位置が遮蔽されていると判定した場合は、音響特性データ保存部21で保存されている材質毎の音響特性の中から、ARオブジェクト構成要素の情報に基づき対応する音響特性を選択する。さらに、選択した音響特性を用いて、音声処理部8の音声演算部83により、前記音声データを解析・加工する。 The voice data processing unit 25 analyzes and processes the voice data acquired by the voice data acquisition unit 22 according to the determination result of the AR object shielding determination unit 24. Specifically, when it is determined that the sound source position of the audio data is shielded by the AR object, the information of the AR object component is selected from the acoustic characteristics of each material stored in the acoustic characteristic data storage unit 21. Select the corresponding acoustic characteristics based on. Further, the voice data is analyzed and processed by the voice calculation unit 83 of the voice processing unit 8 using the selected acoustic characteristics.
 音声データ出力部26は、音声データ加工部25にて加工した音声を、音声処理部7の音声出力部82(ヘッドフォン80)から、ユーザ10に向けて出力する。 The voice data output unit 26 outputs the voice processed by the voice data processing unit 25 from the voice output unit 82 (headphone 80) of the voice processing unit 7 toward the user 10.
 [ARオブジェクトの音響特性]
  次に、音響特性データ保存部21に格納するARオブジェクトの構成要素に対する音響特性について説明する。
[Acoustic characteristics of AR objects]
Next, the acoustic characteristics for the components of the AR object stored in the acoustic characteristic data storage unit 21 will be described.
 図3は、音響特性を表記法の例を示す図であり、構成要素が異なる2つの例(a)、(b)について示している。(a)は低域通過型の特性、(b)は帯域通過型の特性の一例であり、横軸に時間、または周波数をとって、その振幅の変化(応答特性)を表記している。ARオブジェクトの現物がある場合は、その現物の音響特性を、定常解析手法や、インパルス応答などの非定常解析手法等で測定して、音響特性データ保存部21に格納しておくこともできる。しかし、本実施例では、演算処理の簡略化を図るため、ARオブジェクトの構成要素の材質・厚さ・断面積等の付帯情報に応じたインパルス応答を、パターン化して格納している。インパルス応答とは、インパルス(時間的に非常に短い信号)を入力したときの出力振幅波形の時間変化である。 FIG. 3 is a diagram showing an example of notation for acoustic characteristics, and shows two examples (a) and (b) having different components. (A) is an example of a low-pass type characteristic, and (b) is an example of a band-pass type characteristic, and the change in amplitude (response characteristic) is indicated by taking time or frequency on the horizontal axis. If there is an actual AR object, the acoustic characteristics of the actual object can be measured by a stationary analysis method, a non-stationary analysis method such as an impulse response, and stored in the acoustic characteristic data storage unit 21. However, in this embodiment, in order to simplify the arithmetic processing, the impulse response according to the incidental information such as the material, thickness, and cross-sectional area of the constituent elements of the AR object is stored in a pattern. The impulse response is a time change of the output amplitude waveform when an impulse (a signal having a very short time) is input.
 図4は、ARオブジェクトの構成要素に対するインパルス応答を記述したテーブルである。このテーブルは、音響特性データ保存部21に格納されている。テーブル450の項目は、ARオブジェクトの構成要素を識別する番号451と、構成要素の材質452、構成要素の厚さ453、構成要素の断面積454のパラメータに対し、構成要素の音響特性(インパルス応答)455をパターン化して記述している。材質452のパラメータは、金属、ガラス、プラスチック等で区別する。厚さ453は、厚い/薄いの2パターン、断面積454は、大/小の2パターンで区別する。各パラメータのインパルス応答455は、応答値1~応答値4等でパターン化して与えている。 FIG. 4 is a table describing the impulse response to the components of the AR object. This table is stored in the acoustic characteristic data storage unit 21. The items in the table 450 are the acoustic characteristics (impulse response) of the component with respect to the parameters of the number 451 that identifies the component of the AR object, the material 452 of the component, the thickness 453 of the component, and the cross-sectional area 454 of the component. ) 455 is described in a pattern. The parameters of the material 452 are distinguished by metal, glass, plastic and the like. The thickness 453 is distinguished by two patterns of thick / thin, and the cross-sectional area 454 is distinguished by two patterns of large / small. The impulse response 455 of each parameter is given in a pattern with a response value 1 to a response value 4 or the like.
 図5は、インパルス応答の波形を模式的に示す図であり、図4の応答値1~4を例に示している。インパルス応答は、インパルスを入力したときの出力振幅波形の時間変化であり、構成要素の材質、厚さ、断面積といったパラメータの組み合わせで変化する。 FIG. 5 is a diagram schematically showing an impulse response waveform, and the response values 1 to 4 in FIG. 4 are shown as an example. The impulse response is a time change of the output amplitude waveform when an impulse is input, and changes depending on the combination of parameters such as the material, thickness, and cross-sectional area of the component.
 例えば応答値1は、材質が金属で厚さが厚く、断面積が大きい場合であり、応答波形は振幅の減衰度が大きくなる。これに対し応答値2は、材質が金属で厚さが厚く、断面積が小さい場合であり、応答波形は振幅の減衰度が小さくなる。また、応答値3や応答値4は、材質が金属で厚さが薄い場合であり、周波数特性の違いが応答波形の形状の違いとなって表れている。このように、構成要素の各種条件に対するインパルス応答をパターン化して表現している。金属以外の材質に対するインパルス応答値も、同様にパターン化することができる。 For example, the response value 1 is a case where the material is metal, the thickness is thick, and the cross-sectional area is large, and the response waveform has a large amplitude attenuation. On the other hand, the response value 2 is a case where the material is metal, the thickness is thick, and the cross-sectional area is small, and the response waveform has a small amplitude attenuation. Further, the response value 3 and the response value 4 are cases where the material is metal and the thickness is thin, and the difference in frequency characteristics appears as the difference in the shape of the response waveform. In this way, the impulse responses to various conditions of the components are patterned and expressed. Impulse response values for materials other than metal can be patterned in the same way.
 本実施例では、ARオブジェクトの構成要素の材質、厚さ、断面積に応じて、テーブルを参照して対応するインパルス応答値を選択する。なお、ARオブジェクト情報保存部16にて、ARオブジェクトの構成要素に関する情報(材質・厚さ・断面積等)がない場合は、ARオブジェクトの画像に関する情報(形状・大きさ・表面テクスチャ)等から推定することも可能である。このように、インパルス応答値がパターン化されているので、演算処理の簡略化と高速化を図ることができる。 In this embodiment, the corresponding impulse response value is selected by referring to the table according to the material, thickness, and cross-sectional area of the components of the AR object. If the AR object information storage unit 16 does not have information on the components of the AR object (material, thickness, cross-sectional area, etc.), the information on the image of the AR object (shape, size, surface texture, etc.) can be used. It is also possible to estimate. Since the impulse response values are patterned in this way, it is possible to simplify and speed up the arithmetic processing.
 この構成要素の音響特性をパターン化したテーブル、および、インパルス応答値は、HMD1の記憶部4の各種データ部42に保存していてもよいが、ARオブジェクトの材質が分かった段階で、通信処理部6のLAN通信部61を介して接続された外部のサーバから、必要なインパルス応答値のみをダウンロードして利用することもできる。 The table in which the acoustic characteristics of this component are patterned and the impulse response value may be stored in various data units 42 of the storage unit 4 of the HMD 1, but communication processing is performed when the material of the AR object is known. It is also possible to download and use only the necessary impulse response value from an external server connected via the LAN communication unit 61 of the unit 6.
 なお、本実施例で利用する音響特性としてインパルス応答値を採用したが、図3で示したように、周波数応答を規定するフィルタ特性(例えば、ローパスフィルタ)を採用することもできる。 Although the impulse response value is adopted as the acoustic characteristic used in this embodiment, as shown in FIG. 3, a filter characteristic (for example, a low-pass filter) that defines the frequency response can also be adopted.
 また、本実施例で利用する音響特性は、図1Bの状態を想定し、音声がARオブジェクト35を通過するときの減衰特性に注目して説明した。さらには、図1Dの状態に対応するために、音声がARオブジェクト35’により反射するときの反射特性も考慮することで、より現実味を帯びた音声処理が可能となる。この場合、図4のテーブルには、ARオブジェクトと実空間の音源との距離等をパラメータとして追加すればよい。 Further, the acoustic characteristics used in this embodiment are described assuming the state of FIG. 1B and paying attention to the attenuation characteristics when the voice passes through the AR object 35. Furthermore, in order to correspond to the state of FIG. 1D, by considering the reflection characteristic when the voice is reflected by the AR object 35', more realistic voice processing becomes possible. In this case, the distance between the AR object and the sound source in the real space may be added as a parameter to the table of FIG.
 [HMDによる重畳表示の例]
  図6A~図6Cは、ARオブジェクト重畳表示時の映像・音声処理の例を示す図である。まず、映像処理から説明する。
[Example of superimposed display by HMD]
6A to 6C are diagrams showing an example of video / audio processing at the time of superimposing and displaying an AR object. First, video processing will be described.
 図6Aは、ユーザの前方に見える実空間の風景であり、HMD1の撮像部71(カメラ70)により撮影される。本例における実空間には、部屋の背景31と、部屋の隅に座った状態の人物32(人物32が発する音源である口321)が存在している。 FIG. 6A is a real-space landscape seen in front of the user, and is photographed by the image pickup unit 71 (camera 70) of the HMD1. In the real space in this example, the background 31 of the room and the person 32 sitting in the corner of the room (the mouth 321 which is the sound source emitted by the person 32) exist.
 図6Bは、HMD1のARオブジェクト生成部17により生成したARオブジェクトである。本例では、ARオブジェクトとして自動車のARオブジェクト36の例を示す。ARオブジェクト情報保存部16には、様々なARオブジェクトの画像情報が保存されており、それから選択したものである。 FIG. 6B is an AR object generated by the AR object generation unit 17 of the HMD1. In this example, an example of an AR object 36 of an automobile is shown as an AR object. Image information of various AR objects is stored in the AR object information storage unit 16, and is selected from the image information.
 図6Cは、図6Aの実空間の撮影映像(部屋の背景31と人物32の映像)と、図6BのARオブジェクト36とを重畳した映像であり、HMD1の表示データ出力部19により、表示画面(表示部72)に表示される。HMD1を装着したユーザ10は、HMD1を介しこのような重畳映像を視認することができる。 FIG. 6C is an image in which the photographed image of the real space of FIG. 6A (the image of the background 31 of the room and the image of the person 32) and the AR object 36 of FIG. 6B are superimposed, and the display screen is displayed by the display data output unit 19 of the HMD 1. It is displayed on (display unit 72). The user 10 wearing the HMD1 can visually recognize such a superposed image via the HMD1.
 本実施例のビデオ透過型HMDでは、実空間の映像を撮像部71で撮影し、その映像データを処理してARオブジェクト36とともに表示部72に重畳表示する。その際、ARオブジェクト重畳部18は、3次元データ保存部14に保存されている実空間の状態(物体の配置や大きさ等)を考慮し、ARオブジェクトの大きさや配置などが、実空間の状態と矛盾しないように処理する。つまり、ARオブジェクト36で遮蔽されない実空間の映像(背景31の一部)を表示し、ARオブジェクト36で遮蔽される実空間の映像データ(背景31の一部と人物32)は表示しないように処理を行なう。 In the video transmissive HMD of this embodiment, a real space image is captured by the image pickup unit 71, the image data is processed, and the image data is superimposed and displayed on the display unit 72 together with the AR object 36. At that time, the AR object superimposing unit 18 considers the state of the real space (arrangement and size of the object, etc.) stored in the three-dimensional data storage unit 14, and the size and arrangement of the AR object is the real space. Process so that it does not conflict with the state. That is, the real space image (a part of the background 31) that is not shielded by the AR object 36 is displayed, and the real space image data (a part of the background 31 and the person 32) that is shielded by the AR object 36 is not displayed. Perform processing.
 図6Cの場合は、ARオブジェクト36(自動車)は背景31や人物32の手前に存在し、人物32はARオブジェクト36(自動車)の背後に隠れている状態を表示している。なお、この例では、人物32はARオブジェクト36に隠れて見えないが、その位置を示すため、人物32の輪郭を破線32’で示している。 In the case of FIG. 6C, the AR object 36 (automobile) is present in front of the background 31 and the person 32, and the person 32 is hidden behind the AR object 36 (automobile). In this example, the person 32 is hidden behind the AR object 36 and cannot be seen, but the outline of the person 32 is shown by a broken line 32'in order to show the position.
 次に、音声処理について説明する。図6Aの状態で、人物32が口321から音声を発すると、音声入力部81(マイク)で集音し、そのまま音声出力部82(ヘッドフォン80)から出力される。HMD1を装着したユーザ10は、その音声を聞いても何ら違和感がない。この場合の音源位置は、音源位置特定部23により、人物32の口321であることが特定される。 Next, voice processing will be explained. When the person 32 emits sound from the mouth 321 in the state of FIG. 6A, the sound is collected by the voice input unit 81 (microphone) and output as it is from the voice output unit 82 (headphone 80). The user 10 wearing the HMD 1 does not feel any discomfort when listening to the voice. In this case, the sound source position is specified by the sound source position specifying unit 23 to be the mouth 321 of the person 32.
 一方図6Cの状態では、ARオブジェクト36(自動車)により隠れて見えない人物32’の音声を図6Aと同様の音量で出力すると、ユーザ10にとって対応する音源が分からず、違和感が生じる。そこで本実施例では、音源位置がARオブジェクト36により遮蔽される位置関係にある場合、ARオブジェクト36の構成要素の音響特性に基づき音声データを加工して出力する。 On the other hand, in the state of FIG. 6C, if the voice of the person 32'hidden by the AR object 36 (automobile) is output at the same volume as that of FIG. 6A, the user 10 does not know the corresponding sound source, and a sense of discomfort occurs. Therefore, in this embodiment, when the sound source position is in a positional relationship shielded by the AR object 36, the audio data is processed and output based on the acoustic characteristics of the components of the AR object 36.
 そのためARオブジェクト遮蔽判定部24では、ユーザ10から見て、ARオブジェクトにより音声データの音源位置が遮蔽されているかどうかを判定する。音源位置が遮蔽されていると判定した場合は、音声データ加工部25により、音声データを少し籠もった(こもった)音声に加工し、その加工した音声を音声出力部82(ヘッドフォン80)から出力する。これによりユーザ10は、人物32’の発する音声が物陰から聞こえてくる感覚になり、ARオブジェクト36(自動車)の背後にいる人物32’からの音声であることを直感的に把握することができる。 Therefore, the AR object shielding determination unit 24 determines whether or not the sound source position of the audio data is shielded by the AR object as seen from the user 10. When it is determined that the sound source position is shielded, the voice data processing unit 25 processes the voice data into a voice that is slightly trapped (muffled), and the processed voice is processed from the voice output unit 82 (headphone 80). Output. As a result, the user 10 feels that the voice emitted by the person 32'is heard from behind, and can intuitively grasp that the voice is from the person 32'behind the AR object 36 (automobile). ..
 音声データ加工部25では、音響特性データ保存部21に保存されているARオブジェクト(自動車)の構成要素(例えば、材質が金属、厚さが薄い、断面積が大きい)に対応する音響特性であるインパルス応答値3(図4、図5)を用いることで、現実感の増した音声データに加工することができる。 The voice data processing unit 25 has acoustic characteristics corresponding to the components (for example, the material is metal, the thickness is thin, and the cross-sectional area is large) of the AR object (automobile) stored in the acoustic characteristic data storage unit 21. By using the impulse response values 3 (FIGS. 4 and 5), it is possible to process the voice data with an increased sense of reality.
 図7Aと図7Bは、ARオブジェクト重畳表示時の映像・音声処理の他の例を示す図である。ここでは、ARオブジェクト(自動車)に人物が乗車した場合を想定し、ウィンドウの開閉状態に応じて、人物からの音声データの処理を異ならせる場合を示す。 7A and 7B are diagrams showing other examples of video / audio processing at the time of superimposed display of AR objects. Here, assuming a case where a person gets on an AR object (automobile), a case where the processing of voice data from the person is different depending on the open / closed state of the window is shown.
 図7Aは、ARオブジェクト37と実空間の人物33を重畳して表示した状態である。人物33は、ARオブジェクト37(自動車)に乗車しているように、ARオブジェクト37の配置および大きさを調整している。また、ARオブジェクト37のウィンドウ371は閉じている状態で、人物33はウィンドウ371越しに半透明で表示されている。 FIG. 7A shows a state in which the AR object 37 and the person 33 in the real space are superimposed and displayed. The person 33 adjusts the arrangement and size of the AR object 37 so as to be in the AR object 37 (automobile). Further, the window 371 of the AR object 37 is closed, and the person 33 is displayed semi-transparently through the window 371.
 この状態で、人物33が発する音声は、ARオブジェクト37の音響特性に応じた加工を行い、HMD1を装着したユーザ10に提示される。人物33の口331とユーザ10とを結ぶ線分上には、ARオブジェクトの構成要素として材質がガラスであるウィンドウ371が存在するので、音響特性にはガラスのインパルス応答値を用いる。 In this state, the voice emitted by the person 33 is processed according to the acoustic characteristics of the AR object 37 and presented to the user 10 wearing the HMD 1. Since the window 371 whose material is glass exists as a component of the AR object on the line segment connecting the mouth 331 of the person 33 and the user 10, the impulse response value of glass is used for the acoustic characteristics.
 図7Bは、ARオブジェクト37の助手席のウィンドウ371が開いている状態を示している。人物33の口331とユーザ10とを結ぶ線分上には、ARオブジェクトの構成要素が存在しないので、図6Aの状態と同様に、人物33が発する音声には何ら加工せずユーザ10に出力する。このように、ユーザ10に提示する音声を、ARオブジェクトの変化(ウィンドウの開閉状態)に応じて異ならせることで、ユーザ10の聞く音声はより現実味を帯びたものとなる。 FIG. 7B shows a state in which the passenger seat window 371 of the AR object 37 is open. Since there is no component of the AR object on the line segment connecting the mouth 331 of the person 33 and the user 10, the voice emitted by the person 33 is output to the user 10 without any processing, as in the state of FIG. 6A. do. In this way, by making the voice presented to the user 10 different according to the change of the AR object (the open / closed state of the window), the voice heard by the user 10 becomes more realistic.
 本例では、ARオブジェクトの変化としてウィンドウの開閉について説明したが、ARオブジェクトの変化であればどのような形態にも適用可能である。例えば、ARオブジェクトとして表示した住宅におけるドアの開閉状態や、人物の口に装着するマスクの着脱状態などにも適用できることは言うまでもない。 In this example, opening and closing of the window was explained as a change of the AR object, but it can be applied to any form as long as it is a change of the AR object. Needless to say, for example, it can be applied to the open / closed state of a door in a house displayed as an AR object, the attachment / detachment state of a mask attached to a person's mouth, and the like.
 また、上記の例では、ARオブジェクトが静止している場合としたが、ARオブジェクトが移動する場合についても上記の音声処理を適用できることは言うまでもない。その場合には、音源の前を通過するARオブジェクトの位置によって、ユーザ10が聞く音声が、時間的に変化するようにすればよい。 Further, in the above example, it is assumed that the AR object is stationary, but it goes without saying that the above voice processing can be applied even when the AR object moves. In that case, the voice heard by the user 10 may be changed over time depending on the position of the AR object passing in front of the sound source.
 [重畳表示の映像・音声処理手順]
  図8は、実空間の映像とARオブジェクトとを重畳表示する映像処理手順を示すフローチャートである。ここでは、図6Aの実空間の映像と図6BのARオブジェクトを重畳して、図6Cのように重畳して表示する場合について説明する。また、以下の処理は、図2Bに示した各機能ブロックにより実行される。
[Video / audio processing procedure for superimposed display]
FIG. 8 is a flowchart showing a video processing procedure for superimposing and displaying a real space video and an AR object. Here, a case where the real space image of FIG. 6A and the AR object of FIG. 6B are superimposed and displayed as superimposed as shown in FIG. 6C will be described. Further, the following processing is executed by each functional block shown in FIG. 2B.
 重畳表示の処理が開始(S411)されると、撮影データ取得部15により、実空間の撮影データを取得する(S412)。 When the superimposed display process is started (S411), the shooting data acquisition unit 15 acquires the shooting data in the real space (S412).
 ARオブジェクト情報保存部16に保存されているARオブジェクトの中から、表示すべきARオブジェクト36の画像に関する情報(形状・大きさ・配置等の情報)を入手する(S413)。 From the AR objects stored in the AR object information storage unit 16, information regarding the image of the AR object 36 to be displayed (information such as shape, size, arrangement, etc.) is obtained (S413).
 ARオブジェクト生成部17は、入手したARオブジェクト36の画像に関する情報を解析し、ARオブジェクト36を生成する(S414)。ARオブジェクトの生成にあたっては、3次元データ保存部14に保存されている実空間の情報(予め取得した実空間の3次元データ)を入手し、その実空間の形状を考慮して、表示すべきARオブジェクト36の配置や大きさなどを決定する。 The AR object generation unit 17 analyzes the information regarding the image of the acquired AR object 36 and generates the AR object 36 (S414). When generating an AR object, AR to be displayed by obtaining real space information (three-dimensional data of the real space acquired in advance) stored in the three-dimensional data storage unit 14 and considering the shape of the real space. The arrangement and size of the object 36 are determined.
 ARオブジェクト重畳部18は、撮影データ取得処理(S412)で取得した撮影データ(実空間の映像)と、ARブジェクトの生成処理(S414)で生成したARオブジェクト36とを重畳する(S415)。その際、ユーザ10(HMD1)から見る実空間の映像とARオブジェクトの奥行方向の位置関係に応じて、陰に隠れる映像(画像)部分は表示させないオクルージョン処理を行う。 The AR object superimposing unit 18 superimposes the shooting data (real space image) acquired in the shooting data acquisition process (S412) and the AR object 36 generated in the AR object generation process (S414) (S415). At that time, an occlusion process is performed in which the image (image) portion hidden in the shadow is not displayed according to the positional relationship between the image in the real space seen from the user 10 (HMD1) and the depth direction of the AR object.
 表示データ出力部19は、ARオブジェクトの重畳処理(S415)がなされた表示データを、HMD1の表示画面に出力する(S416)。 The display data output unit 19 outputs the display data to which the AR object superimposition processing (S415) has been performed to the display screen of the HMD 1 (S416).
 以上で、実空間の映像(図6Aの背景31と人物32)と、ARオブジェクト36(図6B)とを重畳して図6Cのように表示する映像処理を終了する(S417)。 This completes the video processing in which the real space video (background 31 and person 32 in FIG. 6A) and the AR object 36 (FIG. 6B) are superimposed and displayed as shown in FIG. 6C (S417).
 図9は、重畳表示の際の音声データの処理手順を示すフローチャートである。図6Aの実空間において人物32が発した音声を、図6CのARオブジェクト36が重畳された状態でユーザ10が聞こえる音声データに変換して出力する処理である。 FIG. 9 is a flowchart showing a procedure for processing audio data at the time of superimposed display. This is a process of converting the voice emitted by the person 32 in the real space of FIG. 6A into voice data that can be heard by the user 10 with the AR object 36 of FIG. 6C superimposed, and outputting the voice data.
 音声データの処理が開始(S431)されると、ARオブジェクト情報保存部16に保存されているARオブジェクト36の構成要素に関する付帯情報(構成要素の材質・厚さ・断面積等)を入手する(S432)。 When the processing of the voice data is started (S431), the incidental information (material, thickness, cross-sectional area, etc. of the components) of the components of the AR object 36 stored in the AR object information storage unit 16 is obtained ( S432).
 音声データ取得部22は、音声入力部81に音声入力が有るかどうかを判定する(S433)。判定の結果、音声入力があった場合はS434へ進み、音声データ取得部22は入力した音声データを取得する。音声入力がない場合は音声入力があるまで待機する。 The voice data acquisition unit 22 determines whether or not the voice input unit 81 has a voice input (S433). As a result of the determination, if there is a voice input, the process proceeds to S434, and the voice data acquisition unit 22 acquires the input voice data. If there is no voice input, wait until there is voice input.
 音源位置特定部23は、音源位置(人物32の口321の位置)を特定する(S435)。音源位置である人物32の口321の位置は、3次元データ保存部14に保存されている実空間の配置情報と、撮影データ取得部15が取得した実空間の映像とを解析することにより、特定することができる。 The sound source position specifying unit 23 specifies the sound source position (the position of the mouth 321 of the person 32) (S435). The position of the mouth 321 of the person 32, which is the sound source position, is determined by analyzing the real space arrangement information stored in the three-dimensional data storage unit 14 and the real space image acquired by the shooting data acquisition unit 15. Can be identified.
 ARオブジェクト遮蔽判定部24は、ARオブジェクトにより音源位置が遮蔽されているかどうかを判定する。すなわち、音源(人物32の口321)とHMD1を装着したユーザ10とを結ぶ線分上に、ARオブジェクト36が存在しているかどうかを判定する(S436)。 The AR object shielding determination unit 24 determines whether or not the sound source position is shielded by the AR object. That is, it is determined whether or not the AR object 36 exists on the line segment connecting the sound source (mouth 321 of the person 32) and the user 10 wearing the HMD 1 (S436).
 判定の結果、ARオブジェクトにより音源位置が遮蔽されている場合は、S437へ進み、ARオブジェクトにより音源位置が遮蔽されていない場合は、S439へ進む。 As a result of the determination, if the sound source position is shielded by the AR object, the process proceeds to S437, and if the sound source position is not shielded by the AR object, the process proceeds to S439.
 S437では、音源を遮蔽しているARオブジェクト36の構成要素を特定する。そして、音響特性データ保存部21に保存されている音響特性データの中から、ARオブジェクト36の構成要素に対応する音響特性データを選択する。図6Cの場合には、ARオブジェクト36の音響特性データは、材質が金属、厚さが薄い、断面積が大きい、に対応するインパルス応答値3を採用している。 In S437, the component of the AR object 36 that shields the sound source is specified. Then, the acoustic characteristic data corresponding to the component of the AR object 36 is selected from the acoustic characteristic data stored in the acoustic characteristic data storage unit 21. In the case of FIG. 6C, the acoustic characteristic data of the AR object 36 adopts the impulse response value 3 corresponding to the material being metal, the thickness being thin, and the cross-sectional area being large.
 音声データ加工部25は、S437で選択した音響特性(応答値3)に応じて音声データの加工処理を行う(S438)。 The voice data processing unit 25 processes voice data according to the acoustic characteristics (response value 3) selected in S437 (S438).
 S439では、音声データ出力部26は、音声出力部81(ヘッドフォン80)から音声データを出力する。 In S439, the audio data output unit 26 outputs audio data from the audio output unit 81 (headphone 80).
 以上で、図6Cのような重畳表示において、実空間における音声をHMD1を装着したユーザ10が聞こえる音声データに変換する処理を終了する(S440)。 With the above, in the superimposed display as shown in FIG. 6C, the process of converting the voice in the real space into the voice data that can be heard by the user 10 wearing the HMD 1 is completed (S440).
 なお、上記のARオブジェクト遮蔽判定処理(S436)では、図1Bの状態を想定し、ARオブジェクト35により音源位置301が遮蔽されているかどうかを判定した。さらに、図1Dの状態に対応するためには、音源位置301の前方にあるARオブジェクト35’により音声が反射されているかどうかを判定し、ARオブジェクト35’の音響特性(反射特性)に応じて音声データの加工処理(S438)を行うようにすればよい。 In the above AR object shielding determination process (S436), it was determined whether or not the sound source position 301 was shielded by the AR object 35, assuming the state of FIG. 1B. Further, in order to correspond to the state of FIG. 1D, it is determined whether or not the sound is reflected by the AR object 35'in front of the sound source position 301, and the sound characteristic (reflection characteristic) of the AR object 35'is determined. The audio data processing process (S438) may be performed.
 また上記の説明では、音源として実空間で発せられた音声(人物32,33の発する音声)のみを取り上げたが、さらに、ARオブジェクト(自動車)36,37が発する音声(例えばエンジン音やクラクション音)も含めて処理することができる。その場合は、音声データ出力処理(S439)において、実空間の音声データと、ARオブジェクトから発生する音声データとを重畳して音声出力を行う。 Further, in the above explanation, only the voices emitted in the real space (voices emitted by the persons 32 and 33) are taken up as the sound source, but further, the voices emitted by the AR objects (automobiles) 36 and 37 (for example, engine sound and horn sound) are taken up. ) Can also be processed. In that case, in the voice data output processing (S439), the voice data in the real space and the voice data generated from the AR object are superimposed and the voice is output.
 また音声データ加工処理(S438)では、音源であるARオブジェクトが実空間の物体により遮蔽/反射される場合、あるいは音源であるARオブジェクトが他のARオブジェクトにより遮蔽/反射される場合には、上記説明と同様に、遮蔽/反射する物体の音響特性データに基づいてARオブジェクトの音声データの加工処理を行うことは言うまでもない。 Further, in the audio data processing process (S438), when the AR object which is a sound source is shielded / reflected by an object in real space, or when the AR object which is a sound source is shielded / reflected by another AR object, the above It goes without saying that the processing of the audio data of the AR object is performed based on the acoustic characteristic data of the object to be shielded / reflected, as in the description.
 上記実施例では、重畳表示における映像処理と音声処理は、ハードウェア要素である映像演算部73と音声演算部83で行っているが、これらはソフトウェア処理によっても実現できることは言うまでもない。 In the above embodiment, the video processing and the audio processing in the superimposed display are performed by the video calculation unit 73 and the audio calculation unit 83, which are hardware elements, but it goes without saying that these can also be realized by software processing.
 実施例1によれば、ARオブジェクトの重畳表示と、ARオブジェクトの配置を考慮した音声出力を実現し、ユーザは違和感なく映像と音声を視聴することができる。また、音声演算部による音声データ加工処理ではパターン化された音響特性データを用いるので、処理時間が短縮しリアルタイムに加工することができる。 According to the first embodiment, the superimposed display of the AR object and the audio output in consideration of the arrangement of the AR object are realized, and the user can watch the video and audio without discomfort. Further, since the patterned acoustic characteristic data is used in the voice data processing by the voice calculation unit, the processing time can be shortened and the processing can be performed in real time.
 前記実施例1では、ARオブジェクトを重畳表示するとき、実空間における配置に矛盾が生じないようにARオブジェクトの配置や大きさを決定した。これに対し実施例2では、ARオブジェクトの配置や大きさを先に決定し、これに合わせて実空間の映像を編集して重畳させるようにしている。 In the first embodiment, when the AR objects are superimposed and displayed, the arrangement and size of the AR objects are determined so that there is no contradiction in the arrangement in the real space. On the other hand, in the second embodiment, the arrangement and size of the AR object are determined first, and the image in the real space is edited and superimposed according to this.
 実施例2におけるHMDの基本構成は実施例1で述べた構成と同様であるが、機能ブロックについては、実空間の映像を編集するための機能を追加している。 The basic configuration of the HMD in the second embodiment is the same as the configuration described in the first embodiment, but the function for editing the image in the real space is added to the functional block.
 図10は、HMDの機能ブロック構成を示す図である。図2Bの機能ブロック構成に対し、撮影データ編集部27と音源位置移動部28とを追加している。 FIG. 10 is a diagram showing a functional block configuration of the HMD. A shooting data editing unit 27 and a sound source position moving unit 28 are added to the functional block configuration of FIG. 2B.
 撮影データ編集部27では、撮影データを編集し、撮影対象物の大きさや配置を編集する。すなわち、ARオブジェクト生成部17により生成したARオブジェクトに合わせて撮影対象物を編集し、ARオブジェクト重畳部18により両者を重畳して表示する。 The shooting data editing unit 27 edits the shooting data and edits the size and arrangement of the shooting target. That is, the object to be photographed is edited according to the AR object generated by the AR object generation unit 17, and both are superimposed and displayed by the AR object superimposition unit 18.
 音源位置移動部28では、撮影データ編集部27により編集された映像において、音源の移動が伴う場合、移動した音源位置に合致するように、音声データを加工する。音源位置は、左右ヘッドフォンの音量差や、左右ヘッドフォンの到達時間差等を調整することにより、任意に設定することができる。 In the video edited by the shooting data editing unit 27, the sound source position moving unit 28 processes the audio data so as to match the moved sound source position when the sound source moves. The sound source position can be arbitrarily set by adjusting the volume difference between the left and right headphones, the arrival time difference between the left and right headphones, and the like.
 図11Aと図11Bは、ARオブジェクト重畳表示の例を示す図である。実空間の撮影データである人物34の映像を編集してARオブジェクト37(自動車)と重畳し、人物34がARオブジェクト37(自動車)に乗車している状態を表示する場合である。 11A and 11B are diagrams showing an example of AR object superimposed display. This is a case where the image of the person 34, which is the shooting data in the real space, is edited and superimposed on the AR object 37 (automobile) to display the state in which the person 34 is on the AR object 37 (automobile).
 図11Aは、人物34とARオブジェクト37とを、いずれも何ら映像加工を施さずに重畳した場合を示す。この場合、破線で示す人物34は、ARオブジェクト37(自動車)のフロント部(エンジン部)に存在し、現実にはあり得ない状態となっている。前記実施例1(図7)では、ARオブジェクト37(自動車)の配置を変えて人物34が乗車している状態を実現したが、本実施例では、人物34の配置を変えて自動車に乗車している状態を実現する。 FIG. 11A shows a case where the person 34 and the AR object 37 are superimposed without any image processing. In this case, the person 34 shown by the broken line exists in the front portion (engine portion) of the AR object 37 (automobile), and is in a state that cannot be realized in reality. In the first embodiment (FIG. 7), the arrangement of the AR object 37 (automobile) is changed to realize the state in which the person 34 is on board, but in this embodiment, the arrangement of the person 34 is changed to get on the automobile. Realize the state of being.
 図11Bは、人物34の映像を編集してARオブジェクト37と重畳した場合を示す。この例では、人物34の位置を矢印39の方向に移動させ、ARオブジェクト37のウィンドウ371の位置に配置させている。配置後の人物34’はウィンドウ371越しに半透明に見えるよう、また人物34’の大きさもARオブジェクト37の大きさに合わせて調整している。このように図11Bでは、撮影データ編集部27により実空間に存在する人物34の撮影データ(大きさ及び配置)を編集し、ARオブジェクト37と重畳しても何ら不自然さがないような映像を実現している。 FIG. 11B shows a case where the image of the person 34 is edited and superimposed on the AR object 37. In this example, the position of the person 34 is moved in the direction of the arrow 39 and placed at the position of the window 371 of the AR object 37. The person 34'after placement is made to look translucent through the window 371, and the size of the person 34'is also adjusted according to the size of the AR object 37. As described above, in FIG. 11B, the shooting data editing unit 27 edits the shooting data (size and arrangement) of the person 34 existing in the real space, and the image is not unnatural even if it is superimposed on the AR object 37. Has been realized.
 図11Bの映像処理では、人物34の位置を矢印39の方向に移動させて人物34’として表示している。その際、人物34から発する音声の音源位置は、移動前の口341から移動後の口341’に変化する。これに合わせて音源位置移動部28は、人物34の音声が、移動した音源位置341’から発せられたように音声データを加工する。これにより、人物34’の音声についても、不自然さを感じさせないように提供できる。 In the video processing of FIG. 11B, the position of the person 34 is moved in the direction of the arrow 39 and displayed as the person 34'. At that time, the sound source position of the voice emitted from the person 34 changes from the mouth 341 before the movement to the mouth 341'after the movement. In line with this, the sound source position moving unit 28 processes the voice data so that the voice of the person 34 is emitted from the moved sound source position 341'. As a result, the voice of the person 34'can be provided so as not to make the person feel unnatural.
 図12は、本実施例における重畳表示時の映像処理手順を示すフローチャートである。これは、実施例1(図8)のフローチャートをベースとし、実空間の映像を編集するためのステップS421~S422を追加している。ここでは、図8のフローチャートとの相違点のみ説明する。 FIG. 12 is a flowchart showing a video processing procedure at the time of superimposed display in this embodiment. This is based on the flowchart of the first embodiment (FIG. 8), and steps S421 to S422 for editing a real space image are added. Here, only the differences from the flowchart of FIG. 8 will be described.
 S415にて撮影データにARオブジェクトを重畳する処理を行った後、新たに追加したS421では、重畳表示において、撮影データとARオブジェクトとの位置関係に、何か不自然さがあるかどうかを判定する。例えば図11Aの重畳表示では、実空間の人物34が、現実には存在できないARオブジェクト37(自動車)のフロント部(エンジン部)に存在し、不自然な状態となっている。 After performing the process of superimposing the AR object on the shooting data in S415, in the newly added S421, it is determined whether or not there is any unnaturalness in the positional relationship between the shooting data and the AR object in the superimposition display. do. For example, in the superimposed display of FIG. 11A, the person 34 in the real space exists in the front portion (engine portion) of the AR object 37 (automobile), which cannot exist in reality, and is in an unnatural state.
 判定の結果、不自然な状態である場合は、S422へ進み撮影データの編集を行う。不自然でない場合は、S416へ進み、そのまま表示データを出力する。 If the result of the judgment is an unnatural state, proceed to S422 and edit the shooting data. If it is not unnatural, the process proceeds to S416 and the display data is output as it is.
 S422では、撮影データ編集部27により撮影データを編集する。例えば図11Bの場合は、人物34の撮影データを編集し、人物34がARオブジェクト37のウィンドウ371の位置に配置させ、また人物34の大きさも、ARオブジェクト37の大きさに合わせて調整する。その後、S416にて表示データとして出力する。 In S422, the shooting data is edited by the shooting data editing unit 27. For example, in the case of FIG. 11B, the shooting data of the person 34 is edited, the person 34 is arranged at the position of the window 371 of the AR object 37, and the size of the person 34 is also adjusted according to the size of the AR object 37. After that, it is output as display data in S416.
 図13は、重畳表示の際の音声データの処理手順を示すフローチャートである。これは、実施例1(図9)のフローチャートをベースとし、音源位置の移動に伴う音声処理のステップS441~S443を追加している。ここでは、図9のフローチャートとの相違点のみ説明する。 FIG. 13 is a flowchart showing a procedure for processing audio data at the time of superimposed display. This is based on the flowchart of the first embodiment (FIG. 9), and steps S441 to S443 of voice processing accompanying the movement of the sound source position are added. Here, only the differences from the flowchart of FIG. 9 will be described.
 S435にて、音源位置特定部23により音源位置(例えば図11Aにおける人物34の口341)を特定した後、新たに追加したS441にて、撮影データ編集部27により撮影データを編集したかどうかを判定する。つまり、図12の映像処理フローチャートにおいて、S422の撮影データ編集処理を実行したかどうかを判定する。 After the sound source position (for example, the mouth 341 of the person 34 in FIG. 11A) is specified by the sound source position specifying unit 23 in S435, whether or not the shooting data is edited by the shooting data editing unit 27 in the newly added S441 is determined. judge. That is, in the video processing flowchart of FIG. 12, it is determined whether or not the shooting data editing process of S422 is executed.
 判定の結果、撮影データを編集した場合は、追加したS442へ進み、撮影データを編集しなかった場合は、S436へ進む。 As a result of the determination, if the shooting data is edited, the process proceeds to the added S442, and if the shooting data is not edited, the process proceeds to S436.
 S442では、撮影データの編集に伴い、音源位置の移動が必要かどうかを判断する。例えば図11Aから図11Bへの撮影データの編集において、音源位置(人物34の口341)の移動が必要かどうかを判定する。音源位置の移動が必要である場合は、S443に進み、音源位置の移動が必要でない場合は、S436に進む。 In S442, it is determined whether or not it is necessary to move the sound source position with the editing of the shooting data. For example, in editing the shooting data from FIG. 11A to FIG. 11B, it is determined whether or not the sound source position (mouth 341 of the person 34) needs to be moved. If it is necessary to move the sound source position, the process proceeds to S443, and if it is not necessary to move the sound source position, the process proceeds to S436.
 S443では、音源位置移動部28により、移動した音源位置に合致するように音声データを加工する。 In S443, the sound source position moving unit 28 processes the voice data so as to match the moved sound source position.
 その後S436にて、ARオブジェクト遮蔽判定部24は、ARオブジェクトにより音源位置が遮蔽されているかどうかを判定する。もちろんこのときの音源位置は、S443で音源位置が移動されていれば移動後の音源位置となる。 After that, in S436, the AR object shielding determination unit 24 determines whether or not the sound source position is shielded by the AR object. Of course, the sound source position at this time will be the sound source position after the movement if the sound source position has been moved in S443.
 さらにS437以降では、遮蔽するARオブジェクトの音響特性に応じて音声データを加工し、出力する。 Furthermore, in S437 and later, audio data is processed and output according to the acoustic characteristics of the AR object to be shielded.
 その結果、図11Aの人物34の口341から発せられた音声は、図11Bの口341’から発せられたように、かつウィンドウ371の遮蔽の影響を加味して出力される。 As a result, the voice emitted from the mouth 341 of the person 34 in FIG. 11A is output as if it was emitted from the mouth 341'in FIG. 11B and with the influence of the shielding of the window 371 added.
 実施例2によれば、実空間の映像と、ARオブジェクトの配置および大きさの関係が不自然であった場合でも、実空間の映像を編集することや、実空間の音声を加工することにより、その不自然さを解消して提示することができる。 According to the second embodiment, even if the relationship between the real space image and the arrangement and size of the AR object is unnatural, by editing the real space image or processing the real space sound. , The unnaturalness can be resolved and presented.
 さらに実施例2では、実空間の映像を編集することで、現実には実現不可能な表示も可能となる。例えばARオブジェクトとして空中に浮かばせた自動車に、実空間の人物が乗車している仮想の状態をリアリティに表示できる。実空間では人物はそのような姿勢をとることは困難であるが、実空間の映像を編集することで容易に実現できる。 Further, in the second embodiment, by editing the video in the real space, it is possible to display a display that cannot be realized in reality. For example, it is possible to realistically display the virtual state in which a person in real space is riding in a car floating in the air as an AR object. It is difficult for a person to take such a posture in the real space, but it can be easily realized by editing the image in the real space.
 実施例3では、HMDがカメラを2個搭載した場合について説明する。
  図14は、本実施例のHMD1をユーザ10が装着した状態を示す図である。HMD1の撮像部71として左右端に2個のカメラ711,712を搭載しており、2個のカメラで撮影した実空間の映像をステレオ映像として表示することで、ユーザ10は実空間の奥行き方向を直観的に認識できるようになる。なお、HMD1の内部構成や映像・音声処理は実施例1の場合と同様であり、重複する説明を省略する。
In the third embodiment, a case where the HMD is equipped with two cameras will be described.
FIG. 14 is a diagram showing a state in which the user 10 wears the HMD 1 of this embodiment. Two cameras 711 and 712 are mounted on the left and right ends as the image pickup unit 71 of the HMD 1, and by displaying the real space image taken by the two cameras as a stereo image, the user 10 can use the depth direction of the real space. Will be able to be recognized intuitively. The internal configuration of the HMD 1 and the video / audio processing are the same as in the case of the first embodiment, and duplicate description will be omitted.
 また、図14では図示していないが、音声入力部81(マイク)についてもカメラと同様に、HMD1の左右端に2個のマイクを搭載することにより、実空間の音声を音場空間が広がるステレオ音声として取得することできる。そして、HMD1の音声出力部82(左右のヘッドフォン821,822)からステレオ音声を出力することで、ユーザ10に対し奥行き感のある音声を提供することができる。 Further, although not shown in FIG. 14, the sound field space expands the sound in the real space by mounting two microphones on the left and right ends of the HMD1 for the sound input unit 81 (microphone) as well as the camera. It can be acquired as stereo sound. Then, by outputting the stereo sound from the sound output unit 82 (left and right headphones 821, 822) of the HMD 1, it is possible to provide the user 10 with a sound having a sense of depth.
 実施例3によれば、HMD1を装着したユーザ10は、実空間の映像の奥行き方向を直観的に認識することや、実空間の音声を音場空間が広がるステレオ音声として聞くことが可能となる。 According to the third embodiment, the user 10 wearing the HMD 1 can intuitively recognize the depth direction of the image in the real space and listen to the sound in the real space as the stereo sound in which the sound field space expands. ..
 前記実施例では音源が1個の場合を想定していたが、実施例4では、音源が複数個存在する場合について説明する。実施例4におけるHMDの基本構成は実施例1で述べた構成と同様であるが、機能ブロックについては、複数の音源を分離するための機能を追加している。 In the above embodiment, the case where there is one sound source was assumed, but in the fourth embodiment, the case where there are a plurality of sound sources will be described. The basic configuration of the HMD in the fourth embodiment is the same as the configuration described in the first embodiment, but the function for separating a plurality of sound sources is added to the functional block.
 図15は、HMDの機能ブロック構成を示す図である。図2Bの機能ブロック構成に対し、音源分離部29を追加している。 FIG. 15 is a diagram showing a functional block configuration of the HMD. A sound source separation unit 29 is added to the functional block configuration of FIG. 2B.
 音源分離部29は、音声データ取得部22で取得した音声データを解析し、音源毎の個々の音源に分離する処理を行う。音源を分離する一般的な手法として、外部に複数のマイクを設置し、複数マイクの個々の位置における音声データを位相差や音圧差などを検出し、音源の位置を正確に求める方法がある。これに対し本実施例では、実施例3で述べたように、HMD1の左右端に2個のマイクを搭載し、1個の音源が発する音が左右のマイクに到達する位相差や音量差により音源の方向を検出して、簡易的に音源を分離する。また、3次元データ保存部14に保存されている実空間の配置情報と、撮影データ取得部15で取得した実空間の映像から、音源の位置を特定し分離することもできる。 The sound source separation unit 29 analyzes the audio data acquired by the audio data acquisition unit 22, and performs a process of separating each sound source into individual sound sources. As a general method for separating sound sources, there is a method in which a plurality of microphones are installed externally, and audio data at individual positions of the plurality of microphones are detected for phase difference, sound pressure difference, etc., and the position of the sound source is accurately obtained. On the other hand, in this embodiment, as described in the third embodiment, two microphones are mounted on the left and right ends of the HMD 1, and the sound emitted by one sound source reaches the left and right microphones due to the phase difference and the volume difference. Detects the direction of the sound source and simply separates the sound source. Further, the position of the sound source can be specified and separated from the arrangement information in the real space stored in the three-dimensional data storage unit 14 and the image in the real space acquired by the shooting data acquisition unit 15.
 図16は、音声データの処理手順を示すフローチャートである。これは、実施例1(図9)のフローチャートをベースとし、音源分離に伴うステップS445、S446を追加している。ここでは、図9のフローチャートとの相違点のみ説明する。 FIG. 16 is a flowchart showing a procedure for processing voice data. This is based on the flowchart of the first embodiment (FIG. 9), and steps S445 and S446 accompanying the sound source separation are added. Here, only the differences from the flowchart of FIG. 9 will be described.
 S435にて、音源位置特定部23により音源位置を特定した後、新たに追加したS445にて、音源分離部29により、複数の音源を個々に分離する。そして次のS436以降で、分離した個々の音源に対して、遮蔽されるARオブジェクトの構成要素の音響特性に基づいて音声データの加工処理を行う。 After the sound source position is specified by the sound source position specifying unit 23 in S435, a plurality of sound sources are individually separated by the sound source separating unit 29 in the newly added S445. Then, in the next S436 or later, audio data processing is performed on the separated individual sound sources based on the acoustic characteristics of the components of the AR object to be shielded.
 なお、複数の音源が存在する場合、音量(音圧)や周波数などで閾値を設け、閾値以下の小さな音量の音源および閾値以上の高周波の音源については、S436~S438の音声データ加工処理を行わないようにして、音声加工の対象とする音源の数を絞るようにしてもよい。S446では、分離した全ての音源について処理が終了したかどうかを判定し、終了していればS439にて音声データを出力する。 When a plurality of sound sources exist, a threshold value is set for the volume (sound pressure), frequency, etc., and the sound data processing of S436 to S438 is performed for the sound source having a low volume below the threshold value and the sound source having a high frequency exceeding the threshold value. It may be possible to narrow down the number of sound sources to be processed by voice processing. In S446, it is determined whether or not the processing has been completed for all the separated sound sources, and if so, the audio data is output in S439.
 本実施例においても、音声データ出力処理(S439)において、ARオブジェクトからの音声(例えば自動車のエンジン音やクラクション音)を重畳して音声出力を行えることは言うまでもない。 Needless to say, also in this embodiment, in the voice data output processing (S439), the voice from the AR object (for example, the engine sound of the car or the horn sound) can be superimposed and the voice output can be performed.
 実施例4によれば、音源が複数存在した場合においても、個々の音源についてARオブジェクトの配置を考慮して音声を出力することができる。 According to the fourth embodiment, even when a plurality of sound sources exist, it is possible to output sound for each sound source in consideration of the arrangement of AR objects.
 実施例5では、外部のサーバに接続して処理に必要な情報を入手するHMDシステムについて説明する。
  図17は、HMDシステムの全体構成を示す図である。ユーザ10が装着しているHMD1は、通信部6のLAN通信部61を介して、無線ルータ65およびネットワーク網66に接続されている。ネットワーク網66には、複数のサーバ67~69が接続されている。
In the fifth embodiment, an HMD system that connects to an external server and obtains information necessary for processing will be described.
FIG. 17 is a diagram showing the overall configuration of the HMD system. The HMD 1 worn by the user 10 is connected to the wireless router 65 and the network network 66 via the LAN communication unit 61 of the communication unit 6. A plurality of servers 67 to 69 are connected to the network network 66.
 第1のサーバ67には、ARオブジェクトの画像に関する情報(形状・大きさ・配置等の情報)や、ARオブジェクトの構成要素に関する付帯情報(各構成要素の材質・厚さ・断面積等)、ARオブジェクトに関する情報が保存・管理されている。この第1のサーバ67を利用することにより、ARオブジェクト情報保存部16で必要とする記憶部4の各種データ部42の負荷を軽減することができる。 The first server 67 contains information on the image of the AR object (information on the shape, size, arrangement, etc.), incidental information on the components of the AR object (material, thickness, cross-sectional area, etc. of each component). Information about AR objects is stored and managed. By using the first server 67, it is possible to reduce the load on various data units 42 of the storage unit 4 required by the AR object information storage unit 16.
 第2のサーバ68には、材料毎に材質、厚さ、断面積をパラメータに音響特性データがパターン化されて、保存・管理されている。第2のサーバ68で保存・管理されている材料は、ARオブジェクトの構成要素の材料のみならず、ARオブジェクトの構成要素以外の材料に関する音響特性データも含まれている。この第2サーバを利用することにより、音響特性データ保存部21で必要とする記憶部4の各種データ部42の負荷を軽減することができる。 The second server 68 is stored and managed by patterning acoustic characteristic data with parameters of material, thickness, and cross-sectional area for each material. The material stored and managed by the second server 68 includes not only the material of the component of the AR object but also the acoustic property data regarding the material other than the component of the AR object. By using this second server, it is possible to reduce the load on various data units 42 of the storage unit 4 required by the acoustic characteristic data storage unit 21.
 第3のサーバ69は、映像処理部7の映像演算部73や、音声処理部8の音声演算部83等で行っている演算処理を高速に行うサーバである。この第3のサーバを利用することにより、ARオブジェクト生成部17、ARオブジェクト重畳部18、音源位置特定部23、音声データ加工部25の負荷を軽減することができる。 The third server 69 is a server that performs arithmetic processing performed by the video arithmetic unit 73 of the video processing unit 7 and the audio arithmetic unit 83 of the audio processing unit 8 at high speed. By using this third server, it is possible to reduce the load on the AR object generation unit 17, the AR object superimposition unit 18, the sound source position specifying unit 23, and the voice data processing unit 25.
 本実施例では、HMD1の負荷を軽減するため、全てのサーバ67~69を利用しているが、必要に応じて、適宜サーバを選択して利用できることは言うまでもない。なお、これらのサーバは、個々に独立することなく、集約することも可能である。 In this embodiment, all the servers 67 to 69 are used in order to reduce the load on the HMD 1, but it goes without saying that the servers can be appropriately selected and used as needed. It should be noted that these servers can be aggregated without being independent of each other.
 以上、本発明の実施形態について実施例1~5を用いて説明したが、本発明の技術を実現する構成は前記実施例に限られるものではなく、様々な変形例に対しても適用できる。例えば、ある実施例の構成の一部を他の実施例の構成と置き換えることが可能であり、また、ある実施例の構成に他の実施例の構成を加えることも可能である。すなわち、HMDにより実空間の映像にARオブジェクトを重畳して表示するとき、実空間の物体とARオブジェクトの少なくとも一方が音源となる場合に適用できる。そして、音源からの音声データを、周囲の物体または周囲のARオブジェクトの音響特性(位置関係により減衰特性または反射特性を用いる)を加味して加工処理して出力するものである。 Although the embodiments of the present invention have been described above with reference to Examples 1 to 5, the configuration for realizing the technique of the present invention is not limited to the above-mentioned Examples, and can be applied to various modified examples. For example, it is possible to replace a part of the configuration of one embodiment with the configuration of another embodiment, and it is also possible to add the configuration of another embodiment to the configuration of one embodiment. That is, when the AR object is superimposed and displayed on the image in the real space by the HMD, it can be applied when at least one of the object in the real space and the AR object becomes a sound source. Then, the audio data from the sound source is processed and output in consideration of the acoustic characteristics (attenuation characteristics or reflection characteristics are used depending on the positional relationship) of the surrounding object or the surrounding AR object.
 各実施例の説明においては、ビデオ透過型のHMDを前提としたが、ユーザが使用する端末はHMDの形態でなくともよい。例えば、スマートフォンのカメラとディスプレイを利用してARオブジェクトを表示し、ユーザが装着したヘッドフォンに本実施例に示した音響処理を施した音声を出力するようにしてもよい。 In the description of each embodiment, a video transmission type HMD is premised, but the terminal used by the user does not have to be in the form of an HMD. For example, the AR object may be displayed by using the camera and the display of the smartphone, and the sound processed by the acoustic processing shown in the present embodiment may be output to the headphones worn by the user.
 各実施例で説明した機能等は、それらの一部または全部を、例えば集積回路で設計する等によりハードウェアで実現してもよい。また、マイクロプロセッサユニット等がそれぞれの機能等を実現するプログラムを解釈して実行することによりソフトウェアで実現してもよい。ハードウェアとソフトウェアを併用してもよい。その際のソフトウェアは、製品出荷の時点で、予めHMDのプログラム部41等に格納された状態であってもよいし、製品出荷後に、インターネット上の各種サーバ等から取得するものであってもよい。また、メモリカードや光ディスク等で提供されるソフトウェアを取得するものであってもよい。 The functions and the like described in each embodiment may be realized by hardware, for example, by designing a part or all of them by an integrated circuit. Further, it may be realized by software by interpreting and executing a program in which a microprocessor unit or the like realizes each function or the like. Hardware and software may be used together. The software at that time may be stored in the program unit 41 or the like of the HMD in advance at the time of product shipment, or may be acquired from various servers or the like on the Internet after the product is shipped. .. Further, the software provided by a memory card, an optical disk, or the like may be acquired.
 また、図中に示した制御線や情報線は説明上必要と考えられるものを示しており、必ずしも製品上の全ての制御線や情報線を示しているとは限らない。実際には殆ど全ての構成が相互に接続されていると考えてもよい。 In addition, the control lines and information lines shown in the figure indicate what is considered necessary for explanation, and do not necessarily indicate all the control lines and information lines on the product. In practice, it can be considered that almost all configurations are interconnected.
 1…ヘッドマウントディスプレイ(HMD)、2…主制御部、4…記憶部、5…センサ部、6…通信処理部、7…映像処理部、8…音声処理部、9…操作入力部、10…ユーザ、12…3次元センサ情報取得部、13…3次元データ処理部、14…3次元データ保存部、15…撮影データ取得部、16…ARオブジェクト情報保存部、17…ARオブジェクト生成部、18…ARオブジェクト重畳部、19…表示データ出力部、21…音響特性データ保存部、22…音声データ取得部、23…音源位置特定部、24…ARオブジェクト遮蔽判定部、25…音声データ加工部、26…音声データ出力部、27…撮影データ編集部、28…音源位置移動部、29…音源分離部、32~34…人物、35~37…ARオブジェクト、42…各種データ部、53…3次元センサ部、67~69…サーバ、70…カメラ、71…撮像部、72…表示部、73…映像演算部、80…ヘッドフォン、81…音声入力部、82…音声出力部、83…音声演算部。 1 ... Head mount display (HMD), 2 ... Main control unit, 4 ... Storage unit, 5 ... Sensor unit, 6 ... Communication processing unit, 7 ... Video processing unit, 8 ... Audio processing unit, 9 ... Operation input unit, 10 ... User, 12 ... 3D sensor information acquisition unit, 13 ... 3D data processing unit, 14 ... 3D data storage unit, 15 ... Shooting data acquisition unit, 16 ... AR object information storage unit, 17 ... AR object generation unit, 18 ... AR object superimposition unit, 19 ... display data output unit, 21 ... acoustic characteristic data storage unit, 22 ... audio data acquisition unit, 23 ... sound source position identification unit, 24 ... AR object shielding determination unit, 25 ... audio data processing unit , 26 ... Audio data output unit, 27 ... Shooting data editing unit, 28 ... Sound source position moving unit, 29 ... Sound source separation unit, 32 to 34 ... People, 35 to 37 ... AR object, 42 ... Various data units, 53 ... 3 Dimensional sensor unit, 67-69 ... server, 70 ... camera, 71 ... imaging unit, 72 ... display unit, 73 ... video calculation unit, 80 ... headphones, 81 ... audio input unit, 82 ... audio output unit, 83 ... audio calculation Department.

Claims (11)

  1.  ユーザが頭部に装着し拡張現実のオブジェクト(以下、ARオブジェクト)を表示するヘッドマウントディスプレイにおいて、
     実空間の映像を撮影する撮像部と、
     ARオブジェクトを生成するARオブジェクト生成部と、
     前記撮影した実空間の映像と、前記生成したARオブジェクトとを重畳するARオブジェクト重畳部と、
     前記重畳されたた実空間の映像とARオブジェクトとを表示する表示部と、
     実空間にて発せられた音声を入力する音声入力部と、
     前記ARオブジェクト重畳部により配置された実空間に存在する物体とこれに重畳して表示されるARオブジェクトの位置関係に応じて、実空間からの音声とARオブジェクトが発する音声の少なくとも一方の音声データを、周囲の物体または周囲のARオブジェクトの音響特性を加味して加工処理する音声データ加工部と、
     前記音声データ加工部にて加工処理された実空間からの音声またはARオブジェクトの音声を出力する音声出力部と、
     を備えることを特徴とするヘッドマウントディスプレイ。
    In a head-mounted display that the user wears on his head and displays an augmented reality object (hereinafter referred to as AR object).
    An image pickup unit that captures images in real space,
    The AR object generator that creates the AR object, and
    An AR object superimposing unit that superimposes the captured real space image and the generated AR object,
    A display unit that displays the superimposed real-space image and an AR object,
    A voice input unit that inputs voice emitted in real space,
    At least one of the audio data from the real space and the audio emitted by the AR object, depending on the positional relationship between the object existing in the real space arranged by the AR object superimposition unit and the AR object displayed superimposed on the object. With the audio data processing unit that processes the surrounding object or the surrounding AR object in consideration of the acoustic characteristics.
    An audio output unit that outputs audio from the real space processed by the audio data processing unit or audio of an AR object, and
    A head-mounted display characterized by being equipped with.
  2.  請求項1に記載のヘッドマウントディスプレイにおいて、
     当該ヘッドマウントディスプレイから見て、前記ARオブジェクト重畳部により配置されたARオブジェクトにより実空間の音源位置が遮蔽されているかどうかを判定するARオブジェクト遮蔽判定部を備え、
     前記ARオブジェクト遮蔽判定部の判定により実空間の音源位置が遮蔽されていると判定された場合、前記音声データ加工部は、実空間の音声データを、遮蔽しているARオブジェクトの音響特性を加味して加工処理することを特徴とするヘッドマウントディスプレイ。
    In the head-mounted display according to claim 1,
    It is provided with an AR object shielding determination unit that determines whether or not the sound source position in the real space is shielded by the AR object arranged by the AR object superimposing unit when viewed from the head-mounted display.
    When it is determined by the determination of the AR object shielding determination unit that the sound source position in the real space is shielded, the audio data processing unit adds the acoustic characteristics of the AR object that shields the audio data in the real space. A head-mounted display that is characterized by being processed.
  3.  請求項2に記載のヘッドマウントディスプレイにおいて、
     ARオブジェクトの構成要素毎に、その形状をパラメータとして、音響特性にかかわる音響特性データをパターン化して保存する音響特性データ保存部を備え、
     前記音声データ加工部は、前記音響特性データ保存部から該当するARオブジェクトの音響特性データを選択して、実空間の音声データの加工処理を行うことを特徴とするヘッドマウントディスプレイ。
    In the head-mounted display according to claim 2,
    Each component of the AR object is equipped with an acoustic characteristic data storage unit that stores the acoustic characteristic data related to the acoustic characteristics in a pattern with the shape as a parameter.
    The audio data processing unit is a head-mounted display characterized by selecting acoustic characteristic data of a corresponding AR object from the acoustic characteristic data storage unit and processing audio data in real space.
  4.  請求項1に記載のヘッドマウントディスプレイにおいて、
     前記生成したARオブジェクトに合わせて前記撮影した実空間の映像の大きさや配置を編集する撮影データ編集部と、
     前記撮影データ編集部により編集された実空間の映像において音源位置の移動が伴う場合、移動した音源位置に合致するように実空間の音声データを加工する音源位置移動部と、
     を備えることを特徴とするヘッドマウントディスプレイ。
    In the head-mounted display according to claim 1,
    A shooting data editing unit that edits the size and arrangement of the shot real space image according to the generated AR object.
    When the sound source position moves in the real space image edited by the shooting data editing unit, the sound source position moving unit that processes the sound data in the real space so as to match the moved sound source position, and the sound source position moving unit.
    A head-mounted display characterized by being equipped with.
  5.  請求項1に記載のヘッドマウントディスプレイにおいて、
     前記撮像部として複数のカメラを有し、
     前記表示部には、前記複数のカメラで撮影した実空間の映像をステレオ映像として表示するとともに、
     前記音声入力部として複数のマイクを有し、
     前記音声出力部から、前記複数のマイクに入力した実空間の音声をステレオ音声で出力することを特徴とするヘッドマウントディスプレイ。
    In the head-mounted display according to claim 1,
    It has a plurality of cameras as the image pickup unit and has a plurality of cameras.
    On the display unit, a real space image taken by the plurality of cameras is displayed as a stereo image, and at the same time, the image is displayed as a stereo image.
    It has a plurality of microphones as the voice input unit, and has a plurality of microphones.
    A head-mounted display characterized by outputting real-space audio input to the plurality of microphones as stereo audio from the audio output unit.
  6.  請求項1に記載のヘッドマウントディスプレイにおいて、
     前記音声入力部として複数のマイクを有し、
     前記実空間にて発せられた複数の音声に対して、前記複数のマイクを用いて、それぞれの音源位置を特定する音源分離部を備え、
     前記音声データ加工部は、複数の音声に対し、前記特定したそれぞれの音源位置に応じて音声データの加工処理を行うことを特徴とするヘッドマウントディスプレイ。
    In the head-mounted display according to claim 1,
    It has a plurality of microphones as the voice input unit, and has a plurality of microphones.
    A sound source separation unit for specifying the position of each sound source by using the plurality of microphones for a plurality of sounds emitted in the real space is provided.
    The voice data processing unit is a head-mounted display characterized in that it processes voice data for a plurality of voices according to each of the specified sound source positions.
  7.  請求項1に記載のヘッドマウントディスプレイにおいて、
     通信部を介して少なくとも1つの外部サーバに接続し、
     前記外部サーバから、前記ARオブジェクト生成部にて生成するARオブジェクトに関する情報を入手し、
     または前記外部サーバから、前記音声データ加工部にて音声データを加工処理するために使用するARオブジェクトの音響特性データを入手し、
     または前記外部サーバを用いて、前記ARオブジェクト生成部、前記ARオブジェクト重畳部、前記音声データ加工部の少なくとも1つの処理を行うことを特徴とするヘッドマウントディスプレイ。
    In the head-mounted display according to claim 1,
    Connect to at least one external server via the communication unit,
    Obtain information about the AR object generated by the AR object generation unit from the external server, and obtain information about the AR object.
    Alternatively, the acoustic characteristic data of the AR object used for processing the voice data in the voice data processing unit is obtained from the external server.
    Alternatively, a head-mounted display characterized in that at least one of the AR object generation unit, the AR object superimposition unit, and the voice data processing unit is processed by using the external server.
  8.  ユーザが頭部に装着したヘッドマウントディスプレイで拡張現実のオブジェクト(以下、ARオブジェクト)を表示する際の音声処理方法において、
     撮影した実空間の映像に、生成したARオブジェクトを重畳して表示する映像処理ステップに合わせて、
     実空間にて発せられた音声を入力する音声入力ステップと、
     前記映像処理ステップにより配置された実空間に存在する物体とこれに重畳して表示されるARオブジェクトの位置関係に応じて、実空間からの音声とARオブジェクトが発する音声の少なくとも一方の音声データを、周囲の物体または周囲のARオブジェクトの音響特性を加味して加工処理する音声データ加工ステップと、
     前記加工処理された実空間からの音声またはARオブジェクトの音声を出力する音声出力ステップと、
     を備えることを特徴とする音声処理方法。
    In the voice processing method when displaying an augmented reality object (hereinafter referred to as AR object) on a head-mounted display worn by the user on the head.
    According to the video processing step of superimposing the generated AR object on the shot real space video and displaying it,
    A voice input step for inputting voice emitted in real space,
    At least one of the audio data from the real space and the audio emitted by the AR object is generated according to the positional relationship between the object existing in the real space arranged by the video processing step and the AR object displayed superimposed on the object. , Audio data processing steps that take into account the acoustic characteristics of surrounding objects or AR objects, and
    The audio output step that outputs the audio from the processed real space or the audio of the AR object,
    A voice processing method characterized by comprising.
  9.  請求項8に記載の音声処理方法において、
     前記ヘッドマウントディスプレイから見て、前記映像処理ステップにより配置されたARオブジェクトにより実空間の音源位置が遮蔽されているかどうかを判定するARオブジェクト遮蔽判定ステップを備え、
     前記ARオブジェクト遮蔽判定ステップの判定により実空間の音源位置が遮蔽されていると判定された場合、前記音声データ加工ステップでは、実空間の音声データを、遮蔽しているARオブジェクトの音響特性を加味して加工処理することを特徴とする音声処理方法。
    In the voice processing method according to claim 8,
    The AR object shielding determination step for determining whether or not the sound source position in the real space is shielded by the AR object arranged by the image processing step when viewed from the head-mounted display is provided.
    When it is determined that the sound source position in the real space is shielded by the determination in the AR object shielding determination step, the audio data in the real space is added to the acoustic characteristics of the shielding AR object in the audio data processing step. A voice processing method characterized by processing.
  10.  請求項9に記載の音声処理方法において、
     予め、ARオブジェクトの構成要素毎に、その形状をパラメータとして、音響特性にかかわる音響特性データをパターン化して保存しておき、
     前記音声データ加工ステップでは、前記保存している音響特性データから、該当するARオブジェクトの音響特性データを選択して、実空間の音声データの加工処理を行うことを特徴とする音声処理方法。
    In the voice processing method according to claim 9,
    In advance, for each component of the AR object, the shape is used as a parameter, and the acoustic characteristic data related to the acoustic characteristic is patterned and saved.
    The voice data processing step is a voice processing method characterized in that the acoustic characteristic data of a corresponding AR object is selected from the stored acoustic characteristic data and the voice data in the real space is processed.
  11.  請求項8に記載の音声処理方法において、
     前記映像処理ステップにて、前記生成したARオブジェクトに合わせて前記撮影した実空間の映像の大きさや配置を編集し、編集された実空間の映像において音源位置の移動が伴う場合、移動した音源位置に合致するように実空間の音声データを加工する音源位置移動ステップ、
     を備えることを特徴とする音声処理方法。
    In the voice processing method according to claim 8,
    In the video processing step, the size and arrangement of the captured real space image are edited according to the generated AR object, and when the edited real space image is accompanied by the movement of the sound source position, the moved sound source position is used. Sound source position movement step that processes real space audio data to match
    A voice processing method characterized by comprising.
PCT/JP2020/032962 2020-08-31 2020-08-31 Head-mounted display and voice processing method therefor WO2022044342A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2020/032962 WO2022044342A1 (en) 2020-08-31 2020-08-31 Head-mounted display and voice processing method therefor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2020/032962 WO2022044342A1 (en) 2020-08-31 2020-08-31 Head-mounted display and voice processing method therefor

Publications (1)

Publication Number Publication Date
WO2022044342A1 true WO2022044342A1 (en) 2022-03-03

Family

ID=80354993

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2020/032962 WO2022044342A1 (en) 2020-08-31 2020-08-31 Head-mounted display and voice processing method therefor

Country Status (1)

Country Link
WO (1) WO2022044342A1 (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009116690A (en) * 2007-11-07 2009-05-28 Canon Inc Image processing apparatus and image processing method
JP2017182460A (en) * 2016-03-30 2017-10-05 セイコーエプソン株式会社 Head-mounted type display device, method for controlling head-mounted type display device, and computer program

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009116690A (en) * 2007-11-07 2009-05-28 Canon Inc Image processing apparatus and image processing method
JP2017182460A (en) * 2016-03-30 2017-10-05 セイコーエプソン株式会社 Head-mounted type display device, method for controlling head-mounted type display device, and computer program

Similar Documents

Publication Publication Date Title
US11617050B2 (en) Systems and methods for sound source virtualization
AU2022204210B2 (en) Virtual and real object recording in mixed reality device
KR102609668B1 (en) Virtual, Augmented, and Mixed Reality
EP2660643B1 (en) Head mountable display system
US11956623B2 (en) Processing sound in an enhanced reality environment
US11902772B1 (en) Own voice reinforcement using extra-aural speakers
KR20180102171A (en) Pass-through camera user interface elements for virtual reality
CN110087156B (en) Wearable sound system with configurable privacy mode
EP2661097A2 (en) Audio system
EP3261367B1 (en) Method, apparatus, and computer program code for improving perception of sound objects in mediated reality
CN104765444A (en) In-vehicle gesture interactive spatial audio system
JP2016208348A (en) Display device, control method for display device, and program
WO2017098773A1 (en) Information processing device, information processing method, and program
US20220066207A1 (en) Method and head-mounted unit for assisting a user
JPWO2017098775A1 (en) Information processing apparatus, information processing method, and program
CN114885274A (en) Spatialization audio system and method for rendering spatialization audio
JP2020501275A (en) Image processing system and method
WO2022044342A1 (en) Head-mounted display and voice processing method therefor
US11070933B1 (en) Real-time acoustic simulation of edge diffraction
CN115150716A (en) Audio system and method for determining an audio filter based on device location
US20240098447A1 (en) Shared point of view
US20230027060A1 (en) Display system and method
CN117294980A (en) Method and system for acoustic transparent transmission
CN117676002A (en) Audio processing method and electronic equipment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20951591

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20951591

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP