WO2021191651A1 - 音データ処理装置および音データ処理方法 - Google Patents

音データ処理装置および音データ処理方法 Download PDF

Info

Publication number
WO2021191651A1
WO2021191651A1 PCT/IB2020/000323 IB2020000323W WO2021191651A1 WO 2021191651 A1 WO2021191651 A1 WO 2021191651A1 IB 2020000323 W IB2020000323 W IB 2020000323W WO 2021191651 A1 WO2021191651 A1 WO 2021191651A1
Authority
WO
WIPO (PCT)
Prior art keywords
sound
sound data
occupant
data processing
attention
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/IB2020/000323
Other languages
English (en)
French (fr)
Japanese (ja)
Other versions
WO2021191651A8 (ja
Inventor
大久保翔太
井上裕史
岡本雅紀
西山乗
河西純
寺口剛仁
志小田雄宇
陳放歌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Renault SAS
Nissan Motor Co Ltd
Original Assignee
Renault SAS
Nissan Motor Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Renault SAS, Nissan Motor Co Ltd filed Critical Renault SAS
Priority to US17/907,037 priority Critical patent/US12444424B2/en
Priority to JP2022509740A priority patent/JP7456490B2/ja
Priority to PCT/IB2020/000323 priority patent/WO2021191651A1/ja
Priority to CN202080098932.8A priority patent/CN115315374B/zh
Priority to EP20927218.6A priority patent/EP4129766A4/en
Publication of WO2021191651A1 publication Critical patent/WO2021191651A1/ja
Publication of WO2021191651A8 publication Critical patent/WO2021191651A8/ja
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10KSOUND-PRODUCING DEVICES; METHODS OR DEVICES FOR PROTECTING AGAINST, OR FOR DAMPING, NOISE OR OTHER ACOUSTIC WAVES IN GENERAL; ACOUSTICS NOT OTHERWISE PROVIDED FOR
    • G10K15/00Acoustics not otherwise provided for
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/22Interactive procedures; Man-machine interfaces
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • G06F3/013Eye tracking input arrangements
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/59Context or environment of the image inside of a vehicle, e.g. relating to seat occupancy, driver state or inner lighting conditions
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10KSOUND-PRODUCING DEVICES; METHODS OR DEVICES FOR PROTECTING AGAINST, OR FOR DAMPING, NOISE OR OTHER ACOUSTIC WAVES IN GENERAL; ACOUSTICS NOT OTHERWISE PROVIDED FOR
    • G10K11/00Methods or devices for transmitting, conducting or directing sound in general; Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
    • G10K11/18Methods or devices for transmitting, conducting or directing sound
    • G10K11/26Sound-focusing or directing, e.g. scanning
    • G10K11/34Sound-focusing or directing, e.g. scanning using electrical steering of transducer arrays, e.g. beam steering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; ELECTRIC HEARING AIDS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers
    • H04R3/12Circuits for transducers for distributing signals to two or more loudspeakers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; ELECTRIC HEARING AIDS; PUBLIC ADDRESS SYSTEMS
    • H04R2430/00Signal processing covered by H04R, not provided for in its groups
    • H04R2430/01Aspects of volume control, not necessarily automatic, in sound systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; ELECTRIC HEARING AIDS; PUBLIC ADDRESS SYSTEMS
    • H04R2430/00Signal processing covered by H04R, not provided for in its groups
    • H04R2430/20Processing of the output signals of the acoustic transducers of an array for obtaining a desired directivity characteristic
    • H04R2430/21Direction finding using differential microphone array [DMA]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; ELECTRIC HEARING AIDS; PUBLIC ADDRESS SYSTEMS
    • H04R2499/00Aspects covered by H04R or H04S not otherwise provided for in their subgroups
    • H04R2499/10General applications
    • H04R2499/13Acoustic transducers and sound field adaptation in vehicles
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; ELECTRIC HEARING AIDS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers
    • H04R3/005Circuits for transducers for combining the signals of two or more microphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; ELECTRIC HEARING AIDS; PUBLIC ADDRESS SYSTEMS
    • H04R5/00Stereophonic arrangements
    • H04R5/027Spatial or constructional arrangements of microphones, e.g. in dummy heads
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; ELECTRIC HEARING AIDS; PUBLIC ADDRESS SYSTEMS
    • H04R5/00Stereophonic arrangements
    • H04R5/04Circuit arrangements, e.g. for selective connection of amplifier inputs/outputs to loudspeakers, for loudspeaker detection, or for adaptation of settings to personal preferences or hearing impairments

Definitions

  • the present invention relates to a sound data processing device and a sound data processing method.
  • Patent Document 1 There is known an ambient condition notification device that collects ambient sounds outside the vehicle and reproduces the voice information obtained by the sound collection as sounds with localization inside the vehicle.
  • This surrounding condition notification device determines the attention direction, which is the direction in which the driver is particularly careful in the surrounding direction of the vehicle. Then, the surrounding situation notification device reproduces the sound localized in the attention direction so as to be emphasized more than the sound localized in the direction other than the attention direction around the vehicle.
  • An object to be solved by the present invention is to provide a sound data processing device and a sound data processing method that make it easier for a vehicle occupant to hear a specific sound inside the vehicle.
  • the present invention acquires the first sound data which is the sound data localized in the interior of the vehicle, identifies the attention object which is the object to which the occupant pays attention, and compares with the first sound data.
  • the second sound data which is the sound data that emphasizes the sound related to the sound, and the sound image localization, and outputting the second sound data to the output device that outputs the sound to the occupants.
  • the occupant of the vehicle can easily hear a specific sound inside the vehicle.
  • FIG. 1 is a block diagram showing an example of a sound output system including the sound data processing device according to the first embodiment.
  • FIG. 2 is a block diagram showing each function included in the control device shown in FIG.
  • FIG. 3 is an example of the position information of the sound source in the interior of the vehicle.
  • FIG. 4 is a diagram for explaining a method of identifying a attention object and a sound source corresponding to the attention object by using the position information of the sound source.
  • FIG. 5 is an example of vehicle interior space information.
  • FIG. 6 is a diagram for explaining a method of identifying a caution object and a sound source corresponding to the caution object by using the vehicle interior space information.
  • FIG. 7 is a flowchart showing a process executed by the sound data processing device.
  • FIG. 1 is a block diagram showing an example of a sound output system including the sound data processing device according to the first embodiment.
  • FIG. 2 is a block diagram showing each function included in the control device shown in FIG.
  • FIG. 3 is an example
  • FIG. 8 is a subroutine of step S5 shown in FIG.
  • FIG. 9 is a subroutine of step S6 shown in FIG.
  • FIG. 10 is an example of a scene in which an occupant wearing a head-mounted display interacts with an icon.
  • FIG. 11 is an example of a candidate for a caution object presented to the occupant in the scene shown in FIG.
  • FIG. 12 is the subroutine of step S5 shown in FIG. 7, which is the subroutine according to the second embodiment.
  • FIG. 1 is a block diagram showing an example of a sound output system 100 including a sound data processing device 5 according to the first embodiment.
  • the sound output system 100 includes a sound collecting device 1, an imaging device 2, a database 3, an output device 4, and a sound data processing device 5. These devices are connected by a CAN (Control Area Network) or other in-vehicle LAN in order to exchange information with each other. Further, the devices are not limited to an in-vehicle LAN such as CAN, and may be connected by another wired LAN or wireless LAN.
  • CAN Controller Area Network
  • the sound output system 100 is a system that outputs sound to a person in a vehicle.
  • the sound output by the sound output system 100 will be described later.
  • the vehicle is also equipped with a voice dialogue system, a notification system, a warning system, a car audio system, and the like. Further, in the following description, for convenience, the occupant of the vehicle is also simply referred to as an occupant.
  • the voice dialogue system is a system for interacting with the occupants using voice recognition technology and voice synthesis technology.
  • the notification system is a system for notifying the occupants of information about the equipment mounted on the vehicle by a notification sound.
  • the warning system is a system for warning the occupants of the predicted danger to the vehicle with a warning sound.
  • the car audio system is, for example, a system for playing music recorded on a recording medium by connecting to a recording medium on which music or the like is recorded.
  • the sound data processing device 5, which will be described later, is connected to the system mounted on these vehicles via a predetermined network.
  • the position of the seat on which the occupant sits is not particularly limited. Further, the number of occupants is not particularly limited, and the sound output system 100 outputs sound to one or a plurality of occupants.
  • the sound collecting device 1 is provided in the interior of the vehicle and collects the sound heard by the occupant in the interior of the vehicle.
  • the sound picked up by the sound collecting device 1 is mainly a sound having a sound source in the interior of the vehicle.
  • the sound picked up by the sound collecting device 1 includes, for example, a dialogue between occupants, a dialogue between a voice dialogue system and an occupant, a voice guidance by a voice dialogue system, a notification sound by a notification system, a warning sound by a warning system, and audio. Examples include audio sound from the system.
  • the sound picked up by the sound collecting device 1 may include a sound having a sound source outside the vehicle (for example, an engine sound of another vehicle). In the following description, the wording "inside the vehicle” may be replaced with “inside the vehicle”. Further, "outside the vehicle” may be replaced with "outside the vehicle”.
  • the sound collecting device 1 collects the sound whose sound image is localized in the interior of the vehicle.
  • the sound image-localized sound is a sound that can determine the direction of the sound source and the distance to the sound source when a human hears the sound.
  • a sound whose sound image is localized at a predetermined position with respect to a human being when the human hears the sound, it is as if the sound source is at the predetermined position and the sound is output from that position. feel.
  • binaural recording can be mentioned. In binaural recording, the sound that reaches the human eardrum is recorded.
  • the sound collecting device 1 examples include a binaural microphone, but the form thereof is not particularly limited.
  • the type of the sound collecting device 1 is an earphone type
  • the sound collecting device 1 is attached to the left and right ears of the occupant, respectively.
  • the earphone type the earphone is provided with a microphone, and the sound captured by each of the left and right ears of the occupant can be collected.
  • the sound collecting device 1 may be a headphone type that can be worn by the occupant's head.
  • the sound collecting device 1 when the type of the sound collecting device 1 is a dummy head type, the sound collecting device 1 is provided at a place corresponding to the head of the occupant when seated.
  • An example of a place corresponding to the head of an occupant is the vicinity of a headrest.
  • a dummy head is a recorder in the shape of a human head.
  • a microphone is provided in the ear portion of the dummy head, and the sound can be picked up as if it was captured by the left and right ears of the occupant.
  • the sound image-localized sound is a sound that allows humans to judge the direction of the sound source and the distance to the sound source, so even if the same sound is output from the sound source, it depends on the positional relationship between the human and the sound source. Therefore, how you feel about the direction of the sound source and the distance to the sound source changes. Therefore, in the present embodiment, the vehicle is provided with the same number of sound collecting devices 1 as the number of seats in the vehicle. Further, in the present embodiment, the sound collecting device 1 is provided at the same position as the seat position of the vehicle. As a result, sound data including information on the direction of the sound source felt by each occupant and the distance to the sound source can be acquired without depending on the location of the sound source and the number of sound sources.
  • a sound collecting device 1 is provided in each seat.
  • speakers are provided in the front and left and right sides of the vehicle interior, respectively, and music is played in the interior, for example.
  • the front speaker is closer to the occupant sitting in the driver's seat (the seat on the front right side) than the speakers on the left and right sides, the occupant will direct the sound source of the sound coming from the front to himself. It feels closer than the sound source of the sound that arrives from the left and right.
  • the sound collecting device 1 provided in the driver's seat can collect the sound in a state of reaching the eardrum of the occupant sitting in the driver's seat.
  • the sound collecting device 1 converts the collected sound into a predetermined sound signal, and outputs the converted sound signal as sound data to the sound data processing device 5.
  • the sound data processing device 5 executes data processing of the picked-up sound.
  • the sound data output from the sound collecting device 1 to the sound data processing device 5 includes information that allows the occupant to determine the direction of the sound source and the distance to the sound source.
  • sound data is output from each sound collecting device 1 to the sound data processing device 5.
  • the sound data processing device 5 can determine which seat the sound data is from the sound collecting device 1.
  • the image pickup device 2 images the interior of the vehicle.
  • the captured image captured by the imaging device 2 is output to the sound data processing device 5.
  • Examples of the image pickup device 2 include a camera provided with a CCD element.
  • the type of image captured by the image pickup device 2 is not limited, and the image pickup device 2 may have a function of capturing at least one of a still image and a moving image.
  • the image pickup device 2 is provided at a position in the vehicle interior where the occupant can be imaged, and images the state of the occupant.
  • the location where the image pickup device 2 is provided and the number of image pickup devices 2 are not particularly limited.
  • the image pickup device 2 may be provided for each seat, or may be provided at a position overlooking the entire room.
  • Database 3 stores the position information of the sound source in the interior of the vehicle and the space information in the vehicle regarding the sound source in the interior of the vehicle.
  • the position information of the sound source in the vehicle interior is also simply referred to as the position information of the sound source.
  • vehicle interior space information regarding the sound source in the vehicle interior is also simply referred to as vehicle interior space information.
  • Sound sources in the vehicle interior include speakers and humans (occupants).
  • the position information of the sound source indicates the installation position of the speaker or the position of the occupant's head while seated in the seat. A specific example of the position information of the sound source will be described later.
  • the vehicle interior space information is information used to associate an object in the vehicle interior to which the occupant pays attention with a sound source in the vehicle interior. Specific examples of vehicle interior space information will be described later.
  • the database 3 outputs the position information of the sound source and the space information in the vehicle to the sound data processing device 5 in response to the access from the sound data processing device 5.
  • Sound data is input to the output device 4 from the sound data processing device 5.
  • the output device 4 generates a reproduced sound based on the sound data and outputs the reproduced sound as stereophonic sound.
  • the output device 4 when the sound data output from the sound data processing device 5 to the output device 4 includes a stereo recording signal, the output device 4 outputs the reproduced sound using the stereo method.
  • the output device 4 includes a speaker.
  • the installation location and the number of output devices 4 are not particularly limited.
  • the output devices 4 are provided in the interior of the vehicle as many as the number of output devices 4 capable of outputting the reproduced sound as stereophonic sound. Further, the output device 4 is provided at a predetermined position in the vehicle interior so that the reproduced sound can be output as stereophonic sound.
  • an output device 4 is provided for each seat in order to give different stereophonic sound to each occupant. As a result, it is possible to reproduce the sound as if it were captured by the left and right ears of each occupant.
  • the output device 4 may be a device other than the speaker.
  • the output device 4 when the sound data output from the sound data processing device 5 to the output device 4 includes a binaural recording signal, the output device 4 outputs the reproduced sound using the binaural method.
  • examples of the output device 4 include earphones that can be attached to both ears and headphones that can be attached to the head.
  • the output device 4 is attached or attached to each occupant. As a result, it is possible to reproduce the sound as if it were captured by the left and right ears of each occupant.
  • the sound data processing device 5 is composed of a computer equipped with hardware and software, and has access to a ROM (Read Only Memory) that stores a program, a CPU (Central Processing Unit) that executes a program stored in the ROM, and an access. It is composed of a RAM (Random Access Memory) that functions as a possible storage device.
  • MPU Micro Processing Unit
  • DSP Digital Signal Processor
  • ASIC Application Specific Integrated Circuit
  • FPGA Field Array
  • the control device 50 shown in FIG. 1 corresponds to a CPU
  • the storage device 51 shown in FIG. 1 corresponds to a ROM and a RAM.
  • the sound data processing device 5 is provided in the vehicle as a module.
  • FIG. 2 is a block diagram showing each function included in the control device 50 shown in FIG.
  • the function included in the control device 50 will be described with reference to FIG.
  • the control device 50 includes a sound data acquisition unit 150, a caution object identification unit 160, a sound data processing unit 170, and a sound data output unit 180, and these blocks are ROMs. Each function described later is realized by the software established by.
  • the sound data acquisition unit 150 acquires sound data from the sound collecting device 1. If the sound data acquisition unit 150 can acquire sound data from a system other than the sound output system 100, the sound data acquisition unit 150 acquires the sound data from this system. Examples of the system other than the sound output system 100 include a voice dialogue system, a notification system, a warning system, and a car audio system. In the following description, for convenience, the sound data acquired by the sound data acquisition unit 150 is also referred to as the first sound data. Further, in the following description, processing for one occupant will be described as an example, but when there are a plurality of occupants, that is, when there are a plurality of first sound data, each first sound data is The following description shall be processed.
  • the sound data output from the sound collecting device 1 includes information that allows the occupant to determine the direction of the sound source and the distance to the sound source.
  • the sound data acquisition unit 150 uses the position information of the sound source stored in the database 3 to specify the position of the sound source for one type or a plurality of types of sounds heard by the occupant. For example, when the first sound data includes the voices of other occupants, the sound data acquisition unit 150 determines that the sound source is an occupant and identifies the position of the occupant by referring to the position information of the sound source. do.
  • the sound data acquisition unit 150 acquires sound data from a voice dialogue system or the like, it determines that the sound source is a speaker and specifies the position of the speaker by referring to the position information of the sound source. At this time, the sound data acquisition unit 150 analyzes the first sound data and identifies the speaker that the occupant feels closest to as a sound source among all the speakers installed in the vehicle interior.
  • the caution target identification unit 160 identifies the caution target, which is the object to which the occupants of the vehicle pay attention. Further, the attention object identification unit 160 specifies a sound source corresponding to the attention object.
  • the object is a device or a human being in the interior of the vehicle.
  • the attention object identification unit 160 includes an motion recognition unit 161 and a line-of-sight recognition unit 162 as a functional block for determining whether or not the occupant is paying attention to the object, and the utterance content. It has a recognition unit 163. These blocks are blocks for recognizing the behavior of the occupant. Further, the attention object identification unit 160 has a sound source identification unit 164 as a functional block for specifying the sound source of the attention object and the sound related to the attention object.
  • the motion recognition unit 161 recognizes the motion of the occupant based on the captured image captured by the image pickup device 2. For example, the motion recognition unit 161 recognizes the gesture of the occupant by executing image processing for analyzing the state of the occupant's hand on the captured image. Further, when the occupant's gesture is pointing, the motion recognition unit 161 recognizes the position indicated by the finger or the direction indicated by the finger. In the following description, for convenience, the position indicated by the finger is also referred to as the indicated position, and the direction indicated by the finger is also referred to as the indicated direction.
  • the motion recognition unit 161 extracts the feature points of the hand from the portion of the captured image captured by the image pickup device 2 in which the occupant's hand is captured. Then, the motion recognition unit 161 determines whether or not the gesture by the occupant corresponds to the gesture pointed by the finger by comparing the extracted feature point with the feature point stored in the storage medium. For example, when the number of the extracted feature points that match the feature points stored in the storage medium is a predetermined number or more, the motion recognition unit 161 determines that the gesture of the occupant is pointing.
  • a storage medium such as a hard disk (HDD) or ROM.
  • the motion recognition unit 161 determines that the occupant's gesture is other than pointing.
  • the predetermined number is a threshold value for determining whether or not the occupant's gesture is only pointing, and is a predetermined threshold value.
  • the above-mentioned determination method is an example, and the motion recognition unit 161 can determine whether or not the occupant's gesture is only pointing by using the technique known at the time of filing the application of the present application.
  • the line-of-sight recognition unit 162 recognizes the line of sight of the occupant based on the captured image captured by the image pickup device 2. For example, the line-of-sight recognition unit 162 recognizes the line-of-sight direction of the occupant by executing image processing for analyzing the state of the occupant's eyes on the captured image. Further, the line-of-sight recognition unit 162 recognizes the position where the occupant is gazing or the direction in which the occupant is gazing when the occupant is gazing.
  • the gaze position is a predetermined position in the interior of the vehicle, and the gaze direction is a predetermined direction in the interior of the vehicle. In the following description, for convenience, the position where the occupant gazes is also referred to as a caution position, and the direction in which the occupant gazes is also referred to as a gaze direction.
  • the line-of-sight recognition unit 162 continuously monitors the portion of the captured image captured by the image pickup device 2 in which the eyes of the occupant are reflected.
  • the line-of-sight recognition unit 162 determines that the occupant is gazing, for example, when the line-of-sight of the occupant does not move for a certain period of time or more and points in the same direction.
  • the line-of-sight recognition unit 162 determines that the occupant is not gazing when the line of sight of the occupant moves within a certain period of time.
  • the fixed time is a threshold value for determining whether or not the occupant is gazing, and is a predetermined threshold value.
  • the above-mentioned determination method is an example, and the line-of-sight recognition unit 162 can determine whether or not the occupant is gazing at by using the technique known at the time of filing the application of the present application.
  • the utterance content recognition unit 163 acquires the voice of the occupant from the device that collects the voice in the vehicle interior, and recognizes the utterance content based on the voice of the occupant.
  • the device that collects the voice of the occupant may be a sound collecting device 1 or a sound collecting device different from the sound collecting device 1.
  • the utterance content recognition unit 163 recognizes the utterance content of the occupant by executing a voice recognition process for recognizing the occupant's voice with respect to the sound data corresponding to the occupant's voice.
  • the utterance content recognition unit 163 can recognize the utterance content of the occupant by using the voice recognition technology known at the time of filing the application of the present application.
  • the object identification unit 160 is paying attention to the object by using at least one of the results obtained by the motion recognition unit 161, the line-of-sight recognition unit 162, and the utterance content recognition unit 163. Judge whether or not.
  • the attention object specifying unit 160 may make a determination by using a priority order for the results of each block, a weighting process, or the like.
  • the attention object identification unit 160 determines that the occupant is paying attention to the object. Further, for example, when the line-of-sight recognition unit 162 determines that the occupant is gazing, the attention object identification unit 160 determines that the occupant is paying attention to the object. Further, for example, when the utterance content recognition unit 163 determines that the voice of the occupant contains a specific keyword or a specific key phrase, the attention object identification unit 160 determines that the occupant is paying attention to the object. ..
  • the specific keyword or specific key phrase is a keyword or key phrase for determining whether or not the occupant is paying attention to the object, and is predetermined.
  • Specific keywords include, for example, words related to devices installed in vehicles such as "navigation voice”. Further, as a specific key phrase, for example, a phrase expressing a desire such as "let me hear X" or “want to see Y" can be mentioned.
  • the sound source identification unit 164 includes at least one of the results obtained by the motion recognition unit 161, the line-of-sight recognition unit 162, and the utterance content recognition unit 163, and the position information or vehicle interior space information of the sound source stored in the database 3. Based on this, the object to be watched and the sound source corresponding to the object to be watched are specified.
  • the attention object specifying unit 160 may be specified by using a priority or weighting process for the results of each block.
  • the sound source specifying unit 164 identifies the object to be watched and the sound source corresponding to the object to be watched based on the designated position or direction of the occupant and the position information of the sound source or the space information in the vehicle. Further, the sound source specifying unit 164 identifies the attention object and the sound source corresponding to the attention object based on the occupant's gaze position or gaze direction and the position information of the sound source or the space information in the vehicle. Further, the sound source specifying unit 164 identifies the attention object and the sound source corresponding to the attention object based on the utterance content of the occupant and the position information or the vehicle interior space information of the sound source.
  • FIG. 3 is an example of the position information of the sound source in the vehicle interior stored in the database 3.
  • FIG. 3 shows a plan view showing the interior of the vehicle V.
  • Vehicle V has two seats in the front and two seats in the rear.
  • the traveling direction of the vehicle V is the upper side of the drawing.
  • P 11 to P 15 indicate the positions where the speakers are arranged
  • P 22 to P 25 indicate the positions of the occupant's heads while seated in the seat.
  • P 22 to P 25 are superposed on the seat.
  • D indicates a display embedded in the instrument panel.
  • the display D displays a menu screen by the navigation system, guidance information to the destination, and the like.
  • the navigation system is a system included in the voice dialogue system.
  • FIG. 4 is a diagram for explaining a method of identifying a attention object and a sound source corresponding to the attention object by using the position information of the sound source.
  • P 11 ⁇ P 15, P 22 , P 23 shown in FIG. 4, and P 25 is, P 11 ⁇ P 15, P 22 , P 23 shown in FIG. 3, and corresponding to P 25.
  • U 1 indicates an occupant seated in the driver's seat. Crew U 1 is facing to the left with respect to the traveling direction of vehicle V.
  • the line of sight of the occupant U 1 is indicated by a dotted arrow.
  • the occupant U 1 points to the left side with respect to the traveling direction of the vehicle V.
  • FIG. 4 shows the pointing direction of the occupant U 1 by solid arrows.
  • the vehicle V is stopped or parked in place, or the vehicle V is assumed to be traveling automatically or autonomously by the so-called automatic cruise function, passenger U 1 is traveling It is assumed that the driving of the vehicle V is not affected even if the vehicle faces the left side with respect to the direction.
  • the occupant U 2 is seated in the passenger seat of the vehicle V in FIG. 4, and the occupant U 1 and the occupant U 2 are in dialogue with each other.
  • the sound source identification unit 164 compares the indicated position and the position of each sound source of the occupant U 1 (position P 11 ⁇ position P 15, the position P 22, and the position P 25). When the sound source specifying unit 164 specifies that the designated position of the occupant U 1 is near the position P 22 , the sound source specifying unit 164 identifies the occupant U 2 as an object of caution. At this time, the sound source specifying unit 164 identifies that the sound that the occupant U 1 is trying to listen to is the voice of the occupant U 2. Further, the sound source identifying unit 164, since the sound occupant U 1 is to listen to the attention is issued from the occupant U 2, identified as a sound source corresponding to occupant U 2 to the attention object.
  • the sound source specifying unit 164 replaces the indicated position with the indicated direction, the gaze position, or the gaze direction, and then uses the same method as the specific method using the indicated position to pay attention to the object to be watched and the object to be watched.
  • the sound source corresponding to the object can be specified.
  • FIG. 5 is an example of vehicle interior space information stored in the database 3.
  • FIG. 5 shows a plan view showing the interior of the vehicle V as in FIGS. 3 and 4.
  • R 1 indicates a region related to the notification sound.
  • the area related to the notification sound includes, for example, a speedometer, a fuel gauge, a water temperature gauge, an odometer, etc. located in front of the driver's seat.
  • the area related to the notification sound may include the center console between the driver's seat and the passenger seat, the shift lever, and the operation unit of the air conditioner, and is located in front of the passenger seat.
  • a storage space, a so-called dashboard, may be included.
  • Region R 1 is associated with the speaker located at the position P 11 ⁇ P 15 in FIG. 3.
  • R 2 shows an area related to the voice dialogue between the navigation system and the occupant.
  • the area related to voice dialogue includes a display for displaying a menu screen or the like of a navigation system.
  • R 2 corresponds to the display D shown in FIG. Region R 2 is associated with the speaker located at the position P 11 ⁇ P 15 in FIG. 3.
  • R 3 indicates a region related to the utterance of the occupant. Areas related to occupant utterances include seats in which the occupants sit. Region R 3 is associated with an occupant seated on P 22 to P 25 in FIG.
  • FIG. 6 is a diagram for explaining a method of identifying a caution object and a sound source corresponding to the caution object by using the vehicle interior space information.
  • R 1 to R 3 shown in FIG. 6 correspond to R 1 to R 3 shown in FIG.
  • U 1 indicates an occupant seated in the driver's seat.
  • Crew U 1 is looking at display D (see FIG. 3).
  • the line of sight of the occupant U 1 is indicated by a dotted arrow. Note that the scene shown in the example of FIG. 6, similar to the situation shown in the example of FIG. 4, the passenger U 1 is also looking at the display D, the operation of the vehicle V and the scene is not affected.
  • the sound source specifying unit 164 compares the gaze position of the occupant U 1 with each region (region R 1 to region R 3 ).
  • the display D is specified as an object of caution.
  • the sound source identifying unit 164 the correspondence between the region R 2 and the speaker, the sound occupant U 1 is to listen paying attention to identify as the output sound from the speaker.
  • the sound source identifying unit 164 since the sound occupant U 2 is to listen to attention is outputted from these speakers, to identify these speakers as a sound source that corresponds to the attention object.
  • the speaker identified are a plurality of speakers arranged in a position P 11 ⁇ P 15 in FIG. 3.
  • the sound source specifying unit 164 identifies the speaker that the occupant feels closest to as a sound source among the plurality of speakers.
  • Sound source identification unit 164 by analyzing the first audio data, among the plurality of speakers, to identify the speaker and its position occupant U 1 feels closest.
  • a sound source identifying unit 164 a result of analyzing the first audio data, the speakers occupant U 1 feels closest is a speaker located at the position P 14 shown in FIG. 3 specific do.
  • the sound source that the occupant feels closest to is specified. Since the sound output from the sound source is emphasized by the sound data processing unit 170 described later, the emphasized sound can be effectively transmitted to the occupant.
  • the sound data processing unit 170 executes a process of emphasizing a specific sound more than other sounds with respect to the sound data collected by the sound collecting device 1, and the sound image is localized in the vehicle interior. Generate data.
  • the sound data generated by the sound data processing unit 170 is also referred to as a second sound data.
  • the sound data processing unit 170 generates the second sound data in which the sound related to the attention object is emphasized as compared with the first sound data acquired by the sound data acquisition unit 150.
  • the caution object is an object that the occupant pays attention to, which is specified by the caution object identification unit 160.
  • the volume or sound intensity of the sound related to the object of interest is relatively larger than the volume or intensity of other sounds as compared with the first sound data.
  • the sound related to the attention object is either a sound output from the attention object, a sound associated with the attention object and output from a related object different from the attention object, or both sounds. ..
  • the sound related to the attention object includes the sound output from the attention object and the sound of at least one of the sounds output from the related object.
  • the sound data processing unit 170 targets the voice emitted by another occupant for the emphasis processing.
  • the object of attention is the display, and the sound source corresponding to the object of attention is the speaker.
  • the object associated with the attention object and outputting the sound is recognized as the related object.
  • the sound data processing unit 170 recognizes the speaker as a related object, and the sound data processing unit 170 targets the output sound from the speaker for emphasis processing.
  • the sound source corresponding to the object of attention is the specific occupant.
  • the specified object is recognized as a related object.
  • the sound data processing unit 170 recognizes a plurality of occupants other than a specific occupant, that is, other occupants as related objects. Then, the sound data processing unit 170 targets not only the voice of a specific occupant but also the voice of another occupant for the emphasis processing.
  • the sound data processing unit 170 has a type determination unit 171 and a sound signal processing unit 172.
  • the type determination unit 171 determines whether or not the type of the sound related to the attention object to be emphasized is a type that can be controlled via a system different from the sound output system 100.
  • Examples of the system different from the sound output system 100 include a voice dialogue system, a notification system, a warning system, a car audio system, and the like. Targets controlled by these systems include, for example, volume and sound intensity.
  • the type determination unit 171 determines the type of the sound related to the attention object as a type that can be controlled via the system. In other words, the type determination unit 171 determines that sound data related to the attention object can be acquired from a system different from the sound output system 100.
  • the sound signal processing unit 172 acquires the sound data related to the object to be watched from the corresponding system, and generates the second sound data by executing the process of superimposing the acquired data on the first sound data.
  • the sound data related to the object of interest is also referred to as the third sound data.
  • the sound signal processing unit 172 when the sound signal processing unit 172 generates the second sound data, the sound signal processing unit 172 performs a process of increasing the volume or a process of increasing the sound intensity with respect to the acquired third sound data, and then converts the acquired third sound data into the first sound data. On the other hand, it may be superimposed.
  • the above-mentioned method of emphasizing the sound related to the object to be noted is an example, and the sound signal processing unit 172 uses the sound emphasizing process known at the time of filing the application of the present application to emphasize the sound related to the object to be noted. Can be emphasized compared to other sounds.
  • the sound signal processing unit 172 executes a process of increasing the volume of the sound related to the object to be watched relative to other sounds. You may. In this case, the sound signal processing unit 172 uses the volume-adjusted sound data as the second sound data.
  • the type determination unit 171 determines that the type of the sound related to the attention object cannot be controlled via the system. In other words, the type determination unit 171 determines that sound data relating to the object of interest cannot be acquired from the predetermined system.
  • the sound signal processing unit 172 extracts sound data related to the object of interest from the first sound data, and executes emphasis processing on the extracted sound data to generate second sound data.
  • the sound data output unit 180 outputs the second sound data generated by the sound data processing unit 170 to the output device 4.
  • FIG. 7 is a flowchart showing a process executed by the sound data processing device 5 according to the present embodiment.
  • step S1 the sound data processing device 5 acquires the first sound data from the sound collecting device 1.
  • the first sound data includes information that allows the occupant to determine the direction of the sound source and the distance to the sound source.
  • step S2 the sound data processing device 5 acquires an captured image of the interior of the vehicle from the imaging device 2.
  • the sound data processing device 5 recognizes the behavior of the occupant based on the first sound data acquired in step S1 or the captured image acquired in step S2. For example, the sound data processing device 5 determines whether or not the occupant is pointing based on the captured image. When the sound data processing device 5 determines that the occupant is pointing, the sound data processing device 5 specifies an instruction position or an instruction direction indicated by the occupant based on the captured image. The sound data processing device 5 determines whether or not the occupant is gazing based on the captured image, and if it is determined that the occupant is gazing, the sound data processing device 5 specifies the gaze position or gaze direction that the occupant is gazing at. May be good.
  • the sound data processing device 5 may recognize the utterance content of the occupant based on the first sound data. By performing one or more of these processes, the sound data processing device 5 recognizes the behavior of the occupant.
  • the processing in steps S1 to S3 described above is continuously performed in steps S5 and subsequent steps described later.
  • step S4 the sound data processing device 5 determines whether or not the occupant is paying attention to the object based on the occupant's behavior recognized in step S3. Taking pointing as an example, when it is determined in step S3 that the occupant is pointing, the sound data processing device 5 determines that the occupant is paying attention to the object. In this case, the process proceeds to step S5.
  • step S3 when the sound data processing device 5 determines in step S3 that the occupant is not pointing, it determines that the occupant is not paying attention to the object. In this case, the process returns to step S1.
  • the above determination method is an example, and the sound data processing device 5 determines whether or not the occupant is paying attention to the object based on the combination of the other determination results and the determination results obtained in step S3. can.
  • step S4 If it is determined in step S4 that the occupant is paying attention to the object, the process proceeds to step S5.
  • step S5 the process proceeds to the subroutine shown in FIG. 8, and the sound data processing device 5 performs processing such as specifying an object to be watched.
  • FIG. 8 is a subroutine of step S5 shown in FIG.
  • step S51 the sound data processing device 5 acquires the position information of the sound source in the vehicle interior from the database 3.
  • Examples of the position information of the sound source include a plan view showing the interior of the vehicle as shown in FIG.
  • step S52 the sound data processing device 5 acquires vehicle interior space information from the database 3.
  • Examples of the vehicle interior space information include a plan view showing the interior of the vehicle as shown in FIG.
  • the sound position information and the vehicle interior space information may be any information that represents the interior of the vehicle, and the form thereof is not limited to the plan view.
  • step S53 the sound data processing device 5 identifies a caution object, which is an object to which the occupant pays attention, based on the position information of the sound source acquired in step S51 or the vehicle interior space information acquired in step S52.
  • the sound data processing device 5 when the occupant seated in the driver's seat points his / her finger toward the passenger seat side, the sound data processing device 5 is based on the occupant's designated position or direction and the position information of the sound source. Identify the occupant seated in the passenger seat as an object of caution. Further, the sound data processing device 5 specifies this occupant as a sound source corresponding to the object of interest.
  • the sound data processing device 5 displays the display based on the occupant's gaze position or gaze direction and the vehicle interior space information. Specify as an object of caution. Further, the sound source associated with the region R 2 as shown in FIG. 6 is the case of the speaker, the sound data processing unit 5 is specified as the sound source corresponding speaker associated with the attention object.
  • step S53 When the process in step S53 is completed, the process proceeds to step S6 shown in FIG. In step S6, the second sound data generation process and the like are performed.
  • FIG. 9 is a subroutine of step S6 shown in FIG.
  • step S61 the sound data processing device 5 determines whether or not the sound source corresponding to the attention object specified in step S53 shown in FIG. 8 is of a type that can be controlled via a system different from the sound output system 100. do. For example, when the sound data processing device 5 can acquire the third sound data, which is the sound data related to the attention object, from a system different from the sound output system 100, the sound source corresponding to the attention object goes through another system. Determine the type that can be controlled. Sounds corresponding to these types include, for example, voices programmed by a voice dialogue system, notification sounds set by a notification system, warning sounds set by a warning system, audio sounds set by a car audio system, and the like. Can be mentioned.
  • the sound data processing device 5 determines that the sound source corresponding to the object to be noted is of a type that cannot be controlled via another system. do. Examples of sounds corresponding to such types include the voice of an occupant.
  • step S62 the sound data processing device 5 executes an emphasis process for emphasizing the sound related to the object of interest as compared with other sounds, according to the determination result in step S61.
  • the sound data processing device 5 acquires the third sound data from the voice dialogue system, and with respect to the first sound data acquired in step S1. The third sound data is superimposed.
  • the sound data processing device 5 extracts the third sound data from the first sound data acquired in step S1 and emphasizes the extracted third sound data. To execute.
  • step S63 the sound data processing device 5 generates second sound data in which the sound related to the attention object is emphasized, based on the execution result in step S62.
  • step S63 the process proceeds to step S7 shown in FIG.
  • step S7 the sound data processing device 5 outputs the second sound data generated in step S6 to the output device 4.
  • This step is a step indicating that the output of the second sound data from the sound data processing device 5 to the output device 4 has started.
  • step S8 the sound data processing device 5 determines whether or not the attention of the occupant has deviated from the object of caution.
  • the sound data processing device 5 determines from the behavior recognition result of the occupant in step S3 that the occupant's attention is not directed to the attention object specified in step S5, the occupant's attention deviates from the attention object. Judged as In this case, the process proceeds to step S9.
  • step S9 the sound data processing device 5 stops the process of generating the second sound data, and ends the process shown in the flowchart of FIG. 7.
  • the sound data processing device 5 determines that the occupant's attention has deviated from the attention object when there is no attention object at or near the occupant's instruction position based on the occupant's instruction position and the position information of the sound source. ..
  • the above-mentioned determination method is an example.
  • a gesture for determining that the object is deviated from the attention object is set in advance, and when it is recognized that the occupant has performed the gesture, the occupant It may be determined that the attention of is deviated from the object of caution.
  • the sound data processing device 5 determines that the attention of the occupant is directed to the attention object specified in step S5, it determines that the attention of the occupant does not deviate from the attention object. In this case, the process proceeds to step S10.
  • the sound data processing device 5 is based on the occupant's instruction position and the position information of the sound source, and when there is a caution object at or near the occupant's instruction position, the occupant's attention is not deviated from the attention object. judge.
  • the above-mentioned determination method is an example. For example, when the sound data processing device 5 has preset a gesture for determining that the object has deviated from the attention object and recognizes that the occupant has not performed the gesture. , It may be determined that the occupant's attention does not deviate from the object of caution.
  • step S10 the sound data processing device 5 determines whether or not a sound related to the object to be watched is output. For example, if the sound data processing device 5 cannot confirm the output from the sound source corresponding to the attention object for a predetermined time, the sound data processing device 5 determines that the sound related to the attention object is not output. In this case, the process proceeds to step S9. In step S9, the sound data processing device 5 stops the process of generating the second sound data, and ends the process shown in the flowchart of FIG. 7.
  • the predetermined time is a time for determining whether or not a sound related to the object to be watched is output, and is a preset time.
  • the sound data processing device 5 can confirm the output from the sound source corresponding to the attention object within a predetermined time, it determines that the sound related to the attention object is being output. In this case, the process returns to step S8.
  • the sound data processing device 5 includes a sound data acquisition unit 150 that acquires first sound data, which is sound data localized in the vehicle interior, and an object to which the occupant pays attention.
  • the attention object identification unit 160 for specifying the attention object
  • the sound data processing unit 170 for generating the second sound data which is the sound data in which the sound related to the attention object is emphasized as compared with the first sound data.
  • the output device 4 is provided with a sound data output unit 180 for outputting the second sound data.
  • the attention object identification unit 160 acquires an image of the occupant from the image pickup device 2, recognizes the instruction position or direction of the occupant based on the acquired image, and generates a sound source from the database 3.
  • the position information or the in-vehicle space information of the vehicle is acquired, and the object to be watched is specified based on the recognized indicated position or instructed direction and the position information of the sound source or the in-vehicle space information.
  • the occupant can convey the object of interest to the sound data processing device 5 by an intuitive and efficient method called a gesture.
  • the sound data processing device 5 can accurately identify the object to be watched.
  • the attention object identification unit 160 recognizes the gaze position or gaze direction of the occupant based on the captured image acquired from the image pickup device 2, and obtains the position information of the sound source or the space information in the vehicle from the database 3.
  • the object to be watched is specified based on the acquired and recognized gaze position or gaze direction and the position information of the sound source or the space information in the vehicle.
  • the occupant can convey the object to be watched to the sound data processing device 5 by an intuitive and efficient method of line of sight.
  • the sound data processing device 5 can accurately identify the object to be watched.
  • the attention object identification unit 160 acquires the voice of the occupant from the sound collecting device 1 or other sound collecting device, recognizes the utterance content of the occupant based on the acquired voice of the occupant, and recognizes the utterance content of the occupant. Identify the object to be noted based on the recognized utterance content.
  • the occupant can convey the object to be watched to the sound data processing device 5 by an intuitive and efficient method of utterance content.
  • the sound data processing device 5 can accurately identify the object to be watched.
  • the sound related to the object to be watched is the sound output from the object to be watched.
  • the emphasized sound arrives from the direction in which the occupant pays attention, so that the occupant can easily hear the sound he / she wants to pay attention to.
  • the sound related to the attention object is a sound associated with the attention object and output from a related object different from the attention object.
  • the object to be watched is the voice guidance of the navigation system
  • the voice guidance corresponding to the information displayed on the display is emphasized, so be careful of the object that the occupant does not output sound. Even if you point it, it will be easier to hear the sound related to the object.
  • the sounds related to the attention object are the sounds output from the attention object and the sounds output from the related objects.
  • the sounds related to the attention object are the sounds output from the attention object and the sounds output from the related objects.
  • the sound associated with the subject is emphasized. Even if the occupant does not pay attention, the emphasized sound arrives from the object related to the object to which the occupant pays attention. It is possible to provide a sound output system 100 having excellent convenience for the occupant.
  • the sound data processing unit 170 when the sound data processing unit 170 can acquire the third sound data which is the sound data related to the attention object from a system different from the sound output system 100, the sound data processing unit 170 is the first with respect to the first sound data.
  • the second sound data is generated by executing the process of superimposing the three sound data.
  • an object whose volume and sound intensity can be directly controlled is an object of emphasis processing such as voice guidance by a navigation system, it is possible to emphasize the sound that the occupant wants to hear carefully by a simple method. can.
  • the sound data processing unit 170 when the sound data processing unit 170 cannot acquire the third sound data from a system different from the sound output system 100, the sound data processing unit 170 emphasizes the sound with respect to the third sound data included in the first sound data. By executing the process, the second sound data is generated. For example, even for an object whose volume and sound intensity cannot be directly controlled, such as the voice of an occupant, only such an object can be emphasized. Regardless of the target of the emphasis processing, the sound that the occupant wants to hear carefully can be emphasized.
  • the sound data acquisition unit 150 acquires the first sound data from the sound collecting device 1 that binaurally records the sound generated in the interior of the vehicle.
  • the first sound data includes information that allows the occupant to determine the direction of the sound source and the distance to the sound source. Attention After performing the process of emphasizing the sound related to the object, the sound image-localized sound can be transmitted to the occupant without performing the sound image localization process on the sound. The complicated process of sound image localization processing can be omitted, and the calculation load of the sound data acquisition unit 150 can be reduced.
  • the sound source in the vehicle interior and its position can be easily specified from the information that allows the occupant to determine the position of the sound source and the distance to the sound source. Furthermore, it is possible to reproduce the sound as if it were captured by the left and right ears of the occupant.
  • the attention object identification unit 160 determines whether or not the occupant is paying attention to the attention object after identifying the attention object.
  • the sound data processing unit 170 determines that the attention of the occupant is not directed to the attention object, the sound data processing unit 170 stops the generation of the second sound data.
  • the sound data processing device 5 according to the second embodiment will be described.
  • the sound collecting device 1 and the output device 4 shown in FIG. 1 are provided in the head-mounted display type device, and the attention object and the sound source corresponding to the attention object are icons called avatars.
  • the sound data processing device 5 has the same configuration as that of the first embodiment described above, except that a part of the functions provided in the sound data processing device 5 is different from the above-described first embodiment. Therefore, for the same configuration as that of the first embodiment, the description in the first embodiment is incorporated.
  • a head-mounted display type device is used.
  • the head-mounted display type device is equipped with AR (Augmented Reality) technology.
  • An icon also called an avatar
  • the occupant wearing this device can see the icon (also called an avatar) through the display and can interact with the icon.
  • a device is also simply referred to as a head-mounted display.
  • the object includes an icon presented to the occupant through a head-mounted display, in addition to a device or a human being in the vehicle interior.
  • the sound collecting device 1 and the output device 4 are integrally provided as a head-mounted display as in the present embodiment, for example, the sound collecting device 1 and the output device 4 include headphones capable of binaural recording.
  • a dialogue with a human being outside the vehicle can be mentioned.
  • the remote location may be outside the vehicle and is not particularly limited.
  • an icon corresponding to a person in a remote place is displayed at a position corresponding to the passenger seat.
  • the sound of a human being at a remote location is output from the headphones.
  • FIG. 10 is an example of a scene in which an occupant wearing a head-mounted display interacts with an icon.
  • FIG. 10 corresponds to the scene of FIG. 4 used in the description of the first embodiment.
  • the occupant U 1 is wearing the head-mounted display (HD).
  • Figure 10 shows a gaze direction of the occupant U 1 by a dotted line arrow.
  • the sound data processing device 5 has a function of presenting a candidate for a caution object to an occupant via a head-mounted display and allowing the occupant to select a candidate for the attention object in the process of identifying the object to be watched.
  • FIG. 11 is an example of a candidate for a caution object presented to the occupant in the scene shown in FIG. In FIG. 11, I indicates an icon corresponding to a human being in a remote place.
  • the sound data processing device 5 acquires an captured image corresponding to the field of view of an occupant from an imaging device mounted on a head-mounted display. Then, as shown in the example of FIG. 11, the sound data processing device 5 superimposes P 12 and P 22 indicating the position of the sound source on the captured image in which the passenger seat is captured. As shown in FIG. 11, the sound data processing device 5 presents the position of the sound source to the occupant in a manner in which the occupant can identify the sound source.
  • the sound data processing device 5 determines whether or not there are a plurality of candidates for caution objects on the screen visually recognized by the occupant. Further, when the sound data processing device 5 determines that there are a plurality of candidates for attention objects, it determines whether or not there are a plurality of categories.
  • the categories are classified into, for example, occupants or icons, speakers, and the like. Further, the categories may be classified according to whether or not they can be controlled via a system other than the sound output system 100.
  • the sound data processing device 5 determines that there are a plurality of categories of candidates for the object to be watched, the sound data processing device 5 requests the occupant to select the object to be watched. The sound data processing device 5 identifies one candidate of attention object selected by the occupant as the attention object. Note that P 12 and P 22 shown in FIG. 11 correspond to P 12 and P 22 shown in FIG.
  • FIG. 12 is the subroutine of step S5 shown in FIG. 7, which is the subroutine according to the present embodiment. Further, FIG. 12 is a diagram for explaining a method of specifying a caution object executed by the sound data processing device 5 according to the present embodiment.
  • the same processing as that of the subroutine of step S5 shown in FIG. 7 according to the first embodiment is designated by the same reference numerals as those in FIG. 7, and the description thereof is incorporated.
  • step S151 the sound data processing device 5 presents a candidate for the object to be watched.
  • the sound data processing device 5 presents a plurality of candidates for attention objects in the manner shown in the example of FIG.
  • step S152 the sound data processing device 5 determines whether or not there are a plurality of candidates for the attention object presented in step S152. If it is determined that there are a plurality of candidates for the attention object, the process proceeds to step S153, and if it is determined that there are no multiple candidates for the attention object, the process proceeds to step S54.
  • step S152 If it is determined in step S152 that there are a plurality of candidates for caution objects, the process proceeds to step S153.
  • step S153 the sound data processing device 5 determines whether or not there are a plurality of categories of candidates for attention objects. If it is determined that there are a plurality of categories of attention object candidates, the process proceeds to step S154, and if it is determined that there are no plurality of attention object candidate categories, the process proceeds to step S54.
  • step S153 If it is determined in step S153 that there are a plurality of categories of candidates for caution objects, the process proceeds to step S154.
  • step S154 the sound data processing device 5 receives the selection signal from the occupant. For example, the occupant selects one candidate for attention object from a plurality of candidates for attention object by making a gesture such as pointing.
  • the process in step S154 is completed, the process proceeds to step S54, and the object to be watched is specified.
  • the sound data processing device 5 is applied to a head-mounted display type device equipped with AR technology.
  • the occupant can be made to select the object to be watched, and the sound that the occupant wants to hear with caution can be accurately emphasized and output.
  • an object that outputs sound, such as an icon, that does not actually exist can be included in the attention object.
  • the target of the sound that the occupant wants to hear carefully can be expanded.
  • a method using the position information of the sound source or the space information in the vehicle has been described as an example, but among the position information of the sound source and the space information in the vehicle, At least one of them may be used to identify the object to be watched.
  • the attention object may be specified only by using the position information of the sound source, or the attention object may be specified by using only the vehicle interior space information.
  • the method of specifying the attention object by using the vehicle interior space information may be used.
  • some of the functions of the sound data processing device 5 may use the functions of the head-mounted display type device.
  • the sound data processing device 5 uses these devices or devices to perform the movement, line of sight, and sound of the occupant. You may get information about. Then, the sound data processing device 5 may perform processing of occupant's motion recognition, occupant's line of sight recognition, or occupant's utterance recognition using the acquired information.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Otolaryngology (AREA)
  • Signal Processing (AREA)
  • Fittings On The Vehicle Exterior For Carrying Loads, And Devices For Holding Or Mounting Articles (AREA)
  • Circuit For Audible Band Transducer (AREA)
PCT/IB2020/000323 2020-03-25 2020-03-25 音データ処理装置および音データ処理方法 Ceased WO2021191651A1 (ja)

Priority Applications (5)

Application Number Priority Date Filing Date Title
US17/907,037 US12444424B2 (en) 2020-03-25 2020-03-25 Sound data processing device and sound data processing method
JP2022509740A JP7456490B2 (ja) 2020-03-25 2020-03-25 音データ処理装置および音データ処理方法
PCT/IB2020/000323 WO2021191651A1 (ja) 2020-03-25 2020-03-25 音データ処理装置および音データ処理方法
CN202080098932.8A CN115315374B (zh) 2020-03-25 2020-03-25 声音数据处理装置和声音数据处理方法
EP20927218.6A EP4129766A4 (en) 2020-03-25 2020-03-25 SOUND DATA PROCESSING DEVICE AND SOUND DATA PROCESSING METHOD

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/IB2020/000323 WO2021191651A1 (ja) 2020-03-25 2020-03-25 音データ処理装置および音データ処理方法

Publications (2)

Publication Number Publication Date
WO2021191651A1 true WO2021191651A1 (ja) 2021-09-30
WO2021191651A8 WO2021191651A8 (ja) 2022-06-09

Family

ID=77891584

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2020/000323 Ceased WO2021191651A1 (ja) 2020-03-25 2020-03-25 音データ処理装置および音データ処理方法

Country Status (5)

Country Link
US (1) US12444424B2 (https=)
EP (1) EP4129766A4 (https=)
JP (1) JP7456490B2 (https=)
CN (1) CN115315374B (https=)
WO (1) WO2021191651A1 (https=)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114194128A (zh) * 2021-12-02 2022-03-18 广州小鹏汽车科技有限公司 车辆的音量控制方法、车辆和存储介质

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112655000B (zh) * 2020-04-30 2022-10-25 华为技术有限公司 车内用户定位方法、车载交互方法、车载装置及车辆

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005316704A (ja) 2004-04-28 2005-11-10 Sony Corp 周囲状況通知装置、周囲状況通知方法
JP2013034122A (ja) * 2011-08-02 2013-02-14 Denso Corp 車両用立体音響装置
JP2015071320A (ja) * 2013-10-01 2015-04-16 アルパイン株式会社 会話支援装置、会話支援方法及び会話支援プログラム
JP2019068237A (ja) * 2017-09-29 2019-04-25 株式会社デンソーテン 会話支援装置、会話支援システムおよび会話支援方法

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008042390A (ja) * 2006-08-03 2008-02-21 National Univ Corp Shizuoka Univ 車内会話支援システム
US8818800B2 (en) * 2011-07-29 2014-08-26 2236008 Ontario Inc. Off-axis audio suppressions in an automobile cabin
JP2014181015A (ja) * 2013-03-21 2014-09-29 Toyota Motor Corp 車室内会話支援装置
WO2016118656A1 (en) * 2015-01-21 2016-07-28 Harman International Industries, Incorporated Techniques for amplifying sound based on directions of interest
KR20180102871A (ko) * 2017-03-08 2018-09-18 엘지전자 주식회사 이동단말기 및 이동단말기의 차량 제어 방법
WO2020027061A1 (ja) * 2018-08-02 2020-02-06 日本電信電話株式会社 会話サポートシステム、その方法、およびプログラム
US11638130B2 (en) * 2018-12-21 2023-04-25 Qualcomm Incorporated Rendering of sounds associated with selected target objects external to a device
US10755691B1 (en) * 2019-05-21 2020-08-25 Ford Global Technologies, Llc Systems and methods for acoustic control of a vehicle's interior
US11109152B2 (en) * 2019-10-28 2021-08-31 Ambarella International Lp Optimize the audio capture during conference call in cars
US11127265B1 (en) * 2019-10-28 2021-09-21 Amazon Technologies, Inc. Directed audio emission systems and methods for electric vehicles
US11089428B2 (en) * 2019-12-13 2021-08-10 Qualcomm Incorporated Selecting audio streams based on motion

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005316704A (ja) 2004-04-28 2005-11-10 Sony Corp 周囲状況通知装置、周囲状況通知方法
JP2013034122A (ja) * 2011-08-02 2013-02-14 Denso Corp 車両用立体音響装置
JP2015071320A (ja) * 2013-10-01 2015-04-16 アルパイン株式会社 会話支援装置、会話支援方法及び会話支援プログラム
JP2019068237A (ja) * 2017-09-29 2019-04-25 株式会社デンソーテン 会話支援装置、会話支援システムおよび会話支援方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4129766A4

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114194128A (zh) * 2021-12-02 2022-03-18 广州小鹏汽车科技有限公司 车辆的音量控制方法、车辆和存储介质

Also Published As

Publication number Publication date
US12444424B2 (en) 2025-10-14
CN115315374A (zh) 2022-11-08
US20230121586A1 (en) 2023-04-20
CN115315374B (zh) 2025-08-26
EP4129766A4 (en) 2023-04-19
JP7456490B2 (ja) 2024-03-27
WO2021191651A8 (ja) 2022-06-09
JPWO2021191651A1 (https=) 2021-09-30
EP4129766A1 (en) 2023-02-08

Similar Documents

Publication Publication Date Title
CN108028957B (zh) 信息处理装置、信息处理方法和机器可读介质
JP7551316B2 (ja) 音出力システムおよび音出力方法
US20180270571A1 (en) Techniques for amplifying sound based on directions of interest
JP6284331B2 (ja) 会話支援装置、会話支援方法及び会話支援プログラム
US11061236B2 (en) Head-mounted display and control method thereof
JP7049803B2 (ja) 車載装置および音声出力方法
CN113016016A (zh) 信息提示控制装置、信息提示装置和信息提示控制方法、以及程序和记录介质
JP2020080503A (ja) エージェント装置、エージェント提示方法、およびプログラム
JP2020060830A (ja) エージェント装置、エージェント提示方法、およびプログラム
JP7456490B2 (ja) 音データ処理装置および音データ処理方法
JP2003032776A (ja) 再生システム
JP2020020987A (ja) 車内システム
JP2005212709A (ja) 車両運転支援システム
JP2019102062A (ja) ヘッドマウントディスプレイ及びその制御方法
CN118849993B (zh) 汽车座舱声场自适应调节方法、装置、设备及存储介质
JP7460407B2 (ja) 音響出力装置、音響出力システム及び音響出力方法
EP3906706B1 (en) In-car headphone acoustical augmented reality system
JP2025023317A (ja) 制御装置、制御システム、および制御方法
JP7605034B2 (ja) 制御装置、制御方法および制御プログラム
CN118397990A (zh) 车载k歌方法、系统以及控制器、车辆
JP7731751B2 (ja) 音声出力装置、制御システムおよびキャリブレーション方法
EP4601325A1 (en) Hierarchical priority alert ducker matrix
JP7604273B2 (ja) 音声加工装置及び音声加工方法
CN120496525A (zh) 车辆人机交互方法、装置、电子设备及存储介质
US20260094313A1 (en) Information processing device, information processing system, and information processing method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20927218

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2022509740

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2020927218

Country of ref document: EP

Effective date: 20221025

WWW Wipo information: withdrawn in national office

Ref document number: 2020927218

Country of ref document: EP

WWG Wipo information: grant in national office

Ref document number: 202080098932.8

Country of ref document: CN

WWG Wipo information: grant in national office

Ref document number: 17907037

Country of ref document: US