WO2021028758A1 - 音響装置、及びその動作方法 - Google Patents
音響装置、及びその動作方法 Download PDFInfo
- Publication number
- WO2021028758A1 WO2021028758A1 PCT/IB2020/057125 IB2020057125W WO2021028758A1 WO 2021028758 A1 WO2021028758 A1 WO 2021028758A1 IB 2020057125 W IB2020057125 W IB 2020057125W WO 2021028758 A1 WO2021028758 A1 WO 2021028758A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- sound
- voice
- unit
- function
- feature amount
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims description 67
- 230000006870 function Effects 0.000 claims abstract description 43
- 238000001514 detection method Methods 0.000 claims abstract description 39
- 238000000926 separation method Methods 0.000 claims abstract description 39
- 238000010801 machine learning Methods 0.000 claims abstract description 25
- 238000003062 neural network model Methods 0.000 claims abstract description 11
- 230000008569 process Effects 0.000 claims description 14
- 230000005540 biological transmission Effects 0.000 description 25
- 230000010365 information processing Effects 0.000 description 23
- 230000008859 change Effects 0.000 description 14
- 230000007257 malfunction Effects 0.000 description 12
- 238000003672 processing method Methods 0.000 description 10
- 238000010586 diagram Methods 0.000 description 6
- 239000000284 extract Substances 0.000 description 5
- 238000004458 analytical method Methods 0.000 description 3
- 238000013527 convolutional neural network Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 230000007613 environmental effect Effects 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000000877 morphologic effect Effects 0.000 description 3
- 230000004044 response Effects 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000001151 other effect Effects 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 230000002194 synthesizing effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/09—Supervised learning
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10K—SOUND-PRODUCING DEVICES; METHODS OR DEVICES FOR PROTECTING AGAINST, OR FOR DAMPING, NOISE OR OTHER ACOUSTIC WAVES IN GENERAL; ACOUSTICS NOT OTHERWISE PROVIDED FOR
- G10K11/00—Methods or devices for transmitting, conducting or directing sound in general; Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
- G10K11/16—Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
- G10K11/175—Methods or devices for protecting against, or for damping, noise or other acoustic waves in general using interference effects; Masking sound
- G10K11/178—Methods or devices for protecting against, or for damping, noise or other acoustic waves in general using interference effects; Masking sound by electro-acoustically regenerating the original acoustic waves in anti-phase
- G10K11/1781—Methods or devices for protecting against, or for damping, noise or other acoustic waves in general using interference effects; Masking sound by electro-acoustically regenerating the original acoustic waves in anti-phase characterised by the analysis of input or output signals, e.g. frequency range, modes, transfer functions
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10K—SOUND-PRODUCING DEVICES; METHODS OR DEVICES FOR PROTECTING AGAINST, OR FOR DAMPING, NOISE OR OTHER ACOUSTIC WAVES IN GENERAL; ACOUSTICS NOT OTHERWISE PROVIDED FOR
- G10K11/00—Methods or devices for transmitting, conducting or directing sound in general; Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
- G10K11/16—Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
- G10K11/175—Methods or devices for protecting against, or for damping, noise or other acoustic waves in general using interference effects; Masking sound
- G10K11/178—Methods or devices for protecting against, or for damping, noise or other acoustic waves in general using interference effects; Masking sound by electro-acoustically regenerating the original acoustic waves in anti-phase
- G10K11/1781—Methods or devices for protecting against, or for damping, noise or other acoustic waves in general using interference effects; Masking sound by electro-acoustically regenerating the original acoustic waves in anti-phase characterised by the analysis of input or output signals, e.g. frequency range, modes, transfer functions
- G10K11/17821—Methods or devices for protecting against, or for damping, noise or other acoustic waves in general using interference effects; Masking sound by electro-acoustically regenerating the original acoustic waves in anti-phase characterised by the analysis of input or output signals, e.g. frequency range, modes, transfer functions characterised by the analysis of the input signals only
- G10K11/17827—Desired external signals, e.g. pass-through audio such as music or speech
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10K—SOUND-PRODUCING DEVICES; METHODS OR DEVICES FOR PROTECTING AGAINST, OR FOR DAMPING, NOISE OR OTHER ACOUSTIC WAVES IN GENERAL; ACOUSTICS NOT OTHERWISE PROVIDED FOR
- G10K11/00—Methods or devices for transmitting, conducting or directing sound in general; Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
- G10K11/16—Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
- G10K11/175—Methods or devices for protecting against, or for damping, noise or other acoustic waves in general using interference effects; Masking sound
- G10K11/178—Methods or devices for protecting against, or for damping, noise or other acoustic waves in general using interference effects; Masking sound by electro-acoustically regenerating the original acoustic waves in anti-phase
- G10K11/1785—Methods, e.g. algorithms; Devices
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10K—SOUND-PRODUCING DEVICES; METHODS OR DEVICES FOR PROTECTING AGAINST, OR FOR DAMPING, NOISE OR OTHER ACOUSTIC WAVES IN GENERAL; ACOUSTICS NOT OTHERWISE PROVIDED FOR
- G10K11/00—Methods or devices for transmitting, conducting or directing sound in general; Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
- G10K11/16—Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
- G10K11/175—Methods or devices for protecting against, or for damping, noise or other acoustic waves in general using interference effects; Masking sound
- G10K11/178—Methods or devices for protecting against, or for damping, noise or other acoustic waves in general using interference effects; Masking sound by electro-acoustically regenerating the original acoustic waves in anti-phase
- G10K11/1785—Methods, e.g. algorithms; Devices
- G10K11/17855—Methods, e.g. algorithms; Devices for improving speed or power requirements
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10K—SOUND-PRODUCING DEVICES; METHODS OR DEVICES FOR PROTECTING AGAINST, OR FOR DAMPING, NOISE OR OTHER ACOUSTIC WAVES IN GENERAL; ACOUSTICS NOT OTHERWISE PROVIDED FOR
- G10K11/00—Methods or devices for transmitting, conducting or directing sound in general; Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
- G10K11/16—Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
- G10K11/175—Methods or devices for protecting against, or for damping, noise or other acoustic waves in general using interference effects; Masking sound
- G10K11/178—Methods or devices for protecting against, or for damping, noise or other acoustic waves in general using interference effects; Masking sound by electro-acoustically regenerating the original acoustic waves in anti-phase
- G10K11/1787—General system configurations
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10K—SOUND-PRODUCING DEVICES; METHODS OR DEVICES FOR PROTECTING AGAINST, OR FOR DAMPING, NOISE OR OTHER ACOUSTIC WAVES IN GENERAL; ACOUSTICS NOT OTHERWISE PROVIDED FOR
- G10K11/00—Methods or devices for transmitting, conducting or directing sound in general; Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
- G10K11/16—Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
- G10K11/175—Methods or devices for protecting against, or for damping, noise or other acoustic waves in general using interference effects; Masking sound
- G10K11/178—Methods or devices for protecting against, or for damping, noise or other acoustic waves in general using interference effects; Masking sound by electro-acoustically regenerating the original acoustic waves in anti-phase
- G10K11/1787—General system configurations
- G10K11/17873—General system configurations using a reference signal without an error signal, e.g. pure feedforward
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/16—Speech classification or search using artificial neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/20—Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/02—Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/04—Training, enrolment or model building
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/06—Decision making techniques; Pattern matching strategies
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/18—Artificial neural networks; Connectionist approaches
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L21/0232—Processing in the frequency domain
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0272—Voice signal separating
- G10L21/028—Voice signal separating using properties of sound source
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/24—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10K—SOUND-PRODUCING DEVICES; METHODS OR DEVICES FOR PROTECTING AGAINST, OR FOR DAMPING, NOISE OR OTHER ACOUSTIC WAVES IN GENERAL; ACOUSTICS NOT OTHERWISE PROVIDED FOR
- G10K2210/00—Details of active noise control [ANC] covered by G10K11/178 but not provided for in any of its subgroups
- G10K2210/10—Applications
- G10K2210/108—Communication systems, e.g. where useful sound is kept and noise is cancelled
- G10K2210/1081—Earphones, e.g. for telephones, ear protectors or headsets
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10K—SOUND-PRODUCING DEVICES; METHODS OR DEVICES FOR PROTECTING AGAINST, OR FOR DAMPING, NOISE OR OTHER ACOUSTIC WAVES IN GENERAL; ACOUSTICS NOT OTHERWISE PROVIDED FOR
- G10K2210/00—Details of active noise control [ANC] covered by G10K11/178 but not provided for in any of its subgroups
- G10K2210/30—Means
- G10K2210/301—Computational
- G10K2210/3024—Expert systems, e.g. artificial intelligence
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10K—SOUND-PRODUCING DEVICES; METHODS OR DEVICES FOR PROTECTING AGAINST, OR FOR DAMPING, NOISE OR OTHER ACOUSTIC WAVES IN GENERAL; ACOUSTICS NOT OTHERWISE PROVIDED FOR
- G10K2210/00—Details of active noise control [ANC] covered by G10K11/178 but not provided for in any of its subgroups
- G10K2210/30—Means
- G10K2210/301—Computational
- G10K2210/3025—Determination of spectrum characteristics, e.g. FFT
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10K—SOUND-PRODUCING DEVICES; METHODS OR DEVICES FOR PROTECTING AGAINST, OR FOR DAMPING, NOISE OR OTHER ACOUSTIC WAVES IN GENERAL; ACOUSTICS NOT OTHERWISE PROVIDED FOR
- G10K2210/00—Details of active noise control [ANC] covered by G10K11/178 but not provided for in any of its subgroups
- G10K2210/30—Means
- G10K2210/301—Computational
- G10K2210/3038—Neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L2021/02087—Noise filtering the noise being separate speech, e.g. cocktail party
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0272—Voice signal separating
Definitions
- One aspect of the present invention relates to an audio device and a method of operating the same.
- One aspect of the present invention relates to an information processing system and an information processing method.
- voice recognition for example, when a user of an information terminal such as a smartphone speaks, the information terminal can execute a command included in the utterance.
- Patent Document 1 discloses a headset capable of canceling noise contained in a voice signal.
- the information terminal may recognize the utterance of a person other than the user, and the information terminal may perform an operation not intended by the user.
- One aspect of the present invention is to provide an audio device capable of suppressing malfunction of an information terminal.
- One aspect of the present invention is to provide an acoustic device capable of canceling noise.
- One aspect of the present invention is to provide an audio device capable of enabling an information terminal to perform highly accurate voice recognition.
- One aspect of the present invention is to provide a novel acoustic device.
- One aspect of the present invention is to provide an information processing system in which malfunctions are suppressed.
- One aspect of the present invention is to provide an information processing system capable of canceling noise.
- One aspect of the present invention is to provide an information processing system capable of performing highly accurate speech recognition.
- One aspect of the present invention is to provide a novel information processing system.
- One aspect of the present invention is to provide an operation method of an audio device capable of suppressing a malfunction of an information terminal.
- One aspect of the present invention is to provide a method of operating an acoustic device capable of canceling noise.
- One aspect of the present invention is to provide an operation method of an audio device capable of enabling an information terminal to perform highly accurate voice recognition.
- One aspect of the present invention is to provide a novel method of operating an audio device.
- One aspect of the present invention is to provide an information processing method in which malfunctions are suppressed.
- One aspect of the present invention is to provide an information processing method capable of canceling noise.
- One aspect of the present invention is to provide an information processing method capable of performing highly accurate speech recognition.
- One aspect of the present invention is to provide a novel information processing method.
- One aspect of the present invention includes a sound detection unit, a sound separation unit, a sound determination unit, and a processing unit, and the sound detection unit has a function of detecting a first sound and is a sound separation unit.
- the sound determination unit has a function of registering the feature amount of the sound
- the sound determination unit has a function of registering the feature amount of the sound. It has a function to determine whether or not the feature amount of the second sound is registered by using a machine learning model, and the processing unit is in the case where the feature amount of the second sound is registered.
- It has a function of analyzing the command included in the second sound and generating a signal representing the content of the command, and the processing unit performs a process for canceling the third sound with respect to the third sound. It is an acoustic device having a function of generating a fourth sound by doing so.
- the learning of the machine learning model may be performed using supervised learning in which the voice is the learning data and the label indicating whether or not to register is the teacher data.
- the machine learning model may be a neural network model.
- the fourth sound may be a sound having a phase opposite to that of the third sound.
- the first sound is detected, the first sound is separated into a second sound and a third sound, and the feature amount of the second sound is registered. Whether or not it is a sound is determined using a machine learning model, and if the feature amount of the second sound is registered, the command contained in the second sound is analyzed and a signal representing the content of the command is analyzed. Is a method of operating an acoustic device that generates a fourth sound by performing a process for canceling the third sound with respect to the third sound.
- the learning of the machine learning model may be performed using supervised learning in which the voice is used as learning data and the label indicating whether or not to register is used as teacher data.
- the machine learning model may be a neural network model.
- the fourth sound may be a sound having a phase opposite to that of the third sound.
- an audio device capable of suppressing malfunction of an information terminal.
- an acoustic device capable of canceling noise.
- an audio device capable of enabling an information terminal to perform highly accurate voice recognition.
- a novel acoustic device can be provided.
- an information processing system in which malfunctions are suppressed. According to one aspect of the present invention, it is possible to provide an information processing system capable of canceling noise. According to one aspect of the present invention, it is possible to provide an information processing system capable of performing highly accurate voice recognition. According to one aspect of the present invention, a novel information processing system can be provided.
- an operation method of an audio device capable of suppressing a malfunction of an information terminal.
- a novel method of operating an audio device can be provided.
- an information processing method in which malfunctions are suppressed. According to one aspect of the present invention, it is possible to provide an information processing method capable of canceling noise. According to one aspect of the present invention, it is possible to provide an information processing method capable of performing highly accurate speech recognition. According to one aspect of the present invention, a novel information processing method can be provided.
- FIG. 1A is a block diagram showing a configuration example of an audio device.
- 1B1 and 1B2 are diagrams showing specific examples of audio equipment.
- 2A and 2B are schematic views showing an example of an operation method of the acoustic device.
- FIG. 3 is a flowchart showing an example of an operation method of the audio device.
- 4A to 4C are schematic views showing an example of an operation method of the acoustic device.
- 5A and 5B are schematic views showing an example of an operation method of the acoustic device.
- FIG. 6 is a flowchart showing an example of an operation method of the audio device.
- 7A and 7B are schematic views showing an example of an operation method of the acoustic device.
- FIG. 8 is a flowchart showing an example of an operation method of the audio device.
- FIG. 9 is a schematic view showing an example of an operation method of the audio device.
- FIG. 10 is a flowchart showing an example of an operation method of the audio device.
- FIG. 11 is a schematic view showing an example of an operation method of the audio device.
- the audio device of one aspect of the present invention and the operation method thereof will be described.
- an information processing system including an acoustic device according to one aspect of the present invention and an information processing method using the information processing system will be described.
- the audio device of one aspect of the present invention can be, for example, earphones or headphones.
- the audio device of one aspect of the present invention includes a sound detection unit, a sound separation unit, a sound determination unit, a processing unit, a transmission / reception unit, and a sound output unit.
- the sound detection unit can be configured to include, for example, a microphone.
- the sound output unit may be configured to include, for example, a speaker.
- the audio device of one aspect of the present invention is electrically connected to an information terminal such as a smartphone.
- the audio device of one aspect of the present invention and the information terminal may be connected by wire, or may be wirelessly connected by Bluetooth (registered trademark), Wi-Fi (registered trademark), or the like.
- the information processing system of one aspect of the present invention is configured by the sound device of one aspect of the present invention and the information terminal.
- the feature amount (voiceprint) of voice is registered in advance.
- the feature amount of the voice of the user of the audio device of one aspect of the present invention is registered.
- the feature amount of the voice can be, for example, the frequency characteristic of the voice.
- it can be a frequency characteristic obtained by performing a Fourier transform on voice data which is data representing voice.
- a feature amount of voice for example, a mel-frequency Cepstrum Coefficients (MFCC) can be used as a feature amount of voice.
- MFCC mel-frequency Cepstrum Coefficients
- the sound separation unit separates the sound into a voice and a sound other than the voice.
- sounds other than voice can be said to be, for example, environmental sounds, for example, noise.
- the sound determination unit extracts the feature amount from the voice separated by the sound separation unit, and determines whether or not the extracted feature amount is registered. If it is registered, the processing unit analyzes the instruction included in the voice and generates an instruction signal which is a signal representing the content of the instruction. The instruction can be analyzed by using language processing such as morphological analysis. The generated instruction signal is output to the transmission / reception unit.
- the command signal is not generated.
- the processing unit performs a process for canceling the sound other than the sound separated by the sound separating unit. For example, the processing unit generates a sound having a phase opposite to that of the sound.
- the transmission / reception unit synthesizes the sound processed by the processing unit and the sound emitted by the information terminal, and outputs the sound to the sound output unit.
- the sound emitted by the information terminal can be, for example, the music when the information terminal is playing music.
- the sound output to the sound output unit is emitted to the outside of the acoustic device of one aspect of the present invention.
- the user of the audio apparatus according to one aspect of the present invention can hear the synthesized sound of the sound detected by the sound detection unit and the sound output by the sound output unit.
- the sound output by the sound output unit may include, in addition to the sound emitted by the information terminal,, for example, noise contained in the sound detected by the sound detection unit having the opposite phase.
- the user of the audio equipment of one aspect of the present invention can hear, for example, the noise-canceled sound.
- the processing unit when the processing unit generates an instruction signal and outputs it to the transmission / reception unit, that is, when the sound separation unit registers the separated voice features, the transmission / reception unit outputs the instruction signal to the information terminal. ..
- the information terminal executes the instruction represented by the instruction signal. For example, when the information terminal is playing music and the command signal represents a command to "change the type of song", the song played by the information terminal can be changed to a designated one.
- the above is an example of the operation method of the acoustic device of one aspect of the present invention.
- the processing unit Only when the sound separation unit has registered the features of the separated voice, the processing unit generates the command signal, which causes the information terminal to malfunction as compared to the case where the command signal is generated regardless of whether or not it is registered. Can be suppressed.
- the feature amount of the voice of the user of the information terminal is registered in the sound device of one aspect of the present invention, an operation unintended by the user of the information terminal is performed in response to the voice other than the user of the information terminal. It can be suppressed from being damaged.
- the registration of the voice feature amount and the determination of whether or not the voice feature amount input to the sound determination unit is registered can be performed by using, for example, a machine learning model. It is preferable to use, for example, a neural network model as a machine learning model because inference can be performed with high accuracy.
- a neural network model for example, CNN (Convolutional Neural Network), RNN (Recurrent Neural Network) and the like can be used.
- a learning method of the machine learning model for example, supervised learning can be used.
- the feature amount of voice can be used as learning data, and the label indicating whether or not to register can be used as teacher data.
- learning can be performed in two stages, a first learning and a second learning. That is, after the first learning is performed, the second learning can be performed as additional learning.
- a label indicating "do not register" is given as teacher data to all the learning data.
- a label indicating "registration" is given as teacher data to all the learning data. That is, the voice feature amount can be registered by the second learning.
- the feature amount of the voice of the user of the audio device of one aspect of the present invention is used as the learning data.
- the learning data it is preferable to use the feature amount of the voice uttered by the same person by various utterance methods without bias. Further, it is preferable to inflate the number of learning data by changing parameters such as voice pitch with respect to the voice data acquired as learning data. As described above, inference using the learning result, that is, determination as to whether or not the feature amount of the voice input to the sound determination unit is registered can be performed with high accuracy.
- the first learning can be performed, for example, before shipping the acoustic device of one aspect of the present invention.
- the second learning can be performed, for example, after the acoustic device of one aspect of the present invention is shipped.
- the second learning can be performed by, for example, the user of the audio equipment of one aspect of the present invention.
- the user can register the feature amount of voice by himself / herself.
- the sound determination unit can determine whether or not the feature amount of the voice separated by the sound separation unit is registered. Specifically, when voice is input to the sound determination unit, the sound determination unit infers whether or not the feature amount of the voice input to the sound determination unit is registered based on the learning result. Will be able to.
- the information terminal electrically connected to the acoustic device of one aspect of the present invention can perform high-precision voice recognition.
- FIG. 1A is a diagram showing a configuration example of an audio device 10 which is an audio device of one aspect of the present invention.
- FIG. 1A shows the sound 21, the information terminal 22, and the ear 23 in addition to the audio device 10 for the purpose of explaining the functions and the like of the audio device 10.
- the information terminal 22 can be, for example, a smartphone.
- the information terminal 22 can be a portable electronic device such as a tablet terminal, a laptop PC, or a portable (take-out) game machine.
- the information terminal 22 may be an electronic device other than the portable electronic device.
- the sound device 10 includes a sound detection unit 11, a sound separation unit 12, a sound determination unit 13, a storage unit 14, a processing unit 15, a transmission / reception unit 16, and a sound output unit 17.
- the transmission / reception unit 16 is electrically connected to the information terminal 22.
- the audio device 10 and the information terminal 22 may be connected by wire, or may be wirelessly connected by Bluetooth (registered trademark), Wi-Fi (registered trademark), or the like. It can be said that the information processing system of one aspect of the present invention is configured by the sound device 10 and the information terminal 22.
- FIG. 1A the arrows indicate the flow of data, signals, and the like.
- the flow shown in FIG. 1A is an example, and is not limited to the flow shown in FIG. 1A. The same applies to other figures.
- the sound detection unit 11 has a function of detecting sound. For example, it has a function of detecting a sound 21 including a human voice.
- the sound detection unit 11 may be configured to include, for example, a microphone.
- the sound separation unit 12 has a function of separating the sound detected by the sound detection unit 11 for each characteristic.
- the sound detection unit 11 detects a sound 21 including a human voice, it has a function of separating the sound 21 into a voice and a sound other than the voice.
- sounds other than voice can be said to be, for example, environmental sounds, for example, noise.
- the sound separation unit 12 has a function of separating, for example, the sound detected by the sound detection unit 11 based on the frequency of the sound.
- human voice is mainly composed of frequency components of 0.2 kHz or more and 4 kHz or less. Therefore, for example, by separating the sound detected by the sound detection unit 11 into a sound having a frequency of 0.2 kHz or more and 4 kHz or less and a sound having a frequency other than that, the sound and the other sounds can be obtained. Can be separated. It is said that the intermediate frequency of human voice is around 1 kHz.
- the extracted features can be registered. For example, a voiceprint can be registered. From the above, it can be said that the sound determination unit 13 has a function of registering the feature amount of the sound.
- the registration result can be stored in the storage unit 14.
- the sound determination unit 13 has a function of determining whether or not the extracted feature amount is registered.
- the feature amount can be registered and the above determination can be performed using, for example, a machine learning model.
- a machine learning model for example, using a neural network model is preferable because inference can be performed with high accuracy.
- the neural network model for example, CNN, RNN, or the like can be used.
- a learning method of the machine learning model for example, supervised learning can be used.
- the processing unit 15 has a function of performing processing based on the determination result of the sound determination unit 13. For example, when the sound separation unit 12 outputs voice, the command signal can be generated only when the feature amount of the voice is registered.
- the transmission / reception unit 16 has a function of synthesizing the sound processed by the processing unit 15 and the sound emitted by the information terminal 22.
- the sound emitted by the information terminal 22 can be, for example, the music when the information terminal 22 is playing music.
- the command signal is generated only when, for example, the sound separation unit 12 has registered the separated voice features.
- the malfunction of the information terminal 22 can be suppressed as compared with the case where the command signal is generated regardless of the presence or absence of registration.
- the feature amount of the voice of the user of the information terminal 22 is registered in the sound device 10
- an operation unintended by the user of the information terminal 22 is performed in response to a voice other than the user of the information terminal 22. Can be suppressed.
- 2A and 2B show a method of registering a sound feature amount when the sound determination unit 13 has a function of determining whether or not a sound feature amount is registered by using a machine learning model. It is a figure which shows an example. Specifically, it is a figure which shows an example of the registration method of the feature amount of a sound using supervised learning.
- the sound determination unit 13 extracts the feature amount of the sound data 31.
- the frequency characteristic of the sound represented by the sound data 31 is used as a feature quantity.
- the frequency characteristic obtained by performing the Fourier transform on the sound data 31 can be used as a feature quantity.
- MFCC can be used as the feature amount.
- the voices of a plurality of people is the learning data.
- inference using the learning result described later that is, determination of whether or not the feature amount of the sound input to the sound determination unit 13 is registered can be performed with high accuracy.
- the extracted data representing the feature amount with the label 42 which is a label indicating "registration" is input to the generator 30 in which the learning result 33 is read.
- the generator 30 learns the data representing the feature amount extracted from the sound data 41 as learning data and the label 42 as teacher data, and outputs the learning result 43.
- the learning result 43 can be stored in the storage unit 14.
- the learning result 43 can be a weighting coefficient.
- FIGS. 2A and 2B a label indicating “registration” is indicated by “registration ⁇ ”, and a label indicating “not registration” is indicated by “registration ⁇ ”.
- registration ⁇ a label indicating “registration” is indicated by “registration ⁇ ”
- registration ⁇ a label indicating “not registration” is indicated by “registration ⁇ ”.
- the sound data 41 which is the learning data, is, for example, the voice of the user of the sound device 10.
- voice it is preferable to perform learning by using the feature amount of the voice uttered by the same person by various utterance methods without bias.
- the number of sound data 41 is inflated for learning by changing parameters such as voice pitch with respect to the voice data acquired as the sound data 41.
- the sound determination unit 13 learns the feature amount of the sound that is not registered as learning data as shown in FIG. 2A, and then learns the feature amount of the sound that is registered as shown in FIG. 2B. It can be learned as data. That is, learning can be performed in two stages, the first learning and the second learning. Specifically, after performing the first learning shown in FIG. 2A, the second learning shown in FIG. 2B can be performed as additional learning.
- the first learning can be performed, for example, before the sound device 10 is shipped.
- the second learning can be performed, for example, after the sound device 10 is shipped.
- the second learning can be performed by, for example, the user of the sound device 10. As described above, in the sound device 10, the user can register the feature amount of the sound by himself / herself.
- the sound determination unit 13 can determine, for example, whether or not the feature amount of the sound separated by the sound separation unit 12 is registered. Specifically, when a sound is input to the sound determination unit 13, the sound determination unit 13 can infer whether or not the feature amount of the input sound is registered based on the learning result 43. become able to.
- FIG. 3 is a flowchart showing an example of an operation method when the audio device 10 is used.
- 4A to 4C, and 5A and 5B are schematic views illustrating the details of each step shown in FIG. It should be noted that the following description will be made on the assumption that the feature amount of the sound has already been registered by the methods shown in FIGS. 2A and 2B.
- the sound separation unit 12 separates the detected sound for each characteristic. For example, when the sound detection unit 11 detects a sound including a human voice, the sound separation unit 12 separates the detected sound into a voice and a sound other than the voice (step S02). As described above, sounds other than voice can be said to be, for example, environmental sounds, for example, noise.
- step S02 A specific example of step S02 is shown in FIG. 4A.
- the sound separation unit 12 has a function of separating, for example, the sound detected by the sound detection unit 11 based on the frequency of the sound.
- FIG. 4A shows an example in which the sound 21 detected by the sound detection unit 11 and input to the sound separation unit 12 is separated into a sound 21a and a sound 21b based on the frequency.
- the sound may be separated into a sound having a frequency of 0.5 kHz or more and 2 kHz or less and a sound having a frequency other than that.
- the frequency for sound separation may be changed according to the type of sound detected by the sound detection unit 11. For example, when the sound detection unit 11 detects a sound including a female voice, a sound having a higher frequency than when the sound including a male voice is detected may be separated as voice.
- the frequency for sound separation according to the type of sound detected by the sound detection unit 11, for example, the sound detected by the sound detection unit 11 can be separated into voice and other sounds with high accuracy. can do.
- the processing unit 15 performs a process for canceling the sound 21b with respect to the sound 21b which is a sound other than the voice separated by the sound separating unit 12 (step S06). For example, as shown in FIG. 5A, the sound 21b is input to the processing unit 15, and the sound 26 whose phase is inverted from that of the sound 21b is output.
- the processing unit 15 when the processing unit 15 generates the command signal 25 and outputs it to the transmission / reception unit 16, that is, when the feature amount of the sound 21a which is the separated voice of the sound separation unit 12 is registered, the transmission / reception unit 16 Outputs the command signal 25 to the information terminal 22 (step S08, step S09).
- steps S07 to S09 are shown in FIG. 5B.
- a sound 26 which is a sound obtained by reversing the phase of the sound 21b
- a command signal 25 indicating a command to "change the type of song”
- a sound 27 emitted from the information terminal 22 are transmitted to the transmission / reception unit 16.
- An example of input is shown.
- the sound 26 and the sound 27 are combined by the transmission / reception unit 16 and output to the sound output unit 17.
- the sound input to the sound output unit 17 is emitted to the outside of the sound device 10.
- the user of the sound device 10 can hear the synthetic sound of the sound 21 detected by the sound detection unit 11 and the sound 26 and the sound 27 output by the sound output unit 17 with the ear 23.
- FIG. 6 is a flowchart showing an example of an operation method when the audio device 10 is used, and is a modification of the operation method shown in FIG.
- the operation method shown in FIG. 6 is different from the operation method shown in FIG. 3 in that step S05 is replaced with step S05a and step S09 is replaced with step S09a.
- FIG. 8 is a flowchart showing an example of an operation method when the audio device 10 is used, and is a modification of the operation method shown in FIG.
- the operation method shown in FIG. 8 is different from the operation method shown in FIG. 3 in that step S06a is performed instead of step S06 when the feature amount extracted from the sound 21a is not registered (step S04).
- FIG. 9 is a schematic diagram illustrating the details of step S06a.
- the processing unit 15 may perform a process of reducing the volume of the sound 21a.
- FIG. 10 is a flowchart showing an example of an operation method when the audio device 10 is used, and is a modification of the operation method shown in FIG.
- the operation method shown in FIG. 10 is different from the operation method shown in FIG. 8 in that step S06a is replaced with step S06b.
- FIG. 11 is a schematic diagram illustrating the details of step S06b.
- the processing unit 15 performs a process of reducing the volume of the sound 21a, which is a voice, and canceling the sound 21b, which is a sound other than the voice.
- the sound 21a and the sound 21b are input to the processing unit 15.
- the processing unit 15 inverts the phase of the sound 21a and performs a process of reducing the amplitude. Further, a process of inverting the phase of the sound 21b is performed.
- the sound processed by the processing unit 15 is output as the sound 26.
- the information terminal 22 can perform highly accurate voice recognition.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- Acoustics & Sound (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Signal Processing (AREA)
- Quality & Reliability (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Game Theory and Decision Science (AREA)
- Business, Economics & Management (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- Medical Informatics (AREA)
- Circuit For Audible Band Transducer (AREA)
- Soundproofing, Sound Blocking, And Sound Damping (AREA)
- Telephone Function (AREA)
Abstract
Description
図2A及び図2Bは、音響装置の動作方法の一例を示す模式図である。
図3は、音響装置の動作方法の一例を示すフローチャートである。
図4A乃至図4Cは、音響装置の動作方法の一例を示す模式図である。
図5A及び図5Bは、音響装置の動作方法の一例を示す模式図である。
図6は、音響装置の動作方法の一例を示すフローチャートである。
図7A及び図7Bは、音響装置の動作方法の一例を示す模式図である。
図8は、音響装置の動作方法の一例を示すフローチャートである。
図9は、音響装置の動作方法の一例を示す模式図である。
図10は、音響装置の動作方法の一例を示すフローチャートである。
図11は、音響装置の動作方法の一例を示す模式図である。
本実施の形態では、本発明の一態様の音響装置、及びその動作方法について説明する。また、本発明の一態様の音響装置を含む情報処理システム、および当該情報処理システムを用いた情報処理方法について説明する。
本発明の一態様の音響装置は、例えばイヤホン、又はヘッドホンとすることができる。本発明の一態様の音響装置は、音検知部と、音分離部と、音判定部と、処理部と、送受信部と、音出力部と、を有する。ここで、音検知部は、例えばマイクロフォンを含む構成とすることができる。また、音出力部は、例えばスピーカーを含む構成とすることができる。
以下では、音響装置10の動作方法の一例について説明する。図2A及び図2Bは、音判定部13が、音の特徴量が登録されたものであるか否かを、機械学習モデルを用いて判定する機能を有する場合の、音の特徴量の登録方法の一例を示す図である。具体的には、教師あり学習を用いた、音の特徴量の登録方法の一例を示す図である。
Claims (8)
- 音検知部と、音分離部と、音判定部と、処理部と、を有し、
前記音検知部は、第1の音を検知する機能を有し、
前記音分離部は、前記第1の音を、第2の音と、第3の音と、に分離する機能を有し、
前記音判定部は、音の特徴量を登録する機能を有し、
前記音判定部は、前記第2の音の特徴量が前記登録されたものか否かを、機械学習モデルを用いて判定する機能を有し、
前記処理部は、前記第2の音の特徴量が前記登録されたものである場合は、前記第2の音に含まれる命令を解析し、前記命令の内容を表す信号を生成する機能を有し、
前記処理部は、前記第3の音に対して、前記第3の音をキャンセルするための処理を行うことにより、第4の音を生成する機能を有する音響装置。 - 請求項1において、
前記機械学習モデルの学習は、音声を学習データ、前記登録を行うか否かを表すラベルを教師データとする、教師あり学習を用いて行われる音響装置。 - 請求項1又は2において、
前記機械学習モデルは、ニューラルネットワークモデルである音響装置。 - 請求項1乃至3のいずれか一項において、
前記第4の音は、前記第3の音に対して逆位相の音である音響装置。 - 第1の音を検知し、
前記第1の音を、第2の音と、第3の音と、に分離し、
前記第2の音の特徴量が登録されたものか否かを、機械学習モデルを用いて判定し、
前記第2の音の特徴量が登録されたものである場合は、前記第2の音に含まれる命令を解析し、前記命令の内容を表す信号を生成し、
前記第3の音に対して、前記第3の音をキャンセルするための処理を行うことにより、第4の音を生成する音響装置の動作方法。 - 請求項5において、
前記機械学習モデルの学習は、音声を学習データ、登録を行うか否かを表すラベルを教師データとして用いる教師あり学習を用いて行われる、音響装置の動作方法。 - 請求項5又は6において、
前記機械学習モデルは、ニューラルネットワークモデルである音響装置の動作方法。 - 請求項5乃至7のいずれか一項において、
前記第4の音は、前記第3の音に対して逆位相の音である音響装置の動作方法。
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202080056225.2A CN114207708A (zh) | 2019-08-09 | 2020-07-29 | 音响装置及其工作方法 |
JP2021539690A JPWO2021028758A1 (ja) | 2019-08-09 | 2020-07-29 | |
KR1020227007021A KR20220044530A (ko) | 2019-08-09 | 2020-07-29 | 음향 장치 및 그 동작 방법 |
US17/630,090 US20220366928A1 (en) | 2019-08-09 | 2020-07-29 | Audio device and operation method thereof |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2019147368 | 2019-08-09 | ||
JP2019-147368 | 2019-08-09 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021028758A1 true WO2021028758A1 (ja) | 2021-02-18 |
Family
ID=74570241
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/IB2020/057125 WO2021028758A1 (ja) | 2019-08-09 | 2020-07-29 | 音響装置、及びその動作方法 |
Country Status (5)
Country | Link |
---|---|
US (1) | US20220366928A1 (ja) |
JP (1) | JPWO2021028758A1 (ja) |
KR (1) | KR20220044530A (ja) |
CN (1) | CN114207708A (ja) |
WO (1) | WO2021028758A1 (ja) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2014191823A (ja) * | 2013-03-26 | 2014-10-06 | Tata Consultancy Services Ltd | 生体認証および自己学習アルゴリズムを用いた個人用アカウント識別子の有効化方法およびシステム。 |
JP2016075740A (ja) * | 2014-10-03 | 2016-05-12 | 日本電気株式会社 | 音声処理装置、音声処理方法、およびプログラム |
JP2018107577A (ja) * | 2016-12-26 | 2018-07-05 | ヤマハ株式会社 | 音響装置 |
JP2019036174A (ja) * | 2017-08-17 | 2019-03-07 | ヤフー株式会社 | 制御装置、入出力装置、制御方法、および制御プログラム |
Family Cites Families (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR102287182B1 (ko) | 2014-02-03 | 2021-08-05 | 코핀 코포레이션 | 음성 커맨드에 대한 스마트 블루투스 헤드셋 |
US10360926B2 (en) * | 2014-07-10 | 2019-07-23 | Analog Devices Global Unlimited Company | Low-complexity voice activity detection |
US10657949B2 (en) * | 2015-05-29 | 2020-05-19 | Sound United, LLC | System and method for integrating a home media system and other home systems |
US11006162B2 (en) * | 2015-08-31 | 2021-05-11 | Orcam Technologies Ltd. | Systems and methods for analyzing information collected by wearable systems |
US9858927B2 (en) * | 2016-02-12 | 2018-01-02 | Amazon Technologies, Inc | Processing spoken commands to control distributed audio outputs |
US10373612B2 (en) * | 2016-03-21 | 2019-08-06 | Amazon Technologies, Inc. | Anchored speech detection and speech recognition |
CN110506452B (zh) * | 2017-02-07 | 2021-12-03 | 路创技术有限责任公司 | 基于音频的负载控制系统 |
US11100384B2 (en) * | 2017-02-14 | 2021-08-24 | Microsoft Technology Licensing, Llc | Intelligent device user interactions |
US10431217B2 (en) * | 2017-02-15 | 2019-10-01 | Amazon Technologies, Inc. | Audio playback device that dynamically switches between receiving audio data from a soft access point and receiving audio data from a local access point |
JP6991041B2 (ja) * | 2017-11-21 | 2022-01-12 | ヤフー株式会社 | 生成装置、生成方法、および生成プログラム |
US11120794B2 (en) * | 2019-05-03 | 2021-09-14 | Sonos, Inc. | Voice assistant persistence across multiple network microphone devices |
-
2020
- 2020-07-29 WO PCT/IB2020/057125 patent/WO2021028758A1/ja active Application Filing
- 2020-07-29 CN CN202080056225.2A patent/CN114207708A/zh active Pending
- 2020-07-29 KR KR1020227007021A patent/KR20220044530A/ko unknown
- 2020-07-29 US US17/630,090 patent/US20220366928A1/en active Pending
- 2020-07-29 JP JP2021539690A patent/JPWO2021028758A1/ja active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2014191823A (ja) * | 2013-03-26 | 2014-10-06 | Tata Consultancy Services Ltd | 生体認証および自己学習アルゴリズムを用いた個人用アカウント識別子の有効化方法およびシステム。 |
JP2016075740A (ja) * | 2014-10-03 | 2016-05-12 | 日本電気株式会社 | 音声処理装置、音声処理方法、およびプログラム |
JP2018107577A (ja) * | 2016-12-26 | 2018-07-05 | ヤマハ株式会社 | 音響装置 |
JP2019036174A (ja) * | 2017-08-17 | 2019-03-07 | ヤフー株式会社 | 制御装置、入出力装置、制御方法、および制御プログラム |
Also Published As
Publication number | Publication date |
---|---|
JPWO2021028758A1 (ja) | 2021-02-18 |
CN114207708A (zh) | 2022-03-18 |
KR20220044530A (ko) | 2022-04-08 |
US20220366928A1 (en) | 2022-11-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10628484B2 (en) | Vibrational devices as sound sensors | |
JP6463825B2 (ja) | 多重話者音声認識修正システム | |
EP3210205B1 (en) | Sound sample verification for generating sound detection model | |
US20190355352A1 (en) | Voice and conversation recognition system | |
WO2020210050A1 (en) | Automated control of noise reduction or noise masking | |
WO2019228329A1 (zh) | 个人听力装置、外部声音处理装置及相关计算机程序产品 | |
JP6985221B2 (ja) | 音声認識装置及び音声認識方法 | |
WO2023088083A1 (zh) | 语音增强方法和装置 | |
US11367457B2 (en) | Method for detecting ambient noise to change the playing voice frequency and sound playing device thereof | |
US10424292B1 (en) | System for recognizing and responding to environmental noises | |
US11290802B1 (en) | Voice detection using hearable devices | |
US10964307B2 (en) | Method for adjusting voice frequency and sound playing device thereof | |
WO2021028758A1 (ja) | 音響装置、及びその動作方法 | |
JP2019113636A (ja) | 音声認識システム | |
US10950253B2 (en) | Vocal feedback device and method of use | |
WO2020079918A1 (ja) | 情報処理装置及び情報処理方法 | |
JP7218143B2 (ja) | 再生システムおよびプログラム | |
JP2016186516A (ja) | 疑似音声信号生成装置、音響モデル適応装置、疑似音声信号生成方法、およびプログラム | |
KR102044970B1 (ko) | 환경 특징 추출 방법 및 이를 이용한 보청기 작동 방법 | |
KR101429138B1 (ko) | 복수의 사용자를 위한 장치에서의 음성 인식 방법 | |
JP6696878B2 (ja) | 音声処理装置、ウェアラブル端末、携帯端末、および音声処理方法 | |
US20220261218A1 (en) | Electronic device including speaker and microphone and method for operating the same | |
JP2008286921A (ja) | キーワード抽出装置、キーワード抽出方法及びそのプログラム、記録媒体 | |
JP3846500B2 (ja) | 音声認識対話装置および音声認識対話処理方法 | |
US20240112676A1 (en) | Apparatus performing based on voice recognition and artificial intelligence and method for controlling thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20853263 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2021539690 Country of ref document: JP Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 20227007021 Country of ref document: KR Kind code of ref document: A |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20853263 Country of ref document: EP Kind code of ref document: A1 |