WO2011055410A1 - 音声認識装置 - Google Patents
音声認識装置 Download PDFInfo
- Publication number
- WO2011055410A1 WO2011055410A1 PCT/JP2009/005905 JP2009005905W WO2011055410A1 WO 2011055410 A1 WO2011055410 A1 WO 2011055410A1 JP 2009005905 W JP2009005905 W JP 2009005905W WO 2011055410 A1 WO2011055410 A1 WO 2011055410A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- sound
- unit
- voice
- source direction
- determination
- Prior art date
Links
- 238000010586 diagram Methods 0.000 description 13
- 238000000605 extraction Methods 0.000 description 6
- 238000000034 method Methods 0.000 description 6
- 230000004048 modification Effects 0.000 description 5
- 238000012986 modification Methods 0.000 description 5
- 238000013500 data storage Methods 0.000 description 4
- 239000000284 extract Substances 0.000 description 4
- 238000009826 distribution Methods 0.000 description 3
- 238000003491 array Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 238000009827 uniform distribution Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/10—Speech classification or search using distance or distortion measures between unknown speech and reference templates
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01S—RADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
- G01S3/00—Direction-finders for determining the direction from which infrasonic, sonic, ultrasonic, or electromagnetic waves, or particle emission, not having a directional significance, are being received
- G01S3/80—Direction-finders for determining the direction from which infrasonic, sonic, ultrasonic, or electromagnetic waves, or particle emission, not having a directional significance, are being received using ultrasonic, sonic or infrasonic waves
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
- G10L25/84—Detection of presence or absence of voice signals for discriminating voice from noise
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02166—Microphone arrays; Beamforming
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
- G10L2025/783—Detection of presence or absence of voice signals based on threshold decision
Definitions
- the present invention relates to a speech recognition apparatus.
- This speech recognition apparatus estimates the sound source direction of the speech by determining whether or not the sound pressure and time of the input speech exceed a predetermined threshold, and sets the directivity direction of the microphone array. Then, the voice from the directivity direction is emphasized and voice recognition is performed.
- the voice recognition device of Patent Document 1 may recognize sounds other than the voice emitted by the speaker (for example, noise such as a door closing sound).
- sounds other than the voice emitted by the speaker for example, noise such as a door closing sound.
- the directivity direction of the microphone array is set in the direction of the sound source of such sound, and voice recognition may not be performed with high accuracy.
- the present invention has been made in view of the above problems, and an object of the present invention is to provide a speech recognition apparatus capable of performing speech recognition with high accuracy.
- a speech recognition apparatus includes a determination unit that determines whether a sound input to a speech input unit including a plurality of microphones includes a frequency of 1000 Hz or more at a predetermined intensity or more, and the determination
- the sound source direction estimating unit for estimating the sound source direction of the sound including a frequency of 1000 Hz or more, and the sound arriving from the estimated sound source direction matches a pre-registered speech model when the determination by the unit is true
- a voice recognition unit for determining whether or not.
- the block diagram of the speech recognition apparatus which concerns on the 1st Embodiment of this invention Schematic diagram showing an example of the arrangement of microphone arrays in a speech recognition device
- the figure which shows the flowchart showing an example of the processing flow of a speech recognition apparatus Schematic diagram showing an example of the frequency distribution of the input noise, human voice, and the sound of clapping hands
- a figure showing the time change of the sound of the palm of your hand A diagram showing an example of a database of operation instructions
- the figure showing another example of the database of operation instructions The block diagram of the modification 1 of the speech recognition apparatus which concerns on the 1st Embodiment of this invention.
- the figure which shows the flowchart showing an example of the processing flow of a speech recognition apparatus The figure showing an example of the time change of the sound which hits the palm twice
- the speech recognition apparatus 10 detects a sound (hereinafter, referred to as “cue sound”) generated by a speaker's operation, thereby determining the directivity direction of the microphone array included in the speech input unit.
- the sound source direction is set, the voice from the speaker is recognized, and an electronic device such as a television receiver is controlled.
- the cue sound includes, for example, a sound of hitting a plurality of body parts such as a palm, a sound of playing a finger, a sound of hitting an object with a body part such as a finger or a hand.
- a sound of striking the palm is used as a cue sound.
- FIG. 1 is a block diagram of the speech recognition apparatus according to the first embodiment.
- the speech recognition apparatus 10 includes a speech input unit 50, a storage unit 12, a determination unit 13, a sound source direction estimation unit 14, a directivity control unit 15, a speech recognition unit 16, and device control.
- a unit 17 and a display unit 18 are provided.
- the voice input unit 50 includes one or a plurality of sets of microphone arrays.
- the voice input unit 50 includes one microphone array 11.
- the voice input unit 50 inputs a sound outside the voice recognition device or a voice of a person and stores it in the storage unit 12 as sound data.
- the storage unit 12 stores a speech model necessary for the speech recognition unit 16 (to be described later) to recognize speech in addition to the sound data.
- the determination unit 13 determines whether the sound data stored in the storage unit 12 includes sound data that satisfies a predetermined condition described later.
- the sound source direction estimator 14 estimates the sound source direction of the sound data (the direction of the cue sound) according to the determination result by the determination unit 13.
- the directivity control unit 15 sets the directivity direction of the microphone array 11 to the sound source direction estimated by the sound source direction estimation unit 14.
- the directivity control unit 15 outputs a recognition start command to the voice recognition unit 16 after the setting of the directivity direction of the microphone array 11 is completed.
- the voice recognition unit 16 receives a recognition start command from the directivity control unit 15.
- the voice recognition unit 16 recognizes a speaker's voice from sound data obtained by using the microphone array 11 in which the directivity direction is set by the directivity control unit 15 and determines an operation command to the electronic device.
- the device control unit 17 gives a command corresponding to the voice recognized by the voice recognition unit 16 to an electronic device (not shown) to be operated.
- the display unit 18 notifies the speaker that the voice recognition unit 16 is receiving voice.
- the voice recognition device 10 can be incorporated in an electronic device to be operated or connected to the outside of the electronic device, for example.
- the electronic device to be operated is the television receiver 20, but is not limited thereto.
- the present invention can be applied to an electronic device that exhibits performance by receiving an operation from a speaker during use, such as a personal computer, a video recorder, an air conditioner, and an in-vehicle device.
- the determination unit 13, the sound source direction estimation unit 14, the directivity control unit 15, the speech recognition unit 16, and the device control unit 17 store a program stored in a computer-readable memory with a central processing unit (CPU). Can be realized by executing.
- CPU central processing unit
- the storage unit 12 may be provided inside the voice recognition device 10 or may be provided outside the voice recognition device 10.
- FIG. 2 is a schematic diagram illustrating an arrangement example of the microphone array 11 in the speech recognition apparatus 10.
- the microphone array 11 includes the two microphones 21 and 22, but may include three or more.
- the microphone array 11 can be provided, for example, on the upper portion of the casing 29 of the television receiver 20 in parallel with the upper side of the casing 29.
- the microphones 21 and 22 can convert the input sound into an electric signal.
- the microphones 21 and 22 can set a directivity direction at a position where a speaker normally views the television receiver 20.
- FIG. 4 is a schematic diagram showing an example of the frequency distribution of noise, human voice, and clapping sounds input to the microphones 21 and 22.
- the horizontal axis represents frequency (from 0 Hz to 8000 Hz), and the vertical axis represents sound intensity.
- the intensity of noise shows a substantially uniform distribution at frequencies from 0 Hz to 8000 Hz.
- the intensity of human speech shows a value larger than noise at frequencies from 0 Hz to 1000 Hz, but shows a distribution similar to noise at frequencies above 2000 Hz.
- the intensity of the sound of clapping hands shows a large value in the frequency range from 1000 Hz to 8000 Hz compared to noise and human voice.
- FIG. 5 is an example diagram showing a time change of a component having a frequency of 4000 Hz of a sound of hitting a palm.
- the horizontal axis represents time in seconds, and the vertical axis represents sound intensity.
- the speech recognition apparatus 10 detects such a sound that a speaker strikes the palm as a cue sound, and sets the directivity direction of the microphone array 11.
- FIG. 3 is a flowchart showing an example of a processing flow of the speech recognition apparatus 10.
- the processing flow starts from a state in which the setting of the directivity direction of the microphone array 11 is cancelled.
- the voice recognition device 10 receives sound and voice from the speaker using the microphones 21 and 22 (S101).
- the sound converted into the electrical signal by the microphones 21 and 22 is stored in the storage unit 12 for a certain period of time as sound data arranged for each frequency (S102).
- the time for storing the sound data may be set in advance or may be arbitrarily set by the speaker.
- the storage unit 12 stores sounds from time 0 (s) to T (s) in FIG. In FIG. 5, there is a peak whose intensity exceeds a predetermined threshold from time 0 (s) to T (s).
- the determination unit 13 determines whether or not the directivity direction of the microphone array 11 is set (S103).
- the determination unit 13 searches for sound data having a predetermined frequency from the sound data stored in the storage unit 12, and the intensity of the sound data is a predetermined threshold value. It is determined whether or not (hereinafter, a predetermined intensity threshold). Thus, it is determined whether or not a cue sound is detected (S104).
- the predetermined intensity threshold may be set in advance according to the signal sound, or may be arbitrarily set by the speaker.
- the determination unit 13 determines whether or not the sound data having a frequency of 4000 Hz once exceeds a predetermined intensity threshold during a certain time from 0 (s) to T (s). Can be determined.
- the predetermined intensity threshold to a value that is larger than the intensity of noise or human voice and smaller than the intensity of the sound of clapping hands, For example, it is possible to distinguish between noise and a sound of making a hand, or a person's voice and a sound of making a palm.
- the frequency used for the determination unit 13 to determine whether or not it is a signal sound may be one or more.
- the determination unit 13 may determine using a frequency of 4000 Hz, or may determine using a plurality of frequencies such as 3000 Hz and 5000 Hz. When using a plurality of frequencies, it is determined whether or not the intensities of all frequencies used for the determination are equal to or greater than a predetermined intensity threshold.
- step S104 When the determination of the determination unit 13 in step S104 is no, the determination unit 13 outputs a new sound data storage start signal to the storage unit 12.
- step S101 the storage unit 12 temporarily stores new sound data.
- the sound source direction estimation unit 14 estimates the sound source direction of the sound that exceeds the predetermined intensity threshold from the sound data stored in the storage unit 12 (S105).
- a known method such as a method of calculating the arrival time difference of sounds input to the microphone array 11 (microphones 21 and 22) or a beam former method can be used.
- the directivity control unit 15 outputs a control signal to the microphone array 11 and sets the directivity direction of the microphone array 11 to the sound source direction (the direction of the cue sound) estimated by the sound source direction estimation unit 14 (S106).
- the microphone array 11 emphasizes and accepts sound from the set direction by setting the directivity direction.
- a fixed type represented by a delay-and-sum array or an adaptive type represented by a Griffith-Jim type array can be used.
- the directivity control unit 15 outputs a notification start signal to the display unit 18 after the setting of the directivity direction of the microphone array 11 is completed.
- the display unit 18 receives a notification start signal from the directivity control unit 15 and notifies the speaker that the voice recognition unit 16 is receiving voice.
- the display unit 18 may be an LED, and the speaker may be notified by turning on the LED. Or you may display on a display and notify a speaker.
- the directivity control unit 15 outputs a new sound data storage start signal to the storage unit 12 after the setting of the directivity direction of the microphone array 11 to the direction of the signal is completed.
- the storage unit 12 receives a storage start signal from the directivity control unit 15, and starts storing the sound input to the microphone array 11 again.
- the determination unit 13 further determines whether or not a cue sound is detected in the same manner as in step S104 (S107).
- step S105 If the determination by the determination unit 13 in S107 is yes, the process proceeds to step S105.
- the speech recognition unit 16 performs speech recognition using the sound data stored in the storage unit 12 (S108).
- the voice recognition unit 16 extracts a voice model that matches the sound data stored in the storage unit 12, and determines an operation command corresponding to the voice model (S109).
- FIG. 6 is an example of a database of operation instructions stored in the storage unit 12.
- FIG. 7 is another example of a database of operation instructions stored in the storage unit 12.
- the database includes a speech model of input speech and an operation command corresponding to the speech model.
- the speech model may be a language other than Japanese, such as English, as well as Japanese.
- the speech recognition unit 16 searches the storage unit 12 for a speech model that matches the speech “Enueichike” and corresponds to the speech model.
- An operation command to the electronic device main body “set channel 1” is determined (FIG. 6).
- the voice recognition unit 16 searches the storage unit 12 for a voice model that matches the voice “weather report” and reads “Today's weather forecast corresponding to the voice model”.
- the operation command for presenting information “display” is determined (FIG. 7).
- the speech model shown in FIG. 6 and FIG. 7 is written in words instead of phonetic symbols.
- a plurality of voice models may be associated with one operation instruction. For example, as illustrated in FIG. 6, an operation command “set channel 1” may be associated with “channel one” and “nhk”.
- the television receiver 20 When the electronic device is the television receiver 20, the television receiver 20 is provided with a function for turning off the output sound from the speaker of the television receiver 20 and an echo canceling function while the voice recognition unit 16 accepts the voice. Thus, the recognition accuracy of the voice recognition unit 16 can be further increased.
- the voice recognition unit 16 determines whether or not an operation command has been determined (S110).
- the voice recognition unit 16 When the determination of the voice recognition unit 16 in S110 is yes, the voice recognition unit 16 outputs an operation signal to the device control unit 17.
- the device control unit 17 receives an operation signal from the voice recognition unit 16, gives an operation command determined by the voice recognition unit 16 to the electronic device, and controls the electronic device (S111).
- the directivity control unit 15 outputs a control signal to the microphone array 11 and cancels the setting of the directivity direction of the microphone array 11 (S112).
- the voice recognition unit 16 When the determination of the voice recognition unit 16 in S110 is no, the voice recognition unit 16 outputs a new sound data storage start signal to the storage unit 12.
- step S ⁇ b> 101 the storage unit 12 starts saving the sound input to the microphone array 11 again.
- the same microphone array 11 is used for the estimation of the sound source direction and the speech recognition, but the present invention is not limited to this.
- two or more microphones independent of the microphone array 11 may be used for estimating the sound source direction, and the microphone array 11 may be used for speech recognition.
- the speech recognition apparatus that recognizes the voice from the speaker by setting the directivity direction of the microphone array 11 after the determination unit 13 determines the cue sound has been described, but the present invention is not limited to this. .
- Modification 1 For example, there may be an example in which a speaker emits a cue sound and then inputs a sound within a certain time.
- FIG. 8 is a block diagram of Modification Example 1 of the speech recognition apparatus according to the first embodiment of the present invention.
- the directivity control unit 15 is replaced with the extraction unit 150.
- the extraction unit 150 emphasizes and extracts sound data from the sound source direction estimated by the sound source direction estimation unit 14 from the storage unit 12.
- Such a processing flow (not shown) is as follows.
- the sound data of the cue sound and the sound data of the voice (hereinafter referred to as operation voice) uttered by the speaker to operate the electronic device are stored in the storage unit 12 at a time.
- the determination unit 13 determines a signal sound.
- the sound source direction estimation unit 14 estimates the sound source direction of the cue sound.
- the extraction unit 150 emphasizes and extracts the sound data from the sound source direction of the cue sound estimated by the sound source direction estimation unit 14 from the sound data stored in the storage unit 12. For example, the extraction unit 150 corrects the sound data of the microphones 21 and 22 stored in the storage unit 12 by the time calculated from the sound source direction of the cue sound, and makes it in-phase so that the sound source direction of the cue sound Sound data from may be emphasized.
- the voice recognition unit 16 performs voice recognition.
- the voice recognition unit 16 determines an operation command.
- the device control unit 17 gives an operation command to the electronic device and controls the electronic device.
- the microphones 21 and 22 are preferably omnidirectional microphones.
- the voice recognition device 10 according to the first modification may be configured as follows.
- the sound data of the operation sound and the sound data of the cue sound are stored in the storage unit 12 at a time.
- the determination unit 13 determines a signal sound.
- the sound source direction estimation unit 14 estimates the sound source direction of the cue sound.
- the extraction unit 150 searches the sound data stored in the storage unit 12 for sound data stored before the sound data of the cue sound, and emphasizes the sound data from the sound source direction estimated by the sound source direction estimation unit 14. And extract.
- the voice recognition unit 16 performs voice recognition.
- the voice recognition unit 16 determines an operation command.
- the device control unit 17 gives an operation command to the electronic device and controls the electronic device.
- the voice recognition device can recognize the voice not only when the speaker emits the operation sound after emitting the signal, but also when the signal sounds after the operation sound is generated. Can be improved.
- FIG. 9 is a block diagram of a speech recognition apparatus according to the second embodiment of the present invention.
- the speech recognition apparatus 100 includes the determination unit 113 in place of the determination unit 13 in the speech recognition apparatus 10 according to the first embodiment. Different from the embodiment. The description of the determination unit 113 will be described later.
- the speech recognition apparatus 100 uses a sound based on the custom of “ringing the palm twice” as a cue sound (hereinafter, “cue sound 1”), which is performed when a human directs the attention of another person.
- cue sound 1 a sound based on the custom of “ringing the palm twice” as a cue sound
- the voice recognition device 100 can cancel the setting of the directivity direction of the microphone array 11 once set by using the sound of hitting the palm three times as a cue 2 and can start accepting a new sound. .
- the determination content of the determination unit 113 is different from that of the voice recognition device 10.
- FIG. 10 is a flowchart showing an example of the processing flow of the speech recognition apparatus 100.
- the processing flow of the speech recognition apparatus 100 is that there is step S800 between step S107 and step S108, and the processing content of step S104 and step S107 is the processing flow of the speech recognition apparatus 10. And different.
- FIG. 11 is a diagram showing an example of a change in time of a sound of hitting the palm twice.
- the determination unit 113 searches the sound data stored in the storage unit 12 for sound data having a frequency of 1000 Hz or higher, and whether the intensity of the sound data has exceeded the predetermined intensity threshold value twice within a predetermined time. It is determined whether or not (S104).
- the determination unit 113 determines whether or not the sound data having a frequency of 4000 Hz has exceeded the predetermined intensity threshold value twice during a certain period of time from 0 (s) to T (s). Determine whether.
- the determination unit 113 determines whether the intensity of sound data having a frequency of 1000 Hz or more exceeds a predetermined intensity threshold value twice within a predetermined time, the determination unit 113 determines that there is a cue sound 1 Can do.
- step S107 The processing in step S107 is the same.
- voice recognition can be performed with high accuracy by using, as a cue sound, a sound according to the practice of “slamming the palm twice”, which is performed when a human turns his attention to another person. It can be distinguished more accurately from sudden noise such as a door closing sound.
- the determination unit 113 searches for sound data having a frequency of 1000 Hz or more from the sound data stored in the storage unit 12, and the intensity of the sound data is 3 It is determined whether or not the predetermined intensity threshold is exceeded (S800).
- the determination unit 113 determines whether the intensity of sound data having a frequency of 1000 Hz or more exceeds a predetermined intensity threshold value three times within a predetermined time, the determination unit 113 determines that there is a signal sound 2 Can do.
- step S800 determines whether the determination of the determination unit 113 in step S800 is yes. If the determination of the determination unit 113 in step S800 is yes, the process proceeds to step S112, and the directivity control unit 15 outputs a control signal to the microphone array 11 to cancel the setting of the directivity direction of the microphone array 11.
- the determination unit 113 outputs a new sound data storage start signal to the storage unit 12.
- step S101 the storage unit 12 temporarily stores new sound data.
- step S800 If the determination by the determination unit 113 in step S800 is no, the process proceeds to step S108.
- the speech recognition apparatus 100 can set the direction of the microphone array and cancel the setting by changing the number of times the speaker claps his hand.
- the sound of hitting the palm as the cueing sound 2 may not be 3 degrees as long as it is other than 2 degrees.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- General Physics & Mathematics (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Circuit For Audible Band Transducer (AREA)
- Obtaining Desirable Characteristics In Audible-Bandwidth Transducers (AREA)
- User Interface Of Digital Computer (AREA)
- Toys (AREA)
- Measurement Of Velocity Or Position Using Acoustic Or Ultrasonic Waves (AREA)
Abstract
Description
第1の実施の形態に係る音声認識装置10は、話者の動作により発生する音(以下「合図音」という)を検知することにより、音声入力部に含まれるマイクロフォンアレイの指向方向を合図音の音源方向に設定し、話者からの音声を認識し、テレビ受像機等の電子機器を制御する。合図音には、例えば、手のひらなどの複数の体の部位を打ち鳴らす音、指を弾く音、指や手などの体の部位で物を叩く音などが含まれる。本実施の形態では、手のひらを打ち鳴らす音を合図音として用いる。
例えば、話者が、合図音を発し、その後、一定時間内に音声を入力する例もあり得る。
変形例1の音声認識装置10は、以下の形態とすることもできる。
図9は、本発明の第2の実施の形態に係る音声認識装置のブロック図である。
11 マイクロフォンアレイ
12 記憶部
13、113 判定部
14 音源方向推定部
15 指向性制御部
16 音声認識部
17 機器制御部
18 表示部
20 テレビ受像機
21、22 マイクロフォン
29 筐体
50 音声入力部
150 抽出部
Claims (3)
- 複数のマイクロフォンを含む音声入力部に入力された音に、1000Hz以上の周波数が所定の強度以上含まれるか否かを判定する判定部と、
前記判定部による判定が真の場合に、1000Hz以上の周波数を含む前記音の音源方向を推定する音源方向推定部と、
推定された前記音源方向から到来した音が、あらかじめ登録された音声モデルと一致するか否かを判定する音声認識部と
を備えることを特徴とする音声認識装置。 - 前記判定部は、
1000Hz以上の周波数を含む前記音が、所定の時間内に2度検出されたか否かをさらに判定することを特徴とする、請求項1記載の音声認識装置。 - 複数のマイクロフォンを含む音声入力部に入力された音に、話者の動作により発生した合図音が、所定の強度以上含まれるか否かを判定する判定部と、
前記判定部による判定が真の場合に、前記合図音を含む音の音源方向を推定する音源方向推定部と、
推定された前記音源方向から到来した音が、あらかじめ登録された音声モデルと一致するか否かを判定する音声認識部と
を備えることを特徴とする音声認識装置。
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2011539182A JP5622744B2 (ja) | 2009-11-06 | 2009-11-06 | 音声認識装置 |
CN200980161199.3A CN102483918B (zh) | 2009-11-06 | 2009-11-06 | 声音识别装置 |
PCT/JP2009/005905 WO2011055410A1 (ja) | 2009-11-06 | 2009-11-06 | 音声認識装置 |
US13/430,264 US8762145B2 (en) | 2009-11-06 | 2012-03-26 | Voice recognition apparatus |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2009/005905 WO2011055410A1 (ja) | 2009-11-06 | 2009-11-06 | 音声認識装置 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/430,264 Continuation US8762145B2 (en) | 2009-11-06 | 2012-03-26 | Voice recognition apparatus |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2011055410A1 true WO2011055410A1 (ja) | 2011-05-12 |
Family
ID=43969656
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2009/005905 WO2011055410A1 (ja) | 2009-11-06 | 2009-11-06 | 音声認識装置 |
Country Status (4)
Country | Link |
---|---|
US (1) | US8762145B2 (ja) |
JP (1) | JP5622744B2 (ja) |
CN (1) | CN102483918B (ja) |
WO (1) | WO2011055410A1 (ja) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2015064580A (ja) * | 2013-09-24 | 2015-04-09 | パワー ヴォイス カンパニー リミテッド | サウンドコードをエンコーディングするエンコーディング装置及び方法、サウンドコードをデコーディングするデコーディング装置及び方法 |
US9154848B2 (en) | 2011-03-01 | 2015-10-06 | Kabushiki Kaisha Toshiba | Television apparatus and a remote operation apparatus |
JP2015535952A (ja) * | 2012-09-29 | 2015-12-17 | シェンジェン ピーアールテック カンパニー リミテッド | マルチメディアデバイス用音声制御システム及び方法、及びコンピュータ記憶媒体 |
JP2017539187A (ja) * | 2015-10-28 | 2017-12-28 | 小米科技有限責任公司Xiaomi Inc. | スマート機器の音声制御方法、装置、プログラム、記録媒体、制御機器およびスマート機器 |
JP2021076874A (ja) * | 2021-02-17 | 2021-05-20 | 日本電信電話株式会社 | 話者方向強調装置、話者方向強調方法、およびプログラム |
Families Citing this family (177)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8645137B2 (en) | 2000-03-16 | 2014-02-04 | Apple Inc. | Fast, language-independent method for user authentication by voice |
US8677377B2 (en) | 2005-09-08 | 2014-03-18 | Apple Inc. | Method and apparatus for building an intelligent automated assistant |
US9318108B2 (en) | 2010-01-18 | 2016-04-19 | Apple Inc. | Intelligent automated assistant |
US8977255B2 (en) | 2007-04-03 | 2015-03-10 | Apple Inc. | Method and system for operating a multi-function portable electronic device using voice-activation |
US10002189B2 (en) | 2007-12-20 | 2018-06-19 | Apple Inc. | Method and apparatus for searching using an active ontology |
US9330720B2 (en) | 2008-01-03 | 2016-05-03 | Apple Inc. | Methods and apparatus for altering audio output signals |
US8996376B2 (en) | 2008-04-05 | 2015-03-31 | Apple Inc. | Intelligent text-to-speech conversion |
US20100030549A1 (en) | 2008-07-31 | 2010-02-04 | Lee Michael M | Mobile device having human language translation capability with positional feedback |
US8676904B2 (en) | 2008-10-02 | 2014-03-18 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
US10241752B2 (en) | 2011-09-30 | 2019-03-26 | Apple Inc. | Interface for a virtual digital assistant |
US10241644B2 (en) | 2011-06-03 | 2019-03-26 | Apple Inc. | Actionable reminder entries |
US10706373B2 (en) | 2011-06-03 | 2020-07-07 | Apple Inc. | Performing actions associated with task items that represent tasks to perform |
US9431006B2 (en) | 2009-07-02 | 2016-08-30 | Apple Inc. | Methods and apparatuses for automatic speech recognition |
US10276170B2 (en) | 2010-01-18 | 2019-04-30 | Apple Inc. | Intelligent automated assistant |
US8682667B2 (en) | 2010-02-25 | 2014-03-25 | Apple Inc. | User profiling for selecting user specific voice input processing information |
US9262612B2 (en) | 2011-03-21 | 2016-02-16 | Apple Inc. | Device access using voice authentication |
US10057736B2 (en) | 2011-06-03 | 2018-08-21 | Apple Inc. | Active transport based notifications |
US8994660B2 (en) | 2011-08-29 | 2015-03-31 | Apple Inc. | Text correction processing |
US10134385B2 (en) | 2012-03-02 | 2018-11-20 | Apple Inc. | Systems and methods for name pronunciation |
US9280610B2 (en) | 2012-05-14 | 2016-03-08 | Apple Inc. | Crowd sourcing information to fulfill user requests |
US10417037B2 (en) | 2012-05-15 | 2019-09-17 | Apple Inc. | Systems and methods for integrating third party services with a digital assistant |
US9301073B2 (en) * | 2012-06-08 | 2016-03-29 | Apple Inc. | Systems and methods for determining the condition of multiple microphones |
US9721563B2 (en) | 2012-06-08 | 2017-08-01 | Apple Inc. | Name recognition system |
JP5367134B1 (ja) * | 2012-07-19 | 2013-12-11 | 日東紡音響エンジニアリング株式会社 | 騒音識別装置及び騒音識別方法 |
US9547647B2 (en) | 2012-09-19 | 2017-01-17 | Apple Inc. | Voice-based media searching |
CN113470641B (zh) | 2013-02-07 | 2023-12-15 | 苹果公司 | 数字助理的语音触发器 |
US10652394B2 (en) | 2013-03-14 | 2020-05-12 | Apple Inc. | System and method for processing voicemail |
US10748529B1 (en) * | 2013-03-15 | 2020-08-18 | Apple Inc. | Voice activated device for use with a voice-based digital assistant |
CN105378838A (zh) * | 2013-05-13 | 2016-03-02 | 汤姆逊许可公司 | 用于隔离麦克风音频的方法、装置和系统 |
WO2014197334A2 (en) | 2013-06-07 | 2014-12-11 | Apple Inc. | System and method for user-specified pronunciation of words for speech synthesis and recognition |
WO2014197336A1 (en) | 2013-06-07 | 2014-12-11 | Apple Inc. | System and method for detecting errors in interactions with a voice-based digital assistant |
US9582608B2 (en) | 2013-06-07 | 2017-02-28 | Apple Inc. | Unified ranking with entropy-weighted information for phrase-based semantic auto-completion |
WO2014197335A1 (en) | 2013-06-08 | 2014-12-11 | Apple Inc. | Interpreting and acting upon commands that involve sharing information with remote devices |
KR101772152B1 (ko) | 2013-06-09 | 2017-08-28 | 애플 인크. | 디지털 어시스턴트의 둘 이상의 인스턴스들에 걸친 대화 지속성을 가능하게 하기 위한 디바이스, 방법 및 그래픽 사용자 인터페이스 |
US10176167B2 (en) | 2013-06-09 | 2019-01-08 | Apple Inc. | System and method for inferring user intent from speech inputs |
CN105453026A (zh) | 2013-08-06 | 2016-03-30 | 苹果公司 | 基于来自远程设备的活动自动激活智能响应 |
JP5996603B2 (ja) * | 2013-10-31 | 2016-09-21 | シャープ株式会社 | サーバ、発話制御方法、発話装置、発話システムおよびプログラム |
WO2015072816A1 (ko) * | 2013-11-18 | 2015-05-21 | 삼성전자 주식회사 | 디스플레이 장치 및 제어 방법 |
US10296160B2 (en) | 2013-12-06 | 2019-05-21 | Apple Inc. | Method for extracting salient dialog usage from live data |
CN104214890A (zh) * | 2014-01-20 | 2014-12-17 | 美的集团股份有限公司 | 通过语音控制空调器送风的方法及空调器 |
CN103994541B (zh) * | 2014-04-21 | 2017-01-04 | 美的集团股份有限公司 | 基于语音控制的风向切换方法和系统 |
US9430463B2 (en) | 2014-05-30 | 2016-08-30 | Apple Inc. | Exemplar-based natural language processing |
US10170123B2 (en) | 2014-05-30 | 2019-01-01 | Apple Inc. | Intelligent assistant for home automation |
US9715875B2 (en) | 2014-05-30 | 2017-07-25 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
US9633004B2 (en) | 2014-05-30 | 2017-04-25 | Apple Inc. | Better resolution when referencing to concepts |
US9842101B2 (en) | 2014-05-30 | 2017-12-12 | Apple Inc. | Predictive conversion of language input |
WO2015184186A1 (en) | 2014-05-30 | 2015-12-03 | Apple Inc. | Multi-command single utterance input method |
US9338493B2 (en) | 2014-06-30 | 2016-05-10 | Apple Inc. | Intelligent automated assistant for TV user interactions |
CN104267618B (zh) * | 2014-07-31 | 2017-06-13 | 广东美的制冷设备有限公司 | 基于红外定位的语音控制方法和系统 |
US9818400B2 (en) | 2014-09-11 | 2017-11-14 | Apple Inc. | Method and apparatus for discovering trending terms in speech requests |
US10789041B2 (en) | 2014-09-12 | 2020-09-29 | Apple Inc. | Dynamic thresholds for always listening speech trigger |
US9668121B2 (en) | 2014-09-30 | 2017-05-30 | Apple Inc. | Social reminders |
US9646609B2 (en) | 2014-09-30 | 2017-05-09 | Apple Inc. | Caching apparatus for serving phonetic pronunciations |
US9886432B2 (en) | 2014-09-30 | 2018-02-06 | Apple Inc. | Parsimonious handling of word inflection via categorical stem + suffix N-gram language models |
US10127911B2 (en) | 2014-09-30 | 2018-11-13 | Apple Inc. | Speaker identification and unsupervised speaker adaptation techniques |
US10074360B2 (en) | 2014-09-30 | 2018-09-11 | Apple Inc. | Providing an indication of the suitability of speech recognition |
CN105864952B (zh) * | 2015-01-19 | 2019-06-21 | Tcl空调器(中山)有限公司 | 空调器及空调器的控制方法 |
US10152299B2 (en) | 2015-03-06 | 2018-12-11 | Apple Inc. | Reducing response latency of intelligent automated assistants |
US9865280B2 (en) | 2015-03-06 | 2018-01-09 | Apple Inc. | Structured dictation using intelligent automated assistants |
US10567477B2 (en) | 2015-03-08 | 2020-02-18 | Apple Inc. | Virtual assistant continuity |
US9886953B2 (en) | 2015-03-08 | 2018-02-06 | Apple Inc. | Virtual assistant activation |
US9721566B2 (en) | 2015-03-08 | 2017-08-01 | Apple Inc. | Competing devices responding to voice triggers |
US9899019B2 (en) | 2015-03-18 | 2018-02-20 | Apple Inc. | Systems and methods for structured stem and suffix language models |
US9842105B2 (en) | 2015-04-16 | 2017-12-12 | Apple Inc. | Parsimonious continuous-space phrase representations for natural language processing |
US10460227B2 (en) | 2015-05-15 | 2019-10-29 | Apple Inc. | Virtual assistant in a communication session |
US10200824B2 (en) | 2015-05-27 | 2019-02-05 | Apple Inc. | Systems and methods for proactively identifying and surfacing relevant content on a touch-sensitive device |
US10083688B2 (en) | 2015-05-27 | 2018-09-25 | Apple Inc. | Device voice control for selecting a displayed affordance |
US10127220B2 (en) | 2015-06-04 | 2018-11-13 | Apple Inc. | Language identification from short strings |
US9578173B2 (en) | 2015-06-05 | 2017-02-21 | Apple Inc. | Virtual assistant aided communication with 3rd party service in a communication session |
US10101822B2 (en) | 2015-06-05 | 2018-10-16 | Apple Inc. | Language input correction |
US10186254B2 (en) | 2015-06-07 | 2019-01-22 | Apple Inc. | Context-based endpoint detection |
US11025565B2 (en) | 2015-06-07 | 2021-06-01 | Apple Inc. | Personalized prediction of responses for instant messaging |
US10255907B2 (en) | 2015-06-07 | 2019-04-09 | Apple Inc. | Automatic accent detection using acoustic models |
US20160378747A1 (en) | 2015-06-29 | 2016-12-29 | Apple Inc. | Virtual assistant for media playback |
US10747498B2 (en) | 2015-09-08 | 2020-08-18 | Apple Inc. | Zero latency digital assistant |
US10740384B2 (en) | 2015-09-08 | 2020-08-11 | Apple Inc. | Intelligent automated assistant for media search and playback |
US10331312B2 (en) | 2015-09-08 | 2019-06-25 | Apple Inc. | Intelligent automated assistant in a media environment |
US10671428B2 (en) | 2015-09-08 | 2020-06-02 | Apple Inc. | Distributed personal assistant |
US9697820B2 (en) | 2015-09-24 | 2017-07-04 | Apple Inc. | Unit-selection text-to-speech synthesis using concatenation-sensitive neural networks |
US10366158B2 (en) | 2015-09-29 | 2019-07-30 | Apple Inc. | Efficient word encoding for recurrent neural network language models |
US11010550B2 (en) | 2015-09-29 | 2021-05-18 | Apple Inc. | Unified language modeling framework for word prediction, auto-completion and auto-correction |
US11587559B2 (en) | 2015-09-30 | 2023-02-21 | Apple Inc. | Intelligent device identification |
US10691473B2 (en) | 2015-11-06 | 2020-06-23 | Apple Inc. | Intelligent automated assistant in a messaging environment |
US10956666B2 (en) | 2015-11-09 | 2021-03-23 | Apple Inc. | Unconventional virtual assistant interactions |
US10049668B2 (en) | 2015-12-02 | 2018-08-14 | Apple Inc. | Applying neural network language models to weighted finite state transducers for automatic speech recognition |
US10223066B2 (en) | 2015-12-23 | 2019-03-05 | Apple Inc. | Proactive assistance based on dialog communication between devices |
US10446143B2 (en) | 2016-03-14 | 2019-10-15 | Apple Inc. | Identification of voice inputs providing credentials |
CN105788599B (zh) * | 2016-04-14 | 2019-08-06 | 北京小米移动软件有限公司 | 语音处理方法、路由器及智能语音控制系统 |
US9934775B2 (en) | 2016-05-26 | 2018-04-03 | Apple Inc. | Unit-selection text-to-speech synthesis based on predicted concatenation parameters |
US9972304B2 (en) | 2016-06-03 | 2018-05-15 | Apple Inc. | Privacy preserving distributed evaluation framework for embedded personalized systems |
US10249300B2 (en) | 2016-06-06 | 2019-04-02 | Apple Inc. | Intelligent list reading |
US11227589B2 (en) | 2016-06-06 | 2022-01-18 | Apple Inc. | Intelligent list reading |
US10049663B2 (en) | 2016-06-08 | 2018-08-14 | Apple, Inc. | Intelligent automated assistant for media exploration |
DK179309B1 (en) | 2016-06-09 | 2018-04-23 | Apple Inc | Intelligent automated assistant in a home environment |
US10067938B2 (en) | 2016-06-10 | 2018-09-04 | Apple Inc. | Multilingual word prediction |
US10509862B2 (en) | 2016-06-10 | 2019-12-17 | Apple Inc. | Dynamic phrase expansion of language input |
US10586535B2 (en) | 2016-06-10 | 2020-03-10 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
US10490187B2 (en) | 2016-06-10 | 2019-11-26 | Apple Inc. | Digital assistant providing automated status report |
US10192552B2 (en) | 2016-06-10 | 2019-01-29 | Apple Inc. | Digital assistant providing whispered speech |
DK179415B1 (en) | 2016-06-11 | 2018-06-14 | Apple Inc | Intelligent device arbitration and control |
DK179343B1 (en) | 2016-06-11 | 2018-05-14 | Apple Inc | Intelligent task discovery |
DK179049B1 (en) | 2016-06-11 | 2017-09-18 | Apple Inc | Data driven natural language event detection and classification |
DK201670540A1 (en) | 2016-06-11 | 2018-01-08 | Apple Inc | Application integration with a digital assistant |
CN106328128A (zh) * | 2016-08-16 | 2017-01-11 | 成都市和平科技有限责任公司 | 一种基于语音识别技术的教学系统及方法 |
US10474753B2 (en) | 2016-09-07 | 2019-11-12 | Apple Inc. | Language identification using recurrent neural networks |
US10043516B2 (en) | 2016-09-23 | 2018-08-07 | Apple Inc. | Intelligent automated assistant |
US11281993B2 (en) | 2016-12-05 | 2022-03-22 | Apple Inc. | Model and ensemble compression for metric learning |
US10593346B2 (en) | 2016-12-22 | 2020-03-17 | Apple Inc. | Rank-reduced token representation for automatic speech recognition |
US11204787B2 (en) | 2017-01-09 | 2021-12-21 | Apple Inc. | Application integration with a digital assistant |
US10417266B2 (en) | 2017-05-09 | 2019-09-17 | Apple Inc. | Context-aware ranking of intelligent response suggestions |
DK201770383A1 (en) | 2017-05-09 | 2018-12-14 | Apple Inc. | USER INTERFACE FOR CORRECTING RECOGNITION ERRORS |
US10726832B2 (en) | 2017-05-11 | 2020-07-28 | Apple Inc. | Maintaining privacy of personal information |
DK180048B1 (en) | 2017-05-11 | 2020-02-04 | Apple Inc. | MAINTAINING THE DATA PROTECTION OF PERSONAL INFORMATION |
DK201770439A1 (en) | 2017-05-11 | 2018-12-13 | Apple Inc. | Offline personal assistant |
US10395654B2 (en) | 2017-05-11 | 2019-08-27 | Apple Inc. | Text normalization based on a data-driven learning network |
DK201770429A1 (en) | 2017-05-12 | 2018-12-14 | Apple Inc. | LOW-LATENCY INTELLIGENT AUTOMATED ASSISTANT |
US11301477B2 (en) | 2017-05-12 | 2022-04-12 | Apple Inc. | Feedback analysis of a digital assistant |
DK179496B1 (en) | 2017-05-12 | 2019-01-15 | Apple Inc. | USER-SPECIFIC Acoustic Models |
DK179745B1 (en) | 2017-05-12 | 2019-05-01 | Apple Inc. | SYNCHRONIZATION AND TASK DELEGATION OF A DIGITAL ASSISTANT |
DK201770431A1 (en) | 2017-05-15 | 2018-12-20 | Apple Inc. | Optimizing dialogue policy decisions for digital assistants using implicit feedback |
DK201770411A1 (en) | 2017-05-15 | 2018-12-20 | Apple Inc. | MULTI-MODAL INTERFACES |
DK201770432A1 (en) | 2017-05-15 | 2018-12-21 | Apple Inc. | Hierarchical belief states for digital assistants |
US10403278B2 (en) | 2017-05-16 | 2019-09-03 | Apple Inc. | Methods and systems for phonetic matching in digital assistant services |
DK179560B1 (en) | 2017-05-16 | 2019-02-18 | Apple Inc. | FAR-FIELD EXTENSION FOR DIGITAL ASSISTANT SERVICES |
US10311144B2 (en) | 2017-05-16 | 2019-06-04 | Apple Inc. | Emoji word sense disambiguation |
US10303715B2 (en) | 2017-05-16 | 2019-05-28 | Apple Inc. | Intelligent automated assistant for media exploration |
US20180336892A1 (en) | 2017-05-16 | 2018-11-22 | Apple Inc. | Detecting a trigger of a digital assistant |
US10657328B2 (en) | 2017-06-02 | 2020-05-19 | Apple Inc. | Multi-task recurrent neural network architecture for efficient morphology handling in neural language modeling |
US10445429B2 (en) | 2017-09-21 | 2019-10-15 | Apple Inc. | Natural language understanding using vocabularies with compressed serialized tries |
US10755051B2 (en) | 2017-09-29 | 2020-08-25 | Apple Inc. | Rule-based natural language processing |
US10636424B2 (en) | 2017-11-30 | 2020-04-28 | Apple Inc. | Multi-turn canned dialog |
US10733982B2 (en) | 2018-01-08 | 2020-08-04 | Apple Inc. | Multi-directional dialog |
CN108419168A (zh) * | 2018-01-19 | 2018-08-17 | 广东小天才科技有限公司 | 拾音设备的指向性拾音方法、装置、拾音设备及存储介质 |
US10733375B2 (en) | 2018-01-31 | 2020-08-04 | Apple Inc. | Knowledge-based framework for improving natural language understanding |
US10789959B2 (en) | 2018-03-02 | 2020-09-29 | Apple Inc. | Training speaker recognition models for digital assistants |
US10847176B2 (en) * | 2018-03-12 | 2020-11-24 | Amazon Technologies, Inc. | Detection of TV state using sub-audible signal |
US10560737B2 (en) | 2018-03-12 | 2020-02-11 | Amazon Technologies, Inc. | Voice-controlled multimedia device |
US10592604B2 (en) | 2018-03-12 | 2020-03-17 | Apple Inc. | Inverse text normalization for automatic speech recognition |
US10818288B2 (en) | 2018-03-26 | 2020-10-27 | Apple Inc. | Natural assistant interaction |
US10909331B2 (en) | 2018-03-30 | 2021-02-02 | Apple Inc. | Implicit identification of translation payload with neural machine translation |
US11145294B2 (en) | 2018-05-07 | 2021-10-12 | Apple Inc. | Intelligent automated assistant for delivering content from user experiences |
US10928918B2 (en) | 2018-05-07 | 2021-02-23 | Apple Inc. | Raise to speak |
US10984780B2 (en) | 2018-05-21 | 2021-04-20 | Apple Inc. | Global semantic word embeddings using bi-directional recurrent neural networks |
DK179822B1 (da) | 2018-06-01 | 2019-07-12 | Apple Inc. | Voice interaction at a primary device to access call functionality of a companion device |
DK180639B1 (en) | 2018-06-01 | 2021-11-04 | Apple Inc | DISABILITY OF ATTENTION-ATTENTIVE VIRTUAL ASSISTANT |
US10892996B2 (en) | 2018-06-01 | 2021-01-12 | Apple Inc. | Variable latency device coordination |
DK201870355A1 (en) | 2018-06-01 | 2019-12-16 | Apple Inc. | VIRTUAL ASSISTANT OPERATION IN MULTI-DEVICE ENVIRONMENTS |
US11386266B2 (en) | 2018-06-01 | 2022-07-12 | Apple Inc. | Text correction |
US10496705B1 (en) | 2018-06-03 | 2019-12-03 | Apple Inc. | Accelerated task performance |
US11010561B2 (en) | 2018-09-27 | 2021-05-18 | Apple Inc. | Sentiment prediction from textual data |
US10839159B2 (en) | 2018-09-28 | 2020-11-17 | Apple Inc. | Named entity normalization in a spoken dialog system |
US11462215B2 (en) | 2018-09-28 | 2022-10-04 | Apple Inc. | Multi-modal inputs for voice commands |
US11170166B2 (en) | 2018-09-28 | 2021-11-09 | Apple Inc. | Neural typographical error modeling via generative adversarial networks |
KR20200043075A (ko) * | 2018-10-17 | 2020-04-27 | 삼성전자주식회사 | 전자 장치 및 그 제어방법, 전자 장치의 음향 출력 제어 시스템 |
US11475898B2 (en) | 2018-10-26 | 2022-10-18 | Apple Inc. | Low-latency multi-speaker speech recognition |
US11638059B2 (en) | 2019-01-04 | 2023-04-25 | Apple Inc. | Content playback on multiple devices |
CN109640112B (zh) * | 2019-01-15 | 2021-11-23 | 广州虎牙信息科技有限公司 | 视频处理方法、装置、设备及存储介质 |
US11348573B2 (en) | 2019-03-18 | 2022-05-31 | Apple Inc. | Multimodality in digital assistant systems |
US11423908B2 (en) | 2019-05-06 | 2022-08-23 | Apple Inc. | Interpreting spoken requests |
US11475884B2 (en) | 2019-05-06 | 2022-10-18 | Apple Inc. | Reducing digital assistant latency when a language is incorrectly determined |
DK201970509A1 (en) | 2019-05-06 | 2021-01-15 | Apple Inc | Spoken notifications |
US11307752B2 (en) | 2019-05-06 | 2022-04-19 | Apple Inc. | User configurable task triggers |
US11140099B2 (en) | 2019-05-21 | 2021-10-05 | Apple Inc. | Providing message response suggestions |
DK201970510A1 (en) | 2019-05-31 | 2021-02-11 | Apple Inc | Voice identification in digital assistant systems |
DK180129B1 (en) | 2019-05-31 | 2020-06-02 | Apple Inc. | USER ACTIVITY SHORTCUT SUGGESTIONS |
US11496600B2 (en) | 2019-05-31 | 2022-11-08 | Apple Inc. | Remote execution of machine-learned models |
US11289073B2 (en) | 2019-05-31 | 2022-03-29 | Apple Inc. | Device text to speech |
US11468890B2 (en) | 2019-06-01 | 2022-10-11 | Apple Inc. | Methods and user interfaces for voice-based control of electronic devices |
US11360641B2 (en) | 2019-06-01 | 2022-06-14 | Apple Inc. | Increasing the relevance of new available information |
WO2021056255A1 (en) | 2019-09-25 | 2021-04-01 | Apple Inc. | Text detection using global geometry estimators |
US11061543B1 (en) | 2020-05-11 | 2021-07-13 | Apple Inc. | Providing relevant data items based on context |
US11183193B1 (en) | 2020-05-11 | 2021-11-23 | Apple Inc. | Digital assistant hardware abstraction |
US11755276B2 (en) | 2020-05-12 | 2023-09-12 | Apple Inc. | Reducing description length based on confidence |
CN111609515A (zh) * | 2020-05-26 | 2020-09-01 | 珠海格力电器股份有限公司 | 一种空调外机去除蜂巢的方法及空调外机 |
US11490204B2 (en) | 2020-07-20 | 2022-11-01 | Apple Inc. | Multi-device audio adjustment coordination |
US11438683B2 (en) | 2020-07-21 | 2022-09-06 | Apple Inc. | User identification using headphones |
CN112770224B (zh) * | 2020-12-30 | 2022-07-05 | 上海移远通信技术股份有限公司 | 车内音源采集系统及方法 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP3069036U (ja) * | 1999-10-28 | 2000-05-30 | 株式会社タイメクス | 手拍子検出装置 |
JP2002247569A (ja) * | 2001-02-20 | 2002-08-30 | Nippon Syst Design Kk | エンドレスフレームメモリを用いた姿見 |
JP2007121579A (ja) * | 2005-10-26 | 2007-05-17 | Matsushita Electric Works Ltd | 操作装置 |
JP2007221300A (ja) * | 2006-02-15 | 2007-08-30 | Fujitsu Ltd | ロボット及びロボットの制御方法 |
Family Cites Families (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6219645B1 (en) * | 1999-12-02 | 2001-04-17 | Lucent Technologies, Inc. | Enhanced automatic speech recognition using multiple directional microphones |
US6449593B1 (en) * | 2000-01-13 | 2002-09-10 | Nokia Mobile Phones Ltd. | Method and system for tracking human speakers |
GB2364121B (en) * | 2000-06-30 | 2004-11-24 | Mitel Corp | Method and apparatus for locating a talker |
US6820056B1 (en) * | 2000-11-21 | 2004-11-16 | International Business Machines Corporation | Recognizing non-verbal sound commands in an interactive computer controlled speech word recognition display system |
JP3771812B2 (ja) * | 2001-05-28 | 2006-04-26 | インターナショナル・ビジネス・マシーンズ・コーポレーション | ロボットおよびその制御方法 |
DE10133126A1 (de) * | 2001-07-07 | 2003-01-16 | Philips Corp Intellectual Pty | Richtungssensitives Audioaufnahmesystem mit Anzeige von Aufnahmegebiet und/oder Störquelle |
JP3940662B2 (ja) * | 2001-11-22 | 2007-07-04 | 株式会社東芝 | 音響信号処理方法及び音響信号処理装置及び音声認識装置 |
JP4195267B2 (ja) * | 2002-03-14 | 2008-12-10 | インターナショナル・ビジネス・マシーンズ・コーポレーション | 音声認識装置、その音声認識方法及びプログラム |
US7418392B1 (en) * | 2003-09-25 | 2008-08-26 | Sensory, Inc. | System and method for controlling the operation of a device by voice commands |
KR100754384B1 (ko) * | 2003-10-13 | 2007-08-31 | 삼성전자주식회사 | 잡음에 강인한 화자위치 추정방법 및 장치와 이를 이용한카메라 제어시스템 |
JP4516527B2 (ja) * | 2003-11-12 | 2010-08-04 | 本田技研工業株式会社 | 音声認識装置 |
US8271200B2 (en) * | 2003-12-31 | 2012-09-18 | Sieracki Jeffrey M | System and method for acoustic signature extraction, detection, discrimination, and localization |
DE102004049347A1 (de) * | 2004-10-08 | 2006-04-20 | Micronas Gmbh | Schaltungsanordnung bzw. Verfahren für Sprache enthaltende Audiosignale |
WO2006059806A1 (ja) * | 2004-12-03 | 2006-06-08 | Honda Motor Co., Ltd. | 音声認識装置 |
JP4729927B2 (ja) * | 2005-01-11 | 2011-07-20 | ソニー株式会社 | 音声検出装置、自動撮像装置、および音声検出方法 |
JP4247195B2 (ja) | 2005-03-23 | 2009-04-02 | 株式会社東芝 | 音響信号処理装置、音響信号処理方法、音響信号処理プログラム、及び音響信号処理プログラムを記録した記録媒体 |
US8103504B2 (en) | 2006-08-28 | 2012-01-24 | Victor Company Of Japan, Limited | Electronic appliance and voice signal processing method for use in the same |
CN100527185C (zh) * | 2006-08-28 | 2009-08-12 | 日本胜利株式会社 | 电子设备及其使用的声音信号处理方法 |
JP4234746B2 (ja) | 2006-09-25 | 2009-03-04 | 株式会社東芝 | 音響信号処理装置、音響信号処理方法及び音響信号処理プログラム |
US8321214B2 (en) * | 2008-06-02 | 2012-11-27 | Qualcomm Incorporated | Systems, methods, and apparatus for multichannel signal amplitude balancing |
JP2010072507A (ja) | 2008-09-22 | 2010-04-02 | Toshiba Corp | 音声認識検索装置及び音声認識検索方法 |
JP5646146B2 (ja) | 2009-03-18 | 2014-12-24 | 株式会社東芝 | 音声入力装置、音声認識システム及び音声認識方法 |
JP5771002B2 (ja) | 2010-12-22 | 2015-08-26 | 株式会社東芝 | 音声認識装置、音声認識方法および音声認識装置を搭載したテレビ受像機 |
-
2009
- 2009-11-06 WO PCT/JP2009/005905 patent/WO2011055410A1/ja active Application Filing
- 2009-11-06 JP JP2011539182A patent/JP5622744B2/ja not_active Expired - Fee Related
- 2009-11-06 CN CN200980161199.3A patent/CN102483918B/zh not_active Expired - Fee Related
-
2012
- 2012-03-26 US US13/430,264 patent/US8762145B2/en not_active Expired - Fee Related
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP3069036U (ja) * | 1999-10-28 | 2000-05-30 | 株式会社タイメクス | 手拍子検出装置 |
JP2002247569A (ja) * | 2001-02-20 | 2002-08-30 | Nippon Syst Design Kk | エンドレスフレームメモリを用いた姿見 |
JP2007121579A (ja) * | 2005-10-26 | 2007-05-17 | Matsushita Electric Works Ltd | 操作装置 |
JP2007221300A (ja) * | 2006-02-15 | 2007-08-30 | Fujitsu Ltd | ロボット及びロボットの制御方法 |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9154848B2 (en) | 2011-03-01 | 2015-10-06 | Kabushiki Kaisha Toshiba | Television apparatus and a remote operation apparatus |
JP2015535952A (ja) * | 2012-09-29 | 2015-12-17 | シェンジェン ピーアールテック カンパニー リミテッド | マルチメディアデバイス用音声制御システム及び方法、及びコンピュータ記憶媒体 |
US9955210B2 (en) | 2012-09-29 | 2018-04-24 | Shenzhen Prtek Co. Ltd. | Multimedia device voice control system and method, and computer storage medium |
JP2015064580A (ja) * | 2013-09-24 | 2015-04-09 | パワー ヴォイス カンパニー リミテッド | サウンドコードをエンコーディングするエンコーディング装置及び方法、サウンドコードをデコーディングするデコーディング装置及び方法 |
US9515748B2 (en) | 2013-09-24 | 2016-12-06 | Powervoice Co., Ltd. | Encoding apparatus and method for encoding sound code, decoding apparatus and method for decoding the sound code |
JP2017539187A (ja) * | 2015-10-28 | 2017-12-28 | 小米科技有限責任公司Xiaomi Inc. | スマート機器の音声制御方法、装置、プログラム、記録媒体、制御機器およびスマート機器 |
JP2021076874A (ja) * | 2021-02-17 | 2021-05-20 | 日本電信電話株式会社 | 話者方向強調装置、話者方向強調方法、およびプログラム |
JP7111206B2 (ja) | 2021-02-17 | 2022-08-02 | 日本電信電話株式会社 | 話者方向強調装置、話者方向強調方法、およびプログラム |
Also Published As
Publication number | Publication date |
---|---|
US20120245932A1 (en) | 2012-09-27 |
CN102483918B (zh) | 2014-08-20 |
JP5622744B2 (ja) | 2014-11-12 |
JPWO2011055410A1 (ja) | 2013-03-21 |
CN102483918A (zh) | 2012-05-30 |
US8762145B2 (en) | 2014-06-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP5622744B2 (ja) | 音声認識装置 | |
CN109599124B (zh) | 一种音频数据处理方法、装置及存储介质 | |
CN110785808B (zh) | 具有唤醒字检测的音频设备 | |
US11158333B2 (en) | Multi-stream target-speech detection and channel fusion | |
TWI840587B (zh) | 多模態使用者介面 | |
US7885818B2 (en) | Controlling an apparatus based on speech | |
JP6450139B2 (ja) | 音声認識装置、音声認識方法、及び音声認識プログラム | |
DK2603018T3 (da) | Høreindretning med taleaktivitetserkendelse og fremgangsmåde til driften af en høreindretning | |
CN110428806B (zh) | 基于麦克风信号的语音交互唤醒电子设备、方法和介质 | |
CN110097875B (zh) | 基于麦克风信号的语音交互唤醒电子设备、方法和介质 | |
CN110223711B (zh) | 基于麦克风信号的语音交互唤醒电子设备、方法和介质 | |
US10109294B1 (en) | Adaptive echo cancellation | |
US8767987B2 (en) | Ear contact pressure wave hearing aid switch | |
GB2608710A (en) | Speaker identification | |
WO2022151657A1 (zh) | 降噪方法、装置、音频设备和计算机可读存储介质 | |
US11290802B1 (en) | Voice detection using hearable devices | |
US20070198268A1 (en) | Method for controlling a speech dialog system and speech dialog system | |
US20210201928A1 (en) | Integrated speech enhancement for voice trigger application | |
KR20230084154A (ko) | 동적 분류기를 사용한 사용자 음성 활동 검출 | |
JP2007264132A (ja) | 音声検出装置及びその方法 | |
JP4635683B2 (ja) | 音声認識装置および方法 | |
US20240079007A1 (en) | System and method for detecting a wakeup command for a voice assistant | |
JP7429107B2 (ja) | 音声翻訳装置、音声翻訳方法及びそのプログラム | |
JP2009025518A (ja) | 音声対話装置 | |
JP2008225001A (ja) | 音声認識装置および音声認識方法,音声認識用プログラム |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 200980161199.3 Country of ref document: CN |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 09851067 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2011539182 Country of ref document: JP |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 09851067 Country of ref document: EP Kind code of ref document: A1 |