CN113628613A - 两阶段的用户可定制唤醒词检测 - Google Patents
两阶段的用户可定制唤醒词检测 Download PDFInfo
- Publication number
- CN113628613A CN113628613A CN202110467675.7A CN202110467675A CN113628613A CN 113628613 A CN113628613 A CN 113628613A CN 202110467675 A CN202110467675 A CN 202110467675A CN 113628613 A CN113628613 A CN 113628613A
- Authority
- CN
- China
- Prior art keywords
- model
- training
- detected utterance
- determining
- utterance
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title description 19
- 238000012545 processing Methods 0.000 claims abstract description 101
- 238000012549 training Methods 0.000 claims abstract description 98
- 238000000034 method Methods 0.000 claims abstract description 47
- 230000015654 memory Effects 0.000 claims description 13
- 230000004044 response Effects 0.000 claims description 3
- 238000010586 diagram Methods 0.000 description 15
- 230000006870 function Effects 0.000 description 15
- 230000008569 process Effects 0.000 description 10
- 238000004422 calculation algorithm Methods 0.000 description 9
- 238000004364 calculation method Methods 0.000 description 7
- 238000005070 sampling Methods 0.000 description 7
- 238000013528 artificial neural network Methods 0.000 description 6
- 230000002093 peripheral effect Effects 0.000 description 5
- 230000005236 sound signal Effects 0.000 description 5
- 241000269319 Squalius cephalus Species 0.000 description 4
- 230000004913 activation Effects 0.000 description 4
- 238000013459 approach Methods 0.000 description 4
- 238000005259 measurement Methods 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 4
- 238000003909 pattern recognition Methods 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 238000004891 communication Methods 0.000 description 3
- 238000013461 design Methods 0.000 description 3
- 230000006978 adaptation Effects 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- 238000013500 data storage Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 230000001419 dependent effect Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000001914 filtration Methods 0.000 description 2
- 238000012880 independent component analysis Methods 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 238000013518 transcription Methods 0.000 description 2
- 230000035897 transcription Effects 0.000 description 2
- 230000007704 transition Effects 0.000 description 2
- 244000301850 Cupressus sempervirens Species 0.000 description 1
- 238000007476 Maximum Likelihood Methods 0.000 description 1
- 244000294411 Mirabilis expansa Species 0.000 description 1
- 235000015429 Mirabilis expansa Nutrition 0.000 description 1
- 240000004760 Pimpinella anisum Species 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 238000004378 air conditioning Methods 0.000 description 1
- 230000003321 amplification Effects 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000003190 augmentative effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000003750 conditioning effect Effects 0.000 description 1
- 239000013078 crystal Substances 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 238000009795 derivation Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 238000010304 firing Methods 0.000 description 1
- 238000010438 heat treatment Methods 0.000 description 1
- 230000001976 improved effect Effects 0.000 description 1
- 230000001939 inductive effect Effects 0.000 description 1
- 230000000977 initiatory effect Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 235000013536 miso Nutrition 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000003199 nucleic acid amplification method Methods 0.000 description 1
- 238000013138 pruning Methods 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 238000010183 spectrum analysis Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 239000000758 substrate Substances 0.000 description 1
- 230000008093 supporting effect Effects 0.000 description 1
- 230000001629 suppression Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
- G06F3/167—Audio in a user interface, e.g. using voice commands for navigating, audio feedback
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/451—Execution arrangements for user interfaces
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/10—Speech classification or search using distance or distortion measures between unknown speech and reference templates
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
- G10L15/142—Hidden Markov Models [HMMs]
- G10L15/144—Training of HMMs
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
Landscapes
- Engineering & Computer Science (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Acoustics & Sound (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Electrically Operated Instructional Devices (AREA)
- Machine Translation (AREA)
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US202063020984P | 2020-05-06 | 2020-05-06 | |
US63/020,984 | 2020-05-06 | ||
US17/032,653 US11783818B2 (en) | 2020-05-06 | 2020-09-25 | Two stage user customizable wake word detection |
US17/032,653 | 2020-09-25 |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113628613A true CN113628613A (zh) | 2021-11-09 |
Family
ID=78232057
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110467675.7A Pending CN113628613A (zh) | 2020-05-06 | 2021-04-28 | 两阶段的用户可定制唤醒词检测 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN113628613A (de) |
DE (1) | DE102021111594A1 (de) |
-
2021
- 2021-04-28 CN CN202110467675.7A patent/CN113628613A/zh active Pending
- 2021-05-05 DE DE102021111594.9A patent/DE102021111594A1/de active Pending
Also Published As
Publication number | Publication date |
---|---|
DE102021111594A1 (de) | 2021-11-11 |
DE102021111594A9 (de) | 2022-01-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20230082944A1 (en) | Techniques for language independent wake-up word detection | |
CN110364143B (zh) | 语音唤醒方法、装置及其智能电子设备 | |
CN107481718B (zh) | 语音识别方法、装置、存储介质及电子设备 | |
US9775113B2 (en) | Voice wakeup detecting device with digital microphone and associated method | |
US10304440B1 (en) | Keyword spotting using multi-task configuration | |
US10485049B1 (en) | Wireless device connection handover | |
CN105765650B (zh) | 带有多向解码的语音辨识器 | |
JP3674990B2 (ja) | 音声認識対話装置および音声認識対話処理方法 | |
CN109346075A (zh) | 通过人体振动识别用户语音以控制电子设备的方法和系统 | |
CN105704298A (zh) | 声音唤醒侦测装置与方法 | |
US20140201639A1 (en) | Audio user interface apparatus and method | |
CN109272991B (zh) | 语音交互的方法、装置、设备和计算机可读存储介质 | |
US10477294B1 (en) | Multi-device audio capture | |
US11308946B2 (en) | Methods and apparatus for ASR with embedded noise reduction | |
WO2023029615A1 (zh) | 语音唤醒的方法、装置、设备、存储介质及程序产品 | |
US11064281B1 (en) | Sending and receiving wireless data | |
CN101350196A (zh) | 任务相关的说话人身份确认片上系统及其确认方法 | |
US11783818B2 (en) | Two stage user customizable wake word detection | |
CN113628613A (zh) | 两阶段的用户可定制唤醒词检测 | |
Adnene et al. | Design and implementation of an automatic speech recognition based voice control system | |
US20210210109A1 (en) | Adaptive decoder for highly compressed grapheme model | |
JP3846500B2 (ja) | 音声認識対話装置および音声認識対話処理方法 | |
US20230386458A1 (en) | Pre-wakeword speech processing | |
US20240079004A1 (en) | System and method for receiving a voice command | |
US12027156B2 (en) | Noise robust representations for keyword spotting systems |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |