CN112639965A - 在包括多个设备的环境中的语音识别方法和设备 - Google Patents
在包括多个设备的环境中的语音识别方法和设备 Download PDFInfo
- Publication number
- CN112639965A CN112639965A CN201980055917.2A CN201980055917A CN112639965A CN 112639965 A CN112639965 A CN 112639965A CN 201980055917 A CN201980055917 A CN 201980055917A CN 112639965 A CN112639965 A CN 112639965A
- Authority
- CN
- China
- Prior art keywords
- speaker
- speech recognition
- speech
- recognition
- score
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 62
- 230000005236 sound signal Effects 0.000 claims abstract description 40
- 230000008859 change Effects 0.000 claims description 9
- 230000004044 response Effects 0.000 claims description 7
- 238000013473 artificial intelligence Methods 0.000 abstract description 27
- 238000010801 machine learning Methods 0.000 abstract description 6
- 238000013135 deep learning Methods 0.000 abstract description 5
- 238000010586 diagram Methods 0.000 description 31
- 230000006870 function Effects 0.000 description 16
- 238000004590 computer program Methods 0.000 description 15
- 238000012545 processing Methods 0.000 description 15
- 238000003860 storage Methods 0.000 description 13
- 238000004891 communication Methods 0.000 description 11
- 238000012549 training Methods 0.000 description 11
- 238000013528 artificial neural network Methods 0.000 description 8
- 239000000284 extract Substances 0.000 description 7
- 238000013507 mapping Methods 0.000 description 5
- 239000013598 vector Substances 0.000 description 5
- 238000009826 distribution Methods 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 230000003044 adaptive effect Effects 0.000 description 3
- 230000003213 activating effect Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000007726 management method Methods 0.000 description 2
- 238000003825 pressing Methods 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 238000005406 washing Methods 0.000 description 2
- 230000002457 bidirectional effect Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 230000002787 reinforcement Effects 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/02—Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
- G06F3/167—Audio in a user interface, e.g. using voice commands for navigating, audio feedback
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/06—Decision making techniques; Pattern matching strategies
- G10L17/08—Use of distortion metrics or a particular distance between probe pattern and reference templates
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/06—Decision making techniques; Pattern matching strategies
- G10L17/12—Score normalisation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G10L15/32—Multiple recognisers used in sequence or in parallel; Score combination systems therefor, e.g. voting systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/221—Announcement of recognition results
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Business, Economics & Management (AREA)
- Game Theory and Decision Science (AREA)
- Theoretical Computer Science (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Telephonic Communication Services (AREA)
Applications Claiming Priority (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR10-2018-0127696 | 2018-10-24 | ||
KR20180127696 | 2018-10-24 | ||
KR1020190110772A KR20200047311A (ko) | 2018-10-24 | 2019-09-06 | 복수의 장치들이 있는 환경에서의 음성 인식 방법 및 장치 |
KR10-2019-0110772 | 2019-09-06 | ||
PCT/KR2019/013903 WO2020085769A1 (fr) | 2018-10-24 | 2019-10-22 | Procédé et appareil de reconnaissance vocale dans un environnement comprenant une pluralité d'appareils |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112639965A true CN112639965A (zh) | 2021-04-09 |
Family
ID=70733911
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201980055917.2A Pending CN112639965A (zh) | 2018-10-24 | 2019-10-22 | 在包括多个设备的环境中的语音识别方法和设备 |
Country Status (3)
Country | Link |
---|---|
EP (1) | EP3797414A4 (fr) |
KR (1) | KR20200047311A (fr) |
CN (1) | CN112639965A (fr) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11915697B2 (en) | 2020-11-11 | 2024-02-27 | Samsung Electronics Co., Ltd. | Electronic device, system and control method thereof |
KR20220099831A (ko) * | 2021-01-07 | 2022-07-14 | 삼성전자주식회사 | 전자 장치 및 전자 장치에서 사용자 발화 처리 방법 |
KR20240092249A (ko) * | 2022-12-14 | 2024-06-24 | 삼성전자주식회사 | 전자 장치 및 이의 동작 방법 |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130073293A1 (en) * | 2011-09-20 | 2013-03-21 | Lg Electronics Inc. | Electronic device and method for controlling the same |
US10026399B2 (en) * | 2015-09-11 | 2018-07-17 | Amazon Technologies, Inc. | Arbitration between voice-enabled devices |
WO2018067528A1 (fr) * | 2016-10-03 | 2018-04-12 | Google Llc | Négociation de primauté d'un dispositif parmi des dispositifs d'interface vocale |
US10559309B2 (en) * | 2016-12-22 | 2020-02-11 | Google Llc | Collaborative voice controlled devices |
-
2019
- 2019-09-06 KR KR1020190110772A patent/KR20200047311A/ko unknown
- 2019-10-22 EP EP19874900.4A patent/EP3797414A4/fr not_active Withdrawn
- 2019-10-22 CN CN201980055917.2A patent/CN112639965A/zh active Pending
Also Published As
Publication number | Publication date |
---|---|
EP3797414A1 (fr) | 2021-03-31 |
EP3797414A4 (fr) | 2021-08-25 |
KR20200047311A (ko) | 2020-05-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10607597B2 (en) | Speech signal recognition system and method | |
US11687319B2 (en) | Speech recognition method and apparatus with activation word based on operating environment of the apparatus | |
CN110288987B (zh) | 用于处理声音数据的系统和控制该系统的方法 | |
US9443527B1 (en) | Speech recognition capability generation and control | |
US20200135212A1 (en) | Speech recognition method and apparatus in environment including plurality of apparatuses | |
US11380326B2 (en) | Method and apparatus for performing speech recognition with wake on voice (WoV) | |
TWI619114B (zh) | 環境敏感之自動語音辨識的方法和系統 | |
CN110310623B (zh) | 样本生成方法、模型训练方法、装置、介质及电子设备 | |
JP7173758B2 (ja) | 個人化された音声認識方法及びこれを行うユーザ端末及びサーバ | |
CN112074900B (zh) | 用于自然语言处理的音频分析 | |
EP3533052B1 (fr) | Procédé et appareil de reconnaissance vocale | |
CN112639965A (zh) | 在包括多个设备的环境中的语音识别方法和设备 | |
KR102531654B1 (ko) | 음성 입력 인증 디바이스 및 그 방법 | |
CN112384974B (zh) | 电子装置和用于提供或获得用于训练电子装置的数据的方法 | |
CN114762038A (zh) | 在多轮对话中的自动轮次描述 | |
CN111145735B (zh) | 电子设备及其操作方法 | |
US11830501B2 (en) | Electronic device and operation method for performing speech recognition | |
KR20200051462A (ko) | 전자 장치 및 그 동작방법 | |
KR102677052B1 (ko) | 보이스 어시스턴트 서비스를 제공하는 시스템 및 방법 | |
US10803868B2 (en) | Sound output system and voice processing method | |
US20230126305A1 (en) | Method of identifying target device based on reception of utterance and electronic device therefor | |
CN116686046A (zh) | 电子设备及其控制方法 | |
US20240212681A1 (en) | Voice recognition device having barge-in function and method thereof | |
KR20200021400A (ko) | 음성 인식을 수행하는 전자 장치 및 그 동작 방법 | |
CN116635933A (zh) | 包括个性化文本到语音模块的电子装置及其控制方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20210409 |
|
WD01 | Invention patent application deemed withdrawn after publication |