CN116364062A - 语音识别方法、装置及车辆 - Google Patents
语音识别方法、装置及车辆 Download PDFInfo
- Publication number
- CN116364062A CN116364062A CN202310618669.6A CN202310618669A CN116364062A CN 116364062 A CN116364062 A CN 116364062A CN 202310618669 A CN202310618669 A CN 202310618669A CN 116364062 A CN116364062 A CN 116364062A
- Authority
- CN
- China
- Prior art keywords
- training
- preset
- acoustic
- vector
- voice
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 47
- 239000013598 vector Substances 0.000 claims abstract description 225
- 238000012549 training Methods 0.000 claims abstract description 144
- 238000000605 extraction Methods 0.000 claims abstract description 24
- 239000011159 matrix material Substances 0.000 claims abstract description 19
- 239000003550 marker Substances 0.000 claims description 43
- 238000012545 processing Methods 0.000 claims description 16
- 238000004458 analytical method Methods 0.000 claims description 5
- 238000005070 sampling Methods 0.000 claims description 5
- 238000005516 engineering process Methods 0.000 description 13
- 230000005236 sound signal Effects 0.000 description 9
- 238000004590 computer program Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 6
- 230000008569 process Effects 0.000 description 6
- 230000003993 interaction Effects 0.000 description 5
- 238000001514 detection method Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 238000012544 monitoring process Methods 0.000 description 4
- 230000004044 response Effects 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 3
- 230000002085 persistent effect Effects 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000009825 accumulation Methods 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 239000002355 dual-layer Substances 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000009432 framing Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 230000001052 transient effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60R—VEHICLES, VEHICLE FITTINGS, OR VEHICLE PARTS, NOT OTHERWISE PROVIDED FOR
- B60R16/00—Electric or fluid circuits specially adapted for vehicles and not otherwise provided for; Arrangement of elements of electric or fluid circuits specially adapted for vehicles and not otherwise provided for
- B60R16/02—Electric or fluid circuits specially adapted for vehicles and not otherwise provided for; Arrangement of elements of electric or fluid circuits specially adapted for vehicles and not otherwise provided for electric constitutive elements
- B60R16/037—Electric or fluid circuits specially adapted for vehicles and not otherwise provided for; Arrangement of elements of electric or fluid circuits specially adapted for vehicles and not otherwise provided for electric constitutive elements for occupant comfort, e.g. for automatic adjustment of appliances according to personal settings, e.g. seats, mirrors, steering wheel
- B60R16/0373—Voice control
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W40/00—Estimation or calculation of non-directly measurable driving parameters for road vehicle drive control systems not related to the control of a particular sub unit, e.g. by using mathematical models
- B60W40/08—Estimation or calculation of non-directly measurable driving parameters for road vehicle drive control systems not related to the control of a particular sub unit, e.g. by using mathematical models related to drivers or passengers
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W50/00—Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
- B60W50/08—Interaction between the driver and the control system
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W40/00—Estimation or calculation of non-directly measurable driving parameters for road vehicle drive control systems not related to the control of a particular sub unit, e.g. by using mathematical models
- B60W40/08—Estimation or calculation of non-directly measurable driving parameters for road vehicle drive control systems not related to the control of a particular sub unit, e.g. by using mathematical models related to drivers or passengers
- B60W2040/089—Driver voice
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W2540/00—Input parameters relating to occupants
- B60W2540/21—Voice
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Acoustics & Sound (AREA)
- Automation & Control Theory (AREA)
- Mechanical Engineering (AREA)
- Transportation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
Description
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310618669.6A CN116364062B (zh) | 2023-05-30 | 2023-05-30 | 语音识别方法、装置及车辆 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310618669.6A CN116364062B (zh) | 2023-05-30 | 2023-05-30 | 语音识别方法、装置及车辆 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116364062A true CN116364062A (zh) | 2023-06-30 |
CN116364062B CN116364062B (zh) | 2023-08-25 |
Family
ID=86922514
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310618669.6A Active CN116364062B (zh) | 2023-05-30 | 2023-05-30 | 语音识别方法、装置及车辆 |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116364062B (zh) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117496972A (zh) * | 2023-12-29 | 2024-02-02 | 广州小鹏汽车科技有限公司 | 一种音频识别方法、音频识别装置、车辆和计算机设备 |
CN117524199A (zh) * | 2024-01-04 | 2024-02-06 | 广州小鹏汽车科技有限公司 | 语音识别方法、装置及车辆 |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150340034A1 (en) * | 2014-05-22 | 2015-11-26 | Google Inc. | Recognizing speech using neural networks |
JP2015219480A (ja) * | 2014-05-21 | 2015-12-07 | 日本電信電話株式会社 | 対話状況特徴計算装置、文末記号推定装置、これらの方法及びプログラム |
CN108766418A (zh) * | 2018-05-24 | 2018-11-06 | 百度在线网络技术(北京)有限公司 | 语音端点识别方法、装置及设备 |
WO2019156162A1 (ja) * | 2018-02-08 | 2019-08-15 | 日本電信電話株式会社 | 目的発話推定モデル学習装置、目的発話判定装置、目的発話推定モデル学習方法、目的発話判定方法、プログラム |
US20190362022A1 (en) * | 2018-05-25 | 2019-11-28 | Risto Haukioja | Audio file labeling process for building datasets at scale |
CN110827795A (zh) * | 2018-08-07 | 2020-02-21 | 阿里巴巴集团控股有限公司 | 语音输入结束判断方法、装置、设备、系统以及存储介质 |
US20200117996A1 (en) * | 2017-06-06 | 2020-04-16 | Google Llc | Unified Endpointer Using Multitask and Multidomain Learning |
WO2020214269A1 (en) * | 2019-04-16 | 2020-10-22 | Google Llc | Joint endpointing and automatic speech recognition |
WO2022134894A1 (zh) * | 2020-12-23 | 2022-06-30 | 腾讯科技(深圳)有限公司 | 语音识别方法、装置、计算机设备及存储介质 |
CN115910046A (zh) * | 2022-10-31 | 2023-04-04 | 科大讯飞股份有限公司 | 语音识别方法、装置、电子设备及存储介质 |
CN115910043A (zh) * | 2023-01-10 | 2023-04-04 | 广州小鹏汽车科技有限公司 | 语音识别方法、装置及车辆 |
-
2023
- 2023-05-30 CN CN202310618669.6A patent/CN116364062B/zh active Active
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2015219480A (ja) * | 2014-05-21 | 2015-12-07 | 日本電信電話株式会社 | 対話状況特徴計算装置、文末記号推定装置、これらの方法及びプログラム |
US20150340034A1 (en) * | 2014-05-22 | 2015-11-26 | Google Inc. | Recognizing speech using neural networks |
US20200117996A1 (en) * | 2017-06-06 | 2020-04-16 | Google Llc | Unified Endpointer Using Multitask and Multidomain Learning |
WO2019156162A1 (ja) * | 2018-02-08 | 2019-08-15 | 日本電信電話株式会社 | 目的発話推定モデル学習装置、目的発話判定装置、目的発話推定モデル学習方法、目的発話判定方法、プログラム |
CN108766418A (zh) * | 2018-05-24 | 2018-11-06 | 百度在线网络技术(北京)有限公司 | 语音端点识别方法、装置及设备 |
US20190362022A1 (en) * | 2018-05-25 | 2019-11-28 | Risto Haukioja | Audio file labeling process for building datasets at scale |
CN110827795A (zh) * | 2018-08-07 | 2020-02-21 | 阿里巴巴集团控股有限公司 | 语音输入结束判断方法、装置、设备、系统以及存储介质 |
WO2020214269A1 (en) * | 2019-04-16 | 2020-10-22 | Google Llc | Joint endpointing and automatic speech recognition |
WO2022134894A1 (zh) * | 2020-12-23 | 2022-06-30 | 腾讯科技(深圳)有限公司 | 语音识别方法、装置、计算机设备及存储介质 |
CN115910046A (zh) * | 2022-10-31 | 2023-04-04 | 科大讯飞股份有限公司 | 语音识别方法、装置、电子设备及存储介质 |
CN115910043A (zh) * | 2023-01-10 | 2023-04-04 | 广州小鹏汽车科技有限公司 | 语音识别方法、装置及车辆 |
Non-Patent Citations (2)
Title |
---|
SHUBHAM TOSHNIWAL: "Multilingual Speech Recognition with a Single End-to-End Model", 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, pages 4904 - 4908 * |
李爱真: "基于声学特征凸显的汉语疑问句检出", 中国科技论文, pages 826 - 829 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117496972A (zh) * | 2023-12-29 | 2024-02-02 | 广州小鹏汽车科技有限公司 | 一种音频识别方法、音频识别装置、车辆和计算机设备 |
CN117496972B (zh) * | 2023-12-29 | 2024-04-16 | 广州小鹏汽车科技有限公司 | 一种音频识别方法、音频识别装置、车辆和计算机设备 |
CN117524199A (zh) * | 2024-01-04 | 2024-02-06 | 广州小鹏汽车科技有限公司 | 语音识别方法、装置及车辆 |
CN117524199B (zh) * | 2024-01-04 | 2024-04-16 | 广州小鹏汽车科技有限公司 | 语音识别方法、装置及车辆 |
Also Published As
Publication number | Publication date |
---|---|
CN116364062B (zh) | 2023-08-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN116364062B (zh) | 语音识别方法、装置及车辆 | |
US10937448B2 (en) | Voice activity detection method and apparatus | |
CN108962227B (zh) | 语音起点和终点检测方法、装置、计算机设备及存储介质 | |
CN110097870B (zh) | 语音处理方法、装置、设备和存储介质 | |
CN115910043B (zh) | 语音识别方法、装置及车辆 | |
CN108710704B (zh) | 对话状态的确定方法、装置、电子设备及存储介质 | |
US10431201B1 (en) | Analyzing messages with typographic errors due to phonemic spellings using text-to-speech and speech-to-text algorithms | |
CN115862600B (zh) | 语音识别方法、装置及车辆 | |
CN115910044B (zh) | 语音识别方法、装置及车辆 | |
JP7544989B2 (ja) | ルックアップテーブルリカレント言語モデル | |
US12014725B2 (en) | Large-scale language model data selection for rare-word speech recognition | |
US20200365144A1 (en) | Method and apparatus for speech recognition | |
CN113611316A (zh) | 人机交互方法、装置、设备以及存储介质 | |
CN117043856A (zh) | 高效流式非递归设备上的端到端模型 | |
CN113160854A (zh) | 语音交互系统、相关方法、装置及设备 | |
CN112397053B (zh) | 语音识别方法、装置、电子设备及可读存储介质 | |
WO2019107170A1 (ja) | 緊急度推定装置、緊急度推定方法、プログラム | |
CN113823258A (zh) | 一种语音处理方法及装置 | |
JP7098587B2 (ja) | 情報処理装置、キーワード検出装置、情報処理方法およびプログラム | |
US20230343332A1 (en) | Joint Segmenting and Automatic Speech Recognition | |
US12073825B2 (en) | Method and apparatus for speech recognition | |
CN117524199B (zh) | 语音识别方法、装置及车辆 | |
CN114048714A (zh) | 逆文本标准化方法和装置 | |
CN114203180A (zh) | 会议纪要的生成方法、装置、电子设备及存储介质 | |
CN116312485B (zh) | 语音识别方法、装置及车辆 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20230928 Address after: Room 201, 2nd Floor, Experimental Building, No. 16 Luohu South Street, High tech Zone, Zhaoqing City, Guangdong Province, 526238 Patentee after: Zhaoqing Xiaopeng Intelligent Manufacturing Research Institute Co.,Ltd. Address before: 510000 No.8 Songgang street, Cencun, Tianhe District, Guangzhou City, Guangdong Province Patentee before: GUANGZHOU XIAOPENG MOTORS TECHNOLOGY Co.,Ltd. |
|
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20250102 Address after: Room 201, 2nd Floor, Experimental Building, No. 16 Luohu South Street, High tech Zone, Zhaoqing City, Guangdong Province, China 526200 Patentee after: Zhaoqing Xiaopeng Intelligent Manufacturing Research Institute Co.,Ltd. Country or region after: China Patentee after: GUANGZHOU XIAOPENG MOTORS TECHNOLOGY Co.,Ltd. Address before: Room 201, 2nd Floor, Experimental Building, No. 16 Luohu South Street, High tech Zone, Zhaoqing City, Guangdong Province, 526238 Patentee before: Zhaoqing Xiaopeng Intelligent Manufacturing Research Institute Co.,Ltd. Country or region before: China |