WO2023211369A3 - Speech recognition model generation method and apparatus, speech recognition method and apparatus, medium, and device - Google Patents
Speech recognition model generation method and apparatus, speech recognition method and apparatus, medium, and device Download PDFInfo
- Publication number
- WO2023211369A3 WO2023211369A3 PCT/SG2023/050236 SG2023050236W WO2023211369A3 WO 2023211369 A3 WO2023211369 A3 WO 2023211369A3 SG 2023050236 W SG2023050236 W SG 2023050236W WO 2023211369 A3 WO2023211369 A3 WO 2023211369A3
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- speech recognition
- target
- recognition model
- named entity
- generation method
- Prior art date
Links
- 238000000034 method Methods 0.000 title abstract 6
- 230000015572 biosynthetic process Effects 0.000 abstract 1
- 238000012216 screening Methods 0.000 abstract 1
- 238000003786 synthesis reaction Methods 0.000 abstract 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/08—Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Machine Translation (AREA)
- Document Processing Apparatus (AREA)
Abstract
The present disclosure relates to a speech recognition model generation method and apparatus, a speech recognition method and apparatus, a medium, and a device. The speech recognition model generation method comprises: obtaining a target named entity word list, the target named entity word list comprising a plurality of named entity words; performing screening on preset text data on the basis of the named entity words in the target named entity word list to obtain target text data containing the named entity words; performing speech synthesis processing on the target text data to determine target audio data; determining target training data on the basis of the target audio data; newly performing training on a pre-trained speech recognition model on the basis of initial training data and the target training data to obtain a target speech recognition model, the initial training data being audio data used for training to obtain the pre-trained speech recognition model. The target speech recognition model obtained by the speech recognition model generation method, the recognition accuracy of named entity words can be improved.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210441630.7A CN114765025A (en) | 2022-04-25 | 2022-04-25 | Method for generating and recognizing speech recognition model, device, medium and equipment |
CN202210441630.7 | 2022-04-25 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2023211369A2 WO2023211369A2 (en) | 2023-11-02 |
WO2023211369A3 true WO2023211369A3 (en) | 2024-03-21 |
Family
ID=82364996
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/SG2023/050236 WO2023211369A2 (en) | 2022-04-25 | 2023-04-06 | Speech recognition model generation method and apparatus, speech recognition method and apparatus, medium, and device |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN114765025A (en) |
WO (1) | WO2023211369A2 (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117174084B (en) * | 2023-11-02 | 2024-05-31 | 摩尔线程智能科技(北京)有限责任公司 | Training data construction method and device, electronic equipment and storage medium |
CN117935787B (en) * | 2024-03-22 | 2024-05-31 | 摩尔线程智能科技(北京)有限责任公司 | Data screening and labeling method and device, electronic equipment and storage medium |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109346064A (en) * | 2018-12-13 | 2019-02-15 | 苏州思必驰信息科技有限公司 | Training method and system for end-to-end speech identification model |
CN110675864A (en) * | 2019-09-12 | 2020-01-10 | 上海依图信息技术有限公司 | Voice recognition method and device |
CN110827791A (en) * | 2019-09-09 | 2020-02-21 | 西北大学 | Edge-device-oriented speech recognition-synthesis combined modeling method |
US20200357388A1 (en) * | 2019-05-10 | 2020-11-12 | Google Llc | Using Context Information With End-to-End Models for Speech Recognition |
US20210304769A1 (en) * | 2020-03-31 | 2021-09-30 | Microsoft Technology Licensing, Llc | Generating and using text-to-speech data for speech recognition models |
CN113470626A (en) * | 2021-06-30 | 2021-10-01 | 北京有竹居网络技术有限公司 | Training method, device and equipment of voice recognition model |
CN113782013A (en) * | 2021-09-15 | 2021-12-10 | 北京百度网讯科技有限公司 | Method, apparatus, storage medium, and program product for speech recognition and model training |
US20220115000A1 (en) * | 2020-10-12 | 2022-04-14 | Google Llc | On-device personalization of speech synthesis for training of speech recognition model(s) |
-
2022
- 2022-04-25 CN CN202210441630.7A patent/CN114765025A/en active Pending
-
2023
- 2023-04-06 WO PCT/SG2023/050236 patent/WO2023211369A2/en unknown
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109346064A (en) * | 2018-12-13 | 2019-02-15 | 苏州思必驰信息科技有限公司 | Training method and system for end-to-end speech identification model |
US20200357388A1 (en) * | 2019-05-10 | 2020-11-12 | Google Llc | Using Context Information With End-to-End Models for Speech Recognition |
CN110827791A (en) * | 2019-09-09 | 2020-02-21 | 西北大学 | Edge-device-oriented speech recognition-synthesis combined modeling method |
CN110675864A (en) * | 2019-09-12 | 2020-01-10 | 上海依图信息技术有限公司 | Voice recognition method and device |
US20210304769A1 (en) * | 2020-03-31 | 2021-09-30 | Microsoft Technology Licensing, Llc | Generating and using text-to-speech data for speech recognition models |
US20220115000A1 (en) * | 2020-10-12 | 2022-04-14 | Google Llc | On-device personalization of speech synthesis for training of speech recognition model(s) |
CN113470626A (en) * | 2021-06-30 | 2021-10-01 | 北京有竹居网络技术有限公司 | Training method, device and equipment of voice recognition model |
CN113782013A (en) * | 2021-09-15 | 2021-12-10 | 北京百度网讯科技有限公司 | Method, apparatus, storage medium, and program product for speech recognition and model training |
Also Published As
Publication number | Publication date |
---|---|
WO2023211369A2 (en) | 2023-11-02 |
CN114765025A (en) | 2022-07-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2023211369A3 (en) | Speech recognition model generation method and apparatus, speech recognition method and apparatus, medium, and device | |
Fan et al. | TTS synthesis with bidirectional LSTM based recurrent neural networks | |
CN101246685B (en) | Pronunciation quality evaluation method of computer auxiliary language learning system | |
WO2021134520A1 (en) | Voice conversion method, voice conversion training method, intelligent device and storage medium | |
CN1763843A (en) | Pronunciation quality evaluating method for language learning machine | |
CN104123933A (en) | Self-adaptive non-parallel training based voice conversion method | |
CN107767881B (en) | Method and device for acquiring satisfaction degree of voice information | |
JP7393585B2 (en) | WaveNet self-training for text-to-speech | |
Maqsood et al. | An efficientmis pronunciation detection system using discriminative acoustic phonetic features for arabic consonants. | |
Vít et al. | On the analysis of training data for WaveNet-based speech synthesis | |
Sinclair et al. | A semi-markov model for speech segmentation with an utterance-break prior | |
Karjigi et al. | Classification of place of articulation in unvoiced stops with spectro-temporal surface modeling | |
Lee et al. | Analysis of auto-aligned and auto-segmented oral discourse by speakers with aphasia: A preliminary study on the acoustic parameter of duration | |
CN112767961B (en) | Accent correction method based on cloud computing | |
Wei et al. | Predicting articulatory movement from text using deep architecture with stacked bottleneck features | |
Shahriar et al. | Identification of Spoken Language using Machine Learning Approach | |
CN115312029B (en) | Voice translation method and system based on voice depth characterization mapping | |
Budiman et al. | Multi Speaker Speech Synthesis System for Indonesian Language | |
Rahman et al. | Development of isolated speech recognition system for bangla words | |
Dahan et al. | Automatic arabic pronunciation scoring for language instruction | |
Huang et al. | A mispronunciation detection method of confusing vowel pair for chinese students | |
Docasal et al. | Enhancing Voice Cloning Quality through Data Selection and Alignment-based Metrics | |
CN114283788A (en) | Pronunciation evaluation method, pronunciation evaluation system training method, pronunciation evaluation device and pronunciation evaluation device | |
Karmacharya | Design of Keyword Spotting System Based on Segmental Time Warping of Quantized Features | |
Anand et al. | Unsupervised pronunciation assessment analysis using utterance level alignment distance with self-supervised representations |