WO2023211369A3 - Speech recognition model generation method and apparatus, speech recognition method and apparatus, medium, and device - Google Patents

Speech recognition model generation method and apparatus, speech recognition method and apparatus, medium, and device Download PDF

Info

Publication number
WO2023211369A3
WO2023211369A3 PCT/SG2023/050236 SG2023050236W WO2023211369A3 WO 2023211369 A3 WO2023211369 A3 WO 2023211369A3 SG 2023050236 W SG2023050236 W SG 2023050236W WO 2023211369 A3 WO2023211369 A3 WO 2023211369A3
Authority
WO
WIPO (PCT)
Prior art keywords
speech recognition
target
recognition model
named entity
generation method
Prior art date
Application number
PCT/SG2023/050236
Other languages
French (fr)
Chinese (zh)
Other versions
WO2023211369A2 (en
Inventor
马娆
吴璟成
马泽君
Original Assignee
脸萌有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 脸萌有限公司 filed Critical 脸萌有限公司
Publication of WO2023211369A2 publication Critical patent/WO2023211369A2/en
Publication of WO2023211369A3 publication Critical patent/WO2023211369A3/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)
  • Document Processing Apparatus (AREA)

Abstract

The present disclosure relates to a speech recognition model generation method and apparatus, a speech recognition method and apparatus, a medium, and a device. The speech recognition model generation method comprises: obtaining a target named entity word list, the target named entity word list comprising a plurality of named entity words; performing screening on preset text data on the basis of the named entity words in the target named entity word list to obtain target text data containing the named entity words; performing speech synthesis processing on the target text data to determine target audio data; determining target training data on the basis of the target audio data; newly performing training on a pre-trained speech recognition model on the basis of initial training data and the target training data to obtain a target speech recognition model, the initial training data being audio data used for training to obtain the pre-trained speech recognition model. The target speech recognition model obtained by the speech recognition model generation method, the recognition accuracy of named entity words can be improved.
PCT/SG2023/050236 2022-04-25 2023-04-06 Speech recognition model generation method and apparatus, speech recognition method and apparatus, medium, and device WO2023211369A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210441630.7A CN114765025A (en) 2022-04-25 2022-04-25 Method for generating and recognizing speech recognition model, device, medium and equipment
CN202210441630.7 2022-04-25

Publications (2)

Publication Number Publication Date
WO2023211369A2 WO2023211369A2 (en) 2023-11-02
WO2023211369A3 true WO2023211369A3 (en) 2024-03-21

Family

ID=82364996

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/SG2023/050236 WO2023211369A2 (en) 2022-04-25 2023-04-06 Speech recognition model generation method and apparatus, speech recognition method and apparatus, medium, and device

Country Status (2)

Country Link
CN (1) CN114765025A (en)
WO (1) WO2023211369A2 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117174084B (en) * 2023-11-02 2024-05-31 摩尔线程智能科技(北京)有限责任公司 Training data construction method and device, electronic equipment and storage medium
CN117935787B (en) * 2024-03-22 2024-05-31 摩尔线程智能科技(北京)有限责任公司 Data screening and labeling method and device, electronic equipment and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109346064A (en) * 2018-12-13 2019-02-15 苏州思必驰信息科技有限公司 Training method and system for end-to-end speech identification model
CN110675864A (en) * 2019-09-12 2020-01-10 上海依图信息技术有限公司 Voice recognition method and device
CN110827791A (en) * 2019-09-09 2020-02-21 西北大学 Edge-device-oriented speech recognition-synthesis combined modeling method
US20200357388A1 (en) * 2019-05-10 2020-11-12 Google Llc Using Context Information With End-to-End Models for Speech Recognition
US20210304769A1 (en) * 2020-03-31 2021-09-30 Microsoft Technology Licensing, Llc Generating and using text-to-speech data for speech recognition models
CN113470626A (en) * 2021-06-30 2021-10-01 北京有竹居网络技术有限公司 Training method, device and equipment of voice recognition model
CN113782013A (en) * 2021-09-15 2021-12-10 北京百度网讯科技有限公司 Method, apparatus, storage medium, and program product for speech recognition and model training
US20220115000A1 (en) * 2020-10-12 2022-04-14 Google Llc On-device personalization of speech synthesis for training of speech recognition model(s)

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109346064A (en) * 2018-12-13 2019-02-15 苏州思必驰信息科技有限公司 Training method and system for end-to-end speech identification model
US20200357388A1 (en) * 2019-05-10 2020-11-12 Google Llc Using Context Information With End-to-End Models for Speech Recognition
CN110827791A (en) * 2019-09-09 2020-02-21 西北大学 Edge-device-oriented speech recognition-synthesis combined modeling method
CN110675864A (en) * 2019-09-12 2020-01-10 上海依图信息技术有限公司 Voice recognition method and device
US20210304769A1 (en) * 2020-03-31 2021-09-30 Microsoft Technology Licensing, Llc Generating and using text-to-speech data for speech recognition models
US20220115000A1 (en) * 2020-10-12 2022-04-14 Google Llc On-device personalization of speech synthesis for training of speech recognition model(s)
CN113470626A (en) * 2021-06-30 2021-10-01 北京有竹居网络技术有限公司 Training method, device and equipment of voice recognition model
CN113782013A (en) * 2021-09-15 2021-12-10 北京百度网讯科技有限公司 Method, apparatus, storage medium, and program product for speech recognition and model training

Also Published As

Publication number Publication date
WO2023211369A2 (en) 2023-11-02
CN114765025A (en) 2022-07-19

Similar Documents

Publication Publication Date Title
WO2023211369A3 (en) Speech recognition model generation method and apparatus, speech recognition method and apparatus, medium, and device
Fan et al. TTS synthesis with bidirectional LSTM based recurrent neural networks
CN101246685B (en) Pronunciation quality evaluation method of computer auxiliary language learning system
WO2021134520A1 (en) Voice conversion method, voice conversion training method, intelligent device and storage medium
CN1763843A (en) Pronunciation quality evaluating method for language learning machine
CN104123933A (en) Self-adaptive non-parallel training based voice conversion method
CN107767881B (en) Method and device for acquiring satisfaction degree of voice information
JP7393585B2 (en) WaveNet self-training for text-to-speech
Maqsood et al. An efficientmis pronunciation detection system using discriminative acoustic phonetic features for arabic consonants.
Vít et al. On the analysis of training data for WaveNet-based speech synthesis
Sinclair et al. A semi-markov model for speech segmentation with an utterance-break prior
Karjigi et al. Classification of place of articulation in unvoiced stops with spectro-temporal surface modeling
Lee et al. Analysis of auto-aligned and auto-segmented oral discourse by speakers with aphasia: A preliminary study on the acoustic parameter of duration
CN112767961B (en) Accent correction method based on cloud computing
Wei et al. Predicting articulatory movement from text using deep architecture with stacked bottleneck features
Shahriar et al. Identification of Spoken Language using Machine Learning Approach
CN115312029B (en) Voice translation method and system based on voice depth characterization mapping
Budiman et al. Multi Speaker Speech Synthesis System for Indonesian Language
Rahman et al. Development of isolated speech recognition system for bangla words
Dahan et al. Automatic arabic pronunciation scoring for language instruction
Huang et al. A mispronunciation detection method of confusing vowel pair for chinese students
Docasal et al. Enhancing Voice Cloning Quality through Data Selection and Alignment-based Metrics
CN114283788A (en) Pronunciation evaluation method, pronunciation evaluation system training method, pronunciation evaluation device and pronunciation evaluation device
Karmacharya Design of Keyword Spotting System Based on Segmental Time Warping of Quantized Features
Anand et al. Unsupervised pronunciation assessment analysis using utterance level alignment distance with self-supervised representations