CN110033758B - Voice wake-up implementation method based on small training set optimization decoding network - Google Patents

Voice wake-up implementation method based on small training set optimization decoding network Download PDF

Info

Publication number
CN110033758B
CN110033758B CN201910334792.9A CN201910334792A CN110033758B CN 110033758 B CN110033758 B CN 110033758B CN 201910334792 A CN201910334792 A CN 201910334792A CN 110033758 B CN110033758 B CN 110033758B
Authority
CN
China
Prior art keywords
awakening
word
network
frame
wake
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201910334792.9A
Other languages
Chinese (zh)
Other versions
CN110033758A (en
Inventor
赵升
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Shuixiang Electronic Technology Co ltd
Original Assignee
Wuhan Shuixiang Electronic Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Shuixiang Electronic Technology Co ltd filed Critical Wuhan Shuixiang Electronic Technology Co ltd
Priority to CN201910334792.9A priority Critical patent/CN110033758B/en
Publication of CN110033758A publication Critical patent/CN110033758A/en
Application granted granted Critical
Publication of CN110033758B publication Critical patent/CN110033758B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/083Recognition networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/183Speech classification or search using natural language modelling using context dependencies, e.g. language models
    • G10L15/187Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • G10L2015/025Phonemes, fenemes or fenones being the recognition units

Abstract

The invention discloses a voice awakening implementation method based on a small training set optimization decoding network, which comprises the following steps: s1 extracting the voice eigen feature to obtain the feature vector with obvious distinction between the awakening word and the non-awakening word; s2, combining the feature vectors to obtain a feature phoneme alignment file, selecting a time window according to the distribution of the phonemes of the awakening words, and classifying the mapping of the features and the phonemes to obtain acoustic data with labels; s3, combining with acoustic data with labels to calculate a frame-by-frame posterior probability model S4, combining with the obtained acoustic probability model to obtain a phoneme-level posterior probability confidence coefficient calculation network S5 to wake up a reconfirmation network of words; the invention can easily realize the functions of voice awakening and the like on common processors such as arm, dsp and the like through simple model training strategies, optimized decoding network and other steps.

Description

Voice wake-up implementation method based on small training set optimization decoding network
Technical Field
The invention relates to a voice awakening implementation method based on a small training set optimization decoding network. The post-positioned decoding network reduces the offline false wake-up rate on the basis of not increasing the algorithm complexity through optimization.
Background
The voice is the most convenient and fast means for human to communicate with each other, and it is the goal of human sleep to make the machine understand the voice and perform related operations according to human instructions. Thus, speech recognition technology is in force. The voice recognition technology is an important means of man-machine interaction at present, and voice awakening is an important entrance of man-machine interaction. The intelligent voice equipment is in a standby state under normal conditions and cannot respond to outside sound. Only when the input is awakened to be awakened, the system starts to process and analyze the input voice and give feedback, so that the error recognition rate of voice recognition is greatly reduced.
Specifically, voice wakeup refers to a system retrieving a preset wakeup word instruction from a continuous voice stream to achieve the purpose of starting voice recognition, and belongs to a keyword detection technology in the continuous voice stream. In order to achieve a good detection effect, the current keyword detection model is a model trained based on large-scale data, and the algorithm implementation is complex, which is a great obstacle to data resource shortage and implementation of wake-up end-to-end deployment.
In order to meet the requirement that the function of the awakening word can be easily deployed at a mobile terminal, a voice awakening implementation method which is based on a small training sample set, simple in strategy and high in operation speed is urgently needed.
The existing solution is a combination of a gaussian mixture model and a hidden markov model. Firstly, original voice data is represented by a more simplified vector, namely, feature vectors of voice, and then, the spatial distribution of the feature vectors is supposed to be in accordance with Gaussian distribution, so that the Gaussian distribution of different mean values and variances of a feature space can be obtained through the existing massive data training. The model is used as the observation probability parameter needed by the subsequent hidden Markov, and the hidden Markov model maps the characteristics to the word or phoneme space through data training to form a decoding network. During recognition, voice enters a decoding network through feature extraction, and a decoder selects a dynamic programming Viterbi beam algorithm to search and confirm results in the decoding network.
The two voice awakening overall ideas are also a method for voice recognition of large-scale vocabularies, a large amount of training corpora are needed to achieve good awakening and low false awakening effects, and the following dynamic search algorithm needs to consume huge operation time when performing global search in a decoding network, despite the application of a pruning method. The trained model is large in size, complex in algorithm implementation and low in operation efficiency. Such a model requires high computation power and a lot of resources in hardware to be used in mobile terminals such as home appliances, audio boxes, and vehicles. The cost of the product is increased while more resources are occupied and the operation time is increased invisibly, so that the use of the awakening word at a mobile terminal is limited, and the intelligent voice equipment cannot be widely applied.
Therefore, the invention provides a decoder which is optimized based on a small training set, so that algorithm implementation is simplified and operation efficiency is improved under the condition of realizing the same awakening rate, and the aim of conveniently transplanting voice awakening is further fulfilled.
Disclosure of Invention
The invention aims to solve the technical problems that the existing voice awakening whole thought is a method for voice recognition of large-scale vocabulary, a large amount of training corpora is needed to achieve good awakening and low false awakening effect, and provides a voice awakening implementation method which is based on a small training set and optimizes a decoder, so that algorithm implementation is simplified under the condition of achieving the same awakening rate, the operation efficiency is improved, and further the purpose of conveniently transplanting voice awakening is achieved.
In order to solve the technical problems, the invention provides the following technical scheme:
a voice awakening implementation method based on a small training set optimization decoding network is characterized by comprising the following steps:
s1 extracting intrinsic features of speech
According to the stability and correlation analysis of awakening words, a time window is designed to obtain frame characteristic signals, the time window design in the frame characteristic signals relates to the window length, the shape, the amplitude of each point and the weight between adjacent frame energies, and therefore characteristic vectors with obvious distinction between awakening words and non-awakening words are obtained;
s2 combining the feature vectors to obtain the feature phoneme alignment file
Selecting a time window according to the distribution of the phonemes of the awakening words, and classifying the mapping of the features and the phonemes to obtain acoustic data with labels; the alignment algorithm between the feature and the phoneme is mainly obtained by utilizing a context-related three-factor phoneme model, and the utilization rate of all phonemes in the awakening word in a fixed time window is maximized according to the length of each phoneme obtained by counting the linguistic data.
S3 calculating a frame-by-frame posterior probability model in combination with labeled acoustic data
Sending the obtained acoustic data with the labels into a forward and backward neural network based on a cross entropy loss function to train an acoustic model, and obtaining an acoustic probability model of the awakening words frame by frame;
s4 posterior probability confidence coefficient calculation network for obtaining phoneme level by combining with obtained acoustic probability model
Calculating the confidence coefficient of the wake word category of the useful category according to the frame-by-frame posterior probability of the wake word, and identifying the wake word; reserving N candidate items for the category of each frame data output according to the probability, and finally forming a network search space for confirming the awakening words; the confidence coefficient of the awakening word in the step is a dynamic acoustic confidence coefficient, the window length of the selected time domain framing time window is the window length of the confidence coefficient, the confidence coefficient window is made to slide on the posterior probability matrix output by the neural network in time according to the category, and the probability of each effective category in the window is superposed according to the weight. The weight of the step is obtained from the phoneme entropy of each category in the awakening word corpus counted in the second step. And identifying the awakening words according to the threshold of the dynamic acoustic confidence coefficient obtained by the test. In order to ensure the index of false wake-up, if the index is identified to be false wake-up, a wake-up word confirmation network is required to be entered, and the confirmation network of whether the wake-up word is a wake-up word is performed, so that the reliability of the result is ensured.
S5 network of reconfirmation of wake-up words
Confirming the awakening word according to the maximum entropy principle; firstly, reserving awakening word phonemes contained in each time point, setting zero when no awakening word phoneme exists, setting zero for phonemes on the time point if state jumping occurs in the middle, and then confirming the reliability of the awakening word according to the information entropy of all effective phonemes.
In addition, in the data preparation stage of the deep neural network, the acoustic model training can be realized by aligning the features with the syllables, and the awakening function can be completed through the following steps.
The invention has the following beneficial effects: the invention can easily realize the functions of voice awakening and the like on common processors such as arm, dsp and the like through simple model training strategies, optimized decoding network and the like:
according to the method, the time domain time window is designed according to the stationarity and the relevance of the linguistic data of the awakening words, so that the difference between the feature vectors is effectively improved, a feature model does not need to be learned through a large amount of linguistic data, and the model volume is reduced;
secondly, counting the time window length with the maximum effective phoneme utilization rate during label classification as the length of characteristic classification to prepare a file required by training, selecting a three-factor model for the characteristic phoneme alignment file, and calculating and designing the dynamic acoustic confidence coefficient, so that the adaptability of the recognition network to unknown voice is effectively improved, and the recognition effect of the network to phonemes can be well given under the condition of noise; using the information entropy of each phoneme of the awakening word counted in advance as the weight for calculating the dynamic acoustic confidence;
the invention confirms the awakening words according to the maximum entropy principle, effectively provides a method for judging the reliability of time sequence modeling by utilizing the maximum entropy matching algorithm of forbidden state hopping, effectively provides the reliability of the identified network, effectively reduces the false awakening, simplifies the complexity of the algorithm strategy and provides an effective way for transplanting the intelligent voice module at the front end with less hardware resources.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:
FIG. 1 is a schematic diagram of the present invention;
fig. 2 is a flow chart of the present invention.
Detailed Description
The preferred embodiments of the present invention will be described in conjunction with the accompanying drawings, and it will be understood that they are described herein for the purpose of illustration and explanation and not limitation.
Examples
A voice awakening implementation method based on a small training set optimization decoding network is characterized by comprising the following steps:
s1 extracting intrinsic features of speech
According to the stability and correlation analysis of awakening words, a time window is designed to obtain frame characteristic signals, the time window design in the frame characteristic signals relates to the window length, the shape, the amplitude of each point and the weight between adjacent frame energies, and therefore characteristic vectors with obvious distinction between awakening words and non-awakening words are obtained;
s2 combining the feature vectors to obtain the feature phoneme alignment file
Selecting a time window according to the distribution of the phonemes of the awakening words, and classifying the mapping of the features and the phonemes to obtain acoustic data with labels; the alignment algorithm between the feature and the phoneme is mainly obtained by utilizing a context-related three-factor phoneme model, and the utilization rate of all phonemes in the awakening word in a fixed time window is maximized according to the length of each phoneme obtained by counting the linguistic data.
S3 calculating a frame-by-frame posterior probability model in combination with labeled acoustic data
Sending the obtained acoustic data with the labels into a forward and backward neural network based on a cross entropy loss function to train an acoustic model, and obtaining an acoustic probability model of the awakening words frame by frame;
s4 posterior probability confidence coefficient calculation network for obtaining phoneme level by combining with obtained acoustic probability model
Calculating the confidence coefficient of the wake word category of the useful category according to the frame-by-frame posterior probability of the wake word, and identifying the wake word; reserving N candidate items for the category of each frame data output according to the probability, and finally forming a network search space for confirming the awakening words; the confidence coefficient of the awakening word in the step is a dynamic acoustic confidence coefficient, the window length of the selected time domain framing time window is the window length of the confidence coefficient, the confidence coefficient window is made to slide on the posterior probability matrix output by the neural network in time according to the category, and the probability of each effective category in the window is superposed according to the weight. The weight of the step is obtained from the phoneme entropy of each category in the awakening word corpus counted in the second step. And identifying the awakening words according to the threshold of the dynamic acoustic confidence coefficient obtained by the test. In order to ensure the index of false wake-up, if the index is identified to be false wake-up, a wake-up word confirmation network is required to be entered, and the confirmation network of whether the wake-up word is a wake-up word is performed, so that the reliability of the result is ensured.
S5 network of reconfirmation of wake-up words
Confirming the awakening word according to the maximum entropy principle; firstly, reserving awakening word phonemes contained in each time point, setting zero when no awakening word phoneme exists, setting zero for phonemes on the time point if state jumping occurs in the middle, and then confirming the reliability of the awakening word according to the information entropy of all effective phonemes.
In addition, in the data preparation stage of the deep neural network, the acoustic model training can be realized by aligning the features with the syllables, and the awakening function can be completed through the following steps.
The invention can easily realize the functions of voice awakening and the like on common processors such as arm, dsp and the like through simple model training strategies, optimized decoding network and other steps.
Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that changes may be made in the embodiments and/or equivalents thereof without departing from the spirit and scope of the invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (5)

1. A voice awakening implementation method based on a small training set optimization decoding network is characterized by comprising the following steps:
s1 extracting intrinsic features of speech
According to the stability and correlation analysis of the awakening word material, a time window is designed to obtain a frame characteristic signal, and therefore a characteristic vector with obvious distinction between the awakening word and the non-awakening word is obtained;
s2 combining the feature vectors to obtain the feature phoneme alignment file
Selecting a time window according to the distribution of the phonemes of the awakening words, and classifying the mapping of the features and the phonemes to obtain acoustic data with labels;
s3 calculating a frame-by-frame posterior probability model in combination with labeled acoustic data
Sending the obtained acoustic data with the labels into a forward and backward neural network based on a cross entropy loss function to train an acoustic model, and obtaining an acoustic probability model of the awakening words frame by frame;
s4 posterior probability confidence coefficient calculation network for obtaining phoneme level by combining with obtained acoustic probability model
Calculating the confidence coefficient of the wake word category of the useful category according to the frame-by-frame posterior probability of the wake word, and identifying the wake word; reserving N candidate items for the category of each frame data output according to the probability, and finally forming a network search space for confirming the awakening words;
the confidence of the wake word is a dynamic acoustic confidence; selecting a time domain framing time window length as a window length of a confidence coefficient, enabling the confidence coefficient window to slide on a posterior probability matrix output by the neural network in time according to categories, and overlapping the probability of each effective category in the window according to weight;
s5 network of reconfirmation of wake-up words
Confirming the awakening word according to the maximum entropy principle; firstly, reserving awakening word phonemes contained in each time point, setting zero when no awakening word phoneme exists, setting zero for phonemes on the time point if state jumping occurs in the middle, and then confirming the reliability of the awakening word according to the information entropy of all effective phonemes.
2. The method of claim 1, wherein the time window design in S1 relates to window length, shape, magnitude of each point and weight between adjacent frame energies.
3. The method as claimed in claim 1, wherein the alignment algorithm between the features and the phonemes in S2 is mainly obtained by using a context-dependent triphone model, and the length of each phoneme obtained by counting the corpus is used to maximize the utilization of all phonemes in the wakeup word within a fixed time window.
4. The method for realizing voice wakeup based on short training set optimized decoding network as claimed in claim 1, wherein the weight in S4 is obtained from phoneme entropy of each category in wakeup word corpus.
5. The voice wake-up implementing method based on small training set optimized decoding network as claimed in claim 1 or 4, characterized in that in S5, the recognition of the wake-up word is performed according to the threshold of the dynamic acoustic confidence obtained by the test; if yes, a wake word confirmation network is required to be entered, and whether the wake word confirmation network is a wake word or not is carried out, so that the reliability of the result is ensured.
CN201910334792.9A 2019-04-24 2019-04-24 Voice wake-up implementation method based on small training set optimization decoding network Expired - Fee Related CN110033758B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910334792.9A CN110033758B (en) 2019-04-24 2019-04-24 Voice wake-up implementation method based on small training set optimization decoding network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910334792.9A CN110033758B (en) 2019-04-24 2019-04-24 Voice wake-up implementation method based on small training set optimization decoding network

Publications (2)

Publication Number Publication Date
CN110033758A CN110033758A (en) 2019-07-19
CN110033758B true CN110033758B (en) 2021-09-24

Family

ID=67240130

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910334792.9A Expired - Fee Related CN110033758B (en) 2019-04-24 2019-04-24 Voice wake-up implementation method based on small training set optimization decoding network

Country Status (1)

Country Link
CN (1) CN110033758B (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110364143B (en) * 2019-08-14 2022-01-28 腾讯科技(深圳)有限公司 Voice awakening method and device and intelligent electronic equipment
CN110473536B (en) * 2019-08-20 2021-10-15 北京声智科技有限公司 Awakening method and device and intelligent device
CN110415699B (en) * 2019-08-30 2021-10-26 北京声智科技有限公司 Voice wake-up judgment method and device and electronic equipment
CN110610707B (en) * 2019-09-20 2022-04-22 科大讯飞股份有限公司 Voice keyword recognition method and device, electronic equipment and storage medium
CN110838289B (en) * 2019-11-14 2023-08-11 腾讯科技(深圳)有限公司 Wake-up word detection method, device, equipment and medium based on artificial intelligence
CN111210830B (en) * 2020-04-20 2020-08-11 深圳市友杰智新科技有限公司 Voice awakening method and device based on pinyin and computer equipment
CN112259108A (en) * 2020-09-27 2021-01-22 科大讯飞股份有限公司 Engine response time analysis method, electronic device and storage medium
CN112951211B (en) * 2021-04-22 2022-10-18 中国科学院声学研究所 Voice awakening method and device
CN113470646B (en) * 2021-06-30 2023-10-20 北京有竹居网络技术有限公司 Voice awakening method, device and equipment
CN113450771B (en) * 2021-07-15 2022-09-27 维沃移动通信有限公司 Awakening method, model training method and device
CN113763960B (en) * 2021-11-09 2022-04-26 深圳市友杰智新科技有限公司 Post-processing method and device for model output and computer equipment
CN114783438B (en) * 2022-06-17 2022-09-27 深圳市友杰智新科技有限公司 Adaptive decoding method, apparatus, computer device and storage medium
CN115862604B (en) * 2022-11-24 2024-02-20 镁佳(北京)科技有限公司 Voice awakening model training and voice awakening method and device and computer equipment

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102376305A (en) * 2011-11-29 2012-03-14 安徽科大讯飞信息科技股份有限公司 Speech recognition method and system
US8229744B2 (en) * 2003-08-26 2012-07-24 Nuance Communications, Inc. Class detection scheme and time mediated averaging of class dependent models
CN102999161A (en) * 2012-11-13 2013-03-27 安徽科大讯飞信息科技股份有限公司 Implementation method and application of voice awakening module
US9672815B2 (en) * 2012-07-20 2017-06-06 Interactive Intelligence Group, Inc. Method and system for real-time keyword spotting for speech analytics
CN107123417A (en) * 2017-05-16 2017-09-01 上海交通大学 Optimization method and system are waken up based on the customized voice that distinctive is trained
US20170301347A1 (en) * 2016-04-13 2017-10-19 Malaspina Labs (Barbados), Inc. Phonotactic-Based Speech Recognition & Re-synthesis
CN107331384A (en) * 2017-06-12 2017-11-07 平安科技(深圳)有限公司 Audio recognition method, device, computer equipment and storage medium
CN108281137A (en) * 2017-01-03 2018-07-13 中国科学院声学研究所 A kind of universal phonetic under whole tone element frame wakes up recognition methods and system

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8229744B2 (en) * 2003-08-26 2012-07-24 Nuance Communications, Inc. Class detection scheme and time mediated averaging of class dependent models
CN102376305A (en) * 2011-11-29 2012-03-14 安徽科大讯飞信息科技股份有限公司 Speech recognition method and system
US9672815B2 (en) * 2012-07-20 2017-06-06 Interactive Intelligence Group, Inc. Method and system for real-time keyword spotting for speech analytics
CN102999161A (en) * 2012-11-13 2013-03-27 安徽科大讯飞信息科技股份有限公司 Implementation method and application of voice awakening module
US20170301347A1 (en) * 2016-04-13 2017-10-19 Malaspina Labs (Barbados), Inc. Phonotactic-Based Speech Recognition & Re-synthesis
CN108281137A (en) * 2017-01-03 2018-07-13 中国科学院声学研究所 A kind of universal phonetic under whole tone element frame wakes up recognition methods and system
CN107123417A (en) * 2017-05-16 2017-09-01 上海交通大学 Optimization method and system are waken up based on the customized voice that distinctive is trained
CN107331384A (en) * 2017-06-12 2017-11-07 平安科技(深圳)有限公司 Audio recognition method, device, computer equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Neural speech recognition: Continuous phoneme decoding using spatiotemporal representations of human cortical activity;David A Moses et al;《J Neural Eng》;20160803;第13卷(第5期);全文 *
低资源环境下的语音识别技术研究;舒帆;《中国优秀硕士学位论文全文数据库 信息科技辑》;20180615(第06期);全文 *

Also Published As

Publication number Publication date
CN110033758A (en) 2019-07-19

Similar Documents

Publication Publication Date Title
CN110033758B (en) Voice wake-up implementation method based on small training set optimization decoding network
CN108305616B (en) Audio scene recognition method and device based on long-time and short-time feature extraction
US11062699B2 (en) Speech recognition with trained GMM-HMM and LSTM models
CN106098059B (en) Customizable voice awakening method and system
CN110364171B (en) Voice recognition method, voice recognition system and storage medium
CN101930735B (en) Speech emotion recognition equipment and speech emotion recognition method
CN105529028A (en) Voice analytical method and apparatus
CN109036467B (en) TF-LSTM-based CFFD extraction method, voice emotion recognition method and system
CN103077708B (en) Method for improving rejection capability of speech recognition system
KR20140082157A (en) Apparatus for speech recognition using multiple acoustic model and method thereof
CN107403619A (en) A kind of sound control method and system applied to bicycle environment
CN110211595B (en) Speaker clustering system based on deep learning
CN106340297A (en) Speech recognition method and system based on cloud computing and confidence calculation
CN107871499A (en) Audio recognition method, system, computer equipment and computer-readable recording medium
CN111161726B (en) Intelligent voice interaction method, device, medium and system
KR101065188B1 (en) Apparatus and method for speaker adaptation by evolutional learning, and speech recognition system using thereof
CN112071308A (en) Awakening word training method based on speech synthesis data enhancement
CN114596844A (en) Acoustic model training method, voice recognition method and related equipment
CN108831447A (en) Audio recognition method, device and storage medium based on HMM and PNN
CN113609264B (en) Data query method and device for power system nodes
Kermanshahi et al. Transfer learning for end-to-end ASR to deal with low-resource problem in persian language
CN114187914A (en) Voice recognition method and system
CN111599339B (en) Speech splicing synthesis method, system, equipment and medium with high naturalness
CN111833852B (en) Acoustic model training method and device and computer readable storage medium
CN102237082B (en) Self-adaption method of speech recognition system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20210924

CF01 Termination of patent right due to non-payment of annual fee