CN103117060A - Modeling approach and modeling system of acoustic model used in speech recognition - Google Patents

Modeling approach and modeling system of acoustic model used in speech recognition Download PDF

Info

Publication number
CN103117060A
CN103117060A CN2013100200107A CN201310020010A CN103117060A CN 103117060 A CN103117060 A CN 103117060A CN 2013100200107 A CN2013100200107 A CN 2013100200107A CN 201310020010 A CN201310020010 A CN 201310020010A CN 103117060 A CN103117060 A CN 103117060A
Authority
CN
China
Prior art keywords
phonetic feature
modeling
training
training data
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2013100200107A
Other languages
Chinese (zh)
Other versions
CN103117060B (en
Inventor
颜永红
肖业鸣
潘接林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Acoustics CAS
Beijing Kexin Technology Co Ltd
Original Assignee
Institute of Acoustics CAS
Beijing Kexin Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Acoustics CAS, Beijing Kexin Technology Co Ltd filed Critical Institute of Acoustics CAS
Priority to CN201310020010.7A priority Critical patent/CN103117060B/en
Publication of CN103117060A publication Critical patent/CN103117060A/en
Application granted granted Critical
Publication of CN103117060B publication Critical patent/CN103117060B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention relates to a modeling approach and a modeling system of an acoustic model used in speech recognition. The modeling approach includes the steps of: S1, training an initial model, wherein a modeling unit is a tri-phone state which is clustered by a phoneme decision tree and a state transition probability is provided by the model, S2, obtaining state information of a frame level based on the fact that the initial model aligns the tri-phone state of phonetic features of training data compulsively, S3, pre-training a deep neural network to obtain initial weights of each hidden layer, S4, training the initialized network through error back propagation algorithm based on the obtained frame level state information and updating the weights. According to the modeling approach, a context relevant tri-phone state is used as the modeling unit, the model is established based on the deep neural network, weight of each hidden layer of the network is initialized through restricted Boltzmann algorithm, and the weights can be updated subsequently by means of error back propagation algorithm. Therefore, risk that the network is easy to get into local extremum in pre-training is relieved effectively, and modeling accuracy of the acoustic model is improved greatly.

Description

The modeling method, the modeling that are used for the acoustic model of speech recognition
Technical field
The present invention relates to field of speech recognition, relate in particular to a kind of modeling method and modeling of the acoustic model for speech recognition.
Background technology
The main flow framework of speech recognition is at present identified based on statistical model.Typically the speech recognition system framework as shown in Figure 1: comprise voice collecting and front-end processing module, characteristic extracting module, acoustic model module, language model module and decoder module.The basic procedure of speech recognition is as follows: carry out feature extraction afterwards through front-end processing after voice acquisition device collector's voice, the characteristic sequence that extracts such as MFCC or PLP obtain it by acoustic model and observe probability, send into demoder in conjunction with probabilistic language model and obtain most possible text sequence.Described acoustic model modeling adopts mixed Gauss model to carry out modeling to the probability distribution of phonetic feature based on the Hidden Markov framework.Described mixed Gauss model can be done some inappropriate hypothesis to phonetic feature and distribution thereof, and as the linear independence hypothesis of adjacent phonetic feature, it is observed probability and obeys mixed Gaussian distribution etc.In addition, when mixed Gauss model carries out parameter training, objective function is to make the likelihood probability of observing feature maximum, and what use during decoding is the maximum a posteriori criterion, and is inconsistent on probability model.As seen traditional acoustic model, modeling accuracy is not high, causes the speech recognition effect not good enough.
Summary of the invention
For the problems referred to above, the embodiment of the present invention proposes a kind of modeling method, modeling of the acoustic model for speech recognition.
In first aspect, the embodiment of the present invention proposes a kind of modeling method of the acoustic model for speech recognition, described method comprises: with hidden Markov-mixed Gaussian HMM-GMM model of training data training, the modeling unit of this HMM-GMM model is the three-tone state after the phonetic feature of described training data passes through phoneme decision tree cluster, and described HMM-GMM model obtains the state transition probability of described three-tone state by the maximum EM algorithm of expectation; Based on described HMM-GMM model, the three-tone state of described training data phonetic feature is forced alignment, obtain described phonetic feature frame level status information; To carrying out pre-training as the deep layer neural network of described acoustic model to obtain the parameter for the weight of each hidden layer of the described deep layer network of initialization; Three-tone state based on described training data phonetic feature adopts error backpropagation algorithm that described deep layer neural network is trained, and upgrades the weight of its each hidden layer.
Preferably, described based on described HMM-GMM model, the three-tone state of described training data phonetic feature is forced alignment, obtain described phonetic feature frame level status information, be specially: based on described HMM-GMM model, the most probable three-tone state of described training data phonetic feature and its is carried out corresponding, obtain described phonetic feature frame level status information.
Preferably, describedly be specially for the parameter of the weight of each hidden layer of the described deep layer network of initialization obtaining carry out pre-training as the deep layer neural network of described acoustic model: utilize limited Boltzmann machine successively to train to convergence based on described training data, with the weight of each hidden layer of the described deep layer network of parameter initialization that obtains.
In second aspect, the embodiment of the present invention proposes a kind of modeling for the speech recognition acoustic model, it comprises: the first module, be used for hidden Markov-mixed Gaussian HMM-GMM model of training data training, the modeling unit of this HMM-GMM model is the three-tone state after the phonetic feature of described training data passes through phoneme decision tree cluster, and described HMM-GMM model obtains the state transition probability of described three-tone state by the maximum EM algorithm of expectation; The second module is used for based on described HMM-GMM model, and the three-tone state of described training data phonetic feature is forced alignment, obtains described phonetic feature frame level status information; The 3rd module is used for carrying out pre-training as the deep layer neural network of described acoustic model to obtain the parameter for the weight of each hidden layer of the described deep layer network of initialization; Four module is used for adopting error backpropagation algorithm that described deep layer neural network is trained based on the three-tone state of described training data phonetic feature, upgrades the weight of its each hidden layer.
Preferably, described the second module is based on described HMM-GMM model, the three-tone state of described training data phonetic feature is forced alignment, obtain described phonetic feature frame level status information, be specially: described the second module is based on described HMM-GMM model, the most probable three-tone state of described training data phonetic feature and its is carried out corresponding, obtain described phonetic feature frame level status information.
Preferably, described the 3rd module is specially for the parameter of the weight of each hidden layer of the described deep layer network of initialization obtaining carry out pre-training as the deep layer neural network of described acoustic model: described the 3rd module utilizes limited Boltzmann machine successively to train to convergence based on described training data, with the weight of each hidden layer of the described deep layer network of parameter initialization that obtains.
The embodiment of the present invention adopts the three-tone state, based on the deep layer neural net model establishing, use the weight of described each hidden layer of network of limited Boltzmann's algorithm initialization, described weight can also be updated by the back-propagation algorithm follow-up, can effectively alleviate when described network is trained in advance and easily be absorbed in the risk of local extremum, and further improve the modeling accuracy of acoustic model.
Description of drawings
The present invention is further detailed explanation below in conjunction with the drawings and specific embodiments.
Fig. 1 is existing speech recognition system schematic diagram;
Fig. 2 is the relevant deep layer neural network speech recognition system block diagram of the based on the context of the embodiment of the present invention;
Fig. 3 is the modeling method schematic diagram of the acoustic model that is used for speech recognition of the embodiment of the present invention;
Fig. 4 is the modeling schematic diagram of the acoustic model that is used for speech recognition of the embodiment of the present invention.
Embodiment
Below by drawings and Examples, the technical scheme of the embodiment of the present invention is described in further detail.
Consider that mixed Gauss model need to make incorrect hypothesis to phonetic feature and probability distribution thereof, the embodiment of the present invention uses context-sensitive deep layer neural network to replace mixed Gauss model to carry out the acoustic model modeling.Described deep layer neural network comprises a plurality of hidden layers, and its modeling unit is the context dependent three-tone state after phoneme decision tree cluster.The fundamental block diagram of whole system as shown in Figure 2.
Adopt minimum cross entropy criterion as objective function during the deep layer neural metwork training, because it has a plurality of hidden layers, its error function has a lot of local extremums, causes the deep layer neural network to be easy to be absorbed in local extremum and too early convergence at training process.For this problem, the pre-training of neural network of passing through of neural calculating field proposition comes the initializes weights parameter, then adopts traditional error backpropagation algorithm that network parameter is trained.Pre-training algorithm adopts limited Boltzmann machine, and limited Boltzmann machine is the two-dimensional plot model, comprises a visible layer and a hidden layer, wherein between each unit of same layer without the interconnected and dense link in unit different layers.This model is by the joint distribution of an energy function definition visible layer and hidden layer variable, and concrete formula is as follows:
Figure BDA00002753151200041
Wherein v is the visible layer variable, and h is the hidden layer variable, and E (v, h) is energy function, and p (v, h) is its joint distribution probability, observes feature likelihood probability p (v) by maximum during training, and its weight parameter more new formula is as follows:
Δw ij=<v ih jdata-<v ih jmodel
w ij(t+1)=w ij(t)+Δw ij
W wherein ijBe connection weight, t is iterations,<>represent the variable in bracket is got average.
By successively training limited Boltzmann machine, its parameter is used for initialization deep layer neural network, thereby makes its initial weight fall into a reasonable starting point of weight space, be absorbed in the risk of local extremum when having alleviated to a certain extent network training.Adopt simultaneously three-tone state after phoneme decision tree cluster as the teacher signal of neural network, comprised the context relation of phoneme, make the modeling of acoustic model meticulousr and accurate.
Fig. 3 is the modeling method schematic diagram of the acoustic model that is used for speech recognition of the embodiment of the present invention.Described method comprises: step 1, set up initial model.Particularly, with hidden Markov-mixed Gaussian HMM-GMM model of training data training, the modeling unit of this HMM-GMM model is the three-tone state after the phonetic feature of described training data passes through phoneme decision tree cluster, and described HMM-GMM model obtains the state transition probability of described three-tone state by the maximum EM algorithm of expectation;
Step 2, the phonetic feature frame level status information of the phonetic feature of acquisition training data.Particularly, based on described HMM-GMM model, the three-tone state of described training data phonetic feature is forced alignment, obtain described phonetic feature frame level status information;
Preferably, described based on described HMM-GMM model, the three-tone state of described training data phonetic feature is forced alignment, obtain described phonetic feature frame level status information, be specially: based on described HMM-GMM model, the most probable three-tone state of described training data phonetic feature and its is carried out corresponding, obtain described phonetic feature frame level status information.
Step 3, each hidden layer weight of initialization deep layer neural network.Particularly, to carrying out pre-training as the deep layer neural network of described acoustic model to obtain the parameter for the weight of each hidden layer of the described deep layer network of initialization;
Step 4 is upgraded each hidden layer weight of deep layer neural network.Particularly, based on the three-tone state employing error backpropagation algorithm of described training data phonetic feature, described deep layer neural network is trained, upgrade the weight of its each hidden layer.
Preferably, describedly be specially for the parameter of the weight of each hidden layer of the described deep layer network of initialization obtaining carry out pre-training as the deep layer neural network of described acoustic model: utilize limited Boltzmann machine successively to train to convergence based on described training data, with the weight of each hidden layer of the described deep layer network of parameter initialization that obtains.
Be noted that described hidden Markov-mixed Gaussian HMM-GMM model also can be write as hidden Markov/mixed Gaussian HMM/GMM model.
Pre-training in described step 3 can be considered as a kind of unsupervised training.Training in step 3 can be considered as a kind of training that supervision is arranged.
In addition, the pre-training in step 3 and step 2 can be carried out simultaneously.
When described HMM-GMM model is used for speech recognition as acoustic model, be converted to likelihood probability based on the posterior probability that phonetic feature is generated through the deep layer neural network by Bayesian formula and send into demoder and decode, the text sequence that obtains after decoding is namely as the content of speaking that recognizes.Can assess the effect of speech recognition based on the difference of described speak content and the real raw tone that recognizes.Can assess in speech recognition system performance as the deep layer neural network of acoustic model according to this effect, can consider where necessary it is carried out retraining, even can consider state transition probability in described HMM-GMM model is designed again.
Fig. 4 is the modeling schematic diagram of the acoustic model that is used for speech recognition of the embodiment of the present invention.Described modeling comprises: the first module, be used for hidden Markov-mixed Gaussian HMM-GMM model of training data training, the modeling unit of this HMM-GMM model is the three-tone state after the phonetic feature of described training data passes through phoneme decision tree cluster, and described HMM-GMM model obtains the state transition probability of described three-tone state by the maximum EM algorithm of expectation; The second module is used for based on described HMM-GMM model, and the three-tone state of described training data phonetic feature is forced alignment, obtains described phonetic feature frame level status information; The 3rd module is used for carrying out pre-training as the deep layer neural network of described acoustic model to obtain the parameter for the weight of each hidden layer of the described deep layer network of initialization; Four module is used for adopting error backpropagation algorithm that described deep layer neural network is trained based on the three-tone state of described training data phonetic feature, upgrades the weight of its each hidden layer.
Preferably, described the second module is based on described HMM-GMM model, the three-tone state of described training data phonetic feature is forced alignment, obtain described phonetic feature frame level status information, be specially: described the second module is based on described HMM-GMM model, the most probable three-tone state of described training data phonetic feature and its is carried out corresponding, obtain described phonetic feature frame level status information.
Preferably, described the 3rd module is specially for the parameter of the weight of each hidden layer of the described deep layer network of initialization obtaining carry out pre-training as the deep layer neural network of described acoustic model: described the 3rd module utilizes limited Boltzmann machine successively to train to convergence based on described training data, with the weight of each hidden layer of the described deep layer network of parameter initialization that obtains.
The embodiment of the present invention adopts the deep layer neural network to replace mixed Gauss model to carry out the acoustic model modeling, utilized the three-tone state with context dependent characteristic during modeling, and be different from described mixed Gauss model and need to do some ad hoc hypothesis to phonetic feature and distribution thereof, directly provide the posterior probability of phonetic feature.Described three-tone state has taken into full account the context dependence of language, makes modeling unit more careful, and described a plurality of hidden layers are more similar to human speech sensory perceptual system principle, are beneficial to the extraction of carrying out the high-order characteristic information.The embodiment of the present invention is used the weight of described each hidden layer of network of limited Boltzmann's algorithm initialization, described weight can also be updated by the back-propagation algorithm follow-up, can effectively alleviate when described network is trained in advance and easily be absorbed in the risk of local extremum, and further improve the modeling accuracy of acoustic model.
Those skilled in the art should further recognize, each exemplary module and algorithm steps in conjunction with embodiment description disclosed herein, can realize with electronic hardware, computer software or combination both, for the interchangeability of hardware and software clearly is described, composition and the step of each example described in general manner according to function in the above description.These functions are carried out with hardware or software mode actually, depend on application-specific and the design constraint of technical scheme.Those skilled in the art can specifically should be used for realizing described function with distinct methods to each, but this realization should not thought the scope that exceeds the application.
The method of describing in conjunction with embodiment disclosed herein or the step of algorithm can use the software module of hardware, processor execution, and perhaps both combination is implemented.Software module can be placed in the storage medium of any other form known in random access memory (RAM), internal memory, ROM (read-only memory) (ROM), electrically programmable ROM, electrically erasable ROM, register, hard disk, moveable magnetic disc, CD-ROM or technical field.
It is to be noted, these are only preferred embodiment of the present invention, be not to limit practical range of the present invention, technician with professional knowledge base can realize the present invention by above embodiment, therefore every any variation, modification and improvement according to making within the spirit and principles in the present invention, all covered by the scope of the claims of the present invention.Namely, above embodiment is only unrestricted in order to technical scheme of the present invention to be described, although with reference to preferred embodiment, the present invention is had been described in detail, those of ordinary skill in the art is to be understood that, can modify or be equal to replacement technical scheme of the present invention, and not break away from the spirit and scope of technical solution of the present invention.

Claims (6)

1. a modeling method that is used for the acoustic model of speech recognition, is characterized in that, described method comprises:
With hidden Markov-mixed Gaussian HMM-GMM model of training data training, the modeling unit of this HMM-GMM model is the three-tone state after the phonetic feature of described training data passes through phoneme decision tree cluster, described HMM-GMM model obtains by the maximum EM Algorithm for Training of expectation, obtains simultaneously the state transition probability of described three-tone state;
Based on described HMM-GMM model, described training data phonetic feature is forced alignment, obtain described other three-tone status information of phonetic feature frame level;
To carrying out pre-training as the deep layer neural network of described acoustic model to obtain the parameter for the weight of each hidden layer of the described deep layer network of initialization;
Phonetic feature frame level status information based on described training data phonetic feature adopts error backpropagation algorithm that described deep layer neural network is trained, and upgrades the weight of its each hidden layer.
2. modeling method as claimed in claim 1, it is characterized in that, described based on described HMM-GMM model, the three-tone state of described training data phonetic feature is forced alignment, obtain described phonetic feature frame level status information, be specially: based on described HMM-GMM model, the most probable three-tone state of described training data phonetic feature and its is carried out corresponding, obtain described phonetic feature frame level status information.
3. modeling method as claimed in claim 1, it is characterized in that, describedly be specially for the parameter of the weight of each hidden layer of the described deep layer network of initialization obtaining carry out pre-training as the deep layer neural network of described acoustic model: utilize limited Boltzmann machine successively to train to convergence based on described training data, with the weight of each hidden layer of the described deep layer network of parameter initialization that obtains.
4. a modeling that is used for the speech recognition acoustic model, is characterized in that, described modeling comprises:
The first module, be used for hidden Markov-mixed Gaussian HMM-GMM model of training data training, the modeling unit of this HMM-GMM model is the three-tone state after the phonetic feature of described training data passes through phoneme decision tree cluster, and described HMM-GMM model obtains the state transition probability of described three-tone state by the maximum EM algorithm of expectation;
The second module is used for based on described HMM-GMM model, and described training data phonetic feature is forced alignment, obtains the three-tone status information of described phonetic feature frame level;
The 3rd module is used for carrying out pre-training as the deep layer neural network of described acoustic model to obtain the parameter for the weight of each hidden layer of the described deep layer network of initialization;
Four module is used for adopting error backpropagation algorithm that described deep layer neural network is trained based on the phonetic feature frame level status information of described training data phonetic feature, upgrades the weight of its each hidden layer.
5. modeling as claimed in claim 4, it is characterized in that, described the second module is based on described HMM-GMM model, the three-tone state of described training data phonetic feature is forced alignment, obtain described phonetic feature frame level status information, be specially: described the second module is based on described HMM-GMM model, the most probable three-tone state of described training data phonetic feature and its carried out corresponding, obtains described phonetic feature frame level status information.
6. modeling as claimed in claim 4, it is characterized in that, described the 3rd module is specially for the parameter of the weight of each hidden layer of the described deep layer network of initialization obtaining carry out pre-training as the deep layer neural network of described acoustic model: described the 3rd module utilizes limited Boltzmann machine successively to train to convergence based on described training data, with the weight of each hidden layer of the described deep layer network of parameter initialization that obtains.
CN201310020010.7A 2013-01-18 2013-01-18 For modeling method, the modeling of the acoustic model of speech recognition Expired - Fee Related CN103117060B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310020010.7A CN103117060B (en) 2013-01-18 2013-01-18 For modeling method, the modeling of the acoustic model of speech recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310020010.7A CN103117060B (en) 2013-01-18 2013-01-18 For modeling method, the modeling of the acoustic model of speech recognition

Publications (2)

Publication Number Publication Date
CN103117060A true CN103117060A (en) 2013-05-22
CN103117060B CN103117060B (en) 2015-10-28

Family

ID=48415418

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310020010.7A Expired - Fee Related CN103117060B (en) 2013-01-18 2013-01-18 For modeling method, the modeling of the acoustic model of speech recognition

Country Status (1)

Country Link
CN (1) CN103117060B (en)

Cited By (43)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103345656A (en) * 2013-07-17 2013-10-09 中国科学院自动化研究所 Method and device for data identification based on multitask deep neural network
CN103514879A (en) * 2013-09-18 2014-01-15 广东欧珀移动通信有限公司 Local voice recognition method based on BP neural network
CN103680496A (en) * 2013-12-19 2014-03-26 百度在线网络技术(北京)有限公司 Deep-neural-network-based acoustic model training method, hosts and system
CN103839211A (en) * 2014-03-23 2014-06-04 合肥新涛信息科技有限公司 Medical history transferring system based on voice recognition
CN103839546A (en) * 2014-03-26 2014-06-04 合肥新涛信息科技有限公司 Voice recognition system based on Yangze river and Huai river language family
CN104036774A (en) * 2014-06-20 2014-09-10 国家计算机网络与信息安全管理中心 Method and system for recognizing Tibetan dialects
CN104347066A (en) * 2013-08-09 2015-02-11 盛乐信息技术(上海)有限公司 Deep neural network-based baby cry identification method and system
CN104376842A (en) * 2013-08-12 2015-02-25 清华大学 Neural network language model training method and device and voice recognition method
CN104575497A (en) * 2013-10-28 2015-04-29 中国科学院声学研究所 Method for building acoustic model and speech decoding method based on acoustic model
CN105229676A (en) * 2013-05-23 2016-01-06 国立研究开发法人情报通信研究机构 The learning device of the learning method of deep-neural-network and learning device and category independently sub-network
CN105654955A (en) * 2016-03-18 2016-06-08 华为技术有限公司 Voice recognition method and device
CN105745700A (en) * 2013-11-27 2016-07-06 国立研究开发法人情报通信研究机构 Statistical-acoustic-model adaptation method, acoustic-model learning method suitable for statistical-acoustic-model adaptation, storage medium in which parameters for building deep neural network are stored, and computer program for adapting statistical acoustic model
CN105761720A (en) * 2016-04-19 2016-07-13 北京地平线机器人技术研发有限公司 Interaction system based on voice attribute classification, and method thereof
CN105874530A (en) * 2013-10-30 2016-08-17 格林伊登美国控股有限责任公司 Predicting recognition quality of a phrase in automatic speech recognition systems
CN105960672A (en) * 2014-09-09 2016-09-21 微软技术许可有限责任公司 Variable-component deep neural network for robust speech recognition
CN106023995A (en) * 2015-08-20 2016-10-12 漳州凯邦电子有限公司 Voice recognition method and wearable voice control device using the method
CN106157953A (en) * 2015-04-16 2016-11-23 科大讯飞股份有限公司 continuous speech recognition method and system
CN106297773A (en) * 2015-05-29 2017-01-04 中国科学院声学研究所 A kind of neutral net acoustic training model method
CN106611599A (en) * 2015-10-21 2017-05-03 展讯通信(上海)有限公司 Voice recognition method and device based on artificial neural network and electronic equipment
CN106652999A (en) * 2015-10-29 2017-05-10 三星Sds株式会社 System and method for voice recognition
CN106782504A (en) * 2016-12-29 2017-05-31 百度在线网络技术(北京)有限公司 Audio recognition method and device
CN106782511A (en) * 2016-12-22 2017-05-31 太原理工大学 Amendment linear depth autoencoder network audio recognition method
CN106816147A (en) * 2017-01-25 2017-06-09 上海交通大学 Speech recognition system based on binary neural network acoustic model
WO2017114201A1 (en) * 2015-12-31 2017-07-06 阿里巴巴集团控股有限公司 Method and device for executing setting operation
CN107112006A (en) * 2014-10-02 2017-08-29 微软技术许可有限责任公司 Speech processes based on neutral net
CN107680582A (en) * 2017-07-28 2018-02-09 平安科技(深圳)有限公司 Acoustic training model method, audio recognition method, device, equipment and medium
CN108111335A (en) * 2017-12-04 2018-06-01 华中科技大学 A kind of method and system dispatched and link virtual network function
CN108109615A (en) * 2017-12-21 2018-06-01 内蒙古工业大学 A kind of construction and application method of the Mongol acoustic model based on DNN
CN108346423A (en) * 2017-01-23 2018-07-31 北京搜狗科技发展有限公司 The treating method and apparatus of phonetic synthesis model
CN108428448A (en) * 2017-02-13 2018-08-21 芋头科技(杭州)有限公司 A kind of sound end detecting method and audio recognition method
CN108648747A (en) * 2018-03-21 2018-10-12 清华大学 Language recognition system
CN109215637A (en) * 2017-06-30 2019-01-15 三星Sds株式会社 Audio recognition method
CN109326277A (en) * 2018-12-05 2019-02-12 四川长虹电器股份有限公司 Semi-supervised phoneme forces alignment model method for building up and system
CN109545201A (en) * 2018-12-15 2019-03-29 中国人民解放军战略支援部队信息工程大学 The construction method of acoustic model based on the analysis of deep layer hybrid cytokine
CN109741735A (en) * 2017-10-30 2019-05-10 阿里巴巴集团控股有限公司 The acquisition methods and device of a kind of modeling method, acoustic model
CN109975762A (en) * 2017-12-28 2019-07-05 中国科学院声学研究所 A kind of underwater sound source localization method
CN110070855A (en) * 2018-01-23 2019-07-30 中国科学院声学研究所 A kind of speech recognition system and method based on migration neural network acoustic model
US10452995B2 (en) 2015-06-29 2019-10-22 Microsoft Technology Licensing, Llc Machine learning classification on hardware accelerators with stacked memory
CN110459216A (en) * 2019-08-14 2019-11-15 桂林电子科技大学 A kind of dining room brushing card device and application method with speech recognition
US10540588B2 (en) 2015-06-29 2020-01-21 Microsoft Technology Licensing, Llc Deep neural network processing on hardware accelerators with stacked memory
US10606651B2 (en) 2015-04-17 2020-03-31 Microsoft Technology Licensing, Llc Free form expression accelerator with thread length-based thread assignment to clustered soft processor cores that share a functional circuit
CN112259089A (en) * 2019-07-04 2021-01-22 阿里巴巴集团控股有限公司 Voice recognition method and device
CN113450786A (en) * 2020-03-25 2021-09-28 阿里巴巴集团控股有限公司 Network model obtaining method, information processing method, device and electronic equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5317673A (en) * 1992-06-22 1994-05-31 Sri International Method and apparatus for context-dependent estimation of multiple probability distributions of phonetic classes with multilayer perceptrons in a speech recognition system
CN1427368A (en) * 2001-12-19 2003-07-02 中国科学院自动化研究所 Palm computer non specific human speech sound distinguishing method
CN1588536A (en) * 2004-09-29 2005-03-02 上海交通大学 State structure regulating method in sound identification
CN101740024A (en) * 2008-11-19 2010-06-16 中国科学院自动化研究所 Method for automatic evaluation based on generalized fluent spoken language fluency
CN102411931A (en) * 2010-09-15 2012-04-11 微软公司 Deep belief network for large vocabulary continuous speech recognition
CN102693723A (en) * 2012-04-01 2012-09-26 北京安慧音通科技有限责任公司 Method and device for recognizing speaker-independent isolated word based on subspace

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5317673A (en) * 1992-06-22 1994-05-31 Sri International Method and apparatus for context-dependent estimation of multiple probability distributions of phonetic classes with multilayer perceptrons in a speech recognition system
CN1427368A (en) * 2001-12-19 2003-07-02 中国科学院自动化研究所 Palm computer non specific human speech sound distinguishing method
CN1588536A (en) * 2004-09-29 2005-03-02 上海交通大学 State structure regulating method in sound identification
CN101740024A (en) * 2008-11-19 2010-06-16 中国科学院自动化研究所 Method for automatic evaluation based on generalized fluent spoken language fluency
CN102411931A (en) * 2010-09-15 2012-04-11 微软公司 Deep belief network for large vocabulary continuous speech recognition
CN102693723A (en) * 2012-04-01 2012-09-26 北京安慧音通科技有限责任公司 Method and device for recognizing speaker-independent isolated word based on subspace

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
IBRAHIM M. M. EL-EMARY1, MOHAMED FEZARI AND HAMZA ATTOUI: "Hidden Markov model/Gaussian mixture models (HMM/GMM) based voice command system: A way to improve the control of remotely operated robot arm TR45", 《SCIENTIFIC RESEARCH AND ESSAYS》 *
POONAM BANSAL, ANUJ KANT, SUMIT KUMAR, AKASH SHARDA, SHITIJ GUPT: "IMPROVED HYBRID MODEL OF HMM/GMM FOR SPEECH RECOGNITION", 《INTERNATIONAL CONFERENCE "INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS" INFOS 2008, VARNA, BULGARIA, JUNE-JULY 2008》 *
倪崇嘉,刘文举,徐波: "韵律相关的汉语语音识别系统研究", 《计算机应用研究》 *
黄浩,李兵虎,吾守尔.斯拉木: "区分性模型组合中基于决策树的声学上下文建模方法", 《自动化学报》 *

Cited By (69)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105229676A (en) * 2013-05-23 2016-01-06 国立研究开发法人情报通信研究机构 The learning device of the learning method of deep-neural-network and learning device and category independently sub-network
US9691020B2 (en) 2013-05-23 2017-06-27 National Institute Of Information And Communications Technology Deep neural network learning method and apparatus, and category-independent sub-network learning apparatus
CN105229676B (en) * 2013-05-23 2018-11-23 国立研究开发法人情报通信研究机构 The learning method and learning device of deep-neural-network
CN103345656B (en) * 2013-07-17 2016-01-20 中国科学院自动化研究所 A kind of data identification method based on multitask deep neural network and device
CN103345656A (en) * 2013-07-17 2013-10-09 中国科学院自动化研究所 Method and device for data identification based on multitask deep neural network
CN104347066A (en) * 2013-08-09 2015-02-11 盛乐信息技术(上海)有限公司 Deep neural network-based baby cry identification method and system
CN104347066B (en) * 2013-08-09 2019-11-12 上海掌门科技有限公司 Recognition method for baby cry and system based on deep-neural-network
CN104376842A (en) * 2013-08-12 2015-02-25 清华大学 Neural network language model training method and device and voice recognition method
CN103514879A (en) * 2013-09-18 2014-01-15 广东欧珀移动通信有限公司 Local voice recognition method based on BP neural network
CN104575497A (en) * 2013-10-28 2015-04-29 中国科学院声学研究所 Method for building acoustic model and speech decoding method based on acoustic model
CN104575497B (en) * 2013-10-28 2017-10-03 中国科学院声学研究所 A kind of acoustic model method for building up and the tone decoding method based on the model
US10319366B2 (en) 2013-10-30 2019-06-11 Genesys Telecommunications Laboratories, Inc. Predicting recognition quality of a phrase in automatic speech recognition systems
CN105874530A (en) * 2013-10-30 2016-08-17 格林伊登美国控股有限责任公司 Predicting recognition quality of a phrase in automatic speech recognition systems
CN105874530B (en) * 2013-10-30 2020-03-03 格林伊登美国控股有限责任公司 Predicting phrase recognition quality in an automatic speech recognition system
CN105745700A (en) * 2013-11-27 2016-07-06 国立研究开发法人情报通信研究机构 Statistical-acoustic-model adaptation method, acoustic-model learning method suitable for statistical-acoustic-model adaptation, storage medium in which parameters for building deep neural network are stored, and computer program for adapting statistical acoustic model
CN105745700B (en) * 2013-11-27 2019-11-01 国立研究开发法人情报通信研究机构 The adaptive approach and learning method of statistical acoustics model, recording medium
CN103680496B (en) * 2013-12-19 2016-08-10 百度在线网络技术(北京)有限公司 Acoustic training model method based on deep-neural-network, main frame and system
CN103680496A (en) * 2013-12-19 2014-03-26 百度在线网络技术(北京)有限公司 Deep-neural-network-based acoustic model training method, hosts and system
CN103839211A (en) * 2014-03-23 2014-06-04 合肥新涛信息科技有限公司 Medical history transferring system based on voice recognition
CN103839546A (en) * 2014-03-26 2014-06-04 合肥新涛信息科技有限公司 Voice recognition system based on Yangze river and Huai river language family
CN104036774A (en) * 2014-06-20 2014-09-10 国家计算机网络与信息安全管理中心 Method and system for recognizing Tibetan dialects
CN105960672A (en) * 2014-09-09 2016-09-21 微软技术许可有限责任公司 Variable-component deep neural network for robust speech recognition
CN105960672B (en) * 2014-09-09 2019-11-26 微软技术许可有限责任公司 Variable component deep neural network for Robust speech recognition
CN107112006A (en) * 2014-10-02 2017-08-29 微软技术许可有限责任公司 Speech processes based on neutral net
CN106157953B (en) * 2015-04-16 2020-02-07 科大讯飞股份有限公司 Continuous speech recognition method and system
CN106157953A (en) * 2015-04-16 2016-11-23 科大讯飞股份有限公司 continuous speech recognition method and system
US10606651B2 (en) 2015-04-17 2020-03-31 Microsoft Technology Licensing, Llc Free form expression accelerator with thread length-based thread assignment to clustered soft processor cores that share a functional circuit
CN106297773B (en) * 2015-05-29 2019-11-19 中国科学院声学研究所 A kind of neural network acoustic training model method
CN106297773A (en) * 2015-05-29 2017-01-04 中国科学院声学研究所 A kind of neutral net acoustic training model method
US10452995B2 (en) 2015-06-29 2019-10-22 Microsoft Technology Licensing, Llc Machine learning classification on hardware accelerators with stacked memory
US10540588B2 (en) 2015-06-29 2020-01-21 Microsoft Technology Licensing, Llc Deep neural network processing on hardware accelerators with stacked memory
CN106023995A (en) * 2015-08-20 2016-10-12 漳州凯邦电子有限公司 Voice recognition method and wearable voice control device using the method
CN106611599A (en) * 2015-10-21 2017-05-03 展讯通信(上海)有限公司 Voice recognition method and device based on artificial neural network and electronic equipment
CN106652999A (en) * 2015-10-29 2017-05-10 三星Sds株式会社 System and method for voice recognition
CN106940998A (en) * 2015-12-31 2017-07-11 阿里巴巴集团控股有限公司 A kind of execution method and device of setting operation
WO2017114201A1 (en) * 2015-12-31 2017-07-06 阿里巴巴集团控股有限公司 Method and device for executing setting operation
CN105654955A (en) * 2016-03-18 2016-06-08 华为技术有限公司 Voice recognition method and device
CN105654955B (en) * 2016-03-18 2019-11-12 华为技术有限公司 Audio recognition method and device
CN105761720B (en) * 2016-04-19 2020-01-07 北京地平线机器人技术研发有限公司 Interactive system and method based on voice attribute classification
CN105761720A (en) * 2016-04-19 2016-07-13 北京地平线机器人技术研发有限公司 Interaction system based on voice attribute classification, and method thereof
CN106782511A (en) * 2016-12-22 2017-05-31 太原理工大学 Amendment linear depth autoencoder network audio recognition method
CN106782504A (en) * 2016-12-29 2017-05-31 百度在线网络技术(北京)有限公司 Audio recognition method and device
CN108346423A (en) * 2017-01-23 2018-07-31 北京搜狗科技发展有限公司 The treating method and apparatus of phonetic synthesis model
CN106816147A (en) * 2017-01-25 2017-06-09 上海交通大学 Speech recognition system based on binary neural network acoustic model
CN108428448A (en) * 2017-02-13 2018-08-21 芋头科技(杭州)有限公司 A kind of sound end detecting method and audio recognition method
CN109215637A (en) * 2017-06-30 2019-01-15 三星Sds株式会社 Audio recognition method
CN109215637B (en) * 2017-06-30 2023-09-01 三星Sds株式会社 speech recognition method
WO2019019252A1 (en) * 2017-07-28 2019-01-31 平安科技(深圳)有限公司 Acoustic model training method, speech recognition method and apparatus, device and medium
CN107680582B (en) * 2017-07-28 2021-03-26 平安科技(深圳)有限公司 Acoustic model training method, voice recognition method, device, equipment and medium
CN107680582A (en) * 2017-07-28 2018-02-09 平安科技(深圳)有限公司 Acoustic training model method, audio recognition method, device, equipment and medium
US11030998B2 (en) 2017-07-28 2021-06-08 Ping An Technology (Shenzhen) Co., Ltd. Acoustic model training method, speech recognition method, apparatus, device and medium
CN109741735A (en) * 2017-10-30 2019-05-10 阿里巴巴集团控股有限公司 The acquisition methods and device of a kind of modeling method, acoustic model
CN109741735B (en) * 2017-10-30 2023-09-01 阿里巴巴集团控股有限公司 Modeling method, acoustic model acquisition method and acoustic model acquisition device
CN108111335B (en) * 2017-12-04 2019-07-23 华中科技大学 A kind of method and system of scheduling and link virtual network function
CN108111335A (en) * 2017-12-04 2018-06-01 华中科技大学 A kind of method and system dispatched and link virtual network function
CN108109615A (en) * 2017-12-21 2018-06-01 内蒙古工业大学 A kind of construction and application method of the Mongol acoustic model based on DNN
CN109975762A (en) * 2017-12-28 2019-07-05 中国科学院声学研究所 A kind of underwater sound source localization method
CN109975762B (en) * 2017-12-28 2021-05-18 中国科学院声学研究所 Underwater sound source positioning method
CN110070855B (en) * 2018-01-23 2021-07-23 中国科学院声学研究所 Voice recognition system and method based on migrating neural network acoustic model
CN110070855A (en) * 2018-01-23 2019-07-30 中国科学院声学研究所 A kind of speech recognition system and method based on migration neural network acoustic model
CN108648747B (en) * 2018-03-21 2020-06-02 清华大学 Language identification system
CN108648747A (en) * 2018-03-21 2018-10-12 清华大学 Language recognition system
CN109326277A (en) * 2018-12-05 2019-02-12 四川长虹电器股份有限公司 Semi-supervised phoneme forces alignment model method for building up and system
CN109326277B (en) * 2018-12-05 2022-02-08 四川长虹电器股份有限公司 Semi-supervised phoneme forced alignment model establishing method and system
CN109545201B (en) * 2018-12-15 2023-06-06 中国人民解放军战略支援部队信息工程大学 Construction method of acoustic model based on deep mixing factor analysis
CN109545201A (en) * 2018-12-15 2019-03-29 中国人民解放军战略支援部队信息工程大学 The construction method of acoustic model based on the analysis of deep layer hybrid cytokine
CN112259089A (en) * 2019-07-04 2021-01-22 阿里巴巴集团控股有限公司 Voice recognition method and device
CN110459216A (en) * 2019-08-14 2019-11-15 桂林电子科技大学 A kind of dining room brushing card device and application method with speech recognition
CN113450786A (en) * 2020-03-25 2021-09-28 阿里巴巴集团控股有限公司 Network model obtaining method, information processing method, device and electronic equipment

Also Published As

Publication number Publication date
CN103117060B (en) 2015-10-28

Similar Documents

Publication Publication Date Title
CN103117060A (en) Modeling approach and modeling system of acoustic model used in speech recognition
CN112509564B (en) End-to-end voice recognition method based on connection time sequence classification and self-attention mechanism
CN108172218B (en) Voice modeling method and device
KR101415534B1 (en) Multi-stage speech recognition apparatus and method
CN104681036B (en) A kind of detecting system and method for language audio
CN103400577B (en) The acoustic model method for building up of multilingual speech recognition and device
CN103065620B (en) Method with which text input by user is received on mobile phone or webpage and synthetized to personalized voice in real time
US20160284347A1 (en) Processing audio waveforms
US10714076B2 (en) Initialization of CTC speech recognition with standard HMM
CN108281137A (en) A kind of universal phonetic under whole tone element frame wakes up recognition methods and system
CN104036774A (en) Method and system for recognizing Tibetan dialects
WO2019019252A1 (en) Acoustic model training method, speech recognition method and apparatus, device and medium
CN104575497B (en) A kind of acoustic model method for building up and the tone decoding method based on the model
CN109887484A (en) A kind of speech recognition based on paired-associate learning and phoneme synthesizing method and device
CN107767861A (en) voice awakening method, system and intelligent terminal
CN107093422B (en) Voice recognition method and voice recognition system
CN111445898B (en) Language identification method and device, electronic equipment and storage medium
CN102810311B (en) Speaker estimation method and speaker estimation equipment
CN111833845A (en) Multi-language speech recognition model training method, device, equipment and storage medium
CN111091809B (en) Regional accent recognition method and device based on depth feature fusion
WO2017177484A1 (en) Voice recognition-based decoding method and device
CN102945673A (en) Continuous speech recognition method with speech command range changed dynamically
Kapralova et al. A big data approach to acoustic model training corpus selection
Ferrer et al. Spoken language recognition based on senone posteriors.
CN102521402B (en) Text filtering system and method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20151028