CN106057196B - Vehicle voice data parses recognition methods - Google Patents

Vehicle voice data parses recognition methods Download PDF

Info

Publication number
CN106057196B
CN106057196B CN201610534783.0A CN201610534783A CN106057196B CN 106057196 B CN106057196 B CN 106057196B CN 201610534783 A CN201610534783 A CN 201610534783A CN 106057196 B CN106057196 B CN 106057196B
Authority
CN
China
Prior art keywords
model
state
parameter
vector
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201610534783.0A
Other languages
Chinese (zh)
Other versions
CN106057196A (en
Inventor
谢欣霖
陈波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Zhida Science And Technology Co Ltd
Original Assignee
Chengdu Zhida Science And Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Zhida Science And Technology Co Ltd filed Critical Chengdu Zhida Science And Technology Co Ltd
Priority to CN201610534783.0A priority Critical patent/CN106057196B/en
Publication of CN106057196A publication Critical patent/CN106057196A/en
Application granted granted Critical
Publication of CN106057196B publication Critical patent/CN106057196B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/14Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
    • G10L15/142Hidden Markov Models [HMMs]
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/14Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
    • G10L15/142Hidden Markov Models [HMMs]
    • G10L15/144Training of HMMs

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Machine Translation (AREA)
  • Traffic Control Systems (AREA)

Abstract

The present invention provides a kind of vehicle voice datas to parse recognition methods, this method comprises: reading in the voice to be identified, observation sequence is obtained after front-end processing, the conditional probability of calculating observation sequence model corresponding with all entries determines entry recognition result according to the conditional probability.The invention proposes a kind of vehicle voice datas to parse recognition methods, does not need the label training sample set in offline dictionary, small to the dependence of rule, improves accuracy of identification, adapts to the demand that onboard system is constantly updated.

Description

Vehicle voice data parses recognition methods
Technical field
The present invention relates to speech recognition, in particular to a kind of vehicle voice data parses recognition methods.
Background technique
The development that the technologies such as cloud computing, big data, data mining will push information services industry faster and better, wherein merging Guidance people are more accurately and efficiently obtained required information and service by the information service of natural language understanding technology.Voice Man-machine interaction mode of the technology as most desirable will gradually become one kind more and more crucial in many interactive modes. Specific to automotive field, the practical intelligent information service system of height is customized out using natural language understanding technology, with more people Property man-machine interaction mode more convenient, accurate information speech order and navigation are provided, will be prospect for the experience of driving Wide promotion.However existing vehicle environment speech recognition is by the way that needing in biggish offline dictionary, largely label trains sample This collection carries out study and carries out semantic supposition, relatively high to the dependence of rule, less flexibly, it is continuous can not to adapt to onboard system The demand and precision of variation and accuracy are lower.
Summary of the invention
To solve the problems of above-mentioned prior art, the invention proposes a kind of vehicle voice datas to parse identification side Method, comprising:
The voice to be identified is read in, observation sequence, calculating observation sequence and all entries pair are obtained after front-end processing The conditional probability for answering model determines entry recognition result according to the conditional probability.
Preferably, before the conditional probability of calculating observation sequence model corresponding with all entries, further includes:
Speech primitive corresponding to characteristic parameter sequence to be identified is estimated, the primitive includes word, syllable, initial consonant, rhythm Characteristic parameter sequence is converted to Recognition unit by mother;The process for establishing model includes:
(1), initial parameter value is randomly choosed, HMM model λ is initialized;
It (2), is that each state carries out cutting by observation sequence;The result of cutting is exactly the corresponding observation of each state Frame set;
(3), the observation vector set for belonging to each state is divided into M cluster using Segment Clustering algorithm, M is Gaussian Mixture Degree, a single Gaussian parameter of every cluster for Gaussian-mixture probability density, carries out the estimation of following parameter later:
cjk=the vector number that kth set is in when state j/belongs to the vector sum of state j;
μjkThe vector sample average of kth set is in under=state j;
UjkThe vector sample covariance matrix of kth set is in under=state j;
The HMM model λ ' of a update is obtained according to above-mentioned parameter;
(4), model λ ' is compared with initial model λ, if model difference is more than preset threshold value, by mould Type λ replaces with λ ' and repeats the above steps 2 and 3;If difference is lower than threshold value, it is determined that restrained for model, saving should Model;
By above-mentioned iteration so as to constantly be corrected during entire model training to initial parameter value.
The present invention compared with prior art, has the advantage that
The invention proposes a kind of vehicle voice datas to parse recognition methods, does not need the label training sample in offline dictionary This collection, it is small to the dependence of rule, accuracy of identification is improved, the demand that onboard system is constantly updated is adapted to.
Detailed description of the invention
Fig. 1 is the flow chart of vehicle voice data parsing recognition methods according to an embodiment of the present invention.
Specific embodiment
Retouching in detail to one or more embodiment of the invention is hereafter provided together with the attached drawing of the diagram principle of the invention It states.The present invention is described in conjunction with such embodiment, but the present invention is not limited to any embodiments.The scope of the present invention is only by right Claim limits, and the present invention covers many substitutions, modification and equivalent.Illustrate in the following description many details with Just it provides a thorough understanding of the present invention.These details are provided for exemplary purposes, and without in these details Some or all details can also realize the present invention according to claims.
An aspect of of the present present invention provides a kind of vehicle voice data parsing recognition methods.Fig. 1 is to implement according to the present invention The vehicle voice data of example parses recognition methods flow chart.
Onboard system of the invention is composed of identification module, semantic supposition categorization module.Utilize machine learning method Effective study is carried out to establish segmenter to training corpus.Word mark is carried out using CRF.Then semantic supposition is carried out. Classification to proper noun, storage and tissue and voice command convenient for proper noun.
The identification process of identification module can be described as: read in the voice to be identified, after front-end processing, the sight that will obtain It surveys sequence X to match all entries, that is, design conditions probability, wherein the corresponding entry of the model of maximum probability is just It is recognition result.Complete above-mentioned identification, it is necessary to first complete learning model training.
Go out speech primitive corresponding to characteristic parameter sequence to be identified from probability angle estimation, the primitive include word, syllable, Initial consonant, simple or compound vowel of a Chinese syllable, so that characteristic parameter sequence is converted to Recognition unit.The process for establishing model includes:
1, initial parameter value is randomly choosed, HMM model λ is initialized.
It 2, is that each state carries out cutting by observation sequence.The result of cutting is exactly that each state corresponds to an observation frame Set.
3, the observation vector set for belonging to each state is divided into M cluster using segmentation K mean algorithm, M is Gaussian Mixture Degree, a single Gaussian parameter of every cluster for Gaussian-mixture probability density, carries out the estimation of following parameter later.
cjk=the vector number that kth set is in when state j/belongs to the vector sum of state j
μjkThe vector sample average of kth set is in under=state j
UjkThe vector sample covariance matrix of kth set is in under=state j
The HMM model λ ' of a update is obtained according to above-mentioned parameter;
4, model λ ' is compared with initial model λ, if model difference is more than preset threshold value, by model λ replaces with λ ' and repeats the above steps 2 and 3;If difference is lower than threshold value, it is determined that restrained for model, saved the mould Type.
By above-mentioned iteration so as to constantly be corrected during entire model training to initial parameter value.
In the training stage of model, directly by MFCC characteristic parameter as observation, a MFCC vector is exactly a sight Measured value.Secondly the parameter b of composition probability density function is calculatedj(o):
bj(o)=∑ cjmM(o,μjm,Ujm), 1 < j < N
Wherein o indicates observation, cjmM-th of mixed coefficint of state j is represented, M is ellipsometry density function, by putting down Mean vector μjmWith covariance matrix Ujm
Model training process is as follows:
(1), all training characteristics parameters are sliced into each state.
(2), the single Gauss model that the characteristic parameter that each state is possessed is given to mixed Gauss model, to model into The following amendment of row:
Wherein γt(j, k) isL is sample size;CtFor t moment scale factor.
(3), judge whether to restrain.Terminate if convergence, if not satisfied, continuing iteration.
In classifying to proper noun.The training sample set and test set of database are obtained first.To training sample This collection be trained on the basis of pretreatment and text representation to training sample set, obtains a classifier.It is assessing The classifier stage tests test set.It after being pre-processed to training sample, is segmented, by each proper noun It is converted to the vector being made of morpheme.The word frequency and reverse frequency of each morpheme are counted using training sample, and are thus calculated every Regularization word frequency and reverse frequency ratio of a morpheme to classification predetermined, as word d to the power of corresponding classification i Value wi(d)。
Wherein N is proper noun sum, niFor the proper noun quantity comprising entry i;During the test, it counts respectively Proper noun to be processed provides final classification results to the weights sum of each classification.
Wherein ZY is proper noun to be sorted, and M is classification number ZyjJ-th of morpheme in as ZY.
Four contents are arranged in every voice command in database, are intention, voice command original text, voice command respectively Segment information, the word mark information of voice command.Generate the training file of participle, the test file of participle, word mark The test file of training file, word mark.The present invention segments first to voice command and word mark.For repeatability Mistake, it may be found that mistake be added to modifying for batch in the customized dictionary of program.
Before participle, the problem of participle, is converted into sequence mark problem.The beginnings of tagged words, the centre of word, word The word that ending and single word are constituted, the feature for then needing to learn using default template definition, every a line generation in template file One template of table, what the row in macro [row, col] in each template was represented is opposite line number, and what col was indicated is absolute Line number.The template according to defined in template file, training file is according to these template generation Feature Words.
Using the learning training for segmenting training file progress condition random field, a Words partition system is obtained.Utilize part of speech mark Remember that training file carries out the learning training of condition random field, obtains a word mark system.It is surveyed with participle test file Examination, obtains the precision of participle.Word mark test file is tested, the precision of word mark system is obtained.
After participle terminates, the result that will be segmented is needed to carry out transformation appropriate in order to later use word mark system System carries out word mark.For every voice command, Words partition system exports B1 word segmentation result.The part of speech mark obtained using training Note system carries out word mark to file, and every voice command can obtain B1 word segmentation result, this step is for each input There is the output of B2 word mark as a result, therefore every voice command finally can get B1*B2 recognition result, is screened out from it most B excellent recognition result, the participle of database Plays and word mark information are write back.It is same with word mark result when segmenting When it is completely the same when, determine that recognition result is correct.
It further comprise respectively tying each participle wherein in generating B1 word segmentation result and B2 word mark result The probability of fruit extracts, and is stored in array p1In, the probability of word mark result, which extracts, is stored in p2In array.For B1*B2 recognition result of every voice command can pass through p1、p2The two arrays calculate its generating probability, calculating process are as follows:
P [i]=p2[i]*p1[i/B2] i=0,1,2 ..., B1*B2-1
The generating probability p [i] of this B1*B2 recognition result is ranked up, the highest B recognition result of output probability.
After completing training study, output module exports the participle and word mark result of given voice command.For The voice command of input searches dictionary, searches special noun in voice command, corresponding special symbol is replaced with if finding Number, and file 1 is written;The proper noun found is identified with bracket at the same time, and file 2 is written.File 1 is converted The input format of test module is segmented for condition random field and file 3 is written.File 3 is segmented, the result of participle is protected It is stored in file 4.The input format of word mark system is converted by the result of participle and file 5 is written.For every voice The B1 word segmentation result generated is ordered, the probability of word segmentation result is saved in p1.Part of speech is carried out to the voice command in file 5 Label.As a result it is saved in file 6.B1*B2 recognition result is obtained for every voice command altogether at this time, needs to export most B good recognition result.B final result is written into file 7.Convert file 7 to finally specified output format.
Semantic understanding uses the method based on statistical learning in the present invention.Semantic major class in onboard system includes navigation road Line, road conditions are answered and are made a phone call, air-conditioning adjusting, the functions such as weather voice command, radio.Part of semanteme also needs It takes parameter: such as making a phone call needs and know the specific telephone number dialed.Semantic supposition problem of the invention can be converted into The intention of the text of input is assigned in predefined intention class.The type of voice command intention is speculated first, if intention needs Further parameter then finds corresponding parameter in voice command.
To the following conditions probabilistic Modeling in word mark problem:
p(s1…sm|x1…xm), for x1…xmRepresent the single word in the voice command of input, and s1…sm∈ S generation The all possible part of speech combination of table.For one by x1…xmThe voice command of composition shares kmKind word mark combination.K=| S |.Therefore this k is establishedmThe probability distribution of kind word mark result.
Then it obtains:
The finite aggregate feature vector for indicating predefined set of words X Yu tag set Y, passes through s1…smAnd x1…xm's A large amount of training, obtain parameter vector w, finally obtain p (s1…sm|x1…xm)。
It completes to seek for inputting x after training1…xmState s1…sm, that is, it solves:
The word after being characterized in voice command participle is used to be also possible to the combination of corresponding part of speech or the two. After choosing series of features, feature is added according to training sample, adjusts weight.
In the text classification problem for semantics recognition, for giving the training sample of set of words X and tag set Y (xi,yi), i=l ..., n, n are total sample number, following optimization problem is set:
When test, for point x according to wTX > 0 provides classification results.
In conclusion the invention proposes a kind of vehicle voice datas to parse recognition methods, do not need in offline dictionary Training sample set is marked, it is small to the dependence of rule, accuracy of identification is improved, the demand that onboard system is constantly updated is adapted to.
Obviously, it should be appreciated by those skilled in the art, each module of the above invention or each steps can be with general Computing system realize that they can be concentrated in single computing system, or be distributed in multiple computing systems and formed Network on, optionally, they can be realized with the program code that computing system can be performed, it is thus possible to they are stored It is executed within the storage system by computing system.In this way, the present invention is not limited to any specific hardware and softwares to combine.
It should be understood that above-mentioned specific embodiment of the invention is used only for exemplary illustration or explains of the invention Principle, but not to limit the present invention.Therefore, that is done without departing from the spirit and scope of the present invention is any Modification, equivalent replacement, improvement etc., should all be included in the protection scope of the present invention.In addition, appended claims purport of the present invention Covering the whole variations fallen into attached claim scope and boundary or this range and the equivalent form on boundary and is repairing Change example.

Claims (1)

1. a kind of vehicle voice data parses recognition methods characterized by comprising
The voice to be identified is read in, observation sequence, calculating observation sequence mould corresponding with all entries are obtained after front-end processing The conditional probability of type determines entry recognition result according to the conditional probability;
Before the conditional probability of the calculating observation sequence model corresponding with all entries, further includes:
Speech primitive corresponding to characteristic parameter sequence to be identified is estimated, the primitive includes word, syllable, initial consonant, simple or compound vowel of a Chinese syllable, is incited somebody to action Characteristic parameter sequence is converted to Recognition unit;The process for establishing model includes:
(1), initial parameter value is randomly choosed, HMM model λ is initialized;
It (2), is that each state carries out cutting by observation sequence;The result of cutting is exactly that each state corresponds to an observation frame collection It closes;
(3), the observation vector set for belonging to each state being divided into M cluster using Segment Clustering algorithm, M is Gaussian Mixture degree, A single Gaussian parameter of every cluster for Gaussian-mixture probability density, carries out the estimation of following parameter later:
cjk=the vector number that kth set is in when state j/belongs to the vector sum of state j;
μjkThe vector sample average of kth set is in under=state j;
UjkThe vector sample covariance matrix of kth set is in under=state j;
The HMM model λ ' of a update is obtained according to above-mentioned parameter;
(4), model λ ' is compared with initial model λ, if model difference is more than preset threshold value, by model λ It replaces with λ ' and repeats the above steps 2 and 3;If difference is lower than threshold value, it is determined that restrained for model, saved the mould Type;
By above-mentioned iteration so as to constantly be corrected during entire model training to initial parameter value.
CN201610534783.0A 2016-07-08 2016-07-08 Vehicle voice data parses recognition methods Expired - Fee Related CN106057196B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610534783.0A CN106057196B (en) 2016-07-08 2016-07-08 Vehicle voice data parses recognition methods

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610534783.0A CN106057196B (en) 2016-07-08 2016-07-08 Vehicle voice data parses recognition methods

Publications (2)

Publication Number Publication Date
CN106057196A CN106057196A (en) 2016-10-26
CN106057196B true CN106057196B (en) 2019-06-11

Family

ID=57184974

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610534783.0A Expired - Fee Related CN106057196B (en) 2016-07-08 2016-07-08 Vehicle voice data parses recognition methods

Country Status (1)

Country Link
CN (1) CN106057196B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106971721A (en) * 2017-03-29 2017-07-21 沃航(武汉)科技有限公司 A kind of accent speech recognition system based on embedded mobile device
CN108986811B (en) * 2018-08-31 2021-05-28 北京新能源汽车股份有限公司 Voice recognition detection method, device and equipment
CN111353292B (en) * 2020-02-26 2023-06-16 支付宝(杭州)信息技术有限公司 Analysis method and device for user operation instruction

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1490786A (en) * 2002-10-17 2004-04-21 中国科学院声学研究所 Phonetic recognition confidence evaluating method, system and dictation device therewith
CN101930735A (en) * 2009-06-23 2010-12-29 富士通株式会社 Speech emotion recognition equipment and speech emotion recognition method
CN101980336B (en) * 2010-10-18 2012-01-11 福州星网视易信息系统有限公司 Hidden Markov model-based vehicle sound identification method
CN103065626A (en) * 2012-12-20 2013-04-24 中国科学院声学研究所 Automatic grading method and automatic grading equipment for read questions in test of spoken English
CN103810998A (en) * 2013-12-05 2014-05-21 中国农业大学 Method for off-line speech recognition based on mobile terminal device and achieving method
CN105390133A (en) * 2015-10-09 2016-03-09 西北师范大学 Tibetan TTVS system realization method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1490786A (en) * 2002-10-17 2004-04-21 中国科学院声学研究所 Phonetic recognition confidence evaluating method, system and dictation device therewith
CN101930735A (en) * 2009-06-23 2010-12-29 富士通株式会社 Speech emotion recognition equipment and speech emotion recognition method
CN101980336B (en) * 2010-10-18 2012-01-11 福州星网视易信息系统有限公司 Hidden Markov model-based vehicle sound identification method
CN103065626A (en) * 2012-12-20 2013-04-24 中国科学院声学研究所 Automatic grading method and automatic grading equipment for read questions in test of spoken English
CN103810998A (en) * 2013-12-05 2014-05-21 中国农业大学 Method for off-line speech recognition based on mobile terminal device and achieving method
CN105390133A (en) * 2015-10-09 2016-03-09 西北师范大学 Tibetan TTVS system realization method

Also Published As

Publication number Publication date
CN106057196A (en) 2016-10-26

Similar Documents

Publication Publication Date Title
CN106407333B (en) Spoken language query identification method and device based on artificial intelligence
CN110210029A (en) Speech text error correction method, system, equipment and medium based on vertical field
CN108763510A (en) Intension recognizing method, device, equipment and storage medium
CN108108351A (en) A kind of text sentiment classification method based on deep learning built-up pattern
CN105205124B (en) A kind of semi-supervised text sentiment classification method based on random character subspace
CN107330011A (en) The recognition methods of the name entity of many strategy fusions and device
CN111046670B (en) Entity and relationship combined extraction method based on drug case legal documents
CN104199965A (en) Semantic information retrieval method
CN106294344A (en) Video retrieval method and device
CN106340297A (en) Speech recognition method and system based on cloud computing and confidence calculation
CN103678271B (en) A kind of text correction method and subscriber equipment
CN112016313B (en) Spoken language element recognition method and device and warning analysis system
CN110992988B (en) Speech emotion recognition method and device based on domain confrontation
CN113408287B (en) Entity identification method and device, electronic equipment and storage medium
CN106057196B (en) Vehicle voice data parses recognition methods
CN106528776A (en) Text classification method and device
CN110046264A (en) A kind of automatic classification method towards mobile phone document
CN113919366A (en) Semantic matching method and device for power transformer knowledge question answering
CN106202045B (en) Special audio recognition method based on car networking
CN114154570A (en) Sample screening method and system and neural network model training method
CN108681532B (en) Sentiment analysis method for Chinese microblog
CN110097096A (en) A kind of file classification method based on TF-IDF matrix and capsule network
CN112417132A (en) New intention recognition method for screening negative samples by utilizing predicate guest information
CN106203520B (en) SAR image classification method based on depth Method Using Relevance Vector Machine
CN112489689B (en) Cross-database voice emotion recognition method and device based on multi-scale difference countermeasure

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20190611

Termination date: 20210708