CN106057196B

CN106057196B - Vehicle voice data parses recognition methods

Info

Publication number: CN106057196B
Application number: CN201610534783.0A
Authority: CN
Inventors: 谢欣霖; 陈波
Original assignee: Chengdu Zhida Science And Technology Co Ltd
Current assignee: Chengdu Zhida Science And Technology Co Ltd
Priority date: 2016-07-08
Filing date: 2016-07-08
Publication date: 2019-06-11
Anticipated expiration: 2036-07-08
Also published as: CN106057196A

Abstract

The present invention provides a kind of vehicle voice datas to parse recognition methods, this method comprises: reading in the voice to be identified, observation sequence is obtained after front-end processing, the conditional probability of calculating observation sequence model corresponding with all entries determines entry recognition result according to the conditional probability.The invention proposes a kind of vehicle voice datas to parse recognition methods, does not need the label training sample set in offline dictionary, small to the dependence of rule, improves accuracy of identification, adapts to the demand that onboard system is constantly updated.

Description

Vehicle voice data parses recognition methods

Technical field

The present invention relates to speech recognition, in particular to a kind of vehicle voice data parses recognition methods.

Background technique

The development that the technologies such as cloud computing, big data, data mining will push information services industry faster and better, wherein merging Guidance people are more accurately and efficiently obtained required information and service by the information service of natural language understanding technology.Voice Man-machine interaction mode of the technology as most desirable will gradually become one kind more and more crucial in many interactive modes. Specific to automotive field, the practical intelligent information service system of height is customized out using natural language understanding technology, with more people Property man-machine interaction mode more convenient, accurate information speech order and navigation are provided, will be prospect for the experience of driving Wide promotion.However existing vehicle environment speech recognition is by the way that needing in biggish offline dictionary, largely label trains sample This collection carries out study and carries out semantic supposition, relatively high to the dependence of rule, less flexibly, it is continuous can not to adapt to onboard system The demand and precision of variation and accuracy are lower.

Summary of the invention

To solve the problems of above-mentioned prior art, the invention proposes a kind of vehicle voice datas to parse identification side Method, comprising:

The voice to be identified is read in, observation sequence, calculating observation sequence and all entries pair are obtained after front-end processing The conditional probability for answering model determines entry recognition result according to the conditional probability.

Preferably, before the conditional probability of calculating observation sequence model corresponding with all entries, further includes:

Speech primitive corresponding to characteristic parameter sequence to be identified is estimated, the primitive includes word, syllable, initial consonant, rhythm Characteristic parameter sequence is converted to Recognition unit by mother；The process for establishing model includes:

(1), initial parameter value is randomly choosed, HMM model λ is initialized；

It (2), is that each state carries out cutting by observation sequence；The result of cutting is exactly the corresponding observation of each state Frame set；

(3), the observation vector set for belonging to each state is divided into M cluster using Segment Clustering algorithm, M is Gaussian Mixture Degree, a single Gaussian parameter of every cluster for Gaussian-mixture probability density, carries out the estimation of following parameter later:

c_jk=the vector number that kth set is in when state j/belongs to the vector sum of state j；

μ_jkThe vector sample average of kth set is in under=state j；

U_jkThe vector sample covariance matrix of kth set is in under=state j；

The HMM model λ ' of a update is obtained according to above-mentioned parameter；

(4), model λ ' is compared with initial model λ, if model difference is more than preset threshold value, by mould Type λ replaces with λ ' and repeats the above steps 2 and 3；If difference is lower than threshold value, it is determined that restrained for model, saving should Model；

By above-mentioned iteration so as to constantly be corrected during entire model training to initial parameter value.

The present invention compared with prior art, has the advantage that

The invention proposes a kind of vehicle voice datas to parse recognition methods, does not need the label training sample in offline dictionary This collection, it is small to the dependence of rule, accuracy of identification is improved, the demand that onboard system is constantly updated is adapted to.

Detailed description of the invention

Fig. 1 is the flow chart of vehicle voice data parsing recognition methods according to an embodiment of the present invention.

Specific embodiment

Retouching in detail to one or more embodiment of the invention is hereafter provided together with the attached drawing of the diagram principle of the invention It states.The present invention is described in conjunction with such embodiment, but the present invention is not limited to any embodiments.The scope of the present invention is only by right Claim limits, and the present invention covers many substitutions, modification and equivalent.Illustrate in the following description many details with Just it provides a thorough understanding of the present invention.These details are provided for exemplary purposes, and without in these details Some or all details can also realize the present invention according to claims.

An aspect of of the present present invention provides a kind of vehicle voice data parsing recognition methods.Fig. 1 is to implement according to the present invention The vehicle voice data of example parses recognition methods flow chart.

Onboard system of the invention is composed of identification module, semantic supposition categorization module.Utilize machine learning method Effective study is carried out to establish segmenter to training corpus.Word mark is carried out using CRF.Then semantic supposition is carried out. Classification to proper noun, storage and tissue and voice command convenient for proper noun.

The identification process of identification module can be described as: read in the voice to be identified, after front-end processing, the sight that will obtain It surveys sequence X to match all entries, that is, design conditions probability, wherein the corresponding entry of the model of maximum probability is just It is recognition result.Complete above-mentioned identification, it is necessary to first complete learning model training.

Go out speech primitive corresponding to characteristic parameter sequence to be identified from probability angle estimation, the primitive include word, syllable, Initial consonant, simple or compound vowel of a Chinese syllable, so that characteristic parameter sequence is converted to Recognition unit.The process for establishing model includes:

1, initial parameter value is randomly choosed, HMM model λ is initialized.

It 2, is that each state carries out cutting by observation sequence.The result of cutting is exactly that each state corresponds to an observation frame Set.

3, the observation vector set for belonging to each state is divided into M cluster using segmentation K mean algorithm, M is Gaussian Mixture Degree, a single Gaussian parameter of every cluster for Gaussian-mixture probability density, carries out the estimation of following parameter later.

c_jk=the vector number that kth set is in when state j/belongs to the vector sum of state j

μ_jkThe vector sample average of kth set is in under=state j

U_jkThe vector sample covariance matrix of kth set is in under=state j

4, model λ ' is compared with initial model λ, if model difference is more than preset threshold value, by model λ replaces with λ ' and repeats the above steps 2 and 3；If difference is lower than threshold value, it is determined that restrained for model, saved the mould Type.

In the training stage of model, directly by MFCC characteristic parameter as observation, a MFCC vector is exactly a sight Measured value.Secondly the parameter b of composition probability density function is calculated_j(o):

b_j(o)=∑ c_jmM(o,μ_jm,U_jm), 1 < j < N

Wherein o indicates observation, c_jmM-th of mixed coefficint of state j is represented, M is ellipsometry density function, by putting down Mean vector μ_jmWith covariance matrix U_jm；

Model training process is as follows:

(1), all training characteristics parameters are sliced into each state.

(2), the single Gauss model that the characteristic parameter that each state is possessed is given to mixed Gauss model, to model into The following amendment of row:

Wherein γ_t(j, k) isL is sample size；C_tFor t moment scale factor.

(3), judge whether to restrain.Terminate if convergence, if not satisfied, continuing iteration.

In classifying to proper noun.The training sample set and test set of database are obtained first.To training sample This collection be trained on the basis of pretreatment and text representation to training sample set, obtains a classifier.It is assessing The classifier stage tests test set.It after being pre-processed to training sample, is segmented, by each proper noun It is converted to the vector being made of morpheme.The word frequency and reverse frequency of each morpheme are counted using training sample, and are thus calculated every Regularization word frequency and reverse frequency ratio of a morpheme to classification predetermined, as word d to the power of corresponding classification i Value w_i(d)。

Wherein N is proper noun sum, n_iFor the proper noun quantity comprising entry i；During the test, it counts respectively Proper noun to be processed provides final classification results to the weights sum of each classification.

Wherein ZY is proper noun to be sorted, and M is classification number Zy_jJ-th of morpheme in as ZY.

Four contents are arranged in every voice command in database, are intention, voice command original text, voice command respectively Segment information, the word mark information of voice command.Generate the training file of participle, the test file of participle, word mark The test file of training file, word mark.The present invention segments first to voice command and word mark.For repeatability Mistake, it may be found that mistake be added to modifying for batch in the customized dictionary of program.

Before participle, the problem of participle, is converted into sequence mark problem.The beginnings of tagged words, the centre of word, word The word that ending and single word are constituted, the feature for then needing to learn using default template definition, every a line generation in template file One template of table, what the row in macro [row, col] in each template was represented is opposite line number, and what col was indicated is absolute Line number.The template according to defined in template file, training file is according to these template generation Feature Words.

Using the learning training for segmenting training file progress condition random field, a Words partition system is obtained.Utilize part of speech mark Remember that training file carries out the learning training of condition random field, obtains a word mark system.It is surveyed with participle test file Examination, obtains the precision of participle.Word mark test file is tested, the precision of word mark system is obtained.

After participle terminates, the result that will be segmented is needed to carry out transformation appropriate in order to later use word mark system System carries out word mark.For every voice command, Words partition system exports B1 word segmentation result.The part of speech mark obtained using training Note system carries out word mark to file, and every voice command can obtain B1 word segmentation result, this step is for each input There is the output of B2 word mark as a result, therefore every voice command finally can get B1*B2 recognition result, is screened out from it most B excellent recognition result, the participle of database Plays and word mark information are write back.It is same with word mark result when segmenting When it is completely the same when, determine that recognition result is correct.

It further comprise respectively tying each participle wherein in generating B1 word segmentation result and B2 word mark result The probability of fruit extracts, and is stored in array p₁In, the probability of word mark result, which extracts, is stored in p₂In array.For B1*B2 recognition result of every voice command can pass through p₁、p₂The two arrays calculate its generating probability, calculating process are as follows:

P [i]=p₂[i]*p₁[i/B2] i=0,1,2 ..., B1*B2-1

The generating probability p [i] of this B1*B2 recognition result is ranked up, the highest B recognition result of output probability.

After completing training study, output module exports the participle and word mark result of given voice command.For The voice command of input searches dictionary, searches special noun in voice command, corresponding special symbol is replaced with if finding Number, and file 1 is written；The proper noun found is identified with bracket at the same time, and file 2 is written.File 1 is converted The input format of test module is segmented for condition random field and file 3 is written.File 3 is segmented, the result of participle is protected It is stored in file 4.The input format of word mark system is converted by the result of participle and file 5 is written.For every voice The B1 word segmentation result generated is ordered, the probability of word segmentation result is saved in p1.Part of speech is carried out to the voice command in file 5 Label.As a result it is saved in file 6.B1*B2 recognition result is obtained for every voice command altogether at this time, needs to export most B good recognition result.B final result is written into file 7.Convert file 7 to finally specified output format.

Semantic understanding uses the method based on statistical learning in the present invention.Semantic major class in onboard system includes navigation road Line, road conditions are answered and are made a phone call, air-conditioning adjusting, the functions such as weather voice command, radio.Part of semanteme also needs It takes parameter: such as making a phone call needs and know the specific telephone number dialed.Semantic supposition problem of the invention can be converted into The intention of the text of input is assigned in predefined intention class.The type of voice command intention is speculated first, if intention needs Further parameter then finds corresponding parameter in voice command.

To the following conditions probabilistic Modeling in word mark problem:

p(s₁…s_m|x₁…x_m), for x₁…x_mRepresent the single word in the voice command of input, and s₁…s_m∈ S generation The all possible part of speech combination of table.For one by x₁…x_mThe voice command of composition shares k^mKind word mark combination.K=| S |.Therefore this k is established^mThe probability distribution of kind word mark result.

Then it obtains:

The finite aggregate feature vector for indicating predefined set of words X Yu tag set Y, passes through s₁…s_mAnd x₁…x_m's A large amount of training, obtain parameter vector w, finally obtain p (s₁…s_m|x₁…x_m)。

It completes to seek for inputting x after training₁…x_mState s₁…s_m, that is, it solves:

The word after being characterized in voice command participle is used to be also possible to the combination of corresponding part of speech or the two. After choosing series of features, feature is added according to training sample, adjusts weight.

In the text classification problem for semantics recognition, for giving the training sample of set of words X and tag set Y (x_i,y_i), i=l ..., n, n are total sample number, following optimization problem is set:

When test, for point x according to w^TX > 0 provides classification results.

In conclusion the invention proposes a kind of vehicle voice datas to parse recognition methods, do not need in offline dictionary Training sample set is marked, it is small to the dependence of rule, accuracy of identification is improved, the demand that onboard system is constantly updated is adapted to.

Obviously, it should be appreciated by those skilled in the art, each module of the above invention or each steps can be with general Computing system realize that they can be concentrated in single computing system, or be distributed in multiple computing systems and formed Network on, optionally, they can be realized with the program code that computing system can be performed, it is thus possible to they are stored It is executed within the storage system by computing system.In this way, the present invention is not limited to any specific hardware and softwares to combine.

It should be understood that above-mentioned specific embodiment of the invention is used only for exemplary illustration or explains of the invention Principle, but not to limit the present invention.Therefore, that is done without departing from the spirit and scope of the present invention is any Modification, equivalent replacement, improvement etc., should all be included in the protection scope of the present invention.In addition, appended claims purport of the present invention Covering the whole variations fallen into attached claim scope and boundary or this range and the equivalent form on boundary and is repairing Change example.

Claims

1. a kind of vehicle voice data parses recognition methods characterized by comprising

The voice to be identified is read in, observation sequence, calculating observation sequence mould corresponding with all entries are obtained after front-end processing The conditional probability of type determines entry recognition result according to the conditional probability；

Before the conditional probability of the calculating observation sequence model corresponding with all entries, further includes:

Speech primitive corresponding to characteristic parameter sequence to be identified is estimated, the primitive includes word, syllable, initial consonant, simple or compound vowel of a Chinese syllable, is incited somebody to action Characteristic parameter sequence is converted to Recognition unit；The process for establishing model includes:

It (2), is that each state carries out cutting by observation sequence；The result of cutting is exactly that each state corresponds to an observation frame collection It closes；

(3), the observation vector set for belonging to each state being divided into M cluster using Segment Clustering algorithm, M is Gaussian Mixture degree, A single Gaussian parameter of every cluster for Gaussian-mixture probability density, carries out the estimation of following parameter later:

μ_jkThe vector sample average of kth set is in under=state j；

U_jkThe vector sample covariance matrix of kth set is in under=state j；

(4), model λ ' is compared with initial model λ, if model difference is more than preset threshold value, by model λ It replaces with λ ' and repeats the above steps 2 and 3；If difference is lower than threshold value, it is determined that restrained for model, saved the mould Type；