CN106128454A

CN106128454A - Voice signal matching process based on car networking

Info

Publication number: CN106128454A
Application number: CN201610534864.0A
Authority: CN
Inventors: 谢欣霖; 陈波
Original assignee: Chengdu Zhida Science And Technology Co Ltd
Current assignee: Chengdu Zhida Science And Technology Co Ltd
Priority date: 2016-07-08
Filing date: 2016-07-08
Publication date: 2016-11-16

Abstract

The invention provides a kind of voice signal matching process based on car networking, the method includes: learns corpus thus sets up segmenter, then participle is carried out POS-tagging and semantic supposition, first the type of voice command purpose is speculated during speculating, if purpose needs further parameter, then in voice command, resolve the parameter of correspondence.The present invention proposes a kind of voice signal matching process based on car networking, it is not necessary to the labelling training sample set in off-line dictionary, little to the dependency of rule, improves accuracy of identification, adapts to the demand that onboard system is constantly updated.

Description

Voice signal matching process based on car networking

Technical field

The present invention relates to speech recognition, particularly to a kind of voice signal matching process based on car networking.

Background technology

The technology such as cloud computing, big data, data mining, by promoting the faster and better development of information services industry, wherein merge The information service of natural language understanding technology, obtains required information and service more accurately and efficiently by guiding people.Voice Technology, as the man-machine interaction mode of desirable, will progressively become one more and more crucial in many interactive modes. Specific to automotive field, available natural language understanding technology customizes out highly practical intelligent information service system, with more people Property man-machine interaction mode more convenient, information speech order accurately and navigation are provided, will be prospect for the experience driven Wide lifting.But existing vehicle environment speech recognition is by needing a large amount of labellings training samples in bigger off-line dictionary This collection carries out study and carries out semantic supposition, higher, the most flexibly to the dependency of rule, it is impossible to enough adapt to onboard system continuous Change demand and precision and accuracy relatively low.

Summary of the invention

For solving the problem existing for above-mentioned prior art, the present invention proposes a kind of voice signal based on car networking Method of completing the square, including:

Corpus is learnt thus sets up segmenter, then participle is carried out POS-tagging and semantic supposition, First speculate the type of voice command purpose during supposition, if purpose needs further parameter, then solve in voice command The parameter that analysis is corresponding.

Preferably, described participle is carried out POS-tagging, farther includes:

Following conditional probability is modeled by POS-tagging problem:

p(s₁…s_m|x₁…x_m), x₁…x_mRepresent the single word in the voice command of input, and s₁…s_m∈ S represents institute Possible part of speech combines, for one by x₁…x_mThe voice command constituted has k^mPlant POS-tagging combination, k=| S |；Build This k vertical^mThe probability distribution planting POS-tagging result is:

p (s_{1} ... s_{m} | x_{1} ... x_{m}) = Π_{i = 1}^{m} p (s_{i} | s_{i - 1}, x_{1} ... x_{m})

Then obtain:

Represent the finite aggregate characteristic vector of predefined set of words X and tag set Y, pass through s₁…s_mAnd x₁…x_m's A large amount of training, it is thus achieved that parameter vector w, finally give p (s₁…s_m|x₁…x_m)；

After completing training, ask for for input x₁…x_mState s₁…s_m, i.e. solve:

{argmax}_{s_{1} ... s_{m}} p (s_{1} ... s_{m} | x_{1} ... x_{m})

The feature used is the part of speech of the word after voice command participle or correspondence, or the combination of the two；Work as selection After good series of features, add feature according to training sample, regulate weights；

For in the text classification problem of semantics recognition, for the training sample of given set of words X Yu tag set Y (x_i,y_i), i=l ..., n, n are total sample number, arrange following optimization problem:

m i n w^{T} w / 2 + C [m a x (1 - y_{i} w^{T} x_{i}, 0) + Σ_{i = 1}^{l} l o g (1 + e^{- y_{i} w^{T} x_{i}})]

During test, for an x according to w^TX > 0 provides classification results.

The present invention compared to existing technology, has the advantage that

The present invention proposes a kind of voice signal matching process based on car networking, it is not necessary to the labelling instruction in off-line dictionary Practice sample set, little to the dependency of rule, improve accuracy of identification, adapt to the demand that onboard system is constantly updated.

Accompanying drawing explanation

Fig. 1 is the flow chart of voice signal matching process based on car networking according to embodiments of the present invention.

Detailed description of the invention

Hereafter provide retouching in detail one or more embodiment of the present invention together with the accompanying drawing of the diagram principle of the invention State.Describe the present invention in conjunction with such embodiment, but the invention is not restricted to any embodiment.The scope of the present invention is only by right Claim limits, and the present invention contains many replacements, amendment and equivalent.Illustrate in the following description many details with Thorough understanding of the present invention is just provided.These details are provided for exemplary purposes, and without in these details Some or all details can also realize the present invention according to claims.

An aspect of of the present present invention provides a kind of voice signal matching process based on car networking.Fig. 1 is according to the present invention The voice signal matching process flow chart based on car networking of embodiment.

The onboard system of the present invention is combined by identification module, semantic supposition sort module.Utilize machine learning method Corpus is effectively learnt thus sets up segmenter.CRF is utilized to carry out POS-tagging.Then semantic supposition is carried out. Classification to proper noun, it is simple to the storage of proper noun and tissue and voice command.

The identification process of identification module can be described as: reads in voice to be identified, after front-end processing, and the sight that will obtain Surveying sequence X to mate all entries, namely design conditions probability, the entry that wherein model of maximum probability is corresponding is just It it is recognition result.Complete above-mentioned identification, it is necessary to first complete learning model training.

Go out the speech primitive corresponding to characteristic parameter sequence to be identified from probability angle estimation, this primitive include word, syllable, Initial consonant, simple or compound vowel of a Chinese syllable, thus characteristic parameter sequence is converted to Recognition unit.The process setting up model includes:

1, randomly choose initial parameter value, initialize HMM model λ.

2, it is that each state carries out cutting by observation sequence.The result of cutting is exactly the corresponding observation frame of each state Set.

3, utilizing segmentation K mean algorithm that the observation vector set belonging to each state is divided into M bunch, M is Gaussian Mixture Degree, every cluster, for a single Gaussian parameter of Gaussian-mixture probability density, carries out the estimation of following parameter afterwards.

c_jkThe vector sum of the vectorial number of kth set/belong to state j it is in during=state j

μ_jkThe vectorial sample average of kth set it is in under=state j

U_jkThe vectorial sample covariance matrix of kth set it is in under=state j

A HMM model λ ' updated is obtained according to above-mentioned parameter；

4, model λ ' is compared with initial model λ, if model difference exceedes threshold value set in advance, by model λ replaces with λ ' and repeat the above steps 2 and 3；If difference is less than threshold value, it is determined that restrains for model, preserves this mould Type.

By above-mentioned iteration so that initial parameter value constantly being corrected during whole model training.

In the training stage of model, directly by MFCC characteristic parameter as observation, a MFCC vector is exactly a sight Measured value.Secondly parameter b of composition probability density function is calculated_j(o):

b_j(o)=∑ c_jmM(o,μ_jm,U_jm), 1 < j < N

Wherein o represents observation, c_jmRepresenting the m-th mixed coefficint of state j, M is ellipsometry density function, and it is by putting down Mean vector μ_jmWith covariance matrix U_jm；

Model training process is as follows:

(1), all of training characteristics parameter is sliced in each state.

(2), by the characteristic parameter that each state is had give the single Gauss model of mixed Gauss model, model is entered Row is following to be revised:

{C^{,}}_{j k} = \frac{Σ_{l = 1}^{L} γ_{t} (j, k)}{Σ_{l = 1}^{L} Σ_{m = 1}^{M} γ_{t} (j, m)}

{μ^{,}}_{j k} = \frac{Σ_{l = 1}^{L} γ_{t} (j, k) o}{Σ_{l = 1}^{L} γ_{t} (j, k)}

{U^{,}}_{j k} = \frac{Σ_{l = 1}^{L} γ_{t} (j, k) (o - μ_{j k})}{Σ_{l = 1}^{L} γ_{t} (j, k)}

Wherein γ_t(j k) isL is sample size；C_tFor t scale factor.

(3), convergence is judged whether.If convergence, terminating, if being unsatisfactory for, continuing iteration.

In proper noun is classified.First training sample set and the test set of data base are obtained.To training sample Training sample set is trained on the basis of carrying out pretreatment and text representation by this collection, it is thus achieved that a grader.In assessment Test set is tested by the grader stage.After training sample is carried out pretreatment, obtain participle, by each proper noun Change into the vector being made up of morpheme.Utilize training sample to add up the word frequency of each morpheme and reverse frequency, and thus calculate every Individual morpheme is to the regularization word frequency of predefined classification and reverse frequency ratio, as this word d power to the corresponding i that classifies Value w_i(d)。

w_{i} (d) = \log N / \sqrt{Σ_{i = 1}^{N} \log^{2} (N / n_{i} + 0.1)}

Wherein N is proper noun sum, n_iFor comprising the proper noun quantity of entry i；During test, count respectively The weights sum that each is classified by pending proper noun, provides final classification results.

n (Z Y) = \max_{i &Element; [1, M]} (Σ_{j = 1}^{N} w_{i} ({ZY}_{j}))

Wherein ZY is proper noun to be sorted, and M is classification number Zy_jIt is jth morpheme in ZY.

Every voice command is arranged four contents by data base, is purpose, voice command original text, voice command respectively Participle information, the POS-tagging information of voice command.Generate the training file of participle, the test file of participle, POS-tagging Training file, the test file of POS-tagging.First the present invention carries out participle and POS-tagging to voice command.For repeatability Mistake, it may be found that mistake add in the self-defining dictionary of program modifying of batch to.

Before participle, a point word problem is converted into sequence mark problem.The beginning of tagged words, the centre of word, word The word that ending and single word are constituted, then uses the feature presetting template definition needs study, every a line generation in template file One template of table, what row in grand [row, the col] in each template represented is relative line number, and col represents is absolute Line number.According to the template defined in template file, training file is according to these template generation Feature Words.

Participle training file is utilized to carry out the learning training of condition random field, it is thus achieved that a Words partition system.Utilize part of speech mark Note training file carries out the learning training of condition random field, it is thus achieved that a POS-tagging system.Survey with participle test file Examination, obtains the precision of participle.POS-tagging test file is tested, obtains the precision of POS-tagging system.

After participle terminates, need that the result of participle is carried out suitable conversion so that later use POS-tagging system System carries out POS-tagging.For every voice command, Words partition system B1 word segmentation result of output.Utilize the part of speech mark that training obtains Note system carries out POS-tagging to file, and every voice command can obtain B1 word segmentation result, and this step is for each input Having B2 POS-tagging output result, therefore every voice command finally can obtain B1*B2 recognition result, therefrom filters out B excellent recognition result, writes back the participle of data base's Plays with POS-tagging information.When participle is same with POS-tagging result Time completely the same time, determine that recognition result is correct.

Wherein in generating B1 word segmentation result and B2 POS-tagging result, farther include to be tied by each participle respectively The probability of fruit extracts, and is saved in array p₁In, the probability of POS-tagging result extracts and is saved in p₂In array.For B1*B2 recognition result of every voice command can pass through p₁、p₂The two array calculates its generating probability, and calculating process is:

P [i]=p₂[i]*p₁[i/B2] i=0,1,2 ..., B1*B2-1

Generating probability p [i] of this B1*B2 recognition result is ranked up, B the recognition result that output probability is the highest.

After completing training study, the participle of the given voice command of output module output and POS-tagging result.For The voice command of input, searches dictionary, searches special noun in voice command, if finding, is replaced with the special symbol of correspondence Number, and write file 1；Meanwhile the proper noun bracket found is identified, and writes file 2.File 1 is converted Test the pattern of the input of module for condition random field participle and write file 3.File 3 is carried out participle, the result of participle is protected It is stored in file 4.The result of participle is converted into the pattern of the input of POS-tagging system and writes file 5.For every voice B1 the word segmentation result that order produces, is saved in the probability of word segmentation result in p1.Voice command in file 5 is carried out part of speech Labelling.Result is saved in file 6.B1*B2 recognition result is obtained the most altogether for every voice command, needs to export B good recognition result.B final result is written in file 7.File 7 is converted into the output format finally specified.

In the present invention, semantic understanding uses method based on statistical learning.Semantic big class in onboard system includes road of navigating Line, road conditions, answer and call, air-conditioning regulation, the function such as weather voice command, radio.Wherein part of semantic also needs to Parameter on band: as called it is to be appreciated that the concrete telephone number dialed.The semantic supposition problem of the present invention can be converted into The purpose of the text of input is assigned to predefined purpose apoplexy due to endogenous wind.First the type of voice command purpose is speculated, if purpose needs Further parameter, then resolve the parameter of correspondence in voice command.

Following conditional probability is modeled by POS-tagging problem:

p(s₁…s_m|x₁…x_m), for, x₁…x_mRepresent the single word in the voice command of input, and s₁…s_m∈ S generation The all possible part of speech of table combines.For one by x₁…x_mThe voice command constituted has k^mPlant POS-tagging combination.K=| S |.Hence set up this k^mPlant the probability distribution of POS-tagging result.

p (s_{1} ... s_{m} | x_{1} ... x_{m}) = Π_{i = 1}^{m} p (s_{i} | s_{i - 1}, x_{1} ... x_{m})

Then obtain:

Represent the finite aggregate characteristic vector of predefined set of words X and tag set Y, pass through s₁…s_mAnd x₁…x_m's A large amount of training, it is thus achieved that parameter vector w, finally give p (s₁…s_m|x₁…x_m)。

{argmax}_{s_{1} ... s_{m}} p (s_{1} ... s_{m} | x_{1} ... x_{m})

Word after the feature used is voice command participle can also be corresponding part of speech, or the combination of the two. After choosing series of features, add feature according to training sample, regulate weights.

m i n w^{T} w / 2 + C [m a x (1 - y_{i} w^{T} x_{i}, 0) + Σ_{i = 1}^{l} l o g (1 + e^{- y_{i} w^{T} x_{i}})]

During test, for an x according to w^TX > 0 provides classification results.

In sum, the present invention proposes a kind of voice signal matching process based on car networking, it is not necessary to off-line dictionary In labelling training sample set, little to the dependency of rule, improve accuracy of identification, adapt to the need that onboard system is constantly updated Ask..

Obviously, it should be appreciated by those skilled in the art, each module of the above-mentioned present invention or each step can be with general Calculating system realize, they can concentrate in single calculating system, or be distributed in multiple calculating system and formed Network on, alternatively, they can realize with the executable program code of calculating system, it is thus possible to by they store Performed by calculating system within the storage system.So, the present invention is not restricted to the combination of any specific hardware and software.

It should be appreciated that the above-mentioned detailed description of the invention of the present invention is used only for exemplary illustration or explains the present invention's Principle, and be not construed as limiting the invention.Therefore, that is done in the case of without departing from the spirit and scope of the present invention is any Amendment, equivalent, improvement etc., should be included within the scope of the present invention.Additionally, claims purport of the present invention Whole within containing the equivalents falling into scope and border or this scope and border change and repair Change example.

Claims

1. a voice signal matching process based on car networking, it is characterised in that including:

Corpus is learnt thus sets up segmenter, then participle is carried out POS-tagging and semantic supposition, is speculating During first speculate the type of voice command purpose, if purpose needs further parameter, then it is right to resolve in voice command The parameter answered.

Method the most according to claim 1, it is characterised in that described participle is carried out POS-tagging, farther includes:

Following conditional probability is modeled by POS-tagging problem:

p(s₁…s_m|x₁…x_m), x₁…x_mRepresent the single word in the voice command of input, and s₁…s_m∈ S represent all can The part of speech combination of energy, for one by x₁…x_mThe voice command constituted has k^mPlant POS-tagging combination, k=| S |；Set up this k^mThe probability distribution planting POS-tagging result is:

p (s_{1} ... s_{m} | x_{1} ... x_{m}) = Π_{i = 1}^{m} p (s_{i} | s_{i - 1}, x_{1} ... x_{m})

Then obtain:

Represent the finite aggregate characteristic vector of predefined set of words X and tag set Y, pass through s₁…s_mAnd x₁…x_mA large amount of Training, it is thus achieved that parameter vector w, finally gives p (s₁…s_m|x₁…x_m)；

{argmax}_{s_{1} ... s_{m}} p (s_{1} ... s_{m} | x_{1} ... x_{m})

The feature used is the part of speech of the word after voice command participle or correspondence, or the combination of the two；When choosing one After series of features, add feature according to training sample, regulate weights；

For in the text classification problem of semantics recognition, for the training sample (x of given set of words X Yu tag set Y_i, y_i), i=l ..., n, n are total sample number, arrange following optimization problem:

m i n w^{T} w / 2 + C [m a x (1 - y_{i} w^{T} x_{i}, 0) + Σ_{i = 1}^{l} l o g (1 + e^{- y_{i} w^{T} x_{i}})]

During test, for an x according to w^TX > 0 provides classification results.