CN106128454A - Voice signal matching process based on car networking - Google Patents

Voice signal matching process based on car networking Download PDF

Info

Publication number
CN106128454A
CN106128454A CN201610534864.0A CN201610534864A CN106128454A CN 106128454 A CN106128454 A CN 106128454A CN 201610534864 A CN201610534864 A CN 201610534864A CN 106128454 A CN106128454 A CN 106128454A
Authority
CN
China
Prior art keywords
tagging
voice command
pos
participle
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610534864.0A
Other languages
Chinese (zh)
Inventor
谢欣霖
陈波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Zhida Science And Technology Co Ltd
Original Assignee
Chengdu Zhida Science And Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Zhida Science And Technology Co Ltd filed Critical Chengdu Zhida Science And Technology Co Ltd
Priority to CN201610534864.0A priority Critical patent/CN106128454A/en
Publication of CN106128454A publication Critical patent/CN106128454A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/14Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
    • G10L15/142Hidden Markov Models [HMMs]
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/14Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
    • G10L15/142Hidden Markov Models [HMMs]
    • G10L15/144Training of HMMs

Abstract

The invention provides a kind of voice signal matching process based on car networking, the method includes: learns corpus thus sets up segmenter, then participle is carried out POS-tagging and semantic supposition, first the type of voice command purpose is speculated during speculating, if purpose needs further parameter, then in voice command, resolve the parameter of correspondence.The present invention proposes a kind of voice signal matching process based on car networking, it is not necessary to the labelling training sample set in off-line dictionary, little to the dependency of rule, improves accuracy of identification, adapts to the demand that onboard system is constantly updated.

Description

Voice signal matching process based on car networking
Technical field
The present invention relates to speech recognition, particularly to a kind of voice signal matching process based on car networking.
Background technology
The technology such as cloud computing, big data, data mining, by promoting the faster and better development of information services industry, wherein merge The information service of natural language understanding technology, obtains required information and service more accurately and efficiently by guiding people.Voice Technology, as the man-machine interaction mode of desirable, will progressively become one more and more crucial in many interactive modes. Specific to automotive field, available natural language understanding technology customizes out highly practical intelligent information service system, with more people Property man-machine interaction mode more convenient, information speech order accurately and navigation are provided, will be prospect for the experience driven Wide lifting.But existing vehicle environment speech recognition is by needing a large amount of labellings training samples in bigger off-line dictionary This collection carries out study and carries out semantic supposition, higher, the most flexibly to the dependency of rule, it is impossible to enough adapt to onboard system continuous Change demand and precision and accuracy relatively low.
Summary of the invention
For solving the problem existing for above-mentioned prior art, the present invention proposes a kind of voice signal based on car networking Method of completing the square, including:
Corpus is learnt thus sets up segmenter, then participle is carried out POS-tagging and semantic supposition, First speculate the type of voice command purpose during supposition, if purpose needs further parameter, then solve in voice command The parameter that analysis is corresponding.
Preferably, described participle is carried out POS-tagging, farther includes:
Following conditional probability is modeled by POS-tagging problem:
p(s1…sm|x1…xm), x1…xmRepresent the single word in the voice command of input, and s1…sm∈ S represents institute Possible part of speech combines, for one by x1…xmThe voice command constituted has kmPlant POS-tagging combination, k=| S |;Build This k verticalmThe probability distribution planting POS-tagging result is:
p ( s 1 ... s m | x 1 ... x m ) = Π i = 1 m p ( s i | s i - 1 , x 1 ... x m )
Then obtain:
Represent the finite aggregate characteristic vector of predefined set of words X and tag set Y, pass through s1…smAnd x1…xm's A large amount of training, it is thus achieved that parameter vector w, finally give p (s1…sm|x1…xm);
After completing training, ask for for input x1…xmState s1…sm, i.e. solve:
argmax s 1 ... s m p ( s 1 ... s m | x 1 ... x m )
The feature used is the part of speech of the word after voice command participle or correspondence, or the combination of the two;Work as selection After good series of features, add feature according to training sample, regulate weights;
For in the text classification problem of semantics recognition, for the training sample of given set of words X Yu tag set Y (xi,yi), i=l ..., n, n are total sample number, arrange following optimization problem:
m i n w T w / 2 + C [ m a x ( 1 - y i w T x i , 0 ) + Σ i = 1 l l o g ( 1 + e - y i w T x i ) ]
During test, for an x according to wTX > 0 provides classification results.
The present invention compared to existing technology, has the advantage that
The present invention proposes a kind of voice signal matching process based on car networking, it is not necessary to the labelling instruction in off-line dictionary Practice sample set, little to the dependency of rule, improve accuracy of identification, adapt to the demand that onboard system is constantly updated.
Accompanying drawing explanation
Fig. 1 is the flow chart of voice signal matching process based on car networking according to embodiments of the present invention.
Detailed description of the invention
Hereafter provide retouching in detail one or more embodiment of the present invention together with the accompanying drawing of the diagram principle of the invention State.Describe the present invention in conjunction with such embodiment, but the invention is not restricted to any embodiment.The scope of the present invention is only by right Claim limits, and the present invention contains many replacements, amendment and equivalent.Illustrate in the following description many details with Thorough understanding of the present invention is just provided.These details are provided for exemplary purposes, and without in these details Some or all details can also realize the present invention according to claims.
An aspect of of the present present invention provides a kind of voice signal matching process based on car networking.Fig. 1 is according to the present invention The voice signal matching process flow chart based on car networking of embodiment.
The onboard system of the present invention is combined by identification module, semantic supposition sort module.Utilize machine learning method Corpus is effectively learnt thus sets up segmenter.CRF is utilized to carry out POS-tagging.Then semantic supposition is carried out. Classification to proper noun, it is simple to the storage of proper noun and tissue and voice command.
The identification process of identification module can be described as: reads in voice to be identified, after front-end processing, and the sight that will obtain Surveying sequence X to mate all entries, namely design conditions probability, the entry that wherein model of maximum probability is corresponding is just It it is recognition result.Complete above-mentioned identification, it is necessary to first complete learning model training.
Go out the speech primitive corresponding to characteristic parameter sequence to be identified from probability angle estimation, this primitive include word, syllable, Initial consonant, simple or compound vowel of a Chinese syllable, thus characteristic parameter sequence is converted to Recognition unit.The process setting up model includes:
1, randomly choose initial parameter value, initialize HMM model λ.
2, it is that each state carries out cutting by observation sequence.The result of cutting is exactly the corresponding observation frame of each state Set.
3, utilizing segmentation K mean algorithm that the observation vector set belonging to each state is divided into M bunch, M is Gaussian Mixture Degree, every cluster, for a single Gaussian parameter of Gaussian-mixture probability density, carries out the estimation of following parameter afterwards.
cjkThe vector sum of the vectorial number of kth set/belong to state j it is in during=state j
μjkThe vectorial sample average of kth set it is in under=state j
UjkThe vectorial sample covariance matrix of kth set it is in under=state j
A HMM model λ ' updated is obtained according to above-mentioned parameter;
4, model λ ' is compared with initial model λ, if model difference exceedes threshold value set in advance, by model λ replaces with λ ' and repeat the above steps 2 and 3;If difference is less than threshold value, it is determined that restrains for model, preserves this mould Type.
By above-mentioned iteration so that initial parameter value constantly being corrected during whole model training.
In the training stage of model, directly by MFCC characteristic parameter as observation, a MFCC vector is exactly a sight Measured value.Secondly parameter b of composition probability density function is calculatedj(o):
bj(o)=∑ cjmM(o,μjm,Ujm), 1 < j < N
Wherein o represents observation, cjmRepresenting the m-th mixed coefficint of state j, M is ellipsometry density function, and it is by putting down Mean vector μjmWith covariance matrix Ujm
Model training process is as follows:
(1), all of training characteristics parameter is sliced in each state.
(2), by the characteristic parameter that each state is had give the single Gauss model of mixed Gauss model, model is entered Row is following to be revised:
C , j k = &Sigma; l = 1 L &gamma; t ( j , k ) &Sigma; l = 1 L &Sigma; m = 1 M &gamma; t ( j , m )
&mu; , j k = &Sigma; l = 1 L &gamma; t ( j , k ) o &Sigma; l = 1 L &gamma; t ( j , k )
U , j k = &Sigma; l = 1 L &gamma; t ( j , k ) ( o - &mu; j k ) &Sigma; l = 1 L &gamma; t ( j , k )
Wherein γt(j k) isL is sample size;CtFor t scale factor.
(3), convergence is judged whether.If convergence, terminating, if being unsatisfactory for, continuing iteration.
In proper noun is classified.First training sample set and the test set of data base are obtained.To training sample Training sample set is trained on the basis of carrying out pretreatment and text representation by this collection, it is thus achieved that a grader.In assessment Test set is tested by the grader stage.After training sample is carried out pretreatment, obtain participle, by each proper noun Change into the vector being made up of morpheme.Utilize training sample to add up the word frequency of each morpheme and reverse frequency, and thus calculate every Individual morpheme is to the regularization word frequency of predefined classification and reverse frequency ratio, as this word d power to the corresponding i that classifies Value wi(d)。
w i ( d ) = log N / &Sigma; i = 1 N log 2 ( N / n i + 0.1 )
Wherein N is proper noun sum, niFor comprising the proper noun quantity of entry i;During test, count respectively The weights sum that each is classified by pending proper noun, provides final classification results.
n ( Z Y ) = max i &Element; &lsqb; 1 , M &rsqb; ( &Sigma; j = 1 N w i ( ZY j ) )
Wherein ZY is proper noun to be sorted, and M is classification number ZyjIt is jth morpheme in ZY.
Every voice command is arranged four contents by data base, is purpose, voice command original text, voice command respectively Participle information, the POS-tagging information of voice command.Generate the training file of participle, the test file of participle, POS-tagging Training file, the test file of POS-tagging.First the present invention carries out participle and POS-tagging to voice command.For repeatability Mistake, it may be found that mistake add in the self-defining dictionary of program modifying of batch to.
Before participle, a point word problem is converted into sequence mark problem.The beginning of tagged words, the centre of word, word The word that ending and single word are constituted, then uses the feature presetting template definition needs study, every a line generation in template file One template of table, what row in grand [row, the col] in each template represented is relative line number, and col represents is absolute Line number.According to the template defined in template file, training file is according to these template generation Feature Words.
Participle training file is utilized to carry out the learning training of condition random field, it is thus achieved that a Words partition system.Utilize part of speech mark Note training file carries out the learning training of condition random field, it is thus achieved that a POS-tagging system.Survey with participle test file Examination, obtains the precision of participle.POS-tagging test file is tested, obtains the precision of POS-tagging system.
After participle terminates, need that the result of participle is carried out suitable conversion so that later use POS-tagging system System carries out POS-tagging.For every voice command, Words partition system B1 word segmentation result of output.Utilize the part of speech mark that training obtains Note system carries out POS-tagging to file, and every voice command can obtain B1 word segmentation result, and this step is for each input Having B2 POS-tagging output result, therefore every voice command finally can obtain B1*B2 recognition result, therefrom filters out B excellent recognition result, writes back the participle of data base's Plays with POS-tagging information.When participle is same with POS-tagging result Time completely the same time, determine that recognition result is correct.
Wherein in generating B1 word segmentation result and B2 POS-tagging result, farther include to be tied by each participle respectively The probability of fruit extracts, and is saved in array p1In, the probability of POS-tagging result extracts and is saved in p2In array.For B1*B2 recognition result of every voice command can pass through p1、p2The two array calculates its generating probability, and calculating process is:
P [i]=p2[i]*p1[i/B2] i=0,1,2 ..., B1*B2-1
Generating probability p [i] of this B1*B2 recognition result is ranked up, B the recognition result that output probability is the highest.
After completing training study, the participle of the given voice command of output module output and POS-tagging result.For The voice command of input, searches dictionary, searches special noun in voice command, if finding, is replaced with the special symbol of correspondence Number, and write file 1;Meanwhile the proper noun bracket found is identified, and writes file 2.File 1 is converted Test the pattern of the input of module for condition random field participle and write file 3.File 3 is carried out participle, the result of participle is protected It is stored in file 4.The result of participle is converted into the pattern of the input of POS-tagging system and writes file 5.For every voice B1 the word segmentation result that order produces, is saved in the probability of word segmentation result in p1.Voice command in file 5 is carried out part of speech Labelling.Result is saved in file 6.B1*B2 recognition result is obtained the most altogether for every voice command, needs to export B good recognition result.B final result is written in file 7.File 7 is converted into the output format finally specified.
In the present invention, semantic understanding uses method based on statistical learning.Semantic big class in onboard system includes road of navigating Line, road conditions, answer and call, air-conditioning regulation, the function such as weather voice command, radio.Wherein part of semantic also needs to Parameter on band: as called it is to be appreciated that the concrete telephone number dialed.The semantic supposition problem of the present invention can be converted into The purpose of the text of input is assigned to predefined purpose apoplexy due to endogenous wind.First the type of voice command purpose is speculated, if purpose needs Further parameter, then resolve the parameter of correspondence in voice command.
Following conditional probability is modeled by POS-tagging problem:
p(s1…sm|x1…xm), for, x1…xmRepresent the single word in the voice command of input, and s1…sm∈ S generation The all possible part of speech of table combines.For one by x1…xmThe voice command constituted has kmPlant POS-tagging combination.K=| S |.Hence set up this kmPlant the probability distribution of POS-tagging result.
p ( s 1 ... s m | x 1 ... x m ) = &Pi; i = 1 m p ( s i | s i - 1 , x 1 ... x m )
Then obtain:
Represent the finite aggregate characteristic vector of predefined set of words X and tag set Y, pass through s1…smAnd x1…xm's A large amount of training, it is thus achieved that parameter vector w, finally give p (s1…sm|x1…xm)。
After completing training, ask for for input x1…xmState s1…sm, i.e. solve:
argmax s 1 ... s m p ( s 1 ... s m | x 1 ... x m )
Word after the feature used is voice command participle can also be corresponding part of speech, or the combination of the two. After choosing series of features, add feature according to training sample, regulate weights.
For in the text classification problem of semantics recognition, for the training sample of given set of words X Yu tag set Y (xi,yi), i=l ..., n, n are total sample number, arrange following optimization problem:
m i n w T w / 2 + C &lsqb; m a x ( 1 - y i w T x i , 0 ) + &Sigma; i = 1 l l o g ( 1 + e - y i w T x i ) &rsqb;
During test, for an x according to wTX > 0 provides classification results.
In sum, the present invention proposes a kind of voice signal matching process based on car networking, it is not necessary to off-line dictionary In labelling training sample set, little to the dependency of rule, improve accuracy of identification, adapt to the need that onboard system is constantly updated Ask..
Obviously, it should be appreciated by those skilled in the art, each module of the above-mentioned present invention or each step can be with general Calculating system realize, they can concentrate in single calculating system, or be distributed in multiple calculating system and formed Network on, alternatively, they can realize with the executable program code of calculating system, it is thus possible to by they store Performed by calculating system within the storage system.So, the present invention is not restricted to the combination of any specific hardware and software.
It should be appreciated that the above-mentioned detailed description of the invention of the present invention is used only for exemplary illustration or explains the present invention's Principle, and be not construed as limiting the invention.Therefore, that is done in the case of without departing from the spirit and scope of the present invention is any Amendment, equivalent, improvement etc., should be included within the scope of the present invention.Additionally, claims purport of the present invention Whole within containing the equivalents falling into scope and border or this scope and border change and repair Change example.

Claims (2)

1. a voice signal matching process based on car networking, it is characterised in that including:
Corpus is learnt thus sets up segmenter, then participle is carried out POS-tagging and semantic supposition, is speculating During first speculate the type of voice command purpose, if purpose needs further parameter, then it is right to resolve in voice command The parameter answered.
Method the most according to claim 1, it is characterised in that described participle is carried out POS-tagging, farther includes:
Following conditional probability is modeled by POS-tagging problem:
p(s1…sm|x1…xm), x1…xmRepresent the single word in the voice command of input, and s1…sm∈ S represent all can The part of speech combination of energy, for one by x1…xmThe voice command constituted has kmPlant POS-tagging combination, k=| S |;Set up this kmThe probability distribution planting POS-tagging result is:
p ( s 1 ... s m | x 1 ... x m ) = &Pi; i = 1 m p ( s i | s i - 1 , x 1 ... x m )
Then obtain:
Represent the finite aggregate characteristic vector of predefined set of words X and tag set Y, pass through s1…smAnd x1…xmA large amount of Training, it is thus achieved that parameter vector w, finally gives p (s1…sm|x1…xm);
After completing training, ask for for input x1…xmState s1…sm, i.e. solve:
argmax s 1 ... s m p ( s 1 ... s m | x 1 ... x m )
The feature used is the part of speech of the word after voice command participle or correspondence, or the combination of the two;When choosing one After series of features, add feature according to training sample, regulate weights;
For in the text classification problem of semantics recognition, for the training sample (x of given set of words X Yu tag set Yi, yi), i=l ..., n, n are total sample number, arrange following optimization problem:
m i n w T w / 2 + C &lsqb; m a x ( 1 - y i w T x i , 0 ) + &Sigma; i = 1 l l o g ( 1 + e - y i w T x i ) &rsqb;
During test, for an x according to wTX > 0 provides classification results.
CN201610534864.0A 2016-07-08 2016-07-08 Voice signal matching process based on car networking Pending CN106128454A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610534864.0A CN106128454A (en) 2016-07-08 2016-07-08 Voice signal matching process based on car networking

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610534864.0A CN106128454A (en) 2016-07-08 2016-07-08 Voice signal matching process based on car networking

Publications (1)

Publication Number Publication Date
CN106128454A true CN106128454A (en) 2016-11-16

Family

ID=57283136

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610534864.0A Pending CN106128454A (en) 2016-07-08 2016-07-08 Voice signal matching process based on car networking

Country Status (1)

Country Link
CN (1) CN106128454A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112581958A (en) * 2020-12-07 2021-03-30 中国南方电网有限责任公司 Short voice intelligent navigation method applied to electric power field

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000322088A (en) * 1999-05-14 2000-11-24 Hitachi Ltd Speech recognition microphone and speech recognition system and speech recognition method
CN103971675A (en) * 2013-01-29 2014-08-06 腾讯科技(深圳)有限公司 Automatic voice recognizing method and system
CN103971686A (en) * 2013-01-30 2014-08-06 腾讯科技(深圳)有限公司 Method and system for automatically recognizing voice
CN104160392A (en) * 2012-03-07 2014-11-19 三菱电机株式会社 Device, method, and program for estimating meaning of word
CN105389303A (en) * 2015-10-27 2016-03-09 北京信息科技大学 Automatic heterogenous corpus fusion method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000322088A (en) * 1999-05-14 2000-11-24 Hitachi Ltd Speech recognition microphone and speech recognition system and speech recognition method
CN104160392A (en) * 2012-03-07 2014-11-19 三菱电机株式会社 Device, method, and program for estimating meaning of word
CN103971675A (en) * 2013-01-29 2014-08-06 腾讯科技(深圳)有限公司 Automatic voice recognizing method and system
CN103971686A (en) * 2013-01-30 2014-08-06 腾讯科技(深圳)有限公司 Method and system for automatically recognizing voice
CN105389303A (en) * 2015-10-27 2016-03-09 北京信息科技大学 Automatic heterogenous corpus fusion method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
查道德: "辅助驾驶信息系统的设计与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112581958A (en) * 2020-12-07 2021-03-30 中国南方电网有限责任公司 Short voice intelligent navigation method applied to electric power field
CN112581958B (en) * 2020-12-07 2024-04-09 中国南方电网有限责任公司 Short voice intelligent navigation method applied to electric power field

Similar Documents

Publication Publication Date Title
CN105244029B (en) Voice recognition post-processing method and system
EP2727103B1 (en) Speech recognition using variable-length context
CN104978587B (en) A kind of Entity recognition cooperative learning algorithm based on Doctype
CN105205124B (en) A kind of semi-supervised text sentiment classification method based on random character subspace
CN106557462A (en) Name entity recognition method and system
CN110517693B (en) Speech recognition method, speech recognition device, electronic equipment and computer-readable storage medium
CN107330011A (en) The recognition methods of the name entity of many strategy fusions and device
CN103678271B (en) A kind of text correction method and subscriber equipment
CN101562012B (en) Method and system for graded measurement of voice
CN104462066A (en) Method and device for labeling semantic role
CN106340297A (en) Speech recognition method and system based on cloud computing and confidence calculation
CN103544309A (en) Splitting method for search string of Chinese vertical search
CN103854643A (en) Method and apparatus for speech synthesis
CN104616029A (en) Data classification method and device
CN113495900A (en) Method and device for acquiring structured query language sentences based on natural language
CN106529525A (en) Chinese and Japanese handwritten character recognition method
CN104750779A (en) Chinese multi-class word identification method based on conditional random field
CN106202045A (en) Special audio recognition method based on car networking
CN106057196A (en) Vehicular voice data analysis identification method
CN110232128A (en) Topic file classification method and device
CN104881399A (en) Event identification method and system based on probability soft logic PSL
CN108681532A (en) A kind of sentiment analysis method towards Chinese microblogging
CN112530402B (en) Speech synthesis method, speech synthesis device and intelligent equipment
CN106203520B (en) SAR image classification method based on depth Method Using Relevance Vector Machine
CN106128454A (en) Voice signal matching process based on car networking

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20161116

WD01 Invention patent application deemed withdrawn after publication