CN103810998A - Method for off-line speech recognition based on mobile terminal device and achieving method - Google Patents

Method for off-line speech recognition based on mobile terminal device and achieving method Download PDF

Info

Publication number
CN103810998A
CN103810998A CN201310652535.2A CN201310652535A CN103810998A CN 103810998 A CN103810998 A CN 103810998A CN 201310652535 A CN201310652535 A CN 201310652535A CN 103810998 A CN103810998 A CN 103810998A
Authority
CN
China
Prior art keywords
character string
field
mobile terminal
terminal device
dictionary
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201310652535.2A
Other languages
Chinese (zh)
Other versions
CN103810998B (en
Inventor
李林
徐礼奎
呼延正勇
方帅
张晓东
叶思菁
姚晓闯
刘哲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Agricultural University
Original Assignee
China Agricultural University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Agricultural University filed Critical China Agricultural University
Priority to CN201310652535.2A priority Critical patent/CN103810998B/en
Publication of CN103810998A publication Critical patent/CN103810998A/en
Application granted granted Critical
Publication of CN103810998B publication Critical patent/CN103810998B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Machine Translation (AREA)

Abstract

The invention provides a method for off-line speech recognition based on a mobile terminal device. The method comprises the steps of obtaining speech signals and extracting speech feature vectors corresponding to the speech signals; performing matching on the speech feature vectors on the basis of an acoustic model preset in the mobile terminal device, and obtaining language character strings corresponding to the speech feature vectors; performing matching on the language character strings on the basis of a language model preset in the mobile terminal device and a dictionary, and obtaining corresponding matching text data of the speech feature vectors; calculating output probabilities of the speech feature vectors in the acoustic model, obtaining corresponding matched text data of the corresponding speech feature vectors on the basis of the maximum output probability among the output probabilities, and obtaining final recognition results of the speech signals. The matching of the speech signals in specific fields is achieved by utilizing the preset acoustic model and the language model in the mobile terminal device and the dictionary, the speech signals are converted into the text data, the final recognition results are obtained, and off-line speech recognition is achieved finally.

Description

Off-line audio recognition method and implementation method based on mobile terminal device
Technical field
The present invention relates to the field of speech recognition of mobile terminal, a kind of off-line audio recognition method based on mobile terminal device and the implementation method based on the speech recognition of mobile terminal device off-line are especially provided.
Background technology
Field data collection program based on mobile terminal refers to that the intelligent movable equipment that operates in (flat board, smart mobile phone, portable computer etc.) is upper, the built-in application program that provides computer technology to support for field study work.For simplifying the acquisition mode of field data, shorten data collection cycle, strengthen data typing standardization level and data typing, the efficiency of management, exist at present many field data collection programs, extensive in numerous sector applications such as agricultural, forestry, meteorology, geology, entomology, ecology.
Field data collection program construction and applied research start from the nineties in 20th century, present field acquisition system generally all adopts the mode logging data that utilizes keyboard input, but the keyboard of smart mobile phone is smaller, people's finger is larger, when input data, often there is situation about pressing the wrong button, and both hands are all occupied when logging data, so just cause the efficiency of logging data lower, so just affected the further widespread use of Field data collection system.The application of speech recognition technology, will become and break the low powerful mean of restriction conventional keyboard input data efficiency.
Speech recognition technology is a cross discipline that relates to signal processing, pattern-recognition, theory of probability and information theory, sounding hearing mechanism, artificial intelligence etc., its target is that the vocabulary content in human speech is converted to computer-readable input, thereby reaches the target of man-machine interaction more naturally.Relatively the speech recognition software of main flow is all the technology of processing based on high in the clouds, internet at present, be client input voice, server end speech recognition, recognition result returns to client, and the advantage of this technology is to utilize the powerful speech processes ability of server end; Save the space of client stores language model, acoustic model and dictionary; Can identify a large amount of universal word amount voice, but it can not identify the uncommon vocabulary of application-specific industry, and need to network, in the time that network model is bad, can not guarantee processing speed, so being not suitable in environmental baseline is not the field acquisition system that very good field is used, and therefore needs a kind of speech recognition technology based on off-line to support field acquisition system to seem very important and urgent.
Summary of the invention
(1) technical matters that will solve
The object of the invention is, a kind of off-line audio recognition method based on mobile terminal device is provided, the in the situation that of off-line, realize speech recognition thereby realize.
(2) technical scheme
For solving the problems of the technologies described above, the invention provides a kind of off-line audio recognition method based on mobile terminal device, comprising:
Obtain voice signal and extract the speech feature vector that described voice signal is corresponding;
Acoustic model based on preset in described mobile terminal device mates described speech feature vector, obtains the corresponding language character string of described speech feature vector; And language model and dictionary based on preset in described mobile terminal device mate described language character string, obtain the corresponding matched text data of described speech feature vector;
Calculate the output probability of described speech feature vector in described acoustic model, and output probability based on maximum in described output probability, obtain the corresponding matched text data of corresponding speech feature vector, obtain the final recognition result of described voice signal.
Wherein, the described audio recognition method based on mobile terminal device also comprises: described final recognition result is carried out to participle.
Concrete, described final recognition result is carried out to participle and comprises:
S501, the Chinese character that in a participle dictionary, maximum entry comprises is set counts n; Wherein, the matched text data that described final recognition result is corresponding are Chinese character string;
S502, get front n character in described Chinese character string sequence as matching field, search described participle dictionary;
If have the words corresponding with described matching field in described participle dictionary, the match is successful, and described matching field is split out as a word, and be stored into another character string newString, and separate by blank character and other words;
If can not find a words corresponding with described matching field in described participle dictionary, it fails to match, enters step S503;
S503, n is become to n-1, then the matching field for mating step S502 being taken out removes last Chinese character, as new matching field, search described participle dictionary, if have the words corresponding with new matching field in described participle dictionary, the match is successful, and described new matching field is split out as a word, and be stored in character string newString;
If it fails to match, repeating step S502-S503, extremely till described new matching field is matched to merit;
S504, repeating step S503, until all characters with matching field are matched to merit in described Chinese character string, complete the participle to described Chinese character string.
Wherein, the described audio recognition method based on mobile terminal device also comprises: described final recognition result is shown to interface bivariate table.
Concrete, described final recognition result is shown to interface bivariate table and comprises:
S601, determine the field that described interface bivariate table need to gather, and these locality field deposit in character string array KeyWordString;
S602, the character string after participle, utilize split function take blank character as mark is divided into multiple fields, deposit in character string array InputString;
S603, from character string array InputString, take out a field, compare item by item with the field in KeyWordString, if there is coupling, this field in array InputString in corresponding subscript i storage array PointKeyWord; If do not mated, do not carry out any operation; Wherein, 1=<i<=n, n is the number of field in character string array InputString, i, n are positive integer;
S604, from InputString, take out next field, compare item by item with the field in keyWordString, if the match is successful, this field corresponding subscript i+1 in InputString is deposited in PointKeyWord, array ValueString[i] be set to sky, if do not mated, ValueString[i] value be set to this field;
S605, repeating step S603 and step S604, to mating complete to all fields in InputString;
S606, coupling result store in Hashmap in the mode of key-value pair, utilize the Key of key-value pair and the gauge outfit of bivariate table to compare, and the value in key-value pair deposited in to the bivariate table at interface.
Concrete, the described audio recognition method based on mobile terminal device mates described speech feature vector by dimension bit algorithm.
Concrete, the described audio recognition method based on mobile terminal device mates described language character string by NGram algorithm.
For solving the problems of the technologies described above, the present invention also provides a kind of implementation method based on the speech recognition of mobile terminal device off-line, comprising:
Collection project vocabulary;
Utilize HMM model training acoustic model data and language model data based on described project vocabulary;
Acoustic model data based on completing training is set up acoustic model, and the language model data based on completing training are set up language model, and utilizes text editor to create dictionary;
Described acoustic model, language model and dictionary are stored in to described mobile terminal device.
Wherein, described acoustic model data is HMM parameter optimization Algorithm for Training based on segmentation K mean algorithm.
Wherein, described language model data are based on NGram Algorithm for Training.
(3) beneficial effect
Be different from background technology, cardinal principle of the present invention is that voice signal is converted into text data, obtain final recognition result, its main implementation procedure is to utilize acoustic model preset in mobile terminal device, language model and dictionary to realize the coupling for the voice signal of specific area, finally realizes off-line speech recognition.Further, the present invention has realized in the time gathering the information data of specific area, does not need manually input just can complete the collection of information data, has greatly improved collecting efficiency, has reduced the cost of image data.
Accompanying drawing explanation
Fig. 1 is the implementation method schematic flow sheet based on the speech recognition of mobile terminal device off-line in embodiment mono-;
Fig. 2 be embodiment illustrated in fig. 1 in HMM parameter training process flow diagram based on segmentation K mean algorithm;
Fig. 3 is the overall procedure schematic diagram that the present invention is based on the off-line audio recognition method of mobile terminal device;
Fig. 4 is the off-line audio recognition method schematic flow sheet based on mobile terminal device in embodiment bis-;
Fig. 5 is middle Chinese word segmentation schematic flow sheet embodiment illustrated in fig. 4;
Fig. 6 is the schematic flow sheet that after middle Chinese word segmentation embodiment illustrated in fig. 4, result echo arrives system interface bivariate table;
Fig. 7 is the realize system recorded voice oscillogram of the embodiment of the present invention three based on the speech recognition of mobile terminal device off-line;
Fig. 8 is the realize system acoustic analysis figure of the embodiment of the present invention three based on the speech recognition of mobile terminal device off-line;
Fig. 9 is the realize system analysis.conf configuration file figure of the embodiment of the present invention three based on the speech recognition of mobile terminal device off-line;
Figure 10 is the realize system HMM prototype file map of the embodiment of the present invention three based on the speech recognition of mobile terminal device off-line;
Figure 11 is the realize system HMM training process figure of the embodiment of the present invention three based on the speech recognition of mobile terminal device off-line;
Figure 12 is the realize system HTK tool rack composition of the embodiment of the present invention three based on the speech recognition of mobile terminal device off-line;
Figure 13 is the realize system HTK speech processes process flow diagram of the embodiment of the present invention three based on the speech recognition of mobile terminal device off-line.
Embodiment
For making object of the present invention, content and advantage clearer, below in conjunction with drawings and Examples, the specific embodiment of the present invention is described in further detail.Following examples are used for illustrating the present invention, but are not used for limiting the scope of the invention.
Embodiment mono-
The present embodiment provides a kind of implementation method based on the speech recognition of mobile terminal device off-line, and the method originates in step 101, gathers project vocabulary, and here project vocabulary is for the professional term of specific area, for everyday expressions and the phrase of specific area.
In step 102, utilize hidden Markov HMM model training acoustic model data and the language model data based on probability statistics based on described project vocabulary.Described acoustic model data is HMM parameter optimization Algorithm for Training based on segmentation K mean algorithm.Described language model data are based on NGram Algorithm for Training.
NGram algorithm is mainly that the appearance of N word is only to N-1 word is relevant above based on so a kind of hypothesis, and all uncorrelated with other any word, the probability of whole sentence is exactly the product of each word probability of occurrence.These probability can obtain by directly add up the number of times that N word occur from language material simultaneously.That conventional is the Bi-Gram of binary and the Tri-Gram of ternary.Computing formula is as follows: P(w n| w 1, w 2..., w n-1)=C (w 1, w 2..., w n)/C (w 1, w 2..., w n-1), according to law of great number, in the situation that front N-1 word occurs, the probability of N word appearance is the frequency that frequency that this N word occurs simultaneously occurs divided by N-1 word simultaneously.Take promotion prospect as example, calculate exactly in situation about having occurred before promoting, the probability that scape occurs, i.e. P(scape | push away, wide, front)=C (push away, wide, front, scape)/C (pushing away, wide, front).Utilizing NGram Algorithm for Training speech data is exactly based on context to train the probability occurring in the situation that in dictionary, other words of each word occur simultaneously, then the result storage of training in speech model.
The quality of HMM model parameter directly affects the effect of speech recognition, in three parameters of HMM model M={ A, B, π }, state transition probability matrix A and original state Making by Probability Sets π are little on phonetic recognization rate impact, are conventionally set to the value of being uniformly distributed or non-zero random value.The initial value of B parameter arranges than A and π difficulty, simultaneously also most important.In the invention of this reality, for the shortcoming of Baum-Welch classic algorithm, sum up three kinds and improved algorithm, can improve algorithm by three kinds of HMM model trainings and realize training acoustic model data and language model data, specifically introduce respectively these three kinds below and improve algorithm.
The first: the HMM parameter optimization algorithm based on segmentation K mean algorithm
Refer to Fig. 2, in step 201, preset HMM model parameter initial value, initial value can be divided state method or experience obtains by decile;
Preset maximum iteration time I and convergence threshold ζ;
In step 202, with Viterbi algorithm, the training utterance data of input are carried out to state and cut apart;
In step 203, B parameter in model is reappraised by segmentation K mean algorithm.Be divided into two kinds of situations:
Discrete type system:
Whole speech frame numbers under speech frame occurrence number/state Sj of label i in bji=state Sj;
Continuous type system:
The probability density function of each state is to be formed by stacking by M normal distyribution function.
Speech frame number in class i speech frame number/state j in mixing constant ω ji=state j;
Sample average μ ji: the sample average of class i in state j;
Sample covariance matrix: class i sample covariance matrix in υ ji=state j;
Utilize above-mentioned parameter computation model parameter M*.
In step 204, as initial value, utilize Baum-Welch algorithm to the revaluation of HMM parameter with M*;
In step 205, forward step 202 to, until iterations > I or meet the condition of convergence.
As shown in Figure 2, segmentation K mean algorithm is the maximum likelihood criterion based on state optimization, can greatly accelerate the speed of convergence of model, can also in training, provide some additional informations simultaneously.
The second: the HMM parameter training based on genetic algorithm improves algorithm
(1) preset HMM model parameter initial value, initial value can evenly arrange or obtain by experience;
(2) preset maximum evolution number of times I and convergence threshold ζ;
(3) select optimal base because of: in every generation, according to adaptive value from high to low, according to certain ratio choose optimal base because of, the ratio of choosing that wherein adaptive value is high is corresponding wants high, adaptive value computing formula: f ( &lambda; ) = &Sigma; k = 1 N ln ( P ( O | &lambda; ) ) ;
(4) hybridization and variation: the i.e. appropriate section generation of two parents of hybridization is exchanged and produced filial generation, be equivalent to the search of a regional area, variation carries out the additions and deletions raw Variants of changing products to the subregion of parent, makes filial generation jump out current Local Search region, avoids sinking into too early local optimum.
(5) Renewal model parameter;
(6) if evolution number of times > I or meet the condition of convergence trains completely, otherwise forward (3) to.
The third: the HMM parameter training based on relaxed algorithm improves algorithm.
(1) preset HMM model parameter initial value, initial value can evenly arrange or obtain by experience;
(2) preset maximum iteration time I and convergence threshold ζ;
(3) make Tm=T0 × f (m), f (m)=Km, wherein m value is 0,1,2...I, K<1;
(4) example of N × M separate normal random variable X of generation, average EX=0, variance DX=Tm;
(5) utilize classical Baum-Welch algorithm to obtain bij parameter, making bij*=bij+x(x is the value of X above), 1≤i≤N, 1≤j≤M; If negative value appears in bij*, value is made as zero, is normalized simultaneously;
(6) if m > I or meet the condition of convergence trains completely, otherwise forward (3) to.
But in the present embodiment, the i.e. HMM parameter optimization algorithm based on segmentation K mean algorithm of the first before only having utilized in three kinds of optimized algorithms, this algorithm is mainly used for accelerating the training speed of model, and similar syllable is carried out to cluster, between the decode empty while shortening identification; And latter two improvement algorithm is mainly used for improving precision, speech recognition for us based on off-line, mainly for some vocabulary of specific industry application training, in model, vocabulary is little like this, the likelihood of different words is very low, use the DeGrain of these two kinds of algorithms, therefore a choice for use the first optimized algorithm.
In step 103, the acoustic model data based on completing training is set up acoustic model, and the language model data based on completing training are set up language model, and utilizes text editor to create dictionary.Wherein, the constructive process of dictionary and language model is as follows: utilize text editor to need manual creation text file according to application system, the Chinese character of the field information of needs collection and corresponding phonetic are write in text, then utilize lmtool instrument automatically to generate participle dictionary and language model.
In step 104, described acoustic model, language model and dictionary are stored in to described mobile terminal device.
After said process, for the professional term of specific area, acoustic model set up in everyday expressions and phrase for specific area, language model and dictionary, wherein, acoustic model mates described speech feature vector, language model and dictionary mate the character string after acoustic model coupling, can pass through above-mentioned acoustic model, language model and dictionary are realized the off-line speech recognition for specific area, in the time gathering the information data of specific area, do not need manually input just can complete the collection of information data, greatly improve collecting efficiency, reduce the cost of image data, solve the technical matters of mentioning in background technology.
Embodiment bis-
Refer to Fig. 3 and Fig. 4, the present embodiment provides a kind of off-line audio recognition method based on mobile terminal device, the method is the audio recognition method completing based on implementing an acoustic model of setting up, language model and dictionary, originate in step 401, obtain voice signal and extract the speech feature vector that described voice signal is corresponding.
In step 402, the acoustic model based on preset in described mobile terminal device mates described speech feature vector, obtains the corresponding language character string of described speech feature vector; And language model and dictionary based on preset in described mobile terminal device mate described language character string, obtain the corresponding matched text data of described speech feature vector.Concrete, the described audio recognition method based on mobile terminal device mates described speech feature vector by the dimension bit algorithm in HMM model.The described audio recognition method based on mobile terminal device mates described language character string by NGram algorithm.
In step 403, calculate the output probability of described speech feature vector in described acoustic model, and output probability based on maximum in described output probability, obtain the corresponding matched text data of corresponding speech feature vector, obtain the final recognition result of described voice signal.
Be not difficult to find out through foregoing description, the cardinal principle of the present embodiment is that voice signal is converted into text data, obtain final recognition result, its main implementation procedure is to utilize acoustic model preset in mobile terminal device, language model and dictionary to realize the coupling for the voice signal of specific area, finally realizes off-line speech recognition.Further, the present embodiment has been realized in the time gathering the information data of specific area, does not need manually input just can complete the collection of information data, has greatly improved collecting efficiency, has reduced the cost of image data.
Refer to Fig. 5 and Fig. 6, for making user further understand through the present embodiment final recognition result after treatment, the audio recognition method of the present embodiment based on mobile terminal device also comprises after obtaining described final recognition result:
Described final recognition result is carried out to participle;
And the result after participle is shown to interface bivariate table.
After above-mentioned steps, net result after participle is shown to accurately in system interface bivariate table and gathers the corresponding position of field, user can directly check the voice signal that oneself gathers with written form.In this way, user only inputs a secondary data just can complete data of collection, rather than input data at every turn and can only complete the information acquisition of a field, and improve greatly the efficiency of data acquisition, save time cost and the human cost of field information acquisition.
Fig. 5 is the schematic flow sheet that final recognition result is carried out to participle, and in the present embodiment, the matched text data corresponding take final recognition result make an explanation as Chinese character string as example.Concrete, described final recognition result is carried out to participle and comprises:
S501, the Chinese character that in a participle dictionary, maximum entry comprises is set counts n; Wherein, the matched text data that described final recognition result is corresponding are Chinese character string; Wherein, the constructive process of participle dictionary is as follows: utilize text editor to need manual creation text file according to application system, the Chinese character of the field information of needs collection and corresponding phonetic are write in text, then utilize lmtool instrument automatically to generate participle dictionary.
S502, get front n character in described Chinese character string sequence as matching field, search described participle dictionary.
If have the words corresponding with described matching field in described participle dictionary, the match is successful, and described matching field is split out as a word, and be stored into another character string newString, and separate by blank character and other words.
If can not find a words corresponding with described matching field in described participle dictionary, it fails to match, enters step S503.
S503, n is become to n-1, then the matching field for mating step S502 being taken out removes last Chinese character, as new matching field, search described participle dictionary, if have the words corresponding with new matching field in described participle dictionary, the match is successful, and described new matching field is split out as a word, and is stored in character string newString.
If it fails to match, repeating step S503, extremely till described new matching field is matched to merit.
S504, repeating step S502-S503, until all characters with matching field are matched to merit in described Chinese character string, complete the participle to described Chinese character string.
Refer to Fig. 6, Fig. 6 is through Chinese character string being carried out to result echo after the participle schematic flow sheet to system interface bivariate table, concrete, described final recognition result (be participle after result) is shown to interface bivariate table and comprises:
S601, determine the field that described interface bivariate table need to gather, and these locality field deposit in character string array KeyWordString; Wherein, the field of these collections is the field that interface bivariate table need to show, includes but are not limited to all fields after above-mentioned participle.
S602, the character string after participle, utilize split function take blank character as mark is divided into multiple fields, deposit in character string array InputString.
S603, from character string array InputString, take out a field, compare item by item with the field in KeyWordString, if there is coupling, this field in array InputString in corresponding subscript i storage array PointKeyWord; If do not mated, do not carry out any operation; Wherein, 1=<i<=n, n is the number of field in character string array InputString, i, n are positive integer.
S604, from InputString, take out next field, compare item by item with the field in keyWordString, if the match is successful, this field corresponding subscript i+1 in InputString is deposited in PointKeyWord, array ValueString[i] be set to sky, if do not mated, ValueString[i] value be set to this field.
S605, repeating step S603 and step S604, to mating complete to all fields in InputString.
S606, coupling result store in Hashmap in the mode of key-value pair, utilize the Key of key-value pair and the gauge outfit of bivariate table to compare, and the value in key-value pair deposited in to the bivariate table at interface.
Embodiment tri-
Refer to Fig. 7-13, the system that realizes of the present embodiment mobile terminal device off-line speech recognition is based on Android system platform, and that database is used is current free small-sized movable client database SQLite.The system that realizes of the present embodiment mobile terminal device speech recognition is divided into off-line sound identification module, and word-dividing mode is shown to word-dividing mode word segmentation result the echo module of page bivariate table and four modules of data processing.Off-line sound identification module mainly completes speech model data (comprising acoustic model data and language model data) training, the establishment of language model, acoustic model and dictionary, and the cardinal principle of off-line speech recognition is that voice signal is changed into text signal; Word-dividing mode be mainly responsible for character string word that off-line sound identification module is identified according to string matching method change into one by one phrase and with as comma with or/space uniformly-spaced accords with separating and connects into a character string; Echo module mainly realizes through word-dividing mode character string after treatment, is shown to accurately in bivariate table according to the mode of mating with interface bivariate table gauge outfit; Data processing module mainly completes the operation to data in bivariate table and database data in mobile terminal device is realized and being synchronizeed with data in server.Introduce the specific implementation process of above-mentioned each module below.
1, off-line sound identification module
At the sound identification module of native system, only utilize the i.e. training of the HMM parameter optimization algorithm based on segmentation K mean algorithm to acoustic model data of the first in three kinds of optimized algorithms described in embodiment mono-, this algorithm is mainly used for accelerating the training speed of model, similar syllable is carried out to cluster, between the decode empty while shortening identification; And latter two improvement algorithm is mainly used for improving precision, speech recognition for us based on off-line, mainly for some vocabulary of specific industry application training, in model, vocabulary is little like this, the likelihood of different words is very low, use the DeGrain of these two kinds of algorithms, therefore a choice for use the first optimized algorithm.
Data in off-line speech recognition are prepared, data training, and speech recognition and interpretation of result, we use HTK instrument to complete.The software architecture of HTK as shown in figure 12, HTK speech processes flow process as shown in figure 13:
1.1, set up the file of storaged voice identification material requested.
Set up file data, this file is used for storing training and testing data, under data, set up two sub-directory data/train and data/test, under train, continue to set up two sub-directories, data/train/sig (the training utterance data of recording in order to storage) and data/train/mfcc (being used for storing the mfcc parameter after training data transforms); Data/test is used for store test data.Set up model file, for storing the associated documents of model of recognition system.Set up def file, for storing language model and dictionary.
1.2, create training set.
In this stage, we need to utilize the recording of voice signal of HTK instrument HSLab finished item vocabulary, and then utilizing this instrument is that each voice signal is write label, and an associated text is described the content of voice.The dos command that utilizes this instrument to complete this work is: HSLabname.sig, wherein name is concrete vocabulary phonetic.
1.2.1, recording
Press Rec button and start recorded speech signal, stop recording while pressing Stop, default sample frequency is 16KHZ, and Fig. 7 is that of numeral 5 records waveform.
1.2.2, mark signal
First press Mark button, then select you will open the region of label.After area marking, press Labelas, input label title, then presses enter key Enter.For each signal, we need to mark three continuous regions: starting to pause is labeled as sil, and recording word is labeled as project vocabulary name, and technology is paused and is labeled as sil.These three regions can not be overlapping, even if the gap between them is very little.After these three marks complete, press Save button, label file name.lab is created successfully.Be the label file of numeral 5: 41712509229375sil922937515043750name1504375020430625sil below, the wherein beginning of the each label of digitized representation and end sampled point.Such file can carry out manual modification, such as adjusting beginning or the end point of label.
1.3, acoustic analysis
Speech recognition tools can not directly be processed waveform voice, need to represent waveform voice by more succinct effective method, this just need to utilize HCopy instrument to carry out acoustic analysis, as shown in Figure 8, its dos command be HCopy-A-D-C analysis.conf-S targetlist.txt. wherein-parameter that A is display command,-D is for showing configuration setting,-C formulates configuration file,-S is assigned source file destination road strength, analysis.conf is for extracting parameter configuration files (being annotation after #), acoustics coefficient extracting parameter is set, we use MFCC as feature extraction parameter, all parameters comprise: 12 MFCC coefficient [c1, ..., c12] (because NUMCEPS=12), 1 MFCC coefficient c0, is directly proportional to the gross energy of frame (in TARGETKIND, suffix _ 0), 13 Delta coefficients, by [c0, c1 ..., c12] derivation (in TARGETKIND, suffix _ D), 13 Acceleration coefficients, (in TARGETKIND, suffix _ A).Targetlist.txt be used to specify for the treatment of title and the deposit position of each wave file, and the title of target factor file and deposit position.Analysis.conf configures as shown in Figure 9
1.4, HMM prototype definition
Set up file hmm_yi.hmm, be kept under htk/model/proto.As shown in figure 10, the HMM model of other vocabulary has identical content to Hmm_yi.hmm content, and in need only~h yi, yi changes corresponding vocabulary phonetic into.~h " yi " <BeginHMM>...<En dHMM>, the description of encapsulation to HMM model.
1.5, HMM training
HMM complete model is trained as shown in figure 11, comprises initialization and two parts of training
1.5.1 initialization
Utilize order line below to use Viterbi algorithm to carry out initialization to HMM model: Hinit-A-D-S trainlist.txt-M model/hmm1 – H model/proto/hmm0label – Lnameofhmmnameofhmm is the title that will carry out initialized HMM model.The .mfcc listed files that trainlist.txt has provided.Which label section label is used for indicating for training set.Model/hmm1 is the directory name (must create in advance) of initialization HMM model description result output.This process repeats each model.
1.5.2, training
That utilizes HTK instrument HRest once estimates iteration again, estimate the optimum value of HMM model parameter, order to be: HRest trainlist.txt-M model/hmmi-H model/hmmi-1/hmmfile-l label-L nameofhmm wherein nameofhmm is the title of the HMM model that will train.Hmmfile is the description document of the HMM model of nameofhmm by name.Complete list (the being stored in data/train/mfcc/) label that trainlist.txt provides the .mfcc file of composing training collection indicates the label using in training data.Model/hmmi, output directory, i represents current iteration number of times.For the each HMM model that will train, this process will repeat many times.Stop condition: measure and indicate convergence and be presented on screen by change after each HRest iteration.Once this measurement value no longer reduces (absolute value), process should stop, and then corresponding training result unification is put under hmm_result file.The similar training of other vocabulary.
1.6, task definition
The each file relevant to task should be stored in to special def/ catalogue
1.6.1 set up syntax rule and dictionary
Set up syntax rule file gram.txt (under def file), content is: * Task grammar
$WORD=YI|ER|...|SIL;
({SIL}[$WORD]{SIL})
Draw together SIL with bracket { } and represent that it can not exist or repeatedly (allow to pause for a long time, or do not pause at all) before or after word.Bracket [] is drawn together $ WORD and is represented zero or once occur (if there is no word, may just identify pause).
1.6.2 set up dictionary file
Dict.txt (under def file), content is:
2YI[1]yi
3ER[2]er
SIL[sil]sil
1.7, set up network task
Task grammer (describing in gram.txt) uses HParse instrument to compile, and generates Task Network, and dos command is:
HParsedef/gram.txt?def/net.slf
For guaranteeing that grammer does not have mistake, can test with HSGen instrument, dos command is:
HSGen-s?def/net.slfdef/dict.txt
1.8, speech recognition
Utilization is that the Hvite in HTK carries out speech recognition.Utilize previous ready dictionary, syntactic structure file, the acoustic model having trained, exports corresponding statement to speech data according to identification probability size, and utilizes the HResults instrument of HTK to analyze to the result of output.
1.9 interpretation of result
The interpretation of result mainly accuracy rate to speech recognition and recognition speed is assessed, and is a ring important in speech recognition.Interpretation of result instrument in HTK is HResults, and it carries out paired comparisons analysis by identification test result with completed HMM mark file, exports corresponding accuracy and accuracy rate (accuracy rate has been considered inserting error on the basis of accuracy).
2, Chinese word segmentation
The Chinese word segmentation of system of the present invention, has adopted the maximum matching way based in string matching to carry out Chinese word segmentation to the character string of speech recognition.According to the vocabulary in data dictionary, the character string that speech recognition is gone out by with data dictionary in terminology match, after the match is successful, the result of identification is become to the character string of one group of word composition by space-separated, specifically set forth at embodiment bis-, do not repeat them here.
3, word segmentation result echo interface
System of the present invention is in the process of character string echo, first utilizing split function is mark take space the character string after participle, after cutting apart, be deposited in a character string array, then by comparing with the gauge outfit field of bivariate table, the data of the gauge outfit that the match is successful deposit in key-value pair major key, corresponding worth data are deposited in key-value pair value, finally by key-value pair major key inquiry mode, value in key-value pair is inserted in bivariate table one by one, specifically set forth at embodiment bis-, do not repeated them here.
4, data processing
It is first the data that collect in system interface bivariate table to be deposited in the SQLite database of mobile terminal this locality that the present embodiment system data is processed what adopt, can be data not synchronous in local data base synchronously all in server database in the situation that of mobile terminal networking.When data are synchronous, system of the present invention both can gather immediately the synchronous data that just gathered, data that also can centralized and unified synchronous all collections together.In the time that data are synchronous, the time handle that system of the present invention adopts needs synchronous data encapsulation to together, changes into XML file, by TCP/IP procotol, utilize HTTP to connect, XML file transfer to Web server end, after analyzing XML file Data Update in the database of service end.
The foregoing is only embodiments of the invention; not thereby limit the scope of the claims of the present invention; every equivalent structure or conversion of equivalent flow process that utilizes instructions of the present invention and accompanying drawing content to do; or be directly or indirectly used in other relevant technical fields, be all in like manner included in scope of patent protection of the present invention.

Claims (10)

1. the off-line audio recognition method based on mobile terminal device, is characterized in that, comprising:
Obtain voice signal and extract the speech feature vector that described voice signal is corresponding;
Acoustic model based on preset in described mobile terminal device mates described speech feature vector, obtains the corresponding language character string of described speech feature vector; And language model and dictionary based on preset in described mobile terminal device mate described language character string, obtain the corresponding matched text data of described speech feature vector;
Calculate the output probability of described speech feature vector in described acoustic model, and output probability based on maximum in described output probability, obtain the corresponding matched text data of corresponding speech feature vector, obtain the final recognition result of described voice signal.
2. off-line audio recognition method according to claim 1, is characterized in that, also comprises: described final recognition result is carried out to Chinese word segmentation.
3. off-line audio recognition method according to claim 2, is characterized in that, described final recognition result is carried out to participle and comprise:
S501, the Chinese character that in a participle dictionary, maximum entry comprises is set counts n; Wherein, the matched text data that described final recognition result is corresponding are Chinese character string;
S502, get front n character in described Chinese character string sequence as matching field, search described participle dictionary;
If have the words corresponding with described matching field in described participle dictionary, the match is successful, and described matching field is split out as a word, and be stored into another character string newString, and separate by blank character and other words;
If can not find a words corresponding with described matching field in described participle dictionary, it fails to match, enters step S503;
S503, n is become to n-1, then the matching field for mating step S502 being taken out removes last Chinese character, as new matching field, search described participle dictionary, if have the words corresponding with new matching field in described participle dictionary, the match is successful, and described new matching field is split out as a word, and be stored in character string newString;
If it fails to match, repeating step S503, extremely till described new matching field is matched to merit;
S504, repeating step S502-S503, until all characters with matching field are matched to merit in described Chinese character string, complete the participle to described Chinese character string.
4. off-line audio recognition method according to claim 2, is characterized in that, also comprises: described final recognition result is shown to interface bivariate table.
5. off-line audio recognition method according to claim 4, is characterized in that, describedly described final recognition result is shown to interface bivariate table comprises:
S601, determine the field that described interface bivariate table need to gather, and these locality field deposit in character string array KeyWordString;
S602, the character string after participle, utilize split function take blank character as mark is divided into multiple fields, deposit in character string array InputString;
S603, from character string array InputString, take out a field, compare item by item with the field in KeyWordString, if there is coupling, this field in array InputString in corresponding subscript i storage array PointKeyWord; If do not mated, do not carry out any operation; Wherein, 0=<i<=n-1, n is the number of field in character string array InputString, i, is integer;
S604, from InputString, take out next field, compare item by item with the field in keyWordString, if the match is successful, this field corresponding subscript i+1 in InputString is deposited in PointKeyWord, array ValueString[i] be set to sky, if do not mated, ValueString[i] value be set to this field;
S605, repeating step S603 and step S604, to mating complete to all fields in InputString;
S606, coupling result store in Hashmap in the mode of key-value pair, utilize the Key of key-value pair and the gauge outfit of bivariate table to compare, and the value in key-value pair deposited in to the bivariate table at interface.
6. off-line audio recognition method according to claim 1, is characterized in that: by dimension bit algorithm, described speech feature vector is mated.
7. off-line audio recognition method according to claim 1, is characterized in that: by NGram algorithm, described language character string is mated.
8. the implementation method based on the speech recognition of mobile terminal device off-line, is characterized in that, comprising:
Collection project vocabulary;
Utilize HMM model training acoustic model data and language model data based on described project vocabulary;
Acoustic model data based on completing training is set up acoustic model, and the language model data based on completing training are set up language model, and utilizes text editor to create dictionary;
Described acoustic model, language model and dictionary are stored in to described mobile terminal device.
9. implementation method according to claim 8, is characterized in that, described acoustic model data is HMM parameter optimization Algorithm for Training based on segmentation K mean algorithm.
10. implementation method according to claim 8, is characterized in that, described language model data are based on NGram Algorithm for Training.
CN201310652535.2A 2013-12-05 2013-12-05 Based on the off-line audio recognition method of mobile terminal device and realize method Expired - Fee Related CN103810998B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310652535.2A CN103810998B (en) 2013-12-05 2013-12-05 Based on the off-line audio recognition method of mobile terminal device and realize method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310652535.2A CN103810998B (en) 2013-12-05 2013-12-05 Based on the off-line audio recognition method of mobile terminal device and realize method

Publications (2)

Publication Number Publication Date
CN103810998A true CN103810998A (en) 2014-05-21
CN103810998B CN103810998B (en) 2016-07-06

Family

ID=50707677

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310652535.2A Expired - Fee Related CN103810998B (en) 2013-12-05 2013-12-05 Based on the off-line audio recognition method of mobile terminal device and realize method

Country Status (1)

Country Link
CN (1) CN103810998B (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104754364A (en) * 2015-03-30 2015-07-01 合一信息技术(北京)有限公司 Video advertisement voice interaction system and method
CN106057196A (en) * 2016-07-08 2016-10-26 成都之达科技有限公司 Vehicular voice data analysis identification method
CN106356054A (en) * 2016-11-23 2017-01-25 广西大学 Method and system for collecting information of agricultural products based on voice recognition
CN107145509A (en) * 2017-03-28 2017-09-08 深圳市元征科技股份有限公司 A kind of information search method and its equipment
US10170122B2 (en) 2016-02-22 2019-01-01 Asustek Computer Inc. Speech recognition method, electronic device and speech recognition system
WO2019079962A1 (en) * 2017-10-24 2019-05-02 Beijing Didi Infinity Technology And Development Co., Ltd. System and method for speech recognition with decoupling awakening phrase
CN109726554A (en) * 2017-10-30 2019-05-07 武汉安天信息技术有限责任公司 A kind of detection method of rogue program, device and related application
CN109817226A (en) * 2019-03-29 2019-05-28 四川虹美智能科技有限公司 A kind of offline audio recognition method and device
CN110111774A (en) * 2019-05-13 2019-08-09 广西电网有限责任公司南宁供电局 Robot voice recognition methods and device
CN110930985A (en) * 2019-12-05 2020-03-27 携程计算机技术(上海)有限公司 Telephone speech recognition model, method, system, device and medium
CN111369966A (en) * 2018-12-06 2020-07-03 阿里巴巴集团控股有限公司 Method and device for personalized speech synthesis
CN111785275A (en) * 2020-06-30 2020-10-16 北京捷通华声科技股份有限公司 Voice recognition method and device
CN112581954A (en) * 2020-12-01 2021-03-30 杭州九阳小家电有限公司 High-matching voice interaction method and intelligent equipment
CN113539268A (en) * 2021-01-29 2021-10-22 南京迪港科技有限责任公司 End-to-end voice-to-text rare word optimization method
WO2022134025A1 (en) * 2020-12-25 2022-06-30 京东方科技集团股份有限公司 Offline speech recognition method and apparatus, electronic device and readable storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1293428A (en) * 2000-11-10 2001-05-02 清华大学 Information check method based on speed recognition
US20070033044A1 (en) * 2005-08-03 2007-02-08 Texas Instruments, Incorporated System and method for creating generalized tied-mixture hidden Markov models for automatic speech recognition
CN102063900A (en) * 2010-11-26 2011-05-18 北京交通大学 Speech recognition method and system for overcoming confusing pronunciation
CN102298927A (en) * 2010-06-25 2011-12-28 财团法人工业技术研究院 voice identifying system and method capable of adjusting use space of internal memory
CN102446428A (en) * 2010-09-27 2012-05-09 北京紫光优蓝机器人技术有限公司 Robot-based interactive learning system and interaction method thereof

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1293428A (en) * 2000-11-10 2001-05-02 清华大学 Information check method based on speed recognition
US20070033044A1 (en) * 2005-08-03 2007-02-08 Texas Instruments, Incorporated System and method for creating generalized tied-mixture hidden Markov models for automatic speech recognition
CN102298927A (en) * 2010-06-25 2011-12-28 财团法人工业技术研究院 voice identifying system and method capable of adjusting use space of internal memory
CN102446428A (en) * 2010-09-27 2012-05-09 北京紫光优蓝机器人技术有限公司 Robot-based interactive learning system and interaction method thereof
CN102063900A (en) * 2010-11-26 2011-05-18 北京交通大学 Speech recognition method and system for overcoming confusing pronunciation

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
何国斌等: "基于最大匹配的中文分词概率算法研究", 《计算机工程》, vol. 36, no. 5, 31 March 2010 (2010-03-31), pages 173 - 175 *
倪崇嘉等: "汉语大词汇量连续语音识别系统研究进展", 《中文信息学报》, vol. 23, no. 1, 31 January 2009 (2009-01-31) *
刘明宽等: "音节混淆字典及在汉语口音自适应中的应用研究", 《声学学报》, vol. 27, no. 1, 31 January 2002 (2002-01-31), pages 53 - 58 *
徐礼逵等: "语音识别HMM训练改进算法比较", 《计算机光盘软件与应用》, no. 23, 31 December 2012 (2012-12-31), pages 30 - 32 *

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104754364A (en) * 2015-03-30 2015-07-01 合一信息技术(北京)有限公司 Video advertisement voice interaction system and method
US10170122B2 (en) 2016-02-22 2019-01-01 Asustek Computer Inc. Speech recognition method, electronic device and speech recognition system
CN106057196B (en) * 2016-07-08 2019-06-11 成都之达科技有限公司 Vehicle voice data parses recognition methods
CN106057196A (en) * 2016-07-08 2016-10-26 成都之达科技有限公司 Vehicular voice data analysis identification method
CN106356054A (en) * 2016-11-23 2017-01-25 广西大学 Method and system for collecting information of agricultural products based on voice recognition
CN107145509A (en) * 2017-03-28 2017-09-08 深圳市元征科技股份有限公司 A kind of information search method and its equipment
CN107145509B (en) * 2017-03-28 2020-11-13 深圳市元征科技股份有限公司 Information searching method and equipment thereof
WO2019079962A1 (en) * 2017-10-24 2019-05-02 Beijing Didi Infinity Technology And Development Co., Ltd. System and method for speech recognition with decoupling awakening phrase
CN110809796A (en) * 2017-10-24 2020-02-18 北京嘀嘀无限科技发展有限公司 Speech recognition system and method with decoupled wake phrases
US10789946B2 (en) 2017-10-24 2020-09-29 Beijing Didi Infinity Technology And Development Co., Ltd. System and method for speech recognition with decoupling awakening phrase
CN109726554A (en) * 2017-10-30 2019-05-07 武汉安天信息技术有限责任公司 A kind of detection method of rogue program, device and related application
CN109726554B (en) * 2017-10-30 2021-05-18 武汉安天信息技术有限责任公司 Malicious program detection method and device
CN111369966A (en) * 2018-12-06 2020-07-03 阿里巴巴集团控股有限公司 Method and device for personalized speech synthesis
CN109817226A (en) * 2019-03-29 2019-05-28 四川虹美智能科技有限公司 A kind of offline audio recognition method and device
CN110111774A (en) * 2019-05-13 2019-08-09 广西电网有限责任公司南宁供电局 Robot voice recognition methods and device
CN110930985A (en) * 2019-12-05 2020-03-27 携程计算机技术(上海)有限公司 Telephone speech recognition model, method, system, device and medium
CN110930985B (en) * 2019-12-05 2024-02-06 携程计算机技术(上海)有限公司 Telephone voice recognition model, method, system, equipment and medium
CN111785275A (en) * 2020-06-30 2020-10-16 北京捷通华声科技股份有限公司 Voice recognition method and device
CN112581954A (en) * 2020-12-01 2021-03-30 杭州九阳小家电有限公司 High-matching voice interaction method and intelligent equipment
CN112581954B (en) * 2020-12-01 2023-08-04 杭州九阳小家电有限公司 High-matching voice interaction method and intelligent device
WO2022134025A1 (en) * 2020-12-25 2022-06-30 京东方科技集团股份有限公司 Offline speech recognition method and apparatus, electronic device and readable storage medium
CN113539268A (en) * 2021-01-29 2021-10-22 南京迪港科技有限责任公司 End-to-end voice-to-text rare word optimization method

Also Published As

Publication number Publication date
CN103810998B (en) 2016-07-06

Similar Documents

Publication Publication Date Title
CN103810998B (en) Based on the off-line audio recognition method of mobile terminal device and realize method
CN110717031B (en) Intelligent conference summary generation method and system
CN110364171B (en) Voice recognition method, voice recognition system and storage medium
CN107818164A (en) A kind of intelligent answer method and its system
CN109063159B (en) Entity relation extraction method based on neural network
CN110717018A (en) Industrial equipment fault maintenance question-answering system based on knowledge graph
CN109460459B (en) Log learning-based dialogue system automatic optimization method
CN114116994A (en) Welcome robot dialogue method
CN103677729A (en) Voice input method and system
CN110019741B (en) Question-answering system answer matching method, device, equipment and readable storage medium
CN109920415A (en) Nan-machine interrogation&#39;s method, apparatus, equipment and storage medium based on speech recognition
CN111292751B (en) Semantic analysis method and device, voice interaction method and device, and electronic equipment
CN109377981B (en) Phoneme alignment method and device
CN103309926A (en) Chinese and English-named entity identification method and system based on conditional random field (CRF)
CN102176310A (en) Speech recognition system with huge vocabulary
CN109949799B (en) Semantic parsing method and system
CN110910283A (en) Method, device, equipment and storage medium for generating legal document
CN112925945A (en) Conference summary generation method, device, equipment and storage medium
CN102236639A (en) System and method for updating language model
CN111695358B (en) Method and device for generating word vector, computer storage medium and electronic equipment
CN110119510A (en) A kind of Relation extraction method and device based on transmitting dependence and structural auxiliary word
CN102999533A (en) Textspeak identification method and system
CN110196963A (en) Model generation, the method for semantics recognition, system, equipment and storage medium
CN111192572A (en) Semantic recognition method, device and system
CN113312922A (en) Improved chapter-level triple information extraction method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20160706

Termination date: 20161205