CN1282069A - On-palm computer speech identification core software package - Google Patents
On-palm computer speech identification core software package Download PDFInfo
- Publication number
- CN1282069A CN1282069A CN99111131A CN99111131A CN1282069A CN 1282069 A CN1282069 A CN 1282069A CN 99111131 A CN99111131 A CN 99111131A CN 99111131 A CN99111131 A CN 99111131A CN 1282069 A CN1282069 A CN 1282069A
- Authority
- CN
- China
- Prior art keywords
- software package
- model
- recognition
- palm computer
- identification
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012549 training Methods 0.000 claims abstract description 22
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 17
- 238000000034 method Methods 0.000 claims abstract description 11
- 238000001514 detection method Methods 0.000 claims abstract description 10
- 238000013528 artificial neural network Methods 0.000 claims abstract description 6
- 238000010606 normalization Methods 0.000 claims description 5
- 230000007704 transition Effects 0.000 claims description 4
- 238000004364 calculation method Methods 0.000 claims description 2
- 230000011218 segmentation Effects 0.000 claims description 2
- 238000012821 model calculation Methods 0.000 claims 1
- 238000003909 pattern recognition Methods 0.000 abstract description 2
- 230000001419 dependent effect Effects 0.000 abstract 2
- 238000010845 search algorithm Methods 0.000 abstract 1
- 230000006870 function Effects 0.000 description 17
- 239000011800 void material Substances 0.000 description 4
- 238000000605 extraction Methods 0.000 description 3
- 238000012217 deletion Methods 0.000 description 2
- 230000037430 deletion Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 206010038743 Restlessness Diseases 0.000 description 1
- 230000003321 amplification Effects 0.000 description 1
- 239000012141 concentrate Substances 0.000 description 1
- 238000003199 nucleic acid amplification method Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
Images
Landscapes
- Complex Calculations (AREA)
Abstract
The on-palm computer speech recognition kernel software package is a speech recognition application program interface operated under the condition of on-palm computer environment by means of speaker-dependent qualified lexical quantity and isolated works. The speech recognition technique belongs to the field of pattern recognition technology. For isolated words said invention creates speaker-dependent continuous density implicit Markov model, and its software package interface includes starting and ending recognition, training/recognition, model management, menu management and recognition parameter configuration, adopting end-point detection algorithm based on time domain energy, extracting LPC cepstrum characteristic parameter, adopting Viterbi search algorithm recognition and using neural network to make recognition/reject decision.
Description
The present invention is the complete solution of specific person alone word speech recognition kernel software bag under the palmtop computer environment, and it comprises the design of general frame, the classification of interface, and implementation algorithm.Speech recognition problem is a typical pattern recognition problem.
Speech recognition had obtained development rapidly in recent years, all obtained remarkable progress in Study on Problems such as modeling, training, search, robustness, self-adaptation, had accumulated a lot of practical experiences and theoretical method.The modeling method that adopts in the main flow algorithm is based on the continuous density hidden Markov model of statistical law, handles large batch of data, and training and identifying all need to consume many especially storage and computational resource through a large amount of, complex calculations.Compare with personal computer, the palmtop computer arithmetic speed is slow, only be equivalent to 386 levels, and internal memory is fewer, generally only be 8M, built-in sound pick-up outfit signal to noise ratio (S/N ratio) is very low, considers the needs of practical application, training sample can not be too many, and the speech recognition software of exploitation practicability is surrounded by certain degree of difficulty under such environment.
The objective of the invention is to: under resource-constrained situation, be directed to the characteristics of palmtop computer special messenger special use, the complete solution of the specific person alone word speech recognition of palmtop computer kernel software bag is provided, and making in the palmtop computer application program to increase speech identifying function very easily.
Technical essential of the present invention is as shown in Figure 1: it comprises bottom, transition bed and interface layer.The voice signal pre-service of bottom part to from sound pick-up outfit, producing, end-point detection is extracted feature, and training pattern is discerned/is refused to know and adjudicates.The interface function that interface layer provides application program to need realizes that the core that transition bed connects bottom is implemented to interface layer.
The present invention adopts Object Oriented method, and all API is encapsulated in the interface class, and the example that application program only need be created this interface utilizes interface pointer, just can call any interface function.In order to increase the dirigibility of application program, predefine of the present invention a notice class IVCmdNotifySink, such all function all is a Virtual Function, application program must realize one from notifying class to inherit the class of getting off, be used for informing that the identification core detects speech events and (begins, finishes such as voice, and identify an order etc.) time this execution what operate, it is defined as follows: c1assIVCmdNotifySink{public:virtual void UtteranceBegin (void)=0; /
*Detecting sound begins
*/ virtual void UtteranceEnd (void)=0; /
*Sound finishes
*/ virtual void VUMeter (int Volume)=0; /
*Volume (from 0 to 100)
*/ virtual void CmdStart (void)=0; /
*The beginning recognizing voice
*/ virtual void CmdNotUnderstand (void)=0; /
*Do not understand, refuse to know
*/ virtual void CmdRecognize (SRecogResult result)=0; /
*Recognition result
*/;
Application program and software package are discerned the relation between the core as can be seen from Figure 1.
The present invention has used the notion of " menu ", and in fact it is exactly a command history.Order can be added at any time, train.Figure 2 shows that the interface of training order, each is ordered with three samples, and correct for guaranteeing recognition result, the present invention provides playback function for the training sample of recording.According to function, API provided by the invention can be divided into following five classes:
1. startup/end identification: this class interface is used for the registration notification class and finishes whole identification core, comprises two functions, the registration function Register that should call when being the startup of whole software bag, and it finishes some initialization operations; Another is End, and it is to be called before whole application program finishes, this interface releasing memory, and preserve settings such as relevant model and parameter;
2. training/identification: control identification beginning and time-out, and training process;
3. model management: comprise increase, model of deletion, inquire about certain model and whether exist, by the sequence number retrieval model etc.;
4. menu management: comprise and create a voice menu determine which cover menu of current use, for order project of the current set of menu interpolation/deletion, inquire about and be provided with command entry purpose current state etc. simultaneously;
5. identification parameter configuration: for different applied environments, such as ground unrest, the word speed of speaking etc., if recognizer is done corresponding the adjustment, can reach higher accuracy rate, the acoustic conductance core provides this dirigibility, for different application demands provides convenience, comprise whether enabling and refuse to know, dispose end-point detection algorithm etc.The parameter setting is very directly perceived, and the Application developer need not know most speech recognition professional knowledge.
Bottom core processing of the present invention as shown in Figure 1 comprises: recording, and pre-service, end-point detection, feature extraction, knowledge judgement etc. is discerned/refused to training algorithm.Below tell about respectively:
1. recording: utilize the built-in sound pick-up outfit of palm machine, the voice signal sampling rate is 8K/s, every sample 16bit.
2. pre-service: 392 samples of every frame, the overlapping field of consecutive frame.The pre-emphasis formula is: x
1[n]=x[n]-0.97*x[n-1].
3. end-point detection: based on time domain energy, every frame energy is the quadratic sum of sample after the pre-service.Realization flow comprises following five steps as shown in Figure 3: the estimated background noise also calculates bound, mark and begins, be sure of beginning, mark distal point and be sure of to finish.
4. feature extraction: characteristic parameter adopts the LPC cepstrum.Sample after the pre-emphasis through the hamming window, calculates coefficient of autocorrelation earlier then, and the Levinson-Durbin iterative algorithm by standard draws 12 rank LPC parameters again, and iteration goes out 12 rank LPC cepstrum parameters then, makes single order at last, second order difference.Characteristic parameter comprises the time domain energy of normalization and first order difference, second order difference and 12 rank LPC cepstrum parameters and first order difference thereof, second order difference totally 39 dimension parameters.With 16 integer representations, the training of back and identification can fix a point to realize like this after the characteristic parameter amplification certain multiple.The concrete computing formula of LPC cepstrum is as follows:
1) hamming window function:
2) coefficient of autocorrelation: establish through pre-service, the voice signal after the windowing is s
1[n], then coefficient of autocorrelation is:
The coefficient of autocorrelation of normalization is:
L=0 wherein, 1,2 ..., P
3) Levinson-Durbin iterative computation 12 rank LPC parameters:
Wherein P is the rank 12 of prediction, last a
(P) j, 1≤j≤P is the LPC coefficient of prediction
4) cepstrum parameter:
H wherein
iBe cepstrum coefficient, a
nBe the LPC coefficient.
5. training: 3 training samples are recorded in each order, set up 8 states as shown in Figure 4, from left to right the continuous density hidden Markov model.Utilize even segmentation result initialization model, on existing model parameter basis training sample is cut apart with the Viterbi algorithm, and then reappraised model parameter, so the iteration multipass such as 5 times, obtains the final mask parameter.
6. discern/refuse and know judging process: adopt Viterbi searching algorithm to calculate the likelihood ratio logarithm score of unknown sample each model with standard, then the best model of score is extracted 3 index parameters, deliver in the neural network as shown in Figure 5 to discern/refuse to know and adjudicate, detailed process is as follows, and wherein N is a model number total in the application program:
(1) carries out normalization to the likelihood Log score of unknown pattern X, and according to frame length with N the model of Viterbi searching algorithm calculating of standard;
(2) to N score according to from high to low the ordering, might as well establish score and be respectively S
1-S
N
(3) first place is calculated its three index x
1, x
2, x
3:
Mean wherein
1Average during for model 1 training after the normalization of sample frame length, α is a constant, value is between 0.5 to 0.9.Index 3 is confidence levels of model 1, and the summation in the braces is those model score mean values close with model 1 score, and close degree is determined by constant alpha;
(4) discern/refuse the knowledge judgement according to three indexs:
y
0=x
1 *W
01+x
2 *W
02+x
3 *W
03 y
1=x
1 *W
11+x
2 *W
12+x
3 *W
13
Wherein the W coefficient is the network weight coefficient that has trained.
At y
0>y
1Situation under, be identified as model 1, and provide score to come the many of front
Individual candidate result.At y
0≤ y
1Situation under, refuse to know.
7. network weight coefficient learning algorithm: initial network weight coefficient picked at random, adjust weight coefficient according to the learning algorithm that the teacher is arranged: Δ W
i=a (t) [x
i(t)-W
i(t)], x wherein
I(t) be t input sample constantly, adjust coefficient a (t)=0.1* (1.0-t/M), M is a frequency of training.
The invention has the advantages that:
1. maximum vocabulary can reach 200, and occupying system resources is few, and the recognition accuracy of general name is surpassed 95%, simultaneously and accent, dialect and languages have nothing to do.Fixed point realizes having increased substantially feature extraction and recognition speed, can handle in real time on the palm machine platform, reaches degree of being practical;
2. the software package Frame Design is reasonable, and interface function is perfect, can satisfy palm machine application program requirements such as generally being similar to Voice Navigation, visiting-card management, sound dialing.While interface function friendly interface, the developer does not need to understand a lot of speech recognition professional knowledge;
3. background noise dynamic estimation, the starting and ending point of detection all have a process of confirming, so the end-point detection algorithm can adapt to the environment of variation, very high precision and robustness are arranged;
4. when discerning/refusing the knowledge judgement with neural network, the ratio that need not determine that factor to account for by rule of thumb is great, and network is adjusted automatically, aggregative weighted; Avoided simple thresholding strategy; 0/1 judgement index all is a number percent, and each index haggles on same level, has dwindled dynamic range, has certain universal significance; Index definition is reasonable, and the close word of pronunciation has been taken into account, and utilizes Useful Information as much as possible.
Description of drawings:
Fig. 1 is the mutual relationship between software package general frame and application program and the kernel software bag.
Fig. 2 is the training order interface.
Fig. 3 is the end-point detection algorithm flow.
Fig. 4 is the hidden Markov model topological structure of each order.
Fig. 5 knows the decision neural network topological structure for discerning/refusing.
The identification name of Fig. 6 on the software package basis, developing, the palm machine Application Program Interface of place name.
Embodiment:
It is very simple to use the present invention to add speech identifying function as palm machine application program, only needs a dynamic base VcmdPpcApi.dll and header file SpeechApi.h.
The present invention adopts Object Oriented method, and all interfaces all concentrate among the dynamic base VcmdPpcApi.dll, use this dynamic base, at first need it is registered in the system.
Application program is at the calling interface function or use in the source file of predefined structure and constant in the SpeechApi.h header file and should comprise this header file.
When using development of practical program of the present invention generally according to following step:
1. realize that a notice class IV CmdNotifySink inherits the class of getting off, and define such example, its address is passed in the past the whole acoustic conductance core of initialization so that call Register registration interface function when creating interface instance.
2. establishment interface instance is to obtain interface pointer.
3. call Register interface function registration notification class, the identification core begins to start.
4. call different API as required.Generally created menu before this, determined the current set of menu, and added different order projects, can train or discern then.
5. before whole procedure finishes, what calling interface End preserved software package has related parameter and a setting.
Fig. 6 is one and uses the palm machine identification place name that the present invention writes and the application program of name that it almost relates to all interface functions.The user can arbitrarily add, deletes, trains and discern menucommand.Recognition result provides nearly 5 candidates, shows from high to low according to the model score.Figure 2 shows that the training order dialog box.All orders are presented in the list box, and double-click can be trained this order.
Claims (7)
1, a kind of on-palm computer speech identification core software package of forming by bottom, transition bed, api interface layer, it is characterized in that: bottom is to be linked by the sound pick-up outfit of built-in computer and pretreatment module, link with end-point detection again, be linked to after the characteristic extracting module respectively connection then again and discern/refuse and know or training module, its result is through realizing linking with Application Program Interface by class; Application Program Interface calls following five class interfaces that the api interface layer provides: startup/end identification, training/identification, model management, menu management and identification parameter configuration; The transition bed of being made up of the identification kernel connects bottom layer realization and api interface layer.
2, on-palm computer speech identification core software package according to claim 1, it is characterized in that the end-point detection algorithm adopts time domain energy, testing process comprises following five steps: the estimated background noise and calculate bound, mark begins, be sure of beginning, mark distal point and be sure of to finish.
3, on-palm computer speech identification core software package according to claim 1, it is characterized in that characteristic parameter adopts the time domain energy of normalization and first order difference, second order difference and 12 rank LPC cepstrum parameters and first order difference thereof, second order difference totally 39 dimension parameters.
4, on-palm computer speech identification core software package according to claim 1 is characterized in that training module trains from left to right continuous density hidden Markov model to isolated word; Utilize even segmentation result initialization model, repeatedly on existing model parameter basis, training sample is cut apart with the Viterbi algorithm, and then reappraised model parameter.
5, on-palm computer speech identification core software package according to claim 1, treat the likelihood ratio logarithm score of knowing each model of sample calculation with the Viterbi searching algorithm when it is characterized in that discerning, and the knowledge judgement is discerned/refused to the model of score first; Under the identification situation, software package also provides many candidates.
6, on-palm computer speech identification core software package according to claim 5 is characterized in that discerning/refuses to know that when judgement is to the several index parameters of the Model Calculation of score first, again through the neural network comprehensive assessment; Under the identification situation, unknown sample is identified as the highest model of score; Refusing under the knowledge situation, unknown sample is considered to gather outer model.
7, on-palm computer speech identification core software package according to claim 6 is characterized in that neural network adopts the Kohonen self-organizing network, and initial network weight coefficient picked at random is adjusted weight coefficient according to the learning algorithm that the teacher is arranged.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN99111131A CN1282069A (en) | 1999-07-27 | 1999-07-27 | On-palm computer speech identification core software package |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN99111131A CN1282069A (en) | 1999-07-27 | 1999-07-27 | On-palm computer speech identification core software package |
Publications (1)
Publication Number | Publication Date |
---|---|
CN1282069A true CN1282069A (en) | 2001-01-31 |
Family
ID=5274900
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN99111131A Pending CN1282069A (en) | 1999-07-27 | 1999-07-27 | On-palm computer speech identification core software package |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN1282069A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1331092C (en) * | 2004-05-17 | 2007-08-08 | 中国科学院半导体研究所 | Special purpose neural net computer system for pattern recognition and application method |
WO2007134494A1 (en) * | 2006-05-16 | 2007-11-29 | Zhongwei Huang | A computer auxiliary method suitable for multi-languages pronunciation learning system for deaf-mute |
CN100397387C (en) * | 2002-11-28 | 2008-06-25 | 新加坡科技研究局 | Summarizing digital audio data |
CN102254558A (en) * | 2011-07-01 | 2011-11-23 | 重庆邮电大学 | Control method of intelligent wheel chair voice recognition based on end point detection |
CN102881099A (en) * | 2012-09-25 | 2013-01-16 | 北京声迅电子股份有限公司 | Antitheft alarming method and device applied to automatic teller machine (ATM) |
CN103020048A (en) * | 2013-01-08 | 2013-04-03 | 深圳大学 | Method and system for language translation |
-
1999
- 1999-07-27 CN CN99111131A patent/CN1282069A/en active Pending
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN100397387C (en) * | 2002-11-28 | 2008-06-25 | 新加坡科技研究局 | Summarizing digital audio data |
CN1331092C (en) * | 2004-05-17 | 2007-08-08 | 中国科学院半导体研究所 | Special purpose neural net computer system for pattern recognition and application method |
WO2007134494A1 (en) * | 2006-05-16 | 2007-11-29 | Zhongwei Huang | A computer auxiliary method suitable for multi-languages pronunciation learning system for deaf-mute |
CN102254558A (en) * | 2011-07-01 | 2011-11-23 | 重庆邮电大学 | Control method of intelligent wheel chair voice recognition based on end point detection |
CN102254558B (en) * | 2011-07-01 | 2012-10-03 | 重庆邮电大学 | Control method of intelligent wheel chair voice recognition based on end point detection |
CN102881099A (en) * | 2012-09-25 | 2013-01-16 | 北京声迅电子股份有限公司 | Antitheft alarming method and device applied to automatic teller machine (ATM) |
CN103020048A (en) * | 2013-01-08 | 2013-04-03 | 深圳大学 | Method and system for language translation |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021208287A1 (en) | Voice activity detection method and apparatus for emotion recognition, electronic device, and storage medium | |
US8024188B2 (en) | Method and system of optimal selection strategy for statistical classifications | |
JP5241379B2 (en) | Method and system for optimal selection strategy for statistical classification in dialogue systems | |
CN104143326B (en) | A kind of voice command identification method and device | |
CN107221320A (en) | Train method, device, equipment and the computer-readable storage medium of acoustic feature extraction model | |
US7475013B2 (en) | Speaker recognition using local models | |
JP5223673B2 (en) | Audio processing apparatus and program, and audio processing method | |
US20020188446A1 (en) | Method and apparatus for distribution-based language model adaptation | |
CN112912897A (en) | Sound classification system | |
CN1726532A (en) | Sensor based speech recognizer selection, adaptation and combination | |
CN112735383A (en) | Voice signal processing method, device, equipment and storage medium | |
US7809564B2 (en) | Voice based keyword search algorithm | |
CN111243569B (en) | Emotional voice automatic generation method and device based on generation type confrontation network | |
CN113628612A (en) | Voice recognition method and device, electronic equipment and computer readable storage medium | |
CN111508480A (en) | Training method of audio recognition model, audio recognition method, device and equipment | |
US6718306B1 (en) | Speech collating apparatus and speech collating method | |
CN1213398C (en) | Method and system for non-intrusive speaker verification using behavior model | |
CN1282069A (en) | On-palm computer speech identification core software package | |
Ney | An optimization algorithm for determining the endpoints of isolated utterances | |
JP4219539B2 (en) | Acoustic classification device | |
KR101229108B1 (en) | Apparatus for utterance verification based on word specific confidence threshold | |
CN111506764B (en) | Audio data screening method, computer device and storage medium | |
CN116450848B (en) | Method, device and medium for evaluating computing thinking level based on event map | |
CN111540363B (en) | Keyword model and decoding network construction method, detection method and related equipment | |
CN115132198B (en) | Data processing method, device, electronic equipment, program product and medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C06 | Publication | ||
PB01 | Publication | ||
C02 | Deemed withdrawal of patent application after publication (patent law 2001) | ||
WD01 | Invention patent application deemed withdrawn after publication |