CN1282069A - On-palm computer speech identification core software package - Google Patents

On-palm computer speech identification core software package Download PDF

Info

Publication number
CN1282069A
CN1282069A CN99111131A CN99111131A CN1282069A CN 1282069 A CN1282069 A CN 1282069A CN 99111131 A CN99111131 A CN 99111131A CN 99111131 A CN99111131 A CN 99111131A CN 1282069 A CN1282069 A CN 1282069A
Authority
CN
China
Prior art keywords
software package
model
recognition
palm computer
identification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN99111131A
Other languages
Chinese (zh)
Inventor
邓勇刚
徐波
黄泰翼
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN99111131A priority Critical patent/CN1282069A/en
Publication of CN1282069A publication Critical patent/CN1282069A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Complex Calculations (AREA)

Abstract

The on-palm computer speech recognition kernel software package is a speech recognition application program interface operated under the condition of on-palm computer environment by means of speaker-dependent qualified lexical quantity and isolated works. The speech recognition technique belongs to the field of pattern recognition technology. For isolated words said invention creates speaker-dependent continuous density implicit Markov model, and its software package interface includes starting and ending recognition, training/recognition, model management, menu management and recognition parameter configuration, adopting end-point detection algorithm based on time domain energy, extracting LPC cepstrum characteristic parameter, adopting Viterbi search algorithm recognition and using neural network to make recognition/reject decision.

Description

The on-palm computer speech identification core software package
The present invention is the complete solution of specific person alone word speech recognition kernel software bag under the palmtop computer environment, and it comprises the design of general frame, the classification of interface, and implementation algorithm.Speech recognition problem is a typical pattern recognition problem.
Speech recognition had obtained development rapidly in recent years, all obtained remarkable progress in Study on Problems such as modeling, training, search, robustness, self-adaptation, had accumulated a lot of practical experiences and theoretical method.The modeling method that adopts in the main flow algorithm is based on the continuous density hidden Markov model of statistical law, handles large batch of data, and training and identifying all need to consume many especially storage and computational resource through a large amount of, complex calculations.Compare with personal computer, the palmtop computer arithmetic speed is slow, only be equivalent to 386 levels, and internal memory is fewer, generally only be 8M, built-in sound pick-up outfit signal to noise ratio (S/N ratio) is very low, considers the needs of practical application, training sample can not be too many, and the speech recognition software of exploitation practicability is surrounded by certain degree of difficulty under such environment.
The objective of the invention is to: under resource-constrained situation, be directed to the characteristics of palmtop computer special messenger special use, the complete solution of the specific person alone word speech recognition of palmtop computer kernel software bag is provided, and making in the palmtop computer application program to increase speech identifying function very easily.
Technical essential of the present invention is as shown in Figure 1: it comprises bottom, transition bed and interface layer.The voice signal pre-service of bottom part to from sound pick-up outfit, producing, end-point detection is extracted feature, and training pattern is discerned/is refused to know and adjudicates.The interface function that interface layer provides application program to need realizes that the core that transition bed connects bottom is implemented to interface layer.
The present invention adopts Object Oriented method, and all API is encapsulated in the interface class, and the example that application program only need be created this interface utilizes interface pointer, just can call any interface function.In order to increase the dirigibility of application program, predefine of the present invention a notice class IVCmdNotifySink, such all function all is a Virtual Function, application program must realize one from notifying class to inherit the class of getting off, be used for informing that the identification core detects speech events and (begins, finishes such as voice, and identify an order etc.) time this execution what operate, it is defined as follows: c1assIVCmdNotifySink{public:virtual void UtteranceBegin (void)=0; / *Detecting sound begins */ virtual void UtteranceEnd (void)=0; / *Sound finishes */ virtual void VUMeter (int Volume)=0; / *Volume (from 0 to 100) */ virtual void CmdStart (void)=0; / *The beginning recognizing voice */ virtual void CmdNotUnderstand (void)=0; / *Do not understand, refuse to know */ virtual void CmdRecognize (SRecogResult result)=0; / *Recognition result */;
Application program and software package are discerned the relation between the core as can be seen from Figure 1.
The present invention has used the notion of " menu ", and in fact it is exactly a command history.Order can be added at any time, train.Figure 2 shows that the interface of training order, each is ordered with three samples, and correct for guaranteeing recognition result, the present invention provides playback function for the training sample of recording.According to function, API provided by the invention can be divided into following five classes:
1. startup/end identification: this class interface is used for the registration notification class and finishes whole identification core, comprises two functions, the registration function Register that should call when being the startup of whole software bag, and it finishes some initialization operations; Another is End, and it is to be called before whole application program finishes, this interface releasing memory, and preserve settings such as relevant model and parameter;
2. training/identification: control identification beginning and time-out, and training process;
3. model management: comprise increase, model of deletion, inquire about certain model and whether exist, by the sequence number retrieval model etc.;
4. menu management: comprise and create a voice menu determine which cover menu of current use, for order project of the current set of menu interpolation/deletion, inquire about and be provided with command entry purpose current state etc. simultaneously;
5. identification parameter configuration: for different applied environments, such as ground unrest, the word speed of speaking etc., if recognizer is done corresponding the adjustment, can reach higher accuracy rate, the acoustic conductance core provides this dirigibility, for different application demands provides convenience, comprise whether enabling and refuse to know, dispose end-point detection algorithm etc.The parameter setting is very directly perceived, and the Application developer need not know most speech recognition professional knowledge.
Bottom core processing of the present invention as shown in Figure 1 comprises: recording, and pre-service, end-point detection, feature extraction, knowledge judgement etc. is discerned/refused to training algorithm.Below tell about respectively:
1. recording: utilize the built-in sound pick-up outfit of palm machine, the voice signal sampling rate is 8K/s, every sample 16bit.
2. pre-service: 392 samples of every frame, the overlapping field of consecutive frame.The pre-emphasis formula is: x 1[n]=x[n]-0.97*x[n-1].
3. end-point detection: based on time domain energy, every frame energy is the quadratic sum of sample after the pre-service.Realization flow comprises following five steps as shown in Figure 3: the estimated background noise also calculates bound, mark and begins, be sure of beginning, mark distal point and be sure of to finish.
4. feature extraction: characteristic parameter adopts the LPC cepstrum.Sample after the pre-emphasis through the hamming window, calculates coefficient of autocorrelation earlier then, and the Levinson-Durbin iterative algorithm by standard draws 12 rank LPC parameters again, and iteration goes out 12 rank LPC cepstrum parameters then, makes single order at last, second order difference.Characteristic parameter comprises the time domain energy of normalization and first order difference, second order difference and 12 rank LPC cepstrum parameters and first order difference thereof, second order difference totally 39 dimension parameters.With 16 integer representations, the training of back and identification can fix a point to realize like this after the characteristic parameter amplification certain multiple.The concrete computing formula of LPC cepstrum is as follows:
1) hamming window function:
2) coefficient of autocorrelation: establish through pre-service, the voice signal after the windowing is s 1[n], then coefficient of autocorrelation is:
Figure 99111131000721
The coefficient of autocorrelation of normalization is: r [ l ] = R [ l ] / R [ 0 ] ′ L=0 wherein, 1,2 ..., P
3) Levinson-Durbin iterative computation 12 rank LPC parameters: E 0 = r [ 0 ] k i = r [ i ] - ∑ α j ( i - 1 ) · [ i - j ] E i - 1 , 1 ≤ j ≤ P α j ( i ) = k i α j ( i ) = α j ( i - 1 ) - k i · α i - j ( i - 1 ) , 1 ≤ j ≤ i - 1 E i = ( 1 - k i 2 ) E i - 1 Wherein P is the rank 12 of prediction, last a (P) j, 1≤j≤P is the LPC coefficient of prediction
4) cepstrum parameter: h 1 = a 1 hn = an + &Sigma; k = 1 n - 1 ( 1 - k n ) anhn - k , 1 < n &le; P
H wherein iBe cepstrum coefficient, a nBe the LPC coefficient.
5. training: 3 training samples are recorded in each order, set up 8 states as shown in Figure 4, from left to right the continuous density hidden Markov model.Utilize even segmentation result initialization model, on existing model parameter basis training sample is cut apart with the Viterbi algorithm, and then reappraised model parameter, so the iteration multipass such as 5 times, obtains the final mask parameter.
6. discern/refuse and know judging process: adopt Viterbi searching algorithm to calculate the likelihood ratio logarithm score of unknown sample each model with standard, then the best model of score is extracted 3 index parameters, deliver in the neural network as shown in Figure 5 to discern/refuse to know and adjudicate, detailed process is as follows, and wherein N is a model number total in the application program:
(1) carries out normalization to the likelihood Log score of unknown pattern X, and according to frame length with N the model of Viterbi searching algorithm calculating of standard;
(2) to N score according to from high to low the ordering, might as well establish score and be respectively S 1-S N
(3) first place is calculated its three index x 1, x 2, x 3: x 1 = S 1 / mea n 1 , x 2 = S 2 / S 1 x 3 = ( S 1 - 1 M k &NotEqual; 1 , S k &GreaterEqual; &alpha;S &Sigma; 1 ) / S 1
Mean wherein 1Average during for model 1 training after the normalization of sample frame length, α is a constant, value is between 0.5 to 0.9.Index 3 is confidence levels of model 1, and the summation in the braces is those model score mean values close with model 1 score, and close degree is determined by constant alpha;
(4) discern/refuse the knowledge judgement according to three indexs:
y 0=x 1 *W 01+x 2 *W 02+x 3 *W 03 y 1=x 1 *W 11+x 2 *W 12+x 3 *W 13
Wherein the W coefficient is the network weight coefficient that has trained.
At y 0>y 1Situation under, be identified as model 1, and provide score to come the many of front
Individual candidate result.At y 0≤ y 1Situation under, refuse to know.
7. network weight coefficient learning algorithm: initial network weight coefficient picked at random, adjust weight coefficient according to the learning algorithm that the teacher is arranged: Δ W i=a (t) [x i(t)-W i(t)], x wherein I(t) be t input sample constantly, adjust coefficient a (t)=0.1* (1.0-t/M), M is a frequency of training.
The invention has the advantages that:
1. maximum vocabulary can reach 200, and occupying system resources is few, and the recognition accuracy of general name is surpassed 95%, simultaneously and accent, dialect and languages have nothing to do.Fixed point realizes having increased substantially feature extraction and recognition speed, can handle in real time on the palm machine platform, reaches degree of being practical;
2. the software package Frame Design is reasonable, and interface function is perfect, can satisfy palm machine application program requirements such as generally being similar to Voice Navigation, visiting-card management, sound dialing.While interface function friendly interface, the developer does not need to understand a lot of speech recognition professional knowledge;
3. background noise dynamic estimation, the starting and ending point of detection all have a process of confirming, so the end-point detection algorithm can adapt to the environment of variation, very high precision and robustness are arranged;
4. when discerning/refusing the knowledge judgement with neural network, the ratio that need not determine that factor to account for by rule of thumb is great, and network is adjusted automatically, aggregative weighted; Avoided simple thresholding strategy; 0/1 judgement index all is a number percent, and each index haggles on same level, has dwindled dynamic range, has certain universal significance; Index definition is reasonable, and the close word of pronunciation has been taken into account, and utilizes Useful Information as much as possible.
Description of drawings:
Fig. 1 is the mutual relationship between software package general frame and application program and the kernel software bag.
Fig. 2 is the training order interface.
Fig. 3 is the end-point detection algorithm flow.
Fig. 4 is the hidden Markov model topological structure of each order.
Fig. 5 knows the decision neural network topological structure for discerning/refusing.
The identification name of Fig. 6 on the software package basis, developing, the palm machine Application Program Interface of place name.
Embodiment:
It is very simple to use the present invention to add speech identifying function as palm machine application program, only needs a dynamic base VcmdPpcApi.dll and header file SpeechApi.h.
The present invention adopts Object Oriented method, and all interfaces all concentrate among the dynamic base VcmdPpcApi.dll, use this dynamic base, at first need it is registered in the system.
Application program is at the calling interface function or use in the source file of predefined structure and constant in the SpeechApi.h header file and should comprise this header file.
When using development of practical program of the present invention generally according to following step:
1. realize that a notice class IV CmdNotifySink inherits the class of getting off, and define such example, its address is passed in the past the whole acoustic conductance core of initialization so that call Register registration interface function when creating interface instance.
2. establishment interface instance is to obtain interface pointer.
3. call Register interface function registration notification class, the identification core begins to start.
4. call different API as required.Generally created menu before this, determined the current set of menu, and added different order projects, can train or discern then.
5. before whole procedure finishes, what calling interface End preserved software package has related parameter and a setting.
Fig. 6 is one and uses the palm machine identification place name that the present invention writes and the application program of name that it almost relates to all interface functions.The user can arbitrarily add, deletes, trains and discern menucommand.Recognition result provides nearly 5 candidates, shows from high to low according to the model score.Figure 2 shows that the training order dialog box.All orders are presented in the list box, and double-click can be trained this order.

Claims (7)

1, a kind of on-palm computer speech identification core software package of forming by bottom, transition bed, api interface layer, it is characterized in that: bottom is to be linked by the sound pick-up outfit of built-in computer and pretreatment module, link with end-point detection again, be linked to after the characteristic extracting module respectively connection then again and discern/refuse and know or training module, its result is through realizing linking with Application Program Interface by class; Application Program Interface calls following five class interfaces that the api interface layer provides: startup/end identification, training/identification, model management, menu management and identification parameter configuration; The transition bed of being made up of the identification kernel connects bottom layer realization and api interface layer.
2, on-palm computer speech identification core software package according to claim 1, it is characterized in that the end-point detection algorithm adopts time domain energy, testing process comprises following five steps: the estimated background noise and calculate bound, mark begins, be sure of beginning, mark distal point and be sure of to finish.
3, on-palm computer speech identification core software package according to claim 1, it is characterized in that characteristic parameter adopts the time domain energy of normalization and first order difference, second order difference and 12 rank LPC cepstrum parameters and first order difference thereof, second order difference totally 39 dimension parameters.
4, on-palm computer speech identification core software package according to claim 1 is characterized in that training module trains from left to right continuous density hidden Markov model to isolated word; Utilize even segmentation result initialization model, repeatedly on existing model parameter basis, training sample is cut apart with the Viterbi algorithm, and then reappraised model parameter.
5, on-palm computer speech identification core software package according to claim 1, treat the likelihood ratio logarithm score of knowing each model of sample calculation with the Viterbi searching algorithm when it is characterized in that discerning, and the knowledge judgement is discerned/refused to the model of score first; Under the identification situation, software package also provides many candidates.
6, on-palm computer speech identification core software package according to claim 5 is characterized in that discerning/refuses to know that when judgement is to the several index parameters of the Model Calculation of score first, again through the neural network comprehensive assessment; Under the identification situation, unknown sample is identified as the highest model of score; Refusing under the knowledge situation, unknown sample is considered to gather outer model.
7, on-palm computer speech identification core software package according to claim 6 is characterized in that neural network adopts the Kohonen self-organizing network, and initial network weight coefficient picked at random is adjusted weight coefficient according to the learning algorithm that the teacher is arranged.
CN99111131A 1999-07-27 1999-07-27 On-palm computer speech identification core software package Pending CN1282069A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN99111131A CN1282069A (en) 1999-07-27 1999-07-27 On-palm computer speech identification core software package

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN99111131A CN1282069A (en) 1999-07-27 1999-07-27 On-palm computer speech identification core software package

Publications (1)

Publication Number Publication Date
CN1282069A true CN1282069A (en) 2001-01-31

Family

ID=5274900

Family Applications (1)

Application Number Title Priority Date Filing Date
CN99111131A Pending CN1282069A (en) 1999-07-27 1999-07-27 On-palm computer speech identification core software package

Country Status (1)

Country Link
CN (1) CN1282069A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1331092C (en) * 2004-05-17 2007-08-08 中国科学院半导体研究所 Special purpose neural net computer system for pattern recognition and application method
WO2007134494A1 (en) * 2006-05-16 2007-11-29 Zhongwei Huang A computer auxiliary method suitable for multi-languages pronunciation learning system for deaf-mute
CN100397387C (en) * 2002-11-28 2008-06-25 新加坡科技研究局 Summarizing digital audio data
CN102254558A (en) * 2011-07-01 2011-11-23 重庆邮电大学 Control method of intelligent wheel chair voice recognition based on end point detection
CN102881099A (en) * 2012-09-25 2013-01-16 北京声迅电子股份有限公司 Antitheft alarming method and device applied to automatic teller machine (ATM)
CN103020048A (en) * 2013-01-08 2013-04-03 深圳大学 Method and system for language translation

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100397387C (en) * 2002-11-28 2008-06-25 新加坡科技研究局 Summarizing digital audio data
CN1331092C (en) * 2004-05-17 2007-08-08 中国科学院半导体研究所 Special purpose neural net computer system for pattern recognition and application method
WO2007134494A1 (en) * 2006-05-16 2007-11-29 Zhongwei Huang A computer auxiliary method suitable for multi-languages pronunciation learning system for deaf-mute
CN102254558A (en) * 2011-07-01 2011-11-23 重庆邮电大学 Control method of intelligent wheel chair voice recognition based on end point detection
CN102254558B (en) * 2011-07-01 2012-10-03 重庆邮电大学 Control method of intelligent wheel chair voice recognition based on end point detection
CN102881099A (en) * 2012-09-25 2013-01-16 北京声迅电子股份有限公司 Antitheft alarming method and device applied to automatic teller machine (ATM)
CN103020048A (en) * 2013-01-08 2013-04-03 深圳大学 Method and system for language translation

Similar Documents

Publication Publication Date Title
WO2021208287A1 (en) Voice activity detection method and apparatus for emotion recognition, electronic device, and storage medium
US8024188B2 (en) Method and system of optimal selection strategy for statistical classifications
JP5241379B2 (en) Method and system for optimal selection strategy for statistical classification in dialogue systems
CN104143326B (en) A kind of voice command identification method and device
CN107221320A (en) Train method, device, equipment and the computer-readable storage medium of acoustic feature extraction model
US7475013B2 (en) Speaker recognition using local models
JP5223673B2 (en) Audio processing apparatus and program, and audio processing method
US20020188446A1 (en) Method and apparatus for distribution-based language model adaptation
CN112912897A (en) Sound classification system
CN1726532A (en) Sensor based speech recognizer selection, adaptation and combination
CN112735383A (en) Voice signal processing method, device, equipment and storage medium
US7809564B2 (en) Voice based keyword search algorithm
CN111243569B (en) Emotional voice automatic generation method and device based on generation type confrontation network
CN113628612A (en) Voice recognition method and device, electronic equipment and computer readable storage medium
CN111508480A (en) Training method of audio recognition model, audio recognition method, device and equipment
US6718306B1 (en) Speech collating apparatus and speech collating method
CN1213398C (en) Method and system for non-intrusive speaker verification using behavior model
CN1282069A (en) On-palm computer speech identification core software package
Ney An optimization algorithm for determining the endpoints of isolated utterances
JP4219539B2 (en) Acoustic classification device
KR101229108B1 (en) Apparatus for utterance verification based on word specific confidence threshold
CN111506764B (en) Audio data screening method, computer device and storage medium
CN116450848B (en) Method, device and medium for evaluating computing thinking level based on event map
CN111540363B (en) Keyword model and decoding network construction method, detection method and related equipment
CN115132198B (en) Data processing method, device, electronic equipment, program product and medium

Legal Events

Date Code Title Description
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C06 Publication
PB01 Publication
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication