CN102034474B - Method for identifying all languages by voice and inputting individual characters by voice - Google Patents

Method for identifying all languages by voice and inputting individual characters by voice Download PDF

Info

Publication number
CN102034474B
CN102034474B CN2009101771072A CN200910177107A CN102034474B CN 102034474 B CN102034474 B CN 102034474B CN 2009101771072 A CN2009101771072 A CN 2009101771072A CN 200910177107 A CN200910177107 A CN 200910177107A CN 102034474 B CN102034474 B CN 102034474B
Authority
CN
China
Prior art keywords
tone
individual character
sentence
unknown single
unknown
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN2009101771072A
Other languages
Chinese (zh)
Other versions
CN102034474A (en
Inventor
黎自奋
李台珍
黎世聪
黎世宏
廖丽娟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN2009101771072A priority Critical patent/CN102034474B/en
Publication of CN102034474A publication Critical patent/CN102034474A/en
Application granted granted Critical
Publication of CN102034474B publication Critical patent/CN102034474B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Machine Translation (AREA)

Abstract

The invention provides a method for identifying all languages by voice and inputting individual characters by voice. The invention is characterized by representing m inhomogeneities with a group of m unknown or known different monotones; pronouncing common single words once; converting the pronunciation of each single word into a linearity preestimate coding cepstrum matrix; classifying the common words into one type in m types by a bayesian classification method or a distance classification method; looking for F most similar unknown monotones from the m unknown monotones by the bayesian classification method or the distance classification method after a user pronounces a single word tone; and arranging all single words according to the degree of similarity and letters (or stroke number) of the single word to be pronounced in F types represented by the F similar unknown monotones; and the words required can be found quickly after the user pronounced. The invention has the advantages that the method is simple, a sample is not required, typewriting is not required, each person can be qualified, the words with non-standard pronunciation or mispronunciation also can be input, the speed is rapid, and the accuracy rate is high.

Description

All language of voice identifications and with the method for phonetic entry individual character
Technical field
Chinese has 408 single-tones, adds the four tones of standard Chinese pronunciation, and modern speech method can not be recognized 408 * 4 single-tones, and English is more.The present invention with everyday character be divided into m (=500) individual about different similar tone cluster (class), the literal of the similar sound of each type is by the similar sound representative of the unknown.The user is to an one, and the present invention finds out several unknown single-tones the most similar with this one with the Bei Shi classification in the unknown single-tone of m class, again from these several similar unknown single-tone representatives type individual character go for desired individual character and sentence.
The present invention is with 12 resilient bezel (window), and is isometric, and reactive filter is not overlapping, the sound wave of a single-tone different in size converted to 12 * 12 linear predictor coding cepstrum (LPCC) matrix.
The present invention comprises Bei Shi comparison method, in m unknown single-tone, looks for F unknown single-tone the most similar with required one for the enunciator.Because of F unknown single-tone the most similar of identification in fixing m class only, most important function of the present invention is to recognize individual character very soon, sentence and import a large amount of individual characters.Single-tone or different language sent out different in same individual character, be placed in the inhomogeneity respectively, so individual character or sentence pronounces or different language can recognize that all the present invention does not need sample with different.
Background technology
Modern its orthoepy (or phonetic symbol) must be known in each individual character with the Chinese and English individual character of typewriting input, must practice at typing, extremely inconvenient, so generally per capita can the literal input.Voice identifications and phonetic entry literal are the targets of trend and development from now on, need not typewrite, and cacoepy, but all input characters of stress and other language are arranged.
When sending out a single-tone, its pronunciation is to represent with sound wave.Sound wave is a kind of system of intercropping nonlinearities change at any time, and a single-tone sound wave contains a kind of dynamic perfromance, and also intercropping continuous nonlinear at any time changes.When identical single-tone pronounces, a succession of identical dynamic perfromance is arranged, non-linear stretching, extension of intercropping at any time and contraction, but identical dynamic perfromance is arranged order according to the time, but asynchronism(-nization).When identical single-tone pronounces, be arranged on the same time location identical dynamic perfromance very difficult.More, cause identification more difficult because of similar single-tone spy is many.
A computerize language identification system at first will extract the relevant language message of sound wave, also is dynamic perfromance, the noise of filtration and language independent, and like people's tone color, tone, the identification of psychology, physiology and mood and voice is irrelevant when speaking leaves out earlier.And then the same characteristic features of identical single-tone is arranged on the identical time location.This a series of characteristic is represented with an isometric serial proper vector, is called the characteristic model of a single-tone.The voice recognition system will produce too complicacy of characteristic model of the same size at present, and time-consuming, because the same characteristic features of identical single-tone is difficult to be arranged on the same time location, especially English causes comparison identification difficulty.
General voice identification method has following a succession of 3 groundworks: extract characteristic, characteristic normalization (the characteristic model size is consistent, and the same characteristic features of identical single-tone is arranged in same time location), the identification of unknown single-tone.The common feature of a continuous sound wave has following several kinds: energy (energy); Zero crosses count (zero crossings); Extreme value number (extreme count), summit (formants), linear predictor coding cepstrum (LPCC) and Mei Er frequency cepstral (MFCC); Wherein linear predictor coding cepstrum (LPCC) and Mei Er frequency cepstral (MFCC) are the most effective, and generally use.Linear predictor coding cepstrum (LPCC) represents a continuant the most reliable, stable accurate language characteristic again.It represents the continuant sound wave with the linear regression pattern, calculates regression coefficient with the least square estimation method, and its estimated value converts cepstrum again to, just becomes linear predictor coding cepstrum (LPCC).And Mei Er frequency cepstral (MFCC) is to convert sound wave to frequency with the Fu Shi transformation approach.Go to estimate auditory system according to the Mei Er frequency proportions again.Be published in IEEETransactions on Acoustics in 1980 according to scholar S.B.Davis and P.Mermelstein; Speech Signal Processing; Vol.28; With Dynamic Time Warping method (DTW), Mei Er frequency cepstral (MFCC) aspect ratio linear predictor coding cepstrum (LPCC) character recognition rate wants high among the paper Comparison of parametric representations for monosyllabic wordrecognition in continuously spoken sentences that No.4 delivers.But through repeatedly voice identification experiment (inventing before comprising me), use the Bei Shi classification, linear predictor coding cepstrum (LPCC) character recognition rate is higher than Mei Er frequency cepstral (MFCC) characteristic, and saves time.
As for language identification, existing a lot of methods adopt.Dynamic Time Warping method (dynamictime-warping) is arranged, vector quantization method (vector quantization) and concealed Marko husband type method (HMM).If identical pronunciation variation in time is variant, the one side comparison, one side is moved same characteristic features to same time location.Identification ratio can be fine, but with same characteristic features move to same position very difficulty and distortion time oversize, can not use.The vector quantization method is as recognizing a large amount of single-tones, and is not only inaccurate, and time-consuming.Nearest concealed Marko husband type method (HMM) identification method is pretty good, but method is numerous and diverse, and too many unknown parameter need be estimated, calculates estimated value and recognizes time-consuming.Be published in Pattern Recognition in 2003 with T.F.Li (Li Zifen); Use the Bei Shi classification among the paper Speech recognition of mandarin monosyllables that vol.36 delivers; With identical data bank; A series of LPCC vectors that various length are different are compressed into the characteristic model of identical size; Recognition results is than Y.K.Chen, C.Y.Liu, G.H.Chiang; M.T.Lin is published in Proceedings of Telecommunication Symposium in nineteen ninety, will get well with concealed Marko husband type method HMM method among the paper Therecognition of mandarin monosyllables based on the discrete hidden Markovmodel that Taiwan delivers.But the compression process complicacy is time-consuming, and is difficult to the same characteristic features of identical single-tone is compressed to identical time location, for similar single-tone, is difficult to identification.Also do not have method at present as for the phonetic entry literal because at present computer speech distinguish admit a fault fine.
Voice identification method of the present invention from the scientific principle aspect, has a kind of phonetic feature according to sound wave to above-mentioned shortcoming, and intercropping nonlinearities change at any time derives a cover naturally and extracts the phonetic feature method.Convert a single-tone (Chinese and English individual character) sound wave elder generation normalization to an equal-sized characteristic model that is enough to represent this single-tone again, and the identical time location of identical single-tone in their characteristic models there is same characteristic features.Do not need unknown parameter and threshold in artificial or experiment adjusting the present invention.With simplifying the Bei Shi classification, the characteristic model of the sound that can individual character be sent out is compared with the characteristic model of m unknown single-tone (represent the similar sounds of m class difference), does not need to recompress, and twists or seeks identical characteristic and compare.So voice identification method of the present invention can be accomplished feature extraction fast, characteristic normalization and identification, and can correctly find desired individual character fast.
Summary of the invention
The present invention is in order to overcome the defective of above-mentioned prior art, provides a kind of use simple, do not need sample, do not need phonetic notation, need not typewrite anyone voice that all can be competent at all language of identification that accuracy rate is high and with the method for phonetic entry individual character.
The purpose of this invention is to provide a kind of voice all language of identification and, comprise following steps with the method for phonetic entry individual character:
(1) individual character is English, Chinese or other literal, and the pronunciation of an individual character is single-tone, and this method has m unknown (or known) single-tone and an individual character data bank commonly used, and each unknown single-tone has sample;
(2) first front processors (pre-processor) are left out and are not had the signaling point (sampled points) or a noise of voice sound wave;
(3) a kind of single-tone sound wave normalization and extract the method for characteristic: with sound wave normalization, and convert the encode matrix of cepstrum (LPCC) E * P characteristic of equal-sized linear predictor to E resilient bezel;
(4) ask the mean value and the variance of each unknown single-tone linear predictor coding cepstrum (LPCC) sample, the mean value of an E * P sample and variance matrix represents a unknown single-tone, and each unknown single-tone is represented one type of everyday character that similar sound is arranged, and one has the m class;
The speaker of (5) standards of pronouncing distinctly to each one commonly used once, like user's cacoepy, or with other dialect and language pronouncing, is then pronounced by the user;
(6) the sound wave normalization of each one commonly used and extract the method for characteristic: with sound wave normalization and convert an E * P linear predictor coding cepstrum (LPCC) matrix to;
(7) a kind of simplification Bei Shi (Bayesian) classification: the mean value and the variance matrix of the E * P sample of the E of an individual character commonly used of comparison * P linear predictor coding cepstrum (LPCC) matrix and each unknown single-tone; Look for the unknown single-tone the most similar with Bei Shi distance (similarity) with the pronunciation of this everyday character, will use always again individual character be placed on this most similar unknown single-tone representative type everyday character in;
(8) user is to desired one, and this single-tone converts linear predictor coding cepstrum (LPCC) matrix of an E * P to;
(9) with simplify Bei Shi (Bayesian) classification relatively the user want the mean value and the variance matrix of E * P sample of E * P linear predictor coding cepstrum (LPCC) matrix and each unknown single-tone of individual character, look for the individual unknown single-tone of the F the most similar with Bei Shi apart from (similarity) with want one;
(10) in F type everyday character of the most similar F unknown single-tone representative; (definitely) of asking E * P linear predictor of the E of all everyday characters * P linear predictor coding cepstrum (LPCC) matrix and the individual character of wanting to encode between cepstrum (LPCC) matrix arranged all everyday characters of F class apart from (similarity); After the arrangement, the desired individual character of user should come the foremost, or after arranging; All everyday characters are divided into number and wait section; Whenever, waiting the section everyday character to arrange with letter (or stroke number), also is all everyday characters of F class according to being arranged in a matrix with (definitely) of the one of wanting apart from the letter (or stroke number) of (similarity) and everyday character, after the user pronounces; Letter (or stroke number) according to the individual character of wanting is looked in the matrix of these all everyday characters of F class from top to bottom, is easy in matrix, find the individual character of wanting;
(11) a kind of method of recognizing sentence and title;
(12) a kind ofly revise identification unsuccessful individual character and sentence and title, and the unsuccessful individual character of input, and add the method for new individual character.
Wherein, leave out in the step (2) and do not have the signaling point or a noise of voice sound wave and be: calculate adjacent two signaling points in one hour period apart from summation and general adjacent two signaling points of noise apart from summation, then leave out this period like the former less than the latter.
Wherein, comprise a single-tone sound wave normalization in the step (3) and extract eigenmatrix of the same size, step is following:
(a) method of the signaling point of a single-tone sound wave of a kind of equal five equilibrium in order to estimate the sound wave of nonlinearities change closely with the Regression Model of linear change, is divided into E with the sound wave total length and waits the period; Per period forms a resilient bezel; Total E the isometric resilient bezel of single-tone does not have wave filter (Filter), and is not overlapping; Can freely stretch and contain the total length sound wave, be not Hamming (Hamming) window of regular length;
(b) in every frame, estimate the sound wave of intercropping nonlinearities change at any time with the Regression Model of an intercropping linear change at any time;
(c) with Durbin ' s recurrent formula:
R ( i ) = Σ n = 0 N - i S ( n ) S ( n + i ) , i ≥ 0
E 0=R(0)
k i = [ R ( i ) - Σ j = 1 i - 1 a j ( i - 1 ) R ( i - j ) ] / E i - 1
a i ( i ) = k i
a j ( i ) = a j ( i - 1 ) - k i a i - j ( i - 1 ) , 1 ≤ j ≤ i - 1
E i = ( 1 - k i 2 ) E i - 1
a j = a j ( P ) , 1 ≤ j ≤ P
Ask regression coefficient least square estimation value a j, 1≤j≤P is called linear predictor coding (LPC) vector, uses again
a ′ i = a i + Σ j = 1 i - 1 ( j i ) a i - j a ′ j , 1 ≤ i ≤ P
a &prime; i = &Sigma; j = i - P i - 1 ( j i ) a i - j a &prime; j , P < i
Transfer linearity is estimated coding (LPC) vector and is the stable vectorial a ' of linear predictor coding cepstrum (LPCC) i, 1≤i≤P;
(d) with E linear predictor coding cepstrum (LPCC) vector (E * P linear predictor coding cepstrum (LPCC) matrix) single-tone of expression.
Wherein, Comprise a kind of simplification Bei Shi (Bayesian) the classification relatively mean value and the variance matrix of the E * P sample of the E of an one commonly used * P linear predictor coding cepstrum (LPCC) matrix and each unknown single-tone in the step (7); Look for the method for the most similar unknown single-tone, its step is following:
(a) characteristic of the pronunciation of an individual character commonly used is with an E * P linear predictor coding cepstrum (LPCC) matrix X={X Jl, j=1 ..., E, l=1 ..., P, expression, for quick identification, E * P LPCC{X JlSupposition is E * P independent random variable, and normal allocation is arranged, if the individual unknown single-tone of the pronunciation of this everyday character and m in a unknown single-tone c i, i=1 ..., m, (m is the sum of all unknown single-tones) when comparison, then { X JlAverage and variance (μ Ijl, σ Ijl 2) estimate that with the sample mean and the variance of this unknown single-tone the conditional density function of X is so
f ( x | c i ) = [ &Pi; jl 1 2 &pi; &sigma; ijl ] e - 1 2 &Sigma; jl ( x jl - &mu; ijl &sigma; ijl ) 2
X={X JlBe the linear predictor coding cepstrum (LPCC) of the pronunciation of this everyday character, but (μ Ijl, σ Ijl 2) available unknown single-tone c iAverage of samples and variance estimate;
(b) simplifying the Bei Shi classification is to looking for a unknown single-tone c in m the unknown single-tone iAs the pronunciation X of this everyday character, a unknown single-tone c iTo the pronunciation X similarity of this everyday character by f (x|c in the following formula i) expression
f ( x | c i ) = [ &Pi; jl 1 2 &pi; &sigma; ijl ] e - 1 2 &Sigma; jl ( x jl - &mu; ijl &sigma; ijl ) 2 ;
(c) be identification fast, with conditional density function f (x|c in the logarithm abbreviation (b) i), and leave out the constant that needn't calculate, and get the Bei Shi distance, also be similarity (the Bei Shi distance is littler, and similarity is bigger), also be the Bei Shi classification
l ( c i ) = &Sigma; jl ln ( &sigma; ijl ) + 1 2 &Sigma; jl ( x jl - &mu; ijl &sigma; ijl ) 2 ;
(d) to each unknown single-tone c i, i=1 ..., m, Bei Shi is apart from l (c in calculating (c) formula i) value;
(e) in m unknown single-tone, select a unknown single-tone c ' i, it to the Bei Shi of this everyday character pronunciation X apart from l (c ' i) value is minimum (similarity is maximum), is judged to the most similar unknown single-tone of this everyday character.
Wherein, comprise a kind of method of recognizing sentence and title in the step (11), its step is following:
(a) to a talker's sentence or title identification, we set up a sentence and name data storehouse earlier, and the individual character in each sentence or the title all is made up of the needed individual character of individual character data bank commonly used;
(b) sentence or title are cut into D single-tone, the per unit period is calculated the summation of adjacent two signaling point drop distances, as too little; This period is a noise or quiet, does not have the adjacent cells period accumulation of speech sound signal too many (between English-word two syllables more than the time), and expression is noise or quiet entirely; It should be the separatrix of two individual characters; Just should cut, be cut into D single-tone altogether, every single-tone changes into E * P linear predictor coding cepstrum (LPCC) matrix again.To each single-tone; In the m class, select the most similar F unknown single-tone with the Bei Shi classification; One sentence or title are represented with the most similar unknown single-tone of D * F; All individual characters in F type of the F of each one unknown single-tone representative the most similar are arranged in row according to (definitely) distance (similarity) with the one of wanting, and the individual character of a total D row F similar unknown single-tone can comprise this sentence or title, and the desired individual character of this sentence or title should come the foremost;
(c) if the sentence of the selection of data bank comparison or title and talker's sentence or title isometric (D individual character); The preface comparison be docile and obedient in a D known individual character of the individual character of so that D every row are similar unknown single-tone and comparison sentence or title, looks at whether the individual character of the similar unknown single-tone that the D row are different compares the known individual character in sentence or the title.As all containing the known individual character in a comparison sentence or the title in the individual character of the similar unknown single-tone of every row, recognize that correct individual character is D, then the sentence of this comparison or title are exactly talker's sentence or title;
(d) if individual character is D-1 or D+1 or is not that D is individual at the correct individual character of (c) that the present invention then screens with 3 row windows in data bank comparison sentence and the title.In comparison sentence or title (in the data bank); I known individual character is with the individual character of F similar unknown single-tones of three row before and after in the D row (i.e. i-1, i; The i+1 row) i known individual character of comparison; Calculate D and show the known individual character of how much comparing in sentence or the title, obtain the probability of this comparison sentence or title again divided by total D, select maximum sentence of a probability or name to be called talker's pronunciation sentence or title in data bank.
Wherein, step comprises a kind of technology in (12), revises the unsuccessful individual character of identification and sentence and title, and the unsuccessful individual character of input, and adds the method for new individual character:
(a) if the user can not find desired individual character; Cacoepy mispronounces or with other Languages pronunciation, and then this individual character is not in the everyday character of the F class of F unknown single-tone representative the most similar; One fixes in the sub-block commonly used of other type or not in all m data bank; After the user sends out the individual character sound desired, look for the most similar unknown single-tone, desired individual character is placed in the sub-block commonly used of class of this most similar unknown single-tone representative with the Bei Shi distance; Later on the user sends out single-tone same, and desired individual character will appear in the class of F unknown single-tone representative the most similar;
(b), this new individual character is added in the sub-block commonly used of class of the most similar unknown single-tone representative if after desired individual character not in all m classes, is then sent out this institute and wanted the individual character sound;
(c) unisonance not can be sent out multiple in individual character, like standard pronunciation, nonstandard sound, wrong sound, or with the other Languages pronunciation, this individual character is placed on respectively in the class of different unknown single-tones; The user can send out sound any to same individual character, so anyone can both use the present invention to recognize voice and use the phonetic entry individual character, and method is simple, does not need sample; Do not need phonetic notation, need not typewrite, anyone can both be competent at; Speed is fast, and accuracy rate is high, all goes with national language, amoyese, English and various language pronouncing;
(d) if sentence or title (every individual character) can not be recognized success; This sentence is told; The present invention is divided into D individual character with this sentence or title, the present invention each individual character is assigned to the Bei Shi classification it the most similar unknown single-tone the class individual character in, recognize again this sentence or title can the success;
(e) the present invention only will recognize or import unsuccessful individual character relay its most similar unknown single-tone the class in; Do not change the characteristic (mean value and variance) of any unknown single-tone; Therefore identification method of the present invention is stable; Because of F unknown single-tone the most similar of identification from fixing minority m (about=500) type unknown single-tone only, the present invention can recognize the individual character and the sentence of various language very soon and import a large amount of individual characters, also be most important function of the present invention again.
Compared with prior art, a kind of voice all language of identification provided by the invention and with the method for phonetic entry individual character have following beneficial effect:
(1) the most important purpose of the present invention is to recognize fast to come input characters with speech by a large amount of individual characters, does not need phonetic notation; Need not typewrite, anyone can be competent at, and the word nonstandard or that mispronounce that pronounces also can be imported; Speed is fast, and accuracy rate is high, all can with national language, amoyese, English and various language pronouncing.
(2) the present invention provides the method for a kind of single-tone sound wave normalization and extraction characteristic.It uses E resilient bezel that equates; Not overlapping; There is not wave filter; Can contain whole wavelength according to the length free adjustment of a single-tone ripple, can the dynamic perfromance of a series of intercropping nonlinearities change at any time in the single-tone sound wave be converted to an equal-sized characteristic model, and the characteristic model of identical single-tone sound wave have same characteristic features on identical time location.Can in time recognize, reach the effect of computer real-time identification.
(3) the present invention provides a kind of simple and easy effective Bei Shi to recognize the method for unknown single-tone, and it is minimum that the probability of admitting one's mistake reaches, and calculating less, identification is fast and discrimination power is high.
(4) the present invention provides a kind of method that extracts the single-tone characteristic, and the single-tone sound wave has a kind of dynamic perfromance of intercropping nonlinearities change at any time.The present invention uses the regression model of intercropping linear change at any time to estimate the sound wave of intercropping nonlinearities change at any time, produces the least square estimation value (LPC vector) that returns unknowm coefficient.
(5) the present invention uses all sound waves with voice (sound wave signaling point).With fewer E=12 resilient bezel that equates, there is not wave filter, not overlapping all signaling point characteristics that contain., do not leave out this single-tone,, leave out or the compression section signaling point yet not because oversize because a single-tone sound wave is too short.As long as the human auditory can distinguish this single-tone, the present invention can extract characteristic with this single-tone.Have the signaling point of voice so voice identification method of the present invention is used each, can extract phonetic feature as far as possible.Not overlapping because of E=12 resilient bezel, the frame number is few, significantly reduces feature extraction and calculates linear predictor coding cepstrum (LPCC) time.
(6) identification method of the present invention can be recognized the too fast or too slow single-tone of speech of speech.When speech was too fast, a single-tone sound wave was very short, and resilient bezel length of the present invention can be dwindled, and still contained the minor ripple with identical several E isometric resilient bezel.Produce E linear predictor coding cepstrum (LPCC) vector.As long as this minor mankind can distinguish, this E linear predictor cepstrum (LPCC) vector of encoding can effectively be represented the characteristic model of this minor so.It is longer to say that too slow institute sends the single-tone sound wave.Resilient bezel can be extended.The identical several E that produced linear predictor coding cepstrum (LPCC) vectors also can effectively be represented this long.
(7) the present invention provides a kind of technology, revises the pronunciation of an individual character and sentence and finds the class that comprises this individual character, successfully recognizes individual character and sentence and input individual character.
(8) only in fixing m class, recognize F unknown single-tone the most similar; Therefore identification method of the present invention is stable; Can find a group and want individual character that the everyday character of similar sound is arranged very soon; Similarity and letter (or stroke number) according to the one of wanting are arranged in a matrix, and the user is easy to find this word at matrix with letter (or stroke number) according to the similarity with the one of wanting.
(9) most important function of the present invention is removed and to be recognized individual character very soon, and single-tone or different language sent out different in sentence and importing outside a large amount of individual characters, same individual character, be placed in the inhomogeneity respectively, so individual character or sentence pronounces or different language all can be recognized with different.
(10) the present invention does not need sample.
Brief description of drawings
Fig. 1 is the synoptic diagram of invention executive routine, and m the everyday character data bank that different similar sounds are arranged set up in expression, the individual character crowd commonly used of the different similar sounds of m class;
Fig. 2 is identification individual character and a sentence and with the schematic flow sheet of the input characters that pronounces;
The synoptic diagram of segment instructions of the present invention and identification Chinese and english sentence is imported in Fig. 3~9th with the present invention of Visual Basic software executing.
Wherein, description of reference numerals is following:
1: m different unknown single-tones and sample are arranged earlier; 10: the sound wave digitizing;
20: remove noise; 30:E resilient bezel normalization sound wave;
40: least square method is calculated linear predictor coding cepstrum (LPCC) vector;
50: ask the sample mean and the variance of each unknown single-tone, each unknown single-tone is represented one type, a total m class;
60: to each everyday character pronunciation once, convert this sound to linear predictor coding cepstrum (LPCC) matrix;
70: a type in the m class is assigned in everyday character with Bei Shi or distance classification method;
80:m data bank has the everyday character of different similar sounds;
85: set up the sentence that to recognize and the sentence and the name data storehouse of title with individual character;
2: to a needed one;
40: least square method is calculated linear predictor coding cepstrum (LPCC) vector;
84: in m unknown single-tone, ask F unknown single-tone the most similar with this one with the Bei Shi classification;
86: recognize sentence, earlier sentence is cut into D individual character, ask F unknown single-tone the most similar of each one, sentence is by the matrix representation of the individual unknown single-tone of D * F;
All individual characters in F type of 90:F unknown single-tone representative the most similar are arranged in a matrix according to similarity and letter (or stroke number) with the one of wanting;
100: desired individual character should come the foremost, or is easy in matrix, find this word according to the letter (or stroke number) of this word;
110: all individual characters in F type of the F of each one unknown single-tone representative the most similar are arranged in row according to the similarity with the one of wanting, and the individual character of a total D row F similar unknown single-tone can comprise this sentence;
120: in sentence and name data storehouse, with individual character all sentences of screening of F of 3 row windows similar unknown single-tone and each the known individual character in the title;
130: in sentence and name data storehouse, look for a most probable sentence or title.
Embodiment
Below will combine embodiment that the present invention is further specified, embodiments of the invention only are used to technical scheme of the present invention is described, and non-limiting the present invention.
With Fig. 1 and Fig. 2 the invention executive routine is described.Fig. 1 representes to set up m the everyday character data bank that different similar sounds are arranged, the individual character crowd of m different similar sounds.Fig. 2 representes that the user recognizes individual character and sentence and input individual character executive routine.
There is the continuant sound wave of m different unknown single-tones and 1, one unknown single-tone sample of sample to convert digitized signal point 10 to earlier, removes noise or quiet 20.Method of the present invention is to calculate summation and the general noise or the quiet summation of continuous two signaling point distances in the little period, and less than the latter, then this little period is not had voice, should leave out like the former.After leaving out, obtain the signaling point that a sequence has this unknown single-tone.Earlier sound wave normalization is extracted characteristic again, whole signaling points of unknown single-tone are divided into E wait the period, frame of per period composition.Single-tone one a total E isometric frame 30 does not have wave filter, and not overlapping, according to the length of the whole signaling points of single-tone, E frame length freely adjusted and contained whole signaling points.So this frame is called resilient bezel, length is freely stretched, but E resilient bezel length is the same.Unlike Hamming (Hamming) window, wave filter, half overlapping, regular length are arranged, can not freely adjust with wavelength.Because of single-tone sound wave intercropping nonlinearities change at any time, sound wave contains voice behavioral characteristics, also an intercropping nonlinearities change at any time.Because it is not overlapping; So individual resilient bezel that the present invention uses less (E=12) contains whole single-tone sound waves, because of signaling point can be estimated by the front signaling point; Come to estimate closely the sound wave of nonlinearities change with the Regression Model of intercropping linear change at any time, estimate to return unknowm coefficient with least square method.Produce the least square estimation value of one group of unknowm coefficient in every frame, be called linear predictor coding (LPC vector).Convert linear predictor coding (LPC) vector into more stable linear predictor coding cepstrum (LPCC) 40 again.The sound wave of a unknown single-tone sample contains the voice behavioral characteristics of sequence intercropping nonlinearities change at any time, becomes equal-sized E linear predictor coding cepstrum (LPCC) vector (E * P linear predictor coding cepstrum (LPCC) matrix) in internal conversion of the present invention.Ask the mean value and the variance of linear predictor coding cepstrum (LPCC) sample of each unknown single-tone again; Each unknown single-tone is with its sample mean and the representative of variance matrix; Each unknown single-tone is represented the individual character crowd of one type of similar sound, a total m class 50.To everyday character pronunciation once,, stress is arranged or say dialect or use different language, pronounce by the user if user's cacoepy is true.Convert this everyday character to linear predictor coding cepstrum (LPCC) matrix 60.The present invention is with the Bei Shi classification mean value of linear predictor coding cepstrum (LPCC) and the unknown single-tones of all m classes of this individual character relatively; Calculate the Bei Shi distance of this unknown single-tone again divided by the variance of this unknown single-tone; This individual character is placed on minimum Bei Shi distance unknown single-tone the class in; Also promptly in m unknown single-tone, look for the most similar unknown single-tone, this individual character is assigned to the individual character crowd 70 commonly used of the class of this most similar unknown single-tone representative with the Bei Shi classification.The individual character of similar sound all is placed on same type; All everyday characters are divided into the m class; One total m the everyday character data bank 80 that different similar sounds are arranged, each data bank has different language, and not unisonance or different language sent out in same individual character; Be placed on respectively in the inhomogeneity (data bank), set up the sentence that to recognize and the sentence and the name data storehouse 85 of title with individual character.
Fig. 2 is the individual character of identification, the schematic flow sheet of sentence and title and input individual character method.Earlier to a desired one 2.The single-tone sound wave is digitized into signaling point 10, removes noise 20, and E resilient bezel normalization sound wave extracts characteristic, whole signaling points with voice of single-tone is divided into E waits the period, resilient bezel 30 of per period formation.Each single-tone one total E resilient bezel does not have wave filter, and is not overlapping, freely stretches and contains whole signaling points.In every frame, can estimate by the front signal because of signaling point, ask the estimated value that returns unknowm coefficient with least square method.The one group of least square estimation value that is produced in every frame is called linear predictor coding (LPC) vector, and linear predictor coding (LPC) vector has normal allocation, again that the conversion of linear predictor coding (LPC) vector is more stable linear predictor coding cepstrum (LPCC) vector 40.The present invention is with the Bei Shi classification mean value of linear predictor coding cepstrum (LPCC) and the unknown single-tones of all m classes of this individual character relatively; Calculate the Bei Shi distance of this unknown single-tone again divided by the variance of this unknown single-tone; In the unknown single-tone of m class; Look for F the most similar unknown single-tone, also promptly this F unknown single-tone has F minimum Bei Shi distance 84 apart from the linear predictor coding cepstrum (LPCC) of this individual character.In the everyday character 80 of the different similar sounds of m data bank; Look for all everyday characters of F class of F unknown single-tone representative the most similar; According to apart from (definitely) of the institute's individual character linear predictor of wanting coding cepstrum (LPCC) apart from all individual characters of (similarity) arrangement F class; Desired individual character should come the foremost, or segmentation again, and the individual character in every section is arranged according to the letter (or stroke number) of this word; Also be all individual characters of F class according to being arranged in a matrix 90, comply with the individual character matrix that the letter (or stroke number) of the individual character of wanting is easy in the F class, to be arranged and find desired individual character 100 with the similarity of the one of wanting and the letter (or stroke number) of this word.Recognize sentence and title; Earlier sentence is cut into D individual character; Ask F unknown single-tone the most similar of each one; Sentence or title be by the matrix representation 86 of D * F unknown single-tone, and all individual characters are arranged in row according to the similarity with the one of wanting in F type of the F of each one unknown single-tone representative the most similar, one have the individual similar unknown single-tone of D row F individual character can comprise this sentence or title 110.In sentence and name data storehouse 85,, in sentence and name data storehouse, look for a most probable sentence or title 130 with individual character all sentences of screening of F of 3 row windows similar unknown single-tone and each the known individual character 120 in the title.The present invention is specified in the back:
(1) to behind the desired one, a single-tone input converts this single-tone sound wave to a series of numberization sound wave signaling points (signal sampled points).Leave out again and do not have a sound wave signaling point of voice.The present invention provides two kinds of methods: the one, and the variance of signaling point in the one little period of calculating.The 2nd, calculate the summation of adjacent two signaling point distances in this period.In theory, first method is relatively good, because of the variance of signaling point greater than noise, or quiet variance, expression has voice to exist.But when the present invention recognized single-tone, two kinds of method identification ratios were the same, but second kind saves time.
(2) leave out and do not have behind the signaling point of voice, remaining signaling point is represented whole signaling points of a single-tone.Earlier sound wave normalization is extracted characteristic again, whole signaling points are divided into E wait the period, per period forms a frame.Total E the isometric resilient bezel of single-tone, do not have wave filter, not overlapping, freely stretch, contain whole signaling points.Signaling point intercropping nonlinearities change at any time in the resilient bezel is difficult to represent with mathematical model.Because J.Markhoul was published in Proceedings ofIEEE in 1975; Vol.63; No.4 publishes thesis, and the explanation signaling point has linear relationship with the front signaling point among the Linear Prediction:A tutorial review, the signaling point of this nonlinearities change of model estimation of the recurrence of available intercropping linear change at any time.Signaling point S (n) can estimate that its estimated value S ' (n) is represented by following Regression Model by the front signaling point:
S &prime; ( n ) = &Sigma; k = 1 P a k S ( n - k ) , n &GreaterEqual; 0 - - - ( 1 )
In (1) formula, a k, k=1 ..., P is to return the unknowm coefficient estimated value, P is a front signaling point number.With L.Rabiner and B.H.Juang in works book Fundamentals ofSpeech Recognition in 1993; Prentice Hall PTR; Englewood Cliffs; The recurrent formula of Durbin is asked the least square estimation value among the New Jersey, and this group estimated value is called linear predictor coding (LPC) vector.Details are as follows to ask linear predictor coding (LPC) vector approach of signaling point in the frame:
With E 1Expression signal point S (n) and estimated value S ' thereof (n) between difference of two squares summation:
E 1 = &Sigma; n = 0 N [ S ( n ) - &Sigma; k = 1 P a k S ( n - k ) ] 2 - - - ( 2 )
Ask regression coefficient to make total sum of squares E 1Reach minimum.To each unknown regression coefficient a i, i=1 ..., P asks the partial differential of (2) formula, and to make partial differential be 0, obtains P and organizes normal equation:
&Sigma; k = 1 P a k &Sigma; n S ( n - k ) S ( n - i ) = &Sigma; n S ( n ) S ( n - i ) , 1 &le; i &le; P - - - ( 3 )
After (2) formula of expansion,, get minimum total difference of two squares E with (3) formula substitution P
E P = &Sigma; n S 2 ( n ) - &Sigma; k = 1 P a k &Sigma; n S ( n ) S ( n - k ) - - - ( 4 )
(3) formula and (4) formula convert into
&Sigma; k = 1 P a k R ( i - k ) = R ( i ) , 1 &le; i &le; P - - - ( 5 )
E P = R ( 0 ) - &Sigma; k = 1 P a k R ( k ) - - - ( 6 )
In (5) and (6) formula, represent that with N signal is counted in the frame,
R ( i ) = &Sigma; n = 0 N - i S ( n ) S ( n + i ) , i &GreaterEqual; 0 - - - ( 7 )
Calculate linear predictor coding (LPC) vector fast as follows with the recurrent formula of Durbin:
E 0=R(0) (8)
k i = [ R ( i ) - &Sigma; j = 1 i - 1 a j ( i - 1 ) R ( i - j ) ] / E i - 1 - - - ( 9 )
a i ( i ) = k i - - - ( 10 )
a j ( i ) = a j ( i - 1 ) - k i a i - j ( i - 1 ) , 1 &le; j &le; i - 1 - - - ( 11 )
E i = ( 1 - k i 2 ) E i - 1 - - - ( 12 )
(8-12) formula cycle calculations obtains regression coefficient least square estimation value a j, j=1 ..., P, (linear predictor coding (LPC) vector) as follows:
a j = a j ( P ) , 1 &le; j &le; P - - - ( 13 )
Convert the LPC vector to more stable linear predictor coding cepstrum (LPCC) with formula again
Vector a ' j, j=1 ..., P,
a &prime; i = a i + &Sigma; j = 1 i - 1 ( j i ) a i - j a &prime; j , 1 &le; i &le; P - - - ( 14 )
a &prime; i = &Sigma; j = i - P i - 1 ( j i ) a i - j a &prime; j , P < i - - - ( 15 )
A resilient bezel linear predictor coding cepstrum of generation (LPCC) vector (a ' 1..., a ' P).The voice identification method is used P=12 according to the present invention, because of last linear predictor coding cepstrum (LPCC) is almost 0.A single-tone also is a single-tone of matrix representation that contains E * P linear predictor coding cepstrum (LPCC) with E linear predictor coding cepstrum (LPCC) vector representation characteristic.
(3) all sample conversion with a unknown single-tone become E linear predictor coding cepstrum (LPCC) vector, ask the mean value and the variance of LPCC vector sample again.The mean value and the representative of variance matrix of an E * P linear predictor coding cepstrum sample should the unknown single-tones.Each unknown single-tone is represented one type, a total m class.Each everyday character pronunciation once, and is nonstandard if the user pronounces, and mispronounce, or with other dialect or language pronouncing, everyday character just pronounced by the user.The sound of being sent out converts E linear predictor coding cepstrum (LPCC) vector to (8-15) formula; With the Bei Shi classification mean value of linear predictor coding cepstrum (LPCC) and the unknown single-tones of all m classes of this individual character relatively; Calculate the Bei Shi distance of this unknown single-tone again divided by the variance of this unknown single-tone; This individual character is placed on minimum Bei Shi distance unknown single-tone the class in; Also promptly in m unknown single-tone, look for the most similar unknown single-tone, this everyday character is assigned among the individual character crowd of class of this most similar unknown single-tone representative with the Bei Shi classification.The individual character of similar sound all is placed on same type, and all everyday characters are divided into the m class, the everyday character 80 of a total m data bank.Each data bank has different language, and not unisonance or different language sent out in same individual character, is placed on respectively in the inhomogeneity (data bank), sets up the sentence that will recognize and the sentence and the name data storehouse 85 of title with individual character.
(4) in Fig. 2, the user is to desired word pronunciation 2, and this single-tone (8-15) formula converts linear predictor coding cepstrum (LPCC) matrix of an E * P to.Use X={X Jl, j=1 ..., E, l=1 ..., P representes this single-tone LPCC matrix.With m unknown single-tone in a unknown single-tone c i, i=1 ..., during m (m representes the sum of all unknown single-tones) comparison,, suppose { X in order to calculate the comparison value fast JlE * P independent normal allocation, its average and variance (μ arranged Ijl, σ Ijl 2), with unknown single-tone c iMean value and variance estimate.With f (x|c i) expression X conditional density function.Be published in Pattern Recognition in 2003 with T.F.Li (Li Zifen), the decision theory explanation Bei Shi classification that Vol.36 publishes thesis among the Speech recognition of mandarin monosyllables is following: suppose the individual unknown single-tone c of a total m iWith θ i, i=1 ..., m representes unknown single-tone c i, i=1 ..., m, the probability of appearance also is previous probability, then &Sigma; i = 1 m &theta; i = 1 . Represent a decision-making technique with d: in the m class, select a unknown single-tone the most similar with wanted one.Defining a simple loss function (loss function), also is the wrong probability (misclassificationprobability) of choosing of d: like the wrong unknown single-tone c of decision-making technique d choosing i, d (x) ≠ c i, loss function L (c then i, d (x))=1.If the d choosing is to a unknown single-tone c i, d (x)=c i, free of losses L (c then i, d (x))=0.Identification method is following: with Γ i, i=1 ..., m, expression X=x matrix value belongs to unknown single-tone c in the m class iThe scope of class.Also be that X is at Γ i, d declares this individual character and belongs to unknown single-tone c i, the most similar unknown single-tone of one is unknown single-tone c iThe wrong average probability of d choosing does
R ( &tau; , d ) = &Sigma; i = 1 m &theta; i &Integral; L ( c i , d ( x ) ) f ( x | c i ) dx
= &Sigma; i = 1 m &theta; i &Integral; &Gamma; i C f ( x | c i ) dx - - - ( 16 )
In (16), τ=(θ 1..., θ m), Γ i cBe Γ iScope in addition.Represent the identification method of all voice with D, also promptly in the X scope, divide all methods of the scope of m unknown single-tone.In D, look for an identification method d τMake its wrong probability of average choosing (16) reach minimum, with R (τ, d τ) expression
R ( &tau; , d &tau; ) = min d &Element; D R ( &tau; , d ) - - - ( 17 )
Satisfy the identification method d of (17) formula τBe called and the previous relevant Bei Shi classification of probability τ.Available following expression:
d τ(x)=c i?if?θ if(x|c i)>θ jf(x|c j) (18)
In (18) formula, j=1 ..., m, j ≠ i also is that x belongs to unknown single-tone c iScope be to all j ≠ i, Γ i={ x| θ iF (x|c i)>θ jF (x|c j).As probability appearred in all unknown single-tones, then the Bei Shi classification was the same with most probable number method.
When (5) Bei Shi classification (18) is selected a unknown single-tone, calculate the conditional density function f (x|c of all X earlier i), i=1 ..., m,
f ( x | c i ) = [ &Pi; jl 1 2 &pi; &sigma; ijl ] e - 1 2 &Sigma; jl ( x jl - &mu; ijl &sigma; ijl ) 2 - - - ( 19 )
In (19), i=1 ..., m, (sum of the unknown single-tone of m=).For convenience of calculation, (19) formula is taken the logarithm, and leave out constant, get Bei Shi distance (similarity)
l ( c i ) = &Sigma; jl ln ( &sigma; ijl ) + 1 2 &Sigma; jl ( x jl - &mu; ijl &sigma; ijl ) 2 , i = 1 , . . . , m . - - - ( 20 )
Bei Shi classification (18) becomes each unknown single-tone c iCalculate l (c i) value (20), l (c i) being also referred to as institute will this one x and unknown single-tone c iSimilarity, or Bei Shi distance (Bayesiandistance).In (20) formula, x={x Jl, j=1 ..., E, l=1 ..., P be institute will this one x linear predictor coding cepstrum (LPCC) value, { μ Ijl, σ Ijl 2With unknown single-tone c iAverage and variance estimate.
(6) user to want one after, the present invention seeks F unknown single-tone the most similar with Bei Shi distance (20) from m unknown single-tone, also promptly select the unknown single-tone of F Bei Shi distance the shortest (20).And in the everyday character of the F class of the most similar unknown single-tone representative of F, look for the desired individual character of user; Method is following: if the individual character commonly used one total N of the similar sound of F class is individual; (definitely) of E * P matrix of linear predictor coding cepstrum (LPCC) of calculating each individual character commonly used in E * P matrix and N the individual character commonly used of linear predictor coding cepstrum (LPCC) of the individual character of wanting be apart from (similarity), with apart from (similarity) minispread greatly.Because of there not being variance, can not use Bei Shi distance (20) to arrange the everyday character of N similar sound.Desired individual character should come the foremost, or N everyday character after will arranging again be divided into number and wait section, whenever waits the everyday character of section to arrange according to letter (or stroke number), and N everyday character is arranged in an one-tenth matrix according to similarity and letter (stroke number).After user pronunciation, the individual character of wanting seek from top to bottom from the matrix of everyday character according to similarity and letter (or stroke number), be easy to find the desired individual character 100 of user.
(7) to recognize sentence and title; Earlier sentence is cut into D individual character; Ask each individual character to send out F unknown single-tone the most similar; Sentence is by the matrix representation 86 of D * F unknown single-tone, all individual characters in F type of the F of each one unknown single-tone representative the most similar according to and the similarity of this one be arranged in row, the individual character of the individual similar unknown single-tone of a total D row F can comprise this sentence 110.In sentence and name data storehouse 85,, in sentence and name data storehouse, look for a most probable sentence or title 130 with individual character all sentences of screening of F of 3 row windows similar unknown single-tone and each the known individual character (120) in the title.Be specified in the back:
(a) to a talker's sentence or title identification, we set up a sentence and name data storehouse earlier, and the individual character in each sentence or the title all is made up of needed individual character.
(b) sentence or title are cut into D single-tone, the per unit period is calculated the summation of adjacent two signaling point drop distances, as too little; This period is a noise or quiet, does not have the adjacent cells period accumulation too many (more than the time between English-word two syllables) of speech sound signal, and expression is noise or quiet entirely; It should be the separatrix of two individual characters; Just should cut, be cut into D single-tone altogether, every single-tone changes into E * P LPCC matrix again.To each single-tone; In the m class, select the most similar F unknown single-tone with Bei Shi classification (20); Sentence or title are represented with the most similar unknown single-tone of D * F; All individual characters in F type of the F of each one unknown single-tone representative the most similar are listed as into row according to the one of wanting (definitely) the distance row of institute (similarity), and the individual character of a total D row F similar unknown single-tone can comprise this sentence or title 110.
(c) if the sentence of the selection of data bank comparison or title and talker's sentence or title isometric (D individual character); The preface comparison be docile and obedient in a D known individual character of the individual character of so that D every row are similar unknown single-tone and comparison sentence or title, looks at whether the individual character of the similar unknown single-tone that the D row are different compares the known individual character in sentence or the title.As all containing the known individual character in a comparison sentence or the title in the individual character of the similar unknown single-tone of every row, recognize that correct individual character is D, then the sentence of this comparison or title are exactly talker's sentence or title.
(d) if individual character is D-1 or D+1 or is not that D is individual at the correct individual character of (c) that the present invention then screens with 3 row windows in data bank comparison sentence and the title.In comparison sentence or title (in the data bank); I known individual character is with the individual character of F similar unknown single-tones of three row before and after in the D row (i.e. i-1, i; The i+1 row) i known individual character of comparison; Calculate D and show the known individual character of how much comparing in sentence or the title, obtain the probability of this comparison sentence or title again divided by total D, select maximum sentence of a probability or name to be called talker's pronunciation sentence or title in data bank.
(8) the present invention provides a kind of technology, revises the unsuccessful individual character of identification and sentence or title, and the unsuccessful individual character of input, and adds the method for new individual character:
(a) can not find desired individual character, possible cacoepy mispronounces, or uses other Languages, and then in the everyday character of the F class of the similar unknown single-tone representative of F, one does not fix in the sub-block commonly used of other type this individual character, or not in all m data bank.The present invention provides a means to save the situation.After the user sends out the individual character sound desired; Look for the most similar unknown single-tone with Bei Shi distance (20); With desired individual character be placed on this most similar single-tone representative the class sub-block commonly used in; Later on the user sends out single-tone same, and desired individual character will appear in the class of F unknown single-tone representative the most similar.
(b) desired individual character is in all m classes, after then sending out this institute and wanting the individual character sound, this new individual character is added in the sub-block commonly used of class of the most similar unknown single-tone representative.
(c) sentence or title can not (every individual character) be recognized successfully, and this sentence is told, and the present invention is divided into D individual character with this sentence, and the present invention assigns to each individual character in the class of its most similar unknown single-tone with the Bei Shi classification, recognize that again this sentence can success.
(d) unisonance not can be sent out multiple in individual character, and like standard pronunciation, nonstandard sound, wrong sound, or with the other Languages pronunciation, the present invention is placed on this individual character respectively in the class of different unknown single-tones.The user can send out sound any to same individual character, so anyone can use the present invention to recognize individual character or sentence and input individual character.
(9) in order to confirm that the present invention can use the phonetic entry individual character, the inventor collects 3755 individual characters commonly used, and a lot of repetitions are arranged, and does not have funds to set up a complete individual character data bank commonly used.The inventor is to this 3755 one commonly used, with the inventive method 3755 individual characters is assigned in 659 types of m=659 unknown single-tone representative.Import individual character with the inventive method again, all can recognize and input by voice more than ninety percent, other individual character also can be recognized and imported after proofreading and correct through the 8th method of the present invention.Instructions of the present invention, except that the mathematics formula, the Chinese part is all with voice identification of the present invention and input.Fig. 3 and Fig. 4 are the synoptic diagram with voice identification of the present invention of Visual Basic software executing and input segment instructions of the present invention.The present invention collects 659 Chinese individual characters and 155 English individual characters, assigns to the Bei Shi classification in the class of m=388 unknown single-tone, recognizes 561 Chinese sentences and 70 English sentences with the present invention, proofreaies and correct with the 8th method and all recognizes successfully.The synoptic diagram of Chinese and english sentence is recognized in Fig. 5~9th with the present invention.
What need statement is that foregoing invention content and embodiment are intended to prove the practical application of technical scheme provided by the present invention, should not be construed as the qualification to protection domain of the present invention.Those skilled in the art are in spirit of the present invention and principle, when doing various modifications, being equal to replacement or improvement.Protection scope of the present invention is as the criterion with appended claims.

Claims (6)

1. voice all language of identification and with the method for phonetic entry individual character is characterized in that, said method comprising the steps of:
(1) individual character is English, Chinese or other literal, and the pronunciation of an individual character is a single-tone, and this method has m the unknown or known single-tone and an individual character data bank commonly used, and each unknown single-tone has sample;
(2) first front processors are left out and are not had the signaling point or a noise of voice sound wave;
(3) a kind of single-tone sound wave normalization and extract the step of characteristic, be with E resilient bezel with sound wave normalization, and convert the encode matrix of cepstrum E * P characteristic of equal-sized linear predictor to;
(4) ask the mean value and the variance of each unknown single-tone linear predictor coding cepstrum sample, the mean value of an E * P sample and variance matrix represents a unknown single-tone, and each unknown single-tone is represented one type of everyday character that similar sound is arranged, and one has the m class;
The speaker of (5) standards of pronouncing distinctly to each one commonly used once, like user's cacoepy, or with other dialect and language pronouncing, is then pronounced by the user;
(6) the sound wave normalization of each one commonly used and extract the step of characteristic is with sound wave normalization and convert the matrix of an E * P linear predictor coding cepstrum to;
(7) a kind of step of simplifying the Bei Shi classification; Be the E * matrix of P linear predictor coding cepstrum of an individual character commonly used of comparison and the mean value and the variance matrix of the E * P sample of each unknown single-tone; Look for the unknown single-tone the most similar with Bei Shi distance with the pronunciation of this everyday character, will use always again individual character be placed on this most similar unknown single-tone representative type everyday character in;
(8) user is to desired one, and this single-tone converts the matrix of the linear predictor coding cepstrum of an E * P to;
(9) with the step of simplifying the Bei Shi classification relatively the user want the E * matrix of P linear predictor coding cepstrum of individual character and the mean value and the variance matrix of the E * P sample of each unknown single-tone, with Bei Shi apart from looking for the individual unknown single-tone of the F the most similar with want one;
(10) in F type everyday character of the most similar F unknown single-tone representative; The distance of asking the E of all everyday characters * matrix of P linear predictor coding cepstrum and E * P linear predictor of the individual character of wanting to encode between the matrix of cepstrum is arranged all everyday characters of F class; After the arrangement, the desired individual character of user should come the foremost, or after arranging; All everyday characters are divided into number and wait section; Whenever, wait the section everyday character to arrange with letter or stroke number, all everyday characters of F class are according to being arranged in a matrix with the distance of the one of wanting and the letter or the stroke number of everyday character, after the user pronounces; Letter or stroke number according to the individual character of wanting are looked in the matrix of these all everyday characters of F class from top to bottom, are easy in matrix, find the individual character of wanting;
(11) a kind of step of recognizing sentence and title;
(12) a kind ofly revise identification unsuccessful individual character and sentence and title, and the unsuccessful individual character of input, and add the step of new individual character.
2. voice according to claim 1 all language of identification and with the method for phonetic entry individual character; It is characterized in that; Leave out in the said step (2) do not have signaling point or the noise of voice sound wave be calculate adjacent two signaling points in one hour period apart from summation and general adjacent two signaling points of noise apart from summation, then leave out this period like the former less than the latter.
3. voice according to claim 1 all language of identification and with the method for phonetic entry individual character is characterized in that said step (3) comprises a single-tone sound wave normalization and extracts eigenmatrix of the same size, and step is following:
(a) step of the signaling point of a single-tone sound wave of a kind of equal five equilibrium in order to estimate the sound wave of nonlinearities change closely with the Regression Model of linear change, is divided into E with the sound wave total length and waits the period; Per period forms a resilient bezel; Total E the isometric resilient bezel of single-tone does not have wave filter, and is not overlapping; Can freely stretch and contain the total length sound wave, be not the Hamming window of regular length;
(b) in each frame, estimate the sound wave of intercropping nonlinearities change at any time with the Regression Model of an intercropping linear change at any time;
(c) calculate the linear predictor coding vector with Durbin ' s recurrent formula, wherein:
Signaling point S (n) can estimate that its estimated value S ' (n) is represented by following Regression Model by the front signaling point:
S &prime; ( n ) = &Sigma; k = 1 P a k S ( n - k ) , n &GreaterEqual; 0 - - - ( 1 )
In (1) formula, a k, k=1 ..., P is to return the unknowm coefficient estimated value, and P is a front signaling point number, and this group estimated value is called the linear predictor coding vector, and details are as follows to ask the linear predictor coding vector step of signaling point in the frame:
With E1 expression signal point S (n) and estimated value S ' thereof (n) between difference of two squares summation:
E 1 = &Sigma; n = 0 N [ S ( n ) - &Sigma; k = 1 P a k S ( n - k ) ] 2 - - - ( 2 )
Ask regression coefficient to make total sum of squares E 1Reach minimum, to each unknown regression coefficient a i, i=1 ..., P asks the partial differential of (2) formula, and to make partial differential be 0, obtains P and organizes normal equation:
&Sigma; k = 1 P a k &Sigma;S n ( n - k ) S ( n - i ) = &Sigma; n S ( n ) S ( n - i ) , 1 &le; i &le; P - - - ( 3 )
(2) formula of expansion with (3) formula substitution, gets minimum total difference of two squares E P
E P = &Sigma; n S 2 ( n ) - &Sigma; k = 1 P a k &Sigma; n S ( n ) S ( n - k ) - - - ( 4 )
(3) formula and (4) formula convert into
&Sigma; k = 1 P a k R ( i - k ) = R ( i ) , 1 &le; i &le; P - - - ( 5 )
E P = R ( 0 ) - &Sigma; k = 1 P a k R ( k ) - - - ( 6 )
In (5) formula and (6) formula, represent that with N signal is counted in the frame,
R ( i ) = &Sigma; n = 0 N - i S ( n ) S ( n + i ) , i &GreaterEqual; 0 - - - ( 7 )
It is following to calculate the linear predictor coding vector fast with the recurrent formula of Durbin:
R ( i ) = &Sigma; n = 0 N - i S ( n ) S ( n + i ) , i &GreaterEqual; 0 - - - ( 7 )
E 0=R(0) (8)
k i = [ R ( i ) - &Sigma; j = 1 i - 1 a j ( i - 1 ) R ( i - j ) ] / E i - 1 - - - ( 9 )
a i ( i ) = k i - - - ( 10 )
a j ( i ) = a j ( i - 1 ) - k i a i - j ( i - 1 ) , 1 &le; j &le; i - 1 - - - ( 11 )
E i = ( 1 - k i 2 ) E i - 1 - - - ( 12 )
(8)-(12) the formula cycle calculations obtains regression coefficient least square estimation value a j, j=1 ..., P, the linear predictor coding vector is following:
a j = a j ( P ) , 1 &le; j &le; P - - - ( 13 )
Use (14) and (15) formula to convert the linear predictor coding vector to more stable linear predictor coding cepstrum vector a ' again i, 1≤i≤P;
a &prime; i = a i + &Sigma; j = 1 i - 1 ( j i ) a i - j a &prime; j , 1 &le; i &le; P - - - ( 14 )
a &prime; i = &Sigma; j = i - P i - 1 ( j i ) a i - j a &prime; j , P < i - - - ( 15 )
(d) with single-tone of E linear predictor coding cepstrum vector representation.
4. voice according to claim 1 all language of identification and with the method for phonetic entry individual character; It is characterized in that; The step that comprises a kind of Bei Shi of simplification classification in the said step (7) compares the mean value and the variance matrix of the E of an one commonly used * matrix of P linear predictor coding cepstrum and the E * P sample of each unknown single-tone; Look for the step of the most similar unknown single-tone, this step is following:
(a) characteristic of the pronunciation of an individual character commonly used is with an E * P linear predictor coding cepstrum (LPCC) matrix X={X Jl, j=1 ..., E, l=1 ..., P, expression, for quick identification, E * P LPCC{X JlSupposition is E * P independent random variable, and normal allocation is arranged, if the individual unknown single-tone of the pronunciation of this everyday character and m in a unknown single-tone c i, i=1 ..., during the m comparison, wherein, m is the sum of all unknown single-tones, then { X JlAverage and variance
Figure FSB00000841909300051
Sample mean and variance with this unknown single-tone estimate that the conditional density function of X is so
f ( x | c i ) = [ &Pi; jl 1 2 &pi; &sigma; ijl ] e - 1 2 &Sigma; jl ( x jl - &mu; ijl &sigma; ijl ) 2
X={X JlBe the linear predictor coding cepstrum of the pronunciation of this everyday character, but
Figure FSB00000841909300053
Available unknown single-tone c iAverage of samples and variance estimate;
(b) step of simplification Bei Shi classification is to looking for a unknown single-tone c in m the unknown single-tone iAs the pronunciation X of this everyday character, a unknown single-tone c iTo the pronunciation X similarity of this everyday character by f (x|c in the following formula i) expression
f ( x | c i ) = [ &Pi; jl 1 2 &pi; &sigma; ijl ] e - 1 2 &Sigma; jl ( x jl - &mu; ijl &sigma; ijl ) 2 ;
(c) be identification fast, with conditional density function f (x|c in the logarithm abbreviation (b) i), and leave out the constant that needn't calculate, get the Bei Shi distance,
l ( c i ) = &Sigma; jl ln ( &sigma; ijl ) + 1 2 &Sigma; jl ( x jl - &mu; ijl &sigma; ijl ) 2 ;
(d) to each unknown single-tone c i, i=1 ..., m, Bei Shi is apart from l (c in calculating (c) formula i) value;
(e) in m, select a unknown single-tone c ' i, it to the Bei Shi of this everyday character pronunciation X apart from l (c ' i) value is minimum, is judged to the most similar unknown single-tone of this everyday character.
5. voice according to claim 1 all language of identification and with the method for phonetic entry individual character is characterized in that comprise a kind of step of recognizing sentence and title in the said step (11), its step is following:
(a) to a talker's sentence or title identification, we set up a sentence and name data storehouse earlier, and the individual character in each sentence or the title all is made up of the needed individual character of individual character data bank commonly used;
(b) sentence or title are cut into D single-tone, the per unit period is calculated the summation of adjacent two signaling point drop distances, as too little; This period is a noise or quiet, does not have the adjacent cells period accumulation of speech sound signal too many, between English-word two syllables more than the time; Expression is noise or quiet entirely, should be the separatrix of two individual characters, just should cut; Be cut into D single-tone altogether, every single-tone changes into the matrix of E * P linear predictor coding cepstrum again; To each single-tone; In the m class, select the most similar F unknown single-tone with the step of Bei Shi classification; One sentence or title are represented with the most similar unknown single-tone of D * F; All individual characters in F type of the F of each one unknown single-tone representative the most similar are arranged in row according to the distance with the one of wanting, and the individual character of a total D row F similar unknown single-tone can comprise this sentence or title, and the desired individual character of this sentence or title should come the foremost;
(c) if the sentence of the selection of data bank comparison or title and talker's sentence or title are isometric; The preface comparison be docile and obedient in a D known individual character of the individual character of so that D every row are similar unknown single-tone and comparison sentence or title, looks at whether the individual character of the similar unknown single-tone that the D row are different compares the known individual character in sentence or the title; As all containing the known individual character in a comparison sentence or the title in the individual character of the similar unknown single-tone of every row, recognize that correct individual character is D, then the sentence of this comparison or title are exactly talker's sentence or title;
(d) if individual character is D-1 or D+1 or is not that D is individual at the correct individual character of (c) that the present invention then screens with 3 row windows in data bank comparison sentence and the title; At comparison sentence or title i known individual character; With i-1 in the D row; I, the individual character of i+1 row F similar unknown single-tone compare i known individual character, calculate D and show the known individual character of how much comparing in sentence or the title; Obtain the probability of this comparison sentence or title again divided by total D, select maximum sentence of a probability or name to be called talker's pronunciation sentence or title in data bank.
6. voice according to claim 1 all language of identification and with the method for phonetic entry individual character; It is characterized in that; Comprise in the said step (12) and a kind ofly revise identification unsuccessful individual character and sentence and title, and the unsuccessful individual character of input, and add the step of new individual character, this step is following:
(a) if the user can not find desired individual character; Cacoepy mispronounces or with other Languages pronunciation, and then this individual character is not in the everyday character of the F class of F unknown single-tone representative the most similar; One fixes in the sub-block commonly used of other type or not in all m data bank; After the user sends out the individual character sound desired, look for the most similar unknown single-tone, desired individual character is placed in the sub-block commonly used of class of this most similar unknown single-tone representative with the Bei Shi distance; Later on the user sends out single-tone same, and desired individual character will appear in the class of F unknown single-tone representative the most similar;
(b), this new individual character is added in the sub-block commonly used of class of the most similar unknown single-tone representative if after desired individual character not in all m classes, is then sent out this institute and wanted the individual character sound;
(c) individual character can be sent out standard pronunciation, nonstandard sound, wrong sound or with the multiple not unisonance of other Languages pronunciation, this individual character is placed on respectively in the class of different unknown single-tones, and the user can send out sound any to same individual character;
(d) if success can not be recognized in sentence or title individual character; This sentence is told; The present invention is divided into D individual character with this sentence or title; The present invention assigns to each individual character in its individual character of class of the most similar unknown single-tone with the step of Bei Shi classification, recognize that again this sentence or title can recognize successfully;
(e) the present invention only will recognize or import unsuccessful individual character and relay in the class of its most similar unknown single-tone, not change the mean value and the variance of the characteristic of any unknown single-tone; And F unknown single-tone the most similar of identification from the unknown single-tone of fixing minority m=500 class only.
CN2009101771072A 2009-09-25 2009-09-25 Method for identifying all languages by voice and inputting individual characters by voice Expired - Fee Related CN102034474B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2009101771072A CN102034474B (en) 2009-09-25 2009-09-25 Method for identifying all languages by voice and inputting individual characters by voice

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2009101771072A CN102034474B (en) 2009-09-25 2009-09-25 Method for identifying all languages by voice and inputting individual characters by voice

Publications (2)

Publication Number Publication Date
CN102034474A CN102034474A (en) 2011-04-27
CN102034474B true CN102034474B (en) 2012-11-07

Family

ID=43887279

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2009101771072A Expired - Fee Related CN102034474B (en) 2009-09-25 2009-09-25 Method for identifying all languages by voice and inputting individual characters by voice

Country Status (1)

Country Link
CN (1) CN102034474B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113035195B (en) * 2021-03-04 2022-09-23 江西台德智慧科技有限公司 Artificial intelligence voice interaction terminal equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1293428A (en) * 2000-11-10 2001-05-02 清华大学 Information check method based on speed recognition
CN101246686A (en) * 2007-02-15 2008-08-20 黎自奋 Method and device for identifying analog national language single tone by continuous quadratic Bayes classification method
CN101281746A (en) * 2008-03-17 2008-10-08 黎自奋 Method for identifying national language single tone and sentence with a hundred percent identification rate

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1293428A (en) * 2000-11-10 2001-05-02 清华大学 Information check method based on speed recognition
CN101246686A (en) * 2007-02-15 2008-08-20 黎自奋 Method and device for identifying analog national language single tone by continuous quadratic Bayes classification method
CN101281746A (en) * 2008-03-17 2008-10-08 黎自奋 Method for identifying national language single tone and sentence with a hundred percent identification rate

Also Published As

Publication number Publication date
CN102034474A (en) 2011-04-27

Similar Documents

Publication Publication Date Title
TWI396184B (en) A method for speech recognition on all languages and for inputing words using speech recognition
CN101136199B (en) Voice data processing method and equipment
Sisman et al. Group sparse representation with wavenet vocoder adaptation for spectrum and prosody conversion
Wu et al. Voice conversion using duration-embedded bi-HMMs for expressive speech synthesis
EP4018437B1 (en) Optimizing a keyword spotting system
US5615299A (en) Speech recognition using dynamic features
Shahin Speaker identification in emotional talking environments based on CSPHMM2s
Shaikh Naziya et al. Speech recognition system—a review
CN114023300A (en) Chinese speech synthesis method based on diffusion probability model
CN114495969A (en) Voice recognition method integrating voice enhancement
US20050015251A1 (en) High-order entropy error functions for neural classifiers
JP2001166789A (en) Method and device for voice recognition of chinese using phoneme similarity vector at beginning or end
CN117043857A (en) Method, apparatus and computer program product for English pronunciation assessment
CN102034474B (en) Method for identifying all languages by voice and inputting individual characters by voice
CN101246686A (en) Method and device for identifying analog national language single tone by continuous quadratic Bayes classification method
Barman et al. State of the art review of speech recognition using genetic algorithm
CN101281746A (en) Method for identifying national language single tone and sentence with a hundred percent identification rate
KR20230094826A (en) Method and apparatus for extracting speaker embedding considering phonemic and context information
Kurian et al. Automated Transcription System for MalayalamLanguage
Ananthakrishna et al. Effect of time-domain windowing on isolated speech recognition system performance
CN102479507B (en) Method capable of recognizing any language sentences
Phuong et al. Development of high-performance and large-scale vietnamese automatic speech recognition systems
Khalifa et al. Statistical modeling for speech recognition
Wu et al. Statistical voice conversion with quasi-periodic wavenet vocoder
Minematsu et al. The acoustic universal structure in speech and its correlation to para-linguistic information in speech

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20121107

Termination date: 20140925

EXPY Termination of patent right or utility model