CN1114438A - Chinese word pronunciation inputting system for computer - Google Patents

Chinese word pronunciation inputting system for computer Download PDF

Info

Publication number
CN1114438A
CN1114438A CN 94107804 CN94107804A CN1114438A CN 1114438 A CN1114438 A CN 1114438A CN 94107804 CN94107804 CN 94107804 CN 94107804 A CN94107804 A CN 94107804A CN 1114438 A CN1114438 A CN 1114438A
Authority
CN
China
Prior art keywords
syllable
speech
compound vowel
chinese
words
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN 94107804
Other languages
Chinese (zh)
Inventor
王骏发
许志兴
吴宗宪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN 94107804 priority Critical patent/CN1114438A/en
Publication of CN1114438A publication Critical patent/CN1114438A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Document Processing Apparatus (AREA)

Abstract

The Chinese-character speech enter method for computer includes such main steps as syllable dividing, speech identification, phrase match, initial consonant and vowel identification, generation of syllable candidate, transform of speech to character, display and selection.

Description

Computer Chinese vocabulary pronunciation inputting method
The present invention is relevant for a kind of vocabulary language input method, particularly a kind of phonetic entry vocabulary that utilizes is in computer method, it take the speech as identification unit, promptly for the simple or compound vowel of a Chinese syllable part of each syllable of identification only of the speech more than two words, and two words or monosyllabic word are just with whole syllable identification, to improve the vocabulary pronunciation inputting method of discrimination power and identification speed.
Replace with keyboard input Chinese with phonetic entry, not only make the optimal selection that can not use the user of inputting method, more increase the ubiquity that computing machine uses.On the whole existing phonitic entry method adopts the individual character identification, and has that identification speed is slow, mistake is distinguished the rate height, uses the interface complexity, in addition must the user earlier through problems such as for a long time formal pronunciation trainings.
Because the problems referred to above the object of the present invention is to provide a kind of simple or compound vowel of a Chinese syllable vocabulary phonitic entry method partly with each vocabulary of identification.
The problem that existing phonetic entry mode is faced exists 38 and obscures the sound group in the Chinese speech identification, cause the reason of these 38 audio mixing groups to be; In the middle national language, each syllable can be divided into initial consonant and simple or compound vowel of a Chinese syllable, in general, the discrimination power of simple or compound vowel of a Chinese syllable be can improve because from the characteristic parameter of waveform and voice, can be observed, the cyclical signal that simple or compound vowel of a Chinese syllable is normally stable, and its UL is also longer, however the discrimination power of initial consonant is but much smaller than the discrimination power of simple or compound vowel of a Chinese syllable, its reason is that the length of (1) most of initial consonant is quite short, and the algorithm of therefore general cutting syllable can't accurately be selected them; (2) major part can be said the people of national language, is not subjected to formal pronunciation training, some cerebrals, as with ㄗ, ㄔ and ㄘ, and ㄙ, a lot of people are the differences of ignoring them at all.
For the above reasons, computer Chinese vocabulary phonitic entry method of the present invention, from observing the relation discovery that Chinese words and phonetic notation are combined into speech, most of by the speech of forming more than two words, they in enunciative difference except that can going by 408 syllables to differentiate, as long as differentiate the combination of their simple or compound vowel of a Chinese syllable, can distinguish by the speech that number of words is identical equally in fact, for example:
The Republic of China (ㄓ ㄨ ㄥ ㄨ ㄚ ㄇ-ㄣ ㄍ ㄨ ㄛ) and Taiwan Univ. (ㄊ ㄞ ㄨ ㄢ ㄌ bifurcation ㄩ ㄝ), the phonetic notation of these two speech is as long as go out these two speech " ㄨ ㄥ ㄨ ㄚ-ㄣ ㄨ ㄛ " and " ㄞ ㄨ ㄢ ㄚ ㄩ ㄝ " with regard to distinguishable from simple or compound vowel of a Chinese syllable, being combined in of this simple or compound vowel of a Chinese syllable is easy to reach high discrimination power in the speech recognition, and identification speed also can be accelerated, as long as 38 simple or compound vowel of a Chinese syllable are just much of that because vocabulary input method of the present invention is gone identification, rather than 408 whole syllables.In addition, in the dictionary of the present invention's 80,000 entry words, have 13000 three words and 11388 four words, if the combination of all simple or compound vowel of a Chinese syllable all can form speech, so, have three words of identical simple or compound vowel of a Chinese syllable combination and the number of four words 0.23 (=13000/38) and 0.005 (=11388/38) are arranged respectively, this that is to say that the speech number with identical simple or compound vowel of a Chinese syllable combination is quite few, otherwise the inventive method is also just infeasible.In addition, if this method is used in the result that will cause shot array on two words, because it is many to have two words all toos of identical simple or compound vowel of a Chinese syllable combination, so the present invention must also add the identification of initial consonant come in identification two words.
Therefore, the object of the present invention is to provide and a kind ofly converge to the input method of computer with phonetic entry Chinese, it is to set up a system architecture, comprising syllable cutting, speech recognition, join systems such as speech and user's interface with voice after the flow processing of system, identification correctly rapidly.To reach the purpose of input computing machine.
Computer Chinese vocabulary pronunciation inputting method of the present invention is characterized in that key step comprises:
Syllable cutting:, make into the voice sound frame of a sequence and set the speech characteristic parameter of each sound frame by the different initial consonant and the simple or compound vowel of a Chinese syllable of being cut apart syllable of voice signal cycle stability degree of initial consonant with simple or compound vowel of a Chinese syllable;
Speech recognition: by a simple or compound vowel of a Chinese syllable Bai Shi network and a syllable Bai Shi network with identification sound frame and produce candidate's syllable;
Join speech: by a tandem data structure is that dictionary is a foundation with this dictionary, is converted to candidate word by candidate's syllable; And
User's interface (data-switching): will be converted to displayable electric signal corresponding to the data of candidate word, and borrow computer screen and keyboard, and provide the user by choosing correct candidate word in the screen;
By syllable cutting, speech recognition, join steps such as speech and user's interface: earlier will be cut into initial consonant and simple or compound vowel of a Chinese syllable by the syllable of the voice signal of user's input by the syllable cutting, produce syllable by speech recognition again, again by joining the feature of speech step according to candidate's syllable, in the dictionary by the tandem data structure, match out candidate word, by the demonstration and the mode of operation of user's interface, from candidate word, select correct input speech by the user.
The mode of cutting syllable is different with the cycle stability degree of simple or compound vowel of a Chinese syllable according to initial consonant in the described syllable cutting step, and distinguishes with the sound frame.
Described speech recognition step is divided into two kinds of patterns of the above speech of individual character and two words and two words, promptly with the resolution mode: with the length of simple or compound vowel of a Chinese syllable feature identification speech, wherein individual character and two words are unit with whole syllable promptly earlier, by the in addition identification of syllable Bai Shi network; The above speech of two words with simple or compound vowel of a Chinese syllable according to simple or compound vowel of a Chinese syllable Bai Shi network identification and produce candidate's syllable method.
Described database structure of joining dictionary in the speech step adopts the tandem data structure, and it is that simple or compound vowel of a Chinese syllable numbering with preceding two sounds of speech is combined as index and begins to search, and dictionary is divided into two words and the above speech two large divisions of two words.
As for detailed applications principle of the present invention, system architecture and flow process, effect and effect, then the explanation of doing with reference to following adjoint can be understood completely.
The diagram simple declaration:
Fig. 1: method step synoptic diagram of the present invention.
Fig. 2: method flow diagram of the present invention.
Fig. 3: search in the syllable cutting mode of the present invention and stablize the form synoptic diagram in simple or compound vowel of a Chinese syllable district.
Fig. 4: syllable cutting process flow diagram of the present invention.
Fig. 5: the Bai Shi network structure of voice identification system of the present invention.
Fig. 6: thesaurus structure of joining the speech system of the present invention.
Fig. 7: the initial consonant of a national language syllable ㄅ ㄚ and the spectrogram of simple or compound vowel of a Chinese syllable district example.
As shown in Figure 1, 2, vocabulary pronunciation inputting method of the present invention mainly comprises syllable cutting A, speech recognition B, joins speech C and four steps formations of user's interface D.Its workflow: when voice signal behind microphone input computer, at first cut A, syllable is cut to initial consonant and simple or compound vowel of a Chinese syllable through syllable, because the identification mode in speech recognition step B is to carry out according to the length of speech, therefore for the single-tone of being imported, at first discern simple or compound vowel of a Chinese syllable but during two words by the simple or compound vowel of a Chinese syllable Bai Shi network 21 among the speech recognition step B, then the phonetic feature of initial consonant must be kept earlier, after the syllable for the treatment of whole speech is all imported and finished, number of words determines, determined whether initial consonant is done identification again.The identification of initial consonant and simple or compound vowel of a Chinese syllable is all similarly carried out with Bai Shi network sample for referencial use, but the identification of initial consonant is to be identification unit with whole syllable, and the identification of simple or compound vowel of a Chinese syllable then is the length identification with simple or compound vowel of a Chinese syllable itself.And, because the identification of initial consonant is after simple or compound vowel of a Chinese syllable, therefore, when the identification initial consonant,, so can accelerate identification speed as long as finding out its corresponding initial consonant reference sample at the TOP V of the simple or compound vowel of a Chinese syllable that picks out carries out identification.No matter be the above speech of two words or two words, all get the output of TOP V (can the adjust) syllable that picks out or the numbering of simple or compound vowel of a Chinese syllable as candidate's syllable with the user.After producing candidate's syllable, promptly carry out the action that sound changes word by joining speech C, its speech changes the mode of word, be to meet the fast searching requirement and the dictionary 31 that designs is a foundation with one, after treating that candidate's syllable is made into candidate word, promptly the mode of these candidate word with ten speech of one page is presented on the screen by user's interface system D, for the user with numerical key select the speech of palpus do output.
Below be exactly each key step, describe its function and principle of work in detail:
One, syllable cutting:
Based on simple or compound vowel of a Chinese syllable is a kind of signal with stable period on characteristics of speech sounds, and most initial consonant this characteristic of tool not, syllable cutting step of the present invention is cut apart initial consonant and simple or compound vowel of a Chinese syllable promptly by judging whether stablize of signal cycle, and it calculates flow process as scheming:
1, seeks the starting point (starting point of simple or compound vowel of a Chinese syllable) of stable region
As shown in Figure 3 from the starting point of voice, set the long search window of 25.6ms that is, in this form, find out the position at tangible crest place then, these positions are denoted as F1, F2, F3 in Fig. 3, Fn etc. are the cycle between two crests, because voice are signals of a kind of non-standing wave (nonstationary), therefore the cycle is not a fixing value, we obtain earlier one average period value F F = Σ i = 1 N F i / N
Then judge with following formula whether the cycle in this form is stable
|Fi-F|/F<ε=0.0005
If 2 all satisfy following formula then this form is a stable simple or compound vowel of a Chinese syllable district to all cycles in this form, and the position at first crest place is the starting point of simple or compound vowel of a Chinese syllable, otherwise if following formula does not satisfy, represent that then stable region does not find as yet, this search window is moved 10ms backward, get back to step 1 then, until find till the stable interval.
3, after the starting point of simple or compound vowel of a Chinese syllable finds, begin to be the initial consonant district between the starting point of simple or compound vowel of a Chinese syllable from voice signal.
Above-mentioned flow process as shown in Figure 4.
Two, speech recognition:
As shown in Figure 2, speech recognition step B of the present invention comprises that 21, one on a simple or compound vowel of a Chinese syllable Bai Shi network specializing in the identification simple or compound vowel of a Chinese syllable can make training single-tone network 23 on the syllable Bai Shi network 22 of initial consonant identification and the line.Its principle and Action Specification are as follows:
After at first the initial and the final being separated, initial consonant and simple or compound vowel of a Chinese syllable are divided into the voice sound frame of a sequence respectively, each sound frame is asked for their speech characteristic parameter again, the decision of sound frame number is then decided it to test, be to be decided to be 10 in our system, these sound frames will be used to identification after a while and become candidate's syllable or remove to train the Bai Shi network.
The explanation of relevant sound frame:
1, with a sound waveform be the definition of example explanation " sound frame ":
When doing speech recognition, the little form that a sound waveform is cut into some is analyzed traditionally, and the size of each little form is all fixing sampling spot (samples is generally 300 points), this little form promptly is called the sound frame, and two sound frames can be overlapping.
2, if with Chinese syllable " ㄆ bifurcation ", initial consonant is " ㄆ ", simple or compound vowel of a Chinese syllable is " ㄆ ", and the point-score of rhythm sound sound frame is that first waveform with " ㄆ " and " bifurcation " is separated from waveform " ㄆ bifurcation " with " syllable cutting algorithm " respectively, and the waveform of initial consonant " ㄆ " and simple or compound vowel of a Chinese syllable " bifurcation " both gets.
3, our used characteristic parameter is a kind of bright parameter for " cepstrum ", the method of asking of this parameter is all sampling spots (supposing to have 300 points) with the waveform in the former sound frame, only represent this 300 point with 12, if regard a vector as with these 12, this vector promptly is called proper vector, and those 12 promptly are called characteristic parameter; Change into 12 process by 300 and be called and ask for characteristic parameter,, therefore no longer describe in detail because its process is rather complicated and be well known.
The Bai Shi network is to be the networking model of theoretical foundation with Bai Shi theorem (Bayesian Theorem) basically, the Bai Shi network structurally (as shown in Figure 5) can be divided into four layers: input layer, Gauss's layer, mixolimnion and inductive layer.Input layer is the characteristic parameter of the voice sound frame of identification, Gauss's layer is formed by the distribution scenario of statistics training sample, because the distribution of sample may be very scattered, therefore in Gauss's layer, there is the Gaussian distribution of stationary nodes number to simulate, after a while mean value and the variance that how to obtain these distributions will be described again, mixolimnion is a kind of Gauss's Probability Distribution of mixing, as top said, the distribution of sample may be very scattered, the sound frame of a reference sample may be mixed by a lot of distributions could simulate its real behavior that distributes, the distribution of certain mixing is to be added up by the different Gaussian distribution of proportion to form, so in the structure of Bai Shi network, must there be a proportion weighted value in mixolimnion and Gauss's interlayer, this weighted value also is to distribute resulting by training sample, the output of inductive layer just changes into distance (Distortion) output to the mixing probability of mixolimnion, there is no other purposes at this; Distance is meant the different degree between the characteristic parameter of two sound frames, if when a vector represent sound frame, asks apart from being to ask two vector (V promptly 1, V 2) between apart from V 1=(a 1, a 2, a 3A 12); V 2=(b 1, b 2, b 3B 12);
Figure A9410780400101
Below put up with training Bai Shi network and how to utilize the Bai Shi networking to do voice to recognize to separate and be described as follows.
(1) training Bai Shi network
The training of Bai Shi network mainly is the Gaussian distribution (being exactly the mean value and the variance of Gaussian distribution) of each node in obtaining Gauss's layer, the proportion weighted value that also has mixolimnion and Gauss's interlayer, each the simplest mode that distributes that obtains Gauss's layer is exactly to utilize the method for classification, we utilize a kind of so-called K-MEANS sorting algorithm at this, and its flow process is as follows:
Suppose now to have N training sample (sound frame) and will be divided into the K class
1, set K central point μ K at random, and cumulative distance value Dc of initialization, and make m=1.
2, all samples are done classification in K center according to this, The classification basis is to judge apart from length so that sample is excentric, the classification under the classification at its shortest center is it, and the classification center of each sample under it all has one apart from d j, 1≤j≤N.
3, all dj are added up all obtain a new cumulative distance value DM finishes to judge whether to classify according to following formula D m = Σ j = 1 N d j
|D m-1-D m|/D m<ε=0.00005
4, then ask for new central point μ K to of all categories if following formula is false, and replace old μ K with μ K, and make m=m+1 get back to step 2, all samples are done classification to new classification central point again again, finish otherwise classify promptly to accuse.
K-ME ANS sorting algorithm: the characteristic parameter (and N proper vector) of supposing our existing N sound frame, we desire to be divided into K classification (CLASS) K<N, each classification all there is a central point, each central point is all a proper vector, and we promptly represent the proper vector of original N sound frame with this K central point.
After classification is finished, except can getting central point of all categories (mean value), we also can ask for variance of all categories, because after having classified, it all is as can be known that sample belongs to that class, in addition, we also will utilize the belonging kinds of each sample to ask the proportion weighted value, the proportion weighted value ask method as follows:
If the voice of each training have 10 sound frames, mixolimnion just has 10 nodes so, suppose to have now 15 voice to be trained, then always have 150 sound frames, and also there are 15 respectively from first sound frame to the ten sound frames, requiring the rate of specific gravity of mixolimnion and Gauss's interlayer need only remove to calculate these sound frames is distributed to sound frame number of all categories and can tries to achieve divided by the number of training utterance, for example: require first node of mixolimnion and each internodal rate of specific gravity of Gauss's layer, because even first node of mixolimnion is represented the mixing of first sound frame of each sound, so if the number that first sound frame of 15 voice is distributed in 4 Gauss's node layers is respectively 5,7,4,0 words, so, four rates of specific gravity between the node of first mixolimnion and the Gauss's layer are respectively 5/15,7/15,4/15,0/15.
Language has 38 simple or compound vowel of a Chinese syllable and 408 syllables, so we have set up 408 Bai Shi networks (38 simple or compound vowel of a Chinese syllable networkings are also contained in wherein, and are as shown in table 1) altogether.
(2) identification of Bai Shi network:
For convenience of description, below how we will utilize with the good Bai Shi network of construction with the explanation of the numeral of reality and come identification; Supposing now to import voice sound frame number to be measured is 10 (interstitial content of mixolimnion also must be 10), we ask the mixing probability of the relative node output in the mixolimnion in each network of each sound frame earlier, we are that example (comprises that is numbered an empty simple or compound vowel of a Chinese syllable of 0 with 38 simple or compound vowel of a Chinese syllable networks, as " knowing " simple or compound vowel of a Chinese syllable (ㄓ)); If P i j(X K) be the mixing probability output of j the mixolimnion node of K sound frame in i simple or compound vowel of a Chinese syllable network, its formula is P j i ( X K ) = Σ m = 1 M W jm i / 2 π σ 2 im / · exp [ - 1 / 2 ( μ Im-X K) 2/ σ 2 Im] wherein: M is the number M of Gauss's node layer Jm iBe the proportion weighted value μ between j mixolimnion node of i network and m the Gauss's node layer ImIt is the average value vector of m Gauss's node layer in i the network; σ ImIt is the variance vector of m Gauss's node layer in i the network; Again changing into distance below the mixing probability of gained; d j i(X KThe logP of)=- j i(X K)
In order to match correct network, we find out correct networking output in the mode of a kind of linear matched (linear matching), promptly D i = Σ j = 1 S d j i ( X j ) Wherein s is the number of mixolimnion node and correct networking i *For having little D iNetwork, promptly
i *=argmin(D i)
1≤i≤38 are for the consideration of fault-tolerant problem, and we also not only select a network the most correct, but in other words the network that selects TOP V as the candidate syllable of joining speech is exactly
Figure A9410780400141
The network of the Di of outer minimum is the output of candidate's syllable.
Three, join speech:
The speech step (C) of joining of the present invention is that the candidate's syllable that will be come out by the identification of speech recognition (B) institute converts Chinese character output to, and it mainly is to be foundation with a dictionary 31 that is made for syllable and changes word.Speech its identification mode above owing to two words and two words is different, and therefore, the structure of dictionary 31 is broadly divided into the two large divisions among the present invention.As shown in Figure 6, wherein, the first half is in order to joining the structure of two words, and Lower Half is then for joining the structure of the above speech of two words.) but be still basically begin to search as shown in table 1 with being combined as index of simple or compound vowel of a Chinese syllable numbering of preceding two sounds of speech).
In order to speed search speed and to make things convenient for the additions and deletions entry word, dictionary 31 of joining in the speech system (B) of the present invention is the data mode that adopts a kind of tandem (linklist) formula, and because of the restriction of computer memory, leave in and only be the index of some tandems and content in the storer, real dictionary (literal) places on the disk.The data of dictionary 31 (as shown in Figure 6), its with
1, two syllable index (TWOVOWEL INDEX) 301) combination of one group of two simple or compound vowel of a Chinese syllable is represented in the position of node, the decision of this position is that the numbering with two simple or compound vowel of a Chinese syllable calculates, for example the position of the two syllable index (TWOVOWEL INDEX) of ㄥ ㄨ ㄥ is 12 * 38+32, and the content of this node comprises:
(1) byte point (WORDNODE) 304 indexs of the triliteral simple or compound vowel of a Chinese syllable of sensing.
(2), point to the index of two syllable index (TWOWORDNODE) 301 of two words.
The function of other nodes and theing contents are as follows:
2, byte point WORDNODE mainly is the situation that is connected between the record simple or compound vowel of a Chinese syllable, and its content comprises:
(1) simple or compound vowel of a Chinese syllable of this node representative numbering.
(2) the byte point (WORDNODE) 304 that points to the simple or compound vowel of a Chinese syllable that can connect again after this simple or compound vowel of a Chinese syllable is numbered.
(3) sensing is with byte point (WORDNODE) 304 indexs of delegation.
(4) if can form a speech to this node, then this content is pointed to some word indexing nodes (WORDI NDEXNODE) 305.
3, word indexing node (WORDI NDEXNODE) the 305th is used to refer to the position of speech in the magnetic disc archives, and its content comprises:
(1) points out the position of speech in the magnetic disc archives.
(2) point to a word indexing node, (WORDI NDEXNODE) 305 with identical simple or compound vowel of a Chinese syllable combination.
4, two syllable index (TWOVOWEL INDEX) the 301st, in order to search the tandem of two words, their put down in writing the data of initial consonant, and its content comprises:
(1) initial consonant of record two words and the numbering of the four tones of standard Chinese pronunciation.
(2) point to next two syllable index (TWOVOWEL INDEX) 301 with identical simple or compound vowel of a Chinese syllable combination.
(3) point to the two syllable index (TWOVOWEL INDEX) 301 of this two words.
5, two syllable index (TWOVOWEL INDEX) 301 structures and word indexing node (WORDI NDEXNODE) 305 are identical, but in order to point to the position of two words in the magnetic disc archives.
Referring now to shown in Figure 6, be example to search this speech of (successful building) (ㄔ ㄥ, ㄍ ㄨ ㄥ, ㄌ bifurcation, ㄌ ㄡ), the operative scenario of dictionary 31 is described:
The position of two syllable index (TWOVOWEL INDEX) 301 is found in the combination of the simple or compound vowel of a Chinese syllable of two words at first, content by two syllable index (TWOVOWEL INDEX) 301 finds first to point to the index of byte point (WORDNODE) 304 again, the simple or compound vowel of a Chinese syllable of finding this node is a bifurcation, with triliteral simple or compound vowel of a Chinese syllable is identical, therefore be that starting point continues to look for next word just with the FOLLOW index of this node, finding the simple or compound vowel of a Chinese syllable of next byte point (WORD NODE) 304 is ㄩ ㄝ, also inequality with the simple or compound vowel of a Chinese syllable ㄡ of the 4th word, therefore be that starting point continues to look for just with the NEXT of this node, the simple or compound vowel of a Chinese syllable that finds next byte point (WORDNODE) 304 is ㄡ, conform to the simple or compound vowel of a Chinese syllable of the 4th word, therefore the simple or compound vowel of a Chinese syllable in (successful building) combination has just been found, and what this moment was remaining finds out the position of this speech in the magnetic disc archives by (WORD INDEXNODE) 305 exactly.Four, user's interface system:
The embodiment of the invention, on IBM PC/AT PC, this invention is designed to the program of a resident formula, therefore other any existing software all can be used to carry out, simultaneously and did the test of its compatibility at more existing on the market softwares at present, comprise PEII, HE, DBase, Turbo C etc., these softwares all can normally be worked in system of the present invention, in other words, embodiment and other application software are to exist simultaneously in the storer, the user can use existing keyboard to import data, also can directly import Chinese words with the present invention by microphone certainly, below will do a brief description with regard to user's interface design.
After the user speaks into a national language syllable in by microphone, native system can detect this information automatically and carry out the identification of syllable and show that in the bottom of screen his word just read of ">" symbol prompting user has been become candidate's syllable by identification, he can continue the read next word at this moment, if the user realizes that he had misread just now, also can use " DEL " key on the keyboard to delete previous mistake.After all words in the speech all run through, system promptly begins to carry out candidate's syllable is changed into the step of candidate word, the candidate speech that system allotted will be presented at the bottom of screen in the mode of ten speech of one page and do selection for the user with numerical key, also can use simultaneously ↑ ↓ directionkeys skips selective to show other one group of ten candidate word, a picture of doing artificial correction syllable (remark sign) for the user will be presented on the screen if all candidate's syllables are all joined less than the candidate word that meets, correction is finished, system promptly does the action of joining speech again again, yet, under a lot of situations, this picture can't occur, in case the number of candidate's syllable can be increased and do automatically secondary speech of joining because system realizes when once can not joining candidate word, the speech of can not joining in the first time will occur for the second time usually.In order to increase the convenience of system, present embodiment also adds the function that adds speech on the line, because a lot of Chinese words, title as name, company's row number and organ all is not specific, we can not add these speech in the dictionary at the very start one by one, and therefore designing this function does the additions and deletions of entry word to make things convenient for the user.
The inventive method through test, is tested employed equipment and is included: one one of IBM PC/AT PC, one of microphone, one on interface card (mould/number conversion card) (being used for that sound is changed into digital signal can accept computer).Tested person person comprises: feminine charm compatriots are arranged, and (this U.S. woman lives in Taiwan half a year approximately, but she can read ㄣ ㄇ ㄈ ... so her experimental result is accomplished.), two children and five adult men and women.The naturally of experiment is in the laboratory, but the laboratory is under a noisy environment sometimes, and the sound equipment of Source Music and other people are at the sound of chat etc.Each tested person person utilizes 20 minutes time that Bai Shi network reference sample with the training syllable read once in the computer in 408 syllables of national language earlier.Then ask them to read out 50 speech that produced at random by computer again, the length of its speech is from two words to five words (the appendix listed be the entry word that American Women's is tested), and with the test recognition results, table 2 promptly is that these 3 groups of people's identification result shows.We can recognize from table 2, for example, two children's identification result is: they two test 100 speech altogether, wherein include 20 two words (everyone reads 10), and the correct identification of quilt is that first place has 3 at the speech number of TOP V respectively, second place has 4, third has 3, fourth has 2, and the 5th has 4, in other words has only four to be to drop on outside five; Though and that U.S. woman's identification result drops on the speech number of first place is few, but in TOP V, but can find correct speech, this is unblamable, because her pronunciation is not really clear really, but this there is no big harm, because even correct speech is not recognized as first place, we also can allow the user easily it be picked out with the numerical key on the keyboard from the candidate word of top ten list by user's interface of a friendliness.
What table 3 was shown is other a kind of statistical, and the meaning of its representative is the different speech of statistics number of words, and the correct identification of their quilts is at the number of TOP V and the ratio of total speech number.By we can summarize a conclusion in the table, that is exactly the many more speech of number of words, and its discrimination power is just high more, and this chief reason has two: 1. the discrimination power than syllable is high a lot of really for the discrimination power of simple or compound vowel of a Chinese syllable; 2. the speech number that has identical simple or compound vowel of a Chinese syllable combination is got over the multidate locality less in number of words.
The present invention has following effect:
In sum, computer Chinese vocabulary pronunciation inputting method of the present invention, be that mat syllable cutting fast is with the speech recognition step and utilize the Bai Shi network, two words and speech more than two words are separated identification, make the identification of syllable and the identification speed of voice to improve greatly, again owing to join in the speech mode dictionary searched structure fast among the present invention, made whole identification system almost reach the requirement of timelyization (real time), making really in the past with single Chinese words is that the discrimination method of identification unit is improved.Table 1
Numbering 1 2 3 4 5 6 7 8 9 10 11
Initial consonant
Numbering 12 13 14 15 16 17 18 19 20 21
Initial consonant
Numbering 0 1 2 3 4 5 6 7 8 9 10
Simple or compound vowel of a Chinese syllable Bifurcation
Numbering 11 12 13 14 15 16 17 18 29 20 21
Simple or compound vowel of a Chinese syllable Youngster — ㄚ — ㄝ — ㄞ — ㄠ — ㄡ — ㄞ — ㄣ
Numbering 22 23 24 25 26 27 28 29 30 31 32
Simple or compound vowel of a Chinese syllable — ㄤ — ㄥ ㄨ ㄚ ㄨ ㄛ ㄨ ㄞ ㄨ ㄟ ㄨ ㄢ ㄨ ㄣ ㄨ ㄤ ㄨ ㄥ
Numbering 33 34 35 36 37
Simple or compound vowel of a Chinese syllable ㄩ ㄝ ㄩ ㄢ ㄩ ㄣ ㄩ ㄥ
Table 2
Two words Three words
Which name 1 ?2 ?3 ?4 ?5 ?1 ?2 ?3 ?4 ?5
Child (2) 3 ?4 ?3 ?2 ?4 ?6 ?4 ?2 ?2 ?4
Adult (5) 5 ?7 ?12 ?9 ?7 ?20 ?10 ?5 ?10 ?0
American (1) 0 ?2 ?1 ?1 ?1 ?3 ?1 ?3 ?1 ?0
Four words Five character word
Which name 1 ?2 ?3 ?4 ?5 ?1 ?2 ?3 ?4 ?5
Child (2) 22 ?8 ?4 ?2 ?0 ?16 ?3 ?0 ?0 ?0
Adult (5) 45 ?32 ?12 ?9 ?0 ?48 ?2 ?0 ?0 ?0
American (1) 7 ?5 ?0 ?5 ?1 ?9 ?0 ?0 ?0 ?0
Table 3
Two words Three words Four words Five character word
Child (2) 16/20 ?18/20 ?30/40 ?19/20
Adult (5) 40/50 ?45/50 ?98/100 ?50/50
American (1) 5/10 ?8/10 ?18/20 ?9/10

Claims (4)

1, a kind of computer Chinese vocabulary pronunciation inputting method is characterized in that key step comprises:
Syllable cutting:, make into the voice sound frame of a sequence and set the speech characteristic parameter of each sound frame by the different initial consonant and the simple or compound vowel of a Chinese syllable of being cut apart syllable of voice signal cycle stability degree of initial consonant with simple or compound vowel of a Chinese syllable;
Speech recognition: by a simple or compound vowel of a Chinese syllable Bai Shi network and a syllable Bai Shi network with identification sound frame and produce candidate's syllable;
Join speech: by a tandem data structure is that dictionary is a foundation with this dictionary, is converted to candidate word by candidate's syllable; And
User's interface (data-switching): will be converted to displayable electric signal corresponding to the data of candidate word, and borrow computer screen and keyboard, and provide the user by choosing correct candidate word in the screen;
By syllable cutting, speech recognition, join steps such as speech and user's interface: earlier will be cut into initial consonant and simple or compound vowel of a Chinese syllable by the syllable of the voice signal of user's input by the syllable cutting, produce syllable by speech recognition again, again by joining the feature of speech step according to candidate's syllable, in the dictionary by the tandem data structure, match out candidate word, by the demonstration and the mode of operation of user's interface, from candidate word, select correct input speech by the user.
2, computer Chinese vocabulary pronunciation inputting method as claimed in claim 1 is characterized in that, the mode of cutting syllable is different with the cycle stability degree of simple or compound vowel of a Chinese syllable according to initial consonant in the described syllable cutting step, and distinguishes with the sound frame.
3, computer Chinese vocabulary pronunciation inputting method as claimed in claim 1, it is characterized in that, described speech recognition step is divided into two kinds of patterns of the above speech of individual character and two words and two words, promptly with the resolution mode: earlier with the length of simple or compound vowel of a Chinese syllable feature identification speech, wherein individual character and two words are unit with whole syllable promptly, by the identification in addition of syllable Bai Shi network; The above speech of two words with simple or compound vowel of a Chinese syllable according to simple or compound vowel of a Chinese syllable Bai Shi network identification and produce candidate's syllable method.
4, computer Chinese vocabulary phonetic entry word method as claimed in claim 1, it is characterized in that, described database structure of joining dictionary in the speech step adopts the tandem data structure, it is to be combined as index with the simple or compound vowel of a Chinese syllable of preceding two sounds of speech numbering to begin to search, and dictionary is divided into two words and the above speech two large divisions of two words.
CN 94107804 1994-06-30 1994-06-30 Chinese word pronunciation inputting system for computer Pending CN1114438A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 94107804 CN1114438A (en) 1994-06-30 1994-06-30 Chinese word pronunciation inputting system for computer

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 94107804 CN1114438A (en) 1994-06-30 1994-06-30 Chinese word pronunciation inputting system for computer

Publications (1)

Publication Number Publication Date
CN1114438A true CN1114438A (en) 1996-01-03

Family

ID=5033226

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 94107804 Pending CN1114438A (en) 1994-06-30 1994-06-30 Chinese word pronunciation inputting system for computer

Country Status (1)

Country Link
CN (1) CN1114438A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102063282B (en) * 2009-11-18 2014-08-13 上海果壳电子有限公司 Chinese speech input system and method

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102063282B (en) * 2009-11-18 2014-08-13 上海果壳电子有限公司 Chinese speech input system and method

Similar Documents

Publication Publication Date Title
CN1162839C (en) Method and device for producing acoustics model
Cao et al. Speaker-sensitive emotion recognition via ranking: Studies on acted and spontaneous speech
CN1256714C (en) Hierarchichal language models
CN106503805A (en) A kind of bimodal based on machine learning everybody talk with sentiment analysis system and method
CN1860504A (en) System and method for audio-visual content synthesis
CN1463419A (en) Synchronizing text/visual information with audio playback
CN105895087A (en) Voice recognition method and apparatus
CN1315809A (en) Apparatus and method for spelling speech recognition in mobile communication
CN112185348A (en) Multilingual voice recognition method and device and electronic equipment
CN112309365A (en) Training method and device of speech synthesis model, storage medium and electronic equipment
CN111329494A (en) Depression detection method based on voice keyword retrieval and voice emotion recognition
CN1841496A (en) Method and apparatus for measuring speech speed and recording apparatus therefor
CN104538025A (en) Method and device for converting gestures to Chinese and Tibetan bilingual voices
Volk et al. Towards modelling variation in music as foundation for similarity
CN110782875A (en) Voice rhythm processing method and device based on artificial intelligence
CN102063282B (en) Chinese speech input system and method
CN116524960A (en) Speech emotion recognition system based on mixed entropy downsampling and integrated classifier
CN1224954C (en) Speech recognition device comprising language model having unchangeable and changeable syntactic block
CN102970618A (en) Video on demand method based on syllable identification
CN1158621C (en) Information processing device and information processing method, and recording medium
Sarkar et al. Raga identification from Hindustani classical music signal using compositional properties
CN117390409A (en) Method for detecting reliability of answer generated by large-scale language model
CN111462774A (en) Music emotion credible classification method based on deep learning
Yang Design of service robot based on user emotion recognition and environmental monitoring
CN1114438A (en) Chinese word pronunciation inputting system for computer

Legal Events

Date Code Title Description
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C06 Publication
PB01 Publication
C01 Deemed withdrawal of patent application (patent law 1993)
WD01 Invention patent application deemed withdrawn after publication