CN101067780B

CN101067780B - Character inputting system and method for intelligent equipment

Info

Publication number: CN101067780B
Application number: CN2007101124124A
Authority: CN
Inventors: 张会鹏
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Shenzhen Shiji Guangsu Information Technology Co Ltd
Priority date: 2007-06-21
Filing date: 2007-06-21
Publication date: 2010-06-02
Anticipated expiration: 2027-06-21
Also published as: CN101067780A

Abstract

This invention discloses a character input system of intelligent equipment including: a phone receiving module used in receiving phones, a phone parameter library used in storing the corresponding relations of phones and spellings, a conversion module used in converting phone signals received by the phone module to corresponding spellings and a character generating module used in generating characters based on the spellings converted by the conversion module. This invention also discloses a character input method for intelligent equipment, which stores the corresponding relations between phones and spellings in advance, then converts received phones into corresponding spellings based on the relations then to generate characters according to the converted spellings.

Description

The character input system of smart machine and method

Technical field

The present invention relates to the word processing technology of smart machine, relate in particular to a kind of character input system and character input method of smart machine.

Background technology

The computer user utilizes input in Chinese software that Chinese character is input to smart machine usually.Input in Chinese software just runs on a tool software on the computer operating system, the media data of the coding of keyboard input or the input of other non-keyboards is converted to the software of Chinese character input.Input in Chinese software can be divided into keyboard Input Software and non-keyboard Input Software.

What comparative maturity and use were the widest at present is exactly keyboard input in Chinese software.Keyboard input in Chinese software utilizes keyboard exactly, imports a kind of method of Chinese character according to certain coding rule.

English alphabet has only 26, and their correspondences 26 letters on the keyboard, so, for English, be not have what Input Software, directly inputting English letter just can.The number of words of Chinese character has several ten thousand, they and keyboard are without any corresponding relation, but in order in computing machine, to import Chinese character, must encode to Chinese character, and with these the coding with keyboard on binding be, could import the coding of certain Chinese character then by keyboard, become Chinese character according to code conversion then.

At present, Hanzi coding scheme has had hundreds of, that has wherein moved on computers just has tens kinds, as a kind of pictograph, Chinese character is to come co expression by the sound of word, shape, justice, the coding method of Chinese character input all is to adopt sound, shape, justice and specific key are interrelated basically, makes up the input of finishing Chinese character according to different Chinese character again.

Non-keyboard input in Chinese software comprises handwriting input software, optical character recognition (OCR) Input Software and phonetic entry software etc.

Handwriting input software is the hand-written Chinese identification Input Software under a kind of pen type environment, meets Chinese's custom of word that writes with a pen, as long as write by usual custom on handwriting pad, computing machine just can show its identification.But handwriting input software needs supporting hardware handwriting pad, writes the typing Chinese character with pen (can be the hard-tipped pen of any kind) on supporting handwriting pad, and is not only convenient, fast, and character error rate is also lower.In addition, also can write out literal with mouse in the appointed area, convert Chinese character to by hand-written software then, be that this method requires mouse action very skilled.

The OCR Input Software requires at first the manuscript that will import to be converted into figure by scanner, then figure is converted to literal.Therefore, this input method need be equipped with scanner, and the printing quality of original copy is high more, the accuracy rate of identification is just high more, the general preferably literal of block letter is such as books, magazine etc., if the paper of original copy is thinner, might come at the figure of when scanning back side of paper, literal also transmission so, disturb last recognition effect.

Pronunciation inputting method is the voice according to the operator, by computing machine speech recognition is become the input method of Chinese character, so claims the acoustic control input method again.Pronunciation inputting method is by the microphone that links to each other with the computing machine voice to computer input of Chinese characters, utilize the speech recognition system analysis to debate and know Chinese character or phrase, Chinese character after the identification is presented in the editing area, passes among the editor of other document of computing machine by the literal of " transmission " function again editing area.

The benefit of phonitic entry method is no longer to go input with hand, and both hands are freed, as long as can read the pronunciation of Chinese character, uses simple, quick.

But, present pronunciation inputting method mainly is the corresponding relation between stored sound signal and the Chinese character in computing machine in advance, behind the input speech, people's voice conversion is become voice signal, the existing voice signal of having stored in voice signal after the conversion and the computing machine is compared, select corresponding Chinese character to input to computing machine then.Because the Chinese character quantity of China is very many, have more than 80,000, the all corresponding voice signal of each Chinese character, and also there is very big-difference in everyone sound pronunciation, therefore present pronunciation inputting method is bigger with the intractability that voice directly are converted to Chinese character, and character error rate is very high, and is bigger to the accuracy influence of input.

Summary of the invention

In view of this, fundamental purpose of the present invention is to provide a kind of character input system of smart machine, both can improve input speed, can reduce the difficulty that speech conversion is a literal again, improves the accuracy of literal input.

Another object of the present invention is to provide a kind of character input method of smart machine, can improve input speed equally, reducing speech conversion is the difficulty of literal, improves the accuracy of literal input.

In order to realize the foregoing invention purpose, main technical schemes of the present invention is:

A kind of character input system of smart machine, this system comprises:

The voice receiver module is used to receive voice;

The speech parameter storehouse is used for the corresponding relation of storaged voice and phonetic;

The sound-type discrimination module is used for storaged voice instruction in advance, is used to also judge whether the voice that the voice receiver module receives are the phonetic order of being stored, if then this phonetic order is sent to the literal generation module, otherwise voice are sent to modular converter;

Modular converter is used for the corresponding relation according to the speech parameter library storage, and the voice signal that receives is converted to corresponding phonetic;

The literal generation module, the phonetic that is used for changing out according to modular converter generates candidate character; And from candidate character, selects the literal finally imported according to the phonetic order of sound-type discrimination module input.

Preferably, the corresponding relation of described voice and phonetic is: the corresponding relation of phonetic element and syllable; And this literal input system further comprises:

Sound bank is used for the recorded speech sequence;

Module set up in syllable, is used for setting up the syllable of each phonetic element correspondence of each bar voice sequence that sound bank records, and deposits each phonetic element and the corresponding relation of its corresponding syllable in described speech parameter storehouse.

Preferably, this system further comprises:

Training probability parameter module is used for voice sequence, phonetic element, and the syllable of phonetic element correspondence according to described sound bank, and statistics generates the training probability parameter of each syllable, will train probability parameter to deposit described speech parameter storehouse in.

Preferably, described modular converter specifically comprises:

Decomposing module is used for voice signal is decomposed at least one phonetic element;

The candidate pinyin generation module is used for first phonetic element from described decomposition, selects a syllable successively and form the candidate pinyin string from the syllable of each phonetic element correspondence;

The probability of occurrence computing module is used for calculating the probability of occurrence of described each candidate pinyin string according to described training probability parameter;

Selected cell, the candidate pinyin string that is used to select the probability of occurrence maximum phonetic after as described voice signal conversion; Perhaps, be used to select the candidate pinyin string output that above probability of occurrence is big relatively, and determine the phonetic of the final conversion of described voice signal according to the selection instruction of external world's input.

Preferably, described modular converter specifically comprises:

The candidate pinyin generation module is used for first phonetic element from described decomposition, searches all syllables of each phonetic element correspondence successively, forms the candidate pinyin of phrase or individual character;

The probability of occurrence computing module is used for according to described training probability parameter, calculates the probability of occurrence of described each candidate pinyin;

Selected cell is exported described candidate pinyin successively according to probability of occurrence, determines the phonetic of the final conversion of described voice signal according to the selection instruction of external world's input.

Preferably, described literal generation module specifically comprises:

The candidate character generation module is used for generating the candidate character tabulation that comprises a candidate character at least according to the phonetic of modular converter conversion;

Result-generation module, be used to export the candidate character tabulation that is generated, and detect the phonetic order of whether receiving the input of sound-type discrimination module, when receiving phonetic order, press the phonetic order that is received and from described candidate character tabulation, select literal, and with selected literal output;

Accordingly, described sound-type discrimination module when judging that the voice that receive are the phonetic order of being stored, sends to result-generation module with this phonetic order.

Preferably, described result-generation module specifically comprises: the phonetic order matching module, be used for the tabulate corresponding matching relationship of candidate character position of storaged voice instruction and candidate character, and according to corresponding matching relationship, candidate character position in the described phonetic order that receives and the tabulation of described candidate character is mated, mate when correct, go out the literal of candidate character as the final input of described character input system from the candidate character choice of location of coupling.

Preferably, described result-generation module is connected with external keyboard to receive keyboard instruction;

Described result-generation module further comprises: physics contact command matching module, be used for storing the corresponding matching relationship of physics contact command and candidate character tabulation candidate character position, and according to corresponding matching relationship, candidate character position in the described keyboard instruction that receives and the tabulation of described candidate character is mated, mate when correct, go out the literal of candidate character as the final input of described character input system from the candidate character choice of location of coupling.

A kind of character input method of smart machine, the corresponding relation of storaged voice and phonetic and phonetic order in advance; This method also comprises:

A, reception voice;

Whether B, the voice that judge to receive are the phonetic order of being stored, if, execution in step C then; Otherwise,, be corresponding phonetic with received speech conversion according to the corresponding relation of voice of being stored and phonetic;

The phonetic that C, basis convert to generates candidate character, and receives the recognizing voice instruction, selects the final literal of importing according to phonetic order from candidate character.

Preferably, the corresponding relation of described voice and phonetic is: the corresponding relation of phonetic element and syllable;

The concrete grammar of the corresponding relation of described storaged voice in advance and phonetic is:

The recorded speech sequence deposits each bar voice sequence of recording in sound bank;

Set up the syllable of each phonetic element correspondence of each bar voice sequence in the sound bank;

Store the corresponding relation of each phonetic element and its corresponding syllable.

Preferably, this method further comprises: according to voice sequence, phonetic element and corresponding syllable thereof in the described sound bank, statistics generates the training probability parameter of each syllable;

Step B is described to be that the method for phonetic is with speech conversion:

B1, voice are decomposed at least one phonetic element, search all syllables of each phonetic element correspondence;

B2, from first phonetic element, from the syllable of each phonetic element correspondence, select syllable successively and form the candidate pinyin string;

B3, according to described training probability parameter, calculate the probability of occurrence of described each candidate pinyin string;

Phonetic after the candidate pinyin string of B4, a probability of occurrence maximum of selection is changed as described voice signal; Perhaps, select the candidate pinyin string output that above probability of occurrence is big relatively, and determine the phonetic of the final conversion of described voice signal according to the selection instruction of external world's input.

B2, from first phonetic element of described decomposition, successively the syllable of each phonetic element correspondence is formed the candidate pinyin of phrase or individual character;

B3, according to described training probability parameter, calculate the probability of occurrence of described each candidate pinyin;

B4, export described candidate pinyin successively, determine the phonetic of the final conversion of described voice signal according to user's selection instruction according to probability of occurrence.

Preferably, the training probability parameter of described syllable comprises initial probability parameter, transition probability parameter and emission probability parameter; Wherein,

Initial probability parameter generates according to M/N, and wherein M is the number of times that a concrete syllable appears at the pinyin string stem of a voice sequence correspondence, and N is the sum of all voice sequences of being write down in the sound bank;

The transition probability parameter generates according to O/P, and wherein O is the altogether apparent number of times of two syllables in sound bank, and P is the sum that first syllable in described two syllables is established in sound bank;

The emission probability parameter generates according to Q/R, and wherein Q is the sum that the phonetic element of a concrete syllable correspondence occurs in sound bank, and R is the sum of this concrete syllable in sound bank;

Described step B3 is specially: initial probability parameter, transition probability parameter and the emission probability parameter of syllable in the candidate pinyin string are multiplied each other, and the value that obtains is the probability of occurrence of this candidate pinyin string.

Preferably, the phonetic that converts to of the described basis of step C generates candidate character and is: generate and show the candidate character tabulation that comprises a candidate character at least according to the phonetic of being changed;

The literal of the final input of the described selection of step C is: C1, select literal according to described phonetic order from described candidate character tabulation, and selected literal is input to smart machine.

Preferably, described method further comprises: the corresponding matching relationship of storaged voice instruction in advance and candidate character position during candidate character is tabulated;

Described step C1 is specially: according to the corresponding matching relationship of candidate character position in the tabulation of described phonetic order and candidate character, candidate character position in the phonetic order that receives and the tabulation of described candidate character is mated, mate when correct, with the candidate character of the candidate character position of coupling as the literal of finally importing.

Preferably, described method also comprises: the corresponding matching relationship of storage keyboard instruction in advance and candidate character position during candidate character is tabulated;

Described method further comprises: after detecting keyboard instruction, corresponding matching relationship according to candidate character position in the tabulation of described keyboard instruction and candidate character, candidate character position in detected keyboard instruction and the tabulation of described candidate character is mated, mate when correct, with the candidate character of the candidate character position of coupling as the literal of finally importing.

The present invention is converted into phonetic with voice signal earlier, then phonetic is handled, and conversion generates literal.Therefore, with respect to existing keyboard-type input method, the present invention have input simply, advantage efficiently, improved the speed of literal input, and then increased work efficiency.With respect to existing phonitic entry method, because the present invention is a phonetic with speech conversion earlier, be converted to literal by phonetic again, the corresponding relation of storaged voice and phonetic on smart machine, and the quantity of phonetic is than little many of the quantity of Chinese written language, therefore need the voice quantity of storage identification to reduce greatly, therefore the present invention except have simply, advantage efficiently, can also reduce is the difficulty of literal with speech conversion directly, especially reduce the difficulty that speech conversion is a Chinese character, improve the accuracy of literal input.

The present invention also further is specially the corresponding relation of voice and phonetic the corresponding relation of phonetic element and syllable, and Chinese character syllable has only 403, be far smaller than the quantity of pinyin string, therefore can further reduce storaged voice quantity, make that the literal input is simpler, faster.

The present invention also is provided with sound bank, can prerecord voice, and according to the training probability parameter of the speech production syllable of recording, by training probability parameter to selecting once more by the phonetic of speech conversion, select the phonetic of probability maximum to be converted to Chinese character, therefore the present invention can avoid the many and nonstandard low problem of input accuracy rate that causes owing to Chinese character pronunciation the biglyyest, further improves the accuracy rate of Chinese character input.

In addition, the present invention is converted in the process of literal at phonetic, adopt and at first generate candidate character, utilize the mode of phonetic order or physics contact command (for example keyboard instruction, touch-screen touch instruction etc.) to select the literal that to import again, so further simplified the operating process of input characters; And the user also can freely to select be to select literal by the phonetic entry mode, still directly adopt physics contact input selection literal, perhaps be used in combination the two, thereby make the user in character input process, have greater flexibility.

Description of drawings

Fig. 1 is the structural representation of character input system of the present invention;

Fig. 2 is the modular converter structural representation in the character input system of the present invention;

Fig. 3 is the demonstration synoptic diagram of a kind of candidate character of character input system generation of the present invention;

Fig. 4 is the structural drawing of the candidate character generation module of character input system of the present invention;

Fig. 5 is the structural representation of the result-generation module of character input system of the present invention;

Fig. 6 is the process flow diagram of the character input method of smart machine of the present invention;

Fig. 7 is the synoptic diagram of relatively large two candidate pinyin strings for probability of occurrence;

Fig. 8 is for exporting the synoptic diagram of the corresponding phonetic of phrase or individual character successively according to probability of occurrence;

Fig. 9 is the candidate word tabulation synoptic diagram of a pinyin string example;

Figure 10 is the candidate word tabulation synoptic diagram after Fig. 9 is simplified;

Figure 11 is the demonstration synoptic diagram of the candidate word tabulation that Figure 10 generated.

Embodiment

Below by specific embodiments and the drawings the present invention is described in further details.

Core concept of the present invention is: the corresponding relation of storaged voice and phonetic in advance; Utilize phonetic entry when literal is imported, at first received speech signal according to the corresponding relation of voice of being stored and phonetic, is converted to corresponding phonetic with received voice signal, generates literal according to the phonetic that is converted to again.

Smart machine of the present invention can the time have the Intelligent Information Processing ability equipment, for example computing machine, smart mobile phone, palm PC or the like.Be that the present invention will be described for example herein with the computing machine.

Literal of the present invention can be a Chinese character, and described phonetic is the Chinese phonetic alphabet, described literal also can be pronunciation based on other literal of phonetic, Korean etc. for example, described phonetic can be the phonetic of this kind literal.Be that the present invention will be described for example with the Chinese and the Chinese phonetic alphabet among the embodiment of this paper.

Fig. 1 is the structural representation of character input system of the present invention.Referring to Fig. 1, this literal input system mainly comprises:

Voice receiver module 101 is connected with the external microphone of computing machine, for example is connected with the headset of band microphone in the computing machine, is used for received speech signal.This voice receiver module 101 can adopt existing voice reception technique, and the user can receive and finish digital conversion by voice receiver module 101 by the voice signal of microphone to character input system input Chinese character.

Speech parameter storehouse 102 is used for the corresponding relation of storaged voice and phonetic.This corresponding relation can be the corresponding relation of phonetic element and syllable, also can certain concrete sound and the corresponding relation of concrete phonetic.Described phonetic element is the pronunciation of independent Chinese character.

Modular converter 103 can directly be connected with voice receiver module 101 and with speech parameter storehouse 102, is used for according to speech parameter storehouse 102 stored relation, and voice receiver module 101 received voice signals are converted to corresponding phonetic.

Literal generation module 104 is used for the phonetic changed out according to modular converter 103 and generates literal, further the literal that generates is input to the display device of smart machine and/or memory device shows and/or stores processor.

Character input system of the present invention can further include:

Sound bank 105 is used for the recorded speech sequence.

Module 106 set up in syllable, is used for setting up the syllable of each phonetic element correspondence of each bar voice sequence that sound bank 105 records, and deposits each phonetic element and the corresponding relation of its corresponding syllable in described speech parameter storehouse 102.

The present invention can utilize sound bank 105 and syllable to set up the corresponding relation that module 106 can be provided with phonetic element and syllable.

In order to improve the recognition accuracy of character input system of the present invention to the input voice, the present invention can also be according to the training probability parameter of the speech production syllable of recording in the sound bank 105, discern selecting once more by the training probability parameter, and convert corresponding pinyin string to by the phonetic of speech conversion.In order to achieve this end, character input system of the present invention also further comprises:

Training probability parameter module 107 is used for according to described sound bank 105 voice sequences, phonetic element and corresponding syllable thereof, and statistics generates the training probability parameter of each syllable, will train probability parameter to deposit described speech parameter storehouse 102 in.

Fig. 2 is modular converter 103 structural representations in the character input system of the present invention.Referring to Fig. 2, this modular converter 103 comprises:

Decomposing module 201, the voice signal that is used for receiving is decomposed at least one phonetic element.

Candidate pinyin generation module 202 is used for first phonetic element from described decomposition, selects a syllable successively and form the candidate pinyin string from the syllable of each phonetic element correspondence.

Probability of occurrence computing module 203 is used for calculating the probability of occurrence of described each candidate pinyin string according to described training probability parameter.

Selected cell 204, the candidate pinyin string that is used to select the probability of occurrence maximum phonetic after as described voice signal conversion; Perhaps, be used to select the candidate pinyin string output that above probability of occurrence is big relatively, and determine the phonetic of the final conversion of described voice signal according to the selection instruction of external world's input.

As another embodiment, the concrete module in the described modular converter 103 can also have following function:

Described literal generation module 104 specifically comprises:

Candidate character generation module 108 is used for generating the candidate character tabulation that comprises a candidate character at least according to the pinyin string of modular converter 103 conversions.

Result-generation module 109, be used to show the candidate character tabulation that is generated, and detect the selection instruction of whether receiving extraneous input, when receiving selection instruction, selection instruction according to input is selected literal from described candidate character tabulation, and selected literal is input to smart machine.

For example: from microphone input voice " Chinese ", after receiving, voice receiver module 101 forwards modular converter 103 to, convert pinyin string " zhong ' guo ' ren " to by modular converter 103, pinyin string is input to literal generation module 104, generate candidate character by candidate character generation module 108, as shown in Figure 3; Import selection instruction by the user then, result-generation module 109 is selected first candidate word according to selection instruction, finishes input.

Fig. 4 is the structural drawing of the candidate character generation module 108 of character input system of the present invention.Referring to Fig. 4, this candidate character generation module 108 specifically comprises:

Candidate word generation module 401 is used for generating candidate word according to the pinyin string of modular converter 103 conversions.

Whole sentence generation module 402 is used for according to candidate word, utilizes whole sentence generating algorithm to generate candidate's whole sentence.

The described selection instruction that is input to result-generation module 109 can be a phonetic order, it also can be the physics contact command, described physics contact command can touch instruction for keyboard instruction, touch-screen, or other are that example describes with the keyboard instruction by the instruction that the physics contact produces herein.

As a kind of embodiment, in order to receive phonetic order, between described voice receiver module 101 and modular converter 103, also can further comprise sound-type discrimination module 110, voice receiver module 101 is input to this sound-type discrimination module 110 earlier with the voice signal that receives, wherein storaged voice instruction in advance, be used to judge whether the voice signal that voice receiver module 101 receives is the phonetic order of being stored, if then judge the type of this voice signal is phonetic order, then this phonetic order is sent to result-generation module 109, otherwise voice signal is sent to modular converter 103.

In order to receive keyboard instruction, described result-generation module 109 need be connected with the keyboard of smart machine to receive keyboard instruction.

Described selection instruction can also can perhaps can be passed through keyboard or phonetic entry only by phonetic entry simultaneously only by the keyboard input, is freely selected by the user.

Fig. 5 is the structural representation of the result-generation module 109 of character input system of the present invention.Referring to Fig. 5, further comprise in the result-generation module 109:

Detection module 501: be used to detect the type of input instruction, if phonetic order then is input to phonetic order matching module 502 with this instruction, if keyboard instruction then is input to this instruction physics contact command matching module 503.

Phonetic order matching module 502, be used for the tabulate corresponding matching relationship of candidate character position of storaged voice instruction and candidate character, and according to this correspondence matching relationship, candidate character position in the described phonetic order that receives and the tabulation of described candidate character is mated, if coupling is correct, the literal of candidate character as the final input of this literal input system from the candidate character choice of location of mating.

Physics contact command matching module 503, be used for the tabulate corresponding matching relationship of candidate character position of storage keyboard instruction and candidate character, and according to this correspondence matching relationship, candidate character position in the described keyboard instruction that receives and the tabulation of described candidate character is mated, if coupling is correct, the literal of candidate character as the final input of this literal input system from the candidate character choice of location of mating.

Shown in Figure 5 is not only can be the structure of phonetic order but also the result-generation module 109 can be for keyboard instruction the time in selection instruction.When described character input system during only by the phonetic entry selection instruction, described result-generation module 109 can only comprise phonetic order matching module 502; When described character input system was only imported selection instruction by keyboard, described result-generation module 109 can only comprise physics contact command matching module 503.

Fig. 6 is the process flow diagram of the character input method of smart machine of the present invention.Referring to Fig. 6, this method comprises:

Step 601, the corresponding relation of storaged voice and phonetic in advance.

Described corresponding relation can be stored in the speech parameter storehouse 102, and this corresponding relation can be the corresponding relation of phonetic element and syllable, also can be concrete sound and the corresponding relation of concrete phonetic.For example: the corresponding syllable of the voice of " I " is " wo ", and the corresponding syllable of the voice of " " is " men ", and the corresponding syllable of the voice of "Yes" is " shi ", and " I ", " ", "Yes" all are phonetic element; Also concrete sound " we are " and phonetic " wo ' men ' shi " can be stored as corresponding relation.The digital signal form storage that described voice and phonetic all can be discerned according to smart machine.

Step 602, received speech signal.Be specifically as follows from the voice-input device of smart machine for example microphone receive voice, be converted to the manageable digital signal of smart machine.

Step 603, according to the corresponding relation of voice of being stored and phonetic, received voice signal is converted to corresponding phonetic.For example, when receiving the voice signal of " I ", in institute's stored relation, search the phonetic " wo " of this voice signal correspondence.

Step 604, generate literal according to the phonetic that converted to.For example: " wo " is converted to literal " I " with phonetic, specifically can adopt existing spelling input method to change.

In the present invention, adopt the method for hidden Markov model (HHM) to realize the conversion of voice to phonetic.HMM is a kind of important statistics natural language model, is widely used in fields such as speech recognition, the conversion of sound word.It comes down to the probability function of a Markovian process.

In hidden Markov model, observed incident is the random function of state.Therefore this model is a dual stochastic process, and wherein the state conversion process of model can not be observed, and promptly hide, and the stochastic process of observable incident is the random function of the state conversion process hidden.It can be by a formal five-tuple HMM=＜S, O, A, B, the π of being described as 〉.Its processing procedure can simply be described as utilizing the method for statistics that existing data are carried out learning training earlier, such as sound bank and phonetic corresponding with it storehouse are added up, obtains the parameters relationship between sound bank and the pinyin string, i.e. parameter library.Then, when newly arriving voice, it is the most approaching to utilize information in the parameter library to determine with these voice, i.e. probability maximum, pinyin string, and as the pinyin string result of this voice correspondence.

It is the concrete grammar of phonetic with speech conversion that following description the present invention uses hidden Markov model.

Adopt the corresponding relation of voice training method storaged voice and phonetic, be specially:

Step 701, recorded speech sequence deposit each bar voice sequence of recording in sound bank.

For example record a large amount of voice sequences, described voice sequence can be the different sentence that the people read aloud or article etc.

Step 702, set up the syllable of each phonetic element correspondence of each bar voice sequence in the sound bank; Store the corresponding relation of each phonetic element and its corresponding syllable.

For example the voice sequence that a certain individual is read aloud " we are the ordinary peoples " is decomposed into each phonetic element " I ", " ", " all ", "Yes", " putting down ", " all ", " people ", sets up corresponding syllable " wo ", " men ", " dou ", " shi ", " ping ", " fan ", " ren " for each phonetic element again.The voice sequence " we are the ordinary peoples " that another person is read aloud also resolves into phonetic element, and sets up identical syllable " wo ", " men ", " dou ", " shi ", " ping ", " fan ", " ren " respectively.Like this, can make the phonetic element of the corresponding multiple different accents of same syllable, thereby not be subjected to the influence of entry personnel's accent, improve the accuracy of speech recognition by voice training.

Then, the present invention can also be further according to voice sequence, phonetic element and corresponding syllable thereof in the described sound bank, and statistics generates the training probability parameter of each syllable.

The training probability parameter of described syllable comprises initial probability parameter, transition probability parameter and emission probability parameter.

Initial probability parameter is the probability that a certain syllable appears at the corresponding phonetic header of voice sequence, can generate according to formula: M/N, wherein M is the number of times that a concrete syllable appears at the pinyin string stem of a voice sequence correspondence, and N is the sum of all voice sequences of being write down in the sound bank.

The transition probability parameter is the probability that a certain syllable and an other syllable show altogether, and described showing altogether occurs according to the front and back order simultaneously for these two syllables, and for example: two syllables of " wo " and " men " can show to become " wo ' men " usually altogether; The transition probability parameter generates according to formula: O/P, and wherein O is the altogether apparent number of times of two syllables in sound bank, and P is the sum that first syllable in described two syllables is established in sound bank.

The emission probability parameter is the apparent altogether probability of a certain syllable and a certain voice, for example: because the difference of accent pronunciation, the voice that the voice of " I " can send out " wo ", " e " or syllables such as " huo " to represent, therefore the voice of " I " may take place to show altogether with " wo ", " e " or " huo ".Described emission probability parameter generates according to formula: Q/R, and wherein Q is the sum that the phonetic element of a concrete syllable correspondence occurs in sound bank, and R is the sum of this specific syllable in sound bank.

Utilize hidden Markov model, step 603 is described to be that the method for phonetic is with speech conversion:

Step 6031, voice are decomposed at least one phonetic element, search all syllables of each phonetic element correspondence.

For example import voice " we are the ordinary peoples ", voice are decomposed into " I ", " ", " all ", "Yes", " putting down ", " all ", " people " seven phonetic element, from the corresponding relation of voice and the phonetic of storage in advance, search corresponding pinyin syllable, for example:

" I " corresponding syllable " wo ".

" " corresponding syllable " men " and " meng ".

" all " corresponding syllable " dou ".

The corresponding syllable " shi " and " si " of "Yes".

" put down " corresponding syllable " ping ".

" all " corresponding syllables " fan ".

" people " corresponding syllable " ren ".

Step 6032, from first phonetic element, from the syllable of each phonetic element correspondence, select syllable successively and form the candidate pinyin string.

The candidate pinyin string that for example above-mentioned voice " we are the ordinary peoples " are corresponding is:

1、“wo’men’dou’shi’ping’fan’ren”。

2、“wo’men’dou’si’ping’fan’ren”。

3、“wo’meng’dou’shi’ping’fan’ren”。

4、“wo’meng’dou’si’ping’fan’ren”。

Step 6033, according to described training probability parameter, calculate the probability of occurrence of described each candidate pinyin string.Specifically be that initial probability parameter, transition probability parameter and emission probability parameter with syllable in the candidate pinyin string multiplies each other, the value that obtains is the probability of occurrence of this candidate pinyin string.

Phonetic after the candidate pinyin string of step 6034, a probability of occurrence maximum of selection is changed as described voice signal.

For example by calculating, the probability of occurrence maximum of above-mentioned pinyin string " wo ' men ' dou ' shi ' ping ' fan ' ren ", the phonetic after can selecting this pinyin string as conversion.

Perhaps, can select the big relatively candidate pinyin string output of above probability of occurrence to be shown to the user, select, the phonetic after the candidate pinyin string of user's selection is changed as described voice signal by the user.

For example, the probability of occurrence of above-mentioned pinyin string " wo ' men ' dou ' shi ' ping ' fan ' ren " and " wo ' men ' dou ' si ' ping ' fan ' ren " is relatively large two, the phonetic after can selecting these two pinyin string as conversion.At this moment, described two candidate pinyin strings output can be shown to the user, as shown in Figure 7, before each candidate pinyin string label is arranged all, select according to label by the user, if the user selects 1, then with the phonetic of pinyin string " wo ' men ' dou ' shi ' ping ' fan ' ren " after as described voice signal conversion.

In addition, above-mentioned steps 6032 to step 6034 also can have following replacement scheme, is respectively step 6032 ', step 6033 ' and step 6034 '.

Step 6032 ', from first phonetic element of described decomposition, successively the syllable of each phonetic element correspondence is formed the candidate pinyin of phrase or individual character.For example:

The candidate pinyin that for example above-mentioned voice " we are the ordinary peoples " are corresponding is:

Preceding two phonetic element are formed the candidate pinyin of phrase " wo ' men " and " wo ' meng ";

Second and the 3rd phonetic element formed the candidate pinyin of phrase " dou ' shi " and " dou ' si ";

Three phonetic element in back are formed the candidate pinyin of phrase " ping ' fan ' ren ".

Step 6033 ', according to described training probability parameter, calculate the probability of occurrence of described each candidate pinyin.Specifically be that initial probability parameter, transition probability parameter and emission probability parameter with syllable in the candidate pinyin multiplies each other, the value that obtains is the probability of occurrence of this candidate pinyin.

Step 6034 ', export described candidate pinyin successively according to probability of occurrence, determine the phonetic of the final conversion of described voice signal according to user's selection instruction.

For example, Fig. 8 is for exporting the synoptic diagram of the corresponding phonetic of phrase or individual character successively according to probability of occurrence.Shown in the first step 801 of Fig. 8, can distinguish in order output " 1:wo ' men " and " 2:wo ' meng ", select by the user, if the user has selected 1, then further show follow-up phrase according to probability of occurrence; Shown in second step 802 of Fig. 8, can distinguish in order output " 1:dou ' shi " and " 2:dou ' si ", further select by the user, if the user has selected 1, the further follow-up phrase of demonstration then; Shown in the 3rd step 803 of Fig. 8, can show " ping ' fan ' ren ", can select this last phonetic this moment by the user, also can select this last phonetic by system default; Finally, incite somebody to action " wo ' men ' dou ' shi ' ping ' fan ' ren " as the phonetic after the described voice signal conversion.

Certainly, in said process, also can show all syllables (being the phonetic of individual character) of each phonetic element correspondence successively, select the syllable of each phonetic element successively, thereby determine the phonetic of the final conversion of described voice signal by the user.

After having obtained pinyin string, utilize described step 604 to generate literal.Step 604 can specifically comprise:

Step 6041, generate the candidate character tabulation that comprises a candidate character at least, on smart machine, show described candidate character tabulation according to the phonetic changed.

Whether step 6042, detection smart machine import selection instruction, if the selection instruction of detecting, execution in step 6043; Otherwise repeat this step 6042.

Step 6043, from the tabulation of described candidate character, select literal, and selected literal is input to smart machine according to described selection instruction.

The described candidate character tabulation of described step 6041 can be a candidate word, can also be candidate whole sentence.Concrete generation method comprises following two steps:

One, the generation of candidate word.The present invention need be provided with the mapping table of a pinyin string to the candidate word sequence, i.e. phonetic dictionary.The candidate word of each pinyin string correspondence sorts from big to small according to its word frequency in this phonetic dictionary, the method that candidate word generates is fairly simple, search in the phonetic dictionary according to pinyin string exactly, after finding the pinyin string of coupling, preceding n candidate word output of pinyin string correspondence, n is the candidate word number that the input method output interface can show.

Two, the generation of whole sentence.In order to realize whole sentence input, the present invention adopts most probable number method to realize whole sentence prediction, in the pinyin string of user's input, exists the combined method of multiple candidate's word that is:.At first find out all candidate word that occur in this pinyin string, in these candidate's contaminations, find the assembled scheme of a probability maximum to generate the result then as last whole sentence.

Fig. 9 is the candidate word tabulation synoptic diagram of pinyin string " wo ' men ' dou ' shi ' ping ' fan ' ren ".As shown in Figure 9, each bar arc correspondence one or more candidate word, sort from high to low according to word frequency from top to bottom among the figure, and each bar arc all has word frequency information, this word frequency information does not mark in the drawings, and word frequency information refers to the word frequency of the speech of word frequency maximum in pairing all candidate word of pinyin string.Owing to only provide a candidate whole sentence information to the user, have only the highest speech of word frequency just effective, that is to say that word frequency comes second later speech, can not occur such as " nest ", " door ", " fighter " etc. in last candidate whole sentence result.

Figure 10 is the candidate word tabulation synoptic diagram after Fig. 9 is simplified.As shown in figure 10, utilize the point-to-point transmission shortest path first, dijkstra's algorithm for example, Viterbi algorithm etc., obtain a paths of probability maximum, as the dashed path shown in the path of probability maximum among Figure 10 is, described path is the contamination scheme, the path of probability maximum predicted the outcome as last whole sentence be presented at first of the candidate word window, described candidate character list window as shown in figure 11, wherein whole sentence candidate result has only one, and promptly " we are the ordinary peoples " at the first position candidate place all is the candidate word result from second position candidate backward.

After generating the candidate character tabulation, need from the candidate character tabulation, select the input results that conduct is final by the user.In the present invention, can adopt dual mode to determine final input results, a kind of is that keyboard is selected, and another kind is a voice selecting.That is to say that in step 6042, described selection instruction can be a keyboard instruction, also can be phonetic order.

When the user imported selection instruction by keyboard, described method needed the corresponding matching relationship of candidate character position in storage keyboard instruction in advance and the candidate character tabulation; And, step 6043 is specially: after detecting keyboard instruction, corresponding matching relationship according to candidate character position in the tabulation of described keyboard instruction and candidate character, candidate character position in detected keyboard instruction and the tabulation of described candidate character is mated, if coupling is correct, with the candidate character of the candidate character position of coupling as the literal of finally importing.

When the user imports phonetic order by microphone, the corresponding matching relationship of candidate character position during described method needs storaged voice instruction in advance and phonetic order and candidate character is tabulated, each selection instruction is represented with the voice of a word, promptly sets up the corresponding relation of voice to selection instruction.Such as the voice of " 1 ", first candidate character is selected in corresponding expression, " on " the corresponding expression of voice select the page up candidate character, one page candidate character is down selected in the corresponding expression of the voice of D score.The user can also oneself revise phonetic order as required, represents above-mentioned operation with different phonetic orders, and for example the user can oneself define some voice that are of little use as phonetic order, will significantly reduce the conflict of phonetic order and phonetic entry like this.

And, after step 602, before the step 603, further comprise: judge whether the voice that receive are the phonetic order of storing in advance, if then according to the corresponding matching relationship of candidate character position in the tabulation of described phonetic order and candidate character, candidate character position in the phonetic order that receives and the tabulation of described candidate character is mated, if coupling is correct, with the candidate character of the candidate character position of coupling as the literal of finally importing; If not, then execution in step 603.

The above; only for the preferable embodiment of the present invention, but protection scope of the present invention is not limited thereto, and anyly is familiar with the people of this technology in the disclosed technical scope of the present invention; the variation that can expect easily or replacement all should be encompassed within protection scope of the present invention.

Claims

1. the character input system of a smart machine is characterized in that, this system comprises:

The voice receiver module is used to receive voice;

2. character input system according to claim 1 is characterized in that, the corresponding relation of described voice and phonetic is: the corresponding relation of phonetic element and syllable; And this literal input system further comprises:

Sound bank is used for the recorded speech sequence;

3. character input system according to claim 2 is characterized in that, this system further comprises:

4. character input system according to claim 3 is characterized in that, described modular converter specifically comprises:

5. character input system according to claim 3 is characterized in that, described modular converter specifically comprises:

6. character input system according to claim 1 is characterized in that, described literal generation module specifically comprises:

7. character input system according to claim 6, it is characterized in that, described result-generation module specifically comprises: the phonetic order matching module, be used for the tabulate corresponding matching relationship of candidate character position of storaged voice instruction and candidate character, and according to corresponding matching relationship, candidate character position in the described phonetic order that receives and the tabulation of described candidate character is mated, mate when correct, go out the literal of candidate character as the final input of described character input system from the candidate character choice of location of coupling.

8. character input system according to claim 6 is characterized in that, described result-generation module is connected with external keyboard to receive keyboard instruction;

9. the character input method of a smart machine is characterized in that, in advance the corresponding relation of storaged voice and phonetic and phonetic order; This method also comprises:

A, reception voice;

10. character input method according to claim 9 is characterized in that, the corresponding relation of described voice and phonetic is: the corresponding relation of phonetic element and syllable;

11. character input method according to claim 10 is characterized in that, this method further comprises: according to voice sequence, phonetic element and corresponding syllable thereof in the described sound bank, statistics generates the training probability parameter of each syllable;

12. character input method according to claim 10 is characterized in that, this method further comprises: according to voice sequence, phonetic element and corresponding syllable thereof in the described sound bank, statistics generates the training probability parameter of each syllable;

13. character input method according to claim 11 is characterized in that, the training probability parameter of described syllable comprises initial probability parameter, transition probability parameter and emission probability parameter; Wherein,

14. character input method according to claim 9 is characterized in that, phonetic that the described basis of step C converts to generates candidate character and is: generate and show the candidate character tabulation that comprises a candidate character at least according to the phonetic of being changed;

The literal of the final input of the described selection of step C is:

C1, from the tabulation of described candidate character, select literal, and selected literal is input to smart machine according to described phonetic order.

15. character input method according to claim 14 is characterized in that,

Described method further comprises: the corresponding matching relationship of storaged voice instruction in advance and candidate character position during candidate character is tabulated;

16. character input method according to claim 14 is characterized in that, described method also comprises: the corresponding matching relationship of storage keyboard instruction in advance and candidate character position during candidate character is tabulated;