Disclosure of Invention
In view of the above, the main objective of the present invention is to provide a text input system for an intelligent device, which can not only increase the input speed, but also reduce the difficulty of converting speech into text, and increase the accuracy of text input.
Another objective of the present invention is to provide a text input method for an intelligent device, which can also increase the input speed, reduce the difficulty of converting speech into text, and increase the accuracy of text input.
In order to achieve the purpose of the invention, the main technical scheme of the invention is as follows:
a text input system for a smart device, the system comprising:
the voice receiving module is used for receiving voice;
the voice parameter library is used for storing the corresponding relation between the voice and the pinyin;
the voice type judging module is used for pre-storing the voice command and judging whether the voice received by the voice receiving module is the stored voice command or not, if so, the voice command is sent to the character generating module, otherwise, the voice is sent to the converting module;
the conversion module is used for converting the received voice signals into corresponding pinyin according to the corresponding relation stored in the voice parameter library;
the character generation module is used for generating candidate characters according to the pinyin converted by the conversion module; and selecting the finally input characters from the candidate characters according to the voice command input by the voice type judging module.
Preferably, the correspondence between the voice and the pinyin is as follows: the corresponding relation between the phonetic elements and the syllables; and the text input system further comprises:
the voice library is used for recording a voice sequence;
and the syllable establishing module is used for establishing syllables corresponding to each voice element of each voice sequence recorded in the voice database and storing the corresponding relation between each voice element and the corresponding syllable into the voice parameter database.
Preferably, the system further comprises:
and the training probability parameter module is used for counting and generating the training probability parameters of all syllables according to the voice sequence, the voice elements and the syllables corresponding to the voice elements in the voice database and storing the training probability parameters into the voice parameter database.
Preferably, the conversion module specifically includes:
the decomposition module is used for decomposing the voice signal into at least one voice element;
a candidate pinyin generation module, configured to select a syllable from the syllables corresponding to each of the voice elements in sequence from the decomposed first voice element to form a candidate pinyin string;
the occurrence probability calculation module is used for calculating the occurrence probability of each candidate pinyin string according to the training probability parameters;
the selection unit is used for selecting a candidate pinyin string with the maximum occurrence probability as the pinyin converted by the voice signal; or, the method is used for selecting more than one candidate pinyin strings with relatively high occurrence probability to be output, and determining the pinyin finally converted by the voice signal according to a selection instruction input from the outside.
Preferably, the conversion module specifically includes:
the decomposition module is used for decomposing the voice signal into at least one voice element;
a candidate pinyin generation module, configured to sequentially search all syllables corresponding to each voice element from the decomposed first voice element to form candidate pinyins of phrases or words;
the occurrence probability calculation module is used for calculating the occurrence probability of each candidate pinyin according to the training probability parameters;
and the selection unit is used for sequentially outputting the candidate pinyins according to the occurrence probability and determining the final converted pinyin of the voice signal according to a selection instruction input from the outside.
Preferably, the text generation module specifically includes:
the candidate character generating module is used for generating a candidate character list at least comprising one candidate character according to the pinyin converted by the converting module;
the result generation module is used for outputting the generated candidate character list, detecting whether a voice instruction input by the voice type judgment module is received or not, selecting characters from the candidate character list according to the received voice instruction when the voice instruction is received, and outputting the selected characters;
correspondingly, the voice type judging module sends the voice command to the result generating module when judging that the received voice is the stored voice command.
Preferably, the result generating module specifically includes: and the voice instruction matching module is used for storing the corresponding matching relation between the voice instruction and the candidate character position in the candidate character list, matching the received voice instruction with the candidate character position in the candidate character list according to the corresponding matching relation, and selecting the candidate character from the matched candidate character position to be used as the character finally input by the character input system when the matching is correct.
Preferably, the result generation module is connected with an external keyboard to receive a keyboard instruction;
the result generation module further comprises: and the physical contact instruction matching module is used for storing the corresponding matching relation between the physical contact instruction and the candidate character position in the candidate character list, matching the received keyboard instruction with the candidate character position in the candidate character list according to the corresponding matching relation, and selecting the candidate character from the matched candidate character position as the character finally input by the character input system when the matching is correct.
A character input method of intelligent equipment pre-stores the corresponding relation between voice and pinyin and voice instructions; the method further comprises the following steps:
A. receiving voice;
B. judging whether the received voice is a stored voice instruction or not, and if so, executing the step C; otherwise, converting the received voice into corresponding pinyin according to the corresponding relation between the stored voice and the pinyin;
C. and generating candidate characters according to the converted pinyin, receiving and recognizing a voice instruction, and selecting the finally input characters from the candidate characters according to the voice instruction.
Preferably, the correspondence between the voice and the pinyin is as follows: the corresponding relation between the phonetic elements and the syllables;
the specific method for pre-storing the corresponding relation between the voice and the pinyin comprises the following steps:
recording voice sequences, and storing the recorded voice sequences into a voice library;
establishing syllables corresponding to each voice element of each voice sequence in a voice library;
storing the corresponding relation between each phonetic element and the corresponding syllable.
Preferably, the method further comprises: according to the voice sequence, the voice elements and the corresponding syllables in the voice library, carrying out statistics to generate training probability parameters of each syllable;
the method for converting the voice into the pinyin in the step B comprises the following steps:
b1, decomposing the voice into at least one voice element, and searching all syllables corresponding to each voice element;
b2, selecting a syllable from the syllables corresponding to each phonetic element in turn from the first phonetic element to form a candidate pinyin string;
b3, calculating the occurrence probability of each candidate pinyin string according to the training probability parameters;
b4, selecting a candidate pinyin string with the maximum occurrence probability as the pinyin after the voice signal conversion; or selecting more than one candidate pinyin strings with relatively high occurrence probability to output, and determining the pinyin finally converted by the voice signal according to a selection instruction input from the outside.
Preferably, the method further comprises: according to the voice sequence, the voice elements and the corresponding syllables in the voice library, carrying out statistics to generate training probability parameters of each syllable;
the method for converting the voice into the pinyin in the step B comprises the following steps:
b1, decomposing the voice into at least one voice element, and searching all syllables corresponding to each voice element;
b2, starting from the decomposed first voice element, sequentially forming syllables corresponding to each voice element into candidate pinyin of a phrase or a single character;
b3, calculating the occurrence probability of each candidate pinyin according to the training probability parameters;
b4, outputting the candidate pinyins in sequence according to the occurrence probability, and determining the final converted pinyins of the voice signals according to the selection instruction of the user.
Preferably, the training probability parameters of the syllables comprise an initial probability parameter, a transition probability parameter and a transmission probability parameter; wherein,
generating an initial probability parameter according to M/N, wherein M is the frequency of the specific syllable appearing at the head of the pinyin string corresponding to one voice sequence, and N is the total number of all the voice sequences recorded in the voice library;
the transition probability parameter is generated according to O/P, wherein O is the common display times of the two syllables in the voice library, and P is the total number of the first syllables in the two syllables established in the voice library;
generating the emission probability parameter according to Q/R, wherein Q is the total number of the voice elements corresponding to a specific syllable in the voice library, and R is the total number of the specific syllable in the voice library;
the step B3 specifically includes: multiplying the initial probability parameter, the transfer probability parameter and the emission probability parameter of the syllable in the candidate pinyin string to obtain the value of the occurrence probability of the candidate pinyin string.
Preferably, the step C of generating the candidate characters according to the converted pinyin is: generating and displaying a candidate character list including at least one candidate character according to the converted pinyin;
and C, selecting the finally input characters as follows: and C1, selecting words from the candidate word list according to the voice command, and inputting the selected words into the intelligent equipment.
Preferably, the method further comprises: pre-storing a corresponding matching relation between the voice command and the position of the candidate character in the candidate character list;
the step C1 specifically includes: and matching the received voice command with the candidate character positions in the candidate character list according to the corresponding matching relation between the voice command and the candidate character positions in the candidate character list, and taking the candidate characters at the matched candidate character positions as the finally input characters when the matching is correct.
Preferably, the method further comprises: pre-storing a corresponding matching relation between a keyboard instruction and a candidate character position in a candidate character list;
the method further comprises: after a keyboard instruction is detected, matching the detected keyboard instruction with the candidate character positions in the candidate character list according to the corresponding matching relation between the keyboard instruction and the candidate character positions in the candidate character list, and taking the candidate characters at the matched candidate character positions as the finally input characters when the matching is correct.
The invention firstly converts the voice signal into pinyin, then processes the pinyin and converts the pinyin into characters. Therefore, compared with the existing keyboard input method, the method has the advantages of simple and quick input, improves the speed of character input, and further improves the working efficiency. Compared with the existing voice input method, the voice is converted into pinyin, the pinyin is converted into characters, the corresponding relation between the voice and the pinyin is stored in the intelligent device, the number of the pinyin is much smaller than that of the Chinese characters, and the number of the voice needing to be stored and identified is greatly reduced.
The invention further specifies the corresponding relation between the voice and the pinyin as the corresponding relation between the voice elements and the syllables, and the number of syllables of the Chinese character is only 403 which is far less than the number of pinyin strings, so that the number of stored voice can be further reduced, and the character input is simpler and quicker.
The invention also sets a voice library which can pre-record voice, generates training probability parameters of syllables according to the recorded voice, and selects the pinyin converted by the voice again through the training probability parameters, and the pinyin with the maximum probability is converted into Chinese characters, so that the invention can maximally avoid the problem of low input accuracy rate caused by more pronunciations and nonstandard Chinese characters, and further improve the accuracy rate of Chinese character input.
In addition, in the process of converting pinyin into characters, the method firstly generates candidate characters, and then selects the characters to be input by using a voice instruction or a physical contact instruction (such as a keyboard instruction, a touch instruction of a touch screen and the like), so that the operation process of inputting the characters is further simplified; and the user can also freely select whether to select the characters by a voice input mode, directly input and select the characters by physical contact, or combine the two modes, so that the user has greater flexibility in the character input process.
Detailed Description
The invention is explained in more detail below with reference to specific embodiments and the drawing.
The core idea of the invention is as follows: pre-storing the corresponding relation between the voice and the pinyin; when inputting characters, the speech input is used, firstly the speech signal is received, the received speech signal is converted into corresponding pinyin according to the stored correspondence between the speech and the pinyin, and then the characters are generated according to the converted pinyin.
The intelligent device of the invention can be a device with intelligent information processing capability, such as a computer, a smart phone, a palm computer and the like. The present invention is described herein in the context of a computer.
The characters can be Chinese characters, the pinyin is Chinese pinyin, the characters can also be other characters with pronunciation based on pinyin, such as Korean, and the like, and the pinyin can be the pinyin of the characters. The embodiment of the present invention is described by taking Chinese and pinyin as examples.
Fig. 1 is a schematic structural diagram of a text input system according to the present invention. Referring to fig. 1, the text input system mainly includes:
the voice receiving module 101 is connected to an external microphone of the computer, for example, connected to an earphone with a microphone in the computer, and is configured to receive a voice signal. The voice receiving module 101 can adopt the existing voice receiving technology, and a user can input a voice signal of a Chinese character to the character input system through a microphone, and the voice receiving module 101 receives and completes digital conversion.
And the voice parameter library 102 is used for storing the corresponding relation between the voice and the pinyin. The corresponding relation can be the corresponding relation between the phonetic elements and syllables, or the corresponding relation between a specific phonetic and a specific pinyin. The phonetic elements are the pronunciations of the individual Chinese characters.
The conversion module 103 may be directly connected to the speech receiving module 101 and the speech parameter library 102, and is configured to convert the speech signal received by the speech receiving module 101 into a corresponding pinyin according to the correspondence stored in the speech parameter library 102.
And the character generating module 104 is configured to generate characters according to the pinyin converted by the converting module 103, and further input the generated characters to a display device and/or a storage device of the intelligent device for display and/or storage processing.
The text input system of the present invention may further comprise:
a speech library 105 for recording speech sequences.
A syllable establishing module 106, configured to establish a syllable corresponding to each voice element of each voice sequence recorded in the voice database 105, and store the corresponding relationship between each voice element and its corresponding syllable in the voice parameter database 102.
The present invention can utilize the voice library 105 and the syllable establishing module 106 to set the corresponding relationship between the voice elements and the syllables.
In order to improve the recognition accuracy of the input speech by the character input system of the invention, the invention can also generate the training probability parameter of the syllable according to the speech recorded in the speech library 105, and the pinyin converted by the speech is selected and recognized again through the training probability parameter and is converted into the corresponding pinyin string. To achieve this object, the text input system of the present invention further comprises:
a training probability parameter module 107, configured to statistically generate a training probability parameter of each syllable according to the voice sequence, the voice element, and the corresponding syllable in the voice database 105, and store the training probability parameter in the voice parameter database 102.
Fig. 2 is a schematic structural diagram of the conversion module 103 in the text input system according to the present invention. Referring to fig. 2, the conversion module 103 includes:
a decomposition module 201, configured to decompose the received voice signal into at least one voice element.
And a candidate pinyin generation module 202, configured to select a syllable from the syllables corresponding to each voice element in sequence from the decomposed first voice element to form a candidate pinyin string.
And the occurrence probability calculation module 203 is configured to calculate an occurrence probability of each candidate pinyin string according to the training probability parameter.
A selecting unit 204, configured to select a candidate pinyin string with the highest occurrence probability as the pinyin after the voice signal conversion; or, the method is used for selecting more than one candidate pinyin strings with relatively high occurrence probability to be output, and determining the pinyin finally converted by the voice signal according to a selection instruction input from the outside.
As another embodiment, the specific modules in the conversion module 103 may further have the following functions:
the decomposition module is used for decomposing the voice signal into at least one voice element;
a candidate pinyin generation module, configured to sequentially search all syllables corresponding to each voice element from the decomposed first voice element to form candidate pinyins of phrases or words;
the occurrence probability calculation module is used for calculating the occurrence probability of each candidate pinyin according to the training probability parameters;
and the selection unit is used for sequentially outputting the candidate pinyins according to the occurrence probability and determining the final converted pinyin of the voice signal according to a selection instruction input from the outside.
The text generation module 104 specifically includes:
a candidate character generating module 108, configured to generate a candidate character list including at least one candidate character according to the pinyin string converted by the converting module 103.
And the result generating module 109 is configured to display the generated candidate character list, detect whether a selection instruction input from the outside is received, select a character from the candidate character list according to the input selection instruction when the selection instruction is received, and input the selected character to the intelligent device.
For example: inputting a voice "chinese" from a microphone, receiving the voice by the voice receiving module 101, then transferring the voice to the converting module 103, converting the voice into a pinyin string "zhong 'guo' ren" by the converting module 103, inputting the pinyin string into the character generating module 104, and generating candidate characters by the candidate character generating module 108, as shown in fig. 3; then, the user inputs a selection instruction, and the result generation module 109 selects the first candidate word according to the selection instruction, thereby completing the input.
FIG. 4 is a block diagram of the candidate text generation module 108 of the text input system according to the present invention. Referring to fig. 4, the candidate text generation module 108 specifically includes:
a candidate word generating module 401, configured to generate a candidate word according to the pinyin string converted by the converting module 103.
And a complete sentence generating module 402, configured to generate a candidate complete sentence according to the candidate word by using a complete sentence generating algorithm.
The selection instruction input to the result generation module 109 may be a voice instruction or a physical contact instruction, where the physical contact instruction may be a keyboard instruction, a touch instruction on a touch screen, or another instruction generated by physical contact, and the keyboard instruction is described as an example herein.
As an embodiment, in order to receive a voice instruction, between the voice receiving module 101 and the converting module 103, a voice type distinguishing module 110 may be further included, where the voice receiving module 101 inputs a received voice signal to the voice type distinguishing module 110, a voice instruction is pre-stored, and is used to judge whether the voice signal received by the voice receiving module 101 is the stored voice instruction, if so, it is judged that the type of the voice signal is the voice instruction, and then the voice instruction is sent to the result generating module 109, otherwise, the voice signal is sent to the converting module 103.
To receive the keyboard command, the result generation module 109 needs to connect with the keyboard of the smart device to receive the keyboard command.
The selection instruction can be input only through a keyboard, can be input only through voice, or can be input through the keyboard or the voice at the same time, and can be freely selected by a user.
Fig. 5 is a schematic structural diagram of the result generation module 109 of the text input system according to the present invention. Referring to fig. 5, the result generation module 109 further includes:
the detection module 501: the method is used for detecting the type of an input instruction, inputting the instruction into a voice instruction matching module 502 if the input instruction is a voice instruction, and inputting the instruction into a physical contact instruction matching module 503 if the input instruction is a keyboard instruction.
The voice instruction matching module 502 is configured to store a corresponding matching relationship between a voice instruction and a candidate character position in a candidate character list, match the received voice instruction with the candidate character position in the candidate character list according to the corresponding matching relationship, and select a candidate character from the matched candidate character position as a character finally input by the text input system if the matching is correct.
And the physical contact instruction matching module 503 is configured to store a corresponding matching relationship between a keyboard instruction and a candidate character position in the candidate character list, match the received keyboard instruction with the candidate character position in the candidate character list according to the corresponding matching relationship, and select a candidate character from the matched candidate character position as a character finally input by the text input system if the matching is correct.
Fig. 5 shows the structure of the result generation module 109 when the selection command can be a voice command or a keyboard command. When the text input system only inputs a selection instruction through voice, the result generation module 109 may only include a voice instruction matching module 502; when the text input system only inputs a selection instruction through a keyboard, the result generation module 109 may only include the physical contact instruction matching module 503.
Fig. 6 is a flowchart of a text input method of the intelligent device according to the present invention. Referring to fig. 6, the method includes:
step 601, pre-storing the corresponding relation between the voice and the pinyin.
The corresponding relationship may be stored in the speech parameter library 102, and the corresponding relationship may be the corresponding relationship between the speech element and the syllable, or the corresponding relationship between the specific speech and the specific pinyin. For example: the syllable corresponding to the voice of 'I' is 'wo', 'men' is 'men', the syllable corresponding to the voice of 'Ye' is 'shi', 'I' and 'men' are all voice elements; the specific speech "we is" and the pinyin "wo 'men' shi" can also be stored as a corresponding relationship. The voice and the pinyin are stored according to a digital signal form which can be recognized by the intelligent equipment.
Step 602, receiving a voice signal. Specifically, the voice may be received from a voice input device of the smart device, such as a microphone, and converted into a digital signal that can be processed by the smart device.
Step 603, converting the received voice signal into corresponding pinyin according to the stored correspondence between the voice and the pinyin. For example, when receiving the voice signal of "me", the pinyin "wo" corresponding to the voice signal is searched in the stored correspondence.
Step 604, generating characters according to the converted pinyin. For example: the pinyin wo is converted into the character me, and the conversion can be realized by adopting the existing pinyin input method.
In the invention, the conversion from the speech to the pinyin is realized by adopting a hidden Markov model (HHM) method. HMM is an important statistical natural language model, and is widely used in the fields of speech recognition, phonetic-to-word conversion, and the like. It is essentially a probability function of a markov process.
In hidden markov models, the observed events are random functions of states. The model is thus a double stochastic process in which the state transition process of the model is not observable, i.e. hidden, whereas the stochastic process of observable events is a stochastic function of the hidden state transition process. It can be formally described as a quintuple HMM ═ S, O, a, B, pi >. The processing procedure can be simply described as firstly utilizing a statistical method to perform learning training on the existing data, for example, performing statistics on the voice library and the corresponding pinyin library to obtain the parameter relationship between the voice library and the pinyin string, namely, the parameter library. Then, when a new voice comes, the information in the parameter library is used to determine the pinyin string which is closest to the voice, i.e. has the highest probability, and the pinyin string is used as the pinyin string result corresponding to the voice.
The specific method of converting speech to pinyin using hidden markov models of the present invention is described below.
The method for storing the corresponding relation between the voice and the pinyin by adopting the voice training method comprises the following steps:
and 701, recording the voice sequences, and storing the recorded voice sequences into a voice library.
For example, recording a large number of speech sequences, which may be sentences or articles spoken by different people, etc.
Step 702, establishing syllables corresponding to each voice element of each voice sequence in a voice library; storing the corresponding relation between each phonetic element and the corresponding syllable.
For example, a speech sequence "we are all ordinary" read by a certain person is decomposed into speech elements "i", "d", "yes", "flat", "no" and "person", and then corresponding syllables "wo", "men", "dou", "shi", "ping", "fan" and "ren" are established for each speech element. The phonetic sequence "we are all adventures" read by another person is also decomposed into phonetic elements, and the same syllables "wo", "men", "dou", "shi", "ping", "fan", "ren" are established, respectively. Therefore, the same syllable can correspond to the voice elements of various different accents through voice training, so that the influence of the accents of input personnel is avoided, and the accuracy of voice recognition is improved.
Then, the invention can also further generate the training probability parameter of each syllable by statistics according to the voice sequence, the voice elements and the corresponding syllables in the voice library.
The training probability parameters of the syllables comprise an initial probability parameter, a transition probability parameter and a transmission probability parameter.
The initial probability parameter is the probability that a syllable appears in the pinyin header corresponding to the voice sequence, and can be according to the formula: M/N is generated, wherein M is the number of times a specific syllable appears in the head of the pinyin string corresponding to a voice sequence, and N is the total number of all the voice sequences recorded in the voice library.
The transition probability parameter is the probability that a syllable and another syllable co-appear, i.e. the two syllables appear simultaneously in front and back order, for example: the two syllables "wo" and "men" will usually co-appear as "wo' men"; the transition probability parameter is according to the formula: and generating O/P, wherein O is the co-occurrence number of the two syllables in the voice library, and P is the total number of the first syllables in the two syllables established in the voice library.
The emission probability parameter is the probability of a syllable co-appearing with a voice, for example: because of the different pronunciation of accents, the "I" speech can be uttered as the speech represented by the syllables of "wo", "e", or "huo", so the "I" speech may be co-apparent with "wo", "e", or "huo". The transmission probability parameter is according to the formula: Q/R is generated, wherein Q is the total number of occurrences of the speech element corresponding to a particular syllable in the speech pool, and R is the total number of occurrences of the particular syllable in the speech pool.
Using hidden markov model, the method for converting speech into pinyin in step 603 is:
step 6031, decompose the voice into at least one voice element, and find all syllables corresponding to each voice element.
For example, inputting a voice "we are all ordinary people", decomposing the voice into seven voice elements of "i", "people", "all", "yes", "flat", "no", and "people", and searching corresponding pinyin syllables from a prestored corresponding relationship between the voice and pinyin, for example:
"I" corresponds to the syllable "wo".
"these" correspond to the syllables "men" and "meng".
"all" corresponds to syllable "dou".
"is" corresponding syllables "shi" and "si".
"Flat" corresponds to the syllable "ping".
"Van" corresponds to the syllable "fan".
"human" corresponds to the syllable "ren".
Step 6032, starting from the first phonetic element, selecting a syllable from the syllables corresponding to each phonetic element in turn to form a candidate pinyin string.
For example, the candidate pinyin strings corresponding to the above speech "we are all adventures" are:
1、“wo’men’dou’shi’ping’fan’ren”。
2、“wo’men’dou’si’ping’fan’ren”。
3、“wo’meng’dou’shi’ping’fan’ren”。
4、“wo’meng’dou’si’ping’fan’ren”。
and 6033, calculating the occurrence probability of each candidate pinyin string according to the training probability parameters. Specifically, the initial probability parameter, the transition probability parameter and the emission probability parameter of the syllable in the candidate pinyin string are multiplied to obtain a value which is the occurrence probability of the candidate pinyin string.
Step 6034, select a candidate pinyin string with the highest probability of occurrence as the pinyin after the voice signal conversion.
For example, by calculation, the occurrence probability of the pinyin string "wo 'men' dou 'shi' ping 'fan' ren" is the highest, and the pinyin string can be selected as the converted pinyin.
Or, more than one candidate pinyin strings with relatively high occurrence probability can be selected and output to be displayed to the user, the user selects the candidate pinyin strings, and the candidate pinyin strings selected by the user are used as the pinyin after the voice signal conversion.
For example, the occurrence probabilities of the pinyin strings "wo 'men' dou 'shi' ping 'fan' ren" and "wo 'men' dou 'si' ping 'fan' ren" are two relatively high, and the two pinyin strings may be selected as the converted pinyins. At this time, the two candidate pinyin strings may be output and displayed to the user, as shown in fig. 7, each candidate pinyin string is preceded by a label, and the user selects the candidate pinyin string according to the label, and if the user selects 1, the pinyin string "wo 'men' dou 'shi' ping 'fan' ren" is taken as the pinyin after the voice signal conversion.
In addition, the following alternatives of steps 6032 to 6034 may be provided, which are step 6032 ', step 6033 ' and step 6034 ', respectively.
6032', starting from the first decomposed speech element, the syllables corresponding to each speech element are sequentially combined into candidate pinyin of a phrase or a single character. For example:
for example, the candidate pinyin corresponding to the above speech "we are all adventures" is:
the first two speech elements form candidate pinyins of phrases "wo 'men" and "wo' meng";
the second and third phonetic elements constitute candidate pinyins of the phrases "dou 'shi" and "dou' si";
the last three phonetic elements constitute the candidate pinyin of the phrase "ping 'fan' ren".
And 6033', calculating the occurrence probability of each candidate pinyin according to the training probability parameters. Specifically, the initial probability parameter, the transition probability parameter and the emission probability parameter of syllables in the candidate pinyin are multiplied to obtain a value which is the occurrence probability of the candidate pinyin.
And 6034', sequentially outputting the candidate pinyins according to the occurrence probability, and determining the final converted pinyins of the voice signals according to a selection instruction of a user.
For example, fig. 8 is a schematic diagram of sequentially outputting pinyin corresponding to phrases or words according to the occurrence probability. As shown in the first step 801 of fig. 8, "1: wo' men "and" 2: wo 'meng' selected by the user, and if the user selects 1, further displaying the subsequent phrases according to the occurrence probability; as shown in the second step 802 of fig. 8, "1: dou' shi "and" 2: dou' si ", further selected by the user, if the user selects 1, further displaying the subsequent phrase; as shown in the third step 803 in fig. 8, "ping 'fan' ren" may be displayed, where the last pinyin may be selected by the user or by default by the system; finally, the 'wo' men 'dou' shi 'ping' fan 'ren' is used as the converted pinyin of the voice signal.
Of course, in the above process, all syllables (i.e. the pinyin of a single character) corresponding to each speech element may also be displayed in sequence, and the user selects the syllable of each speech element in sequence, thereby determining the pinyin finally converted by the speech signal.
After the pinyin string is obtained, text is generated using step 604. Step 604 may specifically include:
step 6041, generate a candidate word list including at least one candidate word according to the converted pinyin, and display the candidate word list on the intelligent device.
Step 6042, detecting whether the intelligent equipment inputs a selection instruction, and if the selection instruction is detected, executing step 6043; otherwise, this step 6042 is repeated.
And 6043, selecting characters from the candidate character list according to the selection instruction, and inputting the selected characters into the intelligent device.
The candidate text list in step 6041 may be a candidate word or a candidate whole sentence. The specific generation method comprises the following two steps:
and I, generating a candidate word. The invention needs to set a mapping table from the pinyin string to the candidate word sequence, namely a pinyin dictionary. The candidate words corresponding to each pinyin string in the pinyin dictionary are sorted according to the word frequency of the candidate words from large to small, the method for generating the candidate words is simple, namely the candidate words are searched in the pinyin dictionary according to the pinyin string, after the matched pinyin string is found, the first n candidate words corresponding to the pinyin string are output, and n is the number of the candidate words which can be displayed on the input method output interface.
And secondly, generating the whole sentence. In order to realize the input of the whole sentence, the invention adopts the maximum probability method to realize the prediction of the whole sentence, namely: in a pinyin string input by a user, there are a plurality of candidate word combination methods. Firstly, finding out all candidate words appearing in the Pinyin string, and then finding out a combination scheme with the maximum probability in the combination of the candidate words as a final whole sentence generation result.
Fig. 9 is a schematic diagram of a candidate word list of the pinyin string "wo 'men' dou 'shi' ping 'fan' ren". As shown in fig. 9, each arc corresponds to one or more candidate words, the candidate words are sorted from top to bottom in the graph according to word frequency from high to low, and each arc carries word frequency information which is not marked in the graph, and the word frequency information refers to the word frequency of the word with the largest word frequency in all the candidate words corresponding to the pinyin string. Because only one candidate whole sentence information is provided for the user, only the word with the highest word frequency is effective, that is, the words with the word frequency arranged in the second place and later, such as 'nest', 'gate', 'fighter' and the like, cannot appear in the final candidate whole sentence result.
Fig. 10 is a schematic diagram of the candidate word list simplified from fig. 9. As shown in fig. 10, a path with the highest probability is obtained by using a shortest path algorithm between two points, such as Dijkstra algorithm, Viterbi algorithm, etc., the path with the highest probability is a dashed path shown in fig. 10, the path is a word combination scheme, the path with the highest probability is displayed as a final whole sentence prediction result at the first position of a candidate word window, and the candidate word list window is shown in fig. 11, where only one whole sentence candidate result is shown, that is, "we are all trivial" at the first candidate position, and all candidate word results are from the second candidate position to the back.
After the candidate character list is generated, a user needs to select one of the candidate character lists as a final input result. In the present invention, the final input result can be determined in two ways, one is keyboard selection and the other is voice selection. That is, in step 6042, the selection instruction may be a keyboard instruction or a voice instruction.
When a user inputs a selection instruction through a keyboard, the method needs to pre-store the corresponding matching relation between the keyboard instruction and the position of the candidate character in the candidate character list; step 6043 specifically includes: after a keyboard instruction is detected, matching the detected keyboard instruction with the candidate character positions in the candidate character list according to the corresponding matching relation between the keyboard instruction and the candidate character positions in the candidate character list, and if the matching is correct, taking the candidate characters at the matched candidate character positions as the finally input characters.
When a user inputs a voice instruction through a microphone, the method needs to store the voice instruction and the corresponding matching relation between the voice instruction and the position of the candidate character in the candidate character list in advance, each selection instruction is represented by the voice of one character, and the corresponding relation from the voice to the selection instruction is established. For example, the speech of "1" corresponds to the first candidate character selected, the speech of "up" corresponds to the candidate character selected in the previous page, and the speech of "down" corresponds to the candidate character selected in the next page. The user can also modify the voice instruction according to the needs and represent the operation by different voice instructions, for example, the user can define some unusual voices as the voice instructions, so that the conflict between the voice instructions and the voice input is greatly reduced.
Further, after step 602 and before step 603, the method further includes: judging whether the received voice is a pre-stored voice instruction, if so, matching the received voice instruction with the candidate character position in the candidate character list according to the corresponding matching relation between the voice instruction and the candidate character position in the candidate character list, and if the matching is correct, taking the candidate character at the matched candidate character position as the finally input character; if not, step 603 is performed.
The above description is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention.