CN103106061A

CN103106061A - Voice input method and device

Info

Publication number: CN103106061A
Application number: CN2013100699755A
Authority: CN
Inventors: 张然; 邵颖; 王力劭
Original assignee: BEIJING VCYBER TECHNOLOGY Co Ltd
Current assignee: BEIJING VCYBER TECHNOLOGY Co Ltd
Priority date: 2013-03-05
Filing date: 2013-03-05
Publication date: 2013-05-15

Abstract

The embodiment of the invention provides a voice input method and device, and relates to the field of voice signal processing. The technical scheme provided by the embodiment of the invention comprises the following steps of: carrying out voice identification on original voice information input by a user, and displaying an obtained primary identification result; receiving secondary voice information input by the user after primary voice information is received; judging whether the secondary voice information indicates to be modified or not; and if so displaying the modified primary identification result according to the secondary voice information. The scheme can be applied to user terminals, such as a computer and a mobile phone.

Description

Pronunciation inputting method and device

Technical field

The present invention relates to field of voice signal, relate in particular to a kind of pronunciation inputting method and device.

Background technology

In recent years, along with the development of speech recognition technology, the user can realize controlling of mobile device by phonetic order, also can realize that the editor of word inputs by voice etc.Wherein, system can carry out speech recognition by the voice signal to user's input, and the Identification display result realizes editor's input of word.

Yet when having phonetically similar word in the user input voice signal or have noise etc. to disturb, all or part of of recognition result may be made mistakes; This moment, the user need to re-enter after the part of manual deletion error, complicated operation.

Summary of the invention

Embodiments of the invention provide a kind of pronunciation inputting method and device, can simplify user's operation.

On the one hand, provide a kind of pronunciation inputting method, comprising: the initial speech information to user's input is carried out speech recognition, obtains showing after recognition result first; Receive the secondary voice messaging that the user inputs after described initial speech information; Judge whether described secondary voice messaging indicates modification; If indication, according to described secondary voice messaging to the rear demonstration of modifying of described recognition result first.

On the other hand, provide a kind of speech input device, comprising:

The first display unit is used for the initial speech information of user's input is carried out speech recognition, obtains showing after recognition result first;

Voice receiving unit is used for receiving the secondary voice messaging that the user inputs after described initial speech information;

The indication confirmation unit is used for judging whether described secondary voice messaging indicates modification;

Revise display unit, if be used for indication, according to described secondary voice messaging to the rear demonstration of modifying of described recognition result first.

The pronunciation inputting method that the embodiment of the present invention provides and device, when the secondary voice messaging indication of inputting after initial speech information as the user is revised, can be directly according to the secondary voice messaging of user's input to the rear demonstration of modifying of recognition result first, thereby realize phonetic entry.The technical scheme that the embodiment of the present invention provides has solved and re-enter after the part of the manual deletion error of user's needs in the prior art, and the problem of complicated operation can improve the efficient of phonetic entry.

Description of drawings

In order to be illustrated more clearly in the embodiment of the present invention or technical scheme of the prior art, the below will do to introduce simply to the accompanying drawing of required use in embodiment or description of the Prior Art, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skills, under the prerequisite of not paying creative work, can also obtain according to these accompanying drawings other accompanying drawing.

The process flow diagram of the pronunciation inputting method that Fig. 1 provides for the embodiment of the present invention one;

The process flow diagram of the pronunciation inputting method that Fig. 2 provides for the embodiment of the present invention two;

The schematic diagram one of the pronunciation inputting method that Fig. 3 provides for the embodiment of the present invention two;

The schematic diagram two of the pronunciation inputting method that Fig. 4 provides for the embodiment of the present invention two;

The structural representation one of the speech input device that Fig. 5 provides for the embodiment of the present invention three;

Fig. 6 is the structural representation one of speech input device indicating confirmation unit shown in Figure 5;

Fig. 7 is the structural representation two of speech input device indicating confirmation unit shown in Figure 5;

The structural representation two of the speech input device that Fig. 8 provides for the embodiment of the present invention three.

Embodiment

Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is clearly and completely described, obviously, described embodiment is only the present invention's part embodiment, rather than whole embodiment.Based on the embodiment in the present invention, those of ordinary skills belong to the scope of protection of the invention not making the every other embodiment that obtains under the creative work prerequisite.

The embodiment of the present invention provides a kind of pronunciation inputting method and device, can solve the problem of prior art phonetic entry complexity.

Embodiment one:

As shown in Figure 1, the pronunciation inputting method that the embodiment of the present invention provides comprises:

Step 101 is carried out speech recognition to the initial speech information of user's input, obtains showing after recognition result first.

In the present embodiment, when the user need to be by the phonetic entry word, can press the start button on speech input device, make speech input device can receive by microphone the voice messaging of user's input.When receiving the initial speech information of user's input first, can carry out speech recognition to this initial speech information, obtain recognition result first.For the pronunciation inputting method scope of application that the embodiment of the present invention is provided wider, can identify the user speech information of different field, different accents, in the present embodiment, step 101 can adopt the unspecified person speech recognition technology that the initial speech information of user's input is identified, resolved, and obtains recognition result first.

In the present embodiment, step 101 can show recognition result first with conventional state; Use for the ease of the user, also can show recognition result first with state to be confirmed, be not restricted at this.Wherein, show that take state to be confirmed recognition result can as showing in the mode that covers floating layer, also can show for the mode with flicker first.Wherein, show in the mode that covers floating layer, can be similar with the mode that highlights, give unnecessary details no longer one by one at this.

In the present embodiment, when showing first recognition result with state to be confirmed, the user can treat the word of acknowledgement state and modify.During for fear of needs inputs homonym, the mistake of speech input device is revised, and when can there is no new phonetic entry in Preset Time after voice messaging first, is acknowledgement state with the word marking of state to be confirmed, as removes coating, cancels flicker etc.

Step 102 receives the secondary voice messaging that the user inputs after initial speech information.

In the present embodiment, after speech input device shows first recognition result by step 101, if the user need to in recognition result first partly or entirely word modify or need to continue other words of input, can again press the start button on speech input device, make speech input device can receive by microphone the secondary voice messaging of user's input.

Step 103 judges whether this secondary voice messaging indicates modification.

In the present embodiment, after speech input device receives the secondary voice messaging of user input by step 102, need at first by this secondary voice messaging of step 103 judgement be the user need to recognition result first modify input or the user need to continue to input other words and inputs.

Concrete, judge by step 103 whether this secondary voice messaging indicates the process of modification to comprise: this secondary voice messaging and initial speech information are carried out audio frequency compare, obtain the similarity value; Judge according to the relation of similarity value and default threshold value whether this secondary voice messaging indicates modification.Wherein, this secondary voice messaging and initial speech information are carried out audio frequency to be compared and can realize that audio frequency compare for extracting the audio frequency characteristics parameter, the process of this extraction audio frequency characteristics parameter can comprise: at first utilize wavelet transformation respectively initial speech information and secondary voice messaging to be compressed, obtain initial compression voice and second-compressed voice, this small wave converting method is preferably haar wavelet transform, also can be additive method, not be restricted at this; Then the method for employing " audio frame " is extracted respectively the audio frequency characteristics parameter of initial compression voice and second-compressed voice, obtains initial audio parameter and secondary audio frequency parameter, and this audio frequency characteristics parameter is preferably barycenter, root mean square, Mel cepstrum parameter etc.; At last initial audio parameter and secondary audio frequency parameter are carried out respectively Euclidean distance calculating, after obtaining similarity distance, determine the similarity value according to similarity distance.Also the audio frequency of voice messaging and secondary voice messaging first can be converted to identical time shaft model simultaneously, the recycling pattern recognition technique is realized the audio frequency comparison; Can also realize by other means that the audio frequency of secondary voice messaging and initial speech information compares, give unnecessary details no longer one by one at this.

Judge by step 103 whether this secondary voice messaging indicates the process of modification also can comprise: at first the secondary voice messaging is carried out semantic analysis, obtain analysis result; Then judge according to analysis result whether this secondary voice messaging indicates modification.Wherein, the secondary voice are carried out semantic analysis, can whether comprise for judging in the secondary voice messaging " will replace with ", " adding in the position " etc.; Also can carry out semantic analysis to the secondary voice by other means, give unnecessary details no longer one by one at this.

In the present embodiment, by audio frequency compare, semantic analysis judges whether this voice messaging indicates modification, speech input device both can be selected a kind of in said method according to user's needs, also can with the said method combination, can be user-friendly to; When the user need to be modified to the word of having inputted, both can realize revising by the voice that repeat to need to revise part, also can contain by input and revise semantic voice (as x is revised as y, or add y etc. in the x back) realize revising, need not the user and carry out the manually operation such as deletion, be user-friendly to, and can improve the efficient of phonetic entry.

Step 104, if indication, according to this secondary voice messaging to the rear demonstration of modifying of recognition result first.

In the present embodiment, if whether step 103 indicates modification by the audio frequency contrast judgement, step 104 is modified and can be comprised recognition result first according to this secondary voice messaging: at first the secondary voice messaging is carried out speech recognition, obtain at least one secondary recognition result; Then obtain the target recognition result from least one secondary recognition result; At last according to the target recognition result to the rear demonstration of modifying of recognition result first.If step 103 judges whether the indication modification by semantic analysis, step 104 is modified and can be comprised recognition result first according to this secondary voice messaging: at first obtain location revision and target voice information according to analysis result, as can be used as target voice information by the part after " replacing with " in the secondary voice messaging etc.; Target voice information is carried out speech recognition, obtain at least one secondary recognition result; Obtain the target recognition result from least one secondary recognition result; According to this target recognition result and location revision to the rear demonstration of modifying of recognition result first.Wherein, obtain the target recognition result from least one secondary recognition result, both can be for having obtained the target recognition result according to the frequency of utilization of at least one secondary recognition result, also can be for obtaining the target recognition result according at least one secondary recognition result and the degree of association of recognition result first.

In the present embodiment, in step 104, recognition result is first modified and can be comprised: at first recognition result is first modified, obtain amended recognition result; Then the automatic recognition result first that shown of deletion; At last showing the amended recognition result of position display of recognition result first.Wherein, to recognition result first modify can for: at first determine the location revision of recognition result first; Then root is modified to recognition result first.

Preferably, when indication during to all or part of replacement of recognition result first, by step 104 pair first recognition result modify, also can comprise: at first automatically delete first part to be replaced in recognition result; Then add in the position of partial response to be replaced and replace the rear demonstration of part.When indication is added in recognition result first, by step 104 pair first recognition result modify, can show after corresponding result is added in recognition result first.

The pronunciation inputting method that the embodiment of the present invention provides, when the secondary voice messaging indication of inputting after initial speech information as the user is revised, can be directly according to the secondary voice messaging of user's input to the rear demonstration of modifying of recognition result first, thereby realize phonetic entry.The technical scheme that the embodiment of the present invention provides has solved and re-enter after the part of the manual deletion error of user's needs in the prior art, and the problem of complicated operation can improve the efficient of phonetic entry.

Embodiment two:

As shown in Figure 2, the pronunciation inputting method that the embodiment of the present invention provides, the method is to shown in Figure 1 similar, and difference is, if determine that by step 103 the secondary voice messaging do not indicate modification, the method that the present embodiment provides also comprises:

Step 105 is carried out speech recognition to the secondary voice messaging, obtains the secondary recognition result.

In the present embodiment, if do not indicate modification by the definite secondary voice messaging of step 103, explanation need to continue input after input initial speech information, therefore can directly carry out speech recognition to the secondary voice messaging, obtains the secondary recognition result.For the pronunciation inputting method scope of application that the embodiment of the present invention is provided wider, can identify the user speech information of different field, different accents, in the present embodiment, step 105 can adopt the unspecified person speech recognition technology that the initial speech information of user's input is identified, resolved, and obtains recognition result first.

Step 106 is showing the secondary recognition result after recognition result first.

In the present embodiment, after obtaining the secondary recognition result by step 105, can be and then first recognition result show this secondary recognition result.

In order to make those skilled in the art can understand the technical scheme that the embodiment of the present invention provides, need to describe as example by phonetic entry " sigh wind and cloud changeable " take the user, suppose that recognition result is " sigh wind and cloud different transform " first, be shown as the sigh wind and cloud different transform that covers floating layer "; Due to " conversion " two character errors, and recognition result is state to be confirmed first, therefore the user can be in Preset Time, input audio frequency " bian huan ", make speech input device that " bian huan " and " tan xi feng yun duo bian huan " audio frequency are carried out the audio frequency comparison successively, audio frequency " bian huan " indication of determining input is modified to recognition result first; At first speech input device can carry out speech recognition to " bian huan " afterwards, obtains at least one secondary recognition result---conversion, change irregularly, slow down, trouble on the frontier and just changing; The degree of association by " sigh wind and cloud " in above-mentioned secondary recognition result and recognition result first can determine that " change " is the target recognition result; Thereafter " conversion " deletion automatically in the recognition result first that speech input device is corresponding with audio frequency " bian huan " is shown as " sigh wind and cloud is many " that cover floating layer; Then speech input device can add target recognition result " change " to " conversion " relevant position in original recognition result first, be shown as " the sigh wind and cloud changeable " that cover floating layer, and recognition result is labeled as acknowledgement state first, as shown in Figure 3.Especially, if the user does not input audio frequency in Preset Time, the recognition result first that shows in the mode that covers floating layer can be labeled as acknowledgement state, as shown in Figure 4, when making the user again input voice after Preset Time, can continue recognition result input first, when needing avoiding the input homonym, the problem that the mistake of speech input device is revised; If the voice that the user inputs are again indicated, recognition result is not first modified, can continue recognition result input first yet.

Embodiment three:

As shown in Figure 5, the speech input device that the embodiment of the present invention provides comprises:

The first display unit 501 is used for the initial speech information of user's input is carried out speech recognition, obtains showing after recognition result first;

Voice receiving unit 502 is used for receiving the secondary voice messaging that the user inputs after initial speech information;

Indication confirmation unit 503 is used for judging whether the secondary voice messaging indicates modification;

Revise display unit 504, if be used for indication, according to the secondary voice messaging to the rear demonstration of modifying of recognition result first.

In the present embodiment, realize the process of phonetic entry by the first display unit 501, voice receiving unit 502, indication confirmation unit 503 and modification display unit 504, with the similar process that the embodiment of the present invention one provides, give unnecessary details no longer one by one at this.

Further, as shown in Figure 6, the present embodiment indicating confirmation unit 503 comprises:

Audio frequency comparing module 5031 is used for that secondary voice messaging and initial speech information are carried out audio frequency and compares, and obtains the similarity value;

First confirms module 5032, is used for judging according to the relation of similarity value and default threshold value whether the secondary voice messaging indicates modification.

As shown in Figure 7, this indication confirmation unit 503 also can comprise:

Semantic module 5033 is used for the secondary voice messaging is carried out semantic analysis, obtains analysis result;

Second confirms module 5034, is used for judging according to analysis result whether the secondary voice messaging indicates modification.

In the present embodiment, indication confirmation unit 503 can include only audio frequency comparing module 5031 and first and confirm module 5032, as shown in Figure 6; Also can include only semantic module 5033 and second and confirm module 5034, as shown in Figure 7; Can also both comprise audio frequency comparing module 5031 and the first confirmation module 5032, comprise again semantic module 5033 and the second confirmation module 5034, give unnecessary details no longer one by one at this.

In the present embodiment, indication confirmation unit 503 comprises when audio frequency comparing module 5031 and first is confirmed module 5032, audio frequency comparing module 5031, can comprise: the audio compression submodule, be used for respectively initial speech information and secondary voice messaging being compressed, obtain initial compression voice and second-compressed voice; The parameter extraction submodule is used for extracting the audio frequency characteristics parameter of initial compression voice and second-compressed voice respectively, obtains initial audio parameter and secondary audio frequency parameter; The distance operation submodule is used for initial audio parameter and secondary audio frequency parameter are carried out respectively the Euclidean distance computing, obtains similarity distance; Similarity is obtained submodule, is used for determining the similarity value according to similarity distance.At this moment, revise display unit, can comprise: the first identification module, be used for the secondary voice messaging is carried out speech recognition, obtain at least one secondary recognition result; The first acquisition module as a result is used for obtaining the target recognition result from least one secondary recognition result; The first modified module is used for according to the target recognition result the rear demonstration of modifying of recognition result first.

In the present embodiment, indication confirmation unit 503 comprises when semantic module 5033 and second is confirmed module 5034, revises display unit, can comprise: the position acquisition module is used for obtaining location revision and target voice information according to analysis result; The second identification module is used for target voice information is carried out speech recognition, obtains at least one secondary recognition result; The second acquisition module as a result is used for obtaining the target recognition result from least one secondary recognition result; The second modified module is used for according to target recognition result and location revision the rear demonstration of modifying of recognition result first.

In the present embodiment, the first/the second acquisition module as a result, can comprise: frequency acquisition submodule or the degree of association are obtained submodule.Wherein, the frequency acquisition submodule is used for obtaining the target recognition result degree of association according to the frequency of utilization of at least one secondary recognition result and obtains submodule, is used for obtaining the target recognition result according at least one secondary recognition result and the degree of association of recognition result first.

Further, if the indication confirmation unit is not indicated modification, as shown in Figure 8, the speech input device that the present embodiment provides can also comprise:

Recognition unit 505 is used for the secondary voice messaging is carried out speech recognition, obtains the secondary recognition result;

The second display unit 506 is used for showing the secondary recognition result after recognition result first.

In the present embodiment, realize the process of phonetic entry by recognition unit 505 and the second display unit 506, to the embodiment of the present invention two provide similar, give unnecessary details no longer one by one at this.

The speech input device that the embodiment of the present invention provides, when the secondary voice messaging indication of inputting after initial speech information as the user is revised, can be directly according to the secondary voice messaging of user's input to the rear demonstration of modifying of recognition result first, thereby realize phonetic entry.The technical scheme that the embodiment of the present invention provides has solved and re-enter after the part of the manual deletion error of user's needs in the prior art, and the problem of complicated operation can improve the efficient of phonetic entry.

The pronunciation inputting method that the embodiment of the present invention provides and device can be used on the user terminals such as computer, mobile phone.

The above; be only the specific embodiment of the present invention, but protection scope of the present invention is not limited to this, anyly is familiar with those skilled in the art in the technical scope that the present invention discloses; can expect easily changing or replacing, within all should being encompassed in protection scope of the present invention.Therefore, protection scope of the present invention should be as the criterion by described protection domain with claim.

Claims

1. a pronunciation inputting method, is characterized in that, comprising:

Initial speech information to user's input is carried out speech recognition, obtains showing after recognition result first;

Receive the secondary voice messaging that the user inputs after described initial speech information;

Judge whether described secondary voice messaging indicates modification;

If indication, according to described secondary voice messaging to the rear demonstration of modifying of described recognition result first.

2. pronunciation inputting method according to claim 1, is characterized in that, describedly judges that whether described secondary voice messaging indicates modification, comprising:

Described secondary voice messaging and described initial speech information are carried out audio frequency compare, obtain the similarity value;

Judge according to the relation of described similarity value and default threshold value whether described secondary voice messaging indicates modification.

3. pronunciation inputting method according to claim 2, is characterized in that, describedly described secondary voice messaging and described initial speech information are carried out audio frequency compares, and obtains the step of similarity value, comprising:

Respectively described initial speech information and described secondary voice messaging are compressed, obtained initial compression voice and second-compressed voice;

Extract respectively the audio frequency characteristics parameter of described initial compression voice and described second-compressed voice, obtain initial audio parameter and secondary audio frequency parameter;

Described initial audio parameter and described secondary audio frequency parameter are carried out respectively the Euclidean distance computing, obtain similarity distance;

Determine the similarity value according to described similarity distance.

4. pronunciation inputting method according to claim 2, is characterized in that, described according to described secondary voice messaging to the rear demonstration of modifying of described recognition result first, comprising:

Described secondary voice messaging is carried out speech recognition, obtain at least one secondary recognition result;

Obtain the target recognition result from described at least one secondary recognition result;

According to described target recognition result to the rear demonstration of modifying of described recognition result first.

5. pronunciation inputting method according to claim 1, is characterized in that, describedly judges that whether described secondary voice messaging indicates modification, comprising:

Described secondary voice messaging is carried out semantic analysis, obtain analysis result;

Judge according to described analysis result whether described secondary voice messaging indicates modification.

6. pronunciation inputting method according to claim 5, is characterized in that, described according to described secondary voice messaging to the rear demonstration of modifying of described recognition result first, comprising:

Obtain location revision and target voice information according to described analysis result;

Described target voice information is carried out speech recognition, obtain at least one secondary recognition result;

According to described target recognition result and described location revision to the rear demonstration of modifying of described recognition result first.

7. according to claim 4 or 6 described pronunciation inputting methods, is characterized in that, describedly obtains the target recognition result from described at least one secondary recognition result, comprising:

Obtain the target recognition result according to the frequency of utilization of described at least one secondary recognition result; Perhaps,

Obtain the target recognition result according to the degree of association of described at least one secondary recognition result and described recognition result first.

8. pronunciation inputting method according to claim 1, is characterized in that, described obtaining shows after recognition result first, comprising:

Obtain showing in the mode that covers floating layer after recognition result first; Perhaps

Obtain showing in the mode of glimmering after recognition result first.

9. pronunciation inputting method according to claim 1, is characterized in that, if not indication also comprises:

Described secondary voice messaging is carried out speech recognition, obtain the secondary recognition result;

Show described secondary recognition result after described recognition result first.

10. a speech input device, is characterized in that, comprising:

11. speech input device according to claim 10 is characterized in that, described indication confirmation unit comprises:

The audio frequency comparing module is used for that described secondary voice messaging and described initial speech information are carried out audio frequency and compares, and obtains the similarity value;

First confirms module, is used for judging according to the relation of described similarity value and default threshold value whether described secondary voice messaging indicates modification.

12. speech input device according to claim 11 is characterized in that, described audio frequency comparing module comprises:

The audio compression submodule is used for respectively described initial speech information and described secondary voice messaging being compressed, and obtains initial compression voice and second-compressed voice;

The parameter extraction submodule is used for extracting the audio frequency characteristics parameter of described initial compression voice and described second-compressed voice respectively, obtains initial audio parameter and secondary audio frequency parameter;

The distance operation submodule is used for described initial audio parameter and described secondary audio frequency parameter are carried out respectively the Euclidean distance computing, obtains similarity distance;

Similarity is obtained submodule, is used for determining the similarity value according to described similarity distance.

13. speech input device according to claim 11 is characterized in that, revises display unit, comprising:

The first identification module is used for described secondary voice messaging is carried out speech recognition, obtains at least one secondary recognition result;

The first acquisition module as a result is used for obtaining the target recognition result from described at least one secondary recognition result;

The first modified module is used for according to described target recognition result the rear demonstration of modifying of described recognition result first.

14. speech input device according to claim 10 is characterized in that, described indication confirmation unit comprises:

Semantic module is used for described secondary voice messaging is carried out semantic analysis, obtains analysis result;

Second confirms module, is used for judging according to described analysis result whether described secondary voice messaging indicates modification.

15. speech input device according to claim 14 is characterized in that, described modification display unit comprises:

The position acquisition module is used for obtaining location revision and target voice information according to described analysis result;

The second identification module is used for described target voice information is carried out speech recognition, obtains at least one secondary recognition result;

The second acquisition module as a result is used for obtaining the target recognition result from described at least one secondary recognition result;

The second modified module is used for according to described target recognition result and described location revision the rear demonstration of modifying of described recognition result first.

16. according to claim 13 or 15 described speech input devices is characterized in that, the described the first/the second acquisition module as a result, and comprising: frequency acquisition submodule or the degree of association are obtained submodule;

Described frequency acquisition submodule is used for obtaining the target recognition result according to the frequency of utilization of described at least one secondary recognition result;

The described degree of association is obtained submodule, is used for obtaining the target recognition result according to the degree of association of described at least one secondary recognition result and described recognition result first.

17. speech input device according to claim 10 is characterized in that, described obtaining shows after recognition result first, comprising:

Obtain showing in the mode of glimmering after recognition result first.

18. speech input device according to claim 10 is characterized in that, if not indication, described device also comprises:

Recognition unit is used for described secondary voice messaging is carried out speech recognition, obtains the secondary recognition result;

The second display unit is used for showing described secondary recognition result after described recognition result first.