CN103366742B - Pronunciation inputting method and system - Google Patents

Pronunciation inputting method and system Download PDF

Info

Publication number
CN103366742B
CN103366742B CN201210101302.9A CN201210101302A CN103366742B CN 103366742 B CN103366742 B CN 103366742B CN 201210101302 A CN201210101302 A CN 201210101302A CN 103366742 B CN103366742 B CN 103366742B
Authority
CN
China
Prior art keywords
candidate
word
syllable
text
content
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201210101302.9A
Other languages
Chinese (zh)
Other versions
CN103366742A (en
Inventor
李曜
许东星
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SHANGHAI GEAK ELECTRONICS Co.,Ltd.
Original Assignee
SHANGHAI GUOKE ELECTRONIC CO Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SHANGHAI GUOKE ELECTRONIC CO Ltd filed Critical SHANGHAI GUOKE ELECTRONIC CO Ltd
Priority to CN201210101302.9A priority Critical patent/CN103366742B/en
Publication of CN103366742A publication Critical patent/CN103366742A/en
Application granted granted Critical
Publication of CN103366742B publication Critical patent/CN103366742B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The present invention relates to a kind of pronunciation inputting method and system, the method includes:Constantly by the phonetic segmentation sound bite of input and the text of each sound bite is generated while recording;And the text of each sound bite is shown successively, the text of each sound bite is modified successively according to the user's choice.The present invention can with automatic segmentation voice recognition result and carry out segmentation return for user's secondary-confirmation, user can be directed at returned text while recording and modify and confirm.

Description

Pronunciation inputting method and system
Technical field
The invention belongs to field of speech recognition, more particularly to a kind of pronunciation inputting method and system.
Background technology
With the rise of progress and the cloud computing of speech recognition technology, inputs and pass through using voice on mobile terminals Cloud server carries out the transcription of speech-to-text and text is had become a kind of trend back to the scheme of mobile terminal.Due to The size of mobile terminal limits, and the convenience that text input is directly carried out by physics or dummy keyboard is always not fully up to expectations, It is contemplated that voice input will substitute key-press input in more and more places.
But the present situation that speech recognition accuracy is difficult to reach 100% hinders voice input thoroughly replacement key-press input Process.In fact, due to the complexity really pronounced under the conditions of various in life, the accuracy rate of speech recognition never may Reach 100%, especially under noisy environment, necessarily there may be various mistakes in recognition result, that is to say, that for voice The result of identification certainly exists the process of a secondary-confirmation.Existing voice input scheme is as follows:When press record button it Afterwards, the interface that expression as shown in Figure 1 is being recorded can be popped up on mobile terminal, then user loquiturs, after finishing, meeting On interface as shown in Figure 2 by the textual presentation recognized in a Text Entry 21, if in text input frame 21 There is identification mistake in text, then recall keyboard 22 by user and modify and confirm preservation.However in this voice input scheme, User cannot make recognition result any editor in Recording Process, it is necessary to all be finished in the voice that will disposably input Afterwards, user could be changed and be confirmed that preservation, the text that then will confirm that again are used for one by one to the mistake in returned text It subsequently such as sends short messages, sends out mail, the application kept record etc.So this confirmation process is generally for more numerous for user It is trivial, not friendly enough.
Invention content
The purpose of the present invention is to provide a kind of pronunciation inputting method and systems, can be automatically segmented to input voice Identification, the recording side user Ke Bian to identification by stages to text be modified.
To solve the above problems, the present invention provides a kind of pronunciation inputting method, including:
Constantly by the phonetic segmentation sound bite of input and the text of each sound bite is generated while recording;And
The text of each sound bite is shown successively, and the text of each sound bite is carried out successively according to the user's choice It corrects.
Further, in the above-mentioned methods, it constantly by the phonetic segmentation sound bite of input and is generated by cloud server The text of each sound bite.
Further, in the above-mentioned methods, by voice activity detection algorithm constantly by the phonetic segmentation voice sheet of input Section.
Further, in the above-mentioned methods, described that the text of each sound bite is carried out successively according to the user's choice The step of amendment includes:
User selects to need modified content in the text of each sound bite;
It generates corresponding to the syllable of each word in the candidate word of each word, the content in the content and corresponding to described The candidate syllable of each word in content;
The text in pronunciation segment is carried out according to the candidate word of user's selection, the syllable and the candidate syllable It corrects.
Further, in the above-mentioned methods, the candidate word, the syllable and the candidate selected according to user The step of syllable is modified the text in pronunciation segment include:
When user selects the candidate word, the candidate word selected is replaced to the corresponding word in the content;
When user selects the syllable, the candidate word corresponding to the syllable is generated, from the candidate word of the syllable It selects correct candidate word and replaces corresponding word in the content;
When user's selection candidate syllable, the candidate word for corresponding to candidate syllable is generated, from the candidate syllable Correct candidate word is selected in candidate word replaces corresponding word in the content;
When in the candidate word, candidate syllable of generation without correctly as a result, can then call input method to text into Row modification.
Further, in the above-mentioned methods, it is described recording while constantly by the phonetic segmentation sound bite of input simultaneously Before the step of generating the text of each sound bite, further include:Noise monitoring is carried out to playback environ-ment in recording and obtains letter It makes an uproar ratio.
Further, in the above-mentioned methods, described generate corresponds to the candidate word of each word, the content in the content In each word syllable and corresponding to including the step of the candidate syllable of each word in the content:
When the signal-to-noise ratio is more than predetermined threshold value, the candidate word, the candidate syllable are reduced;
When the signal-to-noise ratio is less than predetermined threshold value, increase the candidate word, the candidate syllable.
Another side according to the present invention provides a kind of voice entry system, including:
Cutting module, for constantly by the phonetic segmentation sound bite of input and generating each voice sheet while recording The text of section;And
Correcting module, the text for showing each sound bite successively, according to the user's choice successively to each voice The text of segment is modified.
Further, in above system, the cutting module is located on cloud server.
Further, in above system, the cutting module is by voice activity detection algorithm constantly by the language of input Sound cutting sound bite.
Further, in above system, the correcting module includes:
Selecting unit selects to need modified content in the text of each sound bite for obtaining user;
Candidate unit, for generating the sound corresponding to each word in the candidate word of each word, the content in the content Section and corresponding to each word in the content candidate syllable;
Amending unit, the candidate word, the syllable and the candidate syllable for being selected according to user are to pronunciation piece Text in section is modified.
Further, in above system, the amending unit, for when user selects the candidate word, will select The candidate word replace the corresponding word in the content;When user selects the syllable, generate corresponding to the syllable Candidate word replaces corresponding word in the content from correct candidate word is selected in the candidate word of the syllable;When user selects When candidate's syllable, the candidate word for corresponding to candidate syllable is generated, is selected correctly from the candidate word of the candidate syllable Candidate word replaces the corresponding word in the content;When no correctly as a result, then may be used in the candidate word, candidate syllable of generation To call input method to modify text.
Further, further include noise monitoring unit in above system, for making an uproar to playback environ-ment in recording Sound monitoring obtains signal-to-noise ratio.
Further, in above system, the candidate unit, for when the signal-to-noise ratio is more than predetermined threshold value, subtracting Few candidate word, the candidate syllable;When the signal-to-noise ratio is less than predetermined threshold value, increase the candidate word.
Compared with prior art, the present invention by recording while constantly by the phonetic segmentation sound bite of input and life At the text of each sound bite, the text of each sound bite is shown successively, according to the user's choice successively to each voice The text of segment is modified, and with automatic segmentation voice recognition result and can carry out segmentation return for user's secondary-confirmation, user Returned text can be directed at while recording to modify and confirm.
In addition, needing modified content in selecting the text of each sound bite by user, then generates and correspond to institute State in content the syllable of each word and the candidate sound corresponding to each word in the content in the candidate word of each word, the content Section repaiies the text in pronunciation segment further according to the candidate word of user's selection, the syllable and the candidate syllable Just, it can facilitate user that correct word is quickly selected to be modified the content in text.
In addition, signal-to-noise ratio is obtained by carrying out noise monitoring to playback environ-ment in recording, when the signal-to-noise ratio is more than in advance If when threshold value, reducing the candidate word, the candidate syllable;When the signal-to-noise ratio is less than predetermined threshold value, increase the candidate Word, the candidate syllable, can adjust the number of candidate result according to different signal-to-noise ratio.
Description of the drawings
Fig. 1 is the recording interface schematic diagram of existing voice input scheme;
Fig. 2 is the identification textual presentation and modification interface schematic diagram of existing voice input scheme;
Fig. 3 is the flow chart of the pronunciation inputting method of the embodiment of the present invention one;
Fig. 4 is recording, identification textual presentation and the modification interface schematic diagram of the embodiment of the present invention one
Fig. 5 is the embodiment of the present invention one successively to identifying that text is shown and changes interface schematic diagram;
Fig. 6 is the flow chart of the pronunciation inputting method of the embodiment of the present invention two;
Fig. 7 is the noise monitoring interface schematic diagram of the embodiment of the present invention two;
Fig. 8 is the high-level schematic functional block diagram of the voice entry system of the embodiment of the present invention three.
Specific implementation mode
In order to make the foregoing objectives, features and advantages of the present invention clearer and more comprehensible, below in conjunction with the accompanying drawings and specific real Applying mode, the present invention is described in further detail.
Embodiment one
As seen in figures 3-5, the present invention provides a kind of pronunciation inputting method, including:
Step S11 constantly by the phonetic segmentation sound bite of input and generates each sound bite while recording Text, specifically, the present invention can automatic segmentation voice recognition result and carry out segmentation return for user's secondary-confirmation, can be by high in the clouds Server constantly by the phonetic segmentation sound bite of input and generates the text of each sound bite, is calculated by speech terminals detection For method constantly by the phonetic segmentation sound bite of input, speech terminals detection is accurately determined from the segment signal comprising voice The starting point and ending point of voice, it is one in voice processing technology to distinguish voice and non-speech audio, speech terminals detection Importance can be used the algorithm of end-point detection, by effective language for example, when user continuously inputs voice by cloud server Sound is cut into one one according to the speak rhythm of pause of user, and is converted into text successively, returns to mobile terminal as shown in Figure 4 Displaying interface on, the interface will record interface and recognition result displaying interface be integrated on the same interface;
Step S12 shows the text of each sound bite successively;
Step S13 is according to the user's choice successively modified the text of each sound bite, specifically, of the invention Middle user can be directed at returned text while recording and modify and confirm, it should be noted that interaction schemes of the invention In, all text identification results are not all shown, but only by the text identification result of current fragment be illustrated in as On the interface of Fig. 5, after user is modified and confirms to the recognition result 1 of sound bite 1, then next section of recognition result is shown 2, this exhibition scheme is advantageous in that be shown limited as a result, allowing user can be by attention collection successively on limited screen In in current recognition result, improve the efficiency of modification text, this shown step may particularly include:
Step S131, user selects to need modified content in the text of each sound bite, specifically, when user needs When changing the partial words in text identification result, the specific word in text identification result can be clicked;
Step S132, generate correspond in the content in the candidate word of each word, the content syllable of each word and Corresponding to the candidate syllable of each word in the content, specifically, the specific text changed is needed in recognition result when the user clicks When word, pop-up several candidate words corresponding with the word, including the correspondence syllable of the word and several candidate sounds can be set Section, can effectively combine voice recognition result with input method in this way, provide multiple candidates and selected for user, and will know It is syllable that other result is degenerated from word, expands the range of hit, makes user that need not input a string of letters, but is found by candidate Oneself required word;
Step S133, according to the candidate word of user's selection, the syllable and the candidate syllable in pronunciation segment Text be modified, specifically, when user is modified and confirms to the recognition result of return, it is possible to provide as shown in Figure 5 " cancellation " and " confirmation " two order, be respectively used to rapidly to delete and preserve this text identification as a result, this step can be into One step includes:
The candidate word selected is replaced the phase in the content by step S1331 when user selects the candidate word Word is answered, if specifically, correct word is present in candidate word, user, which clicks directly on candidate word, can substitute original identification mistake Word;
Step S1332 generates the candidate word corresponding to the syllable, from the syllable when user selects the syllable Candidate word in select correct candidate word and replace corresponding word in the content, if specifically, there is no correct in candidate word Word, then user can click correct syllable, then that of input is thought in selection from the syllable corresponding candidate word of offer A word;
Step S1333 generates the candidate word for corresponding to candidate syllable when user's selection candidate syllable, from described Correct candidate word is selected in the candidate word of candidate syllable and replaces corresponding word in the content, if specifically, correct syllable Correct word is not present in corresponding candidate word, then user can click candidate syllable, then be corresponded to from candidate's syllable of offer Candidate word in selection think input that word;
Step S1334, when no correctly as a result, can then call input in the candidate word, candidate syllable of generation Method modifies to text.
The present invention recording interface and can will return the result interface and be simultaneously displayed on the interface of mobile terminal, allow user can be with The text results of return are seen while recording, and the text results of return can be modified in real time, i.e., user can connect One section of voice is continued, the text results of return are modified and are confirmed in the case where not closing recording, then proceed to record, Other people voice of sound recordings can also be used on one side, and is corrected confirmation identification simultaneously and returned the result.
Embodiment two
As shown in Figure 6 and Figure 7, the present invention provides another pronunciation inputting method, and the difference of the present embodiment and embodiment exists In, it increases and the step of noise monitoring obtains signal-to-noise ratio is carried out to playback environ-ment in recording, it can be according to different signal-to-noise ratio The number of candidate result is adjusted, and is being not suitable for prompting user, this example can be specific using the very noisy of voice input Including:
Step S21 carries out noise monitoring acquisition signal-to-noise ratio, specifically, this step can be automatic in recording to playback environ-ment The signal-to-noise ratio of detection input voice is simultaneously fed back on interactive interface, can be not suitable for carrying using the very noisy of voice input Show user, the number of candidate result can be also adjusted according to different signal-to-noise ratio in subsequent step S242, since noise is for language The influence of sound identification is very big, and when playback environ-ment noise is stronger, the accuracy rate meeting dramatic decrease of speech recognition, user needs to change Word also greatly increase, therefore, the function of noise monitoring can be added in the present embodiment, can be according to the knot of end-point detection Fruit, calculating separately the corresponding voice segments energy of the result and mute section of energy to every section of recognition result, (mute section of energy is equivalent to The energy of noise), to estimate the signal-to-noise ratio of this section of voice, and by the pollution level of ambient noise when recording with such as Fig. 7 institutes The interface of the band recording volume bar 71 and noise ration bar 72 that show is shown, when ambient noise is more than certain threshold value Later, user " current noise is excessive, it is proposed that is inputted using keyboard " can be prompted;
Step S22 constantly by the phonetic segmentation sound bite of input and generates each sound bite while recording Text, specifically, the text of each sound bite constantly by the phonetic segmentation sound bite of input and is generated by cloud server, By voice activity detection algorithm constantly by the phonetic segmentation sound bite of input;
Step S23 shows the text of each sound bite successively;
Step S24 is according to the user's choice successively modified the text of each sound bite, this step can be wrapped specifically It includes:
Step S241, user select to need modified content in the text of each sound bite;
Step S242, generate correspond in the content in the candidate word of each word, the content syllable of each word and Corresponding to the candidate syllable of each word in the content, can facilitate user quickly select correct word to the content in text into Row is corrected, this step can further comprise:
Step S2421 reduces the candidate word, the candidate syllable, specifically when the signal-to-noise ratio is more than predetermined threshold value , signal-to-noise ratio is big, indicates that voice is small by the pollution of noise, and the accuracy of recognition result is high, then can suitably reduce candidate result Number;
Step S2422 increases the candidate word, the candidate syllable, specifically when the signal-to-noise ratio is less than predetermined threshold value , signal-to-noise ratio is small, indicates that voice is big by noise pollution, then the possibility that mistake occurs in recognition result also greatly increases, then needs The number for increasing candidate result, correct word can be therefrom selected convenient for user;
Step S243, according to the candidate word of user's selection, the syllable and the candidate syllable in pronunciation segment Text be modified, this step can further comprise:
The candidate word selected is replaced the phase in the content by step S2431 when user selects the candidate word Answer word;
Step S2432 generates the candidate word corresponding to the syllable, from the syllable when user selects the syllable Candidate word in select correct candidate word and replace corresponding word in the content;
Step S2433 generates the candidate word for corresponding to candidate syllable when user's selection candidate syllable, from described Correct candidate word is selected in the candidate word of candidate syllable replaces corresponding word in the content;
Step S2434, when no correctly as a result, can then call input in the candidate word, candidate syllable of generation Method modifies to text.
The multiple voices such as noise monitoring, end-point detection, continuous speech recognition technology or frame are integrated in the present embodiment It in one interactive process, allows user that can fully experience the convenience of voice input, it is defeated with button in voice input to improve user Enter user experience when promiscuous operation.
Embodiment three
As shown in figure 8, the present invention also provides another voice entry system, including cutting module 41, correcting module 42 and Noise monitoring unit 43.
Cutting module 41 is used to constantly by the phonetic segmentation sound bite of input and generate each voice while recording The text of segment, specifically, the cutting module 41 is located on cloud server, the cutting module 41 is examined by sound end For method of determining and calculating constantly by the phonetic segmentation sound bite of input, this module automatic segmentation voice recognition result and can carry out segmentation return For user's secondary-confirmation.
Correcting module 42 is used to show the text of each sound bite successively, according to the user's choice successively to each voice The text of segment is modified, specifically, this module can realize that user is directed at returned text while recording and modifies and really Recognize, it should be noted that in interaction schemes of the invention, all text identification results are not all shown, but only By on the text identification result of current fragment displaying interface, the text identification result of the sound bite is modified in user and After confirmation, then show next section of recognition result, this exhibition scheme be advantageous in that shown successively on limited screen it is limited As a result, allow user that can concentrate our efforts for current recognition result, improve the efficiency of modification text, the correcting module 42 can further comprise selecting unit 421, candidate unit 422 and amending unit 423.
Selecting unit 421 is used to obtain user and selects to need modified content in the text of each sound bite.
Candidate unit 422 is used to generate corresponding to each word in the candidate word of each word, the content in the content Syllable and candidate syllable corresponding to each word in the content, specifically, need to change when the user clicks in recognition result When specific word, corresponding with the word several candidate words of pop-up can be set, including the correspondence syllable of the word and several Candidate syllable can effectively combine voice recognition result with input method in this way, provide multiple candidates and selected for user, And it is syllable that recognition result is degenerated from word, expands the range of hit, makes user that need not input a string of letters, but passes through time Oneself required word is found in choosing, in addition, the candidate unit 412 can be additionally used in when the signal-to-noise ratio is more than predetermined threshold value, The candidate word, the candidate syllable are reduced, signal-to-noise ratio is big, indicates that voice is small by the pollution of noise, the accuracy of recognition result Height then can suitably reduce the number of candidate result;When the signal-to-noise ratio is less than predetermined threshold value, increase the candidate word, institute Candidate syllable is stated, signal-to-noise ratio is small, indicates that voice is big by noise pollution, then the possibility that mistake occurs in recognition result also increases greatly Add, then needs the number for increasing candidate result, correct word can be therefrom selected convenient for user.
The candidate word, the syllable and the candidate syllable that amending unit 423 is used to be selected according to user are to pronunciation Text in segment is modified, specifically, the amending unit 413 is used to, when user selects the candidate word, to select The candidate word replace the corresponding word in the content;When user selects the syllable, generate corresponding to the syllable Candidate word replaces corresponding word in the content from correct candidate word is selected in the candidate word of the syllable;When user selects When candidate's syllable, the candidate word for corresponding to candidate syllable is generated, is selected correctly from the candidate word of the candidate syllable Candidate word replaces the corresponding word in the content;When no correctly as a result, then may be used in the candidate word, candidate syllable of generation To call input method to modify text.
Noise monitoring unit 43 is used to carry out noise monitoring to playback environ-ment in recording to obtain signal-to-noise ratio, can be according to not The number of same signal-to-noise ratio adjustment candidate result, and be not suitable for prompting user using the very noisy of voice input.
The present invention while recording by constantly by the phonetic segmentation sound bite of input and generating each sound bite Text, show the text of each sound bite successively, according to the user's choice successively to the text of each sound bite carry out It corrects, with automatic segmentation voice recognition result and segmentation can be carried out return for user's secondary-confirmation, user can record one on one side While modifying and confirming to returned text.
In addition, needing modified content in selecting the text of each sound bite by user, then generates and correspond to institute State in content the syllable of each word and the candidate sound corresponding to each word in the content in the candidate word of each word, the content Section repaiies the text in pronunciation segment further according to the candidate word of user's selection, the syllable and the candidate syllable Just, it can facilitate user that correct word is quickly selected to be modified the content in text.
In addition, signal-to-noise ratio is obtained by carrying out noise monitoring to playback environ-ment in recording, when the signal-to-noise ratio is more than in advance If when threshold value, reducing the candidate word, the candidate syllable;When the signal-to-noise ratio is less than predetermined threshold value, increase the candidate Word, the candidate syllable, can adjust the number of candidate result according to different signal-to-noise ratio.
Each embodiment is described by the way of progressive in this specification, the highlights of each of the examples are with other The difference of embodiment, just to refer each other for identical similar portion between each embodiment.For system disclosed in embodiment For, due to corresponding to the methods disclosed in the examples, so description is fairly simple, related place is referring to method part illustration .
Professional further appreciates that, unit described in conjunction with the examples disclosed in the embodiments of the present disclosure And algorithm steps, can be realized with electronic hardware, computer software, or a combination of the two, in order to clearly demonstrate hardware and The interchangeability of software generally describes each exemplary composition and step according to function in the above description.These Function is implemented in hardware or software actually, depends on the specific application and design constraint of technical solution.Profession Technical staff can use different methods to achieve the described function each specific application, but this realization is not answered Think beyond the scope of this invention.
Obviously, those skilled in the art can carry out invention spirit of the various modification and variations without departing from the present invention And range.If in this way, these modifications and changes of the present invention belong to the claims in the present invention and its equivalent technologies range it Interior, then the present invention is also intended to including these modification and variations.

Claims (12)

1. a kind of pronunciation inputting method, which is characterized in that including:
Constantly by the phonetic segmentation sound bite of input and the text of each sound bite is generated while recording;
And the text of each sound bite is shown successively, the text of each sound bite is carried out successively according to the user's choice It corrects, including:
User selects to need modified content in the text of each sound bite;
It generates corresponding to the syllable of each word in the candidate word of each word, the content in the content and corresponds to the content In each word candidate syllable;
The text in pronunciation segment is repaiied according to the candidate word of user's selection, the syllable and the candidate syllable Just.
2. pronunciation inputting method as described in claim 1, which is characterized in that constantly cut the voice of input by cloud server Divide sound bite and generates the text of each sound bite.
3. pronunciation inputting method as described in claim 1, which is characterized in that constantly will input by voice activity detection algorithm Phonetic segmentation sound bite.
4. pronunciation inputting method as described in claim 1, which is characterized in that the candidate word selected according to user, The step of syllable and the candidate syllable are modified the text in pronunciation segment include:
When user selects the candidate word, the candidate word selected is replaced to the corresponding word in the content;
When user selects the syllable, the candidate word corresponding to the syllable is generated, is selected from the candidate word of the syllable Correct candidate word replaces the corresponding word in the content;
When user's selection candidate syllable, the candidate word for corresponding to candidate syllable is generated, from the candidate of the candidate syllable Correct candidate word is selected in word replaces corresponding word in the content;
When no correctly as a result, can then input method be called to repair text in the candidate word, candidate syllable of generation Change.
5. pronunciation inputting method as claimed in claim 4, which is characterized in that it is described recording while constantly by the language of input Sound cutting sound bite and the step of generate the text of each sound bite before, further include:
Noise monitoring is carried out to playback environ-ment in recording and obtains signal-to-noise ratio.
6. pronunciation inputting method as claimed in claim 5, which is characterized in that described generate corresponds to each word in the content Candidate word, the syllable of each word and corresponding to including the step of the candidate syllable of each word in the content in the content:
When the signal-to-noise ratio is more than predetermined threshold value, increase the candidate word, the candidate syllable;
When the signal-to-noise ratio is less than predetermined threshold value, the candidate word, the candidate syllable are reduced.
7. a kind of voice entry system, which is characterized in that including:
Cutting module, for constantly by the phonetic segmentation sound bite of input and generating each sound bite while recording Text;
And correcting module, the text for showing each sound bite successively, according to the user's choice successively to each voice The text of segment is modified, wherein
The correcting module includes:
Selecting unit selects to need modified content in the text of each sound bite for obtaining user;
Candidate unit, for generates corresponding in the candidate word of each word, the content in the content each the syllable of word and Corresponding to the candidate syllable of each word in the content;
Amending unit, the candidate word, the syllable and the candidate syllable for being selected according to user are in pronunciation segment Text be modified.
8. voice entry system as claimed in claim 7, which is characterized in that the cutting module is located on cloud server.
9. voice entry system as claimed in claim 7, which is characterized in that the cutting module is calculated by speech terminals detection Method is constantly by the phonetic segmentation sound bite of input.
10. voice entry system as claimed in claim 7, which is characterized in that
The amending unit, for when user selects the candidate word, the candidate word selected to be replaced in the content Corresponding word;
When user selects the syllable, the candidate word corresponding to the syllable is generated, is selected from the candidate word of the syllable Correct candidate word replaces the corresponding word in the content;
When user's selection candidate syllable, the candidate word for corresponding to candidate syllable is generated, from the candidate of the candidate syllable Correct candidate word is selected in word replaces corresponding word in the content;
When no correctly as a result, can then input method be called to repair text in the candidate word, candidate syllable of generation Change.
11. voice entry system as claimed in claim 10, which is characterized in that further include noise monitoring unit, for recording Noise monitoring is carried out to playback environ-ment when sound and obtains signal-to-noise ratio.
12. voice entry system as claimed in claim 11, which is characterized in that
The candidate unit, for when the signal-to-noise ratio is more than predetermined threshold value, reducing the candidate word, the candidate syllable;
When the signal-to-noise ratio is less than predetermined threshold value, increase the candidate word, the candidate syllable.
CN201210101302.9A 2012-03-31 2012-03-31 Pronunciation inputting method and system Active CN103366742B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210101302.9A CN103366742B (en) 2012-03-31 2012-03-31 Pronunciation inputting method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210101302.9A CN103366742B (en) 2012-03-31 2012-03-31 Pronunciation inputting method and system

Publications (2)

Publication Number Publication Date
CN103366742A CN103366742A (en) 2013-10-23
CN103366742B true CN103366742B (en) 2018-07-31

Family

ID=49367943

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210101302.9A Active CN103366742B (en) 2012-03-31 2012-03-31 Pronunciation inputting method and system

Country Status (1)

Country Link
CN (1) CN103366742B (en)

Families Citing this family (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103871401B (en) 2012-12-10 2016-12-28 联想(北京)有限公司 A kind of method of speech recognition and electronic equipment
CN103559880B (en) * 2013-11-08 2015-12-30 百度在线网络技术(北京)有限公司 Voice entry system and method
CN105469801B (en) * 2014-09-11 2019-07-12 阿里巴巴集团控股有限公司 A kind of method and device thereof for repairing input voice
CN105206267B (en) * 2015-09-09 2019-04-02 中国科学院计算技术研究所 A kind of the speech recognition errors modification method and system of fusion uncertainty feedback
CN105630959B (en) * 2015-12-24 2020-02-21 联想(北京)有限公司 Text information display method and electronic equipment
CN106331893B (en) * 2016-08-31 2019-09-03 科大讯飞股份有限公司 Real-time caption presentation method and system
CN106603381B (en) * 2016-11-24 2020-06-02 北京小米移动软件有限公司 Method and device for processing chat information
CN107068145B (en) * 2016-12-30 2019-02-15 中南大学 Speech evaluating method and system
CN106710597B (en) * 2017-01-04 2020-12-11 广东小天才科技有限公司 Voice data recording method and device
CN107230478A (en) * 2017-05-03 2017-10-03 上海斐讯数据通信技术有限公司 A kind of voice information processing method and system
CN107679032A (en) * 2017-09-04 2018-02-09 百度在线网络技术(北京)有限公司 Voice changes error correction method and device
CN109471537A (en) * 2017-09-08 2019-03-15 腾讯科技(深圳)有限公司 Pronunciation inputting method, device, computer equipment and storage medium
CN107644646B (en) * 2017-09-27 2021-02-02 北京搜狗科技发展有限公司 Voice processing method and device for voice processing
CN108039173B (en) * 2017-12-20 2021-02-26 深圳安泰创新科技股份有限公司 Voice information input method, mobile terminal, system and readable storage medium
CN108320747A (en) * 2018-02-08 2018-07-24 广东美的厨房电器制造有限公司 Appliances equipment control method, equipment, terminal and computer readable storage medium
CN108737634B (en) * 2018-02-26 2020-03-27 珠海市魅族科技有限公司 Voice input method and device, computer device and computer readable storage medium
CN109739425B (en) * 2018-04-19 2020-02-18 北京字节跳动网络技术有限公司 Virtual keyboard, voice input method and device and electronic equipment
CN108600773B (en) * 2018-04-25 2021-08-10 腾讯科技(深圳)有限公司 Subtitle data pushing method, subtitle display method, device, equipment and medium
CN108632465A (en) * 2018-04-27 2018-10-09 维沃移动通信有限公司 A kind of method and mobile terminal of voice input
CN110347996B (en) * 2019-07-15 2023-06-20 北京百度网讯科技有限公司 Text modification method and device, electronic equipment and storage medium
CN110491370A (en) * 2019-07-15 2019-11-22 北京大米科技有限公司 A kind of voice stream recognition method, device, storage medium and server
CN110600039B (en) * 2019-09-27 2022-05-20 百度在线网络技术(北京)有限公司 Method and device for determining speaker attribute, electronic equipment and readable storage medium
CN111326144B (en) * 2020-02-28 2023-03-03 网易(杭州)网络有限公司 Voice data processing method, device, medium and computing equipment
CN112151072A (en) * 2020-08-21 2020-12-29 北京搜狗科技发展有限公司 Voice processing method, apparatus and medium
CN117251556A (en) * 2023-11-17 2023-12-19 北京遥领医疗科技有限公司 Patient screening system and method in registration queue

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1181574A (en) * 1996-10-31 1998-05-13 微软公司 Method and system for selecting recognized words when correcting recognized speech
CN1901041A (en) * 2005-07-22 2007-01-24 康佳集团股份有限公司 Voice dictionary forming method and voice identifying system and its method
US7310602B2 (en) * 2004-09-27 2007-12-18 Kabushiki Kaisha Equos Research Navigation apparatus
CN101131636A (en) * 2006-08-18 2008-02-27 李颖 On-line voice or Pinyin input method
CN101593076A (en) * 2008-05-28 2009-12-02 Lg电子株式会社 Portable terminal and the method that is used to revise its text
CN102122506A (en) * 2011-03-08 2011-07-13 天脉聚源(北京)传媒科技有限公司 Method for recognizing voice
CN102215233A (en) * 2011-06-07 2011-10-12 盛乐信息技术(上海)有限公司 Information system client and information publishing and acquisition methods
CN102299934A (en) * 2010-06-23 2011-12-28 上海博路信息技术有限公司 Voice input method based on cloud mode and voice recognition
CN102779511A (en) * 2011-05-12 2012-11-14 Nhn株式会社 Speech recognition system and method based on word-level candidate generation

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000259178A (en) * 1999-03-08 2000-09-22 Fujitsu Ten Ltd Speech recognition device
JP4509361B2 (en) * 2000-11-16 2010-07-21 株式会社東芝 Speech recognition apparatus, recognition result correction method, and recording medium
JP5364412B2 (en) * 2009-03-26 2013-12-11 富士通テン株式会社 Search device

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1181574A (en) * 1996-10-31 1998-05-13 微软公司 Method and system for selecting recognized words when correcting recognized speech
US7310602B2 (en) * 2004-09-27 2007-12-18 Kabushiki Kaisha Equos Research Navigation apparatus
CN1901041A (en) * 2005-07-22 2007-01-24 康佳集团股份有限公司 Voice dictionary forming method and voice identifying system and its method
CN101131636A (en) * 2006-08-18 2008-02-27 李颖 On-line voice or Pinyin input method
CN101593076A (en) * 2008-05-28 2009-12-02 Lg电子株式会社 Portable terminal and the method that is used to revise its text
CN102299934A (en) * 2010-06-23 2011-12-28 上海博路信息技术有限公司 Voice input method based on cloud mode and voice recognition
CN102122506A (en) * 2011-03-08 2011-07-13 天脉聚源(北京)传媒科技有限公司 Method for recognizing voice
CN102779511A (en) * 2011-05-12 2012-11-14 Nhn株式会社 Speech recognition system and method based on word-level candidate generation
CN102215233A (en) * 2011-06-07 2011-10-12 盛乐信息技术(上海)有限公司 Information system client and information publishing and acquisition methods

Also Published As

Publication number Publication date
CN103366742A (en) 2013-10-23

Similar Documents

Publication Publication Date Title
CN103366742B (en) Pronunciation inputting method and system
US11894014B2 (en) Audio-visual speech separation
US11368581B2 (en) Semiautomated relay method and apparatus
US11627221B2 (en) Semiautomated relay method and apparatus
US10748523B2 (en) Semiautomated relay method and apparatus
US10878721B2 (en) Semiautomated relay method and apparatus
US10037313B2 (en) Automatic smoothed captioning of non-speech sounds from audio
US9390725B2 (en) Systems and methods for noise reduction using speech recognition and speech synthesis
US20240127798A1 (en) Training speech recognition systems using word sequences
US20140074467A1 (en) Speaker Separation in Diarization
US20060229873A1 (en) Methods and apparatus for adapting output speech in accordance with context of communication
US10581625B1 (en) Automatically altering the audio of an object during video conferences
JP2018124425A (en) Voice dialog device and voice dialog method
US20230005480A1 (en) Voice Filtering Other Speakers From Calls And Audio Messages
CN109994129B (en) Speech processing system, method and device
JP2011186143A (en) Speech synthesizer, speech synthesis method for learning user's behavior, and program
JP2012163692A (en) Voice signal processing system, voice signal processing method, and voice signal processing method program
Sodoyer et al. A study of lip movements during spontaneous dialog and its application to voice activity detection
US20220059094A1 (en) Transcription of audio
Eyben et al. Audiovisual vocal outburst classification in noisy acoustic conditions
Bleaman et al. Medium-shifting and intraspeaker variation in conversational interviews
JP6260138B2 (en) COMMUNICATION PROCESSING DEVICE, COMMUNICATION PROCESSING METHOD, AND COMMUNICATION PROCESSING PROGRAM
KR101501705B1 (en) Apparatus and method for generating document using speech data and computer-readable recording medium
US20230298612A1 (en) Microphone Array Configuration Invariant, Streaming, Multichannel Neural Enhancement Frontend for Automatic Speech Recognition
CN113763921A (en) Method and apparatus for correcting text

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
ASS Succession or assignment of patent right

Owner name: SHANGHAI GUOKE ELECTRONIC CO., LTD.

Free format text: FORMER OWNER: SHENGYUE INFORMATION TECHNOLOGY (SHANGHAI) CO., LTD.

Effective date: 20140919

C41 Transfer of patent application or patent right or utility model
TA01 Transfer of patent application right

Effective date of registration: 20140919

Address after: 201203, room 1, building 380, 108 Yin Yin Road, Shanghai, Pudong New Area

Applicant after: Shanghai Guoke Electronic Co., Ltd.

Address before: 201203 Shanghai City, Pudong New Area Shanghai City, Guo Shou Jing Road, Zhangjiang hi tech Park No. 356 building 3 Room 102

Applicant before: Shengle Information Technology (Shanghai) Co., Ltd.

EXSB Decision made by sipo to initiate substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 "change of name, title or address"

Address after: Room 127, building 3, 356 GuoShouJing Road, Zhangjiang High Tech Park, Pudong New Area, Shanghai 201204

Patentee after: SHANGHAI GEAK ELECTRONICS Co.,Ltd.

Address before: Room 108, building 1, 380 Yinbei Road, Pudong New Area, Shanghai 201203

Patentee before: Shanghai Nutshell Electronics Co.,Ltd.

CP03 "change of name, title or address"