CN105469801A - Input speech restoring method and device - Google Patents

Input speech restoring method and device Download PDF

Info

Publication number
CN105469801A
CN105469801A CN201410462543.5A CN201410462543A CN105469801A CN 105469801 A CN105469801 A CN 105469801A CN 201410462543 A CN201410462543 A CN 201410462543A CN 105469801 A CN105469801 A CN 105469801A
Authority
CN
China
Prior art keywords
voice
fields
repaired
input
voice fields
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410462543.5A
Other languages
Chinese (zh)
Other versions
CN105469801B (en
Inventor
陈紫微
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba China Network Technology Co Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201410462543.5A priority Critical patent/CN105469801B/en
Publication of CN105469801A publication Critical patent/CN105469801A/en
Application granted granted Critical
Publication of CN105469801B publication Critical patent/CN105469801B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides an input speech restoring method and device. The method comprises that speech fields in received input speech are recognized according to a preset speech recognition base, and whether speech fields to be restored exist in the input speech are determined; if the speech fields to be restored exist in the input speech, corrective speech fields matching the speech fields to be restored are obtained from the preset speech recognition base; and the corrective speech fields replace the speech fields to be restored in the input speech to obtain the corrected input speech. The method and device restore the input speech and ensure the completeness of the input speech.

Description

A kind of method and device thereof repairing input voice
Technical field
The application relates to phonetic entry technical field, particularly relates to a kind of method and the device thereof of repairing input voice.
Background technology
Along with the development of Internet technology, convenient as one, the direct communication modes of voice technology is widely used.Such as, user can carry out instant messaging by voice or issue voice messaging (such as voice microblog).Voice instant messaging is that user inputs voice by terminal device, and this speech data is transmitted by internet, to realize instant messaging.
Therefore, for voice instant messaging, user by terminal device input voice integrality drastically influence the effect of voice instant messaging.Common voice instant messaging technology, as micro-letter, dealing, credulity etc., when user carries out phonetic entry by terminal device, easily can there is disappearance in one or more fields in these voice ending stage.Further, due to the impact of other environmental noises, also easily there is the disappearance of other fields by the voice that terminal device inputs in user.The disappearance of this field will cause the information integrity exporting voice impaired, causes the hint expression of whole sentence unclear, affects the effect of voice instant messaging.User is required to be the input that this re-starts voice usually, and this repetitive operation can affect the experience of user, and takies time cost more.
Therefore, how input voice are repaired, ensure that the integrality of input voice becomes technical matters urgently to be resolved hurrily.
Summary of the invention
In view of this, the application provides a kind of method and the device thereof of repairing input voice, and it is repaired input voice, ensure that the integrality of input voice.
The application provides a kind of method of repairing input voice, comprising:
According to the speech recognition library preset, the voice fields in the input voice received is identified, determine whether there is voice fields to be repaired in described input voice;
As described in there is voice fields to be repaired in input voice, then from described default speech recognition library, obtain the correction voice fields matched with described voice fields to be repaired; And
Described correction voice fields is replaced the voice fields to be repaired in described input voice, obtains the input voice after repairing.
The application also provides a kind of device repairing input voice, comprising:
Retrieval module, for identifying the voice fields in the input voice received according to the speech recognition library preset, determines whether there is voice fields to be repaired in described input voice;
Repairing module, for when there is voice fields to be repaired in described input voice, then from described default speech recognition library, obtaining the correction voice fields matched with described voice fields to be repaired; And
Replacement module, for described correction voice fields being replaced the voice fields to be repaired in described input voice, obtains the input voice after repairing.
From above technical scheme, the application identifies the voice fields in the input voice received, to determine whether there is voice fields to be repaired in input voice.The application, according to speech recognition library, obtains the correction voice fields matched with voice fields to be repaired.Correction voice fields is replaced the voice fields to be repaired in described input voice, obtain the input voice after repairing.Therefore, the application has carried out identifying and modifying to input voice, ensure that the integrality of input voice, improves Consumer's Experience.
Accompanying drawing explanation
Fig. 1 is the application server of the application and the communication schematic diagram of terminal device;
Fig. 2 is the method flow diagram that the application repairs input voice;
Fig. 3 is the structure drawing of device that the application repairs input voice;
Fig. 4 is the structural drawing of the application one embodiment;
Fig. 5 is the structural drawing that the application repairs replacement module in the device of input voice;
Fig. 6 is the user interface schematic diagram of the application's terminal device.
Embodiment
The application identifies the voice fields in the input voice received, to determine whether there is voice fields to be repaired in input voice.The application, according to speech recognition library, obtains the correction voice fields matched with voice fields to be repaired.Correction voice fields is replaced the voice fields to be repaired in described input voice, obtain the input voice after repairing.Therefore, the application has carried out identifying and modifying to input voice, ensure that the integrality of input voice, improves Consumer's Experience.
The application's specific implementation is further illustrated below in conjunction with illustrations.
Referring to Fig. 1, the application provides a kind of method of repairing input voice, and it is applied to the server 11 carrying out audio frequency dissection process.User carries out phonetic entry by terminal device 12, and described terminal device 12 is connected with described server 11 by network (can be wired, wireless or the two combination).Described terminal device 12 is generally mobile phone, panel computer, Intelligent worn device or PC etc.The input voice of user are sent to described server 11 by network by described terminal device 12, and described server 11 performs the method for the reparation input voice that the application provides, and carries out repair process to the input voice of described user.Input voice after reparation are sent to terminal device 12 by described server 11, and user selects the original input voice of transmission or the input voice after repairing to carry out communication by terminal device 12.In other embodiments, the input voice after reparation also can directly be sent to other-end equipment by described server 11.
Particularly, described terminal device 12 end can adopt the mode of APP (Application, application) software to realize the input voice of user to send to described server 12, and the input voice after the reparation of reception server 12 transmission.User selects the original input voice of transmission or the input voice after repairing to carry out communication by the interface that described APP software provides.
See Fig. 2, described in the application, method 2 comprises:
The speech recognition library that S1, basis are preset identifies the voice fields in the input voice received, and determines whether there is voice fields to be repaired in described input voice.
In the application one specific implementation, described server 11 receives the input voice that user is sent by terminal device 12.Before identifying the voice fields in the input voice received, described server 11 first carries out front-end processing to original input voice, the impact that part stress release treatment and different speaker bring, and makes the signal after process more can reflect the essential characteristic of voice.The most frequently used front-end processing has end-point detection and speech enhan-cement.End-point detection refers to and voice and non-speech audio period to be made a distinction in voice signal, determines the starting point of voice signal exactly.After end-point detection, subsequent treatment just can only be carried out voice signal.Speech enhan-cement is exactly eliminate neighbourhood noise to the impact of voice.
After described input voice carry out front-end processing, the speech recognition library according to presetting identifies each voice fields in the input voice after front-end processing, determines whether there is voice fields to be repaired in described input voice.Because the ending stage inputting voice easily exists the disappearance of one or more field, the application also can only identify the voice fields of ending in input voice, to determine whether there is voice fields to be repaired in described input voice.
In another specific implementation of the application, after described input voice carry out front-end processing, be split at least in short according to default fractionation rule.Particularly, described fractionation rule comprises at least one in word speed, interval, crucial voice fields.Such as, but when one section input voice in occur then and etc. crucial voice fields time, it is split.Or, when occurring that interval is greater than interval threshold in one section of input voice, then it is split.Or when there is obvious different word speed in one section of input voice, then it is split.Certainly, also can split according to above-mentioned any two kinds or whole three kinds the fractionation that rule carries out inputting voice simultaneously.
Such as, user input voice is " I first go shopping today // then go park // finally go eaten your * # of meal // very tired // get along well ", wherein // representing interval 2ms, interval threshold is 1ms.The fractionation rule preset is interval and crucial voice fields, 5 words are obtained after it is split, be respectively " I first goes shopping today ", " then going to park ", " finally going to have eaten meal ", " very tired ", " get along well you * # ", wherein *, # are fuzzy pronunciation.
After input voice split, carry out field cutting to every words after fractionation respectively, every words are cut into multiple voice fields, and particularly, described voice fields is word, word or morpheme.Such as, " get along well you * # " cutting become " no ", " with ", " you ", " * ", " # ".
The application splits input voice, carries out voice recognition processing respectively, greatly reduce the calculated amount of speech recognition algorithm, make server 11 occupy less internal memory and cpu resource the input voice after fractionation.
In addition, particularly, the speech recognition library prestored in described server 11 can obtain correction voice fields according to voice fields to be repaired by the repairing model of recognizer.The voice fields to be repaired of preservation as search index, if the voice fields of inquiry hits the voice fields preserved in this search index, is then shown that the voice fields of this inquiry is voice fields to be repaired by described speech recognition library.Described speech recognition library employing artificially collects or the mode of machine collection is set up, pronunciation characteristic according to different language is set up, such as MITMedialabSpeechDataset (MIT Media Lab speech data collection), PitchandVoicingEstimatesforAurora2 (pitch period of Aurora2 sound bank and tone are estimated), Congressionalspeechdata (Congress's speech data), MandarinSpeechFrameData (mandarin pronunciation frame data) etc.
It should be noted that, voice fields to be repaired in the application's speech recognition library and correct voice fields not relation one to one, different voice fields to be repaired can due to the corresponding identical correction voice fields of the difference of recognizer, and same voice fields to be repaired also can due to the corresponding multiple different correction voice fields of the difference of recognizer.Usually adopt different recognizers jointly to carry out the accuracy identifying to improve identification in speech recognition, general speech recognition algorithm is as HMM speech recognition modeling and algorithm, BMM speech recognition modeling and algorithm etc.
Such as, voice fields " * " to be repaired is preserved in described speech recognition library, the correction voice fields that can obtain its correspondence by recognizer " is said ", preserve voice fields " # " to be repaired in described speech recognition library, the correction voice fields " " of its correspondence can be obtained by recognizer.Certainly, described voice fields to be repaired " * ", also can by other recognizers obtain its correspondence other correct voice fields " institute ", described voice fields " # " to be repaired, also can by other recognizers obtain its correspondence other correct voice fields " ".Described speech recognition library using voice fields " * " to be repaired and " # " as search index.
" get along well you * # " cutting become " no ", " with ", " you ", " * ", after " # ", identify according to speech recognition library, the voice fields " no " of inquiry, " with ", " you ", " * ", " * " in " # ", " # " hit voice fields " * ", " # " in this search index, then show the voice fields " * " of this inquiry, " # " be voice fields to be repaired.
S2, as described in there is voice fields to be repaired in input voice, then from described default speech recognition library, obtain the correction voice fields matched with described voice fields to be repaired.
In the application one specific implementation, if by the search index of the described speech recognition library of inquiry, find there is the voice fields hitting described search index in the voice fields of input voice, then the voice fields of the described search index of this hit is voice fields to be repaired.Use different recognizers to identify described voice fields to be repaired from described speech recognition library, obtain the correction voice fields that this voice fields to be repaired is corresponding.
Such as, the voice fields " no " of inquiry in input voice " get along well you * # ", " with ", there is voice fields " * ", " # " to be repaired in " you ", " * ", " # ", then from described speech recognition library, identifies that the correction voice fields corresponding respectively with voice fields to be repaired " * ", " # " " is said ", " ".
As described in adopt various speech recognition algorithm all cannot obtain correction voice fields for this voice fields to be repaired in default speech recognition library.Like this, although input voice exist voice fields to be repaired, when speech recognition algorithm cannot be utilized to obtain according to speech recognition library the correction voice fields matched with described voice fields to be repaired, can abandon repairing described input voice.
In another specific implementation of the application, as described in adopt various speech recognition algorithm all cannot obtain correction voice fields for this voice fields to be repaired in default speech recognition library.Like this, although there is voice fields to be repaired in input voice, when but speech recognition algorithm cannot be utilized to obtain according to speech recognition library the correction voice fields matched with described voice fields to be repaired, then select excellent word filling algorithms selection voice fields as correction voice fields according to the fuzzy phoneme preset.Described fuzzy phoneme selects excellent word filling algorithm to adopt existing fuzzy control table to inquire about or Fuzzy Calculation formula, and its principle is for finding out the correction voice fields corresponding to voice fields close with the voice fields to be repaired of this inquiry in speech recognition library as correction voice fields corresponding to the voice fields to be repaired of this inquiry.
Such as, various speech recognition algorithm is adopted all cannot to obtain correction voice fields for this voice fields to be repaired " * ", " # " in described default speech recognition library.Like this, although input voice exist voice fields " * ", " # " to be repaired, speech recognition algorithm cannot be utilized according to speech recognition library to obtain the correction voice fields matched with described voice fields to be repaired.Need to adopt fuzzy phoneme to select excellent word filling algorithm, find out correction voice fields corresponding to voice fields " * ' " close with the voice fields " * " to be repaired of this inquiry, " # " in speech recognition library, " # ' " as correction voice fields corresponding to the voice fields to be repaired of this inquiry.Preserve voice fields " * ' " in described speech recognition library, correction voice fields corresponding to " # ' " be " saying ", " ", then select voice fields " to say ", " " as the correction voice fields of described voice fields to be repaired " * ", " # ".
In the application again a specific implementation, various speech recognition algorithm is adopted to obtain at least two correction voice fields corresponding to this voice fields to be repaired for this voice fields to be repaired from described default speech recognition library.
Such as, there is voice fields " * ", " # " to be repaired in input voice " get along well you * # ", then from default speech recognition library, adopt different speech recognition algorithms to obtain the correction voice fields corresponding with voice fields " * " to be repaired, " # " " is said ", " " and the correction voice fields " institute " corresponding with voice fields " * " to be repaired, " # ", " ".
S3, described correction voice fields is replaced the voice fields to be repaired in described input voice, obtain the input voice after repairing.
In the application one specific implementation, if adopt various speech recognition algorithm to obtain a correction voice fields of its correspondence for this voice fields to be repaired from described default speech recognition library, this correction voice fields is replaced the voice fields to be repaired in described input voice, thus obtains the input voice after repairing.
Such as, in input voice " get along well you * # ", correction voice fields corresponding to voice fields " * " to be repaired, " # " " is said ", " ", correction voice fields voice fields " * " to be repaired, " # " being replaced to correspondence " says ", " ".Input voice " get along well you * # " reparation becomes " getting along well, you have said ".
In another specific implementation of the application, if adopt various speech recognition algorithm to obtain at least two correction voice fields of its correspondence for this voice fields to be repaired from described default speech recognition library, respectively each correction voice fields is replaced the voice fields to be repaired in described input voice, obtain the input voice after many reparations.Input voice after repairing each carry out the assessment of statement smoothness, and the result according to the assessment of described statement smoothness determines the final input voice repaired.
The assessment of described statement smoothness is the rule that the language feature used according to phonetic entry is preset, as every words terminate word feature, adversative feature, conjunction feature etc.
Such as, in input voice " get along well you * # ", correction voice fields corresponding to voice fields " * " to be repaired, " # " " is said ", " " or " institute ", " ", voice fields " * " to be repaired, " # " replaced to corresponding correction voice fields " is said ", " " or " institute ", " ".Input voice " get along well you * # " reparation becomes " getting along well, you have said " and " get along well you place ".The assessment of statement smoothness is carried out to " getting along well, you have said " and " get along well you place ", obtains " getting along well, you have said " input voice as final reparation.
The input voice finally repaired are sent to terminal device 12 by described server 11, and terminal device 12 selects the original input voice of transmission or the input voice after repairing to carry out communication.Certain described server 11 also can from the input voice of all reparations case statement smoothness assessment rank forward many (as, article three, the input voice) repaired send to terminal device 12, and user selects to send original input voice or the arbitrary input voice after repairing carry out communication.Particularly, the voice fields of reparation is also sent to user by described server 11, makes user select to carry out reference when sending original input voice or the input voice after repairing.
The application identifies the voice fields in the input voice received, and to determine whether there is voice fields to be repaired in input voice, obtains the correction voice fields matched with voice fields to be repaired.And the voice fields to be repaired correction voice fields replaced in described input voice, obtain the input voice after repairing.Therefore, the application has carried out identifying and modifying to input voice, ensure that the integrality of input voice, improves Consumer's Experience.
Corresponding to the application's device, the application also provides a kind of device repairing input voice, and it is applied to the server 11 carrying out audio frequency dissection process.Described server 11 generally includes CPU, input/output module, storer and other hardware modules.Referring to Fig. 3, the application's device 3 logically comprises:
Retrieval module 31, for identifying the voice fields in the input voice received according to the speech recognition library preset, determines whether there is voice fields to be repaired in described input voice;
Repairing module 32, for when there is voice fields to be repaired in described input voice, then from described default speech recognition library, obtaining the correction voice fields matched with described voice fields to be repaired; And
Replacement module 33, for described correction voice fields being replaced the voice fields to be repaired in described input voice, obtains the input voice after repairing.
The application's retrieval module 31 determines whether input voice exist voice fields to be repaired, described reparation module 32 obtains the correction voice fields matched with described voice fields to be repaired, described correction voice fields is replaced the voice fields to be repaired in described input voice by described replacement module 33, obtains the input voice after repairing.Therefore, the application has carried out identifying and modifying to input voice, ensure that the integrality of input voice, improves Consumer's Experience.
In order to obtain better speech recognition effect, referring to Fig. 4, the application also comprises processing module 34, before the voice fields in the input voice received is identified, first front-end processing is carried out to original input voice, the impact that part stress release treatment and different speaker bring, makes the signal after process more can reflect the essential characteristic of voice.The most frequently used front-end processing has end-point detection and speech enhan-cement.End-point detection refers to and voice and non-speech audio period to be made a distinction in voice signal, determines the starting point of voice signal exactly.After end-point detection, subsequent treatment just can only be carried out voice signal.Speech enhan-cement is exactly eliminate neighbourhood noise to the impact of voice.
After described input voice carry out front-end processing, described retrieval module 31 identifies each voice fields in the input voice after front-end processing according to the speech recognition library preset, and determines whether there is voice fields to be repaired in described input voice.Because the ending stage inputting voice easily exists the disappearance of one or more field, described retrieval module 31 also can only identify the voice fields of ending in input voice, to determine whether there is voice fields to be repaired in described input voice.
Described retrieval module 31 by described carry out front-end processing after input voice split rule split at least in short according to presetting.Particularly, described fractionation rule comprises at least one in word speed, interval, crucial voice fields.Such as, but when one section input voice in occur then and etc. crucial voice fields time, it is split.Or, when occurring that interval is greater than interval threshold in one section of input voice, then it is split.Or when there is obvious different word speed in one section of input voice, then it is split.Certainly, also can split according to above-mentioned any two kinds or whole three kinds the fractionation that rule carries out inputting voice simultaneously.
After input voice split by described retrieval module 31, carry out field cutting to every words after fractionation respectively, every words are cut into multiple voice fields, and particularly, described voice fields is word, word or morpheme.
Such as, user input voice is " I first go shopping today // then go park // finally go eaten your * # of meal // very tired // get along well ", wherein // representing interval 2ms, interval threshold is 1ms.The fractionation rule preset is interval and crucial voice fields, 5 words are obtained after described retrieval module 31 splits it, be respectively " I first goes shopping today ", " then going to park ", " finally going to have eaten meal ", " very tired ", " get along well you * # ", wherein *, # are fuzzy pronunciation.Described retrieval module 31 carries out field cutting to above-mentioned 5 word respectively, such as, will " get along well you * # " cutting become " no ", " with ", " you ", " * ", " # ".
Retrieval module 31 described in the application splits input voice, carries out voice recognition processing respectively, greatly reduce the calculated amount of speech recognition algorithm, make server 11 occupy less internal memory and cpu resource the input voice after fractionation.
In addition, particularly, the speech recognition library prestored in described server 11 can obtain correction voice fields according to voice fields to be repaired by the repairing model of recognizer.The voice fields to be repaired of preservation as search index, if the voice fields of inquiry hits the voice fields preserved in this search index, is then shown that the voice fields of this inquiry is voice fields to be repaired by described speech recognition library.Described speech recognition library employing artificially collects or the mode of machine collection is set up, pronunciation characteristic according to different language is set up, such as MITMedialabSpeechDataset (MIT Media Lab speech data collection), PitchandVoicingEstimatesforAurora2 (pitch period of Aurora2 sound bank and tone are estimated), Congressionalspeechdata (Congress's speech data), MandarinSpeechFrameData (mandarin pronunciation frame data) etc.
It should be noted that, voice fields to be repaired in the application's speech recognition library and correct voice fields not relation one to one, different voice fields to be repaired can due to the corresponding identical correction voice fields of the difference of recognizer, and same voice fields to be repaired also can due to the corresponding multiple different correction voice fields of the difference of recognizer.Usually adopt different recognizers jointly to carry out the accuracy identifying to improve identification in speech recognition, general speech recognition algorithm is as HMM speech recognition modeling and algorithm, BMM speech recognition modeling and algorithm etc.
Such as, voice fields " * " to be repaired is preserved in described speech recognition library, the correction voice fields that can obtain its correspondence by recognizer " is said ", preserve voice fields " # " to be repaired in described speech recognition library, the correction voice fields " " of its correspondence can be obtained by recognizer.Certainly, described voice fields to be repaired " * ", also can by other recognizers obtain its correspondence other correct voice fields " institute ", described voice fields " # " to be repaired, also can by other recognizers obtain its correspondence other correct voice fields " ".Described speech recognition library using voice fields " * " to be repaired and " # " as search index.
In the application one specific implementation, if described retrieval module 31 determines to there is voice fields to be repaired, described reparation module 32 uses different recognizers to identify described voice fields to be repaired from described speech recognition library, obtains the correction voice fields that this voice fields to be repaired is corresponding.This correction voice fields is replaced the voice fields to be repaired in described input voice by described replacement module 33, thus obtains the input voice after repairing.
Such as, the voice fields " no " of inquiry in input voice " get along well you * # ", " with ", there is voice fields " * ", " # " to be repaired in " you ", " * ", " # ", then from described speech recognition library, identifies that the correction voice fields corresponding respectively with voice fields to be repaired " * ", " # " " is said ", " ".Voice fields " * " to be repaired, " # " replaced to corresponding correction voice fields " is said ", " ".Input voice " get along well you * # " reparation becomes " getting along well, you have said ".
As described in reparation module 32 from preset speech recognition library adopt various speech recognition algorithm all cannot obtain correction voice fields for this voice fields to be repaired.Like this, although input voice exist voice fields to be repaired, when speech recognition algorithm cannot be utilized to obtain according to speech recognition library the correction voice fields matched with described voice fields to be repaired, can abandon repairing described input voice.
In another specific implementation of the application, if retrieval module 31 determines to there is voice fields to be repaired, but repair module 32 adopts various speech recognition algorithm all cannot obtain correction voice fields for this voice fields to be repaired from described default speech recognition library.Like this, although there is voice fields to be repaired in input voice, when but speech recognition algorithm cannot be utilized to obtain according to speech recognition library the correction voice fields matched with described voice fields to be repaired, then select excellent word filling algorithms selection voice fields as correction voice fields according to the fuzzy phoneme preset.Described fuzzy phoneme selects excellent word filling algorithm to adopt existing fuzzy control table to inquire about or Fuzzy Calculation formula, and its principle is for finding out the correction voice fields corresponding to voice fields close with the voice fields to be repaired of this inquiry in speech recognition library as correction voice fields corresponding to the voice fields to be repaired of this inquiry.
Such as, various speech recognition algorithm is adopted all cannot to obtain correction voice fields for this voice fields to be repaired " * ", " # " in described default speech recognition library.Like this, although input voice exist voice fields " * ", " # " to be repaired, speech recognition algorithm cannot be utilized according to speech recognition library to obtain the correction voice fields matched with described voice fields to be repaired.Need to adopt fuzzy phoneme to select excellent word filling algorithm, find out correction voice fields corresponding to voice fields " * ' " close with the voice fields " * " to be repaired of this inquiry, " # " in speech recognition library, " # ' " as correction voice fields corresponding to the voice fields to be repaired of this inquiry.Preserve voice fields " * ' " in described speech recognition library, correction voice fields corresponding to " # ' " be " saying ", " ", then select voice fields " to say ", " " as the correction voice fields of described voice fields to be repaired " * ", " # ".
In the application again a specific implementation, if retrieval module 31 determines to there is voice fields to be repaired, described reparation module 32 adopts various speech recognition algorithm to obtain at least two correction voice fields corresponding to this voice fields to be repaired for this voice fields to be repaired from described default speech recognition library.
Referring to Fig. 5, described replacement module 33 comprises repairs replacement unit 331 and clear and coherent assessment unit 332.Each correction voice fields is replaced the voice fields to be repaired in described input voice by described reparation replacement unit 331 respectively, obtains the input voice after many reparations.Described clear and coherent assessment unit 332 carries out the assessment of statement smoothness to the input voice after each reparation, determines the input voice finally repaired according to the result of described statement smoothness assessment.
The assessment of described statement smoothness is the rule that the language feature used according to phonetic entry is preset, as every words terminate word feature, adversative feature, conjunction feature etc.
Such as, there is voice fields " * ", " # " to be repaired in input voice " get along well you * # ", then from default speech recognition library, adopt different speech recognition algorithms to obtain the correction voice fields corresponding with voice fields " * " to be repaired, " # " " is said ", " " and the correction voice fields " institute " corresponding with voice fields " * " to be repaired, " # ", " ".Voice fields " * " to be repaired, " # " replaced to corresponding correction voice fields " is said ", " " or " institute ", " ".Input voice " get along well you * # " reparation becomes " getting along well, you have said " and " get along well you place ".The assessment of statement smoothness is carried out to " getting along well, you have said " and " get along well you place ", obtains " getting along well, you have said " input voice as final reparation.
The input voice finally repaired are sent to terminal device 12 by described server 11, and terminal device 12 selects the original input voice of transmission or the input voice after repairing to carry out communication.Certain described server 11 also can from the input voice of all reparations case statement smoothness assessment rank forward many (as, article three, the input voice) repaired send to terminal device 12, and user selects to send original input voice or the arbitrary input voice after repairing carry out communication.Particularly, the voice fields of reparation is also sent to user by described server 11, makes user select to carry out reference when sending original input voice or the input voice after repairing.
The application identifies the voice fields in the input voice received, and to determine whether there is voice fields to be repaired in input voice, obtains the correction voice fields matched with voice fields to be repaired.And the voice fields to be repaired correction voice fields replaced in described input voice, obtain the input voice after repairing.Therefore, the application has carried out identifying and modifying to input voice, ensure that the integrality of input voice, improves Consumer's Experience.
The application's principle is further illustrated below with a specific implementation.
User inputs voice " I first go shopping // get along well you * # " by terminal device 12, first carries out end-point detection and speech enhan-cement to input voice " I first go shopping // get along well you * # ".Obtain 2 words after input voice after process " I first go shopping // get along well you * # " split, be respectively " I first goes shopping ", " get along well you * # ", wherein *, # are fuzzy pronunciation.Respectively field cutting is carried out to above-mentioned 2 word, " I first goes shopping " cutting is become " I ", " elder generation ", " going ", " doing shopping "; Will " get along well you * # " cutting become " no ", " with ", " you ", " * ", " # ".Respectively for above-mentioned voice fields " I ", " elder generation ", " going ", " doing shopping " and " no ", " with ", identify in the speech recognition library preset in the application of " you ", " * ", " # ", wherein " * ", " # " hit the search index of described speech recognition library, are confirmed as voice fields to be repaired.Determine to there is voice fields " * ", " # " to be repaired, then from default speech recognition library, adopt different speech recognition algorithm to obtain the correction voice fields corresponding with voice fields " * " to be repaired, " # " " is said ", " " and the correction voice fields " institute " corresponding with voice fields " * " to be repaired, " # ", " ".Voice fields " * " to be repaired, " # " replaced to corresponding correction voice fields " is said ", " " or " institute ", " ".Input voice " get along well you * # " reparation becomes " getting along well, you have said " and " get along well you place ".The assessment of statement smoothness is carried out to " getting along well, you have said " and " get along well you place ", obtains " getting along well, you have said " input voice as final reparation.The input voice finally repaired are sent to terminal device 12 by described server 11.See Fig. 6, described terminal device 12 receives the voice fields of the input voice after repairing and reparation, and user sends original input voice with reference to the voice fields selection of repairing or the input voice after repairing carry out communication.
The application has carried out identifying and modifying to input voice, ensure that the integrality of input voice, improves Consumer's Experience.
The foregoing is only the preferred embodiment of the application, not in order to limit the application, within all spirit in the application and principle, any amendment made, equivalent replacements, improvement etc., all should be included within scope that the application protects.

Claims (10)

1. repair a method for input voice, it is characterized in that, comprising:
According to the speech recognition library preset, the voice fields in the input voice received is identified, determine whether there is voice fields to be repaired in described input voice;
As described in there is voice fields to be repaired in input voice, then from described default speech recognition library, obtain the correction voice fields matched with described voice fields to be repaired; And
Described correction voice fields is replaced the voice fields to be repaired in described input voice, obtains the input voice after repairing.
2. method according to claim 1, is characterized in that, the speech recognition library that described basis is preset is carried out identification to the voice fields in the input voice received and comprised:
According to default fractionation rule, the input voice of reception are split at least in short;
Respectively the voice fields in every words is identified according to the speech recognition library preset.
3. method according to claim 2, is characterized in that, described fractionation rule comprises at least one in word speed, interval, crucial voice fields.
4. method according to claim 3, is characterized in that, described as there is voice fields to be repaired in input voice, then from described default speech recognition library, obtain the correction voice fields matched with described voice fields to be repaired also comprise:
As described in fail in default speech recognition library to obtain with as described in the correction voice fields that matches of voice fields to be repaired, then select excellent word filling algorithms selection voice fields as correction voice fields according to the fuzzy phoneme preset.
5. method according to claim 4, is characterized in that, the correction voice fields that voice fields described with to be repaired matches is at least two correction voice fields;
The described voice fields to be repaired replaced by correction voice fields in described input voice, obtains the input voice after repairing and comprises:
Respectively each correction voice fields is replaced the voice fields to be repaired in described input voice, obtain the input voice after many reparations;
Input voice after repairing each carry out the assessment of statement smoothness, and the result according to the assessment of described statement smoothness determines the final input voice repaired.
6. repair a device for input voice, it is characterized in that, comprising:
Retrieval module, for identifying the voice fields in the input voice received according to the speech recognition library preset, determines whether there is voice fields to be repaired in described input voice;
Repairing module, for when there is voice fields to be repaired in described input voice, then from described default speech recognition library, obtaining the correction voice fields matched with described voice fields to be repaired; And
Replacement module, for described correction voice fields being replaced the voice fields to be repaired in described input voice, obtains the input voice after repairing.
7. device according to claim 6, is characterized in that, carries out identification comprise in described retrieval module according to the speech recognition library preset to the voice fields in the input voice received:
According to default fractionation rule, the input voice of reception are split at least in short;
Respectively the voice fields in every words is identified according to the speech recognition library preset.
8. device according to claim 7, is characterized in that, described fractionation rule comprises at least one in word speed, interval, crucial voice fields.
9. device according to claim 8, it is characterized in that, described reparation module also for when failing in described default speech recognition library to obtain the correction voice fields matched with described voice fields to be repaired, then selects excellent word filling algorithms selection voice fields as correction voice fields according to the fuzzy phoneme preset.
10. device according to claim 9, is characterized in that, the correction voice fields that voice fields described with to be repaired matches is at least two correction voice fields;
Described replacement module comprises:
Repairing replacement unit, for respectively each correction voice fields being replaced the voice fields to be repaired in described input voice, obtaining the input voice after many reparations;
Clear and coherent assessment unit, carry out the assessment of statement smoothness for the input voice after repairing each, the result according to the assessment of described statement smoothness determines the final input voice repaired.
CN201410462543.5A 2014-09-11 2014-09-11 A kind of method and device thereof for repairing input voice Active CN105469801B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410462543.5A CN105469801B (en) 2014-09-11 2014-09-11 A kind of method and device thereof for repairing input voice

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410462543.5A CN105469801B (en) 2014-09-11 2014-09-11 A kind of method and device thereof for repairing input voice

Publications (2)

Publication Number Publication Date
CN105469801A true CN105469801A (en) 2016-04-06
CN105469801B CN105469801B (en) 2019-07-12

Family

ID=55607428

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410462543.5A Active CN105469801B (en) 2014-09-11 2014-09-11 A kind of method and device thereof for repairing input voice

Country Status (1)

Country Link
CN (1) CN105469801B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107393544A (en) * 2017-06-19 2017-11-24 维沃移动通信有限公司 A kind of voice signal restoration method and mobile terminal
CN109003619A (en) * 2018-07-24 2018-12-14 Oppo(重庆)智能科技有限公司 Voice data generation method and relevant apparatus
CN109102824A (en) * 2018-07-06 2018-12-28 北京比特智学科技有限公司 Voice error correction method and device based on human-computer interaction
CN110415679A (en) * 2019-07-25 2019-11-05 北京百度网讯科技有限公司 Voice error correction method, device, equipment and storage medium
CN111986668A (en) * 2020-08-20 2020-11-24 深圳市一本电子有限公司 AI voice intelligent control Internet of things method using vehicle-mounted charger

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101175119A (en) * 2007-09-30 2008-05-07 中兴通讯股份有限公司 Abnormal voice data processing method and apparatus
CN101178705A (en) * 2007-12-13 2008-05-14 中国电信股份有限公司 Free-running speech comprehend method and man-machine interactive intelligent system
CN101567189A (en) * 2008-04-22 2009-10-28 株式会社Ntt都科摩 Device, method and system for correcting voice recognition result
US20090326938A1 (en) * 2008-05-28 2009-12-31 Nokia Corporation Multiword text correction
CN101894565A (en) * 2009-05-19 2010-11-24 华为技术有限公司 Voice signal restoration method and device
CN101923854A (en) * 2010-08-31 2010-12-22 中国科学院计算技术研究所 Interactive speech recognition system and method
CN103000176A (en) * 2012-12-28 2013-03-27 安徽科大讯飞信息科技股份有限公司 Speech recognition method and system
CN103021412A (en) * 2012-12-28 2013-04-03 安徽科大讯飞信息科技股份有限公司 Voice recognition method and system
CN103207769A (en) * 2012-01-16 2013-07-17 联想(北京)有限公司 Method and user equipment for voice amending
CN103366742A (en) * 2012-03-31 2013-10-23 盛乐信息技术(上海)有限公司 Voice input method and system
CN103700386A (en) * 2013-12-16 2014-04-02 联想(北京)有限公司 Information processing method and electronic equipment

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101175119A (en) * 2007-09-30 2008-05-07 中兴通讯股份有限公司 Abnormal voice data processing method and apparatus
CN101178705A (en) * 2007-12-13 2008-05-14 中国电信股份有限公司 Free-running speech comprehend method and man-machine interactive intelligent system
CN101567189A (en) * 2008-04-22 2009-10-28 株式会社Ntt都科摩 Device, method and system for correcting voice recognition result
US20090326938A1 (en) * 2008-05-28 2009-12-31 Nokia Corporation Multiword text correction
CN101894565A (en) * 2009-05-19 2010-11-24 华为技术有限公司 Voice signal restoration method and device
CN101923854A (en) * 2010-08-31 2010-12-22 中国科学院计算技术研究所 Interactive speech recognition system and method
CN103207769A (en) * 2012-01-16 2013-07-17 联想(北京)有限公司 Method and user equipment for voice amending
CN103366742A (en) * 2012-03-31 2013-10-23 盛乐信息技术(上海)有限公司 Voice input method and system
CN103000176A (en) * 2012-12-28 2013-03-27 安徽科大讯飞信息科技股份有限公司 Speech recognition method and system
CN103021412A (en) * 2012-12-28 2013-04-03 安徽科大讯飞信息科技股份有限公司 Voice recognition method and system
CN103700386A (en) * 2013-12-16 2014-04-02 联想(北京)有限公司 Information processing method and electronic equipment

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107393544A (en) * 2017-06-19 2017-11-24 维沃移动通信有限公司 A kind of voice signal restoration method and mobile terminal
CN107393544B (en) * 2017-06-19 2019-03-05 维沃移动通信有限公司 A kind of voice signal restoration method and mobile terminal
CN109102824A (en) * 2018-07-06 2018-12-28 北京比特智学科技有限公司 Voice error correction method and device based on human-computer interaction
CN109003619A (en) * 2018-07-24 2018-12-14 Oppo(重庆)智能科技有限公司 Voice data generation method and relevant apparatus
CN110415679A (en) * 2019-07-25 2019-11-05 北京百度网讯科技有限公司 Voice error correction method, device, equipment and storage medium
CN110415679B (en) * 2019-07-25 2021-12-17 北京百度网讯科技有限公司 Voice error correction method, device, equipment and storage medium
US11328708B2 (en) 2019-07-25 2022-05-10 Beijing Baidu Netcom Science And Technology Co., Ltd. Speech error-correction method, device and storage medium
CN111986668A (en) * 2020-08-20 2020-11-24 深圳市一本电子有限公司 AI voice intelligent control Internet of things method using vehicle-mounted charger
CN111986668B (en) * 2020-08-20 2021-05-11 深圳市一本电子有限公司 AI voice intelligent control Internet of things method using vehicle-mounted charger

Also Published As

Publication number Publication date
CN105469801B (en) 2019-07-12

Similar Documents

Publication Publication Date Title
CN107301865B (en) Method and device for determining interactive text in voice input
US6751595B2 (en) Multi-stage large vocabulary speech recognition system and method
CN109844740B (en) Follow-up voice query prediction
US9966077B2 (en) Speech recognition device and method
KR102281178B1 (en) Method and apparatus for recognizing multi-level speech
CN107644638B (en) Audio recognition method, device, terminal and computer readable storage medium
US8972260B2 (en) Speech recognition using multiple language models
US20170140750A1 (en) Method and device for speech recognition
WO2021047180A1 (en) Emotion recognition-based smart chat method, device, and computer apparatus
CN105469801A (en) Input speech restoring method and device
CN111797632B (en) Information processing method and device and electronic equipment
US9601110B2 (en) Unsupervised training method for an N-gram language model based upon recognition reliability
CN103544955A (en) Method of recognizing speech and electronic device thereof
CN105654955B (en) Audio recognition method and device
CN110019741B (en) Question-answering system answer matching method, device, equipment and readable storage medium
CN112509566B (en) Speech recognition method, device, equipment, storage medium and program product
CN105009206A (en) Speech-recognition device and speech-recognition method
CN111883137A (en) Text processing method and device based on voice recognition
CN104714954A (en) Information searching method and system based on context understanding
JP2010078877A (en) Speech recognition device, speech recognition method, and speech recognition program
US10629197B2 (en) Voice processing system and voice processing method for predicting and executing an ask-again request corresponding to a received request
CN110708619B (en) Word vector training method and device for intelligent equipment
CN104731918A (en) Voice search method and device
CN105244024A (en) Voice recognition method and device
CN115104151A (en) Offline voice recognition method and device, electronic equipment and readable storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20211105

Address after: No. 699, Wangshang Road, Binjiang District, Hangzhou, Zhejiang

Patentee after: Alibaba (China) Network Technology Co.,Ltd.

Address before: A four-storey 847 mailbox in Grand Cayman Capital Building, British Cayman Islands

Patentee before: ALIBABA GROUP HOLDING Ltd.

TR01 Transfer of patent right