CN109145276A

CN109145276A - A kind of text correction method after speech-to-text based on phonetic

Info

Publication number: CN109145276A
Application number: CN201810922512.1A
Authority: CN
Inventors: 吕韶
Original assignee: Hangzhou Zhiyu Network Technology Co Ltd
Current assignee: Hangzhou Zhiyu Network Technology Co Ltd
Priority date: 2018-08-14
Filing date: 2018-08-14
Publication date: 2019-01-04

Abstract

The invention discloses the text correction methods after a kind of speech-to-text based on phonetic, this method will be by text information made of speech recognition conversion by preliminary judgement, when there is unidentified information relevant to content out, according to the Pinyin information identified, it is calculated by pinyin similarity and corresponding text is replaced, the correction of voice is realized, in the hope of obtaining accurate semanteme.Phonetic similarity judgement of the present invention is modified with faster speed and is exported with the speech text of high-accuracy, and its implementation is easy, and the accuracy rate and service quality of speech recognition can significantly be guaranteed.

Description

A kind of text correction method after speech-to-text based on phonetic

Technical field

The present invention relates to the texts after artificial intelligent voice identification field more particularly to a kind of speech-to-text based on phonetic This bearing calibration.

Background technique

In the latest 20 years, speech recognition technology obtains marked improvement, starts to move towards market from laboratory.Speech recognition technology Progress into the every field such as industry, household electrical appliances, communication, automotive electronics, medical treatment, home services, consumption electronic product.Speech recognition Technology constantly promoted, the various robots based on speech recognition start to come into being, however because everyone birth Transient causes, the actual uses of speech recognition such as the immanent causes such as ground, pronunciation habit and signal interference, network are bad are accurate Rate is far below 97% that businessman is boasted.And actual speech identification accuracy rate greatly affect need using speech recognition into The business and work of row subsequent operation, therefore during practice, it needs to expend a large amount of manpowers and the time goes processing to identify Inaccuracy and bring trouble, undertake corresponding economic loss.

Existing technology main direction is the tuning and improvement to speech recognition, and the liter of technology is carried out on the algorithm of identification Grade, reaches higher recognition capability, seldom directs attention to correct this aspect to progress secondary treatment after speech recognition, existing That deposits is also corrected just for homonym.However it is that recognition capability is inadequate that many situations, which are not, in existing standard mandarin Under the technical background that discrimination can nearly all accurately identify, causing the reason of identifying deviation is the pronunciation difference and ring due to people Border bring interference etc., these problems are depended merely on the space that promotion recognition capability is difficult to capture or be promoted and are extremely limited.And unisonance Although the correction of word can make up for it the mistake of a part, but more in the case of be non-homonym caused by various complicated reasons Situation, thus market with greater need for be certain Fuzzy Processing ability bearing calibration.

Summary of the invention

In view of the deficiencies of the prior art, the present invention discloses the text correction side after a kind of speech-to-text based on phonetic The accuracy rate of method, the speech recognition that this method obtains is high, and specific technical solution is as follows:

Text correction method after a kind of speech-to-text based on phonetic, which is characterized in that this method includes following step It is rapid:

S1: the Chinese text information after speech recognition is subjected to cutting by Chinese Word Automatic Segmentation or tool, is obtained multiple Word；

S2: it searches in database under the application scenarios of this section of voice, keyword relevant to the word obtained in S1, to S1 Multiple words of middle acquisition are matched with obtained keyword；The database includes the submodule of multiple application scenarios, Multiple keywords relevant to the scene are stored in each submodule；According to different scenes, setting needs to match keyword Number requires if reaching matching, does not need to be corrected, directly by text output；Otherwise, into S3；

S3: i-th of each keyword in each word that S1 is obtained in i-th of Chinese character and database under the scene is calculated The editable discrepancy of distance D of the phonetic of a Chinese character_i, the editable discrepancy of distance of the phonetic is the single word to phonetic Two phonetics are become duplicate minimal modifications number by way of increase, deletion or replacement by symbol, each word Editable discrepancy of distance D=∑ D_i, given threshold k, when the word that S1 is obtained is for the smallest editable of all keywords Discrepancy of distance D_minWhen≤k, then the word in the corresponding S1 of the editable diversity factor is replaced with into corresponding key in database Word；

S4: by the text output after replacement, i.e. completion text correction.

Further, when searching keyword in the database of S2, only the noun after S1 cutting is matched.

Further, the database includes the submodule of multiple application scenarios, is stored and this in each submodule The relevant multiple keywords of scape.

Further, the editable discrepancy of distance D of the phonetic of i-th of Chinese character in the S3_iFor initial consonant, simple or compound vowel of a Chinese syllable and The sum of three kinds of editable discrepancy of distance of tone, i.e. D_i=d1+d2+d3, wherein the editable discrepancy of distance of initial consonant and simple or compound vowel of a Chinese syllable D1, d2 are identical with the definition in S3, and the editable discrepancy of distance of the tone defines d3 and is, tone is mutually all 0, are not all 1。

Further, respectively w1, the w2 of the weight of the editable discrepancy of distance of the initial consonant, simple or compound vowel of a Chinese syllable and tone, W3, then D_i=w1d1+w2d2+w3d3, and w1 >=w2 >=w3.

Beneficial effects of the present invention are as follows:

The phonetic that the present invention uses is the basis of Chinese language, is most to carry model close to the language semantic of language voice, The semantic loss converted in identification process is utmostly reduced, and efficiency more reasonable for the makeover process of phonetic is more Height, its implementation is easy, and means are flexible, and the accuracy rate and service quality of speech recognition can significantly be guaranteed.

Detailed description of the invention

Fig. 1 is the text correction method flow chart after the speech-to-text based on phonetic.

Specific embodiment

Below according to attached drawing and preferred embodiment the present invention is described in detail, the objects and effects of the present invention will become brighter White, below in conjunction with drawings and examples, the present invention will be described in further detail.It should be appreciated that described herein specific Embodiment is only used to explain the present invention, is not intended to limit the present invention.

As shown in Figure 1, the text correction method after the speech-to-text of the invention based on phonetic, specifically includes following step It is rapid:

Step 1: the Chinese text information after speech recognition is subjected to cutting by Chinese Word Automatic Segmentation or tool, is obtained Multiple words；

Step 2: searching the keyword in database under the application scenarios of this section of voice, multiple by what is obtained in step 1 Word is matched with keyword；The database includes the submodule of multiple application scenarios, in each submodule storage with The relevant multiple nominal keywords of the scene；According to different scenes, setting needs to match the number of keyword, if reached It is required to matching, does not then need to be corrected, directly by text output；Otherwise, three are entered step；

Step 3: the correction of Chinese text information is carried out according to the editable of phonetic distance.

The step is core of the invention, is divided into following sub-step.

1) each keyword in each word that step 1 obtains in i-th of Chinese character and database under the scene is calculated The editable discrepancy of distance D of the phonetic of i-th of Chinese character_i, the editable discrepancy of distance of the phonetic is the list to phonetic Two phonetics are become duplicate minimal modifications number by way of increase, deletion or replacement by a character, right respectively Initial consonant, the simple or compound vowel of a Chinese syllable harmony of corresponding Chinese character transfer in the calculating of row diversity factor, and the editable discrepancy of distance of tone is defined as, tone phase It is all 0, is not all 1, calculated result d1, d2 and d3, while according to pronunciation law, for the difference of initial consonant, simple or compound vowel of a Chinese syllable and tone Weight coefficient w1, w2 and w3, Di=d1*w1+d2*w2+d3*w3 is respectively set in degree；

2) the editable discrepancy of distance D=∑ D of single word is calculated_i, according to the threshold value k of setting, when the word that S1 is obtained The smallest editable discrepancy of distance D of the language for all keywords_minIt, then will be in the corresponding S1 of the editable diversity factor when≤k Word replace with corresponding keyword in database；

3) sub-step 1 and 2 is recycled, until all vocabulary obtained by step 1 are all calculated and are disposed；

Step 4: by the text output after replacement, i.e. completion text correction.

The present invention is described in detail using house property as 2 representational embodiments of Foreground selection below, herein data Library includes " price ", " position ", " mating " keyword, and threshold k=5, initial consonant, simple or compound vowel of a Chinese syllable, tone weight are 2:1:0.5.Following reality It applies example for convenience of explanation, is respectively provided with when matching a keyword i.e. it is believed that successful match.

In order to improve matching efficiency, the workload of calculating is reduced, is somebody's turn to do in the multiple words obtained in S1 and database When keyword under the application scenarios of Duan Yuyan is matched, only noun is matched.

Embodiment 1:

Correct text: how is house price?

Does identification text: house price swell sample?

Step 1: the Chinese text information after speech recognition is subjected to cutting by Chinese Word Automatic Segmentation or tool, is obtained Multiple words, this identify that text is split as " house ", " price ", " swollen sample ", and wherein noun has " house ", " price "；

Step 2: the keyword in database under house property application scenarios is searched, in the multiple words obtained in step 1 Noun matched with obtained keyword, identify " price " keyword, directly export result, because having been obtained Required key message, " swollen sample " do not influence as secondary information to semantic judgement, can terminate without correcting.

Embodiment 2:

Correct text: here house mating (pei4 tao4) how?

Identify text: here house quilt cover (bei4 tao4) how?

Step 1: the Chinese text information after speech recognition is subjected to cutting by Chinese Word Automatic Segmentation or tool, is obtained Multiple words, can will identify that in the example come character segmentation be " you ", " here ", " house ", " quilt cover ", " how "；

Step 2: relevant keyword under house property application scenarios in lookup database, it will be in the word that obtained in step 1 Noun " house ", the keyword " price " in " quilt cover " and database under the scene, " position ", " mating " carry out respectively Match, two words are unrecognized to be come out, and enters step three；

Step 3: the correction of Chinese text information is carried out according to the editable of phonetic distance；

1) each keyword in each word that step 1 obtains in i-th of Chinese character and database under the scene is calculated The editable discrepancy of distance D of the phonetic of i-th of Chinese character_i；

When word a is " house " in the example, keyword b is " price ", compares the initial consonant " f " of the 1st word " room " and " valence " " j " need to only be replaced, therefore initial consonant editable discrepancy of distance d1=1, and simple or compound vowel of a Chinese syllable is respectively " ang " and " ia ", need by " ng " removes along with " i " can just make two simple or compound vowel of a Chinese syllable identical, therefore the editable discrepancy of distance of simple or compound vowel of a Chinese syllable is d2=3, tone one A is the rising tone, and the editable distance of another falling tone, tone is d3=1, calculate the phonetic editable of first Chinese character away from From diversity factor D1=2*1+1*3+0.5*1=5.5；

Similarly second Chinese character is calculated, the D2=1*2+1*1+0.5=3.5 that can be calculated.

The finally editable discrepancy of distance D=D1+D2=9 of " house " for keyword " price "；

Similar calculating process,

The editable distance D=10 of " house " for keyword " position "；

The editable distance D=13 of " house " for keyword " mating "；

The Dmin=9 of word " house ", is greater than K, without any replacement；

3) sub-step 1 and 2 is recycled, until the word that all steps 1 provide has been calculated:

The editable distance D=2 of " quilt cover " for keyword " mating "；

The editable discrepancy of distance D=10 of " quilt cover " for keyword " price "；

" quilt cover " is D=8 for the editable distance of keyword " position "；

Therefore, for word " quilt cover ", the smallest editable is 2 apart from value, i.e. Dmin=2, Dmin ratio K value Small, the corresponding vocabulary of Dmin and keyword are respectively " quilt cover " and " mating ", and " quilt cover " is replaced with " mating "；

4) text output that will be disposed, text " here house quilt cover how? " correction of a final proof be " you this In house it is mating how? "；

It will appreciated by the skilled person that being not used to limit the foregoing is merely the preferred embodiment of invention System invention, although invention is described in detail referring to previous examples, for those skilled in the art, still It can modify to the technical solution of aforementioned each case history or equivalent replacement of some of the technical features.It is all Within the spirit and principle of invention, modification, equivalent replacement for being made etc. be should be included within the protection scope of invention.

Claims

1. the text correction method after a kind of speech-to-text based on phonetic, which is characterized in that this method comprises the following steps:

S1: the Chinese text information after speech recognition is subjected to cutting by Chinese Word Automatic Segmentation or tool, obtains multiple words；

S2: it searches in database under the application scenarios of this section of voice, keyword relevant to the word obtained in S1, to being obtained in S1 The multiple words obtained are matched with obtained keyword；The database includes the submodule of multiple application scenarios, each Multiple keywords relevant to the scene are stored in submodule；According to different scenes, setting needs to match the number of keyword, If reaching matching to require, do not need to be corrected, directly by text output；Otherwise, into S3；

S3: i-th of Chinese of each keyword in each word that S1 is obtained in i-th of Chinese character and database under the scene is calculated The editable discrepancy of distance D of the phonetic of word_i, the editable discrepancy of distance of the phonetic is logical to the single character of phonetic Two phonetics are become duplicate minimal modifications number by the mode for crossing increase, deletion or replacement, and each word is compiled Collect discrepancy of distance D=∑ D_i, given threshold k, when the word that S1 is obtained is for the smallest editable distance of all keywords Diversity factor D_minWhen≤k, then the word in the corresponding S1 of the editable diversity factor is replaced with into corresponding keyword in database.

S4: by the text output after replacement, i.e. completion text correction.

2. the text correction method after the speech-to-text according to claim 1 based on phonetic, which is characterized in that in S2 Database in search keyword when, only the noun after S1 cutting is matched.

3. the text correction method after the speech-to-text according to claim 1 based on phonetic, which is characterized in that described Database include multiple application scenarios submodule, storage multiple keywords relevant to the scene in each submodule.

4. the text correction method after the speech-to-text according to claim 1 based on phonetic, which is characterized in that described S3 in i-th of Chinese character phonetic editable discrepancy of distance D_iFor three kinds of initial consonant, simple or compound vowel of a Chinese syllable and tone editable range differences The sum of different degree, i.e. D_i=d1+d2+d3, wherein the definition phase in editable discrepancy of distance d1, d2 and S3 of initial consonant and simple or compound vowel of a Chinese syllable Together, it is that tone is mutually all 0 that the editable discrepancy of distance of the tone, which defines d3, is not all 1.

5. the text correction method after the speech-to-text according to claim 4 based on phonetic, which is characterized in that described Initial consonant, simple or compound vowel of a Chinese syllable and tone the weight of editable discrepancy of distance be respectively w1, w2, w3, then D_i=w1d1+w2d2+ W3d3, and w1 >=w2 >=w3.