CN106250364A - A kind of text modification method and device - Google Patents

A kind of text modification method and device Download PDF

Info

Publication number
CN106250364A
CN106250364A CN201610573610.XA CN201610573610A CN106250364A CN 106250364 A CN106250364 A CN 106250364A CN 201610573610 A CN201610573610 A CN 201610573610A CN 106250364 A CN106250364 A CN 106250364A
Authority
CN
China
Prior art keywords
word
participle
correct
centering
pair
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610573610.XA
Other languages
Chinese (zh)
Inventor
刘江
胡加学
金泽蒙
赵乾
于振华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
iFlytek Co Ltd
Original Assignee
iFlytek Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by iFlytek Co Ltd filed Critical iFlytek Co Ltd
Priority to CN201610573610.XA priority Critical patent/CN106250364A/en
Publication of CN106250364A publication Critical patent/CN106250364A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/232Orthographic correction, e.g. spell checking or vowelisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

Embodiments providing a kind of text modification method and device, wherein method includes: obtain text data to be revised;Obtaining correct word, described correct word is for replacing erroneous words corresponding with described correct word in described text data;The described erroneous words found according to described correct word and replace in described text data.In the present invention, when finding errors in text occur in text, user, without providing any erroneous words, only need to input correct word, and system i.e. goes to search the erroneous words of each correspondence according to correct word automatically.Such as have only to input correct word " dark reddish purple ", it is not necessary to point out that corresponding erroneous words is " dark reddish purple " or " fall ", automatically can look for each erroneous words corresponding according to correct word.Because user has only to provide correct word, it is not necessary to pointed out which erroneous words one by one, substantially increase correction efficiency, and the omission of the erroneous words that may cause because user manually searches can also be avoided, improve the accuracy rate of correction.

Description

A kind of text modification method and device
Technical field
The present invention relates to field of information processing, particularly relate to a kind of text modification method and device.
Background technology
People are to input text by the way of typewriting traditionally, along with the development of technology, occur in that again the newest The mode of text input (or perhaps text generation), such as, convert speech into text by speech recognition technology, pass through OCR Text conversion in picture is become text by technology, etc..But the most traditional typewriting input mode or the input of new text Mode, all suffers from a problem, continuing to bring out of the most various neologisms (such as network words), original to input system or the system of identification Dictionary cause no small impact, a large amount of homonyms of producing because of various neologisms, synonym, similar words etc. have a strong impact on Correct rate for input, causes inputted text often to show some erroneous words.Such as, user passes through one network of phonetic entry Word " dark reddish purple " (means " so "), may be wrongly recognized into " dark reddish purple ", " fall purple " or " fall when being converted into text Son " etc..
When inspection is found to have erroneous words, in the prior art, common process means are that user moves the cursor to mistake Word position, is re-entered correct word, erroneous words is replaced, or carried out certain erroneous words in the whole text by software by mistake Automatically search and replace, thus completing text correction.But inventor finds during realizing the present invention, in prior art These text correcting modes since it is desired that user has pointed out which is erroneous words one by one, so efficiency is the lowest.To be carried above As a example by " dark reddish purple " word arrived, when the user discover that it by when being identified as " dark reddish purple " of mistake, then needs to search the most in the whole text And replace, when user find again its by mistake when being identified as " dark reddish purple ", it is also desirable to search the most in the whole text and replace, when with Family find again its by mistake when being identified as " fall son ", in addition it is also necessary to search the most in the whole text and replace, in other words, Yong Huke Can at least need to carry out three times search in the whole text and replace, the various erroneous words of " dark reddish purple " word could be corrected.Meanwhile, because of Make mistake word for needs artificial cognition, so the accuracy rate of prior art is relatively low, such as in full in be likely present " dark reddish purple " Other erroneous words, but user does not finds in checking process, causes occurring in that omission.
Summary of the invention
The present invention provides a kind of text modification method and device, to improve efficiency and the accuracy rate of text correction.
First aspect according to embodiments of the present invention, it is provided that a kind of text modification method, described method includes:
Obtain text data to be revised;
Obtaining correct word, described correct word is for replacing erroneous words corresponding with described correct word in described text data;
The described erroneous words found according to described correct word and replace in described text data.
Optionally, the described erroneous words finding according to described correct word and replacing in described text data, including:
Described text data is carried out participle, being multiple participle words by described text data cutting;
Described correct word is formed word pair with each participle word;
Extracting the similarity of each correct word of word centering and participle word, described similarity includes font similarity, semanteme Similarity and acoustics similarity;
Similarity according to each word pair and default decision model, obtain each word to the probability for target word pair, institute State the word pair that target word is the erroneous words corresponding with described correct word to the participle word for word centering;
Described probability according to each word pair and preset algorithm, determine target word pair;
Described correct word is used to replace the participle word of described target word centering in described text data.
Optionally, after described text data is carried out participle, described correct word is formed word pair with each participle word Before, described method also includes:
Adjacent two individual character obtained after participle is combined into a participle word.
Optionally, extract the font similarity of each correct word of word centering and participle word, including:
If the correct word of current word centering is identical with the number of words of participle word, then each by correct word and participle word Individual character is converted into quadrangle coding, by correct word and the identical coded number of quadrangle coding and four of each corresponding individual character in participle word The meansigma methods of the ratio of angle coding editor-in-chief's yardage is as font similarity;
If the correct word of current word centering differs with the number of words of participle word, then dynamic programming algorithm will be used to obtain The smallest edit distance of correct word and participle word as font similarity.
Optionally, extract the semantic similarity of each correct word of word centering and participle word, including:
Correct word and participle word to current word centering carry out vectorization respectively to obtain term vector;
Using the distance between the term vector of correct word and participle word as semantic similarity.
Optionally, extract the acoustics similarity of each correct word of word centering and participle word, including:
Determine that the correct word of current word centering changes the smallest edit distance in table with participle word in pinyin character Path;
According on described smallest edit distance path each pinyin character pinyin character conversion distance obtain correct word with The pinyin character conversion distance of participle word;
Pinyin character conversion distance according to described correct word with participle word obtains the acoustics of correct word and participle word Distance and using described acoustics distance as acoustics similarity.
Optionally, according to described probability and the preset algorithm of each word pair, determine target word pair, including:
Judge the described probability of each word pair and the magnitude relationship of predetermined threshold value;
Described probability is more than the word of described predetermined threshold value to being defined as target word pair.
Optionally, according to described probability and the preset algorithm of each word pair, determine target word pair, including:
Described probability according to each word pair is to the sequence to carrying out from big to small of institute's predicate;
By the word of the predetermined number stood out to being defined as target word pair.
Optionally, according to described probability and the preset algorithm of each word pair, determine target word pair, including:
The correct word and the participle word that use current word centering in default vocabulary respectively make a look up, wherein said default In vocabulary, storage has the correct corresponding relation of correct word and erroneous words;
If the erroneous words that the correct word of use current word centering finds in described default vocabulary and current word centering Participle word identical, and, use current word centering participle word find in described default vocabulary as erroneous words Correct word identical with the correct word of current word centering, it is determined that current word is to being a target word pair;
If the erroneous words that the correct word of use current word centering finds in described default vocabulary and current word centering Participle word different, and, use the participle word of current word centering to find in described default vocabulary as erroneous words Correct word the most different from the correct word of current word centering, it is determined that current word is to not being a target word pair;
If the erroneous words using the correct word of current word centering to find in described default vocabulary only occurring with current The situation that the participle word of word centering is identical, or, only occur the participle word using current word centering as erroneous words in institute State the situation that the correct word found in default vocabulary is identical with the correct word of current word centering, then inquire user, and according to The instruction at family determines whether current word is to being a target word pair.
Second aspect according to embodiments of the present invention, it is provided that a kind of text correcting device, described device includes:
Text acquisition module, for obtaining text data to be revised;
Correct word acquisition module, is used for obtaining correct word, and described correct word is used for replacing in described text data with described The erroneous words that correct word is corresponding;
Replacement module, for the described erroneous words found according to described correct word and replace in described text data.
Optionally, described replacement module includes:
Participle submodule, for carrying out participle to described text data, being multiple participle by described text data cutting Word;
Word is to generating submodule, for described correct word is formed word pair with each participle word;
Similarity extracts submodule, for extracting the similarity of each correct word of word centering and participle word, described similar Degree includes font similarity, semantic similarity and acoustics similarity;
Probability obtains submodule, for the similarity according to each word pair and default decision model, obtains each word pair For the probability of target word pair, described target word is to the word that the participle word for word centering is the erroneous words corresponding with described correct word Right;
Target word, to determining submodule, for the described probability according to each word pair and preset algorithm, determines target word pair;
Replace submodule, for using described correct word to replace the participle of described target word centering in described text data Word.
Optionally, described replacement module also includes:
Individual character combination submodule, for being combined into a participle word by adjacent two individual character obtained after participle.
Optionally, described similarity extraction submodule is similar to the font of participle word at each correct word of word centering of extraction When spending, it is used for:
If the correct word of current word centering is identical with the number of words of participle word, then each by correct word and participle word Individual character is converted into quadrangle coding, by correct word and the identical coded number of quadrangle coding and four of each corresponding individual character in participle word The meansigma methods of the ratio of angle coding editor-in-chief's yardage is as font similarity;
If the correct word of current word centering differs with the number of words of participle word, then dynamic programming algorithm will be used to obtain The smallest edit distance of correct word and participle word as font similarity.
Optionally, described similarity extracts submodule at the semantic similitude extracting each correct word of word centering and participle word When spending, it is used for:
Correct word and participle word to current word centering carry out vectorization respectively to obtain term vector;By correct word with point Distance between the term vector of word word is as semantic similarity.
Optionally, described similarity extraction submodule is similar to the acoustics of participle word at each correct word of word centering of extraction When spending, it is used for:
Determine that the correct word of current word centering changes the smallest edit distance in table with participle word in pinyin character Path;Correct word and participle is obtained according to the pinyin character conversion distance of each pinyin character on described smallest edit distance path The pinyin character conversion distance of word;Pinyin character conversion distance according to described correct word and participle word obtain correct word with Participle word acoustics distance and using described acoustics distance as acoustics similarity.
Optionally, described probability acquisition submodule is used for:
Judge the described probability of each word pair and the magnitude relationship of predetermined threshold value;By described probability more than described predetermined threshold value Word is to being defined as target word pair.
Optionally, described probability acquisition submodule is used for:
Described probability according to each word pair is to the sequence to carrying out from big to small of institute's predicate;The predetermined number that will stand out Word to being defined as target word pair.
Optionally, described probability acquisition submodule is used for:
The correct word and the participle word that use current word centering in default vocabulary respectively make a look up, wherein said default In vocabulary, storage has the correct corresponding relation of correct word and erroneous words;
If the erroneous words that the correct word of use current word centering finds in described default vocabulary and current word centering Participle word identical, and, use current word centering participle word find in described default vocabulary as erroneous words Correct word identical with the correct word of current word centering, it is determined that current word is to being a target word pair;
If the erroneous words that the correct word of use current word centering finds in described default vocabulary and current word centering Participle word different, and, use the participle word of current word centering to find in described default vocabulary as erroneous words Correct word the most different from the correct word of current word centering, it is determined that current word is to not being a target word pair;
If the erroneous words using the correct word of current word centering to find in described default vocabulary only occurring with current The situation that the participle word of word centering is identical, or, only occur the participle word using current word centering as erroneous words in institute State the situation that the correct word found in default vocabulary is identical with the correct word of current word centering, then inquire user, and according to The instruction at family determines whether current word is to being a target word pair.
The technical scheme that embodiments of the invention provide can include following beneficial effect:
In the present invention, when finding errors in text occur in text, user, without providing any erroneous words, only needs input Correct word, system i.e. goes to search the erroneous words of each correspondence according to correct word automatically.Such as, when the user discover that in text When having showed being written as of " dark reddish purple " word mistake " dark reddish purple " and " fall " etc., it is only necessary to input correct word and i.e. input " dark reddish purple ", nothing Need to point out that corresponding erroneous words is " dark reddish purple " or " fall ", more without pointing out the position of each erroneous words, system can be automatically Look for each erroneous words corresponding according to correct word, and automatically use correct word replace determined by erroneous words, Thus complete text correction.Because user has only to provide correct word, it is not necessary to pointed out which erroneous words one by one, significantly Improve correction efficiency, and the omission of the erroneous words that may cause because user manually searches can also be avoided, improve and repair Positive accuracy rate.
It should be appreciated that it is only exemplary and explanatory, not that above general description and details hereinafter describe The present invention can be limited.
Accompanying drawing explanation
Accompanying drawing herein is merged in description and constitutes the part of this specification, it is shown that meet the enforcement of the present invention Example, and for explaining the principle of the present invention together with description.
Fig. 1 is the flow chart according to a kind of text modification method shown in the present invention one exemplary embodiment;
Fig. 2 is the flow chart according to a kind of text modification method shown in the present invention one exemplary embodiment;
Fig. 3 is the flow chart according to a kind of text modification method shown in the present invention one exemplary embodiment;
Fig. 4 is the flow chart according to a kind of text modification method shown in the present invention one exemplary embodiment;
Fig. 5 is the schematic diagram according to the smallest edit distance path shown in the present invention one exemplary embodiment;
Fig. 6 is the flow chart according to a kind of text modification method shown in the present invention one exemplary embodiment;
Fig. 7 is the flow chart according to a kind of text modification method shown in the present invention one exemplary embodiment;
Fig. 8 is the flow chart according to a kind of text modification method shown in the present invention one exemplary embodiment;
Fig. 9 is the schematic diagram according to a kind of text correcting device shown in the present invention one exemplary embodiment;
Figure 10 is the schematic diagram according to a kind of text correcting device shown in the present invention one exemplary embodiment;
Figure 11 is the schematic diagram according to a kind of text correcting device shown in the present invention one exemplary embodiment.
Detailed description of the invention
Here will illustrate exemplary embodiment in detail, its example represents in the accompanying drawings.Explained below relates to During accompanying drawing, unless otherwise indicated, the same numbers in different accompanying drawings represents same or analogous key element.Following exemplary embodiment Described in embodiment do not represent all embodiments consistent with the present invention.On the contrary, they are only with the most appended The example of the apparatus and method that some aspects that described in detail in claims, the present invention are consistent.
Fig. 1 is the flow chart according to a kind of text modification method shown in the present invention one exemplary embodiment.The method can For the mobile terminals such as mobile phone and the equipment such as PC, server.
Shown in Figure 1, the method may include that
Step S101, obtains text data to be revised.
Described text data to be revised can determine according to the demand of user, coming for text data to be revised Source the present embodiment does not limit, such as, can be the text that manually enters of user, it is also possible to be the literary composition that obtains of speech recognition Notebook data, or, it is that OCR (Optical Character Recognition, optical character recognition) identifies the textual data obtained According to, etc..
Step S102, obtains correct word, and described correct word is used for replacing in described text data corresponding with described correct word Erroneous words.
In the present embodiment, when finding to there is text mistake, user has only to input correct word, it is not necessary to it is right to point out Which the erroneous words answered has and respectively where.
Step S103, the described erroneous words finding according to described correct word and replacing in described text data.
For the described erroneous words specifically the most how found according to described correct word and replace in described text data, this enforcement Example does not limit, and is illustrated below by Fig. 2:
Shown in Figure 2, in the present embodiment or the present invention some other embodiments, find according to described correct word and replace Change the described erroneous words in described text data, the most described step S103, may include that
Step S201, carries out participle to described text data, being multiple participle words by described text data cutting.
The segmenting method used can be such as segmenting method based on condition random field, not enters this present embodiment Row limits.
For example, text data to be revised is " I thought not ", and the word segmentation result obtained is " I thought not ", Wherein " not having " is erroneous words, needs to be modified to " U.S. ".
Additionally, in order to miss some words when preventing participle, in the present embodiment can also be adjacent by obtain after participle Two individual characters are combined into a participle word, namely successively previous individual character and later individual character are combined into participle word.Example As word segmentation result above comprises multiple continuous individual character i.e. " I ", " thinking ", " going ", after being combined by described individual character, the participle obtained Word is " I thinks " and " thinking ".
Step S202, forms word pair by described correct word with each participle word.
Such as going up the correct word " U.S. " in example can be with the following multiple word pair of multiple participle words composition obtained: " beautiful State-I ", " U.S.-think ", " U.S.-go ", " U.S.-do not had ", " U.S.-I think ", " U.S.-think ".
Step S203, extracts the similarity of each correct word of word centering and participle word, and described similarity includes font phase Like degree, semantic similarity and acoustics similarity.
For specifically how extracting these three similarity, the present embodiment does not limit, and those skilled in the art are permissible According to different demands difference scene and designed, designed, can be in these designs used herein all without departing from the essence of the present invention God and protection domain.
Step S204, according to similarity and the default decision model of each word pair, obtains each word to for target word pair Probability, described target word is to the word pair that the participle word for word centering is the erroneous words corresponding with described correct word.
Described decision model can obtain by building in advance.For example, it is possible to collect a large amount of text data in advance, manually look for To erroneous words present in text data and provide the correct word that erroneous words is corresponding, by described correct word and participle in text data After participle word composition word just can manually mark each word to whether being target word pair to rear, be the most whether real " just Really word-erroneous words " word pair.When specifically marking, it is possible to use 0 and 1 as mark feature, if current word is to for real " correct word-erroneous words " word pair, then be labeled as 1, be otherwise labeled as 0.Then, extract the similarity of each two words of word centering, I.e. font similarity, semantic similarity, acoustics similarity.Finally using described similarity and mark feature as training data, instruction Get this decision model.When specifically training, using the similarity of each word pair as the input of model, by the mark of each word pair Model parameter, as the output of model, is updated by feature, and parameter updates after terminating, and obtains decision model.
When using this decision model, can using the similarity of each two words of word centering as the input of decision model, Then each word is exported to the probability for real " correct word-erroneous words " word pair.
Step S205, according to described probability and the preset algorithm of each word pair, determines target word pair.
Obtain each word to after for the probability of target word pair, it is possible to which filtering out according to preset algorithm is real mesh Mark word pair.Particular content the present embodiment for preset algorithm does not limit, and those skilled in the art can be according to difference Demand difference scene and designed, designed, can be in these designs used herein all without departing from the spirit of the present invention and protection Scope
Step S206, uses described correct word to replace the participle word of described target word centering in described text data.
Such as correct word is " U.S. ", and target word is to being " U.S.-do not had ", then can use in text data full text " U.S. " goes to replace " not having ", thus completes correction.
In the present embodiment, when finding errors in text occur in text, user, without providing any erroneous words, only needs defeated Entering correct word, system i.e. goes to search the erroneous words of each correspondence according to correct word automatically.Such as, when the user discover that in text When occurring in that being written as of " dark reddish purple " word mistake " dark reddish purple " and " fall " etc., it is only necessary to input correct word and i.e. input " dark reddish purple ", Without pointing out that the erroneous words of correspondence is " dark reddish purple " or " fall ", more without pointing out the position of each erroneous words, system can be certainly Dynamic look for each erroneous words corresponding according to correct word, and automatically use correct word replace determined by mistake Word, thus complete text correction.Because user has only to provide correct word, it is not necessary to pointed out which erroneous words one by one, Substantially increase correction efficiency, and the omission of the erroneous words that may cause because user manually searches can also be avoided, improve The accuracy rate revised.
Below to how extracting the similarity of each correct word of word centering and participle word, namely step S203, further It is illustrated.
In the present embodiment or the present invention some other embodiments, extract the word of each correct word of word centering and participle word Shape similarity, specifically may include that
If the correct word of current word centering is identical with the number of words of participle word, then each by correct word and participle word Individual character is converted into quadrangle coding, by correct word and the identical coded number of quadrangle coding and four of each corresponding individual character in participle word The meansigma methods of the ratio of angle coding editor-in-chief's yardage is as font similarity.
Shown in circular such as formula (1):
T = 1 n ( Σ i = 1 i = n l i L i ) - - - ( 1 )
Wherein, T represents the font similarity of two words of word centering, and n is the number of words of each word of word centering, liRepresent two words The identical coded number of quadrangle coding of middle i-th word, LiRepresent that in two words, quadrangle coding editor-in-chief's yardage of i-th word is (usually 4)。
For example, as follows to the font Similarity Measure process of " to going-thinking " for word:
" to " quadrangle coding be 2722
The quadrangle coding " thought " is 4633
1st word i.e. " to " and quadrangle coding editor-in-chief's yardage of " thinking " be 4, but there is no identical coding, And the 2nd word " is gone " and " going ",So finally obtaining the font similarity of this word pair according to formula (1) is 0.5。
If the correct word of current word centering differs with the number of words of participle word, then can will use dynamic programming algorithm The correct word obtained and the smallest edit distance of participle word are as font similarity.Existing skill can be used when implementing Art, here is omitted.
Shown in Figure 3, in the present embodiment or the present invention some other embodiments, extract each correct word of word centering with The semantic similarity of participle word, specifically may include that
Step S301, correct word and participle word to current word centering carry out vectorization respectively to obtain term vector.
Step S302, using the distance between the term vector of correct word and participle word as semantic similarity.
As example, concrete vectorization method can use the methods such as Word2Vec word each to word centering to carry out vector Change.After obtaining the term vector of each word of word centering, the distance of two term vectors can be COS distance, Euclidean distance etc., specifically Computational methods are same as the prior art, are not described in detail in this.
Shown in Figure 4, in the present embodiment or the present invention some other embodiments, extract each correct word of word centering with The acoustics similarity of participle word, specifically may include that
Step S401, determines that the correct word of current word centering changes the minimum in table with participle word in pinyin character Editing distance path.
Step S402, obtains according to the pinyin character conversion distance of each pinyin character on described smallest edit distance path The pinyin character conversion distance of correct word and participle word.
Step S403, obtains correct word and participle word according to the pinyin character conversion distance of described correct word with participle word Language acoustics distance and using described acoustics distance as acoustics similarity.
Described acoustics similarity refers to that two words, in enunciative similarity, use the acoustics distance of two words to represent, two The acoustics distance of word is the nearest, then acoustics similarity is the highest.Distance can be changed by the pinyin character of two words to calculate, i.e. root Come according to the conversion distance of two pinyin character in pinyin character conversion distance table (or perhaps pinyin character conversion confusion matrix) Calculate.Table 1 is part pinyin character conversion confusion matrix, and wherein, the first row and first is classified as the pinyin character of mutually conversion, two Character intersection is conversion distance.
Table 1
a ai an ang ao b c ch d e ei en eng
a 0.67 0.65 0.72 0.6 1 1 1 1 0.6 0.893 0.88 0.927
ai 0.67 0.7 0.95 0.928 1 1 1 1 0.914 0.763 0.866 0.928
an 0.654 0.699 0.6 0.938 1 1 1 1 0.954 0.944 0.67 0.832
ang 0.716 0.95 0.6 0.793 1 1 1 1 0.972 0.971 0.877 0.737
Pinyin character conversion distance according to two words calculates the acoustics distance of two words, and circular can be such as formula (2) shown in:
D a c o u ( a 1 , a 2 ) = 1 1 + D e d i t ( a 1 , a 2 ) - - - ( 2 )
Wherein, Dacou(a1,a2) it is the acoustics distance of two words, Dedit(a1,a2) be two words pinyin character conversion away from From.Dedit(a1,a2) two words minimum editor in pinyin character conversion distance table can be searched according to dynamic programming method Distance path, will i.e. can get the phonetic word of two words after the pinyin character conversion distance fusion of each pinyin character on this path Symbol conversion distance Dedit(a1,a2), concrete fusion method such as can be averaged, simply cumulative or weighted accumulation etc..
For example, " report a case to the security authorities " and the pinyin character conversion distance calculating method of " standby dish " two words be as follows:
1) each word is converted into phonetic
Report a case to the security authorities-> bao an
Standby dish-> bei cai
2) according to pinyin character conversion confusion matrix (namely pinyin character conversion distance table), table look-up and obtain each phonetic word The pinyin character conversion distance of symbol is as shown in table 2:
Table 2
b ao an
b 0 1 1
ei 1 0.976 0.944
c 1 1 1
ai 1 0.928 0.699
3) utilize dynamic programming method, calculate the pinyin character conversion distance of two words
When specifically calculating, it is possible to use dynamic programming method searches pinyin character conversion distance table, find minimum editor away from From path, after the value on this path being merged, i.e. can get the pinyin character conversion distance of two words, as it is shown in figure 5, shadow region Territory is smallest edit distance path, the pinyin character on smallest edit distance path is changed distance and the most simply adds up I.e. can get pinyin character the conversion distance, i.e. 0+0+0.976+1+0.699=2.675 of two words.
Additionally, for step S205, i.e. according to described probability and the preset algorithm of each word pair, determine target word pair, permissible There is various ways to realize, be illustrated below by Fig. 6~Fig. 8:
Shown in Figure 6, in the present embodiment or the present invention some other embodiments, according to the described probability of each word pair and Preset algorithm, determines target word pair, may include that
Step S601, it is judged that the described probability of each word pair and the magnitude relationship of predetermined threshold value.
Step S602, is more than the word of described predetermined threshold value to being defined as target word pair by described probability.
Or it is shown in Figure 7, in the present embodiment or the present invention some other embodiments, according to each word pair Probability and preset algorithm, determine target word pair, may include that
Step S701, according to the described probability of each word pair to the sequence to carrying out from big to small of institute's predicate.
Step S702, by the word of the predetermined number stood out to being defined as target word pair.
Or it is shown in Figure 8, in the present embodiment or the present invention some other embodiments, according to each word pair Probability and preset algorithm, determine target word pair, may include that
Step S801, uses the correct word of current word centering and participle word to make a look up in default vocabulary respectively, its Described in preset storage in vocabulary and have the correct corresponding relation of correct word and erroneous words.
Described default vocabulary preserves the correct word easily made mistakes and the erroneous words of correspondence thereof, such as " U.S.-do not had ", " U.S. State-often cross " etc..Described vocabulary can be built the most in advance by domain expert and obtain.
Step S802, if using the erroneous words that finds in described default vocabulary of correct word of current word centering and working as The participle word of front word centering is identical, and, use current word centering participle word as erroneous words at described default vocabulary In the correct word that finds identical with the correct word of current word centering, it is determined that current word is to being a target word pair.
Step S803, if using the erroneous words that finds in described default vocabulary of correct word of current word centering and working as The participle word of front word centering is different, and, use the participle word of current word centering as erroneous words at described default vocabulary In the correct word that finds the most different from the correct word of current word centering, it is determined that current word is to not being a target word pair.
, if only there is the mistake using the correct word of current word centering to find in described default vocabulary in step S804 The situation that word is identical with the participle word of current word centering, or, the participle word using current word centering only occurs as mistake The situation that correct word that by mistake word finds in described default vocabulary is identical with the correct word of current word centering, then inquire user, And determine whether current word is to being a target word pair according to the instruction of user.If now user confirms, it is determined that current Word is to being a target word pair, if user is unconfirmed, it is determined that current word is to not being a target word pair.
It should be noted that for Fig. 6~Fig. 8 these three mode, it is also possible to carry out combination of two or three combine together Use, order the present embodiment when syntagmatic and combination is not limited.For example, it is possible to it is big first to filter out probability In the word pair of threshold value, carry out the sequence of probability size the most on this basis, choose the word of the predetermined number stood out to really It is set to target word pair;Again for example, it is possible to first carry out the sequence of probability size, choose the word pair of the predetermined number stood out, so After recycle described default vocabulary on this basis and screen;Again for example, it is possible to first filter out the probability word pair more than threshold value, Recycle described default vocabulary the most on this basis and carry out postsearch screening;Etc..
Following for apparatus of the present invention embodiment, may be used for performing the inventive method embodiment.Real for apparatus of the present invention Execute the details not disclosed in example, refer to the inventive method embodiment.
Fig. 9 is the schematic diagram according to a kind of text correcting device shown in the present invention one exemplary embodiment.This device can For the mobile terminals such as mobile phone and the equipment such as PC, server.
Shown in Figure 9, this device may include that
Text acquisition module 901, for obtaining text data to be revised;
Correct word acquisition module 902, is used for obtaining correct word, and described correct word is used for replacing in described text data and institute State the erroneous words that correct word is corresponding;
Replacement module 903, for the described erroneous words found according to described correct word and replace in described text data.
Shown in Figure 10, in the present embodiment or the present invention some other embodiments, described replacement module may include that
Participle submodule 1001, for carrying out participle to described text data, being multiple by described text data cutting Participle word;
Word is to generating submodule 1002, for described correct word is formed word pair with each participle word;
Similarity extracts submodule 1003, for extracting the similarity of each correct word of word centering and participle word, described Similarity includes font similarity, semantic similarity and acoustics similarity;
Probability obtains submodule 1004, for the similarity according to each word pair and default decision model, obtains each Word is to the probability for target word pair, and described target word is the erroneous words corresponding with described correct word to the participle word for word centering Word pair;
Target word, to determining submodule 1005, for the described probability according to each word pair and preset algorithm, determines target word Right;
Replace submodule 1006, for using described correct word to replace described target word centering in described text data Participle word.
Shown in Figure 11, in the present embodiment or the present invention some other embodiments, described replacement module can also wrap Include:
Individual character combination submodule 1101, for being combined into a participle word by adjacent two individual character obtained after participle Language.
In the present embodiment or the present invention some other embodiments, described similarity is extracted submodule and is being extracted each word pair In the font similarity of correct word and participle word time, specifically may be used for:
If the correct word of current word centering is identical with the number of words of participle word, then each by correct word and participle word Individual character is converted into quadrangle coding, by correct word and the identical coded number of quadrangle coding and four of each corresponding individual character in participle word The meansigma methods of the ratio of angle coding editor-in-chief's yardage is as font similarity;
If the correct word of current word centering differs with the number of words of participle word, then dynamic programming algorithm will be used to obtain The smallest edit distance of correct word and participle word as font similarity.
In the present embodiment or the present invention some other embodiments, described similarity is extracted submodule and is being extracted each word pair In the semantic similarity of correct word and participle word time, specifically may be used for:
Correct word and participle word to current word centering carry out vectorization respectively to obtain term vector;By correct word with point Distance between the term vector of word word is as semantic similarity.
In the present embodiment or the present invention some other embodiments, described similarity is extracted submodule and is being extracted each word pair In the acoustics similarity of correct word and participle word time, specifically may be used for:
Determine that the correct word of current word centering changes the smallest edit distance in table with participle word in pinyin character Path;Correct word and participle is obtained according to the pinyin character conversion distance of each pinyin character on described smallest edit distance path The pinyin character conversion distance of word;Pinyin character conversion distance according to described correct word and participle word obtain correct word with Participle word acoustics distance and using described acoustics distance as acoustics similarity.
In the present embodiment or the present invention some other embodiments, described probability obtains submodule and specifically may be used for:
Judge the described probability of each word pair and the magnitude relationship of predetermined threshold value;By described probability more than described predetermined threshold value Word is to being defined as target word pair.
In the present embodiment or the present invention some other embodiments, described probability obtains submodule and specifically may be used for:
Described probability according to each word pair is to the sequence to carrying out from big to small of institute's predicate;The predetermined number that will stand out Word to being defined as target word pair.
In the present embodiment or the present invention some other embodiments, described probability obtains submodule and specifically may be used for:
The correct word and the participle word that use current word centering in default vocabulary respectively make a look up, wherein said default In vocabulary, storage has the correct corresponding relation of correct word and erroneous words;
If the erroneous words that the correct word of use current word centering finds in described default vocabulary and current word centering Participle word identical, and, use current word centering participle word find in described default vocabulary as erroneous words Correct word identical with the correct word of current word centering, it is determined that current word is to being a target word pair;
If the erroneous words that the correct word of use current word centering finds in described default vocabulary and current word centering Participle word different, and, use the participle word of current word centering to find in described default vocabulary as erroneous words Correct word the most different from the correct word of current word centering, it is determined that current word is to not being a target word pair;
If the erroneous words using the correct word of current word centering to find in described default vocabulary only occurring with current The situation that the participle word of word centering is identical, or, only occur the participle word using current word centering as erroneous words in institute State the situation that the correct word found in default vocabulary is identical with the correct word of current word centering, then inquire user, and according to The instruction at family determines whether current word is to being a target word pair.
In the present embodiment, when finding errors in text occur in text, user, without providing any erroneous words, only needs defeated Entering correct word, system i.e. goes to search the erroneous words of each correspondence according to correct word automatically.Such as, when the user discover that in text When occurring in that being written as of " dark reddish purple " word mistake " dark reddish purple " and " fall " etc., it is only necessary to input correct word and i.e. input " dark reddish purple ", Without pointing out that the erroneous words of correspondence is " dark reddish purple " or " fall ", more without pointing out the position of each erroneous words, system can be certainly Dynamic look for each erroneous words corresponding according to correct word, and automatically use correct word replace determined by mistake Word, thus complete text correction.Because user has only to provide correct word, it is not necessary to pointed out which erroneous words one by one, Substantially increase correction efficiency, and the omission of the erroneous words that may cause because user manually searches can also be avoided, improve The accuracy rate revised.
About the device in above-described embodiment, wherein unit module perform the concrete mode of operation relevant The embodiment of the method is described in detail, explanation will be not set forth in detail herein.
Those skilled in the art, after considering description and putting into practice invention disclosed herein, will readily occur to its of the present invention Its embodiment.The application is intended to any modification, purposes or the adaptations of the present invention, these modification, purposes or Person's adaptations is followed the general principle of the present invention and includes the undocumented common knowledge in the art of the present invention Or conventional techniques means.Description and embodiments is considered only as exemplary, and true scope and spirit of the invention are by appended Claim is pointed out.
It should be appreciated that the invention is not limited in precision architecture described above and illustrated in the accompanying drawings, and And various modifications and changes can carried out without departing from the scope.The scope of the present invention is only limited by appended claim.

Claims (18)

1. a text modification method, it is characterised in that described method includes:
Obtain text data to be revised;
Obtaining correct word, described correct word is for replacing erroneous words corresponding with described correct word in described text data;
The described erroneous words found according to described correct word and replace in described text data.
Method the most according to claim 1, it is characterised in that find according to described correct word and replace described text data In described erroneous words, including:
Described text data is carried out participle, being multiple participle words by described text data cutting;
Described correct word is formed word pair with each participle word;
Extracting the similarity of each correct word of word centering and participle word, described similarity includes font similarity, semantic similitude Degree and acoustics similarity;
Similarity according to each word pair and default decision model, obtain each word to the probability for target word pair, described mesh Mark word is the word pair of the erroneous words corresponding with described correct word to the participle word for word centering;
Described probability according to each word pair and preset algorithm, determine target word pair;
Described correct word is used to replace the participle word of described target word centering in described text data.
Method the most according to claim 2, it is characterised in that after described text data is carried out participle, by described just Really word and each participle word composition word are to before, and described method also includes:
Adjacent two individual character obtained after participle is combined into a participle word.
Method the most according to claim 2, it is characterised in that extract the font of each correct word of word centering and participle word Similarity, including:
If the correct word of current word centering is identical with the number of words of participle word, then by each individual character of correct word Yu participle word It is converted into quadrangle coding, correct word is compiled with corner with the identical coded number of quadrangle coding of each corresponding individual character in participle word The meansigma methods of the ratio of code editor-in-chief's yardage is as font similarity;
If the correct word of current word centering differs with the number of words of participle word, then dynamic programming algorithm will be used just to obtain Really word and the smallest edit distance of participle word are as font similarity.
Method the most according to claim 2, it is characterised in that extract the semanteme of each correct word of word centering and participle word Similarity, including:
Correct word and participle word to current word centering carry out vectorization respectively to obtain term vector;
Using the distance between the term vector of correct word and participle word as semantic similarity.
Method the most according to claim 2, it is characterised in that extract the acoustics of each correct word of word centering and participle word Similarity, including:
Determine that the correct word of current word centering changes the smallest edit distance path in table with participle word in pinyin character;
Correct word and participle is obtained according to the pinyin character conversion distance of each pinyin character on described smallest edit distance path The pinyin character conversion distance of word;
Pinyin character conversion distance according to described correct word with participle word obtains the acoustics distance of correct word and participle word And using described acoustics distance as acoustics similarity.
Method the most according to claim 2, it is characterised in that according to described probability and the preset algorithm of each word pair, determine Target word pair, including:
Judge the described probability of each word pair and the magnitude relationship of predetermined threshold value;
Described probability is more than the word of described predetermined threshold value to being defined as target word pair.
Method the most according to claim 2, it is characterised in that according to described probability and the preset algorithm of each word pair, determine Target word pair, including:
Described probability according to each word pair is to the sequence to carrying out from big to small of institute's predicate;
By the word of the predetermined number stood out to being defined as target word pair.
Method the most according to claim 2, it is characterised in that according to described probability and the preset algorithm of each word pair, determine Target word pair, including:
The correct word and the participle word that use current word centering in default vocabulary respectively make a look up, wherein said default vocabulary Middle storage has the correct corresponding relation of correct word and erroneous words;
If the erroneous words that the correct word of use current word centering finds in described default vocabulary is divided with current word centering Word word is identical, and, use the participle word of current word centering just to find in described default vocabulary as erroneous words Really word is identical with the correct word of current word centering, it is determined that current word is to being a target word pair;
If the erroneous words that the correct word of use current word centering finds in described default vocabulary is divided with current word centering Word word is different, and, use the participle word of current word centering just to find in described default vocabulary as erroneous words Really word is the most different from the correct word of current word centering, it is determined that current word is to not being a target word pair;
If erroneous words and the current word pair using the correct word of current word centering to find in described default vocabulary only occurs In the identical situation of participle word, or, only occur the participle word using current word centering as erroneous words described pre- If the situation that the correct word found in vocabulary is identical with the correct word of current word centering, then inquire user, and according to user's Instruction determines whether current word is to being a target word pair.
10. a text correcting device, it is characterised in that described device includes:
Text acquisition module, for obtaining text data to be revised;
Correct word acquisition module, is used for obtaining correct word, described correct word be used for replacing in described text data with described correctly The erroneous words that word is corresponding;
Replacement module, for the described erroneous words found according to described correct word and replace in described text data.
11. devices according to claim 10, it is characterised in that described replacement module includes:
Participle submodule, for carrying out participle to described text data, being multiple participle words by described text data cutting;
Word is to generating submodule, for described correct word is formed word pair with each participle word;
Similarity extracts submodule, for extracting the similarity of each correct word of word centering and participle word, described similarity bag Include font similarity, semantic similarity and acoustics similarity;
Probability obtains submodule, for according to the similarity of each word pair and default decision model, obtains each word to for mesh The probability of mark word pair, described target word is to the word pair that the participle word for word centering is the erroneous words corresponding with described correct word;
Target word, to determining submodule, for the described probability according to each word pair and preset algorithm, determines target word pair;
Replace submodule, for using described correct word to replace the participle word of described target word centering in described text data Language.
12. devices according to claim 11, it is characterised in that described replacement module also includes:
Individual character combination submodule, for being combined into a participle word by adjacent two individual character obtained after participle.
13. devices according to claim 11, it is characterised in that described similarity is extracted submodule and extracted each word pair In the font similarity of correct word and participle word time, be used for:
If the correct word of current word centering is identical with the number of words of participle word, then by each individual character of correct word Yu participle word It is converted into quadrangle coding, correct word is compiled with corner with the identical coded number of quadrangle coding of each corresponding individual character in participle word The meansigma methods of the ratio of code editor-in-chief's yardage is as font similarity;
If the correct word of current word centering differs with the number of words of participle word, then dynamic programming algorithm will be used just to obtain Really word and the smallest edit distance of participle word are as font similarity.
14. devices according to claim 11, it is characterised in that described similarity is extracted submodule and extracted each word pair In the semantic similarity of correct word and participle word time, be used for:
Correct word and participle word to current word centering carry out vectorization respectively to obtain term vector;By correct word and participle word Distance between the term vector of language is as semantic similarity.
15. devices according to claim 11, it is characterised in that described similarity is extracted submodule and extracted each word pair In the acoustics similarity of correct word and participle word time, be used for:
Determine that the correct word of current word centering changes the smallest edit distance path in table with participle word in pinyin character; Correct word and participle word is obtained according to the pinyin character conversion distance of each pinyin character on described smallest edit distance path Pinyin character conversion distance;Pinyin character conversion distance according to described correct word with participle word obtains correct word and participle Word acoustics distance and using described acoustics distance as acoustics similarity.
16. devices according to claim 11, it is characterised in that described probability obtains submodule and is used for:
Judge the described probability of each word pair and the magnitude relationship of predetermined threshold value;Described probability is more than the word pair of described predetermined threshold value It is defined as target word pair.
17. devices according to claim 11, it is characterised in that described probability obtains submodule and is used for:
Described probability according to each word pair is to the sequence to carrying out from big to small of institute's predicate;The word of predetermined number that will stand out To being defined as target word pair.
18. devices according to claim 11, it is characterised in that described probability obtains submodule and is used for:
The correct word and the participle word that use current word centering in default vocabulary respectively make a look up, wherein said default vocabulary Middle storage has the correct corresponding relation of correct word and erroneous words;
If the erroneous words that the correct word of use current word centering finds in described default vocabulary is divided with current word centering Word word is identical, and, use the participle word of current word centering just to find in described default vocabulary as erroneous words Really word is identical with the correct word of current word centering, it is determined that current word is to being a target word pair;
If the erroneous words that the correct word of use current word centering finds in described default vocabulary is divided with current word centering Word word is different, and, use the participle word of current word centering just to find in described default vocabulary as erroneous words Really word is the most different from the correct word of current word centering, it is determined that current word is to not being a target word pair;
If erroneous words and the current word pair using the correct word of current word centering to find in described default vocabulary only occurs In the identical situation of participle word, or, only occur the participle word using current word centering as erroneous words described pre- If the situation that the correct word found in vocabulary is identical with the correct word of current word centering, then inquire user, and according to user's Instruction determines whether current word is to being a target word pair.
CN201610573610.XA 2016-07-20 2016-07-20 A kind of text modification method and device Pending CN106250364A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610573610.XA CN106250364A (en) 2016-07-20 2016-07-20 A kind of text modification method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610573610.XA CN106250364A (en) 2016-07-20 2016-07-20 A kind of text modification method and device

Publications (1)

Publication Number Publication Date
CN106250364A true CN106250364A (en) 2016-12-21

Family

ID=57613760

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610573610.XA Pending CN106250364A (en) 2016-07-20 2016-07-20 A kind of text modification method and device

Country Status (1)

Country Link
CN (1) CN106250364A (en)

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106874241A (en) * 2016-12-23 2017-06-20 《中国医药科学》杂志社有限公司 A kind of intelligent manuscript editing system
CN107168941A (en) * 2017-05-12 2017-09-15 掌阅科技股份有限公司 Content of text modification method, electronic equipment, computer-readable storage medium
CN107291698A (en) * 2017-06-30 2017-10-24 广东欧珀移动通信有限公司 Information revision method, device, storage medium and electronic equipment
CN107633250A (en) * 2017-09-11 2018-01-26 畅捷通信息技术股份有限公司 A kind of Text region error correction method, error correction system and computer installation
CN108804414A (en) * 2018-05-04 2018-11-13 科沃斯商用机器人有限公司 Text modification method, device, smart machine and readable storage medium storing program for executing
CN108874174A (en) * 2018-05-29 2018-11-23 腾讯科技(深圳)有限公司 A kind of text error correction method, device and relevant device
CN109085932A (en) * 2018-08-17 2018-12-25 科大讯飞股份有限公司 A kind of candidate entry method of adjustment, device, equipment and readable storage medium storing program for executing
CN109308295A (en) * 2018-09-26 2019-02-05 南京邮电大学 A kind of privacy exposure method of real-time of data-oriented publication
CN109657115A (en) * 2018-10-18 2019-04-19 平安科技(深圳)有限公司 Crawl data self-repair method, device, equipment and medium
CN109814734A (en) * 2019-01-15 2019-05-28 上海趣虫科技有限公司 A kind of method and processing terminal of the input of the amendment Chinese phonetic alphabet
CN110737757A (en) * 2018-07-03 2020-01-31 百度在线网络技术(北京)有限公司 Method and apparatus for generating information
CN110851484A (en) * 2019-11-13 2020-02-28 北京香侬慧语科技有限责任公司 Method and device for obtaining multi-index question answers
WO2020052060A1 (en) * 2018-09-14 2020-03-19 北京字节跳动网络技术有限公司 Method and apparatus for generating correction statement
CN111191441A (en) * 2020-01-06 2020-05-22 广东博智林机器人有限公司 Text error correction method, device and storage medium
CN111291552A (en) * 2020-05-09 2020-06-16 支付宝(杭州)信息技术有限公司 Method and system for correcting text content
US10929710B2 (en) 2019-05-21 2021-02-23 Advanced New Technologies Co., Ltd. Methods and devices for quantifying text similarity
CN112396049A (en) * 2020-11-19 2021-02-23 平安普惠企业管理有限公司 Text error correction method and device, computer equipment and storage medium
CN112530405A (en) * 2019-09-18 2021-03-19 北京声智科技有限公司 End-to-end speech synthesis error correction method, system and device
WO2021129410A1 (en) * 2019-12-23 2021-07-01 华为技术有限公司 Method and device for text processing

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070033037A1 (en) * 2005-08-05 2007-02-08 Microsoft Corporation Redictation of misrecognized words using a list of alternatives
CN103207769A (en) * 2012-01-16 2013-07-17 联想(北京)有限公司 Method and user equipment for voice amending
CN103399907A (en) * 2013-07-31 2013-11-20 深圳市华傲数据技术有限公司 Method and device for calculating similarity of Chinese character strings on the basis of edit distance
CN103903618A (en) * 2012-12-28 2014-07-02 联想(北京)有限公司 Voice input method and electronic device
CN105244029A (en) * 2015-08-28 2016-01-13 科大讯飞股份有限公司 Voice recognition post-processing method and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070033037A1 (en) * 2005-08-05 2007-02-08 Microsoft Corporation Redictation of misrecognized words using a list of alternatives
CN103207769A (en) * 2012-01-16 2013-07-17 联想(北京)有限公司 Method and user equipment for voice amending
CN103903618A (en) * 2012-12-28 2014-07-02 联想(北京)有限公司 Voice input method and electronic device
CN103399907A (en) * 2013-07-31 2013-11-20 深圳市华傲数据技术有限公司 Method and device for calculating similarity of Chinese character strings on the basis of edit distance
CN105244029A (en) * 2015-08-28 2016-01-13 科大讯飞股份有限公司 Voice recognition post-processing method and system

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106874241A (en) * 2016-12-23 2017-06-20 《中国医药科学》杂志社有限公司 A kind of intelligent manuscript editing system
CN107168941A (en) * 2017-05-12 2017-09-15 掌阅科技股份有限公司 Content of text modification method, electronic equipment, computer-readable storage medium
CN107291698A (en) * 2017-06-30 2017-10-24 广东欧珀移动通信有限公司 Information revision method, device, storage medium and electronic equipment
CN107633250A (en) * 2017-09-11 2018-01-26 畅捷通信息技术股份有限公司 A kind of Text region error correction method, error correction system and computer installation
CN107633250B (en) * 2017-09-11 2023-04-18 畅捷通信息技术股份有限公司 Character recognition error correction method, error correction system and computer device
CN108804414A (en) * 2018-05-04 2018-11-13 科沃斯商用机器人有限公司 Text modification method, device, smart machine and readable storage medium storing program for executing
CN108874174A (en) * 2018-05-29 2018-11-23 腾讯科技(深圳)有限公司 A kind of text error correction method, device and relevant device
CN110737757A (en) * 2018-07-03 2020-01-31 百度在线网络技术(北京)有限公司 Method and apparatus for generating information
CN110737757B (en) * 2018-07-03 2022-07-05 百度在线网络技术(北京)有限公司 Method and apparatus for generating information
CN109085932A (en) * 2018-08-17 2018-12-25 科大讯飞股份有限公司 A kind of candidate entry method of adjustment, device, equipment and readable storage medium storing program for executing
WO2020052060A1 (en) * 2018-09-14 2020-03-19 北京字节跳动网络技术有限公司 Method and apparatus for generating correction statement
US11531814B2 (en) 2018-09-14 2022-12-20 Beijing Bytedance Network Technology Co., Ltd. Method and device for generating modified statement
CN109308295A (en) * 2018-09-26 2019-02-05 南京邮电大学 A kind of privacy exposure method of real-time of data-oriented publication
CN109657115B (en) * 2018-10-18 2023-04-14 平安科技(深圳)有限公司 Crawling data self-repairing method, device, equipment and medium
CN109657115A (en) * 2018-10-18 2019-04-19 平安科技(深圳)有限公司 Crawl data self-repair method, device, equipment and medium
CN109814734B (en) * 2019-01-15 2022-04-15 上海趣虫科技有限公司 Method for correcting Chinese pinyin input and processing terminal
CN109814734A (en) * 2019-01-15 2019-05-28 上海趣虫科技有限公司 A kind of method and processing terminal of the input of the amendment Chinese phonetic alphabet
US10929710B2 (en) 2019-05-21 2021-02-23 Advanced New Technologies Co., Ltd. Methods and devices for quantifying text similarity
CN113723466A (en) * 2019-05-21 2021-11-30 创新先进技术有限公司 Text similarity quantification method, equipment and system
US11210553B2 (en) 2019-05-21 2021-12-28 Advanced New Technologies Co., Ltd. Methods and devices for quantifying text similarity
CN113723466B (en) * 2019-05-21 2024-03-08 创新先进技术有限公司 Text similarity quantification method, device and system
CN112530405A (en) * 2019-09-18 2021-03-19 北京声智科技有限公司 End-to-end speech synthesis error correction method, system and device
CN110851484A (en) * 2019-11-13 2020-02-28 北京香侬慧语科技有限责任公司 Method and device for obtaining multi-index question answers
WO2021129410A1 (en) * 2019-12-23 2021-07-01 华为技术有限公司 Method and device for text processing
CN111191441A (en) * 2020-01-06 2020-05-22 广东博智林机器人有限公司 Text error correction method, device and storage medium
CN111291552A (en) * 2020-05-09 2020-06-16 支付宝(杭州)信息技术有限公司 Method and system for correcting text content
CN112396049A (en) * 2020-11-19 2021-02-23 平安普惠企业管理有限公司 Text error correction method and device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
CN106250364A (en) A kind of text modification method and device
EP3660733B1 (en) Method and system for information extraction from document images using conversational interface and database querying
US10699696B2 (en) Method and apparatus for correcting speech recognition error based on artificial intelligence, and storage medium
CN102982021B (en) For eliminating the method for the ambiguity of the multiple pronunciations in language conversion
CN102156551B (en) Method and system for correcting error of word input
CN106570180B (en) Voice search method and device based on artificial intelligence
CN111625635A (en) Question-answer processing method, language model training method, device, equipment and storage medium
CN107832229A (en) A kind of system testing case automatic generating method based on NLP
EP3846069A1 (en) Pre-training method for sentiment analysis model, and electronic device
CN102393850B (en) A kind of Chinese character pattern cognition similarity determines method
US20230206661A1 (en) Device and method for automatically generating domain-specific image caption by using semantic ontology
CN111160013B (en) Text error correction method and device
CN105808197A (en) Information processing method and electronic device
CN105190645A (en) Leveraging previous instances of handwriting for handwriting beautification and other applications
CN109977398A (en) A kind of speech recognition text error correction method of specific area
CN107526721B (en) Ambiguity elimination method and device for comment vocabularies of e-commerce products
CN113157852A (en) Voice processing method, system, electronic equipment and storage medium
CN112101032A (en) Named entity identification and error correction method based on self-distillation
CN104572632B (en) A kind of method in the translation direction for determining the vocabulary with proper name translation
CN104281275A (en) Method and device for inputting English
CN105243053A (en) Method and apparatus for extracting key sentence of document
Li et al. Wspeller: Robust word segmentation for enhancing chinese spelling check
WO2023225335A1 (en) Performing computer vision tasks by generating sequences of tokens
CN110929514A (en) Text proofreading method and device, computer readable storage medium and electronic equipment
CN110929013A (en) Image question-answer implementation method based on bottom-up entry and positioning information fusion

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20161221