CN106250364A - A kind of text modification method and device - Google Patents
A kind of text modification method and device Download PDFInfo
- Publication number
- CN106250364A CN106250364A CN201610573610.XA CN201610573610A CN106250364A CN 106250364 A CN106250364 A CN 106250364A CN 201610573610 A CN201610573610 A CN 201610573610A CN 106250364 A CN106250364 A CN 106250364A
- Authority
- CN
- China
- Prior art keywords
- word
- participle
- correct
- centering
- pair
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/232—Orthographic correction, e.g. spell checking or vowelisation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
Abstract
Embodiments providing a kind of text modification method and device, wherein method includes: obtain text data to be revised;Obtaining correct word, described correct word is for replacing erroneous words corresponding with described correct word in described text data;The described erroneous words found according to described correct word and replace in described text data.In the present invention, when finding errors in text occur in text, user, without providing any erroneous words, only need to input correct word, and system i.e. goes to search the erroneous words of each correspondence according to correct word automatically.Such as have only to input correct word " dark reddish purple ", it is not necessary to point out that corresponding erroneous words is " dark reddish purple " or " fall ", automatically can look for each erroneous words corresponding according to correct word.Because user has only to provide correct word, it is not necessary to pointed out which erroneous words one by one, substantially increase correction efficiency, and the omission of the erroneous words that may cause because user manually searches can also be avoided, improve the accuracy rate of correction.
Description
Technical field
The present invention relates to field of information processing, particularly relate to a kind of text modification method and device.
Background technology
People are to input text by the way of typewriting traditionally, along with the development of technology, occur in that again the newest
The mode of text input (or perhaps text generation), such as, convert speech into text by speech recognition technology, pass through OCR
Text conversion in picture is become text by technology, etc..But the most traditional typewriting input mode or the input of new text
Mode, all suffers from a problem, continuing to bring out of the most various neologisms (such as network words), original to input system or the system of identification
Dictionary cause no small impact, a large amount of homonyms of producing because of various neologisms, synonym, similar words etc. have a strong impact on
Correct rate for input, causes inputted text often to show some erroneous words.Such as, user passes through one network of phonetic entry
Word " dark reddish purple " (means " so "), may be wrongly recognized into " dark reddish purple ", " fall purple " or " fall when being converted into text
Son " etc..
When inspection is found to have erroneous words, in the prior art, common process means are that user moves the cursor to mistake
Word position, is re-entered correct word, erroneous words is replaced, or carried out certain erroneous words in the whole text by software by mistake
Automatically search and replace, thus completing text correction.But inventor finds during realizing the present invention, in prior art
These text correcting modes since it is desired that user has pointed out which is erroneous words one by one, so efficiency is the lowest.To be carried above
As a example by " dark reddish purple " word arrived, when the user discover that it by when being identified as " dark reddish purple " of mistake, then needs to search the most in the whole text
And replace, when user find again its by mistake when being identified as " dark reddish purple ", it is also desirable to search the most in the whole text and replace, when with
Family find again its by mistake when being identified as " fall son ", in addition it is also necessary to search the most in the whole text and replace, in other words, Yong Huke
Can at least need to carry out three times search in the whole text and replace, the various erroneous words of " dark reddish purple " word could be corrected.Meanwhile, because of
Make mistake word for needs artificial cognition, so the accuracy rate of prior art is relatively low, such as in full in be likely present " dark reddish purple "
Other erroneous words, but user does not finds in checking process, causes occurring in that omission.
Summary of the invention
The present invention provides a kind of text modification method and device, to improve efficiency and the accuracy rate of text correction.
First aspect according to embodiments of the present invention, it is provided that a kind of text modification method, described method includes:
Obtain text data to be revised;
Obtaining correct word, described correct word is for replacing erroneous words corresponding with described correct word in described text data;
The described erroneous words found according to described correct word and replace in described text data.
Optionally, the described erroneous words finding according to described correct word and replacing in described text data, including:
Described text data is carried out participle, being multiple participle words by described text data cutting;
Described correct word is formed word pair with each participle word;
Extracting the similarity of each correct word of word centering and participle word, described similarity includes font similarity, semanteme
Similarity and acoustics similarity;
Similarity according to each word pair and default decision model, obtain each word to the probability for target word pair, institute
State the word pair that target word is the erroneous words corresponding with described correct word to the participle word for word centering;
Described probability according to each word pair and preset algorithm, determine target word pair;
Described correct word is used to replace the participle word of described target word centering in described text data.
Optionally, after described text data is carried out participle, described correct word is formed word pair with each participle word
Before, described method also includes:
Adjacent two individual character obtained after participle is combined into a participle word.
Optionally, extract the font similarity of each correct word of word centering and participle word, including:
If the correct word of current word centering is identical with the number of words of participle word, then each by correct word and participle word
Individual character is converted into quadrangle coding, by correct word and the identical coded number of quadrangle coding and four of each corresponding individual character in participle word
The meansigma methods of the ratio of angle coding editor-in-chief's yardage is as font similarity;
If the correct word of current word centering differs with the number of words of participle word, then dynamic programming algorithm will be used to obtain
The smallest edit distance of correct word and participle word as font similarity.
Optionally, extract the semantic similarity of each correct word of word centering and participle word, including:
Correct word and participle word to current word centering carry out vectorization respectively to obtain term vector;
Using the distance between the term vector of correct word and participle word as semantic similarity.
Optionally, extract the acoustics similarity of each correct word of word centering and participle word, including:
Determine that the correct word of current word centering changes the smallest edit distance in table with participle word in pinyin character
Path;
According on described smallest edit distance path each pinyin character pinyin character conversion distance obtain correct word with
The pinyin character conversion distance of participle word;
Pinyin character conversion distance according to described correct word with participle word obtains the acoustics of correct word and participle word
Distance and using described acoustics distance as acoustics similarity.
Optionally, according to described probability and the preset algorithm of each word pair, determine target word pair, including:
Judge the described probability of each word pair and the magnitude relationship of predetermined threshold value;
Described probability is more than the word of described predetermined threshold value to being defined as target word pair.
Optionally, according to described probability and the preset algorithm of each word pair, determine target word pair, including:
Described probability according to each word pair is to the sequence to carrying out from big to small of institute's predicate;
By the word of the predetermined number stood out to being defined as target word pair.
Optionally, according to described probability and the preset algorithm of each word pair, determine target word pair, including:
The correct word and the participle word that use current word centering in default vocabulary respectively make a look up, wherein said default
In vocabulary, storage has the correct corresponding relation of correct word and erroneous words;
If the erroneous words that the correct word of use current word centering finds in described default vocabulary and current word centering
Participle word identical, and, use current word centering participle word find in described default vocabulary as erroneous words
Correct word identical with the correct word of current word centering, it is determined that current word is to being a target word pair;
If the erroneous words that the correct word of use current word centering finds in described default vocabulary and current word centering
Participle word different, and, use the participle word of current word centering to find in described default vocabulary as erroneous words
Correct word the most different from the correct word of current word centering, it is determined that current word is to not being a target word pair;
If the erroneous words using the correct word of current word centering to find in described default vocabulary only occurring with current
The situation that the participle word of word centering is identical, or, only occur the participle word using current word centering as erroneous words in institute
State the situation that the correct word found in default vocabulary is identical with the correct word of current word centering, then inquire user, and according to
The instruction at family determines whether current word is to being a target word pair.
Second aspect according to embodiments of the present invention, it is provided that a kind of text correcting device, described device includes:
Text acquisition module, for obtaining text data to be revised;
Correct word acquisition module, is used for obtaining correct word, and described correct word is used for replacing in described text data with described
The erroneous words that correct word is corresponding;
Replacement module, for the described erroneous words found according to described correct word and replace in described text data.
Optionally, described replacement module includes:
Participle submodule, for carrying out participle to described text data, being multiple participle by described text data cutting
Word;
Word is to generating submodule, for described correct word is formed word pair with each participle word;
Similarity extracts submodule, for extracting the similarity of each correct word of word centering and participle word, described similar
Degree includes font similarity, semantic similarity and acoustics similarity;
Probability obtains submodule, for the similarity according to each word pair and default decision model, obtains each word pair
For the probability of target word pair, described target word is to the word that the participle word for word centering is the erroneous words corresponding with described correct word
Right;
Target word, to determining submodule, for the described probability according to each word pair and preset algorithm, determines target word pair;
Replace submodule, for using described correct word to replace the participle of described target word centering in described text data
Word.
Optionally, described replacement module also includes:
Individual character combination submodule, for being combined into a participle word by adjacent two individual character obtained after participle.
Optionally, described similarity extraction submodule is similar to the font of participle word at each correct word of word centering of extraction
When spending, it is used for:
If the correct word of current word centering is identical with the number of words of participle word, then each by correct word and participle word
Individual character is converted into quadrangle coding, by correct word and the identical coded number of quadrangle coding and four of each corresponding individual character in participle word
The meansigma methods of the ratio of angle coding editor-in-chief's yardage is as font similarity;
If the correct word of current word centering differs with the number of words of participle word, then dynamic programming algorithm will be used to obtain
The smallest edit distance of correct word and participle word as font similarity.
Optionally, described similarity extracts submodule at the semantic similitude extracting each correct word of word centering and participle word
When spending, it is used for:
Correct word and participle word to current word centering carry out vectorization respectively to obtain term vector;By correct word with point
Distance between the term vector of word word is as semantic similarity.
Optionally, described similarity extraction submodule is similar to the acoustics of participle word at each correct word of word centering of extraction
When spending, it is used for:
Determine that the correct word of current word centering changes the smallest edit distance in table with participle word in pinyin character
Path;Correct word and participle is obtained according to the pinyin character conversion distance of each pinyin character on described smallest edit distance path
The pinyin character conversion distance of word;Pinyin character conversion distance according to described correct word and participle word obtain correct word with
Participle word acoustics distance and using described acoustics distance as acoustics similarity.
Optionally, described probability acquisition submodule is used for:
Judge the described probability of each word pair and the magnitude relationship of predetermined threshold value;By described probability more than described predetermined threshold value
Word is to being defined as target word pair.
Optionally, described probability acquisition submodule is used for:
Described probability according to each word pair is to the sequence to carrying out from big to small of institute's predicate;The predetermined number that will stand out
Word to being defined as target word pair.
Optionally, described probability acquisition submodule is used for:
The correct word and the participle word that use current word centering in default vocabulary respectively make a look up, wherein said default
In vocabulary, storage has the correct corresponding relation of correct word and erroneous words;
If the erroneous words that the correct word of use current word centering finds in described default vocabulary and current word centering
Participle word identical, and, use current word centering participle word find in described default vocabulary as erroneous words
Correct word identical with the correct word of current word centering, it is determined that current word is to being a target word pair;
If the erroneous words that the correct word of use current word centering finds in described default vocabulary and current word centering
Participle word different, and, use the participle word of current word centering to find in described default vocabulary as erroneous words
Correct word the most different from the correct word of current word centering, it is determined that current word is to not being a target word pair;
If the erroneous words using the correct word of current word centering to find in described default vocabulary only occurring with current
The situation that the participle word of word centering is identical, or, only occur the participle word using current word centering as erroneous words in institute
State the situation that the correct word found in default vocabulary is identical with the correct word of current word centering, then inquire user, and according to
The instruction at family determines whether current word is to being a target word pair.
The technical scheme that embodiments of the invention provide can include following beneficial effect:
In the present invention, when finding errors in text occur in text, user, without providing any erroneous words, only needs input
Correct word, system i.e. goes to search the erroneous words of each correspondence according to correct word automatically.Such as, when the user discover that in text
When having showed being written as of " dark reddish purple " word mistake " dark reddish purple " and " fall " etc., it is only necessary to input correct word and i.e. input " dark reddish purple ", nothing
Need to point out that corresponding erroneous words is " dark reddish purple " or " fall ", more without pointing out the position of each erroneous words, system can be automatically
Look for each erroneous words corresponding according to correct word, and automatically use correct word replace determined by erroneous words,
Thus complete text correction.Because user has only to provide correct word, it is not necessary to pointed out which erroneous words one by one, significantly
Improve correction efficiency, and the omission of the erroneous words that may cause because user manually searches can also be avoided, improve and repair
Positive accuracy rate.
It should be appreciated that it is only exemplary and explanatory, not that above general description and details hereinafter describe
The present invention can be limited.
Accompanying drawing explanation
Accompanying drawing herein is merged in description and constitutes the part of this specification, it is shown that meet the enforcement of the present invention
Example, and for explaining the principle of the present invention together with description.
Fig. 1 is the flow chart according to a kind of text modification method shown in the present invention one exemplary embodiment;
Fig. 2 is the flow chart according to a kind of text modification method shown in the present invention one exemplary embodiment;
Fig. 3 is the flow chart according to a kind of text modification method shown in the present invention one exemplary embodiment;
Fig. 4 is the flow chart according to a kind of text modification method shown in the present invention one exemplary embodiment;
Fig. 5 is the schematic diagram according to the smallest edit distance path shown in the present invention one exemplary embodiment;
Fig. 6 is the flow chart according to a kind of text modification method shown in the present invention one exemplary embodiment;
Fig. 7 is the flow chart according to a kind of text modification method shown in the present invention one exemplary embodiment;
Fig. 8 is the flow chart according to a kind of text modification method shown in the present invention one exemplary embodiment;
Fig. 9 is the schematic diagram according to a kind of text correcting device shown in the present invention one exemplary embodiment;
Figure 10 is the schematic diagram according to a kind of text correcting device shown in the present invention one exemplary embodiment;
Figure 11 is the schematic diagram according to a kind of text correcting device shown in the present invention one exemplary embodiment.
Detailed description of the invention
Here will illustrate exemplary embodiment in detail, its example represents in the accompanying drawings.Explained below relates to
During accompanying drawing, unless otherwise indicated, the same numbers in different accompanying drawings represents same or analogous key element.Following exemplary embodiment
Described in embodiment do not represent all embodiments consistent with the present invention.On the contrary, they are only with the most appended
The example of the apparatus and method that some aspects that described in detail in claims, the present invention are consistent.
Fig. 1 is the flow chart according to a kind of text modification method shown in the present invention one exemplary embodiment.The method can
For the mobile terminals such as mobile phone and the equipment such as PC, server.
Shown in Figure 1, the method may include that
Step S101, obtains text data to be revised.
Described text data to be revised can determine according to the demand of user, coming for text data to be revised
Source the present embodiment does not limit, such as, can be the text that manually enters of user, it is also possible to be the literary composition that obtains of speech recognition
Notebook data, or, it is that OCR (Optical Character Recognition, optical character recognition) identifies the textual data obtained
According to, etc..
Step S102, obtains correct word, and described correct word is used for replacing in described text data corresponding with described correct word
Erroneous words.
In the present embodiment, when finding to there is text mistake, user has only to input correct word, it is not necessary to it is right to point out
Which the erroneous words answered has and respectively where.
Step S103, the described erroneous words finding according to described correct word and replacing in described text data.
For the described erroneous words specifically the most how found according to described correct word and replace in described text data, this enforcement
Example does not limit, and is illustrated below by Fig. 2:
Shown in Figure 2, in the present embodiment or the present invention some other embodiments, find according to described correct word and replace
Change the described erroneous words in described text data, the most described step S103, may include that
Step S201, carries out participle to described text data, being multiple participle words by described text data cutting.
The segmenting method used can be such as segmenting method based on condition random field, not enters this present embodiment
Row limits.
For example, text data to be revised is " I thought not ", and the word segmentation result obtained is " I thought not ",
Wherein " not having " is erroneous words, needs to be modified to " U.S. ".
Additionally, in order to miss some words when preventing participle, in the present embodiment can also be adjacent by obtain after participle
Two individual characters are combined into a participle word, namely successively previous individual character and later individual character are combined into participle word.Example
As word segmentation result above comprises multiple continuous individual character i.e. " I ", " thinking ", " going ", after being combined by described individual character, the participle obtained
Word is " I thinks " and " thinking ".
Step S202, forms word pair by described correct word with each participle word.
Such as going up the correct word " U.S. " in example can be with the following multiple word pair of multiple participle words composition obtained: " beautiful
State-I ", " U.S.-think ", " U.S.-go ", " U.S.-do not had ", " U.S.-I think ", " U.S.-think ".
Step S203, extracts the similarity of each correct word of word centering and participle word, and described similarity includes font phase
Like degree, semantic similarity and acoustics similarity.
For specifically how extracting these three similarity, the present embodiment does not limit, and those skilled in the art are permissible
According to different demands difference scene and designed, designed, can be in these designs used herein all without departing from the essence of the present invention
God and protection domain.
Step S204, according to similarity and the default decision model of each word pair, obtains each word to for target word pair
Probability, described target word is to the word pair that the participle word for word centering is the erroneous words corresponding with described correct word.
Described decision model can obtain by building in advance.For example, it is possible to collect a large amount of text data in advance, manually look for
To erroneous words present in text data and provide the correct word that erroneous words is corresponding, by described correct word and participle in text data
After participle word composition word just can manually mark each word to whether being target word pair to rear, be the most whether real " just
Really word-erroneous words " word pair.When specifically marking, it is possible to use 0 and 1 as mark feature, if current word is to for real
" correct word-erroneous words " word pair, then be labeled as 1, be otherwise labeled as 0.Then, extract the similarity of each two words of word centering,
I.e. font similarity, semantic similarity, acoustics similarity.Finally using described similarity and mark feature as training data, instruction
Get this decision model.When specifically training, using the similarity of each word pair as the input of model, by the mark of each word pair
Model parameter, as the output of model, is updated by feature, and parameter updates after terminating, and obtains decision model.
When using this decision model, can using the similarity of each two words of word centering as the input of decision model,
Then each word is exported to the probability for real " correct word-erroneous words " word pair.
Step S205, according to described probability and the preset algorithm of each word pair, determines target word pair.
Obtain each word to after for the probability of target word pair, it is possible to which filtering out according to preset algorithm is real mesh
Mark word pair.Particular content the present embodiment for preset algorithm does not limit, and those skilled in the art can be according to difference
Demand difference scene and designed, designed, can be in these designs used herein all without departing from the spirit of the present invention and protection
Scope
Step S206, uses described correct word to replace the participle word of described target word centering in described text data.
Such as correct word is " U.S. ", and target word is to being " U.S.-do not had ", then can use in text data full text
" U.S. " goes to replace " not having ", thus completes correction.
In the present embodiment, when finding errors in text occur in text, user, without providing any erroneous words, only needs defeated
Entering correct word, system i.e. goes to search the erroneous words of each correspondence according to correct word automatically.Such as, when the user discover that in text
When occurring in that being written as of " dark reddish purple " word mistake " dark reddish purple " and " fall " etc., it is only necessary to input correct word and i.e. input " dark reddish purple ",
Without pointing out that the erroneous words of correspondence is " dark reddish purple " or " fall ", more without pointing out the position of each erroneous words, system can be certainly
Dynamic look for each erroneous words corresponding according to correct word, and automatically use correct word replace determined by mistake
Word, thus complete text correction.Because user has only to provide correct word, it is not necessary to pointed out which erroneous words one by one,
Substantially increase correction efficiency, and the omission of the erroneous words that may cause because user manually searches can also be avoided, improve
The accuracy rate revised.
Below to how extracting the similarity of each correct word of word centering and participle word, namely step S203, further
It is illustrated.
In the present embodiment or the present invention some other embodiments, extract the word of each correct word of word centering and participle word
Shape similarity, specifically may include that
If the correct word of current word centering is identical with the number of words of participle word, then each by correct word and participle word
Individual character is converted into quadrangle coding, by correct word and the identical coded number of quadrangle coding and four of each corresponding individual character in participle word
The meansigma methods of the ratio of angle coding editor-in-chief's yardage is as font similarity.
Shown in circular such as formula (1):
Wherein, T represents the font similarity of two words of word centering, and n is the number of words of each word of word centering, liRepresent two words
The identical coded number of quadrangle coding of middle i-th word, LiRepresent that in two words, quadrangle coding editor-in-chief's yardage of i-th word is (usually
4)。
For example, as follows to the font Similarity Measure process of " to going-thinking " for word:
" to " quadrangle coding be 2722
The quadrangle coding " thought " is 4633
1st word i.e. " to " and quadrangle coding editor-in-chief's yardage of " thinking " be 4, but there is no identical coding,
And the 2nd word " is gone " and " going ",So finally obtaining the font similarity of this word pair according to formula (1) is
0.5。
If the correct word of current word centering differs with the number of words of participle word, then can will use dynamic programming algorithm
The correct word obtained and the smallest edit distance of participle word are as font similarity.Existing skill can be used when implementing
Art, here is omitted.
Shown in Figure 3, in the present embodiment or the present invention some other embodiments, extract each correct word of word centering with
The semantic similarity of participle word, specifically may include that
Step S301, correct word and participle word to current word centering carry out vectorization respectively to obtain term vector.
Step S302, using the distance between the term vector of correct word and participle word as semantic similarity.
As example, concrete vectorization method can use the methods such as Word2Vec word each to word centering to carry out vector
Change.After obtaining the term vector of each word of word centering, the distance of two term vectors can be COS distance, Euclidean distance etc., specifically
Computational methods are same as the prior art, are not described in detail in this.
Shown in Figure 4, in the present embodiment or the present invention some other embodiments, extract each correct word of word centering with
The acoustics similarity of participle word, specifically may include that
Step S401, determines that the correct word of current word centering changes the minimum in table with participle word in pinyin character
Editing distance path.
Step S402, obtains according to the pinyin character conversion distance of each pinyin character on described smallest edit distance path
The pinyin character conversion distance of correct word and participle word.
Step S403, obtains correct word and participle word according to the pinyin character conversion distance of described correct word with participle word
Language acoustics distance and using described acoustics distance as acoustics similarity.
Described acoustics similarity refers to that two words, in enunciative similarity, use the acoustics distance of two words to represent, two
The acoustics distance of word is the nearest, then acoustics similarity is the highest.Distance can be changed by the pinyin character of two words to calculate, i.e. root
Come according to the conversion distance of two pinyin character in pinyin character conversion distance table (or perhaps pinyin character conversion confusion matrix)
Calculate.Table 1 is part pinyin character conversion confusion matrix, and wherein, the first row and first is classified as the pinyin character of mutually conversion, two
Character intersection is conversion distance.
Table 1
a | ai | an | ang | ao | b | c | ch | d | e | ei | en | eng | |
a | ‐ | 0.67 | 0.65 | 0.72 | 0.6 | 1 | 1 | 1 | 1 | 0.6 | 0.893 | 0.88 | 0.927 |
ai | 0.67 | ‐ | 0.7 | 0.95 | 0.928 | 1 | 1 | 1 | 1 | 0.914 | 0.763 | 0.866 | 0.928 |
an | 0.654 | 0.699 | ‐ | 0.6 | 0.938 | 1 | 1 | 1 | 1 | 0.954 | 0.944 | 0.67 | 0.832 |
ang | 0.716 | 0.95 | 0.6 | ‐ | 0.793 | 1 | 1 | 1 | 1 | 0.972 | 0.971 | 0.877 | 0.737 |
Pinyin character conversion distance according to two words calculates the acoustics distance of two words, and circular can be such as formula
(2) shown in:
Wherein, Dacou(a1,a2) it is the acoustics distance of two words, Dedit(a1,a2) be two words pinyin character conversion away from
From.Dedit(a1,a2) two words minimum editor in pinyin character conversion distance table can be searched according to dynamic programming method
Distance path, will i.e. can get the phonetic word of two words after the pinyin character conversion distance fusion of each pinyin character on this path
Symbol conversion distance Dedit(a1,a2), concrete fusion method such as can be averaged, simply cumulative or weighted accumulation etc..
For example, " report a case to the security authorities " and the pinyin character conversion distance calculating method of " standby dish " two words be as follows:
1) each word is converted into phonetic
Report a case to the security authorities-> bao an
Standby dish-> bei cai
2) according to pinyin character conversion confusion matrix (namely pinyin character conversion distance table), table look-up and obtain each phonetic word
The pinyin character conversion distance of symbol is as shown in table 2:
Table 2
b | ao | an | |
b | 0 | 1 | 1 |
ei | 1 | 0.976 | 0.944 |
c | 1 | 1 | 1 |
ai | 1 | 0.928 | 0.699 |
3) utilize dynamic programming method, calculate the pinyin character conversion distance of two words
When specifically calculating, it is possible to use dynamic programming method searches pinyin character conversion distance table, find minimum editor away from
From path, after the value on this path being merged, i.e. can get the pinyin character conversion distance of two words, as it is shown in figure 5, shadow region
Territory is smallest edit distance path, the pinyin character on smallest edit distance path is changed distance and the most simply adds up
I.e. can get pinyin character the conversion distance, i.e. 0+0+0.976+1+0.699=2.675 of two words.
Additionally, for step S205, i.e. according to described probability and the preset algorithm of each word pair, determine target word pair, permissible
There is various ways to realize, be illustrated below by Fig. 6~Fig. 8:
Shown in Figure 6, in the present embodiment or the present invention some other embodiments, according to the described probability of each word pair and
Preset algorithm, determines target word pair, may include that
Step S601, it is judged that the described probability of each word pair and the magnitude relationship of predetermined threshold value.
Step S602, is more than the word of described predetermined threshold value to being defined as target word pair by described probability.
Or it is shown in Figure 7, in the present embodiment or the present invention some other embodiments, according to each word pair
Probability and preset algorithm, determine target word pair, may include that
Step S701, according to the described probability of each word pair to the sequence to carrying out from big to small of institute's predicate.
Step S702, by the word of the predetermined number stood out to being defined as target word pair.
Or it is shown in Figure 8, in the present embodiment or the present invention some other embodiments, according to each word pair
Probability and preset algorithm, determine target word pair, may include that
Step S801, uses the correct word of current word centering and participle word to make a look up in default vocabulary respectively, its
Described in preset storage in vocabulary and have the correct corresponding relation of correct word and erroneous words.
Described default vocabulary preserves the correct word easily made mistakes and the erroneous words of correspondence thereof, such as " U.S.-do not had ", " U.S.
State-often cross " etc..Described vocabulary can be built the most in advance by domain expert and obtain.
Step S802, if using the erroneous words that finds in described default vocabulary of correct word of current word centering and working as
The participle word of front word centering is identical, and, use current word centering participle word as erroneous words at described default vocabulary
In the correct word that finds identical with the correct word of current word centering, it is determined that current word is to being a target word pair.
Step S803, if using the erroneous words that finds in described default vocabulary of correct word of current word centering and working as
The participle word of front word centering is different, and, use the participle word of current word centering as erroneous words at described default vocabulary
In the correct word that finds the most different from the correct word of current word centering, it is determined that current word is to not being a target word pair.
, if only there is the mistake using the correct word of current word centering to find in described default vocabulary in step S804
The situation that word is identical with the participle word of current word centering, or, the participle word using current word centering only occurs as mistake
The situation that correct word that by mistake word finds in described default vocabulary is identical with the correct word of current word centering, then inquire user,
And determine whether current word is to being a target word pair according to the instruction of user.If now user confirms, it is determined that current
Word is to being a target word pair, if user is unconfirmed, it is determined that current word is to not being a target word pair.
It should be noted that for Fig. 6~Fig. 8 these three mode, it is also possible to carry out combination of two or three combine together
Use, order the present embodiment when syntagmatic and combination is not limited.For example, it is possible to it is big first to filter out probability
In the word pair of threshold value, carry out the sequence of probability size the most on this basis, choose the word of the predetermined number stood out to really
It is set to target word pair;Again for example, it is possible to first carry out the sequence of probability size, choose the word pair of the predetermined number stood out, so
After recycle described default vocabulary on this basis and screen;Again for example, it is possible to first filter out the probability word pair more than threshold value,
Recycle described default vocabulary the most on this basis and carry out postsearch screening;Etc..
Following for apparatus of the present invention embodiment, may be used for performing the inventive method embodiment.Real for apparatus of the present invention
Execute the details not disclosed in example, refer to the inventive method embodiment.
Fig. 9 is the schematic diagram according to a kind of text correcting device shown in the present invention one exemplary embodiment.This device can
For the mobile terminals such as mobile phone and the equipment such as PC, server.
Shown in Figure 9, this device may include that
Text acquisition module 901, for obtaining text data to be revised;
Correct word acquisition module 902, is used for obtaining correct word, and described correct word is used for replacing in described text data and institute
State the erroneous words that correct word is corresponding;
Replacement module 903, for the described erroneous words found according to described correct word and replace in described text data.
Shown in Figure 10, in the present embodiment or the present invention some other embodiments, described replacement module may include that
Participle submodule 1001, for carrying out participle to described text data, being multiple by described text data cutting
Participle word;
Word is to generating submodule 1002, for described correct word is formed word pair with each participle word;
Similarity extracts submodule 1003, for extracting the similarity of each correct word of word centering and participle word, described
Similarity includes font similarity, semantic similarity and acoustics similarity;
Probability obtains submodule 1004, for the similarity according to each word pair and default decision model, obtains each
Word is to the probability for target word pair, and described target word is the erroneous words corresponding with described correct word to the participle word for word centering
Word pair;
Target word, to determining submodule 1005, for the described probability according to each word pair and preset algorithm, determines target word
Right;
Replace submodule 1006, for using described correct word to replace described target word centering in described text data
Participle word.
Shown in Figure 11, in the present embodiment or the present invention some other embodiments, described replacement module can also wrap
Include:
Individual character combination submodule 1101, for being combined into a participle word by adjacent two individual character obtained after participle
Language.
In the present embodiment or the present invention some other embodiments, described similarity is extracted submodule and is being extracted each word pair
In the font similarity of correct word and participle word time, specifically may be used for:
If the correct word of current word centering is identical with the number of words of participle word, then each by correct word and participle word
Individual character is converted into quadrangle coding, by correct word and the identical coded number of quadrangle coding and four of each corresponding individual character in participle word
The meansigma methods of the ratio of angle coding editor-in-chief's yardage is as font similarity;
If the correct word of current word centering differs with the number of words of participle word, then dynamic programming algorithm will be used to obtain
The smallest edit distance of correct word and participle word as font similarity.
In the present embodiment or the present invention some other embodiments, described similarity is extracted submodule and is being extracted each word pair
In the semantic similarity of correct word and participle word time, specifically may be used for:
Correct word and participle word to current word centering carry out vectorization respectively to obtain term vector;By correct word with point
Distance between the term vector of word word is as semantic similarity.
In the present embodiment or the present invention some other embodiments, described similarity is extracted submodule and is being extracted each word pair
In the acoustics similarity of correct word and participle word time, specifically may be used for:
Determine that the correct word of current word centering changes the smallest edit distance in table with participle word in pinyin character
Path;Correct word and participle is obtained according to the pinyin character conversion distance of each pinyin character on described smallest edit distance path
The pinyin character conversion distance of word;Pinyin character conversion distance according to described correct word and participle word obtain correct word with
Participle word acoustics distance and using described acoustics distance as acoustics similarity.
In the present embodiment or the present invention some other embodiments, described probability obtains submodule and specifically may be used for:
Judge the described probability of each word pair and the magnitude relationship of predetermined threshold value;By described probability more than described predetermined threshold value
Word is to being defined as target word pair.
In the present embodiment or the present invention some other embodiments, described probability obtains submodule and specifically may be used for:
Described probability according to each word pair is to the sequence to carrying out from big to small of institute's predicate;The predetermined number that will stand out
Word to being defined as target word pair.
In the present embodiment or the present invention some other embodiments, described probability obtains submodule and specifically may be used for:
The correct word and the participle word that use current word centering in default vocabulary respectively make a look up, wherein said default
In vocabulary, storage has the correct corresponding relation of correct word and erroneous words;
If the erroneous words that the correct word of use current word centering finds in described default vocabulary and current word centering
Participle word identical, and, use current word centering participle word find in described default vocabulary as erroneous words
Correct word identical with the correct word of current word centering, it is determined that current word is to being a target word pair;
If the erroneous words that the correct word of use current word centering finds in described default vocabulary and current word centering
Participle word different, and, use the participle word of current word centering to find in described default vocabulary as erroneous words
Correct word the most different from the correct word of current word centering, it is determined that current word is to not being a target word pair;
If the erroneous words using the correct word of current word centering to find in described default vocabulary only occurring with current
The situation that the participle word of word centering is identical, or, only occur the participle word using current word centering as erroneous words in institute
State the situation that the correct word found in default vocabulary is identical with the correct word of current word centering, then inquire user, and according to
The instruction at family determines whether current word is to being a target word pair.
In the present embodiment, when finding errors in text occur in text, user, without providing any erroneous words, only needs defeated
Entering correct word, system i.e. goes to search the erroneous words of each correspondence according to correct word automatically.Such as, when the user discover that in text
When occurring in that being written as of " dark reddish purple " word mistake " dark reddish purple " and " fall " etc., it is only necessary to input correct word and i.e. input " dark reddish purple ",
Without pointing out that the erroneous words of correspondence is " dark reddish purple " or " fall ", more without pointing out the position of each erroneous words, system can be certainly
Dynamic look for each erroneous words corresponding according to correct word, and automatically use correct word replace determined by mistake
Word, thus complete text correction.Because user has only to provide correct word, it is not necessary to pointed out which erroneous words one by one,
Substantially increase correction efficiency, and the omission of the erroneous words that may cause because user manually searches can also be avoided, improve
The accuracy rate revised.
About the device in above-described embodiment, wherein unit module perform the concrete mode of operation relevant
The embodiment of the method is described in detail, explanation will be not set forth in detail herein.
Those skilled in the art, after considering description and putting into practice invention disclosed herein, will readily occur to its of the present invention
Its embodiment.The application is intended to any modification, purposes or the adaptations of the present invention, these modification, purposes or
Person's adaptations is followed the general principle of the present invention and includes the undocumented common knowledge in the art of the present invention
Or conventional techniques means.Description and embodiments is considered only as exemplary, and true scope and spirit of the invention are by appended
Claim is pointed out.
It should be appreciated that the invention is not limited in precision architecture described above and illustrated in the accompanying drawings, and
And various modifications and changes can carried out without departing from the scope.The scope of the present invention is only limited by appended claim.
Claims (18)
1. a text modification method, it is characterised in that described method includes:
Obtain text data to be revised;
Obtaining correct word, described correct word is for replacing erroneous words corresponding with described correct word in described text data;
The described erroneous words found according to described correct word and replace in described text data.
Method the most according to claim 1, it is characterised in that find according to described correct word and replace described text data
In described erroneous words, including:
Described text data is carried out participle, being multiple participle words by described text data cutting;
Described correct word is formed word pair with each participle word;
Extracting the similarity of each correct word of word centering and participle word, described similarity includes font similarity, semantic similitude
Degree and acoustics similarity;
Similarity according to each word pair and default decision model, obtain each word to the probability for target word pair, described mesh
Mark word is the word pair of the erroneous words corresponding with described correct word to the participle word for word centering;
Described probability according to each word pair and preset algorithm, determine target word pair;
Described correct word is used to replace the participle word of described target word centering in described text data.
Method the most according to claim 2, it is characterised in that after described text data is carried out participle, by described just
Really word and each participle word composition word are to before, and described method also includes:
Adjacent two individual character obtained after participle is combined into a participle word.
Method the most according to claim 2, it is characterised in that extract the font of each correct word of word centering and participle word
Similarity, including:
If the correct word of current word centering is identical with the number of words of participle word, then by each individual character of correct word Yu participle word
It is converted into quadrangle coding, correct word is compiled with corner with the identical coded number of quadrangle coding of each corresponding individual character in participle word
The meansigma methods of the ratio of code editor-in-chief's yardage is as font similarity;
If the correct word of current word centering differs with the number of words of participle word, then dynamic programming algorithm will be used just to obtain
Really word and the smallest edit distance of participle word are as font similarity.
Method the most according to claim 2, it is characterised in that extract the semanteme of each correct word of word centering and participle word
Similarity, including:
Correct word and participle word to current word centering carry out vectorization respectively to obtain term vector;
Using the distance between the term vector of correct word and participle word as semantic similarity.
Method the most according to claim 2, it is characterised in that extract the acoustics of each correct word of word centering and participle word
Similarity, including:
Determine that the correct word of current word centering changes the smallest edit distance path in table with participle word in pinyin character;
Correct word and participle is obtained according to the pinyin character conversion distance of each pinyin character on described smallest edit distance path
The pinyin character conversion distance of word;
Pinyin character conversion distance according to described correct word with participle word obtains the acoustics distance of correct word and participle word
And using described acoustics distance as acoustics similarity.
Method the most according to claim 2, it is characterised in that according to described probability and the preset algorithm of each word pair, determine
Target word pair, including:
Judge the described probability of each word pair and the magnitude relationship of predetermined threshold value;
Described probability is more than the word of described predetermined threshold value to being defined as target word pair.
Method the most according to claim 2, it is characterised in that according to described probability and the preset algorithm of each word pair, determine
Target word pair, including:
Described probability according to each word pair is to the sequence to carrying out from big to small of institute's predicate;
By the word of the predetermined number stood out to being defined as target word pair.
Method the most according to claim 2, it is characterised in that according to described probability and the preset algorithm of each word pair, determine
Target word pair, including:
The correct word and the participle word that use current word centering in default vocabulary respectively make a look up, wherein said default vocabulary
Middle storage has the correct corresponding relation of correct word and erroneous words;
If the erroneous words that the correct word of use current word centering finds in described default vocabulary is divided with current word centering
Word word is identical, and, use the participle word of current word centering just to find in described default vocabulary as erroneous words
Really word is identical with the correct word of current word centering, it is determined that current word is to being a target word pair;
If the erroneous words that the correct word of use current word centering finds in described default vocabulary is divided with current word centering
Word word is different, and, use the participle word of current word centering just to find in described default vocabulary as erroneous words
Really word is the most different from the correct word of current word centering, it is determined that current word is to not being a target word pair;
If erroneous words and the current word pair using the correct word of current word centering to find in described default vocabulary only occurs
In the identical situation of participle word, or, only occur the participle word using current word centering as erroneous words described pre-
If the situation that the correct word found in vocabulary is identical with the correct word of current word centering, then inquire user, and according to user's
Instruction determines whether current word is to being a target word pair.
10. a text correcting device, it is characterised in that described device includes:
Text acquisition module, for obtaining text data to be revised;
Correct word acquisition module, is used for obtaining correct word, described correct word be used for replacing in described text data with described correctly
The erroneous words that word is corresponding;
Replacement module, for the described erroneous words found according to described correct word and replace in described text data.
11. devices according to claim 10, it is characterised in that described replacement module includes:
Participle submodule, for carrying out participle to described text data, being multiple participle words by described text data cutting;
Word is to generating submodule, for described correct word is formed word pair with each participle word;
Similarity extracts submodule, for extracting the similarity of each correct word of word centering and participle word, described similarity bag
Include font similarity, semantic similarity and acoustics similarity;
Probability obtains submodule, for according to the similarity of each word pair and default decision model, obtains each word to for mesh
The probability of mark word pair, described target word is to the word pair that the participle word for word centering is the erroneous words corresponding with described correct word;
Target word, to determining submodule, for the described probability according to each word pair and preset algorithm, determines target word pair;
Replace submodule, for using described correct word to replace the participle word of described target word centering in described text data
Language.
12. devices according to claim 11, it is characterised in that described replacement module also includes:
Individual character combination submodule, for being combined into a participle word by adjacent two individual character obtained after participle.
13. devices according to claim 11, it is characterised in that described similarity is extracted submodule and extracted each word pair
In the font similarity of correct word and participle word time, be used for:
If the correct word of current word centering is identical with the number of words of participle word, then by each individual character of correct word Yu participle word
It is converted into quadrangle coding, correct word is compiled with corner with the identical coded number of quadrangle coding of each corresponding individual character in participle word
The meansigma methods of the ratio of code editor-in-chief's yardage is as font similarity;
If the correct word of current word centering differs with the number of words of participle word, then dynamic programming algorithm will be used just to obtain
Really word and the smallest edit distance of participle word are as font similarity.
14. devices according to claim 11, it is characterised in that described similarity is extracted submodule and extracted each word pair
In the semantic similarity of correct word and participle word time, be used for:
Correct word and participle word to current word centering carry out vectorization respectively to obtain term vector;By correct word and participle word
Distance between the term vector of language is as semantic similarity.
15. devices according to claim 11, it is characterised in that described similarity is extracted submodule and extracted each word pair
In the acoustics similarity of correct word and participle word time, be used for:
Determine that the correct word of current word centering changes the smallest edit distance path in table with participle word in pinyin character;
Correct word and participle word is obtained according to the pinyin character conversion distance of each pinyin character on described smallest edit distance path
Pinyin character conversion distance;Pinyin character conversion distance according to described correct word with participle word obtains correct word and participle
Word acoustics distance and using described acoustics distance as acoustics similarity.
16. devices according to claim 11, it is characterised in that described probability obtains submodule and is used for:
Judge the described probability of each word pair and the magnitude relationship of predetermined threshold value;Described probability is more than the word pair of described predetermined threshold value
It is defined as target word pair.
17. devices according to claim 11, it is characterised in that described probability obtains submodule and is used for:
Described probability according to each word pair is to the sequence to carrying out from big to small of institute's predicate;The word of predetermined number that will stand out
To being defined as target word pair.
18. devices according to claim 11, it is characterised in that described probability obtains submodule and is used for:
The correct word and the participle word that use current word centering in default vocabulary respectively make a look up, wherein said default vocabulary
Middle storage has the correct corresponding relation of correct word and erroneous words;
If the erroneous words that the correct word of use current word centering finds in described default vocabulary is divided with current word centering
Word word is identical, and, use the participle word of current word centering just to find in described default vocabulary as erroneous words
Really word is identical with the correct word of current word centering, it is determined that current word is to being a target word pair;
If the erroneous words that the correct word of use current word centering finds in described default vocabulary is divided with current word centering
Word word is different, and, use the participle word of current word centering just to find in described default vocabulary as erroneous words
Really word is the most different from the correct word of current word centering, it is determined that current word is to not being a target word pair;
If erroneous words and the current word pair using the correct word of current word centering to find in described default vocabulary only occurs
In the identical situation of participle word, or, only occur the participle word using current word centering as erroneous words described pre-
If the situation that the correct word found in vocabulary is identical with the correct word of current word centering, then inquire user, and according to user's
Instruction determines whether current word is to being a target word pair.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610573610.XA CN106250364A (en) | 2016-07-20 | 2016-07-20 | A kind of text modification method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610573610.XA CN106250364A (en) | 2016-07-20 | 2016-07-20 | A kind of text modification method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106250364A true CN106250364A (en) | 2016-12-21 |
Family
ID=57613760
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610573610.XA Pending CN106250364A (en) | 2016-07-20 | 2016-07-20 | A kind of text modification method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106250364A (en) |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106874241A (en) * | 2016-12-23 | 2017-06-20 | 《中国医药科学》杂志社有限公司 | A kind of intelligent manuscript editing system |
CN107168941A (en) * | 2017-05-12 | 2017-09-15 | 掌阅科技股份有限公司 | Content of text modification method, electronic equipment, computer-readable storage medium |
CN107291698A (en) * | 2017-06-30 | 2017-10-24 | 广东欧珀移动通信有限公司 | Information revision method, device, storage medium and electronic equipment |
CN107633250A (en) * | 2017-09-11 | 2018-01-26 | 畅捷通信息技术股份有限公司 | A kind of Text region error correction method, error correction system and computer installation |
CN108804414A (en) * | 2018-05-04 | 2018-11-13 | 科沃斯商用机器人有限公司 | Text modification method, device, smart machine and readable storage medium storing program for executing |
CN108874174A (en) * | 2018-05-29 | 2018-11-23 | 腾讯科技(深圳)有限公司 | A kind of text error correction method, device and relevant device |
CN109085932A (en) * | 2018-08-17 | 2018-12-25 | 科大讯飞股份有限公司 | A kind of candidate entry method of adjustment, device, equipment and readable storage medium storing program for executing |
CN109308295A (en) * | 2018-09-26 | 2019-02-05 | 南京邮电大学 | A kind of privacy exposure method of real-time of data-oriented publication |
CN109657115A (en) * | 2018-10-18 | 2019-04-19 | 平安科技(深圳)有限公司 | Crawl data self-repair method, device, equipment and medium |
CN109814734A (en) * | 2019-01-15 | 2019-05-28 | 上海趣虫科技有限公司 | A kind of method and processing terminal of the input of the amendment Chinese phonetic alphabet |
CN110737757A (en) * | 2018-07-03 | 2020-01-31 | 百度在线网络技术(北京)有限公司 | Method and apparatus for generating information |
CN110851484A (en) * | 2019-11-13 | 2020-02-28 | 北京香侬慧语科技有限责任公司 | Method and device for obtaining multi-index question answers |
WO2020052060A1 (en) * | 2018-09-14 | 2020-03-19 | 北京字节跳动网络技术有限公司 | Method and apparatus for generating correction statement |
CN111191441A (en) * | 2020-01-06 | 2020-05-22 | 广东博智林机器人有限公司 | Text error correction method, device and storage medium |
CN111291552A (en) * | 2020-05-09 | 2020-06-16 | 支付宝(杭州)信息技术有限公司 | Method and system for correcting text content |
US10929710B2 (en) | 2019-05-21 | 2021-02-23 | Advanced New Technologies Co., Ltd. | Methods and devices for quantifying text similarity |
CN112396049A (en) * | 2020-11-19 | 2021-02-23 | 平安普惠企业管理有限公司 | Text error correction method and device, computer equipment and storage medium |
CN112530405A (en) * | 2019-09-18 | 2021-03-19 | 北京声智科技有限公司 | End-to-end speech synthesis error correction method, system and device |
WO2021129410A1 (en) * | 2019-12-23 | 2021-07-01 | 华为技术有限公司 | Method and device for text processing |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070033037A1 (en) * | 2005-08-05 | 2007-02-08 | Microsoft Corporation | Redictation of misrecognized words using a list of alternatives |
CN103207769A (en) * | 2012-01-16 | 2013-07-17 | 联想(北京)有限公司 | Method and user equipment for voice amending |
CN103399907A (en) * | 2013-07-31 | 2013-11-20 | 深圳市华傲数据技术有限公司 | Method and device for calculating similarity of Chinese character strings on the basis of edit distance |
CN103903618A (en) * | 2012-12-28 | 2014-07-02 | 联想(北京)有限公司 | Voice input method and electronic device |
CN105244029A (en) * | 2015-08-28 | 2016-01-13 | 科大讯飞股份有限公司 | Voice recognition post-processing method and system |
-
2016
- 2016-07-20 CN CN201610573610.XA patent/CN106250364A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070033037A1 (en) * | 2005-08-05 | 2007-02-08 | Microsoft Corporation | Redictation of misrecognized words using a list of alternatives |
CN103207769A (en) * | 2012-01-16 | 2013-07-17 | 联想(北京)有限公司 | Method and user equipment for voice amending |
CN103903618A (en) * | 2012-12-28 | 2014-07-02 | 联想(北京)有限公司 | Voice input method and electronic device |
CN103399907A (en) * | 2013-07-31 | 2013-11-20 | 深圳市华傲数据技术有限公司 | Method and device for calculating similarity of Chinese character strings on the basis of edit distance |
CN105244029A (en) * | 2015-08-28 | 2016-01-13 | 科大讯飞股份有限公司 | Voice recognition post-processing method and system |
Cited By (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106874241A (en) * | 2016-12-23 | 2017-06-20 | 《中国医药科学》杂志社有限公司 | A kind of intelligent manuscript editing system |
CN107168941A (en) * | 2017-05-12 | 2017-09-15 | 掌阅科技股份有限公司 | Content of text modification method, electronic equipment, computer-readable storage medium |
CN107291698A (en) * | 2017-06-30 | 2017-10-24 | 广东欧珀移动通信有限公司 | Information revision method, device, storage medium and electronic equipment |
CN107633250A (en) * | 2017-09-11 | 2018-01-26 | 畅捷通信息技术股份有限公司 | A kind of Text region error correction method, error correction system and computer installation |
CN107633250B (en) * | 2017-09-11 | 2023-04-18 | 畅捷通信息技术股份有限公司 | Character recognition error correction method, error correction system and computer device |
CN108804414A (en) * | 2018-05-04 | 2018-11-13 | 科沃斯商用机器人有限公司 | Text modification method, device, smart machine and readable storage medium storing program for executing |
CN108874174A (en) * | 2018-05-29 | 2018-11-23 | 腾讯科技(深圳)有限公司 | A kind of text error correction method, device and relevant device |
CN110737757A (en) * | 2018-07-03 | 2020-01-31 | 百度在线网络技术(北京)有限公司 | Method and apparatus for generating information |
CN110737757B (en) * | 2018-07-03 | 2022-07-05 | 百度在线网络技术(北京)有限公司 | Method and apparatus for generating information |
CN109085932A (en) * | 2018-08-17 | 2018-12-25 | 科大讯飞股份有限公司 | A kind of candidate entry method of adjustment, device, equipment and readable storage medium storing program for executing |
WO2020052060A1 (en) * | 2018-09-14 | 2020-03-19 | 北京字节跳动网络技术有限公司 | Method and apparatus for generating correction statement |
US11531814B2 (en) | 2018-09-14 | 2022-12-20 | Beijing Bytedance Network Technology Co., Ltd. | Method and device for generating modified statement |
CN109308295A (en) * | 2018-09-26 | 2019-02-05 | 南京邮电大学 | A kind of privacy exposure method of real-time of data-oriented publication |
CN109657115B (en) * | 2018-10-18 | 2023-04-14 | 平安科技(深圳)有限公司 | Crawling data self-repairing method, device, equipment and medium |
CN109657115A (en) * | 2018-10-18 | 2019-04-19 | 平安科技(深圳)有限公司 | Crawl data self-repair method, device, equipment and medium |
CN109814734B (en) * | 2019-01-15 | 2022-04-15 | 上海趣虫科技有限公司 | Method for correcting Chinese pinyin input and processing terminal |
CN109814734A (en) * | 2019-01-15 | 2019-05-28 | 上海趣虫科技有限公司 | A kind of method and processing terminal of the input of the amendment Chinese phonetic alphabet |
US10929710B2 (en) | 2019-05-21 | 2021-02-23 | Advanced New Technologies Co., Ltd. | Methods and devices for quantifying text similarity |
CN113723466A (en) * | 2019-05-21 | 2021-11-30 | 创新先进技术有限公司 | Text similarity quantification method, equipment and system |
US11210553B2 (en) | 2019-05-21 | 2021-12-28 | Advanced New Technologies Co., Ltd. | Methods and devices for quantifying text similarity |
CN113723466B (en) * | 2019-05-21 | 2024-03-08 | 创新先进技术有限公司 | Text similarity quantification method, device and system |
CN112530405A (en) * | 2019-09-18 | 2021-03-19 | 北京声智科技有限公司 | End-to-end speech synthesis error correction method, system and device |
CN110851484A (en) * | 2019-11-13 | 2020-02-28 | 北京香侬慧语科技有限责任公司 | Method and device for obtaining multi-index question answers |
WO2021129410A1 (en) * | 2019-12-23 | 2021-07-01 | 华为技术有限公司 | Method and device for text processing |
CN111191441A (en) * | 2020-01-06 | 2020-05-22 | 广东博智林机器人有限公司 | Text error correction method, device and storage medium |
CN111291552A (en) * | 2020-05-09 | 2020-06-16 | 支付宝(杭州)信息技术有限公司 | Method and system for correcting text content |
CN112396049A (en) * | 2020-11-19 | 2021-02-23 | 平安普惠企业管理有限公司 | Text error correction method and device, computer equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106250364A (en) | A kind of text modification method and device | |
EP3660733B1 (en) | Method and system for information extraction from document images using conversational interface and database querying | |
US10699696B2 (en) | Method and apparatus for correcting speech recognition error based on artificial intelligence, and storage medium | |
CN102982021B (en) | For eliminating the method for the ambiguity of the multiple pronunciations in language conversion | |
CN102156551B (en) | Method and system for correcting error of word input | |
CN106570180B (en) | Voice search method and device based on artificial intelligence | |
CN111625635A (en) | Question-answer processing method, language model training method, device, equipment and storage medium | |
CN107832229A (en) | A kind of system testing case automatic generating method based on NLP | |
EP3846069A1 (en) | Pre-training method for sentiment analysis model, and electronic device | |
CN102393850B (en) | A kind of Chinese character pattern cognition similarity determines method | |
US20230206661A1 (en) | Device and method for automatically generating domain-specific image caption by using semantic ontology | |
CN111160013B (en) | Text error correction method and device | |
CN105808197A (en) | Information processing method and electronic device | |
CN105190645A (en) | Leveraging previous instances of handwriting for handwriting beautification and other applications | |
CN109977398A (en) | A kind of speech recognition text error correction method of specific area | |
CN107526721B (en) | Ambiguity elimination method and device for comment vocabularies of e-commerce products | |
CN113157852A (en) | Voice processing method, system, electronic equipment and storage medium | |
CN112101032A (en) | Named entity identification and error correction method based on self-distillation | |
CN104572632B (en) | A kind of method in the translation direction for determining the vocabulary with proper name translation | |
CN104281275A (en) | Method and device for inputting English | |
CN105243053A (en) | Method and apparatus for extracting key sentence of document | |
Li et al. | Wspeller: Robust word segmentation for enhancing chinese spelling check | |
WO2023225335A1 (en) | Performing computer vision tasks by generating sequences of tokens | |
CN110929514A (en) | Text proofreading method and device, computer readable storage medium and electronic equipment | |
CN110929013A (en) | Image question-answer implementation method based on bottom-up entry and positioning information fusion |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20161221 |