CN108052499A - Text error correction method, device and computer-readable medium based on artificial intelligence - Google Patents
Text error correction method, device and computer-readable medium based on artificial intelligence Download PDFInfo
- Publication number
- CN108052499A CN108052499A CN201711159880.7A CN201711159880A CN108052499A CN 108052499 A CN108052499 A CN 108052499A CN 201711159880 A CN201711159880 A CN 201711159880A CN 108052499 A CN108052499 A CN 108052499A
- Authority
- CN
- China
- Prior art keywords
- segment
- target fragment
- original segments
- frequency
- original
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
Abstract
The present invention provides a kind of text error correction method, device and computer-readable medium based on artificial intelligence.Its method includes:Obtain target fragment and the target fragment corresponding original segments in original text of error correction in corrected text;Target fragment is when carrying out correction process to original text based on segment scoring model trained in advance, to be selected from multiple candidate segments of original segments;Obtain feedback information of the user to the objective result fed back based on corrected text;According to target fragment, original segments and feedback information, incremental training is carried out to segment scoring model;Based on the segment scoring model after training, correction process is carried out to subsequent original text.Technical scheme when carrying out text error correction using the segment scoring model after training, can effectively improve the error correction accuracy rate of text.
Description
【Technical field】
The present invention relates to Computer Applied Technology field more particularly to a kind of text error correction method based on artificial intelligence,
Device and computer-readable medium.
【Background technology】
Artificial intelligence (Artificial Intelligence;AI), it is research, develops to simulate, extend and extend people
Intelligence theory, method, a new technological sciences of technology and application system.Artificial intelligence is one of computer science
Branch, it attempts to understand the essence of intelligence, and produces a kind of new intelligence that can be made a response in a manner that human intelligence is similar
Energy machine, the research in the field include robot, language identification, image identification, natural language processing and expert system etc..
With the development of science and technology, the pattern of the human-computer interaction under various scenes is more and more, user can be greatlyd improve
Experience Degree.For example, in scene is searched for, user searches for query by inputting, and search server can be according to input by user
The text of query is searched for, corresponding search result is obtained and feeds back to user.Or it is provided online by smart machine in others
In the scene of counseling services or shopping guide's service, smart machine can also receive text input by user, and be inputted based on user
Text make certain feedback.In above-mentioned all scenes, text input by user may all can there are certain mistake,
After getting text input by user, it is required to carry out error correction to text, more accurately to understand the demand of user.In order to
Error correction is effectively carried out to text, passes through trained very intelligent network model, and based on training in advance in currently available technology
Network model realize error correction to text.
But in the prior art, after network model trains, just immobilize, after a period of time, Ke Nengwu
Method carries out error correction to text exactly, and the accuracy so as to cause text error correction is poor.
【The content of the invention】
The present invention provides a kind of text error correction method, device and computer-readable mediums based on artificial intelligence, are used for
Improve the accuracy of text error correction.
The present invention provides a kind of text error correction method based on artificial intelligence, the described method includes:
Obtain target fragment and the target fragment corresponding original segments in original text of error correction in corrected text;
The target fragment is when carrying out correction process to the original text based on segment scoring model trained in advance, from described original
It is selected in multiple candidate segments of segment;
Obtain feedback information of the user to the objective result fed back based on the corrected text;
According to the target fragment, the original segments and the feedback information, the segment scoring model is carried out
Incremental training;
Based on the segment scoring model after training, correction process is carried out to subsequent original text.
Still optionally further, in method as described above, according to the target fragment, original segments and described anti-
Feedforward information carries out incremental training to the segment scoring model, specifically includes:
Obtain the relative characteristic information between the target fragment and the original segments;
Determine that the preferable of the target fragment is given a mark according to the feedback information;
It is given a mark according to the relative characteristic information and the preferable of the target fragment, the segment scoring model is instructed
Practice.
Still optionally further, in method as described above, the phase between the target fragment and the original segments is obtained
To characteristic information, including following at least one:
Obtain the relative mass feature between the target fragment and the original segments;
Obtain the opposite historical behavior feature between the target fragment and the original segments;With
Obtain the semantic similarity feature between the target fragment and the original segments.
Still optionally further, in method as described above, the phase between the target fragment and the original segments is obtained
To qualitative character, specifically include:
Obtain the frequency, the original segments that the original segments occur in corpus with it is upper in the original text
The frequency that hereafter combination of segment occurs together in the corpus;
Obtain the frequency, the target fragment and the context segment that the target fragment occurs in the corpus
The frequency that occurs together in the corpus of combination;
The frequency, the original segments and the context segment occurred according to the original segments in the corpus
The frequency that occurs in the corpus of the combination frequency, the target fragment that occur together in the corpus and institute
The frequency that the combination of target fragment and the context segment occurs in the corpus is stated, obtains the target fragment and institute
State the combination of frequency ratio that original segments occur in the corpus and the target fragment and the context segment with
The frequency ratio and/or the target patch that the combination of the original segments and the context segment occurs in the corpus
The frequency that section occurs in the corpus with the original segments is poor and the target fragment and the context segment
It is poor to combine the frequency that the combination with the original segments and the context segment occurs in the corpus.
Still optionally further, in method as described above, the phase between the target fragment and the original segments is obtained
To historical behavior feature, specifically include:
Obtain the first modification frequency that original segments described in PT tables are revised as the target fragment;
Obtain PT tables described in original segments and the context segment combination be revised as the target fragment with it is described
The second modification frequency of the combination of context segment;
According to the described first modification frequency and the second modification frequency, obtain frequency ratio and/or the frequency is poor, the frequency
Than being equal to the second modification frequency divided by the first modification frequency, the frequency difference is equal to the described second modification frequency and subtracts
The first modification frequency.
Still optionally further, in method as described above, the language between the target fragment and the original segments is obtained
Adopted similarity feature, specifically includes:
Obtain the semantic similarity of the target fragment and the original segments;And/or
Obtain the target fragment and the combination of the context segment and the original segments and the context segment
Combination semantic similarity.
Still optionally further, in method as described above, the phase between the target fragment and the original segments is obtained
To characteristic information, following at least one is further included;
According to default proper-noun dictionary, the specific term for obtaining the original segments and the target fragment respectively is special
Sign;And
Obtain the phonetic editing distance feature of the target fragment and the original segments.
Still optionally further, in method as described above, the ideal of the target fragment is determined according to the feedback information
Marking, specifically includes:
According to the feedback information, thus it is speculated that whether the user receives is replaced in the corrected text using the target fragment
Change the original segments;
If speculating, the user receives, and the preferable marking of the target fragment is arranged to 1;Otherwise, if described in speculating
User does not receive, then the preferable marking of the target fragment is arranged to 0.
Still optionally further, in method as described above, according to the relative characteristic information and the reason of the target fragment
Want to give a mark, the segment scoring model is trained, is specifically included:
The relative characteristic information is inputted and to the segment scoring model, obtains the prediction of the segment scoring model and beats
Point;
Obtain the magnitude relationship of the prediction marking and the preferable marking;
If the prediction marking is less than the preferable parameter given a mark, adjust the segment scoring model so that described
The prediction marking of section scoring model output is changed towards increased direction;
If the prediction marking is more than the preferable parameter given a mark, adjust the segment scoring model so that described
The prediction marking of section scoring model output is changed towards the direction of reduction.
The present invention provides a kind of text error correction device based on artificial intelligence, and described device includes:
Segment data obtaining module, for obtaining in corrected text the target fragment of error correction and the target fragment in original
Corresponding original segments in text;The target fragment is that the original text is carried out based on segment scoring model trained in advance
During correction process, selected from multiple candidate segments of the original segments;
Feedback information acquisition module, for obtaining feedback of the user to the objective result fed back based on the corrected text
Information;
Incremental training module, for according to the target fragment, the original segments and the feedback information, to described
Segment scoring model carries out incremental training;
Correction module, for based on the segment scoring model after training, correction process to be carried out to subsequent original text.
Still optionally further, in device as described above, the incremental training module specifically includes:
Relative characteristic information acquisition unit, for obtaining the relative characteristic between the target fragment and the original segments
Information;
Determination unit, for determining that the preferable of the target fragment is given a mark according to the feedback information;
Training unit, for being given a mark according to the relative characteristic information and the preferable of the target fragment, to the segment
Scoring model is trained.
Still optionally further, in device as described above, the relative characteristic information acquisition unit, for perform such as down toward
A kind of few operation:
Obtain the relative mass feature between the target fragment and the original segments;
Obtain the opposite historical behavior feature between the target fragment and the original segments;With
Obtain the semantic similarity feature between the target fragment and the original segments.
Still optionally further, in device as described above, the relative characteristic information acquisition unit is specifically used for:
Obtain the frequency, the original segments that the original segments occur in corpus with it is upper in the original text
The frequency that hereafter combination of segment occurs together in the corpus;
Obtain the frequency, the target fragment and the context segment that the target fragment occurs in the corpus
The frequency that occurs together in the corpus of combination;
The frequency, the original segments and the context segment occurred according to the original segments in the corpus
The frequency that occurs in the corpus of the combination frequency, the target fragment that occur together in the corpus and institute
The frequency that the combination of target fragment and the context segment occurs in the corpus is stated, obtains the target fragment and institute
State the combination of frequency ratio that original segments occur in the corpus and the target fragment and the context segment with
The frequency ratio and/or the target patch that the combination of the original segments and the context segment occurs in the corpus
The frequency that section occurs in the corpus with the original segments is poor and the target fragment and the context segment
It is poor to combine the frequency that the combination with the original segments and the context segment occurs in the corpus.
Still optionally further, in device as described above, the relative characteristic information acquisition unit is specifically used for:
Obtain the first modification frequency that original segments described in PT tables are revised as the target fragment;
Obtain PT tables described in original segments and the context segment combination be revised as the target fragment with it is described
The second modification frequency of the combination of context segment;
According to the described first modification frequency and the second modification frequency, obtain frequency ratio and/or the frequency is poor, the frequency
Than being equal to the second modification frequency divided by the first modification frequency, the frequency difference is equal to the described second modification frequency and subtracts
The first modification frequency.
Still optionally further, in device as described above, the relative characteristic information acquisition unit is specifically used for:
Obtain the semantic similarity of the target fragment and the original segments;And/or
Obtain the target fragment and the combination of the context segment and the original segments and the context segment
Combination semantic similarity.
Still optionally further, in device as described above, the relative characteristic information acquisition unit is additionally operable to perform as follows
It is at least one;
According to default proper-noun dictionary, the specific term for obtaining the original segments and the target fragment respectively is special
Sign;And
Obtain the phonetic editing distance feature of the target fragment and the original segments.
Still optionally further, in device as described above, the determination unit is specifically used for:
According to the feedback information, thus it is speculated that whether the user receives is replaced in the corrected text using the target fragment
Change the original segments;
If speculating, the user receives, and the preferable marking of the target fragment is arranged to 1;Otherwise, if described in speculating
User does not receive, then the preferable marking of the target fragment is arranged to 0.
Still optionally further, in device as described above, the training unit is specifically used for:
The relative characteristic information is inputted and to the segment scoring model, obtains the prediction of the segment scoring model and beats
Point;
Obtain the magnitude relationship of the prediction marking and the preferable marking;
If the prediction marking is less than the preferable parameter given a mark, adjust the segment scoring model so that described
The prediction marking of section scoring model output is changed towards increased direction;
If the prediction marking is more than the preferable parameter given a mark, adjust the segment scoring model so that described
The prediction marking of section scoring model output is changed towards the direction of reduction.
The present invention also provides a kind of computer equipment, the equipment includes:
One or more processors;
Memory, for storing one or more programs;
When one or more of programs are performed by one or more of processors so that one or more of processing
Device realizes the text error correction method based on artificial intelligence as described above.
The present invention also provides a kind of computer-readable mediums, are stored thereon with computer program, which is held by processor
The text error correction method based on artificial intelligence as described above is realized during row.
Text error correction method, device and the computer-readable medium based on artificial intelligence of the present invention, by obtaining error correction
The target fragment of error correction and the target fragment corresponding original segments in original text in text;Target fragment is based on advance instruction
When experienced segment scoring model carries out correction process to original text, selected from multiple candidate segments of original segments;It obtains
Feedback information of the user to the objective result fed back based on corrected text;According to target fragment, original segments and feedback letter
Breath carries out incremental training to segment scoring model;Based on the segment scoring model after training, error correction is carried out to subsequent original text
Processing.Technical scheme, by according to target fragment, original segments and feedback information, to segment scoring model into
Row incremental training can improve the forecasting accuracy of segment scoring model, and text is carried out using the segment scoring model after training
During error correction, the error correction accuracy rate of text can be effectively improved.Such as technical scheme is applied in long text editor
In, can user experience be promoted with the contents production quality of service hoisting long text.
【Description of the drawings】
Fig. 1 is the flow chart of the text error correction method embodiment one based on artificial intelligence of the present invention.
Fig. 2 is the flow chart of the text error correction method embodiment two based on artificial intelligence of the present invention.
Fig. 3 is the flow chart of the long text error correction method embodiment one based on artificial intelligence of the present invention.
Fig. 4 is a kind of search interface schematic diagram of the present embodiment.
Fig. 5 is the flow chart of the long text error correction method embodiment two the present invention is based on artificial intelligence.
Fig. 6 is the exemplary plot of the mapping table provided in this embodiment for obscuring sound.
Fig. 7 is a kind of error correction result schematic diagram of the long text error correction method based on artificial intelligence of the present embodiment.
Fig. 8 is the structure chart of the text error correction device embodiment one based on artificial intelligence of the present invention.
Fig. 9 is the structure chart of the text error correction device embodiment two based on artificial intelligence of the present invention.
Figure 10 is the structure chart of the computer equipment embodiment of the present invention.
Figure 11 is a kind of exemplary plot of computer equipment provided by the invention.
【Specific embodiment】
It is right in the following with reference to the drawings and specific embodiments in order to make the object, technical solutions and advantages of the present invention clearer
The present invention is described in detail.
Fig. 1 is the flow chart of the text error correction method embodiment one based on artificial intelligence of the present invention.As shown in Figure 1, this
The text error correction method based on artificial intelligence of embodiment, specifically may include steps of:
100th, target fragment and the target fragment corresponding original segments in original text of error correction in corrected text are obtained;
Target fragment is when carrying out correction process to original text based on segment scoring model trained in advance, from multiple times of original segments
It is selected in selected episode;
The executive agent of the text error correction method based on artificial intelligence of the present embodiment entangles for the text based on artificial intelligence
Misloading is put, which can be independent an electronic entity, for carrying out error correction to text.The text of the present embodiment can be
The short text of query etc, or the long text in text editing system, the length of long text are typically larger than query's
Length can be a longer sentence.That is, the text error correction method based on artificial intelligence of the present embodiment can answer
In search scene, it can also be used in and be related in the various scenes of long text editor.
, it is necessary to carry out error correction, specific error correction to original text in the text correction process based on artificial intelligence of the present embodiment
When, can word segmentation processing first be carried out to original text, obtain multiple participles.Participle strategy therein may be referred to related art
Participle strategy, be not limited herein.Then the window of one default size can be set, which is applied in original text
In, sliding window, chooses to each original segments from front to back.The size of preset window in the present embodiment could be provided as 1
The size of a participle either 2 participle sizes or 3 participle sizes.Therefore, the original segments of the present embodiment can be by
Each participle is separately formed or is formed by continuously segmenting combination.
According to aforesaid way, after obtaining each original segments in original text.Then for each original segments, obtain
Multiple candidate segments of the original segments can be replaced, acquisition process can replace (Phase based on the phrase counted in advance
Table;PT) table is recalled more the pronunciation that obtains the corresponding replacement segment of original segments or be also based on original segments
The same or similar candidate segment of pronunciation.Then it can be given a mark, gone forward side by side to each candidate segment using segment scoring model
One step obtains to replace the target fragment of original segments from multiple candidate segments according to the marking of each segment.Such as
It in shorter query, may only include an original segments, the highest candidate segment of marking may be employed at this time as target patch
Section.And for poor text, including original segments more than two when, at this point for each original segments, can obtain
The highest candidate segment of marking is taken as its corresponding target fragment.Or for some original segments, it is contemplated that with context
The factors such as connectedness, from higher top n of giving a mark marking time can also be taken high or secondary high candidate segment mesh the most
Standard film section, does not limit herein.No matter which kind of mode target fragment is obtained using, be required to reference to segment scoring model to candidate
The marking of segment.Therefore, in the present embodiment, segment scoring model is very important in text error correction to the marking of candidate segment
One link, if segment scoring model is poor to the marking accuracy of candidate segment, it will cause the accuracy of text error correction
It is poor.
In the present embodiment, after obtaining corrected text to original text error correction using the above method, error correction text can be got
Target fragment and the target fragment corresponding original segments in original text of error correction in this.
101st, feedback information of the user to the objective result fed back based on corrected text is obtained;
In the present embodiment, scene is different, based on objective result from corrected text to user feedback form and content can
To differ.For example, in search scene, it can be for based on corrected text to the objective result of user feedback based on corrected text
Search result.In long text editor, it can show as agreeing to repair to the objective result of user feedback based on corrected text
Change or disagree modification.In other scenes, other forms may be also had, this is no longer going to repeat them.No matter it is based on
Corrected text and be which type of form to the objective result of user feedback, can get the feedback information of user.Such as
In scene is searched for, based on corrected text after user feedback search result, if user agrees to the search result after error correction,
Search result reading can be clicked directly on.And if user disagrees the search result after error correction, can ignore this search result,
And it scans for again.For another example in long text editor's scene, after carrying out error correction to original text input by user, entangling
The prompting that wrong position can give user certain, agrees to or disagrees, user can be according to the truth of the error correction position, point
It hits agreement or disagrees.So no matter under which type of scene, it is anti-to being based on corrected text user can be got
The feedback information of the objective result of feedback.
102nd, according to target fragment, original segments and feedback information, incremental training is carried out to segment scoring model;
The incremental training of the present embodiment can be on-line study process, that is to say, that after error correction each time, direct root
On-line study is carried out to segment scoring model according to error correction result, to improve the forecasting accuracy of segment scoring model.
Alternatively, the incremental training of the present embodiment or offline progress, per the cycle at regular intervals, gather the time
All error correction datas in cycle, and incremental training is carried out to segment scoring model again using these error correction datas, to improve segment
The forecasting accuracy of scoring model.
, it is necessary to according to target fragment, original segments and feedback information, to segment during the incremental training of the present embodiment
Scoring model carries out incremental training.
103rd, based on the segment scoring model after training, correction process is carried out to subsequent original text.
Based on the segment scoring model after above-mentioned incremental training, when carrying out correction process to subsequent original text, accurately
Degree can higher.
In practical application, the model structure of simple GBRank can not carry out incremental training, in the present embodiment, in order to improve
The accuracy of segment scoring model carries out incremental training to segment scoring model.The segment scoring model of the present embodiment can be adopted
Using logistic regression function in the model of GBRank, to support incremental training.Such as it needs first to train when training
Gbrank models after obtaining tree-model, on this basis, are trained with reference to logistic regression using on same training data, obtained
To the segment scoring model of the present embodiment.
The text error correction method based on artificial intelligence of the present embodiment, by the target patch for obtaining error correction in corrected text
Section and target fragment corresponding original segments in original text;Target fragment is to original based on segment scoring model trained in advance
When text carries out correction process, selected from multiple candidate segments of original segments;User is obtained to being based on corrected text and
The feedback information of the objective result of feedback;According to target fragment, original segments and feedback information, segment scoring model is carried out
Incremental training;Based on the segment scoring model after training, correction process is carried out to subsequent original text.The technical side of the present embodiment
Case, by according to target fragment, original segments and feedback information, carrying out incremental training to segment scoring model, can improve
The forecasting accuracy of segment scoring model when carrying out text error correction using the segment scoring model after training, can be carried effectively
The error correction accuracy rate of high text.Such as apply the technical solution of the present embodiment in long text editor, it can be long with service hoisting
The contents production quality of text promotes user experience.
Fig. 2 is the flow chart of the text error correction method embodiment two based on artificial intelligence of the present invention.As shown in Fig. 2, this
The text error correction method based on artificial intelligence of embodiment is on the basis of the technical solution of above-mentioned embodiment illustrated in fig. 1, into one
Step introduces technical scheme in further detail.As shown in Fig. 2, the text error correction side based on artificial intelligence of the present embodiment
Method specifically may include steps of:
200th, target fragment and the target fragment corresponding original segments in original text of error correction in corrected text are obtained;
Target fragment is when carrying out correction process to original text based on segment scoring model trained in advance, from multiple times of original segments
It is selected in selected episode;
201st, feedback information of the user to the objective result fed back based on corrected text is obtained;
The implementation of step 200 and step 201 specifically may be referred to the step 100 and step of above-mentioned embodiment illustrated in fig. 1
101, details are not described herein.
202nd, the relative characteristic information between target fragment and original segments is obtained;
For example, step 202 can specifically include following at least one:
Firstth, the relative mass feature between target fragment and original segments is obtained;
The step specifically may include steps of:
(a1) frequency, original segments and the context segment in original text that original segments occur in corpus are obtained
The frequency that occurs together in corpus of combination;
The step (a1) is the specific acquisition modes for the qualitative character for obtaining original segments.By having obtained in this present embodiment
Get corrected text, it may be determined that the application field of the present embodiment.Specifically, the original is obtained in the corpus of the application field
The qualitative character of beginning segment.
The context segment of original segments is tight phase before or after being located at original segments in original text in the present embodiment
Adjacent segment.Such as when original segments include 1 participle, corresponding context segment can include being located at 1 before the participle
It is a to segment either 2 participles and 1 participle or 2 participles after the participle.And if original segments include 2 points
During word, corresponding context segment can include being located at 1 participle before the original segments in original text and positioned at the original
1 participle after beginning segment.And if original segments include 3 participle when, corresponding context segment can only include original text
In be located at 1 participle before the original segments and 1 after original segments participle.Or in view of including more point
The probability that the segment of word occurs in original text is smaller, can also be limited in the present embodiment:If original segments included 3 or
During above participle, its context segment can not be taken.That is, when needing to take the context segment of original segments, it is corresponding
The combination of original segments and context segment can add literary segment there are original segments, original segments add hereafter segment and above
Segment adds original segments to add hereafter segment totally three combinations again.When obtaining the qualitative character of original segments, it is necessary to obtain original
Segment, original segments add the combination of literary segment, original segments add the hereafter combination of segment and segment above to add original segments
The frequency occurred again plus hereafter in each comfortable corpus of combination of segment.
Further optionally, when original segments need not take context segment, the quality of corresponding original segments is special at this time
Sign can only include the frequency that original segments occur in corpus.
(b1) combination of the frequency, target fragment and context segment that target fragment occurs in corpus is obtained in language material
The frequency occurred together in storehouse;
Accordingly, which is the acquisition modes of the qualitative character of target fragment, and specific acquisition modes are the same as above-mentioned step
Suddenly (a1) is identical, and details are not described herein.
Furthermore, it is contemplated that the alignment of data, replacement segment of the target fragment as original segments have phase with original segments
With property, if original segments do not take context segment in step (a1), accordingly the target fragment in step (b1) is not yet
Take context segment.And when needing to take context segment, and original segments be the beginning of the sentence of original text or sentence tail, corresponding sky
Context segment can set default beginning of the sentence feature or sentence tail feature to represent, to ensure the alignment of data.
(c1) combination of the frequency, original segments and the context segment that are occurred according to original segments in corpus is in language material
The combination of the frequency and target fragment that the frequency, the target fragment occurred together in storehouse occurs in corpus and context segment
The frequency occurred in corpus obtains frequency ratio and target fragment that target fragment occurs with original segments in corpus
The frequency ratio and/or mesh that the combination of combination and original segments and context segment with context segment occurs in corpus
The frequency difference and combination and the original sheet of target fragment and context segment that standard film section occurs with original segments in corpus
The frequency that the combination of section and context segment occurs in corpus is poor.
The specific acquisition modes of the relative mass feature of the step (c1) between target fragment and original segments.Specifically
Ground, by obtaining frequency ratio and target fragment that target fragment and original segments occur in corpus and context segment
The frequency ratio and/or target fragment and original sheet that combination and the combination of original segments and context segment occur in corpus
The frequency difference and combination and original segments and the context segment of target fragment and context segment that section occurs in corpus
The frequency that occurs in corpus of combination it is poor, the amalgamation of target fragment and context segment can be embodied, if target patch
Section and the frequency that original segments occur in corpus are bigger, and the combination of target fragment and context segment, with original sheet
The frequency ratio that the combination of section and context segment occurs in corpus is very small, then illustrates the target fragment and context segment
Compatibility is poor, is not suitable for replacing original segments.Vice versa.
Similarly, if target fragment and the frequency difference that original segments occur in corpus are smaller, i.e., using probability difference not
It is more, but what the combination of target fragment and context segment and the combination of original segments and context segment occurred in corpus
Frequency difference is very big, illustrates the combination of target fragment and context segment, is expecting than the combination of beginning segment and context segment
Using more frequently in storehouse, it may be considered that target fragment has very strong compatibility with context segment, target patch may be employed
Section replaces original segments, and vice versa.
In addition, when if original segments need not take context segment, corresponding relative mass feature only includes at this time:According to
The frequency that the frequency and target fragment that original segments occur in corpus occur in corpus, obtain target fragment with it is original
Frequency ratio that segment occurs in corpus and/or target fragment and the frequency that original segments occur in corpus are poor.With it is upper
Stating needs that context segment is taken to compare, and obtains feature and not enough enriches, therefore, in the present embodiment, it is preferable that needs to obtain
Hereafter segment.
Secondth, the opposite historical behavior feature between target fragment and original segments is obtained;
The step specifically may include steps of:
(a2) the first modification frequency that original segments in PT tables are revised as target fragment is obtained;
(b2) combination for obtaining original segments and context segment in PT tables is revised as target fragment and context segment
The second modification frequency of combination;
(c2) according to the first modification frequency and the second modification frequency, obtain frequency ratio and/or the frequency is poor, frequency ratio is equal to the
The two modification frequencys divided by the first modification frequency, frequency difference are equal to the second modification frequency and subtract the first modification frequency.
In addition, it is necessary to explanation, if original segments include 3 participles, when not taking context segment, can not use at this time
Above-mentioned steps (a2)-(c2) realizes the acquisition of the opposite historical behavior feature between target fragment and original segments, can be direct
Opposite historical behavior is set to be characterized as an empty or default characteristic symbol.When certainly, due to taking context segment, including spy
It levies abundant in content, the situation of context segment is preferably taken in the present embodiment, above-mentioned steps (a2)-(c2) is taken to realize target fragment
The acquisition of opposite historical behavior feature between original segments.
3rd, the semantic similarity feature between target fragment and original segments is obtained.
Similarly, the semantic similarity feature between the acquisition target fragment and original segments of the present embodiment, can include:It obtains
Take the semantic similarity of target fragment and original segments;And/or obtain combination and the original sheet of target fragment and context segment
The semantic similarity of the combination of section and context segment.
In the present embodiment, default dictionary may be employed, obtain the term vector of target fragment and the word of original segments to
Amount, then calculate target fragment term vector and original segments term vector between COS distance, as the candidate segment with
The semantic similarity of original segments.Accordingly, if the participle quantity that original segments include in the present embodiment is 3 or more
When, the semantic similarity of target fragment and original segments, the semantic similarity as target fragment and original segments will be taken at this time
Feature.If the participle quantity that original segments include in the present embodiment is less than 3, it is also necessary to take the context piece of original segments
Section, at this time, it is also necessary to obtain target fragment and the language of the combination and the combination of original segments and context segment of context segment
Adopted similarity.Similarly, the term vector of the combination of acquisition target fragment and context segment and original segments and context segment
Combination term vector, then calculate term vector between COS distance, the combination as the candidate segment and context segment
With the semantic similarity feature of the combination of original segments and context segment.Accordingly, original segments are plus the group of hereafter segment
Conjunction includes original segments and adds hereafter segment and segment above that original segments is added to add hereafter segment again plus literary segment, original segments
Totally three combinations.At this time accordingly, the semantic similarity feature of candidate segment and original segments includes:Target fragment and original sheet
The semantic similarity of section, the combination of the candidate segment and segment above are semantic similar with the combination of original segments and segment above
Degree, the candidate segment and the hereafter combination of segment and original segments and the hereafter semantic similarity of the combination of segment and above
Segment, the candidate segment and the hereafter combination of segment and segment above, original segments and the hereafter combination of segment it is semantic similar
Spend the semantic similarity feature of the candidate segment being spliced to form together and original segments.
It is, relatively special for the accuracy that the rich and segment scoring model of feature is given a mark in the present embodiment preferably
Reference breath is simultaneously including relative mass feature, opposite historical behavior feature and semantic similarity feature.In order to further enrich
The content of relative characteristic information in the present embodiment, obtains the relative characteristic information between target fragment and original segments, can be with
Including following at least one;According to default proper-noun dictionary, the specific term for obtaining original segments and target fragment respectively is special
Sign;And obtain the phonetic editing distance feature of target fragment and original segments.
Specifically, the specific term feature of target fragment is used to identify whether the target fragment belongs to specific term.Such as
Judge whether certain target fragment belongs to specific term according to proper-noun dictionary, if belonging to, corresponding specific term is characterized as 1,
Otherwise corresponding specific term is characterized as 0.Accordingly, if target fragment is specific term, which replaces original
The probability of segment is higher;And if not specific term, then the probability of target fragment replacement original segments is lower.Similarly, can also
According to the specific term feature of specific term lab setting original segments, details are not described herein.In addition, it is desirable to explanation, actual
In, original segments and target fragment are very small for the probability of specific term simultaneously.
The pronunciation of target fragment is specially compiled as original by the in addition pronunciation editing distance of target fragment and original segments
Alphabetical quantity in the phonetic that the pronunciation needs of segment adjust, accordingly, the pronunciation editing distance of target fragment and original segments
It is bigger, illustrate that the probability that the original segments are replaced using target fragment is smaller;And if the pronunciation of target fragment and original segments is compiled
Volume apart from smaller, illustrate that the probability that target fragment is used to replace the original segments is bigger.
203rd, determine that the preferable of target fragment is given a mark according to feedback information;
With reference to the record of above-mentioned steps 101, it is recognised that no matter based on corrected text and to the objective result of user feedback
It is which type of form, can gets the feedback information of user.And the feedback information of user is finally presented as agreement error correction
Text disagrees corrected text.It therefore, can be first according to feedback information in the present embodiment, thus it is speculated that whether user receives error correction
Original segments are replaced using target fragment in text;If speculating, user receives, it is believed that target fragment to original segments is replaced
It is correct to change, then the preferable marking of target fragment is arranged to 1;Otherwise, if speculating, user does not receive, it is believed that target patch
The replacement of section to original segments is incorrect, then the preferable marking of target fragment is arranged to 0.
204th, given a mark according to the above-mentioned relative characteristic information of acquisition and the preferable of target fragment, segment scoring model is carried out
Training;
The step 202- steps 204 of the present embodiment are the step 102 of above-mentioned embodiment illustrated in fig. 1 " according to target fragment, original
A kind of specific implementation of beginning segment and feedback information, to segment scoring model progress incremental training ".
The present embodiment is trained for incremental training, and similar on-line training can be carried out once after each error correction,
It can carry out off-line training per the cycle at regular intervals, to gather all text error correction datas in the time cycle, no matter adopt
It is that the existing segment scoring model trained is learnt again, after improving segment scoring model with which kind of mode
The precision of continuous prediction.During training, all relative characteristic information of above-mentioned acquisition can be inputted to segment scoring model, obtained
The prediction marking of segment scoring model;Obtain the magnitude relationship of prediction marking and preferable marking;It is beaten if prediction marking is less than ideal
Point, the parameter of adjustment segment scoring model so that the prediction marking of segment scoring model output is changed towards increased direction;It is if pre-
It surveys marking and is more than ideal marking, adjust the parameter of segment scoring model so that the prediction marking court of segment scoring model output subtracts
Small direction changes.The adjustment of the present embodiment is only once finely tuned, as long as ensureing the prediction marking energy of segment scoring model output
It is enough to be changed towards increase or the direction reduced.
Further optionally, in the present embodiment, it can also no longer perform and input all relative characteristic information of above-mentioned acquisition
To segment scoring model, the prediction marking of segment scoring model is obtained, segment scoring model is to this when can directly acquire error correction
The marking of target fragment.
205th, based on the segment scoring model after training, correction process is carried out to subsequent original text.
The text error correction method based on artificial intelligence of the present embodiment, using above-mentioned technical proposal, by according to target patch
Section, original segments and feedback information carry out incremental training to segment scoring model, can improve the prediction of segment scoring model
Accuracy when carrying out text error correction using the segment scoring model after training, can effectively improve the error correction accuracy rate of text.
Such as the technical solution of the present embodiment is applied in long text editor, can with the contents production quality of service hoisting long text,
Promote user experience.
Above-mentioned Fig. 1 and the text error correction method based on artificial intelligence of embodiment illustrated in fig. 2 not only can be adapted for query
In the correction process of short texts such as search, it is readily applicable in the correction process of long text.Following embodiments introduce this implementation
The scene for the long text error correction that the technical solution of example is applied.
Fig. 3 is the flow chart of the long text error correction method embodiment one based on artificial intelligence of the present invention.As shown in figure 3,
The long text error correction method based on artificial intelligence of the present embodiment, specifically may include steps of:
300th, when in long text there are during the original segments of non-dedicated noun, according to pre-set in the field of long text
PT tables carry out PT segments to the original segments for needing error correction and recall, obtain the candidate segment set of original segments, the candidate segment
Set includes multiple candidate segments;
The long text of the present embodiment can be the various long texts letter that the length that user edits is more than common query length
Breath, for example, can be article summary or an article in a sentence etc..Using the technical side of this reality example
Case can carry out long text error correction, so as to fulfill the error correction to entire article to each sentence in an article.
Similarly, in the present embodiment, when carrying out error correction to long text, it is necessary to first carry out word segmentation processing to long text, obtain
Multiple participles.Participle strategy therein may be referred to the participle strategy of related art, not be limited herein.The present embodiment
Original segments can be separately formed by each participle or be formed by continuously segmenting combination, referring in detail to above-described embodiment
It records, details are not described herein.After obtaining multiple original segments in long text, judge whether each original segments are special name
Word.Such as can whether specific term be belonged to come each original segments judged in long text according to default proper-noun dictionary,
If belonging to, determine that there is no the original segments for needing error correction in long text;If otherwise in the presence of being not belonging to the original of specific term
Segment determines there are the original segments for needing error correction in long text.The proper-noun dictionary of the present embodiment can be in advance to the length
Data in the field of text are counted, and extract specific term, and the number of all specific terms including the field generated
According to storehouse.
By above-mentioned judgement, if storing non-dedicated noun in long text, according to pre-set in the field of long text
PT tables carry out PT segments to the original segments for needing error correction and recall, and the multiple candidate segments recalled are integrated into candidate's piece
In Duan Jihe.
In the present embodiment, before the step 300, the PT tables of the long article this field can also be pre-set, such as specifically
It can include following at least one mode:
First, the big data that search term behavior is actively changed according to user in long article this field counts, and obtains original segments and arrives
Replace the change frequency of segment.By original segments, segment and original segments are replaced to the change frequency for replacing segment, are stored in PT
In table;
Such as:User continuously inputs " blue or green Hua Da ", " Tsinghua University ", can collect " blue or green China->The change of Tsing-Hua University ";
" blue or green Hua Da->The change of Tsinghua University ";Since user is in input process, if an input error before finding, can actively repair
It is correct to change search term, according to the behavior of user, it is known that after the search term of modification once be correct.For example,
By the statistics of preset period of time, can learn " blue or green China->The change frequency of Tsing-Hua University " is 100 times, " blue or green Hua Da->Tsing-Hua University
The change frequency of university " is 70 times.
2nd, the title for the search result searched for according to search term input by user in long article this field and search server
Between segment alignment mapping, obtain original segments to replacement segment the change frequency.By original segments, replace segment and original
Beginning segment is stored in the change frequency for replacing segment in PT tables;For example, Fig. 4 is a kind of search interface schematic diagram of the present embodiment.
As shown in figure 4, the search term of certain input of user is " blue or green Hua Da ", still, the search result of search server includes " Tsing-Hua University
University ", and including " blue or green Hua Da ".In this way, including Tsinghua University for the title of search result, blue or green Hua Da can be recorded
Learn->The change of Tsinghua University " 1 time;Include blue or green Hua Da for the title of search result, can record blue or green Hua Da->
The change of blue or green Hua Da " 1 time.If searching 30 altogether as a result, wherein 28 titles are Tsinghua University, 2 titles are to close
In blue or green Hua Da, then it is assumed that " blue or green Hua Da->The change frequency of Tsinghua University " is 28 times, " blue or green Hua Da->Blue or green Hua Da "
The change frequency be 2 times.
3rd, according to the user feedback between search term input by user in long article this field and search server active error correction
Alignment of data maps, and obtains original segments to the change frequency for replacing segment.By original segments, replace segment and original segments
To the change frequency for replacing segment, it is stored in PT tables;Unlike above-mentioned 2nd kind of situation, in this kind of situation, it is necessary to according to
The feedback at family determines to replace segment.For example, the search term of certain input of user is " blue or green Hua Da ", still, search server
Search result not only included " Tsinghua University ", but also including " blue or green Hua Da ";If user, which often clicks on a title, includes " Tsinghua University "
Search result, then it is assumed that " blue or green Hua Da->The change of Tsinghua University " 1 time;User, which clicks on a title, includes " blue or green Hua Da "
Search result, then it is assumed that " blue or green Hua Da->The change of blue or green Hua Da " 1 time.
In the way of above-described embodiment, the PT tables of the present embodiment can be what preset period of time was gathered and counted.It should
PT tables may be employed any mode in above-mentioned three kinds of modes and generate, and can also use above-mentioned arbitrary two ways or three kinds of sides
Formula combination producing.According to above-described embodiment, it is known that being recorded in the PT tables of the present embodiment, multigroup original segments, replacement
Segment and the corresponding change frequency, for example, the storage form that uses of every group of data can for " original segments->Segment is replaced, is changed
The dynamic frequency ".For same original segments, multiple replacement segments can be corresponded to, each replacing the corresponding change frequency of segment can
To differ.According to PT tables, when being recalled to the original segments progress PT segments for needing error correction, can specifically be obtained from the PT tables
The corresponding all replacement segments of the original segments, while obtain the corresponding change frequency of each replacement segment.Then replaced from multiple
The TOP n replacement segments for obtaining change frequency maximum are changed in segment as the corresponding candidate segment of the original segments.And by more
A candidate segment forms a candidate segment set.
301st, using segment scoring model trained in advance, respectively each candidate segment in candidate segment set is beaten
Point;
In the present embodiment, a segment scoring model can be trained in advance, for each time in candidate segment set
Selected episode is given a mark.In the present embodiment, for same original segments, using giving a mark, high candidate segment is come in error correction long text
Original segments probability, higher than the probability for carrying out the original segments in error correction long text using the low candidate segment of giving a mark.But
When correcting long text, it is also necessary to the factors such as original segments and the smoothness of context are considered, so the correction text finally obtained
In, original segments may not be replaced using highest candidate segment is given a mark.The segment scoring model of the present embodiment may be employed
GBRank network models.
For example, the step 301 specifically may include steps of:
(a3) each candidate's piece in qualitative character and candidate segment set of the original segments in the field of long text is obtained
Qualitative character of the section in the field of long text;
For example, wherein obtaining qualitative character of the original segments in the field of long text, can specifically include:It obtains original
The combination of the frequency, original segments and context segment that segment occurs in the corpus of long article this field is in long article this field
The frequency occurred together in corpus.
Accordingly, qualitative character of each candidate segment in the field of long text in candidate segment set is obtained, specifically
Including:Obtain the frequency, each candidate segment and context piece that each candidate segment in candidate segment set occurs in corpus
The frequency that the combination of section occurs in corpus.
The context segment of original segments is tight phase before or after being located at original segments in long text in the present embodiment
Adjacent segment may be referred to the related record of above-mentioned embodiment illustrated in fig. 2 in detail, and details are not described herein.Or in view of including
The probability that the segment of more participle occurs in long text is smaller, can also be limited in the present embodiment:If original segments have been wrapped
When including 3 or more participles, its context segment can not be taken.When needing to take the context segment of original segments, obtaining
, it is necessary to obtain original segments, original segments add the combination of literary segment, original segments add hereafter during the qualitative character of original segments
The combination of segment and segment above add original segments to add the frequency occurred in each comfortable corpus of the combination of hereafter segment again.It is right
Ying Di, similarly, details are not described herein for the qualitative character acquisition modes of each candidate segment.
(b3) according to qualitative character of the original segments in the field of long text and each candidate segment in the field of long text
In qualitative character, obtain the relative mass features of each candidate segment and original segments;
For example, the step (b3), can specifically include:The frequency, the original sheet occurred according to original segments in corpus
The frequency that the section frequency, each candidate segment that occur together with the combination of context segment is in corpus occur in corpus with
And the frequency that the combination of each candidate segment and context segment occurs in corpus, it obtains each candidate segment and exists with original segments
The frequency ratio occurred in corpus and the combination of each candidate segment and context segment and original segments and context segment
Combine the frequency difference that the frequency ratio that occurs in corpus and/or each candidate segment occur with original segments in corpus with
And the frequency that the combination of each candidate segment and context segment and the combination of original segments and context segment occur in corpus
Secondary difference.
Specifically, by obtaining frequency ratio and each candidate's piece that each candidate segment and original segments occur in corpus
Frequency ratio that the combination of section and context segment and the combination of original segments and context segment occur in corpus and/or
The frequency that each candidate segment and original segments occur in corpus is poor and the combination of each candidate segment and context segment with
The frequency that the combination of original segments and context segment occurs in corpus is poor, can embody candidate segment and context segment
Amalgamation, if candidate segment and the frequency that original segments occur in corpus are bigger, and candidate segment and context
The combination of segment, the frequency ratio that the combination with original segments and context segment occurs in corpus is very small, then explanation should
Candidate segment and context segment compatibility are poor, are not suitable for replacing original segments.Vice versa.
Similarly, if candidate segment and the frequency difference that original segments occur in corpus are smaller, i.e., using probability difference not
It is more, but the combination of candidate segment and context segment, the combination with original segments and context segment occur in corpus
Frequency difference it is very big, illustrate the combination of candidate segment and context segment, than beginning segment and context segment combination pre-
Expect, it may be considered that candidate segment has very strong compatibility with context segment, candidate to may be employed using more frequently in storehouse
Segment replaces original segments, and vice versa.
It should be noted that if when original segments have included 3 or more participles, its context segment can not be taken,
The frequency that can occur at this time according only to the frequency, each candidate segment occurred in corpus in corpus, obtains each candidate
The frequency ratio and/or each candidate segment that segment occurs with original segments in corpus occur with original segments in corpus
The frequency it is poor, as each candidate segment and the relative mass feature of original segments.With it is above-mentioned need to take context segment compared with, obtain
Take feature not abundant enough, therefore, in the present embodiment, it is preferable that need to obtain context segment.
In addition, it is necessary to explanation, when needing to take context segment.And the beginning of the sentence or sentence that original segments are long text
Tail, corresponding sky context segment can set default beginning of the sentence feature or sentence tail feature to represent, to ensure pair of data
Together.
(c3) the opposite historical behavior feature that original segments replace with each candidate segment is obtained;
Since PT tokens record historied modification information, the historical behavior feature of the present embodiment can be in PT tables
Change the relevant feature of the frequency.Such as the step (c3) specifically may include steps of:
(a4) the first modification frequency that original segments in PT tables are revised as each candidate segment is obtained;
(b4) combination for obtaining original segments and context segment in PT tables is revised as each candidate segment and context segment
Combination second modification the frequency;
(c4) according to the first modification frequency and the second modification frequency, obtain frequency ratio and/or the frequency is poor, frequency ratio is equal to institute
The second modification frequency divided by the first modification frequency are stated, frequency difference is equal to the second modification frequency and subtracts the first modification frequency.
In addition, it is necessary to explanation, if original segments include 3 participles, when not taking context segment, can be set at this time
An empty or default characteristic symbol is characterized as with respect to historical behavior.
(d3) the semantic similarity feature of each candidate segment and original segments is obtained;
In the present embodiment, default dictionary may be employed, obtain the term vector of each candidate segment and the word of original segments
Then vector calculates the COS distance between the term vector of each candidate segment and the term vector of original segments, as candidate's piece
The semantic similarity of section and original segments.Accordingly, if the participle quantity that original segments include in the present embodiment for 3 or with
When upper, the semantic similarity of each candidate segment and original segments will be taken at this time, the semanteme as each candidate segment and original segments
Similarity feature.If the participle quantity that original segments include in the present embodiment is less than 3, it is also necessary to take the upper and lower of original segments
Literary segment, at this time, it is also necessary to obtain each candidate segment and the combination of context segment and original segments and the group of context segment
The semantic similarity of conjunction.Similarly, the term vector of the combination of each candidate segment and context segment and original segments and upper are obtained
Then the hereafter term vector of the combination of segment calculates the COS distance between term vector, as the candidate segment and context piece
The combination of section, the semantic similarity feature with the combination of original segments and context segment.Accordingly, original segments are plus hereafter
The combination of segment includes original segments and adds hereafter segment and segment above that original segments is added to add again plus literary segment, original segments
Hereafter segment totally three combinations.At this time accordingly, the semantic similarity feature of candidate segment and original segments includes:Each candidate's piece
Semantic similarity, the candidate segment and the combination of segment above and the combination of original segments and segment above of section and original segments
Semantic similarity, the candidate segment and the semanteme of the hereafter combination of segment to the original segments and hereafter combination of segment it is similar
Degree and above segment, the candidate segment and the hereafter combination of segment and segment above, original segments and the hereafter combination of segment
The candidate segment that is spliced to form together of semantic similarity and original segments semantic similarity feature.
In addition, the relative mass feature of above-mentioned each candidate segment and original segments, opposite historical behavior feature and semanteme
The acquisition of similarity feature, the relative mass of target fragment and original segments in can also referring to respectively shown in above-mentioned Fig. 2 are special
The acquisition of sign, opposite historical behavior feature and semantic similarity feature.
(e3) according to the opposite of the relative mass feature of each candidate segment and original segments, each candidate segment and original segments
The semantic similarity feature of historical behavior feature, each candidate segment and original segments and segment scoring model obtain each respectively
The marking of candidate segment.
Then each candidate segment and the relative mass feature of original segments, each candidate segment and original above-mentioned steps obtained
The semantic similarity feature of the opposite historical behavior feature of beginning segment, each candidate segment and original segments is inputted to advance training
Segment scoring model in, which can predict the marking of the candidate segment.
Such as during the training of segment scoring model, it can gather and be replaced as the training original segments of positive example and negative example and training
Segment is replaced if correct, and corresponding marking is 1, and training data is positive example at this time;Else if be the replacement of mistake,
Corresponding marking is 0;Training data is negative example at this time.The ratio of positive and negative example is more than 1 in training data, such as can be 5:1 or
Person 4:1.Before training, it is in advance the parameter setting initial value of the segment scoring model, training data is then sequentially input, if piece
The marking and known marking of section scoring model prediction are inconsistent, adjust the parameter of segment scoring model so that prediction result and
Know that result reaches unanimity.Using aforesaid way, segment scoring model is constantly trained using the training data of tens million of, until
The result of segment scoring model prediction is consistent with known results, it is determined that segment trains the parameter of scoring model, so that it is determined that piece
Section scoring model, then segment scoring model training finish.The quantity of the training data used during training is more, and trained segment is beaten
Sub-model is more accurate, and the follow-up marking that segment scoring model is used to predict candidate segment is more accurate.According to aforesaid way, in advance
The marking of survey can be between 0-1.In practical application, segment scoring model can also be set to be located in other numberical ranges, such as
Between 0-100, principle is similar, and details are not described herein.
Still optionally further, before giving a mark for each candidate's piece, can also include the following steps:According to default special name
Dictionary and each candidate segment obtain the specific term feature of each candidate segment;And/or obtain each candidate segment and original segments
Phonetic editing distance feature.
Specifically, the specific term feature of each candidate segment is used to identify whether the candidate segment belongs to specific term.Example
Such as judge whether certain candidate segment belongs to specific term according to proper-noun dictionary, if belonging to, corresponding specific term is characterized as
1, otherwise corresponding specific term be characterized as 0.Accordingly, if candidate segment is specific term, segment scoring model is should
The marking of candidate segment output is higher;And if not specific term, then the marking of corresponding output is relatively low.In addition candidate segment and original
The pronunciation editing distance of beginning segment, the pronunciation of candidate segment is specially compiled as to the pronunciation of original segments needs the phonetic adjusted
The quantity of middle letter, accordingly, the pronunciation editing distance of candidate segment and original segments are bigger, illustrate to replace using candidate segment
The probability of the original segments is smaller, and corresponding segment scoring model can be smaller for the marking of candidate segment output at this time;And
If the pronunciation editing distance of candidate segment and original segments is smaller, the probability for illustrating to be replaced the original segments using candidate segment is got over
Greatly, corresponding segment scoring model can be larger for the marking of candidate segment output at this time.
Based on principles above, accordingly, step (e1) can specifically include:According to the phase of each candidate segment and original segments
To the semanteme of the opposite historical behavior feature of qualitative character, each candidate segment and original segments, each candidate segment and original segments
Similarity feature and segment scoring model, and combine each candidate segment specific term feature and each candidate segment with it is original
The phonetic editing distance feature of segment, obtains the marking of each candidate segment respectively.At this time accordingly, training segment scoring model
When, it is also desirable to it obtains training in training data and replaces the specific term feature of segment and original segments and training is trained to replace
The phonetic editing distance feature of segment is changed, segment scoring model is trained together with reference to feature before.
302nd, according to the marking of each candidate segment, by decoded mode, from each original sheet for needing error correction of long text
In the candidate segment set of section, the corresponding target fragment of each original segments is obtained, so as to obtain the correction text of long text.
Finally, the marking based on each candidate segment is obtained from the candidate segment set of each original segments for needing error correction
The target fragment of each original segments obtains the correction text of long text.For example, the highest candidate segment of marking can be directly acquired
As target fragment.If the high candidate segment of marking time is combined preferably with the context in long text, can also be used and be beaten
High candidate segment is as the target fragment corrected in text by several times.Or text can also be corrected to obtain using other modes
This.
Such as the different original segments in long text are all carried out after segment recalls, each original segments can obtain multiple
Candidate segment is as a result, there are many possibility of candidate segment combination, formation segment candidates in this way, different original segments can be corresponded to
Network.Such as if certain long text includes original segments A, B and C, the corresponding candidate segments of original segments A have 1,2 and 3;It is original
The corresponding candidate segments of segment B have 4,5 and 6;The corresponding candidate segments of original segments C have 7,8 and 9;Each original sheet at this time
The candidate segment of section may be used to replace original segments, i.e. candidate segment 1 can be combined with candidate segment 4,5 or 6 respectively,
Candidate segment 2 can also be combined with candidate segment 4,5 or 6 respectively, candidate segment 3 can also respectively with candidate segment 4,5 or
Person 6 combines, and forms segment candidate network.Decoding algorithm may be employed at this time, each original segments pair are obtained from segment candidate network
The optimal candidate segment answered obtains optimal correction text.Such as decoding algorithm can include being not limited to:Viterbi algorithm
(viterbi), the decoding algorithms such as beam search (beam search) or greed search (greedy search).
Alternatively, such as step 302, specifically may include steps of:For each original segments, according to candidate segment collection
The marking of each candidate segment in conjunction obtains corresponding at least two preselected fragment of the original segments from candidate segment set;It is logical
Decoded mode is crossed, is obtained from corresponding at least two preselected fragment of each original segments for needing error correction of long text each original
The corresponding target fragment of segment, so as to obtain the correction text of long text.
It specifically, can be according to the suitable of marking height if the corresponding candidate segment quantity of each original segments is more
Sequence takes higher at least one candidate segment of giving a mark as preselected fragment, then by decoded mode, from the needs of long text
The corresponding target fragment of each original segments is obtained in corresponding at least two preselected fragment of each original segments of error correction, so as to obtain
The correction text of long text.
The long text error correction method based on artificial intelligence of the present embodiment, can entangle the false segments in long text
Just, the editting quality of long text is effectively improved.The technical solution of the present embodiment is based on the proposition of long text error correction scene, Ke Yishi
For the error correction behavior under text scene, and can quickly and effectively output error correction result, error correction efficiency is higher, can be in order to auxiliary
The contents production quality for promoting long text is helped, promotes user experience.
Fig. 5 is the flow chart of the long text error correction method embodiment two the present invention is based on artificial intelligence.As shown in figure 5, this
The long text error correction method based on artificial intelligence of embodiment, on the basis of the technical solution of above-mentioned embodiment illustrated in fig. 3, into
One step is added carries out editing distance (Edit Distance to the original segments for needing error correction;ED) segment is recalled, and is discussed in detail
Technical scheme.As shown in figure 5, the long text error correction method based on artificial intelligence of the present embodiment, can specifically wrap
Include following steps:
400th, judge whether each original segments in long text belong to specific term according to proper-noun dictionary;If belong to
In execution step 401;Otherwise, step 402 is performed;
401st, the original segments that determining long text includes are specific term, which is not required error correction, are terminated;
402nd, determine there are the original segments for being not belonging to specific term in the long text, determine to need to non-in the long text
The original segments of specific term carry out error correction;Perform step 403;
403rd, according to pre-set PT tables in the field of long text, PT segments is carried out to the original segments for needing error correction and are called together
It returns, obtains the candidate segment set of original segments, which includes multiple candidate segments;Perform step 404;
The implementation of step 400-403 may be referred to the record of above-mentioned embodiment illustrated in fig. 3 in detail, and details are not described herein.
404th, obtain original segments occur in the corresponding corpus in the field of long text the frequency, original segments and up and down
The change frequency, original segments and context of the frequency, original segments that the combination of literary segment occurs in corpus in PT tables
The semantic similarity of the change frequency and original segments and context segment of the combination of segment in PT tables;Perform step 405;
Similarly, the original segments of the present embodiment and the combination of context segment may be referred to above-mentioned embodiment illustrated in fig. 1
Correlation is recorded, and details are not described herein.The frequency that original segments occur in the corresponding corpus in the field of long text can pass through
The occurrence number of the original segments obtains in statistics corpus.The change frequency of the original segments in PT tables can be should in PT tables
The total degree for other segments that original segments are replaced by outside itself.As " blue or green China " is replaced by " Tsing-Hua University " and " green grass or young crops
China " is replaced by the total degree of all " blue or green China " being replaced such as " blue and white ".The combination of original segments and context segment is in PT
The change frequency in table can be the total degree that the original segments are replaced by other segments outside itself in PT tables.Such as " green grass or young crops
Hua Da " is replaced by the total degree of " Tsinghua University " and all replacement segments being replaced by outside " blue or green Hua Da ".
The semantic similarity of original segments and context segment in the present embodiment specifically can be by obtaining original segments
Term vector and context segment term vector, and calculate between the term vector of original segments and the term vector of context segment
Cosine similarity obtains the semantic similarity of original segments and context segment.The term vector of wherein context segment is above
Segment adds the term vector of the hereafter combination of segment.Or in the present embodiment, can also use original segments with it is original in long text
The semantic similarity of other all segments outside segment replaces the semanteme of the original segments and context segment in the present embodiment
Similarity forms new alternative.
405th, the frequency that is occurred according to original segments in the corresponding corpus in the field of long text, original segments and in length
It is the change frequency of the frequency, original segments that the combination of context segment in text occurs in corpus in PT tables, original
The change frequency of the combination of segment and context segment in PT tables, original segments and context segment semantic similarity, with
And default language smoothness degree scoring model, obtain the confidence levels of original segments;Perform step 406;
For example, the step 405 specifically includes the following two kinds realization method in the present embodiment:
In the first realization method, confidence level is judged using confidence level scoring model, specifically may include steps of:
(a5) frequency that is occurred according to original segments in the corresponding corpus in the field of long text, original segments and in length
The frequency and language smoothness degree scoring model that the combination of context segment in text occurs in corpus, prediction are original
The clear and coherent degree of segment;
The language smoothness degree scoring model of the present embodiment is used to give a mark to the clear and coherent degree of the original segments in long text.
The frequency, original segments and the context in long text that original segments are occurred in the corresponding corpus in the field of long text
The frequency that the combination of segment occurs in corpus, the language smoothness degree scoring model can predict the smoothness of the original segments
Degree.Such as the score value of the smoothness degree can be between 0-1, it is more big more clear and more coherent to limit numerical value, and numerical value is not smaller more clear and coherent.Or
Person can also represent clear and coherent sequence, such as 0-100 using other numberical ranges.
The language smoothness degree scoring model of the present embodiment can also pass through training in advance and obtain, such as gather several instructions in advance
Practice data, a trained long text is corresponded in each training data, including the training original segments in training long text in language material
The combination of the frequency, training original segments and the training context segment in training long text that occur in storehouse goes out in corpus
The known smoothness degree of the existing frequency and the training original segments.It can include known smoothness in each training data of acquisition
The positive example training data for 1 is spent, the negative example training data that known smoothness degree is 0 can also be included.The ratio of positive and negative example can be with
More than 1, for example, it is preferable to be 5:1 or 4:1.Before training, it is the parameter setting initial value of language smoothness degree scoring model, instructs
When practicing, each training data is inputted into the language smoothness degree scoring model successively, which is the instruction
Practice data prediction smoothness degree, then judge whether the clear and coherent degree of prediction and known smoothness degree are consistent, if inconsistent, adjust the language
Say the parameter of smoothness degree scoring model so that the clear and coherent degree of prediction reaches unanimity with known smoothness degree.Using aforesaid way, make
The language smoothness degree scoring model is continued to train with the training datas of tens million of, until the clear and coherent degree of prediction leads to known
Compliance is consistent, it is determined that the parameter of the language smoothness degree scoring model, so that it is determined that the language smoothness degree scoring model, the language
Clear and coherent degree scoring model training finishes.
(b5) the change frequency, original segments and the context according to the clear and coherent degree of original segments, original segments in PT tables
The semantic similarity of the change frequency and original segments and context segment of the combination of segment in PT tables, and combine advance
Trained confidence level scoring model obtains the confidence level of original segments;
Similarly, in the present embodiment, also training has confidence level scoring model in advance, which is used to obtain original
The confidence level of beginning segment.Confidence bit can be set in the present embodiment, and confidence value is bigger between 0-1, represents confidence level
Higher, confidence value is smaller, represents that confidence level is lower.In practical application, confidence level can also be arranged on to other numerical value models
Between enclosing, as between 0-100.In use, by the change frequency in PT tables of the clear and coherent degree of original segments, original segments, original
The semantic similarity of the change frequency and original segments and context segment of the combination of segment and context segment in PT tables
To trained confidence level scoring model, which can export the confidence level of original segments for input.
Similarly, the confidence level scoring model of the present embodiment can also pass through training in advance and obtain, such as gather in advance several
Training data, each training data include the change frequency of the clear and coherent degree of training original segments, training original segments in PT tables
The change frequency of the combination of secondary, training original segments and training context segment in PT tables, training original segments in training
The hereafter semantic similarity of segment and the corresponding confidence level of each training original segments, each parameter acquiring mode is the same as above-mentioned reality
Apply the related record of example.The positive example training data that known confidence level is 1 can be included in each training data of acquisition, it can be with
Include the negative example training data that known confidence level is 0.The ratio of positive and negative example can be more than 1, for example, it is preferable to be 5:1 or 4:
1.Before training, it is the parameter setting initial value of confidence level scoring model, when training, each training data is inputted to this put successively
In reliability scoring model, the confidence level scoring model be the training data forecast confidence, then judge prediction confidence level with
Whether known confidence level consistent, if inconsistent, adjusts the parameter of the confidence level scoring model so that the confidence level of prediction with
The confidence level known reaches unanimity.Using aforesaid way, the confidence level scoring model is continued using the training datas of tens million of
Training, until the confidence level of prediction is consistent with known confidence level, it is determined that the parameter of the confidence level scoring model, so that it is determined that
The confidence level scoring model, confidence level scoring model training finish.
And need to illustrate when, the training and prediction of all models arrived involved in the present embodiment, in input model
Characteristic can first pass through normalized in advance, and the mode of normalized does not limit.
In second of realization method, confidence level is judged using threshold value, specifically may include steps of:
(a6) frequency that is occurred according to original segments in the corresponding corpus in the field of long text, original segments and in length
The frequency and language smoothness degree scoring model that the combination of context segment in text occurs in corpus, prediction are original
The clear and coherent degree of segment;
The realization method of step (a6) is identical with above-mentioned steps (a5), may be referred to the record of above-mentioned steps (a5) in detail,
Details are not described herein.
(b6) judge whether the clear and coherent degree of original segments is more than default smoothness degree threshold value, original segments in PT tables respectively
The change frequency and the change frequency of the combination in PT tables of original segments and context segment whether be all higher than the default frequency
Whether the semantic similarity of threshold value and original segments and context segment is more than default similarity threshold;It if so, will be original
The confidence level of segment is set greater than default confidence threshold value;Otherwise the confidence level of original segments is set smaller than or waited
In default confidence threshold value.
In the present embodiment, pass through the change frequency and original sheet of clear and coherent degree, original segments in PT tables to original segments
The semantic similarity of the change frequency in PT tables of section and the combination of context segment, original segments and context segment is distinguished pre-
Corresponding threshold value is first set, such as smoothness degree threshold value, frequency threshold value and confidence threshold value.Then judging each parameter respectively is
More than corresponding threshold value, if each parameter is all higher than corresponding threshold value, it may be considered that confidence level is larger at this time, it can set and put
Reliability is more than default confidence threshold value, can determine that original segments need not carry out ED and recall at this time.Otherwise there was only wherein one
A parameter is not more than corresponding threshold value, it may be considered that confidence level is smaller at this time, confidence level can be set to be less than default confidence
Threshold value is spent, can determine that original segments need progress ED to recall at this time.The confidence threshold value of the present embodiment can be according to actual warp
Test appropriate numerical value there are one pre-setting.
406th, judge whether the confidence level of original segments is more than default confidence threshold value;If so, perform step 407;It is no
Then determine that original segments need not carry out ED segments and recall;Perform step 408;
407th, determine that original segments need progress ED segments to recall;And according to the pronunciation of original segments, utilize long article ability
Corpus and/or spelling input method in domain are the input prompt message that original segments provide, and ED segments are carried out to original segments
It recalls, and the candidate segment recalled is appended in candidate segment set;Perform step 408;
The ED of the present embodiment is recalled as by from the phonetic notation string i.e. pinyin of original segments, double by mixing the initial and the final
The method of deletion recalls candidate segment.Candidate segment when recalling can come from corpus, pass through the spelling according to original segments
Sound takes high frequency section by mixing double delete of the initial and the final, carries out phonetic notation, inverted index is carried out by phonetic.Such as " China ",
Phonetic notation is " zhonghua ", is recalled to expand, and part deletion is carried out to the initial and the final and is indexed, corresponding generation key-
Value can be " zhonghua ", " zhhua ", " onghua ", " zhongua ", " zhong h " } _ -->{ " China " }.Then
According to " zhonghua ", " zhhua ", " onghua ", " zhongua ", " zhong h " recalls corresponding candidate's piece from corpus
Section.Wherein " zhonghua " due to phonetic it is complete, be very easy to recall corresponding candidate segment.And " zhhua ", " onghua ", "
Zhongua ", " zhong h " can recall the candidate segment of corresponding phonetic by way of supplementing initial consonant or simple or compound vowel of a Chinese syllable.Therefore,
The candidate segment and the pronunciation of original segments that ED is recalled are same or similar.
In addition, the candidate segment that the ED of the present embodiment is recalled could also be from the recalling as a result, specifically may be used of spelling input method
Using the input prompt message provided according to spelling input method as original segments.It is accustomed to according to the common key entry of user, with current word
The initial and the final sequential system recalled, " zhonghua " " zhongh ", " zhhua " obtain spelling input method candidate word row
Table.In practical application, it can also introduce to obscure sound and be enlarged and recall result.Such as Fig. 6 obscures sound to be provided in this embodiment
Mapping table exemplary plot.As shown in fig. 6, provide partial confusion sound.It, can according to spelling input method when recalling candidate segment
With the sound of obscuring with reference to shown in figure 6, result is given in expansion for change.
408th, using segment scoring model trained in advance, respectively each candidate segment in candidate segment set is beaten
Point;Perform step 409;
409th, according to the marking of each candidate segment in candidate segment set, original segments pair are obtained from candidate segment set
At least two preselected fragments answered;Perform step 410;
410th, by decoded mode, from the corresponding at least two pre- chip select of each original segments for needing error correction of long text
The corresponding target fragment of each original segments is obtained in section, so as to obtain the correction text of long text;Perform step 411;
The specific implementation of step 408-410 may be referred to the related record of above-mentioned embodiment illustrated in fig. 3, herein no longer
It repeats.
411st, error correction intervention is carried out to correcting the segment corrected in text, determines final correction text, terminated.
For example, carry out error correction intervention to correcting the segment corrected in text in the present embodiment, specifically include such as down toward
Few one kind:
Judge whether correct the target fragment corrected in text and corresponding original segments hits default blacklist
In error correction pair;If hit, original segments are reduced to by target fragment;With
Judge whether correct the target fragment corrected in text and corresponding original segments belongs to synonym;If belong to
In target fragment then is reduced to original segments.
In blacklist in the present embodiment can according to correct before mistake error correction to being acquired.Such as to original sheet
Section is corrected as after certain target fragment, user according to correction as a result, target fragment is reduced to original segments again, then can be true
Determine error correction.The target fragment and original segments can be gathered at this time, form error correction pair.In practical application, it may be employed several
Similar error correction pair forms blacklist.And the segment of the transmission correction in corrected text is intervened according to the blacklist, example
Whether the target fragment and original segments for such as detecting correction are a pair of of error correction pair, if when, target fragment is reduced to original sheet
Section;Otherwise retain and correct text.
In addition, long text error correction mainly corrects the information of mistake, and without correcting synonym.In the present embodiment,
Synonym table can also be previously stored with, stores each word segment and its corresponding synonym segment.Then according to synonymous
Whether the target fragment and corresponding original segments that vocabulary detection is corrected belong to synonym, if belonging to, target fragment is also
Originally it was original segments;Otherwise retain and correct text.
Fig. 7 is a kind of error correction result schematic diagram of the long text error correction method based on artificial intelligence of the present embodiment.Such as
Using the long text error correction method based on artificial intelligence of the present embodiment, to long text " this Shi Fugan's is faster and better ", carry out
After error correction, available corrected text is " this master does faster and better ", it is known that the technical solution of the present embodiment
Can error correction be carried out to long text in high quality.
The long text error correction method based on artificial intelligence of the present embodiment, can entangle the false segments in long text
Just, the editting quality of long text is effectively improved.The technical solution of the present embodiment is based on the proposition of long text error correction scene, Ke Yishi
For the error correction behavior under text scene, and can quickly and effectively output error correction result, error correction efficiency is higher, can be in order to auxiliary
The contents production quality for promoting long text is helped, promotes user experience.And the technical solution of the present embodiment, it can also continue to carry out wrong
Segment, which is replaced, by mistake intervenes, and further optimizes error correction result.
The long text error correction scene that above-mentioned Fig. 3 and embodiment illustrated in fig. 5 are applied for the text error correction scheme of the present invention.It is real
In the application of border, above-mentioned Fig. 3, embodiment illustrated in fig. 5 can be used after above-mentioned Fig. 1 and embodiment illustrated in fig. 2, realize that basis is entangled
Target fragment and original segments in the feedback information and corrected text of wrong text carry out increment instruction to segment scoring model
Practice, so as to further improve the precision of the prediction marking of segment scoring model.
Fig. 8 is the structure chart of the text error correction device embodiment one based on artificial intelligence of the present invention.As shown in figure 8, this
The text error correction device based on artificial intelligence of embodiment, can specifically include:
Segment data obtaining module 10 is for obtaining in corrected text the target fragment of error correction and target fragment in original text
Corresponding original segments in this;Target fragment is to carry out correction process to original text based on segment scoring model trained in advance
When, it is selected from multiple candidate segments of original segments;
Feedback information acquisition module 11 is used to obtain feedback letter of the user to the objective result fed back based on corrected text
Breath;
Incremental training module 12 is used for the target fragment obtained according to segment data obtaining module 10, original segments and anti-
The feedback information that feedforward information acquisition module 11 obtains carries out incremental training to segment scoring model;
Correction module 13 be used for based on incremental training module 12 train after segment scoring model, to subsequent original text into
Row correction process.
The text error correction device based on artificial intelligence of the present embodiment is realized by using above-mentioned module based on artificial intelligence
Text correction process realization principle and technique effect it is identical with the realization of above-mentioned related method embodiment, can join in detail
The record for stating related method embodiment is admitted to, details are not described herein.
Fig. 9 is the structure chart of the text error correction device embodiment two based on artificial intelligence of the present invention.As shown in figure 9, this
The text error correction device based on artificial intelligence of embodiment is on the basis of the technical solution of above-mentioned embodiment illustrated in fig. 8, into one
Step can also include following technical solution.
As shown in figure 9, in the text error correction device based on artificial intelligence of the present embodiment, incremental training module 12, specifically
Including:
Relative characteristic information acquisition unit 121 is used to obtain the target fragment and original of the acquisition of segment data obtaining module 10
Relative characteristic information between segment;
Determination unit 122 is used to determine the ideal of target fragment according to the feedback information that feedback information acquisition module 11 obtains
Marking;
Training unit 123 is used for the relative characteristic information obtained according to relative characteristic information acquisition unit 121 and determines single
The preferable marking of the definite target fragment of member 122, is trained segment scoring model.
Still optionally further, in the text error correction device based on artificial intelligence of the present embodiment, relative characteristic acquisition of information
Unit 121 operates for performing following at least one:
Obtain the relative mass feature between target fragment and original segments that segment data obtaining module 10 obtains;
Obtain the opposite historical behavior feature between target fragment and original segments that segment data obtaining module 10 obtains;
With
Obtain the semantic similarity feature between target fragment and original segments that segment data obtaining module 10 obtains.
Still optionally further, relative characteristic information acquisition unit 121 is specifically used for:
Obtain the frequency, original segments that the original segments that segment data obtaining module 10 obtains occur in corpus with
The frequency that the combination of context segment in original text occurs together in corpus;
Obtain the frequency, target fragment that the target fragment that segment data obtaining module 10 obtains occurs in corpus with it is upper
The frequency that hereafter combination of segment occurs together in corpus;
The combination of the frequency, original segments and the context segment that are occurred according to original segments in corpus is in corpus
The combination of the frequency and target fragment and context segment that the frequency, the target fragment occurred together occurs in corpus is in language
The frequency that occurs in material storehouse obtains frequency ratio and target fragment that target fragment and original segments occur in corpus and upper
The frequency ratio and/or target patch that hereafter the combination of segment and the combination of original segments and context segment occur in corpus
The combination of frequency difference and target fragment and context segment that section occurs in corpus with original segments and original segments and
The frequency that the combination of context segment occurs in corpus is poor.
Still optionally further, relative characteristic information acquisition unit 121 is specifically used for:
The original segments that segment data obtaining module 10 obtains in acquisition PT tables are revised as segment data obtaining module 10 and obtain
The first modification frequency of the target fragment taken;
The combination for obtaining segment data obtaining module 10 obtains in PT tables original segments and context segment is revised as mesh
Standard film section and the second modification frequency of the combination of context segment;
According to the first modification frequency and the second modification frequency, obtain frequency ratio and/or the frequency is poor, frequency ratio is repaiied equal to second
Change the frequency divided by the first modification frequency, frequency difference is equal to the second modification frequency and subtracts the first modification frequency.
Still optionally further, relative characteristic information acquisition unit 121 is specifically used for:
Obtain the target fragment of 10 acquisition of segment data obtaining module and the semantic similarity of original segments;And/or
Obtain the combination of target fragment and context segment that segment data obtaining module 10 obtains and original segments and upper
The hereafter semantic similarity of the combination of segment.
Still optionally further, relative characteristic information acquisition unit 121 specifically is additionally operable to perform following at least one;
According to default proper-noun dictionary, the specific term feature of original segments and target fragment is obtained respectively;And
Obtain the phonetic editing distance feature of target fragment and original segments.
Still optionally further, in the text error correction device based on artificial intelligence of the present embodiment, determination unit 122 is specifically used
In:
The feedback information obtained according to feedback information acquisition module 11, thus it is speculated that whether user receives to use mesh in corrected text
Standard film section replaces original segments;
If speculating, user receives, and the preferable marking of target fragment is arranged to 1;Otherwise, if speculating, user does not receive,
The preferable marking of target fragment is arranged to 0.
Still optionally further, in the text error correction device based on artificial intelligence of the present embodiment, training unit 123 is specifically used
In:
Relative characteristic information is inputted to the prediction marking that segment scoring model is obtained to segment scoring model;
Obtain the magnitude relationship of prediction marking and preferable marking;
If prediction marking is less than preferable marking, the parameter of segment scoring model is adjusted so that the output of segment scoring model
Prediction marking is changed towards increased direction;
If prediction marking is more than preferable marking, the parameter of segment scoring model is adjusted so that the output of segment scoring model
Prediction marking is changed towards the direction of reduction.
The text error correction device based on artificial intelligence of the present embodiment is realized by using above-mentioned module based on artificial intelligence
Text correction process realization principle and technique effect it is identical with the realization of above-mentioned related method embodiment, can join in detail
The record for stating related method embodiment is admitted to, details are not described herein.
Figure 10 is the structure chart of the computer equipment embodiment of the present invention.As shown in Figure 10, the computer of the present embodiment is set
It is standby, including:One or more processors 30 and memory 40, memory 40 work as storage for storing one or more programs
The one or more programs stored in device 40 are performed by one or more processors 30 so that one or more processors 30 are realized
Such as the information processing method of figure 1 above-embodiment illustrated in fig. 7.In embodiment illustrated in fig. 10 exemplified by including multiple processors 30.
For example, Figure 11 is a kind of exemplary plot of computer equipment provided by the invention.Figure 11 shows to be used for realizing
The block diagram of the exemplary computer device 12a of embodiment of the present invention.The computer equipment 12a that Figure 11 is shown is only one and shows
Example, should not bring any restrictions to the function and use scope of the embodiment of the present invention.
As shown in figure 11, computer equipment 12a is showed in the form of universal computing device.The component of computer equipment 12a
It can include but is not limited to:One or more processor 16a, system storage 28a, connection different system component is (including being
Unite memory 28a and processor 16a) bus 18a.
Bus 18a represents the one or more in a few class bus structures, including memory bus or Memory Controller,
Peripheral bus, graphics acceleration port, processor or the local bus using the arbitrary bus structures in a variety of bus structures.It lifts
For example, these architectures include but not limited to industry standard architecture (ISA) bus, microchannel architecture (MAC)
Bus, enhanced isa bus, Video Electronics Standards Association (VESA) local bus and peripheral component interconnection (PCI) bus.
Computer equipment 12a typically comprises various computing systems readable medium.These media can be it is any can
The usable medium accessed by computer equipment 12a, including volatile and non-volatile medium, moveable and immovable Jie
Matter.
System storage 28a can include the computer system readable media of form of volatile memory, such as deposit at random
Access to memory (RAM) 30a and/or cache memory 32a.Computer equipment 12a may further include it is other it is removable/
Immovable, volatile/non-volatile computer system storage medium.Only as an example, storage system 34a can be used for reading
Write immovable, non-volatile magnetic media (Figure 11 is not shown, is commonly referred to as " hard disk drive ").Although do not show in Figure 11
Go out, can provide for moving the disc driver of non-volatile magnetic disk (such as " floppy disk ") read-write and to removable
The CD drive of anonvolatile optical disk (such as CD-ROM, DVD-ROM or other optical mediums) read-write.In these cases,
Each driver can be connected by one or more data media interfaces with bus 18a.System storage 28a can include
At least one program product, the program product have one group of (for example, at least one) program module, these program modules are configured
To perform the function of the above-mentioned each embodiments of Fig. 1-Fig. 9 of the present invention.
Program with one group of (at least one) program module 42a/utility 40a can be stored in such as system and deposit
In reservoir 28a, such program module 42a include --- but being not limited to --- operating system, one or more application program,
Other program modules and program data may include the reality of network environment in each or certain combination in these examples
It is existing.Program module 42a usually performs the function and/or method in above-mentioned each embodiments of Fig. 1-Fig. 9 described in the invention.
Computer equipment 12a can also be with one or more external equipment 14a (such as keyboard, sensing equipment, display
24a etc.) communication, can also be enabled a user to one or more equipment interact with computer equipment 12a communicate and/or
(such as network interface card is adjusted with enabling any equipment that computer equipment 12a communicates with one or more of the other computing device
Modulator-demodulator etc.) communication.This communication can be carried out by input/output (I/O) interface 22a.Also, computer equipment
12a can also by network adapter 20a and one or more network (such as LAN (LAN), wide area network (WAN) and/or
Public network, such as internet) communication.As shown in the figure, network adapter 20a by bus 18a and computer equipment 12a its
Its module communicates.It should be understood that although not shown in the drawings, can combine computer equipment 12a uses other hardware and/or software
Module includes but not limited to:Microcode, device driver, redundant processor, external disk drive array, RAID system, tape
Driver and data backup storage system etc..
Processor 16a is stored in program in system storage 28a by operation, so as to perform various functions application and
Data processing, such as realize the text error correction method based on artificial intelligence shown in above-described embodiment.
The present invention also provides a kind of computer-readable mediums, are stored thereon with computer program, which is held by processor
The text error correction method based on artificial intelligence as shown in above-described embodiment is realized during row.
The computer-readable medium of the present embodiment can be included in the system storage 28a in above-mentioned embodiment illustrated in fig. 11
RAM30a, and/or cache memory 32a, and/or storage system 34a.
With the development of science and technology, the route of transmission of computer program is no longer limited by tangible medium, it can also be directly from net
Network is downloaded or obtained using other modes.Therefore, the computer-readable medium in the present embodiment can not only include tangible
Medium can also include invisible medium.
Any combination of one or more computer-readable media may be employed in the computer-readable medium of the present embodiment.
Computer-readable medium can be computer-readable signal media or computer readable storage medium.Computer-readable storage medium
Matter for example may be-but not limited to-electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor system, device or device or
The arbitrary above combination of person.The more specific example (non exhaustive list) of computer readable storage medium includes:There are one tools
Or the electrical connections of multiple conducting wires, portable computer diskette, hard disk, random access memory (RAM), read-only memory (ROM),
Erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light
Memory device, magnetic memory device or above-mentioned any appropriate combination.In this document, computer readable storage medium can
To be any tangible medium for including or storing program, the program can be commanded execution system, device or device use or
Person is in connection.
Computer-readable signal media can include in a base band or as carrier wave a part propagation data-signal,
Wherein carry computer-readable program code.Diversified forms may be employed in the data-signal of this propagation, including --- but
It is not limited to --- electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be
Any computer-readable medium beyond computer readable storage medium, which can send, propagate or
Transmission for by instruction execution system, device either device use or program in connection.
The program code included on computer-readable medium can be transmitted with any appropriate medium, including --- but it is unlimited
In --- wireless, electric wire, optical cable, RF etc. or above-mentioned any appropriate combination.
It can write to perform the computer that operates of the present invention with one or more programming languages or its combination
Program code, described program design language include object oriented program language-such as Java, Smalltalk, C++,
Further include conventional procedural programming language-such as " C " language or similar programming language.Program code can be with
It fully performs, partly perform on the user computer on the user computer, the software package independent as one performs, portion
Divide and partly perform or perform on a remote computer or server completely on the remote computer on the user computer.
Be related in the situation of remote computer, remote computer can pass through the network of any kind --- including LAN (LAN) or
Wide area network (WAN)-be connected to subscriber computer or, it may be connected to outer computer (such as is carried using Internet service
Pass through Internet connection for business).
In several embodiments provided by the present invention, it should be understood that disclosed system, apparatus and method can be with
It realizes by another way.For example, the apparatus embodiments described above are merely exemplary, for example, the unit
Division is only a kind of division of logic function, can there is other dividing mode in actual implementation.
The unit illustrated as separating component may or may not be physically separate, be shown as unit
The component shown may or may not be physical location, you can be located at a place or can also be distributed to multiple
In network element.Some or all of unit therein can be selected to realize the mesh of this embodiment scheme according to the actual needs
's.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, it can also
That unit is individually physically present, can also two or more units integrate in a unit.Above-mentioned integrated list
The form that hardware had both may be employed in member is realized, can also be realized in the form of hardware adds SFU software functional unit.
The above-mentioned integrated unit realized in the form of SFU software functional unit, can be stored in one and computer-readable deposit
In storage media.Above-mentioned SFU software functional unit is stored in a storage medium, is used including some instructions so that a computer
It is each that equipment (can be personal computer, server or the network equipment etc.) or processor (processor) perform the present invention
The part steps of embodiment the method.And foregoing storage medium includes:USB flash disk, mobile hard disk, read-only memory (Read-
Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disc or CD etc. it is various
The medium of program code can be stored.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all essences in the present invention
God and any modification, equivalent substitution, improvement and etc. within principle, done, should be included within the scope of protection of the invention.
Claims (20)
1. a kind of text error correction method based on artificial intelligence, which is characterized in that the described method includes:
Obtain target fragment and the target fragment corresponding original segments in original text of error correction in corrected text;It is described
Target fragment is when carrying out correction process to the original text based on segment scoring model trained in advance, from the original segments
Multiple candidate segments in select;
Obtain feedback information of the user to the objective result fed back based on the corrected text;
According to the target fragment, the original segments and the feedback information, increment is carried out to the segment scoring model
Training;
Based on the segment scoring model after training, correction process is carried out to subsequent original text.
2. according to the method described in claim 1, it is characterized in that, according to the target fragment, the original segments and institute
Feedback information is stated, incremental training is carried out to the segment scoring model, is specifically included:
Obtain the relative characteristic information between the target fragment and the original segments;
Determine that the preferable of the target fragment is given a mark according to the feedback information;
It is given a mark according to the relative characteristic information and the preferable of the target fragment, the segment scoring model is trained.
3. it according to the method described in claim 2, it is characterized in that, obtains between the target fragment and the original segments
Relative characteristic information, including following at least one:
Obtain the relative mass feature between the target fragment and the original segments;
Obtain the opposite historical behavior feature between the target fragment and the original segments;With
Obtain the semantic similarity feature between the target fragment and the original segments.
4. it according to the method described in claim 3, it is characterized in that, obtains between the target fragment and the original segments
Relative mass feature, specifically includes:
Obtain the frequency, the original segments and the context in the original text that the original segments occur in corpus
The frequency that the combination of segment occurs together in the corpus;
Obtain the frequency, the group of the target fragment and the context segment that the target fragment occurs in the corpus
Close the frequency occurred together in the corpus;
The frequency, the group of the original segments and the context segment occurred according to the original segments in the corpus
Close the frequency and the mesh that the frequency, the target fragment occurred together in the corpus occurs in the corpus
The frequency that the combination of standard film section and the context segment occurs in the corpus, obtains the target fragment and the original
Frequency ratio that beginning segment occurs in the corpus and the combination of the target fragment and the context segment with it is described
Frequency ratio that the combination of original segments and the context segment occurs in the corpus and/or the target fragment with
The frequency that the original segments occur in the corpus is poor and the combination of the target fragment and the context segment
The frequency that combination with the original segments and the context segment occurs in the corpus is poor.
5. it according to the method described in claim 3, it is characterized in that, obtains between the target fragment and the original segments
With respect to historical behavior feature, specifically include:
Obtain the first modification frequency that original segments described in PT tables are revised as the target fragment;
Obtain original segments and the context segment described in PT tables combination be revised as the target fragment with it is described up and down
The second modification frequency of the combination of literary segment;
According to the described first modification frequency and the second modification frequency, obtain frequency ratio and/or the frequency is poor, described frequency ratio etc.
In the described second modification frequency divided by the first modification frequency, the frequency difference subtracts described equal to the described second modification frequency
The first modification frequency.
6. it according to the method described in claim 3, it is characterized in that, obtains between the target fragment and the original segments
Semantic similarity feature, specifically includes:
Obtain the semantic similarity of the target fragment and the original segments;And/or
Obtain the target fragment and the combination of the context segment and the original segments and the group of the context segment
The semantic similarity of conjunction.
7. according to any methods of claim 3-6, which is characterized in that obtain the target fragment and the original segments
Between relative characteristic information, further include following at least one;
According to default proper-noun dictionary, the specific term feature of the original segments and the target fragment is obtained respectively;With
And
Obtain the phonetic editing distance feature of the target fragment and the original segments.
8. according to the method described in claim 2, it is characterized in that, the reason of the target fragment is determined according to the feedback information
Want to give a mark, specifically include:
According to the feedback information, thus it is speculated that whether the user receives to replace institute using the target fragment in the corrected text
State original segments;
If speculating, the user receives, and the preferable marking of the target fragment is arranged to 1;Otherwise, if speculating the user
Do not receive, then the preferable marking of the target fragment is arranged to 0.
9. according to the method described in claim 2, it is characterized in that, according to the relative characteristic information and the target fragment
Ideal marking, is trained the segment scoring model, specifically includes:
The relative characteristic information is inputted to the prediction marking for the segment scoring model, obtaining the segment scoring model;
Obtain the magnitude relationship of the prediction marking and the preferable marking;
If the prediction marking is less than the preferable parameter given a mark, adjust the segment scoring model so that the segment is beaten
The prediction marking of sub-model output is changed towards increased direction;
If the prediction marking is more than the preferable parameter given a mark, adjust the segment scoring model so that the segment is beaten
The prediction marking of sub-model output is changed towards the direction of reduction.
10. a kind of text error correction device based on artificial intelligence, which is characterized in that described device includes:
Segment data obtaining module, for obtaining in corrected text the target fragment of error correction and the target fragment in original text
In corresponding original segments;The target fragment is to carry out error correction to the original text based on segment scoring model trained in advance
During processing, selected from multiple candidate segments of the original segments;
Feedback information acquisition module, for obtaining feedback letter of the user to the objective result fed back based on the corrected text
Breath;
Incremental training module, for according to the target fragment, the original segments and the feedback information, to the segment
Scoring model carries out incremental training;
Correction module, for based on the segment scoring model after training, correction process to be carried out to subsequent original text.
11. device according to claim 10, which is characterized in that the incremental training module specifically includes:
Relative characteristic information acquisition unit, for obtaining the letter of the relative characteristic between the target fragment and the original segments
Breath;
Determination unit, for determining that the preferable of the target fragment is given a mark according to the feedback information;
Training unit for being given a mark according to the relative characteristic information and the preferable of the target fragment, gives a mark to the segment
Model is trained.
12. according to the devices described in claim 11, which is characterized in that the relative characteristic information acquisition unit, for performing
Following at least one operation:
Obtain the relative mass feature between the target fragment and the original segments;
Obtain the opposite historical behavior feature between the target fragment and the original segments;With
Obtain the semantic similarity feature between the target fragment and the original segments.
13. device according to claim 12, which is characterized in that the relative characteristic information acquisition unit is specifically used for:
Obtain the frequency, the original segments and the context in the original text that the original segments occur in corpus
The frequency that the combination of segment occurs together in the corpus;
Obtain the frequency, the group of the target fragment and the context segment that the target fragment occurs in the corpus
Close the frequency occurred together in the corpus;
The frequency, the group of the original segments and the context segment occurred according to the original segments in the corpus
Close the frequency and the mesh that the frequency, the target fragment occurred together in the corpus occurs in the corpus
The frequency that the combination of standard film section and the context segment occurs in the corpus, obtains the target fragment and the original
Frequency ratio that beginning segment occurs in the corpus and the combination of the target fragment and the context segment with it is described
Frequency ratio that the combination of original segments and the context segment occurs in the corpus and/or the target fragment with
The frequency that the original segments occur in the corpus is poor and the combination of the target fragment and the context segment
The frequency that combination with the original segments and the context segment occurs in the corpus is poor.
14. device according to claim 12, which is characterized in that the relative characteristic information acquisition unit is specifically used for:
Obtain the first modification frequency that original segments described in PT tables are revised as the target fragment;
Obtain original segments and the context segment described in PT tables combination be revised as the target fragment with it is described up and down
The second modification frequency of the combination of literary segment;
According to the described first modification frequency and the second modification frequency, obtain frequency ratio and/or the frequency is poor, described frequency ratio etc.
In the described second modification frequency divided by the first modification frequency, the frequency difference subtracts described equal to the described second modification frequency
The first modification frequency.
15. device according to claim 12, which is characterized in that the relative characteristic information acquisition unit is specifically used for:
Obtain the semantic similarity of the target fragment and the original segments;And/or
Obtain the target fragment and the combination of the context segment and the original segments and the group of the context segment
The semantic similarity of conjunction.
16. according to any devices of claim 12-15, which is characterized in that the relative characteristic information acquisition unit, also
For performing following at least one;
According to default proper-noun dictionary, the specific term feature of the original segments and the target fragment is obtained respectively;With
And
Obtain the phonetic editing distance feature of the target fragment and the original segments.
17. according to the devices described in claim 11, which is characterized in that the determination unit is specifically used for:
According to the feedback information, thus it is speculated that whether the user receives to replace institute using the target fragment in the corrected text
State original segments;
If speculating, the user receives, and the preferable marking of the target fragment is arranged to 1;Otherwise, if speculating the user
Do not receive, then the preferable marking of the target fragment is arranged to 0.
18. according to the devices described in claim 11, which is characterized in that the training unit is specifically used for:
The relative characteristic information is inputted to the prediction marking for the segment scoring model, obtaining the segment scoring model;
Obtain the magnitude relationship of the prediction marking and the preferable marking;
If the prediction marking is less than the preferable parameter given a mark, adjust the segment scoring model so that the segment is beaten
The prediction marking of sub-model output is changed towards increased direction;
If the prediction marking is more than the preferable parameter given a mark, adjust the segment scoring model so that the segment is beaten
The prediction marking of sub-model output is changed towards the direction of reduction.
19. a kind of computer equipment, which is characterized in that the equipment includes:
One or more processors;
Memory, for storing one or more programs;
When one or more of programs are performed by one or more of processors so that one or more of processors are real
The now method as described in any in claim 1-9.
20. a kind of computer-readable medium, is stored thereon with computer program, which is characterized in that the program is executed by processor
Methods of the Shi Shixian as described in any in claim 1-9.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711159880.7A CN108052499B (en) | 2017-11-20 | 2017-11-20 | Text error correction method and device based on artificial intelligence and computer readable medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711159880.7A CN108052499B (en) | 2017-11-20 | 2017-11-20 | Text error correction method and device based on artificial intelligence and computer readable medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108052499A true CN108052499A (en) | 2018-05-18 |
CN108052499B CN108052499B (en) | 2021-06-11 |
Family
ID=62118964
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711159880.7A Active CN108052499B (en) | 2017-11-20 | 2017-11-20 | Text error correction method and device based on artificial intelligence and computer readable medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108052499B (en) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108831212A (en) * | 2018-06-28 | 2018-11-16 | 深圳语易教育科技有限公司 | A kind of oral English teaching auxiliary device and method |
CN109032375A (en) * | 2018-06-29 | 2018-12-18 | 北京百度网讯科技有限公司 | Candidate text sort method, device, equipment and storage medium |
CN109376362A (en) * | 2018-11-30 | 2019-02-22 | 武汉斗鱼网络科技有限公司 | A kind of the determination method and relevant device of corrected text |
CN109766538A (en) * | 2018-11-21 | 2019-05-17 | 北京捷通华声科技股份有限公司 | A kind of text error correction method, device, electronic equipment and storage medium |
CN110399607A (en) * | 2019-06-04 | 2019-11-01 | 深思考人工智能机器人科技(北京)有限公司 | A kind of conversational system text error correction system and method based on phonetic |
CN111160013A (en) * | 2019-12-30 | 2020-05-15 | 北京百度网讯科技有限公司 | Text error correction method and device |
CN111339755A (en) * | 2018-11-30 | 2020-06-26 | 中国移动通信集团浙江有限公司 | Automatic error correction method and device for office data |
CN111832288A (en) * | 2020-07-27 | 2020-10-27 | 网易有道信息技术(北京)有限公司 | Text correction method and device, electronic equipment and storage medium |
CN112541342A (en) * | 2020-12-08 | 2021-03-23 | 北京百度网讯科技有限公司 | Text error correction method and device, electronic equipment and storage medium |
CN112733529A (en) * | 2019-10-28 | 2021-04-30 | 阿里巴巴集团控股有限公司 | Text error correction method and device |
CN113159035A (en) * | 2021-05-10 | 2021-07-23 | 北京世纪好未来教育科技有限公司 | Image processing method, device, equipment and storage medium |
CN114328798A (en) * | 2021-11-09 | 2022-04-12 | 腾讯科技(深圳)有限公司 | Processing method, device, equipment, storage medium and program product for searching text |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2003083858A1 (en) * | 2002-03-28 | 2003-10-09 | Koninklijke Philips Electronics N.V. | Time domain watermarking of multimedia signals |
US20030228007A1 (en) * | 2002-06-10 | 2003-12-11 | Fujitsu Limited | Caller identifying method, program, and apparatus and recording medium |
EP1593049A1 (en) * | 2003-02-11 | 2005-11-09 | Telstra Corporation Limited | System for predicting speec recognition accuracy and development for a dialog system |
CN101866336A (en) * | 2009-04-14 | 2010-10-20 | 华为技术有限公司 | Methods, devices and systems for obtaining evaluation unit and establishing syntactic path dictionary |
CN104915264A (en) * | 2015-05-29 | 2015-09-16 | 北京搜狗科技发展有限公司 | Input error-correction method and device |
CN105068661A (en) * | 2015-09-07 | 2015-11-18 | 百度在线网络技术(北京)有限公司 | Man-machine interaction method and system based on artificial intelligence |
CN105374356A (en) * | 2014-08-29 | 2016-03-02 | 株式会社理光 | Speech recognition method, speech assessment method, speech recognition system, and speech assessment system |
CN106528597A (en) * | 2016-09-23 | 2017-03-22 | 百度在线网络技术(北京)有限公司 | POI (Point Of Interest) labeling method and device |
CN106598939A (en) * | 2016-10-21 | 2017-04-26 | 北京三快在线科技有限公司 | Method and device for text error correction, server and storage medium |
CN107133209A (en) * | 2017-03-29 | 2017-09-05 | 北京百度网讯科技有限公司 | Comment generation method and device, equipment and computer-readable recording medium based on artificial intelligence |
CN107239446A (en) * | 2017-05-27 | 2017-10-10 | 中国矿业大学 | A kind of intelligence relationship extracting method based on neutral net Yu notice mechanism |
CN107357775A (en) * | 2017-06-05 | 2017-11-17 | 百度在线网络技术(北京)有限公司 | The text error correction method and device of Recognition with Recurrent Neural Network based on artificial intelligence |
-
2017
- 2017-11-20 CN CN201711159880.7A patent/CN108052499B/en active Active
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2003083858A1 (en) * | 2002-03-28 | 2003-10-09 | Koninklijke Philips Electronics N.V. | Time domain watermarking of multimedia signals |
US20030228007A1 (en) * | 2002-06-10 | 2003-12-11 | Fujitsu Limited | Caller identifying method, program, and apparatus and recording medium |
EP1593049A1 (en) * | 2003-02-11 | 2005-11-09 | Telstra Corporation Limited | System for predicting speec recognition accuracy and development for a dialog system |
CN101866336A (en) * | 2009-04-14 | 2010-10-20 | 华为技术有限公司 | Methods, devices and systems for obtaining evaluation unit and establishing syntactic path dictionary |
CN105374356A (en) * | 2014-08-29 | 2016-03-02 | 株式会社理光 | Speech recognition method, speech assessment method, speech recognition system, and speech assessment system |
CN104915264A (en) * | 2015-05-29 | 2015-09-16 | 北京搜狗科技发展有限公司 | Input error-correction method and device |
CN105068661A (en) * | 2015-09-07 | 2015-11-18 | 百度在线网络技术(北京)有限公司 | Man-machine interaction method and system based on artificial intelligence |
CN106528597A (en) * | 2016-09-23 | 2017-03-22 | 百度在线网络技术(北京)有限公司 | POI (Point Of Interest) labeling method and device |
CN106598939A (en) * | 2016-10-21 | 2017-04-26 | 北京三快在线科技有限公司 | Method and device for text error correction, server and storage medium |
CN107133209A (en) * | 2017-03-29 | 2017-09-05 | 北京百度网讯科技有限公司 | Comment generation method and device, equipment and computer-readable recording medium based on artificial intelligence |
CN107239446A (en) * | 2017-05-27 | 2017-10-10 | 中国矿业大学 | A kind of intelligence relationship extracting method based on neutral net Yu notice mechanism |
CN107357775A (en) * | 2017-06-05 | 2017-11-17 | 百度在线网络技术(北京)有限公司 | The text error correction method and device of Recognition with Recurrent Neural Network based on artificial intelligence |
Non-Patent Citations (2)
Title |
---|
MEHDI BEN LAZREG: ""Vector representation of non-standard spelling using dynamic time warping and a denoising autoencoder"", 《2017 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION》 * |
林新建 等: ""一种基于纠错编码的可逆文本水印算法"", 《计算机应用与软件》 * |
Cited By (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108831212A (en) * | 2018-06-28 | 2018-11-16 | 深圳语易教育科技有限公司 | A kind of oral English teaching auxiliary device and method |
CN109032375B (en) * | 2018-06-29 | 2022-07-19 | 北京百度网讯科技有限公司 | Candidate text sorting method, device, equipment and storage medium |
CN109032375A (en) * | 2018-06-29 | 2018-12-18 | 北京百度网讯科技有限公司 | Candidate text sort method, device, equipment and storage medium |
CN109766538A (en) * | 2018-11-21 | 2019-05-17 | 北京捷通华声科技股份有限公司 | A kind of text error correction method, device, electronic equipment and storage medium |
CN109766538B (en) * | 2018-11-21 | 2023-12-15 | 北京捷通华声科技股份有限公司 | Text error correction method and device, electronic equipment and storage medium |
CN109376362A (en) * | 2018-11-30 | 2019-02-22 | 武汉斗鱼网络科技有限公司 | A kind of the determination method and relevant device of corrected text |
CN111339755A (en) * | 2018-11-30 | 2020-06-26 | 中国移动通信集团浙江有限公司 | Automatic error correction method and device for office data |
CN110399607A (en) * | 2019-06-04 | 2019-11-01 | 深思考人工智能机器人科技(北京)有限公司 | A kind of conversational system text error correction system and method based on phonetic |
CN110399607B (en) * | 2019-06-04 | 2023-04-07 | 深思考人工智能机器人科技(北京)有限公司 | Pinyin-based dialog system text error correction system and method |
CN112733529B (en) * | 2019-10-28 | 2023-09-29 | 阿里巴巴集团控股有限公司 | Text error correction method and device |
CN112733529A (en) * | 2019-10-28 | 2021-04-30 | 阿里巴巴集团控股有限公司 | Text error correction method and device |
CN111160013A (en) * | 2019-12-30 | 2020-05-15 | 北京百度网讯科技有限公司 | Text error correction method and device |
CN111160013B (en) * | 2019-12-30 | 2023-11-24 | 北京百度网讯科技有限公司 | Text error correction method and device |
CN111832288B (en) * | 2020-07-27 | 2023-09-29 | 网易有道信息技术(北京)有限公司 | Text correction method and device, electronic equipment and storage medium |
CN111832288A (en) * | 2020-07-27 | 2020-10-27 | 网易有道信息技术(北京)有限公司 | Text correction method and device, electronic equipment and storage medium |
CN112541342A (en) * | 2020-12-08 | 2021-03-23 | 北京百度网讯科技有限公司 | Text error correction method and device, electronic equipment and storage medium |
CN113159035B (en) * | 2021-05-10 | 2022-06-07 | 北京世纪好未来教育科技有限公司 | Image processing method, device, equipment and storage medium |
CN113159035A (en) * | 2021-05-10 | 2021-07-23 | 北京世纪好未来教育科技有限公司 | Image processing method, device, equipment and storage medium |
CN114328798A (en) * | 2021-11-09 | 2022-04-12 | 腾讯科技(深圳)有限公司 | Processing method, device, equipment, storage medium and program product for searching text |
CN114328798B (en) * | 2021-11-09 | 2024-02-23 | 腾讯科技(深圳)有限公司 | Processing method, device, equipment, storage medium and program product for searching text |
Also Published As
Publication number | Publication date |
---|---|
CN108052499B (en) | 2021-06-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108052499A (en) | Text error correction method, device and computer-readable medium based on artificial intelligence | |
CN108091328A (en) | Speech recognition error correction method, device and readable medium based on artificial intelligence | |
KR102577514B1 (en) | Method, apparatus for text generation, device and storage medium | |
EP3230896B1 (en) | Localization complexity of arbitrary language assets and resources | |
CN108108349A (en) | Long text error correction method, device and computer-readable medium based on artificial intelligence | |
CN109344413B (en) | Translation processing method, translation processing device, computer equipment and computer readable storage medium | |
CN106534548B (en) | Voice error correction method and device | |
US11210470B2 (en) | Automatic text segmentation based on relevant context | |
CN107678561A (en) | Phonetic entry error correction method and device based on artificial intelligence | |
CN107330011A (en) | The recognition methods of the name entity of many strategy fusions and device | |
CN109753636A (en) | Machine processing and text error correction method and device calculate equipment and storage medium | |
CN110750959A (en) | Text information processing method, model training method and related device | |
CN106537370A (en) | Method and system for robust tagging of named entities in the presence of source or translation errors | |
CN107832299A (en) | Rewriting processing method, device and the computer-readable recording medium of title based on artificial intelligence | |
CN105068997B (en) | The construction method and device of parallel corpora | |
CN110110327A (en) | A kind of text marking method and apparatus based on confrontation study | |
CN109032375A (en) | Candidate text sort method, device, equipment and storage medium | |
CN110347790B (en) | Text duplicate checking method, device and equipment based on attention mechanism and storage medium | |
CN103488627B (en) | Full piece patent document interpretation method and translation system | |
US11593557B2 (en) | Domain-specific grammar correction system, server and method for academic text | |
CN109918627A (en) | Document creation method, device, electronic equipment and storage medium | |
CN110008309A (en) | A kind of short phrase picking method and device | |
CN109710922A (en) | Text recognition method, device, computer equipment and storage medium | |
CN109033073B (en) | Text inclusion recognition method and device based on vocabulary dependency triple | |
Qin et al. | Learning latent semantic annotations for grounding natural language to structured data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |