CN108052499A

CN108052499A - Text error correction method, device and computer-readable medium based on artificial intelligence

Info

Publication number: CN108052499A
Application number: CN201711159880.7A
Authority: CN
Inventors: 肖求根; 詹金波; 郑利群; 邓卓彬; 何径舟
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2017-11-20
Filing date: 2017-11-20
Publication date: 2018-05-18
Anticipated expiration: 2037-11-20
Also published as: CN108052499B

Abstract

The present invention provides a kind of text error correction method, device and computer-readable medium based on artificial intelligence.Its method includes：Obtain target fragment and the target fragment corresponding original segments in original text of error correction in corrected text；Target fragment is when carrying out correction process to original text based on segment scoring model trained in advance, to be selected from multiple candidate segments of original segments；Obtain feedback information of the user to the objective result fed back based on corrected text；According to target fragment, original segments and feedback information, incremental training is carried out to segment scoring model；Based on the segment scoring model after training, correction process is carried out to subsequent original text.Technical scheme when carrying out text error correction using the segment scoring model after training, can effectively improve the error correction accuracy rate of text.

Description

Text error correction method, device and computer-readable medium based on artificial intelligence

【Technical field】

The present invention relates to Computer Applied Technology field more particularly to a kind of text error correction method based on artificial intelligence, Device and computer-readable medium.

【Background technology】

Artificial intelligence (Artificial Intelligence；AI), it is research, develops to simulate, extend and extend people Intelligence theory, method, a new technological sciences of technology and application system.Artificial intelligence is one of computer science Branch, it attempts to understand the essence of intelligence, and produces a kind of new intelligence that can be made a response in a manner that human intelligence is similar Energy machine, the research in the field include robot, language identification, image identification, natural language processing and expert system etc..

With the development of science and technology, the pattern of the human-computer interaction under various scenes is more and more, user can be greatlyd improve Experience Degree.For example, in scene is searched for, user searches for query by inputting, and search server can be according to input by user The text of query is searched for, corresponding search result is obtained and feeds back to user.Or it is provided online by smart machine in others In the scene of counseling services or shopping guide's service, smart machine can also receive text input by user, and be inputted based on user Text make certain feedback.In above-mentioned all scenes, text input by user may all can there are certain mistake, After getting text input by user, it is required to carry out error correction to text, more accurately to understand the demand of user.In order to Error correction is effectively carried out to text, passes through trained very intelligent network model, and based on training in advance in currently available technology Network model realize error correction to text.

But in the prior art, after network model trains, just immobilize, after a period of time, Ke Nengwu Method carries out error correction to text exactly, and the accuracy so as to cause text error correction is poor.

【The content of the invention】

The present invention provides a kind of text error correction method, device and computer-readable mediums based on artificial intelligence, are used for Improve the accuracy of text error correction.

The present invention provides a kind of text error correction method based on artificial intelligence, the described method includes：

Obtain target fragment and the target fragment corresponding original segments in original text of error correction in corrected text； The target fragment is when carrying out correction process to the original text based on segment scoring model trained in advance, from described original It is selected in multiple candidate segments of segment；

Obtain feedback information of the user to the objective result fed back based on the corrected text；

According to the target fragment, the original segments and the feedback information, the segment scoring model is carried out Incremental training；

Based on the segment scoring model after training, correction process is carried out to subsequent original text.

Still optionally further, in method as described above, according to the target fragment, original segments and described anti- Feedforward information carries out incremental training to the segment scoring model, specifically includes：

Obtain the relative characteristic information between the target fragment and the original segments；

Determine that the preferable of the target fragment is given a mark according to the feedback information；

It is given a mark according to the relative characteristic information and the preferable of the target fragment, the segment scoring model is instructed Practice.

Still optionally further, in method as described above, the phase between the target fragment and the original segments is obtained To characteristic information, including following at least one：

Obtain the relative mass feature between the target fragment and the original segments；

Obtain the opposite historical behavior feature between the target fragment and the original segments；With

Obtain the semantic similarity feature between the target fragment and the original segments.

Still optionally further, in method as described above, the phase between the target fragment and the original segments is obtained To qualitative character, specifically include：

Obtain the frequency, the original segments that the original segments occur in corpus with it is upper in the original text The frequency that hereafter combination of segment occurs together in the corpus；

Obtain the frequency, the target fragment and the context segment that the target fragment occurs in the corpus The frequency that occurs together in the corpus of combination；

The frequency, the original segments and the context segment occurred according to the original segments in the corpus The frequency that occurs in the corpus of the combination frequency, the target fragment that occur together in the corpus and institute The frequency that the combination of target fragment and the context segment occurs in the corpus is stated, obtains the target fragment and institute State the combination of frequency ratio that original segments occur in the corpus and the target fragment and the context segment with The frequency ratio and/or the target patch that the combination of the original segments and the context segment occurs in the corpus The frequency that section occurs in the corpus with the original segments is poor and the target fragment and the context segment It is poor to combine the frequency that the combination with the original segments and the context segment occurs in the corpus.

Still optionally further, in method as described above, the phase between the target fragment and the original segments is obtained To historical behavior feature, specifically include：

Obtain the first modification frequency that original segments described in PT tables are revised as the target fragment；

Obtain PT tables described in original segments and the context segment combination be revised as the target fragment with it is described The second modification frequency of the combination of context segment；

According to the described first modification frequency and the second modification frequency, obtain frequency ratio and/or the frequency is poor, the frequency Than being equal to the second modification frequency divided by the first modification frequency, the frequency difference is equal to the described second modification frequency and subtracts The first modification frequency.

Still optionally further, in method as described above, the language between the target fragment and the original segments is obtained Adopted similarity feature, specifically includes：

Obtain the semantic similarity of the target fragment and the original segments；And/or

Obtain the target fragment and the combination of the context segment and the original segments and the context segment Combination semantic similarity.

Still optionally further, in method as described above, the phase between the target fragment and the original segments is obtained To characteristic information, following at least one is further included；

According to default proper-noun dictionary, the specific term for obtaining the original segments and the target fragment respectively is special Sign；And

Obtain the phonetic editing distance feature of the target fragment and the original segments.

Still optionally further, in method as described above, the ideal of the target fragment is determined according to the feedback information Marking, specifically includes：

According to the feedback information, thus it is speculated that whether the user receives is replaced in the corrected text using the target fragment Change the original segments；

If speculating, the user receives, and the preferable marking of the target fragment is arranged to 1；Otherwise, if described in speculating User does not receive, then the preferable marking of the target fragment is arranged to 0.

Still optionally further, in method as described above, according to the relative characteristic information and the reason of the target fragment Want to give a mark, the segment scoring model is trained, is specifically included：

The relative characteristic information is inputted and to the segment scoring model, obtains the prediction of the segment scoring model and beats Point；

Obtain the magnitude relationship of the prediction marking and the preferable marking；

If the prediction marking is less than the preferable parameter given a mark, adjust the segment scoring model so that described The prediction marking of section scoring model output is changed towards increased direction；

If the prediction marking is more than the preferable parameter given a mark, adjust the segment scoring model so that described The prediction marking of section scoring model output is changed towards the direction of reduction.

The present invention provides a kind of text error correction device based on artificial intelligence, and described device includes：

Segment data obtaining module, for obtaining in corrected text the target fragment of error correction and the target fragment in original Corresponding original segments in text；The target fragment is that the original text is carried out based on segment scoring model trained in advance During correction process, selected from multiple candidate segments of the original segments；

Feedback information acquisition module, for obtaining feedback of the user to the objective result fed back based on the corrected text Information；

Incremental training module, for according to the target fragment, the original segments and the feedback information, to described Segment scoring model carries out incremental training；

Correction module, for based on the segment scoring model after training, correction process to be carried out to subsequent original text.

Still optionally further, in device as described above, the incremental training module specifically includes：

Relative characteristic information acquisition unit, for obtaining the relative characteristic between the target fragment and the original segments Information；

Determination unit, for determining that the preferable of the target fragment is given a mark according to the feedback information；

Training unit, for being given a mark according to the relative characteristic information and the preferable of the target fragment, to the segment Scoring model is trained.

Still optionally further, in device as described above, the relative characteristic information acquisition unit, for perform such as down toward A kind of few operation：

Still optionally further, in device as described above, the relative characteristic information acquisition unit is specifically used for：

Still optionally further, in device as described above, the relative characteristic information acquisition unit is additionally operable to perform as follows It is at least one；

Still optionally further, in device as described above, the determination unit is specifically used for：

Still optionally further, in device as described above, the training unit is specifically used for：

The present invention also provides a kind of computer equipment, the equipment includes：

One or more processors；

Memory, for storing one or more programs；

When one or more of programs are performed by one or more of processors so that one or more of processing Device realizes the text error correction method based on artificial intelligence as described above.

The present invention also provides a kind of computer-readable mediums, are stored thereon with computer program, which is held by processor The text error correction method based on artificial intelligence as described above is realized during row.

Text error correction method, device and the computer-readable medium based on artificial intelligence of the present invention, by obtaining error correction The target fragment of error correction and the target fragment corresponding original segments in original text in text；Target fragment is based on advance instruction When experienced segment scoring model carries out correction process to original text, selected from multiple candidate segments of original segments；It obtains Feedback information of the user to the objective result fed back based on corrected text；According to target fragment, original segments and feedback letter Breath carries out incremental training to segment scoring model；Based on the segment scoring model after training, error correction is carried out to subsequent original text Processing.Technical scheme, by according to target fragment, original segments and feedback information, to segment scoring model into Row incremental training can improve the forecasting accuracy of segment scoring model, and text is carried out using the segment scoring model after training During error correction, the error correction accuracy rate of text can be effectively improved.Such as technical scheme is applied in long text editor In, can user experience be promoted with the contents production quality of service hoisting long text.

【Description of the drawings】

Fig. 1 is the flow chart of the text error correction method embodiment one based on artificial intelligence of the present invention.

Fig. 2 is the flow chart of the text error correction method embodiment two based on artificial intelligence of the present invention.

Fig. 3 is the flow chart of the long text error correction method embodiment one based on artificial intelligence of the present invention.

Fig. 4 is a kind of search interface schematic diagram of the present embodiment.

Fig. 5 is the flow chart of the long text error correction method embodiment two the present invention is based on artificial intelligence.

Fig. 6 is the exemplary plot of the mapping table provided in this embodiment for obscuring sound.

Fig. 7 is a kind of error correction result schematic diagram of the long text error correction method based on artificial intelligence of the present embodiment.

Fig. 8 is the structure chart of the text error correction device embodiment one based on artificial intelligence of the present invention.

Fig. 9 is the structure chart of the text error correction device embodiment two based on artificial intelligence of the present invention.

Figure 10 is the structure chart of the computer equipment embodiment of the present invention.

Figure 11 is a kind of exemplary plot of computer equipment provided by the invention.

【Specific embodiment】

It is right in the following with reference to the drawings and specific embodiments in order to make the object, technical solutions and advantages of the present invention clearer The present invention is described in detail.

Fig. 1 is the flow chart of the text error correction method embodiment one based on artificial intelligence of the present invention.As shown in Figure 1, this The text error correction method based on artificial intelligence of embodiment, specifically may include steps of：

100th, target fragment and the target fragment corresponding original segments in original text of error correction in corrected text are obtained； Target fragment is when carrying out correction process to original text based on segment scoring model trained in advance, from multiple times of original segments It is selected in selected episode；

The executive agent of the text error correction method based on artificial intelligence of the present embodiment entangles for the text based on artificial intelligence Misloading is put, which can be independent an electronic entity, for carrying out error correction to text.The text of the present embodiment can be The short text of query etc, or the long text in text editing system, the length of long text are typically larger than query's Length can be a longer sentence.That is, the text error correction method based on artificial intelligence of the present embodiment can answer In search scene, it can also be used in and be related in the various scenes of long text editor.

, it is necessary to carry out error correction, specific error correction to original text in the text correction process based on artificial intelligence of the present embodiment When, can word segmentation processing first be carried out to original text, obtain multiple participles.Participle strategy therein may be referred to related art Participle strategy, be not limited herein.Then the window of one default size can be set, which is applied in original text In, sliding window, chooses to each original segments from front to back.The size of preset window in the present embodiment could be provided as 1 The size of a participle either 2 participle sizes or 3 participle sizes.Therefore, the original segments of the present embodiment can be by Each participle is separately formed or is formed by continuously segmenting combination.

According to aforesaid way, after obtaining each original segments in original text.Then for each original segments, obtain Multiple candidate segments of the original segments can be replaced, acquisition process can replace (Phase based on the phrase counted in advance Table；PT) table is recalled more the pronunciation that obtains the corresponding replacement segment of original segments or be also based on original segments The same or similar candidate segment of pronunciation.Then it can be given a mark, gone forward side by side to each candidate segment using segment scoring model One step obtains to replace the target fragment of original segments from multiple candidate segments according to the marking of each segment.Such as It in shorter query, may only include an original segments, the highest candidate segment of marking may be employed at this time as target patch Section.And for poor text, including original segments more than two when, at this point for each original segments, can obtain The highest candidate segment of marking is taken as its corresponding target fragment.Or for some original segments, it is contemplated that with context The factors such as connectedness, from higher top n of giving a mark marking time can also be taken high or secondary high candidate segment mesh the most Standard film section, does not limit herein.No matter which kind of mode target fragment is obtained using, be required to reference to segment scoring model to candidate The marking of segment.Therefore, in the present embodiment, segment scoring model is very important in text error correction to the marking of candidate segment One link, if segment scoring model is poor to the marking accuracy of candidate segment, it will cause the accuracy of text error correction It is poor.

In the present embodiment, after obtaining corrected text to original text error correction using the above method, error correction text can be got Target fragment and the target fragment corresponding original segments in original text of error correction in this.

101st, feedback information of the user to the objective result fed back based on corrected text is obtained；

In the present embodiment, scene is different, based on objective result from corrected text to user feedback form and content can To differ.For example, in search scene, it can be for based on corrected text to the objective result of user feedback based on corrected text Search result.In long text editor, it can show as agreeing to repair to the objective result of user feedback based on corrected text Change or disagree modification.In other scenes, other forms may be also had, this is no longer going to repeat them.No matter it is based on Corrected text and be which type of form to the objective result of user feedback, can get the feedback information of user.Such as In scene is searched for, based on corrected text after user feedback search result, if user agrees to the search result after error correction, Search result reading can be clicked directly on.And if user disagrees the search result after error correction, can ignore this search result, And it scans for again.For another example in long text editor's scene, after carrying out error correction to original text input by user, entangling The prompting that wrong position can give user certain, agrees to or disagrees, user can be according to the truth of the error correction position, point It hits agreement or disagrees.So no matter under which type of scene, it is anti-to being based on corrected text user can be got The feedback information of the objective result of feedback.

102nd, according to target fragment, original segments and feedback information, incremental training is carried out to segment scoring model；

The incremental training of the present embodiment can be on-line study process, that is to say, that after error correction each time, direct root On-line study is carried out to segment scoring model according to error correction result, to improve the forecasting accuracy of segment scoring model.

Alternatively, the incremental training of the present embodiment or offline progress, per the cycle at regular intervals, gather the time All error correction datas in cycle, and incremental training is carried out to segment scoring model again using these error correction datas, to improve segment The forecasting accuracy of scoring model.

, it is necessary to according to target fragment, original segments and feedback information, to segment during the incremental training of the present embodiment Scoring model carries out incremental training.

103rd, based on the segment scoring model after training, correction process is carried out to subsequent original text.

Based on the segment scoring model after above-mentioned incremental training, when carrying out correction process to subsequent original text, accurately Degree can higher.

In practical application, the model structure of simple GBRank can not carry out incremental training, in the present embodiment, in order to improve The accuracy of segment scoring model carries out incremental training to segment scoring model.The segment scoring model of the present embodiment can be adopted Using logistic regression function in the model of GBRank, to support incremental training.Such as it needs first to train when training Gbrank models after obtaining tree-model, on this basis, are trained with reference to logistic regression using on same training data, obtained To the segment scoring model of the present embodiment.

The text error correction method based on artificial intelligence of the present embodiment, by the target patch for obtaining error correction in corrected text Section and target fragment corresponding original segments in original text；Target fragment is to original based on segment scoring model trained in advance When text carries out correction process, selected from multiple candidate segments of original segments；User is obtained to being based on corrected text and The feedback information of the objective result of feedback；According to target fragment, original segments and feedback information, segment scoring model is carried out Incremental training；Based on the segment scoring model after training, correction process is carried out to subsequent original text.The technical side of the present embodiment Case, by according to target fragment, original segments and feedback information, carrying out incremental training to segment scoring model, can improve The forecasting accuracy of segment scoring model when carrying out text error correction using the segment scoring model after training, can be carried effectively The error correction accuracy rate of high text.Such as apply the technical solution of the present embodiment in long text editor, it can be long with service hoisting The contents production quality of text promotes user experience.

Fig. 2 is the flow chart of the text error correction method embodiment two based on artificial intelligence of the present invention.As shown in Fig. 2, this The text error correction method based on artificial intelligence of embodiment is on the basis of the technical solution of above-mentioned embodiment illustrated in fig. 1, into one Step introduces technical scheme in further detail.As shown in Fig. 2, the text error correction side based on artificial intelligence of the present embodiment Method specifically may include steps of：

200th, target fragment and the target fragment corresponding original segments in original text of error correction in corrected text are obtained； Target fragment is when carrying out correction process to original text based on segment scoring model trained in advance, from multiple times of original segments It is selected in selected episode；

201st, feedback information of the user to the objective result fed back based on corrected text is obtained；

The implementation of step 200 and step 201 specifically may be referred to the step 100 and step of above-mentioned embodiment illustrated in fig. 1 101, details are not described herein.

202nd, the relative characteristic information between target fragment and original segments is obtained；

For example, step 202 can specifically include following at least one：

Firstth, the relative mass feature between target fragment and original segments is obtained；

The step specifically may include steps of：

(a1) frequency, original segments and the context segment in original text that original segments occur in corpus are obtained The frequency that occurs together in corpus of combination；

The step (a1) is the specific acquisition modes for the qualitative character for obtaining original segments.By having obtained in this present embodiment Get corrected text, it may be determined that the application field of the present embodiment.Specifically, the original is obtained in the corpus of the application field The qualitative character of beginning segment.

The context segment of original segments is tight phase before or after being located at original segments in original text in the present embodiment Adjacent segment.Such as when original segments include 1 participle, corresponding context segment can include being located at 1 before the participle It is a to segment either 2 participles and 1 participle or 2 participles after the participle.And if original segments include 2 points During word, corresponding context segment can include being located at 1 participle before the original segments in original text and positioned at the original 1 participle after beginning segment.And if original segments include 3 participle when, corresponding context segment can only include original text In be located at 1 participle before the original segments and 1 after original segments participle.Or in view of including more point The probability that the segment of word occurs in original text is smaller, can also be limited in the present embodiment：If original segments included 3 or During above participle, its context segment can not be taken.That is, when needing to take the context segment of original segments, it is corresponding The combination of original segments and context segment can add literary segment there are original segments, original segments add hereafter segment and above Segment adds original segments to add hereafter segment totally three combinations again.When obtaining the qualitative character of original segments, it is necessary to obtain original Segment, original segments add the combination of literary segment, original segments add the hereafter combination of segment and segment above to add original segments The frequency occurred again plus hereafter in each comfortable corpus of combination of segment.

Further optionally, when original segments need not take context segment, the quality of corresponding original segments is special at this time Sign can only include the frequency that original segments occur in corpus.

(b1) combination of the frequency, target fragment and context segment that target fragment occurs in corpus is obtained in language material The frequency occurred together in storehouse；

Accordingly, which is the acquisition modes of the qualitative character of target fragment, and specific acquisition modes are the same as above-mentioned step Suddenly (a1) is identical, and details are not described herein.

Furthermore, it is contemplated that the alignment of data, replacement segment of the target fragment as original segments have phase with original segments With property, if original segments do not take context segment in step (a1), accordingly the target fragment in step (b1) is not yet Take context segment.And when needing to take context segment, and original segments be the beginning of the sentence of original text or sentence tail, corresponding sky Context segment can set default beginning of the sentence feature or sentence tail feature to represent, to ensure the alignment of data.

(c1) combination of the frequency, original segments and the context segment that are occurred according to original segments in corpus is in language material The combination of the frequency and target fragment that the frequency, the target fragment occurred together in storehouse occurs in corpus and context segment The frequency occurred in corpus obtains frequency ratio and target fragment that target fragment occurs with original segments in corpus The frequency ratio and/or mesh that the combination of combination and original segments and context segment with context segment occurs in corpus The frequency difference and combination and the original sheet of target fragment and context segment that standard film section occurs with original segments in corpus The frequency that the combination of section and context segment occurs in corpus is poor.

The specific acquisition modes of the relative mass feature of the step (c1) between target fragment and original segments.Specifically Ground, by obtaining frequency ratio and target fragment that target fragment and original segments occur in corpus and context segment The frequency ratio and/or target fragment and original sheet that combination and the combination of original segments and context segment occur in corpus The frequency difference and combination and original segments and the context segment of target fragment and context segment that section occurs in corpus The frequency that occurs in corpus of combination it is poor, the amalgamation of target fragment and context segment can be embodied, if target patch Section and the frequency that original segments occur in corpus are bigger, and the combination of target fragment and context segment, with original sheet The frequency ratio that the combination of section and context segment occurs in corpus is very small, then illustrates the target fragment and context segment Compatibility is poor, is not suitable for replacing original segments.Vice versa.

Similarly, if target fragment and the frequency difference that original segments occur in corpus are smaller, i.e., using probability difference not It is more, but what the combination of target fragment and context segment and the combination of original segments and context segment occurred in corpus Frequency difference is very big, illustrates the combination of target fragment and context segment, is expecting than the combination of beginning segment and context segment Using more frequently in storehouse, it may be considered that target fragment has very strong compatibility with context segment, target patch may be employed Section replaces original segments, and vice versa.

In addition, when if original segments need not take context segment, corresponding relative mass feature only includes at this time：According to The frequency that the frequency and target fragment that original segments occur in corpus occur in corpus, obtain target fragment with it is original Frequency ratio that segment occurs in corpus and/or target fragment and the frequency that original segments occur in corpus are poor.With it is upper Stating needs that context segment is taken to compare, and obtains feature and not enough enriches, therefore, in the present embodiment, it is preferable that needs to obtain Hereafter segment.

Secondth, the opposite historical behavior feature between target fragment and original segments is obtained；

The step specifically may include steps of：

(a2) the first modification frequency that original segments in PT tables are revised as target fragment is obtained；

(b2) combination for obtaining original segments and context segment in PT tables is revised as target fragment and context segment The second modification frequency of combination；

(c2) according to the first modification frequency and the second modification frequency, obtain frequency ratio and/or the frequency is poor, frequency ratio is equal to the The two modification frequencys divided by the first modification frequency, frequency difference are equal to the second modification frequency and subtract the first modification frequency.

In addition, it is necessary to explanation, if original segments include 3 participles, when not taking context segment, can not use at this time Above-mentioned steps (a2)-(c2) realizes the acquisition of the opposite historical behavior feature between target fragment and original segments, can be direct Opposite historical behavior is set to be characterized as an empty or default characteristic symbol.When certainly, due to taking context segment, including spy It levies abundant in content, the situation of context segment is preferably taken in the present embodiment, above-mentioned steps (a2)-(c2) is taken to realize target fragment The acquisition of opposite historical behavior feature between original segments.

3rd, the semantic similarity feature between target fragment and original segments is obtained.

Similarly, the semantic similarity feature between the acquisition target fragment and original segments of the present embodiment, can include：It obtains Take the semantic similarity of target fragment and original segments；And/or obtain combination and the original sheet of target fragment and context segment The semantic similarity of the combination of section and context segment.

In the present embodiment, default dictionary may be employed, obtain the term vector of target fragment and the word of original segments to Amount, then calculate target fragment term vector and original segments term vector between COS distance, as the candidate segment with The semantic similarity of original segments.Accordingly, if the participle quantity that original segments include in the present embodiment is 3 or more When, the semantic similarity of target fragment and original segments, the semantic similarity as target fragment and original segments will be taken at this time Feature.If the participle quantity that original segments include in the present embodiment is less than 3, it is also necessary to take the context piece of original segments Section, at this time, it is also necessary to obtain target fragment and the language of the combination and the combination of original segments and context segment of context segment Adopted similarity.Similarly, the term vector of the combination of acquisition target fragment and context segment and original segments and context segment Combination term vector, then calculate term vector between COS distance, the combination as the candidate segment and context segment With the semantic similarity feature of the combination of original segments and context segment.Accordingly, original segments are plus the group of hereafter segment Conjunction includes original segments and adds hereafter segment and segment above that original segments is added to add hereafter segment again plus literary segment, original segments Totally three combinations.At this time accordingly, the semantic similarity feature of candidate segment and original segments includes：Target fragment and original sheet The semantic similarity of section, the combination of the candidate segment and segment above are semantic similar with the combination of original segments and segment above Degree, the candidate segment and the hereafter combination of segment and original segments and the hereafter semantic similarity of the combination of segment and above Segment, the candidate segment and the hereafter combination of segment and segment above, original segments and the hereafter combination of segment it is semantic similar Spend the semantic similarity feature of the candidate segment being spliced to form together and original segments.

It is, relatively special for the accuracy that the rich and segment scoring model of feature is given a mark in the present embodiment preferably Reference breath is simultaneously including relative mass feature, opposite historical behavior feature and semantic similarity feature.In order to further enrich The content of relative characteristic information in the present embodiment, obtains the relative characteristic information between target fragment and original segments, can be with Including following at least one；According to default proper-noun dictionary, the specific term for obtaining original segments and target fragment respectively is special Sign；And obtain the phonetic editing distance feature of target fragment and original segments.

Specifically, the specific term feature of target fragment is used to identify whether the target fragment belongs to specific term.Such as Judge whether certain target fragment belongs to specific term according to proper-noun dictionary, if belonging to, corresponding specific term is characterized as 1, Otherwise corresponding specific term is characterized as 0.Accordingly, if target fragment is specific term, which replaces original The probability of segment is higher；And if not specific term, then the probability of target fragment replacement original segments is lower.Similarly, can also According to the specific term feature of specific term lab setting original segments, details are not described herein.In addition, it is desirable to explanation, actual In, original segments and target fragment are very small for the probability of specific term simultaneously.

The pronunciation of target fragment is specially compiled as original by the in addition pronunciation editing distance of target fragment and original segments Alphabetical quantity in the phonetic that the pronunciation needs of segment adjust, accordingly, the pronunciation editing distance of target fragment and original segments It is bigger, illustrate that the probability that the original segments are replaced using target fragment is smaller；And if the pronunciation of target fragment and original segments is compiled Volume apart from smaller, illustrate that the probability that target fragment is used to replace the original segments is bigger.

203rd, determine that the preferable of target fragment is given a mark according to feedback information；

With reference to the record of above-mentioned steps 101, it is recognised that no matter based on corrected text and to the objective result of user feedback It is which type of form, can gets the feedback information of user.And the feedback information of user is finally presented as agreement error correction Text disagrees corrected text.It therefore, can be first according to feedback information in the present embodiment, thus it is speculated that whether user receives error correction Original segments are replaced using target fragment in text；If speculating, user receives, it is believed that target fragment to original segments is replaced It is correct to change, then the preferable marking of target fragment is arranged to 1；Otherwise, if speculating, user does not receive, it is believed that target patch The replacement of section to original segments is incorrect, then the preferable marking of target fragment is arranged to 0.

204th, given a mark according to the above-mentioned relative characteristic information of acquisition and the preferable of target fragment, segment scoring model is carried out Training；

The step 202- steps 204 of the present embodiment are the step 102 of above-mentioned embodiment illustrated in fig. 1 " according to target fragment, original A kind of specific implementation of beginning segment and feedback information, to segment scoring model progress incremental training ".

The present embodiment is trained for incremental training, and similar on-line training can be carried out once after each error correction, It can carry out off-line training per the cycle at regular intervals, to gather all text error correction datas in the time cycle, no matter adopt It is that the existing segment scoring model trained is learnt again, after improving segment scoring model with which kind of mode The precision of continuous prediction.During training, all relative characteristic information of above-mentioned acquisition can be inputted to segment scoring model, obtained The prediction marking of segment scoring model；Obtain the magnitude relationship of prediction marking and preferable marking；It is beaten if prediction marking is less than ideal Point, the parameter of adjustment segment scoring model so that the prediction marking of segment scoring model output is changed towards increased direction；It is if pre- It surveys marking and is more than ideal marking, adjust the parameter of segment scoring model so that the prediction marking court of segment scoring model output subtracts Small direction changes.The adjustment of the present embodiment is only once finely tuned, as long as ensureing the prediction marking energy of segment scoring model output It is enough to be changed towards increase or the direction reduced.

Further optionally, in the present embodiment, it can also no longer perform and input all relative characteristic information of above-mentioned acquisition To segment scoring model, the prediction marking of segment scoring model is obtained, segment scoring model is to this when can directly acquire error correction The marking of target fragment.

205th, based on the segment scoring model after training, correction process is carried out to subsequent original text.

The text error correction method based on artificial intelligence of the present embodiment, using above-mentioned technical proposal, by according to target patch Section, original segments and feedback information carry out incremental training to segment scoring model, can improve the prediction of segment scoring model Accuracy when carrying out text error correction using the segment scoring model after training, can effectively improve the error correction accuracy rate of text. Such as the technical solution of the present embodiment is applied in long text editor, can with the contents production quality of service hoisting long text, Promote user experience.

Above-mentioned Fig. 1 and the text error correction method based on artificial intelligence of embodiment illustrated in fig. 2 not only can be adapted for query In the correction process of short texts such as search, it is readily applicable in the correction process of long text.Following embodiments introduce this implementation The scene for the long text error correction that the technical solution of example is applied.

Fig. 3 is the flow chart of the long text error correction method embodiment one based on artificial intelligence of the present invention.As shown in figure 3, The long text error correction method based on artificial intelligence of the present embodiment, specifically may include steps of：

300th, when in long text there are during the original segments of non-dedicated noun, according to pre-set in the field of long text PT tables carry out PT segments to the original segments for needing error correction and recall, obtain the candidate segment set of original segments, the candidate segment Set includes multiple candidate segments；

The long text of the present embodiment can be the various long texts letter that the length that user edits is more than common query length Breath, for example, can be article summary or an article in a sentence etc..Using the technical side of this reality example Case can carry out long text error correction, so as to fulfill the error correction to entire article to each sentence in an article.

Similarly, in the present embodiment, when carrying out error correction to long text, it is necessary to first carry out word segmentation processing to long text, obtain Multiple participles.Participle strategy therein may be referred to the participle strategy of related art, not be limited herein.The present embodiment Original segments can be separately formed by each participle or be formed by continuously segmenting combination, referring in detail to above-described embodiment It records, details are not described herein.After obtaining multiple original segments in long text, judge whether each original segments are special name Word.Such as can whether specific term be belonged to come each original segments judged in long text according to default proper-noun dictionary, If belonging to, determine that there is no the original segments for needing error correction in long text；If otherwise in the presence of being not belonging to the original of specific term Segment determines there are the original segments for needing error correction in long text.The proper-noun dictionary of the present embodiment can be in advance to the length Data in the field of text are counted, and extract specific term, and the number of all specific terms including the field generated According to storehouse.

By above-mentioned judgement, if storing non-dedicated noun in long text, according to pre-set in the field of long text PT tables carry out PT segments to the original segments for needing error correction and recall, and the multiple candidate segments recalled are integrated into candidate's piece In Duan Jihe.

In the present embodiment, before the step 300, the PT tables of the long article this field can also be pre-set, such as specifically It can include following at least one mode：

First, the big data that search term behavior is actively changed according to user in long article this field counts, and obtains original segments and arrives Replace the change frequency of segment.By original segments, segment and original segments are replaced to the change frequency for replacing segment, are stored in PT In table；

Such as:User continuously inputs " blue or green Hua Da ", " Tsinghua University ", can collect " blue or green China->The change of Tsing-Hua University "； " blue or green Hua Da->The change of Tsinghua University "；Since user is in input process, if an input error before finding, can actively repair It is correct to change search term, according to the behavior of user, it is known that after the search term of modification once be correct.For example, By the statistics of preset period of time, can learn " blue or green China->The change frequency of Tsing-Hua University " is 100 times, " blue or green Hua Da->Tsing-Hua University The change frequency of university " is 70 times.

2nd, the title for the search result searched for according to search term input by user in long article this field and search server Between segment alignment mapping, obtain original segments to replacement segment the change frequency.By original segments, replace segment and original Beginning segment is stored in the change frequency for replacing segment in PT tables；For example, Fig. 4 is a kind of search interface schematic diagram of the present embodiment. As shown in figure 4, the search term of certain input of user is " blue or green Hua Da ", still, the search result of search server includes " Tsing-Hua University University ", and including " blue or green Hua Da ".In this way, including Tsinghua University for the title of search result, blue or green Hua Da can be recorded Learn->The change of Tsinghua University " 1 time；Include blue or green Hua Da for the title of search result, can record blue or green Hua Da-> The change of blue or green Hua Da " 1 time.If searching 30 altogether as a result, wherein 28 titles are Tsinghua University, 2 titles are to close In blue or green Hua Da, then it is assumed that " blue or green Hua Da->The change frequency of Tsinghua University " is 28 times, " blue or green Hua Da->Blue or green Hua Da " The change frequency be 2 times.

3rd, according to the user feedback between search term input by user in long article this field and search server active error correction Alignment of data maps, and obtains original segments to the change frequency for replacing segment.By original segments, replace segment and original segments To the change frequency for replacing segment, it is stored in PT tables；Unlike above-mentioned 2nd kind of situation, in this kind of situation, it is necessary to according to The feedback at family determines to replace segment.For example, the search term of certain input of user is " blue or green Hua Da ", still, search server Search result not only included " Tsinghua University ", but also including " blue or green Hua Da "；If user, which often clicks on a title, includes " Tsinghua University " Search result, then it is assumed that " blue or green Hua Da->The change of Tsinghua University " 1 time；User, which clicks on a title, includes " blue or green Hua Da " Search result, then it is assumed that " blue or green Hua Da->The change of blue or green Hua Da " 1 time.

In the way of above-described embodiment, the PT tables of the present embodiment can be what preset period of time was gathered and counted.It should PT tables may be employed any mode in above-mentioned three kinds of modes and generate, and can also use above-mentioned arbitrary two ways or three kinds of sides Formula combination producing.According to above-described embodiment, it is known that being recorded in the PT tables of the present embodiment, multigroup original segments, replacement Segment and the corresponding change frequency, for example, the storage form that uses of every group of data can for " original segments->Segment is replaced, is changed The dynamic frequency ".For same original segments, multiple replacement segments can be corresponded to, each replacing the corresponding change frequency of segment can To differ.According to PT tables, when being recalled to the original segments progress PT segments for needing error correction, can specifically be obtained from the PT tables The corresponding all replacement segments of the original segments, while obtain the corresponding change frequency of each replacement segment.Then replaced from multiple The TOP n replacement segments for obtaining change frequency maximum are changed in segment as the corresponding candidate segment of the original segments.And by more A candidate segment forms a candidate segment set.

301st, using segment scoring model trained in advance, respectively each candidate segment in candidate segment set is beaten Point；

In the present embodiment, a segment scoring model can be trained in advance, for each time in candidate segment set Selected episode is given a mark.In the present embodiment, for same original segments, using giving a mark, high candidate segment is come in error correction long text Original segments probability, higher than the probability for carrying out the original segments in error correction long text using the low candidate segment of giving a mark.But When correcting long text, it is also necessary to the factors such as original segments and the smoothness of context are considered, so the correction text finally obtained In, original segments may not be replaced using highest candidate segment is given a mark.The segment scoring model of the present embodiment may be employed GBRank network models.

For example, the step 301 specifically may include steps of：

(a3) each candidate's piece in qualitative character and candidate segment set of the original segments in the field of long text is obtained Qualitative character of the section in the field of long text；

For example, wherein obtaining qualitative character of the original segments in the field of long text, can specifically include：It obtains original The combination of the frequency, original segments and context segment that segment occurs in the corpus of long article this field is in long article this field The frequency occurred together in corpus.

Accordingly, qualitative character of each candidate segment in the field of long text in candidate segment set is obtained, specifically Including：Obtain the frequency, each candidate segment and context piece that each candidate segment in candidate segment set occurs in corpus The frequency that the combination of section occurs in corpus.

The context segment of original segments is tight phase before or after being located at original segments in long text in the present embodiment Adjacent segment may be referred to the related record of above-mentioned embodiment illustrated in fig. 2 in detail, and details are not described herein.Or in view of including The probability that the segment of more participle occurs in long text is smaller, can also be limited in the present embodiment：If original segments have been wrapped When including 3 or more participles, its context segment can not be taken.When needing to take the context segment of original segments, obtaining , it is necessary to obtain original segments, original segments add the combination of literary segment, original segments add hereafter during the qualitative character of original segments The combination of segment and segment above add original segments to add the frequency occurred in each comfortable corpus of the combination of hereafter segment again.It is right Ying Di, similarly, details are not described herein for the qualitative character acquisition modes of each candidate segment.

(b3) according to qualitative character of the original segments in the field of long text and each candidate segment in the field of long text In qualitative character, obtain the relative mass features of each candidate segment and original segments；

For example, the step (b3), can specifically include：The frequency, the original sheet occurred according to original segments in corpus The frequency that the section frequency, each candidate segment that occur together with the combination of context segment is in corpus occur in corpus with And the frequency that the combination of each candidate segment and context segment occurs in corpus, it obtains each candidate segment and exists with original segments The frequency ratio occurred in corpus and the combination of each candidate segment and context segment and original segments and context segment Combine the frequency difference that the frequency ratio that occurs in corpus and/or each candidate segment occur with original segments in corpus with And the frequency that the combination of each candidate segment and context segment and the combination of original segments and context segment occur in corpus Secondary difference.

Specifically, by obtaining frequency ratio and each candidate's piece that each candidate segment and original segments occur in corpus Frequency ratio that the combination of section and context segment and the combination of original segments and context segment occur in corpus and/or The frequency that each candidate segment and original segments occur in corpus is poor and the combination of each candidate segment and context segment with The frequency that the combination of original segments and context segment occurs in corpus is poor, can embody candidate segment and context segment Amalgamation, if candidate segment and the frequency that original segments occur in corpus are bigger, and candidate segment and context The combination of segment, the frequency ratio that the combination with original segments and context segment occurs in corpus is very small, then explanation should Candidate segment and context segment compatibility are poor, are not suitable for replacing original segments.Vice versa.

Similarly, if candidate segment and the frequency difference that original segments occur in corpus are smaller, i.e., using probability difference not It is more, but the combination of candidate segment and context segment, the combination with original segments and context segment occur in corpus Frequency difference it is very big, illustrate the combination of candidate segment and context segment, than beginning segment and context segment combination pre- Expect, it may be considered that candidate segment has very strong compatibility with context segment, candidate to may be employed using more frequently in storehouse Segment replaces original segments, and vice versa.

It should be noted that if when original segments have included 3 or more participles, its context segment can not be taken, The frequency that can occur at this time according only to the frequency, each candidate segment occurred in corpus in corpus, obtains each candidate The frequency ratio and/or each candidate segment that segment occurs with original segments in corpus occur with original segments in corpus The frequency it is poor, as each candidate segment and the relative mass feature of original segments.With it is above-mentioned need to take context segment compared with, obtain Take feature not abundant enough, therefore, in the present embodiment, it is preferable that need to obtain context segment.

In addition, it is necessary to explanation, when needing to take context segment.And the beginning of the sentence or sentence that original segments are long text Tail, corresponding sky context segment can set default beginning of the sentence feature or sentence tail feature to represent, to ensure pair of data Together.

(c3) the opposite historical behavior feature that original segments replace with each candidate segment is obtained；

Since PT tokens record historied modification information, the historical behavior feature of the present embodiment can be in PT tables Change the relevant feature of the frequency.Such as the step (c3) specifically may include steps of：

(a4) the first modification frequency that original segments in PT tables are revised as each candidate segment is obtained；

(b4) combination for obtaining original segments and context segment in PT tables is revised as each candidate segment and context segment Combination second modification the frequency；

(c4) according to the first modification frequency and the second modification frequency, obtain frequency ratio and/or the frequency is poor, frequency ratio is equal to institute The second modification frequency divided by the first modification frequency are stated, frequency difference is equal to the second modification frequency and subtracts the first modification frequency.

In addition, it is necessary to explanation, if original segments include 3 participles, when not taking context segment, can be set at this time An empty or default characteristic symbol is characterized as with respect to historical behavior.

(d3) the semantic similarity feature of each candidate segment and original segments is obtained；

In the present embodiment, default dictionary may be employed, obtain the term vector of each candidate segment and the word of original segments Then vector calculates the COS distance between the term vector of each candidate segment and the term vector of original segments, as candidate's piece The semantic similarity of section and original segments.Accordingly, if the participle quantity that original segments include in the present embodiment for 3 or with When upper, the semantic similarity of each candidate segment and original segments will be taken at this time, the semanteme as each candidate segment and original segments Similarity feature.If the participle quantity that original segments include in the present embodiment is less than 3, it is also necessary to take the upper and lower of original segments Literary segment, at this time, it is also necessary to obtain each candidate segment and the combination of context segment and original segments and the group of context segment The semantic similarity of conjunction.Similarly, the term vector of the combination of each candidate segment and context segment and original segments and upper are obtained Then the hereafter term vector of the combination of segment calculates the COS distance between term vector, as the candidate segment and context piece The combination of section, the semantic similarity feature with the combination of original segments and context segment.Accordingly, original segments are plus hereafter The combination of segment includes original segments and adds hereafter segment and segment above that original segments is added to add again plus literary segment, original segments Hereafter segment totally three combinations.At this time accordingly, the semantic similarity feature of candidate segment and original segments includes：Each candidate's piece Semantic similarity, the candidate segment and the combination of segment above and the combination of original segments and segment above of section and original segments Semantic similarity, the candidate segment and the semanteme of the hereafter combination of segment to the original segments and hereafter combination of segment it is similar Degree and above segment, the candidate segment and the hereafter combination of segment and segment above, original segments and the hereafter combination of segment The candidate segment that is spliced to form together of semantic similarity and original segments semantic similarity feature.

In addition, the relative mass feature of above-mentioned each candidate segment and original segments, opposite historical behavior feature and semanteme The acquisition of similarity feature, the relative mass of target fragment and original segments in can also referring to respectively shown in above-mentioned Fig. 2 are special The acquisition of sign, opposite historical behavior feature and semantic similarity feature.

(e3) according to the opposite of the relative mass feature of each candidate segment and original segments, each candidate segment and original segments The semantic similarity feature of historical behavior feature, each candidate segment and original segments and segment scoring model obtain each respectively The marking of candidate segment.

Then each candidate segment and the relative mass feature of original segments, each candidate segment and original above-mentioned steps obtained The semantic similarity feature of the opposite historical behavior feature of beginning segment, each candidate segment and original segments is inputted to advance training Segment scoring model in, which can predict the marking of the candidate segment.

Such as during the training of segment scoring model, it can gather and be replaced as the training original segments of positive example and negative example and training Segment is replaced if correct, and corresponding marking is 1, and training data is positive example at this time；Else if be the replacement of mistake, Corresponding marking is 0；Training data is negative example at this time.The ratio of positive and negative example is more than 1 in training data, such as can be 5:1 or Person 4:1.Before training, it is in advance the parameter setting initial value of the segment scoring model, training data is then sequentially input, if piece The marking and known marking of section scoring model prediction are inconsistent, adjust the parameter of segment scoring model so that prediction result and Know that result reaches unanimity.Using aforesaid way, segment scoring model is constantly trained using the training data of tens million of, until The result of segment scoring model prediction is consistent with known results, it is determined that segment trains the parameter of scoring model, so that it is determined that piece Section scoring model, then segment scoring model training finish.The quantity of the training data used during training is more, and trained segment is beaten Sub-model is more accurate, and the follow-up marking that segment scoring model is used to predict candidate segment is more accurate.According to aforesaid way, in advance The marking of survey can be between 0-1.In practical application, segment scoring model can also be set to be located in other numberical ranges, such as Between 0-100, principle is similar, and details are not described herein.

Still optionally further, before giving a mark for each candidate's piece, can also include the following steps：According to default special name Dictionary and each candidate segment obtain the specific term feature of each candidate segment；And/or obtain each candidate segment and original segments Phonetic editing distance feature.

Specifically, the specific term feature of each candidate segment is used to identify whether the candidate segment belongs to specific term.Example Such as judge whether certain candidate segment belongs to specific term according to proper-noun dictionary, if belonging to, corresponding specific term is characterized as 1, otherwise corresponding specific term be characterized as 0.Accordingly, if candidate segment is specific term, segment scoring model is should The marking of candidate segment output is higher；And if not specific term, then the marking of corresponding output is relatively low.In addition candidate segment and original The pronunciation editing distance of beginning segment, the pronunciation of candidate segment is specially compiled as to the pronunciation of original segments needs the phonetic adjusted The quantity of middle letter, accordingly, the pronunciation editing distance of candidate segment and original segments are bigger, illustrate to replace using candidate segment The probability of the original segments is smaller, and corresponding segment scoring model can be smaller for the marking of candidate segment output at this time；And If the pronunciation editing distance of candidate segment and original segments is smaller, the probability for illustrating to be replaced the original segments using candidate segment is got over Greatly, corresponding segment scoring model can be larger for the marking of candidate segment output at this time.

Based on principles above, accordingly, step (e1) can specifically include：According to the phase of each candidate segment and original segments To the semanteme of the opposite historical behavior feature of qualitative character, each candidate segment and original segments, each candidate segment and original segments Similarity feature and segment scoring model, and combine each candidate segment specific term feature and each candidate segment with it is original The phonetic editing distance feature of segment, obtains the marking of each candidate segment respectively.At this time accordingly, training segment scoring model When, it is also desirable to it obtains training in training data and replaces the specific term feature of segment and original segments and training is trained to replace The phonetic editing distance feature of segment is changed, segment scoring model is trained together with reference to feature before.

302nd, according to the marking of each candidate segment, by decoded mode, from each original sheet for needing error correction of long text In the candidate segment set of section, the corresponding target fragment of each original segments is obtained, so as to obtain the correction text of long text.

Finally, the marking based on each candidate segment is obtained from the candidate segment set of each original segments for needing error correction The target fragment of each original segments obtains the correction text of long text.For example, the highest candidate segment of marking can be directly acquired As target fragment.If the high candidate segment of marking time is combined preferably with the context in long text, can also be used and be beaten High candidate segment is as the target fragment corrected in text by several times.Or text can also be corrected to obtain using other modes This.

Such as the different original segments in long text are all carried out after segment recalls, each original segments can obtain multiple Candidate segment is as a result, there are many possibility of candidate segment combination, formation segment candidates in this way, different original segments can be corresponded to Network.Such as if certain long text includes original segments A, B and C, the corresponding candidate segments of original segments A have 1,2 and 3；It is original The corresponding candidate segments of segment B have 4,5 and 6；The corresponding candidate segments of original segments C have 7,8 and 9；Each original sheet at this time The candidate segment of section may be used to replace original segments, i.e. candidate segment 1 can be combined with candidate segment 4,5 or 6 respectively, Candidate segment 2 can also be combined with candidate segment 4,5 or 6 respectively, candidate segment 3 can also respectively with candidate segment 4,5 or Person 6 combines, and forms segment candidate network.Decoding algorithm may be employed at this time, each original segments pair are obtained from segment candidate network The optimal candidate segment answered obtains optimal correction text.Such as decoding algorithm can include being not limited to：Viterbi algorithm (viterbi), the decoding algorithms such as beam search (beam search) or greed search (greedy search).

Alternatively, such as step 302, specifically may include steps of：For each original segments, according to candidate segment collection The marking of each candidate segment in conjunction obtains corresponding at least two preselected fragment of the original segments from candidate segment set；It is logical Decoded mode is crossed, is obtained from corresponding at least two preselected fragment of each original segments for needing error correction of long text each original The corresponding target fragment of segment, so as to obtain the correction text of long text.

It specifically, can be according to the suitable of marking height if the corresponding candidate segment quantity of each original segments is more Sequence takes higher at least one candidate segment of giving a mark as preselected fragment, then by decoded mode, from the needs of long text The corresponding target fragment of each original segments is obtained in corresponding at least two preselected fragment of each original segments of error correction, so as to obtain The correction text of long text.

The long text error correction method based on artificial intelligence of the present embodiment, can entangle the false segments in long text Just, the editting quality of long text is effectively improved.The technical solution of the present embodiment is based on the proposition of long text error correction scene, Ke Yishi For the error correction behavior under text scene, and can quickly and effectively output error correction result, error correction efficiency is higher, can be in order to auxiliary The contents production quality for promoting long text is helped, promotes user experience.

Fig. 5 is the flow chart of the long text error correction method embodiment two the present invention is based on artificial intelligence.As shown in figure 5, this The long text error correction method based on artificial intelligence of embodiment, on the basis of the technical solution of above-mentioned embodiment illustrated in fig. 3, into One step is added carries out editing distance (Edit Distance to the original segments for needing error correction；ED) segment is recalled, and is discussed in detail Technical scheme.As shown in figure 5, the long text error correction method based on artificial intelligence of the present embodiment, can specifically wrap Include following steps：

400th, judge whether each original segments in long text belong to specific term according to proper-noun dictionary；If belong to In execution step 401；Otherwise, step 402 is performed；

401st, the original segments that determining long text includes are specific term, which is not required error correction, are terminated；

402nd, determine there are the original segments for being not belonging to specific term in the long text, determine to need to non-in the long text The original segments of specific term carry out error correction；Perform step 403；

403rd, according to pre-set PT tables in the field of long text, PT segments is carried out to the original segments for needing error correction and are called together It returns, obtains the candidate segment set of original segments, which includes multiple candidate segments；Perform step 404；

The implementation of step 400-403 may be referred to the record of above-mentioned embodiment illustrated in fig. 3 in detail, and details are not described herein.

404th, obtain original segments occur in the corresponding corpus in the field of long text the frequency, original segments and up and down The change frequency, original segments and context of the frequency, original segments that the combination of literary segment occurs in corpus in PT tables The semantic similarity of the change frequency and original segments and context segment of the combination of segment in PT tables；Perform step 405；

Similarly, the original segments of the present embodiment and the combination of context segment may be referred to above-mentioned embodiment illustrated in fig. 1 Correlation is recorded, and details are not described herein.The frequency that original segments occur in the corresponding corpus in the field of long text can pass through The occurrence number of the original segments obtains in statistics corpus.The change frequency of the original segments in PT tables can be should in PT tables The total degree for other segments that original segments are replaced by outside itself.As " blue or green China " is replaced by " Tsing-Hua University " and " green grass or young crops China " is replaced by the total degree of all " blue or green China " being replaced such as " blue and white ".The combination of original segments and context segment is in PT The change frequency in table can be the total degree that the original segments are replaced by other segments outside itself in PT tables.Such as " green grass or young crops Hua Da " is replaced by the total degree of " Tsinghua University " and all replacement segments being replaced by outside " blue or green Hua Da ".

The semantic similarity of original segments and context segment in the present embodiment specifically can be by obtaining original segments Term vector and context segment term vector, and calculate between the term vector of original segments and the term vector of context segment Cosine similarity obtains the semantic similarity of original segments and context segment.The term vector of wherein context segment is above Segment adds the term vector of the hereafter combination of segment.Or in the present embodiment, can also use original segments with it is original in long text The semantic similarity of other all segments outside segment replaces the semanteme of the original segments and context segment in the present embodiment Similarity forms new alternative.

405th, the frequency that is occurred according to original segments in the corresponding corpus in the field of long text, original segments and in length It is the change frequency of the frequency, original segments that the combination of context segment in text occurs in corpus in PT tables, original The change frequency of the combination of segment and context segment in PT tables, original segments and context segment semantic similarity, with And default language smoothness degree scoring model, obtain the confidence levels of original segments；Perform step 406；

For example, the step 405 specifically includes the following two kinds realization method in the present embodiment：

In the first realization method, confidence level is judged using confidence level scoring model, specifically may include steps of：

(a5) frequency that is occurred according to original segments in the corresponding corpus in the field of long text, original segments and in length The frequency and language smoothness degree scoring model that the combination of context segment in text occurs in corpus, prediction are original The clear and coherent degree of segment；

The language smoothness degree scoring model of the present embodiment is used to give a mark to the clear and coherent degree of the original segments in long text. The frequency, original segments and the context in long text that original segments are occurred in the corresponding corpus in the field of long text The frequency that the combination of segment occurs in corpus, the language smoothness degree scoring model can predict the smoothness of the original segments Degree.Such as the score value of the smoothness degree can be between 0-1, it is more big more clear and more coherent to limit numerical value, and numerical value is not smaller more clear and coherent.Or Person can also represent clear and coherent sequence, such as 0-100 using other numberical ranges.

The language smoothness degree scoring model of the present embodiment can also pass through training in advance and obtain, such as gather several instructions in advance Practice data, a trained long text is corresponded in each training data, including the training original segments in training long text in language material The combination of the frequency, training original segments and the training context segment in training long text that occur in storehouse goes out in corpus The known smoothness degree of the existing frequency and the training original segments.It can include known smoothness in each training data of acquisition The positive example training data for 1 is spent, the negative example training data that known smoothness degree is 0 can also be included.The ratio of positive and negative example can be with More than 1, for example, it is preferable to be 5：1 or 4：1.Before training, it is the parameter setting initial value of language smoothness degree scoring model, instructs When practicing, each training data is inputted into the language smoothness degree scoring model successively, which is the instruction Practice data prediction smoothness degree, then judge whether the clear and coherent degree of prediction and known smoothness degree are consistent, if inconsistent, adjust the language Say the parameter of smoothness degree scoring model so that the clear and coherent degree of prediction reaches unanimity with known smoothness degree.Using aforesaid way, make The language smoothness degree scoring model is continued to train with the training datas of tens million of, until the clear and coherent degree of prediction leads to known Compliance is consistent, it is determined that the parameter of the language smoothness degree scoring model, so that it is determined that the language smoothness degree scoring model, the language Clear and coherent degree scoring model training finishes.

(b5) the change frequency, original segments and the context according to the clear and coherent degree of original segments, original segments in PT tables The semantic similarity of the change frequency and original segments and context segment of the combination of segment in PT tables, and combine advance Trained confidence level scoring model obtains the confidence level of original segments；

Similarly, in the present embodiment, also training has confidence level scoring model in advance, which is used to obtain original The confidence level of beginning segment.Confidence bit can be set in the present embodiment, and confidence value is bigger between 0-1, represents confidence level Higher, confidence value is smaller, represents that confidence level is lower.In practical application, confidence level can also be arranged on to other numerical value models Between enclosing, as between 0-100.In use, by the change frequency in PT tables of the clear and coherent degree of original segments, original segments, original The semantic similarity of the change frequency and original segments and context segment of the combination of segment and context segment in PT tables To trained confidence level scoring model, which can export the confidence level of original segments for input.

Similarly, the confidence level scoring model of the present embodiment can also pass through training in advance and obtain, such as gather in advance several Training data, each training data include the change frequency of the clear and coherent degree of training original segments, training original segments in PT tables The change frequency of the combination of secondary, training original segments and training context segment in PT tables, training original segments in training The hereafter semantic similarity of segment and the corresponding confidence level of each training original segments, each parameter acquiring mode is the same as above-mentioned reality Apply the related record of example.The positive example training data that known confidence level is 1 can be included in each training data of acquisition, it can be with Include the negative example training data that known confidence level is 0.The ratio of positive and negative example can be more than 1, for example, it is preferable to be 5：1 or 4： 1.Before training, it is the parameter setting initial value of confidence level scoring model, when training, each training data is inputted to this put successively In reliability scoring model, the confidence level scoring model be the training data forecast confidence, then judge prediction confidence level with Whether known confidence level consistent, if inconsistent, adjusts the parameter of the confidence level scoring model so that the confidence level of prediction with The confidence level known reaches unanimity.Using aforesaid way, the confidence level scoring model is continued using the training datas of tens million of Training, until the confidence level of prediction is consistent with known confidence level, it is determined that the parameter of the confidence level scoring model, so that it is determined that The confidence level scoring model, confidence level scoring model training finish.

And need to illustrate when, the training and prediction of all models arrived involved in the present embodiment, in input model Characteristic can first pass through normalized in advance, and the mode of normalized does not limit.

In second of realization method, confidence level is judged using threshold value, specifically may include steps of：

(a6) frequency that is occurred according to original segments in the corresponding corpus in the field of long text, original segments and in length The frequency and language smoothness degree scoring model that the combination of context segment in text occurs in corpus, prediction are original The clear and coherent degree of segment；

The realization method of step (a6) is identical with above-mentioned steps (a5), may be referred to the record of above-mentioned steps (a5) in detail, Details are not described herein.

(b6) judge whether the clear and coherent degree of original segments is more than default smoothness degree threshold value, original segments in PT tables respectively The change frequency and the change frequency of the combination in PT tables of original segments and context segment whether be all higher than the default frequency Whether the semantic similarity of threshold value and original segments and context segment is more than default similarity threshold；It if so, will be original The confidence level of segment is set greater than default confidence threshold value；Otherwise the confidence level of original segments is set smaller than or waited In default confidence threshold value.

In the present embodiment, pass through the change frequency and original sheet of clear and coherent degree, original segments in PT tables to original segments The semantic similarity of the change frequency in PT tables of section and the combination of context segment, original segments and context segment is distinguished pre- Corresponding threshold value is first set, such as smoothness degree threshold value, frequency threshold value and confidence threshold value.Then judging each parameter respectively is More than corresponding threshold value, if each parameter is all higher than corresponding threshold value, it may be considered that confidence level is larger at this time, it can set and put Reliability is more than default confidence threshold value, can determine that original segments need not carry out ED and recall at this time.Otherwise there was only wherein one A parameter is not more than corresponding threshold value, it may be considered that confidence level is smaller at this time, confidence level can be set to be less than default confidence Threshold value is spent, can determine that original segments need progress ED to recall at this time.The confidence threshold value of the present embodiment can be according to actual warp Test appropriate numerical value there are one pre-setting.

406th, judge whether the confidence level of original segments is more than default confidence threshold value；If so, perform step 407；It is no Then determine that original segments need not carry out ED segments and recall；Perform step 408；

407th, determine that original segments need progress ED segments to recall；And according to the pronunciation of original segments, utilize long article ability Corpus and/or spelling input method in domain are the input prompt message that original segments provide, and ED segments are carried out to original segments It recalls, and the candidate segment recalled is appended in candidate segment set；Perform step 408；

The ED of the present embodiment is recalled as by from the phonetic notation string i.e. pinyin of original segments, double by mixing the initial and the final The method of deletion recalls candidate segment.Candidate segment when recalling can come from corpus, pass through the spelling according to original segments Sound takes high frequency section by mixing double delete of the initial and the final, carries out phonetic notation, inverted index is carried out by phonetic.Such as " China ", Phonetic notation is " zhonghua ", is recalled to expand, and part deletion is carried out to the initial and the final and is indexed, corresponding generation key- Value can be " zhonghua ", " zhhua ", " onghua ", " zhongua ", " zhong h " } _ -->{ " China " }.Then According to " zhonghua ", " zhhua ", " onghua ", " zhongua ", " zhong h " recalls corresponding candidate's piece from corpus Section.Wherein " zhonghua " due to phonetic it is complete, be very easy to recall corresponding candidate segment.And " zhhua ", " onghua ", " Zhongua ", " zhong h " can recall the candidate segment of corresponding phonetic by way of supplementing initial consonant or simple or compound vowel of a Chinese syllable.Therefore, The candidate segment and the pronunciation of original segments that ED is recalled are same or similar.

In addition, the candidate segment that the ED of the present embodiment is recalled could also be from the recalling as a result, specifically may be used of spelling input method Using the input prompt message provided according to spelling input method as original segments.It is accustomed to according to the common key entry of user, with current word The initial and the final sequential system recalled, " zhonghua " " zhongh ", " zhhua " obtain spelling input method candidate word row Table.In practical application, it can also introduce to obscure sound and be enlarged and recall result.Such as Fig. 6 obscures sound to be provided in this embodiment Mapping table exemplary plot.As shown in fig. 6, provide partial confusion sound.It, can according to spelling input method when recalling candidate segment With the sound of obscuring with reference to shown in figure 6, result is given in expansion for change.

408th, using segment scoring model trained in advance, respectively each candidate segment in candidate segment set is beaten Point；Perform step 409；

409th, according to the marking of each candidate segment in candidate segment set, original segments pair are obtained from candidate segment set At least two preselected fragments answered；Perform step 410；

410th, by decoded mode, from the corresponding at least two pre- chip select of each original segments for needing error correction of long text The corresponding target fragment of each original segments is obtained in section, so as to obtain the correction text of long text；Perform step 411；

The specific implementation of step 408-410 may be referred to the related record of above-mentioned embodiment illustrated in fig. 3, herein no longer It repeats.

411st, error correction intervention is carried out to correcting the segment corrected in text, determines final correction text, terminated.

For example, carry out error correction intervention to correcting the segment corrected in text in the present embodiment, specifically include such as down toward Few one kind：

Judge whether correct the target fragment corrected in text and corresponding original segments hits default blacklist In error correction pair；If hit, original segments are reduced to by target fragment；With

Judge whether correct the target fragment corrected in text and corresponding original segments belongs to synonym；If belong to In target fragment then is reduced to original segments.

In blacklist in the present embodiment can according to correct before mistake error correction to being acquired.Such as to original sheet Section is corrected as after certain target fragment, user according to correction as a result, target fragment is reduced to original segments again, then can be true Determine error correction.The target fragment and original segments can be gathered at this time, form error correction pair.In practical application, it may be employed several Similar error correction pair forms blacklist.And the segment of the transmission correction in corrected text is intervened according to the blacklist, example Whether the target fragment and original segments for such as detecting correction are a pair of of error correction pair, if when, target fragment is reduced to original sheet Section；Otherwise retain and correct text.

In addition, long text error correction mainly corrects the information of mistake, and without correcting synonym.In the present embodiment, Synonym table can also be previously stored with, stores each word segment and its corresponding synonym segment.Then according to synonymous Whether the target fragment and corresponding original segments that vocabulary detection is corrected belong to synonym, if belonging to, target fragment is also Originally it was original segments；Otherwise retain and correct text.

Fig. 7 is a kind of error correction result schematic diagram of the long text error correction method based on artificial intelligence of the present embodiment.Such as Using the long text error correction method based on artificial intelligence of the present embodiment, to long text " this Shi Fugan's is faster and better ", carry out After error correction, available corrected text is " this master does faster and better ", it is known that the technical solution of the present embodiment Can error correction be carried out to long text in high quality.

The long text error correction method based on artificial intelligence of the present embodiment, can entangle the false segments in long text Just, the editting quality of long text is effectively improved.The technical solution of the present embodiment is based on the proposition of long text error correction scene, Ke Yishi For the error correction behavior under text scene, and can quickly and effectively output error correction result, error correction efficiency is higher, can be in order to auxiliary The contents production quality for promoting long text is helped, promotes user experience.And the technical solution of the present embodiment, it can also continue to carry out wrong Segment, which is replaced, by mistake intervenes, and further optimizes error correction result.

The long text error correction scene that above-mentioned Fig. 3 and embodiment illustrated in fig. 5 are applied for the text error correction scheme of the present invention.It is real In the application of border, above-mentioned Fig. 3, embodiment illustrated in fig. 5 can be used after above-mentioned Fig. 1 and embodiment illustrated in fig. 2, realize that basis is entangled Target fragment and original segments in the feedback information and corrected text of wrong text carry out increment instruction to segment scoring model Practice, so as to further improve the precision of the prediction marking of segment scoring model.

Fig. 8 is the structure chart of the text error correction device embodiment one based on artificial intelligence of the present invention.As shown in figure 8, this The text error correction device based on artificial intelligence of embodiment, can specifically include：

Segment data obtaining module 10 is for obtaining in corrected text the target fragment of error correction and target fragment in original text Corresponding original segments in this；Target fragment is to carry out correction process to original text based on segment scoring model trained in advance When, it is selected from multiple candidate segments of original segments；

Feedback information acquisition module 11 is used to obtain feedback letter of the user to the objective result fed back based on corrected text Breath；

Incremental training module 12 is used for the target fragment obtained according to segment data obtaining module 10, original segments and anti- The feedback information that feedforward information acquisition module 11 obtains carries out incremental training to segment scoring model；

Correction module 13 be used for based on incremental training module 12 train after segment scoring model, to subsequent original text into Row correction process.

The text error correction device based on artificial intelligence of the present embodiment is realized by using above-mentioned module based on artificial intelligence Text correction process realization principle and technique effect it is identical with the realization of above-mentioned related method embodiment, can join in detail The record for stating related method embodiment is admitted to, details are not described herein.

Fig. 9 is the structure chart of the text error correction device embodiment two based on artificial intelligence of the present invention.As shown in figure 9, this The text error correction device based on artificial intelligence of embodiment is on the basis of the technical solution of above-mentioned embodiment illustrated in fig. 8, into one Step can also include following technical solution.

As shown in figure 9, in the text error correction device based on artificial intelligence of the present embodiment, incremental training module 12, specifically Including：

Relative characteristic information acquisition unit 121 is used to obtain the target fragment and original of the acquisition of segment data obtaining module 10 Relative characteristic information between segment；

Determination unit 122 is used to determine the ideal of target fragment according to the feedback information that feedback information acquisition module 11 obtains Marking；

Training unit 123 is used for the relative characteristic information obtained according to relative characteristic information acquisition unit 121 and determines single The preferable marking of the definite target fragment of member 122, is trained segment scoring model.

Still optionally further, in the text error correction device based on artificial intelligence of the present embodiment, relative characteristic acquisition of information Unit 121 operates for performing following at least one：

Obtain the relative mass feature between target fragment and original segments that segment data obtaining module 10 obtains；

Obtain the opposite historical behavior feature between target fragment and original segments that segment data obtaining module 10 obtains； With

Obtain the semantic similarity feature between target fragment and original segments that segment data obtaining module 10 obtains.

Still optionally further, relative characteristic information acquisition unit 121 is specifically used for：

Obtain the frequency, original segments that the original segments that segment data obtaining module 10 obtains occur in corpus with The frequency that the combination of context segment in original text occurs together in corpus；

Obtain the frequency, target fragment that the target fragment that segment data obtaining module 10 obtains occurs in corpus with it is upper The frequency that hereafter combination of segment occurs together in corpus；

The combination of the frequency, original segments and the context segment that are occurred according to original segments in corpus is in corpus The combination of the frequency and target fragment and context segment that the frequency, the target fragment occurred together occurs in corpus is in language The frequency that occurs in material storehouse obtains frequency ratio and target fragment that target fragment and original segments occur in corpus and upper The frequency ratio and/or target patch that hereafter the combination of segment and the combination of original segments and context segment occur in corpus The combination of frequency difference and target fragment and context segment that section occurs in corpus with original segments and original segments and The frequency that the combination of context segment occurs in corpus is poor.

The original segments that segment data obtaining module 10 obtains in acquisition PT tables are revised as segment data obtaining module 10 and obtain The first modification frequency of the target fragment taken；

The combination for obtaining segment data obtaining module 10 obtains in PT tables original segments and context segment is revised as mesh Standard film section and the second modification frequency of the combination of context segment；

According to the first modification frequency and the second modification frequency, obtain frequency ratio and/or the frequency is poor, frequency ratio is repaiied equal to second Change the frequency divided by the first modification frequency, frequency difference is equal to the second modification frequency and subtracts the first modification frequency.

Obtain the target fragment of 10 acquisition of segment data obtaining module and the semantic similarity of original segments；And/or

Obtain the combination of target fragment and context segment that segment data obtaining module 10 obtains and original segments and upper The hereafter semantic similarity of the combination of segment.

Still optionally further, relative characteristic information acquisition unit 121 specifically is additionally operable to perform following at least one；

According to default proper-noun dictionary, the specific term feature of original segments and target fragment is obtained respectively；And

Obtain the phonetic editing distance feature of target fragment and original segments.

Still optionally further, in the text error correction device based on artificial intelligence of the present embodiment, determination unit 122 is specifically used In：

The feedback information obtained according to feedback information acquisition module 11, thus it is speculated that whether user receives to use mesh in corrected text Standard film section replaces original segments；

If speculating, user receives, and the preferable marking of target fragment is arranged to 1；Otherwise, if speculating, user does not receive, The preferable marking of target fragment is arranged to 0.

Still optionally further, in the text error correction device based on artificial intelligence of the present embodiment, training unit 123 is specifically used In：

Relative characteristic information is inputted to the prediction marking that segment scoring model is obtained to segment scoring model；

Obtain the magnitude relationship of prediction marking and preferable marking；

If prediction marking is less than preferable marking, the parameter of segment scoring model is adjusted so that the output of segment scoring model Prediction marking is changed towards increased direction；

If prediction marking is more than preferable marking, the parameter of segment scoring model is adjusted so that the output of segment scoring model Prediction marking is changed towards the direction of reduction.

Figure 10 is the structure chart of the computer equipment embodiment of the present invention.As shown in Figure 10, the computer of the present embodiment is set It is standby, including：One or more processors 30 and memory 40, memory 40 work as storage for storing one or more programs The one or more programs stored in device 40 are performed by one or more processors 30 so that one or more processors 30 are realized Such as the information processing method of figure 1 above-embodiment illustrated in fig. 7.In embodiment illustrated in fig. 10 exemplified by including multiple processors 30.

For example, Figure 11 is a kind of exemplary plot of computer equipment provided by the invention.Figure 11 shows to be used for realizing The block diagram of the exemplary computer device 12a of embodiment of the present invention.The computer equipment 12a that Figure 11 is shown is only one and shows Example, should not bring any restrictions to the function and use scope of the embodiment of the present invention.

As shown in figure 11, computer equipment 12a is showed in the form of universal computing device.The component of computer equipment 12a It can include but is not limited to：One or more processor 16a, system storage 28a, connection different system component is (including being Unite memory 28a and processor 16a) bus 18a.

Bus 18a represents the one or more in a few class bus structures, including memory bus or Memory Controller, Peripheral bus, graphics acceleration port, processor or the local bus using the arbitrary bus structures in a variety of bus structures.It lifts For example, these architectures include but not limited to industry standard architecture (ISA) bus, microchannel architecture (MAC) Bus, enhanced isa bus, Video Electronics Standards Association (VESA) local bus and peripheral component interconnection (PCI) bus.

Computer equipment 12a typically comprises various computing systems readable medium.These media can be it is any can The usable medium accessed by computer equipment 12a, including volatile and non-volatile medium, moveable and immovable Jie Matter.

System storage 28a can include the computer system readable media of form of volatile memory, such as deposit at random Access to memory (RAM) 30a and/or cache memory 32a.Computer equipment 12a may further include it is other it is removable/ Immovable, volatile/non-volatile computer system storage medium.Only as an example, storage system 34a can be used for reading Write immovable, non-volatile magnetic media (Figure 11 is not shown, is commonly referred to as " hard disk drive ").Although do not show in Figure 11 Go out, can provide for moving the disc driver of non-volatile magnetic disk (such as " floppy disk ") read-write and to removable The CD drive of anonvolatile optical disk (such as CD-ROM, DVD-ROM or other optical mediums) read-write.In these cases, Each driver can be connected by one or more data media interfaces with bus 18a.System storage 28a can include At least one program product, the program product have one group of (for example, at least one) program module, these program modules are configured To perform the function of the above-mentioned each embodiments of Fig. 1-Fig. 9 of the present invention.

Program with one group of (at least one) program module 42a/utility 40a can be stored in such as system and deposit In reservoir 28a, such program module 42a include --- but being not limited to --- operating system, one or more application program, Other program modules and program data may include the reality of network environment in each or certain combination in these examples It is existing.Program module 42a usually performs the function and/or method in above-mentioned each embodiments of Fig. 1-Fig. 9 described in the invention.

Computer equipment 12a can also be with one or more external equipment 14a (such as keyboard, sensing equipment, display 24a etc.) communication, can also be enabled a user to one or more equipment interact with computer equipment 12a communicate and/or (such as network interface card is adjusted with enabling any equipment that computer equipment 12a communicates with one or more of the other computing device Modulator-demodulator etc.) communication.This communication can be carried out by input/output (I/O) interface 22a.Also, computer equipment 12a can also by network adapter 20a and one or more network (such as LAN (LAN), wide area network (WAN) and/or Public network, such as internet) communication.As shown in the figure, network adapter 20a by bus 18a and computer equipment 12a its Its module communicates.It should be understood that although not shown in the drawings, can combine computer equipment 12a uses other hardware and/or software Module includes but not limited to：Microcode, device driver, redundant processor, external disk drive array, RAID system, tape Driver and data backup storage system etc..

Processor 16a is stored in program in system storage 28a by operation, so as to perform various functions application and Data processing, such as realize the text error correction method based on artificial intelligence shown in above-described embodiment.

The present invention also provides a kind of computer-readable mediums, are stored thereon with computer program, which is held by processor The text error correction method based on artificial intelligence as shown in above-described embodiment is realized during row.

The computer-readable medium of the present embodiment can be included in the system storage 28a in above-mentioned embodiment illustrated in fig. 11 RAM30a, and/or cache memory 32a, and/or storage system 34a.

With the development of science and technology, the route of transmission of computer program is no longer limited by tangible medium, it can also be directly from net Network is downloaded or obtained using other modes.Therefore, the computer-readable medium in the present embodiment can not only include tangible Medium can also include invisible medium.

Any combination of one or more computer-readable media may be employed in the computer-readable medium of the present embodiment. Computer-readable medium can be computer-readable signal media or computer readable storage medium.Computer-readable storage medium Matter for example may be-but not limited to-electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor system, device or device or The arbitrary above combination of person.The more specific example (non exhaustive list) of computer readable storage medium includes：There are one tools Or the electrical connections of multiple conducting wires, portable computer diskette, hard disk, random access memory (RAM), read-only memory (ROM), Erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light Memory device, magnetic memory device or above-mentioned any appropriate combination.In this document, computer readable storage medium can To be any tangible medium for including or storing program, the program can be commanded execution system, device or device use or Person is in connection.

Computer-readable signal media can include in a base band or as carrier wave a part propagation data-signal, Wherein carry computer-readable program code.Diversified forms may be employed in the data-signal of this propagation, including --- but It is not limited to --- electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be Any computer-readable medium beyond computer readable storage medium, which can send, propagate or Transmission for by instruction execution system, device either device use or program in connection.

The program code included on computer-readable medium can be transmitted with any appropriate medium, including --- but it is unlimited In --- wireless, electric wire, optical cable, RF etc. or above-mentioned any appropriate combination.

It can write to perform the computer that operates of the present invention with one or more programming languages or its combination Program code, described program design language include object oriented program language-such as Java, Smalltalk, C++, Further include conventional procedural programming language-such as " C " language or similar programming language.Program code can be with It fully performs, partly perform on the user computer on the user computer, the software package independent as one performs, portion Divide and partly perform or perform on a remote computer or server completely on the remote computer on the user computer. Be related in the situation of remote computer, remote computer can pass through the network of any kind --- including LAN (LAN) or Wide area network (WAN)-be connected to subscriber computer or, it may be connected to outer computer (such as is carried using Internet service Pass through Internet connection for business).

In several embodiments provided by the present invention, it should be understood that disclosed system, apparatus and method can be with It realizes by another way.For example, the apparatus embodiments described above are merely exemplary, for example, the unit Division is only a kind of division of logic function, can there is other dividing mode in actual implementation.

The unit illustrated as separating component may or may not be physically separate, be shown as unit The component shown may or may not be physical location, you can be located at a place or can also be distributed to multiple In network element.Some or all of unit therein can be selected to realize the mesh of this embodiment scheme according to the actual needs 's.

In addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, it can also That unit is individually physically present, can also two or more units integrate in a unit.Above-mentioned integrated list The form that hardware had both may be employed in member is realized, can also be realized in the form of hardware adds SFU software functional unit.

The above-mentioned integrated unit realized in the form of SFU software functional unit, can be stored in one and computer-readable deposit In storage media.Above-mentioned SFU software functional unit is stored in a storage medium, is used including some instructions so that a computer It is each that equipment (can be personal computer, server or the network equipment etc.) or processor (processor) perform the present invention The part steps of embodiment the method.And foregoing storage medium includes：USB flash disk, mobile hard disk, read-only memory (Read- Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disc or CD etc. it is various The medium of program code can be stored.

The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all essences in the present invention God and any modification, equivalent substitution, improvement and etc. within principle, done, should be included within the scope of protection of the invention.

Claims

1. a kind of text error correction method based on artificial intelligence, which is characterized in that the described method includes：

Obtain target fragment and the target fragment corresponding original segments in original text of error correction in corrected text；It is described Target fragment is when carrying out correction process to the original text based on segment scoring model trained in advance, from the original segments Multiple candidate segments in select；

According to the target fragment, the original segments and the feedback information, increment is carried out to the segment scoring model Training；

2. according to the method described in claim 1, it is characterized in that, according to the target fragment, the original segments and institute Feedback information is stated, incremental training is carried out to the segment scoring model, is specifically included：

It is given a mark according to the relative characteristic information and the preferable of the target fragment, the segment scoring model is trained.

3. it according to the method described in claim 2, it is characterized in that, obtains between the target fragment and the original segments Relative characteristic information, including following at least one：

4. it according to the method described in claim 3, it is characterized in that, obtains between the target fragment and the original segments Relative mass feature, specifically includes：

Obtain the frequency, the original segments and the context in the original text that the original segments occur in corpus The frequency that the combination of segment occurs together in the corpus；

Obtain the frequency, the group of the target fragment and the context segment that the target fragment occurs in the corpus Close the frequency occurred together in the corpus；

The frequency, the group of the original segments and the context segment occurred according to the original segments in the corpus Close the frequency and the mesh that the frequency, the target fragment occurred together in the corpus occurs in the corpus The frequency that the combination of standard film section and the context segment occurs in the corpus, obtains the target fragment and the original Frequency ratio that beginning segment occurs in the corpus and the combination of the target fragment and the context segment with it is described Frequency ratio that the combination of original segments and the context segment occurs in the corpus and/or the target fragment with The frequency that the original segments occur in the corpus is poor and the combination of the target fragment and the context segment The frequency that combination with the original segments and the context segment occurs in the corpus is poor.

5. it according to the method described in claim 3, it is characterized in that, obtains between the target fragment and the original segments With respect to historical behavior feature, specifically include：

Obtain original segments and the context segment described in PT tables combination be revised as the target fragment with it is described up and down The second modification frequency of the combination of literary segment；

According to the described first modification frequency and the second modification frequency, obtain frequency ratio and/or the frequency is poor, described frequency ratio etc. In the described second modification frequency divided by the first modification frequency, the frequency difference subtracts described equal to the described second modification frequency The first modification frequency.

6. it according to the method described in claim 3, it is characterized in that, obtains between the target fragment and the original segments Semantic similarity feature, specifically includes：

Obtain the target fragment and the combination of the context segment and the original segments and the group of the context segment The semantic similarity of conjunction.

7. according to any methods of claim 3-6, which is characterized in that obtain the target fragment and the original segments Between relative characteristic information, further include following at least one；

According to default proper-noun dictionary, the specific term feature of the original segments and the target fragment is obtained respectively；With And

8. according to the method described in claim 2, it is characterized in that, the reason of the target fragment is determined according to the feedback information Want to give a mark, specifically include：

According to the feedback information, thus it is speculated that whether the user receives to replace institute using the target fragment in the corrected text State original segments；

If speculating, the user receives, and the preferable marking of the target fragment is arranged to 1；Otherwise, if speculating the user Do not receive, then the preferable marking of the target fragment is arranged to 0.

9. according to the method described in claim 2, it is characterized in that, according to the relative characteristic information and the target fragment Ideal marking, is trained the segment scoring model, specifically includes：

The relative characteristic information is inputted to the prediction marking for the segment scoring model, obtaining the segment scoring model；

If the prediction marking is less than the preferable parameter given a mark, adjust the segment scoring model so that the segment is beaten The prediction marking of sub-model output is changed towards increased direction；

If the prediction marking is more than the preferable parameter given a mark, adjust the segment scoring model so that the segment is beaten The prediction marking of sub-model output is changed towards the direction of reduction.

10. a kind of text error correction device based on artificial intelligence, which is characterized in that described device includes：

Segment data obtaining module, for obtaining in corrected text the target fragment of error correction and the target fragment in original text In corresponding original segments；The target fragment is to carry out error correction to the original text based on segment scoring model trained in advance During processing, selected from multiple candidate segments of the original segments；

Feedback information acquisition module, for obtaining feedback letter of the user to the objective result fed back based on the corrected text Breath；

Incremental training module, for according to the target fragment, the original segments and the feedback information, to the segment Scoring model carries out incremental training；

11. device according to claim 10, which is characterized in that the incremental training module specifically includes：

Relative characteristic information acquisition unit, for obtaining the letter of the relative characteristic between the target fragment and the original segments Breath；

Training unit for being given a mark according to the relative characteristic information and the preferable of the target fragment, gives a mark to the segment Model is trained.

12. according to the devices described in claim 11, which is characterized in that the relative characteristic information acquisition unit, for performing Following at least one operation：

13. device according to claim 12, which is characterized in that the relative characteristic information acquisition unit is specifically used for：

14. device according to claim 12, which is characterized in that the relative characteristic information acquisition unit is specifically used for：

15. device according to claim 12, which is characterized in that the relative characteristic information acquisition unit is specifically used for：

16. according to any devices of claim 12-15, which is characterized in that the relative characteristic information acquisition unit, also For performing following at least one；

17. according to the devices described in claim 11, which is characterized in that the determination unit is specifically used for：

18. according to the devices described in claim 11, which is characterized in that the training unit is specifically used for：

19. a kind of computer equipment, which is characterized in that the equipment includes：

One or more processors；

Memory, for storing one or more programs；

When one or more of programs are performed by one or more of processors so that one or more of processors are real The now method as described in any in claim 1-9.

20. a kind of computer-readable medium, is stored thereon with computer program, which is characterized in that the program is executed by processor Methods of the Shi Shixian as described in any in claim 1-9.