CN109977415A - A kind of text error correction method and device - Google Patents
A kind of text error correction method and device Download PDFInfo
- Publication number
- CN109977415A CN109977415A CN201910261329.6A CN201910261329A CN109977415A CN 109977415 A CN109977415 A CN 109977415A CN 201910261329 A CN201910261329 A CN 201910261329A CN 109977415 A CN109977415 A CN 109977415A
- Authority
- CN
- China
- Prior art keywords
- error correction
- candidate
- long text
- character
- long
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012937 correction Methods 0.000 title claims abstract description 383
- 238000000034 method Methods 0.000 title claims abstract description 50
- 238000012216 screening Methods 0.000 claims description 35
- 238000004891 communication Methods 0.000 claims description 19
- 238000013528 artificial neural network Methods 0.000 claims description 10
- 230000011218 segmentation Effects 0.000 claims description 9
- 238000004590 computer program Methods 0.000 claims description 8
- 230000002457 bidirectional effect Effects 0.000 claims description 6
- 238000012545 processing Methods 0.000 claims description 5
- 238000010586 diagram Methods 0.000 description 10
- 238000012549 training Methods 0.000 description 7
- 230000001427 coherent effect Effects 0.000 description 5
- 230000008569 process Effects 0.000 description 4
- 230000000306 recurrent effect Effects 0.000 description 3
- 230000000694 effects Effects 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 239000007787 solid Substances 0.000 description 2
- 230000008901 benefit Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000000802 evaporation-induced self-assembly Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000012905 input function Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003032 molecular docking Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/232—Orthographic correction, e.g. spell checking or vowelisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- Biomedical Technology (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Data Mining & Analysis (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Machine Translation (AREA)
Abstract
The embodiment of the invention provides a kind of text error correction method and devices.In this method, it will be divided into multiple segments to error correction long text, determine at least one corresponding candidate error correction segment of each segment;Based on identified each candidate error correction segment, determine at least one corresponding candidate long text of error correction long text;For at least one candidate long text and to each long text in error correction long text, it successively predicts in the long text, at least one candidate characters of each character bit, and the candidate probability of at least one candidate characters of each character bit, and the candidate probability obtained based on prediction, calculate the assessment score of the long text;Assessment score based at least one candidate long text and to the assessment score of error correction long text, error correction result of the determination to error correction long text.Text error correction method provided in an embodiment of the present invention, can be improved the accuracy rate of the error correction result of long text.
Description
Technical field
The present invention relates to text error correcting technique fields, more particularly to a kind of text error correction method and device.
Background technique
In the existing text error correction method for long text, long text is split as multiple shorter segments, for every
A segment carries out independent error correction, obtains the error correction result of each segment, and the error correction result of each segment is combined, obtains
The error correction result of long text.In error correction, each segment corresponds to multiple candidate error correction segments, finally chooses one for each segment
Preferable error correction segment, as error correction result.
However, inventor has found in the implementation of the present invention, at least there are the following problems for the prior art:
Existing text error correction method, essence or the error correction for being directed to segment, and each segment after error correction is connected
Whether clear and coherent pick up the semanteme after coming, whether context relation is connected, and existing text error correction method is there is no consideration, this is undoubtedly
Cause the accuracy rate of the error correction result of long text lower.
Summary of the invention
The embodiment of the present invention is designed to provide a kind of text error correction method and device, to improve the error correction knot of long text
The accuracy rate of fruit.Specific technical solution is as follows:
A kind of text error correction method, comprising:
It will be divided into multiple segments to error correction long text, determine at least one corresponding candidate error correction piece of each segment
Section;
Based on identified each candidate error correction segment, determine described long at least one corresponding candidate of error correction long text
Text;
For at least one described candidate long text and each long text in error correction long text, successively prediction should
In long text, the candidate of at least one candidate characters of at least one candidate characters and each character bit of each character bit
Probability, and the candidate probability obtained based on prediction, calculate the assessment score of the long text;Wherein, at least the one of each character bit
A candidate characters, for the character predicted based on the character on other character bits other than the character bit in the long text of place;
Based on the assessment score and the assessment score to error correction long text of at least one candidate long text, determine
The error correction result to error correction long text.
Optionally, described at least one described candidate long text and each long article in error correction long text
This, successively predicts in the long text, at least one candidate characters of each character bit and at least one time of each character bit
The candidate probability of word selection symbol, comprising:
It is preset at least one described candidate long text and each long text in error correction long text, utilization
Prediction model successively predicts in the long text that at least one candidate characters and each character bit of each character bit are at least
The candidate probability of one candidate characters;
Wherein, the prediction model, based in each character in sample long sentence and sample long sentence each character it is default general
Rate training obtains;The sample long sentence, including default sample long sentence and derivative sample long sentence, wherein the derivative sample long sentence,
Obtained sample long sentence, the default sample are replaced for the character on the target character position to the default sample long sentence
The probability of character on the probability of character on the target character position of long sentence, with the derivative sample long sentence target character position is not
Together.
Optionally, the input layer of the network structure of the prediction model, using bidirectional circulating neural network.
Optionally, the candidate probability obtained based on prediction, calculates the assessment score of the long text, comprising:
For each character bit of the long text, at least one candidate characters for the character bit that prediction is obtained, institute
The highest candidate characters of candidate probability are stated, the error correction character as the character bit;
Based on the candidate probability of the error correction character on each character bit, the assessment score of the long text is calculated.
Optionally, the candidate probability based on the error correction character on each character bit calculates the assessment point of the long text
Number, comprising:
For each character bit, the candidate probability of the former character on the character bit is determined, and calculate entangling on the character bit
The logarithm loss of the candidate probability of the candidate probability of error character and the former character;The original character is to be located in the long text
Character on the character bit;
The each logarithm being calculated loss is summed, and by summed result divided by the character digit of the long text,
Obtain the assessment score of the long text.
Optionally, the assessment score based at least one candidate long text and the commenting to error correction long text
Estimate score, determine the error correction result to error correction long text, comprising:
It determines at least one described candidate long text, the highest candidate long text of assessment score;
The assessment score for judging the highest candidate long text of the assessment score, with the assessment to error correction long text point
Several difference, if be greater than preset threshold value, if it does, by the highest candidate long text of the assessment score, as described
Error correction result to error correction long text.
Optionally, at least one corresponding candidate error correction segment of each segment of the determination, comprising:
For each segment, using preset language model, the corresponding multiple initial error correction segments of the segment are obtained, and
The assessment score of each initial error correction segment;Wherein, the segment that the language model is used to input carries out error correction and is error correction
Obtained segment calibration assessment score;
At least one initial error correction segment that possessed assessment score is met to the first default screening conditions, as the piece
At least one corresponding candidate error correction segment of section.
Optionally, described based on identified each candidate error correction segment, it determines described corresponding extremely to error correction long text
A few candidate long text, comprising:
Based on identified each candidate error correction segment, generate described to the corresponding multiple initial long articles of error correction long text
This;
For each initial long text, according in the initial long text, the assessment score of each candidate's error correction segment is determined
The assessment score of the initial long text;
The multiple initial long texts that will there is assessment score to meet the second default screening conditions, as described long to error correction
At least one corresponding candidate long text of text.
Optionally, described to be directed to each initial long text, according in the initial long text, each candidate's error correction segment is commented
Estimate score, determine the assessment score of the initial long text, comprising:
By the product of the assessment score of candidate's error correction segment each in the initial long text, as commenting for the initial long text
Estimate score, or
By the average value of the assessment score of each candidate error correction segment of the initial long text, as the initial long text
Assess score.
A kind of text error correction device, comprising:
First determining module determines that each segment is corresponding for that will be divided into multiple segments to error correction long text
At least one candidate error correction segment;
Second determining module is determined described corresponding to error correction long text based on identified each candidate error correction segment
At least one candidate long text;
Prediction module, for at least one described candidate long text and each long article in error correction long text
This, successively predicts in the long text, at least one candidate characters of each character bit and at least one time of each character bit
The candidate probability of word selection symbol;Wherein, at least one candidate characters of each character bit, for based on the character bit in the long text of place
The character that the character on other character bits in addition is predicted;
Computing module, for at least one described candidate long text and each long article in error correction long text
This calculates the assessment score of the long text based on the candidate probability that prediction obtains;
Third determining module, for assessment score based at least one candidate long text and described to error correction long article
This assessment score determines the error correction result to error correction long text.
Optionally, the prediction module, is specifically used for:
It is preset at least one described candidate long text and each long text in error correction long text, utilization
Prediction model successively predicts in the long text that at least one candidate characters and each character bit of each character bit are at least
The candidate probability of one candidate characters;
Wherein, the prediction model, based in each character in sample long sentence and sample long sentence each character it is default general
Rate training obtains;The sample long sentence, including default sample long sentence and derivative sample long sentence, wherein the derivative sample long sentence,
Obtained sample long sentence, the default sample are replaced for the character on the target character position to the default sample long sentence
The probability of character on the probability of character on the target character position of long sentence, with the derivative sample long sentence target character position is not
Together.
Optionally, the input layer of the network structure of the prediction model, using bidirectional circulating neural network.
Optionally, the computing module, including error correction submodule and computational submodule;
The error correction submodule, for being directed to each character bit of the long text, the character bit that prediction is obtained is extremely
In few candidate characters, the highest candidate characters of candidate probability, the error correction character as the character bit;
The computational submodule calculates commenting for the long text for the probability based on the error correction character on each character bit
Estimate score.
Optionally, the computational submodule, is specifically used for:
For each character bit, the candidate probability of the former character on the character bit is determined, and calculate entangling on the character bit
The logarithm loss of the probability of the probability of error character and the former character;The original character is to be located at the character bit in the long text
On character;
The each logarithm being calculated loss is summed, and by summed result divided by the character digit of the long text,
Obtain the assessment score of the long text.
Optionally, the third determining module, including the first determining submodule and judging submodule;
Described first determines submodule, for determining at least one described candidate long text, assesses the highest time of score
Select long text;
The judging submodule, it is and described for judging the assessment score of the highest candidate long text of assessment score
The difference of assessment score to error correction long text, if be greater than preset threshold value, if it does, the assessment score is highest
Candidate long text, as the error correction result to error correction long text.
Optionally, first determining module, including segmentation submodule, model application submodule and the first screening submodule
Block;
The segmentation submodule, for multiple segments will to be divided into error correction long text;
The model application submodule, using preset language model, it is corresponding to obtain the segment for being directed to each segment
Multiple initial error correction segments and each initial error correction segment assessment score;Wherein, the language model will be for that will input
The segment segment calibration assessment score that carries out error correction and obtain for error correction;
The first screening submodule, for possessed assessment score to be met at least the one of the first default screening conditions
A initial error correction segment, as at least one corresponding candidate error correction segment of the segment.
Optionally, second determining module, including generate submodule, the second determining submodule and the second screening submodule
Block;
The generation submodule, for generating described to error correction long text based on identified each candidate error correction segment
Corresponding multiple initial long texts;
Described second determines submodule, for being directed to each initial long text, according in the initial long text, each candidate
The assessment score of error correction segment determines the assessment score of the initial long text;
The second screening submodule, for will have assessment score to meet the multiple initial of the second default screening conditions
Long text, as described at least one corresponding candidate long text of error correction long text.
Optionally, it described second determines submodule, is specifically used for:
For each initial long text, by the product of the assessment score of candidate's error correction segment each in the initial long text,
As the assessment score of the initial long text, alternatively, by the assessment score of each candidate error correction segment of the initial long text
Average value, the assessment score as the initial long text.
A kind of electronic equipment, including processor, communication interface, memory and communication bus, wherein processor, communication connect
Mouthful, memory completes mutual communication by communication bus;
Memory, for storing computer program;
Processor when for executing the program stored on memory, realizes above-mentioned any text error correction side
Method.
At the another aspect that the present invention is implemented, a kind of computer readable storage medium is additionally provided, it is described computer-readable
Instruction is stored in storage medium, when run on a computer, so that computer executes any of the above-described text and entangles
Wrong method.
At the another aspect that the present invention is implemented, the embodiment of the invention also provides a kind of, and the computer program comprising instruction is produced
Product, when run on a computer, so that computer executes any of the above-described text error correction method.
In text error correction method provided in an embodiment of the present invention, to determine at least one candidate long article to error correction long text
This;For at least one candidate long text and to each long text in error correction long text, successively predict in the long text, it is each
The candidate probability of at least one candidate characters of at least one candidate characters and each character bit of character bit, and based on pre-
The candidate probability measured calculates the assessment score of the long text;Assessment score based on candidate long text and to error correction long article
This assessment score determines the error correction result to error correction long text.Due to each candidate characters in long text to be assessed,
It is obtained based on the Character prediction on other character bits other than corresponding character bit, therefore the assessment score for the long text being calculated,
Consider the linking degree of the semantic clear and coherent degree and context of long text.Therefore, text provided in an embodiment of the present invention
Error correction method is higher for the accuracy rate of the error correction result of long text.Certainly, it implements any of the products of the present invention or method must not
It is certain to need while reaching all the above advantage.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described.
Fig. 1 is a kind of flow chart of text error correction method provided in an embodiment of the present invention;
Fig. 2 (a) is the candidate that at least one candidate characters of first character position in long text are predicted using prediction model
The schematic diagram of probability;
Fig. 2 (b) is the candidate that at least one candidate characters of second character bit in long text are predicted using prediction model
The schematic diagram of probability;
Fig. 2 (c) is the candidate that at least one candidate characters of third character bit in long text are predicted using prediction model
The schematic diagram of probability;
Fig. 2 (d) is the candidate that at least one candidate characters of the 4th character bit in long text are predicted using prediction model
The schematic diagram of probability;
Fig. 3 is a kind of structural schematic diagram of text error correction device provided in an embodiment of the present invention;
Fig. 4 is the structural schematic diagram of a kind of electronic equipment provided in an embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention is described.
In order to improve long text error correction result accuracy rate, the embodiment of the invention provides a kind of text error correction method and
Device.
It should be noted that a kind of executing subject of text error correction method provided in an embodiment of the present invention, can be one kind
Text error correction device, the device can be applied in the electronic equipment with text input function.In a particular application, the electronics
Equipment can be mobile phone, computer, digital broadcast terminal, messaging devices, tablet device, Medical Devices, body-building equipment, or
Personal digital assistant etc..
Firstly, a kind of text error correction method provided in an embodiment of the present invention is described in detail.As shown in Figure 1, this hair
A kind of text error correction method that bright embodiment provides, may comprise steps of:
S101: will be divided into multiple segments to error correction long text, determine at least one corresponding candidate of each segment
Error correction segment.
Wherein, the specific implementations of multiple segments will be divided into error correction long text there are a variety of, illustratively, one
In kind implementation, the segmentation that treat error correction long text can be realized using language model.Wherein, language model may include N-
Gram language model, also referred to as N metagrammar model.N-gram language model realizes that the basic thought of text segmentation is: by text
The sliding window that size is N is carried out by number of characters to operate, and forms the segment that multiple character lengths are N.In another implementation
In, the segmentation for treating error correction long text can be realized according to the punctuation mark in error correction long text.It should be noted that appointing
The method what can will be divided into multiple segments to error correction long text, can be applied to text error correction side provided in an embodiment of the present invention
In method.
In addition, it is more to determine that the specific implementation of at least one corresponding candidate error correction segment of each segment exists
Kind.Illustratively, in one implementation, at least one corresponding candidate error correction segment of each segment is determined, it can be with
Include:
For each segment, using preset language model, the corresponding multiple initial error correction segments of the segment are obtained, and
The assessment score of each initial error correction segment;Wherein, the segment that the language model is used to input carries out error correction and is error correction
Obtained segment calibration assessment score;
At least one initial error correction segment that possessed assessment score is met to the first default screening conditions, as the piece
At least one corresponding candidate error correction segment of section.
Here, preset language model may include N-gram language model.It is available using N-gram language model
The probability of each initial error correction segment corresponding to each segment, the probability represent initial error correction segment as corresponding segment
Correct error correction segment probability, it is to be understood that the sum of the probability of each initial error correction segment corresponding to each segment
It is 1.It, can be by the probability of the initial error correction segment obtained using N-gram language model, as initial in the embodiment of the present invention
The assessment score of error correction segment.In addition, N value is bigger in N-gram language model, obtained initial error correction segment is more.
In practical applications, can rule of thumb or actual demand presets suitable N value, for example, the value of N can be 2,3 or 4
Deng.
It is understood that screening to initial error correction segment using the first default screening conditions, reduction can be played
The effect of the quantity of candidate error correction segment, to screen out unnecessary initial error correction segment for subsequent step.Wherein, using first
Default screening conditions may include: that assessment score is greater than preset the to the implementation that initial error correction segment is screened
The initial error correction segment of one threshold value is as candidate error correction segment;Alternatively, to each initial error correction segment corresponding to each segment,
Descending sort is carried out according to assessment score, and from the resulting sequence that sorts, X initial error correction segment is entangled as candidate before choosing
Wrong segment.Wherein, in the implementation for determining candidate error correction segment according to first threshold, first threshold can be greater than for one
0 and the numerical value less than 1.For example, initial error correction segment of the score greater than 0.5 can will be assessed as candidate error correction segment;It is arranging
Sequence determines in the implementation of candidate error correction segment that X can be more than or equal to 2 positive integer for one, for example, can be from sequence institute
In the sequence obtained, preceding 3 initial error correction segments are chosen as candidate error correction segment.
Illustratively, in another implementation, at least one corresponding candidate error correction piece of each segment is determined
Section, may include:
For each segment, using preset language model, the corresponding multiple initial error correction segments of the segment are obtained;
Using obtained multiple initial error correction segments as the corresponding candidate error correction segment of the segment, alternatively, from acquired
Multiple initial error correction segments in randomly select the segment of predetermined quantity as the corresponding candidate error correction segment of the segment.
It is emphasized that the tool of at least one corresponding candidate error correction segment of the above-mentioned each segment of the determination
Body implementation, it is merely exemplary, it should not constitute the restriction to the embodiment of the present invention.
S102: based on identified each candidate error correction segment, determine it is described to error correction long text it is corresponding at least one
Candidate long text.
Wherein, it is entangled to the corresponding candidate long text of error correction long text by least two candidates in each candidate error correction segment
Wrong segment is constituted.Specifically, for each candidate long text, the number of each candidate error correction segment included by candidate's long text
Amount is equal with the quantity of the segment to error correction long text, also, candidate error correction segment exists each of included by candidate's long text
Position in candidate's long text, segment corresponding with candidate's error correction segment, described identical to the position in corrected text.
Wherein, based on identified each candidate error correction segment, determine it is described to error correction long text it is corresponding at least one
There may be a variety of for the specific implementation of candidate long text.Illustratively, in one implementation, based on identified each
A candidate's error correction segment, it is determining described at least one corresponding candidate long text of error correction long text, may include:
Based on identified each candidate error correction segment, generate described to the corresponding multiple initial long articles of error correction long text
This;
For each initial long text, according in the initial long text, the assessment score of each candidate's error correction segment is determined
The assessment score of the initial long text;
The multiple initial long texts that will there is assessment score to meet the second default screening conditions, as described long to error correction
At least one corresponding candidate long text of text.It is understood that initial long text using the second default screening conditions into
Row screening can play the role of the quantity for reducing candidate long text, to screen unnecessary initial length for subsequent step
Text.
Wherein, it is entangled to the corresponding initial long text of error correction long text by least two candidates in each candidate error correction segment
Wrong segment is constituted.Specifically, being directed to each initial long text: the number of each candidate's error correction segment included by the initial long text
It measures equal with the quantity of the segment to error correction long text;Candidate error correction segment is initial at this each of included by the initial long text
Position in long text, segment corresponding with candidate's error correction segment, described identical to the position in corrected text.Actually answer
It, can be as much as possible to generate initial long text under the premise of the two conditions, in conjunction with the mode of permutation and combination in.
In addition, according in initial long text, the assessment score of each candidate's error correction segment determines the assessment of initial long text
There may also be a variety of for the specific implementation of score.It illustratively, in one implementation, can will be in initial long text
The product of the assessment score of each candidate's error correction segment, the assessment score as initial long text.In another implementation,
The average value of the assessment score of each candidate error correction segment of initial long text, the assessment as initial long text can be divided
Number.
In addition, being screened using the second default screening conditions to initial long text, it can play and reduce initial long text
Quantity effect, to screen out unnecessary initial long text for subsequent step.Wherein, using the second default screening item
Part may include: that possessed assessment score is greater than preset second threshold to the implementation that initial long text is screened
The initial long text of value, as candidate long text;Alternatively, descending sort is carried out according to assessment score to each initial long text,
And from the resulting sequence that sorts, Y initial error correction segment is as candidate error correction segment before choosing.It is understood that here
Second threshold, can be identical as the first threshold that uses when determining candidate's error correction segment in S101, can also be with first
Threshold value is different.In addition, the value of Y here, it can be identical as the value of X used when determining candidate's error correction segment in S101,
It can also be different from the value of X.
S103: at least one described candidate long text and each long text in error correction long text, successively
It predicts in the long text, at least one candidate characters of each character bit and at least one candidate characters of each character bit
Candidate probability calculate the assessment score of the long text and based on the obtained candidate probability of prediction;Wherein, each character bit
At least one candidate characters, for the word predicted based on the character on other character bits other than the character bit in the long text of place
Symbol.
Here, it successively predicts in long text, at least one candidate characters and each character bit of each character are at least
There are a variety of for the specific implementation of the candidate probability of one candidate characters.Illustratively, in one implementation, Ke Yili
With preset prediction model, successively to predict in long text, at least one candidate characters and each character of each character bit
The candidate probability of at least one candidate characters of position.
Wherein, the prediction model, based in each character in sample long sentence and sample long sentence each character it is default general
Rate training obtains;The sample long sentence, including default sample long sentence and derivative sample long sentence, wherein the derivative sample long sentence,
Obtained sample long sentence, the default sample are replaced for the character on the target character position to the default sample long sentence
The probability of character on the probability of character on the target character position of long sentence, with the derivative sample long sentence target character position is not
Together.
In the prior art, it is based on the language mould of RNN (Recurrent Neural Networks, Recognition with Recurrent Neural Network)
Type can predict subsequent character according to character currently entered.According to this feature, the language mould based on RNN can be used
Type realizes the error correction to shorter text.For example, when the short text of error correction is " I very number ", it can be according to " I of front
" number " word is corrected as " good " word by very " two words.However, in the case where carrying out error correction to long text, in addition to will be to long text
Each character carries out error correction, it is also contemplated that the correctness of entire long text, and in long text, each character not only with the character
The character of front is related, also related with the subsequent character of the character, therefore, if only come according to the character before each character bit
It predicts the character on the character bit, error correction is carried out to long text, error correction foundation is obviously not enough.For example, to error correction long article
When this is " I loves background Tian An-men ", if only according to " I likes " two words, to predict " I likes " subsequent character, candidate word
Symbol is very more, in this way, just cannot achieve the error correction to " background " the two words.And if combine " Tian An-men " these three
Word, the character for being just easy to predict between " I likes " and " Tian An-men " should be " Beijing ", rather than " background ", so as to
By " I loves background Tian An-men ", effectively error correction is " I loves Beijing Tian An-men ".
In the embodiment of the present invention, in order to realize the error correction to long text, two-way RNN (Bi-directional can be based on
Recurrent Neural Networks, bidirectional circulating neural network) build the prediction model, two-way RNN is by two
RNN is docking together composition, can be in the character on any character position for predicting long text, known to before the character
Character and the subsequent known character of the character, predict the character on the character bit accordingly, to realize to the effective of long text
Error correction.
In practical applications, the network structure of the prediction model can be divided into input layer, middle layer and output layer.
Wherein, input layer uses two-way RNN, and the output of input layer is codetermined by two RNN docked;The output of input layer connects centre
The input of layer;The network structure of middle layer can specifically determine according to application scenarios, such as can be full articulamentum, the present invention couple
The network structure of middle layer is without limitation;The output of middle layer connects the input of output layer;Output layer can use Softmax layers,
Using softmax layers, the probability for the candidate characters that each character for long text can be made to predict, and it is equal to 1.
It is understood that above-mentioned input layer, middle layer and output layer, are functional level, do not represent pre-
The level quantity for surveying the network structure of model is only three layers.Wherein, input layer, middle layer or output layer, network structure all may be used
To be multilayer.For example, when output layer is Softmax layers, if the vector dimension of the output of scheduled middle layer, with
When the vector dimension of Softmax layers of input mismatches, then one can be arranged between scheduled middle layer and Softmax layers
Full articulamentum constitutes new middle layer then the full connection and scheduled middle layer are together.
In addition, the default sample long sentence of training prediction model can be using corresponding vertical in the training stage of prediction model
The natural discourse of field or general field.For example, default sample long sentence are as follows: " I loves Beijing Tian An-men " then presets sample long sentence
Corresponding derivative sample long sentence may include: " I suffers Beijing Tian An-men ", " I likes Beijing Tian Ans ", " I loves background Tian An-men "
And " nest loves Beijing Tian An-men " etc..Wherein, presetting each character in sample long sentence has preset probability value, derivative sample
In this long sentence, probability with default sample long sentence identical characters can be equal with the probability for corresponding to character in default sample long sentence;
And derive in sample long sentence, the probability of the character different from default sample long sentence, then it can be corresponding with default sample long sentence
The probability of character is different.For example, the probability of each character in default sample long sentence all can be 1, and derive in sample long sentence,
Character identical with default sample long sentence, probability may be 1, and the character different from default sample long sentence: " suffering ",
" ", " back ", " scape " and " nest " these words probability all can be 0.Correspondingly, in the service stage of prediction model, institute
Predict the obtained candidate probability of candidate characters, it all can be value between 0-1.
Fig. 2 (a) is the candidate that at least one candidate characters of first character position in long text are predicted using prediction model
The schematic diagram of probability;Fig. 2 (b) is that at least one candidate characters of second character bit in long text are predicted using prediction model
The schematic diagram of candidate probability;Fig. 2 (c) is at least one candidate word that third character bit in long text is predicted using prediction model
The schematic diagram of the candidate probability of symbol;Fig. 2 (d) is at least one time that the 4th character bit in long text is predicted using prediction model
The schematic diagram of the candidate probability of word selection symbol.It can be seen that the prediction model shelters from long text similar to a word filling model every time
A character bit on character, the word filling on the character bit, until the character that each character bit of long text can be filled in
It is predicted out.In addition, it can further be seen that the sum of the candidate probability of the corresponding candidate characters of each character bit is 1 from Fig. 2.
In addition, being based at least one described candidate long text and each long text in error correction long text
Predict obtained candidate probability, calculating the specific implementation of the assessment score of the long text, there are a variety of.Illustratively, one
In kind implementation, based on the candidate probability that prediction obtains, the assessment score of the long text is calculated, may include:
For each character bit of the long text, at least one candidate characters for the character bit that prediction is obtained, institute
The highest candidate characters of candidate probability are stated, the error correction character as the character bit;
Based on the candidate probability of the error correction character on each character bit, the assessment score of the long text is calculated.
It is understood that when predicting the candidate characters on a certain character bit of long text, each time for predicting
Word selection symbol, may include the former character on the character bit.As shown in Fig. 2 (d), when predicted using prediction model " abcd " this
When candidate characters in long text on the 4th character bit, the candidate characters predicted include: " c ", " d " and " b ".Its
In, " d " is the former character, is the maximum candidate characters of candidate probability, therefore " d " is also since the candidate probability of " d " is 70%
The error correction character.
In addition, the probability based on the error correction character on each character bit, calculates the specific reality of the assessment score of the long text
There are a variety of for existing mode.Illustratively, in one implementation, the probability based on the error correction character on each character bit, meter
The assessment score for calculating the long text may include:
For each character bit, the candidate probability of the former character on the character bit is determined, and calculate entangling on the character bit
The logarithm loss of the candidate probability of the candidate probability of error character and former character;The original character is to be located at the word in the long text
Accord with the character on position;
The each logarithm being calculated loss is summed, and by summed result divided by the character digit of the long text,
Obtain the assessment score of the long text.
Wherein, for each character bit, determine that the specific implementation of the candidate probability of the former character on the character bit is deposited
It, illustratively, in one implementation,, can in each candidate characters for predicting for each character bit a variety of
To include former character, correspondingly, being assured that the candidate probability of former character.
It should be noted that in the embodiment of the present invention, since candidate long text can satisfy the first screening conditions and second
Screening conditions, therefore for the former character in candidate long text, seldom there are the candidate characters predicted does not include original
The case where character.However, if the charactor comparison on some character bit is uncommon, may be deposited for error correction long text
In such case.For example, being " my Beijing Tian An-men " to error correction long text, predict to second character bit of error correction long text
When candidate characters, each candidate characters position predicted includes: " love ", " suffering " and " ", does not include " " this character.
In response to this, " " can be determined as a lower value, such as 0 etc. by the candidate probability of this former character.
S104: assessment score and the assessment to error correction long text point based at least one candidate long text
Number determines the error correction result to error correction long text.
Here, assessment score and the assessment to error correction long text point based at least one candidate long text
Number determines the specific implementation of the error correction result to error correction long text there are a variety of, illustratively, in a kind of realization side
In formula, based on the assessment score and the assessment score to error correction long text of at least one candidate long text, institute is determined
The error correction result to error correction long text is stated, may include:
It determines at least one described candidate long text, the highest candidate long text of assessment score;
The assessment score for judging the highest candidate long text of the assessment score, with the assessment to error correction long text point
Several difference, if be greater than preset threshold value, if it does, by the highest candidate long text of the assessment score, as described
Error correction result to error correction long text.
Here, preset threshold value can be more than or equal to 0 numerical value for one.
It is understood that when the assessment score that judging result is the highest candidate long text of the assessment score, with institute
State to error correction long text assessment score difference, be not more than preset threshold value when, then it represents that error correction long text do not need into
Row error correction.
In text error correction method provided in an embodiment of the present invention, to determine at least one candidate long article to error correction long text
This;For at least one candidate long text and to each long text in error correction long text, successively predict in the long text, it is each
The candidate probability of at least one candidate characters of at least one candidate characters and each character bit of character bit, and based on pre-
The candidate probability measured calculates the assessment score of the long text;Assessment score based on candidate long text and to error correction long article
This assessment score determines the error correction result to error correction long text.Due to each candidate characters in long text to be assessed,
It is obtained based on the Character prediction on other character bits other than corresponding character bit, therefore the assessment score for the long text being calculated,
Consider the linking degree of the semantic clear and coherent degree and context of long text.Therefore, text provided in an embodiment of the present invention
Error correction method is higher for the accuracy rate of the error correction result of long text.
Corresponding to a kind of above-mentioned text error correction method, the present invention also provides a kind of text error correction devices.Such as Fig. 3 institute
Show, the present invention also provides a kind of text error correction device, may include:
First determining module 301 determines that each segment respectively corresponds for that will be divided into multiple segments to error correction long text
At least one candidate error correction segment;
Second determining module 302 is determined described corresponding to error correction long text based on identified each candidate error correction segment
At least one candidate long text;
Prediction module 303, for at least one described candidate long text and described to each in error correction long text
Long text successively predicts in the long text, at least the one of at least one candidate characters of each character bit and each character bit
The candidate probability of a candidate characters;Wherein, at least one candidate characters of each character bit, for based on the word in the long text of place
The character that the character on other character bits other than symbol position is predicted;
Computing module 304, for at least one described candidate long text and described to each in error correction long text
Long text calculates the assessment score of the long text based on the candidate probability that prediction obtains;
Third determining module 305, for assessment score based at least one candidate long text and described to error correction
The assessment score of long text determines the error correction result to error correction long text.
Optionally, in one implementation, the prediction module 303, specifically can be used for:
It is preset at least one described candidate long text and each long text in error correction long text, utilization
Prediction model successively predicts in the long text that at least one candidate characters and each character bit of each character bit are at least
The candidate probability of one candidate characters;
Wherein, the prediction model, based in each character in sample long sentence and sample long sentence each character it is default general
Rate training obtains;The sample long sentence, including default sample long sentence and derivative sample long sentence, wherein the derivative sample long sentence,
Obtained sample long sentence, the default sample are replaced for the character on the target character position to the default sample long sentence
The probability of character on the probability of character on the target character position of long sentence, with the derivative sample long sentence target character position is not
Together.
Optionally, in one implementation, the input layer of the network structure of the prediction model can be followed using two-way
Ring neural network.
Optionally, in one implementation, the computing module 304 may include error correction submodule and calculating submodule
Block;
Wherein, the error correction submodule will predict the obtained character bit for being directed to each character bit of the long text
At least one candidate characters in, the highest candidate characters of candidate probability, the error correction character as the character bit;
The computational submodule calculates the length for the probability based on error correction character and former character on each character bit
The assessment score of text.
Optionally, in one implementation, the computational submodule can be specifically used for:
For each character bit, the candidate probability of the former character on the character bit is determined, and calculate entangling on the character bit
The logarithm loss of the probability of the probability of error character and former character;The original character is in the long text, on the character bit
Character;
The each logarithm being calculated loss is summed, and by summed result divided by the character digit of the long text,
Obtain the assessment score of the long text.
Optionally, in one implementation, the third determining module 305, may include first determine submodule and
Judging submodule;
Described first determines submodule, for determining at least one described candidate long text, assesses the highest time of score
Select long text;
The judging submodule, it is and described for judging the assessment score of the highest candidate long text of assessment score
The difference of assessment score to error correction long text, if be greater than preset threshold value, if it does, the assessment score is highest
Candidate long text, as the error correction result to error correction long text.
Optionally, in one implementation, first determining module 301 may include that segmentation submodule, model are answered
With submodule and the first screening submodule;
The segmentation submodule, for multiple segments will to be divided into error correction long text;
The model application submodule, using preset language model, it is corresponding to obtain the segment for being directed to each segment
Multiple initial error correction segments and each initial error correction segment assessment score;Wherein, the language model will be for that will input
The segment segment calibration assessment score that carries out error correction and obtain for error correction;
The first screening submodule, for possessed assessment score to be met at least the one of the first default screening conditions
A initial error correction segment, as at least one corresponding candidate error correction segment of the segment.
Optionally, in one implementation, second determining module 302 may include generating submodule, second really
Stator modules and the second screening submodule;
The generation submodule, for generating described to error correction long text based on identified each candidate error correction segment
Corresponding multiple initial long texts;
Described second determines submodule, for being directed to each initial long text, according in the initial long text, each candidate
The assessment score of error correction segment determines the assessment score of the initial long text;
The second screening submodule, for will have assessment score to meet the multiple initial of the second default screening conditions
Long text, as described at least one corresponding candidate long text of error correction long text.
Optionally, in one implementation, it described second determines submodule, can be specifically used for:
For each initial long text, by the product of the assessment score of candidate's error correction segment each in the initial long text,
As the assessment score of the initial long text, alternatively, by the assessment score of each candidate error correction segment of the initial long text
Average value, the assessment score as the initial long text.
Text error correction device provided in an embodiment of the present invention, to determine at least one candidate long text to error correction long text;
For at least one candidate long text and to each long text in error correction long text, successively predict in the long text, each word
The candidate probability of at least one candidate characters of position and at least one candidate characters of each character bit is accorded with, and based on prediction
Obtained candidate probability calculates the assessment score of the long text;Assessment score based on candidate long text and to error correction long text
Assessment score, determine error correction result to error correction long text.It is general due to each candidate characters in long text to be assessed
Rate, the Character prediction being based on other character bits other than corresponding character bit obtains, therefore the assessment for the long text being calculated
Score, it is contemplated that the linking degree of the semantic clear and coherent degree and context of long text.Therefore, the embodiment of the present invention provides
Text error correction device, it is higher for the accuracy rate of the error correction result of long text.
The embodiment of the invention also provides a kind of electronic equipment, as shown in figure 4, include processor 401, communication interface 402,
Memory 403 and communication bus 404, wherein processor 401, communication interface 402, memory 403 are complete by communication bus 404
At mutual communication,
Memory 403, for storing computer program;
Processor 401 performs the steps of when for executing the program stored on memory 403
It will be divided into multiple segments to error correction long text, determine at least one corresponding candidate error correction piece of each segment
Section;
Based on identified each candidate error correction segment, determine described long at least one corresponding candidate of error correction long text
Text;
For at least one described candidate long text and each long text in error correction long text, successively prediction should
In long text, the candidate of at least one candidate characters of at least one candidate characters and each character bit of each character bit
Probability, and the candidate probability obtained based on prediction, calculate the assessment score of the long text;Wherein, at least the one of each character bit
A candidate characters, for the character predicted based on the character on other character bits other than the character bit in the long text of place;
Based on the assessment score and the assessment score to error correction long text of at least one candidate long text, determine
The error correction result to error correction long text.
Optionally, described at least one described candidate long text and each long article in error correction long text
This, successively predicts in the long text, at least one candidate characters of each character bit and at least one time of each character bit
The candidate probability of word selection symbol, comprising:
It is preset at least one described candidate long text and each long text in error correction long text, utilization
Prediction model successively predicts in the long text that at least one candidate characters and each character bit of each character bit are at least
The candidate probability of one candidate characters;
Wherein, the prediction model, based in each character in sample long sentence and sample long sentence each character it is default general
Rate training obtains;The sample long sentence, including default sample long sentence and derivative sample long sentence, wherein the derivative sample long sentence,
Obtained sample long sentence, the default sample are replaced for the character on the target character position to the default sample long sentence
The probability of character on the probability of character on the target character position of long sentence, with the derivative sample long sentence target character position is not
Together.
Optionally, the input layer of the network structure of the prediction model, using bidirectional circulating neural network.
Optionally, the candidate probability obtained based on prediction, calculates the assessment score of the long text, comprising:
For each character bit of the long text, at least one candidate characters for the character bit that prediction is obtained, institute
The highest candidate characters of candidate probability are stated, the error correction character as the character bit;
Based on the candidate probability of the error correction character on each character bit, the assessment score of the long text is calculated.
Optionally, the candidate probability based on the error correction character on each character bit calculates the assessment point of the long text
Number, comprising:
For each character bit, the candidate probability of the former character on the character bit is determined, and calculate entangling on the character bit
The logarithm loss of the candidate probability of the candidate probability of error character and the former character;The original character is to be located in the long text
Character on the character bit;
The each logarithm being calculated loss is summed, and by summed result divided by the character digit of the long text,
Obtain the assessment score of the long text.
Optionally, the assessment score based at least one candidate long text and the commenting to error correction long text
Estimate score, determine the error correction result to error correction long text, comprising:
It determines at least one described candidate long text, the highest candidate long text of assessment score;
The assessment score for judging the highest candidate long text of the assessment score, with the assessment to error correction long text point
Several difference, if be greater than preset threshold value, if it does, by the highest candidate long text of the assessment score, as described
Error correction result to error correction long text.
Optionally, at least one corresponding candidate error correction segment of each segment of the determination, comprising:
For each segment, using preset language model, the corresponding multiple initial error correction segments of the segment are obtained, and
The assessment score of each initial error correction segment;Wherein, the segment that the language model is used to input carries out error correction and is error correction
Obtained segment calibration assessment score;
At least one initial error correction segment that possessed assessment score is met to the first default screening conditions, as the piece
At least one corresponding candidate error correction segment of section.
Optionally, described based on identified each candidate error correction segment, it determines described corresponding extremely to error correction long text
A few candidate long text, comprising:
Based on identified each candidate error correction segment, generate described to the corresponding multiple initial long articles of error correction long text
This;
For each initial long text, according in the initial long text, the assessment score of each candidate's error correction segment is determined
The assessment score of the initial long text;
The multiple initial long texts that will there is assessment score to meet the second default screening conditions, as described long to error correction
At least one corresponding candidate long text of text.
Optionally, described to be directed to each initial long text, according in the initial long text, each candidate's error correction segment is commented
Estimate score, determine the assessment score of the initial long text, comprising:
By the product of the assessment score of candidate's error correction segment each in the initial long text, as commenting for the initial long text
Estimate score, or
By the average value of the assessment score of each candidate error correction segment of the initial long text, as the initial long text
Assess score.
Electronic equipment provided in an embodiment of the present invention, to determine at least one candidate long text to error correction long text;For
At least one candidate long text and to each long text in error correction long text, is successively predicted in the long text, each character bit
At least one candidate characters and each character bit at least one candidate characters candidate probability, and based on prediction obtain
Candidate probability, calculate the assessment score of the long text;Assessment score based on candidate long text and commenting to error correction long text
Estimate score, determines the error correction result to error correction long text.Due to each candidate characters in long text to be assessed, it is based on pair
The Character prediction on other character bits other than character bit is answered to obtain, therefore the assessment score for the long text being calculated, it is contemplated that
The linking degree of the semantic clear and coherent degree and context of long text.Therefore, electronic equipment provided in an embodiment of the present invention,
It is higher for the accuracy rate of the error correction result of long text.
The communication bus that above-mentioned electronic equipment is mentioned can be Peripheral Component Interconnect standard (Peripheral Component
Interconnect, abbreviation PCI) bus or expanding the industrial standard structure (Extended Industry Standard
Architecture, abbreviation EISA) bus etc..The communication bus can be divided into address bus, data/address bus, control bus etc..
Only to be indicated with a thick line in figure, it is not intended that an only bus or a type of bus convenient for indicating.
Communication interface is for the communication between above-mentioned electronic equipment and other equipment.
Memory may include random access memory (Random Access Memory, abbreviation RAM), also may include
Nonvolatile memory (non-volatile memory), for example, at least a magnetic disk storage.Optionally, memory may be used also
To be storage device that at least one is located remotely from aforementioned processor.
Above-mentioned processor can be general processor, including central processing unit (Central Processing Unit,
Abbreviation CPU), network processing unit (Network Processor, abbreviation NP) etc.;It can also be digital signal processor
(Digital Signal Processing, abbreviation DSP), specific integrated circuit (Application Specific
Integrated Circuit, abbreviation ASIC), field programmable gate array (Field-Programmable Gate Array,
Abbreviation FPGA) either other programmable logic device, discrete gate or transistor logic, discrete hardware components.
In another embodiment provided by the invention, a kind of computer readable storage medium is additionally provided, which can
It reads to be stored with instruction in storage medium, when run on a computer, so that computer executes any institute in above-described embodiment
The text error correction method stated.
In another embodiment provided by the invention, a kind of computer program product comprising instruction is additionally provided, when it
When running on computers, so that computer executes any text error correction method in above-described embodiment.
In the above-described embodiments, can come wholly or partly by software, hardware, firmware or any combination thereof real
It is existing.When implemented in software, it can entirely or partly realize in the form of a computer program product.The computer program
Product includes one or more computer instructions.When loading on computers and executing the computer program instructions, all or
It partly generates according to process or function described in the embodiment of the present invention.The computer can be general purpose computer, dedicated meter
Calculation machine, computer network or other programmable devices.The computer instruction can store in computer readable storage medium
In, or from a computer readable storage medium to the transmission of another computer readable storage medium, for example, the computer
Instruction can pass through wired (such as coaxial cable, optical fiber, number from a web-site, computer, server or data center
User's line (DSL)) or wireless (such as infrared, wireless, microwave etc.) mode to another web-site, computer, server or
Data center is transmitted.The computer readable storage medium can be any usable medium that computer can access or
It is comprising data storage devices such as one or more usable mediums integrated server, data centers.The usable medium can be with
It is magnetic medium, (for example, floppy disk, hard disk, tape), optical medium (for example, DVD) or semiconductor medium (such as solid state hard disk
Solid State Disk (SSD)) etc..
It should be noted that, in this document, relational terms such as first and second and the like are used merely to a reality
Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation
In any actual relationship or order or sequence.Moreover, the terms "include", "comprise" or its any other variant are intended to
Non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those
Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or equipment
Intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that
There is also other identical elements in process, method, article or equipment including the element.
Each embodiment in this specification is all made of relevant mode and describes, same and similar portion between each embodiment
Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for device and
For electronic equipment embodiment, since it is substantially similar to the method embodiment, so be described relatively simple, related place referring to
The part of embodiment of the method illustrates.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the scope of the present invention.It is all
Any modification, equivalent replacement, improvement and so within the spirit and principles in the present invention, are all contained in protection scope of the present invention
It is interior.
Claims (19)
1. a kind of text error correction method characterized by comprising
It will be divided into multiple segments to error correction long text, determine at least one corresponding candidate error correction segment of each segment;
Based on identified each candidate error correction segment, determine described at least one corresponding candidate long article of error correction long text
This;
For at least one described candidate long text and each long text in error correction long text, the long article is successively predicted
In this, the candidate probability of at least one candidate characters of at least one candidate characters and each character bit of each character bit,
And the candidate probability obtained based on prediction, calculate the assessment score of the long text;Wherein, at least one candidate of each character bit
Character, for the character predicted based on the character on other character bits other than the character bit in the long text of place;
Based on the assessment score and the assessment score to error correction long text of at least one candidate long text, described in determination
Error correction result to error correction long text.
2. the method according to claim 1, wherein described at least one described candidate long text and described
It to each long text in error correction long text, successively predicts in the long text, at least one candidate characters of each character bit, with
And the candidate probability of at least one candidate characters of each character bit, comprising:
For at least one described candidate long text and each long text in error correction long text, preset prediction is utilized
Model successively predicts in the long text, at least one candidate characters of each character bit and each character bit at least one
The candidate probability of candidate characters;
Wherein, the prediction model, the predetermined probabilities instruction based on each character in each character in sample long sentence and sample long sentence
Practice and obtains;The sample long sentence, including default sample long sentence and derivative sample long sentence, wherein the derivative sample long sentence, is right
Character on the target character position of the default sample long sentence is replaced obtained sample long sentence, the default sample long sentence
Target character position on character probability, it is different from the probability of character on the derivative sample long sentence target character position.
3. according to the method described in claim 2, it is characterized in that, the input layer of the network structure of the prediction model, uses
Bidirectional circulating neural network.
4. the method according to claim 1, wherein the candidate probability obtained based on prediction, calculates the length
The assessment score of text, comprising:
For each character bit of the long text, at least one candidate characters for the character bit that prediction is obtained, the time
The highest candidate characters of probability are selected, the error correction character as the character bit;
Based on the candidate probability of the error correction character on each character bit, the assessment score of the long text is calculated.
5. according to the method described in claim 4, it is characterized in that, the candidate based on the error correction character on each character bit
Probability calculates the assessment score of the long text, comprising:
For each character bit, the candidate probability of the former character on the character bit is determined, and calculate the erroneous character correction on the character bit
The logarithm loss of the candidate probability of the candidate probability of symbol and the former character;The original character is to be located at the word in the long text
Accord with the character on position;
The each logarithm being calculated loss is summed, and summed result is obtained divided by the character digit of the long text
The assessment score of the long text.
6. method according to claim 1-5, which is characterized in that described based at least one described candidate long article
This assessment score and the assessment score to error correction long text determines the error correction result to error correction long text, comprising:
It determines at least one described candidate long text, the highest candidate long text of assessment score;
The assessment score for judging the highest candidate long text of the assessment score, with the assessment score to error correction long text
Difference, if be greater than preset threshold value, if it does, by the highest candidate long text of the assessment score, as described wait entangle
The error correction result of wrong long text.
7. the method according to claim 1, wherein at least one corresponding time of each segment of the determination
Select error correction segment, comprising:
For each segment, using preset language model, the corresponding multiple initial error correction segments of the segment are obtained, and each
The assessment score of initial error correction segment;Wherein, the language model is used to the segment of input carrying out error correction and obtain for error correction
Segment calibration assessment score;
At least one initial error correction segment that possessed assessment score is met to the first default screening conditions, as the segment pair
The candidate error correction segment of at least one answered.
8. the method according to the description of claim 7 is characterized in that described based on identified each candidate error correction segment, really
It is fixed described at least one corresponding candidate long text of error correction long text, comprising:
Based on identified each candidate error correction segment, generate described to the corresponding multiple initial long texts of error correction long text;
For each initial long text, according in the initial long text, the assessment score of each candidate's error correction segment determines that this is first
The assessment score of beginning long text;
The multiple initial long texts that will there is assessment score to meet the second default screening conditions, as described to error correction long text
At least one corresponding candidate long text.
9. according to the method described in claim 8, it is characterized in that, described be directed to each initial long text, according to the initial length
In text, the assessment score of each candidate's error correction segment determines the assessment score of the initial long text, comprising:
Assessment point by the product of the assessment score of candidate's error correction segment each in the initial long text, as the initial long text
Number, or
Assessment by the average value of the assessment score of each candidate error correction segment of the initial long text, as the initial long text
Score.
10. a kind of text error correction device characterized by comprising
First determining module determines that each segment is corresponding at least for that will be divided into multiple segments to error correction long text
One candidate error correction segment;
Second determining module is determined described corresponding at least to error correction long text based on identified each candidate error correction segment
One candidate long text;
Prediction module, for being directed at least one described candidate long text and each long text in error correction long text,
It successively predicts in the long text, at least one candidate characters of each character bit and at least one candidate of each character bit
The candidate probability of character;Wherein, at least one candidate characters of each character bit, for based on the character bit in the long text of place with
The character that character on other outer character bits is predicted;
Computing module, for being directed at least one described candidate long text and each long text in error correction long text,
Based on the candidate probability that prediction obtains, the assessment score of the long text is calculated;
Third determining module, for assessment score based at least one candidate long text and described to error correction long text
Score is assessed, determines the error correction result to error correction long text.
11. device according to claim 10, which is characterized in that the prediction module is specifically used for:
For at least one described candidate long text and each long text in error correction long text, preset prediction is utilized
Model successively predicts in the long text, at least one candidate characters of each character bit and each character bit at least one
The candidate probability of candidate characters;
Wherein, the prediction model, the predetermined probabilities instruction based on each character in each character in sample long sentence and sample long sentence
Practice and obtains;The sample long sentence, including default sample long sentence and derivative sample long sentence, wherein the derivative sample long sentence, is right
Character on the target character position of the default sample long sentence is replaced obtained sample long sentence, the default sample long sentence
Target character position on character probability, it is different from the probability of character on the derivative sample long sentence target character position.
12. device according to claim 11, which is characterized in that the input layer of the network structure of the prediction model is adopted
With bidirectional circulating neural network.
13. device according to claim 10, which is characterized in that the computing module, including error correction submodule and calculating
Submodule;
The error correction submodule, for being directed to each character bit of the long text, at least the one of the character bit that prediction is obtained
In a candidate characters, the highest candidate characters of candidate probability, the error correction character as the character bit;
The computational submodule calculates the assessment point of the long text for the probability based on the error correction character on each character bit
Number.
14. device according to claim 13, which is characterized in that the computational submodule is specifically used for:
For each character bit, the candidate probability of the former character on the character bit is determined, and calculate the erroneous character correction on the character bit
The logarithm loss of the probability of the probability of symbol and the former character;The original character is in the long text, on the character bit
Character;
The each logarithm being calculated loss is summed, and summed result is obtained divided by the character digit of the long text
The assessment score of the long text.
15. the described in any item devices of 0-14 according to claim 1, which is characterized in that the third determining module, including first
Determine submodule and judging submodule;
Described first determines submodule, and for determining at least one described candidate long text, assessment score is highest candidate long
Text;
The judging submodule, for judging the assessment score of the highest candidate long text of the assessment score, with described wait entangle
The difference of the assessment score of wrong long text, if be greater than preset threshold value, if it does, by the highest candidate of the assessment score
Long text, as the error correction result to error correction long text.
16. device according to claim 10, which is characterized in that first determining module, including segmentation submodule, mould
Type application submodule and the first screening submodule;
The segmentation submodule, for multiple segments will to be divided into error correction long text;
The model application submodule, for it is corresponding more to obtain the segment using preset language model for each segment
The assessment score of a initial error correction segment and each initial error correction segment;Wherein, the language model is used for the piece that will be inputted
Duan Jinhang error correction and the segment calibration assessment score obtained for error correction;
The first screening submodule, at the beginning of possessed assessment score is met at least one of the first default screening conditions
Beginning error correction segment, as at least one corresponding candidate error correction segment of the segment.
17. device according to claim 16, which is characterized in that second determining module, including generate submodule, the
Two determine submodule and the second screening submodule;
The generation submodule, for generating described corresponding to error correction long text based on identified each candidate error correction segment
Multiple initial long texts;
Described second determines submodule, for being directed to each initial long text, according in the initial long text, and each candidate's error correction
The assessment score of segment determines the assessment score of the initial long text;
The second screening submodule, multiple initial long articles for will there is assessment score to meet the second default screening conditions
This, as described at least one corresponding candidate long text of error correction long text.
18. device according to claim 17, which is characterized in that described second determines submodule, is specifically used for:
For each initial long text, by the product of the assessment score of candidate's error correction segment each in the initial long text, as
The assessment score of the initial long text, alternatively, being averaged the assessment score of each candidate error correction segment of the initial long text
Value, the assessment score as the initial long text.
19. a kind of electronic equipment, which is characterized in that including processor, communication interface, memory and communication bus, wherein processing
Device, communication interface, memory complete mutual communication by communication bus;
Memory, for storing computer program;
Processor when for executing the program stored on memory, realizes any method and step of claim 1-9.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910261329.6A CN109977415A (en) | 2019-04-02 | 2019-04-02 | A kind of text error correction method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910261329.6A CN109977415A (en) | 2019-04-02 | 2019-04-02 | A kind of text error correction method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109977415A true CN109977415A (en) | 2019-07-05 |
Family
ID=67082453
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910261329.6A Pending CN109977415A (en) | 2019-04-02 | 2019-04-02 | A kind of text error correction method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109977415A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111368918A (en) * | 2020-03-04 | 2020-07-03 | 拉扎斯网络科技(上海)有限公司 | Text error correction method and device, electronic equipment and storage medium |
CN111859907A (en) * | 2020-06-11 | 2020-10-30 | 北京百度网讯科技有限公司 | Character error correction method and device, electronic equipment and storage medium |
CN112307208A (en) * | 2020-11-05 | 2021-02-02 | Oppo广东移动通信有限公司 | Long text classification method, terminal and computer storage medium |
CN112949289A (en) * | 2019-12-11 | 2021-06-11 | 北大方正集团有限公司 | Method, device and system for detecting word stacking errors |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103473003A (en) * | 2013-09-12 | 2013-12-25 | 天津三星通信技术研究有限公司 | Character input error correction method and device |
CN104951779A (en) * | 2014-03-24 | 2015-09-30 | 中国银联股份有限公司 | Method and system for identifying sales slip characters |
CN106959977A (en) * | 2016-01-12 | 2017-07-18 | 广州市动景计算机科技有限公司 | Candidate collection computational methods and device, word error correction method and device in word input |
CN107704102A (en) * | 2017-10-09 | 2018-02-16 | 北京新美互通科技有限公司 | A kind of text entry method and device |
US20180322370A1 (en) * | 2017-05-05 | 2018-11-08 | Baidu Online Network Technology (Beijing) Co., Ltd . | Method and apparatus of discovering bad case based on artificial intelligence, device and storage medium |
CN109543022A (en) * | 2018-12-17 | 2019-03-29 | 北京百度网讯科技有限公司 | Text error correction method and device |
-
2019
- 2019-04-02 CN CN201910261329.6A patent/CN109977415A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103473003A (en) * | 2013-09-12 | 2013-12-25 | 天津三星通信技术研究有限公司 | Character input error correction method and device |
CN104951779A (en) * | 2014-03-24 | 2015-09-30 | 中国银联股份有限公司 | Method and system for identifying sales slip characters |
CN106959977A (en) * | 2016-01-12 | 2017-07-18 | 广州市动景计算机科技有限公司 | Candidate collection computational methods and device, word error correction method and device in word input |
US20180322370A1 (en) * | 2017-05-05 | 2018-11-08 | Baidu Online Network Technology (Beijing) Co., Ltd . | Method and apparatus of discovering bad case based on artificial intelligence, device and storage medium |
CN107704102A (en) * | 2017-10-09 | 2018-02-16 | 北京新美互通科技有限公司 | A kind of text entry method and device |
CN109543022A (en) * | 2018-12-17 | 2019-03-29 | 北京百度网讯科技有限公司 | Text error correction method and device |
Non-Patent Citations (1)
Title |
---|
胡熠等: "搜索引擎的一种在线中文查询纠错方法", 《中文信息学报》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112949289A (en) * | 2019-12-11 | 2021-06-11 | 北大方正集团有限公司 | Method, device and system for detecting word stacking errors |
CN111368918A (en) * | 2020-03-04 | 2020-07-03 | 拉扎斯网络科技(上海)有限公司 | Text error correction method and device, electronic equipment and storage medium |
CN111368918B (en) * | 2020-03-04 | 2024-01-05 | 拉扎斯网络科技(上海)有限公司 | Text error correction method and device, electronic equipment and storage medium |
CN111859907A (en) * | 2020-06-11 | 2020-10-30 | 北京百度网讯科技有限公司 | Character error correction method and device, electronic equipment and storage medium |
CN111859907B (en) * | 2020-06-11 | 2023-06-23 | 北京百度网讯科技有限公司 | Text error correction method and device, electronic equipment and storage medium |
CN112307208A (en) * | 2020-11-05 | 2021-02-02 | Oppo广东移动通信有限公司 | Long text classification method, terminal and computer storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11308405B2 (en) | Human-computer dialogue method and apparatus | |
CN109977415A (en) | A kind of text error correction method and device | |
TWI777010B (en) | Prediction of information conversion rate, information recommendation method and device | |
CN109299344B (en) | Generation method of ranking model, and ranking method, device and equipment of search results | |
WO2020253466A1 (en) | Method and device for generating test case of user interface | |
CN107947951A (en) | Groups of users recommends method, apparatus and storage medium and server | |
US20240020514A1 (en) | Improper neural network input detection and handling | |
US20220261591A1 (en) | Data processing method and apparatus | |
CN110019790A (en) | Text identification, text monitoring, data object identification, data processing method | |
CN110061930B (en) | Method and device for determining data flow limitation and flow limiting values | |
CN107220384A (en) | A kind of search word treatment method, device and computing device based on correlation | |
CN113139052B (en) | Rumor detection method and device based on graph neural network feature aggregation | |
CN115455171B (en) | Text video mutual inspection rope and model training method, device, equipment and medium | |
WO2014176056A2 (en) | Data classification | |
CN107357776B (en) | Related word mining method and device | |
CN103164436B (en) | A kind of image search method and device | |
CN113205189B (en) | Method for training prediction model, prediction method and device | |
CN111565065B (en) | Unmanned aerial vehicle base station deployment method and device and electronic equipment | |
US20230229896A1 (en) | Method and computing device for determining optimal parameter | |
CN107748801A (en) | News recommends method, apparatus, terminal device and computer-readable recording medium | |
CN108491451A (en) | A kind of English reads article and recommends method, apparatus, electronic equipment and storage medium | |
CN111291792B (en) | Flow data type integrated classification method and device based on double evolution | |
CN109582930B (en) | Sliding input decoding method and device and electronic equipment | |
CN105468603A (en) | Data selection method and apparatus | |
CN111897910A (en) | Information pushing method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190705 |