CN108595410A - The automatic of hand-written composition corrects method and device - Google Patents
The automatic of hand-written composition corrects method and device Download PDFInfo
- Publication number
- CN108595410A CN108595410A CN201810223663.8A CN201810223663A CN108595410A CN 108595410 A CN108595410 A CN 108595410A CN 201810223663 A CN201810223663 A CN 201810223663A CN 108595410 A CN108595410 A CN 108595410A
- Authority
- CN
- China
- Prior art keywords
- sentence
- word
- hand
- text
- content
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/232—Orthographic correction, e.g. spell checking or vowelisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
- G06V10/267—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/14—Image acquisition
- G06V30/148—Segmentation of character regions
- G06V30/153—Segmentation of character regions using recognition of characters or words
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Machine Translation (AREA)
- Character Discrimination (AREA)
Abstract
The present invention proposes that a kind of the automatic of hand-written composition corrects method and device, and wherein method includes:Obtain hand-written composition image to be changed;Interference and the background line in hand-written composition image are removed using connected domain analysis algorithm and line detection algorithm;To treated, hand-written composition image obtains corresponding content of text into every trade cutting, the segmentation of words and identification;Sentence cutting is carried out to content of text, phrase and its type in the part of speech of each word, shallow syntactic analysis acquisition sentence is obtained to the progress part of speech analysis of each sentence, and then choose the rule and policy or depth model of particular error type, detects specific syntax error;To each syntax error detected, the recognition result of resultant fault identification model and correct suggestion mode generation correct suggestion, to obtain entire chapter composition correct result;It based on composition content of text and corrects as a result, build convolutional neural networks, realizes the intelligent scoring to composition text.
Description
Technical field
The present invention relates to work correction technical fields more particularly to a kind of the automatic of hand-written composition to correct method and device.
Background technology
The current composition method of correcting is mainly used for correcting content of text automatically, after getting content of text,
To operations such as the cutting of content of text progress sentence, syntactic analyses, each sentence is corresponding in acquisition content of text corrects suggestion, into
And obtain that content of text is corresponding to correct result.However, current most of composition is mainly hand-written composition, hand-written composition exists
It is the problems such as word adhesion, line tilt, low in the presence of accuracy rate is corrected if method is corrected in above-mentioned composition is applied to hand-written composition,
Correct inefficient problem.
Invention content
The present invention is directed to solve at least some of the technical problems in related technologies.
For this purpose, first purpose of the present invention is to propose that a kind of the automatic of hand-written composition corrects method, it is existing for solving
Have that correct accuracy rate in technology low, corrects inefficient problem.
Second object of the present invention is to propose that a kind of the automatic of hand-written composition corrects device.
Third object of the present invention is to propose that the automatic of another hand-written composition corrects device.
Fourth object of the present invention is to propose a kind of non-transitorycomputer readable storage medium.
The 5th purpose of the present invention is to propose a kind of computer program product.
In order to achieve the above object, first aspect present invention embodiment, which proposes a kind of the automatic of hand-written composition, corrects method, wrap
It includes:
Obtain hand-written composition image to be changed;
The hand-written composition image is handled using connected domain analysis algorithm and line detection algorithm, described in removal
Interference in hand-written composition image and background line, the hand-written composition image that obtains that treated;
To treated the hand-written composition image into every trade cutting and the segmentation of words, multiple word image blocks are obtained;
Multiple word image blocks are identified using preset word identification model, obtain the hand-written composition image pair
The content of text answered;
The cutting of content of text progress sentence, syntactic analysis, sentence are corrected, the hand-written composition image is obtained and corresponds to
Correct result.
Further, it is described to treated the hand-written composition image into every trade cutting and the segmentation of words, obtain more
Before a word image block, further include:
Treated that hand-written composition image carries out word fractionation to described, by adjacent rows in the hand-written composition image it
Between adhesion part disconnect.
Further, described that multiple word image blocks are identified using preset word identification model, described in acquisition
Before the corresponding content of text of hand-written composition image, further include:
The word image block is normalized according to the word identification model corresponding preset format, is obtained
Treated word image block.
Further, described that multiple word image blocks are identified using preset word identification model, described in acquisition
Before the corresponding content of text of hand-written composition image, further include:
Obtain word identification sample;The word identification sample includes:Handwritten word image pattern and corresponding text
This content;
The word identification sample is normalized according to the word identification model corresponding preset format, is obtained
The word identification sample that takes that treated;
Treated that word identification sample is trained the word identification model according to described, obtains described preset
Word identification model.
Further, treated described in the basis, and word identification sample is trained the word identification model,
The preset word identification model is obtained, including:
The word identification sample is divided according to word length, obtains short word identification sample, middle word identification
Sample and long word identification sample;
Use the short word identification sample, middle word identification sample and long word identification sample to the word successively
Identification model is trained, and obtains the preset word identification model.
Further, described that the cutting of content of text progress sentence, syntactic analysis, sentence are corrected, obtain the hand
Writing texts and pictures as it is corresponding correct result before, further include:
The content of text is inputted into preset error correcting model, obtains the content of text after error correction;The error correcting model by
Multiple language model compositions;The language model is N gram language models;The value of N is positive integer.
Further, described that the content of text is inputted into preset error correcting model, obtain the content of text after error correction it
Before, further include:
Obtain error correction training sample;The error correction training sample includes:After content of text sample and corresponding error correction
Sample;
The error correction training sample is normalized according to the error correcting model corresponding preset format, at acquisition
Error correction training sample after reason;
Treated that error correction training sample is trained the error correcting model according to described, obtains the preset error correction
Training sample.
Further, described that the cutting of content of text progress sentence, syntactic analysis, sentence are corrected, obtain the hand
Writing texts and pictures are corrected as corresponding as a result, including:
Sentence cutting is carried out to the content of text, obtains multiple sentences in the content of text;
Syntactic analysis is carried out to each sentence of the content of text, obtains the analysis result of each sentence;The analysis
As a result include:The type of word, phrase, the part of speech of the word and the phrase that the sentence includes;
Corresponding wrong identification model is chosen to the sentence according to the analysis result of the sentence for each sentence
It is identified, obtains the error message in the sentence;
By in the sentence error message input it is preset correct suggestion mode, obtain corresponding correct of the sentence and build
View;
Suggestion is corrected according to each sentence is corresponding, the generation hand-written composition image is corresponding to correct result.
Further, described that corresponding wrong identification mould is chosen according to the analysis result of the sentence for each sentence
The sentence is identified in type, obtains the error message in the sentence, including:
For each sentence, the corresponding contextual information of each word in the sentence is obtained;
According to the corresponding contextual information of each word, the corresponding matrix-vector of each word is determined;
According to the part of speech of each word, wrong identification model corresponding with the part of speech is chosen;
By the corresponding matrix-vector of each word in the sentence, the corresponding wrong identification model of input part of speech obtains institute
State the error message in sentence.
Further, described by the corresponding matrix-vector of each word in the sentence, the corresponding mistake of input part of speech is known
Other model further includes before obtaining the error message in the sentence:
Obtain wrong identification training sample corresponding with each part of speech;The wrong identification training sample includes:Have
The corresponding matrix-vector of word of the part of speech and the corresponding error message of the word;
For the corresponding wrong identification model of each part of speech, using the corresponding wrong identification training sample of the part of speech to institute
Wrong identification model is stated to be trained.
Further, described that sentence cutting is carried out to the content of text, multiple sentences in the content of text are obtained,
Including:
Obtain the corresponding type of the content of text;The type is used to identify the accurate of the content of text sentence division
Degree;
Obtain feature to be extracted corresponding with the type;
According to the feature to be extracted, feature extraction is carried out to the content of text, obtains cutting in the content of text
Divide characteristic information;
Sentence cutting is carried out to the content of text according to the cutting characteristic information, is obtained more in the content of text
A sentence.
Further, described that corresponding wrong identification mould is chosen according to the analysis result of the sentence for each sentence
The sentence is identified in type, before obtaining the error message in the sentence, further includes:
For each sentence, the sentence is compared with preset error pattern library, obtains the mistake in the sentence
False information;The error pattern library includes:The corresponding regular expression of a variety of error patterns.
Further, each sentence of the basis is corresponding corrects suggestion, generates hand-written corresponding batch of the composition image
Change as a result, including:
For each sentence, the sentence it is corresponding to correct suggestion be multiple when, the sentence is multiple batches corresponding
Reconstruction view input is preset to correct preference pattern, and acquisition is corresponding with the sentence individually to correct suggestion;
Suggestion is individually corrected according to each sentence is corresponding, the generation hand-written composition image is corresponding to correct result.
Further, the error message input by the sentence is preset corrects suggestion mode, obtains the sentence
Son it is corresponding correct suggestion before, further include:
Training sample is corrected in acquisition;The training sample of correcting includes:Error message in sentence sample, sentence sample
And sentence sample is corresponding corrects suggestion;
It corrects training sample according to described and is trained to correcting suggestion mode, obtain described preset correcting suggestion mould
Type.
Further, described that the cutting of content of text progress sentence, syntactic analysis, sentence are corrected, obtain the hand
Writing texts and pictures as it is corresponding correct result after, further include:
Obtain the characteristic information in the content of text;The characteristic information include in following information any one or
It is a variety of:Lexical information, syntactic information, sentence information correct information;
It is lexical information either syntactic information or when sentence information in the characteristic information, the characteristic information is inputted
Corresponding Rating Model obtains the corresponding scoring of the characteristic information;
Include in the characteristic information:Lexical information, syntactic information, sentence information and when correcting information, by the feature
The preset comprehensive grade model of information input obtains the corresponding comprehensive score of the hand-written composition image.
The automatic of the hand-written composition of the embodiment of the present invention corrects method, by obtaining hand-written composition image to be changed;It adopts
Interference and the background line in hand-written composition image are removed with connected domain analysis algorithm and line detection algorithm;To treated
Hand-written composition image obtains corresponding content of text into every trade cutting, the segmentation of words and identification;Sentence is carried out to content of text
Cutting obtains the progress part of speech analysis of each sentence phrase and its class in the part of speech of each word, shallow syntactic analysis acquisition sentence
Type, and then the rule and policy or depth model of particular error type are chosen, detect specific syntax error;To each of detecting
Syntax error, the recognition result of resultant fault identification model correct suggestion with suggestion mode generation is corrected, and make to obtain entire chapter
Text corrects result;It based on composition content of text and corrects as a result, build convolutional neural networks, realizes the intelligence to composition text
It can score, to improve the accuracy corrected and correct efficiency.
In order to achieve the above object, second aspect of the present invention embodiment, which proposes a kind of the automatic of hand-written composition, corrects device, wrap
It includes:
Acquisition module, for obtaining hand-written composition image to be changed;
Processing module, for being carried out to the hand-written composition image using connected domain analysis algorithm and line detection algorithm
Processing, the interference in the removal hand-written composition image and background line, the hand-written composition image that obtains that treated;
Cutting module, for, into every trade cutting and the segmentation of words, being obtained more to treated the hand-written composition image
A word image block;
Identification module, for multiple word image blocks to be identified using preset word identification model, described in acquisition
The corresponding content of text of hand-written composition image;
Module is corrected, for being corrected to the cutting of content of text progress sentence, syntactic analysis, sentence, obtains the hand
Writing texts and pictures correct result as corresponding.
Further, the module of correcting includes:
Cutting unit obtains multiple sentences in the content of text for carrying out sentence cutting to the content of text;
Analytic unit carries out syntactic analysis for each sentence to the content of text, obtains the analysis of each sentence
As a result;The analysis result includes:Word, phrase, the part of speech of the word and the phrase that the sentence includes
Type;
Recognition unit, for choosing corresponding wrong identification mould according to the analysis result of the sentence for each sentence
The sentence is identified in type, obtains the error message in the sentence;
Input unit, for by the sentence error message input it is preset correct suggestion mode, obtain the sentence
Son is corresponding to correct suggestion;
Generation unit generates hand-written corresponding batch of the composition image for correcting suggestion according to each sentence is corresponding
Change result.
Further, the recognition unit is specifically used for,
For each sentence, the corresponding contextual information of each word in the sentence is obtained;
According to the corresponding contextual information of each word, the corresponding matrix-vector of each word is determined;
According to the part of speech of each word, wrong identification model corresponding with the part of speech is chosen;
By the corresponding matrix-vector of each word in the sentence, the corresponding wrong identification model of input part of speech obtains institute
State the error message in sentence.
The automatic of the hand-written composition of the embodiment of the present invention corrects device, by obtaining hand-written composition image to be changed;It adopts
Interference and the background line in hand-written composition image are removed with connected domain analysis algorithm and line detection algorithm;To treated
Hand-written composition image obtains corresponding content of text into every trade cutting, the segmentation of words and identification;Sentence is carried out to content of text
Cutting obtains the progress part of speech analysis of each sentence phrase and its class in the part of speech of each word, shallow syntactic analysis acquisition sentence
Type, and then the rule and policy or depth model of particular error type are chosen, detect specific syntax error;To each of detecting
Syntax error, the recognition result of resultant fault identification model correct suggestion with suggestion mode generation is corrected, and make to obtain entire chapter
Text corrects result;It based on composition content of text and corrects as a result, build convolutional neural networks, realizes the intelligence to composition text
It can score, to improve the accuracy corrected and correct efficiency.
In order to achieve the above object, third aspect present invention embodiment, which proposes the automatic of another hand-written composition, corrects device,
Including:Memory, processor and storage are on a memory and the computer program that can run on a processor, which is characterized in that
Realize that the automatic of hand-written composition as described above corrects method when the processor executes described program.
To achieve the goals above, fourth aspect present invention embodiment proposes a kind of computer-readable storage of non-transitory
Medium is stored thereon with computer program, automatic batch that hand-written composition as described above is realized when which is executed by processor
Change method.
To achieve the goals above, fifth aspect present invention embodiment proposes a kind of computer program product, when described
When instruction processing unit in computer program product executes, a kind of the automatic of hand-written composition of execution corrects method, the method packet
It includes:
Obtain hand-written composition image to be changed;
The hand-written composition image is handled using connected domain analysis algorithm and line detection algorithm, described in removal
Interference in hand-written composition image and background line, the hand-written composition image that obtains that treated;
To treated the hand-written composition image into every trade cutting and the segmentation of words, multiple word image blocks are obtained;
Multiple word image blocks are identified using preset word identification model, obtain the hand-written composition image pair
The content of text answered;
The cutting of content of text progress sentence, syntactic analysis, sentence are corrected, the hand-written composition image is obtained and corresponds to
Correct result.
The additional aspect of the present invention and advantage will be set forth in part in the description, and will partly become from the following description
Obviously, or practice through the invention is recognized.
Description of the drawings
Above-mentioned and/or additional aspect and advantage of the invention will become from the following description of the accompanying drawings of embodiments
Obviously and it is readily appreciated that, wherein:
Fig. 1 is a kind of automatic flow diagram for correcting method of hand-written composition provided in an embodiment of the present invention;
Fig. 2 is the automatic flow diagram for correcting method of the hand-written composition of another kind provided in an embodiment of the present invention;
Fig. 3 is a kind of automatic structural schematic diagram for correcting device of hand-written composition provided in an embodiment of the present invention;
Fig. 4 is the automatic structural schematic diagram for correcting device of the hand-written composition of another kind provided in an embodiment of the present invention;
Fig. 5 is the automatic structural schematic diagram for correcting device of the hand-written composition of another kind provided in an embodiment of the present invention.
Specific implementation mode
The embodiment of the present invention is described below in detail, examples of the embodiments are shown in the accompanying drawings, wherein from beginning to end
Same or similar label indicates same or similar element or element with the same or similar functions.Below with reference to attached
The embodiment of figure description is exemplary, it is intended to for explaining the present invention, and is not considered as limiting the invention.
Below with reference to the accompanying drawings it describes the automatic of the hand-written composition of the embodiment of the present invention and corrects method and device.
Fig. 1 is a kind of automatic flow diagram for correcting method of hand-written composition provided in an embodiment of the present invention.Such as Fig. 1 institutes
Show, the automatic method of correcting of the hand-written composition includes the following steps:
S101, hand-written composition image to be changed is obtained.
The automatic executive agent for correcting method of hand-written composition provided by the invention is that the automatic of hand-written composition corrects device,
The automatic device of correcting of hand-written composition can be hardware device, such as terminal device, server, server cluster etc., or hard
The software etc. installed in part equipment.Hand-written composition image in the present embodiment can be that opponent's writing text is taken pictures or swept
The electronic image obtained after retouching.
S102, it is handled using connected domain analysis algorithm and line detection algorithm opponent writing texts and pictures picture, removes hand
The interference in texts and pictures picture and background line are write, the hand-written composition image that obtains that treated.
In the present embodiment, hand-written composition it is automatic correct device can first opponent write texts and pictures picture and carry out binary conversion treatment,
Then it uses connected domain analysis algorithm and line detection algorithm opponent to write texts and pictures picture to handle, removes hand-written composition image
In interference and background line, the hand-written composition image that obtains that treated.
Wherein, opponent writes the binary conversion treatment of texts and pictures picture, refers to the face according to each pixel in hand-written composition image
Color determines the gray value of each pixel, and the gray value of each pixel is compared with threshold grey scale value, will according to comparison result
The gray value of pixel on hand-written composition image is set as 0 or 255, that is, whole image is showed significantly only black
With white visual effect.
In the present embodiment, texts and pictures picture is write using connected domain analysis algorithm opponent and is handled, texts and pictures can be made by hand-written
Broken line as in certain area is connected to;Certain area is, for example, the region etc. where word.Using connected domain point
Analysis algorithm opponent write texts and pictures picture handle, additionally it is possible to remove it is hand-written composition image in interference, such as opponent write text into
Part when row is taken pictures highlights situation etc..
In the present embodiment, texts and pictures picture is write using line detection algorithm opponent and is handled, can detect and remove hand-written
Background line, such as notebook horizontal line, four lines, three lattice etc. in composition image.Since in hand-written composition, the stroke of word is frequent
Adhesion or word are carried out with background line to be embedded into background line, after the segmentation of words, the word image block that causes cutting to obtain
Include part background line, influence the recognition accuracy of word, therefore, is detected using line detection algorithm and remove hand-written composition
Background line in image, can improve the recognition accuracy of word, improve the accuracy corrected.
S103, to treated, hand-written composition image obtains multiple word image blocks into every trade cutting and the segmentation of words.
Further, due in hand-written composition, there may be adhesion situation between adjacent rows word, in order to avoid
There will be two row words of adhesion situation during row cutting as a line, and before step 103, the method can also wrap
It includes:To treated, hand-written composition image carries out word fractionation, by the adhesion part between adjacent rows in hand-written composition image
Disconnect, to cutting of being expert at during, can by the two row segmentation of words that there is adhesion situation before be two rows, improve row cuts
The accuracy divided, and then improve the recognition accuracy of word.
S104, multiple word image blocks are identified using preset word identification model, obtain hand-written composition image
Corresponding content of text.
In the present embodiment, preset word identification model for example can be time recurrent neural network (Long Short-
Term Memory, LSTM).The automatic process for correcting device execution step 104 of hand-written composition is specifically as follows, using default
Word identification model multiple word image blocks are identified, the corresponding word of word image block is obtained, by word image block
Corresponding word is integrated, and the corresponding content of text of hand-written composition image is obtained.
It, can be to word image block and word in order to improve the recognition accuracy of word identification model in the present embodiment
It identifies sample, is normalized according to the corresponding preset format of word identification model, therefore, on the basis of above-described embodiment
On, before step 104, the method can also include:According to the corresponding preset format of word identification model to word image
Block is normalized, and obtains treated word image block.
Corresponding, before step 104, the training process to word identification model may include:Obtain word identification sample
This;Word identification sample includes:Handwritten word image pattern and corresponding content of text;It is corresponded to according to word identification model
Preset format word identification sample is normalized, obtain treated word identification sample;According to treated
Word identification sample is trained word identification model, obtains preset word identification model.
Wherein, to the normalized of word image block, such as can be, by tune such as the brightness of word image block, colors
The whole format for required by word identification model.
In the present embodiment, during being trained to word identification model, in order to improve the instruction of word identification model
Practice effect, word identification sample can be divided according to word length, obtain short word identification sample, middle word identification sample
Sheet and long word identification sample;Short word identification sample, middle word identification sample and long word identification sample are used successively
Word identification model is trained, preset word identification model is obtained.Wherein, handwritten word in short word identification sample
Length is less than the length of handwritten word in middle word identification sample;The length of handwritten word is less than long single in middle word identification sample
Word identifies the length of handwritten word in sample.
Further, it due to during the segmentation of words, there is cutting and deficient cutting situation, crosses cutting and refers to
It is two or more words by a segmentation of words;Deficient cutting refers to that by two or more segmentation of words be a list
Word, the word obtained so as to cause word identification Model Identification may be what multiple words formed, or be a part for word,
It therefore,, can will after identification obtains the corresponding word of word image block in step 104 in order to improve the accuracy of word identification
Word is compared with the word in default dictionary, obtains word in the dictionary with word matched, in the word is dictionary
When a part for word, corresponding second word of word image block before or after word image block is obtained, described in judgement
Whether word and second single contamination are word in the dictionary, and then carry out group to crossing the word split according to judging result
It closes, and the word to owing to split carries out fractionation etc. again.
Further, since the recognition accuracy of word identification model is not the absolutely identification of word identification model
As a result there may be mistake, in order to improve the recognition accuracy of hand-written composition image, after step 104, the method may be used also
To include:Content of text is inputted into preset error correcting model, obtains the content of text after error correction;Error correcting model is by multiple language moulds
Type forms;Language model is N gram language models;The value of N is positive integer.
Corresponding, the training process of error correcting model is specifically as follows, and obtains error correction training sample;It is wrapped in error correction training sample
It includes:Sample after content of text sample and corresponding error correction;According to the corresponding preset format of error correcting model to error correction training sample
It is normalized, obtains treated error correction training sample;According to treated error correction training sample to error correcting model into
Row training, obtains preset error correction training sample.
Wherein, error correcting model is specifically as follows the processing procedure of content of text, obtains the candidate word in content of text, obtains
The likeness in form word for taking candidate word also regard the likeness in form word as candidate word;Normalized is done to candidate word, such as candidate word
Capital and small letter, full half-angle, single plural form, digital punctuate, the punctuate being inserted into before and after space, candidate word between candidate word etc. into
Row normalized;Using in content of text candidate word and its likeness in form word as a row candidate word, calculate between each row candidate word
Transition probability, obtain the candidate word that corresponding transition probability is more than the first transition probability threshold value, in conjunction with optimum route search calculate
Method, obtains the most suitable word in each row candidate word, and integration obtains content of text.
It should be noted that during the transition probability between calculating each row candidate word, binary language can be first used
It says that model (bigram) calculates the transition probability between adjacent two row candidate word, is less than the second transition probability threshold in the transition probability
When value, corresponding candidate word is deleted;When the transition probability is more than the second transition probability threshold value, three gram language models are no longer used
(trigram) either polynary language model calculates the transition probability between continuous three row or multiple row candidate word, turns to reduce
The calculation amount of probability is moved, processing speed of the error correcting model to content of text is improved.
S105, the cutting of content of text progress sentence, syntactic analysis, sentence are corrected, it is corresponding obtains hand-written composition image
Correct result.
The automatic of the hand-written composition of the embodiment of the present invention corrects method, by obtaining hand-written composition image to be changed;It adopts
Texts and pictures picture is write with connected domain analysis algorithm and line detection algorithm opponent to handle, and is removed dry in hand-written composition image
It disturbs and background line, the hand-written composition image that obtains that treated;To treated, hand-written composition image is into every trade cutting and list
Word segmentation obtains multiple word image blocks;Multiple word image blocks are identified using preset word identification model, are obtained
The corresponding content of text of hand-written composition image;The cutting of content of text progress sentence, syntactic analysis, sentence are corrected, obtained hand-written
Composition image is corresponding to be corrected as a result, so as to before opponent's writing texts and pictures picture is corrected, and hand-written composition image is removed
In interference and background line, opponent write texts and pictures picture be identified, obtain corresponding content of text, then to content of text into
Row is corrected, to improve the accuracy corrected and correct efficiency.
Fig. 2 is the automatic flow diagram for correcting method of the hand-written composition of another kind provided in an embodiment of the present invention.Such as Fig. 2
Shown, on the basis of embodiment shown in Fig. 1, step 105 can specifically include following steps:
S1051, sentence cutting is carried out to content of text, obtains multiple sentences in content of text.
In the present embodiment, the automatic process for correcting device execution step 1051 of hand-written composition is specifically as follows, and obtains text
The corresponding type of this content;Type is used to identify the accuracy of content of text sentence division;It obtains corresponding with type to be extracted
Feature;According to feature to be extracted, feature extraction is carried out to content of text, obtains the cutting characteristic information in content of text;According to
Cutting characteristic information carries out sentence cutting to content of text, obtains multiple sentences in content of text.
In the present embodiment, the automatic device of correcting of hand-written composition can determine content of text pair using the method for machine learning
The type answered.Wherein, type is specifically as follows hand-written composition image or common composition;The hand-written composition image of type refers to
It is that text content is the content of text that opponent writes that texts and pictures picture is identified.
For example, in the case where type is hand-written composition image, during hand-written, there is mistake and accorded with using punctuate
Number the case where, therefore, in order to improve sentence division accuracy, the corresponding feature to be extracted of type can be the lead-in of capitalization
Mother etc. carries out sentence cutting to content of text according to these features, obtains multiple sentences in content of text.It is common in type
In the case of composition, the corresponding feature to be extracted of type can be punctuation mark etc..
Further, in order to improve the accuracy corrected, before step 1051, the method can also include:To text
This content carries out the operations such as pretreatment operation, such as coded treatment, checking treatment, filtering interference, normalized.
S1052, syntactic analysis is carried out to each sentence of content of text, obtains the analysis result of each sentence;Analysis knot
Fruit includes:The type of word, phrase, the part of speech of word and phrase that sentence includes.
In the present embodiment, the automatic process for correcting device execution step 1052 of hand-written composition is specifically as follows, for text
Each sentence in this content carries out word fractionation to the sentence, obtains the word in the sentence;Obtain the word
Part of speech;The sentence is matched with preset phrase regular expression, obtains the phrase etc. in the sentence.In addition, hand
The automatic of writing text corrects device and can also be adjusted to phrase regular expression according to matching result.
S1053, be directed to each sentence, according to the analysis result of sentence, choose corresponding wrong identification model to sentence into
Row identification, obtains the error message in sentence.
In the present embodiment, prestored in preset model library each word, the part of speech of word, phrase or phrase type
Corresponding wrong identification model, for word, the part of speech phrase of word or the type of phrase in sentence, interrogation model library,
Corresponding wrong identification model is obtained, by the corresponding wrong identification model of the sentence inputting, obtains the error message in sentence.
Wherein, error message for example can be that preposition uses mistake etc. using mistake, phrase using mistake, word.
Specifically, the automatic process for correcting device execution step 1053 of hand-written composition is specifically as follows, for each sentence
Son obtains the corresponding contextual information of each word in sentence;According to the corresponding contextual information of each word, determine each single
The corresponding matrix-vector of word;According to the part of speech of each word, wrong identification model corresponding with part of speech is chosen;It will be each in sentence
The corresponding matrix-vector of word, the corresponding wrong identification model of input part of speech, obtains the error message in sentence.
Wherein, the corresponding contextual information of each word in sentence refers in sentence in the pre-determined distance of each word
Other words information and sentence include word phrase information.For example, in sentence " I am interested in
In something ", other words in the pre-determined distance of preposition " in " for example may include " am ", " interested ",
“something”.The corresponding matrix-vector of each word for example can be Word2Vec models according to the corresponding context of word
The matrix-vector that information generates.
Wherein, part of speech is such as noun, verb, preposition.In the present embodiment, by the corresponding matrix of each word in sentence to
Amount, after inputting the corresponding wrong identification model of part of speech, wrong identification model can determine single according to the corresponding matrix-vector of word
Word is the probability of each word, such as probability that the preposition " in " mentioned in above-mentioned example be " on ", the probability for being " at ", is
The probability of " before ", the probability for being " after ", be " since " probability etc., in the highest preposition of corresponding probability and sentence
When preposition " in " in son is different, the highest preposition of corresponding probability is determined as the error message in sentence;If corresponding general
The highest preposition of rate is preposition " in ", then it represents that the preposition " in " in sentence uses correctly, without error message.
Further, in order to further increase the accuracy for obtaining error message, step 105 can also include:For every
Sentence is compared a sentence with preset error pattern library, obtains the error message in sentence;Error pattern library Zhong Bao
It includes:The corresponding regular expression of a variety of error patterns.
S1054, suggestion mode is corrected by the error message input in sentence is preset, obtain corresponding correct of sentence and build
View.
In the present embodiment, after the error message in getting each sentence, can by the error message in sentence, or
Suggestion mode is corrected by the sentence inputting including error message is preset, obtains that sentence is corresponding to correct suggestion.
Wherein, can be that training sample is corrected in acquisition, is corrected in training sample and is wrapped to the training process for correcting suggestion mode
It includes:A large amount of sentence with error message and it is corresponding correct suggestion, initial correcting is built according to training sample is corrected
View model is trained, and obtains described preset correcting suggestion mode.Wherein, it can be time recurrent neural to correct suggestion mode
Network.
S1055, suggestion is corrected according to each sentence is corresponding, the hand-written composition image of generation is corresponding to correct result.
In the present embodiment, for each sentence, the error message obtained using wrong identification Model Identification is default with using
The error message that compares of error pattern library there may be repeating, so as to cause for each sentence, corrected there are multiple
It is recommended that therefore, hand-written composition it is automatic correct device and execute the process of step 1055 be specifically as follows, for each sentence,
Sentence is corresponding when to correct suggestion be multiple, by sentence it is corresponding it is multiple correct that suggestion input is preset to correct preference pattern, obtain
It takes and corresponding with sentence individually corrects suggestion;Suggestion is individually corrected according to each sentence is corresponding, generates hand-written composition image pair
That answers corrects result.
In the present embodiment, correcting preference pattern, can be that sentence is corresponding correct suggestion and give a mark, by highest scoring
Suggestion is corrected as sentence is corresponding and most probable corrects suggestion.
Further, on the basis of the above embodiments, texts and pictures picture can be write with opponent to score, therefore, step
After 105, the method can also include:
Characteristic information in S106, acquisition content of text;Characteristic information include in following information any one or it is more
Kind:Lexical information, syntactic information, sentence information correct information.
S107, in characteristic information it is lexical information either syntactic information or when sentence information, by characteristic information input pair
The Rating Model answered obtains the corresponding scoring of characteristic information.
S108, include in characteristic information:Lexical information, syntactic information, sentence information and when correcting information, by characteristic information
Preset comprehensive grade model is inputted, the corresponding comprehensive score of hand-written composition image is obtained.
For example, in the case where characteristic information includes lexical information, lexical information is inputted into corresponding vocabulary scoring mould
Type obtains vocabulary scoring;In the case where characteristic information includes syntactic information, syntactic information is inputted into corresponding grammer and is scored
Model obtains grammer scoring;When characteristic information includes sentence information, such as the structure of sentence, length etc., by sentence information
Corresponding sentence Rating Model is inputted, sentence scoring is obtained.
In the present embodiment, each word in content of text can be indicated with corresponding unique vector, such as use one-hot
Vector indicates.Wherein, the number of dimensions of one-hot vectors is the total quantity of all words, the corresponding one-hot of each word to
Amount is only 1 value in corresponding dimension, is 0 value in other dimensions.For example, the total quantity in all words is 5000, content of text
In in the case that the first word is the 1000th word, the number of dimensions of the corresponding one-hot vectors of the first word is 5000, the
The value of the 1000th dimension is 1 in the corresponding one-hot vectors of one word, is 0 value in other dimensions.In the present embodiment, in text
Lexical information in appearance can specifically be indicated with the corresponding unique vector of each vocabulary in content of text, that is to say, that vocabulary is commented
The input of sub-model can be the corresponding vector set of content of text.The corresponding vector set of content of text is referred to text
The vector set that each vocabulary obtains after being replaced with corresponding unique vector in content.
Corresponding, the training process of vocabulary Rating Model can be to obtain the corresponding vector set of composition sample, Yi Jizuo
The corresponding vocabulary scoring of literary sample;The corresponding vector set of the sample that will write a composition, and the corresponding vocabulary scoring input of composition sample
Vocabulary Rating Model is trained vocabulary Rating Model.
In the present embodiment, the syntactic information in content of text and sentence information can also use the corresponding vector of content of text
Set expression.Corresponding, the training process of grammer Rating Model is specifically as follows, and obtains the corresponding vector set of composition sample,
And the corresponding grammer scoring of composition sample;The corresponding vector set of the sample that will write a composition, and the corresponding grammer of composition sample are commented
Divide input syntax Rating Model, grammer Rating Model is trained.The training process of sentence Rating Model is specifically as follows, and obtains
It is taken as the corresponding vector set of literary sample, and the corresponding sentence scoring of composition sample;The corresponding vector set of the sample that will write a composition,
And the corresponding sentence scoring input sentence Rating Model of composition sample, sentence Rating Model is trained.
In another example in the case where characteristic information includes lexical information, syntactic information, sentence information and corrects information,
Characteristic information can successively be carried out to following operate:Vector after vectorization is inputted convolutional neural networks by vectorization operation
CNN, attention machine is inputted by output input time recurrent neural network LSTM of convolutional neural networks CNN, by the output of LSTM
The output of attention processed, attention mechanism are the corresponding comprehensive score of hand-written composition image.
In the present embodiment, vocabulary Rating Model, grammer Rating Model, sentence Rating Model, convolutional neural networks CNN, when
Between recurrent neural network LSTM, attention mechanism attention can be trained according to corresponding training sample, herein no longer
It elaborates.
In the present embodiment, the error number of type of error and each type of error that information refers in content of text is corrected
Amount.Lexical information, syntactic information and sentence information can use the corresponding vectorial set expression of content of text.Corresponding, synthesis is commented
The training process of sub-model is specifically as follows, and obtains the corresponding vector set of the composition sample, type of error in composition sample, every
Number of errors, the corresponding comprehensive score of composition sample of kind type of error;The corresponding vector set of the sample that will write a composition, composition sample
In type of error, each type of error number of errors, the corresponding comprehensive score of composition sample input comprehensive grade model, it is right
Comprehensive grade model is trained.
The automatic of the hand-written composition of the embodiment of the present invention corrects method, by obtaining hand-written composition image to be changed;It adopts
Texts and pictures picture is write with connected domain analysis algorithm and line detection algorithm opponent to handle, and is removed dry in hand-written composition image
It disturbs and background line, the hand-written composition image that obtains that treated;To treated, hand-written composition image is into every trade cutting and list
Word segmentation obtains multiple word image blocks;Multiple word image blocks are identified using preset word identification model, are obtained
The corresponding content of text of hand-written composition image;Sentence cutting is carried out to content of text, carrying out part of speech analysis to each sentence obtains
The part of speech of each word, shallow syntactic analysis obtain phrase and its type in sentence, and then choose the regular plan of particular error type
Summary or depth model, detect specific syntax error;To each syntax error detected, the identification of resultant fault identification model
As a result and correct suggestion mode generation correct suggestion, to obtain entire chapter composition correct result;Based on composition content of text with
And correct the intelligent scoring realized as a result, build convolutional neural networks to text of writing a composition, to improve the accuracy corrected and
Correct efficiency.
Fig. 3 is a kind of automatic structural schematic diagram for correcting device of hand-written composition provided in an embodiment of the present invention.Such as Fig. 3 institutes
Show, including:Acquisition module 31, processing module 32, cutting module 33, identification module 34 and correct module 35.
Wherein, acquisition module 31, for obtaining hand-written composition image to be changed;
Processing module 32, for using connected domain analysis algorithm and line detection algorithm to the hand-written composition image into
Row processing, the interference in the removal hand-written composition image and background line, the hand-written composition image that obtains that treated;
Cutting module 33, for, into every trade cutting and the segmentation of words, being obtained to treated the hand-written composition image
Multiple word image blocks;
Identification module 34 obtains institute for multiple word image blocks to be identified using preset word identification model
State the corresponding content of text of hand-written composition image;
Module 35 is corrected, for being corrected to the cutting of content of text progress sentence, syntactic analysis, sentence, described in acquisition
Hand-written composition image is corresponding to correct result.
The automatic device of correcting of hand-written composition provided by the invention can be hardware device, for example, terminal device, server,
The software etc. installed on server cluster etc. or hardware device.Hand-written composition image in the present embodiment, can write for opponent
The electronic image that composition is taken pictures or obtained after being scanned.
In the present embodiment, hand-written composition it is automatic correct device can first opponent write texts and pictures picture and carry out binary conversion treatment,
Then it uses connected domain analysis algorithm and line detection algorithm opponent to write texts and pictures picture to handle, removes hand-written composition image
In interference and background line, the hand-written composition image that obtains that treated.
In the present embodiment, texts and pictures picture is write using connected domain analysis algorithm opponent and is handled, texts and pictures can be made by hand-written
Broken line as in certain area is connected to;Certain area is, for example, the region etc. where word.Using connected domain point
Analysis algorithm opponent write texts and pictures picture handle, additionally it is possible to remove it is hand-written composition image in interference, such as opponent write text into
Part when row is taken pictures highlights situation etc..
In the present embodiment, texts and pictures picture is write using line detection algorithm opponent and is handled, can detect and remove hand-written
Background line, such as notebook horizontal line, four lines, three lattice etc. in composition image.Since in hand-written composition, the stroke of word is frequent
Adhesion or word are carried out with background line to be embedded into background line, after the segmentation of words, the word image block that causes cutting to obtain
Include part background line, influence the recognition accuracy of word, therefore, is detected using line detection algorithm and remove hand-written composition
Background line in image, can improve the recognition accuracy of word, improve the accuracy corrected.
Further, due in hand-written composition, there may be adhesion situation between adjacent rows word, in order to avoid
There will be two row words of adhesion situation during row cutting as a line, and the device can also include:Module is split,
For to treated hand-written composition image, into before every trade cutting, to treated, hand-written composition image carries out word and tears open
Point, the adhesion part in hand-written composition image between adjacent rows is disconnected, to cutting of being expert at during, can will before
It is two rows there are the two row segmentation of words of adhesion situation, improves the accuracy of row cutting, and then improve the recognition accuracy of word.
It, can be to word image block and word in order to improve the recognition accuracy of word identification model in the present embodiment
It identifies sample, is normalized according to the corresponding preset format of word identification model, therefore, on the basis of above-described embodiment
On, the device can also include:Normalized module is used for according to the corresponding preset format of word identification model to list
Word image block is normalized, and obtains treated word image block.
Corresponding, the device can also include:Training module, for obtaining word identification sample;Word identification sample
This includes:Handwritten word image pattern and corresponding content of text;According to the corresponding preset format pair of word identification model
Word identification sample is normalized, and obtains treated word identification sample;According to treated word identification sample
Word identification model is trained, preset word identification model is obtained.
Wherein, to the normalized of word image block, such as can be, by tune such as the brightness of word image block, colors
The whole format for required by word identification model.
In the present embodiment, during being trained to word identification model, in order to improve the instruction of word identification model
Practice effect, word identification sample can be divided according to word length, obtain short word identification sample, middle word identification sample
Sheet and long word identification sample;Short word identification sample, middle word identification sample and long word identification sample are used successively
Word identification model is trained, preset word identification model is obtained.Wherein, handwritten word in short word identification sample
Length is less than the length of handwritten word in middle word identification sample;The length of handwritten word is less than long single in middle word identification sample
Word identifies the length of handwritten word in sample.
Further, it due to during the segmentation of words, there is cutting and deficient cutting situation, crosses cutting and refers to
It is two or more words by a segmentation of words;Deficient cutting refers to that by two or more segmentation of words be a list
Word, the word obtained so as to cause word identification Model Identification may be what multiple words formed, or be a part for word,
Therefore, in order to improve the accuracy of word identification, the automatic of the hand-written composition corrects device and identifies to obtain word image block
After corresponding word, word can be compared with the word in default dictionary, obtains word in the dictionary with word matched,
When the word is a part for word in dictionary, the word image block before or after obtaining word image block is corresponding
Second word judges whether the word and second single contamination are word in the dictionary, and then according to judging result pair
It crosses the word split to be combined, and the word to owing to split carries out fractionation etc. again.
Further, since the recognition accuracy of word identification model is not the absolutely identification of word identification model
As a result there may be mistakes, and in order to improve the recognition accuracy of hand-written composition image, the device can also include:Input mould
Block obtains the content of text after error correction for content of text to be inputted preset error correcting model;Error correcting model is by multiple language moulds
Type forms;Language model is N gram language models;The value of N is positive integer.
Corresponding, the training process of error correcting model is specifically as follows, and obtains error correction training sample;It is wrapped in error correction training sample
It includes:Sample after content of text sample and corresponding error correction;According to the corresponding preset format of error correcting model to error correction training sample
It is normalized, obtains treated error correction training sample;According to treated error correction training sample to error correcting model into
Row training, obtains preset error correction training sample.
Wherein, error correcting model is specifically as follows the processing procedure of content of text, obtains the candidate word in content of text, obtains
The likeness in form word for taking candidate word also regard the likeness in form word as candidate word;Normalized is done to candidate word, such as candidate word
Capital and small letter, full half-angle, single plural form, digital punctuate, the punctuate being inserted into before and after space, candidate word between candidate word etc. into
Row normalized;Using in content of text candidate word and its likeness in form word as a row candidate word, calculate between each row candidate word
Transition probability, obtain the candidate word that corresponding transition probability is more than the first transition probability threshold value, in conjunction with optimum route search calculate
Method, obtains the most suitable word in each row candidate word, and integration obtains content of text.
It should be noted that during the transition probability between calculating each row candidate word, binary language can be first used
It says that model (bigram) calculates the transition probability between adjacent two row candidate word, is less than the second transition probability threshold in the transition probability
When value, corresponding candidate word is deleted;When the transition probability is more than the second transition probability threshold value, three gram language models are no longer used
(trigram) either polynary language model calculates the transition probability between continuous three row or multiple row candidate word, turns to reduce
The calculation amount of probability is moved, processing speed of the error correcting model to content of text is improved.
The automatic of the hand-written composition of the embodiment of the present invention corrects device, by obtaining hand-written composition image to be changed;It adopts
Texts and pictures picture is write with connected domain analysis algorithm and line detection algorithm opponent to handle, and is removed dry in hand-written composition image
It disturbs and background line, the hand-written composition image that obtains that treated;To treated, hand-written composition image is into every trade cutting and list
Word segmentation obtains multiple word image blocks;Multiple word image blocks are identified using preset word identification model, are obtained
The corresponding content of text of hand-written composition image;The cutting of content of text progress sentence, syntactic analysis, sentence are corrected, obtained hand-written
Composition image is corresponding to be corrected as a result, so as to before opponent's writing texts and pictures picture is corrected, and hand-written composition image is removed
In interference and background line, opponent write texts and pictures picture be identified, obtain corresponding content of text, then to content of text into
Row is corrected, to improve the accuracy corrected and correct efficiency.
Fig. 4 is the automatic structural schematic diagram for correcting device of the hand-written composition of another kind provided in an embodiment of the present invention.Such as Fig. 4
Shown, on the basis of embodiment shown in Fig. 3, the module 35 of correcting can specifically include:Cutting unit 351, analytic unit
352, recognition unit 353, input unit 354 and generation unit 355.
Wherein, cutting unit 351 is obtained for carrying out sentence cutting to the content of text in the content of text
Multiple sentences;
Analytic unit 352 carries out syntactic analysis for each sentence to the content of text, obtains point of each sentence
Analyse result;The analysis result includes:Word that the sentence includes, phrase, the part of speech of the word and described short
The type of language;
Recognition unit 353, according to the analysis result of the sentence, is chosen corresponding mistake and is known for being directed to each sentence
The sentence is identified in other model, obtains the error message in the sentence;
Input unit 354, for correcting suggestion mode by the error message input in the sentence is preset, described in acquisition
Sentence is corresponding to correct suggestion;
It is corresponding to generate the hand-written composition image for correcting suggestion according to each sentence is corresponding for generation unit 355
Correct result.
In the present embodiment, cutting unit 351 specifically can be used for obtaining the corresponding type of content of text;Type is for marking
Know the accuracy that content of text sentence divides;Obtain feature to be extracted corresponding with type;According to feature to be extracted, in text
Hold and carry out feature extraction, obtains the cutting characteristic information in content of text;Sentence is carried out to content of text according to cutting characteristic information
Sub- cutting obtains multiple sentences in content of text.
In the present embodiment, the automatic device of correcting of hand-written composition can determine content of text pair using the method for machine learning
The type answered.Wherein, type is specifically as follows hand-written composition image or common composition;The hand-written composition image of type refers to
It is that text content is the content of text that opponent writes that texts and pictures picture is identified.
For example, in the case where type is hand-written composition image, during hand-written, there is mistake and accorded with using punctuate
Number the case where, therefore, in order to improve sentence division accuracy, the corresponding feature to be extracted of type can be the lead-in of capitalization
Mother etc. carries out sentence cutting to content of text according to these features, obtains multiple sentences in content of text.It is common in type
In the case of composition, the corresponding feature to be extracted of type can be punctuation mark etc..
Further, in order to improve the accuracy corrected, before cutting unit 351 carries out cutting to content of text, institute
The device stated first can carry out pretreatment operation, such as coded treatment, checking treatment, filtering interference, normalization to content of text
The operations such as processing.
In the present embodiment, analytic unit 352 specifically can be used for, for each sentence in content of text, to the sentence
Son carries out word fractionation, obtains the word in the sentence;Obtain the part of speech of the word;By the sentence and preset phrase
Regular expression is matched, and the phrase etc. in the sentence is obtained.In addition, hand-written composition it is automatic correct device can also root
Phrase regular expression is adjusted according to matching result.
In the present embodiment, each word, the part of speech of word or the corresponding mistake of phrase are prestored in preset model library
Identification model, for the part of speech or phrase of word, word in sentence, interrogation model library obtains corresponding wrong identification mould
The corresponding wrong identification model of the sentence inputting is obtained the error message in sentence by type.Wherein, error message for example may be used
Think, preposition uses mistake etc. using mistake, phrase using mistake, word.
Further, recognition unit 353 is specifically used for, for each sentence, obtain each word in sentence it is corresponding on
Context information;According to the corresponding contextual information of each word, the corresponding matrix-vector of each word is determined;According to each word
Part of speech, choose corresponding with part of speech wrong identification model;By the corresponding matrix-vector of each word in sentence, part of speech pair is inputted
The wrong identification model answered obtains the error message in sentence.
Wherein, the corresponding contextual information of each word in sentence refers in sentence in the pre-determined distance of each word
Other words information and sentence include word phrase information.For example, in sentence " I am interested in
In something ", other words in the pre-determined distance of preposition " in " for example may include " am ", " interested ",
“something”.The corresponding matrix-vector of each word for example can be Word2Vec models according to the corresponding context of word
The matrix-vector that information generates.
Wherein, part of speech is such as noun, verb, preposition.In the present embodiment, by the corresponding matrix of each word in sentence to
Amount, after inputting the corresponding wrong identification model of part of speech, wrong identification model can determine single according to the corresponding matrix-vector of word
Word is the probability of each word, such as probability that the preposition " in " mentioned in above-mentioned example be " on ", the probability for being " at ", is
The probability of " before ", the probability for being " after ", be " since " probability etc., in the highest preposition of corresponding probability and sentence
When preposition " in " in son is different, the highest preposition of corresponding probability is determined as the error message in sentence;If corresponding general
The highest preposition of rate is preposition " in ", then it represents that the preposition " in " in sentence uses correctly, without error message.
Further, on the basis of the above embodiments, correcting module 35 can also include:Comparing unit, for being directed to
The sentence is compared with preset error pattern library, obtains the error message in the sentence by each sentence;The mistake
Accidentally pattern base includes:The corresponding regular expression of a variety of error patterns.
In the present embodiment, after the error message in getting each sentence, can by the error message in sentence, or
Suggestion mode is corrected by the sentence inputting including error message is preset, obtains that sentence is corresponding to correct suggestion.
Wherein, can be that training sample is corrected in acquisition, is corrected in training sample and is wrapped to the training process for correcting suggestion mode
It includes:A large amount of sentence with error message and it is corresponding correct suggestion, initial correcting is built according to training sample is corrected
View model is trained, and obtains described preset correcting suggestion mode.Wherein, it can be time recurrent neural to correct suggestion mode
Network.
In the present embodiment, generation unit 355 specifically can be used for, and for each sentence, correct suggestion sentence is corresponding
When being multiple, by sentence it is corresponding it is multiple correct that suggestion input is preset to correct preference pattern, obtain it is corresponding with sentence individually
Correct suggestion;Suggestion is individually corrected according to each sentence is corresponding, the hand-written composition image of generation is corresponding to correct result.
In the present embodiment, correcting preference pattern, can be that sentence is corresponding correct suggestion and give a mark, by highest scoring
Suggestion is corrected as sentence is corresponding and most probable corrects suggestion.
Further, on the basis of the above embodiments, texts and pictures picture can be write with opponent to score, it is therefore, described
Device can also include:Grading module, for obtaining the characteristic information in content of text;Characteristic information includes in following information
Any one or it is a variety of:Lexical information, syntactic information, sentence information correct information;Believe for vocabulary in characteristic information
When ceasing either syntactic information or sentence information, characteristic information is inputted into corresponding Rating Model, it is corresponding to obtain characteristic information
Scoring;Include in characteristic information:Lexical information, syntactic information, sentence information and when correcting information, characteristic information are inputted default
Comprehensive grade model, obtain the corresponding comprehensive score of hand-written composition image.
For example, in the case where characteristic information includes lexical information, lexical information is inputted into corresponding vocabulary scoring mould
Type obtains vocabulary scoring;In the case where characteristic information includes syntactic information, syntactic information is inputted into corresponding grammer and is scored
Model obtains grammer scoring;When characteristic information includes sentence information, such as the structure of sentence, length etc., by sentence information
Corresponding sentence Rating Model is inputted, sentence scoring is obtained.
In the present embodiment, each word in content of text can be indicated with corresponding unique vector, such as use one-hot
Vector indicates.Wherein, the number of dimensions of one-hot vectors is the total quantity of all words, the corresponding one-hot of each word to
Amount is only 1 value in corresponding dimension, is 0 value in other dimensions.For example, the total quantity in all words is 5000, content of text
In in the case that the first word is the 1000th word, the number of dimensions of the corresponding one-hot vectors of the first word is 5000, the
The value of the 1000th dimension is 1 in the corresponding one-hot vectors of one word, is 0 value in other dimensions.In the present embodiment, in text
Lexical information in appearance can specifically be indicated with the corresponding unique vector of each vocabulary in content of text, that is to say, that vocabulary is commented
The input of sub-model can be the corresponding vector set of content of text.The corresponding vector set of content of text is referred to text
The vector set that each vocabulary obtains after being replaced with corresponding unique vector in content.
Corresponding, the training process of vocabulary Rating Model can be to obtain the corresponding vector set of composition sample, Yi Jizuo
The corresponding vocabulary scoring of literary sample;The corresponding vector set of the sample that will write a composition, and the corresponding vocabulary scoring input of composition sample
Vocabulary Rating Model is trained vocabulary Rating Model.
In the present embodiment, the syntactic information in content of text and sentence information can also use the corresponding vector of content of text
Set expression.Corresponding, the training process of grammer Rating Model is specifically as follows, and obtains the corresponding vector set of composition sample,
And the corresponding grammer scoring of composition sample;The corresponding vector set of the sample that will write a composition, and the corresponding grammer of composition sample are commented
Divide input syntax Rating Model, grammer Rating Model is trained.The training process of sentence Rating Model is specifically as follows, and obtains
It is taken as the corresponding vector set of literary sample, and the corresponding sentence scoring of composition sample;The corresponding vector set of the sample that will write a composition,
And the corresponding sentence scoring input sentence Rating Model of composition sample, sentence Rating Model is trained.
In another example in the case where characteristic information includes lexical information, syntactic information, sentence information and corrects information,
Characteristic information can successively be carried out to following operate:Vector after vectorization is inputted convolutional neural networks by vectorization operation
CNN, attention machine is inputted by output input time recurrent neural network LSTM of convolutional neural networks CNN, by the output of LSTM
The output of attention processed, attention mechanism are the corresponding comprehensive score of hand-written composition image.
In the present embodiment, vocabulary Rating Model, grammer Rating Model, sentence Rating Model, convolutional neural networks CNN, when
Between recurrent neural network LSTM, attention mechanism attention can be trained according to corresponding training sample, herein no longer
It elaborates.
In the present embodiment, the error number of type of error and each type of error that information refers in content of text is corrected
Amount.Lexical information, syntactic information and sentence information can use the corresponding vectorial set expression of content of text.Corresponding, synthesis is commented
The training process of sub-model is specifically as follows, and obtains the corresponding vector set of the composition sample, type of error in composition sample, every
Number of errors, the corresponding comprehensive score of composition sample of kind type of error;The corresponding vector set of the sample that will write a composition, composition sample
In type of error, each type of error number of errors, the corresponding comprehensive score of composition sample input comprehensive grade model, it is right
Comprehensive grade model is trained.
The automatic of the hand-written composition of the embodiment of the present invention corrects device, by obtaining hand-written composition image to be changed;It adopts
Texts and pictures picture is write with connected domain analysis algorithm and line detection algorithm opponent to handle, and is removed dry in hand-written composition image
It disturbs and background line, the hand-written composition image that obtains that treated;To treated, hand-written composition image is into every trade cutting and list
Word segmentation obtains multiple word image blocks;Multiple word image blocks are identified using preset word identification model, are obtained
The corresponding content of text of hand-written composition image;Sentence cutting is carried out to content of text, carrying out part of speech analysis to each sentence obtains
The part of speech of each word, shallow syntactic analysis obtain phrase and its type in sentence, and then choose the regular plan of particular error type
Summary or depth model, detect specific syntax error;To each syntax error detected, the identification of resultant fault identification model
As a result and correct suggestion mode generation correct suggestion, to obtain entire chapter composition correct result;Based on composition content of text with
And correct the intelligent scoring realized as a result, build convolutional neural networks to text of writing a composition.
Fig. 5 is the automatic structural schematic diagram for correcting device of the hand-written composition of another kind provided in an embodiment of the present invention.The hand
The automatic of writing text corrects device and includes:
Memory 1001, processor 1002 and it is stored in the calculating that can be run on memory 1001 and on processor 1002
Machine program.
Processor 1002 realizes that the automatic of the hand-written composition provided in above-described embodiment corrects method when executing described program.
Further, the automatic of hand-written composition corrects device and further includes:
Communication interface 1003, for the communication between memory 1001 and processor 1002.
Memory 1001, for storing the computer program that can be run on processor 1002.
Memory 1001 may include high-speed RAM memory, it is also possible to further include nonvolatile memory (non-
Volatile memory), a for example, at least magnetic disk storage.
Processor 1002 realizes the automatic side of correcting of the hand-written composition described in above-described embodiment when for executing described program
Method.
If memory 1001, processor 1002 and the independent realization of communication interface 1003, communication interface 1003, memory
1001 and processor 1002 can be connected with each other by bus and complete mutual communication.The bus can be industrial standard
Architecture (Industry Standard Architecture, referred to as ISA) bus, external equipment interconnection
(Peripheral Component, referred to as PCI) bus or extended industry-standard architecture (Extended Industry
Standard Architecture, referred to as EISA) bus etc..The bus can be divided into address bus, data/address bus, control
Bus processed etc..For ease of indicating, only indicated with a thick line in Fig. 5, it is not intended that an only bus or a type of
Bus.
Optionally, in specific implementation, if memory 1001, processor 1002 and communication interface 1003, are integrated in one
It is realized on block chip, then memory 1001, processor 1002 and communication interface 1003 can be completed mutual by internal interface
Communication.
Processor 1002 may be a central processing unit (Central Processing Unit, referred to as CPU), or
Person is specific integrated circuit (Application Specific Integrated Circuit, referred to as ASIC) or quilt
It is configured to implement one or more integrated circuits of the embodiment of the present invention.
The present invention also provides a kind of non-transitorycomputer readable storage mediums, are stored thereon with computer program, the journey
Realize that the automatic of hand-written composition as described above corrects method when sequence is executed by processor.
The present invention also provides a kind of computer program products, when the instruction processing unit in the computer program product executes
When, a kind of the automatic of hand-written composition of execution corrects method, the method includes:
Obtain hand-written composition image to be changed;
The hand-written composition image is handled using connected domain analysis algorithm and line detection algorithm, described in removal
Interference in hand-written composition image and background line, the hand-written composition image that obtains that treated;
To treated the hand-written composition image into every trade cutting and the segmentation of words, multiple word image blocks are obtained;
Multiple word image blocks are identified using preset word identification model, obtain the hand-written composition image pair
The content of text answered;
The cutting of content of text progress sentence, syntactic analysis, sentence are corrected, the hand-written composition image is obtained and corresponds to
Correct result.
In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show
The description of example " or " some examples " etc. means specific features, structure, material or spy described in conjunction with this embodiment or example
Point is included at least one embodiment or example of the invention.In the present specification, schematic expression of the above terms are not
It must be directed to identical embodiment or example.Moreover, particular features, structures, materials, or characteristics described can be in office
It can be combined in any suitable manner in one or more embodiments or example.In addition, without conflicting with each other, the skill of this field
Art personnel can tie the feature of different embodiments or examples described in this specification and different embodiments or examples
It closes and combines.
In addition, term " first ", " second " are used for description purposes only, it is not understood to indicate or imply relative importance
Or implicitly indicate the quantity of indicated technical characteristic.Define " first " as a result, the feature of " second " can be expressed or
Implicitly include at least one this feature.In the description of the present invention, the meaning of " plurality " is at least two, such as two, three
It is a etc., unless otherwise specifically defined.
Any process described otherwise above or method description are construed as in flow chart or herein, and expression includes
It is one or more for realizing custom logic function or process the step of executable instruction code module, segment or portion
Point, and the range of the preferred embodiment of the present invention includes other realization, wherein can not press shown or discuss suitable
Sequence, include according to involved function by it is basic simultaneously in the way of or in the opposite order, to execute function, this should be of the invention
Embodiment person of ordinary skill in the field understood.
Expression or logic and/or step described otherwise above herein in flow charts, for example, being considered use
In the order list for the executable instruction for realizing logic function, may be embodied in any computer-readable medium, for
Instruction execution system, device or equipment (system of such as computer based system including processor or other can be held from instruction
The instruction fetch of row system, device or equipment and the system executed instruction) it uses, or combine these instruction execution systems, device or set
It is standby and use.For the purpose of this specification, " computer-readable medium " can any can be included, store, communicating, propagating or passing
Defeated program is for instruction execution system, device or equipment or the dress used in conjunction with these instruction execution systems, device or equipment
It sets.The more specific example (non-exhaustive list) of computer-readable medium includes following:Electricity with one or more wiring
Interconnecting piece (electronic device), portable computer diskette box (magnetic device), random access memory (RAM), read-only memory
(ROM), erasable edit read-only storage (EPROM or flash memory), fiber device and portable optic disk is read-only deposits
Reservoir (CDROM).In addition, computer-readable medium can even is that the paper that can print described program on it or other are suitable
Medium, because can be for example by carrying out optical scanner to paper or other media, then into edlin, interpretation or when necessary with it
His suitable method is handled electronically to obtain described program, is then stored in computer storage.
It should be appreciated that each section of the present invention can be realized with hardware, software, firmware or combination thereof.Above-mentioned
In embodiment, software that multiple steps or method can in memory and by suitable instruction execution system be executed with storage
Or firmware is realized.Such as, if realized in another embodiment with hardware, following skill well known in the art can be used
Any one of art or their combination are realized:With for data-signal realize logic function logic gates from
Logic circuit is dissipated, the application-specific integrated circuit with suitable combinational logic gate circuit, programmable gate array (PGA), scene can compile
Journey gate array (FPGA) etc..
Those skilled in the art are appreciated that realize all or part of step that above-described embodiment method carries
Suddenly it is that relevant hardware can be instructed to complete by program, the program can be stored in a kind of computer-readable storage medium
In matter, which includes the steps that one or a combination set of embodiment of the method when being executed.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing module, it can also
That each unit physically exists alone, can also two or more units be integrated in a module.Above-mentioned integrated mould
The form that hardware had both may be used in block is realized, can also be realized in the form of software function module.The integrated module is such as
Fruit is realized in the form of software function module and when sold or used as an independent product, can also be stored in a computer
In read/write memory medium.
Storage medium mentioned above can be read-only memory, disk or CD etc..Although having been shown and retouching above
The embodiment of the present invention is stated, it is to be understood that above-described embodiment is exemplary, and should not be understood as the limit to the present invention
System, those skilled in the art can be changed above-described embodiment, change, replace and become within the scope of the invention
Type.
Claims (21)
1. a kind of the automatic of hand-written composition corrects method, which is characterized in that including:
Obtain hand-written composition image to be changed;
The hand-written composition image is handled using connected domain analysis algorithm and line detection algorithm, removal is described hand-written
Interference and the background line write a composition in image, the hand-written composition image that obtains that treated;
To treated the hand-written composition image into every trade cutting and the segmentation of words, multiple word image blocks are obtained;
Multiple word image blocks are identified using preset word identification model, it is corresponding to obtain the hand-written composition image
Content of text;
The cutting of content of text progress sentence, syntactic analysis, sentence are corrected, hand-written corresponding batch of the composition image is obtained
Change result.
2. according to the method described in claim 1, it is characterized in that, it is described to treated the hand-written composition image into every trade
Cutting and the segmentation of words further include before obtaining multiple word image blocks:
Treated that hand-written composition image carries out word fractionation to described, will be between adjacent rows in the hand-written composition image
Adhesion part disconnects.
3. according to the method described in claim 1, it is characterized in that, described use preset word identification model to multiple words
Image block is identified, and before obtaining the corresponding content of text of the hand-written composition image, further includes:
The word image block is normalized according to the word identification model corresponding preset format, acquisition processing
Word image block afterwards.
4. method according to claim 1 or 3, which is characterized in that described to use preset word identification model to multiple
Word image block is identified, and before obtaining the corresponding content of text of the hand-written composition image, further includes:
Obtain word identification sample;The word identification sample includes:In handwritten word image pattern and corresponding text
Hold;
The word identification sample is normalized according to the word identification model corresponding preset format, at acquisition
Word identification sample after reason;
Treated that word identification sample is trained the word identification model according to described, obtains the preset word
Identification model.
5. according to the method described in claim 4, it is characterized in that, treated described in basis word identification sample is to institute
It states word identification model to be trained, obtains the preset word identification model, including:
The word identification sample is divided according to word length, obtains short word identification sample, middle word identification sample
And long word identification sample;
Use the short word identification sample, middle word identification sample and long word identification sample to the word identification successively
Model is trained, and obtains the preset word identification model.
6. according to the method described in claim 1, it is characterized in that, described carry out sentence cutting, syntax to the content of text
Analysis, sentence are corrected, and before obtaining that the hand-written composition image is corresponding and correcting result, further include:
The content of text is inputted into preset error correcting model, obtains the content of text after error correction;The error correcting model is by multiple
Language model forms;The language model is N gram language models;The value of N is positive integer.
7. according to the method described in claim 6, it is characterized in that, described input preset error correction mould by the content of text
Type, obtain error correction after content of text before, further include:
Obtain error correction training sample;The error correction training sample includes:Sample after content of text sample and corresponding error correction;
The error correction training sample is normalized according to the error correcting model corresponding preset format, after acquisition processing
Error correction training sample;
Treated that error correction training sample is trained the error correcting model according to described, obtains the preset error correction training
Sample.
8. according to the method described in claim 1, it is characterized in that, described carry out sentence cutting, syntax to the content of text
Analysis, sentence are corrected, and the acquisition hand-written composition image is corresponding to be corrected as a result, including:
Sentence cutting is carried out to the content of text, obtains multiple sentences in the content of text;
Syntactic analysis is carried out to each sentence of the content of text, obtains the analysis result of each sentence;The analysis result
Include:The type of word, phrase, the part of speech of the word and the phrase that the sentence includes;
It chooses corresponding wrong identification model according to the analysis result of the sentence for each sentence and the sentence is carried out
Identification, obtains the error message in the sentence;
Suggestion mode is corrected by the error message input in the sentence is preset, obtains that the sentence is corresponding to correct suggestion;
Suggestion is corrected according to each sentence is corresponding, the generation hand-written composition image is corresponding to correct result.
9. according to the method described in claim 8, it is characterized in that, described be directed to each sentence, according to the analysis of the sentence
The sentence is identified as a result, choosing corresponding wrong identification model, obtains the error message in the sentence, including:
For each sentence, the corresponding contextual information of each word in the sentence is obtained;
According to the corresponding contextual information of each word, the corresponding matrix-vector of each word is determined;
According to the part of speech of each word, wrong identification model corresponding with the part of speech is chosen;
By the corresponding matrix-vector of each word in the sentence, the corresponding wrong identification model of input part of speech obtains the sentence
Error message in son.
10. according to the method described in claim 9, it is characterized in that, described by the corresponding matrix of each word in the sentence
Vector, the corresponding wrong identification model of input part of speech further include before obtaining the error message in the sentence:
Obtain wrong identification training sample corresponding with each part of speech;The wrong identification training sample includes:With described
The corresponding matrix-vector of word of part of speech and the corresponding error message of the word;
For the corresponding wrong identification model of each part of speech, using the corresponding wrong identification training sample of the part of speech to the mistake
Misrecognition model is trained.
11. according to the method described in claim 8, it is characterized in that, described carry out sentence cutting, acquisition to the content of text
Multiple sentences in the content of text, including:
Obtain the corresponding type of the content of text;The type is used to identify the accuracy that the content of text sentence divides;
Obtain feature to be extracted corresponding with the type;
According to the feature to be extracted, feature extraction is carried out to the content of text, the cutting obtained in the content of text is special
Reference ceases;
Sentence cutting is carried out to the content of text according to the cutting characteristic information, obtains multiple sentences in the content of text
Son.
12. according to the method described in claim 8, it is characterized in that, described be directed to each sentence, according to the analysis of the sentence
The sentence is identified as a result, choosing corresponding wrong identification model, before obtaining the error message in the sentence, also
Including:
For each sentence, the sentence is compared with preset error pattern library, obtains the mistake letter in the sentence
Breath;The error pattern library includes:The corresponding regular expression of a variety of error patterns.
13. the method according to claim 8 or 12, which is characterized in that each sentence of basis is corresponding to correct suggestion,
Generate that the hand-written composition image is corresponding to be corrected as a result, including:
For each sentence, the sentence it is corresponding to correct suggestion be multiple when, corresponding multiple correct of the sentence is built
View input is preset to correct preference pattern, and acquisition is corresponding with the sentence individually to correct suggestion;
Suggestion is individually corrected according to each sentence is corresponding, the generation hand-written composition image is corresponding to correct result.
14. according to the method described in claim 8, it is characterized in that, described that error message input in the sentence is default
Correct suggestion mode, before obtaining that the sentence is corresponding and correcting suggestion, further include:
Training sample is corrected in acquisition;The training sample of correcting includes:Sentence sample, the error message in sentence sample and
Sentence sample is corresponding to correct suggestion;
It corrects training sample according to described and is trained to correcting suggestion mode, obtain described preset correcting suggestion mode.
15. according to the method described in claim 1, it is characterized in that, described carry out sentence cutting, syntax to the content of text
Analysis, sentence are corrected, and after obtaining that the hand-written composition image is corresponding and correcting result, further include:
Obtain the characteristic information in the content of text;The characteristic information include in following information any one or it is more
Kind:Lexical information, syntactic information, sentence information correct information;
It is lexical information either syntactic information or when sentence information in the characteristic information, the characteristic information is inputted and is corresponded to
Rating Model, obtain the corresponding scoring of the characteristic information;
Include in the characteristic information:Lexical information, syntactic information, sentence information and when correcting information, by the characteristic information
Preset comprehensive grade model is inputted, the corresponding comprehensive score of the hand-written composition image is obtained.
16. a kind of the automatic of hand-written composition corrects device, which is characterized in that including:
Acquisition module, for obtaining hand-written composition image to be changed;
Processing module, at using connected domain analysis algorithm and line detection algorithm to the hand-written composition image
Reason, the interference in the removal hand-written composition image and background line, the hand-written composition image that obtains that treated;
Cutting module, for, into every trade cutting and the segmentation of words, obtaining multiple lists to treated the hand-written composition image
Word image block;
Identification module is obtained described hand-written for multiple word image blocks to be identified using preset word identification model
The corresponding content of text of composition image;
Module is corrected, for being corrected to the cutting of content of text progress sentence, syntactic analysis, sentence, obtains the hand-written work
Texts and pictures correct result as corresponding.
17. device according to claim 16, which is characterized in that the module of correcting includes:
Cutting unit obtains multiple sentences in the content of text for carrying out sentence cutting to the content of text;
Analytic unit carries out syntactic analysis for each sentence to the content of text, obtains the analysis result of each sentence;
The analysis result includes:The class of word, phrase, the part of speech of the word and the phrase that the sentence includes
Type;
Recognition unit, for choosing corresponding wrong identification model pair according to the analysis result of the sentence for each sentence
The sentence is identified, and obtains the error message in the sentence;
Input unit, for by the sentence error message input it is preset correct suggestion mode, obtain the sentence pair
That answers corrects suggestion;
Generation unit, for correcting suggestion according to each sentence is corresponding, the generation hand-written composition image is corresponding to correct knot
Fruit.
18. device according to claim 17, which is characterized in that the recognition unit is specifically used for,
For each sentence, the corresponding contextual information of each word in the sentence is obtained;
According to the corresponding contextual information of each word, the corresponding matrix-vector of each word is determined;
According to the part of speech of each word, wrong identification model corresponding with the part of speech is chosen;
By the corresponding matrix-vector of each word in the sentence, the corresponding wrong identification model of input part of speech obtains the sentence
Error message in son.
19. a kind of the automatic of hand-written composition corrects device, which is characterized in that including:
Memory, processor and storage are on a memory and the computer program that can run on a processor, which is characterized in that institute
It states and realizes that the automatic of the hand-written composition as described in any in claim 1-15 corrects method when processor executes described program.
20. a kind of non-transitorycomputer readable storage medium, is stored thereon with computer program, which is characterized in that the program
Realize that the automatic of the hand-written composition as described in any in claim 1-15 corrects method when being executed by processor.
21. a kind of computer program product executes a kind of hand when the instruction processing unit in the computer program product executes
The automatic of writing text corrects method, the method includes:
Obtain hand-written composition image to be changed;
The hand-written composition image is handled using connected domain analysis algorithm and line detection algorithm, removal is described hand-written
Interference and the background line write a composition in image, the hand-written composition image that obtains that treated;
To treated the hand-written composition image into every trade cutting and the segmentation of words, multiple word image blocks are obtained;
Multiple word image blocks are identified using preset word identification model, it is corresponding to obtain the hand-written composition image
Content of text;
The cutting of content of text progress sentence, syntactic analysis, sentence are corrected, hand-written corresponding batch of the composition image is obtained
Change result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810223663.8A CN108595410B (en) | 2018-03-19 | 2018-03-19 | Automatic correction method and device for handwritten composition |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810223663.8A CN108595410B (en) | 2018-03-19 | 2018-03-19 | Automatic correction method and device for handwritten composition |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108595410A true CN108595410A (en) | 2018-09-28 |
CN108595410B CN108595410B (en) | 2023-03-24 |
Family
ID=63626800
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810223663.8A Active CN108595410B (en) | 2018-03-19 | 2018-03-19 | Automatic correction method and device for handwritten composition |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108595410B (en) |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109670040A (en) * | 2018-11-27 | 2019-04-23 | 平安科技(深圳)有限公司 | Write householder method, device and storage medium, computer equipment |
CN110188274A (en) * | 2019-05-30 | 2019-08-30 | 口口相传(北京)网络技术有限公司 | Search for error correction method and device |
CN110489747A (en) * | 2019-07-31 | 2019-11-22 | 北京大米科技有限公司 | A kind of image processing method, device, storage medium and electronic equipment |
CN110765996A (en) * | 2019-10-21 | 2020-02-07 | 北京百度网讯科技有限公司 | Text information processing method and device |
CN110851599A (en) * | 2019-11-01 | 2020-02-28 | 中山大学 | Automatic scoring method and teaching and assisting system for Chinese composition |
CN111079500A (en) * | 2019-07-11 | 2020-04-28 | 广东小天才科技有限公司 | Method and system for correcting dictation content |
CN111199801A (en) * | 2018-11-19 | 2020-05-26 | 零氪医疗智能科技(广州)有限公司 | Construction method and application of model for identifying disease types of medical records |
CN111737968A (en) * | 2019-03-20 | 2020-10-02 | 小船出海教育科技(北京)有限公司 | Method and terminal for automatically correcting and scoring composition |
CN111950240A (en) * | 2020-08-26 | 2020-11-17 | 北京高途云集教育科技有限公司 | Data correction method, device and system |
CN112149680A (en) * | 2020-09-28 | 2020-12-29 | 武汉悦学帮网络技术有限公司 | Wrong word detection and identification method and device, electronic equipment and storage medium |
CN112528651A (en) * | 2021-02-08 | 2021-03-19 | 深圳市阿卡索资讯股份有限公司 | Intelligent correction method, system, electronic equipment and storage medium |
CN112597754A (en) * | 2020-12-23 | 2021-04-02 | 北京百度网讯科技有限公司 | Text error correction method and device, electronic equipment and readable storage medium |
CN112634689A (en) * | 2020-12-24 | 2021-04-09 | 广州奇大教育科技有限公司 | Application method of regular expression in automatic subjective question changing in computer teaching |
CN113361511A (en) * | 2020-03-05 | 2021-09-07 | 顺丰科技有限公司 | Method, device and equipment for establishing correction model and computer readable storage medium |
CN113536743A (en) * | 2020-11-06 | 2021-10-22 | 腾讯科技(深圳)有限公司 | Text processing method and related device |
CN113836894A (en) * | 2021-09-26 | 2021-12-24 | 武汉天喻信息产业股份有限公司 | Multidimensional English composition scoring method and device and readable storage medium |
CN114489439A (en) * | 2022-01-20 | 2022-05-13 | 安徽淘云科技股份有限公司 | Article correcting method and related equipment thereof |
Citations (24)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TW315432B (en) * | 1995-08-31 | 1997-09-11 | Nat Univ Tsing Hua | The auto debugging and correcting device and method for computer document |
JPH09305714A (en) * | 1996-05-17 | 1997-11-28 | N T T Data Tsushin Kk | System and method for recognizing character |
US6085206A (en) * | 1996-06-20 | 2000-07-04 | Microsoft Corporation | Method and system for verifying accuracy of spelling and grammatical composition of a document |
US6154579A (en) * | 1997-08-11 | 2000-11-28 | At&T Corp. | Confusion matrix based method and system for correcting misrecognized words appearing in documents generated by an optical character recognition technique |
US20040006466A1 (en) * | 2002-06-28 | 2004-01-08 | Ming Zhou | System and method for automatic detection of collocation mistakes in documents |
US20050074169A1 (en) * | 2001-02-16 | 2005-04-07 | Parascript Llc | Holistic-analytical recognition of handwritten text |
WO2005045786A1 (en) * | 2003-10-27 | 2005-05-19 | Educational Testing Service | Automatic essay scoring system |
US20080077859A1 (en) * | 1998-05-26 | 2008-03-27 | Global Information Research And Technologies Llc | Spelling and grammar checking system |
WO2011044658A1 (en) * | 2009-10-15 | 2011-04-21 | 2167959 Ontario Inc. | System and method for text cleaning |
WO2012039686A1 (en) * | 2010-09-24 | 2012-03-29 | National University Of Singapore | Methods and systems for automated text correction |
CN103294660A (en) * | 2012-02-29 | 2013-09-11 | 张跃 | Automatic English composition scoring method and system |
CN103365838A (en) * | 2013-07-24 | 2013-10-23 | 桂林电子科技大学 | Method for automatically correcting syntax errors in English composition based on multivariate features |
US20140214401A1 (en) * | 2013-01-29 | 2014-07-31 | Tencent Technology (Shenzhen) Company Limited | Method and device for error correction model training and text error correction |
CN105045779A (en) * | 2015-07-13 | 2015-11-11 | 北京大学 | Deep neural network and multi-tag classification based wrong sentence detection method |
CN105183713A (en) * | 2015-08-27 | 2015-12-23 | 北京时代焦点国际教育咨询有限责任公司 | English composition automatic correcting method and system |
CN105279149A (en) * | 2015-10-21 | 2016-01-27 | 上海应用技术学院 | Chinese text automatic correction method |
WO2016147330A1 (en) * | 2015-03-18 | 2016-09-22 | 株式会社日立製作所 | Text processing method and text processing system |
WO2017043130A1 (en) * | 2015-09-07 | 2017-03-16 | 信也 赤木 | Text evaluation device, text evaluation method, and program |
CN106610930A (en) * | 2015-10-22 | 2017-05-03 | 科大讯飞股份有限公司 | Foreign language writing automatic error correction method and system |
CN107239449A (en) * | 2017-06-08 | 2017-10-10 | 锦州医科大学 | A kind of English recognition methods and interpretation method |
CN107357775A (en) * | 2017-06-05 | 2017-11-17 | 百度在线网络技术(北京)有限公司 | The text error correction method and device of Recognition with Recurrent Neural Network based on artificial intelligence |
US20170337443A1 (en) * | 2014-11-06 | 2017-11-23 | Achiav KOLTON | Location based optical character recognition (ocr) |
CN107403130A (en) * | 2017-04-19 | 2017-11-28 | 北京粉笔未来科技有限公司 | A kind of character identifying method and character recognition device |
CN107704859A (en) * | 2017-11-01 | 2018-02-16 | 哈尔滨工业大学深圳研究生院 | A kind of character recognition method based on deep learning training framework |
-
2018
- 2018-03-19 CN CN201810223663.8A patent/CN108595410B/en active Active
Patent Citations (24)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TW315432B (en) * | 1995-08-31 | 1997-09-11 | Nat Univ Tsing Hua | The auto debugging and correcting device and method for computer document |
JPH09305714A (en) * | 1996-05-17 | 1997-11-28 | N T T Data Tsushin Kk | System and method for recognizing character |
US6085206A (en) * | 1996-06-20 | 2000-07-04 | Microsoft Corporation | Method and system for verifying accuracy of spelling and grammatical composition of a document |
US6154579A (en) * | 1997-08-11 | 2000-11-28 | At&T Corp. | Confusion matrix based method and system for correcting misrecognized words appearing in documents generated by an optical character recognition technique |
US20080077859A1 (en) * | 1998-05-26 | 2008-03-27 | Global Information Research And Technologies Llc | Spelling and grammar checking system |
US20050074169A1 (en) * | 2001-02-16 | 2005-04-07 | Parascript Llc | Holistic-analytical recognition of handwritten text |
US20040006466A1 (en) * | 2002-06-28 | 2004-01-08 | Ming Zhou | System and method for automatic detection of collocation mistakes in documents |
WO2005045786A1 (en) * | 2003-10-27 | 2005-05-19 | Educational Testing Service | Automatic essay scoring system |
WO2011044658A1 (en) * | 2009-10-15 | 2011-04-21 | 2167959 Ontario Inc. | System and method for text cleaning |
WO2012039686A1 (en) * | 2010-09-24 | 2012-03-29 | National University Of Singapore | Methods and systems for automated text correction |
CN103294660A (en) * | 2012-02-29 | 2013-09-11 | 张跃 | Automatic English composition scoring method and system |
US20140214401A1 (en) * | 2013-01-29 | 2014-07-31 | Tencent Technology (Shenzhen) Company Limited | Method and device for error correction model training and text error correction |
CN103365838A (en) * | 2013-07-24 | 2013-10-23 | 桂林电子科技大学 | Method for automatically correcting syntax errors in English composition based on multivariate features |
US20170337443A1 (en) * | 2014-11-06 | 2017-11-23 | Achiav KOLTON | Location based optical character recognition (ocr) |
WO2016147330A1 (en) * | 2015-03-18 | 2016-09-22 | 株式会社日立製作所 | Text processing method and text processing system |
CN105045779A (en) * | 2015-07-13 | 2015-11-11 | 北京大学 | Deep neural network and multi-tag classification based wrong sentence detection method |
CN105183713A (en) * | 2015-08-27 | 2015-12-23 | 北京时代焦点国际教育咨询有限责任公司 | English composition automatic correcting method and system |
WO2017043130A1 (en) * | 2015-09-07 | 2017-03-16 | 信也 赤木 | Text evaluation device, text evaluation method, and program |
CN105279149A (en) * | 2015-10-21 | 2016-01-27 | 上海应用技术学院 | Chinese text automatic correction method |
CN106610930A (en) * | 2015-10-22 | 2017-05-03 | 科大讯飞股份有限公司 | Foreign language writing automatic error correction method and system |
CN107403130A (en) * | 2017-04-19 | 2017-11-28 | 北京粉笔未来科技有限公司 | A kind of character identifying method and character recognition device |
CN107357775A (en) * | 2017-06-05 | 2017-11-17 | 百度在线网络技术(北京)有限公司 | The text error correction method and device of Recognition with Recurrent Neural Network based on artificial intelligence |
CN107239449A (en) * | 2017-06-08 | 2017-10-10 | 锦州医科大学 | A kind of English recognition methods and interpretation method |
CN107704859A (en) * | 2017-11-01 | 2018-02-16 | 哈尔滨工业大学深圳研究生院 | A kind of character recognition method based on deep learning training framework |
Non-Patent Citations (2)
Title |
---|
卫晓欣: "基于长短型记忆递归神经网络的英文手写识别", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
陈珊珊: "自动作文评分模型及方法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111199801A (en) * | 2018-11-19 | 2020-05-26 | 零氪医疗智能科技(广州)有限公司 | Construction method and application of model for identifying disease types of medical records |
CN111199801B (en) * | 2018-11-19 | 2023-08-08 | 零氪医疗智能科技(广州)有限公司 | Construction method and application of model for identifying disease types of medical records |
CN109670040B (en) * | 2018-11-27 | 2024-04-05 | 平安科技(深圳)有限公司 | Writing assistance method and device, storage medium and computer equipment |
CN109670040A (en) * | 2018-11-27 | 2019-04-23 | 平安科技(深圳)有限公司 | Write householder method, device and storage medium, computer equipment |
CN111737968A (en) * | 2019-03-20 | 2020-10-02 | 小船出海教育科技(北京)有限公司 | Method and terminal for automatically correcting and scoring composition |
CN110188274B (en) * | 2019-05-30 | 2021-06-08 | 口口相传(北京)网络技术有限公司 | Search error correction method and device |
CN110188274A (en) * | 2019-05-30 | 2019-08-30 | 口口相传(北京)网络技术有限公司 | Search for error correction method and device |
CN111079500B (en) * | 2019-07-11 | 2023-10-27 | 广东小天才科技有限公司 | Method and system for correcting dictation content |
CN111079500A (en) * | 2019-07-11 | 2020-04-28 | 广东小天才科技有限公司 | Method and system for correcting dictation content |
CN110489747A (en) * | 2019-07-31 | 2019-11-22 | 北京大米科技有限公司 | A kind of image processing method, device, storage medium and electronic equipment |
CN110765996B (en) * | 2019-10-21 | 2022-07-29 | 北京百度网讯科技有限公司 | Text information processing method and device |
CN110765996A (en) * | 2019-10-21 | 2020-02-07 | 北京百度网讯科技有限公司 | Text information processing method and device |
CN110851599B (en) * | 2019-11-01 | 2023-04-28 | 中山大学 | Automatic scoring method for Chinese composition and teaching assistance system |
CN110851599A (en) * | 2019-11-01 | 2020-02-28 | 中山大学 | Automatic scoring method and teaching and assisting system for Chinese composition |
CN113361511A (en) * | 2020-03-05 | 2021-09-07 | 顺丰科技有限公司 | Method, device and equipment for establishing correction model and computer readable storage medium |
CN111950240A (en) * | 2020-08-26 | 2020-11-17 | 北京高途云集教育科技有限公司 | Data correction method, device and system |
CN112149680A (en) * | 2020-09-28 | 2020-12-29 | 武汉悦学帮网络技术有限公司 | Wrong word detection and identification method and device, electronic equipment and storage medium |
CN112149680B (en) * | 2020-09-28 | 2024-01-16 | 武汉悦学帮网络技术有限公司 | Method and device for detecting and identifying wrong words, electronic equipment and storage medium |
CN113536743A (en) * | 2020-11-06 | 2021-10-22 | 腾讯科技(深圳)有限公司 | Text processing method and related device |
CN112597754A (en) * | 2020-12-23 | 2021-04-02 | 北京百度网讯科技有限公司 | Text error correction method and device, electronic equipment and readable storage medium |
CN112597754B (en) * | 2020-12-23 | 2023-11-21 | 北京百度网讯科技有限公司 | Text error correction method, apparatus, electronic device and readable storage medium |
CN112634689A (en) * | 2020-12-24 | 2021-04-09 | 广州奇大教育科技有限公司 | Application method of regular expression in automatic subjective question changing in computer teaching |
CN112528651A (en) * | 2021-02-08 | 2021-03-19 | 深圳市阿卡索资讯股份有限公司 | Intelligent correction method, system, electronic equipment and storage medium |
CN113836894A (en) * | 2021-09-26 | 2021-12-24 | 武汉天喻信息产业股份有限公司 | Multidimensional English composition scoring method and device and readable storage medium |
CN113836894B (en) * | 2021-09-26 | 2023-08-15 | 武汉天喻信息产业股份有限公司 | Multi-dimensional English composition scoring method and device and readable storage medium |
CN114489439A (en) * | 2022-01-20 | 2022-05-13 | 安徽淘云科技股份有限公司 | Article correcting method and related equipment thereof |
Also Published As
Publication number | Publication date |
---|---|
CN108595410B (en) | 2023-03-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108595410A (en) | The automatic of hand-written composition corrects method and device | |
CN110489760B (en) | Text automatic correction method and device based on deep neural network | |
CN110188202B (en) | Training method and device of semantic relation recognition model and terminal | |
US5784489A (en) | Apparatus and method for syntactic signal analysis | |
KR102199835B1 (en) | System for correcting language and method thereof, and method for learning language correction model | |
CN113657098B (en) | Text error correction method, device, equipment and storage medium | |
KR102143745B1 (en) | Method and system for error correction of korean using vector based on syllable | |
CN114282527A (en) | Multi-language text detection and correction method, system, electronic device and storage medium | |
CN112560506B (en) | Text semantic analysis method, device, terminal equipment and storage medium | |
CN112287100A (en) | Text recognition method, spelling error correction method and voice recognition method | |
CN111177375A (en) | Electronic document classification method and device | |
CN115862045A (en) | Case automatic identification method, system, equipment and storage medium based on image-text identification technology | |
CN115100668A (en) | Method and device for identifying table information in image | |
CN114742039A (en) | Chinese spelling error correction method and system, storage medium and terminal | |
CN112818693A (en) | Automatic extraction method and system for electronic component model words | |
Boukharouba et al. | Recognition of handwritten Arabic literal amounts using a hybrid approach | |
CN116484842A (en) | Statement error correction method and device, electronic equipment and storage medium | |
Nguyen et al. | An in-depth analysis of OCR errors for unconstrained Vietnamese handwriting | |
KR100919497B1 (en) | Method and computer-readable recording medium for separating component parts of hangul in order to recognize the hangul | |
CN113033188B (en) | Tibetan grammar error correction method based on neural network | |
CN109086272B (en) | Sentence pattern recognition method and system | |
Vinitha | Error detection and correction in Indic OCRs | |
CN107977354A (en) | A kind of mixing language material segmenting method based on Bi-LSTM-CNN | |
KR102236639B1 (en) | Method and system for error correction of korean using vector based on syllable | |
CN115204151A (en) | Chinese text error correction method, system and readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right |
Effective date of registration: 20230619 Address after: 6001, 6th Floor, No.1 Kaifeng Road, Shangdi Information Industry Base, Haidian District, Beijing, 100085 Patentee after: Beijing Baige Feichi Technology Co.,Ltd. Address before: 100085 4001, 4th floor, No.1 Kaifa Road, Shangdi Information Industry base, Haidian District, Beijing Patentee before: XIAOCHUANCHUHAI EDUCATION TECHNOLOGY (BEIJING) CO.,LTD. |
|
TR01 | Transfer of patent right |