CN108595410A - The automatic of hand-written composition corrects method and device - Google Patents

The automatic of hand-written composition corrects method and device Download PDF

Info

Publication number
CN108595410A
CN108595410A CN201810223663.8A CN201810223663A CN108595410A CN 108595410 A CN108595410 A CN 108595410A CN 201810223663 A CN201810223663 A CN 201810223663A CN 108595410 A CN108595410 A CN 108595410A
Authority
CN
China
Prior art keywords
sentence
word
hand
text
content
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810223663.8A
Other languages
Chinese (zh)
Other versions
CN108595410B (en
Inventor
王岩
宋旸
张绍亮
袁景伟
黄宇飞
程童
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baige Feichi Technology Co ltd
Original Assignee
Boat Education Technology (beijing) Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Boat Education Technology (beijing) Co Ltd filed Critical Boat Education Technology (beijing) Co Ltd
Priority to CN201810223663.8A priority Critical patent/CN108595410B/en
Publication of CN108595410A publication Critical patent/CN108595410A/en
Application granted granted Critical
Publication of CN108595410B publication Critical patent/CN108595410B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/232Orthographic correction, e.g. spell checking or vowelisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Machine Translation (AREA)
  • Character Discrimination (AREA)

Abstract

The present invention proposes that a kind of the automatic of hand-written composition corrects method and device, and wherein method includes:Obtain hand-written composition image to be changed;Interference and the background line in hand-written composition image are removed using connected domain analysis algorithm and line detection algorithm;To treated, hand-written composition image obtains corresponding content of text into every trade cutting, the segmentation of words and identification;Sentence cutting is carried out to content of text, phrase and its type in the part of speech of each word, shallow syntactic analysis acquisition sentence is obtained to the progress part of speech analysis of each sentence, and then choose the rule and policy or depth model of particular error type, detects specific syntax error;To each syntax error detected, the recognition result of resultant fault identification model and correct suggestion mode generation correct suggestion, to obtain entire chapter composition correct result;It based on composition content of text and corrects as a result, build convolutional neural networks, realizes the intelligent scoring to composition text.

Description

The automatic of hand-written composition corrects method and device
Technical field
The present invention relates to work correction technical fields more particularly to a kind of the automatic of hand-written composition to correct method and device.
Background technology
The current composition method of correcting is mainly used for correcting content of text automatically, after getting content of text, To operations such as the cutting of content of text progress sentence, syntactic analyses, each sentence is corresponding in acquisition content of text corrects suggestion, into And obtain that content of text is corresponding to correct result.However, current most of composition is mainly hand-written composition, hand-written composition exists It is the problems such as word adhesion, line tilt, low in the presence of accuracy rate is corrected if method is corrected in above-mentioned composition is applied to hand-written composition, Correct inefficient problem.
Invention content
The present invention is directed to solve at least some of the technical problems in related technologies.
For this purpose, first purpose of the present invention is to propose that a kind of the automatic of hand-written composition corrects method, it is existing for solving Have that correct accuracy rate in technology low, corrects inefficient problem.
Second object of the present invention is to propose that a kind of the automatic of hand-written composition corrects device.
Third object of the present invention is to propose that the automatic of another hand-written composition corrects device.
Fourth object of the present invention is to propose a kind of non-transitorycomputer readable storage medium.
The 5th purpose of the present invention is to propose a kind of computer program product.
In order to achieve the above object, first aspect present invention embodiment, which proposes a kind of the automatic of hand-written composition, corrects method, wrap It includes:
Obtain hand-written composition image to be changed;
The hand-written composition image is handled using connected domain analysis algorithm and line detection algorithm, described in removal Interference in hand-written composition image and background line, the hand-written composition image that obtains that treated;
To treated the hand-written composition image into every trade cutting and the segmentation of words, multiple word image blocks are obtained;
Multiple word image blocks are identified using preset word identification model, obtain the hand-written composition image pair The content of text answered;
The cutting of content of text progress sentence, syntactic analysis, sentence are corrected, the hand-written composition image is obtained and corresponds to Correct result.
Further, it is described to treated the hand-written composition image into every trade cutting and the segmentation of words, obtain more Before a word image block, further include:
Treated that hand-written composition image carries out word fractionation to described, by adjacent rows in the hand-written composition image it Between adhesion part disconnect.
Further, described that multiple word image blocks are identified using preset word identification model, described in acquisition Before the corresponding content of text of hand-written composition image, further include:
The word image block is normalized according to the word identification model corresponding preset format, is obtained Treated word image block.
Further, described that multiple word image blocks are identified using preset word identification model, described in acquisition Before the corresponding content of text of hand-written composition image, further include:
Obtain word identification sample;The word identification sample includes:Handwritten word image pattern and corresponding text This content;
The word identification sample is normalized according to the word identification model corresponding preset format, is obtained The word identification sample that takes that treated;
Treated that word identification sample is trained the word identification model according to described, obtains described preset Word identification model.
Further, treated described in the basis, and word identification sample is trained the word identification model, The preset word identification model is obtained, including:
The word identification sample is divided according to word length, obtains short word identification sample, middle word identification Sample and long word identification sample;
Use the short word identification sample, middle word identification sample and long word identification sample to the word successively Identification model is trained, and obtains the preset word identification model.
Further, described that the cutting of content of text progress sentence, syntactic analysis, sentence are corrected, obtain the hand Writing texts and pictures as it is corresponding correct result before, further include:
The content of text is inputted into preset error correcting model, obtains the content of text after error correction;The error correcting model by Multiple language model compositions;The language model is N gram language models;The value of N is positive integer.
Further, described that the content of text is inputted into preset error correcting model, obtain the content of text after error correction it Before, further include:
Obtain error correction training sample;The error correction training sample includes:After content of text sample and corresponding error correction Sample;
The error correction training sample is normalized according to the error correcting model corresponding preset format, at acquisition Error correction training sample after reason;
Treated that error correction training sample is trained the error correcting model according to described, obtains the preset error correction Training sample.
Further, described that the cutting of content of text progress sentence, syntactic analysis, sentence are corrected, obtain the hand Writing texts and pictures are corrected as corresponding as a result, including:
Sentence cutting is carried out to the content of text, obtains multiple sentences in the content of text;
Syntactic analysis is carried out to each sentence of the content of text, obtains the analysis result of each sentence;The analysis As a result include:The type of word, phrase, the part of speech of the word and the phrase that the sentence includes;
Corresponding wrong identification model is chosen to the sentence according to the analysis result of the sentence for each sentence It is identified, obtains the error message in the sentence;
By in the sentence error message input it is preset correct suggestion mode, obtain corresponding correct of the sentence and build View;
Suggestion is corrected according to each sentence is corresponding, the generation hand-written composition image is corresponding to correct result.
Further, described that corresponding wrong identification mould is chosen according to the analysis result of the sentence for each sentence The sentence is identified in type, obtains the error message in the sentence, including:
For each sentence, the corresponding contextual information of each word in the sentence is obtained;
According to the corresponding contextual information of each word, the corresponding matrix-vector of each word is determined;
According to the part of speech of each word, wrong identification model corresponding with the part of speech is chosen;
By the corresponding matrix-vector of each word in the sentence, the corresponding wrong identification model of input part of speech obtains institute State the error message in sentence.
Further, described by the corresponding matrix-vector of each word in the sentence, the corresponding mistake of input part of speech is known Other model further includes before obtaining the error message in the sentence:
Obtain wrong identification training sample corresponding with each part of speech;The wrong identification training sample includes:Have The corresponding matrix-vector of word of the part of speech and the corresponding error message of the word;
For the corresponding wrong identification model of each part of speech, using the corresponding wrong identification training sample of the part of speech to institute Wrong identification model is stated to be trained.
Further, described that sentence cutting is carried out to the content of text, multiple sentences in the content of text are obtained, Including:
Obtain the corresponding type of the content of text;The type is used to identify the accurate of the content of text sentence division Degree;
Obtain feature to be extracted corresponding with the type;
According to the feature to be extracted, feature extraction is carried out to the content of text, obtains cutting in the content of text Divide characteristic information;
Sentence cutting is carried out to the content of text according to the cutting characteristic information, is obtained more in the content of text A sentence.
Further, described that corresponding wrong identification mould is chosen according to the analysis result of the sentence for each sentence The sentence is identified in type, before obtaining the error message in the sentence, further includes:
For each sentence, the sentence is compared with preset error pattern library, obtains the mistake in the sentence False information;The error pattern library includes:The corresponding regular expression of a variety of error patterns.
Further, each sentence of the basis is corresponding corrects suggestion, generates hand-written corresponding batch of the composition image Change as a result, including:
For each sentence, the sentence it is corresponding to correct suggestion be multiple when, the sentence is multiple batches corresponding Reconstruction view input is preset to correct preference pattern, and acquisition is corresponding with the sentence individually to correct suggestion;
Suggestion is individually corrected according to each sentence is corresponding, the generation hand-written composition image is corresponding to correct result.
Further, the error message input by the sentence is preset corrects suggestion mode, obtains the sentence Son it is corresponding correct suggestion before, further include:
Training sample is corrected in acquisition;The training sample of correcting includes:Error message in sentence sample, sentence sample And sentence sample is corresponding corrects suggestion;
It corrects training sample according to described and is trained to correcting suggestion mode, obtain described preset correcting suggestion mould Type.
Further, described that the cutting of content of text progress sentence, syntactic analysis, sentence are corrected, obtain the hand Writing texts and pictures as it is corresponding correct result after, further include:
Obtain the characteristic information in the content of text;The characteristic information include in following information any one or It is a variety of:Lexical information, syntactic information, sentence information correct information;
It is lexical information either syntactic information or when sentence information in the characteristic information, the characteristic information is inputted Corresponding Rating Model obtains the corresponding scoring of the characteristic information;
Include in the characteristic information:Lexical information, syntactic information, sentence information and when correcting information, by the feature The preset comprehensive grade model of information input obtains the corresponding comprehensive score of the hand-written composition image.
The automatic of the hand-written composition of the embodiment of the present invention corrects method, by obtaining hand-written composition image to be changed;It adopts Interference and the background line in hand-written composition image are removed with connected domain analysis algorithm and line detection algorithm;To treated Hand-written composition image obtains corresponding content of text into every trade cutting, the segmentation of words and identification;Sentence is carried out to content of text Cutting obtains the progress part of speech analysis of each sentence phrase and its class in the part of speech of each word, shallow syntactic analysis acquisition sentence Type, and then the rule and policy or depth model of particular error type are chosen, detect specific syntax error;To each of detecting Syntax error, the recognition result of resultant fault identification model correct suggestion with suggestion mode generation is corrected, and make to obtain entire chapter Text corrects result;It based on composition content of text and corrects as a result, build convolutional neural networks, realizes the intelligence to composition text It can score, to improve the accuracy corrected and correct efficiency.
In order to achieve the above object, second aspect of the present invention embodiment, which proposes a kind of the automatic of hand-written composition, corrects device, wrap It includes:
Acquisition module, for obtaining hand-written composition image to be changed;
Processing module, for being carried out to the hand-written composition image using connected domain analysis algorithm and line detection algorithm Processing, the interference in the removal hand-written composition image and background line, the hand-written composition image that obtains that treated;
Cutting module, for, into every trade cutting and the segmentation of words, being obtained more to treated the hand-written composition image A word image block;
Identification module, for multiple word image blocks to be identified using preset word identification model, described in acquisition The corresponding content of text of hand-written composition image;
Module is corrected, for being corrected to the cutting of content of text progress sentence, syntactic analysis, sentence, obtains the hand Writing texts and pictures correct result as corresponding.
Further, the module of correcting includes:
Cutting unit obtains multiple sentences in the content of text for carrying out sentence cutting to the content of text;
Analytic unit carries out syntactic analysis for each sentence to the content of text, obtains the analysis of each sentence As a result;The analysis result includes:Word, phrase, the part of speech of the word and the phrase that the sentence includes Type;
Recognition unit, for choosing corresponding wrong identification mould according to the analysis result of the sentence for each sentence The sentence is identified in type, obtains the error message in the sentence;
Input unit, for by the sentence error message input it is preset correct suggestion mode, obtain the sentence Son is corresponding to correct suggestion;
Generation unit generates hand-written corresponding batch of the composition image for correcting suggestion according to each sentence is corresponding Change result.
Further, the recognition unit is specifically used for,
For each sentence, the corresponding contextual information of each word in the sentence is obtained;
According to the corresponding contextual information of each word, the corresponding matrix-vector of each word is determined;
According to the part of speech of each word, wrong identification model corresponding with the part of speech is chosen;
By the corresponding matrix-vector of each word in the sentence, the corresponding wrong identification model of input part of speech obtains institute State the error message in sentence.
The automatic of the hand-written composition of the embodiment of the present invention corrects device, by obtaining hand-written composition image to be changed;It adopts Interference and the background line in hand-written composition image are removed with connected domain analysis algorithm and line detection algorithm;To treated Hand-written composition image obtains corresponding content of text into every trade cutting, the segmentation of words and identification;Sentence is carried out to content of text Cutting obtains the progress part of speech analysis of each sentence phrase and its class in the part of speech of each word, shallow syntactic analysis acquisition sentence Type, and then the rule and policy or depth model of particular error type are chosen, detect specific syntax error;To each of detecting Syntax error, the recognition result of resultant fault identification model correct suggestion with suggestion mode generation is corrected, and make to obtain entire chapter Text corrects result;It based on composition content of text and corrects as a result, build convolutional neural networks, realizes the intelligence to composition text It can score, to improve the accuracy corrected and correct efficiency.
In order to achieve the above object, third aspect present invention embodiment, which proposes the automatic of another hand-written composition, corrects device, Including:Memory, processor and storage are on a memory and the computer program that can run on a processor, which is characterized in that Realize that the automatic of hand-written composition as described above corrects method when the processor executes described program.
To achieve the goals above, fourth aspect present invention embodiment proposes a kind of computer-readable storage of non-transitory Medium is stored thereon with computer program, automatic batch that hand-written composition as described above is realized when which is executed by processor Change method.
To achieve the goals above, fifth aspect present invention embodiment proposes a kind of computer program product, when described When instruction processing unit in computer program product executes, a kind of the automatic of hand-written composition of execution corrects method, the method packet It includes:
Obtain hand-written composition image to be changed;
The hand-written composition image is handled using connected domain analysis algorithm and line detection algorithm, described in removal Interference in hand-written composition image and background line, the hand-written composition image that obtains that treated;
To treated the hand-written composition image into every trade cutting and the segmentation of words, multiple word image blocks are obtained;
Multiple word image blocks are identified using preset word identification model, obtain the hand-written composition image pair The content of text answered;
The cutting of content of text progress sentence, syntactic analysis, sentence are corrected, the hand-written composition image is obtained and corresponds to Correct result.
The additional aspect of the present invention and advantage will be set forth in part in the description, and will partly become from the following description Obviously, or practice through the invention is recognized.
Description of the drawings
Above-mentioned and/or additional aspect and advantage of the invention will become from the following description of the accompanying drawings of embodiments Obviously and it is readily appreciated that, wherein:
Fig. 1 is a kind of automatic flow diagram for correcting method of hand-written composition provided in an embodiment of the present invention;
Fig. 2 is the automatic flow diagram for correcting method of the hand-written composition of another kind provided in an embodiment of the present invention;
Fig. 3 is a kind of automatic structural schematic diagram for correcting device of hand-written composition provided in an embodiment of the present invention;
Fig. 4 is the automatic structural schematic diagram for correcting device of the hand-written composition of another kind provided in an embodiment of the present invention;
Fig. 5 is the automatic structural schematic diagram for correcting device of the hand-written composition of another kind provided in an embodiment of the present invention.
Specific implementation mode
The embodiment of the present invention is described below in detail, examples of the embodiments are shown in the accompanying drawings, wherein from beginning to end Same or similar label indicates same or similar element or element with the same or similar functions.Below with reference to attached The embodiment of figure description is exemplary, it is intended to for explaining the present invention, and is not considered as limiting the invention.
Below with reference to the accompanying drawings it describes the automatic of the hand-written composition of the embodiment of the present invention and corrects method and device.
Fig. 1 is a kind of automatic flow diagram for correcting method of hand-written composition provided in an embodiment of the present invention.Such as Fig. 1 institutes Show, the automatic method of correcting of the hand-written composition includes the following steps:
S101, hand-written composition image to be changed is obtained.
The automatic executive agent for correcting method of hand-written composition provided by the invention is that the automatic of hand-written composition corrects device, The automatic device of correcting of hand-written composition can be hardware device, such as terminal device, server, server cluster etc., or hard The software etc. installed in part equipment.Hand-written composition image in the present embodiment can be that opponent's writing text is taken pictures or swept The electronic image obtained after retouching.
S102, it is handled using connected domain analysis algorithm and line detection algorithm opponent writing texts and pictures picture, removes hand The interference in texts and pictures picture and background line are write, the hand-written composition image that obtains that treated.
In the present embodiment, hand-written composition it is automatic correct device can first opponent write texts and pictures picture and carry out binary conversion treatment, Then it uses connected domain analysis algorithm and line detection algorithm opponent to write texts and pictures picture to handle, removes hand-written composition image In interference and background line, the hand-written composition image that obtains that treated.
Wherein, opponent writes the binary conversion treatment of texts and pictures picture, refers to the face according to each pixel in hand-written composition image Color determines the gray value of each pixel, and the gray value of each pixel is compared with threshold grey scale value, will according to comparison result The gray value of pixel on hand-written composition image is set as 0 or 255, that is, whole image is showed significantly only black With white visual effect.
In the present embodiment, texts and pictures picture is write using connected domain analysis algorithm opponent and is handled, texts and pictures can be made by hand-written Broken line as in certain area is connected to;Certain area is, for example, the region etc. where word.Using connected domain point Analysis algorithm opponent write texts and pictures picture handle, additionally it is possible to remove it is hand-written composition image in interference, such as opponent write text into Part when row is taken pictures highlights situation etc..
In the present embodiment, texts and pictures picture is write using line detection algorithm opponent and is handled, can detect and remove hand-written Background line, such as notebook horizontal line, four lines, three lattice etc. in composition image.Since in hand-written composition, the stroke of word is frequent Adhesion or word are carried out with background line to be embedded into background line, after the segmentation of words, the word image block that causes cutting to obtain Include part background line, influence the recognition accuracy of word, therefore, is detected using line detection algorithm and remove hand-written composition Background line in image, can improve the recognition accuracy of word, improve the accuracy corrected.
S103, to treated, hand-written composition image obtains multiple word image blocks into every trade cutting and the segmentation of words.
Further, due in hand-written composition, there may be adhesion situation between adjacent rows word, in order to avoid There will be two row words of adhesion situation during row cutting as a line, and before step 103, the method can also wrap It includes:To treated, hand-written composition image carries out word fractionation, by the adhesion part between adjacent rows in hand-written composition image Disconnect, to cutting of being expert at during, can by the two row segmentation of words that there is adhesion situation before be two rows, improve row cuts The accuracy divided, and then improve the recognition accuracy of word.
S104, multiple word image blocks are identified using preset word identification model, obtain hand-written composition image Corresponding content of text.
In the present embodiment, preset word identification model for example can be time recurrent neural network (Long Short- Term Memory, LSTM).The automatic process for correcting device execution step 104 of hand-written composition is specifically as follows, using default Word identification model multiple word image blocks are identified, the corresponding word of word image block is obtained, by word image block Corresponding word is integrated, and the corresponding content of text of hand-written composition image is obtained.
It, can be to word image block and word in order to improve the recognition accuracy of word identification model in the present embodiment It identifies sample, is normalized according to the corresponding preset format of word identification model, therefore, on the basis of above-described embodiment On, before step 104, the method can also include:According to the corresponding preset format of word identification model to word image Block is normalized, and obtains treated word image block.
Corresponding, before step 104, the training process to word identification model may include:Obtain word identification sample This;Word identification sample includes:Handwritten word image pattern and corresponding content of text;It is corresponded to according to word identification model Preset format word identification sample is normalized, obtain treated word identification sample;According to treated Word identification sample is trained word identification model, obtains preset word identification model.
Wherein, to the normalized of word image block, such as can be, by tune such as the brightness of word image block, colors The whole format for required by word identification model.
In the present embodiment, during being trained to word identification model, in order to improve the instruction of word identification model Practice effect, word identification sample can be divided according to word length, obtain short word identification sample, middle word identification sample Sheet and long word identification sample;Short word identification sample, middle word identification sample and long word identification sample are used successively Word identification model is trained, preset word identification model is obtained.Wherein, handwritten word in short word identification sample Length is less than the length of handwritten word in middle word identification sample;The length of handwritten word is less than long single in middle word identification sample Word identifies the length of handwritten word in sample.
Further, it due to during the segmentation of words, there is cutting and deficient cutting situation, crosses cutting and refers to It is two or more words by a segmentation of words;Deficient cutting refers to that by two or more segmentation of words be a list Word, the word obtained so as to cause word identification Model Identification may be what multiple words formed, or be a part for word, It therefore,, can will after identification obtains the corresponding word of word image block in step 104 in order to improve the accuracy of word identification Word is compared with the word in default dictionary, obtains word in the dictionary with word matched, in the word is dictionary When a part for word, corresponding second word of word image block before or after word image block is obtained, described in judgement Whether word and second single contamination are word in the dictionary, and then carry out group to crossing the word split according to judging result It closes, and the word to owing to split carries out fractionation etc. again.
Further, since the recognition accuracy of word identification model is not the absolutely identification of word identification model As a result there may be mistake, in order to improve the recognition accuracy of hand-written composition image, after step 104, the method may be used also To include:Content of text is inputted into preset error correcting model, obtains the content of text after error correction;Error correcting model is by multiple language moulds Type forms;Language model is N gram language models;The value of N is positive integer.
Corresponding, the training process of error correcting model is specifically as follows, and obtains error correction training sample;It is wrapped in error correction training sample It includes:Sample after content of text sample and corresponding error correction;According to the corresponding preset format of error correcting model to error correction training sample It is normalized, obtains treated error correction training sample;According to treated error correction training sample to error correcting model into Row training, obtains preset error correction training sample.
Wherein, error correcting model is specifically as follows the processing procedure of content of text, obtains the candidate word in content of text, obtains The likeness in form word for taking candidate word also regard the likeness in form word as candidate word;Normalized is done to candidate word, such as candidate word Capital and small letter, full half-angle, single plural form, digital punctuate, the punctuate being inserted into before and after space, candidate word between candidate word etc. into Row normalized;Using in content of text candidate word and its likeness in form word as a row candidate word, calculate between each row candidate word Transition probability, obtain the candidate word that corresponding transition probability is more than the first transition probability threshold value, in conjunction with optimum route search calculate Method, obtains the most suitable word in each row candidate word, and integration obtains content of text.
It should be noted that during the transition probability between calculating each row candidate word, binary language can be first used It says that model (bigram) calculates the transition probability between adjacent two row candidate word, is less than the second transition probability threshold in the transition probability When value, corresponding candidate word is deleted;When the transition probability is more than the second transition probability threshold value, three gram language models are no longer used (trigram) either polynary language model calculates the transition probability between continuous three row or multiple row candidate word, turns to reduce The calculation amount of probability is moved, processing speed of the error correcting model to content of text is improved.
S105, the cutting of content of text progress sentence, syntactic analysis, sentence are corrected, it is corresponding obtains hand-written composition image Correct result.
The automatic of the hand-written composition of the embodiment of the present invention corrects method, by obtaining hand-written composition image to be changed;It adopts Texts and pictures picture is write with connected domain analysis algorithm and line detection algorithm opponent to handle, and is removed dry in hand-written composition image It disturbs and background line, the hand-written composition image that obtains that treated;To treated, hand-written composition image is into every trade cutting and list Word segmentation obtains multiple word image blocks;Multiple word image blocks are identified using preset word identification model, are obtained The corresponding content of text of hand-written composition image;The cutting of content of text progress sentence, syntactic analysis, sentence are corrected, obtained hand-written Composition image is corresponding to be corrected as a result, so as to before opponent's writing texts and pictures picture is corrected, and hand-written composition image is removed In interference and background line, opponent write texts and pictures picture be identified, obtain corresponding content of text, then to content of text into Row is corrected, to improve the accuracy corrected and correct efficiency.
Fig. 2 is the automatic flow diagram for correcting method of the hand-written composition of another kind provided in an embodiment of the present invention.Such as Fig. 2 Shown, on the basis of embodiment shown in Fig. 1, step 105 can specifically include following steps:
S1051, sentence cutting is carried out to content of text, obtains multiple sentences in content of text.
In the present embodiment, the automatic process for correcting device execution step 1051 of hand-written composition is specifically as follows, and obtains text The corresponding type of this content;Type is used to identify the accuracy of content of text sentence division;It obtains corresponding with type to be extracted Feature;According to feature to be extracted, feature extraction is carried out to content of text, obtains the cutting characteristic information in content of text;According to Cutting characteristic information carries out sentence cutting to content of text, obtains multiple sentences in content of text.
In the present embodiment, the automatic device of correcting of hand-written composition can determine content of text pair using the method for machine learning The type answered.Wherein, type is specifically as follows hand-written composition image or common composition;The hand-written composition image of type refers to It is that text content is the content of text that opponent writes that texts and pictures picture is identified.
For example, in the case where type is hand-written composition image, during hand-written, there is mistake and accorded with using punctuate Number the case where, therefore, in order to improve sentence division accuracy, the corresponding feature to be extracted of type can be the lead-in of capitalization Mother etc. carries out sentence cutting to content of text according to these features, obtains multiple sentences in content of text.It is common in type In the case of composition, the corresponding feature to be extracted of type can be punctuation mark etc..
Further, in order to improve the accuracy corrected, before step 1051, the method can also include:To text This content carries out the operations such as pretreatment operation, such as coded treatment, checking treatment, filtering interference, normalized.
S1052, syntactic analysis is carried out to each sentence of content of text, obtains the analysis result of each sentence;Analysis knot Fruit includes:The type of word, phrase, the part of speech of word and phrase that sentence includes.
In the present embodiment, the automatic process for correcting device execution step 1052 of hand-written composition is specifically as follows, for text Each sentence in this content carries out word fractionation to the sentence, obtains the word in the sentence;Obtain the word Part of speech;The sentence is matched with preset phrase regular expression, obtains the phrase etc. in the sentence.In addition, hand The automatic of writing text corrects device and can also be adjusted to phrase regular expression according to matching result.
S1053, be directed to each sentence, according to the analysis result of sentence, choose corresponding wrong identification model to sentence into Row identification, obtains the error message in sentence.
In the present embodiment, prestored in preset model library each word, the part of speech of word, phrase or phrase type Corresponding wrong identification model, for word, the part of speech phrase of word or the type of phrase in sentence, interrogation model library, Corresponding wrong identification model is obtained, by the corresponding wrong identification model of the sentence inputting, obtains the error message in sentence. Wherein, error message for example can be that preposition uses mistake etc. using mistake, phrase using mistake, word.
Specifically, the automatic process for correcting device execution step 1053 of hand-written composition is specifically as follows, for each sentence Son obtains the corresponding contextual information of each word in sentence;According to the corresponding contextual information of each word, determine each single The corresponding matrix-vector of word;According to the part of speech of each word, wrong identification model corresponding with part of speech is chosen;It will be each in sentence The corresponding matrix-vector of word, the corresponding wrong identification model of input part of speech, obtains the error message in sentence.
Wherein, the corresponding contextual information of each word in sentence refers in sentence in the pre-determined distance of each word Other words information and sentence include word phrase information.For example, in sentence " I am interested in In something ", other words in the pre-determined distance of preposition " in " for example may include " am ", " interested ", “something”.The corresponding matrix-vector of each word for example can be Word2Vec models according to the corresponding context of word The matrix-vector that information generates.
Wherein, part of speech is such as noun, verb, preposition.In the present embodiment, by the corresponding matrix of each word in sentence to Amount, after inputting the corresponding wrong identification model of part of speech, wrong identification model can determine single according to the corresponding matrix-vector of word Word is the probability of each word, such as probability that the preposition " in " mentioned in above-mentioned example be " on ", the probability for being " at ", is The probability of " before ", the probability for being " after ", be " since " probability etc., in the highest preposition of corresponding probability and sentence When preposition " in " in son is different, the highest preposition of corresponding probability is determined as the error message in sentence;If corresponding general The highest preposition of rate is preposition " in ", then it represents that the preposition " in " in sentence uses correctly, without error message.
Further, in order to further increase the accuracy for obtaining error message, step 105 can also include:For every Sentence is compared a sentence with preset error pattern library, obtains the error message in sentence;Error pattern library Zhong Bao It includes:The corresponding regular expression of a variety of error patterns.
S1054, suggestion mode is corrected by the error message input in sentence is preset, obtain corresponding correct of sentence and build View.
In the present embodiment, after the error message in getting each sentence, can by the error message in sentence, or Suggestion mode is corrected by the sentence inputting including error message is preset, obtains that sentence is corresponding to correct suggestion.
Wherein, can be that training sample is corrected in acquisition, is corrected in training sample and is wrapped to the training process for correcting suggestion mode It includes:A large amount of sentence with error message and it is corresponding correct suggestion, initial correcting is built according to training sample is corrected View model is trained, and obtains described preset correcting suggestion mode.Wherein, it can be time recurrent neural to correct suggestion mode Network.
S1055, suggestion is corrected according to each sentence is corresponding, the hand-written composition image of generation is corresponding to correct result.
In the present embodiment, for each sentence, the error message obtained using wrong identification Model Identification is default with using The error message that compares of error pattern library there may be repeating, so as to cause for each sentence, corrected there are multiple It is recommended that therefore, hand-written composition it is automatic correct device and execute the process of step 1055 be specifically as follows, for each sentence, Sentence is corresponding when to correct suggestion be multiple, by sentence it is corresponding it is multiple correct that suggestion input is preset to correct preference pattern, obtain It takes and corresponding with sentence individually corrects suggestion;Suggestion is individually corrected according to each sentence is corresponding, generates hand-written composition image pair That answers corrects result.
In the present embodiment, correcting preference pattern, can be that sentence is corresponding correct suggestion and give a mark, by highest scoring Suggestion is corrected as sentence is corresponding and most probable corrects suggestion.
Further, on the basis of the above embodiments, texts and pictures picture can be write with opponent to score, therefore, step After 105, the method can also include:
Characteristic information in S106, acquisition content of text;Characteristic information include in following information any one or it is more Kind:Lexical information, syntactic information, sentence information correct information.
S107, in characteristic information it is lexical information either syntactic information or when sentence information, by characteristic information input pair The Rating Model answered obtains the corresponding scoring of characteristic information.
S108, include in characteristic information:Lexical information, syntactic information, sentence information and when correcting information, by characteristic information Preset comprehensive grade model is inputted, the corresponding comprehensive score of hand-written composition image is obtained.
For example, in the case where characteristic information includes lexical information, lexical information is inputted into corresponding vocabulary scoring mould Type obtains vocabulary scoring;In the case where characteristic information includes syntactic information, syntactic information is inputted into corresponding grammer and is scored Model obtains grammer scoring;When characteristic information includes sentence information, such as the structure of sentence, length etc., by sentence information Corresponding sentence Rating Model is inputted, sentence scoring is obtained.
In the present embodiment, each word in content of text can be indicated with corresponding unique vector, such as use one-hot Vector indicates.Wherein, the number of dimensions of one-hot vectors is the total quantity of all words, the corresponding one-hot of each word to Amount is only 1 value in corresponding dimension, is 0 value in other dimensions.For example, the total quantity in all words is 5000, content of text In in the case that the first word is the 1000th word, the number of dimensions of the corresponding one-hot vectors of the first word is 5000, the The value of the 1000th dimension is 1 in the corresponding one-hot vectors of one word, is 0 value in other dimensions.In the present embodiment, in text Lexical information in appearance can specifically be indicated with the corresponding unique vector of each vocabulary in content of text, that is to say, that vocabulary is commented The input of sub-model can be the corresponding vector set of content of text.The corresponding vector set of content of text is referred to text The vector set that each vocabulary obtains after being replaced with corresponding unique vector in content.
Corresponding, the training process of vocabulary Rating Model can be to obtain the corresponding vector set of composition sample, Yi Jizuo The corresponding vocabulary scoring of literary sample;The corresponding vector set of the sample that will write a composition, and the corresponding vocabulary scoring input of composition sample Vocabulary Rating Model is trained vocabulary Rating Model.
In the present embodiment, the syntactic information in content of text and sentence information can also use the corresponding vector of content of text Set expression.Corresponding, the training process of grammer Rating Model is specifically as follows, and obtains the corresponding vector set of composition sample, And the corresponding grammer scoring of composition sample;The corresponding vector set of the sample that will write a composition, and the corresponding grammer of composition sample are commented Divide input syntax Rating Model, grammer Rating Model is trained.The training process of sentence Rating Model is specifically as follows, and obtains It is taken as the corresponding vector set of literary sample, and the corresponding sentence scoring of composition sample;The corresponding vector set of the sample that will write a composition, And the corresponding sentence scoring input sentence Rating Model of composition sample, sentence Rating Model is trained.
In another example in the case where characteristic information includes lexical information, syntactic information, sentence information and corrects information, Characteristic information can successively be carried out to following operate:Vector after vectorization is inputted convolutional neural networks by vectorization operation CNN, attention machine is inputted by output input time recurrent neural network LSTM of convolutional neural networks CNN, by the output of LSTM The output of attention processed, attention mechanism are the corresponding comprehensive score of hand-written composition image.
In the present embodiment, vocabulary Rating Model, grammer Rating Model, sentence Rating Model, convolutional neural networks CNN, when Between recurrent neural network LSTM, attention mechanism attention can be trained according to corresponding training sample, herein no longer It elaborates.
In the present embodiment, the error number of type of error and each type of error that information refers in content of text is corrected Amount.Lexical information, syntactic information and sentence information can use the corresponding vectorial set expression of content of text.Corresponding, synthesis is commented The training process of sub-model is specifically as follows, and obtains the corresponding vector set of the composition sample, type of error in composition sample, every Number of errors, the corresponding comprehensive score of composition sample of kind type of error;The corresponding vector set of the sample that will write a composition, composition sample In type of error, each type of error number of errors, the corresponding comprehensive score of composition sample input comprehensive grade model, it is right Comprehensive grade model is trained.
The automatic of the hand-written composition of the embodiment of the present invention corrects method, by obtaining hand-written composition image to be changed;It adopts Texts and pictures picture is write with connected domain analysis algorithm and line detection algorithm opponent to handle, and is removed dry in hand-written composition image It disturbs and background line, the hand-written composition image that obtains that treated;To treated, hand-written composition image is into every trade cutting and list Word segmentation obtains multiple word image blocks;Multiple word image blocks are identified using preset word identification model, are obtained The corresponding content of text of hand-written composition image;Sentence cutting is carried out to content of text, carrying out part of speech analysis to each sentence obtains The part of speech of each word, shallow syntactic analysis obtain phrase and its type in sentence, and then choose the regular plan of particular error type Summary or depth model, detect specific syntax error;To each syntax error detected, the identification of resultant fault identification model As a result and correct suggestion mode generation correct suggestion, to obtain entire chapter composition correct result;Based on composition content of text with And correct the intelligent scoring realized as a result, build convolutional neural networks to text of writing a composition, to improve the accuracy corrected and Correct efficiency.
Fig. 3 is a kind of automatic structural schematic diagram for correcting device of hand-written composition provided in an embodiment of the present invention.Such as Fig. 3 institutes Show, including:Acquisition module 31, processing module 32, cutting module 33, identification module 34 and correct module 35.
Wherein, acquisition module 31, for obtaining hand-written composition image to be changed;
Processing module 32, for using connected domain analysis algorithm and line detection algorithm to the hand-written composition image into Row processing, the interference in the removal hand-written composition image and background line, the hand-written composition image that obtains that treated;
Cutting module 33, for, into every trade cutting and the segmentation of words, being obtained to treated the hand-written composition image Multiple word image blocks;
Identification module 34 obtains institute for multiple word image blocks to be identified using preset word identification model State the corresponding content of text of hand-written composition image;
Module 35 is corrected, for being corrected to the cutting of content of text progress sentence, syntactic analysis, sentence, described in acquisition Hand-written composition image is corresponding to correct result.
The automatic device of correcting of hand-written composition provided by the invention can be hardware device, for example, terminal device, server, The software etc. installed on server cluster etc. or hardware device.Hand-written composition image in the present embodiment, can write for opponent The electronic image that composition is taken pictures or obtained after being scanned.
In the present embodiment, hand-written composition it is automatic correct device can first opponent write texts and pictures picture and carry out binary conversion treatment, Then it uses connected domain analysis algorithm and line detection algorithm opponent to write texts and pictures picture to handle, removes hand-written composition image In interference and background line, the hand-written composition image that obtains that treated.
In the present embodiment, texts and pictures picture is write using connected domain analysis algorithm opponent and is handled, texts and pictures can be made by hand-written Broken line as in certain area is connected to;Certain area is, for example, the region etc. where word.Using connected domain point Analysis algorithm opponent write texts and pictures picture handle, additionally it is possible to remove it is hand-written composition image in interference, such as opponent write text into Part when row is taken pictures highlights situation etc..
In the present embodiment, texts and pictures picture is write using line detection algorithm opponent and is handled, can detect and remove hand-written Background line, such as notebook horizontal line, four lines, three lattice etc. in composition image.Since in hand-written composition, the stroke of word is frequent Adhesion or word are carried out with background line to be embedded into background line, after the segmentation of words, the word image block that causes cutting to obtain Include part background line, influence the recognition accuracy of word, therefore, is detected using line detection algorithm and remove hand-written composition Background line in image, can improve the recognition accuracy of word, improve the accuracy corrected.
Further, due in hand-written composition, there may be adhesion situation between adjacent rows word, in order to avoid There will be two row words of adhesion situation during row cutting as a line, and the device can also include:Module is split, For to treated hand-written composition image, into before every trade cutting, to treated, hand-written composition image carries out word and tears open Point, the adhesion part in hand-written composition image between adjacent rows is disconnected, to cutting of being expert at during, can will before It is two rows there are the two row segmentation of words of adhesion situation, improves the accuracy of row cutting, and then improve the recognition accuracy of word.
It, can be to word image block and word in order to improve the recognition accuracy of word identification model in the present embodiment It identifies sample, is normalized according to the corresponding preset format of word identification model, therefore, on the basis of above-described embodiment On, the device can also include:Normalized module is used for according to the corresponding preset format of word identification model to list Word image block is normalized, and obtains treated word image block.
Corresponding, the device can also include:Training module, for obtaining word identification sample;Word identification sample This includes:Handwritten word image pattern and corresponding content of text;According to the corresponding preset format pair of word identification model Word identification sample is normalized, and obtains treated word identification sample;According to treated word identification sample Word identification model is trained, preset word identification model is obtained.
Wherein, to the normalized of word image block, such as can be, by tune such as the brightness of word image block, colors The whole format for required by word identification model.
In the present embodiment, during being trained to word identification model, in order to improve the instruction of word identification model Practice effect, word identification sample can be divided according to word length, obtain short word identification sample, middle word identification sample Sheet and long word identification sample;Short word identification sample, middle word identification sample and long word identification sample are used successively Word identification model is trained, preset word identification model is obtained.Wherein, handwritten word in short word identification sample Length is less than the length of handwritten word in middle word identification sample;The length of handwritten word is less than long single in middle word identification sample Word identifies the length of handwritten word in sample.
Further, it due to during the segmentation of words, there is cutting and deficient cutting situation, crosses cutting and refers to It is two or more words by a segmentation of words;Deficient cutting refers to that by two or more segmentation of words be a list Word, the word obtained so as to cause word identification Model Identification may be what multiple words formed, or be a part for word, Therefore, in order to improve the accuracy of word identification, the automatic of the hand-written composition corrects device and identifies to obtain word image block After corresponding word, word can be compared with the word in default dictionary, obtains word in the dictionary with word matched, When the word is a part for word in dictionary, the word image block before or after obtaining word image block is corresponding Second word judges whether the word and second single contamination are word in the dictionary, and then according to judging result pair It crosses the word split to be combined, and the word to owing to split carries out fractionation etc. again.
Further, since the recognition accuracy of word identification model is not the absolutely identification of word identification model As a result there may be mistakes, and in order to improve the recognition accuracy of hand-written composition image, the device can also include:Input mould Block obtains the content of text after error correction for content of text to be inputted preset error correcting model;Error correcting model is by multiple language moulds Type forms;Language model is N gram language models;The value of N is positive integer.
Corresponding, the training process of error correcting model is specifically as follows, and obtains error correction training sample;It is wrapped in error correction training sample It includes:Sample after content of text sample and corresponding error correction;According to the corresponding preset format of error correcting model to error correction training sample It is normalized, obtains treated error correction training sample;According to treated error correction training sample to error correcting model into Row training, obtains preset error correction training sample.
Wherein, error correcting model is specifically as follows the processing procedure of content of text, obtains the candidate word in content of text, obtains The likeness in form word for taking candidate word also regard the likeness in form word as candidate word;Normalized is done to candidate word, such as candidate word Capital and small letter, full half-angle, single plural form, digital punctuate, the punctuate being inserted into before and after space, candidate word between candidate word etc. into Row normalized;Using in content of text candidate word and its likeness in form word as a row candidate word, calculate between each row candidate word Transition probability, obtain the candidate word that corresponding transition probability is more than the first transition probability threshold value, in conjunction with optimum route search calculate Method, obtains the most suitable word in each row candidate word, and integration obtains content of text.
It should be noted that during the transition probability between calculating each row candidate word, binary language can be first used It says that model (bigram) calculates the transition probability between adjacent two row candidate word, is less than the second transition probability threshold in the transition probability When value, corresponding candidate word is deleted;When the transition probability is more than the second transition probability threshold value, three gram language models are no longer used (trigram) either polynary language model calculates the transition probability between continuous three row or multiple row candidate word, turns to reduce The calculation amount of probability is moved, processing speed of the error correcting model to content of text is improved.
The automatic of the hand-written composition of the embodiment of the present invention corrects device, by obtaining hand-written composition image to be changed;It adopts Texts and pictures picture is write with connected domain analysis algorithm and line detection algorithm opponent to handle, and is removed dry in hand-written composition image It disturbs and background line, the hand-written composition image that obtains that treated;To treated, hand-written composition image is into every trade cutting and list Word segmentation obtains multiple word image blocks;Multiple word image blocks are identified using preset word identification model, are obtained The corresponding content of text of hand-written composition image;The cutting of content of text progress sentence, syntactic analysis, sentence are corrected, obtained hand-written Composition image is corresponding to be corrected as a result, so as to before opponent's writing texts and pictures picture is corrected, and hand-written composition image is removed In interference and background line, opponent write texts and pictures picture be identified, obtain corresponding content of text, then to content of text into Row is corrected, to improve the accuracy corrected and correct efficiency.
Fig. 4 is the automatic structural schematic diagram for correcting device of the hand-written composition of another kind provided in an embodiment of the present invention.Such as Fig. 4 Shown, on the basis of embodiment shown in Fig. 3, the module 35 of correcting can specifically include:Cutting unit 351, analytic unit 352, recognition unit 353, input unit 354 and generation unit 355.
Wherein, cutting unit 351 is obtained for carrying out sentence cutting to the content of text in the content of text Multiple sentences;
Analytic unit 352 carries out syntactic analysis for each sentence to the content of text, obtains point of each sentence Analyse result;The analysis result includes:Word that the sentence includes, phrase, the part of speech of the word and described short The type of language;
Recognition unit 353, according to the analysis result of the sentence, is chosen corresponding mistake and is known for being directed to each sentence The sentence is identified in other model, obtains the error message in the sentence;
Input unit 354, for correcting suggestion mode by the error message input in the sentence is preset, described in acquisition Sentence is corresponding to correct suggestion;
It is corresponding to generate the hand-written composition image for correcting suggestion according to each sentence is corresponding for generation unit 355 Correct result.
In the present embodiment, cutting unit 351 specifically can be used for obtaining the corresponding type of content of text;Type is for marking Know the accuracy that content of text sentence divides;Obtain feature to be extracted corresponding with type;According to feature to be extracted, in text Hold and carry out feature extraction, obtains the cutting characteristic information in content of text;Sentence is carried out to content of text according to cutting characteristic information Sub- cutting obtains multiple sentences in content of text.
In the present embodiment, the automatic device of correcting of hand-written composition can determine content of text pair using the method for machine learning The type answered.Wherein, type is specifically as follows hand-written composition image or common composition;The hand-written composition image of type refers to It is that text content is the content of text that opponent writes that texts and pictures picture is identified.
For example, in the case where type is hand-written composition image, during hand-written, there is mistake and accorded with using punctuate Number the case where, therefore, in order to improve sentence division accuracy, the corresponding feature to be extracted of type can be the lead-in of capitalization Mother etc. carries out sentence cutting to content of text according to these features, obtains multiple sentences in content of text.It is common in type In the case of composition, the corresponding feature to be extracted of type can be punctuation mark etc..
Further, in order to improve the accuracy corrected, before cutting unit 351 carries out cutting to content of text, institute The device stated first can carry out pretreatment operation, such as coded treatment, checking treatment, filtering interference, normalization to content of text The operations such as processing.
In the present embodiment, analytic unit 352 specifically can be used for, for each sentence in content of text, to the sentence Son carries out word fractionation, obtains the word in the sentence;Obtain the part of speech of the word;By the sentence and preset phrase Regular expression is matched, and the phrase etc. in the sentence is obtained.In addition, hand-written composition it is automatic correct device can also root Phrase regular expression is adjusted according to matching result.
In the present embodiment, each word, the part of speech of word or the corresponding mistake of phrase are prestored in preset model library Identification model, for the part of speech or phrase of word, word in sentence, interrogation model library obtains corresponding wrong identification mould The corresponding wrong identification model of the sentence inputting is obtained the error message in sentence by type.Wherein, error message for example may be used Think, preposition uses mistake etc. using mistake, phrase using mistake, word.
Further, recognition unit 353 is specifically used for, for each sentence, obtain each word in sentence it is corresponding on Context information;According to the corresponding contextual information of each word, the corresponding matrix-vector of each word is determined;According to each word Part of speech, choose corresponding with part of speech wrong identification model;By the corresponding matrix-vector of each word in sentence, part of speech pair is inputted The wrong identification model answered obtains the error message in sentence.
Wherein, the corresponding contextual information of each word in sentence refers in sentence in the pre-determined distance of each word Other words information and sentence include word phrase information.For example, in sentence " I am interested in In something ", other words in the pre-determined distance of preposition " in " for example may include " am ", " interested ", “something”.The corresponding matrix-vector of each word for example can be Word2Vec models according to the corresponding context of word The matrix-vector that information generates.
Wherein, part of speech is such as noun, verb, preposition.In the present embodiment, by the corresponding matrix of each word in sentence to Amount, after inputting the corresponding wrong identification model of part of speech, wrong identification model can determine single according to the corresponding matrix-vector of word Word is the probability of each word, such as probability that the preposition " in " mentioned in above-mentioned example be " on ", the probability for being " at ", is The probability of " before ", the probability for being " after ", be " since " probability etc., in the highest preposition of corresponding probability and sentence When preposition " in " in son is different, the highest preposition of corresponding probability is determined as the error message in sentence;If corresponding general The highest preposition of rate is preposition " in ", then it represents that the preposition " in " in sentence uses correctly, without error message.
Further, on the basis of the above embodiments, correcting module 35 can also include:Comparing unit, for being directed to The sentence is compared with preset error pattern library, obtains the error message in the sentence by each sentence;The mistake Accidentally pattern base includes:The corresponding regular expression of a variety of error patterns.
In the present embodiment, after the error message in getting each sentence, can by the error message in sentence, or Suggestion mode is corrected by the sentence inputting including error message is preset, obtains that sentence is corresponding to correct suggestion.
Wherein, can be that training sample is corrected in acquisition, is corrected in training sample and is wrapped to the training process for correcting suggestion mode It includes:A large amount of sentence with error message and it is corresponding correct suggestion, initial correcting is built according to training sample is corrected View model is trained, and obtains described preset correcting suggestion mode.Wherein, it can be time recurrent neural to correct suggestion mode Network.
In the present embodiment, generation unit 355 specifically can be used for, and for each sentence, correct suggestion sentence is corresponding When being multiple, by sentence it is corresponding it is multiple correct that suggestion input is preset to correct preference pattern, obtain it is corresponding with sentence individually Correct suggestion;Suggestion is individually corrected according to each sentence is corresponding, the hand-written composition image of generation is corresponding to correct result.
In the present embodiment, correcting preference pattern, can be that sentence is corresponding correct suggestion and give a mark, by highest scoring Suggestion is corrected as sentence is corresponding and most probable corrects suggestion.
Further, on the basis of the above embodiments, texts and pictures picture can be write with opponent to score, it is therefore, described Device can also include:Grading module, for obtaining the characteristic information in content of text;Characteristic information includes in following information Any one or it is a variety of:Lexical information, syntactic information, sentence information correct information;Believe for vocabulary in characteristic information When ceasing either syntactic information or sentence information, characteristic information is inputted into corresponding Rating Model, it is corresponding to obtain characteristic information Scoring;Include in characteristic information:Lexical information, syntactic information, sentence information and when correcting information, characteristic information are inputted default Comprehensive grade model, obtain the corresponding comprehensive score of hand-written composition image.
For example, in the case where characteristic information includes lexical information, lexical information is inputted into corresponding vocabulary scoring mould Type obtains vocabulary scoring;In the case where characteristic information includes syntactic information, syntactic information is inputted into corresponding grammer and is scored Model obtains grammer scoring;When characteristic information includes sentence information, such as the structure of sentence, length etc., by sentence information Corresponding sentence Rating Model is inputted, sentence scoring is obtained.
In the present embodiment, each word in content of text can be indicated with corresponding unique vector, such as use one-hot Vector indicates.Wherein, the number of dimensions of one-hot vectors is the total quantity of all words, the corresponding one-hot of each word to Amount is only 1 value in corresponding dimension, is 0 value in other dimensions.For example, the total quantity in all words is 5000, content of text In in the case that the first word is the 1000th word, the number of dimensions of the corresponding one-hot vectors of the first word is 5000, the The value of the 1000th dimension is 1 in the corresponding one-hot vectors of one word, is 0 value in other dimensions.In the present embodiment, in text Lexical information in appearance can specifically be indicated with the corresponding unique vector of each vocabulary in content of text, that is to say, that vocabulary is commented The input of sub-model can be the corresponding vector set of content of text.The corresponding vector set of content of text is referred to text The vector set that each vocabulary obtains after being replaced with corresponding unique vector in content.
Corresponding, the training process of vocabulary Rating Model can be to obtain the corresponding vector set of composition sample, Yi Jizuo The corresponding vocabulary scoring of literary sample;The corresponding vector set of the sample that will write a composition, and the corresponding vocabulary scoring input of composition sample Vocabulary Rating Model is trained vocabulary Rating Model.
In the present embodiment, the syntactic information in content of text and sentence information can also use the corresponding vector of content of text Set expression.Corresponding, the training process of grammer Rating Model is specifically as follows, and obtains the corresponding vector set of composition sample, And the corresponding grammer scoring of composition sample;The corresponding vector set of the sample that will write a composition, and the corresponding grammer of composition sample are commented Divide input syntax Rating Model, grammer Rating Model is trained.The training process of sentence Rating Model is specifically as follows, and obtains It is taken as the corresponding vector set of literary sample, and the corresponding sentence scoring of composition sample;The corresponding vector set of the sample that will write a composition, And the corresponding sentence scoring input sentence Rating Model of composition sample, sentence Rating Model is trained.
In another example in the case where characteristic information includes lexical information, syntactic information, sentence information and corrects information, Characteristic information can successively be carried out to following operate:Vector after vectorization is inputted convolutional neural networks by vectorization operation CNN, attention machine is inputted by output input time recurrent neural network LSTM of convolutional neural networks CNN, by the output of LSTM The output of attention processed, attention mechanism are the corresponding comprehensive score of hand-written composition image.
In the present embodiment, vocabulary Rating Model, grammer Rating Model, sentence Rating Model, convolutional neural networks CNN, when Between recurrent neural network LSTM, attention mechanism attention can be trained according to corresponding training sample, herein no longer It elaborates.
In the present embodiment, the error number of type of error and each type of error that information refers in content of text is corrected Amount.Lexical information, syntactic information and sentence information can use the corresponding vectorial set expression of content of text.Corresponding, synthesis is commented The training process of sub-model is specifically as follows, and obtains the corresponding vector set of the composition sample, type of error in composition sample, every Number of errors, the corresponding comprehensive score of composition sample of kind type of error;The corresponding vector set of the sample that will write a composition, composition sample In type of error, each type of error number of errors, the corresponding comprehensive score of composition sample input comprehensive grade model, it is right Comprehensive grade model is trained.
The automatic of the hand-written composition of the embodiment of the present invention corrects device, by obtaining hand-written composition image to be changed;It adopts Texts and pictures picture is write with connected domain analysis algorithm and line detection algorithm opponent to handle, and is removed dry in hand-written composition image It disturbs and background line, the hand-written composition image that obtains that treated;To treated, hand-written composition image is into every trade cutting and list Word segmentation obtains multiple word image blocks;Multiple word image blocks are identified using preset word identification model, are obtained The corresponding content of text of hand-written composition image;Sentence cutting is carried out to content of text, carrying out part of speech analysis to each sentence obtains The part of speech of each word, shallow syntactic analysis obtain phrase and its type in sentence, and then choose the regular plan of particular error type Summary or depth model, detect specific syntax error;To each syntax error detected, the identification of resultant fault identification model As a result and correct suggestion mode generation correct suggestion, to obtain entire chapter composition correct result;Based on composition content of text with And correct the intelligent scoring realized as a result, build convolutional neural networks to text of writing a composition.
Fig. 5 is the automatic structural schematic diagram for correcting device of the hand-written composition of another kind provided in an embodiment of the present invention.The hand The automatic of writing text corrects device and includes:
Memory 1001, processor 1002 and it is stored in the calculating that can be run on memory 1001 and on processor 1002 Machine program.
Processor 1002 realizes that the automatic of the hand-written composition provided in above-described embodiment corrects method when executing described program.
Further, the automatic of hand-written composition corrects device and further includes:
Communication interface 1003, for the communication between memory 1001 and processor 1002.
Memory 1001, for storing the computer program that can be run on processor 1002.
Memory 1001 may include high-speed RAM memory, it is also possible to further include nonvolatile memory (non- Volatile memory), a for example, at least magnetic disk storage.
Processor 1002 realizes the automatic side of correcting of the hand-written composition described in above-described embodiment when for executing described program Method.
If memory 1001, processor 1002 and the independent realization of communication interface 1003, communication interface 1003, memory 1001 and processor 1002 can be connected with each other by bus and complete mutual communication.The bus can be industrial standard Architecture (Industry Standard Architecture, referred to as ISA) bus, external equipment interconnection (Peripheral Component, referred to as PCI) bus or extended industry-standard architecture (Extended Industry Standard Architecture, referred to as EISA) bus etc..The bus can be divided into address bus, data/address bus, control Bus processed etc..For ease of indicating, only indicated with a thick line in Fig. 5, it is not intended that an only bus or a type of Bus.
Optionally, in specific implementation, if memory 1001, processor 1002 and communication interface 1003, are integrated in one It is realized on block chip, then memory 1001, processor 1002 and communication interface 1003 can be completed mutual by internal interface Communication.
Processor 1002 may be a central processing unit (Central Processing Unit, referred to as CPU), or Person is specific integrated circuit (Application Specific Integrated Circuit, referred to as ASIC) or quilt It is configured to implement one or more integrated circuits of the embodiment of the present invention.
The present invention also provides a kind of non-transitorycomputer readable storage mediums, are stored thereon with computer program, the journey Realize that the automatic of hand-written composition as described above corrects method when sequence is executed by processor.
The present invention also provides a kind of computer program products, when the instruction processing unit in the computer program product executes When, a kind of the automatic of hand-written composition of execution corrects method, the method includes:
Obtain hand-written composition image to be changed;
The hand-written composition image is handled using connected domain analysis algorithm and line detection algorithm, described in removal Interference in hand-written composition image and background line, the hand-written composition image that obtains that treated;
To treated the hand-written composition image into every trade cutting and the segmentation of words, multiple word image blocks are obtained;
Multiple word image blocks are identified using preset word identification model, obtain the hand-written composition image pair The content of text answered;
The cutting of content of text progress sentence, syntactic analysis, sentence are corrected, the hand-written composition image is obtained and corresponds to Correct result.
In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show The description of example " or " some examples " etc. means specific features, structure, material or spy described in conjunction with this embodiment or example Point is included at least one embodiment or example of the invention.In the present specification, schematic expression of the above terms are not It must be directed to identical embodiment or example.Moreover, particular features, structures, materials, or characteristics described can be in office It can be combined in any suitable manner in one or more embodiments or example.In addition, without conflicting with each other, the skill of this field Art personnel can tie the feature of different embodiments or examples described in this specification and different embodiments or examples It closes and combines.
In addition, term " first ", " second " are used for description purposes only, it is not understood to indicate or imply relative importance Or implicitly indicate the quantity of indicated technical characteristic.Define " first " as a result, the feature of " second " can be expressed or Implicitly include at least one this feature.In the description of the present invention, the meaning of " plurality " is at least two, such as two, three It is a etc., unless otherwise specifically defined.
Any process described otherwise above or method description are construed as in flow chart or herein, and expression includes It is one or more for realizing custom logic function or process the step of executable instruction code module, segment or portion Point, and the range of the preferred embodiment of the present invention includes other realization, wherein can not press shown or discuss suitable Sequence, include according to involved function by it is basic simultaneously in the way of or in the opposite order, to execute function, this should be of the invention Embodiment person of ordinary skill in the field understood.
Expression or logic and/or step described otherwise above herein in flow charts, for example, being considered use In the order list for the executable instruction for realizing logic function, may be embodied in any computer-readable medium, for Instruction execution system, device or equipment (system of such as computer based system including processor or other can be held from instruction The instruction fetch of row system, device or equipment and the system executed instruction) it uses, or combine these instruction execution systems, device or set It is standby and use.For the purpose of this specification, " computer-readable medium " can any can be included, store, communicating, propagating or passing Defeated program is for instruction execution system, device or equipment or the dress used in conjunction with these instruction execution systems, device or equipment It sets.The more specific example (non-exhaustive list) of computer-readable medium includes following:Electricity with one or more wiring Interconnecting piece (electronic device), portable computer diskette box (magnetic device), random access memory (RAM), read-only memory (ROM), erasable edit read-only storage (EPROM or flash memory), fiber device and portable optic disk is read-only deposits Reservoir (CDROM).In addition, computer-readable medium can even is that the paper that can print described program on it or other are suitable Medium, because can be for example by carrying out optical scanner to paper or other media, then into edlin, interpretation or when necessary with it His suitable method is handled electronically to obtain described program, is then stored in computer storage.
It should be appreciated that each section of the present invention can be realized with hardware, software, firmware or combination thereof.Above-mentioned In embodiment, software that multiple steps or method can in memory and by suitable instruction execution system be executed with storage Or firmware is realized.Such as, if realized in another embodiment with hardware, following skill well known in the art can be used Any one of art or their combination are realized:With for data-signal realize logic function logic gates from Logic circuit is dissipated, the application-specific integrated circuit with suitable combinational logic gate circuit, programmable gate array (PGA), scene can compile Journey gate array (FPGA) etc..
Those skilled in the art are appreciated that realize all or part of step that above-described embodiment method carries Suddenly it is that relevant hardware can be instructed to complete by program, the program can be stored in a kind of computer-readable storage medium In matter, which includes the steps that one or a combination set of embodiment of the method when being executed.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing module, it can also That each unit physically exists alone, can also two or more units be integrated in a module.Above-mentioned integrated mould The form that hardware had both may be used in block is realized, can also be realized in the form of software function module.The integrated module is such as Fruit is realized in the form of software function module and when sold or used as an independent product, can also be stored in a computer In read/write memory medium.
Storage medium mentioned above can be read-only memory, disk or CD etc..Although having been shown and retouching above The embodiment of the present invention is stated, it is to be understood that above-described embodiment is exemplary, and should not be understood as the limit to the present invention System, those skilled in the art can be changed above-described embodiment, change, replace and become within the scope of the invention Type.

Claims (21)

1. a kind of the automatic of hand-written composition corrects method, which is characterized in that including:
Obtain hand-written composition image to be changed;
The hand-written composition image is handled using connected domain analysis algorithm and line detection algorithm, removal is described hand-written Interference and the background line write a composition in image, the hand-written composition image that obtains that treated;
To treated the hand-written composition image into every trade cutting and the segmentation of words, multiple word image blocks are obtained;
Multiple word image blocks are identified using preset word identification model, it is corresponding to obtain the hand-written composition image Content of text;
The cutting of content of text progress sentence, syntactic analysis, sentence are corrected, hand-written corresponding batch of the composition image is obtained Change result.
2. according to the method described in claim 1, it is characterized in that, it is described to treated the hand-written composition image into every trade Cutting and the segmentation of words further include before obtaining multiple word image blocks:
Treated that hand-written composition image carries out word fractionation to described, will be between adjacent rows in the hand-written composition image Adhesion part disconnects.
3. according to the method described in claim 1, it is characterized in that, described use preset word identification model to multiple words Image block is identified, and before obtaining the corresponding content of text of the hand-written composition image, further includes:
The word image block is normalized according to the word identification model corresponding preset format, acquisition processing Word image block afterwards.
4. method according to claim 1 or 3, which is characterized in that described to use preset word identification model to multiple Word image block is identified, and before obtaining the corresponding content of text of the hand-written composition image, further includes:
Obtain word identification sample;The word identification sample includes:In handwritten word image pattern and corresponding text Hold;
The word identification sample is normalized according to the word identification model corresponding preset format, at acquisition Word identification sample after reason;
Treated that word identification sample is trained the word identification model according to described, obtains the preset word Identification model.
5. according to the method described in claim 4, it is characterized in that, treated described in basis word identification sample is to institute It states word identification model to be trained, obtains the preset word identification model, including:
The word identification sample is divided according to word length, obtains short word identification sample, middle word identification sample And long word identification sample;
Use the short word identification sample, middle word identification sample and long word identification sample to the word identification successively Model is trained, and obtains the preset word identification model.
6. according to the method described in claim 1, it is characterized in that, described carry out sentence cutting, syntax to the content of text Analysis, sentence are corrected, and before obtaining that the hand-written composition image is corresponding and correcting result, further include:
The content of text is inputted into preset error correcting model, obtains the content of text after error correction;The error correcting model is by multiple Language model forms;The language model is N gram language models;The value of N is positive integer.
7. according to the method described in claim 6, it is characterized in that, described input preset error correction mould by the content of text Type, obtain error correction after content of text before, further include:
Obtain error correction training sample;The error correction training sample includes:Sample after content of text sample and corresponding error correction;
The error correction training sample is normalized according to the error correcting model corresponding preset format, after acquisition processing Error correction training sample;
Treated that error correction training sample is trained the error correcting model according to described, obtains the preset error correction training Sample.
8. according to the method described in claim 1, it is characterized in that, described carry out sentence cutting, syntax to the content of text Analysis, sentence are corrected, and the acquisition hand-written composition image is corresponding to be corrected as a result, including:
Sentence cutting is carried out to the content of text, obtains multiple sentences in the content of text;
Syntactic analysis is carried out to each sentence of the content of text, obtains the analysis result of each sentence;The analysis result Include:The type of word, phrase, the part of speech of the word and the phrase that the sentence includes;
It chooses corresponding wrong identification model according to the analysis result of the sentence for each sentence and the sentence is carried out Identification, obtains the error message in the sentence;
Suggestion mode is corrected by the error message input in the sentence is preset, obtains that the sentence is corresponding to correct suggestion;
Suggestion is corrected according to each sentence is corresponding, the generation hand-written composition image is corresponding to correct result.
9. according to the method described in claim 8, it is characterized in that, described be directed to each sentence, according to the analysis of the sentence The sentence is identified as a result, choosing corresponding wrong identification model, obtains the error message in the sentence, including:
For each sentence, the corresponding contextual information of each word in the sentence is obtained;
According to the corresponding contextual information of each word, the corresponding matrix-vector of each word is determined;
According to the part of speech of each word, wrong identification model corresponding with the part of speech is chosen;
By the corresponding matrix-vector of each word in the sentence, the corresponding wrong identification model of input part of speech obtains the sentence Error message in son.
10. according to the method described in claim 9, it is characterized in that, described by the corresponding matrix of each word in the sentence Vector, the corresponding wrong identification model of input part of speech further include before obtaining the error message in the sentence:
Obtain wrong identification training sample corresponding with each part of speech;The wrong identification training sample includes:With described The corresponding matrix-vector of word of part of speech and the corresponding error message of the word;
For the corresponding wrong identification model of each part of speech, using the corresponding wrong identification training sample of the part of speech to the mistake Misrecognition model is trained.
11. according to the method described in claim 8, it is characterized in that, described carry out sentence cutting, acquisition to the content of text Multiple sentences in the content of text, including:
Obtain the corresponding type of the content of text;The type is used to identify the accuracy that the content of text sentence divides;
Obtain feature to be extracted corresponding with the type;
According to the feature to be extracted, feature extraction is carried out to the content of text, the cutting obtained in the content of text is special Reference ceases;
Sentence cutting is carried out to the content of text according to the cutting characteristic information, obtains multiple sentences in the content of text Son.
12. according to the method described in claim 8, it is characterized in that, described be directed to each sentence, according to the analysis of the sentence The sentence is identified as a result, choosing corresponding wrong identification model, before obtaining the error message in the sentence, also Including:
For each sentence, the sentence is compared with preset error pattern library, obtains the mistake letter in the sentence Breath;The error pattern library includes:The corresponding regular expression of a variety of error patterns.
13. the method according to claim 8 or 12, which is characterized in that each sentence of basis is corresponding to correct suggestion, Generate that the hand-written composition image is corresponding to be corrected as a result, including:
For each sentence, the sentence it is corresponding to correct suggestion be multiple when, corresponding multiple correct of the sentence is built View input is preset to correct preference pattern, and acquisition is corresponding with the sentence individually to correct suggestion;
Suggestion is individually corrected according to each sentence is corresponding, the generation hand-written composition image is corresponding to correct result.
14. according to the method described in claim 8, it is characterized in that, described that error message input in the sentence is default Correct suggestion mode, before obtaining that the sentence is corresponding and correcting suggestion, further include:
Training sample is corrected in acquisition;The training sample of correcting includes:Sentence sample, the error message in sentence sample and Sentence sample is corresponding to correct suggestion;
It corrects training sample according to described and is trained to correcting suggestion mode, obtain described preset correcting suggestion mode.
15. according to the method described in claim 1, it is characterized in that, described carry out sentence cutting, syntax to the content of text Analysis, sentence are corrected, and after obtaining that the hand-written composition image is corresponding and correcting result, further include:
Obtain the characteristic information in the content of text;The characteristic information include in following information any one or it is more Kind:Lexical information, syntactic information, sentence information correct information;
It is lexical information either syntactic information or when sentence information in the characteristic information, the characteristic information is inputted and is corresponded to Rating Model, obtain the corresponding scoring of the characteristic information;
Include in the characteristic information:Lexical information, syntactic information, sentence information and when correcting information, by the characteristic information Preset comprehensive grade model is inputted, the corresponding comprehensive score of the hand-written composition image is obtained.
16. a kind of the automatic of hand-written composition corrects device, which is characterized in that including:
Acquisition module, for obtaining hand-written composition image to be changed;
Processing module, at using connected domain analysis algorithm and line detection algorithm to the hand-written composition image Reason, the interference in the removal hand-written composition image and background line, the hand-written composition image that obtains that treated;
Cutting module, for, into every trade cutting and the segmentation of words, obtaining multiple lists to treated the hand-written composition image Word image block;
Identification module is obtained described hand-written for multiple word image blocks to be identified using preset word identification model The corresponding content of text of composition image;
Module is corrected, for being corrected to the cutting of content of text progress sentence, syntactic analysis, sentence, obtains the hand-written work Texts and pictures correct result as corresponding.
17. device according to claim 16, which is characterized in that the module of correcting includes:
Cutting unit obtains multiple sentences in the content of text for carrying out sentence cutting to the content of text;
Analytic unit carries out syntactic analysis for each sentence to the content of text, obtains the analysis result of each sentence; The analysis result includes:The class of word, phrase, the part of speech of the word and the phrase that the sentence includes Type;
Recognition unit, for choosing corresponding wrong identification model pair according to the analysis result of the sentence for each sentence The sentence is identified, and obtains the error message in the sentence;
Input unit, for by the sentence error message input it is preset correct suggestion mode, obtain the sentence pair That answers corrects suggestion;
Generation unit, for correcting suggestion according to each sentence is corresponding, the generation hand-written composition image is corresponding to correct knot Fruit.
18. device according to claim 17, which is characterized in that the recognition unit is specifically used for,
For each sentence, the corresponding contextual information of each word in the sentence is obtained;
According to the corresponding contextual information of each word, the corresponding matrix-vector of each word is determined;
According to the part of speech of each word, wrong identification model corresponding with the part of speech is chosen;
By the corresponding matrix-vector of each word in the sentence, the corresponding wrong identification model of input part of speech obtains the sentence Error message in son.
19. a kind of the automatic of hand-written composition corrects device, which is characterized in that including:
Memory, processor and storage are on a memory and the computer program that can run on a processor, which is characterized in that institute It states and realizes that the automatic of the hand-written composition as described in any in claim 1-15 corrects method when processor executes described program.
20. a kind of non-transitorycomputer readable storage medium, is stored thereon with computer program, which is characterized in that the program Realize that the automatic of the hand-written composition as described in any in claim 1-15 corrects method when being executed by processor.
21. a kind of computer program product executes a kind of hand when the instruction processing unit in the computer program product executes The automatic of writing text corrects method, the method includes:
Obtain hand-written composition image to be changed;
The hand-written composition image is handled using connected domain analysis algorithm and line detection algorithm, removal is described hand-written Interference and the background line write a composition in image, the hand-written composition image that obtains that treated;
To treated the hand-written composition image into every trade cutting and the segmentation of words, multiple word image blocks are obtained;
Multiple word image blocks are identified using preset word identification model, it is corresponding to obtain the hand-written composition image Content of text;
The cutting of content of text progress sentence, syntactic analysis, sentence are corrected, hand-written corresponding batch of the composition image is obtained Change result.
CN201810223663.8A 2018-03-19 2018-03-19 Automatic correction method and device for handwritten composition Active CN108595410B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810223663.8A CN108595410B (en) 2018-03-19 2018-03-19 Automatic correction method and device for handwritten composition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810223663.8A CN108595410B (en) 2018-03-19 2018-03-19 Automatic correction method and device for handwritten composition

Publications (2)

Publication Number Publication Date
CN108595410A true CN108595410A (en) 2018-09-28
CN108595410B CN108595410B (en) 2023-03-24

Family

ID=63626800

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810223663.8A Active CN108595410B (en) 2018-03-19 2018-03-19 Automatic correction method and device for handwritten composition

Country Status (1)

Country Link
CN (1) CN108595410B (en)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109670040A (en) * 2018-11-27 2019-04-23 平安科技(深圳)有限公司 Write householder method, device and storage medium, computer equipment
CN110188274A (en) * 2019-05-30 2019-08-30 口口相传(北京)网络技术有限公司 Search for error correction method and device
CN110489747A (en) * 2019-07-31 2019-11-22 北京大米科技有限公司 A kind of image processing method, device, storage medium and electronic equipment
CN110765996A (en) * 2019-10-21 2020-02-07 北京百度网讯科技有限公司 Text information processing method and device
CN110851599A (en) * 2019-11-01 2020-02-28 中山大学 Automatic scoring method and teaching and assisting system for Chinese composition
CN111079500A (en) * 2019-07-11 2020-04-28 广东小天才科技有限公司 Method and system for correcting dictation content
CN111199801A (en) * 2018-11-19 2020-05-26 零氪医疗智能科技(广州)有限公司 Construction method and application of model for identifying disease types of medical records
CN111737968A (en) * 2019-03-20 2020-10-02 小船出海教育科技(北京)有限公司 Method and terminal for automatically correcting and scoring composition
CN111950240A (en) * 2020-08-26 2020-11-17 北京高途云集教育科技有限公司 Data correction method, device and system
CN112149680A (en) * 2020-09-28 2020-12-29 武汉悦学帮网络技术有限公司 Wrong word detection and identification method and device, electronic equipment and storage medium
CN112528651A (en) * 2021-02-08 2021-03-19 深圳市阿卡索资讯股份有限公司 Intelligent correction method, system, electronic equipment and storage medium
CN112597754A (en) * 2020-12-23 2021-04-02 北京百度网讯科技有限公司 Text error correction method and device, electronic equipment and readable storage medium
CN112634689A (en) * 2020-12-24 2021-04-09 广州奇大教育科技有限公司 Application method of regular expression in automatic subjective question changing in computer teaching
CN113361511A (en) * 2020-03-05 2021-09-07 顺丰科技有限公司 Method, device and equipment for establishing correction model and computer readable storage medium
CN113536743A (en) * 2020-11-06 2021-10-22 腾讯科技(深圳)有限公司 Text processing method and related device
CN113836894A (en) * 2021-09-26 2021-12-24 武汉天喻信息产业股份有限公司 Multidimensional English composition scoring method and device and readable storage medium
CN114489439A (en) * 2022-01-20 2022-05-13 安徽淘云科技股份有限公司 Article correcting method and related equipment thereof

Citations (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW315432B (en) * 1995-08-31 1997-09-11 Nat Univ Tsing Hua The auto debugging and correcting device and method for computer document
JPH09305714A (en) * 1996-05-17 1997-11-28 N T T Data Tsushin Kk System and method for recognizing character
US6085206A (en) * 1996-06-20 2000-07-04 Microsoft Corporation Method and system for verifying accuracy of spelling and grammatical composition of a document
US6154579A (en) * 1997-08-11 2000-11-28 At&T Corp. Confusion matrix based method and system for correcting misrecognized words appearing in documents generated by an optical character recognition technique
US20040006466A1 (en) * 2002-06-28 2004-01-08 Ming Zhou System and method for automatic detection of collocation mistakes in documents
US20050074169A1 (en) * 2001-02-16 2005-04-07 Parascript Llc Holistic-analytical recognition of handwritten text
WO2005045786A1 (en) * 2003-10-27 2005-05-19 Educational Testing Service Automatic essay scoring system
US20080077859A1 (en) * 1998-05-26 2008-03-27 Global Information Research And Technologies Llc Spelling and grammar checking system
WO2011044658A1 (en) * 2009-10-15 2011-04-21 2167959 Ontario Inc. System and method for text cleaning
WO2012039686A1 (en) * 2010-09-24 2012-03-29 National University Of Singapore Methods and systems for automated text correction
CN103294660A (en) * 2012-02-29 2013-09-11 张跃 Automatic English composition scoring method and system
CN103365838A (en) * 2013-07-24 2013-10-23 桂林电子科技大学 Method for automatically correcting syntax errors in English composition based on multivariate features
US20140214401A1 (en) * 2013-01-29 2014-07-31 Tencent Technology (Shenzhen) Company Limited Method and device for error correction model training and text error correction
CN105045779A (en) * 2015-07-13 2015-11-11 北京大学 Deep neural network and multi-tag classification based wrong sentence detection method
CN105183713A (en) * 2015-08-27 2015-12-23 北京时代焦点国际教育咨询有限责任公司 English composition automatic correcting method and system
CN105279149A (en) * 2015-10-21 2016-01-27 上海应用技术学院 Chinese text automatic correction method
WO2016147330A1 (en) * 2015-03-18 2016-09-22 株式会社日立製作所 Text processing method and text processing system
WO2017043130A1 (en) * 2015-09-07 2017-03-16 信也 赤木 Text evaluation device, text evaluation method, and program
CN106610930A (en) * 2015-10-22 2017-05-03 科大讯飞股份有限公司 Foreign language writing automatic error correction method and system
CN107239449A (en) * 2017-06-08 2017-10-10 锦州医科大学 A kind of English recognition methods and interpretation method
CN107357775A (en) * 2017-06-05 2017-11-17 百度在线网络技术(北京)有限公司 The text error correction method and device of Recognition with Recurrent Neural Network based on artificial intelligence
US20170337443A1 (en) * 2014-11-06 2017-11-23 Achiav KOLTON Location based optical character recognition (ocr)
CN107403130A (en) * 2017-04-19 2017-11-28 北京粉笔未来科技有限公司 A kind of character identifying method and character recognition device
CN107704859A (en) * 2017-11-01 2018-02-16 哈尔滨工业大学深圳研究生院 A kind of character recognition method based on deep learning training framework

Patent Citations (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW315432B (en) * 1995-08-31 1997-09-11 Nat Univ Tsing Hua The auto debugging and correcting device and method for computer document
JPH09305714A (en) * 1996-05-17 1997-11-28 N T T Data Tsushin Kk System and method for recognizing character
US6085206A (en) * 1996-06-20 2000-07-04 Microsoft Corporation Method and system for verifying accuracy of spelling and grammatical composition of a document
US6154579A (en) * 1997-08-11 2000-11-28 At&T Corp. Confusion matrix based method and system for correcting misrecognized words appearing in documents generated by an optical character recognition technique
US20080077859A1 (en) * 1998-05-26 2008-03-27 Global Information Research And Technologies Llc Spelling and grammar checking system
US20050074169A1 (en) * 2001-02-16 2005-04-07 Parascript Llc Holistic-analytical recognition of handwritten text
US20040006466A1 (en) * 2002-06-28 2004-01-08 Ming Zhou System and method for automatic detection of collocation mistakes in documents
WO2005045786A1 (en) * 2003-10-27 2005-05-19 Educational Testing Service Automatic essay scoring system
WO2011044658A1 (en) * 2009-10-15 2011-04-21 2167959 Ontario Inc. System and method for text cleaning
WO2012039686A1 (en) * 2010-09-24 2012-03-29 National University Of Singapore Methods and systems for automated text correction
CN103294660A (en) * 2012-02-29 2013-09-11 张跃 Automatic English composition scoring method and system
US20140214401A1 (en) * 2013-01-29 2014-07-31 Tencent Technology (Shenzhen) Company Limited Method and device for error correction model training and text error correction
CN103365838A (en) * 2013-07-24 2013-10-23 桂林电子科技大学 Method for automatically correcting syntax errors in English composition based on multivariate features
US20170337443A1 (en) * 2014-11-06 2017-11-23 Achiav KOLTON Location based optical character recognition (ocr)
WO2016147330A1 (en) * 2015-03-18 2016-09-22 株式会社日立製作所 Text processing method and text processing system
CN105045779A (en) * 2015-07-13 2015-11-11 北京大学 Deep neural network and multi-tag classification based wrong sentence detection method
CN105183713A (en) * 2015-08-27 2015-12-23 北京时代焦点国际教育咨询有限责任公司 English composition automatic correcting method and system
WO2017043130A1 (en) * 2015-09-07 2017-03-16 信也 赤木 Text evaluation device, text evaluation method, and program
CN105279149A (en) * 2015-10-21 2016-01-27 上海应用技术学院 Chinese text automatic correction method
CN106610930A (en) * 2015-10-22 2017-05-03 科大讯飞股份有限公司 Foreign language writing automatic error correction method and system
CN107403130A (en) * 2017-04-19 2017-11-28 北京粉笔未来科技有限公司 A kind of character identifying method and character recognition device
CN107357775A (en) * 2017-06-05 2017-11-17 百度在线网络技术(北京)有限公司 The text error correction method and device of Recognition with Recurrent Neural Network based on artificial intelligence
CN107239449A (en) * 2017-06-08 2017-10-10 锦州医科大学 A kind of English recognition methods and interpretation method
CN107704859A (en) * 2017-11-01 2018-02-16 哈尔滨工业大学深圳研究生院 A kind of character recognition method based on deep learning training framework

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
卫晓欣: "基于长短型记忆递归神经网络的英文手写识别", 《中国优秀硕士学位论文全文数据库信息科技辑》 *
陈珊珊: "自动作文评分模型及方法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111199801A (en) * 2018-11-19 2020-05-26 零氪医疗智能科技(广州)有限公司 Construction method and application of model for identifying disease types of medical records
CN111199801B (en) * 2018-11-19 2023-08-08 零氪医疗智能科技(广州)有限公司 Construction method and application of model for identifying disease types of medical records
CN109670040B (en) * 2018-11-27 2024-04-05 平安科技(深圳)有限公司 Writing assistance method and device, storage medium and computer equipment
CN109670040A (en) * 2018-11-27 2019-04-23 平安科技(深圳)有限公司 Write householder method, device and storage medium, computer equipment
CN111737968A (en) * 2019-03-20 2020-10-02 小船出海教育科技(北京)有限公司 Method and terminal for automatically correcting and scoring composition
CN110188274B (en) * 2019-05-30 2021-06-08 口口相传(北京)网络技术有限公司 Search error correction method and device
CN110188274A (en) * 2019-05-30 2019-08-30 口口相传(北京)网络技术有限公司 Search for error correction method and device
CN111079500B (en) * 2019-07-11 2023-10-27 广东小天才科技有限公司 Method and system for correcting dictation content
CN111079500A (en) * 2019-07-11 2020-04-28 广东小天才科技有限公司 Method and system for correcting dictation content
CN110489747A (en) * 2019-07-31 2019-11-22 北京大米科技有限公司 A kind of image processing method, device, storage medium and electronic equipment
CN110765996B (en) * 2019-10-21 2022-07-29 北京百度网讯科技有限公司 Text information processing method and device
CN110765996A (en) * 2019-10-21 2020-02-07 北京百度网讯科技有限公司 Text information processing method and device
CN110851599B (en) * 2019-11-01 2023-04-28 中山大学 Automatic scoring method for Chinese composition and teaching assistance system
CN110851599A (en) * 2019-11-01 2020-02-28 中山大学 Automatic scoring method and teaching and assisting system for Chinese composition
CN113361511A (en) * 2020-03-05 2021-09-07 顺丰科技有限公司 Method, device and equipment for establishing correction model and computer readable storage medium
CN111950240A (en) * 2020-08-26 2020-11-17 北京高途云集教育科技有限公司 Data correction method, device and system
CN112149680A (en) * 2020-09-28 2020-12-29 武汉悦学帮网络技术有限公司 Wrong word detection and identification method and device, electronic equipment and storage medium
CN112149680B (en) * 2020-09-28 2024-01-16 武汉悦学帮网络技术有限公司 Method and device for detecting and identifying wrong words, electronic equipment and storage medium
CN113536743A (en) * 2020-11-06 2021-10-22 腾讯科技(深圳)有限公司 Text processing method and related device
CN112597754A (en) * 2020-12-23 2021-04-02 北京百度网讯科技有限公司 Text error correction method and device, electronic equipment and readable storage medium
CN112597754B (en) * 2020-12-23 2023-11-21 北京百度网讯科技有限公司 Text error correction method, apparatus, electronic device and readable storage medium
CN112634689A (en) * 2020-12-24 2021-04-09 广州奇大教育科技有限公司 Application method of regular expression in automatic subjective question changing in computer teaching
CN112528651A (en) * 2021-02-08 2021-03-19 深圳市阿卡索资讯股份有限公司 Intelligent correction method, system, electronic equipment and storage medium
CN113836894A (en) * 2021-09-26 2021-12-24 武汉天喻信息产业股份有限公司 Multidimensional English composition scoring method and device and readable storage medium
CN113836894B (en) * 2021-09-26 2023-08-15 武汉天喻信息产业股份有限公司 Multi-dimensional English composition scoring method and device and readable storage medium
CN114489439A (en) * 2022-01-20 2022-05-13 安徽淘云科技股份有限公司 Article correcting method and related equipment thereof

Also Published As

Publication number Publication date
CN108595410B (en) 2023-03-24

Similar Documents

Publication Publication Date Title
CN108595410A (en) The automatic of hand-written composition corrects method and device
CN110489760B (en) Text automatic correction method and device based on deep neural network
CN110188202B (en) Training method and device of semantic relation recognition model and terminal
US5784489A (en) Apparatus and method for syntactic signal analysis
KR102199835B1 (en) System for correcting language and method thereof, and method for learning language correction model
CN113657098B (en) Text error correction method, device, equipment and storage medium
KR102143745B1 (en) Method and system for error correction of korean using vector based on syllable
CN114282527A (en) Multi-language text detection and correction method, system, electronic device and storage medium
CN112560506B (en) Text semantic analysis method, device, terminal equipment and storage medium
CN112287100A (en) Text recognition method, spelling error correction method and voice recognition method
CN111177375A (en) Electronic document classification method and device
CN115862045A (en) Case automatic identification method, system, equipment and storage medium based on image-text identification technology
CN115100668A (en) Method and device for identifying table information in image
CN114742039A (en) Chinese spelling error correction method and system, storage medium and terminal
CN112818693A (en) Automatic extraction method and system for electronic component model words
Boukharouba et al. Recognition of handwritten Arabic literal amounts using a hybrid approach
CN116484842A (en) Statement error correction method and device, electronic equipment and storage medium
Nguyen et al. An in-depth analysis of OCR errors for unconstrained Vietnamese handwriting
KR100919497B1 (en) Method and computer-readable recording medium for separating component parts of hangul in order to recognize the hangul
CN113033188B (en) Tibetan grammar error correction method based on neural network
CN109086272B (en) Sentence pattern recognition method and system
Vinitha Error detection and correction in Indic OCRs
CN107977354A (en) A kind of mixing language material segmenting method based on Bi-LSTM-CNN
KR102236639B1 (en) Method and system for error correction of korean using vector based on syllable
CN115204151A (en) Chinese text error correction method, system and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20230619

Address after: 6001, 6th Floor, No.1 Kaifeng Road, Shangdi Information Industry Base, Haidian District, Beijing, 100085

Patentee after: Beijing Baige Feichi Technology Co.,Ltd.

Address before: 100085 4001, 4th floor, No.1 Kaifa Road, Shangdi Information Industry base, Haidian District, Beijing

Patentee before: XIAOCHUANCHUHAI EDUCATION TECHNOLOGY (BEIJING) CO.,LTD.

TR01 Transfer of patent right