CN108985175B - Handwritten Chinese sentence set recognition method based on standard peripheral outline and deep learning - Google Patents

Handwritten Chinese sentence set recognition method based on standard peripheral outline and deep learning Download PDF

Info

Publication number
CN108985175B
CN108985175B CN201810634249.6A CN201810634249A CN108985175B CN 108985175 B CN108985175 B CN 108985175B CN 201810634249 A CN201810634249 A CN 201810634249A CN 108985175 B CN108985175 B CN 108985175B
Authority
CN
China
Prior art keywords
standard peripheral
picture
peripheral outline
character
handwritten chinese
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810634249.6A
Other languages
Chinese (zh)
Other versions
CN108985175A (en
Inventor
王琦琦
尹成娟
王以忠
杨国威
许素霞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University of Science and Technology
Original Assignee
Tianjin University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University of Science and Technology filed Critical Tianjin University of Science and Technology
Priority to CN201810634249.6A priority Critical patent/CN108985175B/en
Publication of CN108985175A publication Critical patent/CN108985175A/en
Application granted granted Critical
Publication of CN108985175B publication Critical patent/CN108985175B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/30Writer recognition; Reading and verifying signatures
    • G06V40/33Writer recognition; Reading and verifying signatures based only on signature image, e.g. static signature recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Character Input (AREA)

Abstract

The invention provides a handwritten Chinese sentence set recognition method based on standard peripheral outline and deep learning, which comprises the following steps: (1) character cutting: writing the Chinese character sentence set to be identified into specific paper with standard peripheral outline, scanning the paper, and cutting the scanned picture to obtain single character picture with standard peripheral outline. (2) Picture processing: and removing the standard peripheral outline of the cut single character picture, and amplifying and binarizing the single character picture. (3) Character recognition: and calling an identification module to identify the single character picture after the picture processing to obtain an identification result. The character cutting part of the invention introduces the standard peripheral outline, thereby effectively avoiding the error cutting of characters and improving the accuracy of character recognition.

Description

Handwritten Chinese sentence set recognition method based on standard peripheral outline and deep learning
Technical Field
The invention belongs to the technical field of image processing, relates to identification of handwritten Chinese, and discloses a handwritten Chinese sentence set identification method based on standard peripheral outline and deep learning.
Background
The examination system is an important mechanism for talent selection in China and is an objective embodiment of learning results of students and teaching results of teachers. Among many examination subjects, the expression of Chinese composition as a reading and writing capability becomes an indispensable examination subject in schools of middle and primary schools in China. The traditional Chinese composition marking adopts a paper marking mode. However, with the advent of the "paperless" era, the conventional paper scoring method cannot meet the daily requirements of schools of middle and primary schools, and has many disadvantages, such as: the teacher can see the student information unintentionally due to loose binding, so that the unfairness of the examination is caused according to the individual subjective impression; when the paper review language composition is used for practice at ordinary times, a teacher needs to write comments by hand, so that the paper review period is long, and the practice chances of students are reduced. In order to solve the above problems, electronic paper marking is becoming a mainstream paper marking method. The electronic paper marking saves the binding link of paper answer paper, so that the efficiency of paper marking is improved, students can obtain more exercise opportunities, and the unfair examination phenomenon caused by paper marking is avoided.
In the traditional electronic paper reading of Chinese composition, a teacher directly starts to read after the test paper is scanned, so that the teacher still sees handwritten Chinese characters, different people have different writing styles, and the visual fatigue of the paper reading teacher is easily caused when the number of the test paper is large, so that the probability of misjudgment and misjudgment is greatly increased. Therefore, in order to relieve visual fatigue caused by long-time paper marking, a character recognition system is needed to be found, so that different handwritten Chinese sentence sets are converted into uniform printing forms.
Most character recognition systems are usually implemented on the basis of individual characters, so that to recognize a set of handwritten Chinese sentences, a cut of individual characters must first be made. At present, there are many research methods for character segmentation at home and abroad, including character segmentation methods based on recognition, character segmentation methods based on projection methods, and segmentation methods based on analyzing specific background areas and adopting different segmentation strategies for specific sticky numbers. However, the existing character cutting methods still have the defects of low cutting efficiency, poor recognition instantaneity and the like. Therefore, how to accurately cut each character becomes an important problem in the research of the character recognition system.
Disclosure of Invention
In order to solve the defects of the prior art in the related field, the invention provides a handwritten Chinese sentence set recognition method based on standard peripheral outline and deep learning, which adopts the technical scheme that:
character cutting: writing the Chinese character sentence set to be identified into specific paper with standard peripheral outline, scanning the paper, and cutting the scanned picture to obtain single character picture with standard peripheral outline.
Picture processing: and removing the standard peripheral outline of the cut single character picture, and amplifying and binarizing the single character picture.
Character recognition: and calling an identification module to identify the single character picture after the picture processing to obtain an identification result.
Compared with the prior art, the invention has the following advantages:
(1) the character cutting part of the invention introduces the standard peripheral outline, thereby effectively avoiding the error cutting of characters and improving the accuracy of character recognition.
(2) The introduction of the deep learning technology makes up the defects of the traditional character recognition technology to a great extent, realizes the recognition of handwritten Chinese characters, greatly improves the accuracy of character recognition, and has good robustness for characters with complex backgrounds and lower resolution.
(3) The improvement of the deep convolutional neural network structure reduces the complexity of the network and improves the portability of the network.
Drawings
FIG. 1 is a drawing of a particular sheet with a standard peripheral outline;
FIG. 2 is a diagram of a set of handwritten Chinese sentences to be recognized;
FIG. 3 is a diagram of a word after word segmentation;
FIG. 4 is a block diagram with standard peripheral outlines removed;
FIG. 5 is an enlarged, binarized single-word graph;
fig. 6 is a diagram showing the result of character recognition.
Detailed Description
The invention will be further described in detail by means of specific embodiments with reference to the accompanying drawings.
A handwritten Chinese sentence set recognition method based on standard peripheral outline and deep learning mainly comprises the steps of character cutting, picture processing, character recognition and the like, so that picture information is converted into text information. The character cutting and picture processing part is realized by calling an opencv interface on a Visual Studio platform, and the character recognition module is realized by adopting an improved version of an Alexnet model under a Caffe open source framework.
Character cutting: as shown in fig. 1, a specific sheet with a standard peripheral outline is written with a set of chinese sentences to be recognized and scanned, as shown in fig. 2. The scanned picture is cut by a minimum circumscribed rectangle method, and the handwritten Chinese character sentence set picture with the standard peripheral outline is cut into single character pictures with the standard peripheral outline and is stored in sequence, as shown in fig. 3.
Picture processing: removing the standard peripheral outline of the single character picture in fig. 3 by adjusting the RGB color channel threshold, as shown in fig. 4; in practical situations, most of the input handwritten Chinese characters are multi-channel images, and in consideration of the accuracy of character recognition, a single character image with a standard peripheral outline removed is amplified and converted into a binary image by adjusting a color channel threshold, and then the image is stored as shown in fig. 5.
Character recognition: and calling the identification module for each single character picture in sequence to obtain an identification result, as shown in fig. 6. The identification module needs model training before being called, and can be obtained by training in advance through the following steps:
the method comprises the following steps: and (3) analyzing GNT format data of the HWDB1.1 data set, carrying out binarization processing, dividing 240 multiplied by 1000 handwritten Chinese character samples into a training set and a test set, and converting the training set and the test set into a Caffe available lmdb format data set.
Step two: and initially configuring the convolutional neural network.
Step three: and carrying out repeated supervised learning on the 1000-class handwritten Chinese character training set by utilizing an improved version of the deep convolutional neural network Alexnet, and continuously adjusting the connection weight among all layers of the network according to the learning error during the repeated supervised learning. Meanwhile, when the learning is repeated for a set number of times, the test set is sent to the network for testing and the test accuracy is obtained. Wherein, the improvement to Alexnet model includes: the number of convolution kernels in the convolution layer 1 is changed from 96 to 80, and the corresponding pooling layer is also correspondingly improved; the first fully connected layer is removed.
Step four: and when the learning error of the convolutional neural network is lower than a first preset value and the test accuracy is higher than a second preset value, stopping training and storing the connection weight between the layers of the current network to obtain the optimal recognition model.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various changes and modifications can be made without departing from the inventive concept, and these changes and modifications are all within the scope of the present invention.

Claims (3)

1.一种基于标准外围轮廓和深度学习的手写汉字句集识别方法,步骤如下:1. A handwritten Chinese character sentence set recognition method based on standard peripheral outline and deep learning, the steps are as follows: (1)准备带有标准外围轮廓的特定纸张,在特定纸张上写入待识别的汉字句集,并对其进行扫描;(1) Prepare a specific paper with a standard peripheral outline, write the Chinese character sentence set to be recognized on the specific paper, and scan it; (2)用最小外接矩形法对扫描后的图片进行切割,将带有标准外围轮廓的手写汉字句集图片切割成带有标准外围轮廓的单字图片,并按顺序保存;(2) use the minimum circumscribed rectangle method to cut the scanned picture, cut the handwritten Chinese character sentence set picture with standard peripheral outline into single-character pictures with standard peripheral outline, and save them in order; (3)通过调整RGB颜色通道阈值,将单字图片的标准外围轮廓去除;(3) By adjusting the RGB color channel threshold, the standard peripheral contour of the single-word picture is removed; (4)将去除标准外围轮廓的单字图片进行放大处理并通过调整颜色通道阈值将其转化为二值化图片,然后将图片保存;(4) enlarging the single-character picture with the standard peripheral outline removed and converting it into a binarized picture by adjusting the color channel threshold, and then saving the picture; (5)按照每个单字图片顺序调用文字识别模块,得出识别结果。(5) The character recognition module is called according to the sequence of each word picture, and the recognition result is obtained. 2.根据权利要求1所述的基于标准外围轮廓和深度学习的手写汉字句集识别方法,其特征在于:识别模块在被调用以前需要进行模型训练,采用手写汉字数据集HWDB1.1,将其240*1000个手写样本分为测试集和训练集,调用深度卷积神经网络Alexnet模型对1000类手写汉字进行重复训练和预测,最终得到最优识别模型及其权重参数。2. the handwritten Chinese character sentence set recognition method based on standard peripheral outline and deep learning according to claim 1 is characterized in that: the recognition module needs to carry out model training before being called, adopts the handwritten Chinese character data set HWDB1.1, and its 240 *1000 handwritten samples are divided into test set and training set. The deep convolutional neural network Alexnet model is used to repeatedly train and predict 1000 types of handwritten Chinese characters, and finally the optimal recognition model and its weight parameters are obtained. 3.根据权利要求2所述的基于标准外围轮廓和深度学习的手写汉字句集识别方法,其特征在于:Alexnet模型第一个卷积层中的卷积核个数为80,去除第一个全连接层。3. the handwritten Chinese character sentence set recognition method based on standard peripheral outline and deep learning according to claim 2, is characterized in that: the number of convolution kernels in the first convolutional layer of Alexnet model is 80, removes the first full connection layer.
CN201810634249.6A 2018-06-20 2018-06-20 Handwritten Chinese sentence set recognition method based on standard peripheral outline and deep learning Active CN108985175B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810634249.6A CN108985175B (en) 2018-06-20 2018-06-20 Handwritten Chinese sentence set recognition method based on standard peripheral outline and deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810634249.6A CN108985175B (en) 2018-06-20 2018-06-20 Handwritten Chinese sentence set recognition method based on standard peripheral outline and deep learning

Publications (2)

Publication Number Publication Date
CN108985175A CN108985175A (en) 2018-12-11
CN108985175B true CN108985175B (en) 2021-06-04

Family

ID=64540752

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810634249.6A Active CN108985175B (en) 2018-06-20 2018-06-20 Handwritten Chinese sentence set recognition method based on standard peripheral outline and deep learning

Country Status (1)

Country Link
CN (1) CN108985175B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110399850B (en) * 2019-07-30 2021-10-15 西安工业大学 A Continuous Sign Language Recognition Method Based on Deep Neural Network

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102750556A (en) * 2012-06-01 2012-10-24 山东大学 Off-line handwritten form Chinese character recognition method
CN103513898A (en) * 2012-06-21 2014-01-15 夏普株式会社 Handwritten character segmenting method and electronic equipment
CN104239879A (en) * 2014-09-29 2014-12-24 小米科技有限责任公司 Character segmentation method and device
CN104484643A (en) * 2014-10-27 2015-04-01 中国科学技术大学 Intelligent identification method and system for hand-written table
CN105574486A (en) * 2015-11-25 2016-05-11 成都数联铭品科技有限公司 Image table character segmenting method
CN105654087A (en) * 2015-12-30 2016-06-08 李宇 Color template-based offline handwritten character extraction method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101930545A (en) * 2009-06-24 2010-12-29 夏普株式会社 Handwriting recognition method and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102750556A (en) * 2012-06-01 2012-10-24 山东大学 Off-line handwritten form Chinese character recognition method
CN103513898A (en) * 2012-06-21 2014-01-15 夏普株式会社 Handwritten character segmenting method and electronic equipment
CN104239879A (en) * 2014-09-29 2014-12-24 小米科技有限责任公司 Character segmentation method and device
CN104484643A (en) * 2014-10-27 2015-04-01 中国科学技术大学 Intelligent identification method and system for hand-written table
CN105574486A (en) * 2015-11-25 2016-05-11 成都数联铭品科技有限公司 Image table character segmenting method
CN105654087A (en) * 2015-12-30 2016-06-08 李宇 Color template-based offline handwritten character extraction method

Also Published As

Publication number Publication date
CN108985175A (en) 2018-12-11

Similar Documents

Publication Publication Date Title
US11790641B2 (en) Answer evaluation method, answer evaluation system, electronic device, and medium
CN108764074B (en) Subjective item intelligently reading method, system and storage medium based on deep learning
CN110210413B (en) A system and method for content detection and recognition of multi-disciplinary test papers based on deep learning
US10339428B2 (en) Intelligent scoring method and system for text objective question
CN110298236B (en) Automatic Braille image identification method and system based on deep learning
CN111814616A (en) Automatic examination paper marking processing system without answer sheet and implementation method thereof
CN104463101A (en) Answer recognition method and system for textual test question
WO2022161293A1 (en) Image processing method and apparatus, and electronic device and storage medium
CN105427696A (en) Method for distinguishing answer to target question
CN110414563A (en) Total marks of the examination statistical method, system and computer readable storage medium
CN111832551B (en) Text image processing method, device, electronic scanning equipment and storage medium
CN111008594B (en) Error-correction question review method, related device and readable storage medium
CN108985175B (en) Handwritten Chinese sentence set recognition method based on standard peripheral outline and deep learning
CN107220610A (en) A kind of subjective item fraction recognition methods applied to marking system
CN115346222A (en) Handwritten Chinese character quality evaluation model acquisition method, evaluation method and device
CN110503101A (en) Font evaluation method, apparatus, device and computer-readable storage medium
CN111428623B (en) Chinese blackboard-writing style analysis system based on big data and computer vision
CN113903039A (en) Color-based answer area acquisition method for answer sheet
CN119379500A (en) A system for software operation skill assessment and its answering and scoring method
Roque et al. Assistive technology for braille reading using optical braille recognition and text-to-speech
CN111814606A (en) An automatic scoring system and implementation method for technical image processing and pattern recognition
CN110705610A (en) Evaluation system and method based on handwriting detection and temporary writing capability
CN115171109A (en) Handwritten braille identification method and system based on deep learning
JP4710707B2 (en) Additional recording information processing method, additional recording information processing apparatus, and program
CN113947078A (en) Newspaper dictation method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant