CN108985175B

CN108985175B - Handwritten Chinese sentence set recognition method based on standard peripheral outline and deep learning

Info

Publication number: CN108985175B
Application number: CN201810634249.6A
Authority: CN
Inventors: 王琦琦; 尹成娟; 王以忠; 杨国威; 许素霞
Original assignee: Tianjin University of Science and Technology
Current assignee: Tianjin University of Science and Technology
Priority date: 2018-06-20
Filing date: 2018-06-20
Publication date: 2021-06-04
Anticipated expiration: 2038-06-20
Also published as: CN108985175A

Abstract

The invention provides a handwritten Chinese sentence set recognition method based on standard peripheral outline and deep learning, which comprises the following steps: (1) character cutting: writing the Chinese character sentence set to be identified into specific paper with standard peripheral outline, scanning the paper, and cutting the scanned picture to obtain single character picture with standard peripheral outline. (2) Picture processing: and removing the standard peripheral outline of the cut single character picture, and amplifying and binarizing the single character picture. (3) Character recognition: and calling an identification module to identify the single character picture after the picture processing to obtain an identification result. The character cutting part of the invention introduces the standard peripheral outline, thereby effectively avoiding the error cutting of characters and improving the accuracy of character recognition.

Description

Handwritten Chinese sentence set recognition method based on standard peripheral outline and deep learning

Technical Field

The invention belongs to the technical field of image processing, relates to identification of handwritten Chinese, and discloses a handwritten Chinese sentence set identification method based on standard peripheral outline and deep learning.

Background

The examination system is an important mechanism for talent selection in China and is an objective embodiment of learning results of students and teaching results of teachers. Among many examination subjects, the expression of Chinese composition as a reading and writing capability becomes an indispensable examination subject in schools of middle and primary schools in China. The traditional Chinese composition marking adopts a paper marking mode. However, with the advent of the "paperless" era, the conventional paper scoring method cannot meet the daily requirements of schools of middle and primary schools, and has many disadvantages, such as: the teacher can see the student information unintentionally due to loose binding, so that the unfairness of the examination is caused according to the individual subjective impression; when the paper review language composition is used for practice at ordinary times, a teacher needs to write comments by hand, so that the paper review period is long, and the practice chances of students are reduced. In order to solve the above problems, electronic paper marking is becoming a mainstream paper marking method. The electronic paper marking saves the binding link of paper answer paper, so that the efficiency of paper marking is improved, students can obtain more exercise opportunities, and the unfair examination phenomenon caused by paper marking is avoided.

In the traditional electronic paper reading of Chinese composition, a teacher directly starts to read after the test paper is scanned, so that the teacher still sees handwritten Chinese characters, different people have different writing styles, and the visual fatigue of the paper reading teacher is easily caused when the number of the test paper is large, so that the probability of misjudgment and misjudgment is greatly increased. Therefore, in order to relieve visual fatigue caused by long-time paper marking, a character recognition system is needed to be found, so that different handwritten Chinese sentence sets are converted into uniform printing forms.

Most character recognition systems are usually implemented on the basis of individual characters, so that to recognize a set of handwritten Chinese sentences, a cut of individual characters must first be made. At present, there are many research methods for character segmentation at home and abroad, including character segmentation methods based on recognition, character segmentation methods based on projection methods, and segmentation methods based on analyzing specific background areas and adopting different segmentation strategies for specific sticky numbers. However, the existing character cutting methods still have the defects of low cutting efficiency, poor recognition instantaneity and the like. Therefore, how to accurately cut each character becomes an important problem in the research of the character recognition system.

Disclosure of Invention

In order to solve the defects of the prior art in the related field, the invention provides a handwritten Chinese sentence set recognition method based on standard peripheral outline and deep learning, which adopts the technical scheme that:

character cutting: writing the Chinese character sentence set to be identified into specific paper with standard peripheral outline, scanning the paper, and cutting the scanned picture to obtain single character picture with standard peripheral outline.

Picture processing: and removing the standard peripheral outline of the cut single character picture, and amplifying and binarizing the single character picture.

Character recognition: and calling an identification module to identify the single character picture after the picture processing to obtain an identification result.

Compared with the prior art, the invention has the following advantages:

(1) the character cutting part of the invention introduces the standard peripheral outline, thereby effectively avoiding the error cutting of characters and improving the accuracy of character recognition.

(2) The introduction of the deep learning technology makes up the defects of the traditional character recognition technology to a great extent, realizes the recognition of handwritten Chinese characters, greatly improves the accuracy of character recognition, and has good robustness for characters with complex backgrounds and lower resolution.

(3) The improvement of the deep convolutional neural network structure reduces the complexity of the network and improves the portability of the network.

Drawings

FIG. 1 is a drawing of a particular sheet with a standard peripheral outline;

FIG. 2 is a diagram of a set of handwritten Chinese sentences to be recognized;

FIG. 3 is a diagram of a word after word segmentation;

FIG. 4 is a block diagram with standard peripheral outlines removed;

FIG. 5 is an enlarged, binarized single-word graph;

fig. 6 is a diagram showing the result of character recognition.

Detailed Description

The invention will be further described in detail by means of specific embodiments with reference to the accompanying drawings.

A handwritten Chinese sentence set recognition method based on standard peripheral outline and deep learning mainly comprises the steps of character cutting, picture processing, character recognition and the like, so that picture information is converted into text information. The character cutting and picture processing part is realized by calling an opencv interface on a Visual Studio platform, and the character recognition module is realized by adopting an improved version of an Alexnet model under a Caffe open source framework.

Character cutting: as shown in fig. 1, a specific sheet with a standard peripheral outline is written with a set of chinese sentences to be recognized and scanned, as shown in fig. 2. The scanned picture is cut by a minimum circumscribed rectangle method, and the handwritten Chinese character sentence set picture with the standard peripheral outline is cut into single character pictures with the standard peripheral outline and is stored in sequence, as shown in fig. 3.

Picture processing: removing the standard peripheral outline of the single character picture in fig. 3 by adjusting the RGB color channel threshold, as shown in fig. 4; in practical situations, most of the input handwritten Chinese characters are multi-channel images, and in consideration of the accuracy of character recognition, a single character image with a standard peripheral outline removed is amplified and converted into a binary image by adjusting a color channel threshold, and then the image is stored as shown in fig. 5.

Character recognition: and calling the identification module for each single character picture in sequence to obtain an identification result, as shown in fig. 6. The identification module needs model training before being called, and can be obtained by training in advance through the following steps:

the method comprises the following steps: and (3) analyzing GNT format data of the HWDB1.1 data set, carrying out binarization processing, dividing 240 multiplied by 1000 handwritten Chinese character samples into a training set and a test set, and converting the training set and the test set into a Caffe available lmdb format data set.

Step two: and initially configuring the convolutional neural network.

Step three: and carrying out repeated supervised learning on the 1000-class handwritten Chinese character training set by utilizing an improved version of the deep convolutional neural network Alexnet, and continuously adjusting the connection weight among all layers of the network according to the learning error during the repeated supervised learning. Meanwhile, when the learning is repeated for a set number of times, the test set is sent to the network for testing and the test accuracy is obtained. Wherein, the improvement to Alexnet model includes: the number of convolution kernels in the convolution layer 1 is changed from 96 to 80, and the corresponding pooling layer is also correspondingly improved; the first fully connected layer is removed.

Step four: and when the learning error of the convolutional neural network is lower than a first preset value and the test accuracy is higher than a second preset value, stopping training and storing the connection weight between the layers of the current network to obtain the optimal recognition model.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various changes and modifications can be made without departing from the inventive concept, and these changes and modifications are all within the scope of the present invention.

Claims

1. A handwritten Chinese character sentence set recognition method based on standard peripheral outline and deep learning, the steps are as follows:

(1) Prepare a specific paper with a standard peripheral outline, write the Chinese character sentence set to be recognized on the specific paper, and scan it;

(2) use the minimum circumscribed rectangle method to cut the scanned picture, cut the handwritten Chinese character sentence set picture with standard peripheral outline into single-character pictures with standard peripheral outline, and save them in order;

(3) By adjusting the RGB color channel threshold, the standard peripheral contour of the single-word picture is removed;

(4) enlarging the single-character picture with the standard peripheral outline removed and converting it into a binarized picture by adjusting the color channel threshold, and then saving the picture;

(5) The character recognition module is called according to the sequence of each word picture, and the recognition result is obtained.

2. the handwritten Chinese character sentence set recognition method based on standard peripheral outline and deep learning according to claim 1 is characterized in that: the recognition module needs to carry out model training before being called, adopts the handwritten Chinese character data set HWDB1.1, and its 240 *1000 handwritten samples are divided into test set and training set. The deep convolutional neural network Alexnet model is used to repeatedly train and predict 1000 types of handwritten Chinese characters, and finally the optimal recognition model and its weight parameters are obtained.

3. the handwritten Chinese character sentence set recognition method based on standard peripheral outline and deep learning according to claim 2, is characterized in that: the number of convolution kernels in the first convolutional layer of Alexnet model is 80, removes the first full connection layer.