CN110348280A

CN110348280A - Water book character recognition method based on CNN artificial neural

Info

Publication number: CN110348280A
Application number: CN201910217488.6A
Authority: CN
Inventors: 丁琼
Original assignee: Guizhou Industry Polytechnic College
Current assignee: Guizhou Industry Polytechnic College
Priority date: 2019-03-21
Filing date: 2019-03-21
Publication date: 2019-10-18

Abstract

The invention discloses a kind of water book character recognition method based on CNN artificial neural, it includes step 1, acquisition water book text sample；Step 2 carries out feature extraction and classification to water book text sample using based on CNN artificial neural model；Step 3, the positioning and detection that water book character is realized using YOLO algorithm；It solves and the problems such as water book text is low there are accuracy rate is identified by conventional text identification technology in the prior art.

Description

Water book character recognition method based on CNN artificial neural

Technical field

The invention belongs to character recognition technology more particularly to a kind of water book Text regions based on CNN artificial neural Method.

Background technique:

In computer vision field, object identification and positioning are always an important research direction, the identification to water book text The problem of exactly belonging to this scope is nonstandardized technique text, in the water book sample of collection since water book text belongs to pictograph It is nearly all hand-written water book data in this, the different people of the same word writes out difference may be very big, and there are also many water book texts Word is all that water outlet book text is accurately identified from a shuffling document together with other texts (such as: Chinese character) shuffling, without Other texts or pattern are misidentified, this extracts Character segmentation, positioning and detection and character feature and is proposed with sorting algorithm Very high requirement；In addition, since water book text characteristically differs greatly from compared with natural picture, and usual nerve net Network is all the pre-training carried out on ImageNet etc. general image classification data collection, common neural network be difficult it is extensive this Difference in a little features；Therefore it is asked using the character recognition method identification water book text of the prior art there are accuracy rate is low etc. Topic.

Summary of the invention:

The technical problem to be solved in the present invention: providing a kind of water book character recognition method based on CNN artificial neural, with solution The problems such as water book text is low there are accuracy rate is certainly identified by conventional text identification technology in the prior art.

Technical solution of the present invention:

A kind of water book character recognition method based on CNN artificial neural, it includes:

Step 1, acquisition water book text sample；

Step 2 carries out feature extraction and classification to water book text sample using based on CNN artificial neural model；

Step 3, the positioning and detection that water book character is realized using YOLO algorithm.

Include: using the method that YOLO algorithm is realized the positioning of water book character and detected described in step 3

Input picture is divided into S*S grid by step 3.1；

Step 3.2, each grid predict B box and its confidence level；

Step 3.3, each box include five predicted values: x, y, w, h and confidence level, x, y represent box center relative to grid Coordinate, w, h represent box width and height, and confidence level represents the IOU of prediction box and all mark boxes；

Step 3.4, each grid will also predict C conditional probability Pr (Class i | Object), and condition is based on grid In the case where comprising object, one group of conditional probability (C) are predicted for each grid, but regardless of the quantity of B；

The conditional probability of classification, is finally multiplied by step 3.5, in the detection with the confidence level of each box:

Pr(Classi|Object) ∗Pr(Object) ∗ IOU = Pr(Classi) ∗ IOU

Each box is obtained for the confidence level of each classification, the probability that this score contains classification accuracy also contains Matching degree of the prediction block to object.

Beneficial effects of the present invention:

The convolutional neural networks (CNN) that the present invention uses are a kind of neural network models of special deep layer, it is by BP and depth Spend the New BP Neural that learning network is combined and generated.Convolutional neural networks are to set the Multilayer Perception of juice to identify two-dimensional shapes Device has local experiences, the feature for the overall situation training that hierarchical structure, feature extraction and assorting process combine.This network structure There can be height invariance to the deformation of translation, rotation, scaling, inclination or other forms；Image is utilized in CNN network Spatial information, enhance the feature in image, simultaneously as share weight, parameter amount is greatly reduced, so that CNN structure It more can be reduced over-fitting in training, improve the generalization ability of model；

The YOLO(You only look once that the present invention uses) it is a kind of feature to be learnt using depth convolutional neural networks To detect and position the algorithm of target detection of object, i.e. the target detection recognizer based on deep learning.YOLO algorithm imitates The mankind identify the mode of object, are concluded according to priori knowledge, fast and accurately position and identify object.YOLO will test change It is Regression a problem, YOLO from the image of input, only passes through a Neural Network, directly Obtain the probability of bounding boxes and each bounding box generic.Entire detection process only has One network, so it can the directly optimization of end-to-end.

Input picture is divided into S*S grid by YOLO, if the coordinate of the center of some object Ground truth Some grid is dropped into, then this grid is just responsible for detecting this object.The advantage of YOLO has:

1) speed is exceedingly fast, and converts classification problem for detection problem with candidate regions domain method compared to R-CNN series, YOLO will test Problem is directly defined as regression problem, so not needing complicated process, it is only necessary to which picture is input in network to you can get it As a result.

2) utilization of global information, YOLO are obtained a result in detection by observing whole picture, rather than as sliding window Mouth method or candidate regions domain method etc. can only utilize area information.Therefore, the background letter of its implicit coded object itself and object Breath, so that background will not be mistakenly considered object by YOLO, the background of YOLO misses detectivity ratio fast R-CNN and lacks half.

3) generalization ability, YOLO can learn to the more general feature of object, when on natural kind picture training after When detecting on the picture of man-made objects, YOLO is far more than top detection algorithm such as DPM, R-CNN.

Since water book text characteristically differs greatly from compared with natural picture, and usually neural network be all The pre-training carried out on ImageNet etc. general image classification data collection, common neural network are difficult in these extensive features Difference, therefore the present invention selection can directly from testing image out position and classification information YOLO algorithm；YOLO makes With 24 layers of convolutional network and last 2 layers of fully-connected network, wherein alternately being reduced using the convolutional layer of 1*1 from preceding layer Feature space, last two layers of full articulamentum, which is modified, replaces with convolutional layer, therefore can input and detect to sizes, together When, full convolutional network can preferably retain the spatial positional information of target relative to full articulamentum.On the other hand, it introduces Residual structure, instruction deep layer network difficulty greatly reduces, therefore the network number of plies can be accomplished deeper, and precision improvement is brighter It is aobvious.

It solves and the problems such as water book text is low there are accuracy rate is identified by conventional text identification technology in the prior art.

Specific embodiment:

Step 1, acquisition water book text sample；

In order to solve the translation text of water book text, we selected page 120 in " water book common dictionary " as basic sample；

Image characteristics extraction, which refers to from object itself, obtains the various measurements or attribute useful for classification.It is obtained according to feature extraction The feature vector arrived assigns a category label to object, so that analysis sample is divided into n class, it is considered that two objects are similar to be Because they have similar feature, the sample with similar characteristics belongs to same category.

In conventional machines learning method, most of is the feature by manually extracting image to be classified, then by feature It is put into common classifier (such as SVM, decision tree, random forest) and classifies, all kinds of probability values is obtained, to judge Which kind of image to be classified belongs to.

In order to solve the problems, such as that existing character recognition method network generalization is insufficient, the present invention uses CNN network model.

Convolutional neural networks (CNN) are a kind of neural network models of special deep layer, it is by BP and deep learning net Network combines and the New BP Neural that generates.Convolutional neural networks are to set the multilayer perceptron of juice to identify two-dimensional shapes, are had Local experiences, the feature for the overall situation training that hierarchical structure, feature extraction and assorting process combine.This network structure can be to flat Shifting, rotation, scaling, inclination or the deformation of other forms have height invariance.

The spatial information of image is utilized in CNN network, enhances the feature in image, simultaneously as sharing weight, significantly Reduce parameter amount, so that CNN structure more can be reduced over-fitting in training, improves the generalization ability of model.

Water book character is identified, it is necessary first to prepare data set, present embodiment is made of 17 water book characters classifies, Each character is extended for 500, is stored under corresponding folder, picture is saved as to the specified format of 50*50*1, is made into Data format to be trained simultaneously divides training set and test set.Then network is established, only we establish one layer of convolution herein Network, convolution kernel is having a size of 3*3, and quantity is 20, and maxpooling is carried out after relu activation primitive, and then access connects entirely Layer is connect to classify.Training parameter is set, network is finally trained and shows accuracy, obtaining accuracy after 10 epoch is 93.74%。

In view of the CNN network structure is relatively simple, and class object is also only 17, compared to MLP network to 6 words The accuracy rate 90.4% of classification is accorded with, this is the result shows that used CNN structure to have biggish promotion to classification performance.Meanwhile to training Hyper parameter modification and generate sample improvement accuracy rate can also be made further to be promoted.

Since the biggish network structure biggish data volume of needs and longer training time, common practices are in pre-training It is finely adjusted on good basic network, i.e., trains the network of weight for one, only change the structure of final full articulamentum, It is then trained using the weight of front as initial value, by test comparison, we select VGG16 as basic network.VGG16 It is formed by stacking by 13 convolutional layers and 3 full articulamentums, it has the advantage that

1) as basic network, classification performance is very good；

2) network structure of VGG16 is very regular, and modification is got up relatively easy；

3) model of training has been announced on ImageNet, can be carried out on this basis to other data sets Finetuning, and it is fine to other data set adaptability；

4) there are many network structure that object detection field does basic network using VGG16, and effect same is also fine.

VGG16 is the network of the ImageNet image library pre-training based on a large amount of true pictures, we will succeed in school The weight of VGG16 moves to the initial weight on the convolutional neural networks of oneself as network, and network our own in this way is just Without being had trained in a large amount of data from the beginning, to improve training speed.

YOLO(You only look once) it is a kind of feature to be learnt using depth convolutional neural networks to detect simultaneously Position the algorithm of target detection of object, i.e. the target detection recognizer based on deep learning.YOLO algorithm imitates mankind's identification The mode of object, is concluded according to priori knowledge, fast and accurately positions and identify object.YOLO, which will test, becomes one Regression problem, YOLO only pass through a Neural Network, directly obtain from the image of input The probability of bounding boxes and each bounding box generic.Entire detection process only has one Network, so it can the directly optimization of end-to-end.

Since water book text characteristically differs greatly from compared with natural picture, and usually neural network be all The pre-training carried out on ImageNet etc. general image classification data collection, common neural network are difficult in these extensive features Difference, therefore selected can directly from testing image out position and classification information YOLO algorithm.

The target detection process of YOLO:

1) input picture is divided into S*S grid, if jobbie center is fallen on this grid, this grid is auxiliary negative Duty detects this object.

2) each grid predicts the confidence level of B box and it, confidence level are as follows: Pr (Object) IOU(truth Pred), if without object, confidence level 0, if there is object, confidence level be predict box and mark box intersection with simultaneously The ratio between collection (IOU)

3) each box includes five predicted values: x, y, w, h and confidence level, x, y represent seat of the box center relative to grid Mark, w, h represent box width and height, are relative to picture and are normalized.Confidence level represents prediction box and all marks Infuse the IOU of box.

4) each grid will also predict C conditional probability Pr (Class i | Object), and condition is based on grid packet In the case where object.One group of conditional probability (C) are predicted for each grid, but regardless of the quantity of B.

5) in the detection, finally the conditional probability of classification is multiplied with the confidence level of each box:

6) Pr (Classi | Object) Pr (Object) IOU=Pr (Classi) IOU

Each box is obtained for the confidence level of each classification, the probability that this score contains classification accuracy also includes Matching degree of the prediction block to object.

YOLO uses 24 layers of convolutional network and last 2 layers of fully-connected network, wherein alternately being subtracted using the convolutional layer of 1*1 The small feature space from preceding layer, last two layers of full articulamentum, which is modified, replaces with convolutional layer, therefore can be to sizes Input is detected, meanwhile, full convolutional network can preferably retain the spatial positional information of target relative to full articulamentum.Separately On the one hand, introduce residual structure, instruction deep layer network difficulty greatly reduces, therefore the network number of plies can be accomplished it is deeper, Precision improvement is obvious.

For the training effect of raising network, increase generalization ability and identification robustness, when training, sample is scaled, is rotated, Tone, contrast and distortion etc. adjust at random with exptended sample.

Claims

1. a kind of water book character recognition method based on CNN artificial neural, it includes:

Step 1, acquisition water book text sample；

2. a kind of water book character recognition method based on CNN artificial neural according to claim 1, feature exist In: the method for realizing the positioning and detection of water book character described in step 3 using YOLO algorithm includes:

Input picture is divided into S*S grid by step 3.1；

Step 3.2, each grid predict B box and its confidence level；

Pr(Classi|Object) ∗Pr(Object) ∗ IOU = Pr(Classi) ∗ IOU