CN109800754B

CN109800754B - Ancient font classification method based on convolutional neural network

Info

Publication number: CN109800754B
Application number: CN201811487296.9A
Authority: CN
Inventors: 吴以凡; 赵月; 张桦; 戴国骏; 史建凯
Original assignee: Hangzhou Dianzi University
Current assignee: Hangzhou Dianzi University
Priority date: 2018-12-06
Filing date: 2018-12-06
Publication date: 2020-11-06
Anticipated expiration: 2038-12-06
Also published as: CN109800754A

Abstract

The invention discloses an ancient font classification method based on a convolutional neural network. The method comprises the steps of firstly, crawling an ancient font category image data set by using a crawler technology, enabling training set samples to tend to be balanced through data expansion, carrying out graying processing on the balanced training set samples and resizing an image to a target image, then carrying out histogram equalization processing on the sample set, removing isolated noise points in the image through an N8 communicated noise reduction algorithm, and finally carrying out binarization processing on the image by using a Shannon entropy function based on a fuzzy set theory, wherein the detail characteristics of the image are well kept; the classification task-based target function uses the central loss function and the traditional cross entropy loss function in a matched mode, the inter-class distance is increased, the intra-class distance is reduced, the distinguishing capability of the features is improved to a certain extent, the preprocessed image is trained through a predefined network model, and the accuracy of the classification result is evaluated by using a confusion matrix.

Description

Ancient font classification method based on convolutional neural network

Technical Field

The invention relates to the field of Chinese traditional Chinese character image processing, in particular to an ancient font calligraphy classification method based on a convolutional neural network.

Background

Chinese characters, as traditional Chinese characters, have been used for thousands of years, and are also important components of traditional Chinese arts and cultures. However, time has resulted in the efflorescence and damage of old written works, and it is therefore necessary to protect these works using advanced techniques. The preprocessing (denoising) algorithm aiming at the Chinese ancient calligraphy works is provided, and on the basis, the convolutional neural network is used for classifying the data set so as to achieve better classification accuracy. Most ancient fonts (Chinese traditional calligraphy) are written by using Chinese traditional writing brushes, and the handwriting in the traditional writing brushes is much thicker and heavier than the handwriting in a hard pen, so that more shape information is stored in characters; however, weathered works present a lot of noise, largely affecting the classification effect.

In recent years, large volumes of ancient calligraphy have been digitized for research and widespread artistic practice. Thus, there is an increasing demand for ancient font recognition and classification. At present, many relevant solutions are available, which are mostly based on some feature extraction and K-neighborhood value techniques, and after the previous image preprocessing, the processing effect is not significant, and the solutions are generally used for the recognition of fonts and the extraction of single features. On the other hand, convolutional neural networks have been widely used for recognition of handwritten characters, but there is a lack of research on the orientation of ancient fonts in china. Based on the situation, the application of the convolutional neural network to the recognition of the style and the style of the ancient font is hoped to be explored, the goal of systematic classification is achieved, and a solid foundation is laid for the follow-up accurate recognition and the research and management of the ancient font. Aiming at the problems and the practical significance, the invention improves the capability in the aspect of data preprocessing, and trains a convolutional neural network model by optimizing parameter setting and utilizing proper training skills so as to realize better classification performance.

Disclosure of Invention

The invention aims to provide an ancient font classification method based on a convolutional neural network.

The method solves the problem of font style classification by applying the convolutional neural network based on deep learning to Chinese traditional calligraphy font classification. Firstly, preprocessing a data set image by combining histogram equalization and an image binarization algorithm based on a fuzzy set theory; and then, training a convolutional neural network on the preprocessed sample set to classify the preprocessed sample set, wherein experimental results show that the method can be used for more accurately classifying and identifying the degraded Chinese characters.

The method for classifying the ancient fonts based on the convolutional neural network has the classification problem that a discrete mapping relation is established after a built model is subjected to supervised learning, and an algorithm implementation module comprises a data set acquisition module, a data expansion module, an image preprocessing module, a convolutional neural network model module, an objective function module, an optimizer module, a network training module and a network testing module. The technical solution for achieving the purpose of the invention comprises the following steps:

step 1, acquiring a data set, namely crawling a single calligraphy character pre-segmented in a CADAL digital library by utilizing Beautiful Soup in a crawler technology to acquire five standard ancient font type images, and forming an image group by utilizing the five standard ancient font types to obtain the ancient font image data set required by the experiment.

And 2, expanding the ancient font image data set, expanding the number of data samples on the ancient font image data set obtained in the step 1, wherein the number of the samples of the ancient fonts of different styles, which are crawled by a crawler, is different, so that the model training is facilitated, the samples are expanded aiming at the categories with fewer data samples, and the number of the samples is expanded by randomly extracting the existing sample images and applying the data expansion method by using an image horizontal/vertical overturning method, a small-range rotating transformation method, a supervision type data expansion method deduction and scale transformation method, so that the diversity of training samples and test samples is increased, on one hand, overfitting can be effectively avoided, and on the other hand, the improvement of the model performance is brought to a certain degree.

And 3, carrying out preprocessing operation on the expanded complete ancient font image data set, wherein the preprocessing operation comprises image gray processing, image geometric scaling, image edge filling, histogram equalization processing, a connected domain noise reduction algorithm and an image binarization algorithm based on a fuzzy set theory. The original ancient font image is processed into a square image because the input to the convolutional neural network model is typically a square image.

Firstly, gray processing is carried out on an original ancient font image;

secondly, obtaining the size of the input image after the gray processing through reshape, wherein the size comprises the length, the width and the number of channels, scaling the image to a target value by taking the side with the larger length and width value as a reference and scaling the image to the target value by a resize () function;

then, edge filling is carried out on the side with the smaller length and width values, the picture is expanded outwards according to the pixel values of the image boundary, the expanded pixel points in each direction are half of the difference value of the target size, and a square image with the image size being the set target size is obtained;

then, histogram equalization processing is carried out on the square image, uneven gray level distribution in the square image occupies the whole gray level domain through transformation, and details are richer; and after histogram equalization processing, denoising the image by using an N8 connected denoising algorithm, and calculating 8 neighborhoods of each pixel point in the image to remove isolated noise points.

And finally, carrying out binarization processing on the image by using a fuzzy set theory, firstly establishing a fuzzy set X between a pixel point and a front background threshold and a rear background threshold by using the fuzzy set theory, namely defining a fuzzy subset which is mapped to a [0,1] interval from the image X, then establishing a complete fuzzy matrix by using a dynamic threshold adjusting mode, and finally solving the minimum information entropy E of the whole image fuzzy matrix by using a Shannon entropy function, wherein the threshold corresponding to the fuzzy matrix is the image binarization segmentation threshold.

And 4, defining a convolutional neural network model, and using the convolutional neural network based on the VGG19 model, wherein the image preprocessed in the step 3 is used as an input. First, in each model, a 3 × 3 convolution kernel sliding window, step 1, Padding 1, was used to preserve the input height and width, and the sliding window of the max pooling layer was 2 × 2, the down-sampling step 2; secondly, a BatchNorm layer is added behind each convolution layer, so that the input of each layer of neural network keeps the same distribution in the network training process, and the deep network model is easier and more stable to train; then, a nonlinear ReLU activation function is used after each BatchNorm layer, so that a rapid convergence effect is achieved; then 3 full-connection layers are accessed, and random inactivation (dropout) is used for preparing a network regularization method used by the convolutional neural network of the full-connection layers, so that the dependence among neurons is reduced to a certain extent, the occurrence of network overfitting is avoided, and the effect of improving the network generalization is remarkable; and finally, transmitting the obtained data with the output dimension of 5 of the full connection layer to a Softmax function, wherein the full connection layer maps the network characteristics to the mark space of the sample to make corresponding prediction.

And 5, defining an objective function, wherein the objective function is used for measuring the error between the predicted value and the real sample mark. Based on the objective function of the classification task, the central loss function is matched with the traditional cross entropy loss function for use, the central loss function also puts some attention on reducing the intra-class difference while considering the inter-class distance, and the features have stronger discrimination capability while reducing the intra-class difference, namely the class distinction is larger and larger; in the classification performance, the combination of the central loss function and the cross entropy loss function is superior to a network model only using the cross entropy loss function as a target function, and the accurate classification is required from the aspects of increasing the inter-class distance and reducing the intra-class distance, and the improvement of the feature resolution is facilitated.

And 6, defining an optimizer, setting an ideal learning rate for the model, setting the initial learning rate to be 0.001, and slowing down the learning rate along with the increase of the number of batches in the model training process, wherein the slowing down mechanism is as follows: if the loss stops decreasing within two or more training batches, the learning rate is decreased to

Training and parameter solving are carried out on the model by using a momentum-based random gradient descent type network optimization algorithm, a momentum factor mu is adjusted in a dynamic setting mode, the initial value of the mu is set to be 0.5, and then the initial value of the mu is gradually changed to be 0.9 along with the increase of the number of training batches, so that oscillation can be effectively inhibited, convergence tends to be carried out in the middle and later periods of network training, and the network parameter is helped to jump out of local limitation when oscillating back and forth near the local minimum value, so that a better network parameter is found.

And 7, when the network training module trains the convolutional neural network, firstly, 80% of data samples in the data set in the step 3 are selected as a training sample set, training data are randomly disturbed, and the data samples 'seen' by the model in different training batches are different, so that the processing mode not only can improve the convergence rate of the model, but also can improve the prediction result of the model on a test data set. And (5) defining an objective function in the step (5) and an optimizer in the step (6), adjusting network parameters and counting indexes. And (4) taking the network model in the step (4) as a training model to train the data sample, and storing the model after the training is finished so as to facilitate the loading of the model at a later stage.

And 8, the network test module evaluates by using a confusion matrix, wherein the matrix is a tool for quantifying the accuracy of the classification algorithm and is used for presenting the visual effect of classification performance, and the probability and the total accuracy of each type of ancient font are finally obtained by comparing the data predicted by the model with the test data and measuring the classification effect of the model by using the accuracy index.

The specific implementation of step 3 is as follows:

definition of fuzzy set X:

X＝{(x_mn,μ_x(x_mn))}

in the above formula, x_mnRepresenting the gray value of the pixel (m, n). Wherein, for binarization, each pixel should have a very similar relation to the class (foreground or background) to which it belongs, and therefore, μ is used_x(x_mn) To express the pixel gray x_mnThe degree of association with the foreground/background threshold, i.e. the ambiguity of the pixel point (m, n) in the ambiguity set X:

in the above formula,. mu.₀Represents the background pixel mean, μ₁Representing the foreground pixel average, t representing the selected image gray threshold, and C representing the maximum pixel gray difference.

Definition of minimum information entropy E of an image blur matrix based on a histogram:

in the above formula, MN is the total number of image pixels, g is the gray level of image pixels, μ_x(g) Expressing the ambiguity of the gray level g, h (g) expressing the pixel number of the gray level g, and S expressing a Shannon formula, wherein the function is expressed as:

S(μ_A(x_i))＝-μ_A(x_i)ln[μ_A(x_i)]-[1-μ_A(x_i)]ln[1-μ_A(x_i)]

in the above formula,. mu._A(x_i) X in set A_iThe probability of occurrence. The shannon entropy function is used to measure the blurring of an image, i.e. to measure the blurring of a set of blurs.

The objective function described in step 5 is specifically as follows:

the final objective function form of the network can be expressed as:

in the above formula, λ is an adjustment parameter between two loss functions, and the greater λ is the intra-class difference accounting for the greater proportion of the whole objective function, and vice versa; wherein N is the number of training samples, and the input characteristic of the ith sample of the last classification layer of the network is x_iIts corresponding true label is y_iE {1,2, …, C }, and h ═ h (h)₁,h₂,…,h_C)^TIs the final output of the network, i.e. the prediction result of sample i, and the cross entropy loss function L_{cross entropy loss}Where C is the number of classes, the central loss function L_{center loss}In

Is the y_iThe mean ("center") of all depth features is classified.

The specific implementation of step 6 is as follows:

learning rate mitigation formula

Is defined as:

in the above formula, p is the number of training batches (epoch).

The invention has the following beneficial effects:

the method classifies the Chinese ancient fonts based on the convolutional neural network, has complete 5 standard ancient font data sample sets, combines the data preprocessing with a histogram equalization method, enables image details to be more obvious, reduces the influence of unnecessary noise in the image on an image prediction result through connected domain noise reduction processing, effectively displays image characteristic information after binarization processing based on a fuzzy set theory is carried out on the image, and can well distinguish the edge characteristics of the font image. The VGGNet model has good generalization performance, a convolutional neural network model framework with deeper depth is used, the good performance is achieved, the batch normalization can stabilize the learning process, the convergence rate of the model is effectively improved, the central loss function and the cross entropy loss function are used as the network model by the target function, the classification is accurate from the perspective of increasing the inter-class distance and reducing the intra-class distance, and the resolution capability of the features is effectively improved. Proper training skills are adopted, ideal network parameters, an optimization algorithm and learning rate are selected, the network is more stable, the result is more reliable, and the accuracy of ancient font classification is greatly improved.

Drawings

FIG. 1 is a flow chart of the present invention.

Detailed Description

The invention is further illustrated by the following figures and examples.

As shown in the figure, the ancient font classification method based on the convolutional neural network specifically comprises the following steps:

step 1, acquiring a data set, crawling single calligraphy characters segmented in advance in a CADAL digital library by using Beautiful Soup in a crawler technology, firstly analyzing webpage HTML to acquire a source code, then putting read information into the Beautiful Soup, analyzing the information into an object to be processed, acquiring picture links in img labels by adopting a method of searching document trees, downloading the pictures to specified file addresses through the links, and finally acquiring five types of standard ancient font style images to form an ancient font image data set required by the experiment.

And 2, data expansion, namely expanding the number of data samples on the ancient font image data set obtained in the step 1, carrying out sample expansion aiming at the types with less data samples, wherein the existing sample images are randomly extracted and the number of the samples is expanded by applying a data expansion method by using image horizontal/vertical overturning, small-range rotation transformation, supervised data expansion deduction and scale transformation methods, so that the diversity of training samples and test samples is increased, on one hand, overfitting can be effectively avoided, and on the other hand, the improvement of model performance can be brought to a certain degree.

Firstly, gray processing is carried out on an original ancient font image;

Finally, binarization processing is carried out on the image through a fuzzy set theory, firstly, the fuzzy set X between a pixel point and a front background threshold value and a back background threshold value is established through the fuzzy set theory, namely, a fuzzy subset which is mapped to a [0,1] interval from the image X is defined, then, a complete fuzzy matrix is established through a mode of dynamically adjusting the threshold value, finally, the minimum information entropy E of the whole image fuzzy matrix is solved through a Shannon entropy function, and at the moment, the threshold value corresponding to the fuzzy matrix is the segmentation threshold value of the image binarization; the definition of the fuzzy set X is:

X＝{(x_mn,μ_x(x_mn))}

S(μ_A(x_i))＝-μ_A(x_i)ln[μ_A(x_i)]-[1-μ_A(x_i)]ln[1-μ_A(x_i)]

in the above formula,. mu._A(x_i) X in set A_iThe probability of occurrence. The shannon entropy function is used to measure the blurring of an image, i.e. to measure the blurring of a set of blurs. And taking the threshold t when the Shannon entropy value is minimum in the whole process as a final segmentation threshold.

And 5, defining an objective function, wherein the objective function is used for measuring the error between the predicted value and the real sample mark. Based on the objective function of the classification task, the central loss function is matched with the traditional cross entropy loss function for use, the central loss function also puts some attention on reducing the intra-class difference while considering the inter-class distance, and the features have stronger discrimination capability while reducing the intra-class difference, namely the class distinction is larger and larger; in the classification performance, the combination of the central loss function and the cross entropy loss function is superior to a network model only using the cross entropy loss function as a target function, the classification accuracy is required from the perspective of increasing the inter-class distance and reducing the intra-class distance, the resolution of the features is also improved, and the final target function form of the network can be expressed as follows:

Is the y_iThe mean ("center") of all depth features is classified.

Training and parameter solving are carried out on the model by using a momentum-based random gradient descent type network optimization algorithm, a momentum factor mu is adjusted in a dynamic setting mode, the initial value of the mu is set to be 0.5, and then the initial value of the mu is gradually changed to be 0.9 along with the increase of the number of training batches, so that oscillation can be effectively inhibited, convergence tends to be carried out in the middle and later periods of network training, and the network parameter is helped to jump out of local limitation when oscillating back and forth near the local minimum value, so that a better network parameter is found. Wherein, learning rate slowing formula

Is defined as:

in the above formula, p is the number of training batches (epoch).

Claims

1. A ancient font classification method based on a convolutional neural network is characterized by comprising the following steps:

step 1, acquiring a data set, namely crawling a single calligraphy character pre-segmented in a CADAL digital library by utilizing Beautiful Soup in a crawler technology to acquire five standard ancient font type images, and forming an ancient font image data set required by the experiment of the invention by utilizing the five standard ancient font type images;

step 2, data expansion, namely expanding the number of data samples on the ancient character image data set obtained in the step 1, and carrying out sample expansion aiming at the types with less data samples, wherein the expansion mode comprises the steps of using image horizontal/vertical overturning, small-range rotation transformation, deduction by a supervised data expansion method and scale transformation, randomly extracting the existing sample images and expanding the number of the samples by using the data expansion method, so that the diversity of training samples and test samples is increased, and finally, the number of the images of each type of ancient character samples is unified to obtain a complete data set;

step 3, preprocessing the image of the expanded complete data set, and processing the image into a square image; the preprocessing comprises image gray processing, image equal-ratio scaling, image edge filling, histogram equalization processing, a connected domain noise reduction algorithm and an image binarization algorithm based on a fuzzy set theory;

step 4, defining a convolutional neural network model, using a convolutional neural network based on a VGG19 model, and taking the image preprocessed in the step 3 as input;

step 5, defining an objective function, wherein the objective function is used for measuring the error between the predicted value and the real sample mark; based on the objective function of the classification task, the central loss function is matched with the traditional cross entropy loss function;

Training and parameter solving are carried out on the model by using a momentum-based random gradient descent type network optimization algorithm, a momentum factor mu is adjusted in a dynamic setting mode, the initial value of mu is set to be 0.5, and then the initial value of mu gradually becomes 0.9 along with the increase of the number of training batches, so that oscillation is effectively inhibited, and a better network parameter is found;

step 7, network training, namely when training the convolutional neural network, firstly selecting 80% of data samples in the data set in the step 3 as a training sample set, and randomly disordering training data to ensure that the data samples 'seen' by the model in different training batches are different; defining an objective function in the step 5 and an optimizer in the step 6, adjusting network parameters and counting indexes; training the data sample by taking the network model in the step 4 as a training model, and storing the model after the training is finished so as to facilitate the later rapid model loading;

and 8, network testing, namely evaluating by using a confusion matrix, wherein the matrix is a tool for quantifying the accuracy of the classification algorithm and is used for presenting the visual effect of classification performance, and the probability and the total accuracy of each type of ancient font are finally obtained by comparing the data predicted by the model with the test data and measuring the classification effect of the model by using the accuracy index.

2. The ancient font classification method based on the convolutional neural network as claimed in claim 1, wherein the preprocessing is performed on the image of the extended complete data set in step 3, and the method is specifically realized as follows:

firstly, gray processing is carried out on an original ancient font image;

secondly, acquiring the size of the image including the length, the width and the channel number through reshape; scaling the side with the larger length and width value as a reference, and scaling the side to a target value by a resize () function;

then, edge filling is carried out on the side with the smaller length and width values, the size of the image is expanded outwards according to the pixel value of the image boundary, the difference value between the expanded pixel point in each direction and the target size is half, and a square image with the image size being the set target size is obtained;

then, carrying out histogram equalization processing on the square image, and enabling uneven gray level distribution in the square image to occupy the whole gray domain through transformation; after histogram equalization processing, denoising the image by using an N8 connected denoising algorithm, and calculating 8 neighborhoods of each pixel point in the image to remove isolated noise points;

and finally, carrying out binarization processing on the image by using a fuzzy set theory, firstly establishing a fuzzy set X between a pixel point and a front background threshold and a rear background threshold by using the fuzzy set theory, namely defining a fuzzy subset which is mapped to a [0,1] interval from the image X, then establishing a complete fuzzy matrix by using a dynamic threshold adjusting mode, and finally solving the minimum information entropy E of the whole image fuzzy matrix by using a Shannon entropy function, wherein the threshold corresponding to the fuzzy matrix is the image binarization segmentation threshold at the moment.

3. The ancient font classification method based on the convolutional neural network as claimed in claim 2, wherein the step 3 is implemented as follows:

definition of fuzzy set X:

X＝{(x_mn，μ_x(x_mn))}

in the above formula, x_mnRepresenting the gray value of the pixel (m, n); wherein, for binarization, each pixel should have a very similar relation to the class to which it belongs, and therefore, μ is used_x(x_mn) To express the pixel gray x_mnThe degree of association with the foreground/background threshold, i.e. the ambiguity of the pixel point (m, n) in the ambiguity set X:

in the above formula,. mu.₀Represents the background pixel mean, μ₁Expressing the average value of foreground pixels, t expressing the gray threshold of the selected image, and C expressing the gray difference of the maximum pixels;

in the above formula, MN is the total number of image pixels, g is the gray level of image pixels, μ_x(g) Ambiguity representing gray level gH (g) represents the number of pixels of gray level g, S represents the Shannon formula, and the function thereof is expressed as:

S(μ_A(x_i))＝-μ_A(x_i)ln[μ_A(x_i)]-[1-μ_A(x_i)]ln[1-μ_A(x_i)]

in the above formula,. mu._A(x_i) X in set A_iThe probability of occurrence; the shannon entropy function is used to measure the blurring of an image, i.e. to measure the blurring of a set of blurs.

4. The ancient font classification method based on the convolutional neural network as claimed in claim 3, wherein the step 4 is implemented as follows:

first, in each model, a 3 × 3 convolution kernel sliding window, step 1, Padding 1, was used to preserve the input height and width, and the sliding window of the max pooling layer was 2 × 2, the down-sampling step 2; secondly, adding a BatchNorm layer after each convolution layer to ensure that the input of each layer of neural network keeps the same distribution in the network training process; then a non-linear ReLU activation function is used after each BatchNorm layer; then accessing 3 full-connection layers, and using random inactivation to prepare a network regularization method used by the convolutional neural network of the full-connection layers; and finally, transferring the data with the output dimension of 5 of the full connection layer into a Softmax function, wherein the full connection layer maps the network features to the mark space of the sample to make corresponding prediction.

5. The ancient font classification method based on the convolutional neural network as claimed in claim 4, wherein the objective function in step 5 is specifically as follows:

the final objective function form of the network can be expressed as:

in the above formula, λ is the adjustment parameter between two loss functions, and the greater λ is the intra-class differenceA greater proportion of the objective function, and vice versa; wherein N is the number of training samples, and the input characteristic of the ith sample of the last classification layer of the network is x_iIts corresponding true label is y_iE {1, 2.., C }, and h ═ C₁，h₂，...，h_C)^TIs the final output of the network, i.e. the prediction result of sample i, and the cross entropy loss function L_{cross entropy loss}Where C is the number of classes, the central loss function L_{center loss}In

Is the y_iThe mean of all depth features is classified.

6. The ancient font classification method based on the convolutional neural network as claimed in claim 5, wherein the step 6 is implemented as follows:

learning rate mitigation formula

Is defined as:

in the above formula, p is the number of training batches.