CN109800754B - Ancient font classification method based on convolutional neural network - Google Patents

Ancient font classification method based on convolutional neural network Download PDF

Info

Publication number
CN109800754B
CN109800754B CN201811487296.9A CN201811487296A CN109800754B CN 109800754 B CN109800754 B CN 109800754B CN 201811487296 A CN201811487296 A CN 201811487296A CN 109800754 B CN109800754 B CN 109800754B
Authority
CN
China
Prior art keywords
image
ancient
training
network
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811487296.9A
Other languages
Chinese (zh)
Other versions
CN109800754A (en
Inventor
吴以凡
赵月
张桦
戴国骏
史建凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dianzi University
Original Assignee
Hangzhou Dianzi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dianzi University filed Critical Hangzhou Dianzi University
Priority to CN201811487296.9A priority Critical patent/CN109800754B/en
Publication of CN109800754A publication Critical patent/CN109800754A/en
Application granted granted Critical
Publication of CN109800754B publication Critical patent/CN109800754B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses an ancient font classification method based on a convolutional neural network. The method comprises the steps of firstly, crawling an ancient font category image data set by using a crawler technology, enabling training set samples to tend to be balanced through data expansion, carrying out graying processing on the balanced training set samples and resizing an image to a target image, then carrying out histogram equalization processing on the sample set, removing isolated noise points in the image through an N8 communicated noise reduction algorithm, and finally carrying out binarization processing on the image by using a Shannon entropy function based on a fuzzy set theory, wherein the detail characteristics of the image are well kept; the classification task-based target function uses the central loss function and the traditional cross entropy loss function in a matched mode, the inter-class distance is increased, the intra-class distance is reduced, the distinguishing capability of the features is improved to a certain extent, the preprocessed image is trained through a predefined network model, and the accuracy of the classification result is evaluated by using a confusion matrix.

Description

Ancient font classification method based on convolutional neural network
Technical Field
The invention relates to the field of Chinese traditional Chinese character image processing, in particular to an ancient font calligraphy classification method based on a convolutional neural network.
Background
Chinese characters, as traditional Chinese characters, have been used for thousands of years, and are also important components of traditional Chinese arts and cultures. However, time has resulted in the efflorescence and damage of old written works, and it is therefore necessary to protect these works using advanced techniques. The preprocessing (denoising) algorithm aiming at the Chinese ancient calligraphy works is provided, and on the basis, the convolutional neural network is used for classifying the data set so as to achieve better classification accuracy. Most ancient fonts (Chinese traditional calligraphy) are written by using Chinese traditional writing brushes, and the handwriting in the traditional writing brushes is much thicker and heavier than the handwriting in a hard pen, so that more shape information is stored in characters; however, weathered works present a lot of noise, largely affecting the classification effect.
In recent years, large volumes of ancient calligraphy have been digitized for research and widespread artistic practice. Thus, there is an increasing demand for ancient font recognition and classification. At present, many relevant solutions are available, which are mostly based on some feature extraction and K-neighborhood value techniques, and after the previous image preprocessing, the processing effect is not significant, and the solutions are generally used for the recognition of fonts and the extraction of single features. On the other hand, convolutional neural networks have been widely used for recognition of handwritten characters, but there is a lack of research on the orientation of ancient fonts in china. Based on the situation, the application of the convolutional neural network to the recognition of the style and the style of the ancient font is hoped to be explored, the goal of systematic classification is achieved, and a solid foundation is laid for the follow-up accurate recognition and the research and management of the ancient font. Aiming at the problems and the practical significance, the invention improves the capability in the aspect of data preprocessing, and trains a convolutional neural network model by optimizing parameter setting and utilizing proper training skills so as to realize better classification performance.
Disclosure of Invention
The invention aims to provide an ancient font classification method based on a convolutional neural network.
The method solves the problem of font style classification by applying the convolutional neural network based on deep learning to Chinese traditional calligraphy font classification. Firstly, preprocessing a data set image by combining histogram equalization and an image binarization algorithm based on a fuzzy set theory; and then, training a convolutional neural network on the preprocessed sample set to classify the preprocessed sample set, wherein experimental results show that the method can be used for more accurately classifying and identifying the degraded Chinese characters.
The method for classifying the ancient fonts based on the convolutional neural network has the classification problem that a discrete mapping relation is established after a built model is subjected to supervised learning, and an algorithm implementation module comprises a data set acquisition module, a data expansion module, an image preprocessing module, a convolutional neural network model module, an objective function module, an optimizer module, a network training module and a network testing module. The technical solution for achieving the purpose of the invention comprises the following steps:
step 1, acquiring a data set, namely crawling a single calligraphy character pre-segmented in a CADAL digital library by utilizing Beautiful Soup in a crawler technology to acquire five standard ancient font type images, and forming an image group by utilizing the five standard ancient font types to obtain the ancient font image data set required by the experiment.
And 2, expanding the ancient font image data set, expanding the number of data samples on the ancient font image data set obtained in the step 1, wherein the number of the samples of the ancient fonts of different styles, which are crawled by a crawler, is different, so that the model training is facilitated, the samples are expanded aiming at the categories with fewer data samples, and the number of the samples is expanded by randomly extracting the existing sample images and applying the data expansion method by using an image horizontal/vertical overturning method, a small-range rotating transformation method, a supervision type data expansion method deduction and scale transformation method, so that the diversity of training samples and test samples is increased, on one hand, overfitting can be effectively avoided, and on the other hand, the improvement of the model performance is brought to a certain degree.
And 3, carrying out preprocessing operation on the expanded complete ancient font image data set, wherein the preprocessing operation comprises image gray processing, image geometric scaling, image edge filling, histogram equalization processing, a connected domain noise reduction algorithm and an image binarization algorithm based on a fuzzy set theory. The original ancient font image is processed into a square image because the input to the convolutional neural network model is typically a square image.
Firstly, gray processing is carried out on an original ancient font image;
secondly, obtaining the size of the input image after the gray processing through reshape, wherein the size comprises the length, the width and the number of channels, scaling the image to a target value by taking the side with the larger length and width value as a reference and scaling the image to the target value by a resize () function;
then, edge filling is carried out on the side with the smaller length and width values, the picture is expanded outwards according to the pixel values of the image boundary, the expanded pixel points in each direction are half of the difference value of the target size, and a square image with the image size being the set target size is obtained;
then, histogram equalization processing is carried out on the square image, uneven gray level distribution in the square image occupies the whole gray level domain through transformation, and details are richer; and after histogram equalization processing, denoising the image by using an N8 connected denoising algorithm, and calculating 8 neighborhoods of each pixel point in the image to remove isolated noise points.
And finally, carrying out binarization processing on the image by using a fuzzy set theory, firstly establishing a fuzzy set X between a pixel point and a front background threshold and a rear background threshold by using the fuzzy set theory, namely defining a fuzzy subset which is mapped to a [0,1] interval from the image X, then establishing a complete fuzzy matrix by using a dynamic threshold adjusting mode, and finally solving the minimum information entropy E of the whole image fuzzy matrix by using a Shannon entropy function, wherein the threshold corresponding to the fuzzy matrix is the image binarization segmentation threshold.
And 4, defining a convolutional neural network model, and using the convolutional neural network based on the VGG19 model, wherein the image preprocessed in the step 3 is used as an input. First, in each model, a 3 × 3 convolution kernel sliding window, step 1, Padding 1, was used to preserve the input height and width, and the sliding window of the max pooling layer was 2 × 2, the down-sampling step 2; secondly, a BatchNorm layer is added behind each convolution layer, so that the input of each layer of neural network keeps the same distribution in the network training process, and the deep network model is easier and more stable to train; then, a nonlinear ReLU activation function is used after each BatchNorm layer, so that a rapid convergence effect is achieved; then 3 full-connection layers are accessed, and random inactivation (dropout) is used for preparing a network regularization method used by the convolutional neural network of the full-connection layers, so that the dependence among neurons is reduced to a certain extent, the occurrence of network overfitting is avoided, and the effect of improving the network generalization is remarkable; and finally, transmitting the obtained data with the output dimension of 5 of the full connection layer to a Softmax function, wherein the full connection layer maps the network characteristics to the mark space of the sample to make corresponding prediction.
And 5, defining an objective function, wherein the objective function is used for measuring the error between the predicted value and the real sample mark. Based on the objective function of the classification task, the central loss function is matched with the traditional cross entropy loss function for use, the central loss function also puts some attention on reducing the intra-class difference while considering the inter-class distance, and the features have stronger discrimination capability while reducing the intra-class difference, namely the class distinction is larger and larger; in the classification performance, the combination of the central loss function and the cross entropy loss function is superior to a network model only using the cross entropy loss function as a target function, and the accurate classification is required from the aspects of increasing the inter-class distance and reducing the intra-class distance, and the improvement of the feature resolution is facilitated.
And 6, defining an optimizer, setting an ideal learning rate for the model, setting the initial learning rate to be 0.001, and slowing down the learning rate along with the increase of the number of batches in the model training process, wherein the slowing down mechanism is as follows: if the loss stops decreasing within two or more training batches, the learning rate is decreased to
Figure BDA0001894850290000041
Training and parameter solving are carried out on the model by using a momentum-based random gradient descent type network optimization algorithm, a momentum factor mu is adjusted in a dynamic setting mode, the initial value of the mu is set to be 0.5, and then the initial value of the mu is gradually changed to be 0.9 along with the increase of the number of training batches, so that oscillation can be effectively inhibited, convergence tends to be carried out in the middle and later periods of network training, and the network parameter is helped to jump out of local limitation when oscillating back and forth near the local minimum value, so that a better network parameter is found.
And 7, when the network training module trains the convolutional neural network, firstly, 80% of data samples in the data set in the step 3 are selected as a training sample set, training data are randomly disturbed, and the data samples 'seen' by the model in different training batches are different, so that the processing mode not only can improve the convergence rate of the model, but also can improve the prediction result of the model on a test data set. And (5) defining an objective function in the step (5) and an optimizer in the step (6), adjusting network parameters and counting indexes. And (4) taking the network model in the step (4) as a training model to train the data sample, and storing the model after the training is finished so as to facilitate the loading of the model at a later stage.
And 8, the network test module evaluates by using a confusion matrix, wherein the matrix is a tool for quantifying the accuracy of the classification algorithm and is used for presenting the visual effect of classification performance, and the probability and the total accuracy of each type of ancient font are finally obtained by comparing the data predicted by the model with the test data and measuring the classification effect of the model by using the accuracy index.
The specific implementation of step 3 is as follows:
definition of fuzzy set X:
X={(xmnx(xmn))}
in the above formula, xmnRepresenting the gray value of the pixel (m, n). Wherein, for binarization, each pixel should have a very similar relation to the class (foreground or background) to which it belongs, and therefore, μ is usedx(xmn) To express the pixel gray xmnThe degree of association with the foreground/background threshold, i.e. the ambiguity of the pixel point (m, n) in the ambiguity set X:
Figure BDA0001894850290000051
Figure BDA0001894850290000052
in the above formula,. mu.0Represents the background pixel mean, μ1Representing the foreground pixel average, t representing the selected image gray threshold, and C representing the maximum pixel gray difference.
Definition of minimum information entropy E of an image blur matrix based on a histogram:
Figure BDA0001894850290000053
in the above formula, MN is the total number of image pixels, g is the gray level of image pixels, μx(g) Expressing the ambiguity of the gray level g, h (g) expressing the pixel number of the gray level g, and S expressing a Shannon formula, wherein the function is expressed as:
S(μA(xi))=-μA(xi)ln[μA(xi)]-[1-μA(xi)]ln[1-μA(xi)]
in the above formula,. mu.A(xi) X in set AiThe probability of occurrence. The shannon entropy function is used to measure the blurring of an image, i.e. to measure the blurring of a set of blurs.
The objective function described in step 5 is specifically as follows:
the final objective function form of the network can be expressed as:
Figure BDA0001894850290000054
in the above formula, λ is an adjustment parameter between two loss functions, and the greater λ is the intra-class difference accounting for the greater proportion of the whole objective function, and vice versa; wherein N is the number of training samples, and the input characteristic of the ith sample of the last classification layer of the network is xiIts corresponding true label is yiE {1,2, …, C }, and h ═ h (h)1,h2,…,hC)TIs the final output of the network, i.e. the prediction result of sample i, and the cross entropy loss function Lcross entropy lossWhere C is the number of classes, the central loss function Lcenter lossIn
Figure BDA0001894850290000063
Is the yiThe mean ("center") of all depth features is classified.
The specific implementation of step 6 is as follows:
learning rate mitigation formula
Figure BDA0001894850290000061
Is defined as:
Figure BDA0001894850290000062
in the above formula, p is the number of training batches (epoch).
The invention has the following beneficial effects:
the method classifies the Chinese ancient fonts based on the convolutional neural network, has complete 5 standard ancient font data sample sets, combines the data preprocessing with a histogram equalization method, enables image details to be more obvious, reduces the influence of unnecessary noise in the image on an image prediction result through connected domain noise reduction processing, effectively displays image characteristic information after binarization processing based on a fuzzy set theory is carried out on the image, and can well distinguish the edge characteristics of the font image. The VGGNet model has good generalization performance, a convolutional neural network model framework with deeper depth is used, the good performance is achieved, the batch normalization can stabilize the learning process, the convergence rate of the model is effectively improved, the central loss function and the cross entropy loss function are used as the network model by the target function, the classification is accurate from the perspective of increasing the inter-class distance and reducing the intra-class distance, and the resolution capability of the features is effectively improved. Proper training skills are adopted, ideal network parameters, an optimization algorithm and learning rate are selected, the network is more stable, the result is more reliable, and the accuracy of ancient font classification is greatly improved.
Drawings
FIG. 1 is a flow chart of the present invention.
Detailed Description
The invention is further illustrated by the following figures and examples.
As shown in the figure, the ancient font classification method based on the convolutional neural network specifically comprises the following steps:
step 1, acquiring a data set, crawling single calligraphy characters segmented in advance in a CADAL digital library by using Beautiful Soup in a crawler technology, firstly analyzing webpage HTML to acquire a source code, then putting read information into the Beautiful Soup, analyzing the information into an object to be processed, acquiring picture links in img labels by adopting a method of searching document trees, downloading the pictures to specified file addresses through the links, and finally acquiring five types of standard ancient font style images to form an ancient font image data set required by the experiment.
And 2, data expansion, namely expanding the number of data samples on the ancient font image data set obtained in the step 1, carrying out sample expansion aiming at the types with less data samples, wherein the existing sample images are randomly extracted and the number of the samples is expanded by applying a data expansion method by using image horizontal/vertical overturning, small-range rotation transformation, supervised data expansion deduction and scale transformation methods, so that the diversity of training samples and test samples is increased, on one hand, overfitting can be effectively avoided, and on the other hand, the improvement of model performance can be brought to a certain degree.
And 3, carrying out preprocessing operation on the expanded complete ancient font image data set, wherein the preprocessing operation comprises image gray processing, image geometric scaling, image edge filling, histogram equalization processing, a connected domain noise reduction algorithm and an image binarization algorithm based on a fuzzy set theory. The original ancient font image is processed into a square image because the input to the convolutional neural network model is typically a square image.
Firstly, gray processing is carried out on an original ancient font image;
secondly, obtaining the size of the input image after the gray processing through reshape, wherein the size comprises the length, the width and the number of channels, scaling the image to a target value by taking the side with the larger length and width value as a reference and scaling the image to the target value by a resize () function;
then, edge filling is carried out on the side with the smaller length and width values, the picture is expanded outwards according to the pixel values of the image boundary, the expanded pixel points in each direction are half of the difference value of the target size, and a square image with the image size being the set target size is obtained;
then, histogram equalization processing is carried out on the square image, uneven gray level distribution in the square image occupies the whole gray level domain through transformation, and details are richer; and after histogram equalization processing, denoising the image by using an N8 connected denoising algorithm, and calculating 8 neighborhoods of each pixel point in the image to remove isolated noise points.
Finally, binarization processing is carried out on the image through a fuzzy set theory, firstly, the fuzzy set X between a pixel point and a front background threshold value and a back background threshold value is established through the fuzzy set theory, namely, a fuzzy subset which is mapped to a [0,1] interval from the image X is defined, then, a complete fuzzy matrix is established through a mode of dynamically adjusting the threshold value, finally, the minimum information entropy E of the whole image fuzzy matrix is solved through a Shannon entropy function, and at the moment, the threshold value corresponding to the fuzzy matrix is the segmentation threshold value of the image binarization; the definition of the fuzzy set X is:
X={(xmnx(xmn))}
in the above formula, xmnRepresenting the gray value of the pixel (m, n). Wherein, for binarization, each pixel should have a very similar relation to the class (foreground or background) to which it belongs, and therefore, μ is usedx(xmn) To express the pixel gray xmnThe degree of association with the foreground/background threshold, i.e. the ambiguity of the pixel point (m, n) in the ambiguity set X:
Figure BDA0001894850290000081
Figure BDA0001894850290000082
in the above formula,. mu.0Represents the background pixel mean, μ1Representing the foreground pixel average, t representing the selected image gray threshold, and C representing the maximum pixel gray difference.
Definition of minimum information entropy E of an image blur matrix based on a histogram:
Figure BDA0001894850290000083
in the above formula, MN is the total number of image pixels, g is the gray level of image pixels, μx(g) Expressing the ambiguity of the gray level g, h (g) expressing the pixel number of the gray level g, and S expressing a Shannon formula, wherein the function is expressed as:
S(μA(xi))=-μA(xi)ln[μA(xi)]-[1-μA(xi)]ln[1-μA(xi)]
in the above formula,. mu.A(xi) X in set AiThe probability of occurrence. The shannon entropy function is used to measure the blurring of an image, i.e. to measure the blurring of a set of blurs. And taking the threshold t when the Shannon entropy value is minimum in the whole process as a final segmentation threshold.
And 4, defining a convolutional neural network model, and using the convolutional neural network based on the VGG19 model, wherein the image preprocessed in the step 3 is used as an input. First, in each model, a 3 × 3 convolution kernel sliding window, step 1, Padding 1, was used to preserve the input height and width, and the sliding window of the max pooling layer was 2 × 2, the down-sampling step 2; secondly, a BatchNorm layer is added behind each convolution layer, so that the input of each layer of neural network keeps the same distribution in the network training process, and the deep network model is easier and more stable to train; then, a nonlinear ReLU activation function is used after each BatchNorm layer, so that a rapid convergence effect is achieved; then 3 full-connection layers are accessed, and random inactivation (dropout) is used for preparing a network regularization method used by the convolutional neural network of the full-connection layers, so that the dependence among neurons is reduced to a certain extent, the occurrence of network overfitting is avoided, and the effect of improving the network generalization is remarkable; and finally, transmitting the obtained data with the output dimension of 5 of the full connection layer to a Softmax function, wherein the full connection layer maps the network characteristics to the mark space of the sample to make corresponding prediction.
And 5, defining an objective function, wherein the objective function is used for measuring the error between the predicted value and the real sample mark. Based on the objective function of the classification task, the central loss function is matched with the traditional cross entropy loss function for use, the central loss function also puts some attention on reducing the intra-class difference while considering the inter-class distance, and the features have stronger discrimination capability while reducing the intra-class difference, namely the class distinction is larger and larger; in the classification performance, the combination of the central loss function and the cross entropy loss function is superior to a network model only using the cross entropy loss function as a target function, the classification accuracy is required from the perspective of increasing the inter-class distance and reducing the intra-class distance, the resolution of the features is also improved, and the final target function form of the network can be expressed as follows:
Figure BDA0001894850290000091
in the above formula, λ is an adjustment parameter between two loss functions, and the greater λ is the intra-class difference accounting for the greater proportion of the whole objective function, and vice versa; wherein N is the number of training samples, and the input characteristic of the ith sample of the last classification layer of the network is xiIts corresponding true label is yiE {1,2, …, C }, and h ═ h (h)1,h2,…,hC)TIs the final output of the network, i.e. the prediction result of sample i, and the cross entropy loss function Lcross entropy lossWhere C is the number of classes, the central loss function Lcenter lossIn
Figure BDA0001894850290000092
Is the yiThe mean ("center") of all depth features is classified.
And 6, defining an optimizer, setting an ideal learning rate for the model, setting the initial learning rate to be 0.001, and slowing down the learning rate along with the increase of the number of batches in the model training process, wherein the slowing down mechanism is as follows: if the loss stops decreasing within two or more training batches, the learning rate is decreased to
Figure BDA0001894850290000101
Training and parameter solving are carried out on the model by using a momentum-based random gradient descent type network optimization algorithm, a momentum factor mu is adjusted in a dynamic setting mode, the initial value of the mu is set to be 0.5, and then the initial value of the mu is gradually changed to be 0.9 along with the increase of the number of training batches, so that oscillation can be effectively inhibited, convergence tends to be carried out in the middle and later periods of network training, and the network parameter is helped to jump out of local limitation when oscillating back and forth near the local minimum value, so that a better network parameter is found. Wherein, learning rate slowing formula
Figure BDA0001894850290000102
Is defined as:
Figure BDA0001894850290000103
in the above formula, p is the number of training batches (epoch).
And 7, when the network training module trains the convolutional neural network, firstly, 80% of data samples in the data set in the step 3 are selected as a training sample set, training data are randomly disturbed, and the data samples 'seen' by the model in different training batches are different, so that the processing mode not only can improve the convergence rate of the model, but also can improve the prediction result of the model on a test data set. And (5) defining an objective function in the step (5) and an optimizer in the step (6), adjusting network parameters and counting indexes. And (4) taking the network model in the step (4) as a training model to train the data sample, and storing the model after the training is finished so as to facilitate the loading of the model at a later stage.
And 8, the network test module evaluates by using a confusion matrix, wherein the matrix is a tool for quantifying the accuracy of the classification algorithm and is used for presenting the visual effect of classification performance, and the probability and the total accuracy of each type of ancient font are finally obtained by comparing the data predicted by the model with the test data and measuring the classification effect of the model by using the accuracy index.

Claims (6)

1. A ancient font classification method based on a convolutional neural network is characterized by comprising the following steps:
step 1, acquiring a data set, namely crawling a single calligraphy character pre-segmented in a CADAL digital library by utilizing Beautiful Soup in a crawler technology to acquire five standard ancient font type images, and forming an ancient font image data set required by the experiment of the invention by utilizing the five standard ancient font type images;
step 2, data expansion, namely expanding the number of data samples on the ancient character image data set obtained in the step 1, and carrying out sample expansion aiming at the types with less data samples, wherein the expansion mode comprises the steps of using image horizontal/vertical overturning, small-range rotation transformation, deduction by a supervised data expansion method and scale transformation, randomly extracting the existing sample images and expanding the number of the samples by using the data expansion method, so that the diversity of training samples and test samples is increased, and finally, the number of the images of each type of ancient character samples is unified to obtain a complete data set;
step 3, preprocessing the image of the expanded complete data set, and processing the image into a square image; the preprocessing comprises image gray processing, image equal-ratio scaling, image edge filling, histogram equalization processing, a connected domain noise reduction algorithm and an image binarization algorithm based on a fuzzy set theory;
step 4, defining a convolutional neural network model, using a convolutional neural network based on a VGG19 model, and taking the image preprocessed in the step 3 as input;
step 5, defining an objective function, wherein the objective function is used for measuring the error between the predicted value and the real sample mark; based on the objective function of the classification task, the central loss function is matched with the traditional cross entropy loss function;
and 6, defining an optimizer, setting an ideal learning rate for the model, setting the initial learning rate to be 0.001, and slowing down the learning rate along with the increase of the number of batches in the model training process, wherein the slowing down mechanism is as follows: if the loss stops decreasing within two or more training batches, the learning rate is decreased to
Figure FDA0002685217420000011
Training and parameter solving are carried out on the model by using a momentum-based random gradient descent type network optimization algorithm, a momentum factor mu is adjusted in a dynamic setting mode, the initial value of mu is set to be 0.5, and then the initial value of mu gradually becomes 0.9 along with the increase of the number of training batches, so that oscillation is effectively inhibited, and a better network parameter is found;
step 7, network training, namely when training the convolutional neural network, firstly selecting 80% of data samples in the data set in the step 3 as a training sample set, and randomly disordering training data to ensure that the data samples 'seen' by the model in different training batches are different; defining an objective function in the step 5 and an optimizer in the step 6, adjusting network parameters and counting indexes; training the data sample by taking the network model in the step 4 as a training model, and storing the model after the training is finished so as to facilitate the later rapid model loading;
and 8, network testing, namely evaluating by using a confusion matrix, wherein the matrix is a tool for quantifying the accuracy of the classification algorithm and is used for presenting the visual effect of classification performance, and the probability and the total accuracy of each type of ancient font are finally obtained by comparing the data predicted by the model with the test data and measuring the classification effect of the model by using the accuracy index.
2. The ancient font classification method based on the convolutional neural network as claimed in claim 1, wherein the preprocessing is performed on the image of the extended complete data set in step 3, and the method is specifically realized as follows:
firstly, gray processing is carried out on an original ancient font image;
secondly, acquiring the size of the image including the length, the width and the channel number through reshape; scaling the side with the larger length and width value as a reference, and scaling the side to a target value by a resize () function;
then, edge filling is carried out on the side with the smaller length and width values, the size of the image is expanded outwards according to the pixel value of the image boundary, the difference value between the expanded pixel point in each direction and the target size is half, and a square image with the image size being the set target size is obtained;
then, carrying out histogram equalization processing on the square image, and enabling uneven gray level distribution in the square image to occupy the whole gray domain through transformation; after histogram equalization processing, denoising the image by using an N8 connected denoising algorithm, and calculating 8 neighborhoods of each pixel point in the image to remove isolated noise points;
and finally, carrying out binarization processing on the image by using a fuzzy set theory, firstly establishing a fuzzy set X between a pixel point and a front background threshold and a rear background threshold by using the fuzzy set theory, namely defining a fuzzy subset which is mapped to a [0,1] interval from the image X, then establishing a complete fuzzy matrix by using a dynamic threshold adjusting mode, and finally solving the minimum information entropy E of the whole image fuzzy matrix by using a Shannon entropy function, wherein the threshold corresponding to the fuzzy matrix is the image binarization segmentation threshold at the moment.
3. The ancient font classification method based on the convolutional neural network as claimed in claim 2, wherein the step 3 is implemented as follows:
definition of fuzzy set X:
X={(xmn,μx(xmn))}
in the above formula, xmnRepresenting the gray value of the pixel (m, n); wherein, for binarization, each pixel should have a very similar relation to the class to which it belongs, and therefore, μ is usedx(xmn) To express the pixel gray xmnThe degree of association with the foreground/background threshold, i.e. the ambiguity of the pixel point (m, n) in the ambiguity set X:
Figure FDA0002685217420000031
Figure FDA0002685217420000032
in the above formula,. mu.0Represents the background pixel mean, μ1Expressing the average value of foreground pixels, t expressing the gray threshold of the selected image, and C expressing the gray difference of the maximum pixels;
definition of minimum information entropy E of an image blur matrix based on a histogram:
Figure FDA0002685217420000033
in the above formula, MN is the total number of image pixels, g is the gray level of image pixels, μx(g) Ambiguity representing gray level gH (g) represents the number of pixels of gray level g, S represents the Shannon formula, and the function thereof is expressed as:
S(μA(xi))=-μA(xi)ln[μA(xi)]-[1-μA(xi)]ln[1-μA(xi)]
in the above formula,. mu.A(xi) X in set AiThe probability of occurrence; the shannon entropy function is used to measure the blurring of an image, i.e. to measure the blurring of a set of blurs.
4. The ancient font classification method based on the convolutional neural network as claimed in claim 3, wherein the step 4 is implemented as follows:
first, in each model, a 3 × 3 convolution kernel sliding window, step 1, Padding 1, was used to preserve the input height and width, and the sliding window of the max pooling layer was 2 × 2, the down-sampling step 2; secondly, adding a BatchNorm layer after each convolution layer to ensure that the input of each layer of neural network keeps the same distribution in the network training process; then a non-linear ReLU activation function is used after each BatchNorm layer; then accessing 3 full-connection layers, and using random inactivation to prepare a network regularization method used by the convolutional neural network of the full-connection layers; and finally, transferring the data with the output dimension of 5 of the full connection layer into a Softmax function, wherein the full connection layer maps the network features to the mark space of the sample to make corresponding prediction.
5. The ancient font classification method based on the convolutional neural network as claimed in claim 4, wherein the objective function in step 5 is specifically as follows:
the final objective function form of the network can be expressed as:
Figure FDA0002685217420000041
in the above formula, λ is the adjustment parameter between two loss functions, and the greater λ is the intra-class differenceA greater proportion of the objective function, and vice versa; wherein N is the number of training samples, and the input characteristic of the ith sample of the last classification layer of the network is xiIts corresponding true label is yiE {1, 2.., C }, and h ═ C1,h2,...,hC)TIs the final output of the network, i.e. the prediction result of sample i, and the cross entropy loss function Lcross entropy lossWhere C is the number of classes, the central loss function Lcenter lossIn
Figure FDA0002685217420000044
Is the yiThe mean of all depth features is classified.
6. The ancient font classification method based on the convolutional neural network as claimed in claim 5, wherein the step 6 is implemented as follows:
learning rate mitigation formula
Figure FDA0002685217420000042
Is defined as:
Figure FDA0002685217420000043
in the above formula, p is the number of training batches.
CN201811487296.9A 2018-12-06 2018-12-06 Ancient font classification method based on convolutional neural network Active CN109800754B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811487296.9A CN109800754B (en) 2018-12-06 2018-12-06 Ancient font classification method based on convolutional neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811487296.9A CN109800754B (en) 2018-12-06 2018-12-06 Ancient font classification method based on convolutional neural network

Publications (2)

Publication Number Publication Date
CN109800754A CN109800754A (en) 2019-05-24
CN109800754B true CN109800754B (en) 2020-11-06

Family

ID=66556515

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811487296.9A Active CN109800754B (en) 2018-12-06 2018-12-06 Ancient font classification method based on convolutional neural network

Country Status (1)

Country Link
CN (1) CN109800754B (en)

Families Citing this family (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110378885B (en) * 2019-07-19 2023-07-04 王晓骁 WSI focus area automatic labeling method and system based on machine learning
CN110298810A (en) * 2019-07-24 2019-10-01 深圳市华星光电技术有限公司 Image processing method and image processing system
CN110889457B (en) * 2019-12-03 2022-08-19 深圳奇迹智慧网络有限公司 Sample image classification training method and device, computer equipment and storage medium
CN111161332A (en) * 2019-12-30 2020-05-15 上海研境医疗科技有限公司 Homologous pathology image registration preprocessing method, device, equipment and storage medium
CN111209428A (en) * 2020-01-03 2020-05-29 深圳前海微众银行股份有限公司 Image retrieval method, device, equipment and computer readable storage medium
CN111242131A (en) * 2020-01-06 2020-06-05 北京十六进制科技有限公司 Method, storage medium and device for image recognition in intelligent marking
CN113139629A (en) * 2020-01-16 2021-07-20 武汉金山办公软件有限公司 Font identification method and device, electronic equipment and storage medium
CN111325117B (en) * 2020-02-05 2024-01-26 北京字节跳动网络技术有限公司 Training method and device for target object recognition model and electronic equipment
CN111325205B (en) * 2020-03-02 2023-10-10 北京三快在线科技有限公司 Document image direction recognition method and device and model training method and device
CN111325196A (en) * 2020-03-05 2020-06-23 上海眼控科技股份有限公司 Vehicle frame number detection method and device, computer equipment and storage medium
CN111709443B (en) * 2020-05-09 2023-04-07 西安理工大学 Calligraphy character style classification method based on rotation invariant convolution neural network
CN111582225B (en) * 2020-05-19 2023-06-20 长沙理工大学 Remote sensing image scene classification method and device
CN111860571B (en) * 2020-06-03 2021-05-25 成都信息工程大学 Cloud microparticle classification method based on CIP data quality control
CN111860834A (en) * 2020-07-09 2020-10-30 中国科学院深圳先进技术研究院 Neural network tuning method, system, terminal and storage medium
CN112465042B (en) * 2020-12-02 2023-10-24 中国联合网络通信集团有限公司 Method and device for generating classified network model
CN112541544B (en) * 2020-12-09 2022-05-13 福州大学 Garbage classification method based on deep learning
CN112765348B (en) * 2021-01-08 2023-04-07 重庆创通联智物联网有限公司 Short text classification model training method and device
CN113326873A (en) * 2021-05-19 2021-08-31 云南电网有限责任公司电力科学研究院 Method for automatically classifying opening and closing states of power equipment based on data enhancement
KR102349510B1 (en) * 2021-06-08 2022-01-14 주식회사 산돌메타랩 Method for detecting font using neural network and system for the same method
KR102349506B1 (en) * 2021-06-08 2022-01-14 주식회사 산돌메타랩 Method for generating training data of neural network for detecting font and adjusting option of generation through training result
CN113569742B (en) * 2021-07-29 2023-04-07 西南交通大学 Broadband electromagnetic interference source identification method based on convolutional neural network
CN113870284A (en) * 2021-09-29 2021-12-31 柏意慧心(杭州)网络科技有限公司 Method, apparatus, and medium for segmenting medical images
CN113807316B (en) * 2021-10-08 2023-12-12 南京恩博科技有限公司 Training method and device of smoke concentration estimation model, electronic equipment and medium
CN113903043B (en) * 2021-12-11 2022-05-06 绵阳职业技术学院 Method for identifying printed Chinese character font based on twin metric model
CN114494791B (en) * 2022-04-06 2022-07-08 之江实验室 Attention selection-based transformer operation simplification method and device

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108416390A (en) * 2018-03-16 2018-08-17 西北工业大学 Hand-written script recognition methods based on two-dimensional convolution dimensionality reduction

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9881208B2 (en) * 2016-06-20 2018-01-30 Machine Learning Works, LLC Neural network based recognition of mathematical expressions
WO2018125926A1 (en) * 2016-12-27 2018-07-05 Datalogic Usa, Inc Robust string text detection for industrial optical character recognition
CN108664996B (en) * 2018-04-19 2020-12-22 厦门大学 Ancient character recognition method and system based on deep learning
CN108710831B (en) * 2018-04-24 2021-09-21 华南理工大学 Small data set face recognition algorithm based on machine vision
CN108898137B (en) * 2018-05-25 2022-04-12 黄凯 Natural image character recognition method and system based on deep neural network

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108416390A (en) * 2018-03-16 2018-08-17 西北工业大学 Hand-written script recognition methods based on two-dimensional convolution dimensionality reduction

Also Published As

Publication number Publication date
CN109800754A (en) 2019-05-24

Similar Documents

Publication Publication Date Title
CN109800754B (en) Ancient font classification method based on convolutional neural network
CN111325203B (en) American license plate recognition method and system based on image correction
CN109154978B (en) System and method for detecting plant diseases
Zhou et al. Multiscale water body extraction in urban environments from satellite images
CN108416353B (en) Method for quickly segmenting rice ears in field based on deep full convolution neural network
CN109522908A (en) Image significance detection method based on area label fusion
CN113128442B (en) Chinese character handwriting style identification method and scoring method based on convolutional neural network
CN111126404B (en) Ancient character and font recognition method based on improved YOLO v3
CN111738064B (en) Haze concentration identification method for haze image
CN111783782A (en) Remote sensing image semantic segmentation method fusing and improving UNet and SegNet
CN109840483B (en) Landslide crack detection and identification method and device
CN106874862B (en) Crowd counting method based on sub-model technology and semi-supervised learning
CN108764358A (en) A kind of Terahertz image-recognizing method, device, equipment and readable storage medium storing program for executing
CN107220655A (en) A kind of hand-written, printed text sorting technique based on deep learning
CN103246894B (en) A kind of ground cloud atlas recognition methods solving illumination-insensitive problem
CN110414616B (en) Remote sensing image dictionary learning and classifying method utilizing spatial relationship
CN111696046A (en) Watermark removing method and device based on generating type countermeasure network
CN113159045A (en) Verification code identification method combining image preprocessing and convolutional neural network
CN115690086A (en) Object-based high-resolution remote sensing image change detection method and system
CN113158977B (en) Image character editing method for improving FANnet generation network
CN109829511B (en) Texture classification-based method for detecting cloud layer area in downward-looking infrared image
CN105844299B (en) A kind of image classification method based on bag of words
CN115908363B (en) Tumor cell statistics method, device, equipment and storage medium
CN115033721A (en) Image retrieval method based on big data
CN114219049A (en) Fine-grained pencil and stone image classification method and device based on hierarchy constraint

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant