CN115359494A - Handwriting identification method and device based on deep learning - Google Patents

Handwriting identification method and device based on deep learning Download PDF

Info

Publication number
CN115359494A
CN115359494A CN202210860161.2A CN202210860161A CN115359494A CN 115359494 A CN115359494 A CN 115359494A CN 202210860161 A CN202210860161 A CN 202210860161A CN 115359494 A CN115359494 A CN 115359494A
Authority
CN
China
Prior art keywords
layer
convolutional
handwriting
convolution
size
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210860161.2A
Other languages
Chinese (zh)
Inventor
陈海波
罗志鹏
段鹏飞
徐振宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenyan Technology Beijing Co ltd
Original Assignee
Shenyan Technology Beijing Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenyan Technology Beijing Co ltd filed Critical Shenyan Technology Beijing Co ltd
Priority to CN202210860161.2A priority Critical patent/CN115359494A/en
Publication of CN115359494A publication Critical patent/CN115359494A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/15Cutting or merging image elements, e.g. region growing, watershed or clustering-based techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/20Image enhancement or restoration by the use of local operators
    • G06T5/30Erosion or dilatation, e.g. thinning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/191Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06V30/19173Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/30Writer recognition; Reading and verifying signatures

Abstract

The invention provides a handwriting identification method and a handwriting identification device based on deep learning, which relate to the technical field of note identification, can realize quick handwriting comparison and handwriting retrieval identification functions, and can also have better identification effect under the conditions of fuzzy handwriting, different writing types, different illumination conditions and the like; the method comprises the following steps: s1, acquiring picture data; s2, detecting characters; the method comprises two steps of text segmentation and post-processing; text segmentation: performing text segmentation on the picture by adopting ResNet of an FPN structure; and (3) post-treatment: performing post-processing on the text segmentation result by adopting a progressive size expansion algorithm; s3, classifying the characters by adopting an HwNet network; s4, character splicing pretreatment; s5, performing handwriting recognition by adopting an HwNet network; s6, calculating and outputting an identification result; the BatchNorm layer and the Scale layer are sequentially connected between the first eight convolutional layers and the corresponding PReLU layer of the HwNet network.

Description

Handwriting identification method and device based on deep learning
Technical Field
The invention relates to the technical field of note authentication, in particular to a handwriting authentication method and device based on deep learning.
Background
Handwriting authentication has wide application scenes in departments such as a public security system, a judicial system and the like.
Chinese patent publication No. CN103389336A discloses a mass spectrometry imaging method for rapidly identifying handwriting authenticity, which first uses a mixed solution of methanol and water in a certain proportion as an ionization reagent, and performs two-dimensional mass spectrometry scanning on the handwriting by adopting a surface desorption chemical ionization mass spectrometry and combining with a three-dimensional high-precision automatic control moving system; then obtaining a mass spectrum image of the characteristic compound in the handwriting according to the change of the ion signal intensity of the characteristic compound in the handwriting; and finally, analyzing the obtained mass spectrum image to identify the authenticity of the handwriting. The method for identifying the handwriting uses methanol, water and a three-dimensional high-precision automatic control moving system, uses more materials and tools, and has a complicated analysis method of mass spectrometry images, so that the method has a complicated process and low efficiency.
Chinese patent publication No. CN102252621A discloses a writing identification method, which uses light with an angle less than 10 ° with the plane of paper and light with an angle between 30 ° and 60 ° with the plane of paper to irradiate the strokes to be measured; shooting a stroke by using an optical observation instrument and digital camera equipment above paper, and respectively measuring the width of the stroke under a first condition and the width of a shadow section formed on the stroke by the concave edge of the stroke under the irradiation of light under a second condition; solving the ratio of the width of the shadow section to the width of the stroke to obtain a measurement index; repeating the steps to obtain measurement indexes corresponding to the strokes of the handwriting with different structure types, and taking the strokes of the handwriting with different structure types as a horizontal axis and the measurement indexes as a vertical axis to prepare a handwriting measurement index graph of the reference piece; the handwriting measuring index chart of the piece to be identified is prepared by the same method, and the authenticity can be identified by comparing the handwriting measuring index chart and the handwriting measuring index chart. The handwriting identification method uses an optical observation instrument and digital camera equipment, and has complex identification process and low efficiency.
The invention patent with publication number CN1200387C discloses a statistical handwriting identification and verification method based on single character, which comprises the steps of preprocessing a handwriting to be detected, and extracting four-direction line element characteristics; then using PCA (principal component analysis) to transform and reduce dimensions to obtain dimension reduction characteristics, and using LDA (linear discriminant analysis) to transform and extract handwriting characteristics; and finally, classifying and identifying by adopting a Euclidean distance classifier. The handwriting identification method comprises the steps of using PCA (principal component analysis), LDA (linear discriminant analysis), euclidean distance classifiers and the like, and has complex identification process and low efficiency.
In summary, the existing handwriting identification method has the problems of complex identification process and low efficiency.
Therefore, there is a need to develop a new method and apparatus for handwriting authentication based on deep learning to address the deficiencies of the prior art and to solve or alleviate one or more of the above problems.
Disclosure of Invention
In view of the above, the invention provides a handwriting authentication method and device based on deep learning, which can realize rapid handwriting comparison and handwriting retrieval recognition functions and have a good recognition effect under the conditions of fuzzy handwriting, different writing types, different illumination conditions and the like.
In one aspect, the invention provides a handwriting authentication method based on deep learning, and the method comprises the following steps:
s1, acquiring picture data;
s2, carrying out character detection on the acquired picture data;
the character detection comprises two steps of text segmentation and post-processing;
text segmentation: performing text segmentation on the picture by adopting ResNet of an FPN structure;
and (3) post-treatment: performing post-processing on the text segmentation result by adopting a progressive size expansion algorithm;
s3, classifying the characters by adopting an HwNet network;
s4, splicing the characters;
s5, performing handwriting recognition on the characters subjected to splicing pretreatment by adopting an HwNet network;
and S6, calculating and outputting the identification result.
The above-mentioned aspect and any possible implementation manner further provide an implementation manner, and the content of text segmentation in step S2 includes:
selecting 4P 2, P3, P4 and P5 with different sizes, wherein the number of channels of the 4 characteristic diagrams is 256;
respectively up-sampling the characteristic diagrams P3, P4 and P5 by 2, 4 and 8 times, so that the up-sampled P3, P4 and P5 are consistent with the size of P2;
splicing the feature maps P2, P3, P4 and P5, and then convolving with conv _3 x 3 to obtain a fused feature F;
performing n conv _1 × 1 convolutions, upsampling and sigmoid operations on the fused features F to obtain n corresponding segmentation masks, wherein the segmentation masks are M1, M2,. And Mn; n is a positive integer.
The above-mentioned aspect and any possible implementation further provides an implementation, in which the progressive resizing algorithm is obtained based on a breadth-first search, and includes the steps of:
starting the search from the segmentation mask with the smallest scale;
expand their area by adding more pixels in larger segmentation masks step by step;
the search is stopped until the largest segmentation mask is found.
In accordance with the foregoing aspect and any possible implementation manner, there is further provided an implementation manner, where the HwNet network structure in step S3 includes:
the input layer is a single character picture with the size of 112 × 3;
the first layer, convolutional layer, the number of convolutional kernels is 64, the size of each convolutional kernel is 3 × 3, the step size of the convolutional operation is 2, and 64 characteristic images with 56 × 56 pixels are obtained after the convolutional layer;
a second layer, which is a depth separation convolution layer, the number of convolution kernels is 64, the size of each convolution kernel is 3 x 3, and the step length of convolution operation is 1;
the bottleneck layer is a typical residual error structure;
an eighth layer, which is a common convolutional layer, the convolutional kernels have the size of 1 × 1, the number of the convolutional kernels is 512, the step length is 1, and 512 characteristic images with 7 × 7 pixels are obtained after the convolutional layer;
a ninth layer, which is a packet depth convolution layer, the number of convolution kernels is 512, the step length is 1, and 512 feature vectors with pixels of 1 × 1 are obtained after passing through the convolution layer;
the tenth layer is a normal 1 × 1 convolutional layer, the number of convolutional kernels is 128, the step size is 1, and a 1 × 128-dimensional feature vector is output after passing through the convolutional layer.
In the aspect and any possible implementation manner described above, there is further provided an implementation manner, in step S3, the first layer to the eighth layer of the HwNet network sequentially connect one BatchNorm layer and one Scale layer between each layer and the corresponding prilu layer; normalizing the output of the neuron to a distribution with a mean value of 0 and a variance of 1 through the BatchNorm layer, so that the data distribution is easier to learn and less prone to overfitting, and reversing the damage of the BatchNorm layer to the features through the Scale layer, so that the deep neural network is easy to train;
the ninth layer, the packet-deep convolutional layer, is followed by a BatchNorm layer.
As to the above-mentioned aspect and any possible implementation manner, there is further provided an implementation manner, and the content of step S4 includes:
and the richness of the background content of the picture is improved by adopting the modes of inverting a gray scale picture, changing brightness contrast and corroding and expanding, and simultaneously 5X 5 splicing is carried out on a plurality of characters to obtain a spliced character picture containing 25 single characters with 5X 5 dimensions.
In accordance with the foregoing aspect and any possible implementation manner, there is further provided an implementation manner, where the HwNet network structure in step S5 includes:
the input layer inputs a single character splicing diagram with the size of 112 × 3;
the first layer, convolutional layers, the number of convolutional kernels is 64, the size of each convolutional kernel is 3 x 3, and the step size of the convolutional operation is 2; obtaining 64 characteristic images with 56 x 56 pixels after passing through the convolution layer,
a second layer, which deeply separates convolutional layers, the number of convolutional kernels of the convolutional layers is 64, the size of each convolutional kernel is 3 x 3, and the step size of the convolutional operation is 1;
the third layer to the seventh layer, the bottleneck layer, are all typical residual error structures;
an eighth layer, which is a common convolutional layer, the convolutional kernels have the size of 1 × 1, the number of the convolutional kernels is 512, the step length is 1, and 512 characteristic images with 7 × 7 pixels are obtained after the convolutional layer;
a ninth layer, grouping depth convolution layers, wherein the number of convolution kernels is 512, the step length is 1, and 512 feature vectors with 1 × 1 pixels are obtained after passing through the convolution layers;
in the tenth layer, a common 1 × 1 convolutional layer, the number of convolutional kernels is 128, the step size is 1, and a feature vector with dimensions of 1 × 128 is output after passing through the convolutional layer.
In the aspect and any possible implementation manner described above, there is further provided an implementation manner, in step S5, the first layer to the eighth layer of the HwNet network sequentially connect one BatchNorm layer and one Scale layer between each layer and the corresponding prilu layer; normalizing the output of the neuron to a distribution with a mean value of 0 and a variance of 1 through the BatchNorm layer, so that the data distribution is easier to learn and is not easy to over-fit, and reversing the damage of the BatchNorm layer to the characteristics through the Scale layer so that the deep neural network is easy to train;
the ninth layer, the packet-deep convolutional layer, is followed by a BatchNorm layer.
The above-described aspect and any possible implementation manner further provide an implementation manner, and the content of step S6 includes:
whether the pictures are written by the same person is judged by calculating the cosine similarity of the feature vectors of the two pictures.
In another aspect, the invention provides a handwriting evaluation device based on deep learning, comprising a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor implements the steps of the method when executing the computer program.
Compared with the prior art, one of the technical schemes has the following advantages or beneficial effects: the invention innovatively adopts a deep learning method to complete the whole process (detection, screening and identification) of handwriting identification, greatly simplifies the complexity of handwriting identification and simultaneously keeps good comparison and identification effects;
another technical scheme in the above technical scheme has the following advantages or beneficial effects: the invention uses the two-classification network to screen the user for writing, removes non-handwritten characters (background information such as printing characters, punctuations, other noises and the like), and enriches the characteristics of spliced characters;
another technical scheme in the above technical scheme has the following advantages or beneficial effects: the method adopts multi-character splicing, performs gray-scale image inversion on data, changes brightness contrast and corrosion expansion operation to simulate real scene fonts, increases the richness of a data set and improves the generalization capability of a model;
another technical scheme in the above technical scheme has the following advantages or beneficial effects: the loss function adopts a mode of cross entropy and triple fusion, and simultaneously adopts a self-adaptive margin strategy, so that the discrimination between samples during network training is increased, the class interval is increased, and the class internal distance is smaller, thereby improving the classification and identification effect of the model.
Of course, it is not necessary for any product to achieve all of the above-described technical effects simultaneously in the practice of the invention.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flowchart of a method for handwriting authentication based on deep learning according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a character detection process according to an embodiment of the present invention;
fig. 3 is a schematic diagram of an HwNet network structure according to an embodiment of the present invention;
FIG. 4 is a diagram of character detection and handwriting recognition provided by one embodiment of the present invention;
fig. 5 is a schematic diagram of a deep convolution according to an embodiment of the present invention.
Detailed Description
In order to better understand the technical scheme of the invention, the following detailed description of the embodiments of the invention is made with reference to the accompanying drawings.
It should be understood that the described embodiments are only some embodiments of the invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Aiming at the defects of the prior art, the invention provides a method based on deep learning, designs a handwriting identification method which is more stable and faster, and has good detection effect in some special scenes. The method comprises the steps of carrying out character detection on a handwriting writing picture by adopting an OCR detection algorithm, then carrying out handwritten character classification on the detected characters, screening out real handwriting characters, then carrying out 5-by-5 splicing on the screened characters to obtain a spliced graph with handwriting writing characteristics, and then inputting the spliced graph into a neural network for characteristic extraction. The data set adopts web crawler data, the data is subjected to grayscale map inversion operation, and corrosion expansion and contrast brightness change are carried out.
A handwriting authentication method based on deep learning is disclosed, as shown in figure 1, the flow is to input a picture with handwriting characters; inputting the picture into a character detection algorithm to perform character detection and segmentation to obtain single characters (Chinese characters); classifying the obtained character input character classification algorithm to screen out handwritten characters and remove printed characters; 5-by-5 splicing the screened handwritten characters and performing preprocessing operation; inputting the obtained spliced characters into a handwriting recognition algorithm for feature extraction; and judging whether the two handwriting are written by the same person or not by extracting the characteristics of the two different handwriting pictures and calculating the similarity. The detailed steps comprise:
step 1, acquiring picture data;
the acquired data is a picture with handwriting, the picture is input in a forward direction (the character is positive) as much as possible, and the data can be different handwriting types such as a gel pen, a chalk, a whiteboard pen and the like;
step 2, carrying out character detection on the acquired picture data;
step 3, classifying the segmented characters;
step 4, splicing the classified characters;
step 5, performing handwriting recognition on the characters subjected to splicing pretreatment;
and 6, outputting an identification result.
The method comprises the steps that a character detection part adopts a Psenet method, a character classification part adopts a two-classification network based on HwNet, a handwriting recognition part adopts a dynamic size crop mode to enlarge an image, and the richness of image background information is increased by adopting operations of taking the inverse of a gray image, corroding and expanding and changing contrast brightness, so that a real scene is simulated; the method has the advantages that the HwNet network is used for extracting the handwriting characteristics, the loss function adopts a mode of cross entropy and triple fusion, and meanwhile, the self-adaptive margin strategy is adopted, so that the discrimination between samples during network training is increased, the class interval is increased, the class inner distance is smaller, and the classification and identification effect of the model is improved.
The character segmentation in the step 2 is implemented by using a Psenet network, as shown in fig. 2, and includes two steps: text instance segmentation (1) and post-processing algorithm (2).
(1) Text instance segmentation
The base network adopts ResNet of FPN structure, selects feature maps (feature maps) { P2, P3, P4, P5}, the number of 4 feature map (feature map) channels with different sizes is 256, P3, P4, P5 respectively up- sample 2, 4, 8 times (the size is consistent with P2), splices P2, P3, P4, P5 and then connects conv _3 × 3 convolutional layer to process to obtain fused feature F, and obtains corresponding n segmentation masks through n conv _1 × 1 convolutions + upsampling + sigmoid operations of F, and the segmented masks are counted as M1, M2,. And Mn. It should be noted that M1, M2, mn represent different size segmentation masks of the text example, which is referred to as "kernel" in the present invention. In the same text example, the network outputs n segmentation masks with different sizes of "kernel", namely, M1, M2,. And Mn (the sizes are from small to large). Each kernel shares a similar shape to the original entire text instance and they are all located at the same center point but differ in scale.
(2) Post-processing algorithm
In order to solve the problem that adjacent text examples cannot be separately divided, a progressive size expansion algorithm is introduced for postprocessing a division mask output by a network. For the predicted n segmentation examples m1..... Mn, we use a progressive scale-expansion algorithm in order to obtain the final detection result. It is based on breadth-first search (BFS), comprising three steps:
1. starting with the kernel M1 with the smallest scale (instances can be distinguished in this step, different instances having different connected domains);
2. expand their area by adding progressively more pixels in a larger kernel;
3. this is done until the largest core is found.
The character classification in step 3 comprises the following steps:
the network used in the character classification is HwNet, the network layer structure is shown in fig. 3, for the character classification method, the input layer of the network is a single character picture with the size of 112 × 3, the number of convolution kernels of the first convolution layer is 64, the size of each convolution kernel is 3 × 3, the step length of the convolution operation is 2, a feature image with 64 pixels of 56 × 56 is obtained after the first convolution layer, a BatchNorm layer and a Scale layer are sequentially connected between the convolution layer and the corresponding PReLU layer, and the size of the feature image is not changed by the BatchNorm layer, the Scale layer and the PReLU layer. When the deep network has too many layers, signals and gradients are smaller and smaller, deep layers are difficult to train, namely gradient dispersion and possibly larger and larger, also called gradient explosion, the output of the neurons is normalized to the distribution with the mean value of 0 and the variance of 1 through the BatchNorm layer, so that the data distribution is easier to learn by the network and is not easy to over-fit, and the Scale layer reverses the damage of the BatchNorm layer to the characteristics so as to overcome the defect that the deep neural network is difficult to train. Since the input data distribution of each layer is always changed during training of the deep network, because the parameter update of the previous layer causes the distribution of the input data of the next layer, batchNorm and Scale can solve the problem that the data distribution of the network middle layer is changed during training, and the BatchNorm layer normalizes the output of the neuron to: the mean value is 0, the variance is 1, all neurons are normalized to a distribution after passing through a BatchNorm layer, but the distribution with the output limited to the mean value 0 and the variance 1 can weaken the expression capability of the network and destroy the characteristics learned by the previous layer, so a Scale layer is added, the damage to the characteristics is reversed by adding the learned reconstruction parameters, and the reversed program is adjusted during model training. This both normalizes the neurons and preserves the features learned by the previous layers, promoting model convergence and preventing overfitting to some extent as the input becomes stable. By normalizing the scales layer by layer, the disappearance and overflow of gradients are avoided, the convergence is accelerated, and the generalization capability is improved as a regularization technology; compared with the method that sigmoid tanh is used as an activation function, the calculation amount is large, the derivation calculation amount is also large when error gradient is calculated through back propagation, the sigmoid tanh function is easy to saturate, the gradient disappearance condition occurs, namely when convergence is approached, the transformation is too slow, and information loss is caused. The PReLU layer enables partial neurons to output a small negative value, sparsity is caused, overfitting is relieved, gradient disappearance is overcome by being closer to a real neuron activation model, and convergence is remarkably accelerated compared with sigmoid and tanh activation functions under the condition that unsupervised pre-training is not carried out (namely, a first hidden layer of a training network is trained, a second hidden layer is trained again, and finally, trained network parameter values are used as initial values of overall network parameters). The second layer is a deep separation convolution, the number of convolution kernels of the convolution layer is 64, the size of each convolution kernel is 3 multiplied by 3, the step length of the convolution operation is 1, 64 characteristic images with 56 multiplied by 56 pixels are obtained after the first convolution layer, and a BatchNorm layer and a Scale layer are sequentially connected between the convolution layer and the corresponding PReLU layer; the latter 5 bottleecks are more typical residual structures which are used as the layers of the backsboone deepening network to improve the nonlinear capability of the model (a BatchNorm layer and a Scale layer are sequentially connected between the five layers and the corresponding PReLU layers); the eighth layer is a common convolution layer, the convolution kernel size is 1 × 1, the number is 512, the step length is 1, 512 characteristic images with 7 × 7 pixels are obtained after the convolution layer, and a BatchNorm layer and a Scale layer are sequentially connected between the convolution layer and the corresponding PReLU layer; the ninth layer is a packet depth convolution layer, the structure of the layer is shown in fig. 4, the number of convolution kernels is 512, the step length is 1, 512 eigenvectors with 1 × 1 pixels are obtained after the layer, and then a BatchNorm layer is connected, so that the dimension change of a Scale layer is not needed any more; the last layer is a normal 1 × 1 convolution layer (the convolution layer is an output layer, and then the BatchNorm layer and the Scale layer are not needed to be connected), the number of convolution kernels is 128, the step length is 1, and finally, a feature vector with 1 × 128 dimensions is output, and the feature vector is finally used for the classification task of character handwritten words and non-handwritten words. Where we use Global Depthwise Convolition (denoted as GDConv) instead of Global average pooling. The depth-separated convolution kernel for the second layer is the same size as the input feature map, pad =0 and stride =1. The output is calculated as follows:
G m =∑ i,j K i,j,m ·F i,j,m (1)
in the formula, F is the input feature map size, K is the Depthwise convolution kernel, and G is the output feature map, with a size of 1 × m. The calculated cost and parameter number is W × H × M. The depth separation convolution is also called global weighted pooling, and is different from global average pooling in that: global weighted pooling gives each location a learnable weight; the method is effective for the aligned images in a weighting mode, for example, different weights are adopted for Chinese characters, the central position and the boundary position, and the identification accuracy can be greatly improved.
The content of splicing preprocessing in the step 4 comprises the following steps: and processing the handwritten character picture obtained in the step three, adopting operations such as grayscale image reflection, brightness contrast change, corrosion expansion and the like to improve the richness of the background content of the picture, and simultaneously splicing the characters by 5 × 5 to obtain a spliced character picture containing 25 single characters of 5 × 5.
In step 5, the content of handwriting recognition comprises:
the network HwNet is used in the handwriting recognition, the network layer structure is shown in fig. 3, and for the handwriting recognition method, a single size 112 × 3 character splicing diagram is input. The number of convolution kernels of the first convolution layer is 64, the size of each convolution kernel is 3 x 3, the step length of convolution operation is 2, a characteristic image with 64 pixels of 56 x 56 is obtained after the convolution layer passes through the first convolution layer, a BatchNorm layer and a Scale layer are sequentially connected between the convolution layer and the corresponding PReLU layer, and the size of the characteristic image is not changed by the PReLU layer after the BatchNorm layer and the Scale layer pass through. When the deep network has too many layers, signals and gradients are smaller and smaller, deep layers are difficult to train, namely gradient dispersion and possibly larger and larger, also called gradient explosion, the output of the neurons is normalized to the distribution with the mean value of 0 and the variance of 1 through the BatchNorm layer, so that the data distribution is easier to learn by the network and is not easy to over-fit, and the Scale layer reverses the damage of the BatchNorm layer to the characteristics so as to overcome the defect that the deep neural network is difficult to train. Since the input data distribution of each layer is always changed during training of the deep network, because the parameter update of the previous layer causes the distribution of the input data of the next layer, batchNorm and Scale can solve the problem that the data distribution of the network middle layer is changed during training, and the BatchNorm layer normalizes the output of the neuron to: the mean value is 0, the variance is 1, all neurons are normalized to a distribution after passing through a BatchNorm layer, but the distribution with the output limited to the mean value 0 and the variance 1 can weaken the expression capability of the network and destroy the characteristics learned by the previous layer, so a Scale layer is added, the damage to the characteristics is reversed by adding the learned reconstruction parameters, and the reversed program is adjusted during model training. Thus, the neuron is normalized, the characteristics learned by the front layer are kept, and as the input becomes stable, the convergence of the model is promoted and overfitting is prevented to a certain extent. By normalizing the scales layer by layer, the disappearance and overflow of gradients are avoided, the convergence is accelerated, and the generalization capability is improved as a regularization technology; compared with the method that sigmoid tanh is used as an activation function, the calculation amount is large, when error gradient is solved through back propagation, derivation calculation amount is also large, the sigmoid tanh function is easy to saturate, the gradient disappearance condition occurs, namely when convergence is approached, the transformation is too slow, and information loss is caused. The PReLU layer enables partial neurons to output a small negative value, sparsity is caused, overfitting is relieved, gradient disappearance is overcome by being closer to a real neuron activation model, and convergence is remarkably accelerated compared with sigmoid and tanh activation functions under the condition that unsupervised pre-training is not carried out (namely, a first hidden layer of a training network is trained, a second hidden layer is trained again, and finally, trained network parameter values are used as initial values of overall network parameters). The second layer is a deep separation convolution, the number of convolution kernels of the convolution layer is 64, the size of each convolution kernel is 3 multiplied by 3, the step length of the convolution operation is 1, 64 characteristic images with 56 multiplied by 56 pixels are obtained after the first convolution layer, and a BatchNorm layer and a Scale layer are sequentially connected between the convolution layer and the corresponding PReLU layer; the latter 5 bottleecks are more typical residual structures which are used as the layers of the backsboone deepening network to improve the nonlinear capability of the model (a BatchNorm layer and a Scale layer are sequentially connected between the five layers and the corresponding PReLU layers); the eighth layer is a common convolution layer, the convolution kernel size is 1 × 1, the number is 512, the step length is 1, 512 characteristic images with 7 × 7 pixels are obtained after the convolution layer, and a BatchNorm layer and a Scale layer are sequentially connected between the convolution layer and the corresponding PReLU layer; the ninth layer is a packet depth convolution layer, the structure of the layer is shown in fig. 4, the number of convolution kernels is 512, the step length is 1, 512 eigenvectors with 1 × 1 pixels are obtained after the layer, and then a BatchNorm layer is connected without a Scale layer; the last layer is a common 1 × 1 convolution layer, the number of convolution kernels is 128, the step length is 1, finally, a feature vector with 1 × 128 dimensions is output, and the feature vector is finally used for handwriting comparison and handwriting retrieval; the convolution layer is an output layer, and the BatchNorm layer and the Scale layer are not required to be connected.
The content of the authentication result output in the step 6 comprises:
using the characteristic vector obtained in the fifth step as an object for handwriting comparison and handwriting retrieval, and judging whether the handwriting comparison task is written by the same person by calculating cosine similarity of the characteristic vector obtained by the two pictures; for the handwriting retrieval task, firstly, feature extraction is carried out on handwriting pictures in a base library to obtain a feature vector base library, then feature extraction is carried out on the pictures to be retrieved, cosine similarity is calculated between the feature vectors of the pictures to be retrieved and all the feature vectors in the base library, retrieval results are sorted according to the similarity score, the higher the score is, the greater the similarity is, and the higher the ranking is.
The handwriting authentication method based on deep learning provided by the embodiment of the application is described in detail above. The above description of the embodiments is only for the purpose of helping to understand the method of the present application and its core ideas; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.
It is also noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a good or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such good or system. Without further limitation, an element defined by the phrases "comprising one of 8230; \8230;" 8230; "does not exclude the presence of additional like elements in articles of commerce or systems including such elements. "substantially" means within an acceptable error range, and a person skilled in the art can solve the technical problem within a certain error range to substantially achieve the technical effect.

Claims (10)

1. A handwriting authentication method based on deep learning is characterized by comprising the following steps:
s1, acquiring picture data;
s2, carrying out character detection on the acquired picture data;
the character detection comprises two steps of text segmentation and post-processing;
text segmentation: performing text segmentation on the picture by adopting ResNet of an FPN structure;
and (3) post-treatment: performing post-processing on the text segmentation result by adopting a progressive size expansion algorithm;
s3, classifying the characters by adopting an HwNet network;
s4, splicing the characters;
s5, performing handwriting recognition on the characters subjected to splicing pretreatment by adopting an HwNet network;
and S6, calculating and outputting the identification result.
2. The method for handwriting authentication based on deep learning of claim 1, wherein the text segmentation in step S2 comprises:
selecting 4P 2, P3, P4 and P5 with different sizes, wherein the number of channels of the 4 characteristic diagrams is 256;
respectively up-sampling the characteristic diagrams P3, P4 and P5 by 2, 4 and 8 times, so that the up-sampled P3, P4 and P5 are consistent with the size of P2;
splicing the feature maps P2, P3, P4 and P5, and then convolving with conv _3 x 3 to obtain a fused feature F;
performing n conv _ 1-by-1 convolution, upsampling and sigmoid operations on the fused feature F to obtain n corresponding segmentation masks, wherein the segmentation masks are M1, M2, and Mn; n is a positive integer.
3. The method for handwriting authentication based on deep learning of claim 2, wherein said progressive resizing algorithm is obtained based on breadth-first search, comprising the steps of:
starting the search from the segmentation mask with the smallest scale;
expand their area by adding progressively more pixels in the larger segmentation mask;
the search is stopped until the largest segmentation mask is found.
4. The deep learning based handwriting authentication method according to claim 1, wherein the network structure of HwNet in step S3 comprises:
the input layer is a single character picture with the size of 112 × 3;
the first layer, convolutional layer, the number of convolutional kernels is 64, the size of each convolutional kernel is 3 × 3, the step size of the convolutional operation is 2, and 64 characteristic images with 56 × 56 pixels are obtained after the convolutional layer;
a second layer, which is a depth separation convolution layer, the number of convolution kernels is 64, the size of each convolution kernel is 3 x 3, and the step length of convolution operation is 1;
the bottleneck layer is a typical residual error structure;
an eighth layer, which is a common convolutional layer, the convolutional kernels have the size of 1 × 1, the number of the convolutional kernels is 512, the step length is 1, and 512 characteristic images with 7 × 7 pixels are obtained after the convolutional layer;
a ninth layer, which is a packet depth convolution layer, the number of convolution kernels is 512, the step length is 1, and 512 feature vectors with pixels of 1 × 1 are obtained after passing through the convolution layer;
the tenth layer is a common 1 × 1 convolutional layer, the number of convolutional kernels is 128, the step size is 1, and a feature vector with 1 × 128 dimensions is output after passing through the convolutional layer.
5. The method for authenticating handwriting based on deep learning according to claim 4, wherein in step S3, a BatchNorm layer and a Scale layer are connected between each layer and the corresponding PReLU layer in sequence in the first to eighth layers of the HwNet network; normalizing the output of the neuron to a distribution with a mean value of 0 and a variance of 1 through the BatchNorm layer, so that the data distribution is easier to learn and less prone to overfitting, and reversing the damage of the BatchNorm layer to the features through the Scale layer, so that the deep neural network is easy to train;
the ninth layer, the packet-deep convolutional layer, is followed by a BatchNorm layer.
6. The method for handwriting authentication based on deep learning of claim 1, wherein the content of step S4 comprises:
and the richness of the background content of the picture is improved by adopting the modes of inverting a gray scale picture, changing brightness contrast and corroding and expanding, and simultaneously 5X 5 splicing is carried out on a plurality of characters to obtain a spliced character picture containing 25 single characters with 5X 5 dimensions.
7. The handwriting authentication method based on deep learning of claim 1, wherein the network structure of HwNet in step S5 comprises:
the input layer inputs a character splicing diagram with the size of a single character 112 x 3;
the first layer, convolutional layers, the number of convolutional kernels is 64, the size of each convolutional kernel is 3 x 3, and the step size of the convolutional operation is 2; after passing through the convolutional layer, 64 characteristic images with 56 x 56 pixels are obtained,
a second layer, which deeply separates convolutional layers, the number of convolutional kernels of the convolutional layers is 64, the size of each convolutional kernel is 3 x 3, and the step size of the convolutional operation is 1;
the third layer to the seventh layer, the bottleneck layer, are all typical residual error structures;
an eighth layer, which is a common convolutional layer, the convolutional kernels have the size of 1 × 1, the number of the convolutional kernels is 512, the step length is 1, and 512 characteristic images with 7 × 7 pixels are obtained after the convolutional layer;
a ninth layer, grouping depth convolution layers, wherein the number of convolution kernels is 512, the step length is 1, and 512 feature vectors with 1 × 1 pixels are obtained after passing through the convolution layers;
in the tenth layer, a common 1 × 1 convolutional layer, the number of convolutional kernels is 128, the step size is 1, and a feature vector with dimensions of 1 × 128 is output after passing through the convolutional layer.
8. The method for authenticating handwriting based on deep learning according to claim 7, wherein in step S5, a BatchNorm layer and a Scale layer are connected between each layer and the corresponding PReLU layer in sequence in the first to eighth layers of the HwNet network; normalizing the output of the neuron to a distribution with a mean value of 0 and a variance of 1 through the BatchNorm layer, so that the data distribution is easier to learn and is not easy to over-fit, and reversing the damage of the BatchNorm layer to the characteristics through the Scale layer so that the deep neural network is easy to train;
the ninth layer, the packet-deep convolutional layer, is followed by a BatchNorm layer.
9. The method for handwriting authentication based on deep learning of claim 1, wherein the content of step S6 comprises:
whether the pictures are written by the same person or not is judged by calculating the cosine similarity of the feature vectors of the two pictures.
10. A handwriting accreditation apparatus based on deep learning, comprising a memory, a processor and a computer program stored in the memory and operable on the processor, wherein: the processor when executing the computer program realizes the steps of the method according to any of claims 1-9.
CN202210860161.2A 2022-07-21 2022-07-21 Handwriting identification method and device based on deep learning Pending CN115359494A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210860161.2A CN115359494A (en) 2022-07-21 2022-07-21 Handwriting identification method and device based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210860161.2A CN115359494A (en) 2022-07-21 2022-07-21 Handwriting identification method and device based on deep learning

Publications (1)

Publication Number Publication Date
CN115359494A true CN115359494A (en) 2022-11-18

Family

ID=84031394

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210860161.2A Pending CN115359494A (en) 2022-07-21 2022-07-21 Handwriting identification method and device based on deep learning

Country Status (1)

Country Link
CN (1) CN115359494A (en)

Similar Documents

Publication Publication Date Title
CN110443143B (en) Multi-branch convolutional neural network fused remote sensing image scene classification method
CN110348319B (en) Face anti-counterfeiting method based on face depth information and edge image fusion
CN108492298B (en) Multispectral image change detection method based on generation countermeasure network
CN111738055B (en) Multi-category text detection system and bill form detection method based on same
CN111709313B (en) Pedestrian re-identification method based on local and channel combination characteristics
CN111160249A (en) Multi-class target detection method of optical remote sensing image based on cross-scale feature fusion
CN113011357A (en) Depth fake face video positioning method based on space-time fusion
US20200134382A1 (en) Neural network training utilizing specialized loss functions
CN111860309A (en) Face recognition method and system
CN112926652A (en) Fish fine-grained image identification method based on deep learning
CN113159045A (en) Verification code identification method combining image preprocessing and convolutional neural network
Fan et al. A novel sonar target detection and classification algorithm
CN110163274A (en) A kind of object classification method based on ghost imaging and linear discriminant analysis
CN111582057B (en) Face verification method based on local receptive field
CN112818774A (en) Living body detection method and device
CN112070116A (en) Automatic art painting classification system and method based on support vector machine
CN111832463A (en) Deep learning-based traffic sign detection method
Li et al. An improved PCB defect detector based on feature pyramid networks
Li et al. A new algorithm of vehicle license plate location based on convolutional neural network
US11715288B2 (en) Optical character recognition using specialized confidence functions
CN115761667A (en) Unmanned vehicle carried camera target detection method based on improved FCOS algorithm
CN115359494A (en) Handwriting identification method and device based on deep learning
Bureš et al. Semantic text segmentation from synthetic images of full-text documents
CN113239895A (en) SAR image change detection method of capsule network based on attention mechanism
Ahmed et al. Cursive scene text analysis by deep convolutional linear pyramids

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination