WO2023134402A1

WO2023134402A1 - Calligraphy character recognition method based on siamese convolutional neural network

Info

Publication number: WO2023134402A1
Application number: PCT/CN2022/140065
Authority: WO
Inventors: 冯伟; 欧宇浩; 周昭坤; 车其姝
Original assignee: 中国科学院深圳先进技术研究院
Priority date: 2022-01-14
Filing date: 2022-12-19
Publication date: 2023-07-20
Also published as: CN116486419A

Abstract

A calligraphy character recognition method based on a siamese convolutional neural network. The method comprises: obtaining a calligraphy character image to be recognized; inputting the calligraphy character image into a trained siamese convolutional neural network model, the siamese neural network model comprising a first convolutional neural network and a second convolutional neural network, wherein the first convolutional neural network outputs a corresponding first feature vector, and the second convolutional neural network outputs a corresponding second feature vector; calculating the similarity between the first feature vector and the second feature vector; and predicting the category of a calligraphy character on the basis of the similarity result. According to the method, the siamese convolutional neural network can be trained by means of a small number of samples or even a single sample, thereby reducing the training cost and significantly improving the accuracy of calligraphy character recognition.

Description

A Calligraphy Character Recognition Method Based on Siamese Convolutional Neural Network

technical field

The present invention relates to the technical field of calligraphy character recognition, and more specifically, to a method for recognizing calligraphy characters based on twin convolutional neural networks.

Background technique

The history and richness of Chinese calligraphy are extensive and far-reaching, but the current young generation still lacks understanding of some calligraphy characters. For example, climbing the Yueyang Tower, walking in the corridor of the new stele in the east of the building, facing the ancient Chinese characters full of dragons and phoenixes, can only be ashamed in vain, because it is difficult to recognize the content of the characters. Dyslexia can be overcome if it can be recognized quickly by a machine.

Calligraphy fonts can usually be divided into five categories: "Kai, Cao, Xing, Li, and Seal". The morphological characteristics of different fonts are quite different, and it may be difficult for ordinary people who have not studied systematically to recognize them. There are also applications and software with such requirements on the market, but the accuracy rate is not very high. For example, the handwritten continuous character "微" is easily judged as the character "Guo". The fundamental reason is that the existing recognition technology is only based on a simple feature comparison. Hundreds of pieces of data, after the user enters a word of information, through feature comparison, find the most matching result. This method requires extremely large data samples to improve the accuracy rate, but the samples of Chinese calligraphy characters are very small, so the accuracy rate of this recognition method is low, and the cost is too high.

In the prior art, solutions for calligraphy character recognition are generally divided into two categories. One is not to use neural network training, but to collect samples to build a large database, then search and compare the text to be recognized in the database, and take the one with the highest similarity as the recognition result. The second is to learn through the neural network. This method needs to collect a large number of sample data for training, and select the results that match the representation, so as to achieve the effect of accurate recognition.

For methods that do not use neural network training, for example, patent application publication number CN103093240A ("Calligraphy Character Recognition Method") extracts feature information after binarization, denoising and normalization processing of calligraphy characters, such as four boundary point positions, The average stroke crossing number, projection value, contour points, etc., and then extract the feature information of the calligraphy characters to be recognized, and then perform shape matching and comparison to give the recognition result. This method has a low recognition accuracy. As another example, the patent application publication number CN101785030A ("Handwritten Handwriting/Calligraphy Generation Based on Hidden Markov Model") uses a Markov model to generate handwritten characters. The trained Hidden Markov Model can use techniques such as maximum a posteriori techniques and maximum likelihood linearity, but this method also has the problem of low recognition accuracy.

For the method of training through the neural network, a large amount of data is required as support, but the calligraphy data set sample is small and difficult to collect. According to the latest edition of Xinhua Dictionary, there are more than 11,000 Chinese characters, of which 3,500 are commonly used. Dozens to thousands of samples need to be collected for each word, and the time cost required by the existing recognition technology is high and the accuracy rate is low. For example, the patent application publication number CN110334782A ("Multi-convolution layer-driven deep belief network calligraphy style recognition method") and the patent application publication number CN108764242A ("Offline handwritten Chinese font recognition method based on deep convolutional neural network") cannot be used in the sample Efficient training of neural networks with a small amount of data. And the patent application publication number CN108985348 ("Calligraphy style recognition method based on convolutional neural network") can only achieve calligraphy style recognition, but cannot achieve calligraphy character recognition.

In short, the accuracy rate of existing calligraphy character recognition methods is not high, mainly because of the various forms of calligraphy characters and the large space for individual calligraphers to develop. For some samples of calligraphy characters with non-standard shapes, manual programming algorithms for traditional feature extraction The recognition effect is not ideal; the sample size of some rare characters is small, so the database that can be used for machine learning is small, which leads to the unsatisfactory training effect of traditional machine vision algorithms based on deep learning.

Contents of the invention

The purpose of the present invention is to overcome the defective of above-mentioned prior art, a kind of calligraphy word recognition method based on Siamese convolutional neural network is provided. The method includes the following steps:

Obtain the picture of calligraphy characters to be recognized;

The calligraphic word picture is input into the twin convolutional neural network model through training, and this twinned neural network model comprises the first convolutional neural network and the second convolutional neural network, wherein the first convolutional neural network outputs the corresponding first feature Vector, the second feature vector corresponding to the output of the second convolutional neural network;

calculating the similarity between the first eigenvector and the second eigenvector;

The category of the calligraphy character is predicted based on the similarity result.

Compared with the prior art, the present invention has the advantage of being able to complete learning (few-/one-shot learning) with a small number of samples or even a single sample, thereby significantly reducing the amount of neural network training without losing accuracy. Neural networks can be successfully used for calligraphy character recognition. In addition, traditional deep learning methods based on convolutional neural networks cannot recognize objects that have not been encountered in training. If the neural network needs to recognize new objects, it is necessary to collect a large number of samples of the object, and the entire neural network (or At least the fully connected layers of the neural network) for retraining. However, the Siamese neural network architecture provided by the present invention does not directly output the label of the sample, but outputs the similarity value between the sample and other members in the sample library. Members are not similar", that is, it is judged that the object is an object that has never been seen. Due to the huge number of Chinese characters, it is difficult for any database to include all Chinese characters. This feature provided by the present invention is very important and enhances the recognition of calligraphy characters. robustness.

Other features of the present invention and advantages thereof will become apparent from the following detailed description of exemplary embodiments of the present invention with reference to the accompanying drawings.

Description of drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention.

Fig. 1 is the flow chart of the calligraphy word recognition method based on twin convolutional neural network according to one embodiment of the present invention;

FIG. 2 is an overall architecture diagram of a twin convolutional neural network according to an embodiment of the present invention;

3 is an overall architecture diagram of a twin convolutional neural network according to another embodiment of the present invention;

FIG. 4 is a specific structural diagram of a twin convolutional neural network according to an embodiment of the present invention;

Fig. 5 is a schematic diagram of a font sample according to an embodiment of the present invention;

Fig. 6 is a comparison diagram of experimental effects according to an embodiment of the present invention;

In the attached figure, Input Layer-input layer; input-input; output-output; none-none; Model-model; Functional-functionality; Euclidean Distance-Euclidean distance; Max Pooling-maximum pooling; Global Average Pooling - Global average pooling.

Detailed ways

Various exemplary embodiments of the present invention will now be described in detail with reference to the accompanying drawings. It should be noted that the relative arrangements of components and steps, numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present invention unless specifically stated otherwise.

The following description of at least one exemplary embodiment is merely illustrative in nature and in no way taken as limiting the invention, its application or uses.

Techniques, methods and devices known to those of ordinary skill in the relevant art may not be discussed in detail, but where appropriate, such techniques, methods and devices should be considered part of the description.

In all examples shown and discussed herein, any specific values should be construed as exemplary only, and not as limitations. Therefore, other instances of the exemplary embodiment may have different values.

It should be noted that like numerals and letters denote like items in the following figures, therefore, once an item is defined in one figure, it does not require further discussion in subsequent figures.

The present invention builds a model framework based on twin convolutional neural networks to realize calligraphy character recognition. In short, during the training process, the two samples in the training set are respectively input into two identical convolutional neural networks to obtain two feature vectors. Then calculate the similarity of the two feature vectors to obtain a value. If the tags of these two words are the same, they are the same word, and the similarity of their feature vectors is equal to 1. Conversely, if the labels of the two words are different, the similarity of their feature vectors is equal to 0. Moreover, the difference between the Boolean value of the reverse conduction label and the calculated similarity value is performed and stochastic gradient descent is performed to train the neural network. In practical applications, input the picture of calligraphy characters to be recognized into the trained twin convolutional neural network, output the corresponding feature vector, compare the feature vector with the members in the feature vector library, and select the member with the highest similarity. As the recognition result of the calligraphy character to be recognized. The present invention can be used to recognize calligraphy characters, and can also be used to recognize the fonts of calligraphy characters, such as regular script, cursive script, running script and the like.

Specifically, referring to FIG. 1 , the provided calligraphy character recognition method based on twin convolutional neural network includes the following steps.

Step S110, constructing a Siamese convolutional neural network model.

In one embodiment, as shown in FIG. 2, the overall architecture of the Siamese convolutional neural network includes an input layer, two convolutional neural networks, a pooling layer (marked as dense_1) and a fully connected layer (marked as dense_2). The processing process of the twin convolutional neural network is: receive two grayscale images of the same size, such as 100×100, and input the images into two identical deep convolutional neural networks (CNN) to extract features of different depths. For example, each convolutional neural network contains four levels of feature extraction structures, and each feature extraction structure mainly includes convolutional layers and pooling layers, see Table 1 below. Images are first sent to convolutional layers, followed by pooling layers. Then, apply the ReLU activation function and batch normalization (BN, Batchnomalization). In Figure 2, these layers are repeated four times, each time with a slightly different kernel size and number of kernels. Finally, a global pooling layer and a fully connected layer are applied. In the Figure 2 embodiment, two convolutional neural networks extract image features and represent them as feature vectors consisting of 48 values.

Table 1 Deep Convolutional Neural Network

In another embodiment, the structure of the Siamese convolutional neural network is shown in Figure 3 and Figure 4, wherein m and n are an integer between 28 and 1000, and x is between 10 and 100 an integer.

Specifically, the first feature extraction structure is set as:

32-128 convolution kernels are p×p matrices, where p is an integer between 5 and 15;

A k×k pooling layer, where k is an integer between 1 and 5;

Batchnomalization layer;

The dropout layer retains 25% to 75% of the number of neurons.

The second feature extraction structure is set as:

64-256 convolution kernels are q×q matrices, where q is an integer between 5 and 10;

A k×k pooling layer, where k is an integer between 1 and 5;

Batchnomalization layer;

The dropout layer retains 25% to 75% of the number of neurons.

The third feature extraction structure is set as:

64-256 convolution kernels are s×s matrices, where s is an integer between 2 and 6;

A k×k pooling layer, where k is an integer between 1 and 5;

Batchnomalization layer;

The dropout layer retains 25% to 75% of the number of neurons.

The fourth feature extraction structure is set as:

128-512 convolution kernels are t×t matrices, where t is an integer between 2 and 6;

A k×k pooling layer, where k is an integer between 1 and 5;

Batchnomalization layer;

The dropout layer retains 25% to 75% of the number of neurons.

Step S120, collect data sets, and build a training set to train the Siamese neural network model, the training set reflects the correspondence between words or fonts and sample pictures.

In this step, at first collect the data set, and then construct the training set, in one embodiment, this training set comprises a plurality of words (namely with word as category), and each word corresponds to one or more samples, wherein each word The corresponding samples reflect different font classes and different morphological characteristics.

For example, Chinese calligraphy characters can be downloaded from http://www.shufazidian.com/ website, as of July 23, 2021, the website has stored a total of 440,412 images, including 8 fonts and 6197 different characters. For commonly used characters, the number of corresponding fonts is more, and some font samples have few or no samples. Table 2 is a summary of word counts and the number of samples per word.

Table 2 Font categories and the number of samples for each different character

Among the downloaded images, most of the images contain only one word, but some contain multiple words, so images need to be segmented into individual words. Figure 5 illustrates an example word containing 38 samples and an image containing multiple words.

Specifically, first, for example, 1000 images containing multiple characters and 1000 images containing a single character are labeled. These two datasets were then set up to train a CNN with the same structure as used in Siamese Convolutional Neural Networks, and in this way achieved 99.8% accuracy in identifying whether an image contained multiple words or a single word. The high accuracy is due to the visually significant difference between single-word images and multi-word images. Then, all 440,412 images are classified into the corresponding categories (i.e. whether they belong to multi-word images or single-word images) using the trained CNN. In order to train the model based on a small number of samples, only a small number of samples per word is needed, so, in one embodiment, images with multiple words are deleted if the word already has three or more samples. For categories with fewer than three samples, images with multiple words are kept and separated into individual fonts.

Then, the collected data set is preprocessed. Preprocessing of input images includes organization of image files, normalization of image shape and color, normalization of image resolution, and creation of training and test sets. Considering that the resolution and color of different images are very different, the resolution is too low to cause information loss, and the resolution is too high to cause insufficient memory, preferably, 100×100 pixels are used. Since color usually does not play a role in calligraphy recognition, all images can be converted to grayscale. Then, the pixel values are normalized to a range of 0-1, and the pixel values are normalized to have a mean and unit variance of zero.

The twin convolutional neural network model provided by the present invention can realize the recognition of word categories or font categories. The recognition of fonts and fonts is trained separately for simpler design and more direct coding. At the same time, due to the small sample size of the data set, the training time is shorter. In order to recognize characters regardless of font, all fonts belonging to each character are merged, and then the samples in each font class are randomly divided into training set, validation set and test set in the ratio of 8:1:1. In another embodiment, in order to train and recognize fonts regardless of characters, all fonts belonging to each font are combined, and then the samples in each font category are randomly divided into training set and verification set in a ratio of 8:1:1 and the test set.

Preferably, the dataset is not subjected to noise removal, contrast enhancement, extraneous object removal, etc. Because the convolutional neural network used will automatically take these factors into account. In addition, in order to reduce the amount of sample data, the data diversity is increased by random rotation and/or displacement of samples.

In order to collect calligraphy characters as completely as possible as the training set of the twin convolutional neural network, the number of objects collected is more than 3000, and the number of samples for each object is greater than or equal to 1. For example, to control the number of samples of each word within 10 samples, the specific method is to randomly delete part of the members in the sample set of words whose number of samples is greater than 10, so that the final number of samples is less than 10. This can ensure that the trained twin convolutional neural network does not rely on large sample data sets. In the process of using after learning, if new words are encountered, it can be expanded efficiently without collecting a large number of new word samples for training. .

In another preferred embodiment, an abridged version of the data set is used. The reason for training the small-sample calligraphic character recognition model is the small number of some character samples and the need to be able to recognize new character categories that are not included in the dataset of 6197 Chinese calligraphy characters. In order to test the ability of siamese convolutional neural network in few-shot learning, the samples of each word are randomly deleted so that there are no more than 3 samples per word. Then, repeat the training, validation, and test set separation process above to create datasets for word and font recognition, respectively. Table 3 shows the word count and sample count statistics after the training set has been reduced.

Table 3 The number of words and the number of samples after the training set is reduced

Taking the word recognition category as an example, the training process of the Siamese convolutional neural network is shown in Figure 2 and Figure 3. Sample A and sample B with a resolution of m×n (for example, 100×100) are respectively input into two identical Convolutional neural network. The pictures of the two input characters are respectively calculated by the convolutional neural network, and after obtaining two 10- to 100-dimensional feature vectors of a single word, the two vectors are calculated by Euclidean distance or cosine similarity.

If the input samples A and B are the same word, the Euclidean distance of the two output feature vectors is smaller or the cosine similarity is larger; if the output A and B are not the same word, the two output feature vectors The Euclidean distance of the eigenvectors is larger or the cosine similarity is smaller.

Compare the calculated distance value or similarity value with the known labels of the two words to get a Boolean value (for example, the Boolean value of the two input pictures is the same word "1", and the Boolean value of different words is "0 ") to compare and find their difference.

For example, after two images are sent through two identical CNSs, two feature vectors are obtained, each of dimension 48. Then, simply compare the Euclidean distance between the two vectors to obtain a measure of the similarity between the two images. Finally, the two sigmoid functions are applied consecutively. The output is a boolean value: 0 means the two images contain different tokens, 1 means the two images contain the same word.

During the training process, the similarity difference is used as a loss function for back propagation (back propagation), which can update all the weights and biases of the entire Siamese neural network architecture, thereby completing the training.

It should be noted that, during the training process or the actual application process, if you encounter characters you have never seen before, you can include new characters in the character library (feature vector library) to increase the number of characters recognized by the model.

Step S130, using the target picture containing calligraphy characters as input, using the trained Siamese neural network model to predict the character category or font category.

After the model training is completed, the target image can be recognized in real time. For example, for the category of a picture to be predicted, the same number of pictures can be extracted from different categories, and then input into the twin neural network for prediction with this picture to be predicted, and obtained by calculating which one is similar to the image of different categories forecast result.

In order to further verify the effect of the present invention, experiments were carried out. First train to recognize word classes. Samples of the same word from different fonts are merged together. Then, the samples in each word class are randomly divided into training set, validation set and test set in the ratio of 8:1:1. The image is sent through SNN (Twin Neural Network), and the training result is shown in Figure 6, which shows the training loss and accuracy (Training Loss and Accuracy). The results show an accuracy of 94.5% on the training set with a loss of 0.5.

To train Siamese ConvNets to recognize fonts regardless of subtype, all subcategories belonging to each font are combined, and then samples in each font class are randomly split into training, validation, and testing at a ratio of 8:1:1. set. The training results show that the accuracy on the training set is 95.5% with a loss of 0.5.

To sum up, the present invention uses the twin convolutional neural network architecture to complete the training with small sample size data, and achieve high recognition accuracy. In addition, when encountering a word that does not exist in the training set, it will not be misclassified, but it will be recognized as a word that has not been seen before, and it can be recognized after seeing it only once.

The present invention can be a system, method and/or computer program product. A computer program product may include a computer readable storage medium having computer readable program instructions thereon for causing a processor to implement various aspects of the present invention.

A computer readable storage medium may be a tangible device that can retain and store instructions for use by an instruction execution device. A computer readable storage medium may be, for example, but is not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of computer-readable storage media include: portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM), or flash memory), static random access memory (SRAM), compact disc read only memory (CD-ROM), digital versatile disc (DVD), memory stick, floppy disk, mechanically encoded device, such as a printer with instructions stored thereon A hole card or a raised structure in a groove, and any suitable combination of the above. As used herein, computer-readable storage media are not to be construed as transient signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., pulses of light through fiber optic cables), or transmitted electrical signals.

Computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or downloaded to an external computer or external storage device over a network, such as the Internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers. A network adapter card or a network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in each computing/processing device .

Computer program instructions for carrying out operations of the present invention may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-related instructions, microcode, firmware instructions, state setting data, or Source or object code written in any combination, including object-oriented programming languages—such as Smalltalk, C++, Python, etc., and conventional procedural programming languages—such as the “C” language or similar programming languages. Computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server implement. In cases involving a remote computer, the remote computer can be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as via the Internet using an Internet service provider). connect). In some embodiments, an electronic circuit, such as a programmable logic circuit, field programmable gate array (FPGA), or programmable logic array (PLA), can be customized by utilizing state information of computer-readable program instructions, which can Various aspects of the invention are implemented by executing computer readable program instructions.

Aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It should be understood that each block of the flowcharts and/or block diagrams, and combinations of blocks in the flowcharts and/or block diagrams, can be implemented by computer-readable program instructions.

These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine such that when executed by the processor of the computer or other programmable data processing apparatus , producing an apparatus for realizing the functions/actions specified in one or more blocks in the flowchart and/or block diagram. These computer-readable program instructions can also be stored in a computer-readable storage medium, and these instructions cause computers, programmable data processing devices and/or other devices to work in a specific way, so that the computer-readable medium storing instructions includes An article of manufacture comprising instructions for implementing various aspects of the functions/acts specified in one or more blocks in flowcharts and/or block diagrams.

It is also possible to load computer-readable program instructions into a computer, other programmable data processing device, or other equipment, so that a series of operational steps are performed on the computer, other programmable data processing device, or other equipment to produce a computer-implemented process , so that instructions executed on computers, other programmable data processing devices, or other devices implement the functions/actions specified in one or more blocks in the flowcharts and/or block diagrams.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in a flowchart or block diagram may represent a module, a portion of a program segment, or an instruction that includes one or more Executable instructions. In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved. It should also be noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by a dedicated hardware-based system that performs the specified function or action , or may be implemented by a combination of dedicated hardware and computer instructions. It is well known to those skilled in the art that implementation by means of hardware, implementation by means of software, and implementation by a combination of software and hardware are all equivalent.

Having described various embodiments of the present invention, the foregoing description is exemplary, not exhaustive, and is not limited to the disclosed embodiments. Many modifications and alterations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen to best explain the principle of each embodiment, practical application or technical improvement in the market, or to enable other ordinary skilled in the art to understand each embodiment disclosed herein. The scope of the invention is defined by the appended claims.

Claims

A method for recognizing calligraphy characters based on twin convolutional neural networks, comprising the following steps:

Obtain the picture of calligraphy characters to be recognized;

The calligraphic word picture is input into the twin convolutional neural network model through training, and this twinned neural network model comprises the first convolutional neural network and the second convolutional neural network, wherein the first convolutional neural network outputs the corresponding first feature Vector, the second feature vector corresponding to the output of the second convolutional neural network;

calculating the similarity between the first eigenvector and the second eigenvector;

The category of the calligraphy character is predicted based on the similarity result.
The method according to claim 1, wherein the twin convolutional neural network model is trained according to the following steps:

Construct a training set, the training set uses words as categories, and each category corresponds to one or more sample pictures, wherein the sample pictures corresponding to each category reflect the font category and morphological features;

Using the training set to set the loss as the optimization target, train the twin convolutional neural network model, wherein for two sample pictures, input the first convolutional neural network and the second convolutional neural network respectively, and obtain two feature vectors, calculate the similarity of the two feature vectors, and use a Boolean value to mark the similarity results of the two feature vectors to represent whether the two sample pictures contain the same calligraphy characters, and the training process In , stochastic gradient descent is performed by backpropagating the difference between the Boolean value and the computed similarity value.
The method according to claim 2, wherein the number of words contained in the training set is greater than 3000, and the number of samples corresponding to each word is less than or equal to 10.
The method according to claim 1, wherein the first convolutional neural network and the second convolutional neural network have the same structure, each comprising four feature extraction structures, wherein:

The first feature extraction structure includes a convolution layer, the number of convolution kernels is set to 32-128, and the size of the convolution kernel is set to p×p, where p is an integer between 5 and 15; a k×k pooling layer, k is an integer between 1 and 5; a batch normalization layer; a dropout layer, set to retain 25%-75% of the number of neurons;

The second feature extraction structure includes a convolution layer, the number of convolution kernels is set to 64-256, the size of the convolution kernel is set to q×q, and q is an integer between 5 and 10; the pooling layer of k×k, k is an integer between 1 and 5; batch normalization layer; dropout layer, set to keep 25%-75% of the number of neurons;

The third feature extraction structure includes a convolution layer, the number of convolution kernels is set to 64-256, and the size of the convolution kernel is set to s×s, where s is an integer between 2 and 6; pooling of k×k layer, k is an integer between 1 and 5; batch normalization layer; dropout layer, set to retain 25%-75% of the number of neurons;

The fourth feature extraction structure includes a convolution layer, the number of convolution kernels is set to 128-512, the size of the convolution kernel is set to t×t, and t is an integer between 2 and 6; the pooling layer of k×k , k is an integer between 1 and 5; batch normalization layer; dropout layer, set to keep 25%-75% of the number of neurons.
The method according to claim 1, characterized in that Euclidean distance or cosine similarity is used to measure the similarity between the first feature vector and the second feature vector.
The method according to claim 1, further comprising:

Judging whether the character to be recognized exists in the font library according to the similarity result between the first feature vector and the second feature vector;

If the judgment result is negative, include the calligraphy character to be recognized in the font library.
The method according to claim 2, further comprising: using a second training set to train the twin convolutional neural network model, in the second training set, fonts are used as categories, and each category corresponds to one or more Sample image.
The method according to claim 2, characterized in that, for the training set, if a word has three or more than three samples, then delete the sample pictures with multiple fonts, if a word is less than three sample, keep images with multiple fonts and separate them into individual fonts.
A computer-readable storage medium, on which a computer program is stored, wherein, when the program is executed by a processor, the steps of the method according to any one of claims 1 to 8 are implemented.
A computer device comprising a memory and a processor, wherein a computer program capable of running on the processor is stored on the memory, wherein any one of claims 1 to 8 is implemented when the processor executes the program The steps of the method described in the item.