WO2023134402A1 - Calligraphy character recognition method based on siamese convolutional neural network - Google Patents

Calligraphy character recognition method based on siamese convolutional neural network Download PDF

Info

Publication number
WO2023134402A1
WO2023134402A1 PCT/CN2022/140065 CN2022140065W WO2023134402A1 WO 2023134402 A1 WO2023134402 A1 WO 2023134402A1 CN 2022140065 W CN2022140065 W CN 2022140065W WO 2023134402 A1 WO2023134402 A1 WO 2023134402A1
Authority
WO
WIPO (PCT)
Prior art keywords
neural network
convolutional neural
layer
similarity
calligraphy
Prior art date
Application number
PCT/CN2022/140065
Other languages
French (fr)
Chinese (zh)
Inventor
冯伟
欧宇浩
周昭坤
车其姝
Original Assignee
中国科学院深圳先进技术研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国科学院深圳先进技术研究院 filed Critical 中国科学院深圳先进技术研究院
Publication of WO2023134402A1 publication Critical patent/WO2023134402A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/22Character recognition characterised by the type of writing
    • G06V30/226Character recognition characterised by the type of writing of cursive writing

Definitions

  • the present invention relates to the technical field of calligraphy character recognition, and more specifically, to a method for recognizing calligraphy characters based on twin convolutional neural networks.
  • Calligraphy fonts can usually be divided into five categories: "Kai, Cao, Xing, Li, and Seal".
  • the morphological characteristics of different fonts are quite different, and it may be difficult for ordinary people who have not studied systematically to recognize them.
  • the handwritten continuous character " ⁇ " is easily judged as the character "Guo”.
  • the fundamental reason is that the existing recognition technology is only based on a simple feature comparison. Hundreds of pieces of data, after the user enters a word of information, through feature comparison, find the most matching result. This method requires extremely large data samples to improve the accuracy rate, but the samples of Chinese calligraphy characters are very small, so the accuracy rate of this recognition method is low, and the cost is too high.
  • solutions for calligraphy character recognition are generally divided into two categories.
  • One is not to use neural network training, but to collect samples to build a large database, then search and compare the text to be recognized in the database, and take the one with the highest similarity as the recognition result.
  • the second is to learn through the neural network. This method needs to collect a large number of sample data for training, and select the results that match the representation, so as to achieve the effect of accurate recognition.
  • patent application publication number CN103093240A extracts feature information after binarization, denoising and normalization processing of calligraphy characters, such as four boundary point positions, The average stroke crossing number, projection value, contour points, etc., and then extract the feature information of the calligraphy characters to be recognized, and then perform shape matching and comparison to give the recognition result.
  • This method has a low recognition accuracy.
  • the patent application publication number CN101785030A uses a Markov model to generate handwritten characters.
  • the trained Hidden Markov Model can use techniques such as maximum a posteriori techniques and maximum likelihood linearity, but this method also has the problem of low recognition accuracy.
  • the accuracy rate of existing calligraphy character recognition methods is not high, mainly because of the various forms of calligraphy characters and the large space for individual calligraphers to develop.
  • manual programming algorithms for traditional feature extraction The recognition effect is not ideal; the sample size of some rare characters is small, so the database that can be used for machine learning is small, which leads to the unsatisfactory training effect of traditional machine vision algorithms based on deep learning.
  • the purpose of the present invention is to overcome the defective of above-mentioned prior art, a kind of calligraphy word recognition method based on Siamese convolutional neural network is provided.
  • the method includes the following steps:
  • the calligraphic word picture is input into the twin convolutional neural network model through training, and this twinned neural network model comprises the first convolutional neural network and the second convolutional neural network, wherein the first convolutional neural network outputs the corresponding first feature Vector, the second feature vector corresponding to the output of the second convolutional neural network;
  • the category of the calligraphy character is predicted based on the similarity result.
  • the present invention has the advantage of being able to complete learning (few-/one-shot learning) with a small number of samples or even a single sample, thereby significantly reducing the amount of neural network training without losing accuracy.
  • Neural networks can be successfully used for calligraphy character recognition.
  • traditional deep learning methods based on convolutional neural networks cannot recognize objects that have not been encountered in training. If the neural network needs to recognize new objects, it is necessary to collect a large number of samples of the object, and the entire neural network (or At least the fully connected layers of the neural network) for retraining.
  • the Siamese neural network architecture provided by the present invention does not directly output the label of the sample, but outputs the similarity value between the sample and other members in the sample library.
  • Fig. 1 is the flow chart of the calligraphy word recognition method based on twin convolutional neural network according to one embodiment of the present invention
  • FIG. 2 is an overall architecture diagram of a twin convolutional neural network according to an embodiment of the present invention.
  • FIG. 3 is an overall architecture diagram of a twin convolutional neural network according to another embodiment of the present invention.
  • FIG. 4 is a specific structural diagram of a twin convolutional neural network according to an embodiment of the present invention.
  • Fig. 5 is a schematic diagram of a font sample according to an embodiment of the present invention.
  • Fig. 6 is a comparison diagram of experimental effects according to an embodiment of the present invention.
  • Input Layer-input layer input-input; output-output; none-none; Model-model; Functional-functionality; Euclidean Distance-Euclidean distance; Max Pooling-maximum pooling; Global Average Pooling - Global average pooling.
  • the present invention builds a model framework based on twin convolutional neural networks to realize calligraphy character recognition.
  • the two samples in the training set are respectively input into two identical convolutional neural networks to obtain two feature vectors.
  • the difference between the Boolean value of the reverse conduction label and the calculated similarity value is performed and stochastic gradient descent is performed to train the neural network.
  • the present invention can be used to recognize calligraphy characters, and can also be used to recognize the fonts of calligraphy characters, such as regular script, cursive script, running script and the like.
  • the provided calligraphy character recognition method based on twin convolutional neural network includes the following steps.
  • Step S110 constructing a Siamese convolutional neural network model.
  • the overall architecture of the Siamese convolutional neural network includes an input layer, two convolutional neural networks, a pooling layer (marked as dense_1) and a fully connected layer (marked as dense_2).
  • the processing process of the twin convolutional neural network is: receive two grayscale images of the same size, such as 100 ⁇ 100, and input the images into two identical deep convolutional neural networks (CNN) to extract features of different depths.
  • CNN deep convolutional neural networks
  • each convolutional neural network contains four levels of feature extraction structures, and each feature extraction structure mainly includes convolutional layers and pooling layers, see Table 1 below. Images are first sent to convolutional layers, followed by pooling layers. Then, apply the ReLU activation function and batch normalization (BN, Batchnomalization).
  • the structure of the Siamese convolutional neural network is shown in Figure 3 and Figure 4, wherein m and n are an integer between 28 and 1000, and x is between 10 and 100 an integer.
  • the first feature extraction structure is set as:
  • 32-128 convolution kernels are p ⁇ p matrices, where p is an integer between 5 and 15;
  • a k ⁇ k pooling layer where k is an integer between 1 and 5;
  • the dropout layer retains 25% to 75% of the number of neurons.
  • the second feature extraction structure is set as:
  • 64-256 convolution kernels are q ⁇ q matrices, where q is an integer between 5 and 10;
  • a k ⁇ k pooling layer where k is an integer between 1 and 5;
  • the dropout layer retains 25% to 75% of the number of neurons.
  • the third feature extraction structure is set as:
  • 64-256 convolution kernels are s ⁇ s matrices, where s is an integer between 2 and 6;
  • a k ⁇ k pooling layer where k is an integer between 1 and 5;
  • the dropout layer retains 25% to 75% of the number of neurons.
  • the fourth feature extraction structure is set as:
  • 128-512 convolution kernels are t ⁇ t matrices, where t is an integer between 2 and 6;
  • a k ⁇ k pooling layer where k is an integer between 1 and 5;
  • the dropout layer retains 25% to 75% of the number of neurons.
  • Step S120 collect data sets, and build a training set to train the Siamese neural network model, the training set reflects the correspondence between words or fonts and sample pictures.
  • this step at first collect the data set, and then construct the training set, in one embodiment, this training set comprises a plurality of words (namely with word as category), and each word corresponds to one or more samples, wherein each word The corresponding samples reflect different font classes and different morphological characteristics.
  • Chinese calligraphy characters can be downloaded from http://www.shufazidian.com/ website, as of July 23, 2021, the website has stored a total of 440,412 images, including 8 fonts and 6197 different characters. For commonly used characters, the number of corresponding fonts is more, and some font samples have few or no samples. Table 2 is a summary of word counts and the number of samples per word.
  • Figure 5 illustrates an example word containing 38 samples and an image containing multiple words.
  • images with multiple words are deleted if the word already has three or more samples. For categories with fewer than three samples, images with multiple words are kept and separated into individual fonts.
  • Preprocessing of input images includes organization of image files, normalization of image shape and color, normalization of image resolution, and creation of training and test sets. Considering that the resolution and color of different images are very different, the resolution is too low to cause information loss, and the resolution is too high to cause insufficient memory, preferably, 100 ⁇ 100 pixels are used. Since color usually does not play a role in calligraphy recognition, all images can be converted to grayscale. Then, the pixel values are normalized to a range of 0-1, and the pixel values are normalized to have a mean and unit variance of zero.
  • the twin convolutional neural network model provided by the present invention can realize the recognition of word categories or font categories.
  • the recognition of fonts and fonts is trained separately for simpler design and more direct coding.
  • the training time is shorter.
  • all fonts belonging to each character are merged, and then the samples in each font class are randomly divided into training set, validation set and test set in the ratio of 8:1:1.
  • all fonts belonging to each font are combined, and then the samples in each font category are randomly divided into training set and verification set in a ratio of 8:1:1 and the test set.
  • the dataset is not subjected to noise removal, contrast enhancement, extraneous object removal, etc. Because the convolutional neural network used will automatically take these factors into account. In addition, in order to reduce the amount of sample data, the data diversity is increased by random rotation and/or displacement of samples.
  • the number of objects collected is more than 3000, and the number of samples for each object is greater than or equal to 1.
  • the specific method is to randomly delete part of the members in the sample set of words whose number of samples is greater than 10, so that the final number of samples is less than 10. This can ensure that the trained twin convolutional neural network does not rely on large sample data sets. In the process of using after learning, if new words are encountered, it can be expanded efficiently without collecting a large number of new word samples for training. .
  • an abridged version of the data set is used.
  • the reason for training the small-sample calligraphic character recognition model is the small number of some character samples and the need to be able to recognize new character categories that are not included in the dataset of 6197 Chinese calligraphy characters.
  • the samples of each word are randomly deleted so that there are no more than 3 samples per word. Then, repeat the training, validation, and test set separation process above to create datasets for word and font recognition, respectively. Table 3 shows the word count and sample count statistics after the training set has been reduced.
  • the training process of the Siamese convolutional neural network is shown in Figure 2 and Figure 3.
  • Sample A and sample B with a resolution of m ⁇ n are respectively input into two identical Convolutional neural network.
  • the pictures of the two input characters are respectively calculated by the convolutional neural network, and after obtaining two 10- to 100-dimensional feature vectors of a single word, the two vectors are calculated by Euclidean distance or cosine similarity.
  • the Euclidean distance of the two output feature vectors is smaller or the cosine similarity is larger; if the output A and B are not the same word, the two output feature vectors The Euclidean distance of the eigenvectors is larger or the cosine similarity is smaller.
  • the similarity difference is used as a loss function for back propagation (back propagation), which can update all the weights and biases of the entire Siamese neural network architecture, thereby completing the training.
  • Step S130 using the target picture containing calligraphy characters as input, using the trained Siamese neural network model to predict the character category or font category.
  • the target image can be recognized in real time. For example, for the category of a picture to be predicted, the same number of pictures can be extracted from different categories, and then input into the twin neural network for prediction with this picture to be predicted, and obtained by calculating which one is similar to the image of different categories forecast result.
  • the present invention uses the twin convolutional neural network architecture to complete the training with small sample size data, and achieve high recognition accuracy.
  • it when encountering a word that does not exist in the training set, it will not be misclassified, but it will be recognized as a word that has not been seen before, and it can be recognized after seeing it only once.
  • the present invention can be a system, method and/or computer program product.
  • a computer program product may include a computer readable storage medium having computer readable program instructions thereon for causing a processor to implement various aspects of the present invention.
  • a computer readable storage medium may be a tangible device that can retain and store instructions for use by an instruction execution device.
  • a computer readable storage medium may be, for example, but is not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
  • Computer-readable storage media include: portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM), or flash memory), static random access memory (SRAM), compact disc read only memory (CD-ROM), digital versatile disc (DVD), memory stick, floppy disk, mechanically encoded device, such as a printer with instructions stored thereon A hole card or a raised structure in a groove, and any suitable combination of the above.
  • RAM random access memory
  • ROM read-only memory
  • EPROM erasable programmable read-only memory
  • flash memory static random access memory
  • SRAM static random access memory
  • CD-ROM compact disc read only memory
  • DVD digital versatile disc
  • memory stick floppy disk
  • mechanically encoded device such as a printer with instructions stored thereon
  • a hole card or a raised structure in a groove and any suitable combination of the above.
  • computer-readable storage media are not to be construed as transient signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., pulses of light through fiber optic cables), or transmitted electrical signals.
  • Computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or downloaded to an external computer or external storage device over a network, such as the Internet, a local area network, a wide area network, and/or a wireless network.
  • the network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers.
  • a network adapter card or a network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in each computing/processing device .
  • Computer program instructions for carrying out operations of the present invention may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-related instructions, microcode, firmware instructions, state setting data, or Source or object code written in any combination, including object-oriented programming languages—such as Smalltalk, C++, Python, etc., and conventional procedural programming languages—such as the “C” language or similar programming languages.
  • Computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server implement.
  • the remote computer can be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as via the Internet using an Internet service provider). connect).
  • LAN local area network
  • WAN wide area network
  • an electronic circuit such as a programmable logic circuit, field programmable gate array (FPGA), or programmable logic array (PLA)
  • FPGA field programmable gate array
  • PDA programmable logic array
  • These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine such that when executed by the processor of the computer or other programmable data processing apparatus , producing an apparatus for realizing the functions/actions specified in one or more blocks in the flowchart and/or block diagram.
  • These computer-readable program instructions can also be stored in a computer-readable storage medium, and these instructions cause computers, programmable data processing devices and/or other devices to work in a specific way, so that the computer-readable medium storing instructions includes An article of manufacture comprising instructions for implementing various aspects of the functions/acts specified in one or more blocks in flowcharts and/or block diagrams.
  • each block in a flowchart or block diagram may represent a module, a portion of a program segment, or an instruction that includes one or more Executable instructions.
  • the functions noted in the block may occur out of the order noted in the figures. For example, two blocks in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented by a dedicated hardware-based system that performs the specified function or action , or may be implemented by a combination of dedicated hardware and computer instructions. It is well known to those skilled in the art that implementation by means of hardware, implementation by means of software, and implementation by a combination of software and hardware are all equivalent.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Character Discrimination (AREA)

Abstract

A calligraphy character recognition method based on a siamese convolutional neural network. The method comprises: obtaining a calligraphy character image to be recognized; inputting the calligraphy character image into a trained siamese convolutional neural network model, the siamese neural network model comprising a first convolutional neural network and a second convolutional neural network, wherein the first convolutional neural network outputs a corresponding first feature vector, and the second convolutional neural network outputs a corresponding second feature vector; calculating the similarity between the first feature vector and the second feature vector; and predicting the category of a calligraphy character on the basis of the similarity result. According to the method, the siamese convolutional neural network can be trained by means of a small number of samples or even a single sample, thereby reducing the training cost and significantly improving the accuracy of calligraphy character recognition.

Description

一种基于孪生卷积神经网络的书法字识别方法A Calligraphy Character Recognition Method Based on Siamese Convolutional Neural Network 技术领域technical field
本发明涉及书法字识别技术领域,更具体地,涉及一种基于孪生卷积神经网络的书法字识别方法。The present invention relates to the technical field of calligraphy character recognition, and more specifically, to a method for recognizing calligraphy characters based on twin convolutional neural networks.
背景技术Background technique
中国书法字的历史和丰富度广博又深远,而目前年轻一代对一些书法字还缺乏了解。例如,登高岳阳楼,行于楼东新碑廊,面对龙飞凤舞,气吞山河的古文隽字,只能徒然汗颜,这是因为难以识别出文字内容。如果可以通过机器快速进行识别,那么就可以克服阅读障碍。The history and richness of Chinese calligraphy are extensive and far-reaching, but the current young generation still lacks understanding of some calligraphy characters. For example, climbing the Yueyang Tower, walking in the corridor of the new stele in the east of the building, facing the ancient Chinese characters full of dragons and phoenixes, can only be ashamed in vain, because it is difficult to recognize the content of the characters. Dyslexia can be overcome if it can be recognized quickly by a machine.
书法字体通常可以分为“楷、草、行、隶、篆”五大类,不同字体间形态特征差异较大,对于没有经过系统学习的普通人可能难以辨认。市面上也存在此类需求的应用和软件,但准确率都不是很高。例如,手写的连笔“围”字,容易被判断成“国”字,其根本原因是现有的识别技术仅基于一种简单的特征比对,如一个“国”字,收录几十到几百张数据,用户端输入一个字的信息后,通过特征比对,寻找最相匹配的结果。这种方式要求极庞大的数据样本才能够提高准确率,而中国书法字的样本却十分少,所以这种识别方法准确率较低,并且成本过高。Calligraphy fonts can usually be divided into five categories: "Kai, Cao, Xing, Li, and Seal". The morphological characteristics of different fonts are quite different, and it may be difficult for ordinary people who have not studied systematically to recognize them. There are also applications and software with such requirements on the market, but the accuracy rate is not very high. For example, the handwritten continuous character "微" is easily judged as the character "Guo". The fundamental reason is that the existing recognition technology is only based on a simple feature comparison. Hundreds of pieces of data, after the user enters a word of information, through feature comparison, find the most matching result. This method requires extremely large data samples to improve the accuracy rate, but the samples of Chinese calligraphy characters are very small, so the accuracy rate of this recognition method is low, and the cost is too high.
在现有技术中,书法字识别的方案通常分为两类。一是不通过神经网络训练,而是收集样本建立大数据库,然后将待识别的文字在数据库中查找比对,取相似度最高的作为识别结果。二是通过神经网络进行学习,这种方法需要收集大量样本数据进行训练,选择表征相符合的结果,从而实现准确识别的效果。In the prior art, solutions for calligraphy character recognition are generally divided into two categories. One is not to use neural network training, but to collect samples to build a large database, then search and compare the text to be recognized in the database, and take the one with the highest similarity as the recognition result. The second is to learn through the neural network. This method needs to collect a large number of sample data for training, and select the results that match the representation, so as to achieve the effect of accurate recognition.
对于不通过神经网络训练的方法,例如专利申请公布号CN103093240A(“书法字识别方法”)对书法字进行二值化、去噪和归一化处理后提取特征信息,如四个边界点位置、平均笔画穿越数、投影值、 轮廓点等,然后,提取待识别书法字的特征信息,再进行形状匹配比较,给出识别结果,这种方法识别准确率较低。又如,专利申请公布号CN101785030A(“基于隐式马尔可夫模型的手写笔迹/书法生成”)用马尔可夫模型来生成手写字符。经训练的隐式马尔可夫模型可使用诸如最大后验技术、最大似然线性,这种方法同样存在识别准确率较低的问题。For methods that do not use neural network training, for example, patent application publication number CN103093240A ("Calligraphy Character Recognition Method") extracts feature information after binarization, denoising and normalization processing of calligraphy characters, such as four boundary point positions, The average stroke crossing number, projection value, contour points, etc., and then extract the feature information of the calligraphy characters to be recognized, and then perform shape matching and comparison to give the recognition result. This method has a low recognition accuracy. As another example, the patent application publication number CN101785030A ("Handwritten Handwriting/Calligraphy Generation Based on Hidden Markov Model") uses a Markov model to generate handwritten characters. The trained Hidden Markov Model can use techniques such as maximum a posteriori techniques and maximum likelihood linearity, but this method also has the problem of low recognition accuracy.
对于通过神经网络进行训练的方法,需要大量数据作为支撑,但书法的数据集样本很小,且收集困难。据最新版新华字典的收录,约有超过11000的汉字,常用汉字为3500。每个字需要收集几十到几千个样本,已有的识别技术所需的时间成本高而准确率低。例如专利申请公布号CN110334782A(“多卷积层驱动的深度信念网络书法风格识别方法”)和专利申请公布号CN108764242A(“基于深层卷积神经网络的离线手写汉字体识别方法”)就无法在样本量小的情况下有效训练神经网络。而专利申请公布号CN108985348(“基于卷积神经网络的书法风格识别方法”)只能做到书法风格识别,不能做到书法字识别。For the method of training through the neural network, a large amount of data is required as support, but the calligraphy data set sample is small and difficult to collect. According to the latest edition of Xinhua Dictionary, there are more than 11,000 Chinese characters, of which 3,500 are commonly used. Dozens to thousands of samples need to be collected for each word, and the time cost required by the existing recognition technology is high and the accuracy rate is low. For example, the patent application publication number CN110334782A ("Multi-convolution layer-driven deep belief network calligraphy style recognition method") and the patent application publication number CN108764242A ("Offline handwritten Chinese font recognition method based on deep convolutional neural network") cannot be used in the sample Efficient training of neural networks with a small amount of data. And the patent application publication number CN108985348 ("Calligraphy style recognition method based on convolutional neural network") can only achieve calligraphy style recognition, but cannot achieve calligraphy character recognition.
总之,现有的书法字识别方法准确率不高,这主要是因为,书法字形态多样,且书法作者个人发挥空间大,对于部分形态非标准的书法字样本,用传统特征提取的手动编程算法识别效果不理想;部分生僻字的样本量小,所以可用于机器学习的数据库体量小,导致传统基于深度学习的机器视觉算法训练效果不理想。In short, the accuracy rate of existing calligraphy character recognition methods is not high, mainly because of the various forms of calligraphy characters and the large space for individual calligraphers to develop. For some samples of calligraphy characters with non-standard shapes, manual programming algorithms for traditional feature extraction The recognition effect is not ideal; the sample size of some rare characters is small, so the database that can be used for machine learning is small, which leads to the unsatisfactory training effect of traditional machine vision algorithms based on deep learning.
发明内容Contents of the invention
本发明的目的是克服上述现有技术的缺陷,提供一种基于孪生卷积神经网络的书法字识别方法。该方法包括以下步骤:The purpose of the present invention is to overcome the defective of above-mentioned prior art, a kind of calligraphy word recognition method based on Siamese convolutional neural network is provided. The method includes the following steps:
获取待识别的书法字图片;Obtain the picture of calligraphy characters to be recognized;
将所述书法字图片输入经训练的孪生卷积神经网络模型,该孪生神经网络模型包含第一卷积神经网络和第二卷积神经网络,其中第一卷积神经网络输出对应的第一特征向量,第二卷积神经网络输出对应的第二特征向量;The calligraphic word picture is input into the twin convolutional neural network model through training, and this twinned neural network model comprises the first convolutional neural network and the second convolutional neural network, wherein the first convolutional neural network outputs the corresponding first feature Vector, the second feature vector corresponding to the output of the second convolutional neural network;
计算第一特征向量和第二特征向量之间的相似度;calculating the similarity between the first eigenvector and the second eigenvector;
基于所述相似度结果预测书法字的类别。The category of the calligraphy character is predicted based on the similarity result.
与现有技术相比,本发明的优点在于,能够通过少量甚至单一样本,完成学习(few-/one-shot learning),从而显著减少了神经网络训练的量并且不损失准确率,经训练的神经网络能够成功用于书法字识别。另外,传统基于卷积神经网络的深度学习方法无法识别训练中没有遇到过的对象,如果需要使该神经网络识别新的对象,则需要收集大量的该对象的样本,对整个神经网络(或者至少是该神经网络的全连接层)进行重新训练。而本发明提供的孪生神经网络架构并不直接输出该样本的标签,而是输出该样本和样本库里其他成员的相似度值,对于一个新的对象,能够给出“与样本库中所有的成员都不相似”的结论,即判断该对象是一个从未见过的对象,由于中文汉字数量庞大,任何数据库都难以包括所有汉字,本发明提供的这一特征非常重要,增强了书法字识别的鲁棒性。Compared with the prior art, the present invention has the advantage of being able to complete learning (few-/one-shot learning) with a small number of samples or even a single sample, thereby significantly reducing the amount of neural network training without losing accuracy. Neural networks can be successfully used for calligraphy character recognition. In addition, traditional deep learning methods based on convolutional neural networks cannot recognize objects that have not been encountered in training. If the neural network needs to recognize new objects, it is necessary to collect a large number of samples of the object, and the entire neural network (or At least the fully connected layers of the neural network) for retraining. However, the Siamese neural network architecture provided by the present invention does not directly output the label of the sample, but outputs the similarity value between the sample and other members in the sample library. Members are not similar", that is, it is judged that the object is an object that has never been seen. Due to the huge number of Chinese characters, it is difficult for any database to include all Chinese characters. This feature provided by the present invention is very important and enhances the recognition of calligraphy characters. robustness.
通过以下参照附图对本发明的示例性实施例的详细描述,本发明的其它特征及其优点将会变得清楚。Other features of the present invention and advantages thereof will become apparent from the following detailed description of exemplary embodiments of the present invention with reference to the accompanying drawings.
附图说明Description of drawings
被结合在说明书中并构成说明书的一部分的附图示出了本发明的实施例,并且连同其说明一起用于解释本发明的原理。The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention.
图1是根据本发明一个实施例的基于孪生卷积神经网络的书法字识别方法的流程图;Fig. 1 is the flow chart of the calligraphy word recognition method based on twin convolutional neural network according to one embodiment of the present invention;
图2是根据本发明一个实施例的孪生卷积神经网络整体架构图;FIG. 2 is an overall architecture diagram of a twin convolutional neural network according to an embodiment of the present invention;
图3是根据本发明另一实施例的孪生卷积神经网络的整体架构图;3 is an overall architecture diagram of a twin convolutional neural network according to another embodiment of the present invention;
图4是根据本发明一个实施例的孪生卷积神经网络的具体结构图;FIG. 4 is a specific structural diagram of a twin convolutional neural network according to an embodiment of the present invention;
图5是根据本发明一个实施例的字类样本示意图;Fig. 5 is a schematic diagram of a font sample according to an embodiment of the present invention;
图6是根据本发明一个实施例的实验效果对比图;Fig. 6 is a comparison diagram of experimental effects according to an embodiment of the present invention;
附图中,Input Layer-输入层;input-输入;output-输出;none-无;Model-模型;Functional-功能性;Euclidean Distance-欧几里得距离;Max Pooling-最大池化;Global Average Pooling-全局平均池化。In the attached figure, Input Layer-input layer; input-input; output-output; none-none; Model-model; Functional-functionality; Euclidean Distance-Euclidean distance; Max Pooling-maximum pooling; Global Average Pooling - Global average pooling.
具体实施方式Detailed ways
现在将参照附图来详细描述本发明的各种示例性实施例。应注意到:除非另外具体说明,否则在这些实施例中阐述的部件和步骤的相对布置、数字表达式和数值不限制本发明的范围。Various exemplary embodiments of the present invention will now be described in detail with reference to the accompanying drawings. It should be noted that the relative arrangements of components and steps, numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present invention unless specifically stated otherwise.
以下对至少一个示例性实施例的描述实际上仅仅是说明性的,决不作为对本发明及其应用或使用的任何限制。The following description of at least one exemplary embodiment is merely illustrative in nature and in no way taken as limiting the invention, its application or uses.
对于相关领域普通技术人员已知的技术、方法和设备可能不作详细讨论,但在适当情况下,所述技术、方法和设备应当被视为说明书的一部分。Techniques, methods and devices known to those of ordinary skill in the relevant art may not be discussed in detail, but where appropriate, such techniques, methods and devices should be considered part of the description.
在这里示出和讨论的所有例子中,任何具体值应被解释为仅仅是示例性的,而不是作为限制。因此,示例性实施例的其它例子可以具有不同的值。In all examples shown and discussed herein, any specific values should be construed as exemplary only, and not as limitations. Therefore, other instances of the exemplary embodiment may have different values.
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步讨论。It should be noted that like numerals and letters denote like items in the following figures, therefore, once an item is defined in one figure, it does not require further discussion in subsequent figures.
本发明基于孪生卷积神经网络构建模型架构,以实现书法字的识别。简言之,在训练过程中,将训练集中的两个样本分别输入两个完全相同的卷积神经网络中,得出两个特征向量。再将这两个特征向量的相似度进行计算,得出一个数值。如果这两个字的标签一样,则它们是同一个字,它们特征向量的相似度等于1。反之,如果这两个字的标签不同,则它们特征向量的相似度等于0。并且,反向传导标签的布尔值和所计算出的相似度值的差值并进行随机梯度下降,对神经网络进行训练。在实际应用中,将待识别的书法字图片输入经训练的孪生卷积神经网络,输出对应的特征向量,将该特征向量与特征向量库里的成员进行比对,取相似度最高的成员,作为待识别书法字的识别结果。本发明可用于识别书法字,也可用于识别书法字的字体,如楷书、草书、行书等类别。The present invention builds a model framework based on twin convolutional neural networks to realize calligraphy character recognition. In short, during the training process, the two samples in the training set are respectively input into two identical convolutional neural networks to obtain two feature vectors. Then calculate the similarity of the two feature vectors to obtain a value. If the tags of these two words are the same, they are the same word, and the similarity of their feature vectors is equal to 1. Conversely, if the labels of the two words are different, the similarity of their feature vectors is equal to 0. Moreover, the difference between the Boolean value of the reverse conduction label and the calculated similarity value is performed and stochastic gradient descent is performed to train the neural network. In practical applications, input the picture of calligraphy characters to be recognized into the trained twin convolutional neural network, output the corresponding feature vector, compare the feature vector with the members in the feature vector library, and select the member with the highest similarity. As the recognition result of the calligraphy character to be recognized. The present invention can be used to recognize calligraphy characters, and can also be used to recognize the fonts of calligraphy characters, such as regular script, cursive script, running script and the like.
具体地,参见图1所示,所提供的基于孪生卷积神经网络的书法字识别方法包括以下步骤。Specifically, referring to FIG. 1 , the provided calligraphy character recognition method based on twin convolutional neural network includes the following steps.
步骤S110,构建孪生卷积神经网络模型。Step S110, constructing a Siamese convolutional neural network model.
在一个实施例中,参见图2所示,孪生卷积神经网络的整体架构包括 输入层、两个卷积神经网络、池化层(标记为dense_1)和全连接层(标记为dense_2)。孪生卷积神经网络的处理过程是:接收两个相同尺寸如100×100的灰度图像,将图像分别输入两个相同的深度卷积神经网络(CNN)提取不同深度的特征。例如,各卷积神经网络包含四个层次的特征提取结构,每个特征提取结构主要包括卷积层和池化层,参见下表1。图像首先被发送到卷积层,然后是池化层。然后,应用ReLU激活函数以及批规范化(BN,Batchnomalization)。在图2中,这些层重复四次,每次都设置稍微不同的内核大小和内核数量。最后,应用全局池化层和全连接层。在图2实施例中,两个卷积神经网络提取图像特征,并将其表示为由48个值组成的特征向量。In one embodiment, as shown in FIG. 2, the overall architecture of the Siamese convolutional neural network includes an input layer, two convolutional neural networks, a pooling layer (marked as dense_1) and a fully connected layer (marked as dense_2). The processing process of the twin convolutional neural network is: receive two grayscale images of the same size, such as 100×100, and input the images into two identical deep convolutional neural networks (CNN) to extract features of different depths. For example, each convolutional neural network contains four levels of feature extraction structures, and each feature extraction structure mainly includes convolutional layers and pooling layers, see Table 1 below. Images are first sent to convolutional layers, followed by pooling layers. Then, apply the ReLU activation function and batch normalization (BN, Batchnomalization). In Figure 2, these layers are repeated four times, each time with a slightly different kernel size and number of kernels. Finally, a global pooling layer and a fully connected layer are applied. In the Figure 2 embodiment, two convolutional neural networks extract image features and represent them as feature vectors consisting of 48 values.
表1 深度卷积神经网络Table 1 Deep Convolutional Neural Network
Figure PCTCN2022140065-appb-000001
Figure PCTCN2022140065-appb-000001
在另外的实施例中,孪生卷积神经网络的结构参见图3和图4所示,其中m和n分别是介于28和1000之间的一个整数,x是介于10到100之间的一个整数。In another embodiment, the structure of the Siamese convolutional neural network is shown in Figure 3 and Figure 4, wherein m and n are an integer between 28 and 1000, and x is between 10 and 100 an integer.
具体地,第一特征提取结构设置为:Specifically, the first feature extraction structure is set as:
32-128个卷积核为p×p的矩阵,其中p是介于5和15之间的一个整数;32-128 convolution kernels are p×p matrices, where p is an integer between 5 and 15;
k×k的池化层,其中k是介于1和5之间的一个整数;A k×k pooling layer, where k is an integer between 1 and 5;
Batchnomalization层;Batchnomalization layer;
dropout层,保留25%~75%数量的神经元。The dropout layer retains 25% to 75% of the number of neurons.
第二特征提取结构设置为:The second feature extraction structure is set as:
64-256个卷积核为q×q的矩阵,其中q是介于5和10之间的一个整数;64-256 convolution kernels are q×q matrices, where q is an integer between 5 and 10;
k×k的池化层,其中k是介于1和5之间的一个整数;A k×k pooling layer, where k is an integer between 1 and 5;
Batchnomalization层;Batchnomalization layer;
dropout层,保留25%~75%数量的神经元。The dropout layer retains 25% to 75% of the number of neurons.
第三特征提取结构设置为:The third feature extraction structure is set as:
64-256个卷积核为s×s的矩阵,其中s是介于2和6之间的一个整数;64-256 convolution kernels are s×s matrices, where s is an integer between 2 and 6;
k×k的池化层,其中k是介于1和5之间的一个整数;A k×k pooling layer, where k is an integer between 1 and 5;
Batchnomalization层;Batchnomalization layer;
dropout层,保留25%~75%数量的神经元。The dropout layer retains 25% to 75% of the number of neurons.
第四特征提取结构设置为:The fourth feature extraction structure is set as:
128-512个卷积核为t×t的矩阵,其中t是介于2和6之间的一个整数;128-512 convolution kernels are t×t matrices, where t is an integer between 2 and 6;
k×k的池化层,其中k是介于1和5之间的一个整数;A k×k pooling layer, where k is an integer between 1 and 5;
Batchnomalization层;Batchnomalization layer;
dropout层,保留25%~75%数量的神经元。The dropout layer retains 25% to 75% of the number of neurons.
步骤S120,采集数据集,并构建训练集以训练孪生神经网络模型,该训练集反映字或字体与样本图片之间的对应关系。Step S120, collect data sets, and build a training set to train the Siamese neural network model, the training set reflects the correspondence between words or fonts and sample pictures.
在该步骤中,首先采集数据集,进而构建训练集,在一个实施例中,该训练集包含多个字(即以字作为类别),每个字对应一个或多个样本,其中各个字所对应的样本反映不同的字体类别和不同的形态特征。In this step, at first collect the data set, and then construct the training set, in one embodiment, this training set comprises a plurality of words (namely with word as category), and each word corresponds to one or more samples, wherein each word The corresponding samples reflect different font classes and different morphological characteristics.
例如,可从http://www.shufazidian.com/网站下载中国书法字,截至2021年7月23日,该网站共存储了440412幅图像,包含8种字体和6197 个不同的字。对于常用字,对应字体数更多,而有的字体样本很少或没有样本。表2是字数以及每个字的样本数的汇总。For example, Chinese calligraphy characters can be downloaded from http://www.shufazidian.com/ website, as of July 23, 2021, the website has stored a total of 440,412 images, including 8 fonts and 6197 different characters. For commonly used characters, the number of corresponding fonts is more, and some font samples have few or no samples. Table 2 is a summary of word counts and the number of samples per word.
表2 字体类别以及每个不同字的样本数Table 2 Font categories and the number of samples for each different character
Figure PCTCN2022140065-appb-000002
Figure PCTCN2022140065-appb-000002
在所下载的图像中,大多数图像仅包含一个字,但有些包含多个字,因此需要分割成单个字的图像。图5示意了包含38个样本的示例字和一个包含多个字的图像。Among the downloaded images, most of the images contain only one word, but some contain multiple words, so images need to be segmented into individual words. Figure 5 illustrates an example word containing 38 samples and an image containing multiple words.
具体地,首先,标记例如1000个包含多个字的图像和1000个包含单个字的图像。然后,设置这两个数据集来训练一个与孪生卷积神经网络中使用的结构相同的CNN,通过这种方式,在识别图像是包含多个字还是单个字方面达到99.8%的准确率。高准确率是因为单字图像和多字图像在视觉上有显著差异。然后,使用经训练的CNN将所有440412幅图像分成相应的类别(即属于多字图像还是单字图像)。为了实现基于少量样本训练模型,只需要每个字少量的样本,所以,在一个实施例中,如果字已经有三个或更多的样本,就删除带有多个字的图像。对于少于三个样本的类别,则保留具有多个字的图像,并将它们分成单个字体。Specifically, first, for example, 1000 images containing multiple characters and 1000 images containing a single character are labeled. These two datasets were then set up to train a CNN with the same structure as used in Siamese Convolutional Neural Networks, and in this way achieved 99.8% accuracy in identifying whether an image contained multiple words or a single word. The high accuracy is due to the visually significant difference between single-word images and multi-word images. Then, all 440,412 images are classified into the corresponding categories (i.e. whether they belong to multi-word images or single-word images) using the trained CNN. In order to train the model based on a small number of samples, only a small number of samples per word is needed, so, in one embodiment, images with multiple words are deleted if the word already has three or more samples. For categories with fewer than three samples, images with multiple words are kept and separated into individual fonts.
然后,对采集的数据集进行预处理。输入图像的预处理包括图像文件的组织、图像形状和颜色的归一化、图像分辨率的标准化以及训练和测试集的创建。考虑到不同图像的分辨率和颜色差异很大,分辨率过低导致信息丢失,分辨率过高导致内存不够,优选地,采用100×100像素。由于颜色通常不会在书法字识别中发挥作用,因此可以将所有图像转换为灰度图像。然后,将像素值标准化到0-1的范围,并将像素值标准化到均值和单位方差为零。Then, the collected data set is preprocessed. Preprocessing of input images includes organization of image files, normalization of image shape and color, normalization of image resolution, and creation of training and test sets. Considering that the resolution and color of different images are very different, the resolution is too low to cause information loss, and the resolution is too high to cause insufficient memory, preferably, 100×100 pixels are used. Since color usually does not play a role in calligraphy recognition, all images can be converted to grayscale. Then, the pixel values are normalized to a range of 0-1, and the pixel values are normalized to have a mean and unit variance of zero.
本发明提供的孪生卷积神经网络模型可实现对字类别或字体类别的识别。分别训练字类及字体的识别是为了设计更简单、编码更直接,同时由于数据集的小样本量特性,训练时间较短。为了识别字而不考虑字体,将属于每个字的所有字体进行合并,然后将每个字类中的样本以8:1:1的比例随机分成训练集、验证集和测试集。在另一实施例中,为了训练识别字体而不考虑字,将属于每个字体的所有字体合并,然后以8:1:1的比例将每个字体类别中的样本随机分成训练集、验证集和测试集。The twin convolutional neural network model provided by the present invention can realize the recognition of word categories or font categories. The recognition of fonts and fonts is trained separately for simpler design and more direct coding. At the same time, due to the small sample size of the data set, the training time is shorter. In order to recognize characters regardless of font, all fonts belonging to each character are merged, and then the samples in each font class are randomly divided into training set, validation set and test set in the ratio of 8:1:1. In another embodiment, in order to train and recognize fonts regardless of characters, all fonts belonging to each font are combined, and then the samples in each font category are randomly divided into training set and verification set in a ratio of 8:1:1 and the test set.
优选地,数据集没有经过噪声消除、对比度增强、无关对象删除等。因为使用的卷积神经网络将自动考虑这些因素。此外,为了缩减样本数据量,通过样本的随机旋转和/或位移来增加数据多样性。Preferably, the dataset is not subjected to noise removal, contrast enhancement, extraneous object removal, etc. Because the convolutional neural network used will automatically take these factors into account. In addition, in order to reduce the amount of sample data, the data diversity is increased by random rotation and/or displacement of samples.
为了尽可能完整地收集书法字作为孪生卷积神经网络的训练集,收集的字的对象数在3000以上,每个对象的样本数大于等于1。例如,将每个字的样本数控制在10个样本以内,具体做法是将样本数大于10的字的样本集中的成员随机删除一部分,使得最终的样本数小于10。这样可以保证训练出来的孪生卷积神经网络不依赖大样本数据集,在学习完成后使用的过程中,如果遇到新的字可以高效的进行拓展,而不需要收集大量的新字样本进行训练。In order to collect calligraphy characters as completely as possible as the training set of the twin convolutional neural network, the number of objects collected is more than 3000, and the number of samples for each object is greater than or equal to 1. For example, to control the number of samples of each word within 10 samples, the specific method is to randomly delete part of the members in the sample set of words whose number of samples is greater than 10, so that the final number of samples is less than 10. This can ensure that the trained twin convolutional neural network does not rely on large sample data sets. In the process of using after learning, if new words are encountered, it can be expanded efficiently without collecting a large number of new word samples for training. .
在另一优选实施例中,采用删减版数据集。训练小样本书法字识别模型的原因是有一部分字样本数量少,以及需要能够识别例如不包括在6197个中国书法字的数据集中的新字类别。为了测试孪生卷积神经网络在小样本学习中的能力,随机删除了每个字的样本,使得每字不超过3个样本。然后,重复上面的训练、验证和测试集分离过程,分别创建用于字和字体识别的数据集。表3显示的是训练集缩减后的字数和样本数统计。In another preferred embodiment, an abridged version of the data set is used. The reason for training the small-sample calligraphic character recognition model is the small number of some character samples and the need to be able to recognize new character categories that are not included in the dataset of 6197 Chinese calligraphy characters. In order to test the ability of siamese convolutional neural network in few-shot learning, the samples of each word are randomly deleted so that there are no more than 3 samples per word. Then, repeat the training, validation, and test set separation process above to create datasets for word and font recognition, respectively. Table 3 shows the word count and sample count statistics after the training set has been reduced.
表3 训练集缩小后的字数和样本数统计Table 3 The number of words and the number of samples after the training set is reduced
Figure PCTCN2022140065-appb-000003
Figure PCTCN2022140065-appb-000003
Figure PCTCN2022140065-appb-000004
Figure PCTCN2022140065-appb-000004
以识别字类别为例,孪生卷积神经网络的训练过程结合图2和图3所示,将分辨率为m×n(例如100×100)的样本A和样本B分别输入两个完全相同的卷积神经网络。两个输入文字的图片分别经过卷积神经网络的计算,得到两个10到100维的单个字的特征向量后,将这两个向量进行欧几里得距离或者cosine相似度的计算。Taking the word recognition category as an example, the training process of the Siamese convolutional neural network is shown in Figure 2 and Figure 3. Sample A and sample B with a resolution of m×n (for example, 100×100) are respectively input into two identical Convolutional neural network. The pictures of the two input characters are respectively calculated by the convolutional neural network, and after obtaining two 10- to 100-dimensional feature vectors of a single word, the two vectors are calculated by Euclidean distance or cosine similarity.
若输入的样本A和B为同一个字,则输出的两个特征向量的欧几里的距离较小或者cosine相似度较大;若输出的A和B不为同一个字,则输出的两个特征向量的欧几里的距离较大或者cosine相似度较小。If the input samples A and B are the same word, the Euclidean distance of the two output feature vectors is smaller or the cosine similarity is larger; if the output A and B are not the same word, the two output feature vectors The Euclidean distance of the eigenvectors is larger or the cosine similarity is smaller.
将计算出的距离值或者相似度值和这两个字已知标签对比得出布尔值(例如,两个输入图片是同一个字的布尔值为“1”,不同字的布尔值为“0”)进行对比,求出它们的差值。Compare the calculated distance value or similarity value with the known labels of the two words to get a Boolean value (for example, the Boolean value of the two input pictures is the same word "1", and the Boolean value of different words is "0 ") to compare and find their difference.
例如,在两个图像通过两个相同的中枢神经系统发送后,获得两个特征向量,每个维度为48。然后,简单地比较两个向量之间的欧几里德距离,以获得两个图像之间相似性的度量。最后,连续应用两个sigmoid函数。输出是一个布尔值:0表示两个图像包含不同的标记,1表示两个图像包含相同的字。For example, after two images are sent through two identical CNSs, two feature vectors are obtained, each of dimension 48. Then, simply compare the Euclidean distance between the two vectors to obtain a measure of the similarity between the two images. Finally, the two sigmoid functions are applied consecutively. The output is a boolean value: 0 means the two images contain different tokens, 1 means the two images contain the same word.
训练过程中,将相似度差值作为损失函数进行反向传播(back propagation),可更新整个孪生神经网络架构的所有权重和偏差,从而完成训练。During the training process, the similarity difference is used as a loss function for back propagation (back propagation), which can update all the weights and biases of the entire Siamese neural network architecture, thereby completing the training.
需要说明的是,在训练过程或实际应用过程中,如果遇到了没有见过的字,可以将新的字囊括在字库(特征向量库)里,增加该模型认识字的数量。It should be noted that, during the training process or the actual application process, if you encounter characters you have never seen before, you can include new characters in the character library (feature vector library) to increase the number of characters recognized by the model.
步骤S130,以包含书法字的目标图片作为输入,利用经训练的孪生神经网络模型预测字类别或字体类别。Step S130, using the target picture containing calligraphy characters as input, using the trained Siamese neural network model to predict the character category or font category.
在模型训练完成后,即可实时识别目标图片。例如,对于一张要预测图片所属类别,可以从不同类别中抽取相同数量图片,然后分别和这张要预测图片输入孪生神经网络进行预测,通过计算其与不同类别图像中哪一个比较相似来获得预测结果。After the model training is completed, the target image can be recognized in real time. For example, for the category of a picture to be predicted, the same number of pictures can be extracted from different categories, and then input into the twin neural network for prediction with this picture to be predicted, and obtained by calculating which one is similar to the image of different categories forecast result.
为了进一步验证本发明的效果,进行了实验。首先训练识别字类。来自不同字体的相同字的样本被合并在一起。然后,以8:1:1的比例将每个字类中的样本随机分成训练集、验证集和测试集。图像通过SNN(孪生神经网络)发送,训练结果如图6所示,其中示意了训练损失和准确率(Training Loss and Accuracy)。结果表明,训练集上的准确率为94.5%,损失为0.5。In order to further verify the effect of the present invention, experiments were carried out. First train to recognize word classes. Samples of the same word from different fonts are merged together. Then, the samples in each word class are randomly divided into training set, validation set and test set in the ratio of 8:1:1. The image is sent through SNN (Twin Neural Network), and the training result is shown in Figure 6, which shows the training loss and accuracy (Training Loss and Accuracy). The results show an accuracy of 94.5% on the training set with a loss of 0.5.
为了训练孪生卷积神经网络识别字体而不考虑字类,将属于每个字体的所有字类合并,然后以8:1:1的比例将每个字体类中的样本随机分成训练、验证和测试集。训练结果表明,训练集上的准确率为95.5%,损失为0.5。To train Siamese ConvNets to recognize fonts regardless of subtype, all subcategories belonging to each font are combined, and then samples in each font class are randomly split into training, validation, and testing at a ratio of 8:1:1. set. The training results show that the accuracy on the training set is 95.5% with a loss of 0.5.
综上所述,本发明运用孪生卷积神经网络架构,通过小样本量数据就可以完成训练,并取得较高的识别准确率。另外,在遇到训练集中不存在的字时,不会将其错误分类,而是识别出这是一个没有见过的字,并能仅见过一次以后就进行识别。To sum up, the present invention uses the twin convolutional neural network architecture to complete the training with small sample size data, and achieve high recognition accuracy. In addition, when encountering a word that does not exist in the training set, it will not be misclassified, but it will be recognized as a word that has not been seen before, and it can be recognized after seeing it only once.
本发明可以是系统、方法和/或计算机程序产品。计算机程序产品可以包括计算机可读存储介质,其上载有用于使处理器实现本发明的各个方面的计算机可读程序指令。The present invention can be a system, method and/or computer program product. A computer program product may include a computer readable storage medium having computer readable program instructions thereon for causing a processor to implement various aspects of the present invention.
计算机可读存储介质可以是可以保持和存储由指令执行设备使用的指令的有形设备。计算机可读存储介质例如可以是但不限于电存储设备、磁存储设备、光存储设备、电磁存储设备、半导体存储设备或者上述的任意合适的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括:便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、静态随机存取存储器(SRAM)、便携式压缩盘只读存储器(CD-ROM)、数字多功能盘(DVD)、记忆棒、软盘、机械编码设备、例如其上存储有指令的打孔卡或凹槽内凸起结构、以及上述的任意合适的组合。这里所使用的计算机可读存储介质不被解释为瞬时信号本身,诸如无线电波或者其他自由传播的电磁波、通过波导或其他传输媒介传播的电磁波(例如,通过光纤电缆的光脉冲)、或者通过电线传输的电信号。A computer readable storage medium may be a tangible device that can retain and store instructions for use by an instruction execution device. A computer readable storage medium may be, for example, but is not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of computer-readable storage media include: portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM), or flash memory), static random access memory (SRAM), compact disc read only memory (CD-ROM), digital versatile disc (DVD), memory stick, floppy disk, mechanically encoded device, such as a printer with instructions stored thereon A hole card or a raised structure in a groove, and any suitable combination of the above. As used herein, computer-readable storage media are not to be construed as transient signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., pulses of light through fiber optic cables), or transmitted electrical signals.
这里所描述的计算机可读程序指令可以从计算机可读存储介质下载到各个计算/处理设备,或者通过网络、例如因特网、局域网、广域网和/ 或无线网下载到外部计算机或外部存储设备。网络可以包括铜传输电缆、光纤传输、无线传输、路由器、防火墙、交换机、网关计算机和/或边缘服务器。每个计算/处理设备中的网络适配卡或者网络接口从网络接收计算机可读程序指令,并转发该计算机可读程序指令,以供存储在各个计算/处理设备中的计算机可读存储介质中。Computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or downloaded to an external computer or external storage device over a network, such as the Internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers. A network adapter card or a network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in each computing/processing device .
用于执行本发明操作的计算机程序指令可以是汇编指令、指令集架构(ISA)指令、机器指令、机器相关指令、微代码、固件指令、状态设置数据、或者以一种或多种编程语言的任意组合编写的源代码或目标代码,所述编程语言包括面向对象的编程语言—诸如Smalltalk、C++、Python等,以及常规的过程式编程语言—诸如“C”语言或类似的编程语言。计算机可读程序指令可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络—包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。在一些实施例中,通过利用计算机可读程序指令的状态信息来个性化定制电子电路,例如可编程逻辑电路、现场可编程门阵列(FPGA)或可编程逻辑阵列(PLA),该电子电路可以执行计算机可读程序指令,从而实现本发明的各个方面。Computer program instructions for carrying out operations of the present invention may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-related instructions, microcode, firmware instructions, state setting data, or Source or object code written in any combination, including object-oriented programming languages—such as Smalltalk, C++, Python, etc., and conventional procedural programming languages—such as the “C” language or similar programming languages. Computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server implement. In cases involving a remote computer, the remote computer can be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as via the Internet using an Internet service provider). connect). In some embodiments, an electronic circuit, such as a programmable logic circuit, field programmable gate array (FPGA), or programmable logic array (PLA), can be customized by utilizing state information of computer-readable program instructions, which can Various aspects of the invention are implemented by executing computer readable program instructions.
这里参照根据本发明实施例的方法、装置(系统)和计算机程序产品的流程图和/或框图描述了本发明的各个方面。应当理解,流程图和/或框图的每个方框以及流程图和/或框图中各方框的组合,都可以由计算机可读程序指令实现。Aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It should be understood that each block of the flowcharts and/or block diagrams, and combinations of blocks in the flowcharts and/or block diagrams, can be implemented by computer-readable program instructions.
这些计算机可读程序指令可以提供给通用计算机、专用计算机或其它可编程数据处理装置的处理器,从而生产出一种机器,使得这些指令在通过计算机或其它可编程数据处理装置的处理器执行时,产生了实现流程图和/或框图中的一个或多个方框中规定的功能/动作的装置。也可以把这些计算机可读程序指令存储在计算机可读存储介质中,这些指令使得计算机、可编程数据处理装置和/或其他设备以特定方式工作,从而,存储有指令的 计算机可读介质则包括一个制造品,其包括实现流程图和/或框图中的一个或多个方框中规定的功能/动作的各个方面的指令。These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine such that when executed by the processor of the computer or other programmable data processing apparatus , producing an apparatus for realizing the functions/actions specified in one or more blocks in the flowchart and/or block diagram. These computer-readable program instructions can also be stored in a computer-readable storage medium, and these instructions cause computers, programmable data processing devices and/or other devices to work in a specific way, so that the computer-readable medium storing instructions includes An article of manufacture comprising instructions for implementing various aspects of the functions/acts specified in one or more blocks in flowcharts and/or block diagrams.
也可以把计算机可读程序指令加载到计算机、其它可编程数据处理装置、或其它设备上,使得在计算机、其它可编程数据处理装置或其它设备上执行一系列操作步骤,以产生计算机实现的过程,从而使得在计算机、其它可编程数据处理装置、或其它设备上执行的指令实现流程图和/或框图中的一个或多个方框中规定的功能/动作。It is also possible to load computer-readable program instructions into a computer, other programmable data processing device, or other equipment, so that a series of operational steps are performed on the computer, other programmable data processing device, or other equipment to produce a computer-implemented process , so that instructions executed on computers, other programmable data processing devices, or other devices implement the functions/actions specified in one or more blocks in the flowcharts and/or block diagrams.
附图中的流程图和框图显示了根据本发明的多个实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段或指令的一部分,所述模块、程序段或指令的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个连续的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或动作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。对于本领域技术人员来说公知的是,通过硬件方式实现、通过软件方式实现以及通过软件和硬件结合的方式实现都是等价的。The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in a flowchart or block diagram may represent a module, a portion of a program segment, or an instruction that includes one or more Executable instructions. In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved. It should also be noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by a dedicated hardware-based system that performs the specified function or action , or may be implemented by a combination of dedicated hardware and computer instructions. It is well known to those skilled in the art that implementation by means of hardware, implementation by means of software, and implementation by a combination of software and hardware are all equivalent.
以上已经描述了本发明的各实施例,上述说明是示例性的,并非穷尽性的,并且也不限于所披露的各实施例。在不偏离所说明的各实施例的范围和精神的情况下,对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。本文中所用术语的选择,旨在最好地解释各实施例的原理、实际应用或对市场中的技术改进,或者使本技术领域的其它普通技术人员能理解本文披露的各实施例。本发明的范围由所附权利要求来限定。Having described various embodiments of the present invention, the foregoing description is exemplary, not exhaustive, and is not limited to the disclosed embodiments. Many modifications and alterations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen to best explain the principle of each embodiment, practical application or technical improvement in the market, or to enable other ordinary skilled in the art to understand each embodiment disclosed herein. The scope of the invention is defined by the appended claims.

Claims (10)

  1. 一种基于孪生卷积神经网络的书法字识别方法,包括以下步骤:A method for recognizing calligraphy characters based on twin convolutional neural networks, comprising the following steps:
    获取待识别的书法字图片;Obtain the picture of calligraphy characters to be recognized;
    将所述书法字图片输入经训练的孪生卷积神经网络模型,该孪生神经网络模型包含第一卷积神经网络和第二卷积神经网络,其中第一卷积神经网络输出对应的第一特征向量,第二卷积神经网络输出对应的第二特征向量;The calligraphic word picture is input into the twin convolutional neural network model through training, and this twinned neural network model comprises the first convolutional neural network and the second convolutional neural network, wherein the first convolutional neural network outputs the corresponding first feature Vector, the second feature vector corresponding to the output of the second convolutional neural network;
    计算第一特征向量和第二特征向量之间的相似度;calculating the similarity between the first eigenvector and the second eigenvector;
    基于所述相似度结果预测书法字的类别。The category of the calligraphy character is predicted based on the similarity result.
  2. 根据权利要求1所述的方法,其特征在于,根据以下步骤训练所述孪生卷积神经网络模型:The method according to claim 1, wherein the twin convolutional neural network model is trained according to the following steps:
    构建训练集,该训练集以字作为类别,每个类别对应一个或多个样本图片,其中每个类别所对应的样本图片反映字体类别和形态特征;Construct a training set, the training set uses words as categories, and each category corresponds to one or more sample pictures, wherein the sample pictures corresponding to each category reflect the font category and morphological features;
    利用所述训练集以设定的损失为优化目标,训练所述孪生卷积神经网络模型,其中针对两个样本图片,分别输入第一卷积神经网络和第二卷积神经网络,得出两个特征向量,计算所述两个特征向量的相似度,并采用布尔值来标记所述两个特征向量的相似度结果,以表征所述两个样本图片是否包含相同的书法字,并且训练过程中,通过反向传导布尔值和计算出的相似度值的差值进行随机梯度下降。Using the training set to set the loss as the optimization target, train the twin convolutional neural network model, wherein for two sample pictures, input the first convolutional neural network and the second convolutional neural network respectively, and obtain two feature vectors, calculate the similarity of the two feature vectors, and use a Boolean value to mark the similarity results of the two feature vectors to represent whether the two sample pictures contain the same calligraphy characters, and the training process In , stochastic gradient descent is performed by backpropagating the difference between the Boolean value and the computed similarity value.
  3. 根据权利要求2所述的方法,其特征在于,所述训练集中所包含字的数目大于3000,每个字对应的样本数小于等于10。The method according to claim 2, wherein the number of words contained in the training set is greater than 3000, and the number of samples corresponding to each word is less than or equal to 10.
  4. 根据权利要求1所述的方法,其特征在于,第一卷积神经网络和第二卷积神经网络具有相同的结构,各包含四个特征提取结构,其中:The method according to claim 1, wherein the first convolutional neural network and the second convolutional neural network have the same structure, each comprising four feature extraction structures, wherein:
    第一特征提取结构包含卷积层,卷积核数目设置为32-128,卷积核大小设置为p×p,p是介于5和15之间的整数;k×k的池化层,k是介于1和5之间的整数;批规范化层;丢弃层,设置为保留25%-75%数量的神经元;The first feature extraction structure includes a convolution layer, the number of convolution kernels is set to 32-128, and the size of the convolution kernel is set to p×p, where p is an integer between 5 and 15; a k×k pooling layer, k is an integer between 1 and 5; a batch normalization layer; a dropout layer, set to retain 25%-75% of the number of neurons;
    第二特征提取结构包括卷积层,卷积核数目设置为64-256,卷积核大小设置为q×q,q是介于5和10之间的整数;k×k的池化层,k是介于1 和5之间的一个整数;批规范化层;丢弃层,设置为保留25%-75%数量的神经元;The second feature extraction structure includes a convolution layer, the number of convolution kernels is set to 64-256, the size of the convolution kernel is set to q×q, and q is an integer between 5 and 10; the pooling layer of k×k, k is an integer between 1 and 5; batch normalization layer; dropout layer, set to keep 25%-75% of the number of neurons;
    第三特征提取结构包括卷积层,卷积核数目设置为64-256,卷积核大小设置为s×s,其中s是介于2和6之间的一个整数;k×k的池化层,k是介于1和5之间的一个整数;批规范化层;丢弃层,设置为保留25%-75%数量的神经元;The third feature extraction structure includes a convolution layer, the number of convolution kernels is set to 64-256, and the size of the convolution kernel is set to s×s, where s is an integer between 2 and 6; pooling of k×k layer, k is an integer between 1 and 5; batch normalization layer; dropout layer, set to retain 25%-75% of the number of neurons;
    第四特征提取结构包括卷积层,卷积核数目设置为128-512,卷积核大小设置为t×t,t是介于2和6之间的一个整数;k×k的池化层,k是介于1和5之间的一个整数;批规范化层;丢弃层,设置为保留25%-75%数量的神经元。The fourth feature extraction structure includes a convolution layer, the number of convolution kernels is set to 128-512, the size of the convolution kernel is set to t×t, and t is an integer between 2 and 6; the pooling layer of k×k , k is an integer between 1 and 5; batch normalization layer; dropout layer, set to keep 25%-75% of the number of neurons.
  5. 根据权利要求1所述的方法,其特征在于,采用欧几里得距离或者cosine相似度来衡量第一特征向量和第二特征向量之间的相似度。The method according to claim 1, characterized in that Euclidean distance or cosine similarity is used to measure the similarity between the first feature vector and the second feature vector.
  6. 根据权利要求1所述的方法,其特征在于,还包括:The method according to claim 1, further comprising:
    根据第一特征向量和第二特征向量之间的相似度结果判断待识别的字是否存在于字库中;Judging whether the character to be recognized exists in the font library according to the similarity result between the first feature vector and the second feature vector;
    在判断结果为否的情况下,则将该待识别书法字囊括在字库中。If the judgment result is negative, include the calligraphy character to be recognized in the font library.
  7. 根据权利要求2所述的方法,其特征在于,还包括:采用第二训练集训练所述孪生卷积神经网络模型,在第二训练集中,以字体作为类别,每个类别对应一个或多个样本图片。The method according to claim 2, further comprising: using a second training set to train the twin convolutional neural network model, in the second training set, fonts are used as categories, and each category corresponds to one or more Sample image.
  8. 根据权利要求2所述的方法,其特征在于,对于所述训练集,如果一个字已经有三个或三个以上的样本,则删除带有多个字体的样本图片,如果一个字少于三个样本,则保留具有多个字体的图片,并将其分成单个字体。The method according to claim 2, characterized in that, for the training set, if a word has three or more than three samples, then delete the sample pictures with multiple fonts, if a word is less than three sample, keep images with multiple fonts and separate them into individual fonts.
  9. 一种计算机可读存储介质,其上存储有计算机程序,其中,该程序被处理器执行时实现根据权利要求1至8中任一项所述方法的步骤。A computer-readable storage medium, on which a computer program is stored, wherein, when the program is executed by a processor, the steps of the method according to any one of claims 1 to 8 are implemented.
  10. 一种计算机设备,包括存储器和处理器,在所述存储器上存储有能够在处理器上运行的计算机程序,其特征在于,所述处理器执行所述程序时实现权利要求1至8中任一项所述的方法的步骤。A computer device comprising a memory and a processor, wherein a computer program capable of running on the processor is stored on the memory, wherein any one of claims 1 to 8 is implemented when the processor executes the program The steps of the method described in the item.
PCT/CN2022/140065 2022-01-14 2022-12-19 Calligraphy character recognition method based on siamese convolutional neural network WO2023134402A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210042795.7A CN116486419A (en) 2022-01-14 2022-01-14 Handwriting word recognition method based on twin convolutional neural network
CN202210042795.7 2022-01-14

Publications (1)

Publication Number Publication Date
WO2023134402A1 true WO2023134402A1 (en) 2023-07-20

Family

ID=87210659

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/140065 WO2023134402A1 (en) 2022-01-14 2022-12-19 Calligraphy character recognition method based on siamese convolutional neural network

Country Status (2)

Country Link
CN (1) CN116486419A (en)
WO (1) WO2023134402A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117132998A (en) * 2023-08-29 2023-11-28 安徽以观文化科技有限公司 Method and system for identifying single fonts of calligraphic works
CN117437530A (en) * 2023-10-12 2024-01-23 中国科学院声学研究所 Synthetic aperture sonar interest small target twin matching identification method and system
CN117970224A (en) * 2024-03-29 2024-05-03 国网福建省电力有限公司 CVT error state online evaluation method, system, equipment and medium

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117727053B (en) * 2024-02-08 2024-04-19 西南科技大学 Multi-category Chinese character single sample font identification method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190019058A1 (en) * 2017-07-13 2019-01-17 Endgame, Inc. System and method for detecting homoglyph attacks with a siamese convolutional neural network
CN109993236A (en) * 2019-04-10 2019-07-09 大连民族大学 Few sample language of the Manchus matching process based on one-shot Siamese convolutional neural networks
CN111191067A (en) * 2019-12-25 2020-05-22 深圳市优必选科技股份有限公司 Picture book identification method, terminal device and computer readable storage medium
CN112163400A (en) * 2020-06-29 2021-01-01 维沃移动通信有限公司 Information processing method and device
US20210312628A1 (en) * 2020-04-07 2021-10-07 Naver Corporation A method for training a convolutional neural network for image recognition using image-conditioned masked language modeling

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190019058A1 (en) * 2017-07-13 2019-01-17 Endgame, Inc. System and method for detecting homoglyph attacks with a siamese convolutional neural network
CN109993236A (en) * 2019-04-10 2019-07-09 大连民族大学 Few sample language of the Manchus matching process based on one-shot Siamese convolutional neural networks
CN111191067A (en) * 2019-12-25 2020-05-22 深圳市优必选科技股份有限公司 Picture book identification method, terminal device and computer readable storage medium
US20210312628A1 (en) * 2020-04-07 2021-10-07 Naver Corporation A method for training a convolutional neural network for image recognition using image-conditioned masked language modeling
CN112163400A (en) * 2020-06-29 2021-01-01 维沃移动通信有限公司 Information processing method and device

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117132998A (en) * 2023-08-29 2023-11-28 安徽以观文化科技有限公司 Method and system for identifying single fonts of calligraphic works
CN117132998B (en) * 2023-08-29 2024-05-03 安徽以观文化科技有限公司 Method and system for identifying single fonts of calligraphic works
CN117437530A (en) * 2023-10-12 2024-01-23 中国科学院声学研究所 Synthetic aperture sonar interest small target twin matching identification method and system
CN117970224A (en) * 2024-03-29 2024-05-03 国网福建省电力有限公司 CVT error state online evaluation method, system, equipment and medium

Also Published As

Publication number Publication date
CN116486419A (en) 2023-07-25

Similar Documents

Publication Publication Date Title
WO2023134402A1 (en) Calligraphy character recognition method based on siamese convolutional neural network
JP6351689B2 (en) Attention based configurable convolutional neural network (ABC-CNN) system and method for visual question answering
US20210397876A1 (en) Similarity propagation for one-shot and few-shot image segmentation
RU2661750C1 (en) Symbols recognition with the use of artificial intelligence
US20210027098A1 (en) Weakly Supervised Image Segmentation Via Curriculum Learning
US20190087677A1 (en) Method and system for converting an image to text
CN111488826A (en) Text recognition method and device, electronic equipment and storage medium
RU2693916C1 (en) Character recognition using a hierarchical classification
CN111062277B (en) Sign language-lip language conversion method based on monocular vision
CN109983473B (en) Flexible integrated recognition and semantic processing
WO2023015939A1 (en) Deep learning model training method for text detection, and text detection method
CN113051914A (en) Enterprise hidden label extraction method and device based on multi-feature dynamic portrait
RU2712101C2 (en) Prediction of probability of occurrence of line using sequence of vectors
CN114417872A (en) Contract text named entity recognition method and system
Nguyen et al. Nom document digitalization by deep convolution neural networks
US20240028828A1 (en) Machine learning model architecture and user interface to indicate impact of text ngrams
CN115906835B (en) Chinese question text representation learning method based on clustering and contrast learning
US20240152749A1 (en) Continual learning neural network system training for classification type tasks
CN110929013A (en) Image question-answer implementation method based on bottom-up entry and positioning information fusion
Ueki et al. Survey on deep learning-based Kuzushiji recognition
CN112560848B (en) Training method and device for POI (Point of interest) pre-training model and electronic equipment
Awal et al. A hybrid classifier for handwritten mathematical expression recognition
CN107943972A (en) A kind of intelligent response method and its system
Sharma et al. Optical Character Recognition Using Hybrid CRNN Based Lexicon-Free Approach with Grey Wolf Hyperparameter Optimization
Huang et al. Spatial Aggregation for Scene Text Recognition.

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22920033

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE