CN112232195A - Handwritten Chinese character recognition method, device and storage medium - Google Patents

Handwritten Chinese character recognition method, device and storage medium Download PDF

Info

Publication number
CN112232195A
CN112232195A CN202011102640.5A CN202011102640A CN112232195A CN 112232195 A CN112232195 A CN 112232195A CN 202011102640 A CN202011102640 A CN 202011102640A CN 112232195 A CN112232195 A CN 112232195A
Authority
CN
China
Prior art keywords
chinese character
network
network structure
layer
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011102640.5A
Other languages
Chinese (zh)
Other versions
CN112232195B (en
Inventor
薛晗庆
潘红九
陈政
梁宇
窦小明
金娜
薛凯
顾天祺
张建
雷净
于雪洁
赵俊翔
底亚峰
封慧英
李萌萌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Near Space Vehicles System Engineering
Original Assignee
Beijing Institute of Near Space Vehicles System Engineering
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Near Space Vehicles System Engineering filed Critical Beijing Institute of Near Space Vehicles System Engineering
Priority to CN202011102640.5A priority Critical patent/CN112232195B/en
Publication of CN112232195A publication Critical patent/CN112232195A/en
Application granted granted Critical
Publication of CN112232195B publication Critical patent/CN112232195B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/30Writer recognition; Reading and verifying signatures
    • G06V40/33Writer recognition; Reading and verifying signatures based only on signature image, e.g. static signature recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Character Discrimination (AREA)

Abstract

The application discloses a handwritten Chinese character recognition method, a handwritten Chinese character recognition device and a storage medium, which are used for improving the reliability and efficiency of handwritten Chinese character recognition of a document scanning image. The handwritten Chinese character recognition method provided by the application comprises the following steps: constructing a network structure; inputting a Chinese character sequence image to be recognized; determining a weight parameter of each layer of network in the network structure; according to the weight parameters, calculating the Chinese character sequence image to be identified in the network structure, and determining a label sequence, wherein the label sequence comprises Chinese character probability information; and performing Chinese character inverse mapping according to the label sequence and the Chinese character word base table to determine the recognized Chinese characters. The application also provides a handwritten Chinese character recognition device and a storage medium.

Description

Handwritten Chinese character recognition method, device and storage medium
Technical Field
The present application relates to the field of information processing, and in particular, to a method and an apparatus for handwritten Chinese character recognition.
Background
With the increase of visual content data, the demand for handwritten Chinese character recognition in the fields of photographed documents, bills, form forms, manuscript documents, educational materials, and the like is increasing. The non-handwritten Chinese characters are mainly presented in the form of print forms or watermarks in images or videos, the formats of character form components, the distances between characters and paragraphs and the like of each section of sequence characters are uniform, the characters of the handwritten Chinese characters are different due to different individual writing styles, the shapes of the same Chinese characters written by different groups of people are different, the character form difference of children who initially recognize the Chinese characters is more obvious, and the recognition difficulty of the handwritten Chinese characters is higher than that of the non-handwritten Chinese characters. In the prior art, a single character is cut firstly, and then the cut characters are subjected to feature extraction one by one and are matched and identified with a feature library, so that the efficiency is low and the reliability is poor.
Disclosure of Invention
In view of the above technical problems, embodiments of the present application provide a method, an apparatus, and a storage medium for handwritten Chinese character recognition, so as to improve recognition efficiency and reliability of handwritten Chinese characters.
In a first aspect, a method for identifying handwritten Chinese characters provided in an embodiment of the present application includes:
constructing a network structure;
inputting a Chinese character sequence image to be recognized;
determining a weight parameter of each layer of network in the network structure;
according to the weight parameters, calculating the Chinese character sequence image to be identified in the network structure, and determining a label sequence, wherein the label sequence comprises Chinese character probability information;
and performing Chinese character inverse mapping according to the label sequence and the Chinese character word base table to determine the recognized Chinese characters.
Further, the determining the weight parameter of each layer of network in the network structure includes:
loading training data in batches;
inputting the training data into the network structure for calculation, and determining a Chinese character category probability matrix;
obtaining an error value through a loss function operation according to the label sequence of the training data and the probability matrix;
returning the error value to a network structure for gradient updating of weight parameters;
and determining the optimal weight parameter as the weight parameter of each layer of network in the network structure.
Further, the constructing the network structure further includes:
defining a loss function;
setting a training hyper-parameter, wherein the hyper-parameter comprises one or a combination of the following: learning rate, learning decay rate, or training period.
Preferably, the loading of training data by batch includes:
segmenting the training data according to batches;
randomly disorganizing the segmented training data;
and storing the training data after random scrambling in an iterator.
Further, the inputting the training data into the network structure for calculation includes:
and traversing the iterator, and inputting the data into a network structure according to batches for calculation.
Further, before inputting the sequence image of the Chinese character to be recognized, the method further comprises the following steps: generating a Chinese word library table.
Preferably, the generating the chinese character thesaurus table includes:
acquiring information of a Chinese character word stock;
creating a linked list, and adding the texts in the Chinese character lexicon to the linked list one by one;
carrying out deduplication processing on the texts in the linked list, and removing repeated texts from the linked list;
creating an index linked list, wherein the index is increased from 0 to the length of the word stock linked list after the duplication is removed;
and sequencing the word bank linked list according to the first letter of the word from low to high, establishing a mapping relation with the index linked list after sequencing, and determining the mapping relation as a Chinese word bank table.
Further, the mapping relationship includes:
and establishing a one-to-one mapping relation between the index values in the index linked list and the Chinese characters in the word stock linked list.
As a preferred example, the segmenting the training data by batches includes:
acquiring training image data;
reading a label file corresponding to each image in the training image data;
mapping conversion from the Chinese character to the index is carried out according to the characters of the label file and the Chinese character word base table to obtain a label index;
packaging each image in the image data and a label index corresponding to each image into a data packet linked list;
all data packets are randomly arranged and equally divided into N/N batches, wherein each batch has N data, and N is an integer greater than or equal to 2.
Preferably, the constructing the network structure includes:
constructing an M-layer convolutional neural network CNN;
initializing parameters of each layer of the CNN;
compressing the character sequence characteristics output by the CNN according to the dimension;
adding two layers of bidirectional long Short Term Memory networks (BidirectlLSTM);
adding a layer of SoftMax network;
wherein M is an integer of 2 or more.
Further, the method also comprises the following steps:
reading a configuration file of a network structure;
according to the configuration file, for each layer of the CNN, the following operations are executed:
judging whether batch standardization operation is carried out after the convolution layer, and if so, executing batch standardization processing;
judging the nonlinear mapping mode, if the mapping mode is common mapping, using a Relu function to carry out nonlinear mapping on the convolution output, and otherwise, using a Leaky Relu activation function to carry out nonlinear mapping on the convolution output;
and judging whether a maximization pool layer is added or not, and if so, initializing the maximization pool layer according to the parameters of the current layer of the CNN.
The method adopts a network structure of CNN + RNN, utilizes the CNN to extract the characteristics of the image, directly uses the CNN as a sequence analysis problem to process by using an LSTM network, converts text recognition into a sequence learning problem of time sequence dependence, automatically extracts the characteristics, and identifies end to end. Compared with the traditional method, the timeliness, the generalization and the accuracy of the algorithm are improved.
In a second aspect, an embodiment of the present application further provides a handwritten chinese character recognition apparatus, including:
the data loader module is used for loading the training data according to batches;
the Chinese character word stock table generating module is used for generating a Chinese character word stock table;
the Network structure building module is used for building a convolutional neural Network CNN, a bidirectional long and Short Term Memory Network BidirectlLong Short Term Memory Network, a BidirectlLSTM Network and a SoftMax Network;
the network training module is used for determining the weight parameters of the CNN, the BidirectionalLSTM network and the SoftMax network according to training data;
and the handwritten Chinese character prediction module is used for determining a label sequence with the highest probability combination according to the weight parameters and the handwritten Chinese character image to be recognized and outputting the Chinese character according to the label sequence and the Chinese character word base table.
In a third aspect, an embodiment of the present application further provides a handwritten chinese character recognition apparatus, including: a memory, a processor, and a user interface;
the memory for storing a computer program;
the user interface is used for realizing interaction with a user;
the processor is used for reading the computer program in the memory, and when the processor executes the computer program, the handwritten Chinese character recognition method provided by the invention is realized.
In a fourth aspect, an embodiment of the present invention further provides a storage medium readable by a processor, where the storage medium readable by the processor stores a computer program, and the processor executes the computer program to implement the method for recognizing handwritten Chinese characters provided in the present invention.
The method, the device and the storage medium provided by the invention can obviously improve the identification precision of handwritten Chinese characters in the document scanning images, effectively solve the problem of low timeliness of multi-stage image character identification of the traditional method in an end-to-end mode, effectively reduce the error identification rate caused by fuzzy Chinese characters after document scanning, and improve the accuracy and the efficiency.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a schematic diagram of a handwritten Chinese character recognition method provided in an embodiment of the present application;
fig. 2 is a schematic diagram of a process of generating a chinese character lexicon table according to an embodiment of the present application;
fig. 3 is a schematic diagram of a data loading process provided in an embodiment of the present application;
fig. 4 is a schematic diagram of a data slicing process provided in an embodiment of the present application;
fig. 5 is a schematic diagram of a network construction process provided in an embodiment of the present application;
FIG. 6 is a schematic diagram of a deep learning training process provided in an embodiment of the present application;
FIG. 7 is a schematic diagram of another deep learning training process provided in the embodiments of the present application;
FIG. 8 is a block diagram illustrating a handwritten Chinese character recognition apparatus according to an embodiment of the present application;
fig. 9 is a schematic diagram of another handwritten Chinese character recognition apparatus according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be described in further detail with reference to the accompanying drawings, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Some of the words that appear in the text are explained below:
1. the term "and/or" in the embodiments of the present invention describes an association relationship of associated objects, and indicates that three relationships may exist, for example, a and/or B may indicate: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.
2. In the embodiments of the present application, the term "plurality" means two or more, and other terms are similar thereto.
3. The Convolutional Neural Network (CNN) is a kind of feed forward Neural network (fed forward Neural network) containing convolution calculation and having a deep structure, and is one of deep learning (deep learning) artificial Neural Networks, and its weight-shared network structure significantly reduces the complexity of the model and the number of weights. The convolutional neural network can directly take the picture as the input of the network, automatically extract the characteristics, and has high non-deformation to the deformation (such as translation, scaling, inclination) and the like of the picture.
4. The bidirectional Long and Short period Memory Network (Bi-directional Long Short Term Memory Network, Bidirectional LSTM) is one kind of Recurrent Neural Network (RNN), which is a special neural Network that self-calls according to time sequence or character sequence (specifically see application scene), and is also a variant of the Long and Short period Memory Network (LSTM), i.e. a reverse symmetric Network is added, so that Long-Term dependence information can be learned. The method avoids the long-term dependence problem through the deliberate design, and is successfully applied to many fields such as voice recognition, picture description, natural language processing and the like.
5. The SoftMax network layer is an algorithm for normalizing the classification result, namely
Figure BDA0002725908560000061
Wherein
Figure BDA0002725908560000062
The output of j-th neuron of L-th layer network, k is the number of neuron of the layer network;
6. the learning rate is an important parameter in supervised learning and deep learning, and determines whether and when the objective function can converge to a local minimum. The appropriate learning rate can make the objective function converge to the local minimum value in an appropriate time;
7. the learning decay rate is a decay rate of the learning rate;
8. the training period refers to the number of times of training of the neural network;
9. the Relu function refers to a Linear rectification function (ReLU), which is also called a modified Linear Unit and is a commonly used activation function (activation function) in an artificial neural network;
10. the Leaky Relu activation function is a variation of the Relu function.
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that the display sequence of the embodiment of the present application only represents the sequence of the embodiment, and does not represent the merits of the technical solutions provided by the embodiments.
Example one
Referring to fig. 1, a schematic diagram of a handwritten chinese character recognition method provided in an embodiment of the present application is shown, where the method includes steps S101 to S105:
s101, constructing a network structure;
as another preferred example, before the step S101 constructs the network structure, a step of generating a chinese word library table is further included. Illustratively, the step of generating the chinese word library table is shown in fig. 2, and includes steps S201 to S205:
s201, acquiring information of a Chinese character word stock; for example, text information such as chinese characters, english, numerals, punctuation marks, etc.;
s202, creating a linked list, and adding the texts in the Chinese character lexicon to the linked list one by one;
s203, carrying out duplication elimination processing on the texts in the linked list, and removing the repeated texts from the linked list;
s204, creating an index linked list, wherein the index is increased from 0 to the length of the word stock linked list after duplication removal;
s205, the word stock linked list is sorted from small to large according to the first letter of the word, a mapping relation is established with the index linked list after sorting, and the mapping relation is determined as a Chinese word stock list.
As a preferred example, the mapping relationship in step S205 may be a one-to-one mapping relationship between an index value in an index linked list and a chinese character in the word library linked list, for example, a mapping relationship between { index: kanji } represents a mapping, such as { 1: large }, { 2: small }.
As another preferred example, the step of building the network structure is shown in fig. 5, and includes steps S501 to S505:
s501, constructing a Convolutional Neural Network (CNN) of an M-layer Convolutional Neural network;
in this step, the number M of CNN layers is determined first, and then the construction is performed. Preferably, the number of layers M is greater than or equal to 2, and the specific number of layers is determined according to actual requirements, which is not limited in this embodiment.
S502, initializing parameters of each layer of the CNN;
for the M layers CNN determined in step S501, a parameter initialization operation is performed for each layer. As a preferred example, after this step, according to the preset configuration file content, the following operations can be performed:
step S502-1: judging whether batch standardization operation is carried out after the convolution layer, and if so, executing batch standardization processing;
step S502-2: judging the nonlinear mapping mode, if the mapping mode is common mapping, using a Relu function to carry out nonlinear mapping on the convolution output, and otherwise, using a Leaky Relu activation function to carry out nonlinear mapping on the convolution output;
step S502-3: and judging whether a maximization pool layer is added or not, and if so, initializing the maximization pool layer according to the parameters of the current layer of the CNN.
S503, performing dimension compression processing on the Chinese character sequence characteristics output by the CNN, namely compressing four dimensions into one dimension;
s504, adding two layers of bidirectional Long Short Term Memory networks, BidirectlLSTM;
s505, adding a layer of SoftMax network;
as a preferred example, before starting to construct the network structure, the following two steps can be further included:
step A: defining a loss function;
and B: setting a training hyper-parameter, wherein the hyper-parameter comprises one or a combination of the following: learning rate, learning decay rate, or training period.
Taking a 7-layer CNN network as an example, specific steps for constructing a network structure include steps C1 to C8:
c1, designing the number of input and output channels of each layer of the initialization network, the size of convolution kernel, convolution step length and other parameters. The number of input and output channels of each layer is defined as follows: nm is [64,128,256,256,512,512,512], the latter parameter is the output channel number of the former layer, the former parameter is the input channel number of the latter layer, and so on, the weight parameters of each layer are initialized by the Xavier standard method;
c2, building a CNN network structure, setting a 7-layer network structure, initializing an index item i to be 0, and if i is less than 6, adding the ith convolution layer and setting parameters of the layer by using the operation of the step 1; otherwise, step C5 is executed to build the LSTM network structure.
C3, judging whether batch standardization operation BatchNormlication operation needs to be carried out after the convolution layer according to the configuration file, if so, executing the batch standardization operation BatchNorm algorithm, otherwise, directly executing the step C4; in the step, batch standardized operation is used for accelerating convergence, and the risk of network overfitting is reduced;
c4, judging the nonlinear mapping mode, if the nonlinear mapping mode is not the nonlinear mapping mode, using a Relu function to perform nonlinear mapping on the convolution output, otherwise, using a Leaky Relu activation function to perform nonlinear mapping on the convolution output; in this step, the non-linear mapping is used to increase the non-linear expressive power of the features;
c5, judging whether the maximum pooling layer needs to be added according to the configuration file, if so, setting the size of the maximum pooling layer by using the initialization parameters of the step C1; otherwise, finishing the building of the network layer, and executing the step C2 to continue building the next layer; in the step, a maximum pooling layer is added for reducing the sensitivity of the convolution layer to the position of the Chinese character sequence;
and C6, performing dimension-based compression processing on the Chinese character sequence characteristics output by the 7-layer CNN network. As a preferred example, the matrix of [ batch _ size, channels, height, width ] shapes is changed to [ batch _ size channels width, height ] shapes to accommodate the input of the LSTM network;
c7, adding two layers of bidirectional long and short period memory network (BidirectionalLSTM) to allow capturing long distance dependency problems. In the image-based sequence, the contexts of two directions are interactive and complementary, so that two LSTM networks are combined into a bidirectional LSTM one forward and one backward, and multiple layers are stacked to capture higher-level abstract information, and the robustness of recognizing the handwritten Chinese character sequence is improved;
and C8, adding the last layer of network as a SoftMax network, and mapping the Chinese character numerical value predicted by the CNN + LSTM network into a probability value of 0-1. For example, if there are 5000 chinese characters, the output predicted value of each chinese character is converted into a probability matrix of 1 × 5000, such as: [0.12,0.22,0.02, …,0.44] represents the probability that the predicted Chinese character belongs to which Chinese character in the 5000 Chinese character dictionaries is the highest.
S102, inputting a Chinese character sequence image to be recognized;
s103, determining the weight parameters of each layer of network in the network structure;
in this step, the weight parameters of each layer of network are determined by the network structure operation constructed in S101 according to the training data. As a preferred example, the method for determining the weight parameter is shown in fig. 6, and includes steps S601 to S605:
s601, loading training data according to batches;
s602, inputting the training data into the network structure for calculation, and determining a Chinese character category probability matrix;
s603, obtaining an error value through loss function operation according to the label sequence of the training data and the probability matrix;
s604, returning the error value to the network structure for gradient updating of the weight parameter;
s605, determining the optimal weighting parameter as the weighting parameter of each layer of the network structure. As a preferred example, the error value is subjected to a back propagation algorithm to obtain a gradient value of the error with respect to the network weight parameters of each layer, and the gradient update is performed by using the deep learning optimizer RMSprop until the loss function converges, at which time the weight of each layer is determined as the optimal weight parameter in the network structure.
In the above steps, loading the training data in batches is completed through a loading process. Specifically, the loading process is shown in fig. 3, and includes:
s301, segmenting training data according to batches;
s302, randomly disordering the segmented training data;
s303, storing the training data after random scrambling in an iterator
After the training data is loaded through the steps S301-S303, the iterator is traversed, and the data is input into the network structure according to batches for calculation.
It should be noted that, segmenting the training data by batches may be accomplished by the method shown in fig. 4, which includes steps S401 to S405:
s401, acquiring training image data;
s402, reading a label file corresponding to each image in the training image data;
s403, according to the character and Chinese word library table of the tag file, mapping conversion from Chinese characters to indexes is carried out to obtain tag indexes;
s404, packaging each image in the image data and the label index corresponding to each image into a data packet linked list. As a preferred example, the format of the package may be [ [ img1, label1], [ img2, label2], …, [ imgN, labelN ] ], where N is the total data length, img1 is the image name, and label1 is the image correspondence label index;
s405, randomly arranging all the data packets, and equally dividing the data packets into N/N batches, wherein each batch has N data, and N is an integer greater than or equal to 2.
The following description will take the example of encapsulating data and tag index into a DataSet class package to illustrate the data loading process:
d1, acquiring a local storage path of the training image data, and respectively recording the image and the corresponding label file in text files with different names and storing the image and the corresponding label file in a local unified path folder;
d2, traversing the storage path of the training image data to determine whether the image is the last image, if so, ending, otherwise, executing the step D3;
d3, acquiring a corresponding label file path according to the current training image, checking whether the path exists, executing the step D4 if the path exists, and returning to the step D2 if the path does not exist;
d4, opening the label file, taking the character as the partition, reading into the memory word by word;
d5, code conversion is carried out on the words. As a preferred example, the transcoding may be One-Hot vector coding;
d6, inputting the Chinese characters after code conversion into a Chinese character word stock table, performing mapping conversion from the Chinese characters to indexes, and converting the Chinese characters into a digital index form, so that a computer can identify and perform corresponding calculation;
d7, reading the Chinese character sequence image according to the storage path of the training image data, and preprocessing the image, wherein the preprocessing comprises: font blurring processing, partial occlusion processing, or the like;
d8, packaging the image matrix and the label index sequence into a DataSet class packet until all training data are traversed;
d9, randomly scrambling all the data packets, and dividing the data packets according to the set batch number.
The data loading process can be completed through a data loader. After the data loading is finished, training can be carried out based on the constructed network structure, so that the weight parameters of each layer of the network are obtained.
It should be noted that the training process may be implemented by an accelerator, such as a cuda GPU accelerator, to increase the operation speed. If an accelerator is supported, the accelerator may be enabled to increase training speed.
One specific example is given below in connection with the determination of whether an accelerator is supported, as shown in fig. 7:
s701, initializing a data loader;
s702, initializing a network structure;
s703, judging whether the cudaGPU acceleration is available, if so, executing S704, otherwise, executing S705;
s704, converting the model training mode into a GPU computing mode;
s705, converting the model training mode into a CPU calculation mode;
s706, defining a CTC (connectionist Temporal Classification) loss function. The loss function is used for measuring the redundancy of identification and solving the problem of misalignment of the data label and the prediction output of the neural network. For example, a Chinese character is identified twice in succession, "who is my" is identified as "who is my".
S707, initializing an end period end _ epochs of training;
s708, judging whether the training period is less than the training ending period end _ epochs. If yes, executing S709, otherwise executing S712; wherein the end period end _ epochs is an integer approximately equal to 2;
s709, inputting the data into a network structure according to batches, and training the data stream through a CNN + LSTM + SoftMax layer;
s710, calculating a CTC loss value;
and S711, defining an RMSprop optimizer, transmitting the loss value into the optimizer, then performing back propagation, iteratively updating the network weight parameters layer by layer, and executing the step S708.
In the example, after the training data is loaded in batches, the training data is trained by the convolutional neural network CNN of end _ epochs times and the two-layer bidirectional long-short period memory network BidirectionalLSTM and SoftMax networks, so that the optimal weight parameters are iterated.
S104, calculating the Chinese character sequence image to be identified in the network structure according to the weight parameters, and determining a label sequence, wherein the label sequence comprises Chinese character probability information;
it should be noted that the probability information may be represented in the form of a probability matrix, and as a preferred example, the probability matrix may be represented as: for example, the prediction of the Chinese character "a" can be represented as [0.98,0.0012,0.0046,0, … ], where "a" is ordered in the thesaurus as the 1 st word, and the first probability value of the matrix is the probability of predicting "a".
And S105, performing Chinese character inverse mapping according to the label sequence and the Chinese character word base table, and determining the recognized Chinese characters.
By the method of the embodiment, a network structure of CNN + RNN is adopted, the CNN is used for extracting the characteristics of the image and then directly processing the image as a sequence analysis problem by using the LSTM network, and compared with the prior art, the timeliness, the generalization and the accuracy of the algorithm are improved. In the embodiment, a sequence merging mechanism of the CTC Loss function is also used, and redundancy caused by identification is removed by using CTC transcription, so that the accuracy of model identification is improved.
The method improves the identification precision of the handwritten Chinese characters in the document scanning images, effectively solves the problem of low timeliness of multi-stage image character identification of the existing method in an end-to-end mode, and effectively reduces the error identification rate caused by fuzzy Chinese characters after document scanning.
Example two
Based on the same inventive concept, an embodiment of the present invention further provides a handwritten Chinese character recognition device, as shown in fig. 8, the device includes:
a data loader module 801 for loading training data in batches;
a Chinese character word stock table generating module 802 for generating a Chinese character word stock table;
a Network structure building module 803, which is used for building a convolutional neural Network CNN, a bidirectional long Short Term Memory Network, a bidirectional long and Short period Memory Network and a SoftMax Network;
a network training module 804, configured to determine, according to training data, weight parameters of the CNN, the BidirectionalLSTM network, and the SoftMax network;
and the handwritten Chinese character prediction module 805 is configured to determine a tag sequence with the highest probability combination according to the weight parameter and the handwritten Chinese character image to be recognized, and output a Chinese character according to the tag sequence and the Chinese character word library table.
It should be noted that the data loader module 801 provided in this embodiment can implement the data loading processing procedure in fig. 4 and/or fig. 5, solve the same technical problem, achieve the same technical effect, and is not described herein again;
correspondingly, the module 802 for generating a chinese character lexicon table according to this embodiment can implement all functions of generating the chinese character lexicon table shown in fig. 2, solve the same technical problem, achieve the same technical effect, and is not described herein again;
correspondingly, the network structure building module 803 provided in this embodiment can implement all the functions of network building shown in fig. 5, solve the same technical problem, achieve the same technical effect, and is not described herein again;
the network training module 804 provided in this embodiment can implement the training process shown in fig. 6 and/or fig. 7, solve the same technical problem, achieve the same technical effect, and is not described herein again;
the handwritten Chinese character prediction module 805 provided in this embodiment can implement all functions of handwritten Chinese character recognition in fig. 1, solve the same technical problems, and achieve the same technical effects, which are not described herein again.
It should be noted that the apparatus provided in the second embodiment and the method provided in the first embodiment belong to the same inventive concept, solve the same technical problem, and achieve the same technical effect, and the apparatus provided in the second embodiment can implement all the methods of the first embodiment, and the same parts are not described again.
EXAMPLE III
Based on the same inventive concept, an embodiment of the present invention further provides a handwritten Chinese character recognition device, as shown in fig. 9, the device includes:
including memory 902, processor 901, and user interface 903;
the memory 902 is used for storing computer programs;
the user interface 903 is used for realizing interaction with a user;
the processor 901 is configured to read the computer program in the memory 902, and when the processor 901 executes the computer program, the processor is configured to:
constructing a network structure;
inputting a Chinese character sequence image to be recognized;
determining a weight parameter of each layer of network in the network structure;
according to the weight parameters, calculating the Chinese character sequence image to be identified in the network structure, and determining a label sequence, wherein the label sequence comprises Chinese character probability information;
and performing Chinese character inverse mapping according to the label sequence and the Chinese character word base table to determine the recognized Chinese characters.
Where in fig. 9 the bus architecture may include any number of interconnected buses and bridges, in particular one or more processors represented by processor 901 and various circuits of memory represented by memory 902 linked together. The bus architecture may also link together various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. The bus interface provides an interface. The processor 901 is responsible for managing a bus architecture and general processing, and the memory 902 may store data used by the processor 901 in performing operations.
The processor 901 may be a CPU, an ASIC, an FPGA or a CPLD, and the processor 901 may also adopt a multi-core architecture.
The processor 901, when executing the computer program stored in the memory 901, implements any of the handwritten Chinese character recognition methods shown in fig. 1 to 7.
It should be noted that the apparatus provided in the third embodiment and the method provided in the first embodiment belong to the same inventive concept, solve the same technical problem, and achieve the same technical effect, and the apparatus provided in the third embodiment can implement all the methods of the first embodiment, and the same parts are not described again.
The present application also proposes a processor-readable storage medium. The processor-readable storage medium stores a computer program, and the processor implements any one of the handwritten Chinese character recognition methods shown in fig. 1 to 7 when executing the computer program.
It should be noted that the division of the unit in the embodiment of the present application is schematic, and is only a logic function division, and there may be another division manner in actual implementation. In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims (14)

1. A handwritten Chinese character recognition method is characterized by comprising the following steps:
constructing a network structure;
inputting a Chinese character sequence image to be recognized;
determining a weight parameter of each layer of network in the network structure;
according to the weight parameters, calculating the Chinese character sequence image to be identified in the network structure, and determining a label sequence, wherein the label sequence comprises Chinese character probability information;
and performing Chinese character inverse mapping according to the label sequence and the Chinese character word base table to determine the recognized Chinese characters.
2. The method of claim 1, wherein determining the weighting parameters for each layer of the network fabric comprises:
loading training data in batches;
inputting the training data into the network structure for calculation, and determining a Chinese character category probability matrix;
obtaining an error value through a loss function operation according to the label sequence of the training data and the probability matrix;
returning the error value to a network structure for gradient updating of weight parameters;
and determining the optimal weight parameter as the weight parameter of each layer of network in the network structure.
3. The method of claim 1, wherein the constructing the network structure further comprises:
defining a loss function;
setting a training hyper-parameter, wherein the hyper-parameter comprises one or a combination of the following: learning rate, learning decay rate, or training period.
4. The method of claim 2, wherein the loading training data in batches comprises:
segmenting the training data according to batches;
randomly disorganizing the segmented training data;
and storing the training data after random scrambling in an iterator.
5. The method of claim 4, wherein the inputting the training data into the network structure for computation comprises:
and traversing the iterator, and inputting the data into a network structure according to batches for calculation.
6. The method of claim 1, wherein before inputting the sequence image of the Chinese character to be recognized, the method further comprises:
generating a Chinese word library table.
7. The method of claim 6, wherein said generating a chinese character thesaurus table comprises:
acquiring information of a Chinese character word stock;
creating a linked list, and adding the texts in the Chinese character lexicon to the linked list one by one;
carrying out deduplication processing on the texts in the linked list, and removing repeated texts from the linked list;
creating an index linked list, wherein the index is increased from 0 to the length of the word stock linked list after the duplication is removed;
and sequencing the words of the word bank linked list from small to large according to the first letter, establishing a mapping relation with the index linked list after sequencing, and determining the mapping relation as a Chinese word bank table.
8. The method of claim 7, wherein the mapping comprises:
and establishing a one-to-one mapping relation between the index values in the index linked list and the Chinese characters in the word stock linked list.
9. The method of claim 4, wherein the slicing training data in batches comprises:
acquiring training image data;
reading a label file corresponding to each image in the training image data;
mapping conversion from the Chinese character to the index is carried out according to the characters of the label file and the Chinese character word base table to obtain a label index;
packaging each image in the image data and a label index corresponding to each image into a data packet linked list;
all data packets are randomly arranged and equally divided into N/N batches, wherein each batch has N data, and N is an integer greater than or equal to 2.
10. The method of claim 1, wherein constructing the network structure comprises:
constructing an M-layer convolutional neural network CNN;
initializing parameters of each layer of the CNN;
compressing the character sequence characteristics output by the CNN according to the dimension;
adding two layers of bidirectional long and Short Term Memory networks (BidirectionLSTM);
adding a layer of SoftMax network;
wherein M is an integer of 2 or more.
11. The method of claim 10, further comprising:
reading a configuration file of a network structure;
according to the configuration file, for each layer of the CNN, the following operations are executed:
judging whether batch standardization operation is carried out after the convolution layer, and if so, executing batch standardization processing;
judging the nonlinear mapping mode, if the mapping mode is common mapping, using a Relu function to carry out nonlinear mapping on the convolution output, and otherwise, using a Leaky Relu activation function to carry out nonlinear mapping on the convolution output;
and judging whether a maximization pool layer is added or not, and if so, initializing the maximization pool layer according to the parameters of the current layer of the CNN.
12. A handwritten chinese character recognition device, comprising:
the data loader module is used for loading the training data according to batches;
the Chinese character word stock table generating module is used for generating a Chinese character word stock table;
the Network structure building module is used for building a convolutional neural Network CNN, a bidirectional long and Short Term Memory Network BidirectlLong Short Term Memory Network, a BidirectlLSTM Network and a SoftMax Network;
the network training module is used for determining the weight parameters of the CNN, the BidirectionalLSTM network and the SoftMax network according to training data;
and the handwritten Chinese character prediction module is used for determining a label sequence with the highest probability combination according to the weight parameters and the handwritten Chinese character image to be recognized and outputting the Chinese character according to the label sequence and the Chinese character word base table.
13. A handwritten Chinese character recognition device is characterized by comprising a memory, a processor and a user interface;
the memory for storing a computer program;
the user interface is used for realizing interaction with a user;
the processor is used for reading the computer program in the memory, and when the processor executes the computer program, the handwritten Chinese character recognition method according to any one of claims 1 to 11 is realized.
14. A processor-readable storage medium, wherein the processor-readable storage medium stores a computer program, and wherein the processor, when executing the computer program, implements the handwritten chinese character recognition method according to any one of claims 1 to 11.
CN202011102640.5A 2020-10-15 2020-10-15 Handwritten Chinese character recognition method, device and storage medium Active CN112232195B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011102640.5A CN112232195B (en) 2020-10-15 2020-10-15 Handwritten Chinese character recognition method, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011102640.5A CN112232195B (en) 2020-10-15 2020-10-15 Handwritten Chinese character recognition method, device and storage medium

Publications (2)

Publication Number Publication Date
CN112232195A true CN112232195A (en) 2021-01-15
CN112232195B CN112232195B (en) 2024-02-20

Family

ID=74113160

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011102640.5A Active CN112232195B (en) 2020-10-15 2020-10-15 Handwritten Chinese character recognition method, device and storage medium

Country Status (1)

Country Link
CN (1) CN112232195B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114154503A (en) * 2021-12-02 2022-03-08 四川启睿克科技有限公司 Sensitive data type identification method
CN114529910A (en) * 2022-01-27 2022-05-24 北京鼎事兴教育咨询有限公司 Handwritten character recognition method and device, storage medium and electronic equipment
CN115019404A (en) * 2022-07-01 2022-09-06 中国光大银行股份有限公司 Handwritten signature recognition method and device, storage medium and electronic equipment

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007086956A (en) * 2005-09-21 2007-04-05 Fuji Xerox Co Ltd Image processor, image processing method and computer program
CN102866781A (en) * 2011-07-06 2013-01-09 哈尔滨工业大学 Pinyin-to-character conversion method and pinyin-to-character conversion system
CN107636691A (en) * 2015-06-12 2018-01-26 商汤集团有限公司 Method and apparatus for identifying the text in image
CN108268449A (en) * 2018-02-10 2018-07-10 北京工业大学 A kind of text semantic label abstracting method based on lexical item cluster
CA3000166A1 (en) * 2017-04-03 2018-10-03 Royal Bank Of Canada Systems and methods for cyberbot network detection
CN109002461A (en) * 2018-06-04 2018-12-14 平安科技(深圳)有限公司 Handwriting model training method, text recognition method, device, equipment and medium
CN111046946A (en) * 2019-12-10 2020-04-21 昆明理工大学 Burma language image text recognition method based on CRNN
CN111104884A (en) * 2019-12-10 2020-05-05 电子科技大学 Chinese lip language identification method based on two-stage neural network model
CN111414527A (en) * 2020-03-16 2020-07-14 腾讯音乐娱乐科技(深圳)有限公司 Similar item query method and device and storage medium
CN111461112A (en) * 2020-03-03 2020-07-28 华南理工大学 License plate character recognition method based on double-cycle transcription network

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007086956A (en) * 2005-09-21 2007-04-05 Fuji Xerox Co Ltd Image processor, image processing method and computer program
CN102866781A (en) * 2011-07-06 2013-01-09 哈尔滨工业大学 Pinyin-to-character conversion method and pinyin-to-character conversion system
CN107636691A (en) * 2015-06-12 2018-01-26 商汤集团有限公司 Method and apparatus for identifying the text in image
CA3000166A1 (en) * 2017-04-03 2018-10-03 Royal Bank Of Canada Systems and methods for cyberbot network detection
CN108268449A (en) * 2018-02-10 2018-07-10 北京工业大学 A kind of text semantic label abstracting method based on lexical item cluster
CN109002461A (en) * 2018-06-04 2018-12-14 平安科技(深圳)有限公司 Handwriting model training method, text recognition method, device, equipment and medium
CN111046946A (en) * 2019-12-10 2020-04-21 昆明理工大学 Burma language image text recognition method based on CRNN
CN111104884A (en) * 2019-12-10 2020-05-05 电子科技大学 Chinese lip language identification method based on two-stage neural network model
CN111461112A (en) * 2020-03-03 2020-07-28 华南理工大学 License plate character recognition method based on double-cycle transcription network
CN111414527A (en) * 2020-03-16 2020-07-14 腾讯音乐娱乐科技(深圳)有限公司 Similar item query method and device and storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
LING-QUN ZUO等: "Natural Scene Text Recognition Based on Encoder-Decoder Framework", 《IEEE ACCESS》, vol. 7, pages 62616 - 62623, XP011726968, DOI: 10.1109/ACCESS.2019.2916616 *
XIWEN QU等: "Data augmentation and directional feature maps extraction for in-air handwritten Chinese character recognition based on convolutional neural network", 《PATTERN RECOGNITION LETTERS》, vol. 111, pages 9 - 15 *
刘文臻: "中文文本多标签分类算法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》, no. 07, pages 138 - 1470 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114154503A (en) * 2021-12-02 2022-03-08 四川启睿克科技有限公司 Sensitive data type identification method
CN114529910A (en) * 2022-01-27 2022-05-24 北京鼎事兴教育咨询有限公司 Handwritten character recognition method and device, storage medium and electronic equipment
CN115019404A (en) * 2022-07-01 2022-09-06 中国光大银行股份有限公司 Handwritten signature recognition method and device, storage medium and electronic equipment

Also Published As

Publication number Publication date
CN112232195B (en) 2024-02-20

Similar Documents

Publication Publication Date Title
CN112232195B (en) Handwritten Chinese character recognition method, device and storage medium
CN110969020A (en) CNN and attention mechanism-based Chinese named entity identification method, system and medium
CN112686345B (en) Offline English handwriting recognition method based on attention mechanism
CN113836992B (en) Label identification method, label identification model training method, device and equipment
CN111738169B (en) Handwriting formula recognition method based on end-to-end network model
Naseer et al. Meta features-based scale invariant OCR decision making using LSTM-RNN
WO2022203899A1 (en) Document distinguishing based on page sequence learning
CN113987187A (en) Multi-label embedding-based public opinion text classification method, system, terminal and medium
CN113051914A (en) Enterprise hidden label extraction method and device based on multi-feature dynamic portrait
CN109918652A (en) A kind of statement similarity judgment method and judge system
CN114612921B (en) Form recognition method and device, electronic equipment and computer readable medium
CN115881265B (en) Intelligent medical record quality control method, system and equipment for electronic medical record and storage medium
CN111723572B (en) Chinese short text correlation measurement method based on CNN convolutional layer and BilSTM
CN113255498A (en) Financial reimbursement invoice management method based on block chain technology
CN113849653A (en) Text classification method and device
CN115080766A (en) Multi-modal knowledge graph characterization system and method based on pre-training model
CN112818117A (en) Label mapping method, system and computer readable storage medium
CN115759082A (en) Text duplicate checking method and device based on improved Simhash algorithm
CN114328845A (en) Automatic structuralization method and system for key information of document image
CN117743890A (en) Expression package classification method with metaphor information based on contrast learning
CN112699684A (en) Named entity recognition method and device, computer readable storage medium and processor
Bhatt et al. Pho (SC)-CTC—a hybrid approach towards zero-shot word image recognition
CN111985204A (en) Customs import and export commodity tax number prediction method
CN114881038B (en) Chinese entity and relation extraction method and device based on span and attention mechanism
CN116662924A (en) Aspect-level multi-mode emotion analysis method based on dual-channel and attention mechanism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant