CN115862038A - Wood board printing Manchu recognition method based on end-to-end neural network - Google Patents

Wood board printing Manchu recognition method based on end-to-end neural network Download PDF

Info

Publication number
CN115862038A
CN115862038A CN202211405330.XA CN202211405330A CN115862038A CN 115862038 A CN115862038 A CN 115862038A CN 202211405330 A CN202211405330 A CN 202211405330A CN 115862038 A CN115862038 A CN 115862038A
Authority
CN
China
Prior art keywords
manchu
word
sequence
picture
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211405330.XA
Other languages
Chinese (zh)
Inventor
卢思洋
王志炜
魏翔
卢苇
王明泉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jiaotong University
Original Assignee
Beijing Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jiaotong University filed Critical Beijing Jiaotong University
Priority to CN202211405330.XA priority Critical patent/CN115862038A/en
Publication of CN115862038A publication Critical patent/CN115862038A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Character Discrimination (AREA)

Abstract

The invention provides a wood board printing Manchu recognition method based on an end-to-end neural network. The method comprises the following steps: cutting out independent Manchu words from the Manchu ancient book picture, and representing each Manchu word by using a Manchu word picture; extracting a characteristic vector of the Manchu word picture when the convolution neural network CNN is used for carrying out convolution operation on the Manchu word picture; processing the characteristic vector extracted by the CNN by using a long-short term memory network LSTM to obtain a vector sequence; and converting the vector sequence output by the LSTM into a label sequence corresponding to the Manchu word by adopting a connectivity time sequence classification algorithm (CTC) algorithm, and identifying the Manchu word according to the label sequence. The method can effectively extract clear, detachable and universal characteristics from a large number of printed Manchu images and identify the characteristics, and simultaneously effectively screen and associate the associated characteristic information in the context, thereby obtaining the printed Manchu identification model with high identification accuracy.

Description

Wood board printing Manchu recognition method based on end-to-end neural network
Technical Field
The invention relates to the technical field of text recognition, in particular to a wood board printing Manchu recognition method based on an end-to-end neural network.
Background
At present, a method for text recognition in the Natural Language Processing (NLP) mainly includes the following four processes: text positioning, text verification, text entry, and text recognition. The goal of text positioning is to precisely position the various words contained in the image. And reducing non-character characters contained in the picture in the positioning process so as to carry out text verification, extraction and identification on the positioned area in the following process. The text positioning method mainly comprises gradient transformation analysis, stroke transformation analysis, connected component analysis and a neural network-based method. The text verification aims to verify whether the positioned image area contains the text to be identified or not, and the probability that the non-text is identified by a subsequent network model is reduced. The text verification method mainly includes a Support Vector Machine (SVM) and a Convolutional Neural Network (CNN). The text input is to input the verified image text information in the image into the network model, and then complete text recognition. Currently, text entry has become one of the most challenging problems in the field of text recognition. Methods of text input are largely classified into Segmentation-based methods, which are further classified into line-level Segmentation and character-level Segmentation, and Non-Segmentation-based methods. In segmentation-based approaches, the accuracy of text recognition depends largely on the effect of the particular segmentation. The text recognition mainly converts image text information input into a network into a target character sequence, which is the last step of the whole method and determines the recognition effect of the whole network model. Currently, the text recognition method mainly includes a Non-Deep Learning-based (Non-Deep Learning-based) method and a Deep Learning-based (Deep Learning-based) method.
Currently, the text recognition methods in the prior art include two methods, manual recognition and Machine Learning (Machine Learning). The manual identification usually requires manual extraction of the features of the text information from the image, and the method relies on manual extraction of clear features, and has the characteristics of low generalization and low performance. Hidden Markov chains (HMMs) are a common method used in conventional machine learning, and are often used for text recognition and error analysis.
With the development of the deep learning field, the deep neural network is introduced into the text recognition field as an effective novel method, and some defects of the traditional method are overcome. Because Mongolian has the characteristics of simple strokes and large relevance between contexts, kang et al uses a Long Short-Term Memory network (LSTM) to recognize Mongolian words. A Recurrent Neural Network (RNN) depends on text information at a previous time, and cannot solve the problem of long-term dependence of text, so that information of LSTM associated context is used. After that, there is a scheme to perform joint training on the CNN and the RNN, and to analyze the influence of different networks of the model on the accuracy of text recognition respectively. In the scheme, a connectivity time series Classification (CTC) algorithm is used on the basis and is used for converting the output of the RNN into a target character sequence, so that the model identification accuracy is further improved. There are also schemes for identifying Mongolian words using an End-to-End (End-to-End) neural network model. The end-to-end model can greatly reduce the difficulty in model training and can process character sequences with any length.
At present, the defects of the wood board printing full text identification method in the prior art comprise: the existing wood board printing Manchu recognition method extracts word features from Manchu words by means of manual operation, and then classifies and recognizes the extracted word features by using a statistical theory. The method usually depends on artificial extraction of clear features in words, and has the problems of unclear extracted features, high extraction time cost, poor generalization capability and the like.
By adopting the method based on machine learning, the problem that the fuzzy feature is extracted manually and better performance can be obtained can be effectively avoided, but the problems that the context feature relationship is ignored, the feature extraction is not accurate and the like can exist, and a large amount of time cost can be still spent when a large amount of Manchun words are identified. The traditional Machine learning methods such as Support Vector Machine (SVM) and Hidden Markov chain (HMM) simplify the problems of classification, regression and the like, obtain better performance in small sample data, but still have the problem of difficulty in recognizing multiple similar features when processing massive data.
Disclosure of Invention
The embodiment of the invention provides a wood board printing Manchu recognition method based on an end-to-end neural network, so as to effectively recognize Manchu pictures.
In order to achieve the purpose, the invention adopts the following technical scheme.
A wood board printing Manchu recognition method based on an end-to-end neural network comprises the following steps:
cutting out independent Manchu words from the Manchu ancient book picture, and representing each Manchu word by using a Manchu word picture;
carrying out convolution operation on the Manchu word picture by using a Convolutional Neural Network (CNN), and extracting a feature vector of the Manchu word picture;
processing the characteristic vector extracted by the CNN by using a long-short term memory network (LSTM) to obtain a vector sequence containing semantic information;
and converting the vector sequence output by the LSTM into a label sequence corresponding to the Manchu words by adopting a connectivity time sequence classification algorithm (CTC) algorithm, and outputting the recognition result of the Manchu words according to the label sequence.
Preferably, the cutting of the individual Manchu words from the Manchu ancient picture, and the representation of each Manchu word by a Manchu word picture, comprises:
cutting out independent Manchu words from each digitized Manchu ancient book picture, representing each Manchu word by using a Manchu word picture with different length and width, zooming all Manchu word pictures to the same size, normalizing all word pictures to the same width on the premise of keeping the aspect ratio of the word pictures, rotating each Manchu word picture by 90 degrees anticlockwise, distributing different labels for all Manchu letters, and annotating all extracted wood boards with character-shaped codes to print Manchu words.
Preferably, when performing convolution operation on the Manchu word picture by using the CNN, extracting the feature vector of the Manchu word picture includes:
the CNN selects a plurality of groups of convolution kernels which are suitable for the characteristic sizes of Manchu words, the Manchu word pictures pass through a plurality of groups of convolution kernels with different sizes in the CNN, the length and the width of each convolution kernel represent the size of a characteristic area extracted by each convolution operation, the output characteristic graph is subjected to dimension reduction by using maximum pooling after each convolution kernel, the maximum value in a candidate area is selected by the maximum pooling to be used as a whole for representing, and finally the characteristic vector containing more word texture information is obtained.
Preferably, the processing, by using LSTM, the feature vector extracted by CNN to obtain a vector sequence includes:
processing the feature vector of the Manchu word picture by using the LSTM containing two directions to obtain each context information h contained in the feature vector 1 ,h 2 ,…,h T Integrating all the context information to obtain a vector sequence h containing the context information and output by the Bi-LSTM i
Figure BDA0003936865220000041
Wherein
Figure BDA0003936865220000042
And &>
Figure BDA0003936865220000043
Representing the hidden state sequences backwards and forwards, respectively, the finally output vector sequence h i Is the sum of the two.
Preferably, the converting the vector sequence output by the LSTM into a tag sequence corresponding to the Manchu word by using a connectivity time-series classification algorithm CTC algorithm, and identifying the Manchu word according to the tag sequence includes:
converting a vector sequence output by an LSTM into a tag sequence corresponding to Manchu words by adopting a CTC algorithm, calculating and comparing conditional probabilities among different tag sequences, and outputting the tag sequence with the highest conditional probability value as a prediction result, wherein the conditional probability of letters in each Manchu word is defined as the probability of observing a corresponding letter picture at a certain moment, and the conditional probability of the whole Manchu word is obtained by summing the conditional probabilities after different letters are arranged and combined;
the process of the CTC algorithm to calculate the conditional probability of a tag sequence includes: tag sequence to be input into CTC algorithm by y 1, y 2 ,....,y r Where T is the length of the input sequence, L represents the 36 different character labels used, L 'is the addition of a blank label on the basis of L, i.e., L' = L utou { blank }, where blank represents a blank label, and pi is defined as any possible occurring sequence of labels, and there are
Figure BDA0003936865220000051
The conditional probability of π is calculated as follows:
Figure BDA0003936865220000052
wherein
Figure BDA0003936865220000053
Indicating the occurrence of the label pi at time t t P (π | y) denotes in the sequence y 1, y 2 ,....,y r The conditional probability of the occurrence of the tag sequence pi;
in that
Figure BDA0003936865220000054
A group of many-to-one mapping methods B is defined above, and is specifically described as follows: b, removing repeated labels in the pi, removing all blank labels to obtain a new label sequence l, and defining the conditional probability of l as the sum of the conditional probabilities of the label sequence pi mapped to l through B.
The conditional probability calculation formula for l is as follows:
Figure BDA0003936865220000055
in a dictionary-free mode, selecting a sequence with the highest conditional probability in the label sequences as a recognition result of the Manchu word to be output, wherein the calculation method comprises the following steps:
Figure BDA0003936865220000056
h(x)≈B(l′)#(5)
wherein h (x) is the final predicted word tag sequence, and l' represents the selection of the tag sequence with the highest conditional probability;
in the dictionary-based mode, a sequence which is closest to the word in the dictionary and has the highest conditional probability is selected and output as a recognition result of the Manchu word.
According to the technical scheme provided by the embodiment of the invention, the method can effectively extract and identify clear, detachable and universal characteristics from a large number of printed Manchu images, and simultaneously effectively screen and associate the associated characteristic information in the context, so as to obtain the printed Manchu identification model with high identification accuracy.
Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a processing flow chart of a wood printing Manchu recognition method based on an end-to-end neural network according to an embodiment of the present invention;
fig. 2 is a schematic diagram illustrating different examples of a same Manchu word according to an embodiment of the present invention.
Fig. 3 is a flowchart illustrating processing of a Manchu word picture by using a CNN according to an embodiment of the present invention;
FIG. 4 is a flowchart of the overall processing of CNN, bi-LSTM and CTC according to an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or coupled. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
For the convenience of understanding the embodiments of the present invention, the following description will be further explained by taking several specific embodiments as examples in conjunction with the drawings, and the embodiments are not to be construed as limiting the embodiments of the present invention.
The embodiment of the invention introduces the deep neural network into the field of wood board printing Manchu recognition, and avoids the defects of the traditional recognition method. The embodiment of the invention provides a wood board printing Manchu recognition method based on an end-to-end neural network by combining a convolutional neural network, a long-short term memory network and a connectivity time sequence classification algorithm, and realizes high-precision recognition of wood board printing Manchu pictures. The processing flow of the method is shown in fig. 1, and the whole method can be divided into a data preparation module and a model training module.
In the data preparation module, an independent Manchu word needs to be cut out from each digitized Manchu ancient picture. Each Manchu word is a picture in PNG format with different length and width. In order for the network to generate a target string sequence for each word, all Manchu word pictures need to be scaled to the same size. In the invention, all the word pictures are normalized to be the same width on the premise of keeping the aspect ratio of the word pictures. Since the Manchu writing direction is different from that of Chinese, and the Manchu writing direction is from left to right and from top to bottom, each Manchu word picture also needs to be rotated 90 degrees counterclockwise, so that the Manchu word picture can be correctly recognized and input by a neural network.
Manchu is a kind of alphabetic writing composed of Manchu letters (teeth), each Manchu word is composed of different teeth, and the teeth have different shapes at different positions of the word. Teeth can be divided into four forms of independent body, head shape of word, middle shape of word and tail shape of word. To enable the model to accurately recognize Manchu words, we assigned different labels to all Manchu letters and annotated all extracted woodprinted Manchu words with a glyph code. The specific method for assigning labels refers to the Mu Linde fuff transcription scheme. Mu LindeThe Freund transcription scheme was performed by German linguist Paul Georg von
Figure BDA0003936865220000083
It is proposed that the Manchu text transcription scheme is widely adopted internationally at present. The Chinese medicine contains 36 different font codes, which are "a", "e", "i", "o", "u", "g", "n", "ng", "k", "g", "h", "b", "p", "s", "h", and/or "H", respectively>
Figure BDA0003936865220000081
“t”、“d”、“l”、“m”、“c”、“j”、“y”、“r”、“f”、“w”、“k’”、“g’”、“h’”、“ts’”、“ts”、“dz”、“dzi”、/>
Figure BDA0003936865220000082
“sy”、“c’y”、“jy”。
Identifying woodboard-printed Manchu words has the following challenges:
(1) The ancient books with the words in the Manchu are printed by engraving wood plates, which have various engraving methods and complicated inking relations, and the same letters and words have different shapes. Fig. 2 shows examples of different typographical forms of the same Manchu word.
(2) The ancient books have long storage time, and have the problems of damaged and deformed paper, blurred fonts and the like.
(3) The ancient books are complex in layout, and cutting and recognition of Manchu words can be influenced.
In the model training module, in order to obtain rich feature information in the Manchu word picture, firstly, the Manchu word picture is processed by using the CNN according to the processing flow shown in FIG. 3, and the specific processing procedure includes: all pictures are first normalized to the same size before entering the CNN convolutional layer. When performing the CNN convolution operation, the picture is passed through multiple sets of convolution kernels of different sizes. The length and width of the convolution kernel represent the size of the feature region extracted by one convolution operation. Each word picture has a plurality of groups of features with different sizes, and in order to extract complete feature information from the word pictures, a plurality of convolution kernels suitable for the feature sizes of Manchu words are selected for convolution operation. After each pass through the convolution kernel, the output Feature Map (Feature Map) is dimensionality reduced using Max Pooling (Max Pooling). The maximum pooling generally selects the maximum value in the candidate area as a whole representation, and each operation screens out the features with better classification and recognition degrees, and simultaneously more texture information of characters is reserved. And the Average Pooling (Average Pooling) can generally keep more background information in the image each time the Average in the candidate region is selected as the overall representation. In the process of extracting the Manchu word picture features, more attention fonts are needed and irrelevant background information is filtered, so that the maximum pooling is combined with the CNN convolutional layer.
The overall processing flow of CNN, bi-LSTM and CTC provided by the embodiment of the invention is shown in figure 4. In order to enable the extracted feature vectors to contain context information, and further improve the accuracy of full-text word recognition, the invention uses a Long Short-Term Memory network (LSTM) to process the feature vectors extracted by the CNN. The reason for this choice is that LSTM networks are good at handling time-series sensitive problems and tasks. In the face of serialized data, the LSTM network has a memory effect, and a special gating unit structure contained in the LSTM network can select and utilize effective memory information, so that the problem of long-term dependence existing in RNN is solved. Since the context information of the letters in the Manchu word affects the recognition of the whole Manchu word, a Bi-directional Long Short-Term Memory (Bi-LSTM) network with two directions is used to obtain a vector sequence containing the context information, and h is used 1 ,h 2 ,…,h T And (4) showing. The final output of the Bi-LSTM can be expressed as follows:
Figure BDA0003936865220000091
wherein
Figure BDA0003936865220000092
And &>
Figure BDA0003936865220000093
Representing the hidden state sequences backwards and forwards, respectively, the finally output vector sequence h i Is the sum of the two. />
The feature vector output by the Bi-LSTM is processed by a connectivity time sequence Classification (CTC) algorithm and then output as a target character sequence corresponding to the Manchu word, and the CTC algorithm is adopted to convert the vector sequence output by the long-short term memory network into a label sequence corresponding to the Manchu word. CTC can be obtained by calculating and comparing conditional probabilities among different sequences, and outputting a tag sequence with the highest probability as a prediction result, and has remarkable transcription performance and stability. The CTC adds blank tags on the basis of the existing tags, and ignores the absolute position of each tag. The conditional probability of each letter is defined as the probability that the corresponding letter picture is observed at a certain time, and the conditional probability of the whole word can be obtained by summing the conditional probabilities after combining different letter arrangements.
The method for calculating the conditional probability of CTC is described below. First, the sequence of CTC is input using y 1, y 2 ,....,y r Where T is the length of the input sequence. L represents 36 different character labels currently used, and L 'is a blank label added on the basis of L, i.e., L' = L utou { blank }, where blank represents a blank label. π is defined as any possible occurrence of a tag sequence, and
Figure BDA0003936865220000101
the conditional probability of pi is calculated as follows:
Figure BDA0003936865220000102
wherein
Figure BDA0003936865220000103
Indicating the occurrence of the label pi at time t t P (π | y) denotes in the sequence y 1 ,y 2 ,....,y r The conditional probability of occurrence of the tag sequence pi.
Next, in
Figure BDA0003936865220000104
A group of many-to-one mapping methods B is defined above, and is described in detail as follows: b, firstly removing repeated labels in pi, and then removing all blank labels to obtain a new label sequence l. For example, B maps "-a-r-a-hh-aa-" to "araha" (where "-" represents a blank tag), and then defines the conditional probability of l as the sum of the conditional probabilities that tag sequences π map to l through B. Since operations at the exponential level typically consume excessive computer resources, equation 2 is typically efficiently computed using a forward-backward algorithm. The conditional probability calculation formula for l is as follows:
Figure BDA0003936865220000105
and finally, selecting the label sequence as a word recognition result. There are generally two modes in selecting a tag, a dictionary-based mode and a non-dictionary mode. And in a dictionary-free mode, selecting a sequence with the highest conditional probability in the tag sequences as a prediction result to be output. And in the dictionary-based mode, a sequence which is closest to the word in the dictionary and has the highest conditional probability is selected as a prediction result to be output. However, the accuracy and efficiency of word recognition in this mode is generally limited by the size of the lexicon itself, and therefore, the label sequence of the output word in the non-lexicon mode is selected in the present invention. The calculation method is as follows.
Figure BDA0003936865220000106
h(x)≈B(l′)(5)
Where h (x) is the final predicted word tag sequence and l' denotes the selection of the tag sequence with the highest conditional probability.
The experiment adopted by the invention is completed in an Ubuntu 18.04 operating system, a Tensorflow version 1.13.1, a video card uses NVIDIA 2080Ti, and a video memory is 12GB. In the data preparation module, the image is normalized to 120 × 400 size before input CNN. The CNN was subjected to feature extraction and dimensionality reduction using five convolution kernels of size 16 × 16,8 × 8,4 × 4,2 × 2 and five pooled collation pictures of size 2 × 5,1 × 4,1 × 2. The length, kind and number of channels of the convolution kernel can be adjusted. After CNN convolution, an output result with a size of 30 × 1 × 256 dimensions is obtained. The number of memory cells in the Bi-LSTM is set to 256. In the experiment the batch _ size was set to 32, the activation function selected ReLu, and the loss in the network was optimized using rmsprop optimizer. The parts can be modified and adjusted, and after model training is completed, the accuracy rate Character-accuracy of letter recognition and the accuracy rate Word-accuracycacy of Word recognition are respectively calculated.
In summary, the embodiment of the invention creates a recognition method for woodboard printing Manchu, which combines Bi-LSTM, CTC and CNN, extracts more prominent and separable features from a large number of Manchu word pictures, and analyzes the extracted features to obtain a woodboard printing Manchu recognition model with high accuracy.
The wood board printing Manchu recognition method based on the end-to-end neural network provided by the embodiment of the invention combines the advantages that CNN can capture more feature information and Bi-LSTM is connected with context feature information, recognizes each Manchu word from the perspective of tooth (phoneme), and considers the accuracy, stability, high efficiency and generalization. The method identifies 20000 wood board printed Manchu word pictures, and the accuracy rate is 86.98%.
In the existing Manchu recognition methods, the traditional recognition method based on statistics is used for carrying out splitting recognition on Manchu strokes. However, this kind of method ignores the deformation generated after the letters are combined with the letters, and the recognition object is limited to only specific Manchu letters, and cannot recognize the whole Manchu word. In the CNN-based recognition method, all Manchu words are classified into 666 classes, and the Manchu words are recognized as a whole. However, in practice, the type of Manchu words is much larger than 666 classes, and this method cannot accurately identify Manchu words in the class that does not appear.
The comparison between the current Manchu recognition methods is shown in Table 1.
TABLE 1 comparison of different Manchu recognition methods
Figure BDA0003936865220000111
Those of ordinary skill in the art will understand that: the figures are merely schematic representations of one embodiment, and the blocks or flow diagrams in the figures are not necessarily required to practice the present invention.
From the above description of the embodiments, it is clear to those skilled in the art that the present invention can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present invention may be embodied in the form of software products, which may be stored in a storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and include instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for apparatus or system embodiments, since they are substantially similar to method embodiments, they are described in relative terms, as long as they are described in partial descriptions of method embodiments. The above-described embodiments of the apparatus and system are merely illustrative, and the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.

Claims (5)

1. A wood board printing Manchu recognition method based on an end-to-end neural network is characterized by comprising the following steps:
cutting out independent Manchu words from the Manchu ancient book picture, and representing each Manchu word by using a Manchu word picture;
carrying out convolution operation on the Manchu word picture by using a Convolutional Neural Network (CNN), and extracting a feature vector of the Manchu word picture;
processing the characteristic vector extracted by the CNN by using a long-short term memory network LSTM to obtain a vector sequence containing semantic information;
and converting the vector sequence output by the LSTM into a label sequence corresponding to the Manchu word by adopting a connectivity time sequence classification algorithm (CTC) algorithm, and outputting the recognition result of the Manchu word according to the label sequence.
2. The method of claim 1, wherein said cutting individual Manchu words from Manchu ancient pictures, each Manchu word being represented by a Manchu word picture, comprises:
cutting out independent Manchu words from each digitized Manchu ancient book picture, representing each Manchu word by using a Manchu word picture with different length and width, zooming all Manchu word pictures to the same size, normalizing all word pictures to the same width on the premise of keeping the aspect ratio of the word pictures, rotating each Manchu word picture by 90 degrees anticlockwise, distributing different labels for all Manchu letters, and annotating all extracted wood boards with character-shaped codes to print Manchu words.
3. The method according to claim 1 or 2, wherein extracting the feature vector of the Manchu word picture when performing convolution operation on the Manchu word picture by using the CNN comprises:
the CNN selects a plurality of groups of convolution kernels which are suitable for the characteristic sizes of Manchu words, the Manchu word pictures pass through a plurality of groups of convolution kernels with different sizes in the CNN, the length and the width of each convolution kernel represent the size of a characteristic area extracted by each convolution operation, the output characteristic graph is subjected to dimension reduction by using maximum pooling after each convolution kernel, the maximum value in a candidate area is selected by the maximum pooling to be used as a whole for representing, and finally the characteristic vector containing more word texture information is obtained.
4. The method of claim 3, wherein the processing the CNN-extracted feature vectors using LSTM to obtain a vector sequence comprises:
processing the feature vector of the Manchu word picture by using the LSTM containing two directions to obtain each context information h contained in the feature vector 1 ,h 2 ,…,h T Integrating all the context information to obtain a vector sequence h containing the context information and output by the Bi-LSTM i
Figure FDA0003936865210000021
Wherein
Figure FDA0003936865210000022
And &>
Figure FDA0003936865210000023
Representing the hidden state sequences backwards and forwards, respectively, the finally output vector sequence h i Is the sum of the two.
5. The method of claim 4, wherein the CTC algorithm converts the vector sequence output by the LSTM into a tag sequence corresponding to the Manchu word, and the recognition of the Manchu word according to the tag sequence comprises:
converting a vector sequence output by an LSTM into a tag sequence corresponding to Manchu words by adopting a CTC algorithm, calculating and comparing conditional probabilities among different tag sequences, and outputting the tag sequence with the highest conditional probability value as a prediction result, wherein the conditional probability of letters in each Manchu word is defined as the probability of observing a corresponding letter picture at a certain moment, and the conditional probability of the whole Manchu word is obtained by summing the conditional probabilities after different letters are arranged and combined;
the process of calculating the conditional probability of the tag sequence by the CTC algorithm comprises the following steps: tag sequence to be input into CTC algorithm by y 1, y 2 ,....,y r Where T is the length of the input sequence, L represents the 36 different character labels used, L 'is the addition of a blank label on the basis of L, i.e., L' = L utou { blank }, where blank represents a blank label, and pi is defined as any possible occurring sequence of labels, and there are
Figure FDA0003936865210000024
The conditional probability of pi is calculated as follows:
Figure FDA0003936865210000025
wherein
Figure FDA0003936865210000026
Indicating the occurrence of the label pi at time t t P (π | y) denotes in the sequence y 1, y 2 ,....,y r The conditional probability of occurrence of a tag sequence pi;
in that
Figure FDA0003936865210000031
A group of many-to-one mapping methods B is defined above, and is described in detail as follows: b firstly removes repeated labels in pi, removes all blank labels to obtain a new label sequence l, and defines the conditional probability of l as the sum of the conditional probabilities of the label sequence pi mapped to l through B.
The conditional probability calculation formula for l is as follows:
Figure FDA0003936865210000032
in a dictionary-free mode, selecting a sequence with the highest conditional probability in the label sequences as a recognition result of the Manchu word to be output, wherein the calculation method comprises the following steps:
Figure FDA0003936865210000033
h(x)≈B(l′)#(5)
wherein h (x) is the final predicted word tag sequence, and l' represents the selection of the tag sequence with the highest conditional probability;
in the dictionary-based mode, a sequence which is closest to the word in the dictionary and has the highest conditional probability is selected and output as a recognition result of the Manchu word.
CN202211405330.XA 2022-11-10 2022-11-10 Wood board printing Manchu recognition method based on end-to-end neural network Pending CN115862038A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211405330.XA CN115862038A (en) 2022-11-10 2022-11-10 Wood board printing Manchu recognition method based on end-to-end neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211405330.XA CN115862038A (en) 2022-11-10 2022-11-10 Wood board printing Manchu recognition method based on end-to-end neural network

Publications (1)

Publication Number Publication Date
CN115862038A true CN115862038A (en) 2023-03-28

Family

ID=85662982

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211405330.XA Pending CN115862038A (en) 2022-11-10 2022-11-10 Wood board printing Manchu recognition method based on end-to-end neural network

Country Status (1)

Country Link
CN (1) CN115862038A (en)

Similar Documents

Publication Publication Date Title
Giotis et al. A survey of document image word spotting techniques
Naz et al. The optical character recognition of Urdu-like cursive scripts
Mathew et al. Benchmarking scene text recognition in Devanagari, Telugu and Malayalam
US8761500B2 (en) System and methods for arabic text recognition and arabic corpus building
US20190087677A1 (en) Method and system for converting an image to text
Javed et al. Segmentation free nastalique urdu ocr
CN110084239A (en) The method of network training over-fitting when reducing offline hand-written mathematical formulae identification
Lutf et al. Arabic font recognition based on diacritics features
Peng et al. Multi-font printed Mongolian document recognition system
Wu et al. LCSegNet: An efficient semantic segmentation network for large-scale complex Chinese character recognition
CN111914825A (en) Character recognition method and device and electronic equipment
Romero et al. Modern vs diplomatic transcripts for historical handwritten text recognition
Cojocaru et al. Watch your strokes: improving handwritten text recognition with deformable convolutions
Addis et al. Printed ethiopic script recognition by using lstm networks
He et al. Open set Chinese character recognition using multi-typed attributes
Bilgin Tasdemir Printed Ottoman text recognition using synthetic data and data augmentation
Ul-Hasan Generic text recognition using long short-term memory networks
Al Ghamdi A novel approach to printed Arabic optical character recognition
Ashraf et al. An analysis of optical character recognition (ocr) methods
CN115862038A (en) Wood board printing Manchu recognition method based on end-to-end neural network
Khaled et al. A Hybrid Deep Learning Approach for Arabic Handwritten Recognition: Exploring the Complexities of the Arabic Language
CN114332476A (en) Method, device, electronic equipment, storage medium and product for identifying dimensional language
Wang et al. End-to-end model based on bidirectional lstm and ctc for segmentation-free traditional mongolian recognition
Ajao et al. Hidden markov model approach for offline Yoruba handwritten word recognition
Sturgeon Unsupervised extraction of training data for pre-modern Chinese OCR

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination