CN113723421B - Chinese character recognition method based on zero sample embedded in matching category - Google Patents

Chinese character recognition method based on zero sample embedded in matching category Download PDF

Info

Publication number
CN113723421B
CN113723421B CN202111038228.6A CN202111038228A CN113723421B CN 113723421 B CN113723421 B CN 113723421B CN 202111038228 A CN202111038228 A CN 202111038228A CN 113723421 B CN113723421 B CN 113723421B
Authority
CN
China
Prior art keywords
chinese character
embedding
category
embedded
chinese
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111038228.6A
Other languages
Chinese (zh)
Other versions
CN113723421A (en
Inventor
黄宇浩
金连文
彭德智
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN202111038228.6A priority Critical patent/CN113723421B/en
Publication of CN113723421A publication Critical patent/CN113723421A/en
Application granted granted Critical
Publication of CN113723421B publication Critical patent/CN113723421B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Abstract

The invention relates to a Chinese character recognition method based on zero samples embedded in matching categories, which comprises the following steps: extracting visual characteristics of a Chinese character text image; performing category embedding on Chinese character categories, performing hierarchical decomposition on components of the Chinese characters by adopting a hierarchical decomposition-based embedding algorithm, and calculating to obtain corresponding embedded vectors; the method comprises the steps of embedding and mapping the category of the Chinese character category into a visual space, enabling the dimension of the Chinese character category embedding to be equal to the dimension of the visual space based on a bidirectional embedding transfer module, and retaining the original information of the Chinese character category; and matching the visual characteristics of the Chinese character text image and the Chinese character category embedded information through a CTC decoder based on the distance, and outputting a final result of Chinese character text image recognition. The invention realizes the zero-sample Chinese character text recognition by the matching type embedding method, and the method is suitable for the long Chinese character recognition and the zero-sample Chinese character recognition.

Description

Chinese character recognition method based on zero sample embedded in matching category
Technical Field
The invention relates to the technical field of pattern recognition and artificial intelligence, in particular to a Chinese character recognition method based on zero samples embedded in matching categories.
Background
Chinese characters are one of the oldest characters in the world, and are carriers of the histories and cultural inheritance of Chinese nationality to date. The Chinese character recognition is researched, the history document is electronized, and the method has important value and significance for inheritance of history culture. However, the types of Chinese characters are huge, besides 4000 Chinese characters used daily, the number of Chinese character types recorded in history and academic stock is more than 85000, and the Chinese characters are usually in the form of rarely used words, complex words and foreign words, and samples are often difficult to collect manually for acquisition. At present, a Chinese text recognition model is commonly used for recognizing characters by combining a convolutional neural network and a mechanism of CTC decoding or Attention decoding. And a data-driven scheme is generally adopted, namely, a large amount of data is collected or synthesized for each Chinese character category to train a model, which is applicable to the recognition of common Chinese characters. However, for rare words, complex words, and foreign words, it is difficult to collect and obtain the true samples, so obtaining and labeling sufficient data is a time and money consuming task, and it is difficult to synthesize such samples.
Aiming at the problems, a Chinese character classification method based on zero samples embedded in matching categories is adopted, and recognition of rare words, complex words and variant word samples is realized by only learning part features in common Chinese character samples.
Disclosure of Invention
The invention aims to overcome the defects and shortcomings of the prior art, and provides a zero-sample Chinese character recognition method based on matching category embedding, which solves the problem of zero-sample Chinese character recognition and realizes recognition of rare words, complex words and foreign word samples.
In order to achieve the above object, the present invention provides the following solutions:
a Chinese character recognition method based on zero samples embedded in matching categories comprises the following steps:
extracting visual characteristics of a Chinese character text image;
performing category embedding on Chinese character categories, performing hierarchical decomposition on components of the Chinese characters by adopting a hierarchical decomposition-based embedding algorithm, and calculating to obtain corresponding embedded vectors;
the Chinese character category is embedded and mapped into a visual space, and based on a bidirectional embedding transfer module, the dimension of the Chinese character category embedding is equal to the dimension of the visual space, and the original information of the Chinese character category is reserved;
and matching the visual characteristics of the Chinese character text image and the Chinese character category embedded information through a CTC decoder based on the distance, and outputting a final result of Chinese character text image recognition.
Preferably, a text encoder based on a convolutional neural network is adopted to extract the visual characteristics of the Chinese character text image.
Preferably, the extracting the visual feature of the chinese character text image by the text encoder based on convolutional neural network specifically includes:
and extracting by adopting a ResNet18 model as a backbone network, removing a last full-connection layer of the backbone network, and replacing a last output global average pooling layer with pooling only for the height of the feature map so that the output height of the feature map is 1 and the width of the feature map is kept unchanged.
Preferably, a dropout strategy is adopted at the output of the last convolution layer of the backbone network, and the probability of dropout is set to 0.3, so as to prevent the network from over-fitting.
Preferably, the hierarchical decomposition embedding algorithm specifically includes:
obtaining the components and the structure of the Chinese characters according to the ideographic description sequence of the Chinese characters; then, the components and the structure of the Chinese characters are embedded according to the function of the embedding algorithm, so that the corresponding Chinese character category embedding is obtained, and the function expression is as shown in the formula (1):
wherein n is i Representing components in component set R, n j Representing structures in the structure set S, y n A vector is encoded for one hot of a component or structure, lambda is a super parameter, and the set value is 0.5; v n Is an influence factor of a component or a structure, and can be calculated by the following formula (2):
wherein alpha and beta are super parameters, set to 0.5 and 0.001, p respectively i Represented are nodes on the root node to leaf node path, and l represents the length of the path.
Preferably, the bidirectional embedded transfer module is composed of a forward full connection layer and a reverse full connection layer, and the two full connection layers share parameters.
Preferably, the forward full connection layer maps category embeddings of Chinese characters to the visual space for making the dimension of category embeddings equal to the dimension of visual features of the text image.
Preferably, the reverse full-connection layer is formed by a transpose of a parameter matrix of the forward full-connection layer, and through the reverse full-connection layer, the class embedding can be reconstructed, and a reconstruction loss function is adopted to calculate a mean square error between the reconstruction class embedding and the original class embedding, so that the class embedding mapped to the visual space can retain original information thereof.
Preferably, the specific operations of the distance-based CTC decoder include:
and calculating the distance between the visual features and the Chinese character category embedding by adopting a cosine similarity function, wherein the expression is as follows:
wherein V represents visual characteristics, phi' represents category embedding after mapping; substituting the cosine similarity between the visual features and the category embedding into a CTC loss function based on distance to serve as an optimization target of the network after the cosine similarity between the visual features and the category embedding is calculated;
the distance-based CTC loss function expression is:
wherein l i Is a label, alpha is a learnable parameter, and the magnitude of cosine similarity can be adjusted.
The beneficial effects of the invention are as follows:
(1) The invention designs a zero sample recognition model aiming at Chinese character text, solves the problems that the existing text recognition method depends on a large amount of labeled training data and is difficult to recognize zero sample data, so that the text recognition model has better generalization capability, can recognize foreign words, rare words and complex words, has simple and flexible realization process, and can be suitable for the existing text recognition frame.
(2) Compared with most of the existing zero sample recognition methods, the zero sample recognition method only focuses on the recognition problem of Chinese characters, so that the zero sample recognition method has more challenging and practical application value.
(3) The invention adopts a matching type embedding method and a distance-based CTC decoder, so that the model can identify zero samples and process long texts, and solves the problems that the existing zero sample identification method based on Attention needs long training time and is not suitable for long text identification.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the prior art, the drawings that are needed in the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of the method of the present invention;
FIG. 2 is a schematic diagram of a text encoder of the present invention;
FIG. 3 is a schematic diagram of a hierarchical exploded structure of Chinese characters according to the present invention;
FIG. 4 is a schematic diagram of a bi-directional embedded module of the present invention;
fig. 5 is a schematic diagram of a distance-based CTC decoder of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description.
The invention relates to a Chinese character recognition method based on zero samples embedded in matching categories, which is shown in figure 1 and comprises the following steps:
s1, extracting visual features of a Chinese character text image, wherein a text encoder based on a convolutional neural network is adopted to extract the visual features of the Chinese character text image, and the method specifically comprises the following steps:
the ResNet18 model is used as a backbone network to extract visual features of the text image, as shown in FIG. 2, and the text encoder takes the text image as input and outputs a one-dimensional sequence of visual features. The ResNet18 model removes the last full connection layer of the network, so that the network is only used for extracting the features, and the height of the output feature map is subjected to average pooling, so that the output height of the feature map is 1, and the width of the feature map is kept unchanged, thereby obtaining a one-dimensional visual feature sequence. In addition, in order to prevent network overfitting, a dropout strategy is adopted at the output of the last convolutional layer to prevent network overfitting, and the probability of dropout is set to 0.3.
S2, performing category embedding on Chinese character categories, performing hierarchical decomposition on components of the Chinese characters by adopting an algorithm based on hierarchical decomposition embedding, and calculating to obtain corresponding embedded vectors, wherein the method specifically comprises the following steps:
the components and structure of the Chinese character are obtained according to the ideographic description sequence of the Chinese character, as shown in figure 3. Then, embedding the components and the structure of the Chinese characters according to the function of the hierarchical decomposition embedding algorithm to obtain corresponding Chinese character category embedding, wherein the function is expressed as follows:
wherein n is i Representing components in component set R, n j Representing structures in the structure set S, y n The method is a one hot coding vector of a component or a structure, wherein the dimensions of the one hot coding vector are consistent, and lambda is a super parameter of two front and rear items of a balanced sub, and the set value is 0.5; v n Is an influencing factor for a component or structure, which can be calculated by the following formula:
wherein alpha and beta areSuper parameters, set to 0.5 and 0.001, p respectively i Represented are nodes on the root node to leaf node path, and l represents the length of the path. Further, the "blank" symbol to be used in CTCs is represented by a one hot code vector. Because the hierarchical decomposition embedding algorithm has embedded the component and structure information for each Chinese character category, it can be used to represent both visible Chinese character categories and invisible Chinese character categories.
S3, based on the Chinese character category embedding, adopting a bidirectional embedding transfer module to embed and map the category of the Chinese character category into a visual space, and specifically comprising the following steps:
the bidirectional embedded transfer module, as shown in fig. 4, is composed of a forward full-connection layer and a reverse full-connection layer, and the two full-connection layers share parameters. The forward full connection layer maps the category embedding of the Chinese characters to the visual space so that the dimension of the category embedding is equal to the dimension of the visual feature of the text image. The full connection layer is composed of a linear mapping function, and the input category embedding is obtained after the input category embedding passes through the full connection layer. Because the projected category embedding does not add additional constraint, the original information of the projected category embedding can be lost along with the training, so that the generalization capability of the network is weakened, and the original information of the projected category embedding is reserved by adopting a reverse full-connection layer which plays a role of additional constraint. To simplify the operation, the transpose of the forward fully connected layer parameter matrix is used as the parameters of the reverse fully connected layer. And the reverse full-connection layer takes the projected category embedding as input to obtain the reconstructed category embedding. In order to enable the network to learn this characteristic, a reconstruction loss function is employed to calculate the mean square error of the reconstructed class embedding and the original class embedding so that the class embedding mapped to visual space can retain its original information. The expression for reconstructing the loss function is:
wherein the method comprises the steps ofIs the embedding of the reconstructed category, phi i Is the original category embedding.
S4, matching visual features of the Chinese character text image with Chinese character category embedding by adopting a CTC decoder based on distance, and outputting a recognition result of the Chinese character text image, wherein the method specifically comprises the following steps:
the distance-based CTC decoder decodes visual features by matching similar class embeddings as shown in fig. 5, and first calculates the distances between the visual features and the class embeddings by using a cosine similarity function, where the expression is:
where V represents the visual features and Φ' represents the mapped category embedding. After the cosine similarity between the visual features and the category embedding is calculated, a cosine similarity matrix d (V, phi') of the visual features and each category can be obtained, wherein d ij The cosine similarity between pixel point i and class-embedded j in the visual feature is represented. Further, the prediction result of each pixel point on the visual characteristic can be obtained by maximizing each row of the cosine similarity matrix. However, the current prediction results are not aligned to the real tags one by one, so that the CTC loss function needs to be used to optimize the process, so that the prediction results are aligned to the real tags. The cosine similarity matrix can be substituted into the CTC loss function to serve as an optimization target of the network, so that the CTC loss function based on the distance is obtained, and the expression is as follows:
wherein l i Is a label, and alpha is a size that can be adjusted for cosine similarity by a learnable parameter. Finally, after the repeated labels and the blank labels in the output result are removed, the recognition result of the Chinese character text image can be output.
The categories contained in the training set are referred to as visible categories, while the categories not contained in the training set are referred to as invisible categories. The goal of zero sample recognition is to enable the network to recognize invisible categories by learning the characteristics of the visible categories.
The invention realizes the zero-sample Chinese character text recognition by the matching type embedding method, and the method is suitable for the long Chinese character recognition and the zero-sample Chinese character recognition.
The above embodiments are merely illustrative of the preferred embodiments of the present invention, and the scope of the present invention is not limited thereto, but various modifications and improvements made by those skilled in the art to which the present invention pertains are made without departing from the spirit of the present invention, and all modifications and improvements fall within the scope of the present invention as defined in the appended claims.

Claims (7)

1. A Chinese character recognition method based on zero samples embedded in matching categories is characterized by comprising the following steps:
extracting visual characteristics of a Chinese character text image;
performing category embedding on Chinese character categories, performing hierarchical decomposition on components of the Chinese characters by adopting a hierarchical decomposition-based embedding algorithm, and calculating to obtain corresponding embedded vectors;
the hierarchical decomposition embedding algorithm specifically comprises the following steps:
obtaining the components and the structure of the Chinese characters according to the ideographic description sequence of the Chinese characters; then, the components and the structure of the Chinese characters are embedded according to the function of the embedding algorithm, so that the corresponding Chinese character category embedding is obtained, and the function expression is as shown in the formula (1):
wherein n is i Representing components in component set R, n j Representing structures in the structure set S, y n Encoding a vector, lambda, for one hot of a part or structureIs a super parameter, and the set value is 0.5; v n Is an influence factor of a component or a structure, and can be calculated by the following formula (2):
wherein alpha and beta are super parameters, set to 0.5 and 0.001, p respectively i Representing the nodes on the path from the root node to the leaf node, and l represents the length of the path;
the Chinese character category is embedded and mapped into a visual space, and based on a bidirectional embedding transfer module, the dimension of the Chinese character category embedding is equal to the dimension of the visual space, and the original information of the Chinese character category is reserved;
matching visual characteristics of the Chinese character text image and Chinese character category embedded information through a CTC decoder based on the distance, and outputting a final result of Chinese character text image recognition;
specific operations of the distance-based CTC decoder include:
and calculating the distance between the visual features and the Chinese character category embedding by adopting a cosine similarity function, wherein the expression is as follows:
wherein V represents visual characteristics, phi' represents category embedding after mapping; substituting the cosine similarity between the visual features and the category embedding into a CTC loss function based on distance to serve as an optimization target of the network after the cosine similarity between the visual features and the category embedding is calculated;
the distance-based CTC loss function expression is:
wherein l i Is a label, alpha is a learnable parameter, and can adjust the remainderThe magnitude of the chord similarity.
2. The method for recognizing Chinese characters based on zero samples embedded in matching categories according to claim 1, wherein a text encoder based on convolutional neural network is used to extract visual features of the Chinese character text image.
3. The method for recognizing chinese characters based on zero samples embedded in matching categories according to claim 2, wherein said text encoder based on convolutional neural network extracts visual features of said chinese character text image specifically comprises:
and extracting by adopting a ResNet18 model as a backbone network, removing a last full-connection layer of the backbone network, and replacing a last output global average pooling layer with pooling only for the height of the feature map so that the output height of the feature map is 1 and the width of the feature map is kept unchanged.
4. The method for recognizing Chinese characters based on zero samples embedded in matching categories according to claim 3, wherein a dropout strategy is adopted at the output of the last convolution layer of the backbone network, and the probability of dropout is set to 0.3 for preventing the network from over-fitting phenomenon.
5. The method for recognizing Chinese characters based on zero samples embedded in matching categories according to claim 1, wherein the bidirectional embedded transfer module is composed of a forward full connection layer and a reverse full connection layer, and the two full connection layers share parameters.
6. The method for recognition of chinese characters based on zero samples of matching category embedding of claim 5, wherein said forward full-join layer maps category embedding of chinese characters to said visual space for making a dimension of category embedding equal to a dimension of visual features of said text image.
7. The method for recognizing chinese characters based on zero samples of matching class embedding of claim 5 wherein said reverse full-connected layer is comprised of a transpose of a parameter matrix of said forward full-connected layer, class embedding can be reconstructed by said reverse full-connected layer, and a mean square error of the reconstructed class embedding and the original class embedding is calculated by employing a reconstruction loss function, such that class embedding mapped to said visual space can retain its original information.
CN202111038228.6A 2021-09-06 2021-09-06 Chinese character recognition method based on zero sample embedded in matching category Active CN113723421B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111038228.6A CN113723421B (en) 2021-09-06 2021-09-06 Chinese character recognition method based on zero sample embedded in matching category

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111038228.6A CN113723421B (en) 2021-09-06 2021-09-06 Chinese character recognition method based on zero sample embedded in matching category

Publications (2)

Publication Number Publication Date
CN113723421A CN113723421A (en) 2021-11-30
CN113723421B true CN113723421B (en) 2023-10-17

Family

ID=78681933

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111038228.6A Active CN113723421B (en) 2021-09-06 2021-09-06 Chinese character recognition method based on zero sample embedded in matching category

Country Status (1)

Country Link
CN (1) CN113723421B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106570456A (en) * 2016-10-13 2017-04-19 华南理工大学 Handwritten Chinese character recognition method based on full-convolution recursive network
CN108399421A (en) * 2018-01-31 2018-08-14 南京邮电大学 A kind of zero sample classification method of depth of word-based insertion
CN112508108A (en) * 2020-12-10 2021-03-16 西北工业大学 Zero-sample Chinese character recognition method based on etymons
CN112633431A (en) * 2020-12-31 2021-04-09 西北民族大学 Tibetan-Chinese bilingual scene character recognition method based on CRNN and CTC

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106570456A (en) * 2016-10-13 2017-04-19 华南理工大学 Handwritten Chinese character recognition method based on full-convolution recursive network
CN108399421A (en) * 2018-01-31 2018-08-14 南京邮电大学 A kind of zero sample classification method of depth of word-based insertion
CN112508108A (en) * 2020-12-10 2021-03-16 西北工业大学 Zero-sample Chinese character recognition method based on etymons
CN112633431A (en) * 2020-12-31 2021-04-09 西北民族大学 Tibetan-Chinese bilingual scene character recognition method based on CRNN and CTC

Also Published As

Publication number Publication date
CN113723421A (en) 2021-11-30

Similar Documents

Publication Publication Date Title
Gao et al. Reading scene text with fully convolutional sequence modeling
CN109726657B (en) Deep learning scene text sequence recognition method
WO2023134073A1 (en) Artificial intelligence-based image description generation method and apparatus, device, and medium
CN110046671A (en) A kind of file classification method based on capsule network
CN112163429B (en) Sentence correlation obtaining method, system and medium combining cyclic network and BERT
CN115471851B (en) Burmese image text recognition method and device integrating dual attention mechanisms
CN112686345B (en) Offline English handwriting recognition method based on attention mechanism
CN112633431B (en) Tibetan-Chinese bilingual scene character recognition method based on CRNN and CTC
CN111160348A (en) Text recognition method for natural scene, storage device and computer equipment
CN113449801B (en) Image character behavior description generation method based on multi-level image context coding and decoding
CN111475622A (en) Text classification method, device, terminal and storage medium
CN114896434B (en) Hash code generation method and device based on center similarity learning
CN111523420A (en) Header classification and header list semantic identification method based on multitask deep neural network
CN107403153A (en) A kind of palmprint image recognition methods encoded based on convolutional neural networks and Hash
CN114528411B (en) Automatic construction method, device and medium for Chinese medicine knowledge graph
CN109446523A (en) Entity attribute extraction model based on BiLSTM and condition random field
CN107392229B (en) Network representation method based on most social relationship extraction
CN110175330B (en) Named entity recognition method based on attention mechanism
CN113723421B (en) Chinese character recognition method based on zero sample embedded in matching category
CN112612884A (en) Entity label automatic labeling method based on public text
CN114861601B (en) Event joint extraction method based on rotary coding and storage medium
CN116524521A (en) English character recognition method and system based on deep learning
Elagouni et al. Text recognition in videos using a recurrent connectionist approach
CN114092931B (en) Scene character recognition method and device, electronic equipment and storage medium
CN113887504B (en) Strong-generalization remote sensing image target identification method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant