CN113221885B

CN113221885B - Hierarchical modeling method and system based on whole words and radicals

Info

Publication number: CN113221885B
Application number: CN202110523430.1A
Authority: CN
Inventors: 杨争艳; 吴嘉嘉; 张为泰; 宋彦
Original assignee: University of Science and Technology of China USTC
Current assignee: University of Science and Technology of China USTC
Priority date: 2021-05-13
Filing date: 2021-05-13
Publication date: 2022-09-06
Anticipated expiration: 2041-05-13
Also published as: CN113221885A

Abstract

The invention relates to a hierarchical modeling method and a system based on whole words and radicals, wherein the method comprises the following steps: s1: the text line image is subjected to a convolution neural network and a circulation neural network to obtain the sequence characteristics of the text line image; s2: inputting the sequence characteristics of the text line images into a whole word decoding module with an attention mechanism to obtain context characteristic vectors of whole words and decoding results of the whole words; s3: inputting the context feature vector of the whole word into a radical decoding module to obtain the decoding result of each radical under the whole word level; s4: and fusing the whole word and the decoding confidence of each radical by using a confidence score fusion strategy to obtain a recognition result of the whole word. The method provided by the invention can not only realize the recognition of the whole character, but also realize the recognition of the radical at the moment, and not only can improve the recognition effect of the low-frequency character, but also maximally ensure the recognition effect of the non-low-frequency character through the strategy of fusing the decoding confidence coefficients of the whole character and the radical.

Description

Hierarchical modeling method and system based on whole words and radicals

Technical Field

The invention relates to the technical field of electronic information, in particular to a hierarchical modeling method and system based on whole characters and radical radicals.

Background

In daily life, text is an indispensable source of visual information. Compared with other contents in the image/video, the characters often contain stronger semantic information, so that the method has great significance for extracting and identifying the characters in the image. With the rapid development of deep learning, deep learning models are widely applied to the field of character recognition. However, the deep learning model requires a large amount of data to train, and if the training samples are few, it is difficult to train the model well. Particularly, for languages with a large number of characters, such as chinese, there is a problem that low-frequency character recognition is difficult.

The existing scheme for recognizing low-frequency characters is mainly based on two aspects, namely firstly, the scheme of adopting a language model, training the language model by utilizing more text corpora and recognizing the low-frequency characters under the assistance of the language model, and secondly, the scheme of adopting radical modeling, namely splitting characters according to the radicals, such as 'subject' characters, splitting the characters according to the radicals to obtain the characters

The rice and the bucket, wherein,

showing a left-right structure.

For the scheme of the language model, the recognition of the low-frequency characters depends on the language model excessively, the corpus selection of the language model seriously influences the recognition effect of the low-frequency characters, and for the scheme of modeling the radical, the whole characters are too finely divided, such as 'punish' characters, which are divided into 'month' and 'month', each single result can be regarded as a whole character, and the recognition difficulty is increased.

Disclosure of Invention

In order to solve the technical problems, the invention provides a hierarchical modeling method and system based on whole words and radicals.

The technical solution of the invention is as follows: a hierarchical modeling method based on whole words and radicals comprises the following steps:

step S1: the method comprises the steps that a text line image passes through a convolutional neural network and a cyclic neural network to obtain sequence characteristics of the text line image;

step S2: inputting the sequence characteristics of the text line images into a whole word decoding module with an attention mechanism to obtain a context characteristic vector of a whole word and a decoding result of the whole word;

step S3: inputting the context feature vector of the whole word into a radical decoding module to obtain the decoding result of each radical under the whole word level;

step S4: and respectively calculating the confidence coefficient of the decoding result of the whole word and the confidence coefficient of the decoding result of each radical by using a confidence coefficient score fusion strategy, and fusing to obtain the final recognition result of the whole word.

Compared with the prior art, the invention has the following advantages:

the invention provides a hierarchical modeling based on whole words and partial radicals, which uses the idea of partial radical modeling for reference, but is different from the existing partial radical modeling method.

Drawings

FIG. 1 is a flowchart of a hierarchical modeling method based on whole words and radicals according to an embodiment of the present invention;

fig. 2 is a step S1 in the hierarchical modeling method based on whole words and radicals in the embodiment of the present invention: the method comprises the steps that a text line image passes through a convolutional neural network and a cyclic neural network to obtain a flow chart of sequence characteristics of the text line image;

fig. 3 is a step S2 in the hierarchical modeling method based on whole words and radicals in the embodiment of the present invention: inputting the sequence characteristics of the text line images into a whole word decoding module with an attention mechanism to obtain a context characteristic vector of a whole word and a flow chart of a decoding result of the whole word;

fig. 4 is a step S3 in the hierarchical modeling method based on whole words and radicals in the embodiment of the present invention: inputting the context feature vector of the whole word into a radical decoding module to obtain a flow chart of a decoding result of each radical under the whole word level;

fig. 5 is a step S4 in the hierarchical modeling method based on whole words and radicals in the embodiment of the present invention: respectively calculating the confidence coefficient of the decoding result of the whole word and the confidence coefficient of the decoding result of each radical by using a confidence coefficient score fusion strategy, and fusing to obtain a final flow chart of the recognition result of the whole word;

FIG. 6 is a schematic diagram of a modeling structure of radicals at the whole word level according to an embodiment of the present invention;

fig. 7 is a block diagram of a hierarchical modeling system based on whole words and radicals in an embodiment of the present invention.

Detailed Description

The invention provides a hierarchical modeling method based on whole words and radical, which adopts a strategy of adding radical modeling branches under the level of whole word modeling, not only can realize the recognition of the whole words, but also can realize the recognition of the radical at the moment, and finally, the recognition effect of low-frequency words can be improved and the recognition effect of non-low-frequency words can be maximally ensured through the fusion of the whole word modeling confidence coefficient and the radical modeling confidence coefficient.

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings.

Example one

As shown in fig. 1, a hierarchical modeling method based on whole words and radicals according to an embodiment of the present invention includes the following steps:

step S1: the text line image is subjected to a convolutional neural network and a cyclic neural network to obtain sequence characteristics of the text line image;

step S2: inputting the sequence characteristics of the text line images into a whole word decoding module with an attention mechanism to obtain context characteristic vectors of whole words and decoding results of the whole words;

As shown in fig. 2, in one embodiment, the step S1: the method comprises the following steps of enabling a text line image to pass through a convolutional neural network and a cyclic neural network to obtain sequence characteristics of the text line image, and specifically comprises the following steps:

step S11: normalizing the text line image to obtain a normalized text line image;

in the embodiment of the invention, the text line picture is normalized according to the height of 64 pixels, and the pixels are normalized to [ -1, 1 ].

Step S12: inputting the normalized text line image into a convolutional neural network to obtain a feature vector of the text line image;

in this step, the normalized text line image obtained in step S11 is input to a convolutional neural network for feature extraction, but the embodiment of the present invention uses a Resnet29 neural network, the height direction of the image is downsampled 6 times, that is, reduced by 64 times, the width direction of the image is downsampled 3 times, that is, reduced by 8 times, and the size of the obtained text line image feature map is [ H, l, d ], because in the embodiment of the present invention, the image height is 64 pixels, after passing through the Resnet29S neural network, H denotes that the height H of the feature map is 1, l denotes the length of the feature map, and d denotes the number of channels of the feature map. And carrying out slicing operation on the obtained feature map in length so as to obtain feature vectors with the dimension of l being d.

Step S13: and inputting the feature vector into a recurrent neural network to obtain the sequence features of the text line image.

In this step, the feature vector of one dimension obtained in step S13 is used as an input cyclic neural network, and in the embodiment of the present invention, two layers of bidirectional LSTM are used as the cyclic neural network to output the sequence features of the text line image, where the length of the output sequence features is l.

As shown in fig. 3, in one embodiment, the step S2: inputting the sequence features of the text line image into a whole word decoding module with attention mechanism to obtain a context feature vector of a whole word and a decoding result of the whole word, and specifically comprising the following steps:

step S21: inputting the sequence characteristics of the text line images into a whole word decoding module with attention mechanism shown in the following formulas (1) to (3) to obtain a context characteristic vector c of the whole word _t ；

e _ti ＝o(s _t-1 ,h _i ) (1)

Wherein s is _t-1 In the last hidden state, h _i An ith frame representing a sequence feature, and o represents a dot product operation; alpha is alpha _ti For the weight of the attention mechanism, l is the number of the eigenvectors; c. C _t A context feature vector for the whole word;

the whole word decoding module in the embodiment of the invention adopts a layer of unidirectional LSTM.

Step S22: output y of last moment _t-1 And context feature vector c _t After the operation of the cascade layer, the whole word decoding result y at the current moment is obtained through the classification layer _t ；

The classification layer in the embodiment of the invention adopts a Softmax function.

As shown in fig. 4, in one embodiment, the step S3: inputting the context feature vector of the whole word into a radical decoding module to obtain the decoding result of each radical under the whole word level, which specifically comprises the following steps:

step S31: context feature vector c _t The output of the input radical decoding module at the time t is r _t ；

The radical decoding module in the embodiment of the invention also adopts a layer of unidirectional LSTM.

Step S32: r is _t Obtaining a decoding result of the radicals of the whole word through a classification layer;

the classification layer in this step also uses a Softmax function.

Step S33: and counting the number of the split radicals corresponding to each whole word in a batch, and taking the obtained maximum number as the maximum decoding length of the radicals in the batch.

During the training process, the radical decoding module counts the number of the split radicals corresponding to each whole word under all the whole words in a batch, and the obtained maximum number is used as the maximum decoding length of the radical decoding of the batch.

As shown in fig. 5, in one embodiment, the step S4: respectively calculating the confidence coefficient of the decoding result of the whole word and the confidence coefficient of the decoding result of each radical by using a confidence coefficient score fusion strategy, and fusing to obtain a final recognition result of the whole word, wherein the confidence coefficient score fusion strategy specifically comprises the following steps:

step S41: judging the whole word decoding result y _t Whether the Chinese character is selected, if not, y is determined _t As a final decoding result; if yes, calculating the confidence coefficient of the whole word decoding according to the formula (4), calculating the confidence coefficient of the radical decoding according to the formula (5), and turning to the step S42;

-logp _i (4)

wherein, in the formula (4), p _i Representing the recognition probability corresponding to the ith character obtained by decoding; l in the formula (5) _i The number of the split of the radical corresponding to the ith character is shown,

representing the recognition probability corresponding to the jth radical of the ith character obtained by decoding;

step S42: comparing the confidence coefficient of the whole word decoding with the confidence coefficient of the radical decoding, and taking the result with lower confidence coefficient as the final decoding result at the moment;

step S43: and repeating the steps S41-S42 for decoding at each moment until the maximum decoding length is reached or an end symbol is met.

Fig. 6 is a schematic diagram of a modeling structure of radicals at the whole word level according to an embodiment of the present invention.

The invention provides a hierarchical modeling method based on whole words and partial radicals, which adopts a hierarchical structural design that partial radical modeling branches are added under the level of whole word modeling, context feature vectors at each moment are used as input of partial radical modeling under the whole words, so that not only can the recognition of the whole words be realized, but also the recognition of the partial radicals at the moment can be realized, and finally, the recognition effect of low-frequency words can be improved and the recognition effect of non-low-frequency words can be maximally ensured through a strategy of fusing the confidence coefficient of the whole word modeling and the confidence coefficient of the partial radical modeling.

Example two

As shown in fig. 7, an embodiment of the present invention provides a hierarchical modeling system based on whole words and radicals, including the following modules:

a sequence feature obtaining module 51, configured to pass the text line image through a convolutional neural network and a cyclic neural network to obtain a sequence feature of the text line image;

a whole word context feature vector and decoding result obtaining module 52, configured to input the sequence features of the text line image into a whole word decoding module with attention mechanism, so as to obtain a whole word context feature vector and a whole word decoding result;

a decoding result module 53 for obtaining each radical, configured to input the context feature vector of the whole word into the radical decoding module, and obtain a decoding result of each radical in the whole word level;

and the recognition result obtaining module 54 is configured to calculate the confidence of the decoding result of the whole word and the confidence of the decoding result of each radical respectively by using a confidence score fusion policy, and perform fusion to obtain a final recognition result of the whole word.

The above examples are provided for the purpose of describing the present invention only and are not intended to limit the scope of the present invention. The scope of the invention is defined by the appended claims. Various equivalent substitutions and modifications can be made without departing from the spirit and principles of the invention, and are intended to be within the scope of the invention.

Claims

1. A hierarchical modeling method based on whole words and radicals is characterized by comprising the following steps:

step S1: the text line image is subjected to a convolution neural network and a circulation neural network to obtain the sequence characteristics of the text line image;

step S2: inputting the sequence features of the text line images into a whole word decoding module with attention mechanism to obtain a context feature vector of a whole word and a decoding result of the whole word, specifically comprising:

e _ti ＝o(s _t-1 ,h _i ) (1)

Wherein s is _t-1 In the last hidden state, h _i An ith frame representing the sequence feature, and o represents a dot product operation; alpha is alpha _ti For the weight of the attention mechanism, l is the number of the eigenvectors; c. C _t A context feature vector for the whole word;

step S22: output y of last moment _t-1 And the context feature vector c _t After the operation of the cascade layer, the whole word decoding result y at the current moment is obtained through the classification layer _t ；

Step S3: inputting the context feature vector of the whole word into a radical decoding module to obtain a decoding result of each radical under the whole word level, specifically comprising:

step S31: the context feature vector c _t The output of the input radical decoding module at the time t is r _t ；

step S33: counting the number of the split radicals corresponding to each whole word in a batch, and taking the maximum number as the maximum decoding length of the radicals in the batch;

2. The hierarchical modeling method based on whole words and radical components according to claim 1, characterized in that said step S1: the method includes the steps that a text line image passes through a convolution neural network and a circulation neural network to obtain sequence characteristics of the text line image, and specifically includes the following steps:

step S11: carrying out normalization processing on the text line image to obtain a normalized text line image;

step S12: inputting the normalized text line image into the convolutional neural network to obtain a feature vector of the text line image;

step S13: and inputting the feature vector into the recurrent neural network, and obtaining the sequence features of the text line image.

3. The hierarchical modeling method based on whole words and radical components according to claim 1, characterized in that said step S4: respectively calculating the confidence coefficient of the decoding result of the whole word and the confidence coefficient of the decoding result of each radical by using a confidence coefficient score fusion strategy, and fusing to obtain the final recognition result of the whole word, wherein the confidence coefficient score fusion strategy specifically comprises the following steps:

step S41: judging the whole word decoding result y _t Whether the Chinese character is selected, if not, y is determined _t As a final decoding result; if yes, calculating the confidence coefficient of the whole word decoding according to a formula (4), calculating the confidence coefficient of the radical decoding according to a formula (5), and turning to the step S42;

-log p _i (4)

wherein, in the formula (4), p _i Representing the recognition probability corresponding to the ith character obtained by decoding; l in the formula (5) _i The number of the split of the radicals corresponding to the ith character is shown,

step S42: comparing the confidence coefficient of the whole word decoding with the confidence coefficient of the radical decoding, and taking the result with smaller confidence coefficient as the final decoding result at the moment t;

step S43: and repeating the steps S41-S42 for decoding at each moment until the maximum decoding length is reached or an end character is met.

4. A hierarchical modeling system based on whole words and radicals is characterized by comprising the following modules:

the sequence feature module is used for enabling the text line images to pass through a convolutional neural network and a cyclic neural network to obtain sequence features of the text line images;

a module for obtaining context feature vectors of whole words and decoding results, configured to input sequence features of the text line images into a whole word decoding module with attention mechanism, to obtain the context feature vectors of whole words and decoding results of whole words, where the module specifically includes:

e _ti ＝o(s _t-1 ,h _i ) (1)

Wherein s is _t-1 In the last hidden state, h _i An ith frame representing the sequence feature, and o represents a dot product operation; alpha (alpha) ("alpha") _ti For the weight of the attention mechanism, l is the number of the eigenvectors; c. C _t A context feature vector for the whole word;

A decoding result module for obtaining each radical, configured to input the context feature vector of the whole word into the radical decoding module, and obtain a decoding result of each radical under the whole word level, where the decoding result specifically includes:

and the recognition result obtaining module is used for respectively calculating the confidence coefficient of the decoding result of the whole word and the confidence coefficient of the decoding result of each radical by using a confidence coefficient score fusion strategy, and fusing to obtain the final recognition result of the whole word.