CN112990175A

CN112990175A - Method and device for recognizing handwritten Chinese characters, computer equipment and storage medium

Info

Publication number: CN112990175A
Application number: CN202110357440.2A
Authority: CN
Inventors: 邱泰儒; 姚旭峰; 贾佳亚; 沈小勇; 吕江波
Original assignee: Shenzhen Smartmore Technology Co Ltd; Shanghai Smartmore Technology Co Ltd
Current assignee: Shenzhen Smartmore Technology Co Ltd; Shanghai Smartmore Technology Co Ltd
Priority date: 2021-04-01
Filing date: 2021-04-01
Publication date: 2021-06-18
Anticipated expiration: 2041-04-01
Also published as: CN112990175B

Abstract

The application relates to a method and a device for recognizing handwritten Chinese characters, computer equipment and a storage medium. The method comprises the following steps: acquiring an image to be identified; the image to be recognized comprises handwritten Chinese characters; extracting target image characteristics of the image to be recognized; the target image feature is used for representing a text feature of the image to be recognized; segmenting the target image features to obtain semantic information features of the handwritten Chinese characters; and determining the recognition result of the handwritten Chinese character in the image to be recognized according to the semantic information characteristics of the handwritten Chinese character. By adopting the method, the purpose of determining the recognition result of the handwritten Chinese character in the image to be recognized according to the semantic information characteristics of the handwritten Chinese character in the image to be recognized is realized, the recognition is only carried out aiming at the semantic information characteristics of the handwritten Chinese character in the image to be recognized, and the improvement of the recognition accuracy of the handwritten Chinese character is facilitated.

Description

Method and device for recognizing handwritten Chinese characters, computer equipment and storage medium

Technical Field

The present application relates to the field of character recognition technology, and in particular, to a method and an apparatus for recognizing handwritten chinese characters, a computer device, and a storage medium.

Background

Text is one of the most important information carriers today, and is ubiquitous in both daily life and office teaching, such as chinese characters.

At present, a recognition method for handwritten Chinese characters generally cuts out each handwritten Chinese character in a text line, and then performs character recognition on the handwritten Chinese characters one by one, so as to obtain the content of the whole text line; however, recognition of handwritten chinese characters is susceptible to the writing style of the writer, and recognition of a text line by merely dividing it into individual characters results in low recognition accuracy of handwritten chinese characters.

Disclosure of Invention

In view of the above, there is a need to provide a method, an apparatus, a computer device and a storage medium for recognizing handwritten chinese characters, which can improve the recognition accuracy of the handwritten chinese characters.

A method of recognition of a handwritten chinese character, the method comprising:

acquiring an image to be identified; the image to be recognized comprises handwritten Chinese characters;

extracting target image characteristics of the image to be recognized; the target image feature is used for representing a text feature of the image to be recognized;

segmenting the target image features to obtain semantic information features of the handwritten Chinese characters;

and determining the recognition result of the handwritten Chinese character in the image to be recognized according to the semantic information characteristics of the handwritten Chinese character.

In one embodiment, the extracting the target image feature of the image to be recognized includes:

inputting the image to be recognized into a feature extraction model to obtain image features output by a neural network layer at least two preset positions in the feature extraction model;

and aggregating the image characteristics output by the neural network layers at the at least two preset positions to obtain the target image characteristics of the image to be identified.

In one embodiment, before performing segmentation processing on the target image feature to obtain a semantic information feature of the handwritten chinese character, the method further includes:

performing convolution processing on the target image features to obtain target image features after convolution processing;

the segmenting processing of the target image features to obtain the semantic information features of the handwritten Chinese characters comprises the following steps:

carrying out segmentation processing on the target image features after the convolution processing to obtain first image features and second image features; the first image feature is used for representing semantic information features of the handwritten Chinese character, the second image feature is used for representing font appearance features of the handwritten Chinese character, and feature dimensions of the first image feature and the second image feature are the same;

and identifying the first image characteristic as a semantic information characteristic of the handwritten Chinese character.

In one embodiment, the determining, according to the semantic information feature of the handwritten chinese character, a recognition result of the handwritten chinese character in the image to be recognized includes:

converting the semantic information features of the handwritten Chinese characters into corresponding text sequence features;

and acquiring a combination of characters corresponding to each line of characteristics in the text sequence characteristics as a recognition result of the handwritten Chinese characters in the image to be recognized.

In one embodiment, the obtaining a combination of characters corresponding to each sequence feature in the text sequence features as a recognition result of the handwritten chinese character in the image to be recognized includes:

inputting the text sequence characteristics into a pre-trained text prediction model to obtain a recognition result of the handwritten Chinese characters in the image to be recognized; and the pre-trained text prediction model is used for acquiring characters corresponding to each row of characteristics in the text sequence characteristics, and combining the characters corresponding to each row of characteristics to obtain a recognition result of the handwritten Chinese characters in the image to be recognized.

In one embodiment, the pre-trained text prediction model is trained by:

acquiring sample semantic information characteristics and sample font appearance characteristics; wherein the sample semantic information features comprise a first semantic information feature of a first text image, a second semantic information feature of a second text image, and a third semantic information feature of a third text image, and the sample font appearance features comprise a first font appearance feature of the first text image, a second font appearance feature of the second text image, and a third font appearance feature of the third text image; the second text image is the same as the text content in the first text image, but the author of the text content is different; the third text image is different from the text content in the first text image, but the author of the text content is the same;

inputting a first text sequence characteristic corresponding to the first semantic information characteristic into a text prediction model to be trained to obtain a recognition result of a handwritten Chinese character in the first text image;

obtaining a target loss value according to the sample semantic information characteristics, the sample font appearance characteristics and the recognition result of the handwritten Chinese characters in the first text image;

and adjusting the model parameters of the text prediction model to be trained according to the target loss value, repeatedly training the text prediction model after model parameter adjustment until the target loss value obtained according to the trained text prediction model is smaller than a preset threshold value, and taking the trained text prediction model as the pre-trained text prediction model.

In one embodiment, the obtaining a target loss value according to the sample semantic information features, the sample font appearance features, and a recognition result of a handwritten chinese character in the first text image includes:

obtaining a first loss value according to the first semantic information characteristic, the second semantic information characteristic and the third semantic information characteristic;

obtaining a second loss value according to the first font appearance characteristic, the second font appearance characteristic and the third font appearance characteristic;

obtaining a third loss value according to a difference value between a recognition result of the handwritten Chinese character in the first text image and an actual result of the handwritten Chinese character;

and obtaining the target loss value according to the first loss value, the second loss value and the third loss value.

A device for recognition of handwritten chinese characters, the device comprising:

the image acquisition module is used for acquiring an image to be identified; the image to be recognized comprises handwritten Chinese characters;

the characteristic extraction module is used for extracting the target image characteristics of the image to be recognized; the target image feature is used for representing a text feature of the image to be recognized;

the feature segmentation module is used for carrying out segmentation processing on the target image features to obtain semantic information features of the handwritten Chinese characters;

and the character recognition module is used for determining a recognition result of the handwritten Chinese character in the image to be recognized according to the semantic information characteristics of the handwritten Chinese character.

A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:

A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:

According to the identification method, the identification device, the computer equipment and the storage medium of the handwritten Chinese characters, the image to be identified comprising the handwritten Chinese characters is obtained, and the target image characteristics of the image to be identified are extracted; the target image characteristic is used for representing the text characteristic of the image to be recognized; then, segmenting the target image characteristics to obtain semantic information characteristics of the handwritten Chinese characters; finally, according to the semantic information characteristics of the handwritten Chinese characters, determining the recognition result of the handwritten Chinese characters in the image to be recognized; therefore, the purpose of determining the recognition result of the handwritten Chinese character in the image to be recognized according to the semantic information characteristics of the handwritten Chinese character in the image to be recognized is achieved, the recognition is only carried out according to the semantic information characteristics of the handwritten Chinese character in the image to be recognized, the appearance characteristic information of the handwritten Chinese character is not considered, the recognition accuracy of the handwritten Chinese character is improved, and the defect that the recognition accuracy of the handwritten Chinese character is low due to the influence of the writing style of a writer in the recognition process is overcome.

Drawings

FIG. 1 is a flow diagram illustrating a method for recognition of handwritten Chinese characters in one embodiment;

FIG. 2 is a schematic flow chart diagram illustrating the training steps of the text prediction model in one embodiment;

FIG. 3 is a diagram illustrating training of a text prediction model in one embodiment;

FIG. 4 is a flow chart illustrating a method for recognition of handwritten Chinese characters in yet another embodiment;

FIG. 5 is a block diagram of an apparatus for recognition of handwritten Chinese characters in one embodiment;

FIG. 6 is a diagram illustrating an internal structure of a computer device according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

In an embodiment, as shown in fig. 1, a method for recognizing a handwritten chinese character is provided, which is applied to a server for example, and the server may be implemented by an independent server or a server cluster formed by a plurality of servers; it is understood that the method can also be applied to a terminal, and can also be applied to a system comprising the terminal and a server, and is realized through the interaction of the terminal and the server. In this embodiment, the method includes the steps of:

step S101, acquiring an image to be identified; the image to be recognized comprises handwritten Chinese characters.

The image to be recognized refers to an image including a string of handwritten Chinese characters, such as an image including Chinese characters written by a real user; in an actual scene, the image to be recognized may be uploaded by a terminal, may be on a network, or may be stored locally.

Wherein, the handwritten Chinese character refers to the Chinese character written by the real user; it should be noted that the handwritten chinese character mentioned in the present application refers to a text line of the handwritten chinese character, and is composed of a plurality of single handwritten chinese characters in the form of single characters.

Specifically, the terminal generates a character recognition request according to an image to be recognized which is uploaded by a user and comprises a handwritten Chinese character, and sends the character recognition request to a corresponding server; and the server analyzes the received character recognition request to obtain an image to be recognized.

Of course, the server may also obtain the image to be recognized including the handwritten chinese character from the local database, and recognize the image to be recognized.

Step S102, extracting target image characteristics of an image to be recognized; the target image features are used for representing text features of the image to be recognized.

The target image features refer to image features used for representing text features of the image to be recognized, and specifically refer to image features including depth feature information and multi-scale feature information of the image to be recognized.

Specifically, the server performs feature extraction processing on the image to be recognized through a preset target image feature extraction instruction to obtain a target image feature of the image to be recognized, and the target image feature is used as a text feature of the image to be recognized; the preset target image feature extraction instruction is an instruction for extracting a target image feature of an image to be recognized.

Certainly, the server may also input the image to be recognized into the feature extraction model, and perform convolution processing on the image to be recognized through the feature extraction model to obtain the target image feature of the image to be recognized; the feature extraction model is a neural network model used for extracting the target image features of the image to be recognized.

Step S103, the target image characteristics are segmented to obtain the semantic information characteristics of the handwritten Chinese characters.

The semantic information characteristics of the handwritten Chinese characters are used for representing the content information of the handwritten Chinese characters. It should be noted that the text features include semantic information features and font appearance features, and the target image features used for representing the text features of the image to be recognized include semantic information features, that is, the semantic information features of the handwritten chinese characters can be obtained by performing segmentation processing on the target image features.

Specifically, the server acquires a preset semantic font decoupling instruction, and according to the preset semantic font decoupling instruction, divides semantic information features from target image features used for representing text features of an image to be recognized to serve as the semantic information features of the handwritten Chinese characters. Therefore, the method is beneficial to determining the recognition result of the handwritten Chinese character in the image to be recognized subsequently according to the semantic information characteristics of the handwritten Chinese character without considering the appearance characteristics of the font, thereby improving the recognition accuracy of the handwritten Chinese character.

And step S104, determining the recognition result of the handwritten Chinese character in the image to be recognized according to the semantic information characteristics of the handwritten Chinese character.

The recognition result of the handwritten Chinese character in the image to be recognized refers to the text content corresponding to the handwritten Chinese character in the image to be recognized.

Specifically, the server converts semantic information features of the handwritten Chinese characters into corresponding text sequence features, inputs the text sequence features into a text prediction model, and processes the text sequence features through the text prediction model to obtain a recognition result of the handwritten Chinese characters in the image to be recognized. Therefore, the purpose of determining the recognition result of the handwritten Chinese character in the image to be recognized according to the semantic information characteristics of the handwritten Chinese character in the image to be recognized is achieved, the recognition is only carried out according to the semantic information characteristics of the handwritten Chinese character in the image to be recognized, and the recognition accuracy of the handwritten Chinese character is improved.

For example, the server converts the semantic information features of the handwritten Chinese characters to obtain the text sequence features of the handwritten Chinese characters; mapping each sequence feature in the text sequence features of the handwritten Chinese characters to obtain characters corresponding to each sequence feature in the text sequence features; and combining the characters corresponding to each sequence feature in the text sequence features to obtain text contents corresponding to the handwritten Chinese characters, wherein the text contents are used as recognition results of the handwritten Chinese characters in the image to be recognized.

In the identification method of the handwritten Chinese characters, the target image characteristics of the image to be identified are extracted by acquiring the image to be identified comprising the handwritten Chinese characters; the target image characteristic is used for representing the text characteristic of the image to be recognized; then, segmenting the target image characteristics to obtain semantic information characteristics of the handwritten Chinese characters; finally, according to the semantic information characteristics of the handwritten Chinese characters, determining the recognition result of the handwritten Chinese characters in the image to be recognized; therefore, the purpose of determining the recognition result of the handwritten Chinese character in the image to be recognized according to the semantic information characteristics of the handwritten Chinese character in the image to be recognized is achieved, the recognition is only carried out according to the semantic information characteristics of the handwritten Chinese character in the image to be recognized, the appearance characteristic information of the handwritten Chinese character is not considered, the recognition accuracy of the handwritten Chinese character is improved, and the defect that the recognition accuracy of the handwritten Chinese character is low due to the influence of the writing style of a writer in the recognition process is overcome.

In an embodiment, the step S101 of extracting the target image feature of the image to be recognized specifically includes: inputting an image to be recognized into a feature extraction model to obtain image features output by a neural network layer at least two preset positions in the feature extraction model; and aggregating the image characteristics output by the neural network layers at least two preset positions to obtain the target image characteristics of the image to be recognized.

The characteristic extraction model is a neural network model comprising a plurality of neural network layers and is used for extracting the image characteristics of the image to be identified. It should be noted that the neural network layers at least at two preset positions refer to two or more neural network layers at specific positions; in the actual scene, three continuous neural network layers are specifically referred to.

Specifically, the server inputs the image to be recognized into the feature extraction model, and outputs the image features of the image to be recognized through a third layer neural network layer, a fourth layer neural network layer and a fifth layer neural network layer of the feature extraction model; the image features output by the third layer of neural network layer and the fourth layer of neural network layer comprise multi-scale feature information, and the image features output by the fifth layer of neural network layer comprise depth feature information; and splicing the image characteristics of the images to be recognized output by the third layer of neural network layer, the fourth layer of neural network layer and the fifth layer of neural network layer through the image characteristic splicing instruction to obtain image characteristics containing multi-scale characteristic information and depth characteristic information, wherein the image characteristics are used as target image characteristics of the images to be recognized and are used for representing text characteristics of the images to be recognized.

For example, the server inputs the image to be recognized into a residual error Network (ResNet50), and uses an FPN (Feature Pyramid Network) to aggregate the image features output by the 3 rd, 4 th and 5 th residual error modules of the residual error Network, so as to obtain an image Feature including multi-scale Feature information and depth Feature information, which is used as a target image Feature for representing the text Feature of the image to be recognized. For example, the server performs up-sampling amplification on the image features output by the 5 th residual error module, and then performs splicing and convolution operations on the image features output by the 4 th residual error module; and then, performing up-sampling amplification on the obtained result, and performing splicing and convolution on the result and the image features output by the 3 rd residual error module to obtain the target image features of the image to be identified.

According to the technical scheme provided by the embodiment, the target image characteristics including the multi-scale characteristic information and the depth characteristic information are obtained by inputting the image to be recognized into the characteristic extraction model, so that the recognition result of the handwritten Chinese characters obtained subsequently based on the target image characteristics is more accurate, and the recognition accuracy of the handwritten Chinese characters is further improved.

In an embodiment, before the step S103 of performing segmentation processing on the target image feature to obtain the semantic information feature of the handwritten chinese character, the method further includes: performing convolution processing on the target image characteristics to obtain target image characteristics after the convolution processing; then, in step S103, the segmentation processing is performed on the target image feature to obtain the semantic information feature of the handwritten chinese character, which specifically includes: segmenting the target image features after the convolution processing to obtain first image features and second image features; the first image characteristic is used for representing semantic information characteristics of the handwritten Chinese character, the second image characteristic is used for representing font appearance characteristics of the handwritten Chinese character, and the characteristic dimensions of the first image characteristic and the second image characteristic are the same; and recognizing the first image characteristic as a semantic information characteristic of the handwritten Chinese character.

The convolution processing is performed on the target image features in order to reduce the dimension of the target image features. The font appearance characteristics of the handwritten Chinese characters are used for representing appearance information of the handwritten Chinese characters.

Specifically, the server inputs the target image features into a convolution network, and performs convolution processing on the target image features through the convolution network to obtain the target image features after the convolution processing; inputting the target image features subjected to convolution processing into a semantic font decoupling network, and performing segmentation processing on the target image features subjected to convolution processing through the semantic font decoupling network to obtain two groups of image features, namely a first image feature used for representing semantic information features of the handwritten Chinese characters and a second image feature used for representing font appearance features of the handwritten Chinese characters; and finally, recognizing the first image characteristic as the semantic information characteristic of the handwritten Chinese character, and recognizing the second image characteristic as the font appearance characteristic of the handwritten Chinese character.

For example, the server performs dimension reduction processing on the target image feature through a 1 × 1 convolutional network to obtain a dimension-reduced target image feature; and cutting the target image features subjected to dimensionality reduction into halves from the feature dimensionality to obtain two groups of first image features and second image features with the same feature dimensionality, wherein the two groups of first image features and second image features are respectively used for representing semantic information features and font appearance features of the handwritten Chinese characters, for example, a 1024-dimensional target image feature is divided into two 512-dimensional image features.

According to the technical scheme provided by the embodiment, the target image features are subjected to convolution processing, and the semantic information features of the handwritten Chinese characters are segmented from the target image features subjected to convolution processing, so that the recognition result of the handwritten Chinese characters in the image to be recognized can be determined according to the semantic information features of the handwritten Chinese characters in the follow-up process, the appearance features of fonts do not need to be considered, and the recognition accuracy of the handwritten Chinese characters is improved.

In an embodiment, the step S104 of determining a recognition result of the handwritten chinese character in the image to be recognized according to the semantic information feature of the handwritten chinese character specifically includes: converting semantic information characteristics of the handwritten Chinese characters into corresponding text sequence characteristics; and acquiring a combination of characters corresponding to each column of characteristics in the text sequence characteristics as a recognition result of the handwritten Chinese characters in the image to be recognized.

Wherein the text sequence feature is comprised of a plurality of columns of features.

Specifically, the server inputs the semantic information features of the handwritten Chinese characters into a bidirectional long-short term memory network, and converts the semantic information features of the handwritten Chinese characters through the bidirectional long-short term memory network to obtain text sequence features of the handwritten Chinese characters; inquiring the mapping relation between the features and the characters to obtain characters corresponding to each line of features in the text sequence features; and combining the characters corresponding to each line of characteristics in the text sequence characteristics to obtain a recognition result of the handwritten Chinese characters in the image to be recognized.

For example, the server inputs semantic information features of the handwritten Chinese characters into a bidirectional long-short term memory network, a hidden state is obtained through operations such as convolution pooling, the hidden state is iterated for 35 times in the long-short term memory network, 1C-dimensional vector is output each time, and finally a 35 xC sequence feature is obtained through splicing and serves as a text sequence feature of the handwritten Chinese characters; and mapping each line of characteristics of the text sequence characteristics to corresponding characters to obtain a final recognition result.

The technical scheme provided by the embodiment realizes the purpose of determining the recognition result of the handwritten Chinese character in the image to be recognized according to the semantic information characteristics of the handwritten Chinese character in the image to be recognized, only recognizes the semantic information characteristics of the handwritten Chinese character in the image to be recognized, and is beneficial to improving the recognition accuracy of the handwritten Chinese character.

In one embodiment, acquiring a combination of characters corresponding to each sequence feature in the text sequence features as a recognition result of a handwritten chinese character in an image to be recognized specifically includes: inputting the text sequence characteristics into a pre-trained text prediction model to obtain a recognition result of the handwritten Chinese characters in the image to be recognized; the pre-trained text prediction model is used for acquiring characters corresponding to each line of characteristics in the text sequence characteristics, and combining the characters corresponding to each line of characteristics to obtain a recognition result of the handwritten Chinese characters in the image to be recognized.

The pre-trained text prediction model is a model for predicting text contents of handwritten Chinese characters, such as an attention-based sequence prediction model.

Specifically, the server inputs the text sequence features into a pre-trained text prediction model, and maps each column feature of the text sequence features to corresponding characters through the pre-trained text prediction model based on an attention mechanism to obtain characters corresponding to each column feature in the text sequence features, for example, for the text sequence features including A, B, C, D and E column features, a character corresponding to the a column feature is "american", a character corresponding to the B column feature is "beautiful", a character corresponding to the C column feature is "normal", a character corresponding to the D column feature is "day", and a character corresponding to the D column feature is "null"; and combining characters corresponding to each column of characteristics of the text sequence characteristics to obtain combined characters, such as beautiful sky, which are used as recognition results of handwritten Chinese characters in the image to be recognized.

The technical scheme provided by the embodiment only identifies the text sequence features converted from the semantic information features of the handwritten Chinese characters in the image to be identified, does not consider the appearance feature information of the handwritten Chinese characters, is favorable for improving the identification accuracy of the handwritten Chinese characters, and avoids the defect of low identification accuracy of the handwritten Chinese characters caused by the influence of the writing style of a writer in the identification process.

In an embodiment, as shown in fig. 2, the method for recognizing handwritten chinese characters further includes a training step of a text prediction model, and specifically includes the following steps:

step S201, obtaining semantic information characteristics and font appearance characteristics of the sample.

The sample semantic information features comprise a first semantic information feature of a first text image, a second semantic information feature of a second text image and a third semantic information feature of a third text image, and the sample font appearance features comprise a first font appearance feature of the first text image, a second font appearance feature of the second text image and a third font appearance feature of the third text image; the second text image is the same as the text content in the first text image, but the author of the text content is different; the third text image is different from the text content in the first text image, but the author of the text content is the same.

The first text image, the second text image and the third text image respectively comprise handwritten Chinese characters, the handwritten Chinese characters are derived from a data set in a text line form of the handwritten Chinese characters, and the data set in the text line form of the handwritten Chinese characters is obtained by manually generating the data set in a single character form of the handwritten Chinese characters.

It should be noted that the difficulty of identifying handwritten Chinese characters is mainly focused on the large intra-class distance, the small inter-class distance and the lack of text line data. For the problem of large intra-class distance, the handwritten data is collected from different writers, and the style difference of the fonts of different writers is possibly large, so that the same character presents completely different appearances in the data of different writers. For the problem of too small difference between classes, because the Chinese character data is different from English data, the types of the commonly used Chinese characters are as high as 7000, and a large number of characters with extremely similar fonts exist; meanwhile, due to the problem of style difference of writers, two different characters with similar fonts can have extremely small appearance difference in different writer data. For the problem of data shortage of text lines, because the text data different from street view can be obtained in large quantity, the handwritten Chinese data needs a large amount of manual writing and labeling processes; the number of text line form datasets currently used for handwritten chinese character recognition is more difficult to obtain and less numerous than street view text datasets.

Step S202, inputting a first text sequence characteristic corresponding to the first semantic information characteristic into a text prediction model to be trained, and obtaining a recognition result of a handwritten Chinese character in a first text image.

Step S203, obtaining a target loss value according to the sample semantic information characteristics, the sample font appearance characteristics and the recognition result of the handwritten Chinese characters in the first text image.

Specifically, obtaining a target loss value according to the sample semantic information features, the sample font appearance features, and the recognition result of the handwritten Chinese character in the first text image includes: obtaining a first loss value according to the first semantic information characteristic, the second semantic information characteristic and the third semantic information characteristic; obtaining a second loss value according to the first font appearance characteristic, the second font appearance characteristic and the third font appearance characteristic; obtaining a third loss value according to a difference value between the recognition result of the handwritten Chinese character in the first text image and the actual result of the handwritten Chinese character; and obtaining a target loss value according to the first loss value, the second loss value and the third loss value.

For example, the server obtains a first loss value based on the first semantic information feature, the second semantic information feature and the third semantic information feature in combination with the first loss function; obtaining a second loss value by combining a second loss function based on the first font appearance characteristic, the second font appearance characteristic and the third font appearance characteristic; obtaining a third loss value by combining a third loss function based on a difference value between the recognition result of the handwritten Chinese character in the first text image and the actual result of the handwritten Chinese character; acquiring a first product between a first loss value and a first coefficient corresponding to the first loss value, a second product between a second loss value and a second coefficient corresponding to the second loss value, and a third product between a third loss value and a third coefficient corresponding to the third loss value; and adding the first product, the second product and the third product to obtain a target loss value.

And S204, adjusting model parameters of the text prediction model to be trained according to the target loss value, repeatedly training the text prediction model after model parameter adjustment until the target loss value obtained according to the trained text prediction model is smaller than a preset threshold value, and taking the trained text prediction model as a pre-trained text prediction model.

Specifically, if the target loss value is smaller than the preset threshold, adjusting a model parameter of the text prediction model to be trained according to the target loss value, and repeatedly executing the steps S201 to S203 to repeatedly train the text prediction model after the model parameter adjustment until the target loss value obtained according to the trained text prediction model is smaller than the preset threshold, and stopping training; and if the target loss value obtained according to the trained text prediction model is smaller than a preset threshold value, taking the trained text prediction model as a pre-trained text prediction model.

For example, referring to fig. 3, fig. 3 is a handwritten chinese character recognition network based on semantic font decoupling, which is composed of three networks, namely, a feature extraction network, a sequence modeling network, and a text prediction network; specifically, referring to fig. 3, the server first obtains a triplet, where the triplet is composed of a text picture a to be recognized, a text picture P that is different from a writer of the text picture a to be recognized but has the same text, and a text picture N that is the same as the writer of the text picture a to be recognized but has a different text; secondly, inputting the triples into a backbone network sharing weight, such as a residual network, and performing feature extraction processing on each text picture through the backbone network sharing weight to obtain a feature map containing multi-scale features of each text picture, wherein the feature map is correspondingly used as the text features of each picture; then inputting the text characteristics of each text picture into a semantic font decoupling module, performing dimension reduction processing on the text characteristics of each text picture through a 1 x 1 convolutional network, and dividing to obtain two groups of characteristic representations, which respectively represent the semantic information characteristics and font appearance characteristics of each text picture, such as the semantic information characteristics and font appearance characteristics of a text picture A to be identified, the semantic information characteristics and font appearance characteristics of a text picture P, and the semantic information characteristics and font appearance characteristics of a text picture N; inputting one of semantic information features of the text picture A to be recognized into a bidirectional long and short term memory network (BilSTM), and outputting corresponding text sequence features through the bidirectional long and short term memory network for subsequent text recognition; calculating the other semantic information characteristic of the text picture A to be recognized, the semantic information characteristics of the text picture P and the text picture N together to obtain a first loss value Lsem; and the font appearance characteristics of the text picture A to be recognized are combined with the font appearance characteristics of the text picture P and the text picture N for calculation to obtain a second loss value lfent. And then, inputting the text sequence feature of the text picture A to be recognized into an attention-based sequence prediction module, automatically acquiring semantic information in the text sequence feature through the attention-based sequence prediction module to obtain a final recognition result, and combining the final recognition result with the character label of the text picture A to be recognized to calculate to obtain a third loss value Lrec, so that a final target loss value L is obtained, wherein the final target loss value L is Lrec + lambda 1 xLsem + lambda 2 xLfont. And finally, training the handwritten Chinese character recognition network based on the semantic font decoupling according to the target loss value to obtain the trained handwritten Chinese character recognition network.

Furthermore, after a pre-trained text prediction model is obtained, in order to reduce the number of parameters of the text prediction model and enable the text prediction model to be more conveniently deployed on a mobile terminal application platform, the text prediction model can be subjected to model compression in a model pruning mode, so that the size of the model of the text prediction model is reduced, the running speed of the text prediction model on the mobile terminal application platform is increased, and the purpose of quickly recognizing handwritten Chinese characters is achieved.

According to the technical scheme provided by the embodiment, the text prediction model is trained for multiple times, so that the accuracy of the recognition result obtained through the trained text prediction model is improved, and the recognition accuracy of the handwritten Chinese character is improved.

In one embodiment, as shown in fig. 4, another method for recognizing a handwritten chinese character is provided, which is described by taking the method as an example applied to a server, and includes the following steps:

step S401, acquiring an image to be identified; the image to be recognized comprises handwritten Chinese characters.

Step S402, inputting the image to be identified into the feature extraction model to obtain the image features output by the neural network layer at least two preset positions in the feature extraction model.

And S403, performing aggregation processing on the image features output by the neural network layers at least two preset positions to obtain the target image features of the image to be recognized.

And S404, performing convolution processing on the target image characteristics to obtain the target image characteristics after the convolution processing.

Step S405, segmenting the target image features after the convolution processing to obtain first image features and second image features; the first image feature is used for representing semantic information features of the handwritten Chinese characters, the second image feature is used for representing font appearance features of the handwritten Chinese characters, and feature dimensions of the first image feature and the second image feature are the same.

In step S406, the first image feature is recognized as a semantic information feature of the handwritten chinese character.

Step S407, converting the semantic information features of the handwritten Chinese characters into corresponding text sequence features.

Step S408, inputting the text sequence characteristics into a pre-trained text prediction model to obtain a recognition result of the handwritten Chinese characters in the image to be recognized; the pre-trained text prediction model is used for acquiring characters corresponding to each line of characteristics in the text sequence characteristics, and combining the characters corresponding to each line of characteristics to obtain a recognition result of the handwritten Chinese characters in the image to be recognized.

According to the method for recognizing the handwritten Chinese characters, the purpose of determining the recognition result of the handwritten Chinese characters in the image to be recognized according to the semantic information features of the handwritten Chinese characters in the image to be recognized is achieved, the recognition is only performed according to the semantic information features of the handwritten Chinese characters in the image to be recognized, the appearance feature information of the handwritten Chinese characters is not considered, the recognition accuracy of the handwritten Chinese characters is improved, and the defect that the recognition accuracy of the handwritten Chinese characters is low due to the fact that the recognition accuracy is influenced by the writing style of a writer in the recognition process is overcome.

In one embodiment, the application further provides a Semantic-Font decoupling-based handwritten Chinese character recognition Network (SFDN). A semantic-font decoupling module (semantic-font decoupling module) is introduced into the network and used for decoupling semantic information of text characters and font information of different writer styles, so that the model can more robustly recognize Chinese handwritten character data from different writers. In addition, in the training process, the network introduces a triple loss function (triplet loss) to minimize the intra-class distance of the same character and maximize the inter-class distance of different characters, so that the model can more accurately distinguish the handwriting data which is difficult to recognize. Finally, in order to achieve high operation efficiency of the model on the mobile-end application platform, the proposed network model is compressed by a model pruning method, so that the size of the model is reduced to one third of the original size.

The embodiment can achieve the following technical effects: (1) the handwritten Chinese character recognition network based on the semantic font decoupling aims at the problem of inter-class distance of characters in a handwritten Chinese character recognition task, and by decoupling semantic information and font appearance information of the characters, the accuracy of a model on a handwritten Chinese character reference data set CASIA-HWDB reaches 82.11%, and the current optimal (state-of-the-art) effect is achieved; (2) by using a simpler system framework and a mode of pruning the model, the running speed of the model on an application platform of the mobile terminal can be greatly increased, and the running speed reaches 27.8FPS and the real-time deduction speed of the mobile terminal on the premise that the accuracy can still be kept at 80.10%. (3) The semantic font decoupling module is provided, so that the semantic information of text characters and the font information of different writer styles can be decoupled, and the data characteristics of handwritten Chinese characters are improved by combining with triple Loss, and the robustness of the model is improved; (4) a data set in a handwritten Chinese character text line form is constructed by manually generating a data set in a handwritten Chinese character single character form, is composed of a series of triple data, and can be used for training and evaluating handwritten Chinese character recognition.

It should be understood that although the steps in the flowcharts of fig. 1, 2, and 4 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 1, 2, and 4 may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternatively with other steps or at least some of the other steps or stages.

In one embodiment, as shown in fig. 5, there is provided a recognition apparatus for handwritten chinese characters, comprising: an image acquisition module 510, a feature extraction module 520, a feature segmentation module 530, and a character recognition module 540, wherein:

an image obtaining module 510, configured to obtain an image to be identified; the image to be recognized comprises handwritten Chinese characters.

A feature extraction module 520, configured to extract a target image feature of the image to be identified; the target image features are used for representing text features of the image to be recognized.

The feature segmentation module 530 is configured to perform segmentation processing on the target image features to obtain semantic information features of the handwritten chinese characters.

And the character recognition module 540 is configured to determine a recognition result of the handwritten chinese character in the image to be recognized according to semantic information features of the handwritten chinese character.

In an embodiment, the feature extraction module 520 is further configured to input the image to be recognized into the feature extraction model, so as to obtain image features output by the neural network layer at least two preset positions in the feature extraction model; and aggregating the image characteristics output by the neural network layers at least two preset positions to obtain the target image characteristics of the image to be recognized.

In an embodiment, the recognition apparatus for handwritten chinese characters further includes a convolution processing module, configured to perform convolution processing on the target image feature to obtain a target image feature after the convolution processing;

the feature segmentation module 530 is further configured to perform segmentation processing on the convolved target image features to obtain a first image feature and a second image feature; the first image characteristic is used for representing semantic information characteristics of the handwritten Chinese character, the second image characteristic is used for representing font appearance characteristics of the handwritten Chinese character, and the characteristic dimensions of the first image characteristic and the second image characteristic are the same; and recognizing the first image characteristic as a semantic information characteristic of the handwritten Chinese character.

In one embodiment, the character recognition module 540 is further configured to convert semantic information features of the handwritten chinese character into corresponding text sequence features; and acquiring a combination of characters corresponding to each column of characteristics in the text sequence characteristics as a recognition result of the handwritten Chinese characters in the image to be recognized.

In one embodiment, the character recognition module 540 is further configured to input the text sequence features into a pre-trained text prediction model to obtain a recognition result of the handwritten chinese character in the image to be recognized; the pre-trained text prediction model is used for acquiring characters corresponding to each line of characteristics in the text sequence characteristics, and combining the characters corresponding to each line of characteristics to obtain a recognition result of the handwritten Chinese characters in the image to be recognized.

In one embodiment, the recognition device for handwritten Chinese characters further comprises a model training module, a model generating module and a model generating module, wherein the model training module is used for acquiring sample semantic information characteristics and sample font appearance characteristics; the sample semantic information features comprise a first semantic information feature of a first text image, a second semantic information feature of a second text image and a third semantic information feature of a third text image, and the sample font appearance features comprise a first font appearance feature of the first text image, a second font appearance feature of the second text image and a third font appearance feature of the third text image; the second text image is the same as the text content in the first text image, but the author of the text content is different; the third text image is different from the text content in the first text image, but the author of the text content is the same; inputting a first text sequence characteristic corresponding to the first semantic information characteristic into a text prediction model to be trained to obtain a recognition result of a handwritten Chinese character in a first text image; obtaining a target loss value according to the sample semantic information characteristics, the sample font appearance characteristics and the recognition result of the handwritten Chinese characters in the first text image; and adjusting the model parameters of the text prediction model to be trained according to the target loss value, repeatedly training the text prediction model after the model parameters are adjusted until the target loss value obtained according to the trained text prediction model is smaller than a preset threshold value, and taking the trained text prediction model as the pre-trained text prediction model.

In one embodiment, the model training module is further configured to obtain a first loss value according to the first semantic information feature, the second semantic information feature, and the third semantic information feature; obtaining a second loss value according to the first font appearance characteristic, the second font appearance characteristic and the third font appearance characteristic; obtaining a third loss value according to a difference value between the recognition result of the handwritten Chinese character in the first text image and the actual result of the handwritten Chinese character; and obtaining a target loss value according to the first loss value, the second loss value and the third loss value.

For the specific limitations of the recognition device for handwritten chinese characters, reference may be made to the above limitations of the recognition method for handwritten chinese characters, which are not described herein again. All modules in the device for recognizing handwritten Chinese characters can be completely or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 6. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing data such as target image characteristics, semantic information characteristics, recognition results and the like. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method of recognition of handwritten Chinese characters.

Those skilled in the art will appreciate that the architecture shown in fig. 6 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is further provided, which includes a memory and a processor, the memory stores a computer program, and the processor implements the steps of the above method embodiments when executing the computer program.

In an embodiment, a computer-readable storage medium is provided, in which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method embodiments.

In one embodiment, a computer program product or computer program is provided that includes computer instructions stored in a computer-readable storage medium. The computer instructions are read by a processor of a computer device from a computer-readable storage medium, and the computer instructions are executed by the processor to cause the computer device to perform the steps in the above-mentioned method embodiments.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A method for recognizing handwritten chinese characters, the method comprising:

2. The method according to claim 1, wherein the extracting the target image feature of the image to be recognized comprises:

3. The method according to claim 1, before segmenting the target image features to obtain semantic information features of the handwritten chinese character, further comprising:

4. The method according to claim 1, wherein the determining a recognition result of the handwritten chinese character in the image to be recognized according to semantic information features of the handwritten chinese character comprises:

5. The method according to claim 4, wherein the obtaining of the combination of the characters corresponding to each of the sequence features of the text as the recognition result of the handwritten Chinese character in the image to be recognized comprises:

6. The method of claim 5, wherein the pre-trained text prediction model is trained by:

7. The method of claim 6, wherein obtaining a target loss value based on the sample semantic information features, the sample font appearance features, and recognition results for handwritten Chinese characters in the first text image comprises:

8. An apparatus for recognition of handwritten chinese characters, said apparatus comprising:

9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 7.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.