CN114494678A

CN114494678A - Character recognition method and electronic equipment

Info

Publication number: CN114494678A
Application number: CN202111473992.6A
Authority: CN
Inventors: 陈苏; 王维晟; 石光; 吴志敏; 金鑫; 王宇; 刘柏锋; 焦亮
Original assignee: National Computer Network and Information Security Management Center
Current assignee: National Computer Network and Information Security Management Center
Priority date: 2021-12-02
Filing date: 2021-12-02
Publication date: 2022-05-13

Abstract

The invention provides a character recognition method and electronic equipment, wherein the method comprises the following steps: determining an image to be recognized, wherein the image to be recognized comprises characters to be recognized; acquiring character type information, and inputting an image to be recognized into a character type recognition model to obtain character type information; the character type recognition model extracts the features of the image to be recognized based on a convolutional neural network to obtain a target feature map containing multi-dimensional features, and performs character type recognition on characters to be recognized of the image to be recognized; the character type recognition model is obtained by training sample images of characters to be recognized based on multiple types; determining a character detection and identification model corresponding to the character type information; and carrying out character recognition on the characters to be recognized of the images to be recognized based on the character detection recognition model.

Description

Character recognition method and electronic equipment

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a character recognition method and electronic equipment.

Background

In the existing method, the characters in the picture are generally identified by manually checking the picture information, and the accuracy and efficiency of identifying the characters of massive and multi-type picture data in the internet are poor.

Disclosure of Invention

The invention provides a character recognition method and electronic equipment, which are used for solving the defects that in the prior art, the character recognition in a picture is carried out by manually checking picture information, and the accuracy and efficiency of carrying out character recognition on massive and multi-type picture data in the Internet are poor.

The invention provides a character recognition method, which comprises the following steps:

determining an image to be recognized, wherein the image to be recognized comprises characters to be recognized;

acquiring character type information, and inputting the image to be recognized into a character type recognition model to obtain the character type information; the character type recognition model performs feature extraction on the image to be recognized based on a convolutional neural network to obtain a target feature map containing multi-dimensional features, and performs character type recognition on characters to be recognized of the image to be recognized; the character type recognition model is obtained by training a sample image of characters to be recognized based on multiple types;

determining a character detection and identification model corresponding to the character type information; and carrying out character recognition on the characters to be recognized of the images to be recognized based on the character detection recognition model.

According to a character recognition method provided by the invention, the character type recognition model comprises a feature extraction layer, and the feature extraction layer comprises:

the first convolution layer is used for performing convolution operation on the image to be identified through a preset first convolution core to obtain a first characteristic diagram;

the first linear transformation layer is used for linearly transforming the first characteristic diagram through a preset second convolution kernel to obtain a first abstract characteristic diagram with the size consistent with that of the image to be identified;

and the pooling layer is used for performing maximum pooling on the first abstract feature map to obtain the first target feature map containing the multi-dimensional features.

According to the character recognition method provided by the invention, the character type recognition model further comprises:

and the scene classification layer is used for carrying out scene classification on the first target characteristic diagram output by the characteristic extraction layer to obtain character type information of the characters to be recognized.

According to the character recognition method provided by the invention, the scene classification layer comprises the following steps:

the second convolution layer is used for carrying out full convolution operation on the first target feature map to obtain a first target one-dimensional feature vector;

the calculation classification layer is used for calculating the first target one-dimensional feature vector through a softmax function to obtain scene classification probabilities of multiple types;

and the classification output layer is used for obtaining the scene classification corresponding to the maximum scene classification probability from the scene classification probabilities of the multiple categories and outputting the scene classification corresponding to the maximum scene classification probability as the character category information.

According to the character recognition method provided by the invention, before the image to be recognized is determined, the character recognition method further comprises the step of training the character type recognition model; the training of the character type recognition model comprises:

obtaining a sample image, wherein the sample image comprises characters to be identified;

performing feature extraction on the sample image based on a convolutional neural network to obtain a second target feature map containing multi-dimensional features;

and carrying out scene classification on the second target characteristic graph to obtain the character type information of the characters to be recognized of the sample image.

According to the character recognition method provided by the invention, the feature extraction is performed on the sample image based on the convolutional neural network to obtain a second target feature map containing multi-dimensional features, and the method comprises the following steps:

performing convolution operation on the sample image through a preset first convolution core to obtain a second characteristic diagram;

linearly transforming the second feature map through a preset second convolution kernel to obtain a second abstract feature map with the size consistent with that of the image to be identified;

and performing maximum pooling on the second abstract feature map to obtain the second target feature map containing the multi-dimensional features.

According to the character recognition method provided by the invention, the scene classification of the second target feature map is performed to obtain the character type information of the character to be recognized of the sample image, and the method comprises the following steps:

performing full convolution operation on the second target feature map to obtain a second target one-dimensional feature vector;

calculating the second target one-dimensional feature vector through a softmax function to obtain scene classification probabilities of multiple types;

and obtaining the scene classification corresponding to the maximum classification probability from the scene classification probabilities of the multiple categories, and outputting the scene classification corresponding to the maximum scene classification probability as the character category information.

According to the character recognition method provided by the invention, the character detection recognition model corresponding to the character type information is determined; based on the character detection and recognition model, the character recognition is carried out on the characters to be recognized of the images to be recognized, and the character recognition method comprises the following steps:

when the character type information is a natural scene character, the character detection and recognition model includes an EAST model and a CRNN model, and the character recognition of the character to be recognized of the image to be recognized based on the character detection and recognition model includes:

detecting a first character region position of a natural scene character in the image to be recognized based on an EAST model;

and identifying the text content of the natural scene text at the position of the first text area based on the CRNN model.

According to the character recognition method provided by the invention, when the character type information is a printed character or a handwritten character, the character detection recognition model comprises a CTPN model and a CRNN model, and the character recognition of the character to be recognized of the image to be recognized based on the character detection recognition model comprises the following steps:

detecting a second character area position of the printed characters or the handwritten characters in the image to be recognized based on the CTPN model;

and identifying the text content of the printed text or the handwritten text at the position of the second text area based on the CRNN model.

According to the character recognition method provided by the invention, after the image to be recognized is determined, the image to be recognized is preprocessed;

the preprocessing the image to be recognized comprises the following steps:

converting the image to be identified into a pixel matrix;

adjusting the size of the pixel matrix to meet the input requirement of the convolutional neural network;

deleting the abnormal image to be identified; the abnormal image to be identified comprises the image to be identified with data missing or bitmap error.

The present invention also provides a character recognition apparatus, comprising:

the device comprises a first determining module, a second determining module and a judging module, wherein the first determining module is used for determining an image to be recognized, and the image to be recognized comprises characters to be recognized;

the character type acquisition module is used for acquiring character type information, and inputting the image to be recognized into a character type recognition model to obtain the character type information; the character type recognition model performs feature extraction on the image to be recognized based on a convolutional neural network to obtain a target feature map containing multi-dimensional features, and performs character type recognition on characters to be recognized of the image to be recognized; the character type recognition model is obtained by training sample images of characters to be recognized based on multiple types;

the second determining module is used for determining a character detection and identification model corresponding to the character type information; and carrying out character recognition on the characters to be recognized of the images to be recognized based on the character detection recognition model.

The invention also provides an electronic device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the program to realize the steps of any one of the character recognition methods.

The invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the text recognition method as described in any one of the above.

The invention also provides a computer program product comprising a computer program which, when executed by a processor, performs the steps of the character recognition method as described in any one of the above.

The character recognition method and the electronic equipment provided by the invention determine the character type information of the character to be recognized in the image to be recognized through the character type recognition model; determining a character detection and identification model corresponding to the character type information; and carrying out character recognition on the characters to be recognized of the images to be recognized based on the character detection recognition model.

The character type recognition model performs feature extraction on the image to be recognized based on a convolutional neural network to obtain a target feature map containing multi-dimensional features, and performs character type recognition on characters to be recognized of the image to be recognized; and the target feature map containing the multi-dimensional features represents the features of different character types, so that different character types of characters to be recognized in the image to be recognized can be recognized through the character type recognition model.

Compared with manual identification of characters in the picture, the method and the device for identifying the characters in the picture determine character type information of the characters to be identified in the image to be identified through the character type identification model; and then carrying out character recognition on the characters to be recognized of the images to be recognized based on the character detection recognition model. The accuracy and efficiency of character recognition on massive and multi-type picture data in the Internet are improved. In addition, training a character type recognition model through sample images based on a plurality of types of characters to be recognized; and the range of the character type recognition model for recognizing the characters to be recognized in the image to be recognized is enlarged.

Drawings

In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

FIG. 1 is a flow chart of a text recognition method according to the present invention;

FIG. 2 is a second schematic flow chart of the text recognition method according to the present invention;

FIG. 3 is a flowchart illustrating an embodiment of detecting a position of a first text region of a text in a natural scene in an image to be recognized based on an EAST model;

FIG. 4 is a third schematic flow chart of a text recognition method according to the present invention;

fig. 5 is a flowchart illustrating an embodiment of detecting a second text region position of a printed text or a handwritten text in the image to be recognized based on a CTPN model;

FIG. 6 is a schematic diagram of an RPN network of the present invention;

FIG. 7 is a block diagram illustrating a process of performing text recognition on an image to be recognized through a CRNN model according to an embodiment of the present invention;

FIG. 8 is a diagram illustrating recognition of text content according to a text detection recognition model corresponding to a text type according to an embodiment of the present invention;

FIG. 9 is a block diagram illustrating an overall flow of a text recognition method according to an embodiment of the present invention;

FIG. 10 is a fourth flowchart illustrating a character recognition method according to the present invention;

FIG. 11 is a fifth flowchart illustrating a character recognition method according to the present invention;

FIG. 12 is a schematic structural diagram of a character recognition device according to the present invention;

fig. 13 is a schematic structural diagram of an electronic device provided by the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The character recognition method of the present invention is described below with reference to fig. 1 to 11.

Referring to fig. 1, the character recognition method of the present invention includes:

step 200, determining an image to be recognized, wherein the image to be recognized comprises characters to be recognized;

determining an image to be recognized including the characters to be recognized through the electronic equipment. Various types or kinds of image characters are spread in internet scenes. For example, (1) text is printed conventionally, such as by scanning text in a document. (2) text in natural scenes, such as billboard recognition in street view photographs, etc.; (3) and handwriting characters, namely characters which are handwritten by the user.

It should be noted that the image to be recognized supports decoding of specific attributes, such as base64 encoding. And the image to be recognized has a plurality of mainstream data formats, including data formats such as jpg, jpeg, bmp, png, gif, tif and the like.

Step 300, acquiring character type information, and inputting the image to be recognized into a character type recognition model to obtain the character type information; the character type recognition model performs feature extraction on the image to be recognized based on a convolutional neural network to obtain a target feature map containing multi-dimensional features, and performs character type recognition on characters to be recognized of the image to be recognized; the character type recognition model is obtained by training sample images of characters to be recognized based on multiple types;

the target feature map containing the multi-dimensional features is obtained by extracting the features of the image to be recognized based on the convolutional neural network, so that the features of the image to be recognized are converted into detailed features of a high layer from abstract features of a low layer, and different character types in the image to be recognized in a region are facilitated.

Specifically, in some possible embodiments, the character type recognition model includes a feature extraction layer, and the feature extraction layer includes: a first convolution layer, a first linear transformation layer and a pooling layer.

specifically, a Convolutional Neural Network (CNN) has a structure based on VGG16, and a plurality of preset first convolution kernels are superimposed to form a first convolution layer of the CNN. And performing convolution operation on the image to be identified through a preset first convolution core to obtain a first characteristic diagram. Wherein the predetermined first convolution kernel may employ a small-sized 3 x 3 convolution kernel.

on the basis of obtaining the first feature map, the electronic equipment linearly transforms the first feature map through a preset second convolution kernel to obtain a first abstract feature map with the size consistent with that of the image to be identified. Specifically, in some embodiments, the preset second convolution kernel uses a 1 × 1 convolution kernel to perform linear transformation, so that the number of input channels and the number of output channels are unchanged, and no dimension reduction occurs. And setting the step length of the convolution operation to be 1, and carrying out padding operation to ensure that the sizes before and after convolution are unchanged, so as to obtain a first abstract feature map which is consistent with the size of the image to be identified.

And performing maximum value pooling operation on the first abstract feature map subjected to the convolution operation through a pooling layer, thereby performing down-sampling to obtain the first target feature map containing the multi-dimensional features, and further reducing the size of the first abstract feature map.

And searching bottom-layer features such as edges, directions and the like from the image to be recognized through a feature extraction layer so as to be further abstracted into more specific high-layer features, thereby recognizing the image content. The feature maps are gradually changed from the bottom abstract features to the high-level detailed features through multiple convolution kernel pooling calculations, and the feature maps contain the multi-dimensional features of images and can be used for distinguishing character pictures of natural scenes, printed character scenes or handwritten character scenes.

It should be noted that, the character type recognition model of this embodiment further includes: a scene classification layer. And the scene classification layer is used for carrying out scene classification on the first target characteristic diagram output by the characteristic extraction layer to obtain character type information of the characters to be recognized.

And carrying out full convolution operation on the first target characteristic diagram to obtain a one-dimensional characteristic vector. And classifying the one-dimensional characteristic vectors by using a softmax algorithm layer to obtain a classification result, namely probability information of each scene. And taking the class with the maximum probability value as a final scene classification result of the image and outputting the class as a character type identification model.

In some possible embodiments, the scene classification layer includes: a second convolution layer for calculating a classification layer and a classification output layer.

And the second convolution layer is used for carrying out full convolution operation on the first target characteristic diagram to obtain a first target one-dimensional characteristic vector.

Specifically, the present embodiment mainly takes the example of recognizing the images to be recognized of three character types as an example. Therefore, the first target feature map extracted by the feature extraction layer passes through the full connection layers with 3 neuron numbers of 4096, 4096 and 3, respectively, and becomes a first target one-dimensional feature vector.

and calculating the first target one-dimensional feature vector by using a softmax function to obtain the probability of each scene classification. softmax maps the image to be recognized onto 3 classes, each of which has an output constrained to be between 0 and 1.

And the electronic equipment outputs the class with the maximum probability value as a final scene classification result, and determines whether the character picture belongs to a natural scene, a printed character scene or a handwritten character scene.

For example, the first target one-dimensional feature vectors obtained in the second convolutional layer are-4, 2.5, 3.7. The first target one-dimensional feature vectors-4, 2.5, 3.7 represent a natural scene, a printed text scene, and a handwritten text scene, respectively. Calculating the probability of each scene classification through calculating softmax function pairs (-4, 2.5 and 3.7) of the classification layers, and obtaining the probability of the scene classification corresponding to-4 as 0.0056; 2.5 corresponding to a scene classification probability of 0.2507; 3.7 corresponds to a scene classification probability of 0.7437. And outputting the handwritten character scene with the highest probability value in (0.0056, 0.2507, 0.7437) as a final scene classification result. Namely, the character type output by the character type identification model is the handwritten character at this time.

Step 400, determining a character detection and identification model corresponding to the character type information; and carrying out character recognition on the characters to be recognized of the images to be recognized based on the character detection and recognition model.

On the basis of character type information obtained through character type identification model identification, the electronic equipment determines a character detection identification model corresponding to the character type information.

In some possible embodiments, referring to fig. 2, when the text type information is a text in a natural scene, the text detection and recognition model includes an EAST model and a CRNN model, and the step 400 of performing text recognition on the text to be recognized of the image to be recognized based on the text detection and recognition model includes:

step 410, detecting a first text region position of the natural scene text in the image to be recognized based on an EAST model;

it should be noted that the natural scene character recognition is to perform character detection on a natural scene picture, and acquire a character region position (or called a text position) in the picture, thereby performing subsequent character recognition work.

Inputting an image to be recognized, positioning the text position of the image to be recognized by utilizing an EAST model, and outputting all character position coordinates on the image to be recognized.

Characters such as billboards and shop signs are mostly short texts due to natural scenes. The EAST model is adapted to detect short text in natural scenes and can recognize oblique text characters. The EAST model predicts the image at pixel level by adopting the idea of FCN full convolution neural network, and finally outputs the coordinates of the text area.

In one possible embodiment, detecting the first text region position of the text in the natural scene in the image to be recognized based on the EAST model comprises the following steps:

step 411, analyzing the input image to be identified into an RBG three-dimensional pixel matrix.

The requirement of the model is met by analyzing the input image to be identified into an RBG three-dimensional pixel matrix.

And step 412, extracting the features of the RBG three-dimensional pixel matrix through a convolutional neural network.

As the EAST model adopts the structure of the FCN full convolution neural network, the RBG three-dimensional pixel matrix is firstly subjected to feature extraction through CNN, and the feature map obtained in each convolution layer is stored. The CNN can adopt a network structure of VGG16 to gradually extract features from a shallow layer to a deep layer, and the feature graph output by each layer corresponds to a feature graph from a bottom layer to a top layer.

And 413, performing feature fusion on the feature graph subjected to feature extraction by the convolutional neural network to obtain a global feature graph.

And the final global feature maps are restored to the same size as the original picture by fusing and upsampling the feature maps extracted by the convolutional neural network one by one. The deep layer features and the shallow layer features are reserved by carrying out feature fusion, so that characters with different sizes are positioned better.

And step 414, outputting the global feature map.

The global feature map outputs a prediction result through an output layer. In the output layer, three operations are performed: text detection, text box detection, and angle detection. And merging the text boxes meeting the threshold value by using NMS non-maximum value inhibition to obtain the final text position box coordinate. Therefore, the position of the first character area of the characters of the natural scene in the image to be recognized is detected through the EAST model.

Referring to fig. 3, fig. 3 is a block diagram illustrating a flow chart of detecting a position of a first text region of a natural scene text in the image to be recognized based on the EAST model according to the embodiment.

In some other embodiments, referring to fig. 4, when the text type information is a printed text or a handwritten text, the text detection and recognition model includes a CTPN model and a CRNN model, and the step 400 of performing text recognition on the text to be recognized of the image to be recognized based on the text detection and recognition model includes:

step 420, detecting a second character area position of the printed characters or the handwritten characters in the image to be recognized based on the CTPN model;

and performing character detection on the image to be identified including the printed characters through a CTPN (text detection based on a network connected with a preselected frame) model, and acquiring a second character area position (or called text position) in the image to be identified so as to perform subsequent character identification work. Inputting an image to be recognized, positioning the text position by using the CTPN model, and outputting all text position coordinates of the natural scene characters on the image to be recognized.

Printed or handwritten text often has a long text length. The CTPN is composed of a CNN + RNN + RPN network, and the addition of the RNN structure enables the CTPN to better position a long text line. Therefore, the CTPN model can perform better text position detection on the printed characters or the handwritten characters.

In one possible embodiment, the step 420 of detecting the second text region position of the printed text or the handwritten text in the image to be recognized based on the CTPN model comprises the following steps:

step 421, analyzing the input image to be identified into an RBG three-dimensional pixel matrix.

Step 422, extracting the characteristics of the RBG three-dimensional pixel matrix through a convolutional neural network to obtain a spatial characteristic diagram; the spatial feature map represents at least one of an edge, a contour, or a corner feature of the image to be identified.

The pixel matrix of the image to be identified is convoluted by using a CNN convolution neural network to obtain the characteristics such as edges, outlines or corners, and the characteristics form a series of spatial characteristic maps.

And step 423, constructing a serialized combined feature map by performing feature transformation on the spatial feature map.

Specifically, the spatial feature map can be subjected to feature transformation by using a sliding window, and a serialized feature map is constructed to meet the requirements of the RNN recurrent neural network. BLSTM is an RNN algorithm model that can extract sequence features of text. And forming a combined feature map after BLSTM processing, wherein the combined feature map comprises both spatial features and sequence features.

And 424, performing a confirmed text position box operation on the combined feature graph based on the RPN.

And performing feature transformation on the combined feature map through a full connection layer, inputting the combined feature map into an RPN (Region probable Network) to perform character Region positioning so as to obtain a series of text candidate regions, and finally merging the candidate regions to obtain a final text position box.

Referring to fig. 5, fig. 5 is a block diagram illustrating a flow chart of detecting a position of a second text region of a printed text or a handwritten text in the image to be recognized based on the CTPN model according to the present embodiment.

Referring to fig. 6, fig. 6 is a schematic diagram of an RPN network, wherein the sizes, numbers and widths of the Anchor Boxes are preset. In the RPN, the classification layer classifies the characteristic region where each Anchor Box is located, judges whether characters exist in the characteristic region, and outputs a classification result and confidence. And finally, merging the same category areas meeting a certain confidence coefficient by using an NMS maximum value inhibition method, thereby obtaining the position of a second character area of the printed characters or the handwritten characters in the image to be recognized.

After the text position of the character to be recognized in the image to be recognized is recognized, the character content of the character to be recognized needs to be recognized at the text position.

Specifically, referring to fig. 7, fig. 7 is a block diagram illustrating a flow chart of performing text recognition on an image to be recognized through a CRNN model according to this embodiment. And performing character recognition on the text position in the positioned image to be recognized through the CRNN model, recognizing the character content of the text position, and outputting the text on the whole picture after final processing. Inputting an image to be identified, performing character identification through a CRNN model, and outputting text contents in the image.

For the natural scene text, after detecting the first text region position of the natural scene text in the image to be recognized based on the EAST model, step 410 includes:

and 430, identifying the text content of the natural scene text at the position of the first text area based on the CRNN model.

For the printed text or the handwritten text, after detecting the position of the second text area of the printed text or the handwritten text in the image to be recognized based on the CTPN model in step 420, the method includes:

and 440, identifying the text content of the printed text or the handwritten text at the position of the second text area based on the CRNN model.

In the embodiment, for printed characters or handwritten characters, the character content is recognized through the CRNN model. This embodiment is described by way of an exemplary embodiment.

Step 401, analyzing the input image to be identified into an RBG three-dimensional pixel matrix.

And step 402, extracting a characteristic sequence from the RBG three-dimensional pixel matrix.

Extracting a characteristic sequence from a pixel matrix of an image to be identified, specifically, the characteristic sequence comprises preprocessing of an RBG three-dimensional pixel matrix, convolution operation and operation of extracting sequence characteristics.

And step 403, predicting the label distribution of each feature vector in the feature sequence through the bidirectional LSTM recurrent neural network.

The cyclic layer is composed of a bidirectional LSTM cyclic neural network, the label distribution of each feature vector in the feature sequence is predicted, and the width of the sequence is taken as the time scales of the LSTM in the model.

And 403, converting into a final identification result based on the label distribution.

And converting the label distribution acquired from the circulation layer into a final identification result through operations of de-duplication, integration and the like. And connecting a CTC module at the last of the bidirectional LSTM network layer in the CRNN model, thereby realizing end-to-end identification. And further realizing the recognition of the characters to be recognized in the image to be recognized.

Referring to fig. 8, in summary, for images to be recognized of different character types, in the embodiment, the character content is recognized according to the character detection recognition model corresponding to the character type.

a) For the natural character scene picture, character detection is carried out on the natural scene characters by utilizing an EAST model to obtain regional position information of the characters; then, identifying the text content and the corresponding position information thereof by using a CRNN algorithm;

b) for the handwritten character scene picture, carrying out character detection on the handwritten character by utilizing a CTPN algorithm to obtain the regional position information of the character; then, identifying the text content and the corresponding position information by using a CRNN algorithm;

c) for the scanned character scene picture, performing character detection on the scanned characters by utilizing a CTPN algorithm to obtain the regional position information of the characters; then, the CRNN algorithm is utilized to identify the text content and the corresponding position information.

Referring to fig. 9, fig. 9 is a block diagram illustrating an overall flow of the character recognition method according to the present embodiment.

The embodiment determines the character type information of the characters to be recognized in the images to be recognized through a character type recognition model; then determining a character detection and identification model corresponding to the character type information; and carrying out character recognition on the characters to be recognized of the images to be recognized based on the character detection recognition model.

In other aspects of the present application, referring to fig. 10, before determining the image to be recognized in step 200, the text recognition method further includes: and 100, training the character type recognition model. Referring to fig. 11, step 100, the training of the character type recognition model includes:

step 110, obtaining a sample image, wherein the sample image comprises characters to be identified;

determining, by the electronic device, a sample image including text to be recognized. The sample image may be a plurality of types or kinds of image text. For example, (1) text is printed conventionally, such as by scanning text in a document. (2) Text in natural scenes, such as billboard recognition in street view photographs, etc.; (3) and (4) handwriting characters, namely characters which are handwritten by a user.

In the embodiment, the VGG16 is used as a CNN infrastructure, and 200 scene classification data sets are used as training sets. Namely, 200 sample images of natural scene characters, 200 sample images of printed characters, and 200 sample images of handwritten characters.

Step 120, extracting features of the sample image based on a convolutional neural network to obtain a second target feature map containing multi-dimensional features;

and performing feature extraction on the sample image based on a convolutional neural network through electronic equipment to obtain a second target feature map containing multi-dimensional features. Specifically, step 120, performing feature extraction on the sample image based on the convolutional neural network to obtain a second target feature map including multi-dimensional features, includes:

step 121, performing convolution operation on the sample image through a preset first convolution kernel to obtain a second feature map;

the electronic device may perform a convolution operation on the sample image by a small-sized 3 x 3 convolution kernel, resulting in a second feature map.

Step 122, linearly transforming the second feature map through a preset second convolution kernel to obtain a second abstract feature map with the size consistent with that of the image to be identified;

the electronic device can adopt 1 × 1 convolution kernel to make linear conversion, so that the number of input channels and the number of output channels are unchanged, and dimension reduction does not occur. And setting the step size of the convolution operation to be 1, and carrying out padding operation to ensure that the sizes before and after convolution are unchanged, so as to obtain a second abstract feature map which is consistent with the size of the sample image.

And 123, performing maximum pooling on the second abstract feature map to obtain the second target feature map containing the multi-dimensional features.

And performing maximum value pooling operation on the second abstract feature map after the convolution operation, thereby performing down-sampling to obtain a second target feature map containing the multi-dimensional features, and further reducing the size of the second abstract feature map.

And step 130, carrying out scene classification on the second target characteristic diagram to obtain character type information of characters to be identified of the sample image.

And the electronic equipment carries out scene classification on the second target characteristic diagram to obtain character type information of characters to be recognized of the sample image. Specifically, the step 130 of performing scene classification on the second target feature map to obtain the text type information of the text to be recognized of the sample image includes:

131, performing full convolution operation on the second target feature map to obtain a second target one-dimensional feature vector;

and the second target feature map is changed into a second target one-dimensional feature vector after passing through 3 full-connected layers with the numbers of the neurons being 4096, 4096 and 3 respectively.

Step 132, calculating the second target one-dimensional feature vector through a softmax function to obtain scene classification probabilities of multiple types;

and calculating the one-dimensional feature vector of the second target by using a softmax function to obtain the probability of each scene classification. softmax will map the image to be recognized onto 3 categories,

step 133, obtaining a scene classification corresponding to the maximum classification probability from the scene classification probabilities of the plurality of categories, and outputting the scene classification corresponding to the maximum scene classification probability as the character category information.

And obtaining the scene classification corresponding to the maximum scene classification probability from the scene classification probabilities of the multiple categories, and outputting the scene classification corresponding to the maximum scene classification probability as character category information.

In other aspects of the present application, after the step 200 of determining the image to be identified, the method further includes preprocessing the image to be identified;

step 210, preprocessing the image to be recognized, including:

step 211, converting the image to be identified into a pixel matrix;

and reading the image to be recognized, wherein the image to be recognized is an RGB image. Converting an image to be identified into a three-dimensional pixel matrix in a (w, h, c) form by using opencv software, wherein w is the width of the image to be identified, h is the height of the image to be identified, and c is the number of image channels;

step 212, adjusting the size of the pixel matrix to meet the input requirement of the convolutional neural network;

specifically, the width and height of the image to be recognized are fixed to a size of 224 × 224, which is a fixed input size of the CNN network, by a resize operation.

Step 213, deleting the abnormal image to be identified; the abnormal image to be recognized comprises the image to be recognized with data missing or bitmap error.

And eliminating the unqualified images to be input, such as the images to be recognized with missing data and the images to be recognized with wrong bitmaps.

The image to be recognized is ensured to meet the input requirement of the convolutional neural network by preprocessing the image to be recognized, so that the subsequent steps are smoothly carried out.

The following describes the character recognition device provided by the present invention, and the character recognition device described below and the character recognition method described above may be referred to correspondingly.

Referring to fig. 12, the present invention provides a character recognition apparatus, including:

a first determining module 201, configured to determine an image to be recognized, where the image to be recognized includes characters to be recognized;

the character type obtaining module 202 is configured to obtain character type information, and input the image to be recognized to a character type recognition model to obtain the character type information; the character type recognition model performs feature extraction on the image to be recognized based on a convolution neural network to obtain a target feature map containing multi-dimensional features, and performs character type recognition on characters to be recognized of the image to be recognized; the character type recognition model is obtained by training sample images of characters to be recognized based on multiple types;

a second determining module 203, configured to determine a text detection and recognition model corresponding to the text type information; and carrying out character recognition on the characters to be recognized of the images to be recognized based on the character detection recognition model.

On the basis of the foregoing embodiments, as an optional embodiment, the text type recognition model in the text type acquisition module includes a feature extraction layer, and the feature extraction layer includes:

On the basis of the foregoing embodiments, as an optional embodiment, the text type identification model in the text type obtaining module further includes:

On the basis of the foregoing embodiments, as an optional embodiment, the scene classification layer includes:

On the basis of the above embodiments, as an optional embodiment, the character recognition apparatus further includes a training module for training the character type recognition model; the training module comprises:

the device comprises a first acquisition module, a second acquisition module and a recognition module, wherein the first acquisition module is used for acquiring a sample image, and the sample image comprises characters to be recognized;

the characteristic diagram acquisition module is used for extracting the characteristics of the sample image based on a convolutional neural network to obtain a second target characteristic diagram containing multi-dimensional characteristics;

and the scene classification module is used for carrying out scene classification on the second target characteristic graph to obtain character type information of characters to be recognized of the sample image.

On the basis of the foregoing embodiments, as an optional embodiment, the feature map obtaining module includes:

the first sub-feature map acquisition module is used for carrying out convolution operation on the sample image through a preset first convolution core to obtain a second feature map;

the second sub-feature map acquisition module is used for linearly transforming the second feature map through a preset second convolution core to obtain a second abstract feature map with the size consistent with that of the image to be identified;

and the third sub-feature map acquisition module is used for performing maximum pooling on the second abstract feature map to obtain the second target feature map containing the multi-dimensional features.

On the basis of the foregoing embodiments, as an optional embodiment, the scene classification module includes:

the one-dimensional feature vector acquisition module is used for carrying out full convolution operation on the second target feature map to obtain a second target one-dimensional feature vector;

the scene classification probability acquisition module is used for calculating the second target one-dimensional feature vector through a softmax function to obtain scene classification probabilities of multiple types;

and a character type output module for obtaining a scene classification corresponding to a maximum classification probability from the scene classification probabilities of the plurality of types, and outputting the scene classification corresponding to the maximum scene classification probability as the character type information.

On the basis of the above-described embodiments, as an alternative embodiment,

when the text type information is a text in a natural scene, the text detection and identification model includes an EAST model and a CRNN model, and the second determination module includes:

the first character area position determining module is used for detecting a first character area position of the natural scene characters in the image to be recognized based on the EAST model;

and the first identification module is used for identifying the text content of the natural scene text at the position of the first text area based on the CRNN model.

On the basis of the foregoing embodiments, as an optional embodiment, when the text type information is printed text or handwritten text, the second determining module includes:

the second character area position determining module is used for detecting the position of a second character area of the printed characters or the handwritten characters in the image to be recognized based on the CTPN model;

and the second identification module is used for identifying the text content of the printed text or the handwritten text at the position of the second text area based on the CRNN model.

On the basis of the above embodiments, as an optional embodiment, the character recognition apparatus further includes a preprocessing module for preprocessing the image to be recognized;

the preprocessing module comprises:

the conversion module is used for converting the image to be identified into a pixel matrix;

the size adjusting module is used for adjusting the size of the pixel matrix to meet the input requirement of the convolutional neural network;

the deleting module is used for deleting the abnormal image to be identified; the abnormal image to be recognized comprises the image to be recognized with data missing or bitmap error.

Fig. 13 illustrates a physical structure diagram of an electronic device, and as shown in fig. 13, the electronic device may include: a processor (processor)810, a communication Interface 820, a memory 830 and a communication bus 840, wherein the processor 810, the communication Interface 820 and the memory 830 communicate with each other via the communication bus 840. Processor 810 may invoke logic instructions in memory 830 to perform a text recognition method comprising: determining an image to be recognized, wherein the image to be recognized comprises characters to be recognized; acquiring character type information, and inputting the image to be recognized into a character type recognition model to obtain the character type information; the character type recognition model performs feature extraction on the image to be recognized based on a convolutional neural network to obtain a target feature map containing multi-dimensional features, and performs character type recognition on characters to be recognized of the image to be recognized; the character type recognition model is obtained by training based on sample images of characters to be recognized of multiple types; determining a character detection and identification model corresponding to the character type information; and carrying out character recognition on the characters to be recognized of the images to be recognized based on the character detection and recognition model.

In addition, the logic instructions in the memory 830 may be implemented in software functional units and stored in a computer readable storage medium when the logic instructions are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

In another aspect, the present invention also provides a computer program product, the computer program product including a computer program, the computer program being stored on a non-transitory computer-readable storage medium, wherein when the computer program is executed by a processor, a computer is capable of executing the text recognition method provided by the above methods, and the method includes: determining an image to be recognized, wherein the image to be recognized comprises characters to be recognized; acquiring character type information, and inputting the image to be recognized into a character type recognition model to obtain the character type information; the character type recognition model performs feature extraction on the image to be recognized based on a convolutional neural network to obtain a target feature map containing multidimensional features, and performs character type recognition on characters to be recognized of the image to be recognized; the character type recognition model is obtained by training sample images of characters to be recognized based on multiple types; determining a character detection and identification model corresponding to the character type information; and performing character recognition on the characters to be recognized of the images to be recognized based on the character detection recognition model.

In yet another aspect, the present invention also provides a non-transitory computer-readable storage medium having stored thereon a computer program, which when executed by a processor, implements a method of text recognition provided by the above methods, the method comprising: determining an image to be recognized, wherein the image to be recognized comprises characters to be recognized; acquiring character type information, and inputting the image to be recognized to a character type recognition model to obtain the character type information; the character type recognition model performs feature extraction on the image to be recognized based on a convolutional neural network to obtain a target feature map containing multidimensional features, and performs character type recognition on characters to be recognized of the image to be recognized; the character type recognition model is obtained by training sample images of characters to be recognized based on multiple types; determining a character detection and identification model corresponding to the character type information; and carrying out character recognition on the characters to be recognized of the images to be recognized based on the character detection recognition model.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement such a technique without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment may be implemented by software plus a necessary general hardware platform, and may also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, and not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A method for recognizing a character, comprising:

acquiring character type information, and inputting the image to be recognized into a character type recognition model to obtain the character type information; the character type recognition model performs feature extraction on the image to be recognized based on a convolutional neural network to obtain a target feature map containing multi-dimensional features, and performs character type recognition on characters to be recognized of the image to be recognized; the character type recognition model is obtained by training sample images of characters to be recognized based on multiple types;

determining a character detection and identification model corresponding to the character type information; and carrying out character recognition on the characters to be recognized of the images to be recognized based on the character detection and recognition model.

2. The character recognition method of claim 1, wherein the character type recognition model comprises a feature extraction layer, the feature extraction layer comprising:

3. The method of claim 1, wherein the text type recognition model further comprises:

4. The method of claim 3, wherein the scene classification layer comprises:

the second convolution layer is used for carrying out full convolution operation on the first target characteristic diagram to obtain a first target one-dimensional characteristic vector;

5. The character recognition method of claim 1, wherein before the image to be recognized is determined, the character recognition method further comprises a step of training the character type recognition model; the training of the character type recognition model comprises:

and carrying out scene classification on the second target characteristic graph to obtain character type information of characters to be recognized of the sample image.

6. The character recognition method of claim 5, wherein the extracting features of the sample image based on the convolutional neural network to obtain a second target feature map containing multi-dimensional features comprises:

7. The method according to claim 5, wherein the performing scene classification on the second target feature map to obtain text type information of the text to be recognized in the sample image comprises:

8. The character recognition method of claim 1, wherein the character detection recognition model corresponding to the character type information is determined; based on the character detection and recognition model, the character recognition is carried out on the characters to be recognized of the images to be recognized, and the character recognition method comprises the following steps:

9. The method according to claim 1, wherein when the text type information is a printed text or a handwritten text, the text detection recognition model includes a CTPN model and a CRNN model, and the performing text recognition on the text to be recognized of the image to be recognized based on the text detection recognition model includes:

10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the steps of the character recognition method according to any one of claims 1 to 9 are implemented when the processor executes the program.