CN114943958A

CN114943958A - Character recognition method, character recognition device, computer equipment and storage medium

Info

Publication number: CN114943958A
Application number: CN202210384895.8A
Authority: CN
Inventors: 申啸尘; 周有喜
Original assignee: Shenzhen Aishen Yingtong Information Technology Co Ltd
Current assignee: Shenzhen Aishen Yingtong Information Technology Co Ltd
Priority date: 2022-04-13
Filing date: 2022-04-13
Publication date: 2022-08-26

Abstract

The application provides a character recognition method, a device, a computer device and a storage medium, wherein the method comprises the following steps: acquiring an image to be recognized, wherein the image to be recognized comprises characters to be recognized; in the process of extracting the features of the image to be recognized through a feature extraction network, performing target vector replacement on at least one feature map extracted by the feature extraction network to obtain a target feature map corresponding to the image to be recognized; the target vector is a vector arranged in the character direction in the feature map, and the target vector replacement is to replace another target vector in the feature map by using one target vector in the feature map; and performing character recognition based on the target feature map to determine characters in the image to be recognized. The technical scheme fully establishes the incidence relation between the characters and improves the accuracy of character recognition.

Description

Character recognition method, character recognition device, computer equipment and storage medium

Technical Field

The present application relates to the field of image recognition, and in particular, to a character recognition method, apparatus, computer device, and storage medium.

Background

Optical Character Recognition (OCR) refers to a process in which an electronic device (e.g., a scanner or a camera) detects characters printed on a paper document, determines the shape of the characters by detecting dark and light patterns, and then translates the shape into characters by a character recognition method. In a conventional OCR scheme, a projection method is generally used to cut out a single character, and then the cut-out character is sent to a Convolutional Neural Network (CNN) for classification.

With the development of technology, people propose an end-to-end OCR recognition scheme based on deep learning, that is, characters are not cut any more, but character recognition is converted into a problem of sequence learning, the cutting of characters is merged into the deep learning, text recognition is directly performed on a text image containing the characters, and the characters in the text image are determined. The deep learning-based end-to-end OCR recognition scheme is mainly characterized in that characters are output after translation and transcription are carried out on a prediction label through convolution feature extraction and sequence feature prediction. Because there is a certain association relation between characters, when character recognition is performed, if the correlation between characters can be established, the recognition efficiency can be improved. Therefore, how to establish the association relationship between characters becomes a technical problem which needs to be solved urgently.

Disclosure of Invention

The application provides a character recognition method, a character recognition device, computer equipment and a storage medium, so as to establish an incidence relation between characters and improve the accuracy of character recognition.

In a first aspect, a character recognition method is provided, including:

acquiring an image to be recognized, wherein the image to be recognized comprises characters to be recognized;

in the process of extracting the features of the image to be recognized through a feature extraction network, performing target vector replacement on at least one feature map extracted by the feature extraction network to obtain a target feature map corresponding to the image to be recognized; the target vector is a vector arranged in the character direction in the feature map, and the target vector replacement is that one target vector in the feature map is used for replacing another target vector in the feature map;

and performing character recognition based on the target feature map to determine characters in the image to be recognized. .

According to the technical scheme, after the image to be recognized is obtained, in the process of extracting the features of the image to be recognized through the feature extraction network, target vector replacement is carried out on the feature graph obtained by the feature extraction network, a target feature graph corresponding to the image to be recognized is obtained, and then character recognition is carried out on the basis of the target feature graph, so that characters in the image to be recognized are determined. The target vector is a vector arranged in the characteristic diagram along the character direction, the target vector replacement is to replace another target vector in the characteristic diagram by using one target vector in the characteristic diagram, the target vector replacement is performed on the characteristic diagram in the characteristic extraction process, namely, the vector arranged in the characteristic diagram along the character direction is replaced in the characteristic extraction process, and the vector arranged in the character direction can indicate part of characteristic information of a certain character, so that the relation among the characters can be established by replacing the vector arranged in the character direction in the characteristic diagram in the characteristic extraction process, the target characteristic diagram contains the correlation among the characters, and the character recognition accuracy can be improved.

With reference to the first aspect, in a possible implementation manner, the feature extraction network includes a plurality of feature extraction structures connected in sequence, each feature extraction structure includes at least one convolution layer, and in two adjacent feature extraction structures, a next feature extraction structure is used to perform feature extraction on a feature atlas output by a previous feature extraction structure; the performing target vector replacement on at least one feature map extracted by the feature extraction network to obtain a target feature map corresponding to the image to be recognized includes: acquiring a first feature atlas, wherein the first feature atlas is a feature atlas output by a first feature extraction structure in the feature extraction network, the first feature atlas comprises a plurality of feature maps, the first feature extraction structure is any one feature extraction structure in a preset structure set, and the preset structure set comprises at least one feature extraction structure in the feature extraction network; performing target vector replacement on at least one feature map in the first feature map set to update the first feature map set, and inputting the updated first feature map set to a second feature extraction structure to obtain a second feature map set, where the second feature extraction structure is a next feature extraction structure connected to the first feature extraction structure, and the second feature map set is a feature map set output by the second feature extraction structure; and determining a third feature map set or an updated third feature map set as the target feature map, wherein the third feature map set is a feature map set output by a last feature extraction structure in the feature extraction network. In the process of feature extraction, the feature map is updated by replacing the target vector of the partial feature map output by the partial feature extraction structure, so that the feature map not only contains the feature information of characters, but also contains the associated information among the characters, and the accuracy of character recognition can be improved.

With reference to the first aspect, in a possible implementation manner, the performing target vector replacement on at least one feature map in the first feature map set to update the first feature map set includes: in a first feature map, for each first target vector, replacing the first target vector with a target vector adjacent to the first target vector to obtain an updated first feature map, where the first feature map is a feature map in the first feature map set that needs to be replaced by the target vector, and the first target vector is any target vector to be replaced in the first feature map. Because the correlation between adjacent characters is higher, when the target vector is replaced, the adjacent target vectors are adopted for replacement, and the association relationship between the characters can be better established.

With reference to the first aspect, in a possible implementation manner, at least one feature map in the first feature map set includes 2n feature maps in the first feature map set, where n is a positive integer greater than or equal to 1; the performing target vector replacement on at least one feature map in the first feature map set to update the first feature map set includes: in a second feature map, for each second target vector, replacing the second target vector with a target vector adjacent to the second target vector in the first character direction to obtain an updated second feature map, where the second feature map is any one of n feature maps in the 2n feature maps, and the second target vector is any one target vector to be replaced in the second feature map; in a third feature map, for each third target vector, replacing the third target vector with a target vector adjacent to the third target vector in a second character direction to obtain an updated third feature map, where the third feature map is any one of n other feature maps in the 2n feature maps, and the third target vector is any one target vector to be replaced in the third feature map; the first character direction and the second character direction are two opposite character directions. When the target vectors are replaced, the target vectors adjacent to each other in the two character directions are respectively adopted for replacement, the incidence relation between the adjacent characters can be fully established, and the accuracy of character recognition is improved.

With reference to the first aspect, in a possible implementation manner, the n feature maps are first n feature maps in the first feature map set, and the n other feature maps are last n feature maps in the first feature map set.

With reference to the first aspect, in a possible implementation manner, the feature extraction network includes M feature extraction structures, the preset structure set includes an ith feature extraction structure in the feature extraction network, i is greater than or equal to 2 and is less than or equal to (M-1), and M is a positive integer greater than 4. The characteristic extraction structure without the first characteristic extraction structure and the last characteristic extraction structure in the characteristic extraction network is extracted to obtain characteristic graphs, target vector replacement is carried out on the characteristic graphs, and the association relation between the characters can be fully established.

With reference to the first aspect, in a possible implementation manner, before performing target vector replacement on at least one feature map extracted by the feature extraction network, the method further includes: determining the number of characters to be recognized contained in the image to be recognized; and determining the number of the at least one feature map according to the number of the characters to be recognized. Before the target vector replacement is carried out, the number of the characters to be recognized is determined to determine the number of the characters to be subjected to the target vector replacement, so that the association relation between the characters can be established more reasonably, and the character recognition accuracy is improved.

In a second aspect, there is provided a character recognition apparatus comprising:

the device comprises an image acquisition module, a recognition module and a recognition module, wherein the image acquisition module is used for acquiring an image to be recognized, and the image to be recognized comprises characters to be recognized;

the replacing module is used for replacing a target vector of at least one feature map extracted by the feature extraction network in the process of extracting the features of the image to be identified through the feature extraction network so as to obtain a target feature map corresponding to the image to be identified; the target vector is a vector arranged in the character direction in the feature map, and the target vector replacement is that one target vector in the feature map is used for replacing another target vector in the feature map;

and the character determining module is used for performing character recognition based on the target feature map so as to determine characters in the image to be recognized.

In a third aspect, there is provided a computer device comprising a memory and one or more processors for executing one or more computer programs stored in the memory, the one or more processors, when executing the one or more computer programs, causing the computer device to implement the character recognition method of the first aspect described above.

In a fourth aspect, a computer-readable storage medium is provided, in which a computer program is stored, the computer program comprising program instructions that, when executed by a processor, cause the processor to perform the character recognition method of the first aspect.

The application can realize the following technical effects: the target vector replacement of the feature map in the feature extraction process means that the vectors arranged along the character direction in the feature map are replaced in the feature extraction process, and the vectors arranged along the character direction can indicate part of feature information of a certain character, so that the association between characters can be established by replacing the vectors arranged along the character direction in the feature map in the feature extraction process, the target feature map contains the correlation between the characters, and the character recognition accuracy can be improved.

Drawings

FIG. 1 is a schematic diagram of an OCR recognition system;

fig. 2 is a schematic flowchart of a character recognition method according to an embodiment of the present application;

FIG. 3 illustrates a process for target vector replacement of a feature map;

fig. 4 is a schematic diagram of a specific configuration of a feature extraction network according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of a character recognition apparatus according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.

The technical scheme of the application can be applied to various OCR recognition scenes. The technical scheme of the application can be particularly applied to various types of computer equipment, such as notebook computers, servers and the like.

For ease of understanding, a recognition scheme for OCR recognition will first be described. Referring to fig. 1, fig. 1 is a schematic diagram of an OCR recognition system. As shown in fig. 1, the OCR recognition system 10 is a cyclic convolution network (CRNN), which specifically includes three parts, from top to bottom, a convolution structure 101, a cyclic structure 102 and a transcription structure 103, where the convolution structure 101 includes a plurality of convolution layers for performing feature extraction on an input text image to obtain a feature map; the loop structure 102 includes a bidirectional long-and-short term memory network (BiLSTM) for converting the feature map into a feature sequence, predicting the feature sequence, and outputting a predicted tag distribution; the transcription structure 103 is used to classify (CTC) loss based on the timing of a neural network, convert the predicted tag distribution into a final tag sequence, and obtain a text in a text image based on the tag sequence. In the CRNN network, the features are time-sequentially associated through the BiLSTM in the loop structure 102, and this processing method can establish the association relationship between the characters to a certain extent, but since the feature extraction only involves extracting the features of the characters themselves when extracting the features, the association relationship between the characters established only through the BiLSTM may be insufficient, that is, the association degree between the established characters is not strong enough, which may bring certain errors to character recognition.

In view of the above, the present application provides a new technical idea, before performing sequence feature prediction based on a feature map and translating and transcribing a prediction tag to output characters in an image, in the process of extracting features from the image to obtain the feature map, performing target vector replacement on the feature map extracted by a feature extraction network, that is, replacing vectors arranged in the character direction in the feature map, so that the feature map includes features of each character and an association relationship between characters, and by establishing the association relationship between characters in a feature extraction stage, the association degree between characters can be improved, thereby facilitating better sequence feature prediction and character output, and improving the accuracy of character recognition.

The technical solution of the present application is specifically described below.

Referring to fig. 2, fig. 2 is a schematic flowchart of a character recognition method provided in an embodiment of the present application, where the method is applicable to a computer device, and as shown in fig. 2, the method includes the following steps:

s201, acquiring an image to be identified.

Here, the image to be recognized refers to an image in which characters need to be recognized, the image to be recognized includes the characters to be recognized, and the characters to be recognized refer to characters that need to be recognized, including but not limited to chinese characters, english characters, numeric characters, and the like.

Specifically, the image to be recognized can be obtained by shooting or scanning a paper document containing characters; or, the image may also be obtained by capturing an image of a file scanned into an electronic file, and the method for obtaining the image to be identified is not limited in the present application.

S202, in the process of extracting the features of the image to be recognized through the feature extraction network, target vector replacement is carried out on at least one feature map extracted through the feature extraction network so as to obtain a target feature map corresponding to the image to be recognized.

In the embodiment of the application, the process of extracting the features of the image to be recognized through the feature extraction network refers to a process of inputting the image to be recognized into the feature extraction network, and recognizing and extracting various image features (such as color features, texture features, size features, spatial features and the like) of the image to be recognized through a method or a structure for extracting the image features, such as network operators or convolution kernels and the like in the feature extraction network, so as to obtain a feature map capable of representing the various image features of the image to be recognized. Illustratively, the feature extraction network may be the convolution structure 101 of FIG. 1 described previously.

The target vector is a vector arranged in the character direction in the feature map. Specifically, in the case where the character direction is horizontal (i.e., the characters are arranged horizontally in the image to be recognized), the target vector may be a vector arranged horizontally in the feature map, i.e., the target vector is a column vector in the feature map; in the case where the character direction is vertical (i.e., the characters are vertically arranged in the image to be recognized), the target vector is a vector vertically arranged in the feature map, i.e., the target vector is a row vector in the feature map.

And target vector replacement, namely replacing one target vector in the feature map by another target vector in the feature map. Taking the target vector as a column vector in the feature map as an example, the target vector replacement means replacing one column vector in the feature map with another column vector in the feature map. Performing target vector replacement on the feature map, namely performing one or more target vector replacement processes in the feature map to obtain a new feature map; wherein, the target vector replaced in each replacing process is different. The number of times of the target vector replacement process in the feature map and the target vector to be replaced in the target feature map can be set according to specific requirements. Exemplarily, referring to fig. 3, fig. 3 shows a process of target vector replacement for a feature map. For the feature map T1 in fig. 3, replacing the column vector L2 in the feature map T1 with the column vector L3 in the feature map T1, replacing the column vector L1 in the feature map T1 with the column vector L2 in the feature map T1, replacing the column vector L4 in the feature map with the column vector L3 in the feature map, replacing the column vector L5 in the feature map with the column vector L4 in the feature map, and replacing the column vector L1 in the feature map with the column vector L5 in the feature map; after 5 times of target vector replacement process, a feature map T2 can be obtained, and the feature map T2 is a new feature map corresponding to the feature map T1. The characteristic diagram is obtained by extracting the characteristics of the image to be recognized, the characteristic diagram is used for reflecting the image characteristics of the image to be recognized, the target vector is a vector arranged in the characteristic diagram along the character direction and used for indicating the characteristic information of a certain character, and the characteristic information of other characters can be merged into the characteristic diagram by performing one or more target vector replacement processes in the characteristic diagram, so that the relation between the characters is established.

In some possible cases, the feature extraction network is based on a convolutional neural network to perform feature extraction on the image to be recognized so as to obtain a feature map. The feature extraction network comprises feature extraction structures which are connected in sequence, each feature extraction structure comprises at least one convolution layer, and in two adjacent feature extraction structures, the next feature extraction structure is used for performing feature extraction on a feature atlas output by the previous feature extraction. Target vector replacement can be performed on at least one feature map extracted by the feature extraction network through the following steps a1-a4 to obtain a target feature map corresponding to the image to be recognized.

And A1, acquiring a first feature atlas.

And A2, performing target vector replacement on at least one feature map in the first feature map set to update the first feature map set.

And A3, inputting the updated first feature atlas into the second feature extraction structure to obtain a second feature atlas.

And A4, determining the third feature map set or the updated third feature map set as a target feature map corresponding to the image to be recognized.

The first feature map set in step a1 is a feature map set output by a first feature extraction structure in the feature extraction network structure, where the first feature map set includes multiple feature maps, the first feature extraction structure is any one feature extraction structure in a preset structure set, and the preset structure set includes at least one feature extraction structure in the feature extraction network. The first feature extraction structure can be understood as a feature extraction structure which is preset in a feature extraction network and needs to be subjected to target vector replacement; and presetting a structure set, and extracting the structure set for the features needing target vector replacement. The preset feature extraction structure requiring target vector replacement can be set according to the structure and specific requirements of the feature extraction network.

In a specific implementation manner, the feature extraction network includes M feature extraction structures, M is a positive integer greater than 4, the preset structure set may include an ith feature extraction structure in the feature extraction network, i is greater than or equal to 2 and less than or equal to (M-1), that is, the 2 nd feature extraction structure and the 3 rd (M-1) th feature extraction structure … … in the feature extraction network are all feature extraction structures that need to be replaced by target vectors, that is, the first feature extraction structure.

Taking the feature extraction network as the convolution structure 101 in fig. 1 as an example, a specific configuration of the convolution structure 101 may specifically include 5 feature extraction structures, namely, a feature extraction structure Q1, a feature extraction structure Q2, a feature extraction structure Q3, a feature extraction structure Q4, and a feature extraction structure Q5, as shown in fig. 4. The feature extraction structure Q1 comprises a convolution layer and a maximum pooling layer, and is used for performing feature extraction on 1 text image grayscale map of 32 × 160 to obtain 64 feature maps of 16 × 80; the feature extraction structure Q2 includes a convolution layer and a maximum pooling layer, and is used for performing feature extraction on 64 16 × 80 feature maps to obtain 128 8 × 40 feature maps; a feature extraction structure Q3, including 2 convolutional layers and one max pooling layer, for performing feature extraction on 128 features of 8 × 40, to obtain 256 features of 4 × 40; the feature extraction structure Q4 includes 2 convolutional layers and one max pooling layer, and is used for performing feature extraction on 256 4 × 40 feature maps to obtain 512 2 × 40 feature maps; the feature extraction structure Q5 includes a convolution layer, and is used to perform feature extraction on 512 features of 2 × 40, so as to obtain 512 features of 1 × 40. For the feature extraction network shown in fig. 4, the feature extraction structure Q2, the feature extraction structure Q3, and the feature extraction structure Q4 may be determined as the feature extraction structure to be performed, i.e., the first feature extraction structure. Corresponding to the first feature extraction structure, the first feature map set may include 64 16 × 80 feature maps output by the feature extraction structure Q2, or 128 8 × 40 feature maps output by the feature extraction structure Q3, or 512 2 × 40 feature maps output by the feature extraction structure Q4.

The characteristic images obtained by extracting the characteristic extracting structures except the first characteristic extracting structure and the last characteristic extracting structure in the characteristic extracting network are subjected to target vector replacement, so that the incidence relation between the characters can be fully established.

In one possible implementation manner, for step a2, target vector replacement may be performed on at least one feature map in the first feature map set based on neighboring target vector replacement.

For a feature map (hereinafter referred to as a first feature map) in the first feature map set, for each target vector to be replaced (hereinafter referred to as a first target vector) in the first feature map, replacing the first target vector with a target vector adjacent to the first target vector to obtain an updated first feature map.

Specifically, which feature maps in the first feature map set are determined as the first feature maps and which target vectors in the first feature maps are determined as the first target vectors may be set according to specific requirements. Specifically, a preset number of feature maps in the first feature map set may be determined as the first feature map. Or, the first feature maps are selected in proportion, and the number of the first features is equal to the product of the number of the features in the first feature map set and a preset proportion. The first feature map is selected in proportion, so that the feature map output by the first feature extraction structure can be ensured to contain original features of characters and associated features between adjacent characters, and the accuracy of character recognition is improved. Specifically, all target vectors in the first feature map or other target vectors except the first and last two target vectors may be determined as the first target vector. In this way, as many target vectors as possible in the first feature map can be replaced, and the association relationship between adjacent characters can be fully established.

Since there are two target vectors adjacent to some target vectors in one feature map (e.g., column vector L2 in fig. 3), the number of feature maps to be replaced by target vectors may be set to 2n, where n is a positive integer greater than or equal to 1, in order to establish the relationship between the two adjacent target vectors; for any feature map (hereinafter referred to as a second feature map) of the n feature maps requiring target vector replacement, in the second feature map, for each target vector to be replaced (hereinafter referred to as a second target vector) in the second feature map, replacing the second target vector with a target vector adjacent to the second target vector in the first character direction to obtain an updated second feature map; for any feature in the n feature maps (hereinafter referred to as a third feature map) that needs to be replaced by the target vector, in the third feature map, for each target vector to be replaced in the third feature map (hereinafter referred to as a third target vector), the third target vector may be replaced by a target vector adjacent to the third target vector in the second character direction, so as to obtain an updated third feature map.

The first character direction and the second character direction are two opposite character directions. Taking the character direction as the horizontal direction as an example, the first character direction may be from left to right, and the second character direction may be from right to left; alternatively, the first character direction may be from right to left and the second character direction may be from left to right. The target vectors to be replaced may be other target vectors except the first and last two target vectors in the feature map to be replaced with the feature target vectors, that is, the second target vector is other target vectors except the first and last two target vectors in the second feature map, and the third target vector is other target vectors except the first and last two target vectors in the third feature map.

In a specific implementation manner, in 2n feature maps that need to be subjected to vector replacement, the n feature maps that need to be subjected to target vector replacement may be the first n feature maps in the first feature map set, that is, feature maps output by the first n channels in the first feature extraction structure; the n additional feature maps requiring target vector replacement may be the last n feature maps in the first feature map set, that is, feature maps output by the last n channels in the first feature extraction structure.

Taking the first feature extraction structure as the feature extraction structure Q2 in fig. 4 as an example, the first feature map set includes 128 pieces of 8 × 40 feature maps, and then the first 16 pieces of 8 × 40 feature maps and the last 16 pieces of 8 × 40 feature maps may be selected as 2n feature maps to be subjected to vector replacement. In the replaced 2n feature maps, the target vectors are column vectors, the column vectors are sequentially arranged in the character direction, and the arrangement sequence is 1-40 in sequence. Then, for the first 16 feature maps of 8 × 40, the 2 st column vector in the feature map may be replaced by the 1 st column vector in the feature map, the 3 rd column vector in the feature map may be replaced by the 2 nd column vector in the feature map, … …, and the 39 th column vector in the feature map may be replaced by the 38 th column vector in the feature map, so as to update each feature map, and obtain the updated first 16 feature maps of 8 × 40. For the last 16 feature maps 8 × 40, the 39 th column vector in the feature map may be replaced by the 40 th column vector in the feature map, the 38 th column vector in the feature map may be replaced by the 39 th column vector in the feature map, … …, and the 2 nd column vector in the feature map may be replaced by the 3 rd column vector in the feature map, so as to update each feature map, and obtain the updated last 16 feature maps 8 × 40. Thus, the updated first 16 feature maps of 8 × 40, the updated last 16 feature maps of 8 × 40, and the non-updated 96 feature maps of 8 × 40 constitute the updated first feature map set. It should be understood that when the first feature extraction structure is the feature extraction structure Q3 and the feature extraction structure Q4 in fig. 4, the same reason as that when the first feature extraction structure is the feature extraction structure Q2 in fig. 4 is not repeated.

When the target vectors are replaced, the target vectors adjacent to each other in the two character directions are respectively adopted for replacement, and the incidence relation between the adjacent characters can be fully established; because the incidence relation between adjacent characters in the characters is close and recent, the accuracy of character recognition is improved by fully utilizing the incidence relation between the adjacent characters.

Optionally, other target vector replacement manners may also be adopted to establish the association relationship between the characters, which is not limited in this application.

The second feature extraction structure in step a3 is the next feature extraction structure connected to the first feature extraction structure, and the second feature atlas is the feature atlas output by the second feature extraction structure. For example, the first feature extraction structure is the feature extraction structure Q2 in fig. 4, the second feature extraction structure is the feature extraction structure Q3, and the second feature map set includes 256 4 × 40 feature maps output by the feature extraction structure Q3; the first feature extraction structure is the feature extraction structure Q3 in fig. 4, the second feature extraction structure is the feature extraction structure Q4, and the second feature map set includes 512 pieces of 2 × 40 feature maps output by the feature extraction structure Q4; the first feature extraction structure is the feature extraction structure Q4 in fig. 4, the second feature extraction structure is the feature extraction structure Q5, and the second feature map set includes 512 pieces of 1 × 40 feature maps output by the feature extraction structure Q5.

In step a4, the third feature map set is the feature map set output by the last feature extraction structure in the feature extraction network. And determining the third feature map set as a target feature map under the condition that the last feature extraction structure in the feature extraction network is not a preset feature extraction structure needing target vector replacement. For example, as shown in fig. 4, if the feature extraction structure Q2, the feature extraction structure Q3, and the feature extraction structure Q4 are determined as feature extraction structures to be performed, 512 sheets of 1 × 40 feature maps output by the feature extraction structure Q5 may be determined as target feature maps corresponding to the image to be recognized. And under the condition that the last feature extraction structure in the feature extraction network is a preset feature extraction structure needing target vector replacement, determining the updated third feature map set as a target feature map.

And S203, performing character recognition based on the target feature map corresponding to the image to be recognized to determine characters in the image to be recognized.

In a possible implementation, the target feature map corresponding to the image to be recognized may be input into the loop structure 102 shown in fig. 1, and the output of the transcription structure 103 may be obtained to determine the characters in the image to be recognized.

In the technical scheme, after the image to be recognized is obtained, in the process of extracting the features of the image to be recognized through the feature extraction network, the feature map mentioned by the feature extraction network is subjected to target vector replacement to obtain a target feature map corresponding to the image to be recognized, and then character recognition is performed based on the target feature map, so that characters in the image to be recognized are determined. The target vector is a vector arranged in the characteristic diagram along the character direction, the target vector replacement is to replace another target vector in the characteristic diagram by using one target vector in the characteristic diagram, the target vector replacement is performed on the characteristic diagram in the characteristic extraction process, namely, the vector arranged in the characteristic diagram along the character direction is replaced in the characteristic extraction process, and the vector arranged in the character direction can indicate part of characteristic information of a certain character, so that the relation among the characters can be established by replacing the vector arranged in the character direction in the characteristic diagram in the characteristic extraction process, the target characteristic diagram contains the correlation among the characters, and the character recognition accuracy can be improved.

Optionally, in some possible cases, the number of feature maps to be replaced by the target vector may also be determined according to the character condition contained in the image to be recognized. Before target vector replacement is carried out on at least one feature map extracted by the feature extraction network, the number of characters to be recognized contained in the image to be recognized can be determined; and determining the number of feature maps which need to be subjected to target vector replacement in the feature extraction network according to the number of the characters to be recognized contained in the image to be recognized.

Specifically, the number of characters to be recognized contained in the image to be recognized may be determined by a character number prediction model; the character number prediction model may be obtained by pre-training. The number of the characters to be recognized contained in the image to be recognized can also be determined in a manner of cutting the characters by a projection method.

Specifically, when the number of the characters to be recognized is large, it is described that the incidence relation among the characters included in one target vector in the feature map is large, and the number of the feature maps which need to be replaced by the target vector can be reduced; when the number of the characters to be recognized is small, it is described that the incidence relation among the characters included in one target vector in the feature map is small, and the number of the feature maps which need to be replaced by the target vector can be increased. The method comprises the steps of obtaining a first feature map set, obtaining a number of characters to be recognized, determining the proportion of feature maps needing target vector replacement compared with the first feature map set according to the number of the characters to be recognized, and determining the number of the feature maps needing target vector replacement by the product of the total number of the feature maps contained in the first feature map set and the proportion.

Before the target vector replacement is carried out, the number of the characters to be recognized is determined to determine the number of the characters to be subjected to the target vector replacement, so that the association relation between the characters can be established more reasonably, and the character recognition accuracy is improved.

While the method of the present application is described above, to better practice the method of the present application, the apparatus of the present application is described next.

Referring to fig. 5, fig. 5 is a schematic structural diagram of a character recognition apparatus according to an embodiment of the present application, where the character recognition apparatus may be a computer device or a part of a computer device. As shown in fig. 5, the character recognition apparatus 30 includes:

the image acquisition module 301 is configured to acquire an image to be recognized, where the image to be recognized includes characters to be recognized;

a replacing module 302, configured to perform target vector replacement on at least one feature map extracted by a feature extraction network in a process of performing feature extraction on the image to be identified through the feature extraction network, so as to obtain a target feature map corresponding to the image to be identified; the target vector is a vector arranged in the character direction in the feature map, and the target vector replacement is to replace another target vector in the feature map by using one target vector in the feature map;

a character determining module 303, configured to perform character recognition based on the target feature map to determine characters in the image to be recognized.

In some possible designs, the feature extraction network includes a plurality of feature extraction structures connected in sequence, each feature extraction structure includes at least one convolution layer, and in two adjacent feature extraction structures, a next feature extraction structure is used for performing feature extraction on a feature atlas output by a previous feature extraction structure; the replacement module 302 is specifically configured to: acquiring a first feature atlas, wherein the first feature atlas is a feature atlas output by a first feature extraction structure in the feature extraction network, the first feature atlas comprises a plurality of feature maps, the first feature extraction structure is any one feature extraction structure in a preset structure set, and the preset structure set comprises at least one feature extraction structure in the feature extraction network; performing target vector replacement on at least one feature map in the first feature map set to update the first feature map set, and inputting the updated first feature map set to a second feature extraction structure to obtain a second feature map set, where the second feature extraction structure is a next feature extraction structure connected with the first feature extraction structure, and the second feature map set is a feature map set output by the second feature extraction structure; and determining a third feature map set or an updated third feature map set as the target feature map, wherein the third feature map set is a feature map set output by a last feature extraction structure in the feature extraction network.

In some possible designs, the replacement module 302 is specifically configured to: in a first feature map, for each first target vector, replacing the first target vector with a target vector adjacent to the first target vector to obtain an updated first feature map, where the first feature map is a feature map in the first feature map set that needs to be replaced by the target vector, and the first target vector is any target vector to be replaced in the first feature map.

In some possible designs, at least one feature map in the first feature map set includes 2n feature maps in the first feature map set, n being a positive integer greater than or equal to 1; the replacement module 302 is specifically configured to: in a second feature map, for each second target vector, replacing the second target vector with a target vector adjacent to the second target vector in the first character direction to obtain an updated second feature map, where the second feature map is any one of n feature maps in the 2n feature maps, and the second target vector is any one target vector to be replaced in the second feature map; in a third feature map, for each third target vector, replacing the third target vector with a target vector adjacent to the third target vector in a second character direction to obtain an updated third feature map, where the third feature map is any one of n other feature maps in the 2n feature maps, and the third target vector is any one target vector to be replaced in the third feature map; the first character direction and the second character direction are two opposite character directions.

In some possible designs, the n feature maps are the first n feature maps in the first feature map set, and the n additional feature maps are the last n feature maps in the first feature map set.

In one possible design, the feature extraction network includes M feature extraction structures, the preset structure set includes the ith feature extraction structure in the feature extraction network, i is greater than or equal to 2 and less than or equal to (M-1), and M is a positive integer greater than 4.

In some possible designs, the character recognition apparatus 30 further includes a number determining module 304 for determining the number of characters to be recognized contained in the image to be recognized; and determining the number of the at least one feature map according to the number of the characters to be recognized.

It should be noted that, for what is not mentioned in the embodiment corresponding to fig. 5, reference may be made to the description of the foregoing method embodiment, and details are not described here again.

According to the device, after the image to be recognized is obtained, in the process of performing feature extraction on the image to be recognized through the feature extraction network, the feature map mentioned by the feature extraction network is subjected to target vector replacement to obtain the target feature map corresponding to the image to be recognized, and then character recognition is performed on the basis of the target feature map, so that characters in the image to be recognized are determined. The target vector is a vector arranged in the characteristic diagram along the character direction, the target vector replacement is to replace another target vector in the characteristic diagram by using one target vector in the characteristic diagram, the target vector replacement is performed on the characteristic diagram in the characteristic extraction process, namely, the vector arranged in the characteristic diagram along the character direction is replaced in the characteristic extraction process, and the vector arranged in the character direction can indicate part of characteristic information of a certain character, so that the relation among the characters can be established by replacing the vector arranged in the character direction in the characteristic diagram in the characteristic extraction process, the target characteristic diagram contains the correlation among the characters, and the character recognition accuracy can be improved.

Referring to fig. 6, fig. 6 is a schematic structural diagram of a computer device provided in an embodiment of the present application, where the computer device 40 includes a processor 401 and a memory 402. The processor 401 is connected to the memory 402, for example, the processor 401 may be connected to the memory 402 through a bus.

The processor 401 is configured to support the computer device 40 to perform the corresponding functions in the methods in the above-described method embodiments. The processor 401 may be a Central Processing Unit (CPU), a Network Processor (NP), a hardware chip, or any combination thereof. The hardware chip may be an Application Specific Integrated Circuit (ASIC), a Programmable Logic Device (PLD), or a combination thereof. The PLD may be a Complex Programmable Logic Device (CPLD), a field-programmable gate array (FPGA), a General Array Logic (GAL), or any combination thereof.

The memory 402 is used to store program codes and the like. Memory 402 may include Volatile Memory (VM), such as Random Access Memory (RAM); the memory 402 may also include a non-volatile memory (NVM), such as a read-only memory (ROM), a flash memory (flash memory), a hard disk (HDD) or a solid-state drive (SSD); the memory 402 may also comprise a combination of memories of the kind described above.

Processor 401 may call the program code to perform the following:

and performing character recognition based on the target feature map to determine characters in the image to be recognized.

Embodiments of the present application also provide a computer-readable storage medium, which stores a computer program, where the computer program includes program instructions, and the program instructions, when executed by a computer, cause the computer to execute the method according to the foregoing embodiments.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only memory (ROM), a Random Access Memory (RAM), or the like.

The above disclosure is only for the purpose of illustrating the preferred embodiments of the present application and is not to be construed as limiting the scope of the present application, so that the present application is not limited thereto, and all equivalent variations and modifications can be made to the present application.

Claims

1. A character recognition method, comprising:

2. The method according to claim 1, wherein the feature extraction network comprises a plurality of feature extraction structures connected in sequence, each feature extraction structure comprises at least one convolution layer, and wherein, of two adjacent feature extraction structures, the next feature extraction structure is used for performing feature extraction on a feature atlas output by the previous feature extraction structure;

the performing target vector replacement on at least one feature map extracted by the feature extraction network to obtain a target feature map corresponding to the image to be recognized includes:

acquiring a first feature atlas, wherein the first feature atlas is a feature atlas output by a first feature extraction structure in the feature extraction network, the first feature atlas comprises a plurality of feature maps, the first feature extraction structure is any one feature extraction structure in a preset structure set, and the preset structure set comprises at least one feature extraction structure in the feature extraction network;

performing target vector replacement on at least one feature map in the first feature map set to update the first feature map set, and inputting the updated first feature map set to a second feature extraction structure to obtain a second feature map set, where the second feature extraction structure is a next feature extraction structure connected with the first feature extraction structure, and the second feature map set is a feature map set output by the second feature extraction structure;

and determining a third feature map set or an updated third feature map set as the target feature map, wherein the third feature map set is a feature map set output by a last feature extraction structure in the feature extraction network.

3. The method according to claim 2, wherein the performing target vector replacement on at least one feature map in the first feature map set to update the first feature map set comprises:

in a first feature map, for each first target vector, replacing the first target vector with a target vector adjacent to the first target vector to obtain an updated first feature map, where the first feature map is a feature map in the first feature map set that needs to be replaced by the target vector, and the first target vector is any target vector to be replaced in the first feature map.

4. The method according to claim 2, wherein at least one feature map in the first feature map set comprises 2n feature maps in the first feature map set, n being a positive integer greater than or equal to 1;

the performing target vector replacement on at least one feature map in the first feature map set to update the first feature map set includes:

in a second feature map, for each second target vector, replacing the second target vector with a target vector adjacent to the second target vector in the first character direction to obtain an updated second feature map, where the second feature map is any one of n feature maps in the 2n feature maps, and the second target vector is any one target vector to be replaced in the second feature map;

in a third feature map, for each third target vector, replacing the third target vector with a target vector adjacent to the third target vector in a second character direction to obtain an updated third feature map, where the third feature map is any one of n other feature maps in the 2n feature maps, and the third target vector is any one target vector to be replaced in the third feature map;

the first character direction and the second character direction are two opposite character directions.

5. The method according to claim 4, wherein the n feature maps are the first n feature maps in the first feature map set, and the n additional feature maps are the last n feature maps in the first feature map set.

6. The method of any one of claims 2-5, wherein the feature extraction network comprises M feature extraction structures, and wherein the preset set of structures comprises the i-th feature extraction structure in the feature extraction network, 2 ≦ i ≦ M-1, M being a positive integer greater than 4.

7. The method according to any one of claims 1 to 5, wherein before performing the target vector replacement on the at least one feature map extracted by the feature extraction network, the method further comprises:

determining the number of characters to be recognized contained in the image to be recognized;

and determining the number of the at least one feature map according to the number of the characters to be recognized.

8. A character recognition apparatus, comprising:

9. A computer device comprising a memory and a processor for executing one or more computer programs stored in the memory, the processor, when executing the one or more computer programs, causing the computer device to implement the method of any of claims 1-7.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program comprising program instructions that, when executed by a processor, cause the processor to carry out the method according to any one of claims 1-7.