CN114387588A

CN114387588A - Character recognition method and device, electronic equipment and storage medium

Info

Publication number: CN114387588A
Application number: CN202011110649.0A
Authority: CN
Inventors: 苏伟博; 马原
Original assignee: Beijing Pengsi Technology Co ltd
Current assignee: Beijing Pengsi Technology Co ltd
Priority date: 2020-10-16
Filing date: 2020-10-16
Publication date: 2022-04-22

Abstract

The application discloses a character recognition method, a character recognition device, electronic equipment and a storage medium, which belong to the technical field of artificial intelligence and comprise the following steps: the method comprises the steps of detecting a target object of an acquired image to determine an area where the target object is located in the image, extracting features of the area to obtain a feature map of the target object, dividing information according to the position of each character in the preset target object, acquiring an associated feature map of each character in the target object from the feature map, and identifying each character in the target object according to a preset character set and the associated feature map of each character in the target object. Therefore, the region where the target object is located does not need to be subjected to character segmentation, and the single character is identified based on the segmented single small region, so that the processing process is simple, and in addition, when each character in the target object is identified, only the associated characteristic diagram of each character is considered, the data volume needing to be processed is small, and therefore, the character identification efficiency can be improved.

Description

Character recognition method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a character recognition method and apparatus, an electronic device, and a storage medium.

Background

With the rapid development of artificial intelligence technology, the application field of character recognition technology is more and more extensive, and the requirements on the accuracy and the speed of character recognition are higher and higher.

Taking the identification of the license plate in the image as an example, in the related art, the license plate detection needs to be performed on the acquired image to determine the area where the license plate is located in the image, then the character segmentation is performed on the area, and finally the identification of a single character is performed based on the small area obtained by the segmentation. Similar problems exist in recognizing characters in other objects in an image, such as production dates and factory serial numbers.

As can be seen, the related art has a problem that the recognition efficiency of characters in an image is relatively low.

Disclosure of Invention

The embodiment of the application provides a character recognition method, a character recognition device, electronic equipment and a storage medium, which are used for solving the problem that the recognition efficiency of characters in an image is low in the related art.

In a first aspect, an embodiment of the present application provides a character recognition method, including:

carrying out target object detection on the obtained image to determine an area where a target object in the image is located, wherein the target object comprises N characters, and N is a positive integer;

extracting the features of the region to obtain a feature map of the target object;

acquiring an associated characteristic diagram of each character in the target object from the characteristic diagram according to preset position division information of each character in the target object;

and identifying each character in the target object according to a preset character set and the associated characteristic diagram of each character in the target object.

In a possible implementation manner, acquiring an associated feature map of each character in the target object from the feature map according to preset position division information of each character in the target object includes:

determining a corresponding area of each character in the feature map according to preset position division information of each character in the target object, wherein the determined area is larger than an actual corresponding area of the character in the feature map; and acquiring a characteristic diagram corresponding to the determined area from the characteristic diagram as an associated characteristic diagram of the character.

In a possible implementation manner, acquiring an associated feature map of each character in the target object from the feature map according to preset position division information of each character in the target object includes: dividing the feature map into N sub-feature maps; taking the sub-feature map matched with the preset position division information of each character in the target object in the N sub-feature maps as a reference feature map of the character; and correcting the reference characteristic diagram according to the adjacent sub characteristic diagrams of the reference characteristic diagram to obtain the associated characteristic diagram of the character.

In a possible implementation manner, the modifying the reference feature map according to the sub-feature maps adjacent to the reference feature map to obtain the associated feature map of the character includes:

performing feature extraction on the reference feature map and sub-feature maps adjacent to the reference feature map for multiple times, wherein the sub-feature maps subjected to feature extraction each time are different, and the sub-feature maps subjected to feature extraction each time are continuous in the feature map;

performing pooling treatment on the feature maps extracted each time;

and fusing the characteristic diagrams after the pooling and the reference characteristic diagram to obtain the associated characteristic diagram of the character.

In one possible implementation, the feature extraction is performed on the reference feature map and the sub-feature maps adjacent to the reference feature map for multiple times, and includes:

and performing feature extraction on the reference feature map and sub-feature maps adjacent to the reference feature map by using different feature extraction layers in a character recognition network model, wherein the sub-feature maps for performing feature extraction on each feature extraction layer are different, the sub-feature maps extracted by each feature extraction layer are continuous in the feature map, and the character recognition network model is obtained by training the reference feature map of a character sample and the sub-feature maps adjacent to the reference feature map.

In one possible implementation, the pooling process is performed on each extracted feature map, and includes:

and inputting the output result of each feature extraction layer into a pooling layer which is connected with each feature extraction layer in the character recognition network model for pooling.

In a possible embodiment, the fusing the pooled feature maps and the reference feature map to obtain the associated feature map of the character includes:

and inputting the output result of each pooling layer and the reference feature map into a feature fusion layer in the character recognition network model for fusion processing to obtain the associated feature map of the character.

In one possible implementation, the character recognition network model is trained according to the following steps:

performing feature extraction on a reference feature map of a character sample and sub-feature maps adjacent to the reference feature map by using different feature extraction layers in a neural network model, wherein the sub-feature maps for performing feature extraction of each feature extraction layer are different, and the sub-feature maps extracted by each feature extraction layer are continuous in the feature maps;

inputting the output results of the feature extraction layers into the pooling layers connected with the feature extraction layers in the neural network model for pooling;

inputting the output result of each pooling layer and the reference characteristic diagram into a characteristic fusion layer in the neural network model for fusion processing;

respectively utilizing the output results of the feature fusion layer and the output results of the pooling layers to perform character recognition;

and adjusting the model parameters of the neural network model according to the character recognition error of the output result of the feature fusion layer and the character recognition error of the output result of each pooling layer to obtain the character recognition network model.

In a second aspect, an embodiment of the present application provides a character recognition apparatus, including:

the detection module is used for detecting a target object in the acquired image so as to determine an area where the target object is located in the image, wherein the target object comprises N characters, and N is a positive integer;

the characteristic extraction module is used for extracting the characteristics of the region to obtain a characteristic diagram of the target object;

the acquisition module is used for acquiring the associated characteristic diagram of each character in the target object from the characteristic diagram according to preset position division information of each character in the target object;

and the identification module is used for identifying each character in the target object according to a preset character set and the associated feature map of each character in the target object.

In a third aspect, an embodiment of the present application provides an electronic device, including: at least one processor, and a memory communicatively coupled to the at least one processor, wherein:

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform any of the character recognition methods described above.

In a fourth aspect, embodiments of the present application provide a storage medium, where instructions in the storage medium are executed by a processor of an electronic device, and the electronic device is capable of executing any one of the above character recognition methods.

In the embodiment of the application, the obtained image is subjected to target object detection to determine the area where the target object is located in the image, feature extraction is performed on the area to obtain the feature map of the target object, the associated feature map of each character in the target object is obtained from the feature map according to the preset position division information of each character in the target object, and then each character in the target object is identified according to the preset character set and the associated feature map of each character in the target object. In this way, each character in the target object is identified based on the feature map of the target object, the region where the target object is located does not need to be subjected to character segmentation, and the identification of a single character is carried out based on a single small region subjected to segmentation, so that the processing process is relatively simple, and in addition, when each character in the target object is identified, only the associated feature map of each character is considered, but not the whole feature map of the target object, the data volume needing to be processed is relatively small, and therefore, the character identification efficiency can be improved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

fig. 1 is a flowchart of a character recognition method according to an embodiment of the present application;

fig. 2 is a schematic diagram of a convolutional neural network for feature extraction of a target object in an image according to an embodiment of the present disclosure;

fig. 3 is a flowchart for acquiring an association feature map of each character in a target object according to an embodiment of the present disclosure;

fig. 4 is a schematic structural diagram of a character recognition model corresponding to an ith character position in a target object according to an embodiment of the present application;

fig. 5 is a flowchart of a training method for a character recognition network model corresponding to each character position in a target object according to an embodiment of the present disclosure;

fig. 6 is a schematic diagram of a process of performing feature extraction on a target object in an image according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a character recognition model corresponding to each character position in a target object according to an embodiment of the present disclosure;

fig. 8 is a schematic diagram of a training process of a character recognition network model corresponding to each character position in a target object according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of a character recognition apparatus according to an embodiment of the present application;

fig. 10 is a schematic hardware structure diagram of an electronic device for implementing a character recognition method according to an embodiment of the present application.

Detailed Description

In order to solve the problem that the recognition efficiency of characters in an image is low in the related art, embodiments of the present application provide a character recognition method, apparatus, electronic device, and storage medium.

The preferred embodiments of the present application will be described below with reference to the accompanying drawings of the specification, it should be understood that the preferred embodiments described herein are merely for illustrating and explaining the present application, and are not intended to limit the present application, and that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.

To facilitate understanding of the present application, the present application refers to technical terms in which:

characteristic diagram: after the convolution kernel is used for carrying out convolution operation on the image, a characteristic graph can be obtained, and the characteristic graph and the convolution kernel can be carried out convolution operation to generate a new characteristic graph.

N × N represents the convolution kernel size (window size) used when performing convolution operation on an image, and N is an odd number.

Fig. 1 is a flowchart of a character recognition method provided in an embodiment of the present application, including the following steps:

s101: and detecting a target object of the acquired image to determine an area where the target object is located in the image, wherein the target object comprises N characters, and N is a positive integer.

In specific implementation, target objects such as license plates, production dates, factory serial numbers and the like all contain a fixed number of characters.

S102: and performing feature extraction on the region where the target object is located in the image to obtain a feature map of the target object.

In specific implementation, the region where the target object is located in the image may be cut, and feature extraction may be performed on the sub-image obtained by cutting, so as to obtain a feature map of the target object.

Fig. 2 shows a schematic diagram of a convolutional neural network for feature extraction of cropped sub-images, the convolutional neural network comprising: the device comprises a first convolution block, a first pooling layer, a second convolution block, a second pooling layer, a third convolution block, a third pooling layer and a fourth convolution block, wherein the first convolution block is formed by stacking two convolution layers with a convolution kernel size of 3x3, a step length (stride) of 1, a pad of 1 and a channel of 32; the convolution kernel size of the first pooling layer is 3x3, stride is 2; the second convolution block is formed by stacking two convolution layers with convolution kernel size of 3x3, stride of 1, pad of 1 and channel of 64; the convolution kernel size of the second pooling layer is 3x3, stride is 2; the third convolution block is formed by stacking three convolution layers with the convolution kernel size of 3x3, stride of 1, pad of 1 and channel of 128; the convolution kernel size of the third pooling layer is 3x3, stride is 2; the fourth convolution block is formed by stacking three convolution layers with convolution kernel size of 3x3, stride of 1, pad of 1 and channel of 128, where stride is used for controlling the sliding step number of the convolution operation window, pad is used for controlling the size of the image after convolution operation, and the size of the image before and after convolution operation can be kept consistent by reasonably setting the value of pad.

S103: and acquiring the associated characteristic diagram of each character in the target object from the characteristic diagram according to preset position division information of each character in the target object.

In a possible implementation manner, a region corresponding to each character in the feature map may be determined according to preset position division information of each character in the target object, and then, a feature map corresponding to the region is acquired from the feature map as an associated feature map of the character, where the determined region is larger than a region actually corresponding to the character in the feature map.

It is assumed that the characters in the target object are uniformly distributed. If the position division information of the first character in the target object is

It may be determined that the corresponding region of the first character in the feature map is from the start position of the feature map to the feature map

A location; similarly, if the position of the second character in the target object is divided into information

It may be determined that the corresponding region of the second character in the feature map is from the feature map

Location to feature map

Location.

Therefore, the corresponding area of each character in the feature map can be ensured to be larger than the actually corresponding area of the character in the feature map, the associated feature map of the character obtained from the feature map can be ensured to completely express the semanteme of the character, and the accuracy of subsequent character recognition is improved.

In a possible implementation manner, the feature map may be divided into N sub-feature maps, a sub-feature map in the N sub-feature maps, which is matched with the position division information of each character in the target object, is used as a reference feature map of the character, and then, the reference feature map is modified according to the sub-feature maps adjacent to the reference feature map, so as to obtain the associated feature map of the character.

In specific implementation, the associated feature map of each character in the target object may be obtained according to the process shown in fig. 3, where the process includes the following steps:

s301 a: the feature map is divided into N sub-feature maps.

In specific implementation, the feature map may be divided into N sub-feature maps with the same size, that is, the feature map is divided into N sub-feature maps on average. Assuming that the feature map is expressed in the form of "channel × height × width", the feature map may be divided into N sub-feature maps of the same size in the width direction.

S302 a: and taking the sub-feature map matched with the position division information of each character in the target object in the N sub-feature maps as a reference feature map of the character.

In specific implementation, the target object includes N characters, and if the feature map is divided into N sub-feature maps, the characters in the target object and the sub-feature maps may be in one-to-one correspondence.

Assuming that the position division information of each character in the target object is the position number of the character in the target object, the sub feature maps may be numbered according to the position numbering manner of the character in the target object, for example, numbering from left to right, and then the sub feature map with the same position number as that of each character in the target object in the N sub feature maps is used as the reference feature map of the character.

S303 a: and correcting the reference characteristic diagram according to the adjacent sub characteristic diagrams of the reference characteristic diagram to obtain the associated characteristic diagram of the character.

In practical application, although the reference feature map of each character in the target object is determined from the N sub-feature maps, there may be a case where the reference feature map is not aligned with the character, that is, the semantic expression of the reference feature map for the corresponding character is not accurate.

In specific implementation, the reference feature map and the sub-feature map adjacent to the reference feature map may be subjected to feature extraction for multiple times, the feature maps extracted for each time are subjected to pooling, and the feature maps and the reference feature map subjected to pooling are subjected to fusion processing to obtain the associated feature map of the character.

Considering that the sub-feature maps corresponding to the same character are generally continuously adjacent, in order to enable the associated feature map of the character to more accurately express the image feature of the character, the sub-feature maps extracted each time are required to be different, and the sub-feature maps extracted each time are continuous in the feature map, that is, no sub-feature map is spaced between the sub-feature maps extracted each time.

In one possible implementation, the associated feature map of each character in the target object may be determined by using a character recognition model corresponding to the character.

In specific implementation, the reference feature diagram of the character sample and the sub-feature diagram adjacent to the reference feature diagram can be used for training to obtain a character recognition network model corresponding to each character in the target object. Subsequently, different feature extraction layers in the character recognition network model corresponding to each character can be used for carrying out feature extraction on the reference feature graph of the character and the sub-feature graph adjacent to the reference feature graph, then, the output result of each feature extraction layer is input into a pooling layer which is connected with each feature extraction layer in the character recognition network model for pooling, and finally, the output result of each pooling layer and the feature fusion layer input into the character recognition network model by the reference feature graph are subjected to fusion processing, so that the associated feature graph of the character is obtained.

Similarly, in order to enable the associated feature map of the character to more accurately express the image features of the character, the sub-feature maps extracted by each feature extraction layer in the character recognition network model may be required to be different, and the sub-feature maps extracted by each feature extraction layer are continuous in the feature map.

Referring to fig. 4, taking the ith character in the target object as an example, taking the reference feature map of the ith character as the ith sub-feature map, in specific implementation, feature extraction may be performed on the i-2 th sub-feature map, the i-1 th sub-feature map and the ith sub-feature map by using a first feature extraction layer in a character recognition model i corresponding to the ith character, and the extracted features are input into a first pooling layer in the character recognition model i for pooling; performing feature extraction on the (i-1) th sub-feature graph and the ith sub-feature graph by using a second feature extraction layer in the character recognition model i, and inputting the extracted features into a second pooling layer in the character recognition model i for pooling; performing feature extraction on the ith sub-feature map and the (i + 1) th sub-feature map by using a third feature extraction layer in the character recognition model i, and inputting the extracted features into a third pooling layer in the character recognition model i for pooling; and performing feature extraction on the ith sub-feature map, the (i + 1) th sub-feature map and the (i + 2) th sub-feature map by using a fourth feature extraction layer in the character recognition model i, and inputting the extracted features into a fourth pooling layer in the character recognition model i for pooling. Further, the pooling results of the four pooling layers in the character recognition model i and the ith sub-feature map are input into a feature fusion layer in the character recognition model i, so that the associated feature map of the ith character is obtained.

Fig. 4 is introduced by taking two sub-feature maps adjacent to each other left and right of the ith reference feature map as an example, in practical applications, other numbers of sub-feature maps adjacent to each other left and right of the ith reference feature map may also be selected to correct the ith reference feature map, and the sub-feature map with a negative value is omitted, which is not described herein again.

S104: and identifying each character in the target object according to the preset character set and the associated characteristic diagram of each character in the target object.

In practical applications, each character in the preset character set may be a chinese character, a letter, a number, or a special symbol, where the chinese character includes 31 province chinese characters, the letter includes "a-Z" and "a-Z", the number includes "0-9", and the special symbol includes "+", "&", "#", and so on.

In specific implementation, the associated feature map of each character in the target object may be compared with the stored feature data of each character in the preset character set to determine the similarity between the character and each character in the preset character set, the distribution probability of the character between each character in the preset character set is determined according to the similarity between the character and each character in the preset character set, and the preset character with the highest probability is determined as the recognition result of the character.

In specific implementation, the character recognition network model corresponding to each character position in the target object may be trained according to the process shown in fig. 5, where the process includes the following steps:

s501: and performing feature extraction on the reference feature map of the character sample and the sub-feature maps adjacent to the reference feature map by using different feature extraction layers in the neural network model, wherein the sub-feature maps for performing feature extraction of each feature extraction layer are different, and the sub-feature maps extracted by each feature extraction layer are continuous in the feature maps.

S502: and inputting the output result of each feature extraction layer into the pooling layer connected with each feature extraction layer in the neural network model for pooling.

S503: and inputting the output result of each pooling layer and the reference characteristic diagram into a characteristic fusion layer in the neural network model for fusion processing.

S504: and respectively carrying out character recognition by using the output results of the feature fusion layer and the output results of the pooling layers.

S505: and adjusting the model parameters of the neural network model according to the character recognition error of the output result of the feature fusion layer and the character recognition error of the output result of each pooling layer to obtain the character recognition network model.

In specific implementation, a training end condition of the neural network model may be set, for example, a character recognition error of the neural network model is smaller than a preset error, training times of the neural network model reach preset times, and the like, and subsequently, before adjusting model parameters of the neural network model each time, it may be determined whether the training end condition of the neural network model is currently satisfied, and if not, the training of the neural network model continues; if so, finishing the training of the neural network model so as to obtain the character recognition network model.

In the embodiment of the application, when the character recognition network model is trained, the output result of the characteristic fusion layer and the output result of each pooling layer are respectively utilized for character recognition, and the model parameters of the neural network model are adjusted according to the character recognition error of the output result of the characteristic fusion layer and the character recognition error of the output result of each pooling layer, which is equivalent to that a plurality of pieces of supervision information are added to the character recognition network model for loss calculation, so that the recognition performance of the whole character recognition network model can be effectively improved.

The technical solution of the present application is described below with reference to specific embodiments.

Assuming that a license plate in an image is identified, the license plate comprises 7 characters, and the preset character set comprises 65 characters, wherein the 65 characters comprise 31 provinces of Chinese characters, 24 English letters and 10 numbers of 0-9.

In specific implementation, after an original image acquired by a camera is acquired, license plate detection can be performed on the original image to determine a region where a license plate is located in the original image, the region is cut out from the original image to obtain a license plate image, and the license plate image is input into a multi-label license plate recognition convolutional neural network for license plate recognition, wherein the multi-label license plate recognition convolutional neural network comprises a feature extraction network and a character recognition network.

The size of the feature map is represented by "channel × height × width", and referring to fig. 6, the height × width of the license plate image may be scaled to 32 × 112, the scaled license plate image is input to the first convolution block of the feature extraction network to obtain a 32 × 32 × 112 feature map, the 32 × 32 × 112 feature map is input to the first pooling layer of the feature extraction network to obtain a 32 × 16 × 56 feature map, the 32 × 16 × 56 feature map is input to the second convolution block of the backbone feature extraction convolution network to obtain a 64 × 16 × 56 feature map, the 64 × 16 × 56 feature map is input to the second pooling layer of the backbone feature extraction convolution network to obtain a 64 × 8 × 28 feature map, the 64 × 8 × 28 feature map is input to the third convolution block of the backbone feature extraction convolution network to obtain a 128 × 8 × 28 feature map, and the 128 × 8 × 28 feature map is input to the third pooling layer of the backbone feature extraction convolution network, and obtaining a 128 × 4 × 14 feature map, inputting the 128 × 4 × 14 feature map into a fourth convolution block of the dry feature extraction convolution network to obtain a 128 × 4 × 14 feature map, and then inputting the 128 × 4 × 14 feature map into a character recognition network to obtain a license plate recognition result.

The first convolution block is formed by stacking two convolution layers with convolution kernel size of 3x3, stride of 1, pad of 1 and channel of 32; the convolution kernel size of the first pooling layer is 3x3, stride is 2; the second convolution block is formed by stacking two convolution layers with convolution kernel size of 3x3, stride of 1, pad of 1 and channel of 64; the convolution kernel size of the second pooling layer is 3x3, stride is 2; the third convolution block is formed by stacking three convolution layers with the convolution kernel size of 3x3, stride of 1, pad of 1 and channel of 128; the convolution kernel size of the third pooling layer is 3x3, stride is 2; the fourth convolution block is formed by stacking three convolution layers with convolution kernel size of 3x3, stride of 1, pad of 1 and channel of 128, where stride is used for controlling the sliding step number of the convolution operation window, pad is used for controlling the size of the image after convolution operation, and the size of the image before and after convolution operation can be kept consistent by reasonably setting the value of pad.

Referring to fig. 7, the character recognition network includes a global feature segmentation layer, a first branch character recognition network, a second branch character recognition network, a third branch character recognition network, a fourth branch character recognition network, a fifth branch character recognition network, a sixth branch character recognition network, and a seventh branch character recognition network, where the global feature segmentation layer is configured to segment 128 × 4 × 14 feature maps into 7 equal parts according to columns, each sub-feature map has a size of 128 × 4 × 2, and assuming that the serial numbers of the sub-feature maps from left to right are 0 to 6, the i-th sub-feature map is a reference feature map of the i-th character in the license plate, and i is greater than or equal to 0 and less than or equal to 6; the first branch character recognition network is used for recognizing a first character in the license plate by utilizing the sub-feature graph 0 and the adjacent sub-feature graph of the sub-feature graph 0; the second branch character recognition network is used for recognizing a second character in the license plate by utilizing the sub-feature diagram 1 and the adjacent sub-feature diagram of the sub-feature diagram 1; the third branched character recognition network is used for recognizing a third character in the license plate by utilizing the sub-feature graph 2 and the adjacent sub-feature graph of the sub-feature graph 2; the fourth branch character recognition network is used for recognizing a fourth character in the license plate by utilizing the sub-feature graph 3 and the adjacent sub-feature graph of the sub-feature graph 3; the fifth branch character recognition network is used for recognizing a fifth character in the license plate by utilizing the sub-feature graph 4 and the adjacent sub-feature graph of the sub-feature graph 4; the sixth branched character recognition network is used for recognizing a sixth character in the license plate by utilizing the sub-feature graph 5 and the adjacent sub-feature graph of the sub-feature graph 5; the seventh branched character recognition network is used for recognizing the seventh character in the license plate by utilizing the sub-feature map 6 and the adjacent sub-feature map of the sub-feature map 6.

The following describes a process of recognizing the first character of the license plate by taking the first branch character recognition network as an example.

In specific implementation, different feature extraction layers in the first branch character recognition network can be used for feature extraction on the sub-feature graph 0 and the sub-feature graph adjacent to the sub-feature graph 0, the first feature extraction layer in the first branch character recognition network in fig. 7 can be used for feature extraction on the sub-feature graph 0 and the sub-feature graph 1 to obtain a 128 × 4 × 4 feature graph, and the second feature extraction layer in the first branch character recognition network can be used for feature extraction on the sub-feature graph 0, the sub-feature graph 1 and the sub-feature graph 2 to obtain a 128 × 4 × 6 feature graph.

Furthermore, the output results of the feature extraction layers in the first branch character recognition network can be respectively subjected to pooling processing by utilizing the pooling layers respectively connected with the feature extraction layers, and the feature graph output by each pooling layer is the same as the sub-feature graph 0 in size of 4 multiplied by 2.

Furthermore, a 128 × 4 × 2 feature map and a 128 × 4 × 2 sub-feature map 0 output by each pooling layer in the first branchcharacter recognition network are input to a channel fusion layer (concat) in the first branchcharacter recognition network to obtain an associated feature map of the character, the associated feature map of the character is input to a full connectivity (Fc) layer, so that the distribution probability of the first character in the license plate among 65 preset characters can be obtained, and the preset character with the highest probability is taken as the recognition result of the first character.

The process of recognizing the fourth character in the license plate in fig. 7 is similar to the above process, and is not repeated herein.

In order to improve the recognition accuracy of the character recognition network, more supervision information can be added in the training stage of the character recognition network, referring to fig. 8, taking a first branch character recognition network as an example, the associated feature map of a first character in a license plate can be input into a full connection layer to obtain a first recognition result of the first character, a first recognition error is calculated according to the first recognition result and the labeling information of the first character, the output result of each pooling layer in the first branch character recognition network is input into the full connection layer to obtain a second recognition result of the first character, a second recognition error is calculated according to the second recognition result and the labeling information of the first character, the value of a loss function is calculated according to the first recognition error and the second recognition error, and further, the network parameters in the first branch character recognition network are adjusted according to the value of the loss function and a gradient descent algorithm, and when the training end condition of the first branch character recognition network is determined to be met, taking the first branch character recognition network as a character recognition model of the first character in the license plate, wherein the training end condition is that the character recognition error of the first branch character recognition network is smaller than a preset error, the training times of the first branch character recognition network reaches preset times and the like.

Therefore, the loss calculation is performed by adding a plurality of pieces of supervision information to the first branch character recognition network, and the network parameter adjustment of the first branch character recognition network is guided, so that the recognition performance of the finally obtained character recognition model can be effectively improved.

The training process of other branched character recognition networks in fig. 8 is similar and will not be described herein.

In the embodiment of the application, local feature information corresponding to the characters is used for replacing global feature information for character recognition, interference information is less, feature extraction is more targeted, and feature expression is more reasonable, so that the character recognition efficiency can be improved, and in order to solve the problem that the character features are not aligned with the characters, after the reference feature map of each character is determined, the reference feature map is corrected according to sub-feature maps adjacent to the reference feature map, the semantic expression capability of the finally obtained associated feature map on the character is enhanced, and therefore the character recognition accuracy can be further ensured.

When the character recognition method provided in the embodiments of the present application is implemented in software or hardware or a combination of software and hardware, a plurality of functional modules may be included in the electronic device, and each functional module may include software, hardware or a combination thereof.

Fig. 9 is a schematic structural diagram of a character recognition apparatus according to an embodiment of the present application, and includes a detection module 901, a feature extraction module 902, an acquisition module 903, and a recognition module 904.

A detection module 901, configured to perform target object detection on an acquired image to determine an area where a target object in the image is located, where the target object includes N characters, and N is a positive integer;

a feature extraction module 902, configured to perform feature extraction on the region to obtain a feature map of the target object;

an obtaining module 903, configured to obtain, according to preset position division information of each character in the target object, an associated feature map of each character in the target object from the feature map;

and the identifying module 904 is configured to identify each character in the target object according to a preset character set and an associated feature map of each character in the target object.

In a possible implementation manner, the obtaining module 903 is specifically configured to:

In another possible implementation, the obtaining module 903 is specifically configured to:

dividing the feature map into N sub-feature maps; taking the sub-feature map matched with the preset position division information of each character in the target object in the N sub-feature maps as a reference feature map of the character; and correcting the reference characteristic diagram according to the adjacent sub characteristic diagrams of the reference characteristic diagram to obtain the associated characteristic diagram of the character.

In a possible implementation manner, the obtaining module 903 is further specifically configured to:

performing pooling treatment on the feature maps extracted each time;

The division of the modules in the embodiments of the present application is schematic, and only one logical function division is provided, and in actual implementation, there may be another division manner, and in addition, each functional module in each embodiment of the present application may be integrated in one processor, may also exist alone physically, or may also be integrated in one module by two or more modules. The coupling of the various modules to each other may be through interfaces that are typically electrical communication interfaces, but mechanical or other forms of interfaces are not excluded. Thus, modules described as separate components may or may not be physically separate, may be located in one place, or may be distributed in different locations on the same or different devices. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode.

Fig. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure, where the electronic device includes a transceiver 1001 and a processor 1002, and the processor 1002 may be a Central Processing Unit (CPU), a microprocessor, an application specific integrated circuit, a programmable logic circuit, a large scale integrated circuit, or a digital Processing Unit. The transceiver 1001 is used for data transmission and reception between an electronic device and other devices.

The electronic device may further comprise a memory 1003 for storing software instructions executed by the processor 1002, and may of course also store some other data required by the electronic device, such as identification information of the electronic device, encryption information of the electronic device, user data, etc. The Memory 1003 may be a Volatile Memory (Volatile Memory), such as a Random-Access Memory (RAM); the Memory 1003 may also be a Non-Volatile Memory (Non-Volatile Memory) such as a Read-Only Memory (ROM), a Flash Memory (Flash Memory), a Hard Disk Drive (HDD) or a Solid-State Drive (SSD), or the Memory 1003 may be any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited thereto. The memory 1003 may be a combination of the above memories.

The embodiment of the present application does not limit the specific connection medium among the processor 1002, the memory 1003, and the transceiver 1001. In the embodiment of the present application, only the memory 1003, the processor 1002, and the transceiver 1001 are connected by the bus 1004 in fig. 10, the bus is shown by a thick line in fig. 10, and the connection manner between the other components is only schematically illustrated and is not limited thereto. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 10, but this is not intended to represent only one bus or type of bus.

The processor 1002 may be dedicated hardware or a processor running software, and when the processor 1002 can run software, the processor 1002 reads software instructions stored in the memory 1003 and executes the character recognition method involved in the foregoing embodiments under the driving of the software instructions.

The embodiment of the present application also provides a storage medium, and when instructions in the storage medium are executed by a processor of an electronic device, the electronic device can execute the character recognition method in the foregoing embodiment.

In some possible embodiments, the various aspects of the character recognition method provided in the present application may also be implemented in the form of a program product, which includes program code for causing an electronic device to perform the character recognition method referred to in the foregoing embodiments when the program product runs on the electronic device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable Disk, a hard Disk, a RAM, a ROM, an Erasable Programmable Read-Only Memory (EPROM), a flash Memory, an optical fiber, a Compact Disk Read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The program product for character recognition in the embodiments of the present application may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a computing device. However, the program product of the present application is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, Radio Frequency (RF), etc., or any suitable combination of the foregoing.

Program code for carrying out operations of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + +, or the like, as well as conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In situations involving remote computing devices, the remote computing devices may be connected to the user computing device through any kind of Network, including a Local Area Network (LAN) or Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

It should be noted that although several units or sub-units of the apparatus are mentioned in the above detailed description, such division is merely exemplary and not mandatory. Indeed, the features and functions of two or more units described above may be embodied in one unit, according to embodiments of the application. Conversely, the features and functions of one unit described above may be further divided into embodiments by a plurality of units.

Further, while the operations of the methods of the present application are depicted in the drawings in a particular order, this does not require or imply that these operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. A character recognition method, comprising:

2. The method of claim 1, wherein obtaining an associated feature map of each character in the target object from the feature map according to preset position division information of each character in the target object comprises:

determining a corresponding area of each character in the feature map according to preset position division information of each character in the target object, wherein the determined area is larger than an actual corresponding area of the character in the feature map; acquiring a characteristic diagram corresponding to the determined area from the characteristic diagram as an associated characteristic diagram of the character;

or

3. The method of claim 2, wherein the modifying the reference feature map according to the sub-feature maps adjacent to the reference feature map to obtain the associated feature map of the character comprises:

performing pooling treatment on the feature maps extracted each time;

4. The method of claim 3, wherein performing feature extraction on the reference feature map and sub-feature maps adjacent to the reference feature map a plurality of times comprises:

5. The method of claim 4, wherein pooling the extracted feature maps comprises:

6. The method according to claim 5, wherein fusing the pooled feature maps and the reference feature map to obtain the associated feature map of the character comprises:

7. The method of claim 4, wherein the character recognition network model is trained according to the following steps:

8. A character recognition apparatus, comprising:

9. An electronic device, comprising: at least one processor, and a memory communicatively coupled to the at least one processor, wherein:

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the character recognition method of any one of claims 1-7.

10. A storage medium, wherein instructions in the storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the character recognition method of any of claims 1-7.