CN112434698A

CN112434698A - Character recognition method, character recognition device, electronic equipment and storage medium

Info

Publication number: CN112434698A
Application number: CN202011317741.4A
Authority: CN
Inventors: 丁笑天; 刘岩; 朱兴杰; 张秋晖
Original assignee: Taikang Insurance Group Co Ltd
Current assignee: Taikang Insurance Group Co Ltd
Priority date: 2020-11-23
Filing date: 2020-11-23
Publication date: 2021-03-02

Abstract

The application aims to provide a character recognition method, a character recognition device, electronic equipment and a storage medium, and is used for solving the problems that in the prior art, the character recognition method and the device are difficult to be applied to recognition of character strings with different lengths, and the character recognition method and the device cannot be well applied to scenes with disordered text directions. In the embodiment of the application, each character can be segmented based on the ordered vertex position sequence of the single character, the orientation of the single character can be known, the identification based on the single character is suitable for identification of characters with indefinite length, and the orientation information of the single character is generated in the segmentation process of the single character, so that the relative position relation before different characters can be combed, and the character string is organized and output.

Description

Character recognition method, character recognition device, electronic equipment and storage medium

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a character recognition method and apparatus, an electronic device, and a storage medium.

Background

Character recognition is an important development direction of computer vision technology, and in recent years, image-based character recognition methods are continuously optimized and perfected. In the related art, the character recognition method based on the image is mostly realized by adopting a deep learning neural network. The method comprises two stages of character detection and character recognition. The character detection stage needs to identify a text block in a line in the image, and then identifies each character from the text block in the character identification stage to obtain a character string.

However, the inventor researches and finds that the method is difficult to be applied to recognition of the character strings with different lengths and cannot be well applied to scenes with disordered text directions.

Disclosure of Invention

The application aims to provide a character recognition method, a character recognition device, electronic equipment and a storage medium, and is used for solving the problems that in the prior art, the character recognition method and the device are difficult to be applied to recognition of character strings with different lengths, and the character recognition method and the device cannot be well applied to scenes with disordered text directions.

In a first aspect, an embodiment of the present application provides a character recognition method, which is applied to a server or a terminal, and the method includes:

acquiring a target image, wherein the target image comprises character strings oriented by a plurality of characters;

carrying out feature extraction on a target image to obtain character features of the target image;

carrying out single character region prediction on the character features to obtain the probability that each target position in the target image contains characters and character region information of each target position; the character region information comprises vertex position sequences which are orderly arranged according to the character direction;

when the probability that any target position contains characters to be recognized is larger than a probability threshold value, cutting out a character area represented by the character area information of the target position from the target image;

classifying each character area obtained by cutting to obtain characters contained in each character area;

according to the vertex position sequence of each character area, sorting characters in each character area to obtain a character recognition result of the target image;

and outputting the character recognition result of the target image to a terminal for display.

In some embodiments, the feature extraction is performed on the target image to obtain character features of the target image; carrying out single character region prediction on the character features to obtain the probability that each target position in the target image contains characters and character region information of each target position; the method comprises the following steps:

inputting the target image into a pre-trained character segmentation model, and performing feature extraction on the target image by a feature extraction module of the character segmentation model to obtain character features of the target image;

probability estimation is carried out on the probability that the target positions contain characters based on the character features by a probability prediction module of the character segmentation model, and the probability that each target position contains characters is obtained; and the number of the first and second electrodes,

and predicting the character positions of the target positions by a position prediction module of the character segmentation model based on the character features to obtain character area information of each target position.

In some embodiments, the method further comprises:

training the character segmentation model according to the following method:

acquiring a training sample, wherein the training sample comprises a sample image, character region information and a standard probability value of each sample character in the sample image; the standard probability value is used for representing the probability that the target position of the sample character in the sample image contains the character;

inputting the training sample into the character segmentation model to obtain a probability estimation value of characters contained in each target position of the sample image output by the character segmentation model and character region information estimation information of each target position;

calculating a probability estimation loss value according to the probability estimation value of the sample image and a standard probability value of the sample image, and calculating a region prediction loss value according to the character region information estimation information of the sample image and the character region information of the sample image;

calculating a total loss value of the sample image according to the probability estimation loss and the region prediction loss value;

and adjusting the model parameters of the character segmentation model according to the total loss value.

In some embodiments, the method further comprises:

acquiring the character area information of each sample character in the sample image according to the following method:

acquiring a pre-labeled text block, wherein the text block comprises at least one sample character and is associated with pre-text direction reference information for representing the text block;

when the text block comprises a plurality of sample characters, pixel projection processing is carried out on the text block in the character direction according to the text direction reference information associated with the text block, and the character area information of each sample character in the text block is obtained.

In some embodiments, the method further comprises:

obtaining the standard probability value of each sample character in the sample image according to the following method:

for each sample character, identifying the sample character area from the sample image according to the character area information corresponding to the sample character;

carrying out affine transformation on a pre-generated Gaussian distribution map according to the direction and the size of the sample character region in the sample image, and mapping the Gaussian distribution map to the sample character region;

acquiring mapping points of target points with the probability greater than a set probability threshold in the sample character area from the target Gaussian distribution map;

and taking the mapping point as a target position corresponding to the sample character, and taking the probability of the target point in the target Gaussian distribution map as the standard probability value of the sample character.

In some embodiments, the classifying each of the character regions obtained by the cutting to obtain characters included in each of the character regions includes:

classifying each character region obtained by cutting to obtain the probability that the character region belongs to each known character;

and taking the known character with the highest probability as the character contained in the character area.

In some embodiments, the step of obtaining the character recognition result of the target image by performing sorting processing on the characters in each character region according to the vertex position sequence of each character region includes:

determining the orientation of the character and adjacent characters according to the four corner vertexes;

and arranging adjacent characters in sequence to obtain the character recognition result of the target.

In some embodiments, after the classifying each of the character regions obtained by the clipping, the method further includes:

obtaining the confidence coefficient of each character obtained by the classification processing;

the vertex position sequence includes four corners vertexes of the ordered arrangement of characters, and after the characters in each character region are sequenced according to the vertex position sequence of each character region to obtain a character recognition result of the target image, the method further includes:

and correcting the character recognition result by adopting a natural language processing method according to the confidence coefficient of each character to obtain a final character recognition result.

In some embodiments, the performing feature extraction on the target image to obtain the character features of the target image includes:

extracting high-level features and low-level features in the target image;

and performing feature fusion on the obtained high-level features and low-level features to obtain the character features of the target image.

In a second aspect, the present application also provides a character recognition apparatus, the apparatus comprising:

the character segmentation module is used for acquiring a target image, wherein the target image comprises character strings with a plurality of character orientations, and extracting the characteristics of the target image to acquire the character characteristics of the target image; carrying out single character region prediction on the character features to obtain the probability that each target position in the target image contains characters and character region information of each target position; the character region information comprises vertex position sequences which are orderly arranged according to the character direction;

the character region cutting module is used for cutting out a character region represented by the character region information of the target position from the target image when the probability that any target position contains a character to be recognized is greater than a probability threshold;

the recognition module is used for classifying each character area obtained by cutting to obtain characters contained in each character area;

the sorting module is used for sorting the characters in each character region according to the vertex position sequence of each character region to obtain a character recognition result of the target image;

and the output module is used for outputting the character recognition result of the target image to a terminal for displaying.

In some embodiments, the character segmentation module is to:

In some embodiments, the apparatus further comprises:

a character segmentation model training module for training the character segmentation model according to the following method:

In some embodiments, the apparatus further comprises:

a character region information obtaining module, configured to obtain the character region information of each sample character in the sample image according to the following method:

In some embodiments, the apparatus further comprises:

a standard probability value obtaining module, configured to obtain the standard probability value of each sample character in the sample image according to the following method:

In some embodiments, the identification module is to:

In some embodiments, the vertex position sequence includes four corner vertices of an ordered arrangement of characters, the ordering module is configured to determine the orientation of the character and adjacent characters according to the four corner vertices; and arranging adjacent characters in sequence to obtain the character recognition result of the target.

In some embodiments, after the classifying processing is performed on each of the character regions obtained by the cutting, the apparatus further includes:

a confidence coefficient obtaining module, configured to obtain a confidence coefficient of each character obtained through the classification processing;

and the correction module is used for correcting the character recognition result by adopting a natural language processing device according to the confidence coefficient of each character to obtain a final character recognition result.

In some embodiments, the character segmentation module is to:

extracting high-level features and low-level features in the target image;

In a third aspect, another embodiment of the present application further provides an electronic device, including at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to perform the method of the first aspect.

In a fourth aspect, this application further provides a non-transitory computer-readable storage medium, wherein instructions in the storage medium, when executed by a processor of the electronic device, cause the electronic device to perform the method of the first aspect in this application embodiment.

In the embodiment of the application, each character can be segmented based on the ordered vertex position sequence of the single character, the orientation of the single character can be known, the identification based on the single character is suitable for identification of characters with indefinite length, and the orientation information of the single character is generated in the segmentation process of the single character, so that the relative position relation before different characters can be combed, and the character string is organized and output.

Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the application. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the embodiments of the present application will be briefly described below, and it is obvious that the drawings described below are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic illustration of an application environment according to one embodiment of the present application;

FIG. 2 is a flow chart illustrating a character recognition method according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a sequence of vertex positions according to one embodiment of the present application;

FIGS. 4-5 are schematic views of a mold structure according to one embodiment of the present application;

FIG. 6 is a schematic flow chart illustrating training of a character segmentation model according to an embodiment of the present application;

FIG. 7 is an illustrative diagram of some implementations in a character recognition process according to one embodiment of the application;

FIG. 8 is a graphical illustration of a probability of locating a target location of a sample character according to one embodiment of the present application;

FIG. 9 is a schematic illustration of a stamp image for character organization according to one embodiment of the present application;

FIG. 10 is a flow diagram illustrating character recognition according to one embodiment of the present application;

FIG. 11 is a schematic diagram of a character recognition apparatus according to one embodiment of the present application;

FIG. 12 is a schematic view of an electronic device according to one embodiment of the present application.

Detailed Description

In order to make the technical solutions of the present application better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first", "second", and the like in the description of the present application are used for distinguishing similar objects, and are not necessarily used for describing a particular order or sequence. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.

The inventor researches and discovers that although deep learning neural networks can be applied in the related art, a text block in a row in an image needs to be identified through a character detection stage, and then each character is identified from the text block in a character identification stage, so that a character string is obtained. However, the inventor researches and finds that the method is difficult to be applied to recognition of the character strings with different lengths and cannot be well applied to scenes with disordered text directions.

In view of this, the embodiment of the present application provides a character recognition method, and the inventive concept of the method is as follows: first, each character is annotated with an ordered sequence of vertices (also referred to as a sequence of vertex positions) that represents its character orientation, and then a character segmentation model is trained based on the ordered sequence of vertices so that the model can segment out a single character in the image and give a sequence of vertex positions for the single character. The character orientation can be well determined according to the vertex position sequence so as to facilitate better character recognition processing.

The following describes the character recognition method provided in the present application with reference to the drawings. FIG. 1 is a schematic diagram of an application environment according to one embodiment of the present application.

As shown in fig. 1, the application environment may include, for example, at least one server 20 and a plurality of terminal devices 40. The terminal devices 40 can communicate with each other, and the terminal devices 40 and the server 20 can communicate with each other based on a network. Terminal devices include, but are not limited to, computers, laptops, smart phones, tablets, scanners, or other types of user terminals. The server 20 is a device capable of providing interactive services and character recognition capabilities. The server 20 may be used for the user to upload the image shot by the user, for example, the terminal device may scan the policy or shoot the image, and then upload the obtained image to the server 20, where the server 20 performs character recognition in the image in the embodiment of the present application. Of course, when the terminal is based on a certain processing capability, character recognition of the image may also be done by the terminal device.

In some embodiments, an example of a text orientation inconsistency is a stamp. When the stamp is included in the image, the text in the stamp is in a wrap-around format. Because the orientation of the characters cannot be known in the related art, the characters in the seal cannot be well recognized, and the characters can be segmented out in the application and the orientation of the characters can be detected, so that the characters can be well recognized. And the character recognition is carried out in a single character recognition mode, so that the method can be suitable for character recognition of indefinite length.

As shown in fig. 2, a schematic flow chart of the character recognition method provided in the embodiment of the present application may be applied to a server or a terminal, and the method includes the following steps:

in step 201, a target image is acquired, the target image including a character string with a plurality of character orientations.

For example, a seal or handwritten characters may be provided, and the character direction may be disordered.

In step 202, feature extraction is performed on a target image to obtain character features of the target image;

in step 203, performing single character region prediction on the character features to obtain the probability that each target position in the target image contains characters and character region information of each target position; the character region information comprises vertex position sequences which are orderly arranged according to the character direction;

in some embodiments, the sequence of vertex positions may comprise an ordered sequence of four vertices of a rectangular box of characters, for example as shown in the (a) diagram in FIG. 3, the ordered sequence of vertices comprising, in order, an upper-left vertex position, an upper-right vertex position, a lower-right vertex position, and a lower-left vertex position of the rectangular box of characters.

In other embodiments, the present invention can be applied to any embodiment as long as the orientation and the region of the character can be recognized. For example, the sequence of vertex positions may include three vertices. As shown in (b) of fig. 3, the top left corner vertex position, the top right corner vertex position, and the center position of the lower boundary of the character of the rectangular box of the character, respectively. For another example, as shown in (c) of fig. 3, a character center position and two adjacent vertex positions of a rectangular frame of the character may be further included.

In step 204, when the probability that any target position contains a character to be recognized is greater than a probability threshold value, cutting out a character area represented by the character area information of the target position from the target image;

in step 205, performing classification processing on each cut character region to obtain characters included in each character region;

in step 206, according to the vertex position sequence of each character region, performing sorting processing on the characters in each character region to obtain a character recognition result of the target image;

in step 207, the character recognition result of the target image is output to a terminal for display.

Therefore, in the embodiment of the application, each character region can be segmented from the target image through feature extraction and single character region prediction, and the vertex position sequence of the character of each character region can be obtained, so that each character region can be classified, each character can be recognized, and the recognized characters can be sequenced according to the vertex position sequence, and a character recognition result can be obtained. In the whole process, the text block does not need to be positioned and detected, and the single character recognition can be compatible with characters with different lengths.

In some embodiments, character segmentation of the target image may be achieved based on a deep-learned character segmentation model. As shown in fig. 4, a schematic diagram of a character segmentation model provided in the embodiment of the present application is shown, where the character segmentation model includes a feature extraction module 401, a probability prediction module 402, and a position prediction module 403. Wherein:

(1) the feature extraction module 401 is configured to perform feature extraction on the target image to obtain character features of the target image;

in some embodiments, in order to be able to extract features in different receptive fields of a target image, in the embodiments of the present application, a high-level feature and a low-level feature in the target image may be extracted; and then, performing feature fusion on the acquired high-level features and low-level features to obtain character features of the target image.

As shown in fig. 5, the feature extraction module may include 4 backbone network feature layers and three feature fusion layers. The backbone network feature layer is used for extracting image features of different layers, including high-layer features and low-layer features, and the feature fusion layer can be used for performing feature fusion on the image features of the different layers to finally obtain character features, and the character features are output to the probability prediction module and the position prediction module.

In practice, the feature backbone network is generally a neural network with multiple layers and mainly operated by convolution, and is used for extracting features (features) of different layers from an input image. The network part of the backbone network at the low level can extract simpler features in the original image, and the network part of the backbone network at the high level can extract more complex features of the image and the combination of the features. In the present application, the structure and type of the backbone network used are not limited, and in general, networks with a better effect for image classification (e.g., for ImageNet image classification data set), such as ResNet (including ResNet, ResNetv2, ResNext), densnet, VGG, etc., may be selected as the backbone network.

The feature fusion process is used to obtain different size outputs processed by the backbone network from different convolutional layers of the backbone network for a picture of fixed input size, e.g., W × H size. Taking ResNet as an example, the output obtained by the Conv2_ x convolution module is W/4 × H/4; the output obtained by the Conv3_ x convolution module is W/8 × H/8; the output obtained by the Conv4_ x convolution module is W/16 × H/16; the output from the Conv5_ x convolution module is W/32 × H/32.

In the present application, the layer at the highest level of the backbone network, i.e., the layer with the smallest image size scaling (e.g., Conv5_ x of 1/32), is subjected to Upsampling) + convolution calculation, so that the image feature output size of this layer is the same as the feature output of the previous layer (e.g., Conv4_ x of 1/16).

In other embodiments, the upsampling technique may be replaced with image scaling (Resize) or deconvolution (De-Convolution).

And performing feature fusion on the up-sampled high-level feature layer and the next lower-level feature layer. Feature fusion can be performed in the form of matrix-wise value addition (Element-wise addition) or in the form of simple matrix engagement (Concatenate). And (3) continuing to apply Upsampling (Upsampling) + convolution calculation to the fused features to enable the size of the feature map to be the same as that of the lower level, and then continuing to perform feature fusion in a form of matrix value addition or matrix connection. And so on until the feature of the lowest level is fused. At this time, the output of the backbone network comprises an image feature matrix which is extracted by the backbone network features and is fused with features of different levels from low to high.

In model building, the method does not have fixed limitation on which layers are extracted from the backbone network for feature fusion, for example, for ResNet, the method can be C2-C5, C2-C4, or C3-C5, and then custom convolution, pooling and other operations are performed on the basis of C5 to derive C6, C7 and the like (namely, fusion C3-C7).

(2) The probability prediction module 402 is used for performing probability estimation on the probability that the target positions contain characters based on the character features to obtain the probability that each target position contains characters;

(3) and a position prediction module 403 for predicting the character positions of the respective target positions based on the character features to obtain character region information of each target position.

The probability prediction module 402 and the location prediction module 403 may each be a small multi-layer convolutional subnetwork. The convolution kernel size and the number of layers of the two modules can be adjusted according to specific application scenarios, and can also comprise a Pooling layer (Pooling) and an Activation layer (Activation). The input of the probability prediction module 402 and the position prediction module 403 are the fused image feature matrix (i.e. character features) output in the previous step; for the probability prediction module 402, the output is W/s × H/s, 1 Channel (Channel) output matrix, where s represents the scaling factor relative to the original image (s value can be determined according to the aforementioned "e.g. for ResNet, C2-C5, C2-C4, or C3-C5, and then custom convolution, pooling, etc. are performed on the basis of C5 to derive C6, C7, etc. (i.e. merge C3-C7)", if C2 is selected, s value is 4, if C5 is selected, s value is 32, and so on), and 1 Channel represents the score of the character (i.e. the probability that this pixel is not a character pixel in the character feature); for the position prediction module 403, it outputs a multi-channel output matrix of size W/s × H/s, each channel identifying one coordinate of the localization position, s representing the scaling factor with respect to the original image. If the rectangular four-corner vertices are used to identify the ordered vertex sequence, the output of the position prediction module 403 is an 8-channel output matrix, where 8 channels represent offset values of the coordinates xy of the 4 vertices of the character with respect to the pixel.

After the structure and functions of the character segmentation network in the present application are introduced, how to train the character segmentation network provided in the embodiments of the present application is described below. As shown in fig. 6, the method comprises the following steps:

in step 601, a training sample is obtained, wherein the training sample comprises a sample image, character region information of each sample character in the sample image and a standard probability value; the standard probability value is used for representing the probability that the target position of the sample character in the sample image contains the character;

that is, in the embodiment of the present application, each sample image may correspond to two labels, one label is a probability that has a character corresponding to a character position, and the other label is character region information, where the character region information is described in an ordered vertex sequence and used for describing a region of a character and an orientation of the character.

1. Acquisition of information about character area

In some embodiments, taking the ordered four-corner vertices shown in (a) of fig. 3 as an example, the four-corner vertices of each character can be labeled manually or by machine, and can be arranged in the order from the top-left vertex, the top-right vertex, the bottom-right vertex to the bottom-left vertex.

In another embodiment, in order to improve the efficiency of labeling the character region information, in this embodiment, the character region information of each sample character in the sample image may be obtained according to the following method:

firstly, as shown in fig. 6, in step a1, a pre-labeled text block is obtained, where the text block includes at least one sample character and the text block is associated with pre-labeled text direction reference information for representing the text block; for example, as shown in fig. 7, a character string including "text block" in the sample image may be used as a text block to be analyzed. This string of "text blocks" can be manually labeled in the image, for example, the four corner vertices shown by the dashed box in FIG. 7. And the four corner vertexes are orderly arranged, such as E1, E2, E3 and E4 are orderly arranged according to the orientation of characters.

Since the four-corner vertex sequence of each character needs to be obtained by segmentation in the present application, after the text block is marked, in step a2, when the text block includes a plurality of sample characters, pixel projection processing is performed on the text block in the character direction according to text direction reference information associated with the text block, so as to obtain character region information of each sample character in the text block.

Continuing with fig. 7, pixel projection processing is performed on the three character strings of the "text block" to obtain a projection processing result with a curve shape having a distinct valley as shown in fig. 7, so that the coordinates of a single character in the text block can be determined according to the distribution of pixel projection values (there may be a peak valley with a lower pixel projection value in the middle of a character) and the number of characters in the text block. As shown in fig. 7, the intersection point ABDC is an ordered sequence of four corner vertex coordinate positions of the character "this".

Therefore, in the embodiment, each character does not need to be labeled manually, and only the text block needs to be labeled, so that the respective ordered vertex sequence of each character can be obtained based on pixel projection processing. The workload of manual labeling can be effectively reduced, and the labeling efficiency for obtaining training samples is improved.

2. Acquisition of standard probability values

In general, a character occupies a certain area, and the character area includes a plurality of pixels. And the forms of different characters are different, so that the position of the character is represented by selecting which position in the character area. In practice, the position with higher probability can be selected as the character position according to the Gaussian distribution.

First, a template conforming to a gaussian distribution may be generated. For example, a square image with pixel values corresponding to a gaussian distribution can be generated based on a cartesian gaussian distribution function. The image size may be 50 × 50 (pixels), and in practice, the gaussian distribution map may be sized according to the actual application scene.

Because the pixel value of each pixel point in the Gaussian distribution map represents the probability, the Gaussian distribution map can be used as a template to perform matching processing on the image of the sample character, and the pixel point at the position with high probability is obtained and used as the position of the sample character in the image.

Therefore, as shown in fig. 6, the following steps can be implemented:

then in step B1, for each sample character, identifying the sample character area from the sample image according to the character area information corresponding to the sample character;

in step B2, affine transformation is performed on a gaussian distribution map generated in advance according to the direction and size of the sample character region in the sample image, and the gaussian distribution map is mapped to the sample character region;

in step B3, mapping points of the target points with the probability greater than a set probability threshold in the sample character region are obtained from the target gaussian distribution map.

In step B4, the mapped point is used as a target position corresponding to the sample character, and the probability of the target point in the target gaussian distribution map is used as the standard probability value of the sample character.

As shown in fig. 8, the diagram of the upper left-corner positive direction circular pattern is a gaussian distribution diagram serving as a template, a white frame inclined below the gaussian distribution diagram represents a sample character region of a sample character in a sample image, the gaussian distribution diagram serving as the template is mapped into the sample character region of the sample image through affine transformation according to the size and the orientation of the sample character, a mapping point of a point 0 with a higher probability in the gaussian distribution diagram in the sample character region is found as a position of the sample character in the sample image, and a probability of a point O in the gaussian distribution diagram is taken as a standard probability value of the sample character, so as to facilitate subsequent training.

Therefore, in the embodiment of the application, the position of each sample character in the sample image can be accurately and efficiently positioned based on the Gaussian distribution map to serve as the target position used in the training stage, and reasonable probability is used as a training label for training the model.

In practice, the same sample image corresponds to a Score map and an Offset map, respectively. Wherein, the Score map includes the standard probability value distribution of each target character. Score map, which describes the probability value of the label that each target location contains a character. The Offset map is used to describe character area information of characters at each target position.

In order to use the ordered four-corner vertices, in the present application, the character area information is expressed by using the position offset from the target position to the four-corner vertices of the sample character.

Therefore, after the four corner vertices of each character in the sample image are segmented and the target position of each character in the sample image is obtained, the Offset between the ordered four corner vertex positions and the target position can be calculated, and then the Offset map is constructed. For example, an image matrix of W/4 × H/4 × 8, all pixel values of 0, is initialized. In the gaussian distribution of each Score map, pixel positions with pixel values greater than a certain threshold value are found, for example, the pixel values are greater than 0.9, and then the offsets (Δ x1, Δ y1, Δ x2, Δ y2, Δ x3, Δ y3, Δ x4, Δ y4) of the pixel coordinates from the 4 corner points of the corresponding character are assigned to 8 channels of the pixel positions of the Offset map.

In summary, after obtaining the actual values of the sample image, the Score map and the Offset map, the training template may be sent to the built character segmentation model in batches (batch), and the character segmentation model may be subjected to parameter fitting training. The size of the batch is determined according to the hardware performance and scale for training, and is typically determined to be 8.

That is, in step 602, the training sample may be input to the character segmentation model, and a probability estimation value that each target position of the sample image output by the character segmentation model contains a character and character region information estimation information of each target position are obtained;

in step 603, calculating a probability estimation loss value according to the probability estimation value of the sample image and the standard probability value of the sample image, and calculating a region prediction loss value according to the character region information estimation information of the sample image and the character region information of the sample image;

in step 604, calculating a total loss value of the sample image according to the probability estimation loss and the region prediction loss value;

in step 605, the model parameters of the character segmentation model are adjusted according to the total loss value.

That is, each time a batch of sample images are input, the model outputs a Score map and Offset map prediction values, and in order to evaluate the difference between the model prediction value and the true value, a loss function is generally defined for evaluation. In the present embodiment, a Mean Square Error (MSE) function may be used as the loss function of the Score map; the Smooth L1 distance is used as a loss function for the Offset map. Then, the total loss is calculated based on the Score map loss function and the Offset map loss function to fit the parameters of the character segmentation model. In some embodiments, the probability estimate loss and the regional prediction loss values may be weighted and summed to obtain an overall loss value.

In addition, in another embodiment, in the training process, in order to make the model more robust, the input sample image and the label are generally synchronously scaled, rotated, shifted, deformed, and the like. To increase the model training speed, a single-machine multi-GPU environment or multi-machine distributed training is generally used.

In addition, in the training stage, the character segmentation model can also output the character with Score value of N before ranking as a candidate character, and the candidate character and the Score are output together, so that the character recognized by a natural language processing method can be corrected subsequently.

In addition to the character segmentation stage, the classification processing in the character recognition stage may also be implemented by using a neural network model in the embodiment of the present application. Training of the character recognition model, the training sample adopted by the training model can be each character marked manually, or an image containing a single character synthesized according to the character to be recognized.

In implementation, as shown in fig. 5, after the probability prediction module gives the probability that each target position contains a character and the vertex position sequence corresponding to each target position, a target position with a higher probability may be selected according to the two parameter information, and character positioning is performed in the target image, that is, an image part of a single character is segmented, and then input to the character recognition network for classification processing, so as to obtain character content, and then the position relationship between the character content and different character strings is input to the Natural Language Processing (NLP) module for correcting the recognized character content, and then the character recognition result of the entire target image is organized and output.

During implementation, the classification processing of the character regions can classify each character region obtained by cutting, so as to obtain the probability that the character region belongs to each known character; and then the known character with the highest probability is taken as the character contained in the character area. Therefore, the character of each character area can be accurately obtained.

In order to further optimize the classification processing result, in the application, after each character region obtained by cutting is subjected to classification processing, the confidence coefficient of each character obtained by the classification processing can be obtained; then, based on the vertex position sequence including four corners of the ordered arrangement of the characters, the characters in each character region are sequenced according to the vertex position sequence of each character region, and after the character recognition result of the target image is obtained, the character recognition result is corrected according to the confidence coefficient of each character by adopting a natural language processing method, and the final character recognition result is obtained. For example, after each line of text is obtained, the line of text is corrected using NLP.

When organizing the character recognition result of the output target image, taking the ordered four-corner vertex sequence as an example, the orientation of the character and the adjacent characters can be determined according to the four-corner vertices; and then arranging adjacent characters in sequence to obtain a character recognition result of the target.

Taking a stamp as an example, as shown in fig. 9, characters in a common circular stamp are generally arranged in order, and the angles of the characters are changed according to a certain rule, and the distances are almost the same. Therefore, the orientation position of each character can be determined, and the character strings meeting the characteristics can be positioned according to the characteristics of the circular stamp, so that the characters in the stamp can be organized and output.

In another embodiment, characters which are in the same line and are not obviously divided can be combined into text blocks according to the position relation and the character content of each character. And simultaneously, according to the candidate character of each character, combining an NLP means to correct and output the text content of each text block.

During output, the recognition result may be output according to a desired format, for example, using JSON format, where the output includes each text block in the input image and the position of each character.

The method for recognizing characters in the embodiment of the present application is described by taking a claim settlement scenario as an example. As shown in fig. 10, the following steps may be included:

in step 1001, the terminal device is used to take a picture of the invoice and upload the picture to the server.

In step 1002, the server first pre-processes the invoice image to optimize invoice image quality.

In step 1003, the server extracts the low-level features and the high-level features in the invoice image, and performs feature fusion on the extracted low-level features to obtain fused character features.

In step 1004, probability estimation is performed on probabilities that a plurality of target positions contain characters based on character features, so as to obtain probabilities that each of the plurality of target positions contains characters; and the number of the first and second electrodes,

in step 1005, the character positions of the respective target positions are predicted based on the character features, and an ordered vertex sequence of each target position is obtained.

In step 1006, when the probability that the target location contains a character is higher than the probability threshold, the server crops the character region from the invoice image according to the ordered vertex sequence of the target location.

In step 1007, each character region obtained by clipping is classified to obtain a character and a confidence of the character.

In step 1008, the natural language processing method is used to correct each recognized character according to the confidence of each character, so as to obtain a final character recognition result, and the final character recognition result is output to the review terminal for display.

Wherein, the recognized characters and the invoice images are displayed so as to be convenient for manual conformity with the recognition results.

In summary, in the embodiment of the present application, segmentation and recognition based on a single character are applicable to recognition of characters with indefinite lengths, and orientation information of the single character is generated in the segmentation process of the single character, so that a relative position relationship before different characters can be combed, NLP correction is performed on a character string, and the character string is organized and output.

Based on the same inventive concept, an embodiment of the present application further provides a character recognition apparatus, as shown in fig. 11, the apparatus 1100 includes:

the character segmentation module 1101 is configured to acquire a target image, where the target image includes character strings oriented to a plurality of characters, and perform feature extraction on the target image to obtain character features of the target image; carrying out single character region prediction on the character features to obtain the probability that each target position in the target image contains characters and character region information of each target position; the character region information comprises vertex position sequences which are orderly arranged according to the character direction;

a character region clipping module 1102, configured to clip, from the target image, a character region indicated by the character region information of any target position when the probability that the target position contains a character to be recognized is greater than a probability threshold;

the identification module 1103 is configured to perform classification processing on each cut character region to obtain characters included in each character region;

a sorting module 1104, configured to perform sorting processing on the characters in each character region according to the vertex position sequence of each character region, so as to obtain a character recognition result of the target image;

an input module 1105, configured to output the character recognition result of the target image to a terminal for display.

In some embodiments, the character segmentation module is to:

In some embodiments, the apparatus further comprises:

In some embodiments, the identification module is to:

In some embodiments, the character segmentation module is to:

extracting high-level features and low-level features in the target image;

For the implementation and beneficial effects of the operations in the character recognition apparatus, reference is made to the description of the foregoing method, and further description is omitted here.

Having described the character recognition apparatus method and apparatus of the exemplary embodiments of the present application, an electronic device according to another exemplary embodiment of the present application is next described.

As will be appreciated by one skilled in the art, aspects of the present application may be embodied as a system, method or program product. Accordingly, various aspects of the present application may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.

In some possible implementations, an electronic device according to the present application may include at least one processor, and at least one memory. Wherein the memory stores program code which, when executed by the processor, causes the processor to perform the steps of the character recognition method according to various exemplary embodiments of the present application described above in the present specification.

The electronic device 130 according to this embodiment of the present application is described below with reference to fig. 12. The electronic device 130 shown in fig. 12 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.

As shown in fig. 12, the electronic device 130 is represented in the form of a general electronic device. The components of the electronic device 130 may include, but are not limited to: the at least one processor 131, the at least one memory 132, and a bus 133 that connects the various system components (including the memory 132 and the processor 131).

Bus 133 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, a processor, or a local bus using any of a variety of bus architectures.

The memory 132 may include readable media in the form of volatile memory, such as Random Access Memory (RAM)1321 and/or cache memory 1322, and may further include Read Only Memory (ROM) 1323.

Memory 132 may also include a program/utility 1325 having a set (at least one) of program modules 1324, such program modules 1324 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

The electronic device 130 may also communicate with one or more external devices 134 (e.g., keyboard, pointing device, etc.), with one or more devices that enable a user to interact with the electronic device 130, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 130 to communicate with one or more other electronic devices. Such communication may occur via input/output (I/O) interfaces 135. Also, the electronic device 130 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) via the network adapter 136. As shown, network adapter 136 communicates with other modules for electronic device 130 over bus 133. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with electronic device 130, including but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

In some possible embodiments, various aspects of a character recognition method provided by the present application may also be implemented in the form of a program product including program code for causing a computer device to perform the steps in the character recognition method according to various exemplary embodiments of the present application described above in this specification when the program product is run on the computer device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The program product for character recognition of the embodiments of the present application may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on an electronic device. However, the program product of the present application is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the consumer electronic device, partly on the consumer electronic device, as a stand-alone software package, partly on the consumer electronic device and partly on a remote electronic device, or entirely on the remote electronic device or server. In the case of remote electronic devices, the remote electronic devices may be connected to the consumer electronic device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external electronic device (e.g., through the internet using an internet service provider).

It should be noted that although several units or sub-units of the apparatus are mentioned in the above detailed description, such division is merely exemplary and not mandatory. Indeed, the features and functions of two or more units described above may be embodied in one unit, according to embodiments of the application. Conversely, the features and functions of one unit described above may be further divided into embodiments by a plurality of units.

Further, while the operations of the methods of the present application are depicted in the drawings in a particular order, this does not require or imply that these operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. A character recognition method is applied to a server or a terminal, and the method comprises the following steps:

2. The method according to claim 1, wherein the feature extraction is performed on the target image to obtain character features of the target image; carrying out single character region prediction on the character features to obtain the probability that each target position in the target image contains characters and character region information of each target position; the method comprises the following steps:

3. The method of claim 2, further comprising:

training the character segmentation model according to the following method:

4. The method of claim 3, further comprising:

5. The method of claim 3, further comprising:

6. The method according to any one of claims 1 to 5, wherein the classifying each of the character regions obtained by the cutting to obtain the characters included in each of the character regions comprises:

7. The method according to any one of claims 1 to 5, wherein the vertex position sequence includes four corners of an ordered arrangement of characters, and the obtaining of the character recognition result of the target image by performing a sorting process on the characters in each character region according to the vertex position sequence of each character region includes:

8. The method according to any one of claims 1 to 5, wherein after the classifying processing is performed on each of the character regions obtained by the cutting, the method further comprises:

9. The method according to claim 1, wherein the performing feature extraction on the target image to obtain the character features of the target image comprises:

extracting high-level features and low-level features in the target image;

10. An apparatus for character recognition, the apparatus comprising:

11. An electronic device comprising at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-9.

12. A computer storage medium, characterized in that the computer storage medium stores a computer program for causing a computer to perform the method of any one of claims 1-9.