CN111046859B

CN111046859B - Character recognition method and device

Info

Publication number: CN111046859B
Application number: CN201811184618.2A
Authority: CN
Inventors: 朱尧
Original assignee: Hangzhou Hikvision Digital Technology Co Ltd
Current assignee: Hangzhou Hikvision Digital Technology Co Ltd
Priority date: 2018-10-11
Filing date: 2018-10-11
Publication date: 2023-09-29
Anticipated expiration: 2038-10-11
Also published as: CN111046859A

Abstract

The application provides a character recognition method and a device, wherein the method comprises the following steps: inputting an image to be recognized into a character recognition model, positioning character key points in the image to be recognized through a character positioning network by the character recognition model, outputting the character key points to a character correction network in the character recognition model, determining a correction image corresponding to a character area in the image to be recognized by the character correction network in the image to be recognized by utilizing the corresponding relation between the character key points and preset position points, and outputting the correction image to the character recognition network in the character recognition model to recognize characters in the correction image. The character correction network can correct images with problems of inclination, rotation, deformation and the like, so that the recognition result has good stability and high recognition accuracy, the character recognition model can position character key points through the character positioning network and can obtain results through the character correction network and the character recognition network, accurate character frames do not need to be detected from the images, segmentation is not needed, and the recognition accuracy is high.

Description

Character recognition method and device

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a method and an apparatus for character recognition.

Background

Current character recognition technology generally includes two modules, character region location and character segmentation. In the deep learning method, the character recognition technology is realized through a plurality of deep learning models, namely, the image is input into a feature extraction model to extract the features of the image, then the features output by the feature extraction model are input into a target detection model to detect a character frame, and finally the character frame output by the target detection model and the features output by the feature extraction model are input into a character segmentation model to carry out character segmentation.

However, the multiple deep learning models exist independently, and each deep learning model has data interaction, so that redundant calculation exists, and the memory space is occupied, so that the character recognition speed is low.

Disclosure of Invention

In view of the above, the present application provides a character recognition method and apparatus to solve the problem of low recognition speed in the character recognition method in the related art.

According to a first aspect of an embodiment of the present application, there is provided a character recognition method, the method including:

inputting an image to be recognized into a trained character recognition model, positioning character key points in the image to be recognized through a character positioning network by a character recognition model, outputting the character key points to a character correction network in the character recognition model, determining a correction image corresponding to a character area in the image to be recognized by the character correction network in the image to be recognized by utilizing the corresponding relation between the character key points and preset position points, and outputting the correction image to the character recognition network in the character recognition model to recognize characters in the correction image; and acquiring a character recognition result output by the character recognition model.

According to a second aspect of an embodiment of the present application, there is provided a character recognition apparatus, the apparatus including:

the character recognition module is used for inputting an image to be recognized into a trained character recognition model, positioning character key points in the image to be recognized through a character positioning network by the character recognition model, outputting the character key points to a character correction network in the character recognition model, determining a correction image corresponding to a character area in the image to be recognized by the character correction network in the image to be recognized by utilizing the corresponding relation between the character key points and preset position points, and outputting the correction image to the character recognition network in the character recognition model to recognize characters in the correction image; and the acquisition module is used for acquiring the character recognition result output by the character recognition model.

According to a third aspect of embodiments of the present application, there is provided an electronic device comprising a readable storage medium and a processor;

wherein the readable storage medium is for storing machine executable instructions;

the processor is configured to read the machine-executable instructions on the readable storage medium and execute the instructions to implement the steps of the character recognition method described above.

Based on the description, the whole recognition process is realized in the character recognition model, so that data interaction between a plurality of models and an external platform does not exist, the recognition speed can be improved, and meanwhile, the maintenance difficulty is reduced. The character correction network in the character recognition model can correct images with problems of inclination, rotation, deformation and the like, so that the character recognition result of the character recognition model is good in stability and high in recognition accuracy. And after inputting an image to the character recognition model, the model directly outputs a character recognition result, so that the end-to-end character recognition can be truly realized. In addition, the character recognition model can obtain a character recognition result only by positioning character key points in the image through the character positioning network and by the character correcting network and the character recognition network, and does not need to detect accurate character frames from the image or divide, so that the recognition accuracy can be further improved.

Drawings

FIG. 1 is a block diagram of a character recognition model according to an exemplary embodiment of the present application;

FIG. 2A is a flow chart illustrating an embodiment of a method of character recognition according to an exemplary embodiment of the present application;

FIG. 2B is a schematic diagram of a located character keypoint according to the embodiment of the application shown in FIG. 2A;

FIG. 2C is a schematic view of a preset position point according to the embodiment of FIG. 2A;

FIG. 2D is a schematic illustration of a rectified image according to the embodiment of FIG. 2A;

FIG. 3 is a flow chart illustrating another character recognition method according to an exemplary embodiment of the present application;

FIG. 4 is a hardware architecture diagram of an electronic device according to an exemplary embodiment of the application;

fig. 5 is a block diagram showing an embodiment of a character recognition apparatus according to an exemplary embodiment of the present application.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the application. Rather, they are merely examples of apparatus and methods consistent with aspects of the application as detailed in the accompanying claims.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any or all possible combinations of one or more of the associated listed items.

It should be understood that although the terms first, second, third, etc. may be used herein to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the application. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "responsive to a determination", depending on the context.

The character recognition technology implemented by a plurality of deep learning models (including a feature extraction model, a target detection model, a character segmentation module) in the related art has the following problems: 1. the multiple deep learning models exist independently, each deep learning model has data interaction with an external platform, so that redundant calculation exists, the memory space is occupied, and the character recognition speed is low. 2. If the image has problems of inclination, deformation, etc., the recognition result may not be obtained, and thus the above character recognition technique is poor in stability. 3. The accuracy of character segmentation by the character segmentation model depends on the accuracy of character frame detection by the target detection model, so that the dependency of segmentation tasks is strong, and if the detected character frame is not accurate enough, segmentation errors are easy to occur.

Based on this, fig. 1 is a diagram of a character recognition model according to an exemplary embodiment of the present application, as shown in fig. 1, an image to be recognized is input into a trained character recognition model, first, a character key point is located in the image to be recognized by the character recognition model through a character locating network, and is output to a character correction network in the character recognition model, so that a correction image corresponding to a character area in the image to be recognized is determined by the character correction network in the image to be recognized by using a correspondence between the character key point and a preset position point, and the correction image is output to the character recognition network in the character recognition model to recognize characters in the correction image, so that a character recognition result output by the character recognition model can be obtained.

Based on the description, the whole recognition process is realized in the character recognition model, so that data interaction between a plurality of models and an external platform does not exist, the recognition speed can be improved, and meanwhile, the maintenance difficulty is reduced. The character correction network in the character recognition model can correct images with problems of inclination, deformation and the like, so that the character recognition result of the character recognition model is good in stability and high in recognition accuracy. And after inputting an image to the character recognition model, the model directly outputs a character recognition result, so that the end-to-end character recognition can be truly realized. In addition, the character recognition model can obtain a character recognition result only by positioning character key points in the image through the character positioning network and by the character correcting network and the character recognition network, and does not need to detect accurate character frames from the image or divide, so that the recognition accuracy can be further improved.

The technical scheme of the application is described in detail below by using specific examples.

Fig. 2A is a flowchart of an embodiment of a character recognition method according to an exemplary embodiment of the present application, and in combination with the above-described character recognition model structure shown in fig. 1, the character recognition model is obtained by training in advance, and may include a character positioning network, a character correcting network, and a character recognition network. As shown in fig. 2A, the character recognition method includes the steps of:

step 201: inputting an image to be recognized into a trained character recognition model, positioning character key points in the image to be recognized through a character positioning network by the character recognition model, outputting the character key points to a character correction network in the character recognition model, determining a correction image corresponding to a character area in the image to be recognized by the character correction network in the image to be recognized by utilizing the corresponding relation between the character key points and preset position points, and outputting the correction image to the character recognition network in the character recognition model to recognize characters in the correction image.

In an embodiment, for the process of positioning the key points of the characters in the image to be identified by the character positioning network, the features of the image to be identified can be extracted by the feature extraction network in the character positioning network, and output to the key point regression network in the character positioning network, and the extracted features are utilized by the key point regression network to extract the key points of the characters.

The image to be identified may be a gray-scale image under a natural scene (such as a scene of shop name identification, billboard identification, etc.), or may be a gray-scale image under a specific scene (such as a scene of license plate identification, business card identification, certificate identification, etc.). The feature extraction network may comprise a plurality of convolution layers and a pooling layer, but convolving at least once before each pooling. The keypoint regression network may comprise a fully connected layer and a plurality of regression layers. The extracted character key points can be edge points of a character area in the image to be recognized, the number of the character key points can be set according to actual requirements, as shown in fig. 2B, and the "+" in fig. 2B indicates the character key points, and the number of the character key points is 16.

In an embodiment, for a process of determining a corrected image corresponding to a character region in an image to be recognized by using a correspondence between a character key point and a preset position point in the image to be recognized by using a character correction network, a corresponding TPS (Thin Plate Spline, thin-plate spline) transformation matrix may be determined according to the correspondence between the character key point and the preset position point, after a blank corrected image is created, for each position point in the corrected image, a coordinate point corresponding to the position point in the image to be recognized is determined by using the TPS transformation matrix, and a corrected pixel value is obtained by interpolation of a pixel value of a pixel point near the coordinate point, and the corrected pixel value is added to the position point in the corrected image.

The number of the character key points is consistent with the number of the preset position points, the preset position points can be set according to a certain rule, the number of the character key points is assumed to be n, and the rule of the preset position points can be as follows: the preset position points are two parallel rows of position points, the distance between the two rows of position points is a preset height h, the length of each row of position points is a preset length w, the distance between adjacent position points in each row of position points is equal, and n position points are arranged according to the rule. The size of the created correction image may be w×h, and the pixels near the coordinate point may be pixels at four corners of the coordinate point, for example, a coordinate point corresponding to a certain position point in the correction image in the image to be identified is (100.5,2.6), and the pixels near the coordinate point are (100, 2), (101,2), (100, 3), (101,3).

It should be noted that, since the transformation matrix is obtained by the correspondence between the key points of the character and the preset position points, the region corresponding to the coordinate point corresponding to each position point in the blank correction image determined by the transformation matrix in the image to be identified is the character region, and the image formed by interpolating the pixel points near each coordinate point to obtain the correction pixel value is the correction image corresponding to the character region.

In an exemplary scenario, as shown in fig. 2B, a character key point is located in the image to be recognized through a character locating network, so as to obtain the character key point shown in fig. 2B, and the character key point is denoted as p= { P ₁ ,P ₂ ,…P ₁₆ As shown in fig. 2C, the preset position point is denoted as P '= { P' ₁ ，P’ ₂ ，...P’ ₁₆ Determining TPS transformation matrix according to the corresponding relation between P and P', creating a correction image with w x h blank, determining coordinate point of each position point in the correction image in the image to be identified through the TPS transformation matrix, interpolating by using pixel value of pixel point near the coordinate point to obtain correction pixel value, and filling corresponding position in the correction image by using the correction pixel valueThe points are placed so that a rectified image as shown in fig. 2D is obtained.

In an embodiment, for the process of recognizing characters in the corrected image by the character recognition network, features of the corrected image may be extracted by a convolutional neural network in the character recognition network and output to a convolutional neural network in the character recognition network, the features are weighted and encoded by the convolutional neural network and output to a decoding network in the character recognition network, the weighted and encoded features are decoded by the decoding network to obtain at least one feature sequence, and output to a classification layer in the character recognition network, and each feature sequence is classified by the classification layer to obtain a character corresponding to each feature sequence.

Among these, convolutional Neural Networks (CNNs) may be neural networks based on a res net (Residual Neural Network, deep residual network) structure. The decoding network may be a network based on an Attention Model structure.

Step 202: and acquiring a character recognition result output by the character recognition model.

Based on the above-described scene, after inputting the corrected image in fig. 2D into the character recognition network, the character recognition result of "GIORDANO" can be obtained.

In the embodiment of the application, the image to be recognized can be input into the trained character recognition model, the character recognition model locates character key points in the image to be recognized through the character locating network, and the character key points are output to the character correction network in the character recognition model, so that the character correction network can determine the correction image corresponding to the character area in the image to be recognized by utilizing the corresponding relation between the character key points and the preset position points, and the correction image is output to the character recognition network in the character recognition model to recognize the characters in the correction image, thereby obtaining the character recognition result output by the character recognition model.

Fig. 3 is a flowchart of another embodiment of a character recognition method according to an exemplary embodiment of the present application, and based on the embodiment shown in fig. 2A, this embodiment is exemplarily described by taking how to train a character recognition model, and as shown in fig. 3, a flow of training the character recognition model may include:

step 301: a training sample containing characters is obtained.

In an embodiment, images of various natural scenes or specific scenes can be obtained, and characters contained in the images are labeled, so that training samples are obtained.

Wherein the number of training samples may be set according to practical experience.

Step 302: and training the character recognition model end to end by using training samples until the training times reach the preset times, and stopping training.

In an embodiment, in the training process, parameters in the character recognition model can be adjusted by calculating the loss value of the character recognition result output by the character recognition model each time relative to the labeled character until the training times reach the preset times, and the training is stopped. The training times can be set according to practical experience.

Thus, the process shown in fig. 3 is completed, the training of the single character recognition model can be realized through the process shown in fig. 3, each neural network in the character recognition model does not need to be independently and separately trained, and the problem of error transmission caused by separate training can be avoided.

Fig. 4 is a hardware configuration diagram of an electronic device according to an exemplary embodiment of the present application, the electronic device including: a communication interface 401, a processor 402, a machine-readable storage medium 403, and a bus 404; wherein the communication interface 401, the processor 402 and the machine readable storage medium 403 perform communication with each other via a bus 404. The processor 402 may perform the character recognition method described above by reading and executing machine-executable instructions in the machine-readable storage medium 403 corresponding to the control logic of the character recognition method, the details of which are referred to in the above embodiments and will not be further elaborated here.

The machine-readable storage medium 403 referred to in this disclosure may be any electronic, magnetic, optical, or other physical storage device that can contain or store information, such as executable instructions, data, or the like. For example, a machine-readable storage medium may be: volatile memory, nonvolatile memory, or similar storage medium. In particular, the machine-readable storage medium 403 may be RAM (Radom Access Memory, random access memory), flash memory, a storage drive (e.g., hard drive), any type of storage disk (e.g., optical disk, DVD, etc.), or a similar storage medium, or a combination thereof.

Fig. 5 is a block diagram showing an embodiment of a character recognition apparatus according to an exemplary embodiment of the present application, and as shown in fig. 5, the character recognition apparatus includes:

the character recognition module 510 is configured to input an image to be recognized into a trained character recognition model, so that a character key point is positioned in the image to be recognized by the character recognition model through a character positioning network, and output the character key point to a character correction network in the character recognition model, so that a correction image corresponding to a character region in the image to be recognized is determined by the character correction network in the image to be recognized by using a corresponding relationship between the character key point and a preset position point, and the correction image is output to the character recognition network in the character recognition model to recognize characters in the correction image;

and the obtaining module 520 is configured to obtain a character recognition result output by the character recognition model.

In an optional implementation manner, the character recognition module 510 is specifically configured to extract, through a feature extraction network in the character positioning network, features of the image to be recognized and output the extracted features to a key point regression network in the character positioning network, in a process that the character positioning network positions key points of the character in the image to be recognized; and the key point regression network extracts character key points by using the extracted features.

In an optional implementation manner, the character recognition module 510 is specifically configured to determine, in the process that the character correction network determines, in the image to be recognized, a corrected image corresponding to a character region in the image to be recognized by using a correspondence between the character key points and preset position points, a corresponding thin-plate spline TPS transformation matrix according to the correspondence between the character key points and the preset position points, where the number of the character key points is identical to the number of the preset position points; creating a blank rectified image; and determining a coordinate point corresponding to each position point in the corrected image by using the TPS transformation matrix, interpolating by using pixel values of pixel points near the coordinate point to obtain a corrected pixel value, and filling the corrected pixel value into the position point in the corrected image.

In an optional implementation manner, the character recognition module 510 is specifically configured to extract, through a convolutional neural network in the character recognition network, a feature of the corrected image in a process of recognizing a character in the corrected image by the character recognition network, and output the feature to a convolutional neural network in the character recognition network; the cyclic neural network performs weighted coding on the characteristics and outputs the weighted coded characteristics to a decoding network in the character recognition network; the decoding network decodes the weighted and coded features to obtain at least one feature sequence, and outputs the at least one feature sequence to a classification layer in the character recognition network; the classifying layer classifies each characteristic sequence to obtain character content corresponding to each characteristic sequence.

In an alternative implementation, the apparatus further comprises (not shown in fig. 5):

the training module is used for acquiring training samples containing characters; and performing end-to-end training on the character recognition model by using the training sample until the training times reach the preset times, and stopping training.

The implementation process of the functions and roles of each unit in the above device is specifically shown in the implementation process of the corresponding steps in the above method, and will not be described herein again.

For the device embodiments, reference is made to the description of the method embodiments for the relevant points, since they essentially correspond to the method embodiments. The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purposes of the present application. Those of ordinary skill in the art will understand and implement the present application without undue burden.

Other embodiments of the application will be apparent to those skilled in the art from consideration of the specification and practice of the application disclosed herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.

The foregoing description of the preferred embodiments of the application is not intended to be limiting, but rather to enable any modification, equivalent replacement, improvement or the like to be made within the spirit and principles of the application.

Claims

1. A method of character recognition, the method comprising:

inputting an image to be recognized into a trained character recognition model, positioning character key points in the image to be recognized through a character positioning network by a character recognition model, outputting the character key points to a character correction network in the character recognition model, determining a correction image corresponding to a character area in the image to be recognized by the character correction network in the image to be recognized by utilizing the corresponding relation between the character key points and preset position points, and outputting the correction image to the character recognition network in the character recognition model to recognize characters in the correction image;

the character recognition network recognizes characters in the rectified image, comprising: extracting the characteristics of the corrected image through a convolutional neural network in the character recognition network, and outputting the characteristics to a cyclic neural network in the character recognition network; the cyclic neural network performs weighted coding on the characteristics and outputs the weighted coded characteristics to a decoding network in the character recognition network; the decoding network decodes the weighted and coded features to obtain at least one feature sequence, and outputs the at least one feature sequence to a classification layer in the character recognition network; the classifying layer classifies each characteristic sequence to obtain character content corresponding to each characteristic sequence;

and acquiring a character recognition result output by the character recognition model.

2. The method of claim 1, wherein the character locating network locates character keypoints in the image to be identified, comprising:

extracting the characteristics of the image to be identified through a characteristic extraction network in the character positioning network, and outputting the characteristics to a key point regression network in the character positioning network;

and the key point regression network extracts character key points by using the extracted features.

3. The method according to claim 1, wherein the character correction network determining, in the image to be recognized, a corrected image corresponding to a character area in the image to be recognized by using a correspondence between the character key points and preset position points, includes:

determining a corresponding thin-plate spline function TPS transformation matrix according to the corresponding relation between the character key points and the preset position points, wherein the number of the character key points is consistent with the number of the preset position points;

creating a blank rectified image;

and determining a coordinate point corresponding to each position point in the corrected image by using the TPS transformation matrix, interpolating by using pixel values of pixel points near the coordinate point to obtain a corrected pixel value, and filling the corrected pixel value into the position point in the corrected image.

4. The method of claim 1, wherein the character recognition model is trained by:

acquiring a training sample containing characters;

and performing end-to-end training on the character recognition model by using the training sample until the training times reach the preset times, and stopping training.

5. A character recognition apparatus, the apparatus comprising:

the character recognition module is used for inputting an image to be recognized into a trained character recognition model, positioning character key points in the image to be recognized through a character positioning network by the character recognition model, outputting the character key points to a character correction network in the character recognition model, determining a correction image corresponding to a character area in the image to be recognized by the character correction network in the image to be recognized by utilizing the corresponding relation between the character key points and preset position points, and outputting the correction image to the character recognition network in the character recognition model to recognize characters in the correction image;

the character recognition module is specifically configured to extract, through a convolutional neural network in the character recognition network, characteristics of the corrected image and output the extracted characteristics to a convolutional neural network in the character recognition network in a process of recognizing characters in the corrected image by the character recognition network; the cyclic neural network performs weighted coding on the characteristics and outputs the weighted coded characteristics to a decoding network in the character recognition network; the decoding network decodes the weighted and coded features to obtain at least one feature sequence, and outputs the at least one feature sequence to a classification layer in the character recognition network; the classifying layer classifies each characteristic sequence to obtain character content corresponding to each characteristic sequence;

and the acquisition module is used for acquiring the character recognition result output by the character recognition model.

6. The apparatus according to claim 5, wherein the character recognition module is specifically configured to extract, through a feature extraction network in the character positioning network, features of the image to be recognized and output the extracted features to a key point regression network in the character positioning network, in a process that the character positioning network positions key points of the characters in the image to be recognized; and the key point regression network extracts character key points by using the extracted features.

7. The apparatus of claim 5, wherein the character recognition module is specifically configured to determine, in the process that the character correction network determines the corrected image corresponding to the character region in the image to be recognized by using the correspondence between the character key points and the preset position points in the image to be recognized, a corresponding thin-plate spline TPS transformation matrix according to the correspondence between the character key points and the preset position points, where the number of the character key points is identical to the number of the preset position points; creating a blank rectified image; and determining a coordinate point corresponding to each position point in the corrected image by using the TPS transformation matrix, interpolating by using pixel values of pixel points near the coordinate point to obtain a corrected pixel value, and filling the corrected pixel value into the position point in the corrected image.

8. The apparatus of claim 5, wherein the apparatus further comprises:

9. An electronic device comprising a readable storage medium and a processor;

the processor is configured to read the machine-executable instructions on the readable storage medium and execute the instructions to implement the steps of the method of any of claims 1-4.