CN111275051A

CN111275051A - Character recognition method, character recognition device, computer equipment and computer-readable storage medium

Info

Publication number: CN111275051A
Application number: CN202010128301.8A
Authority: CN
Inventors: 周康明; 于洋
Original assignee: Shanghai Eye Control Technology Co Ltd
Current assignee: Shanghai Eye Control Technology Co Ltd
Priority date: 2020-02-28
Filing date: 2020-02-28
Publication date: 2020-06-12

Abstract

The application relates to a character recognition method, a character recognition device, a computer device and a computer readable storage medium. The character recognition method includes: acquiring position information of interference information in a text image; extracting an interference area image from the text image according to the position information of the interference information; obtaining a character image without the interference information according to the interference area image and the interference removing model; and acquiring a character recognition result of the text image according to the character image and the text image. By adopting the method, the character recognition accuracy of the text image can be improved.

Description

Character recognition method, character recognition device, computer equipment and computer-readable storage medium

Technical Field

The present invention relates to the field of character recognition technologies, and in particular, to a character recognition method, a character recognition apparatus, a computer device, and a computer-readable storage medium.

Background

Image recognition has been widely used in many industries as one of branches of computer vision research field, and character recognition technology belongs to the image recognition field. OCR (Optical Character Recognition), which is a Character Recognition technology, can recognize Character information from an input text image. In an actual application scenario, interference information such as a stamp or handwritten characters may exist in input images such as a contract text image and a form text image.

At present, for text images with interference information such as seals or handwritten characters, character recognition technology cannot accurately recognize character information under the interference information in the text images, and therefore the accuracy rate of character recognition is low.

Disclosure of Invention

In view of the above, it is necessary to provide a character recognition method, a device, a computer device and a computer readable storage medium capable of improving the character recognition accuracy of a text image.

In a first aspect, an embodiment of the present application provides a character recognition method, where the character recognition method includes: acquiring position information of interference information in a text image; extracting an interference area image from the text image according to the position information of the interference information; obtaining a character image with interference information removed according to the interference area image and the interference removing model; and acquiring a character recognition result of the text image according to the character image and the text image.

In one embodiment, obtaining a character recognition result of a text image according to a character image and the text image includes: replacing the interference area image in the text image with a character image to obtain a replaced text image; and inputting the replaced text image into the character recognition model to obtain a character recognition result of the text image.

In one embodiment, obtaining the character image without the interference information according to the interference region image and the interference removing model includes: inputting the interference area image into an interference removing model to obtain an output image; the output pixel value of each pixel point in the output image is a normalized pixel value; acquiring a real pixel value of each pixel point according to the output pixel value of each pixel point in the output image; and generating a character image according to the real pixel value of each pixel point.

In one embodiment, the output pixel values include a normalized R value, a normalized G value, and a normalized B value; obtaining the real pixel value of each pixel point according to the output pixel value of each pixel point in the output image, comprising: respectively multiplying the normalized R value, the normalized G value and the normalized B value of each pixel point by preset times to obtain a real R value, a real G value and a real B value of each pixel point; and determining the real R value, the real G value and the real B value of each pixel point as the real pixel values of the corresponding pixel points.

In one embodiment, the training process of the interference elimination model includes: acquiring a plurality of groups of sample images; each group of sample images comprises a sample interference area image and a sample character image corresponding to the sample interference area image; and replacing the Softmax layer of the initial full convolution network with a normalization layer, and training the initial full convolution network according to the multiple groups of sample images to obtain an interference removal model.

In one embodiment, training an initial full convolution network according to a plurality of groups of sample images to obtain an interference-free model includes: pre-training a basic network of the initial full convolution network by adopting an ImageNet data set to obtain a pre-trained basic network; and determining the parameters of the pre-trained basic network as the initialization parameters of the basic network of the initial full convolution network, and training the initial full convolution network after the parameters are initialized by adopting a plurality of groups of sample images to obtain an interference removal model.

In a second aspect, an embodiment of the present application provides a character recognition apparatus, including: the first acquisition module is used for acquiring the position information of the interference information in the text image; the extraction module is used for extracting an interference area image from the text image according to the position information of the interference information; the second acquisition module is used for acquiring the character image without the interference information according to the interference area image and the interference elimination model; and the recognition module is used for acquiring a character recognition result of the text image according to the character image and the text image.

In a third aspect, an embodiment of the present application provides a computer device, which includes a memory and a processor, where the memory stores a computer program, and the processor implements the steps of the method according to the first aspect when executing the computer program.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the steps of the method according to the first aspect as described above.

The beneficial effects brought by the technical scheme provided by the embodiment of the application at least comprise:

obtaining position information of interference information in a text image; extracting an interference area image from the text image according to the position information of the interference information; obtaining a character image with interference information removed according to the interference area image and the interference removing model; acquiring a character recognition result of the text image according to the character image and the text image; therefore, the character recognition result of the text image is obtained according to the character image and the text image, and the character image is obtained by removing the interference information from the interference area image in the text image by adopting the interference removal model, so that the influence of the interference information on character recognition is reduced; the problem that in the traditional technology, when the text image has the interference information such as a seal or handwritten characters and the like, the character recognition technology cannot accurately recognize the character information under the interference information in the text image, so that the recognition accuracy rate of character recognition is low is solved. The character recognition accuracy of the text image including the interference information is improved.

Drawings

FIG. 1 is a flow diagram illustrating a character recognition method according to one embodiment;

FIG. 2 is a flow diagram illustrating a character recognition method, according to an embodiment;

fig. 3 is a schematic diagram illustrating that a text image is replaced according to a text image in the character recognition method according to the embodiment;

FIG. 4 is a flow diagram illustrating a method for character recognition, according to an embodiment;

FIG. 5 is a diagram illustrating a detailed step of step S320 in the character recognition method according to an embodiment;

FIG. 6 is a flowchart illustrating a process of interference cancellation model training according to an embodiment;

fig. 7 is a schematic diagram illustrating a refinement step of step S520 in the process of training the interference cancellation model according to an embodiment;

FIG. 8 is a flowchart illustrating a character recognition method according to another embodiment;

FIG. 9 is a flowchart illustrating a character recognition method according to another embodiment;

FIG. 10 is a block diagram of a character recognition apparatus according to an embodiment;

FIG. 11 is an internal block diagram of a computer device, provided in one embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The character recognition method, the character recognition device, the computer equipment and the computer readable storage medium provided by the embodiment of the application aim to solve the technical problem that in the traditional technology, for a text image with interference information such as a seal or handwritten characters, the character recognition technology cannot accurately recognize character information under the interference information in the text image, so that the recognition accuracy rate of character recognition is low. The following describes in detail the technical solutions of the present application and how the technical solutions of the present application solve the above technical problems by embodiments and with reference to the drawings. The following specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments.

It should be noted that, in the character recognition method provided in the embodiments of the present application, the execution main body may be a character recognition device, and the character recognition device may be implemented as part or all of a computer device by software, hardware, or a combination of software and hardware. In the following method embodiments, the execution subject is a computer device, which may be a server; it is understood that the character recognition method provided by the following method embodiments may also be applied to a terminal, may also be applied to a system including the terminal and a server, and is implemented through interaction between the terminal and the server.

Please refer to fig. 1, which illustrates a flowchart of a character recognition method according to an embodiment of the present application. The embodiment relates to a specific implementation process for removing interference information in a text image and then performing character recognition. As shown in fig. 1, the character recognition method of the present embodiment may include the following steps:

and step S100, acquiring position information of interference information in the text image.

The text image may be an image obtained by shooting or scanning a bill, a contract text, a form, a certificate and the like, and the interference information may be information such as a stamp or a handwritten character which covers a character to be recognized in the text image and interferes with the recognition of the character to be recognized.

In this embodiment, for a text image with interference information, a computer device first obtains location information of the interference information in the text image. As an embodiment, the computer device may obtain location information of the interference information in the text image by using an SSD destination detection algorithm.

In other embodiments, the computer device may further use a YOLO target detection algorithm to obtain location information of the interference information in the text image, and the like, which is not limited herein.

And step S200, extracting an interference area image from the text image according to the position information of the interference information.

And the computer equipment determines the specific position of the interference information in the text image according to the position information of the interference information, and extracts the interference area image from the text image.

As an embodiment, the position information may include coordinates of a position frame, and the computer device cuts out a target area corresponding to the coordinates of the position frame in the text image as the interference area image.

In other embodiments, the computer device may further correct the coordinate of the position frame by a preset coefficient to improve the accuracy of the extracted interference region image, and the preset coefficient may be set by itself in actual implementation.

And step S300, obtaining the character image without the interference information according to the interference area image and the interference removing model.

And the computer equipment extracts an interference area image from the text image according to the position information of the interference information, and then obtains a character image without the interference information according to the interference area image and the interference removing model. In this embodiment, the interference elimination model may be obtained by training based on an FCN (full Convolutional network) model framework.

In the conventional technology, the FCN is used for pixel segmentation, that is, for classifying each pixel in an image, and after a feature obtained by a full connection layer of the FCN passes through a Softmax layer, a class probability value of each pixel for 21 classes can be obtained, wherein the sum of the class probability values of each pixel for 21 classes is 1.

In this embodiment, the FCN is used for prediction, and the class probability value of each pixel in the interference region image does not need to be calculated, so that, during training of the interference elimination model, the computer removes the Softmax layer in the FCN model frame, and controls the interference elimination model to output the three-channel numerical value of each pixel by adjusting the model parameters of the interference elimination model.

And the computer equipment takes the three-channel numerical value of each pixel point output by the interference removal model as the RGB pixel value of the corresponding pixel point, so that the character image can be obtained according to the RGB pixel value of each pixel point.

In other embodiments, in order to facilitate better learning of the interference elimination model in the training process, the computer device may further add a normalization layer to the interference elimination model, obtain a three-channel value of each pixel point, and output the three-channel value of each pixel point after normalization through normalization operation. And the computer equipment restores the three-channel numerical value normalized by each pixel point to obtain a character image. For example, during normalization, 255 is divided for each three-channel value, and the three-channel value is normalized to be within the [0,1] interval, and when the three-channel value is restored, the three-channel value is multiplied by 255 and restored to be within the [0,255] interval by the computer equipment, so that the RGB pixel value of each pixel point is obtained.

And step S400, acquiring a character recognition result of the text image according to the character image and the text image.

And after the computer equipment acquires the character image, acquiring a character recognition result of the text image according to the character image and the text image.

In this embodiment, as an implementation manner, the computer device may replace the interference area image in the text image with a character image, and then input the replaced text image into the character recognition model to obtain a character recognition result of the text image, where the character recognition may be an OCR model.

In other embodiments, as an implementation manner, the computer device may further replace the interference area image in the text image with a blank image, perform character recognition on the replaced text image by using a character recognition model, perform character recognition on the character image, and form a final character recognition result of the text image by using the two recognition results.

In the embodiment, the position information of the interference information in the text image is acquired; extracting an interference area image from the text image according to the position information of the interference information; obtaining a character image with interference information removed according to the interference area image and the interference removing model; acquiring a character recognition result of the text image according to the character image and the text image; therefore, in the embodiment, the character recognition result of the text image is obtained according to the character image and the text image, and the character image is obtained by removing the interference information from the interference area image in the text image by adopting the interference removal model, so that the influence of the interference information on character recognition is reduced; the problem that in the traditional technology, when the text image has the interference information such as a seal or handwritten characters and the like, the character recognition technology cannot accurately recognize the character information under the interference information in the text image, so that the recognition accuracy rate of character recognition is low is solved. The embodiment improves the character recognition accuracy of the text image including the interference information.

Fig. 2 is a schematic flowchart of a character recognition method according to another embodiment. On the basis of the embodiment shown in fig. 1, as shown in fig. 2, in the present embodiment, the step S400 includes a step S410 and a step S420, specifically:

and step S410, replacing the interference area image in the text image with a character image to obtain a replaced text image.

In this embodiment, the computer device obtains position information of interference information in the text image, extracts an interference area image from the text image according to the position information of the interference information, obtains a character image from which the interference information is removed according to the interference area image and the interference removal model, and then replaces the interference area image in the text image with the character image to obtain a replaced text image.

Step S420, inputting the replaced text image into the character recognition model, and obtaining a character recognition result of the text image.

The computer equipment inputs the replaced text image into the character recognition model to obtain a character recognition result of the text image, and it can be understood that the replaced text image is obtained by removing interference information in the text image, so that the interference of the interference information on the character recognition of the text image is avoided, and the character recognition accuracy of the text image is improved.

In this embodiment, specifically, the example that the interference information is a stamp is taken as an example, and the implementation process of the character recognition method in this embodiment is further described.

Fig. 3 is a schematic diagram of acquiring a text image after replacement according to the text image in this embodiment. As shown in fig. 3, the four images from left to right represent a text image, an interference area image, a character image, and a replaced text image, respectively. The text image comprises an interference information stamp, the computer equipment acquires position information of the interference information in the text image according to an SSD target detection algorithm, the position information can comprise coordinates of a position frame, the computer equipment intercepts a target area corresponding to the coordinates of the position frame in the text image as an interference area image, the computer equipment obtains a character image without the interference information according to the interference area image and an interference elimination model, and then the interference area image in the text image is replaced by the character image to obtain a replaced text image. And the computer equipment inputs the replaced text image into the character recognition model to obtain a character recognition result of the text image. Therefore, the influence of interference information on character recognition is reduced, and the accuracy of character recognition of the text image is improved.

In other embodiments, as an implementation manner, the interference information in the text image may also be multiple seals, the computer device obtains coordinates of position frames of the seals in the text image according to an SSD destination detection algorithm, then cuts out multiple target areas corresponding to the coordinates of the position frames from the text image as multiple interference area images, the computer device inputs the multiple interference area images into the interference elimination model to obtain multiple character images from which the interference information is removed, and the computer device replaces the multiple interference area images in the text image with corresponding character images to obtain the replaced text image, thereby completing removal of the interference information in the text image, reducing the influence of the interference information on character recognition, and improving the accuracy of character recognition of the text image.

Fig. 4 is a flowchart illustrating a character recognition method according to another embodiment. On the basis of the embodiment shown in fig. 1, as shown in fig. 4, in the present embodiment, the step S300 includes a step S310, a step S320, and a step S330, specifically:

and step S310, inputting the interference area image into the interference elimination model to obtain an output image.

And the output pixel value of each pixel point in the output image is the normalized pixel value.

In this embodiment, specifically, the interference elimination model is obtained based on the FCN model framework training, and when the interference elimination model is trained, the computer removes the softmax layer in the FCN model framework and adds the normalization layer, that is, the normalization operation is used to replace the softmax operation, and the model parameters of the interference elimination model are adjusted through iterative training, so as to control the output result of the interference elimination model to be the normalized pixel value of each pixel point.

Step S320, obtaining a true pixel value of each pixel point according to the output pixel value of each pixel point in the output image.

And the computer equipment acquires the real pixel value of each pixel point according to the output pixel value of each pixel point in the output image. Specifically, the output pixel value of each pixel point is normalized in the [0,1] interval after being divided by 255, but the value range of the real pixel value is in the [0,255] interval, so the computer device multiplies the output pixel value of each pixel point by 255 to obtain the real pixel value of each pixel point.

Step S330, generating a character image according to the real pixel value of each pixel point.

And the computer equipment acquires the real pixel value of each pixel point, namely the real pixel value of each pixel point in the character image. In the embodiment, an interference area image is input into an interference removing model to obtain an output image; acquiring a real pixel value of each pixel point according to the output pixel value of each pixel point in the output image; generating a character image according to the real pixel value of each pixel point; therefore, the pixel value of each pixel point after normalization is obtained by adopting the interference elimination model based on the FCN model frame, and the normalized pixel value is further restored to obtain the character image, so that the interference information in the interference area image is removed, and the influence of the interference information on the character recognition of the text image is avoided.

On the basis of the embodiment shown in fig. 4, fig. 5 is a schematic diagram illustrating a step S320 in a character recognition method according to another embodiment. As shown in fig. 5, step S320 of the present embodiment includes step S321 and step S322, specifically:

step S321, multiplying the normalized R value, the normalized G value, and the normalized B value of each pixel point by preset multiples, respectively, to obtain a true R value, a true G value, and a true B value of each pixel point.

In this embodiment, the output pixel values include a normalized R value, a normalized G value, and a normalized B value. And the computer equipment inputs the interference area image into the interference removal model to obtain an output image, and the pixel value of each pixel point in the output image is a three-channel numerical value after normalization, namely a normalized R value, a normalized G value and a normalized B value. The computer device multiplies the normalized R value, the normalized G value, and the normalized B value of each pixel point by a preset multiple, in this embodiment, the preset multiple is 255, and obtains a true R value, a true G value, and a true B value of each pixel point.

In step S322, the true R value, the true G value, and the true B value of each pixel point are determined as the true pixel values of the corresponding pixel points.

The computer equipment determines the real R value, the real G value and the real B value of each pixel point as the real pixel value of the corresponding pixel point, and generates a colorful character image according to the real pixel value of each pixel point.

In this embodiment, the normalized R value, the normalized G value, and the normalized B value of each pixel are multiplied by preset multiples, respectively, to obtain a true R value, a true G value, and a true B value of each pixel; determining the real R value, the real G value and the real B value of each pixel point as the real pixel values of the corresponding pixel points; therefore, the character image without the interference information is generated, and the accuracy of character recognition can be improved based on the character recognition of the character image.

Based on the embodiment shown in fig. 1, referring to fig. 6, fig. 6 is a flowchart illustrating a training process of the interference cancellation model according to an embodiment. As shown in fig. 6, the training process of the interference elimination model of the present embodiment includes step S510 and step S520, specifically:

in step S510, a plurality of sets of sample images are acquired.

Each group of sample images comprises a sample interference area image and a sample character image corresponding to the sample interference area image. In the data preparation stage, taking interference information as a seal as an example, selecting a text without the seal to photograph to obtain an image A, adding the seal to the image A to photograph again to obtain an image B, and obtaining an interference detection model by using a Solid State Disk (SSD) target detection algorithm or training based on an SSD target detection model framework by using computer equipment to obtain coordinates of a position frame of the seal in the image B; the computer equipment intercepts an image C from the image A according to the coordinates of the position frame, wherein the image C is a sample character image; and the computer device intercepts an image D from the image B according to the coordinates of the position frame, wherein the image D is a sample interference area image, and thus a group of sample images are obtained. In the data preparation stage, a plurality of groups of sample images are obtained by adopting the method, and the sample images are texts with different formats, so that the richness of the samples is ensured.

And S520, replacing the Softmax layer of the initial full convolution network with a normalization layer, and training the initial full convolution network according to the multiple groups of sample images to obtain an interference removal model.

The interference elimination model is realized based on an FCN model framework, namely a full convolution network model framework.

In this embodiment, the computer device replaces the Softmax layer of the initial full-convolution network with the normalized layer, and then takes a plurality of sample interference area images in a plurality of groups of sample images as the input of the model; and after the pixel value of each pixel point in the plurality of sample character images is divided by 255 for normalization, the RGB three-channel image corresponding to each sample character image is taken as the output of the model, and then the initial full convolution network is trained to obtain the interference-free model.

As an implementation manner, referring to fig. 7, fig. 7 is a schematic diagram of a step of refining step S520 in another embodiment. As shown in fig. 7, step S520 includes step S521 and step S522, specifically:

and step S521, replacing the Softmax layer of the initial full convolution network with a normalization layer, and pre-training the basic network of the initial full convolution network by adopting an ImageNet data set to obtain the pre-trained basic network.

In this embodiment, the initial fully convolutional network uses a convolutional neural network AlexNet as a backbone network, i.e., a basic network. After the computer device replaces the Softmax layer of the initial full convolution network with the normalization layer, AlexNet is pre-trained by the ImageNet data set, the ImageNet data set is one of public data sets in the field of image processing, and the training speed of the interference removal model can be increased by pre-training AlexNet. And pre-training AlexNet to obtain a pre-trained basic network.

Step S522, determining the parameters of the pre-trained basic network as initialization parameters of the basic network of the initial full convolution network, and training the initial full convolution network after the parameters are initialized by using multiple groups of sample images to obtain an interference-free model.

The computer equipment determines the parameters of the pre-trained basic network as the initialization parameters of the basic network of the initial full convolution network, namely, the parameters of the basic network of the initial full convolution network are initialized to the parameters of the pre-trained basic network, and then the computer equipment carries out fine tuning training on the full convolution network after the parameters of the basic network are initialized by adopting a plurality of groups of sample images, namely, the parameters of the initial full convolution network after the parameters are initialized are finely tuned to obtain the interference removal model.

In this embodiment, as an implementation manner, according to an experimental effect, the basic learning rate is set to 0.0001, the weight attenuation default is set to 0.0005, the model parameter momentum factor is set to 0.99, and the optimization algorithm uses the SGD optimization algorithm to obtain an optimal model, that is, a final interference removal model.

In the character recognition process, the computer equipment obtains position information of interference information in a text image, extracts an interference area image from the text image according to the position information of the interference information, obtains a character image without the interference information according to the interference area image and an interference removal model obtained by training in the embodiment, and obtains a character recognition result of the text image according to the character image and the text image, so that the accuracy of character recognition in the text image is improved.

Fig. 8 is a flowchart illustrating a character recognition method according to another embodiment. On the basis of the embodiment shown in fig. 1, as shown in fig. 8, in the present embodiment, the step S100 includes a step S110, specifically:

step S110, inputting the text image into the interference detection model to obtain the position information of the interference information in the text image.

The position information comprises coordinates of a position frame, and the interference detection model is obtained based on training of an SSD target detection model framework. In this embodiment, the interference detection model is implemented based on an SSD model framework, the model framework uses VGG16 as a backbone network, and replaces the FC layer of VGG16 with convolutional layers for extracting features of different scales, the model framework outputs position frame coordinates of targets detected under different scales and confidence scores of targets belonging to different categories from six convolutional layers of Conv4_3, Conv7, Conv8_2, Conv9_2, Conv10_2, and Conv11_2, respectively, and the final targets, that is, the position information of the interference information and the category confidence scores are obtained through post-processing. In the post-processing, one of the plurality of overlapped position frames with the highest confidence score is used as the final position frame.

In this embodiment, the training process of the interference detection model is described by taking the example that the interference information is a stamp. In the data preparation stage, a plurality of sample text images containing the stamps are obtained, a rectangular frame is used for marking stamp areas in the sample text images, and corresponding category labels are recorded, wherein the category labels comprise four types: round seal, square seal, rectangular seal and background.

In the training process, the VGG16 is pre-trained by adopting an ImageNet training set to obtain a basic model, and the training speed of the interference detection model can be increased by pre-training the VGG 16; and then, carrying out parameter fine adjustment on the SSD model frame comprising the basic model by adopting the marked multiple sample text images until a final interference detection model is obtained.

In this embodiment, as an implementation manner, according to an experimental effect, a basic learning rate is set to 0.001, a weight attenuation default is set to 0.0005, a model parameter momentum factor is set to 0.99, and an SGD optimization algorithm is used in the optimization algorithm to obtain an optimal model, that is, a final interference detection model.

The computer equipment inputs the text image into the interference detection model to obtain position information of interference information in the text image, wherein the position information is specifically coordinates of a position frame, if the text image comprises a plurality of interference information, such as a plurality of seals, the position information comprises a rectangular coordinate array, the rectangular coordinate array comprises a plurality of groups of coordinates, and each group of coordinates corresponds to the position of one seal.

Further, please continue to refer to fig. 8, step S200 of the present embodiment includes step S210, specifically:

and step S210, according to the coordinates of the position frame, intercepting a target area corresponding to the coordinates of the position frame in the text image as an interference area image.

The computer equipment intercepts a target area corresponding to the coordinates of the position frame from the text image as an interference area image according to the coordinates of the position frame; obtaining a character image with interference information removed according to the interference area image and the trained interference removing model; and acquiring a character recognition result of the text image according to the character image and the text image, thereby improving the accuracy of the character recognition result in the text image.

Referring to fig. 9, fig. 9 is a schematic flowchart of a character recognition method according to another embodiment. On the basis of the embodiments shown in fig. 1 to 8, as shown in fig. 9, in the present embodiment, the character recognition method includes:

The position information comprises coordinates of a position frame, and the interference detection model is obtained based on training of an SSD target detection model framework. In this embodiment, the interference detection model is implemented based on an SSD model framework, the computer device inputs the text image into the interference detection model to obtain position information of interference information in the text image, where the position information is specifically coordinates of a position frame, and if the text image includes a plurality of interference information, if the text image includes a plurality of stamps, the position information includes a rectangular coordinate array, the rectangular coordinate array includes a plurality of sets of coordinates, and each set of coordinates corresponds to a position of one stamp.

And the computer equipment intercepts a target area corresponding to the coordinates of the position frame in the text image as an interference area image according to the coordinates of the position frame.

In this embodiment, specifically, the interference elimination model is obtained based on FCN model framework training, and when the interference elimination model is trained, the computer device removes the Softmax layer in the FCN model framework and adds a normalization layer, that is, replaces Softmax operation with normalization operation, adjusts model parameters of the interference elimination model through iterative training, and controls an output result of the interference elimination model to be a pixel value after normalization of each pixel point.

In this embodiment, the output pixel values include a normalized R value, a normalized G value, and a normalized B value. The computer device multiplies the normalized R value, the normalized G value, and the normalized B value of each pixel point by a preset multiple, in this embodiment, the preset multiple is 255, and then the true R value, the true G value, and the true B value of each pixel point are obtained.

And the computer equipment determines the real R value, the real G value and the real B value of each pixel point as the real pixel value of the corresponding pixel point.

And the computer equipment acquires the real pixel value of each pixel point, namely the real pixel value of each pixel point in the character image. Therefore, an interference removing model based on the FCN model frame is adopted to obtain the pixel value of each pixel point after normalization, the pixel value after normalization is further restored to obtain a character image, and interference information in the interference area image is removed.

And the computer equipment extracts an interference area image from the text image according to the position information of the interference information, and replaces the interference area image in the text image with the character image after obtaining the character image without the interference information according to the interference area image and the interference removing model to obtain the replaced text image.

It should be understood that, although the steps in the above-described flowcharts are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in the above-described flowcharts may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the sub-steps or the stages is not necessarily sequential, but may be performed alternately or alternatingly with other steps or at least a portion of the sub-steps or stages of other steps.

In one embodiment, as shown in fig. 10, there is provided a character recognition apparatus including: the first obtaining module 10 is configured to obtain position information of interference information in a text image; an extracting module 20, configured to extract an interference area image from the text image according to the position information of the interference information; the second obtaining module 30 is configured to obtain a character image without interference information according to the interference region image and the interference removing model; and the recognition module 40 is used for acquiring a character recognition result of the text image according to the character image and the text image.

Optionally, the identification module 40 includes: the replacing submodule is used for replacing the interference area image in the text image with a character image to obtain a replaced text image; and the recognition submodule is used for inputting the replaced text image into the character recognition model to obtain a character recognition result of the text image.

Optionally, the second obtaining module 30 includes: the first acquisition submodule is used for inputting the interference area image into the interference removing model to obtain an output image; the output pixel value of each pixel point in the output image is a normalized pixel value; the second obtaining submodule is used for obtaining the real pixel value of each pixel point according to the output pixel value of each pixel point in the output image; and the generating submodule is used for generating a character image according to the real pixel value of each pixel point.

Optionally, the output pixel values comprise a normalized R value, a normalized G value, and a normalized B value; the second acquisition sub-module includes: the restoring unit is used for multiplying the normalized R value, the normalized G value and the normalized B value of each pixel point by preset multiples respectively to obtain a real R value, a real G value and a real B value of each pixel point; and the determining unit is used for determining the real R value, the real G value and the real B value of each pixel point as the real pixel value of the corresponding pixel point.

Optionally, the apparatus further comprises: the third acquisition module is used for acquiring a plurality of groups of sample images; each group of sample images comprises a sample interference area image and a sample character image corresponding to the sample interference area image; and the first training module is used for replacing the Softmax layer of the initial full convolution network with a normalization layer, and training the initial full convolution network according to the multiple groups of sample images to obtain the interference removal model.

Optionally, the first training module comprises: the pre-training sub-module is used for replacing a Softmax layer of the initial full convolution network with a normalization layer, and pre-training a basic network of the initial full convolution network by adopting an ImageNet data set to obtain a pre-trained basic network; and the training submodule is used for determining the parameters of the pre-trained basic network as the initialization parameters of the basic network of the initial full convolution network, and training the initial full convolution network after the parameters are initialized by adopting a plurality of groups of sample images to obtain the interference removal model.

Optionally, the first obtaining module 20 includes: the third acquisition submodule is used for inputting the text image into the interference detection model to obtain the position information of the interference information in the text image; the position information comprises coordinates of a position frame, and the interference detection model is obtained based on training of an SSD target detection model framework; an extraction module 20 comprising: and the extraction submodule is used for intercepting a target area corresponding to the coordinates of the position frame from the text image as an interference area image according to the coordinates of the position frame.

The character recognition apparatus provided in this embodiment may implement the above-mentioned character recognition method embodiment, and its implementation principle and technical effect are similar, which are not described herein again. For the specific definition of the character recognition device, reference may be made to the above definition of the character recognition method, which is not described herein again. The respective modules in the character recognition apparatus described above may be implemented in whole or in part by software, hardware, and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, there is also provided a computer device as shown in fig. 11, which may be a server. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing character recognition data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a character recognition method.

Those skilled in the art will appreciate that the architecture shown in fig. 11 is a block diagram of only a portion of the architecture associated with the subject application, and is not intended to limit the computing device to which the subject application may be applied, and that a computing device may in particular include more or less components than those shown, or combine certain components, or have a different arrangement of components.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware related to instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, the computer program can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms, such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Ramb microsecond direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and bus dynamic RAM (RDRAM).

In one embodiment, a computer-readable storage medium is provided, having stored thereon a computer program for causing a computer to perform some or all of the above-described method embodiments.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above examples only show some embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A method of character recognition, the method comprising:

acquiring position information of interference information in a text image;

extracting an interference area image from the text image according to the position information of the interference information;

obtaining a character image without the interference information according to the interference area image and the interference removing model;

and acquiring a character recognition result of the text image according to the character image and the text image.

2. The method according to claim 1, wherein the obtaining a character recognition result of the text image according to the character image and the text image comprises:

replacing the interference area image in the text image with the character image to obtain a replaced text image;

and inputting the replaced text image into a character recognition model to obtain a character recognition result of the text image.

3. The method according to claim 1, wherein obtaining the character image without the interference information according to the interference region image and the interference elimination model comprises:

inputting the interference area image into the interference removing model to obtain an output image; the output pixel value of each pixel point in the output image is a normalized pixel value;

acquiring a real pixel value of each pixel point according to the output pixel value of each pixel point in the output image;

and generating the character image according to the real pixel value of each pixel point.

4. The method of claim 3, wherein the output pixel values comprise a normalized R value, a normalized G value, and a normalized B value; the obtaining of the true pixel value of each pixel point according to the output pixel value of each pixel point in the output image includes:

multiplying the normalized R value, the normalized G value and the normalized B value of each pixel point by preset times respectively to obtain a real R value, a real G value and a real B value of each pixel point;

and determining the real R value, the real G value and the real B value of each pixel point as the real pixel value of the corresponding pixel point.

5. The method according to any of claims 1-4, wherein the training process of the interference rejection model comprises:

acquiring a plurality of groups of sample images; each group of sample images comprises a sample interference area image and a sample character image corresponding to the sample interference area image;

replacing a Softmax layer of the initial full convolution network with a normalization layer, and training the initial full convolution network according to the plurality of groups of sample images to obtain the interference elimination model.

6. The method of claim 5, wherein training the initial full convolution network from the plurality of sets of sample images to obtain the interference rejection model comprises:

pre-training the basic network of the initial full convolution network by adopting an ImageNet data set to obtain a pre-trained basic network;

and determining the parameters of the pre-trained basic network as initialization parameters of the basic network of the initial full convolution network, and training the initial full convolution network after parameter initialization by adopting the plurality of groups of sample images to obtain the interference removal model.

7. The method according to claim 1, wherein the obtaining of the location information of the interference information in the text image comprises:

inputting the text image into an interference detection model to obtain position information of interference information in the text image; the position information comprises coordinates of a position frame, and the interference detection model is obtained based on training of an SSD target detection model framework;

the extracting an interference area image from the text image according to the position information of the interference information includes:

and according to the coordinates of the position frame, intercepting a target area corresponding to the coordinates of the position frame in the text image as the interference area image.

8. An apparatus for character recognition, the apparatus comprising:

the first acquisition module is used for acquiring the position information of the interference information in the text image;

the extraction module is used for extracting an interference area image from the text image according to the position information of the interference information;

the second acquisition module is used for acquiring the character image without the interference information according to the interference area image and the interference elimination model;

and the recognition module is used for acquiring a character recognition result of the text image according to the character image and the text image.

9. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.