CN114898381A

CN114898381A - OCR recognition method and device, storage medium and electronic equipment

Info

Publication number: CN114898381A
Application number: CN202210550710.6A
Authority: CN
Inventors: 卢健
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Priority date: 2022-05-20
Filing date: 2022-05-20
Publication date: 2022-08-12

Abstract

The application discloses an OCR recognition method and device, a storage medium and electronic equipment, and relates to the field of artificial intelligence. The method comprises the following steps: acquiring a target image, wherein the target image is an image to be subjected to OCR recognition; inputting a target image into a target OCR recognition model, and outputting an OCR recognition result of the target image, wherein the target OCR recognition model is a model constructed on the basis of a residual error network, a feature pyramid network, a time sequence neural network and an attention mechanism, and the time sequence neural network is one of the following: GRU neural networks, LSTM neural networks, and RNN recurrent neural networks. By the method and the device, the problem of low accuracy of OCR optical character recognition in the related technology is solved.

Description

OCR recognition method and device, storage medium and electronic equipment

Technical Field

The application relates to the field of artificial intelligence, in particular to an OCR recognition method and device, a storage medium and electronic equipment.

Background

Currently, OCR (Optical Character Recognition) is widely applied to scenes such as bill Recognition and contract Recognition. In the related art, CRNN (Convolutional Neural Network) and CTC (connected Temporal Classification), which are algorithms commonly used in the fields of speech recognition, text recognition, and the like, are mainly used for OCR recognition. However, the CRNN and CTC algorithms result in low accuracy of OCR recognition results and long time to infer optical characters.

Aiming at the problem of low accuracy of OCR optical character recognition in the related technology, no effective solution is provided at present.

Disclosure of Invention

The present application mainly aims to provide an OCR recognition method and apparatus, a storage medium, and an electronic device, so as to solve the problem of low accuracy of OCR optical character recognition in the related art.

In order to achieve the above object, according to one aspect of the present application, there is provided an OCR recognition method. The method comprises the following steps: acquiring a target image, wherein the target image is an image to be subjected to OCR recognition; inputting the target image into a target OCR recognition model, and outputting an OCR recognition result of the target image, wherein the target OCR recognition model is a model constructed based on a residual error network, a feature pyramid network, a time sequence neural network and an attention mechanism, and the time sequence neural network is one of the following: GRU neural networks, LSTM neural networks, and RNN recurrent neural networks.

Further, before inputting the target image into a target OCR recognition model, the method further includes: acquiring a plurality of sample images; processing each sample image by using a plurality of residual error networks to obtain a plurality of first characteristic images; sampling and splicing the plurality of first characteristic images by using the characteristic pyramid network to obtain a target characteristic image; constructing a first OCR recognition model according to the target feature image, the time sequence neural network and the attention mechanism; and carrying out model training on the first OCR recognition model to obtain the target OCR recognition model.

Further, the sampling and splicing the plurality of first feature images by using the feature pyramid network to obtain a target feature image comprises: acquiring the size of a second characteristic image in the plurality of first characteristic images; according to the size of the second characteristic image, utilizing the characteristic pyramid network to perform downsampling processing on a third characteristic image in the plurality of first characteristic images to obtain a fourth characteristic image; according to the size of the second characteristic image, utilizing the characteristic pyramid network to perform upsampling processing on a fifth characteristic image in the plurality of first characteristic images to obtain a sixth characteristic image; and splicing the second characteristic image, the fourth characteristic image and the sixth characteristic image to obtain the target characteristic image.

Further, constructing a first OCR recognition model from the target feature image, the temporal neural network, and the attention mechanism comprises: inputting the target characteristic image into a time sequence neural network for encoding, and outputting a first sequence characteristic of the target characteristic image; carrying out weighted summation processing on the first sequence features by using the attention mechanism to obtain second sequence features; and constructing the first OCR recognition model based on the second sequence characteristics and a time-sequence neural network for decoding.

Further, before model training is performed on the first OCR recognition model to obtain the target OCR recognition model, the method further includes: acquiring a real character sequence of each sample image; acquiring a predicted character sequence of each sample image, wherein the predicted character sequence is a character sequence output by inputting each sample image into the first OCR recognition model; training the first OCR recognition model by using a loss function based on the real character sequence and the predicted character sequence to obtain the target OCR recognition model, wherein the loss function at least comprises: a cross entropy function.

Further, before obtaining the predicted text sequence for each sample image, the method further comprises: converting the data in each sample image into a marker character; and inputting the marked characters into the time-sequence neural network for decoding based on the second sequence characteristics, and outputting a predicted literal sequence of each sample image.

Further, prior to acquiring the plurality of sample images, the method further comprises: acquiring a first original image corresponding to each sample image; changing the size of the first original image according to a first preset requirement to obtain a first image; and whitening the first image to obtain each sample image.

Further, before acquiring the target image, the method further comprises: acquiring a second original image corresponding to the target image; whitening the second original image to obtain a second image; and changing the size of the second image according to a second preset requirement to obtain the target image, wherein the second preset requirement is different from the first preset requirement.

In order to achieve the above object, according to another aspect of the present application, there is provided an OCR recognition apparatus. The device includes: the device comprises a first acquisition unit, a second acquisition unit and a third acquisition unit, wherein the first acquisition unit is used for acquiring a target image, and the target image is an image to be subjected to OCR recognition; a first output unit, configured to input the target image into a target OCR recognition model, and output an OCR recognition result of the target image, where the target OCR recognition model is a model constructed based on a residual error network, a feature pyramid network, a time-series neural network, and an attention mechanism, where the time-series neural network is one of: GRU neural networks, LSTM neural networks, and RNN recurrent neural networks.

Further, the apparatus further comprises: the second acquisition unit is used for acquiring a plurality of sample images before the target image is input into the target OCR recognition model; the first processing unit is used for processing each sample image by utilizing the residual error networks to obtain a plurality of first characteristic images; the second processing unit is used for sampling and splicing the plurality of first characteristic images by using the characteristic pyramid network to obtain target characteristic images; the first construction unit is used for constructing a first OCR recognition model according to the target feature image, the time sequence neural network and the attention mechanism; and the first training unit is used for carrying out model training on the first OCR recognition model to obtain the target OCR recognition model.

Further, the second processing unit includes: the first acquisition module is used for acquiring the size of a second characteristic image in the plurality of first characteristic images; the first processing module is used for performing downsampling processing on a third feature image in the plurality of first feature images by using the feature pyramid network according to the size of the second feature image to obtain a fourth feature image; the second processing module is used for performing upsampling processing on a fifth characteristic image in the plurality of first characteristic images by using the characteristic pyramid network according to the size of the second characteristic image to obtain a sixth characteristic image; and the third processing module is used for splicing the second characteristic image, the fourth characteristic image and the sixth characteristic image to obtain the target characteristic image.

Further, the first building unit includes: the first output module is used for inputting the target characteristic image into a time sequence neural network for encoding and outputting a first sequence characteristic of the target characteristic image; the fourth processing module is used for carrying out weighted summation processing on the first sequence characteristics by utilizing the attention mechanism to obtain second sequence characteristics; and the first construction module is used for constructing the first OCR recognition model based on the second sequence characteristics and the time-series neural network for decoding.

Further, the apparatus further comprises: the third acquisition unit is used for acquiring a real character sequence of each sample image before model training is carried out on the first OCR recognition model to obtain the target OCR recognition model; a fourth obtaining unit, configured to obtain a predicted character sequence of each sample image, where the predicted character sequence is a character sequence output by inputting each sample image into the first OCR recognition model; a second training unit, configured to train the first OCR recognition model by using a loss function based on the real character sequence and the predicted character sequence, to obtain the target OCR recognition model, where the loss function at least includes: a cross entropy function.

Further, the apparatus further comprises: a first conversion unit for converting data in each sample image into a marker character before acquiring a predicted character sequence of each sample image; and the second output unit is used for inputting the marking characters into the time sequence neural network for decoding based on the second sequence characteristics and outputting the predicted character sequence of each sample image.

Further, the apparatus further comprises: the fifth acquisition unit is used for acquiring a first original image corresponding to each sample image before acquiring a plurality of sample images; the third processing unit is used for changing the size of the first original image according to a first preset requirement to obtain a first image; and the fourth processing unit is used for carrying out whitening processing on the first image to obtain each sample image.

Further, the apparatus further comprises: a sixth acquiring unit, configured to acquire a second original image corresponding to a target image before acquiring the target image; a fifth processing unit, configured to perform whitening processing on the second original image to obtain a second image; and the sixth processing unit is configured to change the size of the second image according to a second preset requirement to obtain the target image, where the second preset requirement is different from the first preset requirement.

In order to achieve the above object, according to another aspect of the present application, there is provided a computer-readable storage medium storing a program, wherein the program performs the OCR recognition method as described in any one of the above.

To achieve the above object, according to another aspect of the present application, there is provided an electronic device including one or more processors and a memory for storing one or more programs, wherein when the one or more programs are executed by the one or more processors, the one or more processors are caused to implement the OCR recognition method according to any one of the above.

Through the application, the following steps are adopted: acquiring a target image, wherein the target image is an image to be subjected to OCR recognition; inputting a target image into a target OCR recognition model, and outputting an OCR recognition result of the target image, wherein the target OCR recognition model is a model constructed on the basis of a residual error network, a feature pyramid network, a time sequence neural network and an attention mechanism, and the time sequence neural network is one of the following: GRU neural network, LSTM neural network and RNN recurrent neural network have solved the problem that OCR optical character discerns the rate of accuracy lower in the correlation technique. By acquiring the target image and inputting the target image into the model constructed based on the residual error network, the characteristic pyramid network, the time sequence neural network and the attention mechanism, the OCR recognition result of the target image can be output, and therefore the accuracy of OCR optical character recognition is improved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the application and, together with the description, serve to explain the application and are not intended to limit the application. In the drawings:

FIG. 1 is a flow chart of an OCR recognition method provided according to an embodiment of the application;

FIG. 2 is a diagram of a ResNet convolution module in an embodiment of the present application;

FIG. 3 is a schematic diagram of the operation of the FPN structure in an embodiment of the present application;

FIG. 4 is a schematic diagram of an original image in an embodiment of the present application;

FIG. 5 is a schematic diagram of an embodiment of the present application after processing an original image;

FIG. 6 is a schematic diagram of an image of an OCR recognition model trained to be input in the embodiment of the present application after processing;

FIG. 7 is a flow diagram of an alternative OCR recognition method provided in accordance with an embodiment of the present application;

FIG. 8 is a schematic diagram of an OCR recognition apparatus provided in accordance with an embodiment of the present application;

FIG. 9 is a schematic diagram of an alternative OCR recognition device provided in accordance with an embodiment of the present application;

fig. 10 is a schematic diagram of an electronic device provided according to an embodiment of the application.

Detailed Description

It should be noted that, in the present application, the embodiments and features of the embodiments may be combined with each other without conflict. The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that embodiments of the application described herein may be used. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

It should be noted that, the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data for presentation, analyzed data, etc.) referred to in the present disclosure are information and data authorized by the user or sufficiently authorized by each party.

The present invention is described below with reference to preferred implementation steps, and fig. 1 is a flowchart of an OCR recognition method according to an embodiment of the present application, and as shown in fig. 1, the method includes the following steps:

step S101, a target image is obtained, wherein the target image is an image to be subjected to OCR recognition.

In this embodiment, the target image may be an image obtained by correlating an image to be subjected to OCR recognition. For example, the correlation process may be performed on the image to be OCR-recognized, such as whitening the image to be OCR-recognized, and scaling the height and length of the image to be OCR-recognized.

Step S102, inputting a target image into a target OCR recognition model, and outputting an OCR recognition result of the target image, wherein the target OCR recognition model is a model constructed based on a residual error network, a feature pyramid network, a time sequence neural network and an attention mechanism, and the time sequence neural network is one of the following: GRU neural networks, LSTM neural networks, and RNN recurrent neural networks.

For example, ResNet (Residual Network) has a high image classification effect, is widely applied to the field of image classification and target detection, and can be used as a backbone Network for image semantic extraction. The FPN (Feature Pyramid Networks) is top-down, and can be used for fusing the upper sampling of high-level features and the bottom-level features to obtain a new Feature map, and can be used for enhancing the extraction of image semantics. Attention is equivalent to weighted summation of sequence features to obtain a more semantic low-dimensional feature tensor. In this embodiment, the time-series Neural Network may be a GRU (Gated Recurrent Unit), an LSTM (Long Short-Term Memory), or an RNN (Recurrent Neural Network). Therefore, the image which is subjected to the correlation processing on the image needing OCR recognition is input into an OCR recognition model which is constructed on the basis of a residual error network, a feature pyramid network, a time sequence neural network and an attention mechanism, and an OCR recognition result is output.

Through the steps S101 to S102, the OCR recognition result of the target image can be output by acquiring the target image and inputting the target image into the model constructed based on the residual error network, the feature pyramid network, the time-series neural network and the attention mechanism, so that the accuracy of OCR optical character recognition is improved.

Optionally, in the OCR recognition method provided in the embodiment of the present application, before inputting the target image into the target OCR recognition model, the method further includes: acquiring a plurality of sample images; processing each sample image by utilizing a plurality of residual error networks to obtain a plurality of first characteristic images; sampling and splicing a plurality of first characteristic images by using a characteristic pyramid network to obtain a target characteristic image; constructing a first OCR recognition model according to the target characteristic image, the time sequence neural network and the attention mechanism; and carrying out model training on the first OCR recognition model to obtain a target OCR recognition model.

In this embodiment, the specific steps of constructing a recognition model of a ResNet-FPN structure and attention mechanism may be: acquiring a plurality of sample images; then, operating each sample image by using a plurality of ResNet convolution modules to finally obtain a characteristic diagram; sampling and splicing the characteristic graph by using an operation similar to an FPN structure to obtain a final characteristic image; then, constructing an OCR recognition model according to the final characteristic diagram and the GRU-Attention structure; and training the OCR recognition model to obtain the final OCR recognition model to be used. Further, the model can be constructed using a platform paddlepaddlele (a platform for constructing a model first) or using another deep learning platform, such as tensrflow (a platform two for constructing a model), Pytorch (a platform three for constructing a model), mxnet (a platform four for constructing a model), or the like.

By the scheme, the OCR recognition model can be constructed according to ResNet, FPN, the time sequence neural network and the attention mechanism, the constructed model is trained, and the final OCR recognition model can be conveniently obtained. In addition, by using the trained OCR recognition model, the character recognition accuracy and the recall rate of the print form bill can reach 99 percent, which are 3 percent higher than the accuracy and the recall rate of the CRNN-CTC model, so that the accuracy and the timeliness of OCR recognition can be improved.

Optionally, in the OCR recognition method provided in the embodiment of the present application, the sampling and stitching the plurality of first feature images by using the feature pyramid network to obtain the target feature image includes: acquiring the size of a second characteristic image in the plurality of first characteristic images; according to the size of the second characteristic image, utilizing a characteristic pyramid network to perform downsampling processing on a third characteristic image in the plurality of first characteristic images to obtain a fourth characteristic image; according to the size of the second characteristic image, utilizing a characteristic pyramid network to perform upsampling processing on a fifth characteristic image in the plurality of first characteristic images to obtain a sixth characteristic image; and splicing the second characteristic image, the fourth characteristic image and the sixth characteristic image to obtain a target characteristic image.

Fig. 2 is a schematic diagram of a ResNet convolution module in the embodiment of the present application, and as shown in fig. 2, the ResNet is composed of a plurality of convolution modules. In this embodiment, three ResNet convolution module operations can be stacked and used to finally convert the input picture into a feature map with a smaller size (each time the convolution module operation is performed, the feature map is reduced in size). And then a ResNet convolution module operation is used again to output a feature map with smaller size. In addition, fig. 3 is a schematic diagram of the operation of the FPN structure in the embodiment of the present application, and as shown in fig. 3, the operation similar to the FPN structure is used to down-sample the feature maps output by the second ResNet convolution module operation, and up-sample the smaller feature maps output last, so that they are the same as the feature maps output by the third ResNet convolution module operation. And finally, splicing the characteristic diagram output by the second ResNet convolution module after down sampling, the smaller characteristic diagram output by the last ResNet convolution module after up sampling and the characteristic diagram output by the third ResNet convolution module to obtain the final characteristic diagram.

In summary, by using the ResNet and FPN structure operations, the input image can be converted into a feature image which can fuse both a wider view field feature and a narrower view field feature of the image, so as to lay a foundation for subsequently improving the accuracy of the OCR recognition result.

Optionally, in the OCR recognition method provided in the embodiment of the present application, constructing the first OCR recognition model according to the target feature image, the time-series neural network, and the attention mechanism includes: inputting the target characteristic image into a time sequence neural network for encoding, and outputting a first sequence characteristic of the target characteristic image; carrying out weighted summation processing on the first sequence features by using an attention mechanism to obtain second sequence features; and constructing a first OCR recognition model based on the second sequence characteristics and the time-sequence neural network for decoding.

In this embodiment, the obtained final feature map may be encoded by using a bidirectional GRU, then sequence features are obtained, then the weighting feature extraction is performed on the GRU of the encoding portion by using the Attention, and the sequence features after the weighted summation are obtained, and then the OCR recognition model before model training is constructed by combining with the GRU of the decoding portion.

By the scheme, the OCR recognition model before model training can be quickly and accurately constructed according to the time sequence neural network and the attention mechanism and by combining the characteristic diagram of the image.

Optionally, in the OCR recognition method provided in the embodiment of the present application, before performing model training on the first OCR recognition model to obtain the target OCR recognition model, the method further includes: acquiring a real character sequence of each sample image; acquiring a predicted character sequence of each sample image, wherein the predicted character sequence is a character sequence output by inputting each sample image into a first OCR recognition model; training the first OCR recognition model by using a loss function based on the real character sequence and the predicted character sequence to obtain a target OCR recognition model, wherein the loss function at least comprises: a cross entropy function.

For example, model training is performed using a cross entropy function as a loss function, and specifically as follows:

wherein Y is a real character label sequence,

is a predicted literal sequence.

In conclusion, the constructed OCR recognition model is trained by utilizing the loss function, so that a more accurate OCR recognition model can be conveniently obtained, and the accuracy of OCR optical character recognition can be improved.

Optionally, in the OCR recognition method provided in the embodiment of the present application, before obtaining the predicted text sequence of each sample image, the method further includes: converting the data in each sample image into a marker character; and inputting the marking characters into a time sequence neural network for decoding based on the second sequence characteristics, and outputting a predicted literal sequence of each sample image.

For example, converting the annotation data into token (labeled character), specifically: and carrying out integer numbering on characters and characters appearing in all labels, wherein the numbering starts from 0. And then converting the corresponding label of each picture into an integer token representation. For example, after the "lean-away fee" is converted, it can be "[ 0,1,2 ]". The marker characters are then input to the GRU of the decoding section in combination with the weighted summed sequence features, and a predicted text sequence for each sample image is output.

By the scheme, data in the image can be quickly and accurately converted into the marker characters, and the predicted character sequence of each sample image can be output according to the marker characters, the sequence characteristics after weighted summation and the time sequence neural network.

Optionally, in the OCR recognition method provided in the embodiment of the present application, before acquiring a plurality of sample images, the method further includes: acquiring a first original image corresponding to each sample image; changing the size of the first original image according to a first preset requirement to obtain a first image; and whitening the first image to obtain each sample image.

For example, the first preset requirement may be to set the size of the picture to 56 high and 760 wide. That is, the picture is read first and reset to 56 high and 760 wide, and turned to black and white tone. The method specifically comprises the following steps: if the aspect ratio of the original picture is greater than 56:760, scale it to 56 by height and less than 760 by right side gray-white fill 760; if the aspect ratio is less than or equal to 56:760, the height is scaled to 760 according to the width, and the height is less than 56, and the upper side and the lower side are filled to 56 by grey white. The image pixel values are then subtracted by 127.5 for normalization (whitening) to obtain a sample image for training. In addition, a schematic diagram of an original image in the embodiment of the present application is shown in fig. 4, and a schematic diagram of the original image after the height of the original image is subjected to the filling processing is shown in fig. 5.

By the scheme, the size setting and whitening processing can be quickly and accurately carried out on the image, so that the image can be better subjected to OCR recognition.

Optionally, in the OCR recognition method provided in the embodiment of the present application, before acquiring the target image, the method further includes: acquiring a second original image corresponding to the target image; whitening the second original image to obtain a second image; and changing the size of the second image according to a second preset requirement to obtain the target image, wherein the second preset requirement is different from the first preset requirement.

For example, the second preset requirement may be to set the height of the picture to 56, and the length of the picture is not limited. That is, after the model is trained sufficiently, the new picture is converted into a gray scale, and then scaled to 56 degrees in the original scale, and the length is no longer limited to 760 degrees. Fig. 6 is a schematic diagram of an image of an OCR recognition model after training to be input in the embodiment of the present application after processing, and an image as shown in fig. 6 is input into the model and character sequence prediction is performed.

In summary, the image of the trained OCR recognition model can be processed quickly and accurately, and OCR recognition can be performed on the image by using the trained OCR recognition model.

Fig. 7 is a flowchart of an alternative OCR recognition method according to an embodiment of the present application, and as shown in fig. 7, the flow of the OCR recognition specifically includes:

step S701, performing resizing and normalization processing on the image data;

step S702, carrying out character turning token processing on the label data;

step S703, constructing a recognition model of a ResNet-FPN structure and an attention mechanism;

step S704, using a cross entropy function as a loss function to carry out model training;

step S705, the prediction picture is processed, and the trained model is used for inference.

In summary, according to the OCR recognition method provided by the embodiment of the application, a target image is obtained, wherein the target image is an image to be subjected to OCR recognition; inputting a target image into a target OCR recognition model, and outputting an OCR recognition result of the target image, wherein the target OCR recognition model is a model constructed on the basis of a residual error network, a feature pyramid network, a time sequence neural network and an attention mechanism, and the time sequence neural network is one of the following: GRU neural network, LSTM neural network and RNN recurrent neural network have solved the problem that OCR optical character discerns the rate of accuracy lower in the correlation technique. By acquiring the target image and inputting the target image into the model constructed based on the residual error network, the characteristic pyramid network, the time sequence neural network and the attention mechanism, the OCR recognition result of the target image can be output, and therefore the accuracy of OCR optical character recognition is improved.

It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer-executable instructions and that, although a logical order is illustrated in the flowcharts, in some cases, the steps illustrated or described may be performed in an order different than presented herein.

The embodiment of the present application further provides an OCR recognition apparatus, and it should be noted that the OCR recognition apparatus of the embodiment of the present application may be used to execute the OCR recognition method provided by the embodiment of the present application. The following describes an OCR recognition apparatus according to an embodiment of the present application.

FIG. 8 is a schematic diagram of an OCR recognition device according to an embodiment of the application. As shown in fig. 8, the apparatus includes: a first acquisition unit 801 and a first output unit 802.

Specifically, the first obtaining unit 801 is configured to obtain a target image, where the target image is an image to be subjected to OCR recognition;

a first output unit 802, configured to input a target image into a target OCR recognition model, and output an OCR recognition result of the target image, where the target OCR recognition model is a model constructed based on a residual error network, a feature pyramid network, a time-series neural network, and an attention mechanism, where the time-series neural network is one of: GRU neural networks, LSTM neural networks, and RNN recurrent neural networks.

To sum up, in the OCR recognition apparatus provided in the embodiment of the present application, the first obtaining unit 801 obtains a target image, where the target image is an image to be subjected to OCR recognition; the first output unit 802 inputs the target image into a target OCR recognition model, and outputs an OCR recognition result of the target image, wherein the target OCR recognition model is a model constructed based on a residual error network, a feature pyramid network, a time-series neural network, and an attention mechanism, and the time-series neural network is one of the following: the method comprises the steps of obtaining a target image, inputting the target image into a model constructed based on a residual error network, a characteristic pyramid network, a time sequence neural network and an attention mechanism, outputting an OCR recognition result of the target image, and improving the accuracy of OCR optical character recognition.

Optionally, in an OCR recognition apparatus provided in an embodiment of the present application, the apparatus further includes: the second acquisition unit is used for acquiring a plurality of sample images before the target image is input into the target OCR recognition model; the first processing unit is used for processing each sample image by utilizing a plurality of residual error networks to obtain a plurality of first characteristic images; the second processing unit is used for sampling and splicing the plurality of first characteristic images by using the characteristic pyramid network to obtain target characteristic images; the first construction unit is used for constructing a first OCR recognition model according to the target characteristic image, the time sequence neural network and the attention mechanism; and the first training unit is used for carrying out model training on the first OCR recognition model to obtain a target OCR recognition model.

Optionally, in an OCR device provided in an embodiment of the present application, the second processing unit includes: the first acquisition module is used for acquiring the size of a second characteristic image in the plurality of first characteristic images; the first processing module is used for carrying out downsampling processing on a third characteristic image in the plurality of first characteristic images by utilizing the characteristic pyramid network according to the size of the second characteristic image to obtain a fourth characteristic image; the second processing module is used for performing upsampling processing on a fifth characteristic image in the plurality of first characteristic images by utilizing the characteristic pyramid network according to the size of the second characteristic image to obtain a sixth characteristic image; and the third processing module is used for splicing the second characteristic image, the fourth characteristic image and the sixth characteristic image to obtain a target characteristic image.

Optionally, in an OCR recognition apparatus provided in an embodiment of the present application, the first building unit includes: the first output module is used for inputting the target characteristic image into a time sequence neural network for coding and outputting a first sequence characteristic of the target characteristic image; the fourth processing module is used for carrying out weighted summation processing on the first sequence characteristics by utilizing an attention mechanism to obtain second sequence characteristics; and the first construction module is used for constructing a first OCR recognition model based on the second sequence characteristics and the time sequence neural network for decoding.

Optionally, in an OCR recognition apparatus provided in an embodiment of the present application, the apparatus further includes: the third acquisition unit is used for acquiring a real character sequence of each sample image before model training is carried out on the first OCR recognition model to obtain a target OCR recognition model; a fourth obtaining unit, configured to obtain a predicted character sequence of each sample image, where the predicted character sequence is a character sequence output by inputting each sample image into the first OCR recognition model; and the second training unit is used for training the first OCR recognition model by utilizing a loss function based on the real character sequence and the predicted character sequence to obtain a target OCR recognition model, wherein the loss function at least comprises: a cross entropy function.

Optionally, in an OCR recognition apparatus provided in an embodiment of the present application, the apparatus further includes: a first conversion unit for converting data in each sample image into a marker character before acquiring a predicted character sequence of each sample image; and the second output unit is used for inputting the marking characters into a time sequence neural network for decoding based on the second sequence characteristics and outputting the predicted character sequence of each sample image.

Optionally, in an OCR recognition apparatus provided in an embodiment of the present application, the apparatus further includes: the fifth acquisition unit is used for acquiring a first original image corresponding to each sample image before acquiring a plurality of sample images; the third processing unit is used for changing the size of the first original image according to the first preset requirement to obtain a first image; and the fourth processing unit is used for carrying out whitening processing on the first image to obtain each sample image.

Optionally, in an OCR recognition apparatus provided in an embodiment of the present application, the apparatus further includes: a sixth acquiring unit, configured to acquire a second original image corresponding to the target image before acquiring the target image; the fifth processing unit is used for carrying out whitening processing on the second original image to obtain a second image; and the sixth processing unit is used for changing the size of the second image according to a second preset requirement to obtain the target image, wherein the second preset requirement is different from the first preset requirement.

The OCR recognition device includes a processor and a memory, the first obtaining unit 801 and the first output unit 802 are stored in the memory as program units, and the processor executes the program units stored in the memory to implement corresponding functions.

Fig. 9 is a schematic diagram of an alternative OCR recognition apparatus provided according to an embodiment of the present application, as shown in fig. 9, the apparatus includes: the device comprises a training sample processing unit, a model training unit, a prediction image processing unit, a model OCR recognition unit and a prediction result output unit.

The processor comprises a kernel, and the kernel calls the corresponding program unit from the memory. The kernel can be set to be one or more, and the accuracy of OCR optical character recognition is improved by adjusting kernel parameters.

The memory may include volatile memory in a computer readable medium, Random Access Memory (RAM) and/or nonvolatile memory such as Read Only Memory (ROM) or flash memory (flash RAM), and the memory includes at least one memory chip.

An embodiment of the present invention provides a computer-readable storage medium on which a program is stored, which, when executed by a processor, implements the OCR recognition method.

As shown in fig. 10, an embodiment of the present invention provides an electronic device, where the device includes a processor, a memory, and a program stored in the memory and executable on the processor, and the processor executes the program to implement the following steps: acquiring a target image, wherein the target image is an image to be subjected to OCR recognition; inputting the target image into a target OCR recognition model, and outputting an OCR recognition result of the target image, wherein the target OCR recognition model is a model constructed based on a residual error network, a feature pyramid network, a time sequence neural network and an attention mechanism, and the time sequence neural network is one of the following: GRU neural networks, LSTM neural networks, and RNN recurrent neural networks.

The processor executes the program and further realizes the following steps: before inputting the target image into a target OCR recognition model, the method further comprises: acquiring a plurality of sample images; processing each sample image by using a plurality of residual error networks to obtain a plurality of first characteristic images; sampling and splicing the plurality of first characteristic images by using the characteristic pyramid network to obtain a target characteristic image; constructing a first OCR recognition model according to the target feature image, the time sequence neural network and the attention mechanism; and carrying out model training on the first OCR recognition model to obtain the target OCR recognition model.

The processor executes the program and further realizes the following steps: sampling and splicing the plurality of first feature images by using the feature pyramid network to obtain a target feature image, wherein the step of sampling and splicing the plurality of first feature images comprises the following steps: acquiring the size of a second characteristic image in the plurality of first characteristic images; according to the size of the second characteristic image, utilizing the characteristic pyramid network to perform downsampling processing on a third characteristic image in the plurality of first characteristic images to obtain a fourth characteristic image; according to the size of the second characteristic image, utilizing the characteristic pyramid network to perform upsampling processing on a fifth characteristic image in the plurality of first characteristic images to obtain a sixth characteristic image; and splicing the second characteristic image, the fourth characteristic image and the sixth characteristic image to obtain the target characteristic image.

The processor executes the program and further realizes the following steps: constructing a first OCR recognition model from the target feature image, the temporal neural network, and the attention mechanism comprises: inputting the target characteristic image into a time sequence neural network for encoding, and outputting a first sequence characteristic of the target characteristic image; carrying out weighted summation processing on the first sequence features by using the attention mechanism to obtain second sequence features; and constructing the first OCR recognition model based on the second sequence characteristics and a time-sequence neural network for decoding.

The processor executes the program and further realizes the following steps: before model training of the first OCR recognition model to obtain the target OCR recognition model, the method further includes: acquiring a real character sequence of each sample image; acquiring a predicted character sequence of each sample image, wherein the predicted character sequence is a character sequence output by inputting each sample image into the first OCR recognition model; training the first OCR recognition model by using a loss function based on the real character sequence and the predicted character sequence to obtain the target OCR recognition model, wherein the loss function at least comprises: a cross entropy function.

The processor executes the program and further realizes the following steps: prior to obtaining the predicted text sequence for each sample image, the method further comprises: converting the data in each sample image into a marker character; and inputting the marked characters into the time-sequence neural network for decoding based on the second sequence characteristics, and outputting a predicted literal sequence of each sample image.

The processor executes the program and further realizes the following steps: prior to acquiring the plurality of sample images, the method further comprises: acquiring a first original image corresponding to each sample image; changing the size of the first original image according to a first preset requirement to obtain a first image; and whitening the first image to obtain each sample image.

The processor executes the program and further realizes the following steps: prior to acquiring the target image, the method further comprises: acquiring a second original image corresponding to the target image; whitening the second original image to obtain a second image; and changing the size of the second image according to a second preset requirement to obtain the target image, wherein the second preset requirement is different from the first preset requirement. The device herein may be a server, a PC, a PAD, a mobile phone, etc.

The present application further provides a computer program product adapted to perform a program for initializing the following method steps when executed on a data processing device: acquiring a target image, wherein the target image is an image to be subjected to OCR recognition; inputting the target image into a target OCR recognition model, and outputting an OCR recognition result of the target image, wherein the target OCR recognition model is a model constructed based on a residual error network, a feature pyramid network, a time sequence neural network and an attention mechanism, and the time sequence neural network is one of the following: GRU neural networks, LSTM neural networks, and RNN recurrent neural networks.

When executed on a data processing device, is further adapted to perform a procedure for initializing the following method steps: before inputting the target image into a target OCR recognition model, the method further comprises: acquiring a plurality of sample images; processing each sample image by using a plurality of residual error networks to obtain a plurality of first characteristic images; sampling and splicing the plurality of first characteristic images by using the characteristic pyramid network to obtain a target characteristic image; constructing a first OCR recognition model according to the target feature image, the time sequence neural network and the attention mechanism; and carrying out model training on the first OCR recognition model to obtain the target OCR recognition model.

When executed on a data processing device, is further adapted to perform a procedure for initializing the following method steps: sampling and splicing the plurality of first feature images by using the feature pyramid network to obtain a target feature image, wherein the step of sampling and splicing the plurality of first feature images comprises the following steps: acquiring the size of a second characteristic image in the plurality of first characteristic images; according to the size of the second characteristic image, utilizing the characteristic pyramid network to perform downsampling processing on a third characteristic image in the plurality of first characteristic images to obtain a fourth characteristic image; according to the size of the second characteristic image, utilizing the characteristic pyramid network to perform upsampling processing on a fifth characteristic image in the plurality of first characteristic images to obtain a sixth characteristic image; and splicing the second characteristic image, the fourth characteristic image and the sixth characteristic image to obtain the target characteristic image.

When executed on a data processing device, is further adapted to perform a procedure for initializing the following method steps: constructing a first OCR recognition model from the target feature image, the temporal neural network, and the attention mechanism comprises: inputting the target characteristic image into a time sequence neural network for encoding, and outputting a first sequence characteristic of the target characteristic image; carrying out weighted summation processing on the first sequence features by using the attention mechanism to obtain second sequence features; and constructing the first OCR recognition model based on the second sequence characteristics and a time-sequence neural network for decoding.

When executed on a data processing device, is further adapted to perform a procedure for initializing the following method steps: before model training of the first OCR recognition model to obtain the target OCR recognition model, the method further includes: acquiring a real character sequence of each sample image; acquiring a predicted character sequence of each sample image, wherein the predicted character sequence is a character sequence output by inputting each sample image into the first OCR recognition model; training the first OCR recognition model by using a loss function based on the real character sequence and the predicted character sequence to obtain the target OCR recognition model, wherein the loss function at least comprises: a cross entropy function.

When executed on a data processing device, is further adapted to perform a procedure for initializing the following method steps: prior to obtaining the predicted text sequence for each sample image, the method further comprises: converting the data in each sample image into a marker character; and inputting the marked characters into the time-sequence neural network for decoding based on the second sequence characteristics, and outputting a predicted literal sequence of each sample image.

When executed on a data processing device, is further adapted to perform a procedure for initializing the following method steps: prior to acquiring the plurality of sample images, the method further comprises: acquiring a first original image corresponding to each sample image; changing the size of the first original image according to a first preset requirement to obtain a first image; and whitening the first image to obtain each sample image.

When executed on a data processing device, is further adapted to perform a procedure for initializing the following method steps: prior to acquiring the target image, the method further comprises: acquiring a second original image corresponding to the target image; whitening the second original image to obtain a second image; and changing the size of the second image according to a second preset requirement to obtain the target image, wherein the second preset requirement is different from the first preset requirement.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). The memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.

The above are merely examples of the present application and are not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims

1. An OCR recognition method, comprising:

acquiring a target image, wherein the target image is an image to be subjected to OCR recognition;

inputting the target image into a target OCR recognition model, and outputting an OCR recognition result of the target image, wherein the target OCR recognition model is a model constructed based on a residual error network, a feature pyramid network, a time sequence neural network and an attention mechanism, and the time sequence neural network is one of the following: GRU neural networks, LSTM neural networks, and RNN recurrent neural networks.

2. The method of claim 1, prior to inputting the target image into a target OCR recognition model, the method further comprising:

acquiring a plurality of sample images;

processing each sample image by using a plurality of residual error networks to obtain a plurality of first characteristic images;

sampling and splicing the plurality of first characteristic images by using the characteristic pyramid network to obtain a target characteristic image;

constructing a first OCR recognition model according to the target feature image, the time sequence neural network and the attention mechanism;

and carrying out model training on the first OCR recognition model to obtain the target OCR recognition model.

3. The method of claim 2, wherein sampling and stitching the plurality of first feature images using the feature pyramid network to obtain a target feature image comprises:

acquiring the size of a second characteristic image in the plurality of first characteristic images;

according to the size of the second characteristic image, utilizing the characteristic pyramid network to perform downsampling processing on a third characteristic image in the plurality of first characteristic images to obtain a fourth characteristic image;

according to the size of the second characteristic image, utilizing the characteristic pyramid network to perform upsampling processing on a fifth characteristic image in the plurality of first characteristic images to obtain a sixth characteristic image;

and splicing the second characteristic image, the fourth characteristic image and the sixth characteristic image to obtain the target characteristic image.

4. The method of claim 2, wherein constructing a first OCR recognition model from the target feature image, the temporal neural network, and the attention mechanism comprises:

inputting the target characteristic image into a time sequence neural network for encoding, and outputting a first sequence characteristic of the target characteristic image;

carrying out weighted summation processing on the first sequence features by using the attention mechanism to obtain second sequence features;

and constructing the first OCR recognition model based on the second sequence characteristics and a time-sequence neural network for decoding.

5. The method of claim 4, wherein prior to model training the first OCR recognition model to obtain the target OCR recognition model, the method further comprises:

acquiring a real character sequence of each sample image;

acquiring a predicted character sequence of each sample image, wherein the predicted character sequence is a character sequence output by inputting each sample image into the first OCR recognition model;

training the first OCR recognition model by using a loss function based on the real character sequence and the predicted character sequence to obtain the target OCR recognition model, wherein the loss function at least comprises: a cross entropy function.

6. The method of claim 5, wherein prior to obtaining the predicted text sequence for each sample image, the method further comprises:

converting the data in each sample image into a marker character;

and inputting the marked characters into the time-sequence neural network for decoding based on the second sequence characteristics, and outputting a predicted literal sequence of each sample image.

7. The method of claim 2, wherein prior to acquiring the plurality of sample images, the method further comprises:

acquiring a first original image corresponding to each sample image;

changing the size of the first original image according to a first preset requirement to obtain a first image;

and whitening the first image to obtain each sample image.

8. The method of claim 7, wherein prior to acquiring the target image, the method further comprises:

acquiring a second original image corresponding to the target image;

whitening the second original image to obtain a second image;

and changing the size of the second image according to a second preset requirement to obtain the target image, wherein the second preset requirement is different from the first preset requirement.

9. An OCR recognition apparatus, comprising:

the device comprises a first acquisition unit, a second acquisition unit and a third acquisition unit, wherein the first acquisition unit is used for acquiring a target image, and the target image is an image to be subjected to OCR recognition;

a first output unit, configured to input the target image into a target OCR recognition model, and output an OCR recognition result of the target image, where the target OCR recognition model is a model constructed based on a residual error network, a feature pyramid network, a time-series neural network, and an attention mechanism, where the time-series neural network is one of: GRU neural networks, LSTM neural networks, and RNN recurrent neural networks.

10. A computer-readable storage medium characterized in that the storage medium stores a program, wherein the program executes the OCR recognition method according to any one of claims 1 to 8.

11. An electronic device comprising one or more processors and memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the OCR recognition method of any one of claims 1 to 8.