CN111105549A

CN111105549A - Optical character recognition method, device and computer storage medium

Info

Publication number: CN111105549A
Application number: CN201911318760.6A
Authority: CN
Inventors: 乐识非
Original assignee: Unicloud Nanjing Digital Technology Co Ltd
Current assignee: Unicloud Nanjing Digital Technology Co Ltd
Priority date: 2019-12-19
Filing date: 2019-12-19
Publication date: 2020-05-05

Abstract

An optical character recognition method and apparatus, wherein the method includes preprocessing an image acquired by an optical device through an SSD network model; detecting the preprocessed image by adopting an EAST neural network detection model obtained by training; and identifying the detection result by adopting a CRNN neural network identification model obtained by training. The scheme of the invention adopts the solutions of SSD network segmentation, EAST network detection and CRNN network identification, and also considers the invoice type, detection speed and identification accuracy in the office scene, thereby providing an effective technical scheme for intelligent optical character recognition OCR land in the office scene.

Description

Optical character recognition method, device and computer storage medium

Technical Field

The invention belongs to the field of character recognition, and particularly relates to an optical character recognition method, an optical character recognition device and a computer readable storage medium.

Background

With the rapid development of artificial intelligence technology, the application field of character recognition has gradually shifted from a simple scene oriented to scientific research to a complex application scene closely related to social activities. Based on the above, the design and use of optical character recognition gradually shift from single functionality to cloud, but the existing common optical character recognition OCR technology can complete detection and recognition in the same invoice type, once the invoice background has high noise or large type difference, the existing OCR technology is not easy to separate the boundaries of various invoices from the background, which is not applicable to the optical character recognition technology facing to the office scene.

The OCR technology used at present is mainly applied under an office scene and in a natural scene, and the primary detection technology represented by a Yolo series for the former occupies the mainstream of the detection of the existing natural scene, but the technology has the defect of low regression rate for characters with different scales; in the common office scene character detection, the prior art can only work for one type of invoices, but the clustering method is used for identifying multiple types of invoices, so that the invoices of different types cannot be distinguished with high accuracy.

Disclosure of Invention

In view of the above-mentioned deficiencies of the prior art, one of the objectives of the present invention is to solve the problem that the prior art cannot distinguish invoices of different types with high accuracy and avoid the defect of low regression rate of characters of different scales.

The embodiment of the invention discloses an optical character recognition method, which comprises the steps of preprocessing an image acquired by optical equipment through an SSD network model; detecting the preprocessed image by adopting an EAST neural network detection model obtained by training; and identifying the detection result by adopting a CRNN neural network identification model obtained by training.

In one possible embodiment, the pre-processing includes data cleansing and dataset preparation of the image; and carrying out image segmentation processing by using the trained SSD network model.

In one possible embodiment, the EAST neural network detection model is obtained by: pre-training to obtain a detection model by changing a data set path, adjusting training parameters according to resources, cleaning pre-training, and starting a training process under a multi-window manager tmux; the pre-trained parameters are stored for retraining to obtain a detection model.

In one possible embodiment, the CRNN neural network recognition pattern is obtained by: placing the images to be identified in the same path, and then cutting the images according to the detection result to obtain data to be identified; changing a data set path, adjusting training parameters, cleaning pre-training, starting a training process under a multi-window manager tmux, and training to obtain a recognition model.

In a possible embodiment, the method further comprises verifying the detection result and the identification result.

In one possible embodiment, the method further comprises optimizing the EAST neural network detection model and the CRNN neural network identification model according to the verification result.

The embodiment of the invention also discloses an optical character recognition device, which comprises a preprocessing module, a data processing module and a data processing module, wherein the preprocessing module is used for preprocessing the image acquired by the optical equipment through the SSD network model; the detection module is used for detecting the preprocessed image by adopting an EAST neural network detection model obtained by training; and the recognition module is used for recognizing the detection result by adopting a CRNN neural network recognition model obtained by training.

In one possible embodiment, the preprocessing module is further configured to: performing data cleaning and data set production on the image; and carrying out image segmentation processing by using the trained SSD network model.

In one possible embodiment, the system further comprises a verification module for optimizing the EAST neural network detection model and the CRNN neural network identification model according to the verification result.

The invention also discloses a computer readable storage medium having a computer program stored thereon, which when executed by a processor implements any of the methods described above.

The invention has the beneficial effects that: the scheme of the invention adopts the solutions of SSD network segmentation, EAST network detection and CRNN network identification, and also considers the invoice type, detection speed and identification accuracy in the office scene, thereby providing an effective technical scheme for intelligent optical character recognition OCR land in the office scene.

Drawings

FIG. 1 is a flow chart of a method according to an embodiment of the present invention;

FIG. 2 is a schematic flow chart of a specific method according to an embodiment of the present invention;

fig. 3 is a schematic diagram of a cloud service environment deployment architecture according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of an apparatus according to an embodiment of the present invention.

Detailed Description

In order to facilitate understanding of those skilled in the art, the present invention will be further described with reference to the following examples and drawings, which are not intended to limit the present invention.

The embodiment of the invention discloses an optical character recognition method, which comprises the following steps:

s101, preprocessing an image acquired by the optical equipment through an SSD network model.

The preprocessing comprises the steps of carrying out data cleaning and data set production on the image; and carrying out image segmentation processing by using the trained SSD network model. The English full name of the SSD is a Single Shot MultiBox Detector, the Single Shot indicates that the SSD algorithm belongs to a one-stage method, and the MultiBox indicates that the SSD is multi-frame prediction.

Specifically, referring to fig. 2, the acquired image is preprocessed, that is, data sample sampling is performed: two sub-processes are usually involved, one is the data processing process and the other is the original sample segmentation process.

The data processing sub-process mainly comprises data acquisition, data cleaning and data set production processes.

The data acquisition process mainly comprises the steps of applying for invoice data to relevant departments, sampling on-site data after sampling permission is obtained, carrying out simple normalization and arrangement on the acquired data, and grading according to the quality of the acquired samples to finish coarse-grained data analysis.

The data cleaning process comprises the step of cleaning the data after coarse-grained cleaning in a fine granularity mode, wherein the data can comprise the following dimensions, and pictures which do not meet the minimum dimension, resolution and occupied proportion are filtered; the cleansing of the data requires the following goals to be achieved: the key words of the invoice are clear, the illumination is uniform, the invoice has no distortion, ink marks, obvious folding marks, smooth surface, no folds and no virtual images.

The data set production requires the support of a standard data set format, the data can be produced into a VOC-like data set format, and the data set can comprise the following four items: annotation carries the scaled data, JPEG contains the images in each jpg format, score contains the data samples for each score, layout contains the sample numbers for training, training-validation and validation.

And secondly, the original sample segmentation sub-process mainly comprises SSD interface design and SSD image segmentation training. Wherein. The SSD interface design enables invoice segmentation success to be derived from the fact that both OCR and labels are sorted by the results of the same SSD box, which ensures the relative ordering of large boxes. The SSD image segmentation training is to perform SSD image segmentation training after the SSD interface design is finished, and distinguish the coarse-grained invoice types by using a model obtained by training.

And S102, detecting the preprocessed image by adopting an EAST neural network detection model obtained by training.

The test model needs to be trained before testing. Obtaining the EAST neural network detection model by the following steps: pre-training to obtain a detection model by changing a data set path, adjusting training parameters according to resources, cleaning pre-training, and starting a training process under a multi-window manager tmux; the pre-trained parameters are stored for retraining to obtain a detection model.

Specifically, referring to fig. 2, the EAST detection model training is mainly divided into pre-training and retraining. The pre-training comprises changing a data set path in an EAST pre-training part, adjusting training parameters on the multi-core V100 according to resources, cleaning pre-training, starting a training process under tmux, and then training to obtain a detection model. Retraining comprises the steps of reserving the checkpoint of the pre-training in the EAST retraining part, then inputting relevant images and corresponding json files, and completing retraining by using the parameters of the restore pre-training.

And S103, identifying the detection result by adopting a CRNN neural network identification model obtained by training.

Obtaining a CRNN neural network recognition model by the following steps: placing the images to be identified in the same path, and then cutting the images according to the detection result to obtain data to be identified; changing a data set path, adjusting training parameters, cleaning pre-training, starting a training process under a multi-window manager tmux, and training to obtain a recognition model. The CRNN is a convolution cyclic neural network structure, and is used for solving the problem of image-based sequence recognition, particularly the problem of scene character recognition.

Specifically, referring to fig. 2, the training of the recognition model is mainly divided into label generation and training. The label generation comprises the steps of putting invoices to be identified under the same folder address, cutting out eight-point coordinates based on QUAD according to a detection result, packaging the subgraph and the corresponding label to form files of label and path, and then changing a training label set of CRNN to avoid automatic escape of predicted characters. The training comprises the steps of changing a data set path in a CRNN pre-training part, adjusting training parameters on the multi-core V100 according to resources, cleaning pre-training, starting a training process under tmux, and then training to obtain a detection model.

After obtaining corresponding training models, namely an EAST neural network detection model and a CRNN neural network identification model, the models can be verified, and finally an analysis report of detection and identification is obtained. Specifically, referring to fig. 2, the verification method includes detection model verification: the method mainly comprises the steps of checking a specific detection result and checking a macroscopic detection index, wherein the specific detection result comprises a frame of a code, a number, a date, time, mileage and money of an invoice; the latter includes the correct rate, regression rate and F1 values for the field level; and (3) identifying the model: the method mainly comprises the steps of checking a specific identification result and checking a macroscopic detection index, wherein the specific identification result comprises specific field values of a code, a number, a date, time, mileage and amount of an invoice; the latter includes the correct rate, regression rate and F1 values for the field level.

The method also comprises an improvement process which is mainly divided into data quality improvement and algorithm improvement. Referring to fig. 2, wherein the data quality improvement may be for sampling of a small number of samples, mainly taking a supplementary invoice resampling strategy to improve data quality; for data samples with specific applications, data quality improvement is mainly performed by adopting image processing modes such as data enhancement and the like. The algorithm improvement can be divided into two levels of API image processing level improvement and core algorithm improvement, a core algorithm of target detection, clustering, character detection and character recognition is selected macroscopically, and the operation of the image is carried out on the API level.

The method may be issued to a cloud server, and referring to fig. 3, the design corresponding to the cloud server includes a basic deployment environment and a cluster deployment environment, where the basic deployment environment may include: 1) deploying a Docker environment: and installing a common docker and configuring the authority, then creating a docker group, adding the current user into the group, and installing nvidia-docker. 2) Docker mirror images are made and uploaded to a warehouse: firstly, registering an account number in a Docker Hub, creating a warehouse after registration is completed, and then building a Docker mirror image locally and uploading the Docker mirror image to the warehouse. 3) Installing a deep learning mirror in the cluster by using Docker: the deep learning mirror image is downloaded firstly, then the deep learning running container is created, and finally the deployment of the basic container can be completed by opening the deep learning running container.

Deploying the clustered environment may include: 1) and (3) deep learning cluster frame component installation: and after the deep learning mirror image installation is completed, the deep learning cluster framework component is installed, and the client and the server of the K8S are respectively deployed on the related servers. 2) Create K8S deployment and service: the deployment consists of 3 server copies controlled by Kubernets deployment initiation _ reference, and if the states of the deployment and pod are checked and are in Running, the K8S is created and the service is successful. 3) Invoke K8S deployment and service: and recognizing and packaging the optical character facing to the office scene to form cloud service, and releasing the cloud service on a public cloud to finish OCR landing facing to the office scene.

By the method, a better office scene character recognition result can be obtained under the condition of considering the invoice type, the detection speed and the recognition accuracy in the office scene; the invention aims to provide wider services for more users by packaging simple character recognition services into cloud services, and constructs an optical character recognition cloud service facing to an office scene.

The embodiment of the present invention further discloses an optical character recognition apparatus 10, as shown in fig. 4, including: a preprocessing module 101, configured to preprocess an image acquired by an optical device through an SSD network model; the detection module 102 is configured to detect the preprocessed image by using an EAST neural network detection model obtained through training; and the recognition module 103 is configured to recognize the detection result by using the trained CRNN neural network recognition model.

In one embodiment, the preprocessing module 101 is further configured to: performing data cleaning and data set production on the image; and carrying out image segmentation processing by using the trained SSD network model.

In one embodiment, the system further comprises a verification module for optimizing the EAST neural network detection model and the CRNN neural network identification model according to the verification result

For the specific implementation of the apparatus 10, reference may be made to the method embodiment, which is not described in detail.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described embodiments are merely illustrative, and for example, a division of a unit is merely a division of a logic function, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed.

Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment of the present invention.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

While the invention has been described in terms of its preferred embodiments, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention.

Claims

1. An optical character recognition method is characterized in that an image acquired by an optical device is preprocessed through an SSD network model; detecting the preprocessed image by adopting an EAST neural network detection model obtained by training; and identifying the detection result by adopting a CRNN neural network identification model obtained by training.

2. The method of claim 1, wherein the pre-processing comprises data cleansing and dataset production of the image; and carrying out image segmentation processing by using the trained SSD network model.

3. The method of claim 1 or 2, wherein the EAST neural network detection model is obtained by: pre-training to obtain a detection model by changing a data set path, adjusting training parameters according to resources, cleaning pre-training, and starting a training process under a multi-window manager tmux; the pre-trained parameters are stored for retraining to obtain a detection model.

4. The method of claim 1 or 2, wherein the CRNN neural network recognition pattern is obtained by: placing the images to be identified in the same path, and then cutting the images according to the detection result to obtain data to be identified; changing a data set path, adjusting training parameters, cleaning pre-training, starting a training process under a multi-window manager tmux, and training to obtain a recognition model.

5. The method of claim 1, further comprising verifying the detection result and the identification result.

6. The method of claim 1 or 5, further comprising optimizing the EAST neural network detection model and the CRNN neural network identification model based on the validation results.

7. An optical character recognition device is characterized by comprising a preprocessing module, a character recognition module and a character recognition module, wherein the preprocessing module is used for preprocessing an image acquired by an optical device through an SSD network model; the detection module is used for detecting the preprocessed image by adopting an EAST neural network detection model obtained by training; and the recognition module is used for recognizing the detection result by adopting a CRNN neural network recognition model obtained by training.

8. The apparatus of claim 7, wherein the pre-processing module is further to: performing data cleaning and data set production on the image; and carrying out image segmentation processing by using the trained SSD network model.

9. The apparatus of claim 7, further comprising a verification module to optimize the EAST neural network detection model and the CRNN neural network identification model based on a verification result.

10. A computer storage medium having stored thereon a computer program which, when executed by a processor, implements the optical character recognition method of any of the preceding claims 1-6.