CN112348024A

CN112348024A - Image-text identification method and system based on deep learning optimization network

Info

Publication number: CN112348024A
Application number: CN202011178476.6A
Authority: CN
Inventors: 戴亦斌
Original assignee: Beijing Information Technology Bote Intelligent Technology Co ltd
Current assignee: Beijing Information Technology Bote Intelligent Technology Co ltd
Priority date: 2020-10-29
Filing date: 2020-10-29
Publication date: 2021-02-09

Abstract

The invention discloses a method and a system for recognizing graphics and texts based on a deep learning optimization network, which belong to the technical field of optical character recognition and are characterized in that: at least comprises the following steps: the method comprises the following steps: identifying an object in a single-frame image through a deep learning target detection technology; step two: the method comprises the following steps of (1) scratching out a picture of an object through a scratching model and an aligning model, and aligning; step three: performing OCR recognition on the whole picture; step four: and sending the character recognition result obtained by OCR recognition into an NLP correction model established based on deep learning natural language processing for correction, and finally outputting the character recognition result. The invention can quickly identify the processing technology of the photo and the video of the whole block of characters by establishing an inaccurate text correction model by means of a deep learning target detection technology, and can mark the whole block of characters in the whole photo or the whole frame of video, thereby saving system resources of OCR processing and greatly improving character identification efficiency.

Description

Image-text identification method and system based on deep learning optimization network

Technical Field

The invention belongs to the technical field of optical character recognition, and particularly relates to a text-text recognition method and system based on a deep learning optimization network.

Background

As is well known, OCR (Optical Character Recognition) refers to a process in which an electronic device (e.g., a scanner or a digital camera) checks a Character printed on paper, determines its shape by detecting dark and light patterns, and then translates the shape into a computer text by a Character Recognition method; the method is characterized in that characters in a paper document are converted into an image file with a black-white dot matrix in an optical mode aiming at print characters, and the characters in the image are converted into a text format through recognition software for further editing and processing by word processing software. How to debug or use auxiliary information to improve recognition accuracy is the most important issue of OCR, and the term of icr (intelligent Character recognition) is generated accordingly. The main indicators for measuring the performance of an OCR system are: the rejection rate, the false recognition rate, the recognition speed, the user interface friendliness, the product stability, the usability, the feasibility and the like.

Referring to fig. 1, in the conventional OCR recognition technology, a single text block is usually found first, and the single text block is usually numerous and many small blocks are spliced, which results in a great waste of system resources and greatly reduces the text recognition efficiency.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides a method and a system for recognizing pictures and texts based on a deep learning optimization network, which can quickly recognize the processing technology of photos and videos of a whole block of characters by establishing an inaccurate text correction model by means of a deep learning target detection technology, and can mark the whole block of characters in the whole photo or the whole frame of video, thereby saving the system resources of OCR processing and greatly improving the character recognition efficiency.

One of the purposes of the invention is to provide an image-text identification method based on a deep learning optimization network, which comprises the following steps:

the method comprises the following steps: identifying an object in a single-frame image through a deep learning target detection technology;

step two: the method comprises the following steps of (1) scratching out a picture of an object through a scratching model and an aligning model, and aligning;

step three: performing OCR recognition on the whole picture;

step four: and sending the character recognition result obtained by OCR recognition into an NLP correction model established based on deep learning natural language processing for correction, and finally outputting the character recognition result.

Preferably, the specific steps of establishing the NLP correction model based on deep learning natural language processing are as follows:

firstly, initializing a deep artificial neural network by utilizing a corpus accumulated in the early stage;

then, the sorted whole block is used for carrying out recognition process information of OCR recognition, error information of an input text manually corrected by NLP, relevant information of a correction process record, and the text which is used as a data set and input into aligned target object information and has low accuracy is used for training, and weight adjustment is carried out on the deep artificial neural network through a reasonably set loss function.

Preferably, the single frame image is a single picture in a photo album or a single frame picture in a video.

The invention also provides a system for identifying graphics and texts based on deep learning optimization network, which at least comprises:

an object identification module: identifying an object in a single-frame image through a deep learning target detection technology;

an alignment module: the method comprises the following steps of (1) scratching out a picture of an object through a scratching model and an aligning model, and aligning;

an OCR recognition module: performing OCR recognition on the whole picture;

a correction module: and sending the character recognition result obtained by OCR recognition into an NLP correction model established based on deep learning natural language processing for correction, and finally outputting the character recognition result.

Preferably, the single frame image is a single picture of a photo album or a single frame picture in a video.

The invention also aims to provide a computer program for realizing the image-text recognition method based on the deep learning optimization network.

The fourth purpose of the invention is to provide an information data processing terminal for realizing the image-text identification method based on the deep learning optimization network.

The fifth object of the present invention is to provide a computer-readable storage medium, which includes instructions that, when executed on a computer, cause the computer to execute a deep learning optimization network-based text-text recognition method.

In summary, the advantages and positive effects of the invention are:

by using the technical scheme of the invention, the processing technology of the photo and the video of the whole block of characters can be rapidly identified, and the whole block of characters in the whole photo or the whole frame of video can be marked, thereby saving the system resource of OCR processing and greatly improving the character identification efficiency.

Drawings

FIG. 1 is a flow chart of a conventional solution;

FIG. 2 is a flow chart of a preferred embodiment of the present invention;

FIG. 3 is a flow chart of the establishment of the NLP correction model in the preferred embodiment of the present invention;

fig. 4 is a flow chart of NLP application in the preferred embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Referring to fig. 2, a method for identifying an image-text based on a deep learning optimization network includes the following steps:

1) objects are first identified in an image or video frame by a deep learning object detection technique.

2) And through a plurality of background target object matting models and alignment models, each identified object matting picture is aligned.

3) And performing OCR recognition on the aligned object picture instead of single character OCR recognition. In the process, because a large amount of interference and distortion exist in the OCR recognition process of the whole picture, the recognized characters can be extremely undesirable.

4) And (3) sending the inaccurate character recognition result in the previous step into an NLP (Natural Language Processing) correction model established based on deep learning Natural Language Processing for correction, and finally outputting a more accurate character recognition result.

Wherein: the specific steps of establishing the NLP correction model based on deep learning natural language processing are as follows:

1) first, a corpus accumulated in a previous stage is used to initialize a deep artificial neural network (DNN).

2) And training the recognition process information of OCR recognition, the error information of the input text manually corrected by NLP, the relevant information of the correction process record, the aligned target object information input as a data set and the text with low accuracy by using the sorted whole block, and adjusting the weight of DNN by using a reasonably set loss function.

The process of using the trained "NLP correction model built based on deep learning natural language processing" is shown in fig. 4: and inputting the image or video frame needing character recognition as an input into the trained DNN, and outputting corrected and more accurate text by the DNN.

The artificial neural network (DNN) referred to in fig. 3 and 4 of the present invention includes, but is not limited to, the following networks or a combination of networks: CNN (convolutional neural network), RNN (Recurrent neural network), GAN (generic adaptive network generation countermeasure network), LSTM (Long Short-Term Memory), etc., inclusive subnetworks including but not limited to a combination of the following methods: R-CNN (Region-CNN, meaning of CNN is described above), fast-RCNN (RCNN is the same as R-CNN, meaning is described above), MASK-RCNN, etc. Such networks or sub-networks, with or without attentional mechanisms, are encompassed by the present invention.

1. The invention provides a technology for realizing character recognition in an image or a video through deep learning target detection. It is completely different from the traditional OCR technology in recognizing characters in images or videos.

2. The traditional OCR method directly carries out the character block recognition step on the image or the video frame, a target detection method is not used for recognizing specific objects (except the situation of recognizing the character blocks), and the method for recognizing the specific objects by using the target detection method in the OCR task is the innovation of the invention. Are within the scope of the invention.

3. Since the object is not recognized, the conventional OCR method will not scratch out and align the specific object image in the image (the operations of rotating, aligning, etc. the whole image are not listed here).

4. The traditional OCR method only focuses on each single character in an image or a video frame, and does not combine with object information to perform whole block recognition on the characters in an object.

The invention designs a special deep neural network construction method aiming at detecting a whole block of recognized characters by combining a target, the construction steps, the network structure or the substructure, and the training and application methods of the deep neural network construction method are innovative, and for image-text recognition tasks and OCR tasks in images or video frames, if the steps similar to the invention are used, and if the artificial neural network structure and the artificial neural network training application method similar to the invention shown in the figures 3 and 4 are used when the artificial neural network is established, the deep neural network construction method is within the protection scope of the patent of the invention.

Referring to fig. 3, a second preferred embodiment is a system for identifying texts based on a deep learning optimization network, including:

an object identification module: objects are first identified in an image or video frame by a deep learning object detection technique.

An alignment module: and through a plurality of background target object matting models and alignment models, each identified object matting picture is aligned.

An OCR recognition module: and performing OCR recognition on the aligned object picture instead of single character OCR recognition. In the process, because a large amount of interference and distortion exist in the OCR recognition process of the whole picture, the recognized characters can be extremely undesirable.

A correction module: and (4) sending the inaccurate character recognition result in the last step into an NLP correction model established based on deep learning natural language processing for correction, and finally outputting a more accurate character recognition result.

In a third preferred embodiment, a computer program for implementing a deep learning optimization network-based image-text recognition method includes the following steps:

4) And (4) sending the inaccurate character recognition result in the last step into an NLP correction model established based on deep learning natural language processing for correction, and finally outputting a more accurate character recognition result.

1) first, initialization of a deep artificial neural network (DNN) is performed using a corpus accumulated in the early stage.

The fourth preferred embodiment is an information data processing terminal for realizing the image-text identification method based on the deep learning optimization network. The image-text identification method based on the deep learning optimization network comprises the following steps:

A fifth preferred embodiment is a computer-readable storage medium comprising instructions which, when executed on a computer, cause the computer to perform the following method for deep learning optimized network based teletext recognition:

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When used in whole or in part, can be implemented in a computer program product that includes one or more computer instructions. When loaded or executed on a computer, cause the flow or functions according to embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, the computer instructions may be transmitted from one website site, computer, server, or data center to another website site, computer, server, or data center via wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL), or wireless (e.g., infrared, wireless, microwave, etc.)). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that includes one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims

1. A graphic identification method based on a deep learning optimization network is characterized in that: at least comprises the following steps:

step three: performing OCR recognition on the whole picture;

2. The image-text recognition method based on the deep learning optimization network of claim 1 is characterized in that the specific steps of establishing the NLP correction model based on the deep learning natural language processing are as follows:

3. The image-text recognition method based on the deep learning optimization network of claim 1 or 2, wherein the single-frame image is a single picture in a photo set or a single picture in a video.

4. A picture and text recognition system based on deep learning optimization network is characterized in that: at least comprises the following steps:

an OCR recognition module: performing OCR recognition on the whole picture;

5. The deep learning optimization network-based image-text recognition system based on claim 4 is characterized in that the specific steps of establishing the NLP correction model based on deep learning natural language processing are as follows:

6. The deep learning optimization network-based image-text recognition system based on claim 4 or 5, wherein the single frame image is a single picture in a photo set or a single frame picture in a video.

7. A computer program for implementing the deep learning optimization network-based teletext recognition method according to any one of claims 1-3.

8. An information data processing terminal for implementing the image-text identification method based on the deep learning optimization network of any one of claims 1 to 3.

9. A computer-readable storage medium comprising instructions which, when executed on a computer, cause the computer to perform the deep learning optimization network-based teletext recognition method according to any one of claims 1-3.