CN112733858B

CN112733858B - Image character rapid identification method and device based on character region detection

Info

Publication number: CN112733858B
Application number: CN202110021200.5A
Authority: CN
Inventors: 张博; 张乐平; 侯磊; 匡海泉; 李海峰
Original assignee: Beijing Deepctrl Co ltd
Current assignee: Beijing Deepctrl Co ltd
Priority date: 2021-01-08
Filing date: 2021-01-08
Publication date: 2021-10-26
Anticipated expiration: 2041-01-08
Also published as: CN112733858A

Abstract

The method and the device for quickly identifying the image characters based on the character area detection convert the channel number of a target image into the category number by utilizing a convolution layer; generating a characteristic diagram of the target image in a scaled-down manner through a character region detection model; overlapping the generated mask and the target image to obtain a regional sub-image containing characters; identifying character gaps of the sub-images in the area, marking the character gaps as backgrounds, and separating each character in the mask image; extracting character areas of the regional sub-images through the mask image to obtain a character area of each character; performing horizontal closing operation on the mask map to synthesize a closed region, extracting a rectangular outline of the closed region to obtain an integral region of a line of characters, extracting each character region of the integral region by using the initial mask map, sequencing each character region of the extracted integral region from left to right, and combining the sequenced character regions according to the line. The method can be realized by adopting a smaller neural network, and has the advantages of high reasoning speed and less occupied resources.

Description

Image character rapid identification method and device based on character region detection

Technical Field

The invention relates to the technical field of image detection, in particular to a method and a device for quickly identifying image characters based on character region detection.

Background

At present, two steps of processes are basically needed to identify characters in an image based on deep learning, referring to fig. 1, first step, Text Region detection (Text Region Detect), that is, whether an image contains a character Region is detected, and sub-images of the partial Region are extracted; second, Text Recognition, divides the Text area in the image into individual character area images, recognizes each character image as a character (i.e., a machine-recognizable code), and then reassembles the character images into character strings.

At present, the text region detection generally uses an image detection model (such as Yolo, SSD, etc.), and the model can output text region coordinates, so as to extract a corresponding region sub-image, and then perform character recognition on the text region image. The character recognition of the character area image has two main methods, one is to cut the image into a series of small images only containing single characters by means of sliding windows, histogram cutting and the like, then classify each small image by using a convolutional neural network to obtain a single character, and then recombine the single character into a character string; and the other method is to directly identify the character area image by using a recurrent neural network to generate a character sequence.

The following problems exist in the prior art:

firstly, the character region detection model only marks the whole character region and does not distinguish the position of each character, and then extra calculation is needed to identify each character;

secondly, the character area is divided by using a sliding window or a histogram, so that the generalization capability is weak and the accuracy is low. The sliding window is easy to cause over-segmentation, so that the calculation load is increased for subsequent processing, and the accuracy of the reconstructed character string is influenced; the histogram segmentation is influenced by noise, so that the interval of characters is not easy to find;

and thirdly, characters are not segmented, the whole character region reconstruction character string is directly identified by using a recurrent neural network, and the model is complex, the inference time is long, the occupied computing resources are more, and the method can only be deployed in a large-scale server for cloud service with weak real-time performance and is difficult to be deployed in an embedded platform and an image real-time processing scene.

In summary, a technical solution for realizing fast recognition of image and text is needed.

Disclosure of Invention

Therefore, the embodiment of the invention provides a method and a device for quickly identifying image characters based on character region detection, which are used for performing character-level region detection on character images, directly dividing the region of each character while detecting the character region, and realizing the real-time completion of character extraction and identification of high-resolution images.

In order to achieve the above object, an embodiment of the present invention provides the following: the image character fast recognition method based on character area detection comprises the following steps:

extracting the characteristics of a target image by adopting a convolutional neural network, and converting the number of channels of the target image into the number of categories by utilizing a convolutional layer;

generating a feature map of the target image in a reduced scale through a character region detection model, wherein the value of each pixel point in the feature map corresponds to the category;

amplifying the feature map to the size of the target image and using the feature map as a mask map of a character area, and overlapping the generated mask and the target image to obtain an area sub-image containing characters;

performing character gap recognition on the regional subimages, marking the character gaps as backgrounds, and separating each character in the mask image;

and extracting character areas of the area sub-images through the mask image to obtain the character area of each character.

As a preferred scheme of the image character fast recognition method based on character region detection, the method further comprises the following steps: performing transverse closing operation on the mask image to synthesize a closed area, extracting a rectangular outline of the closed area to obtain an integral area of a line of characters, sequencing each character area of the extracted integral area from left to right, and combining the sequenced character areas according to the line;

the obtaining of the character area of each character further comprises: extracting each character area of the whole area by using the initial mask image.

As a preferred scheme of the image character rapid identification method based on character region detection, a character identification model is adopted to carry out batch identification and classification on the character regions combined according to rows, and the character regions are recombined according to the existing sequence of the character regions to form a character string.

As a preferred scheme of the image character fast recognition method based on character region detection, the character region detection model is trained according to an image classification model during training, a Flatten layer is added for conversion after a convolution layer is output, and then a softmax layer is added for outputting a category.

As a preferred scheme of the image character rapid identification method based on character region detection, the character region detection model removes the last flatten layer and softmax layer during reasoning, and directly obtains the output of the convolutional layer.

As a preferred scheme of the image character fast recognition method based on character region detection, the target image is input after adopting the original size or scaling according to the proportion, a character region detection model is adopted to scan and extract the characteristics of the whole target image, one region of the target image is sequentially extracted by convolution each time, and the corresponding characteristics of each region are obtained by forward calculation of the character region detection model;

when the preset range characters appear in the area, marking the area as a character area;

in the process of scanning an input target image, when a calculation window passes through a group of characters, the periphery of each character is output to be 0, the center of each character is output to be 1, and each character area is divided on a final feature map.

As a preferred scheme of the image character rapid identification method based on character region detection, the same target image is subjected to multi-scale multi-time identification, and the feature masks are fused on the size of an original image to obtain a comprehensive character judgment result.

The method is used for video real-time analysis, video text content monitoring and public screen text content protection as a preferred scheme of an image text quick identification method based on character region detection.

The invention also provides a device for quickly identifying image characters based on character region detection, which comprises:

the image feature extraction module is used for extracting the features of the target image by adopting a convolutional neural network and converting the number of channels of the target image into the number of categories by utilizing a convolutional layer;

the pixel point category processing module is used for generating a characteristic diagram of the target image in a reduced proportion through a character region detection model, and the value of each pixel point in the characteristic diagram corresponds to the category;

the area sub-image generation module is used for amplifying the feature map to the size of the target image and using the feature map as a mask map of a character area, and overlapping the generated mask map and the target image to obtain an area sub-image containing characters;

the character gap recognition module is used for carrying out character gap recognition on the regional subimages, marking the character gaps as backgrounds and separating each character in the mask image;

and the character area extraction module is used for extracting the character area of the area sub-image through the mask image to obtain the character area of each character.

As a preferable scheme of the image character rapid recognition device based on character region detection, the device further comprises:

a closed region generating module, configured to perform a horizontal closing operation on the mask map to synthesize a closed region;

the whole area generating module is used for extracting the rectangular outline of the closed area to obtain a whole area of a line of characters;

the character area arrangement module is used for extracting each character area of the whole area by using the initial mask image, sequencing each extracted character area of the whole area from left to right, and combining the sequenced character areas according to rows;

and the character generation module is used for adopting a character recognition model to recognize and classify the character areas combined according to the rows in batches and recombining the character areas according to the existing sequence of the character areas to form a character string.

Extracting the characteristics of a target image by adopting a convolutional neural network, and converting the number of channels of the target image into the number of categories by utilizing a convolutional layer; generating a characteristic diagram of the target image in a scaled-down manner through a character region detection model, wherein the value of each pixel point in the characteristic diagram corresponds to a category; amplifying the feature map to the size of the target image and using the feature map as a mask map of the character area, and overlapping the generated mask and the target image to obtain an area sub-image containing characters; carrying out character gap recognition on the regional subimages, marking the character gaps as backgrounds and separating each character in the mask image; and extracting character areas of the area sub-images through the mask image to obtain the character area of each character. Performing horizontal closing operation on the mask map to synthesize a closed region, extracting a rectangular outline of the closed region to obtain an integral region of a line of characters, extracting each character region of the integral region by using the initial mask map, sequencing each character region of the extracted integral region from left to right, and combining the sequenced character regions according to the line. And adopting a character recognition model to recognize and classify the character areas combined according to the rows in batches, and recombining the character areas according to the existing sequence of the character areas to form character strings. The method carries out character-level region detection on the character image, and directly divides the region of each character while detecting the character region; the multi-scale reasoning of the image can adapt to character images with various sizes and identify fonts with different sizes; performing horizontal closing operation on the character region mask and calculating a rectangular outline, and forming the character regions into a whole line; the method can be realized by adopting a smaller neural network, has high reasoning speed and less occupied resources, can be constructed on an embedded platform to form edge computing equipment, is deployed on the site of an industrial or commercial scene, and is used for video text content monitoring, public screen text content protection and the like.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It should be apparent that the drawings in the following description are merely exemplary, and that other embodiments can be derived from the drawings provided by those of ordinary skill in the art without inventive effort.

FIG. 1 is a diagram illustrating a prior art image text recognition process involved in the background of the invention;

fig. 2 is a schematic diagram of a character region detection model adopted for image character fast recognition based on character region detection according to an embodiment of the present invention;

fig. 3 is a flowchart of an image text fast recognition method based on character region detection according to an embodiment of the present invention;

fig. 4 is a schematic diagram of a process for quickly recognizing image characters based on character region detection according to an embodiment of the present invention;

fig. 5 is a schematic diagram of a character region mask in an image character fast recognition process based on character region detection according to an embodiment of the present invention;

fig. 6 is a schematic diagram of an image and text fast recognition device based on character region detection according to an embodiment of the present invention.

Detailed Description

The present invention is described in terms of particular embodiments, other advantages and features of the invention will become apparent to those skilled in the art from the following disclosure, and it is to be understood that the described embodiments are merely exemplary of the invention and that it is not intended to limit the invention to the particular embodiments disclosed. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Example 1

Comparing with fig. 1, referring to fig. 2, fig. 3, fig. 4 and fig. 5, a method for quickly recognizing image text based on character region detection is provided, which is used for video real-time analysis, video text content monitoring and public screen text content protection, and comprises the following steps:

s1: extracting the characteristics of a target image by adopting a convolutional neural network, and converting the number of channels of the target image into the number of categories by utilizing a convolutional layer;

s2: generating a feature map of the target image in a reduced scale through a character region detection model, wherein the value of each pixel point in the feature map corresponds to the category;

s3: amplifying the feature map to the size of the target image and using the feature map as a mask map of a character area, and overlapping the generated mask and the target image to obtain an area sub-image containing characters;

s4: performing character gap recognition on the regional subimages, marking the character gaps as backgrounds, and separating each character in the mask image;

s5: and extracting character areas of the area sub-images through the mask image to obtain the character area of each character.

Referring to fig. 3, fig. 4 and fig. 5 again, in the process of the image character fast recognition method based on character region detection, the method further includes:

s6: carrying out transverse closing operation on the mask image to synthesize a closed area, and extracting a rectangular outline of the closed area to obtain an integral area of a line of characters;

s7: extracting each character area of the whole area by using the initial mask image; sequencing each extracted character area of the whole area from left to right, and combining the sequenced character areas according to rows;

s8: and adopting a character recognition model to recognize and classify the character areas combined according to the rows in batches, and recombining the character areas according to the existing sequence of the character areas to form character strings.

Because the output of the character region detection model is a group of single-character images, the character recognition model only needs to construct an image classification model based on a convolutional neural network, and the character recognition and the character string reconstruction are rapidly completed through batch acceleration. And a cyclic neural network is not required to be introduced, so that the training cost and the actual operation resource are reduced. Compared with the traditional character segmentation based on the sliding window of image processing or the image histogram segmentation, the technical scheme has the advantages that the character segmentation is realized by using the deep learning model, higher accuracy can be obtained on the basis of big data training, and over-segmentation and under-segmentation are avoided. The accuracy of the overall recognition is higher.

Specifically, the character region detection model is trained according to an image classification model during training, and the output of the 1x1 convolution layer is converted by adding a Flatten layer, and then an output category of a softmax layer is added. The image used for training is a fixed-size grayscale (1 × 45 × 45), and a large amount of data such as a character space image, a character edge image, and a character center image, as well as a character image and a background image, are used for training, and these data are marked as the background (category 0). And removing the last Flatten layer in reasoning of the trained model, outputting the convolution characteristic diagram of 1x1, and performing softmax and argmax operations on a category channel to obtain a two-dimensional characteristic diagram, wherein each pixel point is a category value (a text area is 1, and the background is 0). Due to the setting of the size and the step size of the convolution kernel, the size of the feature map is the scaling down of the input image (the area of the target image is 45 × 45 corresponds to one pixel of the feature map), and the feature map is scaled up to the size of the target image, so that the feature map becomes the character area mask map of the target image. And performing character region type extraction on the target image by using the mask image to obtain the region of each character.

Specifically, the Flatten layer is used for flattening the input, namely the multidimensional input is subjected to one-dimensional operation, the transition from the convolution layer to the full-connection layer is realized, and the size of the hyper-parameter batch is not influenced by the Flatten. The softmax layer is a full connection layer, and can map a plurality of neuron outputs calculated by the convolutional neural network to a (0, 1) interval to give the probability condition of each classification.

In order to obtain the sequence effect of a line of characters, the technical scheme can firstly carry out transverse closing operation on the mask image, so that the line of characters are converged into a closed area when the line of characters are close to each other, the rectangular outline of the closed area is extracted to obtain the whole area of the line of characters, then each character area is extracted by using the original mask image and is sequenced from left to right, and all characters can be combined according to the line. The subsequent character recognition model does not need to pay attention to the sequence relation of characters, and only needs to recognize and classify in batches and recombine character strings according to the existing sequence.

In the technical scheme, a simplified Full Convolution Network (FCN) is realized in a text region detection stage, image features are extracted by using a multilayer convolution neural network during construction, and then the number of channels is converted into the number of categories through a 1x1 convolution layer (different from a classical FCN, a subsequent transposition convolution layer is not added). The character region detection model finally generates a feature map of the target image in a scaled-down manner, and the value of each pixel point is the corresponding category (the character region is 1, and the background is 0). And enlarging the characteristic diagram to the size of the target image to be used as a mask diagram of the character area. And multiplying and superposing the mask image and the target image to obtain a regional sub-image containing characters. The character region detection model can identify gaps in a row of characters and mark the gaps as background, so that each character in the mask image is separated, and an image of a single character can be obtained only by simple image processing operation.

Specifically, the character region detection model scans and extracts features of the whole image, one region is sequentially extracted by convolution each time, and the model calculates forward to obtain a corresponding feature value of each region:

firstly, when the region contains complete characters, the size of the characters is moderate, the characters are marked with pictures corresponding to characters in training data, and the characteristic value is 1 (characters);

secondly, the region does not contain any character, and the characteristic value is 0 (background);

thirdly, the region contains a part of the character, the edge of the character does not exceed the center point of the region, and the output characteristic value is 0 (background);

fourthly, the center point of the area is positioned in a gap between two characters, and the output characteristic value is 0 (background);

fifthly, the whole area is positioned in the characters (the characters are larger than the area size), and the output characteristic value is 0 (background);

sixthly, the characters in the region are too small, and the output characteristic value is 0 (background).

Only if the characters are mostly in the region and are of a moderate size, they are marked as text regions, otherwise the text feature cannot be activated. In the process of scanning an input target image, when a calculation window passes through a group of characters, the periphery of each character is output to be 0, and only the center of each character is output to be 1, so that each character area is divided on a final feature map.

Specifically, the target image is input after being subjected to original size or proportional scaling, a character region detection model is adopted to scan and extract features of the whole target image, one region of the target image is sequentially extracted by convolution each time, and the corresponding features of each region are obtained by forward calculation of the character region detection model;

In one embodiment of the image character rapid identification method based on character region detection, the input target image is zoomed in and zoomed out in different scales, so that character regions with different sizes can be effectively identified. The small size is favorable for detecting large font, and the large size is favorable for detecting small font. After the character region detection model is deployed, the scaling size of the input image can be set according to the size of the target character, and the method can be flexibly suitable for various scenes. In addition, the same target image can be identified for multiple times in a multi-scale mode, the feature masks are fused on the size of the original image to obtain a comprehensive judgment result, and identification accuracy can be greatly improved.

Example 2

Referring to fig. 6, the present invention further provides an image and text fast recognition apparatus based on character region detection, including:

the image feature extraction module 1 is used for extracting target image features by adopting a convolutional neural network and converting the number of channels of the target image into the number of categories by utilizing a convolutional layer;

a pixel point category processing module 2, configured to generate a feature map in which the target image is scaled down through a text region detection model, where a value of each pixel point in the feature map corresponds to the category;

the area sub-image generation module 3 is used for amplifying the feature map to the size of the target image and using the feature map as a mask map of a character area, and overlapping the generated mask map and the target image to obtain an area sub-image containing characters;

a character gap recognition module 4, configured to perform character gap recognition on the region sub-image, mark the character gap as a background, and separate each character in the mask image;

and the character area extraction module 5 is configured to perform character area extraction on the area sub-images through the mask map to obtain a character area of each character.

Specifically, still include:

a closed region generating module 6, configured to perform a horizontal closing operation on the mask map to synthesize a closed region;

the whole area generating module 7 is used for extracting the rectangular outline of the closed area to obtain a whole area of a line of characters;

a character region arrangement module 8, configured to extract each character region of the entire region by using the initial mask map, sort each extracted character region of the entire region from left to right, and combine the sorted character regions according to rows;

and the character generation module 9 is used for adopting a character recognition model to recognize and classify the character areas combined according to the rows in batches and recombining the character areas according to the existing sequence of the character areas to form a character string.

In the embodiment, the input target image is zoomed in and zoomed out in different scales, so that character areas in different sizes can be effectively identified. The small size is favorable for detecting large font, and the large size is favorable for detecting small font. After the character region detection model is deployed, the scaling size of the input image can be set according to the size of the target character, and the method can be flexibly suitable for various scenes. In addition, the same target image can be identified for multiple times in a multi-scale mode, the feature masks are fused on the size of the original image to obtain a comprehensive judgment result, and identification accuracy can be greatly improved.

Example 3

The present invention provides a computer-readable storage medium in which a program code for image text fast recognition based on character region detection is stored, the program code including instructions for executing the image text fast recognition method based on character region detection in embodiment 1 or any possible implementation manner thereof.

The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

Example 4

The invention provides an electronic device, which comprises a processor, wherein the processor is coupled with a storage medium, and when the processor executes instructions in the storage medium, the electronic device is enabled to execute the method for quickly recognizing image and characters based on character region detection in embodiment 1 or any possible implementation manner thereof.

Specifically, the processor may be implemented by hardware or software, and when implemented by hardware, the processor may be a logic circuit, an integrated circuit, or the like; when implemented in software, the processor may be a general-purpose processor implemented by reading software code stored in a memory, which may be integrated in the processor, located external to the processor, or stand-alone.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.).

Specifically, a Central Processing Unit (CPU) executes various processes in accordance with a program stored in a Read Only Memory (ROM) or a program loaded from a storage section to a Random Access Memory (RAM). In the RAM, data necessary when the CPU executes various processes and the like is also stored as necessary. The CPU, ROM, and RAM are connected to each other via a bus. An input/output interface is also connected to the bus.

The following components are connected to the input/output interface: an input section (including a keyboard, a mouse, etc.), an output section (including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), etc., and a speaker, etc.), a storage section (including a hard disk, etc.), a communication section (including a network interface card such as a LAN card, a modem, etc.). The communication section performs communication processing via a network such as the internet. The driver may also be connected to an input/output interface as desired. A removable medium such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like can be mounted on the drive as needed, so that the computer program read out therefrom is installed in the storage section as needed.

In the case where the above-described series of processes is realized by software, a program constituting the software is installed from a network such as the internet or a storage medium such as a removable medium.

It will be understood by those skilled in the art that such a storage medium is not limited to a removable medium storing the program, distributed separately from the apparatus, to provide the program to the user. Examples of the removable medium include a magnetic disk (including a floppy disk (registered trademark)), an optical disk (including a compact disc-read only memory (CD-ROM) and a Digital Versatile Disc (DVD)), a magneto-optical disk (including a mini-disk (MD) (registered trademark)), and a semiconductor memory. Alternatively, the storage medium may be a ROM, a hard disk included in a storage section, or the like, in which programs are stored and which are distributed to users together with the device including them.

It will be apparent to those skilled in the art that the modules or steps of the present invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and alternatively, they may be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, and in some cases, the steps shown or described may be performed in an order different than that described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.

Although the invention has been described in detail above with reference to a general description and specific examples, it will be apparent to one skilled in the art that modifications or improvements may be made thereto based on the invention. Accordingly, such modifications and improvements are intended to be within the scope of the invention as claimed.

Claims

1. The image character rapid identification method based on character region detection is characterized by comprising the following steps:

2. The method for rapidly recognizing image characters based on character region detection according to claim 1, further comprising: performing transverse closing operation on the mask image to synthesize a closed area, extracting a rectangular outline of the closed area to obtain an integral area of a line of characters, sequencing each character area of the extracted integral area from left to right, and combining the sequenced character areas according to the line;

3. The image character rapid recognition method based on character region detection as claimed in claim 2, characterized in that the character region after line combination is classified by batch recognition using character recognition model, and the character string is formed by recombination according to the existing sequence of the character region.

4. The method for rapidly recognizing image characters based on character region detection according to claim 1, wherein the character region detection model is trained according to an image classification model during training, a Flatten layer conversion is added after a convolution layer is output, and then a softmax layer output category is added.

5. The method for rapidly recognizing image characters based on character region detection according to claim 4, wherein said character region detection model removes a last flatten layer and a softmax layer during inference, and directly obtains the output of a convolutional layer.

6. The method of claim 1, wherein the target image is input after being scaled in original size or in proportion, the whole target image is scanned and feature extracted by a text region detection model, one region of the target image is sequentially extracted by convolution each time, and the corresponding feature of each region is obtained by forward calculation of the text region detection model;

7. The method as claimed in claim 1, wherein the same target image is multi-scale and multi-time recognized, and the feature masks are fused on the size of the original image to obtain the comprehensive character judgment result.

8. The image character rapid identification method based on character area detection as claimed in any one of claims 1 to 7, which is used for video real-time analysis, video character content monitoring and public screen character content protection.

9. Image characters quick identification equipment based on character region detects, its characterized in that includes:

10. The device for rapidly recognizing image characters based on character region detection according to claim 9, further comprising: