CN111612005A

CN111612005A - Character detection method and device

Info

Publication number: CN111612005A
Application number: CN202010266335.3A
Authority: CN
Inventors: 杨璐; 范志刚
Original assignee: Xian Wanxiang Electronics Technology Co Ltd
Current assignee: Xian Wanxiang Electronics Technology Co Ltd
Priority date: 2020-04-07
Filing date: 2020-04-07
Publication date: 2020-09-01

Abstract

The disclosure provides a character detection method and device, relates to the technical field of image processing, and can solve the problem of missed detection in the existing character recognition. The specific technical scheme is as follows: acquiring a nuclear density estimation image of an image to be processed; determining N minimum value points in the kernel density estimation graph; taking the N minimum value points as boundary points, layering the image to be processed to obtain N +1 image layers; identifying a text area of each layer in the N +1 layers; and overlapping the text area of each layer to obtain a target text image. The invention is used for character detection.

Description

Character detection method and device

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to a text detection method and apparatus.

Background

At present, when recognizing characters of an image, most of conventional binarization processing methods use a threshold to perform binarization processing on the image to identify a region of interest, such as conventional binarization or adaptive binarization algorithms such as OSTU (shangzu method or maximum inter-class variance method). However, the binarization processing method is prone to have a missing detection problem, and thus the processing effect is poor.

Disclosure of Invention

The embodiment of the disclosure provides a character detection method and a character detection device, which can solve the problem of missing detection in the existing character recognition. The technical scheme is as follows:

according to a first aspect of the embodiments of the present disclosure, there is provided a text detection method, including:

acquiring a nuclear density estimation image of an image to be processed;

determining N minimum value points in the nuclear density estimation graph, wherein N is not less than 1;

taking the N minimum value points as boundary points, layering the image to be processed to obtain N +1 image layers;

identifying a text area of each layer in the N +1 layers;

and overlapping the text area of each layer to obtain a target text image.

The method comprises the steps of layering images to be processed, identifying text regions of each layer, obtaining complete text regions, avoiding missing detection, and after the text regions of each layer are obtained, overlapping the text regions of each layer to generate a final target text graph.

In one embodiment, identifying the text region of each of the N +1 layers includes:

carrying out binarization processing on each image layer in the N +1 image layers to obtain N +1 binarization image layers;

identifying a target region of interest of each binarization layer in the N +1 binarization layers;

and identifying the target region of interest of each binarization layer to obtain a text region of each layer.

In one embodiment, identifying a target region of interest of each of the N +1 binarization image layers includes:

performing expansion corrosion on each binarization layer to obtain an initial region of interest of each binarization layer;

and screening the initial region of interest of each binarization layer to obtain a target region of interest of each binarization layer.

In one embodiment, the step of screening the initial region of interest of each binarization layer to obtain a target region of interest of each binarization layer includes:

calculating the area of each initial region of interest in each binarization layer, wherein the area of the initial region of interest is the sum of the number of all pixels in the initial region of interest;

and eliminating the initial region of interest with the area smaller than a preset area threshold value in each binarization layer to obtain a target region of interest of each binarization layer.

acquiring the area of each initial region of interest of each binarization layer and the number of corresponding inflection points, wherein the inflection points are used for indicating that the difference value between the pixel value of a pixel point and the pixel values of surrounding pixels is greater than a preset threshold value;

calculating the inflection point density of each initial region of interest according to the area of each initial region of interest and the number of corresponding inflection points;

determining the uniformity of inflection point distribution in the corresponding initial region of interest according to the inflection point density of each initial region of interest;

and eliminating initial interested areas with uneven inflection point density distribution in each binarization layer to obtain a target interested area of each binarization layer.

In one embodiment, determining the uniformity of the distribution of the inflection points in the corresponding initial region of interest according to the inflection point density of each initial region of interest includes:

averagely dividing each initial region of interest into M sub-regions;

calculating the sub-inflection point density of each sub-region in the M sub-regions of each initial region of interest;

detecting whether the sub-inflection point density of each sub-region is less than the corresponding inflection point density;

and when the sub-inflection point density of the sub-regions with the preset number is smaller than the corresponding inflection point density, determining that the inflection point distribution in the corresponding initial region of interest is not uniform.

acquiring the outline of each initial region of interest in each binarization layer and the minimum circumscribed rectangle corresponding to the outline of each third region of interest;

calculating the ratio of the outline area of each initial region of interest to the corresponding minimum circumscribed rectangle area to obtain the rectangularity of each initial region of interest;

and eliminating the initial region of interest with the rectangularity smaller than a preset threshold value in the binarization layer to obtain a target region of interest of the binarization layer.

In one embodiment, the obtaining the target text image by overlapping the text regions of each layer includes:

overlapping the text area of each layer to obtain a superimposed text image;

and when the overlapped region exists in the overlapped text image, reserving the layer corresponding to the text region with the largest area to obtain the target text image, wherein the overlapped region is used for indicating that the same part of the overlapped text image contains the character region with repeated different layers.

In one embodiment, the method further comprises:

and post-processing each text region in the target text image, and eliminating false detection regions which do not meet preset requirements to obtain a final target text image, wherein the preset requirements are that a connected domain where the text region is located is expanded by a preset step length and then is connected with connected domains of other text regions.

According to a second aspect of the embodiments of the present disclosure, there is provided a text detection apparatus, where the text detection apparatus includes a processor and a memory, and the memory stores at least one computer instruction, and the instruction is loaded and executed by the processor to implement the steps performed in the text detection method described in any one of the first aspect and the first aspect.

According to a fourth aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium, in which at least one computer instruction is stored, the instruction being loaded and executed by a processor to implement the steps performed in the text detection method described in any one of the first aspect and the first aspect.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.

Fig. 1 is a flowchart of a text detection method provided in an embodiment of the present disclosure;

FIG. 2 is a flowchart illustrating a method for identifying text regions of each layer according to an embodiment of the disclosure;

fig. 3 is a flowchart of identifying a target region of interest provided by an embodiment of the present disclosure;

FIG. 4 is a graph illustrating a trend of kernel probability density estimation for a gray histogram provided by an embodiment of the present disclosure;

FIG. 5 is a schematic diagram of a minimum point provided by an embodiment of the present disclosure;

FIG. 6 is an image of a non-dilated corrosion provided by an embodiment of the disclosure;

FIG. 7 is an image of dilated corrosion provided by an embodiment of the present disclosure;

fig. 8 is a schematic diagram of a uniform distribution of inflection points provided by the embodiment of the present disclosure;

fig. 9 is a schematic diagram of uneven inflection point distribution provided by the embodiment of the present disclosure;

FIG. 10 is a schematic diagram of an image to be processed according to an embodiment of the disclosure;

fig. 11 to fig. 13 are images of a binarized image layer after layering on the image to be processed of fig. 10, which are subjected to dilation corrosion according to an embodiment of the present disclosure;

fig. 14 is a text image obtained by superimposing fig. 11 to fig. 13 according to an embodiment of the present disclosure;

FIG. 15 is a schematic diagram of a false positive area check provided by an embodiment of the present disclosure;

FIG. 16 is a schematic diagram of an image to be processed provided by an embodiment of the present disclosure;

fig. 17 is a text image after identifying an image to be processed according to an embodiment of the present disclosure;

fig. 18 is a structural diagram of a text detection apparatus according to an embodiment of the present disclosure;

fig. 19 is a structural diagram of a text detection device according to an embodiment of the present disclosure;

fig. 20 is a structural diagram of a text detection apparatus according to an embodiment of the present disclosure;

fig. 21 is a structural diagram of a character detection apparatus according to an embodiment of the present disclosure;

fig. 22 is a structural diagram of a character detection device according to an embodiment of the present disclosure.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

The embodiment of the present disclosure provides a text detection method, as shown in fig. 1, the text detection method includes the following steps:

101. and acquiring a nuclear density estimation map of the image to be processed.

In an embodiment of the present disclosure, acquiring a nuclear density estimation map of an image to be processed includes: acquiring a gray level histogram of an image to be processed; and processing the gray level histogram by adopting a kernel density estimation method to obtain a kernel density estimation image of the image to be processed. Specifically, when the image to be processed is a color image, the image to be processed is subjected to gray scale processing, the image to be processed is converted from the color image to a gray scale image, of course, the gray scale image may also be subjected to filtering and denoising processing, when the gray scale image of the image to be processed is obtained, a gray scale histogram of the image to be processed is drawn according to the gray scale value of each pixel point in the gray scale image, and then, a kernel probability density estimation method is used to draw the trend of the obtained gray scale histogram, that is, a kernel density estimation image of the image to be processed.

102. N minima points in the kernel density estimate map are determined.

In the embodiment of the present disclosure, the kernel density estimation map is a function curve capable of describing a trend of the gray histogram, and therefore, N minimum value points are obtained from the kernel density estimation curve, where a minimum value point refers to a point where function values on both sides of the point are greater than function values corresponding to the point. Note that the minimum value point is a point corresponding function value compared with a function value of a point in the vicinity of the point, or a point corresponding function value compared with a point in the vicinity of the point, and is an abscissa on a certain segment of the function image, and does not mean the maximum or minimum in the definition domain of the entire function.

103. And layering the image to be processed by using the N minimum point position interfaces to obtain N +1 image layers.

In the embodiment of the present disclosure, the abscissa of the function curve of the kernel density estimation is a gray value, and the gray value ranges from a minimum gray value to a maximum gray value in a gray map of the image to be processed, so that with N minimum value points as boundaries, a pixel between the minimum gray value and a first minimum value point is determined as a first layer, a pixel between the first minimum value point and a second minimum value point is determined as a second layer, and so on, a pixel between the nth minimum value point and the maximum gray value is determined as an N +1 th layer. Thus, N +1 image layers are separated from one image.

104. A text region of each of the N +1 layers is identified.

In this embodiment of the present disclosure, referring to fig. 2, the identifying, by step 104, the text region of each of the N +1 layers includes:

21. and carrying out binarization processing on each image layer of the N +1 image layers to obtain N +1 binarization image layers.

22. And identifying a target region of interest of each binarization layer in the N +1 binarization layer.

23. And identifying the target region of interest of each binarization layer to obtain a text region of each layer.

The binarization processing for each layer includes: for a first image layer, setting the pixel value of a pixel point between the minimum gray value and a first extreme point in the first image layer to be 255, and setting the other pixel values to be 0 to obtain a first binarization image layer; and for the second layer, setting the pixel value between the first minimum value point and the second minimum value point in the second layer as 255, setting the other pixel values as 0, and obtaining a second binarization layer, and so on, and for the (N + 1) th layer, setting the pixel value of the pixel point between the nth minimum value point and the maximum gray value as 255, and setting the other pixel values as 0, and obtaining the (N + 1) th binarization layer. Thus, N +1 binarized image layers are formed.

After acquiring N +1 binarization image layers, identifying a target region of interest of each binarization image layer, and delineating a region to be processed from a processed image in a manner of a square frame, a circle, an ellipse, an irregular polygon and the like. Referring to fig. 3, the step of identifying the target region of interest of each binarized image layer in step 202 includes the following steps:

31. and performing expansion corrosion on each binarization layer to obtain an interested area of each binarization layer.

The dilation and erosion is a morphological operation, which is a series of image processing operations based on shapes, and a region of interest of each binarized image layer can be obtained through dilation and erosion, and it should be noted that each binarized image layer contains at least one region of interest.

32. And screening the interested region of each binarization layer to obtain a target interested region of each binarization layer.

In the embodiment of the present disclosure, the region of interest that is not a text region may be removed from the region of interest by at least one of the area of the region of interest, the uniformity of the corner distribution, and the rectangularity of the region of interest, which will be described in detail below for different cases.

In a first example, the step of screening the region of interest of each binarization layer to obtain a target region of interest of each binarization layer includes:

In this embodiment, the area of the region of interest is screened, and the region of interest with the area smaller than the area threshold is considered as noise false detection and is directly rejected. Taking an area threshold value of 30 pixels as an example, counting the area of each initial region of interest in each binarization image layer, comparing the area of each initial region of interest with the area threshold value 30, removing the initial region of interest smaller than the area threshold value 30, and reserving the initial region of interest larger than the area threshold value 30 as a target region of interest.

In a second example, the step of screening the region of interest of each binarization layer to obtain a target region of interest of each binarization layer includes:

acquiring the area of each initial region of interest of each binarization layer and the number of corresponding inflection points, wherein the inflection points are used for indicating that the absolute value of the difference value between the pixel value of a pixel point and the pixel values of surrounding pixels is greater than a preset threshold value;

Specifically, performing inflection point detection on each initial region of interest by using a fast algorithm, counting to obtain the number of inflection points in each region of interest, and dividing the number of inflection points in each initial region of interest by the area of the corresponding initial region of interest to obtain the inflection point density of each initial region of interest; then, averagely dividing each initial region of interest into M sub-regions, calculating the sub-inflection point density of each sub-region, detecting whether the sub-inflection point density of each sub-region is less than the corresponding inflection point density, if the sub-inflection point density of a preset number of sub-regions in one initial region of interest is less than the corresponding inflection point density, considering that the inflection point distribution of the initial region of interest is not uniform, rejecting the initial region of interest with non-uniform inflection point distribution, and reserving the initial region of interest with uniform inflection point distribution.

In a third example, the step of screening the initial region of interest of each binarization layer to obtain a target region of interest of each binarization layer includes:

Specifically, extracting the outline of each initial region of interest, and calculating the minimum bounding rectangle of the outline, further acquiring the outline area of each initial region of interest and the corresponding minimum circumscribed rectangle area, defining the ratio of the outline area to the minimum circumscribed rectangle area as the rectangularity, the smaller the rectangularity, the more irregular the shape of the initial region of interest, the less likely it is to be a text region, the closer the rectangularity is to 1, meaning that the contour is closer to a rectangle, and therefore, the greater the rectangularity, the more probable the initial region of interest is text, therefore, the rectangularity of each initial region of interest is compared with a preset threshold value, the initial region of interest with the rectangularity smaller than the preset threshold value is removed, and the initial region of interest with the rectangularity larger than the preset threshold value is reserved as a target region of interest of each binarization layer.

It should be noted that, for the removal of the region of interest, any one of the area of the region of interest, the uniformity of the corner distribution, and the rectangularity of the region of interest may be removed, or any two or three of the area of the region of interest, the uniformity of the corner distribution, and the rectangularity of the region of interest may be removed in combination, and by removing the region of interest that is not the text region in combination, a complete and accurate text region can be obtained, thereby avoiding missing detection and false detection.

105. And overlapping the text area of each layer to obtain a target text image.

In the embodiment of the present disclosure, since the text regions are dispersed in each layer, the text regions in all layers need to be overlapped and integrated to obtain the overlapped text graphics; however, the same portion of the superimposed text image may have overlapping of a plurality of text regions of different layers, and therefore, the layer where the text region with the largest area is located is selected from the plurality of text regions as the layer with the largest contribution, that is, only the layer with the largest contribution is reserved, so that the situation that the same portion of the superimposed text image includes a plurality of repeated text regions can be avoided. Meanwhile, all possible regions of interest can be effectively extracted, and the existing condition that a large amount of missed detection exists is avoided.

According to the character detection method provided by the embodiment of the disclosure, the kernel density estimation graph of the image to be processed is obtained, the image to be processed is divided into N +1 layers according to N minimum points in the kernel density estimation graph, the image to be processed is layered, and the text region of each layer is identified, so that a complete text region can be obtained, and the condition of missing detection is avoided.

Based on the text detection method provided in the embodiment corresponding to fig. 1, another embodiment of the present disclosure provides a text detection method, where the text detection method provided in this embodiment is used for detecting text in a natural scene, and specifically includes the following steps:

step 1, carrying out gray level processing and filtering denoising processing on an image.

Specifically, the grayscale processing refers to converting a color picture into a grayscale image.

And 2, carrying out self-adaptive hierarchical threshold processing on the image.

Firstly, a gray level histogram of the image is drawn according to a gray level value (Y value) in a pixel value (YUV) of each pixel point in the image. Next, using a kernel probability density estimation method, a trend of a gray histogram is plotted as shown by a black curve in fig. 4; finally, a minimum value point in the histogram is found, where the minimum value point means that values on both sides of the point are greater than the minimum value point, and as shown by a black circle in fig. 5, there are three minimum values in total, so that the image can be divided into four image layers.

Step 3, taking the minimum value point as a boundary, and carrying out layering processing on the image; and carrying out binarization processing on each layer to generate a binarization layer.

In this step, if there are n minimum values, dividing the image into n +1 layers; and carrying out binarization processing on each layer of picture, thus obtaining n +1 binarization layers.

Taking the minimum value point as a boundary, the specific steps of carrying out layering processing on the image are as follows: for the first minimum value point, setting the pixel value between 0 and the first minimum value point as 255, and setting the other pixel values as 0 to form a 1 st binarization image layer; and for the second extreme point, setting the pixel value between the first extreme point and the second extreme point to be 255, and setting the other pixel values to be 0, and forming a 2 nd binarization layer. In the same way, 4 binarization image layers are formed, similarly to the separation of 4 image layers from one image.

And 4, respectively extracting the region of interest in each binarized image layer by using an expansion erosion operation, and identifying the text region from the region of interest.

In this step, the white regions in the binarized image layer are connected together by dilation etching to generate a region of interest, as shown in fig. 6 and 7, fig. 6 is an image without dilation etching, fig. 7 is an image with dilation etching, and the white portion in fig. 7 is the region of interest.

Since the text is a continuous region with many inflection points, and the common text region is not a geometric shape with an extremely irregular shape, in order to obtain a more accurate text region, the following processing may be performed on each binarized image layer:

firstly, the area of the region of interest is screened, and the region of interest with the area smaller than the area threshold is considered as noise false detection and is directly rejected. Wherein, the area threshold value can be 30 pixels.

Secondly, in the region of interest, performing inflection point detection by using a fast algorithm (if the difference between a certain pixel and surrounding pixels is too large, the pixel can be regarded as an inflection point); calculating the inflection point density in the region of interest; and determining the uniformity of inflection point distribution in the region of interest according to the inflection point density, and rejecting the region of interest with uneven inflection point distribution.

Wherein, the inflection point density is the number of inflection points of the region of interest, and is divided by the area of the region of interest. In order to speed up the calculation efficiency, a region of interest, which may be a relatively small picture or noise and which cannot be regarded as a text region, may be equally divided into nine parts, and if the inflection point density is much lower than the average inflection point density by more than three parts, the distribution of inflection points in the region of interest may be considered to be uneven.

For example, the following steps are carried out: as shown in fig. 8, the region of interest a is divided into nine parts, the total number of inflection points in the entire region of interest a is 18, and the distribution of the inflection points of the region of interest a shown in fig. 8 is uniform; as shown in fig. 9, the region of interest B is divided into nine parts, and the total number of inflection points in the entire region of interest B is 18, wherein the inflection point density of 4 parts is too small, and it can be considered that the region of interest B is not a text region.

Thirdly, calculating the rectangularity of the interested region and removing the interested region with small rectangularity.

In the step, the outline of the region of interest is extracted by using a sobel operator, and the minimum bounding rectangle of the outline is calculated, wherein the definition of the rectangularity is the ratio of the area of the outline to the area of the minimum bounding rectangle. The smaller the rectangularity, the more irregular the shape of the region of interest, and the less likely the region of interest is considered to be a text region. Then, the region of interest having a rectangularity smaller than the preset threshold may not be a text region.

Firstly, calculating the minimum circumscribed rectangle and the outline of an interested region by using an open source library opencv, avoiding repeated inclusion of rectangular frames, and drawing the minimum circumscribed enveloping rectangle, wherein the outline is in any shape, and the circumscribed enveloping rectangle is a rectangle wrapped outside the interested region; then, the area ratio of the two is calculated, the closer to 1 indicates that the outline is closer to a rectangle, the more rectangularity is, and the more probable the region of interest is text.

Thus, through the above-mentioned first to third steps of processing, the region of interest which is not a text region can be removed from the region of interest by the area of the region of interest, the uniformity of the corner distribution, and the rectangularity of the region of interest, and the remaining region of interest is likely to be a text region.

Step 5, superposing the layers processed in the step 4 to obtain an image containing a plurality of text regions; for each part in the image, only the text area of the layer with the largest contribution is reserved in the part.

Because the text regions may be dispersed in each layer, in this step, the text regions in all layers may be integrated, and the specific steps are as follows:

firstly, directly combining and superposing all layers; because each part in the superposed image may have a plurality of text regions, the layer in which the text region with the largest area is located is selected as the layer with the largest contribution; finally, this part only retains the layer that contributes the most. This can prevent the same portion of the superimposed image from containing multiple repeated text regions.

It should be noted that, in this way, all possible regions of interest can be effectively extracted, and a situation that a large number of missed detections exist in a conventional binarization or adaptive binarization algorithm such as an OSTU is avoided.

For example. Fig. 10 is an image to be processed, and a plurality of binarized image layers can be generated by using the character detection method provided by the present invention, and after erosion and expansion, as shown in fig. 11 to 13, an image generated after superposition is as shown in fig. 14. It can be seen that the present invention can extract all possible text regions (regions of interest) in the map, so that there is no missing detection.

And 6, eliminating the false detection area by post-processing the image.

In a specific implementation, the image generated in step 5 may further have a false detection region, and a post-processing technique may be used to check the false detection region and delete isolated points.

Specifically, the false detection region generally refers to an independent connected domain. And for the connected domain with a smaller area, expanding outwards by 10 times, judging whether the connected domain is connected with other connected domains, and if not, considering the connected domain as an isolated point and removing the isolated point.

For example, as shown in fig. 15, a connected domain is erroneously detected in the lower left corner of the graph, and the rectangular region can be expanded by ten times by using a black bold rectangular frame to identify that the rectangular region is still not connected to any blue rectangle, so that the rectangular region can be regarded as an isolated point and deleted. In this way, character recognition can be performed in the text region of the recognized image.

The experimental result shows that, as shown in fig. 16 and 17, the CPU is i78550U under the win10 system by using the text detection method of the present invention, and the time taken for identifying the text region is 10-20 ms. If the existing machine recognition method is used, under the same conditions, the time is about 100 ms.

The character detection method provided by the embodiment of the disclosure includes the steps that an image is divided into a plurality of image layers, binarization processing is carried out on each image layer, and a plurality of regions of interest are obtained; and then according to the area of the region of interest of each layer, the uniformity of corner distribution and the rectangularity of the region of interest, removing the region of interest which is not a text region from the plurality of regions of interest of each layer, wherein the probability of the finally left region of interest is a text region. Therefore, the complete and accurate text region can be obtained through binarization processing of each layer, and missing detection is avoided.

In addition, since the resulting image is generated by superimposing a plurality of image layers, a plurality of superimposed character regions may exist in one portion of the image. The invention only reserves the character area of the layer with the largest contribution in each part in the image, namely the part only contains the character area with the largest area in a plurality of character areas contained in a plurality of layers, thus avoiding the same part of the image after superposition from containing a plurality of repeated text areas.

Based on the text detection method described in the embodiments corresponding to fig. 1 to fig. 3, the following is an embodiment of the apparatus of the present disclosure, which can be used to execute an embodiment of the method of the present disclosure.

The embodiment of the present disclosure provides a character detection apparatus, as shown in fig. 18, the character detection apparatus 180 includes: an obtaining module 1801, a determining module 1802, a layering module 1803, an identifying module 1804 and a superimposing module 1805;

an obtaining module 1801, configured to obtain a nuclear density estimation map of an image to be processed;

a determining module 1802, configured to determine N minimum value points in the nuclear density estimation map, where N is ≧ 1;

a layering module 1803, configured to layer the image to be processed by using the N minimum value points as boundary points to obtain N +1 image layers;

an identifying module 1804, configured to identify a text region of each of the N +1 layers;

and an overlapping module 1805, configured to overlap the text region of each layer to obtain a target text image.

In one embodiment, as shown in FIG. 19, the identifying module 1804 comprises: a binarization processing submodule 1901 and an identifier submodule 1902;

a binarization processing submodule 1901, configured to perform binarization processing on each of the N +1 image layers to obtain N +1 binarization image layers;

an identifying sub-module 1902, configured to identify a target region of interest of each binarization layer in the N +1 binarization layers;

the identifying sub-module 1902 is configured to identify a target region of interest of each binarization layer, so as to obtain a text region of each layer.

In one embodiment, as shown in fig. 20, the identifier sub-module 1902 includes: an expansion corrosion unit 2001 and a screening unit 2002;

an expansion etching unit 2001, configured to perform expansion etching on each binarization layer to obtain an initial region of interest of each binarization layer;

the screening unit 2002 is configured to screen the initial region of interest of each binarization layer to obtain a target region of interest of each binarization layer.

As shown in fig. 21, the screening unit 2002 includes: a computation subunit 2101, a culling subunit 2102, an acquisition subunit 2103, and a determination subunit 2104;

in an embodiment, the calculating subunit 2101 is configured to calculate an area of each initial region of interest in each binarization layer, where the area of the initial region of interest is a sum of all pixels in the initial region of interest;

a removing subunit 2102, configured to remove the initial region of interest in each binarization layer, where the area of the initial region of interest is smaller than a preset area threshold, to obtain a target region of interest in each binarization layer.

In an embodiment, the obtaining subunit 2103 is configured to obtain an area of each initial roi of each binarization layer and a corresponding number of inflection points, where the inflection points are used to indicate that a difference between a pixel value of a pixel point and pixel values of surrounding pixels is greater than a preset threshold;

the calculation subunit 2101 is configured to calculate an inflection point density of each initial region of interest according to the area of each initial region of interest and the number of corresponding inflection points;

a determining subunit 2104 for determining uniformity of inflection point distribution in the corresponding initial region of interest according to the inflection point density of each initial region of interest;

a removing subunit 2102, configured to remove the initial region of interest with uneven inflection point density distribution in each binarization layer, to obtain a target region of interest of each binarization layer.

In one embodiment, the determining sub-unit 2104 for equally dividing each initial region of interest into M sub-regions; calculating the sub-inflection point density of each sub-region in the M sub-regions of each initial region of interest; detecting whether the sub-inflection point density of each sub-region is less than the corresponding inflection point density; and when the sub-inflection point density of the sub-regions with the preset number is smaller than the corresponding inflection point density, determining that the inflection point distribution in the corresponding initial region of interest is not uniform.

In an embodiment, the obtaining subunit 2103 is configured to obtain a minimum bounding rectangle corresponding to the contour of each initial region of interest and the contour of each third region of interest in each binarization layer;

the calculating subunit 2101 is configured to calculate a ratio of an outline area of each initial region of interest to a corresponding minimum circumscribed rectangle area to obtain a rectangularity of each initial region of interest;

a removing unit 2102, configured to remove the initial region of interest in the binarization layer, where the rectangularity is smaller than a preset threshold, to obtain a target region of interest in the binarization layer.

In an embodiment, the superimposing module 1805 is configured to superimpose the text region of each layer, so as to obtain a superimposed text image; and when the overlapped region exists in the overlapped text image, reserving the layer corresponding to the text region with the largest area to obtain the target text image, wherein the overlapped region is used for indicating that the same part of the overlapped text image contains the character region with repeated different layers.

In one embodiment, as shown in fig. 22, the text detection apparatus 180 further includes: a post-processing module 1806;

and a post-processing module 1806, configured to perform post-processing on each text region in the target text image, and remove a false detection region that does not meet a preset requirement, to obtain a final target text image, where the preset requirement is that a connected domain where the text region is located is extended by a preset step length and then connected to connected domains of other text regions.

The character detection device provided by the embodiment of the disclosure divides the image to be processed into N +1 layers according to N minimum points in the kernel density estimation map by obtaining the kernel density estimation map of the image to be processed, stratifies the image to be processed, and identifies the text region for each layer, so that a complete text region can be obtained, and the missing detection situation is avoided.

The embodiment of the present disclosure further provides a text detection device, where the text detection device includes a receiver, a transmitter, a memory, and a processor, where the transmitter and the memory are respectively connected to the processor, the memory stores at least one computer instruction, and the processor is configured to load and execute the at least one computer instruction, so as to implement the text detection method described in the embodiment corresponding to fig. 1 to 3.

Based on the word detection method described in the embodiments corresponding to fig. 1 to fig. 3, embodiments of the present disclosure further provide a computer-readable storage medium, for example, the non-transitory computer-readable storage medium may be a Read Only Memory (ROM), a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like. The storage medium stores computer instructions for executing the text detection method described in the embodiment corresponding to fig. 1 to 3, which is not described herein again.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

Claims

1. A method for detecting text, the method comprising:

acquiring a nuclear density estimation image of an image to be processed;

determining N minimum value points in the nuclear density estimation graph, wherein N is larger than or equal to 1;

layering the to-be-processed image by taking the N minimum value points as boundary points to obtain N +1 image layers;

identifying a text region of each layer in the N +1 layers;

and overlapping the text areas of all the layers to obtain a target text image.

2. The method of claim 1, wherein the identifying text regions for each of the N +1 layers comprises:

performing binarization processing on each image layer in the N +1 image layers to obtain N +1 binarization image layers;

3. The method according to claim 2, wherein said identifying a target region of interest for each of said N +1 binarization image layers comprises:

4. The method according to claim 3, wherein the step of screening the initial region of interest of each binarized image layer to obtain the target region of interest of each binarized image layer comprises:

5. The method according to claim 3, wherein the step of screening the initial region of interest of each binarized image layer to obtain the target region of interest of each binarized image layer comprises:

6. The method of claim 5, wherein said determining a uniformity of a distribution of inflection points in a corresponding initial region of interest based on the inflection density of each of the initial regions of interest comprises:

dividing each initial region of interest into M sub-regions on average;

calculating a sub-inflection density of each sub-region in the M sub-regions of each initial region of interest;

7. The method according to claim 3, wherein the step of screening the initial region of interest of each binarized image layer to obtain the target region of interest of each binarized image layer comprises:

8. The method according to claim 1, wherein the superimposing the text region of each layer to obtain the target text image comprises:

overlapping the text area of each layer to obtain a superimposed text image;

and when the overlapped region is detected to exist in the overlapped text image, reserving the layer corresponding to the text region with the largest area to obtain the target text image, wherein the overlapped region is used for indicating that the same part of the overlapped text image contains character regions with different repeated layers.

9. The method of claim 1, further comprising:

10. A text detection apparatus comprising a processor and a memory, the memory having stored therein at least one computer instruction, the instruction being loaded and executed by the processor to perform the steps performed in the text detection method of any one of claims 1 to 9.