CN112580655A

CN112580655A - Text detection method and device based on improved CRAFT

Info

Publication number: CN112580655A
Application number: CN202011574073.3A
Authority: CN
Inventors: 范凌
Original assignee: Tezign Shanghai Information Technology Co Ltd
Current assignee: Tezign Shanghai Information Technology Co Ltd
Priority date: 2020-12-25
Filing date: 2020-12-25
Publication date: 2021-03-30
Anticipated expiration: 2040-12-25
Also published as: CN112580655B

Abstract

The application discloses a text detection method and device based on improved CRAFT. The text detection method based on the improved CRAFT comprises the steps of cutting a picture to be detected to obtain a cut picture set; calculating a feature map of the cutting picture set to obtain a cutting feature map set, wherein the feature map comprises single character position information and connection information between characters; splicing the cut feature picture set according to the position of the cut picture set corresponding to the picture to be detected to obtain a feature picture to be detected corresponding to the picture to be detected; and detecting a text region on the characteristic diagram to be detected according to the single character position information and the inter-character connection information, and outputting the position of the text region. The method and the device solve the technical problem that the GPU memory occupies too much when the large-size picture is detected.

Description

Text detection method and device based on improved CRAFT

Technical Field

The application relates to the field of text detection, in particular to a text detection method and device based on improved CRAFT.

Background

With the continuous development of the artificial intelligence process, character recognition in natural scenes becomes an indispensable link in the process. Characters need to be detected before character recognition, in related technologies, CTPN (connectionist Text forward network) and craft (character Region aware for Text detection) are generally used for character detection, but CTPN cannot recognize bent characters, a large number of threshold values are set, robustness of network Text detection is not strong, too large pictures cannot be detected, GPU memory occupation is too high when large-size pictures are detected, and other applications deployed in the same GPU are prone to crash; although the CRAFT detection algorithm overcomes the problems that CTPN cannot detect randomly arranged characters, the threshold value is set to be large and the like, the GPU memory occupation is too high when a large-size picture is detected.

Aiming at the problem that the GPU memory occupation is too high when a large-size picture is detected in the related art, an effective solution is not provided at present.

Disclosure of Invention

The main objective of the present application is to provide a text detection method based on an improved CRAFT, so as to solve the problem of an excessively high memory usage of a GPU when detecting a large-size picture.

In order to achieve the above purpose, the present application provides a text detection method and apparatus based on an improved CRAFT.

In a first aspect, the present application provides a text detection method based on an improved CRAFT.

The text detection method based on the improved CRAFT comprises the following steps:

cutting the picture to be detected to obtain a cut picture set;

calculating a feature map of the cutting picture set to obtain a cutting feature map set, wherein the feature map comprises single character position information and connection information between characters;

splicing the cut feature picture set according to the position of the cut picture set corresponding to the picture to be detected to obtain a feature picture to be detected corresponding to the picture to be detected;

and detecting a text region on the characteristic diagram to be detected according to the single character position information and the inter-character connection information, and outputting the position of the text region.

Further, the detecting the text region on the feature map to be detected according to the individual character position information and the inter-character connection information includes:

according to the single character position information and the inter-character connection information, carrying out noise reduction processing on the feature map to be detected to obtain a noise reduction feature map;

detecting the outline of the noise reduction characteristic diagram and generating a positive circumscribed rectangle corresponding to the outline;

and determining a text area according to the positive circumscribed rectangle.

Further, the denoising processing of the feature map to be detected to obtain a denoising feature map includes:

and denoising a channel in the thermodynamic diagram of the picture to be detected to obtain a denoising feature diagram, wherein the thermodynamic diagram is in Gaussian distribution.

Further, the denoising the channel in the thermodynamic diagram of the picture to be detected to obtain a denoising feature map includes:

adding a first channel and a second channel of the thermodynamic diagram to obtain a feature superposition diagram, wherein the first channel of the thermodynamic diagram contains feature information of a single character position, and the second channel of the thermodynamic diagram contains feature information of connection between characters;

and carrying out noise reduction processing on the feature overlay map through binarization to obtain a noise reduction feature map after noise reduction.

Further, the cutting the picture to be detected includes:

cutting the picture to be detected according to a preset side length, and judging whether the side length of the cut picture is larger than the minimum cutting side length;

if the length of the minimum cutting side is larger than the minimum cutting side length, continuously cutting according to the preset side length;

if the length of the side is less than the minimum cutting side length, the cutting is not continued.

Further, before the picture to be detected is cropped to obtain a cropped picture set, the method further includes:

and carrying out standardization processing on the pixels of the picture to be detected according to the preset pixel standard deviation and the preset pixel mean value.

In a second aspect, the present application provides an improved CRAFT-based text detection apparatus.

The text detection device based on the improved CRAFT comprises:

the picture cutting module is used for cutting the picture to be detected to obtain a cut picture set;

the characteristic acquisition module is used for calculating a characteristic diagram of the cutting image set to obtain a cutting characteristic diagram set, wherein the characteristic diagram comprises single character position information and connection information between characters;

the characteristic splicing module is used for splicing the cut characteristic image set according to the position of the cut image set corresponding to the image to be detected to obtain a characteristic image to be detected corresponding to the image to be detected;

and the text detection module is used for detecting the text region on the characteristic diagram to be detected according to the single character position information and the connection information between the characters and outputting the position of the text region.

Further, the text detection module comprises:

the noise reduction processing unit is used for carrying out noise reduction processing on the feature map to be detected according to the single character position information and the inter-character connection information to obtain a noise reduction feature map;

the contour detection unit is used for detecting the contour of the noise reduction characteristic diagram and generating a positive circumscribed rectangle corresponding to the contour;

and the text determining unit is used for determining a text area according to the positive circumscribed rectangle.

Further, the noise reduction processing unit further includes:

and the method is used for denoising the channel in the thermodynamic diagram of the picture to be detected to obtain a denoising feature diagram, wherein the thermodynamic diagram is in Gaussian distribution.

Further, the noise reduction processing unit further includes:

the first channel of the thermodynamic diagram contains characteristic information of single character positions, and the second channel of the thermodynamic diagram contains characteristic information of connection between characters; and carrying out noise reduction processing on the feature overlay map through binarization to obtain a noise reduction feature map after noise reduction.

Further, the picture cropping module comprises:

the side length judging unit is used for cutting the picture to be detected according to the preset side length and judging whether the side length of the cut picture is larger than the minimum cutting side length or not;

the continuous cutting unit is used for continuously cutting according to the preset side length if the minimum cutting side length is larger than the minimum cutting side length;

and the cutting stopping unit is used for not continuing cutting if the length of the cutting edge is less than the minimum cutting edge length.

Further, the text detection device based on the modified CRAFT further includes:

and the picture preprocessing module is used for carrying out standardization processing on the pixels of the picture to be detected according to the preset pixel standard deviation and the preset pixel mean value.

In a third aspect, the present application provides a non-transitory computer-readable storage medium having stored thereon a computer program, which when executed by a processor, implements the steps of the method for improved CRAFT-based text detection provided in the first aspect.

In a fourth aspect, the present application provides an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the steps of the method for text detection based on improved CRAFT provided in the first aspect when executing the program.

In the embodiment of the application, the mode of cutting the picture to be detected is adopted, the feature graph containing single character position information and word connection information of the cut picture is calculated, the cut picture set is spliced according to the position corresponding to the picture to be detected, the purpose of text detection is achieved by detecting the spliced feature graph, so that the picture with any size can be detected, the technical effect of the GPU memory is saved, and the technical problem that the GPU memory occupies too much when the picture with large size is detected is solved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this application, serve to provide a further understanding of the application and to enable other features, objects, and advantages of the application to be more apparent. The drawings and their description illustrate the embodiments of the invention and do not limit it. In the drawings:

FIG. 1 is a schematic flow chart of a text detection method based on improved CRAFT according to an embodiment of the present application;

FIG. 2 is a schematic flow chart of a text detection method based on improved CRAFT according to another embodiment of the present application;

FIG. 3 is a block diagram of a text detection device based on an improved CRAFT according to an embodiment of the present application;

FIG. 4 is a block diagram of an electronic device according to an embodiment of the application.

Detailed Description

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that embodiments of the application described herein may be used. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

According to an embodiment of the present application, there is provided a text detection method based on improved CRAFT, as shown in fig. 1, the method includes the following steps S1 to S4:

s1: and cutting the picture to be detected to obtain a cut picture set.

"cut out the picture to be detected", specifically: cutting the picture to be detected according to a preset side length, and judging whether the side length of the cut picture is larger than the minimum cutting side length; if the length of the minimum cutting side is larger than the minimum cutting side length, continuously cutting according to the preset side length; if the length of the side is less than the minimum cutting side length, the cutting is not continued.

The size of the picture to be detected is uncertain, and the picture to be detected needs to be circularly cut according to a specified size, wherein the specified size comprises the maximum cut side length (namely the preset side length) and the minimum cut side length (namely the preset side length plus the minimum side length of the picture). Optionally, if the residual side length of the picture to be detected after being cut according to the maximum side length is larger than the minimum cutting side length, cutting according to a preset side length, and storing the picture cut according to the preset side length into a cut picture set; if the length of the side is less than the minimum cutting side length, the side is not cut and is directly stored in a cutting picture set. For example, the to-be-detected image has a size of 1800 × 1400 pixels (pixel), the maximum side length of the cropping is 1200(pixel), and the minimum cropping side length is 1500(pixel), so that the to-be-detected image is cropped by a window having a length of 1200 (pixel). If the side length of 1800(pixel) is larger than the minimum clipping side length, clipping according to the maximum side length into 1200(pixel) and 600 (pixel); 1400(pixel) side length is smaller than the minimum clipping side length, and clipping is not performed, that is, the clipped pictures are 1200 × 1400(pixel) and 600 × 1400 (pixel). Further, the clipped pictures are stored in the clipped picture set, that is, the clipped picture set includes two pictures of 1200 × 1400 (pixels) and 600 × 1400 (pixels).

S2: and calculating a feature map of the cutting picture set to obtain the cutting feature map set, wherein the feature map comprises single character position information and connection information between characters.

And after the cut picture set is obtained, calculating the characteristic graph of each picture in the cut picture set respectively. The characteristics of the picture may include characteristics of single character position information and characteristics of inter-character connection information, and may also include information that can express character characteristics, such as gray information and character layout information, which is not limited herein. The first channel of the feature map contains feature information of the position of a single character, and the second channel of the feature map contains feature information of connection between characters. Specifically, after combining a plurality of features of each picture, performing convolution, calculating a feature map of a second channel of each picture, and storing the feature map into a cutting feature map set to obtain a cutting feature map set.

S3: and splicing the cut feature picture set according to the position of the picture to be detected corresponding to the cut picture set to obtain the feature picture to be detected corresponding to the picture to be detected.

After the clipping feature map set is obtained, the clipping feature map set needs to be spliced according to the position of an original image (i.e., an image to be detected), so as to obtain a feature map to be detected. Optionally, after splicing the cropping characteristic diagrams, the spliced characteristic diagrams to be detected can be verified. For example, the concatenation check may be performed according to a single-word position feature in the feature diagram, or may be performed according to an inter-word connection feature in the feature diagram, which is not limited herein.

S4: and detecting a text region on the characteristic diagram to be detected according to the single character position information and the inter-character connection information, and outputting the position of the text region.

Specifically, detecting the text region on the feature map to be detected according to the individual character position information and the inter-character connection information includes: according to the single character position information and the inter-character connection information, carrying out noise reduction processing on the feature map to be detected to obtain a noise reduction feature map; detecting the outline of the noise reduction characteristic diagram and generating a positive circumscribed rectangle corresponding to the outline; and determining the text area according to the positive circumscribed rectangle.

And (3) denoising the characteristic graph to be detected according to the single character position information and the inter-character connection information, wherein in an example, denoising characteristic graphs are obtained by deleting noise points except the single character position information and the inter-character connection information. And further detecting the outline of the noise reduction feature map to obtain the outline of each character block in the feature overlay map, and generating a positive circumscribed rectangle corresponding to the outline. And storing all the positive circumscribed rectangles in the binarized feature overlay image to generate a list, simultaneously storing the outline generation list of each text block, and enabling the positive circumscribed rectangles to correspond to the outlines, wherein the output positive circumscribed rectangles are determined text areas.

The method for denoising the characteristic graph to be detected to obtain the denoising characteristic graph specifically comprises the following steps: and denoising a channel in the thermodynamic diagram of the picture to be detected to obtain a denoising feature diagram, wherein the thermodynamic diagram is in Gaussian distribution.

The characteristic diagram to be detected is a thermodynamic diagram with two channels, and each channel in the thermodynamic diagram is similar to a Gaussian distribution by adding and reducing noise for the two channels of the thermodynamic diagram. Optionally, the noise reduction processing may be performed by using a thermal gray scale map similar to gaussian distribution, for example, the text region of the feature map to be detected may correspond to a peak value of the thermal gray scale map similar to gaussian distribution, and the text region of the feature map to be detected may also correspond to a red region similar to the thermal gray scale map. And denoising the characteristic graph to be detected according to a threshold, wherein the threshold can be preset by a system or set by a user according to the requirement, and is not limited herein. Illustratively, when the threshold is 50, deleting noise points with pixel points smaller than 50 in the feature graph to be detected, so as to achieve the effect of noise reduction. Optionally, after the noise of the channel in the thermodynamic diagram of the picture to be detected is reduced, the feature of the picture to be detected is extracted.

The method for denoising a channel in a thermodynamic diagram of a picture to be detected to obtain a denoising feature map specifically comprises the following steps: adding a first channel and a second channel of the thermodynamic diagram to obtain a feature superposition diagram, wherein the first channel of the thermodynamic diagram contains feature information of a single character position, and the second channel of the thermodynamic diagram contains feature information of connection between characters; and carrying out noise reduction processing on the feature overlay image through binarization to obtain a noise reduction feature image after noise reduction.

When the thermodynamic diagram is processed, adding and denoising two channels of the thermodynamic diagram, specifically adding a feature map of a first channel and a feature map of a second channel through matrix addition to obtain a feature overlay diagram, wherein the feature overlay diagram comprises feature information of single character positions and feature information of connection between characters. Optionally, the channel of the feature overlay is a first channel, and the feature overlay may be a text block of a single text, or a text block (i.e., a segment of text) composed of multiple texts. Further, binarization is performed on the feature overlay image to obtain a noise reduction feature image after noise reduction, and optionally, a threshold value of the binarization may be a user-defined confidence level or a system preset confidence level.

Further, before the to-be-detected picture is cut to obtain a cut picture set, the method further comprises:

The preset pixel standard deviation may be a standard deviation determined by calculating a data set (e.g., ImageNet), or may be a value set by a user, which is not limited herein. The preset pixel mean may be a mean determined by calculating the data set, or may be a value set by the user, which is not limited herein. "standardizing the pixels of the picture to be detected according to the preset pixel standard deviation and the preset pixel mean value" specifically includes: and multiplying the pixels of the picture to be detected by the standard deviation of the preset pixels and the mean value of the preset pixels, and standardizing the pixel values of the processed picture to the standard positive-phase-difference cloth.

Illustratively, according to another alternative embodiment of the present application, as shown in fig. 2, the method includes:

(1) and inputting a picture. When inputting, the pixel values of the pictures need to be normalized to the standard normal distribution (the input pictures are multiplied by the standard deviation and then added with the average value, and the average value and the standard deviation of the normal distribution are both from the ImageNet data set).

(2) And (4) performing picture cropping and calculating feature maps of all the slices in a circulating mode. This step clips the picture to a size that we stipulate, for example, 1200(pixel) is the maximum side length (corresponding to the preset side length in the above step S1), and the shortest side length is 300(pixel) (corresponding to the minimum lengthening of the picture in the above step S1), then the picture will be clipped by the window with the length of 1200(pixel), the clipped picture will be stored in a picture list, if the window moves to a place, if the subtracted 1200(pixel) is less than 300(pixel), then this position will not be clipped, and the picture list will be directly stored. At this time, a loop is used to calculate the feature map of each picture (the number of channels of the feature map is 2, which corresponds to the second channel in step S2), and the feature map is put into a list for storing feature maps. If the longest side of the picture is less than 1200(pixel) plus 300(pixel), the feature map is directly calculated (the number of channels of the feature map is 2, which is equivalent to the second channel in step S2).

(3) And (5) splicing the characteristic graphs. And (3) splicing the feature maps in the list of the stored feature maps obtained in the step (2) according to the position of the original image, so that a feature map which is consistent with and corresponds to the original image can be obtained. Alternatively, if there is only one signature in the list, no concatenation is used.

(4) Text regions are detected on the feature map. The channel 1 (corresponding to the first channel in the step S2) of the feature map represents each word position (each word position is estimated to be a gaussian distribution), the channel 2 (corresponding to the second channel in the step S2) is a connection between two words (each connection between two words is estimated to be a gaussian distribution), and then the channel 1 (corresponding to the first channel in the step S2) and the channel 2 (corresponding to the second channel in the step S2) of the feature map are added (matrix addition) to obtain a feature map (channel 1, corresponding to the first channel in the step S2) which includes the connection between each word position and each word (the connection may be made by connecting words into a word block, i.e., a sentence). And (3) carrying out binarization on the feature map obtained in the last step (the threshold value of binarization is confidence degree, which represents that only a character area with confidence degree higher than the threshold value is extracted), detecting the contour of each character block by using a contour detection mode, and making a right circumscribed rectangle for the contour of each character block.

(5) The positions of all the text regions are output. And (3) storing the circumscribed rectangles obtained in the step (4) into a list, storing the outlines into the list, enabling one circumscribed rectangle to correspond to one outline, finally outputting the circumscribed rectangle list as a text area, and outputting the outline list as the actual outline of the text of each text area, wherein the actual outline can be used for removing unnecessary information.

From the above description, it can be seen that the following technical effects are achieved by the present application:

in the embodiment of the application, the mode of cutting the picture to be detected is adopted, the feature graph containing single character position information and word connection information of the cut picture is calculated, the picture to be detected is spliced according to the position of the cut picture set corresponding to the picture to be detected, and the purpose of text detection is achieved by detecting the spliced feature graph, so that the pictures with any size can be detected, and the technical effect of saving the memory of the GPU is achieved.

The text detection method based on the improved CRAFT and the CTPN text detection algorithm are transversely compared: the CTPN cannot identify curved characters, a large number of threshold values are set, the robustness of network text detection is not strong, too large pictures cannot be detected, and the GPU occupies too high during large picture detection, so that other applications deployed in the same GPU are easy to crash. The CRAFT detection algorithm can detect randomly arranged characters, the threshold value is less, the network structure is visual, the improved CRAFT can detect pictures with any size, and the memory of a GPU is saved.

It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer-executable instructions and that, although a logical order is illustrated in the flowcharts, in some cases, the steps illustrated or described may be performed in an order different than presented herein.

According to an embodiment of the present application, there is further provided a device 30 for implementing the text detection method based on improved CRAFT, as shown in fig. 3, the text detection device 30 based on improved CRAFT includes:

the picture cutting module 301 is configured to cut a picture to be detected to obtain a cut picture set;

the feature obtaining module 302 is configured to calculate a feature map of the clipped picture set to obtain a clipped feature map set, where the feature map includes single character position information and inter-character connection information;

the feature splicing module 303 is configured to splice the cut feature map set according to the position of the cut picture set corresponding to the to-be-detected picture, so as to obtain a to-be-detected feature map corresponding to the to-be-detected picture;

and the text detection module 304 is configured to detect a text region on the feature map to be detected according to the single character position information and the inter-character connection information, and output a position of the text region.

Further, the picture cropping module 301 includes:

Further, the text detection module 304 includes:

and the text determining unit is used for determining the text area according to the circumscribed rectangle.

Further, the noise reduction processing unit further includes:

and denoising the channels in the thermodynamic diagram of the picture to be detected to obtain a denoising feature diagram, wherein the thermodynamic diagram is in Gaussian distribution.

Further, the noise reduction processing unit further includes:

the method comprises the steps of adding a first channel and a second channel of the thermodynamic diagram to obtain a feature superposition diagram, wherein the first channel of the thermodynamic diagram contains feature information of a single character position, and the second channel of the thermodynamic diagram contains feature information of connection between characters; and carrying out noise reduction processing on the feature overlay image through binarization to obtain a noise reduction feature image after noise reduction.

Further, the text detection device 30 based on modified CRAFT further includes:

Specifically, the implementation of each module in this embodiment may refer to the related implementation in the method embodiment, and is not described again.

in the embodiment of the application, the mode of cutting the picture to be detected is adopted, the feature graph containing single character position information and word connection information of the cut picture is calculated, the picture to be detected is spliced according to the position of the cut picture set corresponding to the picture to be detected, and the purpose of detecting the spliced feature graph is achieved, so that the pictures with any size can be detected, and the technical effect of saving the memory of the GPU is achieved.

Embodiments of the present invention provide a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the improved CRAFT-based text detection method. Examples include: cutting the picture to be detected to obtain a cut picture set; calculating a feature map of the cutting picture set to obtain a cutting feature map set, wherein the feature map comprises single character position information and connection information between characters; splicing the cut feature picture set according to the position of the cut picture set corresponding to the picture to be detected to obtain a feature picture to be detected corresponding to the picture to be detected; and detecting a text region on the characteristic diagram to be detected according to the single character position information and the inter-character connection information, and outputting the position of the text region.

Fig. 4 is a block diagram of an electronic device according to an embodiment of the present invention, and as shown in fig. 4, the electronic device includes: a processor 401, a memory 402, and a bus 403;

the processor 401 and the memory 402 respectively complete communication with each other through the bus 403; processor 401 is configured to call program instructions in memory 402 to perform the text detection method based on modified CRAFT provided by the above embodiments, including: cutting the picture to be detected to obtain a cut picture set; calculating a feature map of the cutting picture set to obtain a cutting feature map set, wherein the feature map comprises single character position information and connection information between characters; splicing the cut feature picture set according to the position of the cut picture set corresponding to the picture to be detected to obtain a feature picture to be detected corresponding to the picture to be detected; and detecting a text region on the characteristic diagram to be detected according to the single character position information and the inter-character connection information, and outputting the position of the text region.

It will be apparent to those skilled in the art that the modules or steps of the present application described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and they may alternatively be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, or fabricated separately as individual integrated circuit modules, or fabricated as a single integrated circuit module from multiple modules or steps. Thus, the present application is not limited to any specific combination of hardware and software.

The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A text detection method based on improved CRAFT is characterized by comprising the following steps:

cutting the picture to be detected to obtain a cut picture set;

2. The text detection method based on the improved CRAFT of claim 1, wherein the detecting the text region on the feature map to be detected according to the single character position information and the inter-character connection information comprises:

and determining a text area according to the positive circumscribed rectangle.

3. The text detection method based on the improved CRAFT of claim 2, wherein the denoising processing is performed on the feature map to be detected to obtain a denoising feature map, and comprises:

4. The method according to claim 3, wherein the denoising the channel in the thermodynamic diagram of the picture to be detected to obtain a denoising feature map comprises:

5. The method according to claim 1, wherein the cropping the picture to be detected comprises:

6. The method according to claim 1, wherein before the cropping the picture to be detected to obtain the cropped picture set, the method further comprises:

7. A text detection device based on modified CRAFT, comprising:

8. The apparatus of claim 7, wherein the text detection module comprises:

9. A computer-readable storage medium storing computer instructions for causing a computer to perform the method for improved CRAFT-based text detection according to any one of claims 1-6.

10. An electronic device, comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores a computer program executable by the at least one processor, the computer program being executable by the at least one processor to cause the at least one processor to perform the method for improved CRAFT based text detection of any of claims 1-6.