CN114445825A

CN114445825A - Character detection method and device, electronic equipment and storage medium

Info

Publication number: CN114445825A
Application number: CN202210117039.6A
Authority: CN
Inventors: 刘威威; 杜宇宁; 李晨霞; 郭若愚; 赖宝华; 马艳军; 于佃海
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-02-07
Filing date: 2022-02-07
Publication date: 2022-05-06
Also published as: WO2023147717A1

Abstract

The disclosure provides a character detection method, a character detection device, electronic equipment and a storage medium, and relates to the field of artificial intelligence, in particular to the field of deep learning, the field of character recognition and the field of image processing. The specific implementation scheme of the character detection method is as follows: detecting a binary image representing a region where characters are located in an image to be processed to obtain contour information of the region where the characters are located; determining position information of an initial surrounding frame aiming at the characters according to the outline information; determining an edge extension value for the initial bounding box according to the location information; and extending the edge of the initial enclosing frame according to the edge extension value to obtain the position information of the enclosing frame for identifying the characters.

Description

Character detection method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of artificial intelligence technology, and in particular to the field of deep learning, the field of character recognition and the field of image processing. And more particularly, to a text detection method, apparatus, electronic device, and storage medium.

Background

With the development of computer technology and network technology, deep learning technology has been widely used in many fields. For example, a deep learning technique may be used to detect characters in an image and locate a character region in the image, so as to identify the characters in the image.

Disclosure of Invention

The present disclosure is directed to a text detection method, apparatus, electronic device, and storage medium that improve detection efficiency.

According to an aspect of the present disclosure, there is provided a text detection method, including: detecting a binary image representing a region where characters are located in an image to be processed to obtain contour information of the region where the characters are located; determining position information of an initial surrounding frame aiming at the characters according to the outline information; determining an edge extension value for the initial bounding box according to the location information; and extending the edge of the initial enclosing frame according to the edge extension value to obtain the position information of the enclosing frame for identifying the characters.

According to another aspect of the present disclosure, there is provided a character detection apparatus including: the image detection module is used for detecting a binary image representing the region where the characters are located in the image to be processed to obtain the contour information of the region where the characters are located; the position determining module is used for determining the position information of the initial surrounding frame aiming at the characters according to the outline information; a value determination module for determining an edge extension value for the initial bounding box according to the location information; and the position obtaining module is used for prolonging the edge of the initial surrounding frame according to the edge extension value to obtain the position information of the surrounding frame for identifying the characters.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to execute the text detection method provided by the disclosure.

According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform a text detection method provided by the present disclosure.

According to another aspect of the present disclosure, there is provided a computer program product comprising computer programs/instructions which, when executed by a processor, implement the text detection method provided by the present disclosure.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a schematic diagram of an application scenario of a text detection method and apparatus according to an embodiment of the present disclosure;

FIG. 2 is a schematic flow chart diagram of a text detection method according to an embodiment of the present disclosure;

FIG. 3 is a schematic diagram illustrating the principle of extending the sides of an initial bounding box according to an embodiment of the present disclosure;

FIG. 4 is a schematic diagram of a text detection method according to an embodiment of the present disclosure;

FIG. 5 is a block diagram of a text detection device according to an embodiment of the present disclosure; and

FIG. 6 is a block diagram of an electronic device for implementing a text detection method of an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

The present disclosure provides a text detection method, which includes a graph detection stage, a position determination stage, an extension value determination stage, and a position acquisition stage. In the image detection stage, a binary image representing the region where the characters are located in the image to be processed is detected, and the outline information of the region where the characters are located is obtained. In the position determining stage, the position information of the initial surrounding frame aiming at the characters is determined according to the outline information. In the extension value determination phase, an edge extension value for the initial bounding box is determined from the position information. In the position obtaining stage, the edge of the initial surrounding frame is extended according to the edge extension value, and the position information of the surrounding frame for identifying the characters is obtained.

An application scenario of the method and apparatus provided by the present disclosure will be described below with reference to fig. 1.

Fig. 1 is a schematic view of an application scenario of a text detection method and apparatus according to an embodiment of the present disclosure.

As shown in fig. 1, the application scenario 100 of this embodiment may include an electronic device 110, and the electronic device 110 may be various electronic devices with processing functionality, including but not limited to a smartphone, a tablet, a laptop, a desktop computer, a server, and so on.

The electronic device 110 may perform text detection on the input image 120, for example, to obtain a text bounding box 130. The text included in the image 120 can be obtained by performing text recognition on the image in the text enclosure 130. The electronic device 110 may employ a text recognition technology (e.g., an optical character recognition technology (OCR), etc.) to perform text recognition on the image within the text enclosure 130. By detecting the word bounding box 130 and then performing word Recognition, the accuracy of Scene Text Recognition (STR) can be improved. For example, accurate recognition of curved text and the like can be achieved.

In one embodiment, the electronic device 110 may perform text detection via the text detection algorithm 140 to obtain the text bounding box 130. The text detection algorithm 140 may be a regression-based detection algorithm or a segmentation-based detection algorithm, among others. The regression-based detection algorithm may include, for example, a Textboxes algorithm, a Textboxes + + algorithm, or a Rotational area convolutional neural network (R2 CNN) algorithm. The detection algorithm based on the segmentation may include a Pixel-Link (Pixel-Link) algorithm, a Progressive Scale Expansion Network (PSENet), a differential Binarization Network (DBNet), and the like.

In one embodiment, the text detection algorithm 140 may be provided by the server 150. For example, the electronic device 110 may be communicatively coupled to the server 150 via a network to send algorithm acquisition requests to the server 150. Accordingly, the server 150 may send the text detection algorithm 140 to the electronic device 110 in response to the algorithm acquisition request. When the text detection algorithm 140 is implemented by a deep learning network model, the server 150 may also be used to train the deep learning network model, for example, and send the trained deep learning network model in response to an algorithm acquisition request.

In an embodiment, the electronic device 110 may further send the input image 120 to the server 150, perform text detection on the image 120 by the server 150, so as to obtain a text enclosure box, and identify the text in the text enclosure box.

It should be noted that the text detection method provided by the present disclosure may be executed by the electronic device 110, and may also be executed by the server 150. Accordingly, the text detection apparatus provided by the present disclosure may be disposed in the electronic device 110, and may also be disposed in the server 150.

It should be understood that the number and type of electronic devices 110 and servers 150 in fig. 1 are merely illustrative. There may be any number and type of electronic devices 110 and servers 150, as desired for an implementation.

Hereinafter, the text detection method provided by the present disclosure will be described in detail by the following fig. 2 to 4 in conjunction with fig. 1.

Fig. 2 is a schematic flow chart diagram of a text detection method according to an embodiment of the present disclosure.

As shown in fig. 2, the text detection method 200 of this embodiment includes operations S210 to S240.

In operation S210, a binary image representing a region where a character is located in the image to be processed is detected, and outline information of the region where the character is located is obtained.

According to an embodiment of the present disclosure, the image to be processed is an image including characters. For example, the image to be processed may be obtained by photographing an entity having characters such as a billboard, a trademark, a car, an invoice, and the like.

The embodiment can acquire a binary image generated in advance for detection. Or, after acquiring the image to be processed, the embodiment may detect the image to be processed by using a text detection algorithm based on segmentation, so as to obtain a binary image representing a region where the text is located. Similarly, the pixel points with the value not being 0 in the binary image are the pixel points where the characters are located, and the region formed by the pixel points where all the characters are located can be used as the region where the characters are located. The principle of obtaining the binary image is described in detail below, and is not described in detail here.

According to the embodiment of the disclosure, each pixel point in the binary image can be scanned, and whether the values of four pixel points adjacent to each pixel point include the pixel point of 0 or not is determined. If the pixel points are 0, each pixel point is used as a contour point, otherwise, each pixel point is determined to be a point inside the contour or a point outside the contour. In this embodiment, a target area formed by connecting contour points in the binary image may be used as an area where characters are located, and coordinate values of the contour points in the binary image that connect the target area may be used as contour information of the area where the characters are located. Wherein, the target area refers to: and the values of the pixel points in the areas are not 0.

In operation S220, position information of an initial bounding box for the text is determined according to the outline information.

According to an embodiment of the present disclosure, the minimum bounding rectangle of the target area described above may be used as the initial bounding box for the text. For example, the embodiment may determine two contour points located at both ends in the width direction of the binary map and two contour points located at both ends in the height direction of the binary map, among the contour points connecting the formation target regions. And then taking the four determined contour points as four target contour points, and taking a rectangular frame of which four sides pass through the four target contour points respectively as an initial surrounding frame. The coordinate values of the four vertices of the rectangular frame in the binary image may be used as the position information of the initial bounding box, or the coordinate values of the four contour points in the binary image may be used as the position information of the initial bounding box.

In operation S230, an edge extension value for the initial bounding box is determined according to the location information.

In operation S240, the edge of the initial bounding box is extended according to the edge extension value, resulting in position information of the bounding box for recognizing the text.

According to the embodiment of the present disclosure, after the position information of the initial enclosure frame is obtained, for example, the width and height of the initial enclosure frame may be determined according to the position information. The root mean square of the width and height may then be taken as the edge extension value. Alternatively, the product of the width and the predetermined ratio may be used as the edge extension value in the width direction of the initial bounding box, and the product of the height and the predetermined ratio may be used as the edge extension value in the height direction of the initial bounding box. The predetermined proportion can be set according to actual requirements.

After the edge extension value is obtained, the edge of the initial bounding box can be extended. For example, for each edge of the initial bounding box, the two end points of each edge may be extended in opposite directions by the same length, resulting in an extended edge. After all edges of the initial bounding box have been extended, a plurality of extended edges may be obtained. This embodiment may connect all endpoints of the plurality of extended edges, resulting in a bounding box for identifying text. That is, the outline of the bounding box for identifying text passes through all end points of the plurality of extended edges. For example, the embodiment may use coordinate values of all end points of the plurality of extended sides in the binary image as the position information of the bounding box for recognizing the text.

According to the method and the device for identifying the characters, the edge extension value is determined according to the position information of the initial surrounding frame, the edge of the initial surrounding frame is extended according to the extension value to obtain the position information of the surrounding frame used for identifying the characters, and the determined surrounding frame used for identifying the characters can better cover the area where the characters are located in the image to be processed. This is because, when detecting an image to be processed to obtain a binary image, in order to enable the binary image to better represent different text lines, an area where a character is located is often reduced to a certain extent, and the area where the character represented by the binary image is located cannot completely cover the character. Correspondingly, the outline information of the region where the characters are located obtained by detecting the binary image and the initial surrounding frame determined according to the outline information both have the problem that the character region cannot be completely embodied.

According to the character detection method provided by the embodiment of the disclosure, the edge of the initial bounding box is extended according to the determined edge extension value, and the bounding box of the character is obtained based on the extended edge, so that the purpose of enlarging the bounding box and the purpose of improving the precision of the bounding box for recognizing the character can be achieved. Compared with the technical scheme of calling a graphic processing library (such as a Clipper library) to process the binary image and accordingly obtaining the bounding box for identifying the characters, the processing process can be simplified, the calculation amount and the resource occupation amount are reduced, and the processing efficiency is improved. Therefore, the processing logic of the character detection method provided by the embodiment can be deployed on equipment with limited computing performance, such as terminal equipment, and the like, and is favorable for improving the robustness of the character detection method.

In an embodiment, a contour detection function may be called to detect a binary image, so as to obtain contour information of an area where a text is located. The contour detection function may be a function in a computer vision library, and the computer vision library may include OpenCV and the like. For example, taking OpenCV as an example for a computer vision library, the contour detection function may be a findContours function. OpenCV is an Inter open source computer vision library, and is composed of a series of C functions and a small number of C + + classes. It is understood that OpenCV, as described above, is merely an example to facilitate understanding of the present disclosure, and that the present disclosure may employ any lightweight computer vision library, and the present disclosure is not limited thereto.

For example, the embodiment may use the binary image as a value of an image parameter in a findContours function, and obtain a vector after processing by the findContours function, where the vector includes at least one point set, and each point set corresponds to a contour. It is understood that the points included in each point set are similar to the contour points described above, and are not described in detail here.

In an embodiment, a minimum bounding rectangle function may be invoked to determine the position information for the initial bounding box for the word. The minimum bounding rectangle function can be a function in a computer vision library. For example, taking OpenCV as an example of a computer vision library, the minimum bounding rectangle function may be a minareact function.

For example, each point set described above may be used as an input to a minAreaRect function, which, when processed, outputs coordinate values of four vertices of a rectangle. The rectangle is the initial bounding box, and the coordinate values of the four vertices can be used as the position information of the initial bounding box. The rectangle resulting from the minAreaRect function may be by deflection angle. That is, the included angle between each side of the rectangle and the width direction of the binary image and the included angle between each side and the height direction of the binary image are both not zero.

In the embodiment, the contour information of the region where the character is located and the position information of the initial bounding box are obtained by calling the function in the computer vision library, so that the processing logic of the character detection method can be realized by adopting codes with high running efficiency, such as C + + codes and the like. Compared with the technical scheme of adopting python codes to realize the processing logic by depending on the Clipper library, the method can reduce the deployment time of the processing logic, reduce the occupation of the codes for realizing the processing logic on the content space, and improve the character detection efficiency.

Fig. 3 is a schematic diagram of a principle of obtaining a bounding box for recognizing text according to an embodiment of the present disclosure.

According to an embodiment of the present disclosure, when determining the edge extension value of the initial bounding box, the size information of the initial bounding box may be determined according to the position information of the initial bounding box. An edge extension value for the initial bounding box is then determined based on the size information and a predetermined extension coefficient. According to the embodiment, the expansion range of the initial bounding box can be flexibly adjusted according to actual requirements by setting the extension coefficient, so that the obtained bounding box for recognizing characters can better meet the actual requirements.

In an embodiment, the determined size information of the initial bounding box may include a width and a height of the initial bounding box. This embodiment may take the product of the height and the predetermined elongation coefficient as the edge elongation value of the height-direction edge of the initial bounding box, and the product of the width and the predetermined elongation coefficient as the edge elongation value of the width-direction edge of the initial bounding box. The predetermined elongation coefficient may be set according to actual requirements, which is not limited in this disclosure.

In an embodiment, the determined size information of the initial bounding box may include a perimeter and an area of the initial bounding box. This embodiment may employ a method similar to the calculation method of the offset coefficient D' of the Vatti Clipping algorithm to determine the edge extension value from the perimeter and the area. For example, the ratio of the area to the perimeter may be determined first, and the product of this ratio and a predetermined extension coefficient may be used as the edge extension value. The embodiment determines the edge extension value according to the determined perimeter and area, so that the principle of the text detection method of the embodiment of the disclosure can be more suitable for the detection method realized by relying on the Clipper library. Therefore, the processing efficiency can be improved, and meanwhile, the obtained surrounding frame used for recognizing the characters is closer to the surrounding frame obtained through the complex processing process, so that the detection precision is ensured.

For example, if the position information of the initial bounding box is represented by coordinate values of four vertices in the binary image, the area of the initial bounding box is a, the perimeter of the initial bounding box is P, and the edge extension value is d, then d can be calculated, for example, by using the following formula:

d＝A*unclip_ratio/D。

wherein D is the circumference P

The unclip _ ratio may be a super parameter, which is used to adjust the extent of the bounding box expansion, and the value of the super parameter may be, for example, 1.5,the present disclosure is not limited thereto.

For example, the area a of the initial bounding box may be calculated using the following formula:

wherein x is₁、...、x_nRespectively representing coordinate values, y, of a horizontal axis of each of the four vertices in a binary image-based coordinate system₁、...、y_nAnd coordinate values respectively representing the longitudinal axes of the four vertexes in a coordinate system constructed based on the binary image. Wherein n is 4.

It should be noted that the determined edge extension value may be a value in which the edge is extended in one direction. This embodiment may be elongated with the sides in two opposite directions.

In one embodiment, as shown in fig. 3, in this embodiment 300, the four vertices of the initial bounding box 301 are set to p _0, p _1, p _2, and p _3, respectively, and the edge extension value is determined to be d. When the point of the initial bounding box is extended according to the edge extension value, the edge of the initial bounding box may be extended based on the extension value to obtain the position information of the extended edge. Then, position information of the bounding box for identifying the character is determined based on the position information of the extended side.

For example, for an edge formed by connecting the vertex p _0 and the vertex p _1 in the initial bounding box 301, the edge is extended by d in two opposite directions, and the extended edge is formed by connecting the point 311 and the point 312. Similarly, for the edge formed by connecting the vertex p _0 and the vertex p _3 in the initial bounding box, the edge is extended by d in two opposite directions, and the extended edge is the edge formed by connecting the point 313 and the point 314. The edge formed by connecting the vertex p _1 with the vertex p _2 and the edge formed by connecting the vertex p _2 with the vertex p _3 in the initial bounding box are extended in a similar way. By extending all the edges of the initial bounding box 301, 8 points can be obtained. Here, the position information of the extended side corresponding to each side may be represented by coordinate values of two points obtained by extending the side. For example, the extended edge obtained by extending the edge formed by connecting the vertex p _0 and the vertex p _3 may be represented by the coordinate values of the point 313 and the coordinate values of the point 314.

For example, the resulting 8 points may constitute four point groups respectively near the vertices p _0, p _1, p _2, p _3, each point group including two points. For example, the group of points near the vertex p _0 includes the point 311 and the point 313. The embodiment may determine a rectangular box with the each vertex and two points in the group of points near the each vertex as three vertices. The embodiment may use another vertex except the three vertices in the determined rectangular box as one vertex of a bounding box for recognizing a text. In this way, for each vertex in the initial bounding box 301, a similar method may be used to obtain one vertex of the bounding box for identifying the text, and a total of four vertices of the bounding box for identifying the text may be obtained, so as to obtain the bounding box 302 for identifying the text. For example, the position information of the bounding box 302 may be represented by coordinate values of four vertices of the bounding box 302.

In an embodiment, after obtaining 8 points, the coordinate values of the 8 points may form a point set, the formed point set is used as an input of the minareact function described above, and after the minareact function processes the point set, the coordinate values of the four vertices of the bounding box 302 for recognizing the text are output, so as to obtain the position information of the bounding box for recognizing the text.

FIG. 4 is a schematic diagram of a text detection method according to an embodiment of the present disclosure.

As shown in fig. 4, in this embodiment 400, when detecting characters, a to-be-processed image 401 may be detected first, so as to obtain a binary image representing an area where the characters are located.

For example, the embodiment may employ a Segmentation-based (Segmentation-based) text detection algorithm to detect the image to be processed 401. Taking a text detection algorithm as the DBNet 410 as an example, in the embodiment, the image 401 to be processed can be input into the DBNet 410, and features are extracted through a Backbone network (Backbone) in the DBNet 410 to obtain a feature map F. Then, the DBNet 410 may predict a probability map (probability map) P402 and a threshold map (threshold map) T403 at the same time according to the feature map F. Finally, the embodiment may take the predicted probability map P402 output by the DBNet 410 as the binary image B404.

It can be understood that in the training process of the DBNet 410, the binary image B needs to be calculated according to the prediction probability P and the threshold map T. For example, in the training process, the element value B of the ith row and the jth column in the binary image B_i，jThe formula can be used for calculation:

where k is the expansion factor, which can be set empirically. P_i，jAnd (3) for predicting the element values of the ith row and the jth column in the probability map P, each probability element in the probability map P represents the probability that a pixel point corresponding to each element in the image to be processed represents a text. T is_i，jEach element in the threshold map P represents a threshold value for a pixel point corresponding to each element in the image to be processed, for the value of the element in the ith row and the jth column in the threshold map T.

After obtaining the binary image B404, post-processing may be performed on the binary image B404 to obtain position information of the bounding box 405 for recognizing characters. The post-processing can be realized through operations S210 to S240 described in fig. 2, and is not described herein again.

According to the character detection method, the to-be-processed image is detected by adopting the text detection algorithm based on the segmentation, so that the character detection method can accurately detect the bent characters and the adjacent characters in the natural scene. Because the DBNet is used for dividing the region where the characters are located and the region where the non-characters are located in the image to be processed in a dynamic threshold mode, the DBNet is used for detecting the image to be processed, and the detection precision and the detection efficiency of the character detection method can be improved to a certain extent. The reason that the DBNet can divide the region where the characters are located and the region where the non-characters are located by adopting a dynamic threshold value mode is that the DBNet is a deep learning network, and the DBNet can continuously adjust network parameters related to obtaining the threshold value graph T according to the feature graph F by continuously learning image features.

Based on the character detection method provided by the disclosure, the disclosure also provides a character detection device. The apparatus will be described in detail below with reference to fig. 5.

Fig. 5 is a block diagram of a structure of a text detection apparatus according to an embodiment of the present disclosure.

As shown in fig. 5, the text detection apparatus 500 of this embodiment may include a diagram detection module 510, a position determination module 520, a value determination module 530, and a position acquisition module 540.

The graph detection module 510 is configured to detect a binary image representing an area where a character in the image to be processed is located, and obtain contour information of the area where the character is located. In an embodiment, the graph detection module 510 may be configured to perform the operation S210 described above, which is not described herein again.

The position determining module 520 is configured to determine the position information of the initial bounding box for the text according to the contour information. In an embodiment, the position determining module 520 may be configured to perform the operation S220 described above, which is not described herein again.

The value determination module 530 is configured to determine an edge extension value for the initial bounding box based on the location information. In an embodiment, the value determining module 530 may be configured to perform the operation S230 described above, which is not described herein again.

The position obtaining module 540 is configured to extend the edge of the initial bounding box according to the edge extension value, so as to obtain position information of the bounding box for identifying the text. In an embodiment, the value determining module 530 may be configured to perform the operation S240 described above, which is not described herein again.

According to an embodiment of the present disclosure, the value determination module 530 may include a size determination sub-module and an extension value determination sub-module. And the size determining submodule is used for determining the size information of the initial surrounding frame according to the position information. And the extension value determining submodule is used for determining an edge extension value aiming at the initial surrounding frame according to the size information and a preset extension coefficient.

According to an embodiment of the present disclosure, the size determination module is specifically configured to determine a circumference of the initial bounding box and an area of the initial bounding box according to the position information.

According to an embodiment of the present disclosure, the position obtaining module 540 may include an edge extension sub-module and a position determination sub-module. And the edge extension submodule is used for extending the edge of the initial surrounding frame based on the edge extension value to obtain the position information of the extended edge. And the position determining submodule is used for determining the position information of the surrounding frame aiming at the characters according to the position information of the extended edge, and the position information is used for identifying the surrounding frame of the characters.

According to an embodiment of the present disclosure, the text detection apparatus 500 may further include a binary image obtaining module, configured to detect an image to be processed by using a text detection algorithm based on segmentation, so as to obtain a binary image representing a region where a text is located.

According to an embodiment of the present disclosure, the graph detection module 510 is specifically configured to invoke a contour detection function in a computer vision library to detect a binary image, so as to obtain contour information. The position determining module 520 is specifically configured to call a minimum area rectangular function in a computer vision library to determine the position information of the initial bounding box according to the contour information.

In the technical scheme of the present disclosure, the processes of collecting, storing, using, processing, transmitting, providing, disclosing and applying the personal information of the related users all conform to the regulations of related laws and regulations, and necessary security measures are taken without violating the good customs of the public order. In the technical scheme of the disclosure, before the personal information of the user is acquired or collected, the authorization or the consent of the user is acquired.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 6 illustrates a schematic block diagram of an example electronic device 600 that may be used to implement the text detection method of embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not intended to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 6, the apparatus 600 includes a computing unit 601, which can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)602 or a computer program loaded from a storage unit 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data required for the operation of the device 600 can also be stored. The calculation unit 601, the ROM 602, and the RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

A number of components in the device 600 are connected to the I/O interface 605, including: an input unit 606 such as a keyboard, a mouse, or the like; an output unit 607 such as various types of displays, speakers, and the like; a storage unit 608, such as a magnetic disk, optical disk, or the like; and a communication unit 609 such as a network card, modem, wireless communication transceiver, etc. The communication unit 609 allows the device 600 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

The computing unit 601 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 601 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 601 performs the respective methods and processes described above, such as the character detection method. For example, in some embodiments, the text detection method can be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 608. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 600 via the ROM 602 and/or the communication unit 609. When the computer program is loaded into RAM 603 and executed by the computing unit 601, one or more steps of the text detection method described above may be performed. Alternatively, in other embodiments, the computing unit 601 may be configured to perform the text detection method in any other suitable manner (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server may be a cloud Server, which is also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service extensibility in a traditional physical host and a VPS service ("Virtual Private Server", or simply "VPS"). The server may also be a server of a distributed system, or a server incorporating a blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A text detection method, comprising:

detecting a binary image representing a region where characters in an image to be processed are located to obtain outline information of the region where the characters are located;

determining position information of an initial surrounding frame aiming at the characters according to the outline information;

determining an edge extension value for the initial bounding box according to the location information; and

and extending the edge of the initial enclosing frame according to the edge extension value to obtain the position information of the enclosing frame for identifying the characters.

2. The method of claim 1, wherein determining an edge extension value for the initial bounding box from the location information comprises:

determining size information of the initial bounding box according to the position information; and

determining an edge extension value for the initial bounding box based on the size information and a predetermined extension coefficient.

3. The method of claim 2, wherein the determining size information of the initial bounding box from the location information comprises:

and determining the perimeter of the initial surrounding frame and the area of the initial surrounding frame according to the position information.

4. The method of claim 1, wherein extending the edges of the initial bounding box according to the edge extension value, resulting in a bounding box for identifying the text comprises:

extending the edge of the initial bounding box based on the edge extension value to obtain position information of the extended edge; and

and determining the position information of the surrounding frame for identifying the characters according to the position information of the extended rear edge.

5. The method of claim 1, further comprising:

and detecting the image to be processed by adopting a text detection algorithm based on segmentation to obtain a binary image representing the region of the character.

6. The method of claim 1, wherein:

the detecting a binary image representing a region where characters are located in the image to be processed to obtain the contour information of the region where the characters are located comprises: calling a contour detection function in a computer vision library to detect the binary image to obtain the contour information;

the determining, according to the contour information, the position information of the initial bounding box for the text includes: and calling a minimum circumscribed rectangle function in the computer vision library to determine the position information of the initial bounding box according to the contour information.

7. A text detection apparatus comprising:

the image detection module is used for detecting a binary image representing the region where the characters are located in the image to be processed to obtain the contour information of the region where the characters are located;

the position determining module is used for determining the position information of the initial surrounding frame aiming at the characters according to the outline information;

a value determination module to determine an edge extension value for the initial bounding box according to the location information; and

and the position obtaining module is used for prolonging the edge of the initial surrounding frame according to the edge extension value to obtain the position information of the surrounding frame for identifying the characters.

8. The apparatus of claim 7, wherein the value determination module comprises:

the size determining submodule is used for determining the size information of the initial enclosing frame according to the position information; and

and the extension value determining submodule is used for determining an edge extension value aiming at the initial surrounding frame according to the size information and a preset extension coefficient.

9. The apparatus of claim 8, wherein the sizing submodule is to:

10. The apparatus of claim 7, wherein the location obtaining module comprises:

the edge extension submodule is used for extending the edge of the initial surrounding frame based on the edge extension value to obtain the position information of the extended edge; and

and the position determining submodule is used for determining the position information of the enclosing frame for identifying the characters according to the position information of the extended rear edge.

11. The apparatus of claim 7, further comprising:

and the binary image obtaining module is used for detecting the image to be processed by adopting a text detection algorithm based on segmentation to obtain a binary image representing the region where the character is located.

12. The apparatus of claim 7, wherein:

the graph detection module is to: calling a contour detection function in a computer vision library to detect the binary image to obtain the contour information;

the location determination module is to: and calling a minimum area rectangular function in the computer vision library to determine the position information of the initial bounding box according to the contour information.

13. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-6.

14. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any of claims 1-6.

15. A computer program product comprising computer programs/instructions which, when executed by a processor, implement the steps of the method according to any one of claims 1 to 6.