CN114387605A

CN114387605A - Text detection method and device, electronic equipment and storage medium

Info

Publication number: CN114387605A
Application number: CN202210032955.XA
Authority: CN
Inventors: 黄聚; 谢群义; 李煜林; 钦夏孟; 姚锟
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-01-12
Filing date: 2022-01-12
Publication date: 2022-04-22

Abstract

The present disclosure provides a text detection method, an apparatus, an electronic device, and a storage medium, and relates to the field of image processing and pattern recognition, and in particular, to a text detection method, an apparatus, an electronic device, and a storage medium. The specific implementation scheme is as follows: acquiring a target image, wherein the target image comprises a target text to be detected; determining an external rectangle of a target text and a plurality of first corner points of a target quadrangle, wherein the external rectangle is a minimum positive rectangle externally connected with the target text, and the target quadrangle is determined by a plurality of target points on the target text and comprises the target text; and detecting target position information of a plurality of first corner points based on the circumscribed rectangle.

Description

Text detection method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of image processing and pattern recognition technologies, and in particular, to a text detection method and apparatus, an electronic device, and a storage medium.

Background

In the related art, when text detection is performed, the Anchor-based method is directed to a text prediction result of a regular rectangle, and cannot cope with rotation of different degrees and irregular character shapes.

Disclosure of Invention

The present disclosure provides a method, apparatus, device, and storage medium for text detection.

According to an aspect of the present disclosure, there is provided a text detection method, including: acquiring a target image, wherein the target image comprises a target text to be detected; determining an external rectangle of a target text and a plurality of first corner points of a target quadrangle, wherein the external rectangle is a minimum positive rectangle externally connected with the target text, and the target quadrangle is determined by a plurality of target points on the target text and comprises the target text; and detecting target position information of a plurality of first corner points based on the circumscribed rectangle.

Optionally, the method includes detecting target position information of a plurality of first corner points based on a circumscribed rectangle, and the method includes: determining offset position information of each first corner point relative to a second corner point in the circumscribed rectangle; target position information for each first corner point is determined based on the offset position information.

Optionally, the target location information of each first corner point is determined based on the offset location information, and the method includes: and determining the target position information of each first corner point based on the offset position information and the size of the circumscribed rectangle.

Optionally, the target position information of each first corner point is determined based on the offset position information and the size of the circumscribed rectangle, and the method includes: and adjusting the offset position information based on the position information of the length, the width and the central point of the circumscribed rectangle to obtain the target position information of each first corner point.

Optionally, the method includes detecting target position information of a plurality of first corner points based on a circumscribed rectangle, and the method includes: processing the size of the external rectangle based on a first target model to obtain target position information of each first corner point, wherein the first target model is obtained by training the size of the external rectangle sample of the text sample in the target image sample and the quadrilateral sample of the text sample, the external rectangle sample is the smallest positive rectangle externally connected with the text sample, and the quadrilateral sample is determined by a plurality of target point samples on the text sample and comprises the text sample.

Optionally, the method further comprises: determining a plurality of anchor boxes corresponding to texts to which target texts belong; and detecting the text based on the anchor points respectively to obtain a plurality of target detection results, wherein the target detection results correspond to the anchor points one by one, and each target detection result is used for representing the detection result of one target text.

Optionally, the method further comprises: at least two target texts in the plurality of target texts corresponding to the plurality of target detection results are overlapped.

Optionally, the method further comprises: acquiring a plurality of feature maps of a target image; determining a plurality of anchor boxes corresponding to text to which the target text belongs includes: the size of the plurality of anchor boxes is determined based on the size of each feature map.

Optionally, the target text is detected based on a plurality of anchor boxes respectively to obtain a plurality of target detection results, and the method includes: and detecting the target text based on each anchor point box and the corresponding circumscribed rectangle of the target text to obtain each target detection result.

Optionally, the method comprises: the circumscribed rectangle is not coincident with the target quadrilateral.

Optionally, the method comprises: the target text is a line of text.

According to another aspect of the present disclosure, there is provided a text detection apparatus including: the device comprises an acquisition unit, a processing unit and a display unit, wherein the acquisition unit is used for acquiring a target image, and the target image comprises a target text to be detected; the determining unit is used for determining a circumscribed rectangle of the target text and a plurality of first corner points of a target quadrangle, wherein the circumscribed rectangle is a minimum positive rectangle circumscribed to the target text, and the target quadrangle is determined by a plurality of target points on the target text and comprises the target text; and the detection unit is used for detecting the target position information of the plurality of first corner points based on the circumscribed rectangle.

Optionally, the detection unit comprises: the first determining module is used for determining offset position information of each first corner point relative to a second corner point in the circumscribed rectangle; a second determining module 92 for determining the target location information based on the offset location information.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the text detection method of the embodiments of the present disclosure.

According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of text detection of the embodiments of the present disclosure.

According to another aspect of the present disclosure, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the method of text detection of an embodiment of the present disclosure.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

FIG. 1 is a flow chart of a text detection method according to an embodiment of the present disclosure;

FIG. 2 is a schematic diagram of a commonly used text detection network according to an embodiment of the present disclosure;

FIG. 3 is a schematic diagram of a flowchart for anchor-based four corner point modeling in accordance with an embodiment of the present disclosure;

FIG. 4 is a schematic diagram of a text detection network based on four-corner modeling in an embodiment of the present disclosure;

FIG. 5 is a schematic diagram of an application effect of a text detection network based on four-corner modeling according to an embodiment of the present disclosure;

FIG. 6 is a diagram of text detection effect of EAST in the related art;

FIG. 7 is a diagram illustrating text detection effects according to an embodiment of the present disclosure;

FIG. 8 is a schematic diagram of a text detection device according to an embodiment of the present disclosure;

fig. 9 is a schematic diagram of a structure of a detection unit provided in a text detection apparatus according to an embodiment of the present disclosure;

fig. 10 is a schematic block diagram of an electronic device in accordance with an embodiment of the disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

The following describes a text detection method according to an embodiment of the present disclosure.

Generally, word spotting is generally divided into segmentation-based schemes and detection-based free-anchor text (anchor-free) schemes, as follows:

based on the segmentation: the response of each pixel point in the picture to the characters is directly predicted, so that the problem of long text can be well solved, such as a database network (DBNet) and the like;

based on the detected anchor-free protocol: when each grid point in the picture has a character response, four corner points of the characters, such as EAST, are predicted.

The method also adopts a scheme of directly adopting universal target detection, the universal target detection is mainly divided into an anchor-free class and an anchor-based class, the anchor-free class has a simple structure and is more flexible, and the anchor-based class introduces an anchor frame (anchor) prior to guide detection. The difference between the two methods is mainly that whether Anchor priori knowledge is utilized to reduce the prediction difficulty when a final detection frame is predicted, and the following modeling modes of several typical methods are as follows:

the Anchor free class includes: upper left corner + lower right corner (CornerNet/CornerNet-lite), 4 extreme Points + center point (extreme Net), 9 learned adaptive jitter sampling Points (RePoints), center point + width + height CeterNet object as Points, center point + 2 distances FCOS (FCOS) to frame;

anchor based includes: center point + width + height (fasterncn), center point + width + height (SSD), center point + width + height (YoloV 3).

However, for the segmentation-based scheme, if the text overlaps, the response of the pixel point to the text cannot distinguish which field it belongs to; the Anchor-free method may be difficult to process the occlusion problem due to the distinguishing problem of positive and negative samples, such as east; the Anchor-based is a regular rectangle of the prediction result, and cannot cope with rotation of different degrees and irregular character shapes. But can rely on a different anchor to solve the text overlap problem.

Fig. 1 is a flow chart of a text detection method according to an embodiment of the present disclosure, which may include the following steps, as shown in fig. 1:

step S102, a target image is obtained, wherein the target image comprises a target text to be detected.

In the technical solution provided in the above step S102 of the present disclosure, a target image is obtained, where the target image includes a target text to be detected, for example, when a medical insurance reimbursement is performed, a medical clinic charging bill needs to be provided, and text information such as personal information and charging items on the bill is scanned by using a character recognition device, so as to implement a rapid reimbursement process.

Alternatively, in this embodiment, the target image may be various tickets.

And step S104, determining a circumscribed rectangle of the target text and a plurality of first corner points of a target quadrangle, wherein the circumscribed rectangle is a minimum positive rectangle circumscribed to the target text, and the target quadrangle is determined by a plurality of target points on the target text and comprises the target text.

In the technical solution provided in the above step S104 of the present disclosure, a plurality of first corners of the circumscribed rectangle of the target text and the target quadrangle are determined, for example, in the process of text detection, after the text content of the ticket is obtained, since there is an overlapping portion in the text portions such as personal information, a charge item, and a charge amount on the ticket, or the text distortion due to a shooting angle, it is necessary to position and detect information of different classifications on the ticket image, and by determining the circumscribed rectangle of the ticket text and the plurality of corners of the quadrangle including the text, the positioning and detection of the text information can be achieved.

In this embodiment, the circumscribed rectangle is a minimum positive rectangle circumscribing the target text, for example, in a medical clinic charging bill, the information in the column of the charging item is a text of one or more lines, and therefore, the target text to be detected can be extracted through the minimum positive rectangle circumscribing the text of one or more lines.

In this embodiment, the target quadrangle is determined by a plurality of target points on the target text and includes the target text, for example, when the target text to be detected is determined to be information of a field of a toll item on a medical clinic toll ticket by a minimum rectangle, but due to improper operation of photographing or a part of the ticket where text overlaps, the desired target text cannot be detected by the minimum rectangle, and by adding a quadrangle on the basis of circumscribing the minimum rectangle, detection of distorted or inclined text can be realized.

Optionally, in this embodiment, before determining the plurality of first corners of the circumscribed rectangle and the target quadrangle of the target text, a plurality of anchor boxes corresponding to the text to which the target text belongs may be determined by using an anchor-based method, and then the text is detected based on the plurality of anchor boxes, so as to obtain a plurality of target detection results.

And step S106, detecting target position information of a plurality of first corner points based on the circumscribed rectangle.

In the technical solution provided in the above step S106 of the present disclosure, target position information of a plurality of first corner points is detected based on the external rectangle, for example, in order to accurately and efficiently detect a bill text, four corner points may be determined based on a minimum external regular rectangle based on the target text, and then a quadrangle obtained by sequentially connecting the four corner points is determined, and the quadrangle may implement more accurate detection of the target text in the minimum external regular rectangle.

Optionally, in this embodiment, the target position information of each first corner point is determined based on the offset position information by determining the offset position information of each first corner point with respect to the second corner point in the circumscribed rectangle.

Optionally, in this embodiment, the size of the circumscribed rectangle is processed based on a first target model to obtain target position information of each first corner point, where the first target model is obtained by training a size of a circumscribed rectangle sample of the text sample in the target image sample and a quadrilateral sample of the text sample, the circumscribed rectangle sample is a smallest positive rectangle circumscribed to the text sample, and the quadrilateral sample is determined by a plurality of target point samples on the text sample and includes the text sample.

Acquiring a target image through the steps S102 to S106, wherein the target image comprises a target text to be detected; determining an external rectangle of a target text and a plurality of first corner points of a target quadrangle, wherein the external rectangle is a minimum positive rectangle externally connected with the target text, and the target quadrangle is determined by a plurality of target points on the target text and comprises the target text; the target position information of the first corner points is detected based on the external rectangle, namely, the position information of the corner points is detected based on the minimum external regular rectangle of the text and the minimum quadrangle of the text, and the text can be effectively detected even if the text rotates to different degrees and is in a character shape with an irregular shape, so that the efficiency of detecting the text is improved, the technical problem of low efficiency of detecting the text is solved, and the technical effect of improving the efficiency of detecting the text is achieved. .

The above-described method of this embodiment is described in further detail below.

As an alternative implementation manner, in step S106, the detecting the target position information of the plurality of first corner points based on the circumscribed rectangle includes: determining offset position information of each first corner point relative to a second corner point in the circumscribed rectangle; target position information for each first corner point is determined based on the offset position information.

In this embodiment, the offset position information of each first corner point relative to the second corner point in the circumscribed rectangle is determined, for example, on the basis of predicting 4 regression values (dx, dy, dw, dh) on the basis of the original Anchor-based basis, 8 predicted values are newly added, wherein each 2 is 1 group, and represents the relative position of the x and y coordinates of each corner point in the minimum circumscribed rectangle, and then the offset of the four corner points in the original circumscribed rectangle is predicted through learning of a trained convolutional neural network.

In this embodiment, the target position information of each first corner point is determined based on the offset position information, for example, the target position information of each first corner point is determined by learning the convolutional neural network to make the offset in the offset position information be within a target threshold range meeting the requirement.

In this embodiment, the second corner point in the circumscribed rectangle may be the top left corner point of the smallest circumscribed rectangle of the target text.

As an optional implementation, determining the target location information of each first corner point based on the offset location information includes: and determining the target position information of each first corner point based on the offset position information and the size of the circumscribed rectangle.

In this embodiment, the target position information of each first corner point is determined based on the offset position information and the size of the circumscribed rectangle, for example, by determining the offset position information of each first corner point relative to the top-left corner point in the circumscribed rectangle and the size of the smallest circumscribed rectangle, wherein the offset position information is the horizontal axis and the vertical axis coordinates of each first corner point in the grid, and further determining the target position information of each first corner point.

In this embodiment, the offset position information may include the horizontal axis and vertical axis coordinates of each first corner point in the grid.

In this embodiment, the offset position information may be determined according to the following formula:

wherein Gxi and Gyi respectively represent the coordinates of 4 corner points of the real detection frame, and p_xi、p_yiRespectively represent the predicted values of the network, and Dxi respectively represent the deviation of the coordinates of the four corners of the network relative to the coordinates of the upper left corner of the minimum circumscribed positive rectangle.

In this embodiment, the dimensions of the circumscribed rectangle may be the length and width of the smallest circumscribed rectangle.

As an optional implementation manner, determining the target position information of each first corner point based on the offset position information and the size of the circumscribed rectangle includes: and adjusting the offset position information based on the position information of the length, the width and the central point of the circumscribed rectangle to obtain the target position information of each first corner point.

In this embodiment, the offset position information is adjusted based on the position information of the length, the width, and the center point of the circumscribed rectangle to obtain the target position information of each first corner point, for example, the coordinates of the length, the width, and the center point of the circumscribed rectangle that are the smallest are obtained by modeling with an Anchor-based method, and the target position information of each first corner point is obtained by network prediction according to the coordinates and the offset of the horizontal axis and the vertical axis of each first corner point in the grid.

In this embodiment, the length, width, and center point position information of the circumscribed rectangle may be obtained by Anchor-based modeling, as follows:

G_x＝σ(d_x)+C_x

G_y＝σ(d_y)+C_y

wherein, Gx and Gy represent the central point coordinates x and y of the real detection frame, Gw and Gh represent the width and height of the real detection frame, Cx and Cy represent the horizontal and vertical coordinates of the upper left corner point of the grid where the anchor is located, and Pw and Ph represent the width and height of the anchor. dx, dy, dw, dh represent the predicted values of the network, respectively.

As an alternative implementation manner, in step S106, the detecting the target position information of the plurality of first corner points based on the circumscribed rectangle includes: processing the size of the external rectangle based on a first target model to obtain target position information of each first corner point, wherein the first target model is obtained by training the size of the external rectangle sample of the text sample in the target image sample and the quadrilateral sample of the text sample, the external rectangle sample is the smallest positive rectangle externally connected with the text sample, and the quadrilateral sample is determined by a plurality of target point samples on the text sample and comprises the text sample.

In this embodiment, the size of the circumscribed rectangle is processed based on the first target model to obtain the target position information of each first corner point, for example, the size of the minimum circumscribed regular rectangle of the text is adjusted to adapt to the text through four-corner modeling, so as to obtain the target position information of each first corner point, so that the target position information is more accurately positioned and the text is detected.

In this embodiment, the first target model may be a four corner point model, and the first target model may be determined according to the following formula:

As an optional implementation, the method further comprises: determining a plurality of anchor boxes corresponding to texts to which target texts belong; and detecting the text based on the anchor points respectively to obtain a plurality of target detection results, wherein the target detection results correspond to the anchor points one by one, and each target detection result is used for representing the detection result of one target text.

In this embodiment, a plurality of anchor boxes corresponding to the text to which the target text belongs are determined, for example, when modeling is performed based on anchor-based, the text is selected and recognized through the anchor boxes, a minimum bounding rectangle of the text is further established to adapt to the size of the text, and the text in the minimum bounding rectangle is detected.

In this embodiment, the text is detected based on a plurality of anchor boxes respectively to obtain a plurality of target detection results, where the plurality of target detection results correspond to the plurality of anchor boxes one to one, and each target detection result is used to represent a detection result of one target text, for example, after a feature map of the target text is obtained through a convolutional neural network, a prediction for each feature point is not only one target, but also has a plurality of targets, and corresponds to different anchor boxes.

As an optional implementation manner, at least two target texts in the multiple target texts corresponding to the multiple target detection results are overlapped.

In this embodiment, at least two target texts in the multiple target texts corresponding to the multiple target detection results are overlapped, for example, text portions such as personal information, a charge item, a charge amount and the like on a bill due to shooting and the like have an overlapping portion, so that detection of the condition that at least two target texts in the multiple target texts corresponding to the multiple target detection results are overlapped is realized through four-corner modeling.

As an optional implementation, the method further comprises: acquiring a plurality of feature maps of a target image; determining a plurality of anchor boxes corresponding to text to which the target text belongs includes: the size of the plurality of anchor boxes is determined based on the size of each feature map.

In this embodiment, a plurality of feature maps of the target image are obtained, for example, when text detection is performed by the trained convolutional neural network, 3 layers of feature maps are output according to the target detection network, and the feature maps of each layer have different sizes.

In this embodiment, determining a plurality of anchor boxes corresponding to texts to which the target text belongs includes: the sizes of the anchor boxes are determined based on the size of each feature map, for example, there may be a plurality of anchor boxes with different sizes on each feature point on each feature map, so that characters with different sizes may be allocated to anchor boxes with different sizes.

As an optional implementation manner, detecting the target text based on the anchor boxes respectively, and obtaining a plurality of target detection results includes: and detecting the target text based on each anchor point box and the corresponding circumscribed rectangle of the target text to obtain each target detection result.

In this embodiment, the target text is detected based on each Anchor box and the circumscribed rectangle of the corresponding target text to obtain each target detection result, for example, when the minimum circumscribed rectangle is obtained by the Anchor-based method modeling, the length, width, and center point coordinates of the circumscribed rectangle of the target text can be obtained by network prediction.

In this embodiment, the target detection result may be obtained by the following formula:

G_x＝σ(d_x)+C_x

G_y＝σ(d_y)+C_y

wherein, Gx and Gy represent the central point coordinates x and y of the real detection frame, Gw and Gh represent the width and height of the real detection frame, Cx and Cy represent the horizontal and vertical coordinates of the upper left corner point of the grid where the anchor is located, and Pw and Ph represent the width and height of the anchor. dx, dy, dw, dh represent the predicted values of the network, i.e., the target detection results, respectively.

As an alternative embodiment, the circumscribed rectangle is not coincident with the target quadrilateral.

In this embodiment, the circumscribed rectangle is not coincident with the target quadrilateral, e.g., there is slight rotation and distortion of the smallest quadrilateral.

As an alternative embodiment, the target text is a line of text.

In this embodiment, the target text is a text line, such as a text line in a billing item on a hospital clinic billing ticket.

The embodiment determines the offset position information of each first corner point relative to a second corner point in the circumscribed rectangle; determining target position information of each first corner point based on the offset position information; processing the size of an external rectangle based on a first target model to obtain target position information of each first corner point, wherein the first target model is obtained by training the size of an external rectangle sample based on a text sample in a target image sample and a quadrilateral sample of the text sample, the external rectangle sample is a minimum positive rectangle externally connected with the text sample, and the quadrilateral sample is determined by a plurality of target point samples on the text sample and comprises the text sample; determining a plurality of anchor boxes corresponding to texts to which target texts belong; the text is detected based on the anchor points respectively to obtain a plurality of target detection results, wherein the target detection results correspond to the anchor points one by one, and each target detection result is used for representing the detection result of one target text, namely, the position information of the corner points is detected based on the minimum external regular rectangle of the text and the minimum quadrangle of the text.

The face recognition method of the present disclosure is further described below with reference to preferred embodiments.

Fig. 2 is a schematic diagram of a commonly used text detection network according to an embodiment of the present disclosure, which may include the following, as shown in fig. 2:

firstly, inputting an original picture, obtaining a series of candidate frame sets through a series of Convolutional Neural Networks (CNN for short), wherein each candidate frame comprises a category and a position to which the candidate frame belongs, and finally obtaining a final detection frame through Non-Maximum Suppression (NMS for short).

In the process of detection modeling, there are generally two ways to represent the position of an object (target) in an image:

(x, y, w, h): x and y refer to the horizontal and vertical coordinates of the center point of the object, and w and h represent the width and height of the object; (x)_min,y_min,x_max,y_max): i.e. the upper left corner point and the lower right corner point of the object.

In a general modeling manner, the network usually does not directly predict the 4 values of x, y, w and h, because the 4 values are absolute values in the open world, which has considerable prediction difficulty; but the relative values of dx, dy, dw, dh and the like can be predicted and converted into real values x, y, w and h through a predefined formula, and the conversion of the real values and the predicted values is the modeling of detection.

In this embodiment, a text image to be detected is input into the CNN to obtain a series of candidate frame (anchor frame) sets, and a final detection frame is obtained through a non-maximum suppression operation, so that text detection is realized.

Fig. 3 is a schematic diagram of a flowchart of anchor-based four-corner point modeling according to an embodiment of the disclosure, and as shown in fig. 3, the flowchart may include the following steps:

in step S302, 4 regression values (dx, dy, dw, dh) are predicted based on the Anchor-based model.

In the technical solution provided in the above step S302 of the present disclosure, in the Anchor-based methods, the methods for modeling a regular rectangle are all similar, taking yolo v3 as an example, and the following formula is:

G_x＝σ(d_x)+C_x

G_y＝σ(d_y)+C_y

wherein, Gx and Gy respectively represent the central point coordinates x and y of the real detection frame, Gw and Gh respectively represent the width and height of the real detection frame, Cx and Cy respectively represent the horizontal and vertical coordinates of the upper left corner point of the grid where the anchor is located, Pw and Ph respectively represent the width and height of the anchor, and dx, dy, dw and dh respectively represent the predicted values of the network.

And step S304, on the basis of the positive rectangle prediction, modeling is carried out by connecting four corner points.

In the technical solution provided in the above step S304 of the present disclosure, on the basis of predicting 4 regression values (dx, dy, dw, dh) by the original Anchor-based, 8 prediction values are newly added, each 2 is 1 group, and each prediction value represents the relative position of the x and y coordinates of each corner point in the minimum circumscribed rectangle. As follows:

wherein Gxi and Gyi respectively represent the coordinates of 4 corner points of the real detection frame, and p_xi、p_yiRespectively representing the predicted values of the network, and respectively representing the deviation of the coordinates of the four corners relative to the coordinates of the upper left corner of the minimum circumscribed rectangle, wherein the four corners modeling is used for substantially predicting the deviation of the four corners in the original circumscribed rectangle.

Through steps S302 to S304 in this embodiment, 4 regression values (dx, dy, dw, dh) are predicted based on the Anchor-based model, and then four corner points are modeled based on the prediction of the regular rectangles, that is, the position information of the corner point is detected based on the minimum external regular rectangle of the text and the minimum quadrangle of the text, and even if the text has rotation of different degrees and a character shape of an irregular shape, the detection can be effectively performed, so that the efficiency of detecting the text is improved, thereby solving the technical problem of low efficiency of detecting the text, and achieving the technical effect of improving the efficiency of detecting the text.

Fig. 4 is a schematic diagram of a text detection network based on four-corner modeling according to an embodiment of the disclosure, and as shown in fig. 4, the network may include the following:

compared with a general character detection network, the Feature map layer outputs 3 layers according to the target detection network, and each layer of Feature map has different sizes. In the feature map of each layer, each feature point (feature point) prediction is not only one target, but also has a plurality of targets, and the targets are distributed to different anchors to enable the targets to have the capability of detecting overlapped texts.

Fig. 5 is a schematic diagram of an application effect of a text detection network based on four-corner modeling according to an embodiment of the present disclosure, and as shown in fig. 5, the effect diagram may include the following contents:

in the figure, from the central point of the rectangle of the detection frame to the outside, the following are sequentially performed: minimum quadrilateral (four corners), minimum circumscribed regular rectangle, original anchor box (anchor).

Fig. 6 is a diagram of text detection effect of EAST in the related art, as shown in fig. 6, when there is a text response at each grid point in the picture, four corner points of the text are predicted, but it is difficult to deal with the occlusion problem due to the problem of distinguishing between positive and negative samples, and in addition, multiple anchor blocks for the same text line may occur, which may result in inaccurate text detection, for example, the text line "medical institution" in fig. 6 is selected by two anchor blocks for detection.

Fig. 7 is a text detection effect diagram according to an embodiment of the present disclosure, as shown in fig. 7, based on the prediction of the positive rectangle, four corner point modeling is connected, and the model predicts the minimum external positive rectangle and the minimum quadrangle (four corner points) of the text line at the same time, so that the text line can cope with different degrees of rotation and irregular shapes of the text, for example, in the text detection effect diagram shown in fig. 7, there is no problem that the text line "medical institution" in fig. 6 is selected by two anchor points for detection.

The embodiment of the disclosure also provides a text detection device for executing the embodiment shown in fig. 1.

Fig. 8 is a schematic diagram of a text detection apparatus according to an embodiment of the present disclosure, and as shown in fig. 8, the text detection apparatus 80 may include: an acquisition unit 81, a determination unit 82, and a detection unit 83.

An acquiring unit 81, configured to acquire a target image, where the target image includes a target text to be detected;

a determining unit 82, configured to determine a circumscribed rectangle of the target text and a plurality of first corner points of a target quadrangle, where the circumscribed rectangle is a smallest positive rectangle circumscribed to the target text, and the target quadrangle is determined by a plurality of target points on the target text and includes the target text;

the detecting unit 83 is configured to detect target position information of a plurality of first corner points based on the circumscribed rectangle.

The embodiment of the disclosure also provides a detection unit for being arranged in the text detection device according to the embodiment of the disclosure shown in fig. 4.

Fig. 9 is a detection unit provided in a text detection apparatus according to an embodiment of the present disclosure, as shown in fig. 9, the detection unit including: a first determination module 91 and a second determination module 92.

A first determining module 91, configured to determine offset position information between each first corner point with respect to a second corner point in the circumscribed rectangle.

The second determination module 92 is configured to determine target location information based on the offset location information.

Optionally, the second determining module 92 comprises: a first determining subunit, wherein the first determining subunit includes a first determining submodule, and wherein the second detecting first submodule includes a second detecting first submodule first unit. The first determining subunit is configured to determine, based on the offset position information and the size of the circumscribed rectangle, target position information of each first corner point; and the first determining submodule is used for adjusting the offset position information based on the length, the width and the position information of the central point of the circumscribed rectangle to obtain the target position information of each first angular point.

Optionally, the detection unit 83 further comprises a processing module. The processing module is used for processing the size of the external rectangle based on a first target model to obtain target position information of each first corner point, wherein the first target model is obtained by training the size of the external rectangle sample of the text sample in the target image sample and a quadrilateral sample of the text sample, the external rectangle sample is a minimum positive rectangle externally connected with the text sample, and the quadrilateral sample is determined by a plurality of target point samples on the text sample and comprises the text sample.

Alternatively, the detection unit 83 includes: the third confirms module and detection module, wherein, detection module includes: a second determining subunit and a detecting subunit, the third determining module comprising: an acquisition subunit and a third determination subunit. The third determining module is used for determining a plurality of anchor points corresponding to the text to which the target text belongs; the detection module is used for detecting the text based on the anchor boxes respectively to obtain a plurality of target detection results, wherein the target detection results correspond to the anchor boxes one by one, and each target detection result is used for representing the detection result of one target text; the second determining subunit is used for determining that at least two target texts in the target texts corresponding to the target detection results are overlapped, and the obtaining subunit is used for obtaining a plurality of feature maps of the target image; a third determining subunit, configured to determine that a plurality of anchor boxes corresponding to texts to which the target text belongs includes: determining the size of a plurality of anchor boxes based on the size of each feature map; and the detection subunit is used for detecting the target text based on each anchor point frame and the corresponding circumscribed rectangle of the target text to obtain each target detection result.

Optionally, the apparatus further comprises: a fourth determination module and a fifth determination module. The fourth determining module is used for determining that the circumscribed rectangle is not coincident with the target quadrangle; and the fifth determining module is used for determining the target text as a text line.

In the text detection apparatus of this embodiment, a target image is acquired by the acquisition unit 81, wherein the target image includes a target text to be detected; the determining unit 82 determines a circumscribed rectangle of the target text and a plurality of first corner points of a target quadrangle, wherein the circumscribed rectangle is a minimum positive rectangle circumscribed to the target text, and the target quadrangle is determined by a plurality of target points on the target text and comprises the target text; the detection unit 83 detects the target position information of the first corner points based on the external rectangle, so that the efficiency of detecting the text is improved, the technical problem of low efficiency of detecting the text is solved, and the technical effect of improving the efficiency of detecting the text is achieved.

In the technical scheme of the disclosure, the acquisition, storage, application and the like of the personal information of the related user all accord with the regulations of related laws and regulations, and do not violate the good customs of the public order.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

Embodiments of the present disclosure provide an electronic device, which may include: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the text detection method of the embodiments of the present disclosure.

Optionally, the electronic device may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.

Alternatively, in the present embodiment, the above-mentioned nonvolatile storage medium may be configured to store a computer program for executing the steps of:

step S102, acquiring a target image, wherein the target image comprises a target text to be detected;

step S104, determining an external rectangle of the target text and a plurality of first corner points of a target quadrangle, wherein the external rectangle is a minimum positive rectangle externally connected with the target text, and the target quadrangle is determined by a plurality of target points on the target text and comprises the target text;

Alternatively, in the present embodiment, the non-transitory computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

According to an embodiment of the present disclosure, the present disclosure also provides a computer program product comprising a computer program which, when executed by a processor, realizes the steps of:

Fig. 10 is a schematic block diagram of an electronic device in accordance with an embodiment of the disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

FIG. 10 illustrates a schematic block diagram of an example electronic device 1000 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 10, the apparatus 1000 includes a computing unit 1001 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)1002 or a computer program loaded from a storage unit 1008 into a Random Access Memory (RAM) 1003. In the RAM 1003, various programs and data necessary for the operation of the device 1000 can also be stored. The calculation unit 1001, the ROM 1002, and the RAM 1003 are connected to each other by a bus 1004. An input/output (I/O) interface 1005 is also connected to bus 1004.

A number of components in device 1000 are connected to I/O interface 1005, including: an input unit 1006 such as a keyboard, a mouse, and the like; an output unit 1007 such as various types of displays, speakers, and the like; a storage unit 1008 such as a magnetic disk, an optical disk, or the like; and a communication unit 1009 such as a network card, a modem, a wireless communication transceiver, or the like. The communication unit 1009 allows the device 1000 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

Computing unit 1001 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 1001 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 1001 performs the respective methods and processes described above, for example, the method detects target position information of a plurality of first corner points based on a circumscribed rectangle. For example, in some embodiments, the method of detecting target location information of a plurality of first corner points based on a circumscribed rectangle may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as the storage unit 1008. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 1000 via ROM 1002 and/or communications unit 1009. When the computer program is loaded into the RAM 1003 and executed by the computing unit 1001, one or more steps of the method described above for detecting target position information of a plurality of first corner points based on a circumscribed rectangle may be performed. Alternatively, in other embodiments, the calculation unit 1001 may be configured by any other suitable means (e.g. by means of firmware) to perform the method of detecting target position information of the plurality of first corner points based on the circumscribed rectangle.

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A text detection method, comprising:

acquiring a target image, wherein the target image comprises a target text to be detected;

determining a circumscribed rectangle of the target text and a plurality of first corner points of a target quadrangle, wherein the circumscribed rectangle is a minimum positive rectangle circumscribed to the target text, and the target quadrangle is determined by a plurality of target points on the target text and comprises the target text;

and detecting target position information of the plurality of first corner points based on the circumscribed rectangle.

2. The method of claim 1, wherein detecting target location information for the plurality of first corner points based on the circumscribed rectangle comprises:

determining offset position information of each first corner point relative to a second corner point in the circumscribed rectangle;

and determining target position information of each first corner point based on the offset position information.

3. The method of claim 2, determining target location information for each of the first corner points based on the offset location information comprises:

and determining the target position information of each first corner point based on the offset position information and the size of the circumscribed rectangle.

4. The method of claim 3, wherein determining the target location information for each of the first corner points based on the offset location information and the dimensions of the circumscribed rectangle comprises:

and adjusting the offset position information based on the length, the width and the position information of the central point of the circumscribed rectangle to obtain the target position information of each first angular point.

5. The method of claim 1, detecting target location information for the plurality of first corner points based on the circumscribed rectangle comprising:

processing the size of the external rectangle based on a first target model to obtain the target position information of each first corner point, wherein the first target model is obtained by training the size of an external rectangle sample of a text sample in a target image sample and a quadrilateral sample of the text sample, the external rectangle sample is a minimum positive rectangle externally connected with the text sample, and the quadrilateral sample is determined by a plurality of target point samples on the text sample and comprises the text sample.

6. The method of claim 1, further comprising:

determining a plurality of anchor boxes corresponding to texts to which the target texts belong;

and detecting the text based on the anchor boxes respectively to obtain a plurality of target detection results, wherein the target detection results correspond to the anchor boxes one by one, and each target detection result is used for representing a detection result of the target text.

7. The method of claim 6, wherein at least two of the plurality of target texts corresponding to the plurality of target detection results overlap.

8. The method of claim 6, wherein,

the method further comprises the following steps: acquiring a plurality of feature maps of the target image;

determining a plurality of anchor boxes corresponding to text to which the target text belongs comprises: determining the size of the anchor boxes based on the size of each feature map.

9. The method of claim 6, wherein detecting the target text based on the anchor boxes, respectively, and obtaining a plurality of target detection results comprises:

and detecting the target text based on each anchor point box and the corresponding circumscribed rectangle of the target text to obtain each target detection result.

10. The method of any of claims 1-9, the circumscribed rectangle being non-coincident with the target quadrilateral.

11. The method of any one of claims 1 to 9, wherein the target text is a line of text.

12. A text detection apparatus comprising:

the device comprises an acquisition unit, a processing unit and a display unit, wherein the acquisition unit is used for acquiring a target image, and the target image comprises a target text to be detected;

the determining unit is used for determining a circumscribed rectangle of the target text and a plurality of first corner points of a target quadrangle, wherein the circumscribed rectangle is a minimum positive rectangle circumscribed to the target text, and the target quadrangle is determined by a plurality of target points on the target text and comprises the target text;

and the detection unit is used for detecting the target position information of the plurality of first corner points based on the circumscribed rectangle.

13. The apparatus of claim 12, wherein the detection unit comprises:

the first determining module is used for determining offset position information of each first corner point relative to a second corner point in the circumscribed rectangle;

a second determining module to determine the target location information based on the offset location information.

14. The apparatus of claim 13, wherein the second determining means is configured to determine the target location information based on the offset location information by:

15. The apparatus of claim 14, wherein the second determining module is configured to determine the target location information for each of the first corner points based on the offset location information and the dimensions of the circumscribed rectangle by:

16. The apparatus of claim 12, wherein the detection unit comprises:

the processing module is configured to process the size of the external rectangle based on a first target model to obtain the target position information of each first corner point, where the first target model is obtained by training a size of an external rectangle sample based on a text sample in a target image sample and a quadrilateral sample of the text sample, the external rectangle sample is a smallest positive rectangle externally connected to the text sample, and the quadrilateral sample is determined by a plurality of target point samples on the text sample and includes the text sample.

17. The apparatus of claim 12, wherein the detection unit further comprises:

a third determining module, configured to determine a plurality of anchor boxes corresponding to texts to which the target texts belong;

and the detection module is used for detecting the text based on the anchor points respectively to obtain a plurality of target detection results, wherein the target detection results correspond to the anchor points one by one, and each target detection result is used for representing a detection result of the target text.

18. The apparatus of claim 17, wherein the detection module is configured to detect the text based on the anchor boxes, respectively, to obtain a plurality of target detection results by:

determining that at least two of the target texts corresponding to the target detection results overlap.

19. The apparatus of claim 17, wherein the third determination module is configured to determine a plurality of anchor boxes corresponding to text to which the target text belongs by:

acquiring a plurality of feature maps of the target image;

20. The apparatus of claim 17, wherein the detection module is configured to detect the target text based on the anchor boxes, respectively, to obtain a plurality of target detection results by:

21. The apparatus of claims 12 to 20, wherein the apparatus further comprises:

and the fourth determining module is used for determining that the circumscribed rectangle is not coincident with the target quadrangle.

22. The apparatus of claims 12 to 20, wherein the apparatus further comprises:

and the fifth determining module is used for determining that the target text is a text line.

23. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-11.

24. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-11.

25. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-11.