CN116453133B - Banner curve and key point-based banner text detection method and system - Google Patents
Banner curve and key point-based banner text detection method and system Download PDFInfo
- Publication number
- CN116453133B CN116453133B CN202310714974.5A CN202310714974A CN116453133B CN 116453133 B CN116453133 B CN 116453133B CN 202310714974 A CN202310714974 A CN 202310714974A CN 116453133 B CN116453133 B CN 116453133B
- Authority
- CN
- China
- Prior art keywords
- text
- points
- coordinate
- point
- key
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 78
- 238000000034 method Methods 0.000 claims abstract description 83
- 230000008569 process Effects 0.000 claims description 34
- 230000006870 function Effects 0.000 claims description 28
- 238000012549 training Methods 0.000 claims description 28
- 238000005452 bending Methods 0.000 claims description 25
- 230000004927 fusion Effects 0.000 claims description 18
- 238000005070 sampling Methods 0.000 claims description 18
- 238000004364 calculation method Methods 0.000 claims description 16
- 238000012360 testing method Methods 0.000 claims description 12
- 238000000605 extraction Methods 0.000 claims description 11
- 238000012545 processing Methods 0.000 claims description 7
- 230000004913 activation Effects 0.000 claims description 4
- 230000001276 controlling effect Effects 0.000 claims description 3
- 230000001105 regulatory effect Effects 0.000 claims description 3
- 238000007792 addition Methods 0.000 description 3
- 230000011218 segmentation Effects 0.000 description 3
- 238000004590 computer program Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 206010047571 Visual impairment Diseases 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000009432 framing Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012805 post-processing Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/18—Extraction of features or characteristics of the image
- G06V30/1801—Detecting partial patterns, e.g. edges or contours, or configurations, e.g. loops, corners, strokes or intersections
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/46—Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/19—Recognition using electronic means
- G06V30/191—Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G06V30/1918—Fusion techniques, i.e. combining data from various sources, e.g. sensor fusion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/42—Document-oriented image-based pattern recognition based on the type of document
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Artificial Intelligence (AREA)
- Image Generation (AREA)
Abstract
The invention discloses a method and a system for detecting a banner text based on Bezier curves and key points, wherein an initial text box of a text area is firstly generated according to image labels, then the number of long-edge coordinates of the initial text box is reduced by utilizing a fixed threshold value, the Bezier curves are generated based on the reduced long-edge coordinates, the two Bezier curves are connected end to form a new text box, the labels of the text box are converted into key point coordinates and the width of the key points from the boundary coordinates of the text box, a banner text detection network model is then constructed and trained, and finally the text in the banner image is detected by utilizing the trained banner text detection network model. The method solves the problem that the prior art cannot accurately frame the banner text, can improve the detection speed and use less data when completing text detection.
Description
Technical Field
The invention belongs to the technical field of natural scene text positioning, and particularly relates to a banner text detection method and system based on Bezier curves and key points.
Background
Along with the continuous development of computer vision, the target detection and the semantic segmentation are also continuously and iteratively updated. And dividing a text region and a non-text region in the banner by using technologies such as target detection, semantic segmentation and the like, so as to realize detection of the banner text and further identify the content of the text region.
By researching the text area of the banner image, the problems of large length-width ratio of the banner text, text distortion and the like are found. Although the target detection method has made a certain progress in processing such texts, there are still problems such as false detection, missed detection, and more non-text regions in the detected text. Meanwhile, the text detection method adopting semantic segmentation has some problems, such as complex post-processing, low detection speed, high requirements on hardware equipment and the like. Therefore, in order to solve these problems, a new text detection method is urgently required.
Disclosure of Invention
Aiming at the problem that the prior art cannot accurately frame a banner text, the invention provides a banner text detection method and system based on Bezier curves and key points.
In order to achieve the above object, the present invention provides a method for detecting a banner text based on a bezier curve and a key point, comprising the steps of:
Step 1, generating an initial text box of a text region according to labels of a public text data set, simplifying the number of long-side coordinates of the text box through a fixed threshold value, generating Bezier curves based on simplified long-side coordinate points, connecting two Bezier curves end to form a new text box, and converting the labels of the text box from boundary coordinate points of the text box to key point coordinates and the width of key points;
step 1.1, selecting images with special texts such as long texts, distorted texts and the like in a public text image data set as a data set, and generating an initial text box of a text area according to labels of the public text data set;
step 1.2, judging the bending degree of the long edge of the text box by adopting a fixed threshold value method;
step 1.3, selectively simplifying coordinate points of two long sides of the text box according to the bending degree of the two long sides of the text box;
step 1.4, taking coordinate points on two simplified long sides as control points of a Bezier curve, generating two corresponding Bezier curves, and connecting the two Bezier curves end to obtain a real boundary frame of the text;
step 1.5, converting labels of the public data set from text box boundary coordinate points into key point coordinates and the width of the key points;
Step 2, constructing a banner text detection network model;
step 3, training the banner text detection network model constructed in the step 2 by utilizing the key point data set obtained in the step 1;
and 4, detecting the text in the banner image by using the trained banner text detection network model.
In step 1.1, the labels of the common text data set are a plurality of groups of coordinates arranged clockwise, each group of coordinates is coordinates of boundary points of a text frame for framing the text, each group of coordinates is connected clockwise to form a closed polygon, an initial text frame of the text is obtained, and the number of boundary points of an image of the data set is set as followsSequentially select ∈>And (2) as an upper boundary point, after->And taking the connecting line of the upper boundary point and the connecting line of the lower boundary point as two long sides of the initial text box.
In step 1.2, the distance between the connecting line of the coordinates of the head and tail of the long edge of the text box in the data set and the distance between the other coordinate points on the long edge and the connecting line are compared through a fixed threshold value, and the bending degree of the two long edges of the text box is judged, namely:
(1)
in the method, in the process of the invention,indicating the degree of curvature of the long side of the text box, +.>Representing the ratio of the farthest distance from the coordinate point on the long side of the text box in the image data to the head-to-tail coordinate connecting line of the long side to the head-to-tail coordinate connecting line distance, when The ratio is greater than or equal to 0 and less than +.>When the ratio is equal to or greater than +.>And is less than->When the ratio is equal to or greater than +.>When the long side is judged to be completely bent, +.>、/>Is a set threshold.
In step 1.3, the distance from the coordinate point on the long side to the line connecting the head and tail coordinate points is set asThe head and tail coordinate points are respectively +.>、/>The specific simplifying process is as follows: when the long side is judged to be a straight line, only the head-tail coordinate points of the long side are reserved; when the long side is judged to be partially bent, a coordinate point farthest from the head-tail coordinate connecting line and a head-tail coordinate point are reserved; when it is determined that the long side is completely bent, a threshold value +.>0.1 time of the length of the head-tail coordinate connecting line, when +.>Is greater than->When the coordinate points are reserved, corresponding coordinate points are reserved, and other coordinate points are abandoned; is provided with->The maximum coordinate point is +.>Use->Dividing the curve into->,/>Two parts, repeating the above operation until no coordinate point to the connecting line distance is greater than +.>Until that point.
In step 1.4, the coordinate points on the reduced long side are used as control points of a bezier curve, the bezier curve is represented by a parameter curve based on a bernstein polynomial, and the specific definition is as follows:
(2)
(3)
In the method, in the process of the invention,coordinate set representing a point on the Bessel curve,/->Represents Bezier curve order, +.>Represent the firstCoordinates of the individual control points->Indicate->Bernstan polynomial of the individual control points, +.>Representing binomial coefficients,/->Time is expressed when coordinates of all points on the Bessel curve are corresponding, due to +.>Or 1->The value of (2) is 0, thus when +.>At the moment, the first coordinate point on the long side is selected as the position coordinate of the Bezier curve at the moment 0, when +.>And when the position coordinate of the Bezier curve at the moment 1 is selected as the last coordinate point on the long side.
Two Bezier curves are generated through the formula (2), and a closed polygon formed by connecting the two Bezier curves end to end is used as a real text box of the text example.
In step 1.5, the boundary points on the two long sides are converted into a group of key points to represent the text box, and before the key points are converted, the number of the boundary points on the upper and lower long sides of the text box is ensured to be consistent by adopting an upward compatible mode, and the specific steps are as follows: when the upper edge and the lower edge are respectively straight lines and partially bent, the middle point of the straight line edge is extracted as one boundary point, so that the boundary points of the upper edge and the lower edge are three; when the upper edge and the lower edge are respectively straight lines and completely bent, dividing the straight lines equally according to the number of coordinate points of the completely bent edges, and extracting equally divided coordinate points so that the number of boundary points of the upper edge and the lower edge is consistent; when the upper edge and the lower edge are respectively in partial bending and full bending, dividing the two curves of the partial bending edge equally according to the coordinate point quantity of the full bending edge minus the coordinate point quantity of the partial bending edge, and extracting the equally divided coordinate points, so that the boundary point quantity of the upper edge and the lower edge is consistent. After the number of the upper boundary points and the lower boundary points are unified through the operation, the boundary points are converted, coordinates of the upper edge and the lower edge are in one-to-one correspondence from the beginning to the end, the middle point coordinates of the corresponding coordinate points are taken as key point coordinates, one half of the distance of the corresponding coordinate points is taken as the width of the key points, and the labels in the public image text data set are converted into a group of key point coordinates and corresponding widths from the coordinate points of the boundary frames.
And, the banner text detection network model in the step 2 comprises a feature extraction module, a feature fusion module, a regression module and a text box generation module. And the feature extraction module is used for extracting feature information of different layers to obtain feature images containing semantic information from a lower layer to a higher layer. And the feature fusion module is used for combining the feature images of different layers to obtain a fused feature image which is used for detecting the banner text subsequently. And the regression module is used for regressing the shape of the text instance, and the coordinates of the key points and the width of the key points of the text instance. And the text box generation module is used for generating a banner image text box based on the key point coordinates and the width information in the current image.
The feature extraction module backbone network adopts a ResNet-50 model, and four feature images are sequentially obtained through channel increasing and downsampling processing after images are input into the ResNet-50 model、/>、/>、/>The channel number of four characteristic images with different scales obtained in a backbone network is uniformly processed to obtain +.>、/>、/>、/>Then from the lowest scale feature map +.>Starting up-sampling process, and performing feature map with the same scale as the input end of FPN structure ∈ ->Performing addition operation to obtain fused lower-scale characteristic image +. >For->Up-sampling and +.>Adding to obtain a fused low-scale characteristic image +.>Likewise for->Up-sampling and +.>Adding to obtain a fused characteristic image +.>Finally, the fused characteristic image is +.>、/>、/>、/>As an output of the FPN.
The feature fusion module is used for combining the fused feature images with different scales to obtain a combined fused feature imageThe specific calculation process is as follows:
(4)
in the method, in the process of the invention,indicating a channel connection-> And->Up-sampling by 2 times, 4 times and 8 times, respectively,>、、/>、/>and the characteristic images are fused.
Will fuse the feature imagesUp-sampling processing is performed so that +.>The same size as the original image.
The regression module comprises two parts, namely shape regression and key point regression, wherein the shape regression fuses the feature graphs through a convolution layer of an activation functionConverting into text shape feature map by setting threshold value +.>Binarizing the feature map, above threshold +.>Is a text region below a threshold +.>The area of the (2) is a background area, and a text shape binary image with the text separated from the background is obtained. Comparing the text outline shape in the binary image with the text box shape generated by the image key point label, and comparing the two shapes by the cross-correlation IOUAnd matching the text outline shape in the binary image with the text box shape generated by the image key point label. The input of the keypoint regression is the fusion profile +.>The output is the key point coordinates and width, including two branches, one of which is output +.>Zhang Guanjian dot heat map, < >>Selecting the ++of the highest score in the keypoint heat map for the most key points in the detected image text example>The highlighted coordinate points are the key point coordinates in the key point heat map, and are the key point coordinates corresponding to the key points of the text instance of the image, and the key points are the key points>For the number of text examples of the detected image, the number of key points of the text examples is insufficient, the number of the highlight coordinates is correspondingly reduced, and the output of the other branch detectionThe number of the key points of the text example is insufficient, and the residual width information is 0.
The text box generation module takes the coordinates of the key points and the width information output by the regression module as text instance information, and generates a text box by using the information. The width of the key point is the distance from the key point to the corresponding long-side coordinate point, the connecting line of two adjacent key points is used as the normal line of the connecting line of the key point and the long-side coordinate point, the key point extends upwards and downwards to be perpendicular to the normal line by the corresponding width distance of the key point, and the end point coordinate is the long-side coordinate point. And processing each coordinate point according to the operation to obtain two groups of long-side coordinate points with the same number as the key points, generating two Bezier curves by taking the long-side coordinate points as control points of the Bezier curves, and connecting the two Bezier curves end to obtain a completely closed curve frame, wherein the curve frame is a text frame of the text example. And finally, outputting the image of the framed text to realize text detection of the banner image.
In step 3, the key point data set obtained in step 1 is divided into a training set and a testing set, the training set is input into a banner text detection network model for iterative training, parameters of the banner text detection network model are updated, the loss function is minimized,and recording the accuracy of the test set test model, and storing the optimal model. The training process is divided into shape detection training and key point detection training, and corresponding loss functionsThe calculation method is as follows:
(5)
in the method, in the process of the invention,for the shape loss function +.>For the key point loss function +.>A weight factor that is a loss function.
Shape loss functionThe calculation mode of (2) is as follows:
(6)
in the method, in the process of the invention,representing the intersection ratio of the regressed text outline shape and the text box generated by the key point label, ++>And->Respectively representing center point coordinates of a text box generated by the regressed text outline shape and the key point labels, wherein the center point coordinates of the regressed text outline shape are key point coordinates of a clockwise median in key points of the text outline shape, and the number of the key points is doubleSelecting the central point coordinate of the connecting line of the two most middle key points when the number is counted, wherein the central point coordinate of the text box generated by the key point label is the clockwise median key point coordinate of the key points of the generated text box, and selecting the central point coordinate of the connecting line of the two most middle key points when the number of the key points is counted as double numbers >Euclidean distance representing two center points, < ->Diagonal length of minimum closure area representing text box capable of containing both regressed text outline shape and key point label generation, < ->As a regulatory factor for balancing the weights between overlap area and aspect ratio similarity, ++>Is an index for measuring the similarity of the length to width ratio.
Key point loss functionThe method comprises two parts of key point coordinates and width, and a specific calculation formula is as follows:
(7)
(8)
(9)
in the method, in the process of the invention,for the key point coordinate loss function, < >>As a key point width loss function, +.>As a weight factor, ++>Is the number of text instances in the image, +.>Number of channels representing regression keypoint heatmap, +.>And->Representing the height and width of the regression keypoint heat map, respectively,/->Key point +.in key point heat map which is regression module regression>Is used to calculate the score of (a),a coordinate point score representing a real key point heat map obtained by calculating an image with key point labels through a Gaussian function,and->Is a super-parameter controlling the contribution of each key point by +.>To reduce the penalty for points around the coordinates of the keypoint, for example>Indicating the absolute value of the number returned in brackets.
In order to accelerate model convergence, non-text region coordinate points are not considered in the regression of the key point coordinates, so that the number of negative samples is reduced. After training the banner text detection network model by using training set data, putting the test set into the model, comparing the accuracy and the detection speed of text detection, and extracting an optimal detection model.
The invention also provides a banner text detection system based on the Bezier curve and the key points, which is used for realizing the banner text detection method based on the Bezier curve and the key points.
Further, the system comprises a processor and a memory, wherein the memory is used for storing program instructions, and the processor is used for calling the stored instructions in the memory to execute a banner text detection method based on the Bezier curve and the key points.
Compared with the prior art, the invention has the following advantages:
1) A group of key points are used for replacing rectangular frames for regression, the anchor frames with fixed shapes are replaced by the mode of generating text frames through the key points to represent text examples, meanwhile, long sides of the text frames are replaced by Bezier curves, and the shape variability of the Bezier curves is utilized to adapt to different text shapes, so that the problem that the shapes of the text examples cannot be accurately represented due to the fact that the anchor frames are fixed is solved.
2) In order to reduce the calculation pressure of the text box coordinate regression, in the key point label making stage, adopting a self-adaptive mode to reduce the number of long-side coordinate points so as to reduce the calculation pressure of the text box coordinate regression; the key point coordinates and the width are used for replacing the text box coordinates to carry out regression, so that the calculation cost of the regression label is obviously reduced, and the text detection is finished, and meanwhile, the detection speed is improved and less data is used.
Drawings
FIG. 1 is a flow chart of an embodiment of the present invention.
Fig. 2 is a block diagram of a banner text detection network in accordance with an embodiment of the present invention.
Detailed Description
The invention provides a method and a system for detecting banner texts based on Bezier curves and key points, and the technical scheme of the invention is further described below with reference to drawings and embodiments.
Example 1
As shown in fig. 1, the invention provides a method for detecting a banner text based on a bezier curve and key points, which comprises the following steps:
step 1, generating an initial text box of a text region according to labels of a public text data set, simplifying the number of long-side coordinates of the text box through a fixed threshold value, generating a Bezier curve based on simplified long-side coordinate points, connecting two Bezier curves end to form a new text box, and converting the labels of the text box from boundary coordinate points of the text box to key point coordinates and the width of key points.
And 1.1, selecting images with special texts such as long texts, distorted texts and the like in the public text image data set as a data set, and generating an initial text box of a text area according to the labels of the public text data set.
The labels of the public text data set are a plurality of groups of coordinates which are arranged clockwise, each group of coordinates is a text box boundary point coordinate of the framed text, and each group of coordinates is connected clockwise to form a closed polygon, so that the text box of the text, namely the initial text box, is obtained. In this embodiment, the number of boundary points of the ctw-1500 dataset is 14, the first 7 are sequentially selected as upper boundary points, the last 7 are sequentially selected as lower boundary points, and the connecting line of the upper boundary points (1-7) and the connecting line of the lower boundary points (8-14) are sequentially selected as the long sides of the initial text box.
And 1.2, judging the bending degree of the long edge of the text box by adopting a fixed threshold value method.
And comparing the connecting line distance of the head and tail coordinates of the long sides of the text box in the data set with the distance from other coordinate points on the long sides to the connecting line through a fixed threshold value, and judging the bending degree of the two long sides of the text box, namely:
(1)
in the method, in the process of the invention,indicating the degree of curvature of the long side of the text box, +.>And (3) representing the ratio of the furthest distance from the coordinate point on the long side of the text box in the image data set to the connecting line of the head and tail coordinates of the long side to the connecting line distance of the head and tail coordinates of the long side, judging that the long side is a straight line when the ratio is more than or equal to 0 and less than 0.1, judging that the long side is partially bent when the ratio is more than or equal to 0.1 and less than 0.7, and judging that the long side is completely bent when the ratio is more than or equal to 0.7.
And 1.3, selectively simplifying coordinate points of two long sides of the text box according to the bending degree of the two long sides of the text box.
Let the distance from the coordinate point on the long edge to the line connecting the head and tail coordinate points beThe head and tail coordinate points are respectively +.>、/>The specific simplifying process is as follows: when the long side is judged to be a straight line, only the head-tail coordinate points of the long side are reserved; when the long side is judged to be partially bent, a coordinate point farthest from the head-tail coordinate connecting line and a head-tail coordinate point are reserved; when the long edge is judged to be completely bent, a threshold value is set by the heuristic algorithm of Douglas-Peucker +. >0.1 time of the length of the head-tail coordinate connecting line, when +.>Is greater than->When the coordinate points are reserved, corresponding coordinate points are reserved, and other coordinate points are abandoned; is provided with->The maximum coordinate point is +.>Use->Dividing the curve into->,Two parts, repeating the above operation until no coordinate point to the connecting line distance is greater than +.>Until that point.
Through the operation, the coordinate points on the long sides are simplified, the number of labels is reduced, the calculated amount of a banner image text detection model regression module is reduced, and the detection speed is improved.
And 1.4, taking coordinate points on two simplified long sides as control points of the Bezier curves, generating two corresponding Bezier curves, and connecting the two Bezier curves end to obtain a real boundary frame of the text.
And taking the coordinate points on the reduced long side as control points of a Bezier curve, wherein the Bezier curve is expressed by using a parameter curve based on a Bernstein polynomial, and the specific definition is as follows:
(2)
(3)
in the method, in the process of the invention,coordinate set representing a point on the Bessel curve,/->Represents Bezier curve order, +.>Represent the firstCoordinates of the individual control points->Indicate->Bernstan polynomial of the individual control points, +.>Representing binomial coefficients,/->Time is expressed when coordinates of all points on the Bessel curve are corresponding, due to +. >Or 1->The value of (2) is 0, thus when +.>At the moment, the first coordinate point on the long side is selected as the position coordinate of the Bezier curve at the moment 0, when +.>And when the position coordinate of the Bezier curve at the moment 1 is selected as the last coordinate point on the long side.
Two Bezier curves are generated through the formula (2), and a closed polygon formed by connecting the two Bezier curves end to end is used as a real text box of the text example.
And step 1.5, converting the labels of the public data set from text box boundary coordinate points to key point coordinates and the width of the key points.
In order to further reduce the number of data set labels and improve the detection efficiency, boundary points on two long sides are converted into a group of key points to represent text boxes. Before the text box is converted into key points, the quantity of boundary points of the upper long side and the lower long side of the text box is ensured to be consistent, and the quantity of the boundary points is ensured to be consistent by adopting an upward compatible mode due to the fact that the two long sides have different bending degrees, and the concrete steps are as follows:
when the upper edge and the lower edge are respectively straight lines and partially bent, the middle point of the straight line edge is extracted as one boundary point, so that the boundary points of the upper edge and the lower edge are three; when the upper edge and the lower edge are respectively straight lines and completely bent, dividing the straight lines equally according to the number of coordinate points of the completely bent edges, and extracting equally divided coordinate points so that the number of boundary points of the upper edge and the lower edge is consistent; when the upper edge and the lower edge are respectively in partial bending and full bending, dividing the two curves of the partial bending edge equally according to the coordinate point quantity of the full bending edge minus the coordinate point quantity of the partial bending edge, and extracting the equally divided coordinate points, so that the boundary point quantity of the upper edge and the lower edge is consistent.
After the number of the upper boundary points and the lower boundary points is unified through the operation, the boundary points are converted. The coordinates of the upper side and the lower side are in one-to-one correspondence from the head to the tail, the middle point coordinates of the corresponding coordinate points are taken as key point coordinates, one half of the distance of the corresponding coordinate points is taken as the width of the key points, and the labels in the public image text data set are converted into a group of key point coordinates and the corresponding width from the coordinate points of the boundary boxes, so that the label manufacturing based on the key points is realized.
And 2, constructing a banner text detection network model.
The banner text detection network model comprises a feature extraction module, a feature fusion module, a regression module and a text box generation module. And the feature extraction module is used for extracting feature information of different layers to obtain feature images containing semantic information from a lower layer to a higher layer. And the feature fusion module is used for superposing and combining the feature images of different layers to obtain a fused feature image which is used for detecting the banner text subsequently. And the regression module is used for regressing the shape of the text instance, and the coordinates of the key points and the width of the key points of the text instance. And the text box generation module is used for generating a banner image text box based on the key point coordinates and the width information vector information in the current image.
Firstly, extracting four characteristic images with different scales by using a banner text detection network model through a Resnet50 as a backbone network, and sequentially combining the characteristic images with different scales by using an FPN (characteristic pyramid network) to obtain four fused characteristic images with different scales; up-sampling the fused characteristic images with different scales by corresponding times to obtain four characteristic images with the same scales, then superposing the four characteristic images to obtain a fused characteristic image, and up-sampling the fused characteristic image by four times to obtain a fused characteristic image with the same size as the original image; and carrying out regression operation on the fusion characteristic images to obtain two parts of regression data, comparing the shape of the contour of the regressed text with the shape of the real text box, judging the similarity degree of the contour of the regressed text and the shape of the real text box, sending the data regressed by the key points into a text box generating module, obtaining two groups of long-side control point coordinates by utilizing the coordinates and width information of the key points, converting the obtained control points into two Bezier curves, and connecting the Bezier curves to obtain a final text box of the text example.
The feature extraction module backbone network adopts a ResNet-50 model, after an image is input into the ResNet-50 model, firstly downsampling is carried out on the image so that the length and the width of the image are respectively reduced to 1/4 of the original image, the channel number is increased from 3 to 64, then a convolution kernel of 1 multiplied by 1 is adopted so that the channel number of the image is increased from 64 to 256 under the condition that the length and the width of the image are unchanged, and a first feature map is obtained Then the channel increase and downsampling are carried out on the feature map, so that the channel number is increased by two times when the length and the width of the image are reduced to 1/2, and a second feature map ∈is obtained>Repeating this operation to obtain four characteristic images +.>、/>、、/>. ResNet-50 with the full connection layer removed is combined with the FPN structure, and four characteristic images with different scales obtained in a backbone network are used as input of the FPN structure. Before fusion of characteristic images with different scales, the number of channels of the characteristic images needs to be processed uniformly, so that a convolution kernel of 1 multiplied by 1 is added at the input end of the FPN structure, so that the number of channels of the characteristic images is reduced to 256, and the ∈1 is obtained>、/>、/>、/>. From the feature map of the lowest scale->Initially, a double up-sampling is performed by nearest neighbor interpolation and feature map of the same scale as the input end of the FPN structure +.>Performing addition operation to obtain fused lower-scale characteristic image +.>And again adopt nearest neighbor interpolation method pair +.>Double up-sampling and +.>Adding to obtain a fused low-scale characteristic image +.>Likewise for->Double up-sampling and +.>Adding to obtain a fused characteristic image +.>Finally, the fused characteristic image is +.>、/>、/>、/>As an output of the FPN.
The feature fusion module is used for combining the fused feature images with different scales to obtain a combined fused feature imageThe specific calculation process is as follows:
(4)
in the method, in the process of the invention,indicating a channel connection-> And->Up-sampling by 2 times, 4 times and 8 times, respectively,>、、/>、/>and the characteristic images are fused.
Using a 3 x 3 convolutional layer (with BN and ReLU layers to accelerate model convergence, reduce model parameters) willThe number of channels is reduced to 256, then the feature image is +.>4-fold upsampling is performed so that +.>The same size as the original image.
The regression module comprises two parts, namely shape regression and key point regression. Shape regression feature maps will be fused by a 3 x 3 convolution layer of Sigmoid activation functionConverting into text shape feature diagram by setting threshold +.>Binarizing the feature map for 0.5, taking the region higher than the threshold value of 0.5 as a text region, and taking the region lower than the threshold value of 0.5 as a background region, so as to obtain a text shape binary map with text separated from the background. Text in the binary imageComparing the outline shape with the shape of the text box generated by the image key point label, and comparing the intersection ratio of the outline shape and the shape of the text boxIOUAnd matching the text outline shape in the binary image with the text box shape generated by the image key point label. Because an image has a plurality of texts, the situation that key points are matched with other texts possibly exists, and the shape regression function is to ensure that the key points are in the corresponding text outline shape, so that false detection is avoided. The input of the keypoint regression is the fusion profile +. >Will fuse the feature map->Respectively input into two different detection branches, wherein one branch detects the key point coordinates, and the detection branches output +.>Zhang Guanjian dot heat map, < >>Selecting the ++of the highest score in the keypoint heat map for the most key points in the detected image text example>The highlighted coordinate points are the key point coordinates in the key point heat map, and are the key point coordinates corresponding to the key points of the text instance of the image, and the key points are the key points>For the number of text examples of the detected image, the number of key points of the text examples is insufficient, and the number of the highlight coordinates is correspondingly reduced. The other branch detects the critical point width and outputs +.A.about.3×3 convolution layer and a 1×1 convolution layer>Width information->In order to detect the number of text examples of the image, the width information corresponds to the key points one by one, the number of the key points of the text examples is insufficient, and the residual width information is 0.
The text box generation module takes the coordinates of the key points and the width information output by the regression module as text instance information, and generates a text box by using the information. Specifically, the width of a key point is the distance from the key point to a corresponding long-side coordinate point, the connecting line of two adjacent key points is used as the normal line of the connecting line of the key point and the long-side coordinate point, the key point extends upwards and downwards to be perpendicular to the normal line by the distance of the corresponding width of the key point, and the end point coordinate is the long-side coordinate point. And processing each coordinate point according to the method to obtain two groups of long-side coordinate points with the same number as the key points. And (3) taking the long-side coordinate point as a control point of the Bezier curve, obtaining two Bezier curves according to a formula (2), connecting the two Bezier curves end to obtain a completely closed curve frame, and taking the curve frame as a text frame of the text example. And finally, outputting the image of the framed text to realize text detection of the banner image.
And 3, training the banner text detection network model constructed in the step 2 by using the key point data set obtained in the step 1.
Dividing the key point data set obtained in the step 1 into a training set and a testing set, inputting the training set into a banner text detection network model for iterative training, updating parameters of the banner text detection network model, minimizing a loss function, recording the accuracy of the testing set testing model, and storing the optimal model. The training process is divided into shape detection training and key point detection training, and corresponding loss functionsThe calculation method is as follows:
(5)
in the method, in the process of the invention,for the shape loss function +.>For the key point loss function +.>For the weight factor of the loss function, the present embodiment is provided with +.>。
Considering that the banner text has arbitrary shape and large aspect ratio, the CIOU loss function is used for definingThe specific formula is as follows:
(6)
in the method, in the process of the invention,representing the intersection ratio of the regressed text outline shape and the text box generated by the key point label, ++>And->Respectively representing center point coordinates of a text box generated by the regressed text outline shape and the key point labels, wherein the center point coordinates of the regressed text outline shape are clockwise center point coordinates of key points of the text outline shape, the center point coordinates of the two most middle key point connecting lines are selected when the number of the key points is double, the center point coordinates of the text box generated by the key point labels are clockwise center point coordinates of the key points of the generated text box, the center point coordinates of the two most middle key point connecting lines are selected when the number of the key points is double >Euclidean distance representing two center points, < ->Diagonal length of minimum closure area representing text box capable of containing both regressed text outline shape and key point label generation, < ->As a regulatory factor for balancing the weights between overlap area and aspect ratio similarity, ++>Is an index for measuring the similarity of the length to width ratio.
The keypoint comprises two parts, a keypoint coordinate and a width, so the keypoint loss functionThe calculation formula is as follows:
(7)
in the method, in the process of the invention,for the key point coordinate loss function, < >>As a key point width loss function, +.>As the weight factor, the present embodiment is set to 0.2.
Considering that the number of negative samples of the key point coordinates is far greater than the number of positive samples in the training process, in order to solve the problem of unbalance of the positive and negative samples, a variant of the focal change loss function is adopted asThe method comprises the following steps:
(8)
in the method, in the process of the invention,key point +.in key point heat map which is regression module regression>Score of->Coordinate point score representing real key point heat map obtained by Gaussian function calculation of image with key point label, < ->Is the number of text instances in the image, +.>Number of channels representing regression keypoint heatmap, +.>And->Representing the height and width of the regression keypoint heat map, respectively,/- >And->Is a super parameter controlling the contribution of each key point, the present embodiment sets +.>,/>By means ofTo reduce points around the coordinates of the keypointPunishment.
Since the width of each key point generation is random, L is adopted 1 Loss function as:
(9)
In the method, in the process of the invention,is the number of text instances in the image, +.>Indicating the absolute value of the number returned in brackets, < >>A coordinate point score representing a real key point heat map obtained by calculating an image with key point labels through a Gaussian function,key point +.in key point heat map which is regression module regression>Is a score of (2).
In order to accelerate model convergence, non-text region coordinate points are not considered in the regression of the key point coordinates, so that the number of negative samples is reduced.
After training the banner text detection network model by using training set data, putting the test set into the model, comparing the accuracy and the detection speed of text detection, and extracting an optimal detection model.
And 4, detecting the text in the banner image by using the trained banner text detection network model.
Inputting an image with a banner text into the banner text detection network model trained in the step 3, and obtaining a banner text image with a text box through feature extraction, feature fusion, regression and text box generation. The specific process comprises the following steps: inputting a banner text image into a banner text detection network model, obtaining four characteristic images with different scales through a feature extraction network of resnet50+FPN, then carrying out up-sampling on the four images with different multiples to enable the image scales to be identical, then carrying out feature fusion on the four characteristic images to obtain a fused characteristic image, carrying out up-sampling on the fused characteristic image four times to enable the fused characteristic image to be identical with an original image in size, then carrying out activation mapping on the fused characteristic image to obtain a key point heat map, obtaining key point coordinates and widths through the key point heat map, finally calculating two groups of long-side coordinate points according to key point coordinates and width information, generating two Bezier curves by taking the long-side coordinate points as control points of the Bezier curves, taking a closed curve frame obtained by connecting the two Bezier curves end to end as a text frame, and outputting the banner text image with text frame marks.
Example two
Based on the same inventive concept, the invention also provides a banner text detection system based on the Bezier curve and the key points, which comprises a processor and a memory, wherein the memory is used for storing program instructions, and the processor is used for calling the program instructions in the memory to execute the banner text detection method based on the Bezier curve and the key points.
In particular, the method according to the technical solution of the present invention may be implemented by those skilled in the art using computer software technology to implement an automatic operation flow, and a system apparatus for implementing the method, such as a computer readable storage medium storing a corresponding computer program according to the technical solution of the present invention, and a computer device including the operation of the corresponding computer program, should also fall within the protection scope of the present invention.
The specific embodiments described herein are offered by way of example only to illustrate the spirit of the invention. Those skilled in the art may make various modifications or additions to the described embodiments or substitutions thereof without departing from the spirit of the invention or exceeding the scope of the invention as defined in the accompanying claims.
Claims (10)
1. A banner text detection method based on Bezier curves and key points is characterized by comprising the following steps:
Step 1, generating an initial text box of a text region according to labels of a public text data set, simplifying the number of long-side coordinates of the text box through a fixed threshold value, generating Bezier curves based on simplified long-side coordinate points, connecting two Bezier curves end to form a new text box, and converting the labels of the text box from boundary coordinate points of the text box to key point coordinates and the width of key points;
step 1.1, selecting images with long texts and distorted texts in a public text image dataset as a dataset, and generating an initial text box of a text region according to labels of the public text dataset;
step 1.2, judging the bending degree of the long edge of the text box by adopting a fixed threshold value method;
step 1.3, selectively simplifying coordinate points of two long sides of the text box according to the bending degree of the two long sides of the text box;
step 1.4, taking coordinate points on two simplified long sides as control points of a Bezier curve, generating two corresponding Bezier curves, and connecting the two Bezier curves end to obtain a real boundary frame of the text;
step 1.5, converting labels of the public data set from text box boundary coordinate points into key point coordinates and the width of the key points;
Step 2, constructing a banner text detection network model;
the banner text detection network model comprises a feature extraction module, a feature fusion module, a regression module and a text box generation module; the feature extraction module is used for extracting feature information of different layers of an input image to obtain feature images containing semantic information from a lower layer to a higher layer; the feature fusion module is used for combining the feature images of different layers to obtain a fused feature image which is used for detecting the banner text subsequently; the regression module comprises shape regression and key point regression, wherein the shape regression obtains a text shape binary image with text separated from a background based on the fusion feature image, and the key point regression obtains key point coordinates and width based on the fusion feature image; the text box generation module is used for generating a banner image text box based on the key point coordinates and the width information in the current image output by the regression module;
step 3, training the banner text detection network model constructed in the step 2 by utilizing the image obtained in the step 1 and the key point data set in the label;
dividing the image obtained in the step 1 and the key point data set in the label into a training set and a testing set, inputting the training set into a banner text detection network model for iterative training, updating parameters of the banner text detection network model, minimizing a loss function, recording the accuracy of the testing model of the testing set, and storing an optimal model; the training process is divided into shape detection training and key point detection training, and corresponding loss functions The calculation method is as follows:
(5)
in the method, in the process of the invention,for the shape loss function +.>For the key point loss function +.>A weight factor that is a loss function;
shape loss functionThe calculation mode of (2) is as follows:
(6)
in the method, in the process of the invention,representing the intersection ratio of the regressed text outline shape and the text box generated by the key point label, ++>And->Respectively representing center point coordinates of a text box generated by the regressed text outline shape and the key point labels, wherein the center point coordinates of the regressed text outline shape are clockwise center point coordinates of key points of the text outline shape, the center point coordinates of the two most middle key point connecting lines are selected when the number of the key points is double, the center point coordinates of the text box generated by the key point labels are clockwise center point coordinates of the key points of the generated text box, the center point coordinates of the two most middle key point connecting lines are selected when the number of the key points is double>Euclidean distance representing two center points, < ->Diagonal length of minimum closure area representing text box capable of containing both regressed text outline shape and key point label generation, < ->As a regulatory factor for balancing the weights between overlap area and aspect ratio similarity, ++ >Is an index for measuring the similarity of the length-width ratio;
and 4, detecting the text in the banner image by using the trained banner text detection network model.
2. The method for detecting the banner text based on the Bezier curve and the key points as claimed in claim 1, wherein the method comprises the following steps: the labels of the public text data set in the step 1.1 are a plurality of groups of coordinates which are arranged clockwise, each group of coordinates is a text box boundary point coordinate of the framed text, each group of coordinates is connected clockwise to form a closed polygon, an initial text box of the text is obtained, and the number of data set image boundary points is set as followsSequentially select ∈>And (2) as an upper boundary point, after->And taking the connecting line of the upper boundary point and the connecting line of the lower boundary point as two long sides of the initial text box.
3. The method for detecting the banner text based on the Bezier curve and the key points as claimed in claim 1, wherein the method comprises the following steps: in step 1.2, the connection distance between the head and tail coordinates of the long sides of the text box in the data set and the distance between other coordinate points on the long sides and the connection are compared through a fixed threshold value, and the bending degree of the two long sides of the text box is judged, namely:
(1)
in the method, in the process of the invention,indicating the degree of curvature of the long side of the text box, +. >Representing coordinates points on long sides of text boxes in image data set to head and tail positions of long sidesThe ratio of the farthest distance of the mark connecting line to the head-tail coordinate connecting line distance is greater than or equal to 0 and less thanWhen the ratio is equal to or greater than +.>And is less than->When the ratio is equal to or greater than +.>When the long side is judged to be completely bent, +.>、/>Is a set threshold.
4. A method for detecting a banner text based on bezier curves and keypoints as claimed in claim 3, wherein: in the step 1.3, the distance from the coordinate point on the long side to the connecting line of the head coordinate point and the tail coordinate point is set asThe head and tail coordinate points are respectively +.>、/>The specific simplifying process is as follows: when the long side is judged to be a straight line, only the head-tail coordinate points of the long side are reserved; when the long side is judged to be partially bent, a coordinate point farthest from the head-tail coordinate connecting line and a head-tail coordinate point are reserved; when the long side is judged to be completely bent, settingThreshold->0.1 time of the length of the head-tail coordinate connecting line, when +.>Is greater than->When the coordinate points are reserved, corresponding coordinate points are reserved, and other coordinate points are abandoned; is provided with->The maximum coordinate point is +.>Use->Dividing the curve into->,/>Two parts, repeating the above operation until no coordinate point to the connecting line distance is greater than +. >Until that point.
5. The method for detecting the banner text based on the Bezier curve and the key points as claimed in claim 1, wherein the method comprises the following steps: in step 1.4, coordinate points on the long side after simplification are used as control points of a Bezier curve, the Bezier curve is expressed by using a parameter curve based on a Bernstein polynomial, and the specific definition is as shown in the following formula:
(2)
(3)
in the method, in the process of the invention,coordinate set representing a point on the Bessel curve,/->Represents Bezier curve order, +.>Indicate->Coordinates of the individual control points->Indicate->Bernstan polynomial of the individual control points, +.>Representing binomial coefficients,/->Time is expressed when coordinates of all points on the Bessel curve are corresponding, due to +.>Or 1->The value of (2) is 0, thus whenWhen the first coordinate point on the long side is selected as the 0 moment Bezier curvePosition coordinates of>When the method is used, the last coordinate point on the long side is selected as the position coordinate of the Bezier curve at the moment 1;
two Bezier curves are generated through the formula (2), and a closed polygon formed by connecting the two Bezier curves end to end is used as a real text box of the text example.
6. A method for detecting a banner text based on bezier curves and keypoints as claimed in claim 3, wherein: in step 1.5, converting boundary points on two long sides into a group of key points to represent a text box, and adopting an upward compatible mode to ensure that the number of boundary points on the upper and lower long sides of the text box is consistent before converting the boundary points into the key points, wherein the specific steps are as follows: when the upper edge and the lower edge are respectively straight lines and partially bent, the middle point of the straight line edge is extracted as one boundary point, so that the boundary points of the upper edge and the lower edge are three; when the upper edge and the lower edge are respectively straight lines and completely bent, dividing the straight lines equally according to the number of coordinate points of the completely bent edges, and extracting equally divided coordinate points so that the number of boundary points of the upper edge and the lower edge is consistent; when the upper edge and the lower edge are respectively in partial bending and full bending, dividing the two curves of the partial bending edge equally according to the coordinate point quantity of the full bending edge minus the coordinate point quantity of the partial bending edge, and extracting the equally divided coordinate points so that the boundary point quantity of the upper edge and the lower edge is consistent; after the number of the upper boundary points and the lower boundary points are unified through the operation, the boundary points are converted, coordinates of the upper edge and the lower edge are in one-to-one correspondence from the beginning to the end, the middle point coordinates of the corresponding coordinate points are taken as key point coordinates, one half of the distance of the corresponding coordinate points is taken as the width of the key points, and the labels in the public image text data set are converted into a group of key point coordinates and corresponding widths from the coordinate points of the boundary frames.
7. The method for detecting the banner text based on the Bezier curve and the key points as claimed in claim 1, wherein the method comprises the following steps: in the step 2, the Re is adopted in the backbone network of the feature extraction modulesNet-50 model, after inputting the image into ResNet-50 model, obtaining four characteristic images in turn by channel increasing and downsampling process、/>、/>、/>The channel number of four characteristic images with different scales obtained in a backbone network is uniformly processed to obtain +.>、/>、/> 、/>Then from the lowest scale feature map +.>Starting up-sampling process, and performing feature map with the same scale as the input end of FPN structure ∈ ->Performing addition operation to obtain fused lower-scale characteristic image +.>For->Up-sampling and +.>Adding to obtain a fused low-scale characteristic image +.>Likewise for->Up-sampling and +.>Adding to obtain a fused characteristic image +.>Finally, the fused characteristic image is +.>、/>、/>、/>As an output of the FPN;
the feature fusion module is used for combining the fused feature images with different scales to obtain a combined fused feature imageThe specific calculation process is as follows:
(4)
in the method, in the process of the invention,indicating a channel connection-> And->Up-sampling by 2 times, 4 times and 8 times, respectively,>、、/>、/>the characteristic images are fused;
Will fuse the feature imagesUp-sampling processing is performed so that +.>The same size as the original image.
8. The method for detecting the banner text based on the Bezier curve and the key points as claimed in claim 7, wherein: the regression module in the step 2 comprises two parts of shape regression and key point regression, wherein the shape regression fuses the feature graphs through a convolution layer of an activation functionConverting into text shape feature map by setting threshold value +.>Binarizing the feature map, above threshold +.>Is a text region below a threshold +.>The area of the (2) is a background area, and a text shape binary image with a text separated from the background is obtained; comparing the text outline shape in the binary image with the text box shape generated by the image key point label, and comparing the two shapes by the cross-correlationIOUMatching the text outline shape in the binary image with the text box shape generated by the image key point label; the input of the keypoint regression is the fusion profile +.>The output is the key point coordinates and width, including two branches, one of which is output +.>Zhang Guanjian dot heat map, < >>Selecting the ++of the highest score in the keypoint heat map for the most key points in the detected image text example >The highlighted coordinate points are the key point coordinates in the key point heat map, and are the key point coordinates corresponding to the key points of the text instance of the image, and the key points are the key points>For the number of text examples of the detected imageThe number of key points in the example is insufficient, the number of the highlight coordinates is correspondingly reduced, and the output of the detection of the other branchThe number of the key points of the text example is insufficient, and the residual width information is 0; the text box generation module takes the coordinates and the width information of the key points output by the regression module as text instance information, and generates a text box by using the information; the width of the key point is the distance from the key point to the corresponding long-side coordinate point, the connecting line of two adjacent key points is used as the normal line of the connecting line of the key point and the long-side coordinate point, the key point extends upwards and downwards to be perpendicular to the normal line by the corresponding width distance of the key point, and the end point coordinate is the long-side coordinate point; processing each coordinate point according to the operation to obtain two groups of long-side coordinate points with the same number as the key points, generating two Bezier curves by taking the long-side coordinate points as control points of the Bezier curves, and connecting the two Bezier curves end to obtain a completely closed curve frame, wherein the curve frame is a text frame of the text example; and finally, outputting the image of the framed text to realize text detection of the banner image.
9. The method for detecting the banner text based on the Bezier curve and the key points as claimed in claim 1, wherein the method comprises the following steps: key point loss function in step 3The method comprises two parts of key point coordinates and width, and a specific calculation formula is as follows:
(7)
(8)
(9)
in the method, in the process of the invention,for the key point coordinate loss function, < >>As a key point width loss function, +.>As the weight factor of the weight factor,is the number of text instances in the image, +.>Number of channels representing regression keypoint heatmap, +.>And->Representing the height and width of the regression keypoint heat map, respectively,/->Key point +.in key point heat map which is regression module regression>Score of->Coordinate point score representing real key point heat map obtained by Gaussian function calculation of image with key point label, < ->And->Is a super-parameter controlling the contribution of each key point by +.>To reduce the penalty for points around the coordinates of the keypoint, for example>Indicating the absolute value of the number returned in brackets;
when the coordinates of the key points are regressed, the coordinates of the non-text area are not considered, so that the number of negative samples is reduced; after training the banner text detection network model by using training set data, putting the test set into the model, comparing the accuracy and the detection speed of text detection, and extracting an optimal detection model.
10. A bessel curve and keypoint based banner text detection system comprising a processor and a memory, the memory for storing program instructions, the processor for invoking the stored instructions in the memory to perform a bessel curve and keypoint based banner text detection method in accordance with any of claims 1-9.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310714974.5A CN116453133B (en) | 2023-06-16 | 2023-06-16 | Banner curve and key point-based banner text detection method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310714974.5A CN116453133B (en) | 2023-06-16 | 2023-06-16 | Banner curve and key point-based banner text detection method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116453133A CN116453133A (en) | 2023-07-18 |
CN116453133B true CN116453133B (en) | 2023-09-05 |
Family
ID=87132471
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310714974.5A Active CN116453133B (en) | 2023-06-16 | 2023-06-16 | Banner curve and key point-based banner text detection method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116453133B (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108564639A (en) * | 2018-04-27 | 2018-09-21 | 广州视源电子科技股份有限公司 | Handwriting storage method and device, intelligent interaction equipment and readable storage medium |
CN111414915A (en) * | 2020-02-21 | 2020-07-14 | 华为技术有限公司 | Character recognition method and related equipment |
CN112183322A (en) * | 2020-09-27 | 2021-01-05 | 成都数之联科技有限公司 | Text detection and correction method for any shape |
CN113537187A (en) * | 2021-01-06 | 2021-10-22 | 腾讯科技(深圳)有限公司 | Text recognition method and device, electronic equipment and readable storage medium |
CN114898379A (en) * | 2022-05-10 | 2022-08-12 | 度小满科技(北京)有限公司 | Method, device and equipment for recognizing curved text and storage medium |
CN115731539A (en) * | 2022-11-16 | 2023-03-03 | 武汉电信实业有限责任公司 | Video banner text detection method and system |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8457403B2 (en) * | 2011-05-19 | 2013-06-04 | Seiko Epson Corporation | Method of detecting and correcting digital images of books in the book spine area |
US10289924B2 (en) * | 2011-10-17 | 2019-05-14 | Sharp Laboratories Of America, Inc. | System and method for scanned document correction |
US9230514B1 (en) * | 2012-06-20 | 2016-01-05 | Amazon Technologies, Inc. | Simulating variances in human writing with digital typography |
WO2021113346A1 (en) * | 2019-12-03 | 2021-06-10 | Nvidia Corporation | Landmark detection using curve fitting for autonomous driving applications |
-
2023
- 2023-06-16 CN CN202310714974.5A patent/CN116453133B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108564639A (en) * | 2018-04-27 | 2018-09-21 | 广州视源电子科技股份有限公司 | Handwriting storage method and device, intelligent interaction equipment and readable storage medium |
CN111414915A (en) * | 2020-02-21 | 2020-07-14 | 华为技术有限公司 | Character recognition method and related equipment |
CN112183322A (en) * | 2020-09-27 | 2021-01-05 | 成都数之联科技有限公司 | Text detection and correction method for any shape |
CN113537187A (en) * | 2021-01-06 | 2021-10-22 | 腾讯科技(深圳)有限公司 | Text recognition method and device, electronic equipment and readable storage medium |
CN114898379A (en) * | 2022-05-10 | 2022-08-12 | 度小满科技(北京)有限公司 | Method, device and equipment for recognizing curved text and storage medium |
CN115731539A (en) * | 2022-11-16 | 2023-03-03 | 武汉电信实业有限责任公司 | Video banner text detection method and system |
Non-Patent Citations (1)
Title |
---|
Yuliang Liu et,cl..ABCNet: Real-time Scene Text Spotting with Adaptive Bezier-Curve Network.《2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition》.2020,第9806-9815页. * |
Also Published As
Publication number | Publication date |
---|---|
CN116453133A (en) | 2023-07-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111814794B (en) | Text detection method and device, electronic equipment and storage medium | |
CN111401361A (en) | End-to-end lightweight deep license plate recognition method | |
WO2022037170A1 (en) | Instance segmentation method and system for enhanced image, and device and medium | |
CN111008567A (en) | Driver behavior identification method | |
CN109447897B (en) | Real scene image synthesis method and system | |
CN112465801B (en) | Instance segmentation method for extracting mask features in scale division mode | |
CN112070037B (en) | Road extraction method, device, medium and equipment based on remote sensing image | |
CN116645592B (en) | Crack detection method based on image processing and storage medium | |
CN115631112B (en) | Building contour correction method and device based on deep learning | |
CN115797731A (en) | Target detection model training method, target detection model detection method, terminal device and storage medium | |
CN111178363B (en) | Character recognition method, character recognition device, electronic equipment and readable storage medium | |
CN114155540A (en) | Character recognition method, device and equipment based on deep learning and storage medium | |
CN117725966A (en) | Training method of sketch sequence reconstruction model, geometric model reconstruction method and equipment | |
CN116453133B (en) | Banner curve and key point-based banner text detection method and system | |
CN111368848B (en) | Character detection method under complex scene | |
CN111340139B (en) | Method and device for judging complexity of image content | |
CN111414823B (en) | Human body characteristic point detection method and device, electronic equipment and storage medium | |
CN113537187A (en) | Text recognition method and device, electronic equipment and readable storage medium | |
CN115731539A (en) | Video banner text detection method and system | |
CN113361511B (en) | Correction model establishing method, device, equipment and computer readable storage medium | |
CN114708591A (en) | Document image Chinese character detection method based on single character connection | |
CN111582275B (en) | Serial number identification method and device | |
CN110516669B (en) | Multi-level and multi-scale fusion character detection method in complex environment | |
CN114022458A (en) | Skeleton detection method and device, electronic equipment and computer readable storage medium | |
CN112381129A (en) | License plate classification method and device, storage medium and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |