CN112149620A - Method for constructing natural scene character region detection model based on no anchor point - Google Patents
Method for constructing natural scene character region detection model based on no anchor point Download PDFInfo
- Publication number
- CN112149620A CN112149620A CN202011098722.7A CN202011098722A CN112149620A CN 112149620 A CN112149620 A CN 112149620A CN 202011098722 A CN202011098722 A CN 202011098722A CN 112149620 A CN112149620 A CN 112149620A
- Authority
- CN
- China
- Prior art keywords
- feature map
- loss
- feature
- centrality
- input
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 51
- 238000000034 method Methods 0.000 title claims abstract description 29
- 238000012549 training Methods 0.000 claims description 18
- 238000000605 extraction Methods 0.000 claims description 13
- 238000010586 diagram Methods 0.000 claims description 12
- 238000012360 testing method Methods 0.000 claims description 9
- 238000011176 pooling Methods 0.000 claims description 6
- 238000001914 filtration Methods 0.000 claims description 5
- 238000004364 calculation method Methods 0.000 claims description 3
- 230000001629 suppression Effects 0.000 claims description 3
- 238000010276 construction Methods 0.000 claims 1
- 238000012545 processing Methods 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a method for constructing a natural scene character region detection model based on no anchor point, which uses a pixel-based detection mode and introduces a convolution branch for predicting the inclination angle of a boundary box, thereby detecting the inclined characters in a natural scene; the method has the advantages that the deformable convolution DCN is added into some layers of the network backbone, so that the capability of expressing the specific characteristics of the text instance by the network is improved, and the receptive field of the text target shape is more flexible; an attention module is introduced into the network, so that the extracted features are filtered, positive information is enhanced, and interference information is suppressed; in the invention, classification Loss, regression Loss CIoU Loss, centrality Loss and angle Loss are used as a joint Loss function, so that the detection precision is improved, the regression of the target frame becomes more stable, and the faster convergence speed is achieved.
Description
Technical Field
The invention relates to the technical field of image processing, in particular to a method for constructing a natural scene character region detection model based on no anchor point.
Background
The character region detection is a hot point of research in the field of computer vision, and aims to detect the position of characters in a natural scene image for further identification, so that the image is converted into real character information which can be processed by a computer. Characters in the natural scene image generally have great differences in the aspects of fonts, combination modes, character sizes and the like, and the natural scene image also has great uncertainty in the aspects of illumination intensity, resolution, image noise, shooting angles and the like, and the difficulty of character area detection in the natural scene is greatly increased by the complex factors.
One method commonly used for detecting the character region of the natural scene is a method based on bounding box regression, which generally regards a text as a type of target and directly predicts the bounding box of the text as a detection result. The method based on the bounding box regression comprises a two-stage (two-stage) method and a one-stage (one-stage) method, wherein the two-stage (two-stage) method is to generate a series of candidate frames serving as samples by an algorithm and then to classify the samples by a convolutional neural network; in the latter case, the problem of target frame positioning is directly converted into regression problem processing without generating candidate frames. Generally, the former has a higher accuracy than the latter, while the latter has a higher speed than the former.
Based on the characteristics of the two methods, under the situation of higher real-time requirement, for example, the character region detection in the automatic driving scene needs shorter identification time, which belongs to the real-time and commonly uses a single-stage method. For example, the FCOS algorithm proposed by Tian Z, Shen C, Chen H, et al.FCOS: full volumetric one-stage object detection is an anchor-frame-free single-stage target detection algorithm, the algorithm reserves an anchor-frame-free mechanism, three strategies based on pixel regression prediction, multi-scale features and centrality (Center-less) prediction are introduced, and finally the effect can be better than that of various mainstream anchor-frame-based target detection algorithms under the condition of no anchor frame. However, the FCOS algorithm has a problem of low accuracy.
Disclosure of Invention
The invention provides a method for constructing a natural scene character region detection model based on no anchor point, which aims to solve the problem of low accuracy of the conventional natural scene character region detection without anchor point.
The invention provides a method for constructing a natural scene character region detection model without anchor points, which comprises the following steps:
s100, collecting a data set containing character images facing to a natural scene, wherein the data set comprises a training image set TtrainAnd detecting the image set Ttest;
Step S200, inputting the natural image as Input to a Feature extraction network to generate a Feature pyramid consisting of multi-scale Input Feature maps, wherein the Feature extraction network comprises a deformable convolution DCN;
step S300, introducing the Feature pyramid into Attention Module Attention, and filtering the Input Feature Map by operating a Head pyramid Attention Module through a Head to generate an accurate Feature Map, wherein the Attention Module Attention comprises a Channel Attention Module and a Spatial Attention Module;
step S400, transmitting the accurate Feature Map into an output layer comprising three Convolution branches, generating a Feature Map, wherein the Feature Map comprises a Classification Feature Map, a Center-to-new Feature Map, a Regression Feature Map and an Angle Feature Map,
among the three Convolution branches, the first Convolution branch is responsible for a classification task and a centrality prediction task, the second Convolution branch is responsible for regression of a boundary box, and the third Convolution branch is responsible for prediction of an inclination angle of the boundary box;
step S500, training image set TtrainThe training image in (1) is input into step S200, a characteristic feature map corresponding to the training image is obtained through step S200, step S300 and step S400,
training each actual target frame centrality, target frame regression coordinate target frame character inclination angle and corresponding characteristic feature map labeled by the training image by using a joint loss function to obtain an anchor-free natural scene character region detection model;
step S600, detecting an image data set TtestThe detection image in (1) is used as input and is input to a natural scene character area detection model without anchor points, and a character detection area in the detection image is obtained.
Optionally, step S200, including,
step S210, the natural images are transmitted to a feature extraction network, and a third layer C3, a fourth layer C4 and a fifth layer C5 of a ResNet network in the feature extraction network generate corresponding input feature maps P3, input feature maps P4 and input feature maps P5;
in step S220, two convolution layers are added to the input feature map P5 generated at the fifth layer, so as to generate two new input feature maps P6 and P7, and obtain a feature pyramid composed of five input feature maps with different sizes.
Optionally, step S300, including,
step S310, compressing the Input Feature Map in the Feature pyramid in the spatial dimension by using maximum pooling Maxpool and average pooling Avgppool operations to generate two different spatial and contextual descriptors; inputting the two descriptors into a shared network, wherein the shared network consists of a hidden layer of multi-layer perceptrons MLP, and respectively generating corresponding sub-channel attention drawings through the shared network; merging the two generated subchannel attention diagrams to generate an attention weight diagram; performing dot product operation on the attention weight graph and the Input Feature Map to generate a Channel Refined Feature Map;
step S320, performing Maxpool operation and average pool Avgpool operation on the Channel accurate Feature Map along the Channel axis of the Channel accurate Feature Map, and performing connection operation on the generated Feature Map to generate a Feature descriptor; applying the convolutional layer Conv on the feature descriptor to generate a Spatial Attention map; and performing dot product operation on the generated Spatial Attention Map and the Channel accurate Feature Map to generate a Spatial Attention accurate Feature Map.
Optionally, classification Loss, regression Loss CIoU Loss, centrality Loss, and angle Loss are used as a joint Loss function, and a calculation formula of the joint Loss function is as follows:
wherein L iscls、Lreg、Lθ、LcesRespectively classification loss, regression loss, angle loss, centrality loss, NposRepresenting the number of positive samples, l being an indicator function whenWhen the position is classified as text, the value of the function is 1, otherwise, the value of the function is 0;
specifically, the classification penalty is:
the regression Loss CIoU Loss is:
wherein, b and bgtRespectively representing the central points of the prediction frame and the target frame, wherein p () is the Euclidean distance for calculating the two central points, a is a coefficient for balancing the length-width ratio, and v is used for measuring the ratio consistency of the prediction frame and the target frame;
the angle loss function is:
Lθ(θ,θ*)=1-cos(θ-θ*) Where θ represents the predicted tilt angle, θ*And representing the inclination angle of the target box characters.
The centrality loss is:
Lces(c,c*)=-c*·log(c)+(1-c*) Log (1-c), wherein c and c*Respectively, the predicted value of the centrality and the centrality of the target frame.
Optionally, step S600, including,
step S610, for the natural scene character detection model without anchor point, detecting the image data set TtestObtaining a characteristic Feature Map corresponding to the detected image as input, and generating the distance between a pixel point corresponding to a certain point in the Regression Feature Map and the Angle Feature Map and the distance between the point in the detected image and four frames in a prediction frame so as to generate the prediction frame;
obtaining a primary Classification score and a centrality score of the point according to the Classification Feature Map and the centrality Feature Map, and multiplying the primary Classification score obtained according to the Classification Feature Map by the centrality to obtain a final Classification score;
and S620, filtering the prediction box by using a non-maximum suppression algorithm NMS and the final classification score to obtain a character region in the detected image.
The invention provides a method for constructing a natural scene character region detection model based on no anchor point, which adds a convolution branch for predicting the inclination angle of a boundary box on an algorithm without an anchor point so as to detect inclined characters in a natural scene; the method has the advantages that the deformable convolution DCN is added into some layers of the network backbone, so that the capability of expressing the specific characteristics of the text instance by the network is improved, and the receptive field of the text target shape is more flexible; an attention module is introduced into the network, so that the extracted features are filtered, positive information is enhanced, and interference information is suppressed; in the invention, classification Loss, regression Loss CIoU Loss, centrality Loss and angle Loss are used as a joint Loss function, so that the detection precision is improved, the regression of the target frame becomes more stable, and the faster convergence speed is achieved.
Drawings
In order to more clearly illustrate the technical solution of the present invention, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious to those skilled in the art that other drawings can be obtained based on these drawings without creative efforts.
FIG. 1 is a flow chart of a method for constructing a text region detection model based on a natural scene without anchor points according to the present invention;
FIG. 2 is a network structure diagram of the method for constructing a text region detection model based on a natural scene without anchor points according to the present invention;
FIG. 3 is a network architecture diagram of the attention module of the present invention;
FIG. 4 is a network architecture diagram of a channel attention module according to the present invention;
FIG. 5 is a network architecture diagram of the spatial attention module of the present invention.
Detailed Description
The invention provides a method for constructing a natural scene character region detection model based on no anchor point, which is applied to occasions with higher real-time requirements, and ensures higher accuracy while maintaining higher detection speed.
Fig. 1 is a flowchart of a method for constructing a natural scene text region detection model based on no anchor point, fig. 2 is a network structure diagram of a method for constructing a natural scene text region detection model based on no anchor point, and as shown in fig. 1 and fig. 2, the method for constructing a natural scene text region detection model based on no anchor point comprises,
s100, collecting a data set containing character images facing to a natural scene, wherein the data set comprises a training image set TtrainAnd detecting the image set Ttest。
Step S200, inputting the natural image as Input to a Feature extraction network, and generating a Feature pyramid consisting of multi-scale Input Feature maps, wherein the Feature extraction network comprises deformable Convolution DCN (Deformable Convolution Net).
It is explained here that the feature extraction network uses the ResNet50 as a backbone network, and a deformable convolution DCN is added to the network, so that the network is more suitable for extracting text information. The network is constructed into a characteristic pyramid structure, so that a multi-scale strategy is used, and the network can well detect targets with various scales.
In the present invention, step S200 specifically includes:
step S210, the natural images are transmitted to a feature extraction network, and a third layer C3, a fourth layer C4 and a fifth layer C5 of a ResNet network in the feature extraction network generate corresponding input feature maps P3, input feature maps P4 and input feature maps P5;
in step S220, two convolution layers are added to the input feature map P5 generated at the fifth layer, so as to generate two new input feature maps P6 and P7, and obtain a feature pyramid composed of five input feature maps with different sizes.
Step S300, the Feature pyramid is transmitted to the Attention Module Attention, and the Head pyramid Attention Module is operated by the Head to filter the Input Feature Map, so as to generate an accurate Feature Map, where the Attention Module Attention includes a Channel Attention Module and a Spatial Attention Module, as shown in fig. 3.
In the present invention, step S300 specifically includes:
step S310, compressing the Input Feature Map in the Feature pyramid in the spatial dimension by using maximum pooling Maxpool and average pooling Avgppool operations to generate two different spatial and contextual descriptors; inputting the two descriptors into a shared network, wherein the shared network consists of a hidden layer of multi-layer perceptrons MLP, and respectively generating corresponding sub-channel attention drawings through the shared network; merging the two generated subchannel attention diagrams to generate an attention weight diagram; performing dot product operation on the attention weight graph and the Input Feature Map to generate a Channel Refined Feature Map, as shown in fig. 4;
step S320, performing Maxpool operation and average pool Avgpool operation on the Channel accurate Feature Map along the Channel axis of the Channel accurate Feature Map, and performing connection operation on the generated Feature Map to generate a Feature descriptor; applying the convolutional layer Conv on the feature descriptor to generate a Spatial Attention map; the generated Spatial Attention Map and the Channel Refined Feature Map are subjected to dot product operation to generate a Spatial Attention Refined Feature Map, as shown in fig. 5.
Step S400, transmitting the accurate Feature Map into an output layer comprising three Convolution branches, and generating a Feature Map, wherein the Feature Map comprises a Classification Feature Map, a Center-to-less Feature Map, a Regression Feature Map and an Angle Feature Map;
in the three Convolution branches, the first Convolution branch is responsible for a classification task and a centrality prediction task, the second Convolution branch is responsible for regression of a bounding box, and the third Convolution branch is responsible for prediction of an inclination angle of the bounding box.
Compared with the FCOS algorithm, the method has the advantages that a convolution branch for predicting the inclination angle of the boundary box is added, so that the algorithm can detect the inclined text.
Step S500, training image set TtrainThe training image in (1) is input to step S200, and passes through step S200 and step S300Step S400, obtaining a characteristic feature map corresponding to the training image,
and training each actual target frame centrality, target frame regression coordinate target frame character inclination angle and corresponding characteristic feature diagram marked by the training image by using a joint loss function to obtain an anchor-free natural scene character region detection model.
In order to improve the detection precision, make the regression of the target frame more stable and achieve faster convergence speed, in the invention, classification Loss, regression Loss CIoU Loss, centrality Loss and angle Loss are used as a joint Loss function, and the calculation formula of the joint Loss function is as follows:
wherein L iscls、Lreg、Lθ、LcesRespectively classification loss, regression loss, angle loss, centrality loss, NposRepresenting the number of positive samples, l being an indicator function whenI.e. the classification of the location is text, the value of the function is 1, otherwise the value of the function is 0.
Specifically, the classification penalty is:
the regression Loss CIoU Loss is:
wherein, b and bgtRespectively representing the central points of the prediction frame and the target frame, wherein p () is the Euclidean distance for calculating the two central points, a is a coefficient for balancing the length-width ratio, and v is used for measuring the ratio consistency of the prediction frame and the target frame;
the angle loss function is:
Lθ(θ,θ*)=1-cos(θ-θ*) Where θ represents the predicted tilt angle, θ*And representing the inclination angle of the target box characters.
The centrality loss is:
Lces(c,c*)=-c*·log(c)+(1-c*) Log (1-c), wherein c and c*Respectively, the predicted value of the centrality and the centrality of the target frame.
Step S600, detecting an image data set TtestThe detection image in (1) is used as input and is input to a natural scene character area detection model without anchor points, and a character detection area in the detection image is obtained.
In the present invention, step S600 specifically includes:
step S610, for the natural scene character detection model without anchor point, detecting the image data set TtestObtaining a characteristic Feature Map corresponding to the detected image as input, and generating the distance between a pixel point corresponding to a certain point in the Regression Feature Map and the Angle Feature Map and the distance between the point in the detected image and four frames in a prediction frame so as to generate the prediction frame;
obtaining a primary Classification score and a centrality score of the point according to the Classification Feature Map and the centrality Feature Map, and multiplying the primary Classification score obtained according to the Classification Feature Map by the centrality to obtain a final Classification score;
and S620, filtering the prediction box by using a non-maximum suppression algorithm NMS and the final classification score to obtain a character region in the detected image. In the present invention, the threshold in the NMS is a predicted box coverage of 0.6.
The invention provides a method for constructing a natural scene character region detection model based on no anchor point, which can detect inclined characters in a natural scene by adding a convolution branch for predicting the inclination angle of a boundary box; the deformable convolution is added in some layers of the network backbone, so that the capability of expressing the specific characteristics of the text instance by the network is improved, and the receptive field of the text target shape is more flexible; an attention module is introduced into the network, so that the extracted features are filtered, positive information is enhanced, and interference information is suppressed; in the invention, classification Loss, regression Loss CIoU Loss, centrality Loss and angle Loss are used as a joint Loss function, so that the detection precision is improved, the regression of the target frame becomes more stable, and the faster convergence speed is achieved.
The above-described embodiments of the present invention should not be construed as limiting the scope of the present invention.
Claims (5)
1. A construction method of a natural scene character region detection model based on no anchor point is characterized by comprising the following steps:
s100, collecting a data set containing character images facing to a natural scene, wherein the data set comprises a training image set TtrainAnd detecting the image set Ttest;
Step S200, inputting the natural image as Input to a Feature extraction network to generate a Feature pyramid consisting of multi-scale Input Feature maps, wherein the Feature extraction network comprises a deformable convolution DCN;
step S300, introducing the Feature pyramid into Attention Module Attention, and filtering the Input Feature Map by operating a Head pyramid Attention Module through a Head to generate an accurate Feature Map, wherein the Attention Module Attention comprises a Channel Attention Module and a Spatial Attention Module;
step S400, transmitting the accurate Feature Map into an output layer comprising three Convolution branches, generating a Feature Map, wherein the Feature Map comprises a Classification Feature Map, a Center-to-new Feature Map, a Regression Feature Map and an Angle Feature Map,
among the three Convolution branches, the first Convolution branch is responsible for a classification task and a centrality prediction task, the second Convolution branch is responsible for regression of a boundary box, and the third Convolution branch is responsible for prediction of an inclination angle of the boundary box;
step S500, training image set TtrainThe training image in (1) is input into step S200, a characteristic feature map corresponding to the training image is obtained through step S200, step S300 and step S400,
training each actual target frame centrality, target frame regression coordinate target frame character inclination angle and corresponding characteristic feature map labeled by the training image by using a joint loss function to obtain an anchor-free natural scene character region detection model;
step S600, detecting an image data set TtestThe detection image in (1) is used as input and is input to a natural scene character area detection model without anchor points, and a character detection area in the detection image is obtained.
2. The method for constructing the text region detection model based on the natural scene without anchor point as claimed in claim 1, wherein step S200 comprises,
step S210, the natural images are transmitted to a feature extraction network, and a third layer C3, a fourth layer C4 and a fifth layer C5 of a ResNet network in the feature extraction network generate corresponding input feature maps P3, input feature maps P4 and input feature maps P5;
in step S220, two convolution layers are added to the input feature map P5 generated at the fifth layer, so as to generate two new input feature maps P6 and P7, and obtain a feature pyramid composed of five input feature maps with different sizes.
3. The method for constructing the text region detection model based on the natural scene without anchor point as claimed in claim 1, wherein step S300 comprises,
step S310, compressing the Input Feature Map in the Feature pyramid in the spatial dimension by using maximum pooling Maxpool and average pooling Avgppool operations to generate two different spatial and contextual descriptors; inputting the two descriptors into a shared network, wherein the shared network consists of a hidden layer of multi-layer perceptrons MLP, and respectively generating corresponding sub-channel attention drawings through the shared network; merging the two generated subchannel attention diagrams to generate an attention weight diagram; performing dot product operation on the attention weight graph and the Input Feature Map to generate a Channel Refined Feature Map;
step S320, performing Maxpool operation and average pool Avgpool operation on the Channel accurate Feature Map along the Channel axis of the Channel accurate Feature Map, and performing connection operation on the generated Feature Map to generate a Feature descriptor; applying the convolutional layer Conv on the feature descriptor to generate a Spatial Attention map; and performing dot product operation on the generated Spatial Attention Map and the Channel accurate Feature Map to generate a Spatial Attention accurate Feature Map.
4. The method for constructing the natural scene character region detection model based on the anchorless as claimed in claim 1, wherein a classification Loss, a regression Loss CIoU Loss, a centrality Loss and an angle Loss are used as a joint Loss function, and a calculation formula of the joint Loss function is as follows:
wherein L iscls、Lreg、Lθ、LcesRespectively classification loss, regression loss, angle loss, centrality loss, NposRepresenting the number of positive samples, l being an indicator function whenWhen the position is classified as text, the value of the function is 1, otherwise, the value of the function is 0;
specifically, the classification penalty is:
the regression Loss CIoU Loss is:
wherein, b and bgtRespectively representing the central points of the prediction frame and the target frame, wherein p () is the Euclidean distance for calculating the two central points, a is a coefficient for balancing the length-width ratio, and v is used for measuring the ratio consistency of the prediction frame and the target frame;
the angle loss function is:
Lθ(θ,θ*)=1-cos(θ-θ*) Where θ represents the predicted tilt angle, θ*And representing the inclination angle of the target box characters.
The centrality loss is:
Lces(c,c*)=-c*·log(c)+(1-c*) Log (1-c), wherein c and c*Respectively, the predicted value of the centrality and the centrality of the target frame.
5. The method for constructing the text region detection model based on the natural scene without anchor point as claimed in claim 1, wherein step S600 comprises,
step S610, for the natural scene character detection model without anchor point, detecting the image data set TtestObtaining a characteristic Feature Map corresponding to the detected image as input, and generating the distance between a pixel point corresponding to a certain point in the Regression Feature Map and the Angle Feature Map and the distance between the point in the detected image and four frames in a prediction frame so as to generate the prediction frame;
obtaining a primary Classification score and a centrality score of the point according to the Classification Feature Map and the centrality Feature Map, and multiplying the primary Classification score obtained according to the Classification Feature Map by the centrality to obtain a final Classification score;
and S620, filtering the prediction box by using a non-maximum suppression algorithm NMS and the final classification score to obtain a character region in the detected image.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011098722.7A CN112149620A (en) | 2020-10-14 | 2020-10-14 | Method for constructing natural scene character region detection model based on no anchor point |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011098722.7A CN112149620A (en) | 2020-10-14 | 2020-10-14 | Method for constructing natural scene character region detection model based on no anchor point |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112149620A true CN112149620A (en) | 2020-12-29 |
Family
ID=73951780
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011098722.7A Pending CN112149620A (en) | 2020-10-14 | 2020-10-14 | Method for constructing natural scene character region detection model based on no anchor point |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112149620A (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112560857A (en) * | 2021-02-20 | 2021-03-26 | 鹏城实验室 | Character area boundary detection method, equipment, storage medium and device |
CN112926584A (en) * | 2021-05-11 | 2021-06-08 | 武汉珈鹰智能科技有限公司 | Crack detection method and device, computer equipment and storage medium |
CN112966690A (en) * | 2021-03-03 | 2021-06-15 | 中国科学院自动化研究所 | Scene character detection method based on anchor-free frame and suggestion frame |
CN113255906A (en) * | 2021-04-28 | 2021-08-13 | 中国第一汽车股份有限公司 | Method, device, terminal and storage medium for returning obstacle 3D angle information in automatic driving |
CN113435266A (en) * | 2021-06-09 | 2021-09-24 | 东莞理工学院 | FCOS intelligent target detection method based on extreme point feature enhancement |
CN113723563A (en) * | 2021-09-13 | 2021-11-30 | 中科南京人工智能创新研究院 | Vehicle detection algorithm based on FCOS improvement |
CN114022558A (en) * | 2022-01-05 | 2022-02-08 | 深圳思谋信息科技有限公司 | Image positioning method and device, computer equipment and storage medium |
CN114841244A (en) * | 2022-04-05 | 2022-08-02 | 西北工业大学 | Target detection method based on robust sampling and mixed attention pyramid |
CN114913110A (en) * | 2021-02-08 | 2022-08-16 | 深圳中科飞测科技股份有限公司 | Detection method and system, equipment and storage medium |
CN118711040A (en) * | 2024-08-29 | 2024-09-27 | 杭州久烁网络科技有限公司 | FCOS network optimization method and system based on feature fusion and attention mechanism |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109117836A (en) * | 2018-07-05 | 2019-01-01 | 中国科学院信息工程研究所 | Text detection localization method and device under a kind of natural scene based on focal loss function |
US20200090506A1 (en) * | 2018-09-19 | 2020-03-19 | National Chung-Shan Institute Of Science And Technology | License plate recognition system and license plate recognition method |
CN111126472A (en) * | 2019-12-18 | 2020-05-08 | 南京信息工程大学 | Improved target detection method based on SSD |
WO2020097734A1 (en) * | 2018-11-15 | 2020-05-22 | Element Ai Inc. | Automatically predicting text in images |
CN111723798A (en) * | 2020-05-27 | 2020-09-29 | 西安交通大学 | Multi-instance natural scene text detection method based on relevance hierarchy residual errors |
-
2020
- 2020-10-14 CN CN202011098722.7A patent/CN112149620A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109117836A (en) * | 2018-07-05 | 2019-01-01 | 中国科学院信息工程研究所 | Text detection localization method and device under a kind of natural scene based on focal loss function |
US20200090506A1 (en) * | 2018-09-19 | 2020-03-19 | National Chung-Shan Institute Of Science And Technology | License plate recognition system and license plate recognition method |
WO2020097734A1 (en) * | 2018-11-15 | 2020-05-22 | Element Ai Inc. | Automatically predicting text in images |
CN111126472A (en) * | 2019-12-18 | 2020-05-08 | 南京信息工程大学 | Improved target detection method based on SSD |
CN111723798A (en) * | 2020-05-27 | 2020-09-29 | 西安交通大学 | Multi-instance natural scene text detection method based on relevance hierarchy residual errors |
Non-Patent Citations (4)
Title |
---|
JIFENG DAI等: "Deformable Convolutional Networks", 《ARXIV》, pages 1 - 12 * |
SANGHYUN WOO等: "CBAM Convolutional Block Attention Module", 《ARXIV》, pages 1 - 17 * |
ZHI TIAN等: "FCOS: Fully Convolutional One-Stage Object Detection", 《ARXIV》, pages 1 - 13 * |
刘济樾: "基于轻量化网络的实时人脸检测方法研究", 《中国优秀硕士学位论文全文数据库:信息科技辑》, no. 7, pages 1 - 84 * |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114913110A (en) * | 2021-02-08 | 2022-08-16 | 深圳中科飞测科技股份有限公司 | Detection method and system, equipment and storage medium |
CN112560857A (en) * | 2021-02-20 | 2021-03-26 | 鹏城实验室 | Character area boundary detection method, equipment, storage medium and device |
CN112966690B (en) * | 2021-03-03 | 2023-01-13 | 中国科学院自动化研究所 | Scene character detection method based on anchor-free frame and suggestion frame |
CN112966690A (en) * | 2021-03-03 | 2021-06-15 | 中国科学院自动化研究所 | Scene character detection method based on anchor-free frame and suggestion frame |
CN113255906A (en) * | 2021-04-28 | 2021-08-13 | 中国第一汽车股份有限公司 | Method, device, terminal and storage medium for returning obstacle 3D angle information in automatic driving |
CN112926584B (en) * | 2021-05-11 | 2021-08-06 | 武汉珈鹰智能科技有限公司 | Crack detection method and device, computer equipment and storage medium |
CN112926584A (en) * | 2021-05-11 | 2021-06-08 | 武汉珈鹰智能科技有限公司 | Crack detection method and device, computer equipment and storage medium |
CN113435266A (en) * | 2021-06-09 | 2021-09-24 | 东莞理工学院 | FCOS intelligent target detection method based on extreme point feature enhancement |
CN113435266B (en) * | 2021-06-09 | 2023-09-01 | 东莞理工学院 | FCOS intelligent target detection method based on extremum point characteristic enhancement |
CN113723563A (en) * | 2021-09-13 | 2021-11-30 | 中科南京人工智能创新研究院 | Vehicle detection algorithm based on FCOS improvement |
CN114022558A (en) * | 2022-01-05 | 2022-02-08 | 深圳思谋信息科技有限公司 | Image positioning method and device, computer equipment and storage medium |
CN114841244A (en) * | 2022-04-05 | 2022-08-02 | 西北工业大学 | Target detection method based on robust sampling and mixed attention pyramid |
CN114841244B (en) * | 2022-04-05 | 2024-03-12 | 西北工业大学 | Target detection method based on robust sampling and mixed attention pyramid |
CN118711040A (en) * | 2024-08-29 | 2024-09-27 | 杭州久烁网络科技有限公司 | FCOS network optimization method and system based on feature fusion and attention mechanism |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112149620A (en) | Method for constructing natural scene character region detection model based on no anchor point | |
CN108961235B (en) | Defective insulator identification method based on YOLOv3 network and particle filter algorithm | |
CN110428432B (en) | Deep neural network algorithm for automatically segmenting colon gland image | |
JP6547069B2 (en) | Convolutional Neural Network with Subcategory Recognition Function for Object Detection | |
CN112150821B (en) | Lightweight vehicle detection model construction method, system and device | |
CN111222396B (en) | All-weather multispectral pedestrian detection method | |
CN114565860B (en) | Multi-dimensional reinforcement learning synthetic aperture radar image target detection method | |
CN112836713A (en) | Image anchor-frame-free detection-based mesoscale convection system identification and tracking method | |
CN113052200B (en) | Sonar image target detection method based on yolov3 network | |
CN111626993A (en) | Image automatic detection counting method and system based on embedded FEFnet network | |
JP2022025008A (en) | License plate recognition method based on text line recognition | |
CN113569724B (en) | Road extraction method and system based on attention mechanism and dilation convolution | |
CN116342894B (en) | GIS infrared feature recognition system and method based on improved YOLOv5 | |
CN113222824B (en) | Infrared image super-resolution and small target detection method | |
CN113033315A (en) | Rare earth mining high-resolution image identification and positioning method | |
CN112446292B (en) | 2D image salient object detection method and system | |
CN111860587A (en) | Method for detecting small target of picture | |
CN113052215A (en) | Sonar image automatic target identification method based on neural network visualization | |
CN113505634A (en) | Double-flow decoding cross-task interaction network optical remote sensing image salient target detection method | |
CN114821316A (en) | Three-dimensional ground penetrating radar crack disease identification method and system | |
CN114565824B (en) | Single-stage rotating ship detection method based on full convolution network | |
CN113609904B (en) | Single-target tracking algorithm based on dynamic global information modeling and twin network | |
CN114821346A (en) | Radar image intelligent identification method and system based on embedded platform | |
CN111476226B (en) | Text positioning method and device and model training method | |
CN117152508A (en) | Target detection method for decoupling positioning and classification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20201229 |
|
WD01 | Invention patent application deemed withdrawn after publication |