CN112149620A - Method for constructing natural scene character region detection model based on no anchor point - Google Patents

Method for constructing natural scene character region detection model based on no anchor point Download PDF

Info

Publication number
CN112149620A
CN112149620A CN202011098722.7A CN202011098722A CN112149620A CN 112149620 A CN112149620 A CN 112149620A CN 202011098722 A CN202011098722 A CN 202011098722A CN 112149620 A CN112149620 A CN 112149620A
Authority
CN
China
Prior art keywords
feature map
loss
feature
centrality
input
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011098722.7A
Other languages
Chinese (zh)
Inventor
徐亦飞
王冕
王爱臣
严汤文
王优
李斌
尉萍萍
肖志峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanchang Huiyichen Technology Co ltd
Original Assignee
Nanchang Huiyichen Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanchang Huiyichen Technology Co ltd filed Critical Nanchang Huiyichen Technology Co ltd
Priority to CN202011098722.7A priority Critical patent/CN112149620A/en
Publication of CN112149620A publication Critical patent/CN112149620A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method for constructing a natural scene character region detection model based on no anchor point, which uses a pixel-based detection mode and introduces a convolution branch for predicting the inclination angle of a boundary box, thereby detecting the inclined characters in a natural scene; the method has the advantages that the deformable convolution DCN is added into some layers of the network backbone, so that the capability of expressing the specific characteristics of the text instance by the network is improved, and the receptive field of the text target shape is more flexible; an attention module is introduced into the network, so that the extracted features are filtered, positive information is enhanced, and interference information is suppressed; in the invention, classification Loss, regression Loss CIoU Loss, centrality Loss and angle Loss are used as a joint Loss function, so that the detection precision is improved, the regression of the target frame becomes more stable, and the faster convergence speed is achieved.

Description

Method for constructing natural scene character region detection model based on no anchor point
Technical Field
The invention relates to the technical field of image processing, in particular to a method for constructing a natural scene character region detection model based on no anchor point.
Background
The character region detection is a hot point of research in the field of computer vision, and aims to detect the position of characters in a natural scene image for further identification, so that the image is converted into real character information which can be processed by a computer. Characters in the natural scene image generally have great differences in the aspects of fonts, combination modes, character sizes and the like, and the natural scene image also has great uncertainty in the aspects of illumination intensity, resolution, image noise, shooting angles and the like, and the difficulty of character area detection in the natural scene is greatly increased by the complex factors.
One method commonly used for detecting the character region of the natural scene is a method based on bounding box regression, which generally regards a text as a type of target and directly predicts the bounding box of the text as a detection result. The method based on the bounding box regression comprises a two-stage (two-stage) method and a one-stage (one-stage) method, wherein the two-stage (two-stage) method is to generate a series of candidate frames serving as samples by an algorithm and then to classify the samples by a convolutional neural network; in the latter case, the problem of target frame positioning is directly converted into regression problem processing without generating candidate frames. Generally, the former has a higher accuracy than the latter, while the latter has a higher speed than the former.
Based on the characteristics of the two methods, under the situation of higher real-time requirement, for example, the character region detection in the automatic driving scene needs shorter identification time, which belongs to the real-time and commonly uses a single-stage method. For example, the FCOS algorithm proposed by Tian Z, Shen C, Chen H, et al.FCOS: full volumetric one-stage object detection is an anchor-frame-free single-stage target detection algorithm, the algorithm reserves an anchor-frame-free mechanism, three strategies based on pixel regression prediction, multi-scale features and centrality (Center-less) prediction are introduced, and finally the effect can be better than that of various mainstream anchor-frame-based target detection algorithms under the condition of no anchor frame. However, the FCOS algorithm has a problem of low accuracy.
Disclosure of Invention
The invention provides a method for constructing a natural scene character region detection model based on no anchor point, which aims to solve the problem of low accuracy of the conventional natural scene character region detection without anchor point.
The invention provides a method for constructing a natural scene character region detection model without anchor points, which comprises the following steps:
s100, collecting a data set containing character images facing to a natural scene, wherein the data set comprises a training image set TtrainAnd detecting the image set Ttest
Step S200, inputting the natural image as Input to a Feature extraction network to generate a Feature pyramid consisting of multi-scale Input Feature maps, wherein the Feature extraction network comprises a deformable convolution DCN;
step S300, introducing the Feature pyramid into Attention Module Attention, and filtering the Input Feature Map by operating a Head pyramid Attention Module through a Head to generate an accurate Feature Map, wherein the Attention Module Attention comprises a Channel Attention Module and a Spatial Attention Module;
step S400, transmitting the accurate Feature Map into an output layer comprising three Convolution branches, generating a Feature Map, wherein the Feature Map comprises a Classification Feature Map, a Center-to-new Feature Map, a Regression Feature Map and an Angle Feature Map,
among the three Convolution branches, the first Convolution branch is responsible for a classification task and a centrality prediction task, the second Convolution branch is responsible for regression of a boundary box, and the third Convolution branch is responsible for prediction of an inclination angle of the boundary box;
step S500, training image set TtrainThe training image in (1) is input into step S200, a characteristic feature map corresponding to the training image is obtained through step S200, step S300 and step S400,
training each actual target frame centrality, target frame regression coordinate target frame character inclination angle and corresponding characteristic feature map labeled by the training image by using a joint loss function to obtain an anchor-free natural scene character region detection model;
step S600, detecting an image data set TtestThe detection image in (1) is used as input and is input to a natural scene character area detection model without anchor points, and a character detection area in the detection image is obtained.
Optionally, step S200, including,
step S210, the natural images are transmitted to a feature extraction network, and a third layer C3, a fourth layer C4 and a fifth layer C5 of a ResNet network in the feature extraction network generate corresponding input feature maps P3, input feature maps P4 and input feature maps P5;
in step S220, two convolution layers are added to the input feature map P5 generated at the fifth layer, so as to generate two new input feature maps P6 and P7, and obtain a feature pyramid composed of five input feature maps with different sizes.
Optionally, step S300, including,
step S310, compressing the Input Feature Map in the Feature pyramid in the spatial dimension by using maximum pooling Maxpool and average pooling Avgppool operations to generate two different spatial and contextual descriptors; inputting the two descriptors into a shared network, wherein the shared network consists of a hidden layer of multi-layer perceptrons MLP, and respectively generating corresponding sub-channel attention drawings through the shared network; merging the two generated subchannel attention diagrams to generate an attention weight diagram; performing dot product operation on the attention weight graph and the Input Feature Map to generate a Channel Refined Feature Map;
step S320, performing Maxpool operation and average pool Avgpool operation on the Channel accurate Feature Map along the Channel axis of the Channel accurate Feature Map, and performing connection operation on the generated Feature Map to generate a Feature descriptor; applying the convolutional layer Conv on the feature descriptor to generate a Spatial Attention map; and performing dot product operation on the generated Spatial Attention Map and the Channel accurate Feature Map to generate a Spatial Attention accurate Feature Map.
Optionally, classification Loss, regression Loss CIoU Loss, centrality Loss, and angle Loss are used as a joint Loss function, and a calculation formula of the joint Loss function is as follows:
Figure BDA0002724619220000021
wherein L iscls、Lreg、Lθ、LcesRespectively classification loss, regression loss, angle loss, centrality loss, NposRepresenting the number of positive samples, l being an indicator function when
Figure BDA0002724619220000022
When the position is classified as text, the value of the function is 1, otherwise, the value of the function is 0;
specifically, the classification penalty is:
Figure BDA0002724619220000023
the regression Loss CIoU Loss is:
Figure BDA0002724619220000024
wherein, b and bgtRespectively representing the central points of the prediction frame and the target frame, wherein p () is the Euclidean distance for calculating the two central points, a is a coefficient for balancing the length-width ratio, and v is used for measuring the ratio consistency of the prediction frame and the target frame;
the angle loss function is:
Lθ(θ,θ*)=1-cos(θ-θ*) Where θ represents the predicted tilt angle, θ*And representing the inclination angle of the target box characters.
The centrality loss is:
Lces(c,c*)=-c*·log(c)+(1-c*) Log (1-c), wherein c and c*Respectively, the predicted value of the centrality and the centrality of the target frame.
Optionally, step S600, including,
step S610, for the natural scene character detection model without anchor point, detecting the image data set TtestObtaining a characteristic Feature Map corresponding to the detected image as input, and generating the distance between a pixel point corresponding to a certain point in the Regression Feature Map and the Angle Feature Map and the distance between the point in the detected image and four frames in a prediction frame so as to generate the prediction frame;
obtaining a primary Classification score and a centrality score of the point according to the Classification Feature Map and the centrality Feature Map, and multiplying the primary Classification score obtained according to the Classification Feature Map by the centrality to obtain a final Classification score;
and S620, filtering the prediction box by using a non-maximum suppression algorithm NMS and the final classification score to obtain a character region in the detected image.
The invention provides a method for constructing a natural scene character region detection model based on no anchor point, which adds a convolution branch for predicting the inclination angle of a boundary box on an algorithm without an anchor point so as to detect inclined characters in a natural scene; the method has the advantages that the deformable convolution DCN is added into some layers of the network backbone, so that the capability of expressing the specific characteristics of the text instance by the network is improved, and the receptive field of the text target shape is more flexible; an attention module is introduced into the network, so that the extracted features are filtered, positive information is enhanced, and interference information is suppressed; in the invention, classification Loss, regression Loss CIoU Loss, centrality Loss and angle Loss are used as a joint Loss function, so that the detection precision is improved, the regression of the target frame becomes more stable, and the faster convergence speed is achieved.
Drawings
In order to more clearly illustrate the technical solution of the present invention, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious to those skilled in the art that other drawings can be obtained based on these drawings without creative efforts.
FIG. 1 is a flow chart of a method for constructing a text region detection model based on a natural scene without anchor points according to the present invention;
FIG. 2 is a network structure diagram of the method for constructing a text region detection model based on a natural scene without anchor points according to the present invention;
FIG. 3 is a network architecture diagram of the attention module of the present invention;
FIG. 4 is a network architecture diagram of a channel attention module according to the present invention;
FIG. 5 is a network architecture diagram of the spatial attention module of the present invention.
Detailed Description
The invention provides a method for constructing a natural scene character region detection model based on no anchor point, which is applied to occasions with higher real-time requirements, and ensures higher accuracy while maintaining higher detection speed.
Fig. 1 is a flowchart of a method for constructing a natural scene text region detection model based on no anchor point, fig. 2 is a network structure diagram of a method for constructing a natural scene text region detection model based on no anchor point, and as shown in fig. 1 and fig. 2, the method for constructing a natural scene text region detection model based on no anchor point comprises,
s100, collecting a data set containing character images facing to a natural scene, wherein the data set comprises a training image set TtrainAnd detecting the image set Ttest
Step S200, inputting the natural image as Input to a Feature extraction network, and generating a Feature pyramid consisting of multi-scale Input Feature maps, wherein the Feature extraction network comprises deformable Convolution DCN (Deformable Convolution Net).
It is explained here that the feature extraction network uses the ResNet50 as a backbone network, and a deformable convolution DCN is added to the network, so that the network is more suitable for extracting text information. The network is constructed into a characteristic pyramid structure, so that a multi-scale strategy is used, and the network can well detect targets with various scales.
In the present invention, step S200 specifically includes:
step S210, the natural images are transmitted to a feature extraction network, and a third layer C3, a fourth layer C4 and a fifth layer C5 of a ResNet network in the feature extraction network generate corresponding input feature maps P3, input feature maps P4 and input feature maps P5;
in step S220, two convolution layers are added to the input feature map P5 generated at the fifth layer, so as to generate two new input feature maps P6 and P7, and obtain a feature pyramid composed of five input feature maps with different sizes.
Step S300, the Feature pyramid is transmitted to the Attention Module Attention, and the Head pyramid Attention Module is operated by the Head to filter the Input Feature Map, so as to generate an accurate Feature Map, where the Attention Module Attention includes a Channel Attention Module and a Spatial Attention Module, as shown in fig. 3.
In the present invention, step S300 specifically includes:
step S310, compressing the Input Feature Map in the Feature pyramid in the spatial dimension by using maximum pooling Maxpool and average pooling Avgppool operations to generate two different spatial and contextual descriptors; inputting the two descriptors into a shared network, wherein the shared network consists of a hidden layer of multi-layer perceptrons MLP, and respectively generating corresponding sub-channel attention drawings through the shared network; merging the two generated subchannel attention diagrams to generate an attention weight diagram; performing dot product operation on the attention weight graph and the Input Feature Map to generate a Channel Refined Feature Map, as shown in fig. 4;
step S320, performing Maxpool operation and average pool Avgpool operation on the Channel accurate Feature Map along the Channel axis of the Channel accurate Feature Map, and performing connection operation on the generated Feature Map to generate a Feature descriptor; applying the convolutional layer Conv on the feature descriptor to generate a Spatial Attention map; the generated Spatial Attention Map and the Channel Refined Feature Map are subjected to dot product operation to generate a Spatial Attention Refined Feature Map, as shown in fig. 5.
Step S400, transmitting the accurate Feature Map into an output layer comprising three Convolution branches, and generating a Feature Map, wherein the Feature Map comprises a Classification Feature Map, a Center-to-less Feature Map, a Regression Feature Map and an Angle Feature Map;
in the three Convolution branches, the first Convolution branch is responsible for a classification task and a centrality prediction task, the second Convolution branch is responsible for regression of a bounding box, and the third Convolution branch is responsible for prediction of an inclination angle of the bounding box.
Compared with the FCOS algorithm, the method has the advantages that a convolution branch for predicting the inclination angle of the boundary box is added, so that the algorithm can detect the inclined text.
Step S500, training image set TtrainThe training image in (1) is input to step S200, and passes through step S200 and step S300Step S400, obtaining a characteristic feature map corresponding to the training image,
and training each actual target frame centrality, target frame regression coordinate target frame character inclination angle and corresponding characteristic feature diagram marked by the training image by using a joint loss function to obtain an anchor-free natural scene character region detection model.
In order to improve the detection precision, make the regression of the target frame more stable and achieve faster convergence speed, in the invention, classification Loss, regression Loss CIoU Loss, centrality Loss and angle Loss are used as a joint Loss function, and the calculation formula of the joint Loss function is as follows:
Figure BDA0002724619220000051
wherein L iscls、Lreg、Lθ、LcesRespectively classification loss, regression loss, angle loss, centrality loss, NposRepresenting the number of positive samples, l being an indicator function when
Figure BDA0002724619220000052
I.e. the classification of the location is text, the value of the function is 1, otherwise the value of the function is 0.
Specifically, the classification penalty is:
Figure BDA0002724619220000053
the regression Loss CIoU Loss is:
Figure BDA0002724619220000054
wherein, b and bgtRespectively representing the central points of the prediction frame and the target frame, wherein p () is the Euclidean distance for calculating the two central points, a is a coefficient for balancing the length-width ratio, and v is used for measuring the ratio consistency of the prediction frame and the target frame;
the angle loss function is:
Lθ(θ,θ*)=1-cos(θ-θ*) Where θ represents the predicted tilt angle, θ*And representing the inclination angle of the target box characters.
The centrality loss is:
Lces(c,c*)=-c*·log(c)+(1-c*) Log (1-c), wherein c and c*Respectively, the predicted value of the centrality and the centrality of the target frame.
Step S600, detecting an image data set TtestThe detection image in (1) is used as input and is input to a natural scene character area detection model without anchor points, and a character detection area in the detection image is obtained.
In the present invention, step S600 specifically includes:
step S610, for the natural scene character detection model without anchor point, detecting the image data set TtestObtaining a characteristic Feature Map corresponding to the detected image as input, and generating the distance between a pixel point corresponding to a certain point in the Regression Feature Map and the Angle Feature Map and the distance between the point in the detected image and four frames in a prediction frame so as to generate the prediction frame;
obtaining a primary Classification score and a centrality score of the point according to the Classification Feature Map and the centrality Feature Map, and multiplying the primary Classification score obtained according to the Classification Feature Map by the centrality to obtain a final Classification score;
and S620, filtering the prediction box by using a non-maximum suppression algorithm NMS and the final classification score to obtain a character region in the detected image. In the present invention, the threshold in the NMS is a predicted box coverage of 0.6.
The invention provides a method for constructing a natural scene character region detection model based on no anchor point, which can detect inclined characters in a natural scene by adding a convolution branch for predicting the inclination angle of a boundary box; the deformable convolution is added in some layers of the network backbone, so that the capability of expressing the specific characteristics of the text instance by the network is improved, and the receptive field of the text target shape is more flexible; an attention module is introduced into the network, so that the extracted features are filtered, positive information is enhanced, and interference information is suppressed; in the invention, classification Loss, regression Loss CIoU Loss, centrality Loss and angle Loss are used as a joint Loss function, so that the detection precision is improved, the regression of the target frame becomes more stable, and the faster convergence speed is achieved.
The above-described embodiments of the present invention should not be construed as limiting the scope of the present invention.

Claims (5)

1. A construction method of a natural scene character region detection model based on no anchor point is characterized by comprising the following steps:
s100, collecting a data set containing character images facing to a natural scene, wherein the data set comprises a training image set TtrainAnd detecting the image set Ttest
Step S200, inputting the natural image as Input to a Feature extraction network to generate a Feature pyramid consisting of multi-scale Input Feature maps, wherein the Feature extraction network comprises a deformable convolution DCN;
step S300, introducing the Feature pyramid into Attention Module Attention, and filtering the Input Feature Map by operating a Head pyramid Attention Module through a Head to generate an accurate Feature Map, wherein the Attention Module Attention comprises a Channel Attention Module and a Spatial Attention Module;
step S400, transmitting the accurate Feature Map into an output layer comprising three Convolution branches, generating a Feature Map, wherein the Feature Map comprises a Classification Feature Map, a Center-to-new Feature Map, a Regression Feature Map and an Angle Feature Map,
among the three Convolution branches, the first Convolution branch is responsible for a classification task and a centrality prediction task, the second Convolution branch is responsible for regression of a boundary box, and the third Convolution branch is responsible for prediction of an inclination angle of the boundary box;
step S500, training image set TtrainThe training image in (1) is input into step S200, a characteristic feature map corresponding to the training image is obtained through step S200, step S300 and step S400,
training each actual target frame centrality, target frame regression coordinate target frame character inclination angle and corresponding characteristic feature map labeled by the training image by using a joint loss function to obtain an anchor-free natural scene character region detection model;
step S600, detecting an image data set TtestThe detection image in (1) is used as input and is input to a natural scene character area detection model without anchor points, and a character detection area in the detection image is obtained.
2. The method for constructing the text region detection model based on the natural scene without anchor point as claimed in claim 1, wherein step S200 comprises,
step S210, the natural images are transmitted to a feature extraction network, and a third layer C3, a fourth layer C4 and a fifth layer C5 of a ResNet network in the feature extraction network generate corresponding input feature maps P3, input feature maps P4 and input feature maps P5;
in step S220, two convolution layers are added to the input feature map P5 generated at the fifth layer, so as to generate two new input feature maps P6 and P7, and obtain a feature pyramid composed of five input feature maps with different sizes.
3. The method for constructing the text region detection model based on the natural scene without anchor point as claimed in claim 1, wherein step S300 comprises,
step S310, compressing the Input Feature Map in the Feature pyramid in the spatial dimension by using maximum pooling Maxpool and average pooling Avgppool operations to generate two different spatial and contextual descriptors; inputting the two descriptors into a shared network, wherein the shared network consists of a hidden layer of multi-layer perceptrons MLP, and respectively generating corresponding sub-channel attention drawings through the shared network; merging the two generated subchannel attention diagrams to generate an attention weight diagram; performing dot product operation on the attention weight graph and the Input Feature Map to generate a Channel Refined Feature Map;
step S320, performing Maxpool operation and average pool Avgpool operation on the Channel accurate Feature Map along the Channel axis of the Channel accurate Feature Map, and performing connection operation on the generated Feature Map to generate a Feature descriptor; applying the convolutional layer Conv on the feature descriptor to generate a Spatial Attention map; and performing dot product operation on the generated Spatial Attention Map and the Channel accurate Feature Map to generate a Spatial Attention accurate Feature Map.
4. The method for constructing the natural scene character region detection model based on the anchorless as claimed in claim 1, wherein a classification Loss, a regression Loss CIoU Loss, a centrality Loss and an angle Loss are used as a joint Loss function, and a calculation formula of the joint Loss function is as follows:
Figure FDA0002724619210000021
wherein L iscls、Lreg、Lθ、LcesRespectively classification loss, regression loss, angle loss, centrality loss, NposRepresenting the number of positive samples, l being an indicator function when
Figure FDA0002724619210000022
When the position is classified as text, the value of the function is 1, otherwise, the value of the function is 0;
specifically, the classification penalty is:
Figure FDA0002724619210000023
the regression Loss CIoU Loss is:
Figure FDA0002724619210000024
wherein, b and bgtRespectively representing the central points of the prediction frame and the target frame, wherein p () is the Euclidean distance for calculating the two central points, a is a coefficient for balancing the length-width ratio, and v is used for measuring the ratio consistency of the prediction frame and the target frame;
the angle loss function is:
Lθ(θ,θ*)=1-cos(θ-θ*) Where θ represents the predicted tilt angle, θ*And representing the inclination angle of the target box characters.
The centrality loss is:
Lces(c,c*)=-c*·log(c)+(1-c*) Log (1-c), wherein c and c*Respectively, the predicted value of the centrality and the centrality of the target frame.
5. The method for constructing the text region detection model based on the natural scene without anchor point as claimed in claim 1, wherein step S600 comprises,
step S610, for the natural scene character detection model without anchor point, detecting the image data set TtestObtaining a characteristic Feature Map corresponding to the detected image as input, and generating the distance between a pixel point corresponding to a certain point in the Regression Feature Map and the Angle Feature Map and the distance between the point in the detected image and four frames in a prediction frame so as to generate the prediction frame;
obtaining a primary Classification score and a centrality score of the point according to the Classification Feature Map and the centrality Feature Map, and multiplying the primary Classification score obtained according to the Classification Feature Map by the centrality to obtain a final Classification score;
and S620, filtering the prediction box by using a non-maximum suppression algorithm NMS and the final classification score to obtain a character region in the detected image.
CN202011098722.7A 2020-10-14 2020-10-14 Method for constructing natural scene character region detection model based on no anchor point Pending CN112149620A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011098722.7A CN112149620A (en) 2020-10-14 2020-10-14 Method for constructing natural scene character region detection model based on no anchor point

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011098722.7A CN112149620A (en) 2020-10-14 2020-10-14 Method for constructing natural scene character region detection model based on no anchor point

Publications (1)

Publication Number Publication Date
CN112149620A true CN112149620A (en) 2020-12-29

Family

ID=73951780

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011098722.7A Pending CN112149620A (en) 2020-10-14 2020-10-14 Method for constructing natural scene character region detection model based on no anchor point

Country Status (1)

Country Link
CN (1) CN112149620A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112560857A (en) * 2021-02-20 2021-03-26 鹏城实验室 Character area boundary detection method, equipment, storage medium and device
CN112926584A (en) * 2021-05-11 2021-06-08 武汉珈鹰智能科技有限公司 Crack detection method and device, computer equipment and storage medium
CN112966690A (en) * 2021-03-03 2021-06-15 中国科学院自动化研究所 Scene character detection method based on anchor-free frame and suggestion frame
CN113255906A (en) * 2021-04-28 2021-08-13 中国第一汽车股份有限公司 Method, device, terminal and storage medium for returning obstacle 3D angle information in automatic driving
CN113435266A (en) * 2021-06-09 2021-09-24 东莞理工学院 FCOS intelligent target detection method based on extreme point feature enhancement
CN113723563A (en) * 2021-09-13 2021-11-30 中科南京人工智能创新研究院 Vehicle detection algorithm based on FCOS improvement
CN114022558A (en) * 2022-01-05 2022-02-08 深圳思谋信息科技有限公司 Image positioning method and device, computer equipment and storage medium
CN114841244A (en) * 2022-04-05 2022-08-02 西北工业大学 Target detection method based on robust sampling and mixed attention pyramid
CN114913110A (en) * 2021-02-08 2022-08-16 深圳中科飞测科技股份有限公司 Detection method and system, equipment and storage medium
CN118711040A (en) * 2024-08-29 2024-09-27 杭州久烁网络科技有限公司 FCOS network optimization method and system based on feature fusion and attention mechanism

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109117836A (en) * 2018-07-05 2019-01-01 中国科学院信息工程研究所 Text detection localization method and device under a kind of natural scene based on focal loss function
US20200090506A1 (en) * 2018-09-19 2020-03-19 National Chung-Shan Institute Of Science And Technology License plate recognition system and license plate recognition method
CN111126472A (en) * 2019-12-18 2020-05-08 南京信息工程大学 Improved target detection method based on SSD
WO2020097734A1 (en) * 2018-11-15 2020-05-22 Element Ai Inc. Automatically predicting text in images
CN111723798A (en) * 2020-05-27 2020-09-29 西安交通大学 Multi-instance natural scene text detection method based on relevance hierarchy residual errors

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109117836A (en) * 2018-07-05 2019-01-01 中国科学院信息工程研究所 Text detection localization method and device under a kind of natural scene based on focal loss function
US20200090506A1 (en) * 2018-09-19 2020-03-19 National Chung-Shan Institute Of Science And Technology License plate recognition system and license plate recognition method
WO2020097734A1 (en) * 2018-11-15 2020-05-22 Element Ai Inc. Automatically predicting text in images
CN111126472A (en) * 2019-12-18 2020-05-08 南京信息工程大学 Improved target detection method based on SSD
CN111723798A (en) * 2020-05-27 2020-09-29 西安交通大学 Multi-instance natural scene text detection method based on relevance hierarchy residual errors

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
JIFENG DAI等: "Deformable Convolutional Networks", 《ARXIV》, pages 1 - 12 *
SANGHYUN WOO等: "CBAM Convolutional Block Attention Module", 《ARXIV》, pages 1 - 17 *
ZHI TIAN等: "FCOS: Fully Convolutional One-Stage Object Detection", 《ARXIV》, pages 1 - 13 *
刘济樾: "基于轻量化网络的实时人脸检测方法研究", 《中国优秀硕士学位论文全文数据库:信息科技辑》, no. 7, pages 1 - 84 *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114913110A (en) * 2021-02-08 2022-08-16 深圳中科飞测科技股份有限公司 Detection method and system, equipment and storage medium
CN112560857A (en) * 2021-02-20 2021-03-26 鹏城实验室 Character area boundary detection method, equipment, storage medium and device
CN112966690B (en) * 2021-03-03 2023-01-13 中国科学院自动化研究所 Scene character detection method based on anchor-free frame and suggestion frame
CN112966690A (en) * 2021-03-03 2021-06-15 中国科学院自动化研究所 Scene character detection method based on anchor-free frame and suggestion frame
CN113255906A (en) * 2021-04-28 2021-08-13 中国第一汽车股份有限公司 Method, device, terminal and storage medium for returning obstacle 3D angle information in automatic driving
CN112926584B (en) * 2021-05-11 2021-08-06 武汉珈鹰智能科技有限公司 Crack detection method and device, computer equipment and storage medium
CN112926584A (en) * 2021-05-11 2021-06-08 武汉珈鹰智能科技有限公司 Crack detection method and device, computer equipment and storage medium
CN113435266A (en) * 2021-06-09 2021-09-24 东莞理工学院 FCOS intelligent target detection method based on extreme point feature enhancement
CN113435266B (en) * 2021-06-09 2023-09-01 东莞理工学院 FCOS intelligent target detection method based on extremum point characteristic enhancement
CN113723563A (en) * 2021-09-13 2021-11-30 中科南京人工智能创新研究院 Vehicle detection algorithm based on FCOS improvement
CN114022558A (en) * 2022-01-05 2022-02-08 深圳思谋信息科技有限公司 Image positioning method and device, computer equipment and storage medium
CN114841244A (en) * 2022-04-05 2022-08-02 西北工业大学 Target detection method based on robust sampling and mixed attention pyramid
CN114841244B (en) * 2022-04-05 2024-03-12 西北工业大学 Target detection method based on robust sampling and mixed attention pyramid
CN118711040A (en) * 2024-08-29 2024-09-27 杭州久烁网络科技有限公司 FCOS network optimization method and system based on feature fusion and attention mechanism

Similar Documents

Publication Publication Date Title
CN112149620A (en) Method for constructing natural scene character region detection model based on no anchor point
CN108961235B (en) Defective insulator identification method based on YOLOv3 network and particle filter algorithm
CN110428432B (en) Deep neural network algorithm for automatically segmenting colon gland image
JP6547069B2 (en) Convolutional Neural Network with Subcategory Recognition Function for Object Detection
CN112150821B (en) Lightweight vehicle detection model construction method, system and device
CN111222396B (en) All-weather multispectral pedestrian detection method
CN114565860B (en) Multi-dimensional reinforcement learning synthetic aperture radar image target detection method
CN112836713A (en) Image anchor-frame-free detection-based mesoscale convection system identification and tracking method
CN113052200B (en) Sonar image target detection method based on yolov3 network
CN111626993A (en) Image automatic detection counting method and system based on embedded FEFnet network
JP2022025008A (en) License plate recognition method based on text line recognition
CN113569724B (en) Road extraction method and system based on attention mechanism and dilation convolution
CN116342894B (en) GIS infrared feature recognition system and method based on improved YOLOv5
CN113222824B (en) Infrared image super-resolution and small target detection method
CN113033315A (en) Rare earth mining high-resolution image identification and positioning method
CN112446292B (en) 2D image salient object detection method and system
CN111860587A (en) Method for detecting small target of picture
CN113052215A (en) Sonar image automatic target identification method based on neural network visualization
CN113505634A (en) Double-flow decoding cross-task interaction network optical remote sensing image salient target detection method
CN114821316A (en) Three-dimensional ground penetrating radar crack disease identification method and system
CN114565824B (en) Single-stage rotating ship detection method based on full convolution network
CN113609904B (en) Single-target tracking algorithm based on dynamic global information modeling and twin network
CN114821346A (en) Radar image intelligent identification method and system based on embedded platform
CN111476226B (en) Text positioning method and device and model training method
CN117152508A (en) Target detection method for decoupling positioning and classification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20201229

WD01 Invention patent application deemed withdrawn after publication