CN110516669B - Multi-level and multi-scale fusion character detection method in complex environment - Google Patents

Multi-level and multi-scale fusion character detection method in complex environment Download PDF

Info

Publication number
CN110516669B
CN110516669B CN201910781042.6A CN201910781042A CN110516669B CN 110516669 B CN110516669 B CN 110516669B CN 201910781042 A CN201910781042 A CN 201910781042A CN 110516669 B CN110516669 B CN 110516669B
Authority
CN
China
Prior art keywords
features
substep
network
scale
level
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910781042.6A
Other languages
Chinese (zh)
Other versions
CN110516669A (en
Inventor
袁媛
王�琦
刘琛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northwestern Polytechnical University
Original Assignee
Northwestern Polytechnical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwestern Polytechnical University filed Critical Northwestern Polytechnical University
Priority to CN201910781042.6A priority Critical patent/CN110516669B/en
Publication of CN110516669A publication Critical patent/CN110516669A/en
Application granted granted Critical
Publication of CN110516669B publication Critical patent/CN110516669B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/24Aligning, centring, orientation detection or correction of the image
    • G06V10/243Aligning, centring, orientation detection or correction of the image by compensating for image skew or non-uniform image deformations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images

Abstract

The invention relates to a multi-level and multi-scale fusion character detection method in a complex environment, which can effectively solve the problems of text positioning and detection in the complex background environment (variable illumination, variable contrast and the like), and has the advantages of high training speed and high detection precision rate which is more than 77%. Aiming at pictures containing texts with various shapes and scales in various natural scenes, the method has the characteristics of high efficiency, accuracy, simplicity and the like.

Description

Multi-level and multi-scale fusion character detection method in complex environment
Technical Field
The invention belongs to the technical field of computer vision and graphic processing, and particularly relates to a multi-level and multi-scale fusion character detection method in a complex environment.
Background
The character detection under the complex scene plays an important role in intelligent transportation, bill recognition and the like, but because the character picture to be detected is usually acquired under the complex scene under the actual condition, the character picture to be detected may possibly encounter the interference factors of poor picture quality, fuzzy character font, text bending, low contrast, different character fonts and the like. Meanwhile, because a text is different from a general target in a picture, the shape difference (such as different lengths) between words or a line of texts is very large, and meanwhile, the detection of the text in a complex environment faces the problem of large change of the target size.
For the problems, on the basis of general target detection, two methods for text detection are generally provided, one is to detect the position of a target central point, propose candidate frames with various sizes and shapes based on the central point, and then find the most appropriate one of the candidate frames; the other is to directly output the coordinate regression point based on it without considering the candidate box.
Text detection based on candidate boxes the problem of using a rotating region candidate box network to generate candidate boxes with oblique angles to cope with shape variations of text is proposed in the documents "j.ma, w.sho, h.ye, l.wang, h.wang, y.zheng, and x.xue," align-oriented scene text detection via positions protocols, "IEEE Transactions on Multimedia, vol.20, No.11, pp.3111-3122,2018".
Methods based on direct regression coordinates have proposed a simple and effective network that directly predicts text lines and removes intermediate steps for generating candidate frame networks in the documents "x.zhou, c.yao, h.wen, y.wang, s.zhou, w.he, and j.liang," EAST: an effective and available scene context detector, "in proc.ieee conf.conference on Computer Vision and Pattern Recognition,2017, pp.5551-5560.
Both of these approaches have their limitations. When multi-scale text detection is performed, multi-scale feature information is difficult to effectively utilize, and meanwhile, the practicability is not high due to the fact that the number of model parameters is large and the network structure is complex.
Disclosure of Invention
The technical problem solved by the invention is as follows: in order to solve the limitation of the existing scene character detection algorithm to the specific problem, the invention relates to a character detection method with multi-level and multi-scale fusion under a complex environment.
The technical scheme of the invention is as follows: a multi-level and multi-scale fusion character detection method under a complex environment comprises the following steps:
the method comprises the following steps: a training phase comprising the following sub-steps:
the first substep: and (3) expanding the image data of the training picture with the label through the combination of three modes of rotating, turning and changing light and shade, extracting 30% of all data, and performing the three operations to obtain expanded image data. Incorporating it into the original image data to form a new enlarged image data set for subsequent operation;
and a second substep: inputting the enlarged image dataset sample image obtained in the substep one into a ResNetXt-101 network; extracting the output characteristics of 8 layers of 'conv 1', 'conv 2_ 1', 'conv 3_ 1', 'conv 3_ 4', 'conv 4_ 3', 'conv 4_ 12', 'conv 4_ 20' and 'conv 5_ 3' of the ResNetXt-101 network respectively, wherein the characteristics are depth characteristics of different scales;
and a third substep: the purpose of this step is to fuse the 8 different scale depth features obtained in sub-step two. Firstly, respectively passing 8 depth features with different scales through 8 transformation modules which are formed by cascading a convolution layer with a convolution kernel size of 1x1, a Batch Normalization layer and a ReLU layer (linear rectification activation function layer) to obtain 8 transformation features with different scales; respectively carrying out Bilinear Upsampling (Bilinear Upsampling) operation on 8 conversion characteristics with different scales, so that the scales of the conversion characteristics are unified to the maximum scale of the conversion characteristics; finally, stacking the 8 uniform-scale transformation features according to the number of channels to form a multi-scale fusion feature;
and a fourth substep: and (4) passing the multi-scale fusion features obtained in the substep three through k decoding networks to obtain multi-level multi-scale features: each decoding network is composed of n convolution layers and n deconvolution layers which are completely symmetrical, and the size of the characteristic dimension obtained after the decoding network is the same as the original size, so that the decoding network can be cascaded with multi-scale characteristics. And sending the concatenated features to the next decoding network. Each decoding network can obtain 2n features of n scales, the 2nk features output by the k decoding networks are combined according to the scales, the features of the same scale are combined and cascaded according to the channels, and the features of different scales after being combined by n different levels, namely the multi-level features of n different scales can be obtained.
And a fifth substep: introducing a CBAM (convolutional Block attention Module) model, and for n multi-level features obtained in the fourth substep, enabling the n multi-level features to pass through the CBAM module respectively to obtain n multi-level fusion features with different scales, and upsampling the n multi-level features with different scales and cascading the upsampled multi-level features into a multi-scale multi-level feature according to channels.
And a sixth substep: sending the multi-scale and multi-level features into a regression prediction network; the regression prediction network consists of a convolution layer with convolution kernel size of 1x1 and a full connection layer, and the multi-scale multi-level features obtained through the substep five are sent to the regression prediction network to output a feature of 1x 5 h. It represents the predicted h outcomes. Each prediction result comprises 5 attributes (coordinate values of the upper left corner point and the lower right corner point of the detection frame and a prediction score), and finally the maximum score is screened out according to the size of the prediction score to serve as the prediction result.
And a seventh substep: optimizing the target by using a GIoU target function, optimizing the target function by using a random gradient descent method, continuously updating the parameters of each network layer, and performing iterative optimization to store a group of model parameters which enable the result to tend to be stable;
step two: a detection phase comprising the following sub-steps:
the first substep: when a text picture is detected every time, reading model parameters obtained in the training stage, applying the model parameters to each network layer and fixing the model parameters, repeating the substeps two to six in the step one on the text picture to be detected, obtaining h prediction outputs, and outputting the prediction outputs including coordinate positions and prediction scores;
and a second substep: and selecting the coordinates of the regression box with the maximum prediction score as the text coordinate position.
The further technical scheme of the invention is as follows: the rotation angle in the first substep is randomly generated by using a computer random seed from-90 degrees to 90 degrees, the turning mode is a random one of left-right turning, up-down turning and constant keeping, and the mode of changing brightness is randomly zooming by 0.5-1.5 times.
Effects of the invention
The invention has the technical effects that: the invention discloses a method for positioning and detecting a multi-scale-transformation text in any direction, which can effectively solve the problem of positioning and detecting the text in a complex background environment (changeable illumination, changeable contrast and the like), and has the advantages of high training speed and high detection precision rate which is up to more than 77%. Aiming at pictures containing texts with various shapes and scales in various natural scenes, the method has the characteristics of high efficiency, accuracy, simplicity and the like.
Drawings
FIG. 1 is a flow chart of a multi-scale and multi-level text detection method according to the present invention
Detailed Description
Referring to fig. 1, the invention provides a multi-level and multi-scale fusion text detection method in a complex environment, so as to solve the limitations of the conventional text detection method and the general target detection on the specific problem. The technical scheme comprises two stages: a training phase and a testing phase.
A training stage:
1. expanding data by using an original training set sample picture in modes of rotating, turning, changing brightness and the like on training and images so as to obtain a multi-angle and multi-brightness expanded image data set;
2. inputting the enlarged image dataset sample image obtained in step 1 into a ResNetXt-101 network; in consideration of balance of computational complexity and effect, extracting outputs of 8 layers of 'conv 1', 'conv 2_ 1', 'conv 3_ 1', 'conv 3_ 4', 'conv 4_ 3', 'conv 4_ 12', 'conv 4_ 20' and 'conv 5_ 3' of the ResNetXt-101 network respectively to obtain depth features of 8 different scales;
3. firstly, respectively passing the 8 depth features with different scales obtained in the step 2 through 8 transformation modules to obtain 8 transformation features with different scales; respectively carrying out Bilinear Upsampling (Bilinear Upsampling) operation on 8 conversion characteristics with different scales, so that the scales of the conversion characteristics are unified to the maximum scale of the conversion characteristics; finally, stacking the 8 uniform transformation characteristics into a multi-scale fusion characteristic according to the number of channels;
4. cascading the multi-scale fusion features obtained in the step 3 into the 1 st decoding network to obtain a group of 1 st-level features of multiple scales, cascading the last layer output of the 1 st decoding network with the multi-scale fusion features obtained in the step 3, sending the output into the 2 nd decoding network to obtain a group of 2 nd-level features of multiple scales, cascading the last layer output of the 2 nd decoding network with the multi-scale fusion features obtained in the step 3, sending the output into the 3 rd decoding network to obtain a group of 3 rd-level features of multiple scales … …, and repeating continuously after passing through k decoding networks to obtain k groups of k-level features of multiple scales. And combining the features with the same scale corresponding to the k groups of features to obtain a group of multi-scale 'multi-level features'. The decoding network is a lightweight feature extraction network unit formed by a convolution network and a deconvolution network. In the embodiment of the invention, the characteristics are as follows: consists of n convolutional layers and n anti-convolutional layers, and for each layer, a plurality of scale features are extracted as a set of the levels. Here, k denotes the number of decoding networks, n denotes the number of convolutional layers and deconvolution layers in each decoding network, and the number of convolutional layers and deconvolution layers is the same. K here may theoretically take 2 to infinity, but in view of complexity, 2 ≦ n ≦ 5 is generally taken, and k is desirably 3 in the present embodiment; in theory n may be any integer greater than 1, but in view of complexity, 1 ≦ n ≦ 5 is generally used, and n ≦ 3 is desirable in this embodiment
5. And introducing a CBAM (CBAM) model, respectively sending the group of multi-scale 'multi-level features' obtained in the step 4 into a CBAM module to obtain a group of multi-scale 'multi-level fusion new features', unifying the 'multi-level fusion new features' of different scales to the maximum scale through upsampling, and finally cascading according to channels to obtain the 'multi-scale multi-level features'.
6. Sending the multi-scale multi-level features obtained in the step (5) into a regression prediction network, wherein the regression prediction network is composed of a convolution layer and a full-connection layer, the output size is 1 multiplied by 5h, the regression prediction network represents h results of primary prediction, each result comprises four coordinate positions and confidence coefficients, and the result with the maximum confidence coefficient is selected as the coordinate position obtained by prediction;
7. optimizing the target by using a GIoU objective function, and finally training to obtain a group of depth model parameters to be stored;
and (3) a testing stage:
1. and when detecting the text picture every time, reading the model parameters obtained in the training stage, applying the model parameters to each network layer and fixing, repeating the steps 2 to 6 in the training stage on the text picture to be detected, obtaining h prediction outputs, wherein the outputs comprise coordinate positions and prediction scores. All the network layers mentioned above (including the ResNeXt-101 network, the transformation module, the decoding network, the CBAM, the prediction regression network and the like) have a large number of parameters, the parameters are continuously updated and optimized based on an objective function and an optimization method in a training stage, finally, stable solutions are obtained and stored, the parameters which are trained and optimized are directly read in a testing stage and applied to each network layer, and a picture to be tested can obtain an expected ideal output result through the network layers. The prediction score is also the result of the output of the network, which represents that this result of the prediction may not be credible, and is predicted by the model learned during the training phase without human factor control.
2. And selecting the coordinates of the regression box with the maximum prediction score as the text coordinate position. (the coordinates of the upper left corner and the lower right corner of the frame are predicted to be output, and a rectangular frame can be determined according to the four values, and the text in the picture is framed out.) in order to more clearly illustrate the technical solution implemented by the present invention, the following briefly introduces each module required in the description of the embodiment. It should be apparent that the drawings in the following description are only flow charts of the present invention, and it is obvious for those skilled in the art to expand the drawings and obtain other drawings without creative efforts.
Referring to fig. 1, the implementation steps of the invention are as follows:
step 1, firstly, performing data expansion on a labeled training picture through a combination of three modes of rotating, turning and changing light and shade, wherein the rotating angle is randomly generated by using a computer random seed from-90 degrees to 90 degrees, the turning mode is a random one of left-right turning, up-down turning and keeping unchanged, the changing light and shade mode is random scaling by 0.5-1.5 times, extracting 30% of all data to perform the three operations, obtaining expanded data, and combining the expanded data with the original data to form a new expanded data set for subsequent operation.
Step 2, inputting the enlarged image data set sample image obtained in the step 1 into a ResNetXt-101 network; extracting the output characteristics of 8 layers of 'conv 1', 'conv 2_ 1', 'conv 3_ 1', 'conv 3_ 4', 'conv 4_ 3', 'conv 4_ 12', 'conv 4_ 20' and 'conv 5_ 3' of the ResNetXt-101 network respectively, wherein the characteristics are depth characteristics of different scales;
and 3, fusing the 8 depth features with different scales obtained in the step 2. Firstly, respectively passing 8 depth features with different scales through 8 transformation modules which are formed by cascading a convolution layer with a convolution kernel size of 1x1, a Batch Normalization layer and a ReLU layer (linear rectification activation function layer) to obtain 8 transformation features with different scales; respectively carrying out Bilinear Upsampling (Bilinear Upsampling) operation on 8 conversion characteristics with different scales, so that the scales of the conversion characteristics are unified to the maximum scale of the conversion characteristics; finally, stacking the 8 uniform-scale transformation features according to the number of channels to form a multi-scale fusion feature;
and 4, passing the multi-scale fusion features obtained in the step 3 through k decoding networks to obtain multi-level multi-scale features: each decoding network is composed of n convolution layers and n deconvolution layers which are completely symmetrical, and the size of the characteristic dimension obtained after the decoding network is the same as the original size, so that the decoding network can be cascaded with multi-scale characteristics. And sending the concatenated features to the next decoding network. Each decoding network can obtain 2n features of n scales, the 2nk features output by the k decoding networks are combined according to the scales, the features of the same scale are combined and cascaded according to the channels, and the features of different scales after being combined by n different levels, namely the multi-level features of n different scales can be obtained.
And 5, introducing a CBAM (convolutional Block attention module) model, enabling the n multi-level features obtained in the step 4 to respectively pass through the CBAM module to obtain n multi-level fusion features with different scales, and upsampling the n multi-level features with different scales and cascading the upsampled multi-level features into a multi-scale multi-level feature according to a channel.
Step 6, the multi-scale and multi-level features are sent into a regression prediction network; the regression prediction network consists of a convolution layer with convolution kernel size of 1x1 and a full connection layer, and the multi-scale multi-level features obtained in the step 5 are sent to the regression prediction network to output a feature of 1x 5 h. It represents the predicted h outcomes. Each prediction result comprises 5 attributes (coordinate values of the upper left corner point and the lower right corner point of the detection frame and a prediction score), and finally the maximum score is screened out according to the size of the prediction score to serve as the prediction result.
Step 7, optimizing the target by using a GIoU target function, optimizing the target function by using a random gradient descent method, continuously updating the parameters of each network layer, and performing iterative optimization to store a group of model parameters which enable the result to tend to be stable;
and 8, removing the optimization process in the step 7, importing the trained model parameters, carrying out the processing of the steps 1 to 6 on the picture to be tested, reasoning to obtain h output results, and selecting one with the largest prediction score as a text position for output.
The effects of the present invention can be further explained by the following simulation experiments.
1. Simulation conditions
The invention is characterized in that a central processing unit is Intel (R) core (TM) i7-6800K CPU @3.40GHz, a memory 128G and a graphics processor are
Figure BDA0002176590920000071
And (3) carrying out simulation on an Ubuntu14.04LTS operating system of the Tesla 1080Ti GPU by utilizing a Pytrch framework.
The data used in the simulation was a text detection picture of ICDAR 2015.
2. Emulated content
Firstly, learning features according to training steps in a specific implementation mode by using a training set; then according to the test steps, the pictures in the test set are combined with the real mark result to calculate the accuracy rate P, the recall rate R and the calculation F1Value of wherein
Figure BDA0002176590920000072
In order to prove the effectiveness of the algorithm, a Deep Matching Prior Network (DMPNet) is selected, and a model of a connected Text area Network (CTPN) and a full volume multidirectional Text Detection Network (MCLAB FCN) is used as a comparison algorithm, wherein the DMPNet algorithm is described In detail In the documents of Y.Liu and L.jin, "Deep Matching precursor Network: aware Tight Multi-oriented Text Detection," In Proceedings of IEEE Conference Computer Vision and Pattern Recognition, pp.3454-3461,2017 "; the CTPN algorithm is proposed In the literature "Z.Tian, W.Huang, T.He, P.He, and Y.Qiao," Detecting Text In Natural Image with connectivity Text protocol Network, "In Proceedings of IEEE Conference on European Conference on Computer Vision, pp.56-72,2016"; the MCLAB FCN algorithm is proposed In the literature "Z.Zhang, C.Zhang, W.Shen, C.Yao, W.Liu, and X.Bai," Multi-oriented Text Detection with full connectivity Networks, "In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp.4159-4167,2016". The above comparison methods are typical methods with prominent effect in the art, and the comparison results are shown in table 1.
TABLE 1 comparative results
Method Recall Precision F-measure
CTPN 51.56% 74.22% 60.85%
MCLAB_FCN 43.09% 70.81% 53.58%
DMPNet 68.22% 73.23% 70.64%
MMText(ours) 52.34% 76.26% 62.07%
As can be seen from table 1, the detection accuracy of the present invention is significantly higher than that of the general text detection algorithm. Although the method is slightly insufficient in recall rate, the method has good adaptability and good practicability and robustness in detection under complex environments.
The effectiveness of the invention can be verified through the simulation experiment.

Claims (2)

1. A multi-level and multi-scale fusion character detection method under a complex environment is characterized by comprising the following steps:
the method comprises the following steps: a training phase comprising the following sub-steps:
the first substep: carrying out image data expansion on the training picture with the label through the combination of three modes of rotating, turning and changing light and shade, extracting 30% of all data and carrying out the three operations to obtain expanded image data; incorporating it into the original image data to form a new enlarged image data set for subsequent operation;
and a second substep: inputting the enlarged image dataset sample image obtained in the substep one into a ResNetXt-101 network; extracting the output characteristics of 8 layers of 'conv 1', 'conv 2_ 1', 'conv 3_ 1', 'conv 3_ 4', 'conv 4_ 3', 'conv 4_ 12', 'conv 4_ 20' and 'conv 5_ 3' of the ResNetXt-101 network respectively, wherein the characteristics are depth characteristics of different scales;
and a third substep: the purpose of this step is to fuse the depth features of 8 different scales obtained in the sub-step two; firstly, respectively passing 8 depth features with different scales through 8 transformation modules which are formed by cascading a convolution layer with a convolution kernel size of 1x1, a Batch Normalization layer and a ReLU layer (linear rectification activation function layer) to obtain 8 transformation features with different scales; respectively carrying out Bilinear Upsampling (Bilinear Upsampling) operation on 8 conversion characteristics with different scales, so that the scales of the conversion characteristics are unified to the maximum scale of the conversion characteristics; finally, stacking the 8 uniform-scale transformation features according to the number of channels to form a multi-scale fusion feature;
and a fourth substep: and (4) passing the multi-scale fusion features obtained in the substep three through k decoding networks to obtain multi-level multi-scale fusion features: each decoding network is composed of n convolution layers and n deconvolution layers which are completely symmetrical, and the size of the characteristic dimension obtained after the decoding network is the same as the original size, so that the decoding network can be cascaded with multi-scale fusion characteristics; sending the cascaded features to a next decoding network; each decoding network can obtain 2n features of n scales, the 2nk features output by the k decoding networks are combined according to the scales, the features of the same scale are combined and cascaded according to the channels, and the features of different scales after being combined by n different levels, namely the multi-level features of n different scales can be obtained;
and a fifth substep: introducing a CBAM (convolutional Block attention module) model, and for n multi-level features obtained in the fourth substep, enabling the n multi-level features to pass through the CBAM module respectively to obtain n multi-level fusion features with different scales, and performing up-sampling on the n multi-level features with different scales and cascading the n multi-level features into a multi-scale multi-level feature according to a channel;
and a sixth substep: sending the multi-scale and multi-level features into a regression prediction network; the regression prediction network consists of a convolution layer with convolution kernel size of 1x1 and a full connection layer, and the multi-scale multi-level features obtained in the substep five are sent to the regression prediction network to output a feature of 1x 5 h; it represents the predicted h outcomes; each prediction result comprises 5 attributes, and the maximum score is screened out as the prediction result according to the size of the prediction score;
and a seventh substep: optimizing the target by using a GIoU target function, optimizing the target function by using a random gradient descent method, continuously updating parameters of each network layer, and performing iterative optimization to store a group of model parameters which enable the result to tend to be stable;
step two: a detection phase comprising the following sub-steps:
the first substep: when a text picture is detected every time, reading model parameters obtained in the training stage, applying the model parameters to each network layer and fixing the model parameters, repeating the substeps two to six in the step one on the text picture to be detected, obtaining h prediction outputs, and outputting the prediction outputs including coordinate positions and prediction scores;
and a second substep: and selecting the coordinates of the regression box with the maximum prediction score as the text coordinate position.
2. The method for detecting multilayer multi-scale fusion characters under complex environment as claimed in claim 1, wherein the angle of rotation in the sub-step one is randomly generated from-90 degrees to 90 degrees by using a computer random seed, the turning mode is a random one of left-right turning, up-down turning and keeping unchanged, and the mode of changing brightness is randomly scaled by 0.5-1.5 times.
CN201910781042.6A 2019-08-23 2019-08-23 Multi-level and multi-scale fusion character detection method in complex environment Active CN110516669B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910781042.6A CN110516669B (en) 2019-08-23 2019-08-23 Multi-level and multi-scale fusion character detection method in complex environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910781042.6A CN110516669B (en) 2019-08-23 2019-08-23 Multi-level and multi-scale fusion character detection method in complex environment

Publications (2)

Publication Number Publication Date
CN110516669A CN110516669A (en) 2019-11-29
CN110516669B true CN110516669B (en) 2022-04-29

Family

ID=68626229

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910781042.6A Active CN110516669B (en) 2019-08-23 2019-08-23 Multi-level and multi-scale fusion character detection method in complex environment

Country Status (1)

Country Link
CN (1) CN110516669B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111259764A (en) * 2020-01-10 2020-06-09 中国科学技术大学 Text detection method and device, electronic equipment and storage device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107750366A (en) * 2015-01-08 2018-03-02 线性代数技术有限公司 Hardware accelerator for histogram of gradients
CN109165697A (en) * 2018-10-12 2019-01-08 福州大学 A kind of natural scene character detecting method based on attention mechanism convolutional neural networks
CN109753956A (en) * 2018-11-23 2019-05-14 西北工业大学 The multi-direction text detection algorithm extracted based on dividing candidate area
CN110097049A (en) * 2019-04-03 2019-08-06 中国科学院计算技术研究所 A kind of natural scene Method for text detection and system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107750366A (en) * 2015-01-08 2018-03-02 线性代数技术有限公司 Hardware accelerator for histogram of gradients
CN109165697A (en) * 2018-10-12 2019-01-08 福州大学 A kind of natural scene character detecting method based on attention mechanism convolutional neural networks
CN109753956A (en) * 2018-11-23 2019-05-14 西北工业大学 The multi-direction text detection algorithm extracted based on dividing candidate area
CN110097049A (en) * 2019-04-03 2019-08-06 中国科学院计算技术研究所 A kind of natural scene Method for text detection and system

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
Aggregated Residual Transformations for Deep Neural Networks;Saining Xie et al.;《2017 IEEE Conference on Computer Vision and Pattern Recognition》;20171109;第5987-5995页 *
Attention-YOLO:引入注意力机制的YOLO检测算法;徐诚极 等;《计算机工程与应用》;20190331;第55卷(第6期);第13-23,125页 *
CBAM: Convolutional Block Attention Module;Sanghyun Woo et al.;《arXiv》;20180718;第1-17页 *
Deep Matching Prior Network: Toward Tighter Multi-oriented Text Detection;Yuliang Liu et al.;《arXiv》;20170304;第1-8页 *
Multi-Oriented Text Detection with Fully Convolutional Networks;Zheng Zhang et al.;《2016 IEEE Conference on Computer Vision and Pattern Recognition》;20161212;第4159-4167页 *
Scene Text Detection via Deep Semantic Feature Fusion and Attention-based Refinement;Yu Song et al.;《2018 24th International Conference on Pattern Recognition (ICPR)》;20181129;第3747-3752页 *
Squeeze-and-Excitation Networks;Jie Hu et al.;《arXiv》;20190516;第1-13页 *

Also Published As

Publication number Publication date
CN110516669A (en) 2019-11-29

Similar Documents

Publication Publication Date Title
CN111461110A (en) Small target detection method based on multi-scale image and weighted fusion loss
CN112183501B (en) Depth counterfeit image detection method and device
CN111489357A (en) Image segmentation method, device, equipment and storage medium
CN111353544B (en) Improved Mixed Pooling-YOLOV 3-based target detection method
CN111274981B (en) Target detection network construction method and device and target detection method
CN115131797B (en) Scene text detection method based on feature enhancement pyramid network
CN110866938B (en) Full-automatic video moving object segmentation method
CN110598715A (en) Image recognition method and device, computer equipment and readable storage medium
CN115861462B (en) Training method and device for image generation model, electronic equipment and storage medium
CN111553351A (en) Semantic segmentation based text detection method for arbitrary scene shape
CN111612789A (en) Defect detection method based on improved U-net network
Ji et al. LGCNet: A local-to-global context-aware feature augmentation network for salient object detection
CN112070040A (en) Text line detection method for video subtitles
CN110633706B (en) Semantic segmentation method based on pyramid network
Zhu et al. DFTR: Depth-supervised fusion transformer for salient object detection
CN101650824B (en) Content erotic image zooming method based on conformal energy
CN110516669B (en) Multi-level and multi-scale fusion character detection method in complex environment
CN113052187B (en) Global feature alignment target detection method based on multi-scale feature fusion
Zong et al. A cascaded refined rgb-d salient object detection network based on the attention mechanism
CN110580462B (en) Natural scene text detection method and system based on non-local network
CN111753714A (en) Multidirectional natural scene text detection method based on character segmentation
CN113052156B (en) Optical character recognition method, device, electronic equipment and storage medium
CN110619387A (en) Channel expansion method based on convolutional neural network
CN113343979B (en) Method, apparatus, device, medium and program product for training a model
Guo et al. Text detection of power equipment nameplates based on deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant