CN111062386B - Natural scene text detection method based on depth pyramid attention and feature fusion - Google Patents
Natural scene text detection method based on depth pyramid attention and feature fusion Download PDFInfo
- Publication number
- CN111062386B CN111062386B CN201911192949.5A CN201911192949A CN111062386B CN 111062386 B CN111062386 B CN 111062386B CN 201911192949 A CN201911192949 A CN 201911192949A CN 111062386 B CN111062386 B CN 111062386B
- Authority
- CN
- China
- Prior art keywords
- feature
- network
- text
- depth
- conv5
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 29
- 230000004927 fusion Effects 0.000 title claims abstract description 23
- 238000005070 sampling Methods 0.000 claims description 12
- 238000012549 training Methods 0.000 claims description 11
- 238000013507 mapping Methods 0.000 claims description 10
- 238000000034 method Methods 0.000 claims description 10
- 238000000605 extraction Methods 0.000 claims description 6
- 238000011176 pooling Methods 0.000 claims description 5
- 238000007670 refining Methods 0.000 claims description 3
- 230000009466 transformation Effects 0.000 claims description 3
- 230000004913 activation Effects 0.000 claims description 2
- 238000001914 filtration Methods 0.000 claims description 2
- 230000006870 function Effects 0.000 claims description 2
- 230000008569 process Effects 0.000 claims description 2
- 238000013461 design Methods 0.000 abstract description 4
- 230000007547 defect Effects 0.000 abstract description 3
- 238000002474 experimental method Methods 0.000 description 5
- 238000012360 testing method Methods 0.000 description 5
- 238000011156 evaluation Methods 0.000 description 4
- 238000013135 deep learning Methods 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 239000011521 glass Substances 0.000 description 1
- 238000003709 image segmentation Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/62—Text, e.g. of license plates, overlay texts or captions on TV images
- G06V20/63—Scene text, e.g. street names
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Image Processing (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a natural scene text detection method based on depth pyramid attention and feature fusion, which is a natural scene text detection algorithm combining a depth pyramid attention network and feature fusion, and aims to solve the problems that an originally designed good model cannot be fully utilized, the overall performance is limited, convolution operation is based on a local receptive field, and long dependence disappears along with the deepening of convolution. The utilization rate of the model is improved better by utilizing the feature fusion and the depth pyramid attention model, the defect that the design structure of many conventional character detection models is good but cannot be fully utilized is overcome, and long dependence can disappear along with the deepening of convolution operation based on local receptive fields.
Description
Technical Field
The invention relates to a natural scene text detection method, in particular to a natural scene text detection algorithm combining a depth pyramid attention network and a feature fusion technology.
Background
With the progress of science and technology, the demand for internet products is increasing, and more aspects need text information in images. The text detection is the first step, which is also extremely important, to more completely identify the text content in the image, and directly affects the text identification performance.
Based on text detection in natural scenes, detection complexity caused by background interference, changeable character aspect ratio, changeable character direction and small text on text detection needs to be overcome, and the method is one of the most challenging problems in the field of computer vision at present. The natural scene text detection can be divided into traditional natural scene text detection and text detection under a natural scene based on deep learning from different extraction feature modes. Scene pictures, unlike document pictures, contain complex backgrounds and changes in text angle, which are difficult to distinguish from the background using conventional natural scene text detection methods alone. Text detection in the current deep learning natural scene can be mainly divided into two types, namely a text detection method based on regional suggestion and a text detection method based on image segmentation. Through analysis of the two methods, the fact that most models lack balance of feature levels is found, so that the originally well-designed models cannot be fully utilized, and overall performance is limited.
In order to better fully utilize the model, the invention provides a new network, which overcomes the defects that the model which is designed well originally cannot be fully utilized and the overall performance is limited, and solves the problems that the long dependence can disappear along with the deepening of convolution operation based on local receptive fields.
Disclosure of Invention
The invention provides a natural scene text detection algorithm combining a depth pyramid attention network and feature fusion, which solves the problems that an originally designed model cannot be fully utilized and the overall performance is limited.
The technical scheme of the invention is as follows:
a natural scene text detection method based on depth pyramid attention and feature fusion comprises the following steps:
step one, taking a text public data set related to a natural scene as a training sample;
step two, inputting training samples into a preliminary extraction feature network (PixelLink extraction feature network) according to 8 pictures in each batch, wherein a basic framework is a VGG16 network, and a Unet structure is adopted; the top-down path adopts a VGG16 network, which is a deep network formed by a plurality of 3*3 convolution series connection and maximum pooling. The advantage of using multiple convolutions in series is: fewer parameter amounts and more non-linear variations are required than if only one larger convolution kernel were used.
The bottom-up path, the up-sampling phase. Wherein upsampling is performed using bilinear interpolation.
To prevent the feature map output by the VGG16 from being directly upsampled, thereby losing context information, a lateral connection is employed. The feature fusion is carried out on the feature graphs with the same space size of the top-down path and the bottom-up path, so that missing information is complemented, and the feature representation capability after up-sampling is stronger.
Step three, 4 feature mapping layers obtained by extracting a feature network from the PixelLink: h4, h3, h2 and h1, up-sampling the 4 feature mapping layers to h4, and carrying out average summation of pixel values, wherein the number of channels is unchanged, which is called feature fusion; wherein the upsampling is a bilinear interpolation; the formula of feature fusion is:
F=(h4+Up ×2 (h3)+Up ×4 (h2)+Up ×4 (h1))/4 (1)
wherein Up ×2 (. Cndot.) and Up ×4 (. Cndot.) the expansion is 2-fold and 4-fold respectively;
step four, taking the output of feature fusion as the input of a depth pyramid attention model, further increasing the depth pyramid attention model, and more fully utilizing the increased depth pyramid attention model;
the depth pyramid attention model consists of three branches: depth feature pyramid network branches, nonlinear transformation branches, and global average pooling branches. The invention does not simply add the extracted information to the depth feature pyramid network, but performs refinement processing. The depth feature pyramid network branches are convolved with 2 7 x 7, 25 x 5,2 x 3*3, respectively, in order to extract information from different pyramid scales. The same convolution kernel adopts a serial form, and different convolution kernels adopt a parallel form. The present invention labels conv7×7 in the left half, bn, relu as conv7_1, conv7×7 in the right half, bn as conv7_2. Similarly, conv5 x 5 in the left half, bn, relu is denoted Conv5_1, conv5 x 5 in the right half, bn is denoted Conv5_2, conv3 x 3 in the left half, bn, relu is denoted Conv3_1, conv3 x 3 in the right half, bn is denoted Conv3_2. The refining process is as follows: the feature map after feature fusion first goes through conv7_1, conv5_1, conv3_1 and conv3_2, respectively. The feature map of conv3_2 is then up-sampled and superimposed with the feature map of conv5_1 by pixel values and the superimposed result is input to conv5_2. And finally, up-sampling the Conv5_2 feature map, superposing the pixel values with the Conv7_1 feature map, and inputting the superposition result to the Conv7_2. Wherein the up-sampling is deconvolution, the size of the kernel is 4*4, the step size is 2, and BN and Relu activation functions are used;
inputting the refined feature mapping layer into a PixelLink output network;
the pixelink output network mainly comprises two parts: the first part is to predict whether the pixel is text; the second part is to predict whether the pixel and 8 pixels around the pixel belong to the same text instance; connecting the positive pixels by positive connection to form a connected component, wherein each component is a text example;
step six, finally, obtaining a final connected domain through minAreRect in the Opencv connected domain method by the segmented text example; when the connected area with the shortest edge less than 10 pixels or the area less than 300 pixels is regarded as false detection, automatically filtering the text area, and finally outputting the boundary box.
The invention has the beneficial effects that:
(1) The utilization rate of the model is improved better by utilizing the feature fusion and the depth pyramid attention model, and the defects that many character detection models are good in design structure but cannot be fully utilized and the overall performance is limited are overcome.
(2) The convolution operation is based on a local receptive field, so that the problem that long dependence disappears as the convolution deepens is avoided.
(3) Is effective for multi-scale text.
Drawings
Fig. 1 is a flow chart of the method of the present invention.
Fig. 2 is a schematic diagram of the overall network architecture of the present invention.
FIG. 3 is a schematic diagram of a portion of a deep pyramid attention network architecture.
Detailed Description
The following describes the embodiments of the present invention further with reference to the drawings and technical schemes.
As shown in fig. 1, the following steps are specifically described:
firstly, taking a training set of a text public data set related to a natural scene as a training sample;
step two, utilizing an extracted feature network of the PixelLink as a preliminary extracted feature network, wherein a basic framework is a VGG16 network, and a Unet structure is adopted;
the Unet is composed of a top-down path, a bottom-up path and a transverse connection.
(1) The top-down path adopts a VGG16 network, which is a deep network formed by a plurality of 3*3 convolution series connection and maximum pooling. The advantage of using multiple convolutions in series is: fewer parameter amounts and more non-linear variations are required than if only one larger convolution kernel were used.
(2) The bottom-up path, the up-sampling phase. Wherein upsampling is performed using bilinear interpolation.
(3) To prevent the feature map output by the VGG16 from being directly upsampled, thereby losing context information, a lateral connection is employed. The feature fusion is carried out on the feature graphs with the same space size of the top-down path and the bottom-up path, so that missing information is complemented, and the feature representation capability after up-sampling is stronger.
Step three, 4 feature mapping layers obtained by extracting a feature network from the PixelLink: h4; h3; h2; h1, up-sampling the 4 feature mapping layers to h4, carrying out average summation of pixel values, and enabling the number of channels to be unchanged, namely feature fusion; wherein the upsampling is a bilinear interpolation; the formula of feature fusion is:
F=(h4+Up ×2 (h3)+Up ×4 (h2)+Up ×4 (h1))/4 (1)
wherein Up ×2 (. Cndot.) and Up ×4 (. Cndot.) the expansion is 2-fold and 4-fold respectively;
(1) For reasons of hardware equipment, the training picture size is 256×256, the h4 size is 64×64, the h3 size is 32×32, the h2 size is 16×16, and the h1 size is 16×16.
Step four, taking the output of the feature fusion as the input of a deep pyramid attention network, further refining the features and more fully utilizing the model;
(1) The depth pyramid attention network is composed of depth feature pyramid network branches, nonlinear transformation branches, and global average pooling branches. Some designs are made on the network branches of the depth feature pyramid, so that the features of each branch are simply fused, and each part in the network branches of the depth feature pyramid is further refined.
And fifthly, inputting the refined feature mapping layer into a PixelLink output network.
(1) This output network mainly comprises two parts. The first part is to predict whether the pixel is text/not text; the second part is to predict whether the pixel and 8 pixels around it belong to the same text instance. Connecting the positive pixels by positive connection to form a connected component, wherein each component is a text example;
and step six, finally, the segmented text example is subjected to minAreRect in the Opencv connected domain method to obtain a final connected domain, but the method is sensitive to noise and can predict the noise as a real text, so that a plurality of thresholds are set, and false positives are reduced. When the connected area with the shortest edge less than 10 pixels or the area less than 300 pixels is regarded as false detection, the text area is automatically filtered, and finally the bounding box is output.
The invention is characterized in that the refined network is composed of two parts: the utilization rate of the model is improved better, the problems that many character detection models at present are good in design structure and cannot be fully utilized, and long dependence can disappear along with deepening of convolution based on local receptive fields in convolution operation are avoided.
The following describes embodiments of the present invention in detail with reference to the accompanying drawings, and the embodiments and specific operation procedures are given by the present embodiment on the premise of the technical solution of the present invention, but the scope of protection of the present invention is not limited to the following embodiments.
The data set for the experiments of the present invention were ICDAR2015 and ICDAR2013.ICDAR2015 dataset total 1500 pictures under natural scene with resolution size 1280 x 720, of which 1000 are training pictures and 500 are test pictures. Unlike previous images of ICDAR games are: these pictures are mainly obtained by google glasses and are very random when taken, and the text has the condition of inclination and blurring, which aims at increasing the difficulty of detection.
ICDAR2013 contains 229 training pictures and 233 test pictures. The dataset is a subset of the ICDAR2011, deleting the ICDAR2011 duplicate pictures and repairing the problem of incorrect image annotation. It is widely used in text detection, but contains only horizontal text.
The experiment was performed on a computer equipped with Intel (R) Core i7-6700 CPU 3.40GHz running the Linux Ubuntu 14.04 operating system and Pycharm Python 2.7. The deep learning framework is Tensorflow-gpu= 1.3.0, and the main required libraries are Opencv2, setproctitle, matplotlib.
ICDAR2015 experiment: when ICDAR2015 was tested, the training picture input size in the ICDAR2015 dataset used was 256 x 256 and the test picture resolution in the ICDAR2015 dataset was 1280 x 704. The evaluation criteria used were ICDAR2015 challenge published evaluation style R, P, F values.
Table 1 shows the R, P, F values for the model of the invention and the PixelLink on the ICDAR2015 dataset, respectively. ICDAR2015 experimental results are shown in table 1:
table 1 ICDAR2015 multi-directional text detection experimental results
Model | Recall rate of recall | Accuracy rate of | F value |
The model of the invention | 0.7708 | 0.7595 | 0.7651 |
PixelLink | 0.7299 | 0.7607 | 0.7450 |
ICDAR2013 experiment: in ICDAR2013 experiments, the training picture input size in the ICDAR2013 dataset used was 256×256, and the test picture resolution in the ICDAR2013 dataset was 384×384. The evaluation standard adopts an ICDAR2013 challenge game published evaluation mode R, P, F value.
Table 2 shows the R, P, F values for the model of the present invention and the PixelLink on the ICDAR2013 dataset, respectively. The ICDAR2013 experimental results are shown in table 2:
table 2 ICDAR2013 horizontal text test results
Model | Recall rate of recall | Accuracy rate of | F value |
The model of the invention | 0.8168 | 0.7041 | 0.7563 |
PixelLink | 0.6919 | 0.7508 | 0.7201 |
Claims (1)
1. A natural scene text detection method based on depth pyramid attention and feature fusion is characterized by comprising the following steps:
step one, taking a text public data set related to a natural scene as a training sample;
inputting training samples into a preliminary extraction feature network according to 8 pictures in each batch, wherein a basic framework is a VGG16 network and adopts a Unet structure; the preliminary extraction feature network is an extraction feature network of PixelLink;
step three, 4 feature mapping layers obtained by extracting a feature network from the PixelLink: h4, h3, h2 and h1, up-sampling the 4 feature mapping layers to h4, and carrying out average summation of pixel values, wherein the number of channels is unchanged, which is called feature fusion; wherein the upsampling is a bilinear interpolation; the formula of feature fusion is:
F=(h4+Up ×2 (h3)+Up ×4 (h2)+Up ×4 (h1))/4 (1)
wherein Up ×2 (. Cndot.) and Up ×4 (. Cndot.) the expansion is 2-fold and 4-fold respectively;
step four, taking the output of feature fusion as the input of a depth pyramid attention model, further increasing the depth pyramid attention model, and more fully utilizing the increased depth pyramid attention model;
the depth pyramid attention model consists of three branches: depth feature pyramid network branches, nonlinear transformation branches, and global average pooling branches; the depth feature pyramid network branches are respectively convolved with 2 convolutions of 7 x 7,2 convolutions of 5 x 5 and 2 convolutions of 3*3, so as to extract information from different pyramid scales; the same convolution kernel adopts a serial connection mode, and different convolution kernels adopt a parallel connection mode; conv7×7 in the left half, bn, renu, conv7_1, conv7×7 in the right half, bn, conv7_2; similarly, conv5 x 5 in the left half, bn, relu is labeled Conv5_1, conv5 x 5 in the right half, bn is labeled Conv5_2, conv3 x 3 in the left half, bn, relu is labeled Conv3_1, conv3 x 3 in the right half, bn is labeled Conv3_2; the refining process is as follows: the feature mapping after feature fusion is firstly subjected to Conv7_1, conv5_1, conv3_1 and Conv3_2 respectively; then up-sampling the feature map of Conv3_2, superposing the pixel values with the feature map of Conv5_1, and inputting the superposition result to Conv5_2; finally, up-sampling the feature map of Conv5_2, superposing pixel values with the feature map of Conv7_1, and inputting a superposition result to Conv7_2; wherein the up-sampling is deconvolution, the kernel size is 4*4, the step size is 2, using BN and relu activation functions;
inputting the refined feature mapping layer into a PixelLink output network;
the pixelink output network comprises two parts: the first part is to predict whether the pixel is text; the second part is to predict whether the pixel and 8 pixels around the pixel belong to the same text instance; connecting the positive pixels by positive connection to form a connected component, wherein each component is a text example;
step six, finally, obtaining a final connected domain through minAreRect in the Opencv connected domain method by the segmented text example; when the connected area with the shortest edge less than 10 pixels or the area less than 300 pixels is regarded as false detection, automatically filtering the text area, and finally outputting the boundary box.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911192949.5A CN111062386B (en) | 2019-11-28 | 2019-11-28 | Natural scene text detection method based on depth pyramid attention and feature fusion |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911192949.5A CN111062386B (en) | 2019-11-28 | 2019-11-28 | Natural scene text detection method based on depth pyramid attention and feature fusion |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111062386A CN111062386A (en) | 2020-04-24 |
CN111062386B true CN111062386B (en) | 2023-12-29 |
Family
ID=70299270
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911192949.5A Active CN111062386B (en) | 2019-11-28 | 2019-11-28 | Natural scene text detection method based on depth pyramid attention and feature fusion |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111062386B (en) |
Families Citing this family (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111753714B (en) * | 2020-06-23 | 2023-09-01 | 中南大学 | Multidirectional natural scene text detection method based on character segmentation |
CN111898570A (en) * | 2020-08-05 | 2020-11-06 | 盐城工学院 | Method for recognizing text in image based on bidirectional feature pyramid network |
CN112257708A (en) * | 2020-10-22 | 2021-01-22 | 润联软件系统(深圳)有限公司 | Character-level text detection method and device, computer equipment and storage medium |
CN112613561B (en) * | 2020-12-24 | 2022-06-03 | 哈尔滨理工大学 | EAST algorithm optimization method |
CN113744279B (en) * | 2021-06-09 | 2023-11-14 | 东北大学 | Image segmentation method based on FAF-Net network |
CN113609892A (en) * | 2021-06-16 | 2021-11-05 | 北京工业大学 | Handwritten poetry recognition method integrating deep learning with scenic spot knowledge map |
CN113743291B (en) * | 2021-09-02 | 2023-11-07 | 南京邮电大学 | Method and device for detecting texts in multiple scales by fusing attention mechanisms |
CN113903022B (en) * | 2021-09-23 | 2024-07-09 | 山东师范大学 | Text detection method and system based on feature pyramid and attention fusion |
CN115471831B (en) * | 2021-10-15 | 2024-01-23 | 中国矿业大学 | Image saliency detection method based on text reinforcement learning |
CN113822232B (en) * | 2021-11-19 | 2022-02-08 | 华中科技大学 | Pyramid attention-based scene recognition method, training method and device |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109325534A (en) * | 2018-09-22 | 2019-02-12 | 天津大学 | A kind of semantic segmentation method based on two-way multi-Scale Pyramid |
CN110097049A (en) * | 2019-04-03 | 2019-08-06 | 中国科学院计算技术研究所 | A kind of natural scene Method for text detection and system |
CN110287960A (en) * | 2019-07-02 | 2019-09-27 | 中国科学院信息工程研究所 | The detection recognition method of curve text in natural scene image |
WO2019192397A1 (en) * | 2018-04-04 | 2019-10-10 | 华中科技大学 | End-to-end recognition method for scene text in any shape |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10679085B2 (en) * | 2017-10-31 | 2020-06-09 | University Of Florida Research Foundation, Incorporated | Apparatus and method for detecting scene text in an image |
-
2019
- 2019-11-28 CN CN201911192949.5A patent/CN111062386B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019192397A1 (en) * | 2018-04-04 | 2019-10-10 | 华中科技大学 | End-to-end recognition method for scene text in any shape |
CN109325534A (en) * | 2018-09-22 | 2019-02-12 | 天津大学 | A kind of semantic segmentation method based on two-way multi-Scale Pyramid |
CN110097049A (en) * | 2019-04-03 | 2019-08-06 | 中国科学院计算技术研究所 | A kind of natural scene Method for text detection and system |
CN110287960A (en) * | 2019-07-02 | 2019-09-27 | 中国科学院信息工程研究所 | The detection recognition method of curve text in natural scene image |
Non-Patent Citations (2)
Title |
---|
基于特征金字塔的场景文本检测;常宇飞;陈欣鹏;王远航;钱冰;;信息工程大学学报(05);全文 * |
联合膨胀卷积残差网络和金字塔池化表达的高分影像建筑物自动识别;乔文凡;慎利;戴延帅;曹云刚;;地理与地理信息科学(05);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN111062386A (en) | 2020-04-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111062386B (en) | Natural scene text detection method based on depth pyramid attention and feature fusion | |
CN112232391B (en) | Dam crack detection method based on U-net network and SC-SAM attention mechanism | |
CN104751142B (en) | A kind of natural scene Method for text detection based on stroke feature | |
CN109784372B (en) | Target classification method based on convolutional neural network | |
CN112767418B (en) | Mirror image segmentation method based on depth perception | |
CN111275034B (en) | Method, device, equipment and storage medium for extracting text region from image | |
CN110334709B (en) | License plate detection method based on end-to-end multi-task deep learning | |
Hou et al. | BSNet: Dynamic hybrid gradient convolution based boundary-sensitive network for remote sensing image segmentation | |
CN115063373A (en) | Social network image tampering positioning method based on multi-scale feature intelligent perception | |
CN114266794B (en) | Pathological section image cancer region segmentation system based on full convolution neural network | |
CN114742799B (en) | Industrial scene unknown type defect segmentation method based on self-supervision heterogeneous network | |
Li et al. | Towards photo-realistic visible watermark removal with conditional generative adversarial networks | |
CN111986164A (en) | Road crack detection method based on multi-source Unet + Attention network migration | |
Niu et al. | Defect attention template generation cycleGAN for weakly supervised surface defect segmentation | |
CN112132164B (en) | Target detection method, system, computer device and storage medium | |
Chen et al. | Single depth image super-resolution using convolutional neural networks | |
CN111914654A (en) | Text layout analysis method, device, equipment and medium | |
CN112507876A (en) | Wired table picture analysis method and device based on semantic segmentation | |
CN113591719A (en) | Method and device for detecting text with any shape in natural scene and training method | |
Chen et al. | Adaptive fusion network for RGB-D salient object detection | |
JP2012252691A (en) | Method and device for extracting text stroke image from image | |
CN111222564A (en) | Image identification system, method and device based on image channel correlation | |
CN107563963B (en) | Super-resolution reconstruction method based on single depth map | |
Yu et al. | Progressive refined redistribution pyramid network for defect detection in complex scenarios | |
Shao et al. | Generative image inpainting with salient prior and relative total variation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |