CN111062386B - Natural scene text detection method based on depth pyramid attention and feature fusion - Google Patents

Natural scene text detection method based on depth pyramid attention and feature fusion Download PDF

Info

Publication number
CN111062386B
CN111062386B CN201911192949.5A CN201911192949A CN111062386B CN 111062386 B CN111062386 B CN 111062386B CN 201911192949 A CN201911192949 A CN 201911192949A CN 111062386 B CN111062386 B CN 111062386B
Authority
CN
China
Prior art keywords
feature
network
text
depth
conv5
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911192949.5A
Other languages
Chinese (zh)
Other versions
CN111062386A (en
Inventor
贾世杰
冯宇静
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian Jiaotong University
Original Assignee
Dalian Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian Jiaotong University filed Critical Dalian Jiaotong University
Priority to CN201911192949.5A priority Critical patent/CN111062386B/en
Publication of CN111062386A publication Critical patent/CN111062386A/en
Application granted granted Critical
Publication of CN111062386B publication Critical patent/CN111062386B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • G06V20/63Scene text, e.g. street names
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a natural scene text detection method based on depth pyramid attention and feature fusion, which is a natural scene text detection algorithm combining a depth pyramid attention network and feature fusion, and aims to solve the problems that an originally designed good model cannot be fully utilized, the overall performance is limited, convolution operation is based on a local receptive field, and long dependence disappears along with the deepening of convolution. The utilization rate of the model is improved better by utilizing the feature fusion and the depth pyramid attention model, the defect that the design structure of many conventional character detection models is good but cannot be fully utilized is overcome, and long dependence can disappear along with the deepening of convolution operation based on local receptive fields.

Description

Natural scene text detection method based on depth pyramid attention and feature fusion
Technical Field
The invention relates to a natural scene text detection method, in particular to a natural scene text detection algorithm combining a depth pyramid attention network and a feature fusion technology.
Background
With the progress of science and technology, the demand for internet products is increasing, and more aspects need text information in images. The text detection is the first step, which is also extremely important, to more completely identify the text content in the image, and directly affects the text identification performance.
Based on text detection in natural scenes, detection complexity caused by background interference, changeable character aspect ratio, changeable character direction and small text on text detection needs to be overcome, and the method is one of the most challenging problems in the field of computer vision at present. The natural scene text detection can be divided into traditional natural scene text detection and text detection under a natural scene based on deep learning from different extraction feature modes. Scene pictures, unlike document pictures, contain complex backgrounds and changes in text angle, which are difficult to distinguish from the background using conventional natural scene text detection methods alone. Text detection in the current deep learning natural scene can be mainly divided into two types, namely a text detection method based on regional suggestion and a text detection method based on image segmentation. Through analysis of the two methods, the fact that most models lack balance of feature levels is found, so that the originally well-designed models cannot be fully utilized, and overall performance is limited.
In order to better fully utilize the model, the invention provides a new network, which overcomes the defects that the model which is designed well originally cannot be fully utilized and the overall performance is limited, and solves the problems that the long dependence can disappear along with the deepening of convolution operation based on local receptive fields.
Disclosure of Invention
The invention provides a natural scene text detection algorithm combining a depth pyramid attention network and feature fusion, which solves the problems that an originally designed model cannot be fully utilized and the overall performance is limited.
The technical scheme of the invention is as follows:
a natural scene text detection method based on depth pyramid attention and feature fusion comprises the following steps:
step one, taking a text public data set related to a natural scene as a training sample;
step two, inputting training samples into a preliminary extraction feature network (PixelLink extraction feature network) according to 8 pictures in each batch, wherein a basic framework is a VGG16 network, and a Unet structure is adopted; the top-down path adopts a VGG16 network, which is a deep network formed by a plurality of 3*3 convolution series connection and maximum pooling. The advantage of using multiple convolutions in series is: fewer parameter amounts and more non-linear variations are required than if only one larger convolution kernel were used.
The bottom-up path, the up-sampling phase. Wherein upsampling is performed using bilinear interpolation.
To prevent the feature map output by the VGG16 from being directly upsampled, thereby losing context information, a lateral connection is employed. The feature fusion is carried out on the feature graphs with the same space size of the top-down path and the bottom-up path, so that missing information is complemented, and the feature representation capability after up-sampling is stronger.
Step three, 4 feature mapping layers obtained by extracting a feature network from the PixelLink: h4, h3, h2 and h1, up-sampling the 4 feature mapping layers to h4, and carrying out average summation of pixel values, wherein the number of channels is unchanged, which is called feature fusion; wherein the upsampling is a bilinear interpolation; the formula of feature fusion is:
F=(h4+Up ×2 (h3)+Up ×4 (h2)+Up ×4 (h1))/4 (1)
wherein Up ×2 (. Cndot.) and Up ×4 (. Cndot.) the expansion is 2-fold and 4-fold respectively;
step four, taking the output of feature fusion as the input of a depth pyramid attention model, further increasing the depth pyramid attention model, and more fully utilizing the increased depth pyramid attention model;
the depth pyramid attention model consists of three branches: depth feature pyramid network branches, nonlinear transformation branches, and global average pooling branches. The invention does not simply add the extracted information to the depth feature pyramid network, but performs refinement processing. The depth feature pyramid network branches are convolved with 2 7 x 7, 25 x 5,2 x 3*3, respectively, in order to extract information from different pyramid scales. The same convolution kernel adopts a serial form, and different convolution kernels adopt a parallel form. The present invention labels conv7×7 in the left half, bn, relu as conv7_1, conv7×7 in the right half, bn as conv7_2. Similarly, conv5 x 5 in the left half, bn, relu is denoted Conv5_1, conv5 x 5 in the right half, bn is denoted Conv5_2, conv3 x 3 in the left half, bn, relu is denoted Conv3_1, conv3 x 3 in the right half, bn is denoted Conv3_2. The refining process is as follows: the feature map after feature fusion first goes through conv7_1, conv5_1, conv3_1 and conv3_2, respectively. The feature map of conv3_2 is then up-sampled and superimposed with the feature map of conv5_1 by pixel values and the superimposed result is input to conv5_2. And finally, up-sampling the Conv5_2 feature map, superposing the pixel values with the Conv7_1 feature map, and inputting the superposition result to the Conv7_2. Wherein the up-sampling is deconvolution, the size of the kernel is 4*4, the step size is 2, and BN and Relu activation functions are used;
inputting the refined feature mapping layer into a PixelLink output network;
the pixelink output network mainly comprises two parts: the first part is to predict whether the pixel is text; the second part is to predict whether the pixel and 8 pixels around the pixel belong to the same text instance; connecting the positive pixels by positive connection to form a connected component, wherein each component is a text example;
step six, finally, obtaining a final connected domain through minAreRect in the Opencv connected domain method by the segmented text example; when the connected area with the shortest edge less than 10 pixels or the area less than 300 pixels is regarded as false detection, automatically filtering the text area, and finally outputting the boundary box.
The invention has the beneficial effects that:
(1) The utilization rate of the model is improved better by utilizing the feature fusion and the depth pyramid attention model, and the defects that many character detection models are good in design structure but cannot be fully utilized and the overall performance is limited are overcome.
(2) The convolution operation is based on a local receptive field, so that the problem that long dependence disappears as the convolution deepens is avoided.
(3) Is effective for multi-scale text.
Drawings
Fig. 1 is a flow chart of the method of the present invention.
Fig. 2 is a schematic diagram of the overall network architecture of the present invention.
FIG. 3 is a schematic diagram of a portion of a deep pyramid attention network architecture.
Detailed Description
The following describes the embodiments of the present invention further with reference to the drawings and technical schemes.
As shown in fig. 1, the following steps are specifically described:
firstly, taking a training set of a text public data set related to a natural scene as a training sample;
step two, utilizing an extracted feature network of the PixelLink as a preliminary extracted feature network, wherein a basic framework is a VGG16 network, and a Unet structure is adopted;
the Unet is composed of a top-down path, a bottom-up path and a transverse connection.
(1) The top-down path adopts a VGG16 network, which is a deep network formed by a plurality of 3*3 convolution series connection and maximum pooling. The advantage of using multiple convolutions in series is: fewer parameter amounts and more non-linear variations are required than if only one larger convolution kernel were used.
(2) The bottom-up path, the up-sampling phase. Wherein upsampling is performed using bilinear interpolation.
(3) To prevent the feature map output by the VGG16 from being directly upsampled, thereby losing context information, a lateral connection is employed. The feature fusion is carried out on the feature graphs with the same space size of the top-down path and the bottom-up path, so that missing information is complemented, and the feature representation capability after up-sampling is stronger.
Step three, 4 feature mapping layers obtained by extracting a feature network from the PixelLink: h4; h3; h2; h1, up-sampling the 4 feature mapping layers to h4, carrying out average summation of pixel values, and enabling the number of channels to be unchanged, namely feature fusion; wherein the upsampling is a bilinear interpolation; the formula of feature fusion is:
F=(h4+Up ×2 (h3)+Up ×4 (h2)+Up ×4 (h1))/4 (1)
wherein Up ×2 (. Cndot.) and Up ×4 (. Cndot.) the expansion is 2-fold and 4-fold respectively;
(1) For reasons of hardware equipment, the training picture size is 256×256, the h4 size is 64×64, the h3 size is 32×32, the h2 size is 16×16, and the h1 size is 16×16.
Step four, taking the output of the feature fusion as the input of a deep pyramid attention network, further refining the features and more fully utilizing the model;
(1) The depth pyramid attention network is composed of depth feature pyramid network branches, nonlinear transformation branches, and global average pooling branches. Some designs are made on the network branches of the depth feature pyramid, so that the features of each branch are simply fused, and each part in the network branches of the depth feature pyramid is further refined.
And fifthly, inputting the refined feature mapping layer into a PixelLink output network.
(1) This output network mainly comprises two parts. The first part is to predict whether the pixel is text/not text; the second part is to predict whether the pixel and 8 pixels around it belong to the same text instance. Connecting the positive pixels by positive connection to form a connected component, wherein each component is a text example;
and step six, finally, the segmented text example is subjected to minAreRect in the Opencv connected domain method to obtain a final connected domain, but the method is sensitive to noise and can predict the noise as a real text, so that a plurality of thresholds are set, and false positives are reduced. When the connected area with the shortest edge less than 10 pixels or the area less than 300 pixels is regarded as false detection, the text area is automatically filtered, and finally the bounding box is output.
The invention is characterized in that the refined network is composed of two parts: the utilization rate of the model is improved better, the problems that many character detection models at present are good in design structure and cannot be fully utilized, and long dependence can disappear along with deepening of convolution based on local receptive fields in convolution operation are avoided.
The following describes embodiments of the present invention in detail with reference to the accompanying drawings, and the embodiments and specific operation procedures are given by the present embodiment on the premise of the technical solution of the present invention, but the scope of protection of the present invention is not limited to the following embodiments.
The data set for the experiments of the present invention were ICDAR2015 and ICDAR2013.ICDAR2015 dataset total 1500 pictures under natural scene with resolution size 1280 x 720, of which 1000 are training pictures and 500 are test pictures. Unlike previous images of ICDAR games are: these pictures are mainly obtained by google glasses and are very random when taken, and the text has the condition of inclination and blurring, which aims at increasing the difficulty of detection.
ICDAR2013 contains 229 training pictures and 233 test pictures. The dataset is a subset of the ICDAR2011, deleting the ICDAR2011 duplicate pictures and repairing the problem of incorrect image annotation. It is widely used in text detection, but contains only horizontal text.
The experiment was performed on a computer equipped with Intel (R) Core i7-6700 CPU 3.40GHz running the Linux Ubuntu 14.04 operating system and Pycharm Python 2.7. The deep learning framework is Tensorflow-gpu= 1.3.0, and the main required libraries are Opencv2, setproctitle, matplotlib.
ICDAR2015 experiment: when ICDAR2015 was tested, the training picture input size in the ICDAR2015 dataset used was 256 x 256 and the test picture resolution in the ICDAR2015 dataset was 1280 x 704. The evaluation criteria used were ICDAR2015 challenge published evaluation style R, P, F values.
Table 1 shows the R, P, F values for the model of the invention and the PixelLink on the ICDAR2015 dataset, respectively. ICDAR2015 experimental results are shown in table 1:
table 1 ICDAR2015 multi-directional text detection experimental results
Model Recall rate of recall Accuracy rate of F value
The model of the invention 0.7708 0.7595 0.7651
PixelLink 0.7299 0.7607 0.7450
ICDAR2013 experiment: in ICDAR2013 experiments, the training picture input size in the ICDAR2013 dataset used was 256×256, and the test picture resolution in the ICDAR2013 dataset was 384×384. The evaluation standard adopts an ICDAR2013 challenge game published evaluation mode R, P, F value.
Table 2 shows the R, P, F values for the model of the present invention and the PixelLink on the ICDAR2013 dataset, respectively. The ICDAR2013 experimental results are shown in table 2:
table 2 ICDAR2013 horizontal text test results
Model Recall rate of recall Accuracy rate of F value
The model of the invention 0.8168 0.7041 0.7563
PixelLink 0.6919 0.7508 0.7201

Claims (1)

1. A natural scene text detection method based on depth pyramid attention and feature fusion is characterized by comprising the following steps:
step one, taking a text public data set related to a natural scene as a training sample;
inputting training samples into a preliminary extraction feature network according to 8 pictures in each batch, wherein a basic framework is a VGG16 network and adopts a Unet structure; the preliminary extraction feature network is an extraction feature network of PixelLink;
step three, 4 feature mapping layers obtained by extracting a feature network from the PixelLink: h4, h3, h2 and h1, up-sampling the 4 feature mapping layers to h4, and carrying out average summation of pixel values, wherein the number of channels is unchanged, which is called feature fusion; wherein the upsampling is a bilinear interpolation; the formula of feature fusion is:
F=(h4+Up ×2 (h3)+Up ×4 (h2)+Up ×4 (h1))/4 (1)
wherein Up ×2 (. Cndot.) and Up ×4 (. Cndot.) the expansion is 2-fold and 4-fold respectively;
step four, taking the output of feature fusion as the input of a depth pyramid attention model, further increasing the depth pyramid attention model, and more fully utilizing the increased depth pyramid attention model;
the depth pyramid attention model consists of three branches: depth feature pyramid network branches, nonlinear transformation branches, and global average pooling branches; the depth feature pyramid network branches are respectively convolved with 2 convolutions of 7 x 7,2 convolutions of 5 x 5 and 2 convolutions of 3*3, so as to extract information from different pyramid scales; the same convolution kernel adopts a serial connection mode, and different convolution kernels adopt a parallel connection mode; conv7×7 in the left half, bn, renu, conv7_1, conv7×7 in the right half, bn, conv7_2; similarly, conv5 x 5 in the left half, bn, relu is labeled Conv5_1, conv5 x 5 in the right half, bn is labeled Conv5_2, conv3 x 3 in the left half, bn, relu is labeled Conv3_1, conv3 x 3 in the right half, bn is labeled Conv3_2; the refining process is as follows: the feature mapping after feature fusion is firstly subjected to Conv7_1, conv5_1, conv3_1 and Conv3_2 respectively; then up-sampling the feature map of Conv3_2, superposing the pixel values with the feature map of Conv5_1, and inputting the superposition result to Conv5_2; finally, up-sampling the feature map of Conv5_2, superposing pixel values with the feature map of Conv7_1, and inputting a superposition result to Conv7_2; wherein the up-sampling is deconvolution, the kernel size is 4*4, the step size is 2, using BN and relu activation functions;
inputting the refined feature mapping layer into a PixelLink output network;
the pixelink output network comprises two parts: the first part is to predict whether the pixel is text; the second part is to predict whether the pixel and 8 pixels around the pixel belong to the same text instance; connecting the positive pixels by positive connection to form a connected component, wherein each component is a text example;
step six, finally, obtaining a final connected domain through minAreRect in the Opencv connected domain method by the segmented text example; when the connected area with the shortest edge less than 10 pixels or the area less than 300 pixels is regarded as false detection, automatically filtering the text area, and finally outputting the boundary box.
CN201911192949.5A 2019-11-28 2019-11-28 Natural scene text detection method based on depth pyramid attention and feature fusion Active CN111062386B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911192949.5A CN111062386B (en) 2019-11-28 2019-11-28 Natural scene text detection method based on depth pyramid attention and feature fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911192949.5A CN111062386B (en) 2019-11-28 2019-11-28 Natural scene text detection method based on depth pyramid attention and feature fusion

Publications (2)

Publication Number Publication Date
CN111062386A CN111062386A (en) 2020-04-24
CN111062386B true CN111062386B (en) 2023-12-29

Family

ID=70299270

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911192949.5A Active CN111062386B (en) 2019-11-28 2019-11-28 Natural scene text detection method based on depth pyramid attention and feature fusion

Country Status (1)

Country Link
CN (1) CN111062386B (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111753714B (en) * 2020-06-23 2023-09-01 中南大学 Multidirectional natural scene text detection method based on character segmentation
CN111898570A (en) * 2020-08-05 2020-11-06 盐城工学院 Method for recognizing text in image based on bidirectional feature pyramid network
CN112257708A (en) * 2020-10-22 2021-01-22 润联软件系统(深圳)有限公司 Character-level text detection method and device, computer equipment and storage medium
CN112613561B (en) * 2020-12-24 2022-06-03 哈尔滨理工大学 EAST algorithm optimization method
CN113744279B (en) * 2021-06-09 2023-11-14 东北大学 Image segmentation method based on FAF-Net network
CN113609892A (en) * 2021-06-16 2021-11-05 北京工业大学 Handwritten poetry recognition method integrating deep learning with scenic spot knowledge map
CN113743291B (en) * 2021-09-02 2023-11-07 南京邮电大学 Method and device for detecting texts in multiple scales by fusing attention mechanisms
CN113903022B (en) * 2021-09-23 2024-07-09 山东师范大学 Text detection method and system based on feature pyramid and attention fusion
CN115471831B (en) * 2021-10-15 2024-01-23 中国矿业大学 Image saliency detection method based on text reinforcement learning
CN113822232B (en) * 2021-11-19 2022-02-08 华中科技大学 Pyramid attention-based scene recognition method, training method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109325534A (en) * 2018-09-22 2019-02-12 天津大学 A kind of semantic segmentation method based on two-way multi-Scale Pyramid
CN110097049A (en) * 2019-04-03 2019-08-06 中国科学院计算技术研究所 A kind of natural scene Method for text detection and system
CN110287960A (en) * 2019-07-02 2019-09-27 中国科学院信息工程研究所 The detection recognition method of curve text in natural scene image
WO2019192397A1 (en) * 2018-04-04 2019-10-10 华中科技大学 End-to-end recognition method for scene text in any shape

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10679085B2 (en) * 2017-10-31 2020-06-09 University Of Florida Research Foundation, Incorporated Apparatus and method for detecting scene text in an image

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019192397A1 (en) * 2018-04-04 2019-10-10 华中科技大学 End-to-end recognition method for scene text in any shape
CN109325534A (en) * 2018-09-22 2019-02-12 天津大学 A kind of semantic segmentation method based on two-way multi-Scale Pyramid
CN110097049A (en) * 2019-04-03 2019-08-06 中国科学院计算技术研究所 A kind of natural scene Method for text detection and system
CN110287960A (en) * 2019-07-02 2019-09-27 中国科学院信息工程研究所 The detection recognition method of curve text in natural scene image

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于特征金字塔的场景文本检测;常宇飞;陈欣鹏;王远航;钱冰;;信息工程大学学报(05);全文 *
联合膨胀卷积残差网络和金字塔池化表达的高分影像建筑物自动识别;乔文凡;慎利;戴延帅;曹云刚;;地理与地理信息科学(05);全文 *

Also Published As

Publication number Publication date
CN111062386A (en) 2020-04-24

Similar Documents

Publication Publication Date Title
CN111062386B (en) Natural scene text detection method based on depth pyramid attention and feature fusion
CN112232391B (en) Dam crack detection method based on U-net network and SC-SAM attention mechanism
CN104751142B (en) A kind of natural scene Method for text detection based on stroke feature
CN109784372B (en) Target classification method based on convolutional neural network
CN112767418B (en) Mirror image segmentation method based on depth perception
CN111275034B (en) Method, device, equipment and storage medium for extracting text region from image
CN110334709B (en) License plate detection method based on end-to-end multi-task deep learning
Hou et al. BSNet: Dynamic hybrid gradient convolution based boundary-sensitive network for remote sensing image segmentation
CN115063373A (en) Social network image tampering positioning method based on multi-scale feature intelligent perception
CN114266794B (en) Pathological section image cancer region segmentation system based on full convolution neural network
CN114742799B (en) Industrial scene unknown type defect segmentation method based on self-supervision heterogeneous network
Li et al. Towards photo-realistic visible watermark removal with conditional generative adversarial networks
CN111986164A (en) Road crack detection method based on multi-source Unet + Attention network migration
Niu et al. Defect attention template generation cycleGAN for weakly supervised surface defect segmentation
CN112132164B (en) Target detection method, system, computer device and storage medium
Chen et al. Single depth image super-resolution using convolutional neural networks
CN111914654A (en) Text layout analysis method, device, equipment and medium
CN112507876A (en) Wired table picture analysis method and device based on semantic segmentation
CN113591719A (en) Method and device for detecting text with any shape in natural scene and training method
Chen et al. Adaptive fusion network for RGB-D salient object detection
JP2012252691A (en) Method and device for extracting text stroke image from image
CN111222564A (en) Image identification system, method and device based on image channel correlation
CN107563963B (en) Super-resolution reconstruction method based on single depth map
Yu et al. Progressive refined redistribution pyramid network for defect detection in complex scenarios
Shao et al. Generative image inpainting with salient prior and relative total variation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant