CN113888505B - Natural scene text detection method based on semantic segmentation - Google Patents

Natural scene text detection method based on semantic segmentation Download PDF

Info

Publication number
CN113888505B
CN113888505B CN202111157377.4A CN202111157377A CN113888505B CN 113888505 B CN113888505 B CN 113888505B CN 202111157377 A CN202111157377 A CN 202111157377A CN 113888505 B CN113888505 B CN 113888505B
Authority
CN
China
Prior art keywords
feature
network
size
output
map
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111157377.4A
Other languages
Chinese (zh)
Other versions
CN113888505A (en
Inventor
张立和
隋国际
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian University of Technology
Original Assignee
Dalian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian University of Technology filed Critical Dalian University of Technology
Priority to CN202111157377.4A priority Critical patent/CN113888505B/en
Publication of CN113888505A publication Critical patent/CN113888505A/en
Application granted granted Critical
Publication of CN113888505B publication Critical patent/CN113888505B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/13Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Quality & Reliability (AREA)
  • Image Analysis (AREA)

Abstract

The invention belongs to the field of computer vision by a deep learning technology, and provides a natural scene text detection method based on semantic segmentation. According to the method, a feature extraction network is firstly constructed, then a feature selection module is used for screening effective information, the screened multi-scale feature information is fused through a feature pyramid network structure, finally a semantic segmentation result after the edge of a text region is obviously strengthened is obtained through an edge strengthening network and a semantic segmentation network, and finally boundary coordinate information of the text region is obtained. The invention realizes a quick lightweight text detection model, not only can detect text areas with various complex shapes and backgrounds, but also the detection process is quick and accurate.

Description

Natural scene text detection method based on semantic segmentation
Technical Field
The invention belongs to the technical field of artificial intelligence, relates to deep learning and computer vision content, and particularly relates to a natural scene text detection method.
Background
The text detection is an important step for acquiring important information of human society by a computer and realizing man-machine interaction, and aims to enable the computer to quickly acquire a text area containing effective information in the field of vision like a human. In a natural scene image, the part with the greatest information density is usually a character, and the first step of acquiring information is to find the position of the character. Through the text region containing effective information selected, the process of acquiring the information by the computer is more accurate and efficient, and the redundancy of calculation and storage resources in the later period is reduced, so that the overall performance of image understanding is improved. In general, text regions containing valid information and other background regions of unwanted information in an image, while understanding an image only requires attention to the valid information therein, ignoring unwanted information, which distinguishes foreground and background alien from semantic segmentation in computer vision. Therefore, it is feasible to perform scene text detection by using a computer to simulate a human visual system.
The prior text detection utilizes a traditional machine learning mode to count and analyze pixel distribution in an image, and the mode cannot fully consider global information, but only carries out traversal searching in the image through a fixed algorithm, so that the speed and the accuracy are not ideal. The method based on deep learning effectively solves the problems of speed and accuracy, the method proposed in the early stage mainly uses a neural network to predict the frame information of a text region, and is limited by the expression capability of the network, and the method of directly regressing the text frame can only detect a simple text region. This approach is not effective if the background and text are not easily separated from each other, the text style is curved, etc. While semantic segmentation can solve the above problems well. Firstly, the processing speed of the neural network on the image can meet the requirement of real-time property due to the development of deep learning and the rapid improvement of the current computer computing power. Furthermore, the semantic segmentation mode can accurately separate the foreground and the background of the target, even if the target has a complex outline, the detection can be performed under the conditions of complex scenes and complex texts. By tracing the detected semantic information, the exact outline of the text region can be obtained, which makes the complex text extraction in natural scenes more efficient.
Disclosure of Invention
The invention aims to solve the technical problems that: the defect of the current scene text detection is overcome, and the edge-enhanced natural scene text detection method based on semantic segmentation is provided, so that the purpose of high-precision and high-efficiency detection is achieved.
The technical scheme of the invention is as follows:
a natural scene text detection method based on semantic segmentation comprises the following steps:
(1) Constructing a basic feature extraction network
The feature extraction network adopts a ResNet or MobileNet classical network structure as backbone, 1/4, 1/8, 1/16 and 1/32 features of the input image size are extracted from different layers respectively as output, and the number of channels corresponding to the output features is 64, 128, 256 and 512 channels respectively;
(2) Construction of feature screening Module
The input of the feature screening module is divided into two parts i and h, i represents the output feature of the feature extraction network, h represents the output feature of the feature screening module at the upper stage, the two parts are subjected to convolution fusion and then normalized by using a sigmoid function, the normalized result is used as a weight, the two inputs i and h are subjected to selective fusion, and finally the fused output feature is obtained; the whole operation process is defined as follows:
S=sigmoid(conv3(conv1(h),conv2(i)))
out=conv4((1-S)·h+S·i)
Where S represents a normalized feature screening heat map, conv (x) represents a series of self-network structures consisting of convolution, batch normalization, reLu activation functions, out represents the final output feature map, fixed as 64 channels. It should be noted that the above operation process also implies the step of channel transformation;
(3) Construction of feature pyramid networks
The feature pyramid network is a step of fusing the outputs of the feature screening modules. Feature screening modules are used at 3 in the network, but the network structure of the modules is only one, namely 1 module 3 is multiplexed. Firstly, feature expansion is carried out on the 1/32-size feature map output by the feature extraction network by using a pyramid pooling network (ASPP), so that a 1/32-size feature map res4 is obtained. Up-sampling res4 to be 1/16 size, and then taking the 1/16 size feature map output by the feature extraction network and the 1/16 size feature map as h and i inputs of a feature screening module respectively, wherein the feature screening module outputs a 1/16 size feature map res3; repeating the steps to obtain res2 and res1, wherein the sizes are 1/8 and 1/4 respectively. Finally, up-sampling res2, res3 and res4 to the size of res1, and then cascading on channels to obtain a multi-scale fusion feature map with 256 channels;
(4) Constructing edge-enhanced networks
The edge enhancement network consists of 3 layers of neural networks, wherein the first two layers of neural networks consist of convolution, batch normalization and ReLu activation functions, and the last layer of neural networks consist of convolution, bias and sigmoid activation functions. Finally, obtaining an edge strengthening heat map with the channel number of 1, wherein the pixel point value range is [0,1], and the larger the value is, the closer the value is to the edge position;
(5) Constructing semantic segmentation networks
Firstly, a 256-channel feature map output by a feature pyramid network and a 1-channel feature map output by an edge enhancement network are cascaded on channels, then the result is input into a 3-layer convolutional neural network, and the front 2-layer network structure consists of upsampling, convolution, batch normalization and ReLu activation functions, wherein the upsampling operation adopts a bilinear interpolation method to enlarge the size of the feature map to 2 times of the original size. And the final layer of network adopts convolution, offset and sigmoid activation functions to obtain a semantic segmentation heat map of 1 channel, wherein the range of values is between 0 and 1. Converting the heat map into a binary map with only two values of 0 and 1 by setting 0.7 as a threshold value;
(6) Contour forming
Separating different text regions from the binarization map by using OpenCV software, and solving a closed polygon with the smallest perimeter of the region for each region, wherein the vertex coordinates of the polygon are the position coordinates of the text region in the image. For a rectangular text area, its coordinates consist of 4 points. For other irregular text regions, openCV software can determine the number of polygon vertices by itself.
(7) Training method
For ResNet networks as backbone structures, first pre-training is performed on image classification dataset ImageNet, and pre-training network weight parameters are saved. Then the whole network is preheated on the artificial synthetic dataset SynthText to enable the model to achieve convergence on the task scene. And finally, performing final formal training under the specific scene data set. In addition, OHEM algorithm is used in the design of the loss function, difficult mining is carried out, and the area gap between the foreground and the background is balanced.
The invention has the beneficial effects that: the invention fully utilizes the strong distinguishing capability of the semantic segmentation algorithm between the foreground and the background, and performs multi-scale feature extraction through the feature pyramid network, thereby ensuring that both small-size texts and large-size texts in the image can be effectively detected. By introducing the information selection gate structure, the up-sampling and feature fusion part selects effective information to propagate and output, so that redundant information in the network is removed. In addition, the commonality of the semantic segmentation algorithm and the frame shaping algorithm in processing the irregular area ensures the accurate detection capability of the whole scheme on the irregular text area.
Drawings
Fig. 1 illustrates a multi-scale feature extraction network. Wherein the top row represents a feature extraction backbone network, and the different sizes represent progressively smaller extracted feature patterns. The middle row represents a feature filter gate with two inputs and ASPP represents a pyramid pooling network. The next row of differently sized boxes represents the extracted different scale feature maps. Finally, the feature images are aggregated together through an up-sampling step;
FIG. 2 is an internal concrete structure of a feature filter gate, conv (x) representing a number of layers of convolutional networks, x representing a pixel multiplication operation, and +representing a pixel addition operation;
FIG. 3 is a schematic diagram of an edge enhancement network, a semantic segmentation network, and a binarization process, wherein conv (x) represents a number of layers of convolutional networks;
FIG. 4 is a true value diagram of an edge enhancement structure, wherein the innermost line of the three lines represents the boundary after the text outline has been reduced to 0.5, and all pixel values within it are set to 0. The boundary of the outermost layer represents the boundary after enlarging the text outline 1.25 times the original, and all pixel values outside it are set to 0. The value of the middle black line is 1, which represents the original boundary, and the pixel values among the three boundary lines are linearly interpolated;
fig. 5 is an input image example;
FIG. 6 is a semantic segmentation result example;
fig. 7 is a frame example of a text region.
Detailed Description
The following describes the embodiments of the present invention further with reference to the drawings and technical schemes.
A natural scene text detection method based on semantic segmentation comprises the following steps:
(1) Constructing a basic feature extraction network
The feature extraction network employs ResNet network architecture as the backbone, as shown by the upper row conv (x) in fig. 1. Its input is a 3-channel RGB image as shown in fig. 5. Extracting 1/4, 1/8, 1/16 and 1/32 features of the input image size from layers 4, 6, 9 and 13 of ResNet to be output, wherein the number of channels corresponding to the output features is 64, 128, 256 and 512 channels respectively;
(2) Construction of feature screening Module
As shown in fig. 2, the input of the feature screening module is i and h, i represents the output feature of the feature extraction network, h represents the output feature of the previous feature screening module, the two parts are subjected to convolution fusion and then normalized by using a sigmoid function, the normalization result is used as a weight, the i and h inputs are subjected to selective fusion, and finally the fused output feature is obtained; the whole operation process is defined as follows:
S=sigmoid(conv3(conv1(h),conv2(i)))
out=conv4((1-S)·h+S·i)
Wherein S is a normalized feature screening heat map, out is a final output feature map, which has 64 channels with the same size as i and h;
(3) Construction of feature pyramid networks
The feature pyramid network is a step of fusing the outputs of the feature screening modules. As shown in fig. 1, feature screening modules are used at 3 in the network, but the network structure of the modules is only one, namely 1 module 3 is multiplexed. Firstly, feature expansion is carried out on the 1/32-size feature map output by the feature extraction network by using a pyramid pooling network (ASPP), so that a 1/32-size feature map res4 is obtained. Up-sampling res4 to be 1/16 size, and then taking the 1/16 size feature map output by the feature extraction network and the 1/16 size feature map as h and i inputs of a feature screening module respectively, wherein the feature screening module outputs a 1/16 size feature map res3; repeating the steps to obtain res2 and res1, wherein the sizes are 1/8 and 1/4 respectively. Finally, up-sampling res2, res3 and res4 to the size of res1, and then cascading on channels to obtain a multi-scale fusion feature map with 256 channels;
(4) Constructing edge-enhanced networks
The edge enhancement network consists of 3 layers of neural networks, wherein the first two layers of neural networks consist of convolution, batch normalization and ReLu activation functions, and the last layer of neural networks consist of convolution, bias and sigmoid activation functions. Finally, obtaining an edge enhancement heat map with the channel number of 1, wherein the pixel point value range is [0,1], and the larger the value is, the closer the value is to the edge position. FIG. 4 illustrates a distribution of pixel values at text edge locations in a heat map;
(5) Constructing semantic segmentation networks
Firstly, a 256-channel feature map output by a feature pyramid network and a 1-channel feature map output by an edge enhancement network are cascaded on channels, then the result is input into a 3-layer convolutional neural network, and the front 2-layer network structure consists of upsampling, convolution, batch normalization and ReLu activation functions, wherein the upsampling operation adopts a bilinear interpolation method to enlarge the size of the feature map to 2 times of the original size. And the final layer of network adopts convolution, offset and sigmoid activation functions to obtain a semantic segmentation heat map of 1 channel, wherein the range of values is between 0 and 1. Converting the heat map into a binary map with only 0 and 1 values by setting 0.7 as a threshold value, as shown in fig. 6, wherein a black area represents the position of a character and a white area is a background area;
(6) Contour forming
Separating different text regions from the binarization map by using OpenCV software, and solving a closed polygon with the smallest perimeter of the region for each region, wherein the vertex coordinates of the polygon are the position coordinates of the text region in the image. In fig. 6, 3 text regions are detected in total by semantic segmentation and binarization, and in fig. 7, the border of each text region is derived from the binarization map using OpenCV. For the 3 rectangular text regions in fig. 7, openCV will output coordinates of 4 vertices, respectively. Finally, the coordinate points are taken as text region coordinates. For other irregular text areas, the OpenCV software can also determine the number of polygon vertices by itself.
(7) Training method
Using ResNet as backbone network, pre-training it on image classification dataset ImageNet and saving pre-training network weight parameters. The entire network is then pre-trained on the artificial synthetic dataset SynthText to allow the model to converge on the task scene. And finally, performing final formal training under the specific scene data set. In addition, OHEM algorithm is used in the design of the loss function, positive and negative sample balancing is performed, and the area gap between the foreground and the background is balanced. The network optimizer adopts Adam, the batch size is set to 8, an exponentially decaying learning rate curve is used, the initial learning rate is set to 0.0001, and the learning rate is reduced to 0.95 after every 1 ten thousand iterations, and 10 ten thousand iterations are performed.

Claims (1)

1. A natural scene text detection method based on semantic segmentation is characterized by comprising the following steps:
(1) Constructing a basic feature extraction network
The feature extraction network adopts ResNet or MobileNet network structure as backbone, 1/4, 1/8, 1/16 and 1/32 features of the input image size are extracted from different layers as output, and the number of channels corresponding to the output features is 64, 128, 256 and 512 channels respectively;
(2) Construction of feature screening Module
The input of the feature screening module is divided into two parts i and h, i represents the output feature of the feature extraction network, h represents the output feature of the feature screening module at the upper stage, the two parts are subjected to convolution fusion and then normalized by using a sigmoid function, the normalized result is used as a weight, the two inputs i and h are subjected to selective fusion, and finally the fused output feature is obtained; the whole operation process is defined as follows:
S=sigmoid(conv3(conv1(h),conv2(i)))
out=conv4((1-S)·h+S·i)
Wherein S represents a normalized feature screening heat map; conv (x) represents a series of self-network structures consisting of convolution, batch normalization, reLu activation functions; out represents the final output feature map, fixed as 64 channels; the step of channel transformation is also implied in the operation process;
(3) Construction of feature pyramid networks
The feature pyramid network is used for fusing the output of the feature screening module; the feature screening module is used at 3 positions in the feature pyramid network, but the network structure of the feature screening module is only one, namely 1 module 3 positions are multiplexed; firstly, performing feature expansion on a 1/32-size feature map output by a feature extraction network by using a pyramid pooling network to obtain a 1/32-size feature map res4; up-sampling res4 to be 1/16 size, and then taking the 1/16 size feature map output by the feature extraction network and the 1/16 size feature map as h and i inputs of a feature screening module respectively, wherein the feature screening module outputs a 1/16 size feature map res3; repeating the steps to obtain res2 and res1, wherein the sizes are 1/8 and 1/4 respectively; finally, up-sampling res2, res3 and res4 to the size of res1, and then cascading on channels to obtain a multi-scale fusion feature map with 256 channels;
(4) Constructing edge-enhanced networks
The edge strengthening network consists of 3 layers of neural networks, wherein the first two layers of neural networks consist of convolution, batch normalization and ReLu activation functions, and the last layer of neural networks consist of convolution, bias and sigmoid activation functions; finally, obtaining an edge strengthening heat map with the channel number of 1, wherein the pixel point value range is [0,1], and the larger the value is, the closer the value is to the edge position;
(5) Constructing semantic segmentation networks
Firstly, cascading 256 channel feature images output by a feature pyramid network and 1 channel feature images output by an edge enhancement network on channels, inputting the results into a 3-layer convolutional neural network, wherein the front 2-layer network structure consists of upsampling, convolution, batch normalization and ReLu activation functions, and the upsampling operation adopts a bilinear interpolation method to enlarge the size of the feature images to 2 times of the original size; the final layer of network adopts convolution, bias and sigmoid activation functions to obtain a semantic segmentation heat map of a1 channel, wherein the range of values is between 0 and 1; converting the heat map into a binary map with only two values of 0 and 1 by setting 0.7 as a threshold value;
(6) Contour forming
Separating different text regions from a binarization graph by using OpenCV software, and solving a closed polygon with the smallest perimeter of the region for each region, wherein the vertex coordinates of the polygon are the position coordinates of the text region in an image; for a rectangular text region, its coordinates consist of 4 points; for other irregular text areas, the OpenCV software automatically determines the number of polygon vertexes;
(7) Training method
Using ResNet as a backbone network, pre-training the backbone network on an image classification dataset ImageNet, and storing pre-training network weight parameters; then the whole network is pre-trained on the artificial synthetic dataset SynthText to enable the model to converge on the task scene; finally, performing final formal training under the specific scene data set; in addition, OHEM algorithm is used in the design of the loss function, positive and negative sample balancing is carried out, and the area difference between the foreground and the background is balanced; the network optimizer adopts Adam, the batch size is set to 8, an exponentially decaying learning rate curve is used, the initial learning rate is set to 0.0001, and the learning rate is reduced to 0.95 after every 1 ten thousand iterations, and 10 ten thousand iterations are performed.
CN202111157377.4A 2021-09-30 2021-09-30 Natural scene text detection method based on semantic segmentation Active CN113888505B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111157377.4A CN113888505B (en) 2021-09-30 2021-09-30 Natural scene text detection method based on semantic segmentation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111157377.4A CN113888505B (en) 2021-09-30 2021-09-30 Natural scene text detection method based on semantic segmentation

Publications (2)

Publication Number Publication Date
CN113888505A CN113888505A (en) 2022-01-04
CN113888505B true CN113888505B (en) 2024-05-07

Family

ID=79004733

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111157377.4A Active CN113888505B (en) 2021-09-30 2021-09-30 Natural scene text detection method based on semantic segmentation

Country Status (1)

Country Link
CN (1) CN113888505B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114399710A (en) * 2022-01-06 2022-04-26 昇辉控股有限公司 Identification detection method and system based on image segmentation and readable storage medium
CN114092930B (en) * 2022-01-07 2022-05-03 中科视语(北京)科技有限公司 Character recognition method and system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110097049A (en) * 2019-04-03 2019-08-06 中国科学院计算技术研究所 A kind of natural scene Method for text detection and system
CN110322495A (en) * 2019-06-27 2019-10-11 电子科技大学 A kind of scene text dividing method based on Weakly supervised deep learning
CN111553351A (en) * 2020-04-26 2020-08-18 佛山市南海区广工大数控装备协同创新研究院 Semantic segmentation based text detection method for arbitrary scene shape
CN112966691A (en) * 2021-04-14 2021-06-15 重庆邮电大学 Multi-scale text detection method and device based on semantic segmentation and electronic equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110097049A (en) * 2019-04-03 2019-08-06 中国科学院计算技术研究所 A kind of natural scene Method for text detection and system
CN110322495A (en) * 2019-06-27 2019-10-11 电子科技大学 A kind of scene text dividing method based on Weakly supervised deep learning
CN111553351A (en) * 2020-04-26 2020-08-18 佛山市南海区广工大数控装备协同创新研究院 Semantic segmentation based text detection method for arbitrary scene shape
CN112966691A (en) * 2021-04-14 2021-06-15 重庆邮电大学 Multi-scale text detection method and device based on semantic segmentation and electronic equipment

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Yu Zeng ; Yunzhi Zhuge ; Huchuan Lu ; Lihe Zhang.Joint Learning of Saliency Detection and Weakly Supervised Semantic Segmentation .《arxiv》.2019,全文. *
基于轻量级网络的自然场景下的文本检测;孙婧婧;张青林;;电子测量技术;20200423(第08期);全文 *

Also Published As

Publication number Publication date
CN113888505A (en) 2022-01-04

Similar Documents

Publication Publication Date Title
CN110287849B (en) Lightweight depth network image target detection method suitable for raspberry pi
CN111047551B (en) Remote sensing image change detection method and system based on U-net improved algorithm
CN111640125B (en) Aerial photography graph building detection and segmentation method and device based on Mask R-CNN
CN109034210A (en) Object detection method based on super Fusion Features Yu multi-Scale Pyramid network
CN113888505B (en) Natural scene text detection method based on semantic segmentation
CN111612008A (en) Image segmentation method based on convolution network
CN113807355A (en) Image semantic segmentation method based on coding and decoding structure
CN111046917B (en) Object-based enhanced target detection method based on deep neural network
CN111797841B (en) Visual saliency detection method based on depth residual error network
CN110532946A (en) A method of the green vehicle spindle-type that is open to traffic is identified based on convolutional neural networks
CN115620010A (en) Semantic segmentation method for RGB-T bimodal feature fusion
CN113706545A (en) Semi-supervised image segmentation method based on dual-branch nerve discrimination dimensionality reduction
CN111353544A (en) Improved Mixed Pooling-Yolov 3-based target detection method
CN114820579A (en) Semantic segmentation based image composite defect detection method and system
CN110852330A (en) Behavior identification method based on single stage
CN110852199A (en) Foreground extraction method based on double-frame coding and decoding model
CN114943876A (en) Cloud and cloud shadow detection method and device for multi-level semantic fusion and storage medium
CN113516126A (en) Adaptive threshold scene text detection method based on attention feature fusion
CN113298817A (en) High-accuracy semantic segmentation method for remote sensing image
Zhang et al. R2net: Residual refinement network for salient object detection
CN113487610B (en) Herpes image recognition method and device, computer equipment and storage medium
CN113408524A (en) Crop image segmentation and extraction algorithm based on MASK RCNN
CN107766838B (en) Video scene switching detection method
CN115578721A (en) Streetscape text real-time detection method based on attention feature fusion
CN112861860B (en) Text detection method in natural scene based on upper and lower boundary extraction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant