CN110751154B - Complex environment multi-shape text detection method based on pixel-level segmentation - Google Patents

Complex environment multi-shape text detection method based on pixel-level segmentation Download PDF

Info

Publication number
CN110751154B
CN110751154B CN201910929393.7A CN201910929393A CN110751154B CN 110751154 B CN110751154 B CN 110751154B CN 201910929393 A CN201910929393 A CN 201910929393A CN 110751154 B CN110751154 B CN 110751154B
Authority
CN
China
Prior art keywords
text
image
pixel
segmentation
fused
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910929393.7A
Other languages
Chinese (zh)
Other versions
CN110751154A (en
Inventor
袁媛
王�琦
陈旺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northwestern Polytechnical University
Original Assignee
Northwestern Polytechnical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwestern Polytechnical University filed Critical Northwestern Polytechnical University
Priority to CN201910929393.7A priority Critical patent/CN110751154B/en
Publication of CN110751154A publication Critical patent/CN110751154A/en
Application granted granted Critical
Publication of CN110751154B publication Critical patent/CN110751154B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/187Segmentation; Edge detection involving region growing; involving region merging; involving connected component labelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a complex environment multi-shape text detection method based on pixel level segmentation. Firstly, preprocessing images in a data set such as enhancement and the like, expanding the data set and obtaining labels with different sizes; then, constructing and training a complex environment text segmentation model based on a full convolution network; and finally, performing text detection on the given image by using the trained model. The method can detect texts in various shapes including arc shapes, effectively solves the detection problem of the texts in different scales, has robustness to illumination change and complex background conditions, and has higher detection accuracy and recall rate.

Description

Complex environment multi-shape text detection method based on pixel-level segmentation
Technical Field
The invention belongs to the technical field of computer vision and graphic processing, and particularly relates to a complex environment multi-shape text detection method based on pixel-level segmentation.
Background
The character recognition is divided into two specific steps of character detection and character recognition, wherein one of the two steps is not available, and the character detection is the premise of recognition. Text detection is not a simple task, and especially text detection in complex scenes is very challenging. But the character recognition under the natural scene has important effects on intelligent transportation, automatic driving, picture translation and the like. The method is also a research hotspot in the field of computer vision due to the strong application value.
Texts in natural scenes are very complex, and the texts have various tilt angles, languages, arrangements, size scales, fonts and the like; in the shooting environment, due to the brightness change/blur of the image or the deformation of the text caused by the shooting conditions, the complexity of the text in a natural scene is increased, and the detection difficulty is increased. Since the conventional method has difficulty in coping with such a complicated situation, the method of machine learning has been more applied to text detection in recent years.
Scene text detection methods based on deep learning are mainly based on convolutional neural networks and are roughly divided into two types: one type is a regression-based approach, typically based on a generic object detection framework. For example, "J.Ma, W.Shao, H.Ye, L.Wang, H.Wang, Y.Zheng, and X.Xue," inhibit-oriented scene text detection view rotation recommendations, "IEEE Transactions on Multimedia, vol.20, No.11, pp.3111-3122,2018" proposes the RRPN method, i.e., generating rotation candidate regions based on the fast R-CNN candidate region network (RPN) to detect text in any orientation. The second category is partition-based methods, primarily based on Full Convolutional Networks (FCNs). For example, "D.Deng, H.Liu, X.Li, and D.Cai," Pixellink: Detecting Scene Text instant Segmentation, "Proc.AAAI reference on Intelligent knowledge reference, 2018," proposed PixelLink method, by performing Text/sub-Text classification and predicting pixel connections between different Text instances, and finally performing connected domain analysis and merging to obtain the final Text box.
The method overcomes the problem that the inclined text is difficult to detect by the traditional method on the basis of universal detection. But also has limitations such as inability to effectively cope with texts with large bending and dimensional changes, etc.
Disclosure of Invention
In order to overcome the defects that the conventional text detection method cannot process texts with large bending or scale change and multi-line texts cannot be correctly separated, the invention provides a complex environment multi-shape text detection method based on pixel-level segmentation. Firstly, preprocessing images in a data set such as enhancement and the like, expanding the data set and obtaining labels with different sizes; then, constructing and training a complex environment text segmentation model based on a full convolution network; and finally, performing text detection on the given image by using the trained model. The method can detect texts in various shapes including arc shapes, effectively solves the detection problem of the texts in different scales, is more robust to illumination change and complex background conditions, and has higher detection accuracy and recall rate.
A complex environment multi-shape text detection method based on pixel level segmentation is characterized by comprising the following steps:
step 1, data preprocessing:
respectively carrying out enhancement processing on all images in the data set, and combining the images subjected to enhancement processing and the images in the original data set into a new image data set; respectively reducing the text region labels of each image in the new data set to 1/2 and 1/4, and adding the original labels to obtain three groups of labels; the enhancement processing comprises image rotation, brightness adjustment and scaling processing.
Step 2, constructing and training a complex environment text segmentation model based on a full convolution network:
step 2.1: inputting the samples into a ResNet50 network, and respectively extracting the outputs of the pool2, pool3, pool4 and pool5 layers to obtain 4 features with different scales, wherein the features are sequentially represented as f _1, f _2, f _3 and f _4 from small to large according to the scales;
step 2.2: inputting the minimum scale feature f _1 into the upper pooling layer, then cascading the minimum scale feature with f _2, and inputting the cascaded feature into a feature fusion module to obtain a first fused transformation feature; inputting the fused transformation characteristic I into an upper pooling layer, cascading the transformation characteristic I with f _3, and passing the cascaded characteristic through a characteristic fusion module to obtain a fused transformation characteristic II; inputting the fused transformation feature II into an upper pooling layer, cascading the transformation feature II with f _4, and passing the cascaded feature through a feature fusion module to finally obtain the transformation feature fused with 4 different scale features; the characteristic fusion module consists of a convolution layer with convolution kernel size of 3 multiplied by 3, a Batch Normalization layer and a ReLU layer;
step 2.3: inputting the transformation characteristics finally fused in the step 2.2 into a convolution layer with a convolution kernel size of 1x1, and activating the layer through a Sigmoid function to obtain a pixel-level segmentation image;
step 2.4: training the models in the steps 2.1 to 2.3 by taking the label of the image as a target and using cross entropy as a loss function to calculate a loss value, and training three groups of different labels to obtain three segmentation models;
step 3, text detection:
step 3.1: respectively inputting the text image to be detected into the three segmentation models obtained in the step 2, and binarizing the output to obtain three segmentation results A _1, A _2 and A _3 which respectively correspond to 1/4, 1/2 and the original text region segmentation image;
step 3.2: analyzing the connected domain of A _1, and marking different connected domains by different positive integers; superposing the marked image and A _2, analyzing the connected domain of the superposed image, and respectively removing and expanding the regions to obtain a 1/2-sized segmented image (A'; AlO _ 2); stacking A 'and A _3, analyzing connected domain of the stacked images, and removing and expanding the regions to obtain the final segmented image A' of original size; the region removal means that all pixel values are set to 0 for a connected region with the maximum value of 1; after the region is removed, setting the pixel with the remaining value of 1 as the value of the pixel with the value which is closest to the pixel and is not 0 or 1;
step 3.3: the segmented images a' are processed using an OpenCV contour detection function to obtain contour point coordinates of different text regions.
The invention has the beneficial effects that: because the network model can realize the fusion of different scale characteristics, the method has better detection effect on texts with various sizes. Due to the adoption of the image segmentation technology, not only can a rectangular text region be detected, but also irregular texts such as bent texts can be well detected. Due to the fact that the expansion processing is carried out on the text core region, the multi-line text of the dense region can be well separated, and compared with direct segmentation, the text region overlapping portion can also be well separated, and the false detection rate can be reduced. The deep network of the method can cope with text detection tasks in complex backgrounds, and has higher detection accuracy and better robustness.
Drawings
FIG. 1 is a flowchart of a method for detecting polymorphic text in a complex environment according to the present invention
FIG. 2 is a diagram of a text segmentation model in a complex environment according to the present invention
Detailed Description
The present invention will be further described with reference to the following drawings and examples, which include, but are not limited to, the following examples.
As shown in fig. 1, the present invention provides a method for detecting a multi-shape text in a complex environment, which is implemented as follows:
1. data pre-processing
Step 1.1: images in the used ICDAR2015 and Total-Text datasets, whose pictures carry Text region labels, are first data enhanced to fit complex scenes. The image in the data set is subjected to image data enhancement through a combination of rotation, brightness adjustment and a scaling mode, in the embodiment, the rotation angle is randomly generated from-90 degrees to 90 degrees, the brightness adjustment mode is that the brightness is randomly increased or decreased by 50%, and the scaling mode is that the brightness is randomly scaled 1/2 to 2 times. After the images are subjected to the enhancement processing, the processed images are combined into the original data set to obtain an expanded image data set, and the expanded image data set is used for training samples of a subsequent feature learning algorithm to deal with light changes and shooting angle changes in a complex environment.
Step 1.2: and respectively reducing the text region labels of each image in the new data set to 1/2 and 1/4, and adding the original labels to each image to obtain three groups of labels with different sizes. The method specifically comprises the following steps: firstly, generating an image (the size is the size of an original image) with all pixel values of 0, filling a text region of a label with 1 by using an Opencv polygon filling algorithm, then respectively corroding the text region with the widths of 1/4 and 3/8 (namely the minimum value of the distances between four corner points) by using an Opencv corrosion algorithm, so that a new text label is changed into 1/2 and 1/4 of the original size, and adding the original label to obtain three groups of pixel-level segmentation labels with different sizes.
2. Construction and training of complex environment text segmentation model based on full convolution network
As shown in fig. 2, includes:
step 2.1: and constructing a multi-scale Feature extractor based on a Feature Pyramid Network (FPN). Using ResNet50 as a skeleton network, generating a feature pyramid, and using features output by 4 layers, namely pool2, pool3, pool4 and pool5, wherein the scales from small to large are respectively represented as f _1, f _2, f _3 and f _ 4.
Among them, the ResNet50 network is described in "Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jianan Sun. deep reactive Learning for Image registration [ A ]. IEEE Conference on Computer Vision and Pattern registration [ C ].2016.1063-6919 ].
Step 2.2: f _1 is cascaded with f _2 through an upper pooling layer (the method is bilinear upsampling, and the scale is changed into 2 times). And (3) passing the cascaded features through a feature fusion module consisting of a convolution layer with a convolution kernel size of 3 multiplied by 3, a Batch Normalization layer and a ReLU layer to obtain the transformation features fused with f _1 and f _ 2. Similarly, the obtained transformation features are cascaded with f _3 after passing through the pooling layer, and then pass through the feature fusion module, so that the transformation features with f _1, f _2 and f _3 are fused. And after the new features pass through the upper pooling layer, the new features are cascaded with f _4 and then pass through a feature fusion module, and finally the transformation features obtained after 4 features with different scales are fused are obtained.
Step 2.3: the fused transformation features are segmented using convolution layers (Conv1x1) with convolution kernel size 1x1 and Sigmoid function activation layers, and segmented images with pixel values of 0 to 1 are output, corresponding to the confidence of each pixel in the detection region.
Step 2.4: and inputting the labeled image to train the model. The cross entropy is used as a loss function to calculate a loss value, the learning rate is set to be 0.001, the batch size is set to be 32, and the model is trained by using a stochastic gradient descent method. And respectively obtaining three text segmentation models for the three groups of different labels.
3. Text detection
Step 3.1: the text image to be detected is input into the above to obtain three segmentation models, the output is subjected to binarization processing, and three segmentation results, namely segmentation images of 1/4 size, 1/2 size and original size of each text region are respectively obtained and are respectively represented as A _1, A _2 and A _ 3. The binarization threshold value is set to 0.6 in this embodiment.
Step 3.2: and performing connected component analysis on A _1, and marking different connected components (the marking method is to set all pixel values in the components to different positive integers). The resulting image is superimposed with a _2, i.e. the value of each pixel is added, and connected domain analysis is performed. Setting all the values to be 0 for the connected region with the maximum value of 1, and removing the text region with lower credibility; for each remaining pixel with a value of 1, the value of the pixel with a value other than 0 or 1 closest to the pixel is set, resulting in an example segmented image that extends to a size of 1/2. Similarly, the obtained image is superposed with the segmentation result of a _3, and the same operation as the above process is performed, and finally the text instance segmentation image is obtained.
Step 3.3: and processing the segmented image obtained in the last step by using an OpenCV contour detection function to obtain contour point coordinates of different text areas, namely the required final output result.
To verify the effectiveness of the method of the present invention, the CPU is Intel (R) core (TM) i7-6800K CPU @3.40GHz, the memory 64G and the graphics processor are
Figure GDA0003476304680000051
Ubuntu18.04LTS operation of Geforce 1080Ti GPUAnd performing a simulation experiment by using a Pythrch frame on the system. The experiments used the public data set ICDAR2015 containing oblique Text and the public data set Total-Text containing curved Text, respectively.
Firstly, learning features according to training steps in a specific implementation mode by using a training set; and then, detecting the pictures in the test set according to the detection step, and calculating the accuracy P (the accuracy of the detection result), the recall ratio R (the ratio of the detected existing text region) and the F value by combining the result of the real mark, wherein the F value integrates the accuracy and the recall ratio, and the larger the value is, the better the effect of the method is.
Meanwhile, a connected text area network (CTPN) (document "z.tie, w.huang, t.he, p.he, and y.qiao," Detecting text In natural image with connecting text protocol network "In ECCV, 2017"), a division connection network (SegLink) (document "b.shi, x.bai, and s.belongingie," Detecting encoded text In natural images by linking segments ", In CVPR, 2017") and a rotation candidate area network (RRPN) (document "j.ma, w.share, h.ye, l.wang, h.wang, y.z, and x.xue," array-oriented text protocol, reaction, "2018) were selected as a comparison table, and the results are calculated as a comparison table, i.1, 2, and a comparison table, respectively. The calculation results show that the detection performance of the method for the oblique text and the bent text is better, particularly the detection result of the bent text is far superior to that of other methods, and the method has good practicability and robustness for detecting the complex text in the natural environment.
TABLE 1
Method Recall rate Rate of accuracy F value
CTPN 51.56% 74.22% 60.85%
SegLink 76.8% 73.1% 75.0%
RRPN 73.0% 82.0% 77.0%
The method of the invention 73.62% 79.81% 76.6%
TABLE 2
Method Recall rate Rate of accuracy F value
CTPN 20.7% 28.6% 24.0%
SegLink 23.8% 30.3% 26.7%
RRPN 36.2% 40.2% 38.09%
The method of the invention 69.54 77.02% 73.09%

Claims (1)

1. A complex environment multi-shape text detection method based on pixel level segmentation is characterized by comprising the following steps:
step 1, data preprocessing:
respectively carrying out enhancement processing on all images in the data set, and combining the images subjected to enhancement processing and the images in the original data set into a new image data set; respectively reducing the text region labels of each image in the new data set to 1/2 and 1/4, and adding the original labels to obtain three groups of labels; the enhancement processing comprises image rotation, brightness adjustment and scaling processing;
step 2, constructing and training a complex environment text segmentation model based on a full convolution network:
step 2.1: inputting the samples into a ResNet50 network, and respectively extracting the pool2 and the poolThe output of the ol3, the pool4 and the pool5 layers obtains 4 characteristics with different scales, which are sequentially expressed as f from small to large according to the scales1,f2,f3,f4
Step 2.2: the minimum scale feature f1After entering the upper pooling layer and f2Cascading, and inputting the cascaded features into a feature fusion module to obtain a first fused transformation feature; inputting the fused transformation characteristic I into the upper pooling layer and f3Cascading, namely passing the cascaded features through a feature fusion module to obtain a second fused transformation feature; inputting the two fused transformation characteristics into the upper pooling layer and f4Cascading, namely passing the cascaded features through a feature fusion module to finally obtain transformation features fused with 4 features with different scales; the characteristic fusion module consists of a convolution layer with convolution kernel size of 3 multiplied by 3, a Batch Normalization layer and a ReLU layer;
step 2.3: inputting the transformation characteristics finally fused in the step 2.2 into a convolution layer with a convolution kernel size of 1x1, and activating the layer through a Sigmoid function to obtain a pixel-level segmentation image;
step 2.4: training the models in the steps 2.1 to 2.3 by taking the label of the image as a target and using cross entropy as a loss function to calculate a loss value, and training three groups of different labels to obtain three segmentation models;
step 3, text detection:
step 3.1: respectively inputting the text image to be detected into the three segmentation models obtained in the step 2, and obtaining three segmentation results A after binarization of the output1,A2,A3Segmenting the image corresponding to 1/4, 1/2 and the original text region respectively;
step 3.2: to A1Analyzing the connected domains, and marking different connected regions with different positive integers; the marked image is compared with A2Superposing, analyzing the connected domain of the superposed images, and respectively removing and expanding the regions to obtain a segmented image A 'with the size of 1/2'2(ii) a A'2And A3Overlapping, analyzing the connected domain of the overlapped images, and respectively removing the regionsAnd expanding to obtain final segmentation image A 'of original size'3(ii) a The region removal means that all pixel values are set to 0 for a connected region with the maximum value of 1; after the region is removed, setting the pixel with the remaining value of 1 as the value of the pixel with the value which is closest to the pixel and is not 0 or 1;
step 3.3: segment image A 'using OpenCV contour detection function'3And processing to obtain the coordinates of the contour points of different text areas.
CN201910929393.7A 2019-09-27 2019-09-27 Complex environment multi-shape text detection method based on pixel-level segmentation Active CN110751154B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910929393.7A CN110751154B (en) 2019-09-27 2019-09-27 Complex environment multi-shape text detection method based on pixel-level segmentation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910929393.7A CN110751154B (en) 2019-09-27 2019-09-27 Complex environment multi-shape text detection method based on pixel-level segmentation

Publications (2)

Publication Number Publication Date
CN110751154A CN110751154A (en) 2020-02-04
CN110751154B true CN110751154B (en) 2022-04-08

Family

ID=69277379

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910929393.7A Active CN110751154B (en) 2019-09-27 2019-09-27 Complex environment multi-shape text detection method based on pixel-level segmentation

Country Status (1)

Country Link
CN (1) CN110751154B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111368848B (en) * 2020-05-28 2020-08-21 北京同方软件有限公司 Character detection method under complex scene
CN112200181B (en) * 2020-08-19 2023-10-10 西安理工大学 Character shape approximation method based on particle swarm optimization algorithm
CN112926372B (en) * 2020-08-22 2023-03-10 清华大学 Scene character detection method and system based on sequence deformation
CN112101355B (en) * 2020-09-25 2024-04-02 北京百度网讯科技有限公司 Method and device for detecting text in image, electronic equipment and computer medium
CN113255646B (en) * 2021-06-02 2022-10-18 北京理工大学 Real-time scene text detection method
CN114049625B (en) * 2021-11-11 2024-02-27 西北工业大学 Multidirectional text detection method based on novel image shrinkage method

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107609549A (en) * 2017-09-20 2018-01-19 北京工业大学 The Method for text detection of certificate image under a kind of natural scene
CN107784654A (en) * 2016-08-26 2018-03-09 杭州海康威视数字技术股份有限公司 Image partition method, device and full convolutional network system
CN108288088A (en) * 2018-01-17 2018-07-17 浙江大学 A kind of scene text detection method based on end-to-end full convolutional neural networks
CN108549893A (en) * 2018-04-04 2018-09-18 华中科技大学 A kind of end-to-end recognition methods of the scene text of arbitrary shape
CN108830855A (en) * 2018-04-02 2018-11-16 华南理工大学 A kind of full convolutional network semantic segmentation method based on the fusion of multiple dimensioned low-level feature
CN110008950A (en) * 2019-03-13 2019-07-12 南京大学 The method of text detection in the natural scene of a kind of pair of shape robust
CN110059539A (en) * 2019-02-27 2019-07-26 天津大学 A kind of natural scene text position detection method based on image segmentation
CN110232381A (en) * 2019-06-19 2019-09-13 梧州学院 License Plate Segmentation method, apparatus, computer equipment and computer readable storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107784654A (en) * 2016-08-26 2018-03-09 杭州海康威视数字技术股份有限公司 Image partition method, device and full convolutional network system
CN107609549A (en) * 2017-09-20 2018-01-19 北京工业大学 The Method for text detection of certificate image under a kind of natural scene
CN108288088A (en) * 2018-01-17 2018-07-17 浙江大学 A kind of scene text detection method based on end-to-end full convolutional neural networks
CN108830855A (en) * 2018-04-02 2018-11-16 华南理工大学 A kind of full convolutional network semantic segmentation method based on the fusion of multiple dimensioned low-level feature
CN108549893A (en) * 2018-04-04 2018-09-18 华中科技大学 A kind of end-to-end recognition methods of the scene text of arbitrary shape
CN110059539A (en) * 2019-02-27 2019-07-26 天津大学 A kind of natural scene text position detection method based on image segmentation
CN110008950A (en) * 2019-03-13 2019-07-12 南京大学 The method of text detection in the natural scene of a kind of pair of shape robust
CN110232381A (en) * 2019-06-19 2019-09-13 梧州学院 License Plate Segmentation method, apparatus, computer equipment and computer readable storage medium

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
Arbitrary-Oriented Scene Text Detection via Rotation Proposals;Jianqi Ma et al.;《IEEE TRANSACTIONS ON MULTIMEDIA》;20181130;第20卷(第11期);第3111-3122页 *
Detecting Oriented Text in Natural Images by Linking Segments;Baoguang Shi et al.;《2017 IEEE Conference on Computer Vision and Pattern Recognition》;20171109;第3482-3490页 *
Detecting Text in Natural Image with Connectionist Text Proposal Network;Zhi Tian et al.;《ECCV 2016》;20160917;第56-72页 *
Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes;Pengyuan Lyu et al.;《arXiv》;20180801;第1-18页 *
PixelLink: Detecting Scene Text via Instance Segmentation;Dan Deng et al.;《arXiv》;20180104;第1-8页 *
TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes;Shangbang Long et al.;《ECCV 2018》;20181009;第19-35页 *
基于语义分割技术的任意方向文字识别;王涛 等;《应用科技》;20180630;第45卷(第3期);第55-60页 *

Also Published As

Publication number Publication date
CN110751154A (en) 2020-02-04

Similar Documents

Publication Publication Date Title
CN110751154B (en) Complex environment multi-shape text detection method based on pixel-level segmentation
CN108549893B (en) End-to-end identification method for scene text with any shape
CN109299274B (en) Natural scene text detection method based on full convolution neural network
CN108898610B (en) Object contour extraction method based on mask-RCNN
JP7113657B2 (en) Information processing device, information processing method, and program
CN110334762B (en) Feature matching method based on quad tree combined with ORB and SIFT
CN104751142B (en) A kind of natural scene Method for text detection based on stroke feature
Lee et al. SNIDER: Single noisy image denoising and rectification for improving license plate recognition
CN113591719A (en) Method and device for detecting text with any shape in natural scene and training method
CN111680690A (en) Character recognition method and device
Wu et al. Text Detection and Recognition for Natural Scene Images Using Deep Convolutional Neural Networks.
Ahmed et al. Traffic sign detection and recognition model using support vector machine and histogram of oriented gradient
CN113989604A (en) Tire DOT information identification method based on end-to-end deep learning
Goud et al. Text localization and recognition from natural scene images using ai
Qin et al. Robust and accurate text stroke segmentation
Xu et al. Based on improved edge detection algorithm for English text extraction and restoration from color images
Kumar et al. An efficient algorithm for text localization and extraction in complex video text images
Rajan et al. Text detection and character extraction in natural scene images using fractional Poisson model
Hossen et al. License plate detection and recognition system based on morphological approach and feed-forward neural network
Xu et al. Tolerance Information Extraction for Mechanical Engineering Drawings–A Digital Image Processing and Deep Learning-based Model
Toaha et al. Automatic signboard detection from natural scene image in context of Bangladesh Google street view
CN114639013A (en) Remote sensing image airplane target detection and identification method based on improved Orient RCNN model
Rani et al. Object Detection in Natural Scene Images Using Thresholding Techniques
Zhu et al. Chip surface character recognition based on improved LeNet-5 convolutional neural network
CN114049625B (en) Multidirectional text detection method based on novel image shrinkage method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant