CN106650725B - Candidate text box generation and text detection method based on full convolution neural network - Google Patents

Candidate text box generation and text detection method based on full convolution neural network Download PDF

Info

Publication number
CN106650725B
CN106650725B CN201611070587.9A CN201611070587A CN106650725B CN 106650725 B CN106650725 B CN 106650725B CN 201611070587 A CN201611070587 A CN 201611070587A CN 106650725 B CN106650725 B CN 106650725B
Authority
CN
China
Prior art keywords
text
candidate
box
network
boxes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201611070587.9A
Other languages
Chinese (zh)
Other versions
CN106650725A (en
Inventor
马景法
金连文
钟卓耀
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN201611070587.9A priority Critical patent/CN106650725B/en
Publication of CN106650725A publication Critical patent/CN106650725A/en
Application granted granted Critical
Publication of CN106650725B publication Critical patent/CN106650725B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Abstract

The invention discloses a candidate text box generation and text detection method based on a full convolution neural network, which comprises the following steps: generating a text region candidate frame, wherein an initiation-RPN takes a natural scene picture and a set of real boundary frames for marking a text region as input, generates word region candidate frames with controllable quantity, slides an initiation network on a convolution characteristic response image of a VGG16 model, and assists a set of text characteristic prior frames at each sliding position; text type monitoring information which easily causes ambiguity is merged, multilevel regional downsampling information is merged, and text detection is carried out; training an initiation candidate box in an end-to-end mode to generate a network and a text detection network through back propagation and random gradient descent; the candidate box iterative voting achieves a higher text recall in a complementary manner, using a candidate box filtering algorithm to remove the remaining detection boxes. The invention obtains 0.83 and 0.85 accuracy on ICDAR 2011 and 2013robust text detection standard databases respectively, which is superior to the best result in the past.

Description

Candidate text box generation and text detection method based on full convolution neural network
Technical Field
The invention relates to a technology for generating a text candidate box and detecting a text in a natural scene picture, in particular to a method for generating a candidate text box and detecting a text based on a full convolution neural network.
Background
Text in images provides a rich and accurate high level of semantic information that is critical to a large number of potential applications such as scene understanding, image and food retrieval, content-based recommendation systems, and the like. Text detection of natural scene pictures has attracted a great deal of attention in computer vision and image understanding communities. However, text detection of natural scenes remains a challenging and unsolved problem. First, the background of text pictures is complex and the composition of areas such as symbols, logos, bricks and grass is very difficult to distinguish from text. In addition, the super-mixed factors of non-uniform lighting conditions, strong exposure, low contrast, blur, and low resolution add significant challenges to the text detection task
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a candidate text box generation and text detection method based on a full convolution neural network.
The technical scheme of the invention is realized as follows:
the candidate text box generating and text detecting method based on the full convolution neural network includes the steps of
S1: generating a text region candidate frame, wherein an initiation-RPN takes a natural scene picture and a set of real boundary frames for marking a text region as input, generates word region candidate frames with controllable quantity, slides an initiation network on a convolution characteristic response image of a VGG16 model, and assists a set of text characteristic prior frames at each sliding position;
s2: text type monitoring information which easily causes ambiguity is merged, multilevel regional downsampling information is merged, and text detection is carried out;
s3: training an initiation candidate box in an end-to-end mode to generate a network and a text detection network through back propagation and random gradient descent;
s4: the candidate box iterative voting achieves a higher text recall in a complementary manner, using a candidate box filtering algorithm to remove the remaining detection boxes.
Further, step S1 includes the step of
S11: designing a text feature prior box;
s12: and constructing an inclusion candidate box generation network.
Further, in step S11, there are 24 text feature prior boxes, where the width of the sliding window at each sliding position is set to be 32, 48, 64 and 80, and the length-to-width ratio is 0.2, 0.5, 0.8, 1.0, 1.2 and 1.5.
Further, the initiation candidate frame generation net in step S12 is formed by connecting a 3 × 3 convolutional layer, a 5 × 5 convolutional layer and a 3 × 3 max pooling layer to a corresponding spatial acceptance domain of the characteristic response map of Conv5_3 as input.
Further, in step S2, the text type supervision information is: the candidate box IoU is designated as existing text with an overlap of 0.5 or more, the candidate box IoU is designated as "fuzzy text" with an overlap of 0.2 or more and less than 0.5, and the others are designated as containing no text information.
Further, the multi-level area downsampling information in step S2 is: the convolutional feature response maps of Conv4_3 and Conv5_3 in the VGG16 network both perform multi-level region downsampling and obtain two 512H W sampled features, which are then decoded with one 512H 1W convolutional layer to join the features together.
Compared with the prior art, the invention provides an initiation candidate box generation network, which applies sliding windows with different sizes on a convolution characteristic diagram and assists a set of text characteristic prior boxes at each sliding position to generate word area candidate boxes. The sliding windows with different sizes keep local information at corresponding positions and also give consideration to context information, and help to filter candidate frames without texts, and the initiation candidate frame generation network of the invention obtains high recall rate under the condition of only using hundreds of word candidate frames; the invention also introduces additional text category supervision information which is easy to disambiguate and multi-level regional down-sampling information which are fused into the text detection network, and the information helps the text detection network to learn more discriminative information to distinguish the text from the complex background; in addition, in order to better utilize the model in the training process, the invention provides a candidate box iterative voting scheme, and a higher word recall rate is obtained in a supplementary mode.
Drawings
FIG. 1 is a flow chart of a candidate text box generation and text detection method based on a full convolution neural network according to the present invention.
FIG. 2 is an exemplary diagram of IoU overlap of word region candidate boxes in a particular interval, according to one embodiment of the invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, the method for generating a candidate text box and detecting a text based on a full convolution neural network of the present invention includes four steps: s1, generating a text region candidate box; s2, text detection; s3, end-to-end learning optimization; and S4, heuristic processing.
The function of the component S1 is: the initiation-RPN takes a natural scene picture and a set of real boundary boxes for marking text areas as input to generate word area candidate boxes with controllable quantity; to search for word region candidate boxes, we slide an initiation network over the convolution feature response map of the VGG16 model and assist a set of text feature prior boxes at each slide position. The method comprises the following steps: (1) and designing a text feature prior box (2) an inclusion candidate box to generate a network. Four different scales (32, 48, 64 and 80) and six different ratios (0.2, 0.5, 0.8, 1.0, 1.2 and 1.5) were set at each sliding position, for a total of 24 a priori sliding windows. In the learning phase, text labels are designated that intersect with the real text box divided by a union greater than 0.5, whereas background labels are designated that overlap areas divided by a union area less than 0.3. The designed concept candidate frame generation net is connected to a corresponding spatial acceptance domain of the characteristic response map of Conv5_3 as input by a convolution layer of 3 x 3, a convolution layer of 5 x 5 and a maximum pooling layer of 3 x 3. In addition, to reduce dimensionality, a convolution operation of 1 × 1 is applied on the 3 × 3 max pooling layer. Then we join the features of each part on the channel coordinates, and a 640-dimensional connected feature vector is sent to two output layers: the classification layer predicts whether the text score exists in the region, and the regression layer improves the text region position of various prior windows at each sliding position.
Step S2 includes: (1) text category supervision information which is easy to cause ambiguity is integrated to increase more reasonable supervision information, a classifier is helped to learn more distinctive features, text regions are distinguished from complex and diverse backgrounds, and candidate boxes which do not contain texts are filtered out. (2) And integrating multi-level regional down-sampling information. The effect is to better utilize the convolution characteristics of multiple layers and enrich the distinguishing information of each sliding window.
Much of the previous work has been on detecting networks designating IoU candidates with overlap greater than 0.5 as being text present and vice versa. However, this method of determining whether text is present in the candidate box is not reasonable because IoU overlap in the interval 0.2 to 0.5 may contain spatial or extensive text information, as shown in FIG. 2. This confounding labeling information can confuse the classification learning of textual and non-textual candidate boxes. To this end, we propose to designate as present text the candidate box IoU that overlaps by 0.5 or more, the candidate box IoU that overlaps by 0.2 or more and less than 0.5 as "fuzzy text", and the others as not containing textual information. This strategy provides more reasonable supervision information to help the classifier learn more distinctive features to discriminate text from a complex and diverse background and filter out candidate boxes that do not contain text.
In order to better utilize multilevel convolution characteristics and enrich the discrimination information of each candidate frame, the invention performs multilevel regional downsampling on the convolution characteristic response graphs of Conv4_3 and Conv5_3 of the VGG16 network, and obtains two sampling characteristics of 512H W. The connected features are then decoded with a 512 x 1 convolutional layer. The effect of this 1 x 1 convolutional layer is (1) to combine the sampled features of multiple levels together and weight-weighted fusion during the training process. (2) The dimensions are reduced to match the first fully connected layer of VGG 16.
The component S3 is different from the proposed four-step training strategy combining RPN and Fast-RCNN, the invention trains the initiation candidate box generation network and the text detection network in an end-to-end mode through a back propagation and random gradient descent method. The shared convolutional network is initialized by a pre-trained imageNet classification network. The weights of the new layers are initialized by a gaussian distribution with a mean of 0 and a variance of 0.01. The baseline learning rate was 0.001, and was reduced to one tenth of the original 40000 times per iteration. The momentum and the weight attenuation were set to 0.9 and 0.0005, respectively.
The inclusion candidate box generation network and the text detection network have two sibling input layers: a classification layer, and a regression layer. The difference between the output layers of the inclusion candidate box generation network and the text detection network is as follows: (1) the inclusion candidate box generates a network, and each prior box should be parameterized independently, so we need to predict k-24 prior candidate boxes simultaneously. The classification layer outputs 2k scores for judging whether the candidate frames have texts, and the regression layer outputs 4k numerical values of the improved candidate frames deviating from the original candidate frames. (2) The text detection network has three output scores for each candidate box, which respectively correspond to the background, the fuzzy text and the candidate box with the text. The regression layer outputs 4 regression bias values for each text candidate box. We minimize this multi-tasking loss function during the training process, the formula is as follows:
L(p,p*,t,t*)=Lcls(p,p*)+λLreg(t,t*), (0.1)
loss function L of classification layerclsIs the softmax loss function, p and p*Respectively a predicted tag and a genuine tag. Regression loss function LregA smooth-L1 loss function is applied. In addition, t is { t ═ tx,ty,tw,thAnd
Figure BDA0001165080850000051
corresponding regression deviation value vectors, t, representing the prediction and true candidate frames, respectively*The following formula is obtained:
Figure BDA0001165080850000052
here, P ═ { P ═ Px,Py,Pw,PhAnd G ═ Gx,Gy,Gw,GhRepresents the center coordinates, height, and width of the corresponding candidate frame P and real text frame G, respectivelyAnd (4) degree. λ represents the loss balance parameter, and in the initiation candidate box generation network we let λ be 3 to bias him towards better candidate box positions, and in the text detection network let λ be 1.
The component S4 includes a candidate box iterative voting mechanism and a filtering algorithm. The candidate box iterative voting mechanism enables the invention to obtain higher text recall rate in a supplementary mode and improves the performance of a text detection system. The filtering algorithm allows the present invention to remove excess detection boxes to improve accuracy.
The invention firstly inputs a natural scene picture and a set of real text box data into an initiation candidate box to generate a network, and generates a certain number of word area candidate boxes. And then, sending the obtained word region candidate box into a text detection network for text and non-text classification and text positioning, wherein the network adds text category supervision information which is easy to cause ambiguity and region down-sampling information which is fused with multiple layers in the training process. The entire system is trained in an end-to-end fashion through back-propagation and gradient descent mechanisms. In order to fully utilize the intermediate model of the training process, the invention adopts a candidate box iterative voting mechanism to obtain the high recall rate of the text example in a supplementary mode, and improves the performance of the whole text detection system. Finally, the invention applies a filtering algorithm that finds the inside and outside candidate boxes for each text instance in terms of coordinate position, retains the high-score candidate boxes, and removes the low-score candidate boxes.
While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention.

Claims (6)

1. The candidate text box generation and text detection method based on the full convolution neural network is characterized by comprising the following steps
S1: generating a text region candidate frame, wherein an initiation-RPN takes a natural scene picture and a set of real boundary frames for marking a text region as input, generates word region candidate frames with controllable quantity, slides an initiation network on a convolution characteristic response image of a VGG16 model, and assists a set of text characteristic prior frames at each sliding position;
s2: text type monitoring information which easily causes ambiguity is merged, multilevel regional downsampling information is merged, and text detection is carried out;
s3: training an initiation candidate box in an end-to-end mode to generate a network and a text detection network through back propagation and random gradient descent;
s4: iteratively voting the candidate boxes to obtain a higher text recall rate in a supplementary mode, and removing the residual detection boxes by using a candidate box filtering algorithm;
the loss function of multitask is minimized in the training process, and the formula is as follows:
Figure FDA0002386877820000011
loss function L of classification layerclsIs the softmax loss function, p and p*Predicted tags and true tags, respectively; regression loss function LregApplying a smooth-L1 loss function; in addition, t is { t ═ tx,ty,tw,thAnd
Figure FDA0002386877820000013
corresponding regression deviation value vectors, t, representing the prediction and true candidate frames, respectively*The following formula is obtained:
Figure FDA0002386877820000014
wherein P ═ { P ═ Px,Py,Pw,PhAnd G ═ Gx,Gy,Gw,GhRepresents the center coordinates, height and width of the corresponding candidate box P and real text box G, respectively, and λ represents the loss balance parameter.
2. The method for generating candidate text boxes and detecting text based on full convolutional neural network as claimed in claim 1, wherein step S1 includes the step of
S11: designing a text feature prior box;
s12: and constructing an acceptance candidate box to generate a network.
3. The method for generating candidate text boxes and detecting text based on full convolution neural network as claimed in claim 2, wherein the text feature prior boxes in step S11 are 24 in number, wherein the width of each sliding window in sliding position is set to be 32, 48, 64 and 80, and the ratio of length to width is 0.2, 0.5, 0.8, 1.0, 1.2 and 1.5.
4. The method as claimed in claim 2, wherein the concept candidate box generating network in step S12 is formed by connecting a 3 x 3 convolutional layer, a 5 x 5 convolutional layer and a 3 x 3 maximal pooling layer to corresponding spatial acceptance fields of a feature response map of Conv5_3 as input.
5. The method for generating candidate text boxes and detecting texts based on full convolutional neural network as claimed in claim 1, wherein the text type supervision information in step S2 is: the candidate box IoU is designated as existing text with an overlap of 0.5 or more, the candidate box IoU is designated as "fuzzy text" with an overlap of 0.2 or more and less than 0.5, and the others are designated as containing no text information.
6. The method for generating a candidate text box and detecting a text based on a full convolutional neural network as claimed in claim 1, wherein the multi-level region downsampling information in step S2 is: the convolutional feature response maps of Conv4_3 and Conv5_3 in the VGG16 network both perform multi-level region downsampling and obtain two 512H W sampled features, which are then decoded with one 512H 1W convolutional layer to join the features together.
CN201611070587.9A 2016-11-29 2016-11-29 Candidate text box generation and text detection method based on full convolution neural network Active CN106650725B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611070587.9A CN106650725B (en) 2016-11-29 2016-11-29 Candidate text box generation and text detection method based on full convolution neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611070587.9A CN106650725B (en) 2016-11-29 2016-11-29 Candidate text box generation and text detection method based on full convolution neural network

Publications (2)

Publication Number Publication Date
CN106650725A CN106650725A (en) 2017-05-10
CN106650725B true CN106650725B (en) 2020-06-26

Family

ID=58813359

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611070587.9A Active CN106650725B (en) 2016-11-29 2016-11-29 Candidate text box generation and text detection method based on full convolution neural network

Country Status (1)

Country Link
CN (1) CN106650725B (en)

Families Citing this family (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107316058A (en) * 2017-06-15 2017-11-03 国家新闻出版广电总局广播科学研究院 Improve the method for target detection performance by improving target classification and positional accuracy
CN107397658B (en) * 2017-07-26 2020-06-19 成都快眼科技有限公司 Multi-scale full-convolution network and visual blind guiding method and device
CN109389114B (en) * 2017-08-08 2021-12-03 富士通株式会社 Text line acquisition device and method
CN107480649B (en) * 2017-08-24 2020-08-18 浙江工业大学 Fingerprint sweat pore extraction method based on full convolution neural network
CN108090443B (en) * 2017-12-15 2020-09-22 华南理工大学 Scene text detection method and system based on deep reinforcement learning
CN108288088B (en) * 2018-01-17 2020-02-28 浙江大学 Scene text detection method based on end-to-end full convolution neural network
CN108154145B (en) * 2018-01-24 2020-05-19 北京地平线机器人技术研发有限公司 Method and device for detecting position of text in natural scene image
CN108647681B (en) * 2018-05-08 2019-06-14 重庆邮电大学 A kind of English text detection method with text orientation correction
CN108764228A (en) * 2018-05-28 2018-11-06 嘉兴善索智能科技有限公司 Word object detection method in a kind of image
CN110619325B (en) * 2018-06-20 2024-03-08 北京搜狗科技发展有限公司 Text recognition method and device
CN109190458B (en) * 2018-07-20 2022-03-25 华南理工大学 Method for detecting head of small person based on deep learning
CN109165697B (en) * 2018-10-12 2021-11-30 福州大学 Natural scene character detection method based on attention mechanism convolutional neural network
CN109376658B (en) * 2018-10-26 2022-03-08 信雅达科技股份有限公司 OCR method based on deep learning
CN109492630A (en) * 2018-10-26 2019-03-19 信雅达系统工程股份有限公司 A method of the word area detection positioning in the financial industry image based on deep learning
CN109299274B (en) * 2018-11-07 2021-12-17 南京大学 Natural scene text detection method based on full convolution neural network
CN109598290A (en) * 2018-11-22 2019-04-09 上海交通大学 A kind of image small target detecting method combined based on hierarchical detection
CN109800756B (en) * 2018-12-14 2021-02-12 华南理工大学 Character detection and identification method for dense text of Chinese historical literature
WO2020132322A1 (en) * 2018-12-19 2020-06-25 Aquifi, Inc. Systems and methods for joint learning of complex visual inspection tasks using computer vision
CN109918987B (en) * 2018-12-29 2021-05-14 中国电子科技集团公司信息科学研究院 Video subtitle keyword identification method and device
CN110135408B (en) * 2019-03-26 2021-02-19 北京捷通华声科技股份有限公司 Text image detection method, network and equipment
CN110135248A (en) * 2019-04-03 2019-08-16 华南理工大学 A kind of natural scene Method for text detection based on deep learning
CN110135424B (en) * 2019-05-23 2021-06-11 阳光保险集团股份有限公司 Inclined text detection model training method and ticket image text detection method
CN112418207B (en) * 2020-11-23 2024-03-19 南京审计大学 Weak supervision character detection method based on self-attention distillation
CN112765353B (en) * 2021-01-22 2022-11-04 重庆邮电大学 Scientific research text-based biomedical subject classification method and device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104915386A (en) * 2015-05-25 2015-09-16 中国科学院自动化研究所 Short text clustering method based on deep semantic feature learning
CN105740892A (en) * 2016-01-27 2016-07-06 北京工业大学 High-accuracy human body multi-position identification method based on convolutional neural network
CN105912611A (en) * 2016-04-05 2016-08-31 中国科学技术大学 CNN based quick image search method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015132665A2 (en) * 2014-03-07 2015-09-11 Wolf, Lior System and method for the detection and counting of repetitions of repetitive activity via a trained network

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104915386A (en) * 2015-05-25 2015-09-16 中国科学院自动化研究所 Short text clustering method based on deep semantic feature learning
CN105740892A (en) * 2016-01-27 2016-07-06 北京工业大学 High-accuracy human body multi-position identification method based on convolutional neural network
CN105912611A (en) * 2016-04-05 2016-08-31 中国科学技术大学 CNN based quick image search method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Dictionary Pair Classifier Driven Convolutional Neural Networks for Object Detection;Keze Wang 等;《2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)》;20160630;第2138-2146页 *
深度学习在手写汉字识别中的应用综述;金连文 等;《自动化学报》;20160831;第1125-1142页 *

Also Published As

Publication number Publication date
CN106650725A (en) 2017-05-10

Similar Documents

Publication Publication Date Title
CN106650725B (en) Candidate text box generation and text detection method based on full convolution neural network
CN110852368B (en) Global and local feature embedding and image-text fusion emotion analysis method and system
CN109299274B (en) Natural scene text detection method based on full convolution neural network
CN111858954B (en) Task-oriented text-generated image network model
EP3447727B1 (en) A method, an apparatus and a computer program product for object detection
CN110866140A (en) Image feature extraction model training method, image searching method and computer equipment
CN111160350B (en) Portrait segmentation method, model training method, device, medium and electronic equipment
CN106897732A (en) Multi-direction Method for text detection in a kind of natural picture based on connection word section
CN111914622A (en) Character interaction detection method based on deep learning
CN108345850A (en) The scene text detection method of the territorial classification of stroke feature transformation and deep learning based on super-pixel
CN113361432B (en) Video character end-to-end detection and identification method based on deep learning
CN111274981B (en) Target detection network construction method and device and target detection method
CN111598183A (en) Multi-feature fusion image description method
CN110929665A (en) Natural scene curve text detection method
US20180314894A1 (en) Method, an apparatus and a computer program product for object detection
CN112734803A (en) Single target tracking method, device, equipment and storage medium based on character description
CN111598155A (en) Fine-grained image weak supervision target positioning method based on deep learning
Piergiovanni et al. Video question answering with iterative video-text co-tokenization
Shi et al. Weakly supervised deep learning for objects detection from images
Yan Computational Methods for Deep Learning: Theory, Algorithms, and Implementations
Li A deep learning-based text detection and recognition approach for natural scenes
Puspasari et al. A Survey of Data Mining Techniques for Smart Museum Applications
Shao et al. Multi-object detection based on deep learning in real classrooms
Hsu et al. DensityLayout: Density-Conditioned Layout GAN for Visual-Textual Presentation Designs
Castro et al. Segmentation task for fashion and apparel

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant