CN109002463A - A kind of Method for text detection based on depth measure model - Google Patents

A kind of Method for text detection based on depth measure model Download PDF

Info

Publication number
CN109002463A
CN109002463A CN201810568042.3A CN201810568042A CN109002463A CN 109002463 A CN109002463 A CN 109002463A CN 201810568042 A CN201810568042 A CN 201810568042A CN 109002463 A CN109002463 A CN 109002463A
Authority
CN
China
Prior art keywords
text
region
character
class
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810568042.3A
Other languages
Chinese (zh)
Inventor
赵永彬
刚毅凝
李巍
刘树吉
陈硕
熊先亮
梁凯
周杨浩
杨育彬
郝跃冬
刘嘉华
康睿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
State Grid Corp of China SGCC
Nari Information and Communication Technology Co
Information and Telecommunication Branch of State Grid Liaoning Electric Power Co Ltd
Original Assignee
Nanjing University
Nari Information and Communication Technology Co
Information and Telecommunication Branch of State Grid Liaoning Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University, Nari Information and Communication Technology Co, Information and Telecommunication Branch of State Grid Liaoning Electric Power Co Ltd filed Critical Nanjing University
Priority to CN201810568042.3A priority Critical patent/CN109002463A/en
Publication of CN109002463A publication Critical patent/CN109002463A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a kind of Method for text detection based on depth measure model, comprising: step 1, using MSER detection algorithm, obtains the other candidate region of character level.Step 2, candidate region is filtered using classifier, removes non-character region.Step 3, according to geometric position information, obtained character is clustered into line of text.Step 4, according to heuristic rule, line of text is split, is divided into each specific word.Step 5, the training set of word rank is constructed.Step 6, training depth measure learning model.Step 7, the depth measure model obtained using step 6, classifies to text box, obtains final text box field.

Description

A kind of Method for text detection based on depth measure model
Technical field
The invention belongs to computer vision field more particularly to a kind of Method for text detection based on depth measure model.
Background technique
In machine learning model, loss function usually can be expressed as loss item and regular terms.Loss item is for describing Fitting degree between model itself and training data, regular terms are used to restricted model, enable model in fitting data It is unlikely to too complicated again simultaneously, prevents over-fitting.Common loss function includes 0-1 loss function, square damage in statistical learning Lose function, absolute loss function, logarithm loss function etc..It is mainly quadratic loss function and to be based on used in deep learning The cross entropy loss function of one-hot coding.Existing loss function does not all account for the relationship between sample pair, only pair The borrow of loss function in statistical machine learning does not make full use of other available discriminant informations.
Summary of the invention
Goal of the invention: line of text classification problem is typical two classification problem in text detection.The present invention will measure The thought of study introduces deep learning, the distance between similar, the distance between maximization inhomogeneity is minimized, so that classification Boundary is more obvious, improves the identification of model.
For the deficiency currently considered two classification problems, the present invention provides introduce depth measure learning model always Processing method.
The method specifically includes following steps:
Step 1, using MSER (Maximally Stable Extremal Regions, most stable extremal region) to defeated Enter image and carry out detection algorithm, obtains the other candidate region of character level;
Step 2, the other training dataset of character level is constructed, training dataset of the present invention is mainly derived from scene lteral data Collect ICDAR2003, ICDAR 2011 and ICDAR 2013, according to the character zone marked, intercepts text in character zone Information chooses the candidate region not being overlapped with positive class as negative class, just for the candidate region that step 1 obtains as positive class Class and negative class form the other training dataset of character level, and as input, training deep neural network uses this trained depth Neural network is spent as classifier (classifier can judge whether contain character in character zone), to the time of candidate region Word selection symbol is classified, and screening and filtering removes negative class;
Step 3, lesser threshold values is arranged according to the abscissa of each central point in the central point for choosing each candidate region (generally 5 pixels), by the candidate characters region within this threshold values according to horizontal direction, are all divided into same Line of text region;
Step 4, the average distance in the line of text region that step 3 obtains between each character is calculated, it is average for being greater than Twice of distance of two characters are split, and are divided into two different words, conversely, will be twice less than average distance Two characters belong to same word, to obtain the candidate region of word rank;
Step 5, the candidate region of the word rank obtained according to step 4, each character belong to a word, and one Word is made of at least one character, using all words constructed as the data set of word rank, according to the text of word rank This markup information (the text marking information of contained character in text marking information, that is, word rank data set of word rank) is cut Take corresponding region as positive class, using the region not being overlapped with positive class as negative class;
Step 6, the positive class and negative class obtained according to step 5 is built depth measure model and is instructed using them as input Practice, which can be used in the classification of word rank;
Step 7, the depth measure model obtained according to step 6, treats test image and is filtered, and obtains final text One's respective area.
In step 1 when with MSER algorithm, sets the smallest by 1 for the threshold values of MSER, opened in image in detection one When text filed, need in H, L, S (H (hue, form and aspect), L (lightness, brightness), S (saturation, saturation degree)) and Four channels of gray scale use MSER algorithm.
In step 2, data set is constructed by oneself, will be in view of the data set constructed and detection when constructing data set Similarity between picture.In general, the higher the better for similarity.
In step 5, the negative class of line of text is removed.Construct the data set of word rank, the construction process and step 2 of training set It is similar.According to the markup information of word rank, corresponding region is intercepted as positive class.Those region conducts not being overlapped with positive class Negative class.
Step 6 includes:
Piece image in the data set for the word rank that step 5 obtains is transformed into d dimension Euclidean space, then by step 6-1 Have:
In this formula (1),It is a pair of of triple,WithIt is building word rank in step 5 Belong to the sample of same class (positive class or negative class) in data set,Be withThe different sample of classification, f () instruction Be depth measure model, margin is sample pairWith sample pairBetween parameter value;
Step 6-2 designs following loss function:
Specific gradient derivation process is as follows:
Wherein, Ni indicates sample number,Indicate feature of the depth measure model to i-th of ancestors' sample extraction, fIndicate the feature of sample extraction identical with i-th of ancestor categories,It indicates different from i-th of ancestor categories The feature of sample extraction;
Step 6-3, using loss function training depth measure model, it includes two layers volume that the network of depth measure model, which has altogether, Lamination, two layers of pond layer, two layers of full articulamentum, all image whole normalizings in the data set for the word rank for first obtaining step 5 32 × 32 are turned to, first convolutional layer convolution kernel number is 6, and convolution kernel size is 5 × 5;Second convolutional layer convolution nucleus number Mesh is 12, and convolution kernel size is 5 × 5, and convolution kernel parameter initialization mode is random, first convolutional layer 6 convolution of output Figure, size are 28 × 28, and the size of pond layer is 2 × 2, and pondization strategy is using maximum pond mode, first time pond Afterwards, characteristic pattern size is 14 × 14;After second of convolution, characteristic pattern size is 10 × 10, and the number of full articulamentum is respectively 150 and 50, L2 regularization layer is added after convolution is complete, the characteristic criterion made, after the processing of these layers, step 5 is obtained Word rank data set in all images become effective characteristic function, be finally introducing loss layers of training of Triplet, step The up time function proposed in 6-2 is triplet loss.
Step 7 includes: image to be tested for one, obtains the other candidate regions of character level using the method detection of step 1 The negative sample of candidate region is removed using the deep neural network of step 2 in domain, utilizes the method for step 3 and step 4, construction Candidate word rank region out, the depth measure model in recycle step 6 filter out negative each word rank territorial classification Class, to obtain final text filed.
In step 6, for metric learning, most importantly how the distance between picture is measured.Think of of the invention Want to be desirable to minimize inter- object distance, between class distance is maximized, so that classification boundaries are more obvious.For this purpose, selecting Triplet loss realizes idea of the invention by building triple.It is empty that piece image is transformed into d dimension Euclid Between.Guaranteed with this(anchor, ancestors' node) can with it is similarThe distance of (positive, positive sample) is closer, with It is inhomogeneous(negative, negative sample) is farther.Therefore, have:
When training, decline loss in iteration the smaller the better.Namely allow ancestors' node (anchor) with it is right The closer the positive sample (positive) answered the better, ancestors' node (anchor) and corresponding negative sample (negative) it is more remote more It is good.For the value of marginal value (margin):
(1) when the value of marginal value is smaller, loss function value is just easier to be intended to 0.Ancestors' node with it is corresponding just Sample is not needing the too close of drawing, when not needing be too far, can make loss function value very with corresponding negative sample drawing Fastly close to 0.In this way training as a result, similar image often can not be distinguished well.
(2) when the value of marginal value is larger, it is necessary to so that the parameter of network training risk one's life further dotted line node with The distance between corresponding positive sample zooms out the distance between ancestors' node and corresponding negative sample, especially when marginal value Value setting it is too big when, frequently can lead to loss function value keep a very big value.
Therefore, one reasonable marginal value value of setting is very crucial, this is the important finger of similarity between measuring sample Mark.Gap size needs to do one and accepts or rejects well between differentiation for similar image and inhomogeneity image.Above Thinking is specifically set with certain reference significance to boundary value, but can not directly give it is certain by detailed rules. In experiment, by many experiments effect, adjusted repeatedly to choose appropriate value.
Whole network structure has used convolutional layer to the extraction property of feature, has used screening of the pond layer to feature With the characteristic for reducing parameter, comparatively, or ratio is more complete.
In mathematics, one is measured distance function in other words, and expression is in a definition set, between each element Distance.One set with certain metric function is referred to as metric space.Metric learning, that is, often say based on similar The feature learning of degree.Its distance measure the destination of study is to measure the similarity degree between each sample.And this measurement It also is exactly one of the most crucial problem of pattern-recognition.If it is desired to the similarity between two pictures is calculated, then how to measure Similarity between picture is so that similarity is small between different classes of picture and similarity between the picture of the same category It greatly, is exactly metric learning problem in need of consideration.If saying that target is face, then just needing to construct a suitable distance Function goes the feature of quantization face.Such as color development, shape of face etc.;If target is posture identification, with regard to needing building one A distance function that can measure posture similarity.Feature is various, can basis in order to go to model these similarities Specific task, by selecting suitable feature and manually selecting distance function.Certainly, this method may may require that very big Manual time and energy investment, it is also possible to generate to the changes of data very not robust the case where.Metric learning conduct One selectable alternative can freely learn out according to specific different task for certain particular tasks Distance metric function.
The thought of metric learning is introduced deep learning and this model is applied to text detection field by the present invention, for text The problem of positive and negative class is classified in this detection process.
The utility model has the advantages that the present invention solves the problems of the prior art: metric learning being combined with deep learning, to depth The loss function of degree study improves.Original Softmax function, the spy that will learn are substituted using Triplet loss Euclidean distance is taken over for use to express.The sample distance between similar is minimized, the distance between inhomogeneity sample is maximized, so that not Diversity factor between generic is bigger, and boundary between the two becomes apparent from.Our improvement detects line of text and classifies this A two classification problem has the effect of apparent.
Detailed description of the invention
The present invention is done with reference to the accompanying drawings and detailed description and is further illustrated, it is of the invention above-mentioned or Otherwise advantage will become apparent.
Fig. 1 is the thought example of depth measure study.
Fig. 2 is to carry out sentencing method for distinguishing example using metric learning,
Fig. 3 Triplet network model figure.
Fig. 4 is the parameter list of model.
Fig. 5 is the result figure that model inspection obtains.
Fig. 6 is the flow chart of entire method.
Specific embodiment
The present invention will be further described with reference to the accompanying drawings and embodiments.
The present invention suitable for coping with the line of text test problems image, ask by two classification for being particularly suitable for candidate text box Topic.The invention proposes the new methods for text detection and classification.1) when carrying out the detection and filtering of candidate region, Using MSER detection algorithm, detection obtains the other candidate region of character level, constructs the other training dataset of character level, we Training set is mainly derived from scene lteral data collection ICDAR 2003, ICDAR 2011 and ICDAR 2013, we are according to having marked The character zone being poured in, intercepts corresponding text information as positive class, for the candidate region that detection algorithm obtains, choose with The candidate region that positive class is not overlapped is as negative class, and in this, as input, training deep learning network is right using this classifier Candidate characters are classified, screening and filtering, remove negative class.2) candidate characters are clustered into line of text using seed growth algorithm. Lesser threshold values is arranged according to the abscissa of each central point in the central point for choosing each candidate region, will this threshold values with Interior candidate characters region is all divided into the same line of text region according to horizontal direction.It calculates flat between each character Equal distance is split for being significantly greater than between two characters of average distance, is divided into two different words. 3) when classifying to line of text, using depth measure model.The similarity between similar is minimized, inhomogeneity is maximized Between diversity factor.The present invention includes the following steps:
Step 1, using MSER (Maximally Stable Extremal Regions, most stable extremal region) to defeated Enter image and carry out detection algorithm, obtains the other candidate region of character level;
Step 2, the other training dataset of character level is constructed, training dataset of the present invention is mainly derived from scene lteral data Collect ICDAR2003, ICDAR 2011 and ICDAR 2013, according to the character zone marked, intercepts text in character zone Information chooses the candidate region not being overlapped with positive class as negative class, just for the candidate region that step 1 obtains as positive class Class and negative class form the other training dataset of character level, and as input, training deep neural network uses this trained depth Neural network is spent as classifier (classifier can judge whether contain character in character zone), to the time of candidate region Word selection symbol is classified, and screening and filtering removes negative class;
Step 3, lesser threshold values is arranged according to the abscissa of each central point in the central point for choosing each candidate region (generally 5 pixels), by the candidate characters region within this threshold values according to horizontal direction, are all divided into same Line of text region;
Step 4, the average distance in the line of text region that step 3 obtains between each character is calculated, it is average for being greater than Twice of distance of two characters are split, and are divided into two different words, conversely, will be twice less than average distance Two characters belong to same word;To obtain the candidate region of word rank;
Step 5, the candidate region of the word rank obtained according to step 4, each character belong to a word, and one Word is made of at least one character, using all words constructed as the data set of word rank, according to the text of word rank This markup information (the text marking information of contained character in text marking information, that is, word rank data set of word rank) is cut Take corresponding region as positive class, using the region not being overlapped with positive class as negative class;
Step 6, the positive class and negative class obtained according to step 5 is built depth measure model and is instructed using them as input Practice, which can be used in the classification of word rank;
Step 7, the depth measure model obtained according to step 6, treats test image and is filtered, and obtains final text One's respective area.
In step 1 when with MSER algorithm, sets the smallest by 1 for the threshold values of MSER, opened in image in detection one When text filed, need in H, L, S (H (hue, form and aspect), L (lightness, brightness), S (saturation, saturation degree)) and Four channels of gray scale use MSER algorithm.
In step 2, data set is constructed by oneself, will be in view of the data set constructed and detection when constructing data set Similarity between picture.In general, the higher the better for similarity.
In step 5, the negative class of line of text is removed.Construct the data set of word rank, the construction process and step 2 of training set It is similar.According to the markup information of word rank, corresponding region is intercepted as positive class.Those region conducts not being overlapped with positive class Negative class.
Step 6 includes:
Piece image in the data set for the word rank that step 5 obtains is transformed into d dimension Euclidean space, then by step 6-1 Have:
In this formula (1),It is a pair of of triple,WithIt is building word rank in step 5 Belong to the sample of same class (positive class or negative class) in data set,Be withThe different sample of classification, f () instruction Be depth measure model, margin is sample pairWith sample pairBetween parameter value;
Step 6-2 designs following loss function:
Wherein, Ni indicates sample number,Indicate feature of the depth measure model to i-th of ancestors' sample extraction, fIndicate the feature of sample extraction identical with i-th of ancestor categories,It indicates different from i-th of ancestor categories The feature of sample extraction;
Step 6-3, using loss function training depth measure model, it includes two layers volume that the network of depth measure model, which has altogether, Lamination, two layers of pond layer, two layers of full articulamentum, all image whole normalizings in the data set for the word rank for first obtaining step 5 32 × 32 are turned to, first convolutional layer convolution kernel number is 6, and convolution kernel size is 5 × 5;Second convolutional layer convolution nucleus number Mesh is 12, and convolution kernel size is 5 × 5, and convolution kernel parameter initialization mode is random, first convolutional layer 6 convolution of output Figure, size are 28 × 28, and the size of pond layer is 2 × 2, and pondization strategy is using maximum pond mode, first time pond Afterwards, characteristic pattern size is 14 × 14;After second of convolution, characteristic pattern size is 10 × 10, and the number of full articulamentum is respectively 150 and 50, L2 regularization layer is added after convolution is complete, the characteristic criterion made, after the processing of these layers, step 5 is obtained Word rank data set in all images become effective characteristic function, be finally introducing loss layers of training of Triplet, step The up time function proposed in 6-2 is triplet loss.
Step 7 includes: image to be tested for one, obtains the other candidate regions of character level using the method detection of step 1 The negative sample of candidate region is removed using the deep neural network of step 2 in domain, utilizes the method for step 3 and step 4, construction Candidate word rank region out, the depth measure model in recycle step 6 filter out negative each word rank territorial classification Class, to obtain final text filed.
Embodiment:
The present invention using the above scheme, realizes the work that character area is detected on ICDAR2011.
Be implemented as follows: these three data sets are text detection standard data sets.It is detected first using detection algorithm Candidate characters.After being filtered, screening out non-character candidate, character is clustered into line of text according to geological information.For every One line of text classifies to these candidate characters, removes non-text by trained depth measure Study strategies and methods One's respective area obtains final testing result.
Step 1, using MSER detection algorithm, in H, detection work is carried out respectively on L, S and four channels of gray scale, with To candidate region as much as possible.
Step 2, the other training dataset of character level is constructed, training set is mainly derived from ICDAR2003, ICDAR 2011 Corresponding text information is intercepted as positive class, for detection algorithm according to the character zone marked with ICDAR 2013 The candidate region not being overlapped with positive class is chosen as negative class, in this, as input, training depth in obtained candidate region It practises network to classify to candidate characters using this classifier, screening and filtering removes negative class;
Step 3, candidate characters are clustered into line of text using seed growth algorithm.Choose the center of each candidate region Lesser threshold values is arranged according to the abscissa of each central point in point.By the candidate characters region within this threshold values according to water Square to being all divided into the same line of text region.The average distance between each character is calculated, it is flat for being significantly greater than It is split between two characters of equal distance, is divided into two different words;
Step 4, it in the horizontal text box that step 3 obtains, is refined again, calculates being averaged between each character Distance is split for being significantly greater than between two characters of average distance, is divided into two different words;
Step 5, the data set of word rank is constructed, the construction process of training set is similar to step 2, according to the mark of word rank Information is infused, interception corresponding region is as positive class, and the region that those are not overlapped with positive class is as negative class.Training depth measure mould Type.Step 6, depth measure model and training are built.The training objective of depth measure model as shown in figures 1 and 2, in Fig. 1 Anchor is represented ancestors' image (containing text labels), and postive represents the image as anchor classification (containing text This self), negtive is represented and the different image of anchor classification.Input is a pair of of image in Fig. 2, by neural network (w represents its parameter), has obtained hiding input h, and Distance Metric represents metric range predetermined, finally judges It whether is same image (Same or different).Tertiary target is by connecing between depth measure model latter three Short range degree (similar sample is approached than dissimilar), it is special that Fig. 3 indicates how depth measure model extracts on an image Sign, Fig. 3 indicate the training process of depth measure model: the left side is the image containing text of input, by a nerve net Network obtains character representation (Feature Represention) finally by Triplet Loss defined in step 6, calculates Penalty values out.Fig. 4 is the parameter of network structure.That be represented in Fig. 4 is the details of multilayer neural network, Input image Size indicates input picture size (32*32), and Kernel size indicates convolution kernel size (5*5), and Pooling size is indicated Down-sampled ratio, C1 and C2 indicate that two convolutional layers, F1 and F2 indicate two full articulamentums.
Step 7, image to be tested for one obtains the other candidate region of character level using the method detection of step 1, Using the deep neural network of step 2, remove the negative sample of candidate region, using the method for step 3 and step 4, constructs time The word rank region of choosing, the depth measure model in recycle step 6 filter out negative class to each word rank territorial classification, from And it obtains final text filed.Fig. 5 is the concrete outcome that detection obtains, and flow chart of the present invention is as shown in Fig. 6.
The present invention provides a kind of Method for text detection based on depth measure model, implement the side of the technical solution There are many method and approach, the above is only a preferred embodiment of the present invention, it is noted that for the common of the art For technical staff, various improvements and modifications may be made without departing from the principle of the present invention, these improve and Retouching also should be regarded as protection scope of the present invention.The available prior art of each component part being not known in the present embodiment is subject to It realizes.

Claims (4)

1. a kind of Method for text detection based on depth measure model, which comprises the steps of:
Step 1, using MSER detection algorithm, input picture is detected, obtains the other candidate region of character level;
Step 2, the other training dataset of character level is constructed, according to the character zone marked, intercepts text in character zone Information chooses the candidate region not being overlapped with positive class as negative class, just for the candidate region that step 1 obtains as positive class Class and negative class form the other training dataset of character level, and as input, training deep neural network uses this trained depth Neural network classifies to the candidate characters of candidate region as classifier, and screening and filtering removes negative class;
Step 3, lesser threshold values is arranged according to the abscissa of each central point in the central point for choosing each candidate region, will be Candidate characters region within this threshold values is all divided into the same line of text region according to horizontal direction;
Step 4, the average distance in the line of text region that step 3 obtains between each character is calculated, for being greater than average distance Twice of two characters are split, and are divided into two different words, conversely, by less than the two of twice of average distance Character belongs to same word, to obtain the candidate region of word rank;
Step 5, the candidate region of the word rank obtained according to step 4, each character belong to a word, and a word is It is made of at least one character, using all words constructed as the data set of word rank, according to the text marking of word rank Information intercepts corresponding region as positive class, using the region not being overlapped with positive class as negative class;
Step 6, the positive class and negative class obtained according to step 5 builds depth measure model and training using them as input, should Trained model can be used in the classification of word rank;
Step 7, the depth measure model obtained according to step 6, treats test image and is filtered, and obtains final text area Domain.
2. the method according to claim 1, wherein in step 1 when with MSER algorithm, by the threshold values of MSER It is set as the smallest by 1, when text filed in image is opened in detection one, needs in H, L, four channels S and gray scale use MSER algorithm.
3. according to the method described in claim 2, it is characterized in that, step 6 includes:
Piece image in the data set for the word rank that step 5 obtains is transformed into d dimension Euclidean space, then had by step 6-1:
In this formula (1),It is a pair of of triple,WithIt is the data set that word rank is constructed in step 5 In belong to of a sort sample,Be withThe different sample of classification, f () instruction is depth measure model, Margin is sample pairWith sample pairBetween parameter value;
Step 6-2 designs following loss function L:
Specific gradient derivation process is as follows:
Wherein, Ni indicates sample number,Indicate depth measure model to the feature of i-th of ancestors' sample extraction, Indicate the feature of sample extraction identical with i-th of ancestor categories,Indicate that the sample different from i-th of ancestor categories mentions The feature taken;
Step 6-3, using loss function training depth measure model, it includes two layers of convolution that the network of depth measure model, which has altogether, Layer, two layers of pond layer, two layers of full articulamentum, all images all normalization in the data set for the word rank for first obtaining step 5 It is 32 × 32, first convolutional layer convolution kernel number is 6, and convolution kernel size is 5 × 5;Second convolutional layer convolution kernel number It is 12, convolution kernel size is 5 × 5, and convolution kernel parameter initialization mode is random, first convolutional layer 6 trellis diagram of output, Size is 28 × 28, and the size of pond layer is 2 × 2, and pondization is tactful using maximum pond mode, behind first time pond, spy Levying figure size is 14 × 14;After second of convolution, characteristic pattern size is 10 × 10, and the number of full articulamentum is respectively 150 and 50, It is added L2 regularization layer after convolution is complete, the characteristic criterion made, after the processing of these layers, word rank that step 5 obtains Data set in all images become effective characteristic function, be finally introducing loss layers of training of Triplet, proposed in step 6-2 Up time function be triplet loss.
4. according to the method described in claim 3, utilizing step it is characterized in that, step 7 includes: image to be tested for one Rapid 1 method detection obtains the other candidate region of character level, using the deep neural network of step 2, removes the negative of candidate region Sample constructs candidate word rank region, the depth measure mould in recycle step 6 using the method for step 3 and step 4 Type filters out negative class to each word rank territorial classification, to obtain final text filed.
CN201810568042.3A 2018-06-05 2018-06-05 A kind of Method for text detection based on depth measure model Pending CN109002463A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810568042.3A CN109002463A (en) 2018-06-05 2018-06-05 A kind of Method for text detection based on depth measure model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810568042.3A CN109002463A (en) 2018-06-05 2018-06-05 A kind of Method for text detection based on depth measure model

Publications (1)

Publication Number Publication Date
CN109002463A true CN109002463A (en) 2018-12-14

Family

ID=64573314

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810568042.3A Pending CN109002463A (en) 2018-06-05 2018-06-05 A kind of Method for text detection based on depth measure model

Country Status (1)

Country Link
CN (1) CN109002463A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110211123A (en) * 2019-06-14 2019-09-06 北京文安智能技术股份有限公司 A kind of optimization method, the apparatus and system of deep learning neural network
CN111582069A (en) * 2020-04-22 2020-08-25 北京航空航天大学 Track obstacle zero sample classification method and device for air-based monitoring platform
CN111832390A (en) * 2020-05-26 2020-10-27 西南大学 Handwritten ancient character detection method
CN111860525A (en) * 2020-08-06 2020-10-30 宁夏宁电电力设计有限公司 Bottom-up optical character recognition method suitable for terminal block
CN112598614A (en) * 2019-09-17 2021-04-02 南京大学 Judicial image quality measurement method based on deep neural network
CN113538075A (en) * 2020-04-14 2021-10-22 阿里巴巴集团控股有限公司 Data processing method, model training method, device and equipment
WO2023015939A1 (en) * 2021-08-13 2023-02-16 北京百度网讯科技有限公司 Deep learning model training method for text detection, and text detection method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102298604A (en) * 2011-05-27 2011-12-28 中国科学院自动化研究所 Video event detection method based on multi-media analysis
US20170236069A1 (en) * 2016-02-11 2017-08-17 Nec Laboratories America, Inc. Scalable supervised high-order parametric embedding for big data visualization
CN107330074A (en) * 2017-06-30 2017-11-07 中国科学院计算技术研究所 The image search method encoded based on deep learning and Hash

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102298604A (en) * 2011-05-27 2011-12-28 中国科学院自动化研究所 Video event detection method based on multi-media analysis
US20170236069A1 (en) * 2016-02-11 2017-08-17 Nec Laboratories America, Inc. Scalable supervised high-order parametric embedding for big data visualization
CN107330074A (en) * 2017-06-30 2017-11-07 中国科学院计算技术研究所 The image search method encoded based on deep learning and Hash

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ZHU Q H,ET AL: "Deep metric learning for scene text detection", 《2017 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC)》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110211123A (en) * 2019-06-14 2019-09-06 北京文安智能技术股份有限公司 A kind of optimization method, the apparatus and system of deep learning neural network
CN112598614A (en) * 2019-09-17 2021-04-02 南京大学 Judicial image quality measurement method based on deep neural network
CN113538075A (en) * 2020-04-14 2021-10-22 阿里巴巴集团控股有限公司 Data processing method, model training method, device and equipment
CN111582069A (en) * 2020-04-22 2020-08-25 北京航空航天大学 Track obstacle zero sample classification method and device for air-based monitoring platform
CN111832390A (en) * 2020-05-26 2020-10-27 西南大学 Handwritten ancient character detection method
CN111860525A (en) * 2020-08-06 2020-10-30 宁夏宁电电力设计有限公司 Bottom-up optical character recognition method suitable for terminal block
WO2023015939A1 (en) * 2021-08-13 2023-02-16 北京百度网讯科技有限公司 Deep learning model training method for text detection, and text detection method

Similar Documents

Publication Publication Date Title
CN109002463A (en) A kind of Method for text detection based on depth measure model
CN107609459B (en) A kind of face identification method and device based on deep learning
Lu et al. Learning optimal seeds for diffusion-based salient object detection
CN103514456B (en) Image classification method and device based on compressed sensing multi-core learning
CN105427309B (en) The multiple dimensioned delamination process of object-oriented high spatial resolution remote sense information extraction
CN106503727B (en) A kind of method and device of classification hyperspectral imagery
CN109919177B (en) Feature selection method based on hierarchical deep network
CN106845510A (en) Chinese tradition visual culture Symbol Recognition based on depth level Fusion Features
CN105930815A (en) Underwater organism detection method and system
CN109711448A (en) Based on the plant image fine grit classification method for differentiating key field and deep learning
CN109740686A (en) A kind of deep learning image multiple labeling classification method based on pool area and Fusion Features
Aditya et al. Batik classification using neural network with gray level co-occurence matrix and statistical color feature extraction
CN109886161A (en) A kind of road traffic index identification method based on possibility cluster and convolutional neural networks
CN107909102A (en) A kind of sorting technique of histopathology image
CN104282008B (en) The method and apparatus that Texture Segmentation is carried out to image
CN107679509A (en) A kind of small ring algae recognition methods and device
CN110264454B (en) Cervical cancer histopathological image diagnosis method based on multi-hidden-layer conditional random field
CN107392968A (en) The image significance detection method of Fusion of Color comparison diagram and Color-spatial distribution figure
CN110399820A (en) A kind of margin of roads scenery visual identity analysis method
CN108664969A (en) Landmark identification method based on condition random field
CN104573701B (en) A kind of automatic testing method of Tassel of Corn
CN108288061A (en) A method of based on the quick positioning tilt texts in natural scene of MSER
CN110348320A (en) A kind of face method for anti-counterfeit based on the fusion of more Damage degrees
CN106548195A (en) A kind of object detection method based on modified model HOG ULBP feature operators
CN105844299B (en) A kind of image classification method based on bag of words

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20190611

Address after: 110 000 No. 18 Ningbo Road, Heping District, Shenyang, Liaoning Province

Applicant after: Guo Wang Information Communication Branch Company of Liaoning Electric Power Co., Ltd.

Applicant after: Nanjing University

Applicant after: NANJING NARI INFORMATION COMMUNICATION SCIENCE & TECHNOLOGY CO., LTD.

Applicant after: State Grid Corporation of China

Address before: 11 006 No. 18 Ningbo Road, Heping District, Shenyang City, Liaoning Province

Applicant before: Guo Wang Information Communication Branch Company of Liaoning Electric Power Co., Ltd.

Applicant before: Nanjing University

Applicant before: NANJING NARI INFORMATION COMMUNICATION SCIENCE & TECHNOLOGY CO., LTD.

TA01 Transfer of patent application right
RJ01 Rejection of invention patent application after publication

Application publication date: 20181214

RJ01 Rejection of invention patent application after publication