CN108549893B - End-to-end identification method for scene text with any shape - Google Patents
End-to-end identification method for scene text with any shape Download PDFInfo
- Publication number
- CN108549893B CN108549893B CN201810294058.XA CN201810294058A CN108549893B CN 108549893 B CN108549893 B CN 108549893B CN 201810294058 A CN201810294058 A CN 201810294058A CN 108549893 B CN108549893 B CN 108549893B
- Authority
- CN
- China
- Prior art keywords
- text
- network
- character
- region
- rcnn
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
- G06V10/267—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
- G06V10/449—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
- G06V10/451—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
- G06V10/454—Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/62—Text, e.g. of license plates, overlay texts or captions on TV images
- G06V20/63—Scene text, e.g. street names
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/14—Image acquisition
- G06V30/148—Segmentation of character regions
- G06V30/153—Segmentation of character regions using recognition of characters or words
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Life Sciences & Earth Sciences (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biodiversity & Conservation Biology (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses an end-to-end identification method for a scene text with any shape, which is used for extracting text characteristics through a characteristic pyramid network and generating a candidate text box through a regional extraction network; then, adjusting the position of the candidate text box through the fast region classification regression branch to obtain more accurate position information of the text bounding box; secondly, inputting the position information of the bounding box into a segmentation branch, and obtaining a predicted character sequence through a pixel voting algorithm; and finally, processing the predicted character sequence through a weighted edit distance algorithm, and finding the best matched word of the predicted sequence in the given dictionary to obtain a final text recognition result. The method can simultaneously detect and recognize scene texts in any shapes in natural images, including horizontal texts, multi-directional texts and curved texts, and can completely perform end-to-end training. Compared with the prior art, the detection and identification method provided by the invention has excellent effects in the aspects of accuracy and universality and has strong practical application value.
Description
Technical Field
The invention belongs to the technical field of computer vision, and particularly relates to an end-to-end identification method for a scene text with any shape.
Background
Scene text detection and recognition is a very active and challenging research direction in the field of computer vision, to which many real-life applications are relevant, such as picture-based geo-location, real-time translation, and blind help.
The method for detecting and identifying the scene text aims at simultaneously detecting and identifying the text from the natural scene, namely, the method is divided into two tasks of detecting and identifying. In most of the past researches, text detection and recognition are processed separately, namely, in the first step, a trained detector is used for detecting character areas in a natural scene picture, and in the second step, the character areas detected in the first step are input into a recognition module for recognition to obtain character contents. But since these two tasks are highly correlated and complementary, on the one hand, the quality of the detection step determines the accuracy of the recognition; on the other hand, the result of the recognition may also provide feedback for the detection. Such separate processing may result in less than optimal performance of the detection and identification.
Recently, two approaches have been proposed for an end-to-end trainable framework for scene text recognition. These unified models are significantly superior to previous approaches in view of the complementarity between detection and recognition. However, both of these methods have two major drawbacks, firstly, they are not fully trained in an end-to-end fashion. Second, these methods can only recognize horizontal or directional text, but there may be significant changes in the shape of the text in the actual scene picture, from horizontal or directional to curved form. Therefore, an end-to-end recognition method capable of processing scene texts with arbitrary shapes needs to be designed.
Disclosure of Invention
The invention aims to provide an end-to-end identification method of scene texts with arbitrary shapes, which consists of a text detector based on example segmentation and a text recognizer based on character segmentation. Detecting texts in any shapes by a method of segmenting example text regions; and recognizing the text through semantic segmentation in a two-dimensional space, so that irregular text instances are recognized. The method can detect and recognize text instances of arbitrary shapes and can perform end-to-end training completely.
In order to achieve the above object, the present invention provides an end-to-end recognition method for scene texts with arbitrary shapes, which solves the problem of scene text detection and recognition from a completely new perspective, and comprises the following steps:
(1) training an arbitrarily-shaped scene text end-to-end recognition network model, comprising the following sub-steps:
(1.1) carrying out word-level labeling on multidirectional texts of all pictures in an original data set, wherein labels are polygon clockwise vertex coordinates of a text bounding box at the word level and word character sequences of the texts, and a standard training data set with labels is obtained;
and (1.2) defining an end-to-end identification network model of the scene text in any shape, wherein the detection identification network model consists of a characteristic pyramid structure network, a region extraction network, a rapid region classification regression branch network and a segmentation branch network. Calculating a training label according to the standard training data set with the label in the step (1.1), designing a loss function, and training the end-to-end recognition network of the scene text in any shape by using a reverse conduction method to obtain an end-to-end recognition network model of the scene text in any shape; the method specifically comprises the following substeps:
(1.2.1) constructing a scene text end-to-end identification network model in any shape, wherein the identification network model consists of a characteristic pyramid structure network, a region extraction network, a rapid region classification regression branch network and a segmentation branch network; the feature pyramid structure network is shown in fig. 3, and is formed by adding a bottom-up connection, a top-down connection and a transverse connection to a base network of a ResNet-50 deep convolutional neural network, and is used for extracting features fused with different resolutions from an input standard data set picture; inputting the extracted features of different scales into a region extraction network to obtain candidate text regions, obtaining the candidate text regions of fixed scales after aligning the region of interest, and respectively inputting the candidate text regions into a fast region classification regression branch network and a segmentation branch network; inputting a candidate text region with the resolution of 7 multiplied by 7 extracted by a region extraction network into a rapid region classification regression network, predicting the probability that the input candidate text region is a positive sample through classification branches, providing a more accurate candidate text region, calculating the offset of the candidate text region relative to a real text region through regression branches, and adjusting the position of the candidate text region; as shown in fig. 4, the segmentation branching network is composed of four convolutional layers Conv1, Conv2, Conv3, Conv4, one deconvolution layer DeConv and one final convolutional layer Conv5, and inputs the candidate text region with the resolution of 16 × 64 extracted by the region extraction network into the segmentation branching, and finally generates 38 target segmentation layers with the resolution of 32 × 128 through convolution and deconvolution operations; the method comprises the following steps that 1 global text instance segmentation layer is used for predicting the specific position of a text region, and a 36 character segmentation layer and a 1 character background segmentation layer are used for obtaining a predicted character sequence through a pixel voting algorithm.
(1.2.2) generating a horizontal initial bounding box on an original image according to a standard training data set with labels and a characteristic diagram, and generating training labels for a region extraction network module, a fast region classification regression branch network module and a segmentation branch network module in the recognition network model: for the labeled standard training data set Itr, the input picture true label contains a polygon P ═ { P ] representing the text region1,p2…pmAnd a character label C ═ C that indicates the category and position of the character1=(cc1,cl1),c2=(cc2,cl2),…,cn=(ccn,cln) For input picture ItriWherein P isiIs a picture ItriPolygonal bounding box of the middle text region, pij=(xij,yij) Is a polygon PiThe coordinates of the jth vertex, m, denote the number of polygonal text labels, cckAnd clkRespectively, the class and position of the kth character in the text, C is not necessary for all training samples in the present invention.
For a given standard dataset Itr, first the polygon P in the dataset tag is given as { P ═ P1,p2…pmConverts to the smallest horizontal rectangular bounding box of a polygonal text label box, which is denoted G with the center point (x, y) of the rectangle and the height h and width wd(x, y, h, w); for regionDomain extraction network, labeling bounding box G according to labeling data setd(x, y, h, w), each pixel on each feature map in the feature maps to be extracted output by the feature pyramid is corresponding to the original image, a plurality of initial bounding boxes are generated according to candidate text regions predicted by the region extraction network, and the initial bounding box Q is calculated0Annotation bounding box G with respect to an annotation data setdWhen all the labeled bounding boxes G are labeleddAnd an initial bounding box Q0All Jaccard coefficients are less than 0.5, then the initial bounding box Q0Labeled negative class non-text, class label PrpnThe value is 0; otherwise, i.e. there is at least one label bounding box GdAnd Q0Has a Jaccard coefficient of not less than 0.5, Q0Marked as positive text, class label PrpnThe value is 1, and the position offset is calculated relative to the labeling box with the maximum Jaccard coefficient, and the formula is as follows:
x=x0+w0Δx
y=y0+h0Δy
w=w0exp(Δw)
h=h0exp(Δh)
wherein x is0、y0Respectively an initial bounding box Q0Abscissa, ordinate, w of the center point of (a)0、h0Respectively an initial bounding box Q0And Δ x, Δ y are Q, respectively0Center point of (D) relative to GdThe horizontal and vertical coordinate position offset of the central point, exp is exponential operation, and the training label of the area extraction network is obtained as follows:
gtrpn=(Δxrpn,Δyrpn,Δhrpn,Δwrpn,Prpn)
for the fast region classification regression branch network, similarly, the training labels can be calculated as follows: gtrcnn=(Δxrcnn,Δyrcnn,Δhrcnn,Δwrcnn,Prcnn)
For a split branch network, two types of target tags need to be generated: global labels for text instance segmentation and character labels for character semantic segmentation; for a given positive candidate text box r, firstly, obtaining a best matching horizontal rectangle, further obtaining a matching polygon and a character box, and then, shifting and resizing the matching polygon and the character box so as to align the candidate text box r with a target label with a preset height H and a preset width W according to the following formula:
By=(By0-min(ry))×H/(max(ry))
wherein (r)x,ry) Is the vertex of the candidate text box r, (B)x,By) And (B)x0,By0) Are the updated vertices and the original vertices of the polygon and all character boxes, specifically rxSet of abscissas of all vertices of a candidate text box r, rySet of ordinates of all vertices of the candidate text box r, Bx,Bx0,By,By0Similarly, the target global label X is then generated by drawing a standard polygon on a zero-initialized mask and filling the value to 1gFor the character label, the character label X is generated by using the center as the origin, reducing the standard character frame to one eighth of the size of the origin frame, avoiding the character masks from overlapping each other, drawing the reduced character frames on the zero initialization mask and using the corresponding category index padding of the reduced character framescIf C does not exist, all pixels in the character layer are set to be-1 and are ignored during optimization, and finally the segmentation branch overall label gt is obtainedmaskX, in combination with the above label gtrpn,gtrcnn,gtmaskGenerating the final training label as follows:
gt={Δxrpn,Δyrpn,Δhrpn,Δwrpn,Prpn,Δxrcnn,Δyrcnn,
Δhrcnn,Δwrcnn,Prcnn,X};
(1.2.3) training data set I with the standardtrAs the input of the recognition network model, extracting the characteristics by using a characteristic pyramid network module, namely extracting the characteristics of a standard training data set ItrIn the ResNet-50 network structure of the image input feature pyramid network from bottom to top, a convolutional layer unit which does not change the size of a feature map in the network is defined as a level (levels { P2, P3, P4, P5 and P6 }), and finally output convolutional features F of each level are extracted; the top-down connection in the feature pyramid network module upsamples the output convolution feature of the ResNet-50 to generate a multi-scale upsampling feature, and the transverse connection structure in the feature pyramid network module fuses the feature of each level upsampled in the top-down process and the feature generated in the bottom-up process to generate a final feature { F2, F3, F4, F5, F6}, which is shown in fig. 3.
(1.2.4) inputting the features extracted by the feature pyramid network into a region extraction network, distributing anchor points, adjusting a feature map by using a region-of-interest alignment method, and generating a candidate text box:
for input picture ItrkExtracting 5 stage features { F2, F3, F4, F5 and F6} through a feature pyramid network, and defining the feature scale of the anchor at different stages as {32 } according to stages { P2, P3, P4, P5 and P6}2,642,1282,2562,5122And each scale layer has 3 aspect ratios {1:2, 1:1, 2:1 }; 15 feature graphs { Ftr with different scales and proportions can be extracted1,Ftr2,…,Ftr15Is denoted as FtrpSubscript p ═ 1, …, 15;
by region of interest alignment operation, feature Ftr is alignedpGenerating a candidate text region of fixed scale, wherein a candidate text region R of 7 × 7 resolution is generated for the region extraction networkrcnnGenerating a candidate text region R with a resolution of 16 × 64 for dividing branchesmask(ii) a And predicting the probability P of each candidate text box as a correct text region bounding box through classificationrpnPredicting candidate textbox offsets by regression:
Yrpn=(Δxrpn,Δyrpn,Δhrpn,Δwrpn)。
(1.2.5) size (7 × 7) candidate text region R generated by region extraction networkrcnnInputting a fast region classification regression branch network module, calculating a loss function through classification and regression two branches, conducting reversely, and finally generating a predicted text bounding box: the region extraction network is divided into two network branches of classification and regression, and candidate text regions R with the size of 7 multiplied by 7 are obtainedrcnnInputting a classification branch, and outputting a classification score P of the predicted bounding box by convolution operationrcnnThat is, the probability that the bounding box is the positive text box is predicted, and the value is [0, 1 ]]A decimal fraction in between; r is to bercnnInputting regression branches and outputting 4 [0, 1 ]]Fractional component between predicted regression offset Yrcnn=(Δxrcnn,Δyrcnn,Δhrcnn,Δwrcnn) As a prediction bounding box GqThe abscissa and ordinate of the center point when predicted as a positive type text box, and the height and width of the text box are relative to the labeled bounding box GdThe abscissa of the center point, the ordinate, and the predicted positional offset of the height and width of the text box.
(1.2.6) size (16 × 64) candidate text region R generated by region extraction networkmaskThe input segmentation branch network module generates 38 target segmentation layers based on example segmentation and semantic segmentation operations: the split branch network module comprises 4 convolutional layers Conv1, Conv2, Conv3, Conv4, a deconvolution layer deconnv, and a final convolutional layer Conv 5; candidate text box R with size of 16 x 64 generated by area extraction networkmaskInputting the division branch module, and finally generating 38 target division layers { M ] with the scale of 32 x 128 through operations such as convolution, deconvolution and the likeglobal,M1,M2,…,M36,MbackgroundAnd outputting the pixel value X of each pixel in the layer, wherein the value is [0, 1 ]]In the meantime. Outputting global division layer M in layerglobalThe text area polygon Pm can be directly predicted as Pm1,pm2…pmnCharacter segmentation drawingLayer { M1,M2,…,M36And character background segmentation layer MbackgroundThe character sequence S can be predicted according to a pixel voting algorithmq。
(1.2.7) taking training label gt as expected output of the network to predict labelsFor the network prediction output, an objective loss function between the desired output and the prediction output is designed for the constructed network model: taking the training label gt obtained by calculation in the step (1.2.2) as the expected output of the network, and taking the prediction labels in the steps (1.2.4), (1.2.5) and (1.2.6) For network prediction output, aiming at the network model constructed in (1.2.1), designing a target loss function between expected output and prediction output, wherein the overall target loss function consists of a region extraction network, a fast region classification regression branch network and a segmentation branch network loss function, and the overall target loss function expression is as follows:
L(Prpn,Yrpn,Prcnn,Yrcnn,X)=Lrpn(Prpn,Yrpn)+α1Lrcnn(Prcnn,Yrcnn)+α2Lmask(X)
wherein L isrpn(Prpn,Yrpn) Extracting the loss function of the network for the region, Lrcnn(Prcnn,Yrcnn) For fast regionally classifying the loss function of the regression branch network, Lmask(X) is a loss function of the split branch network α1,α2Are respectively a loss function LrcnnAnd LmaskThe weight coefficient of (1) is simply set to 1;
according to a designed overall target loss function, iterative training is carried out on the model by utilizing a back propagation algorithm, the overall target loss function is minimized, an optimal network model is realized, and aiming at a scene character detection and recognition task, iterative training is firstly carried out on a synthetic text data set (SynthText) in the training process to obtain initial network parameters; training is then performed on the real dataset to fine-tune the network parameters.
The character recognition is carried out on the text picture to be recognized by utilizing the trained model, and the character recognition method comprises the following substeps:
(2.1) extracting features of the text picture of the scene to be detected and recognized, inputting the extracted features into a fast region classification regression branch network to generate a candidate text region, and filtering the candidate text region by non-maximum suppression operation to obtain a more accurate candidate text region: for the data set I to be detectedtstIth picture ItstkInputting the initial bounding boxes into the model trained in the step (1.2), generating the initial bounding boxes after the model passes through the characteristic pyramid network and the region extraction network, inputting the initial bounding boxes into the fast region classification regression branch network, and performing fast region classification on each initial bounding box GqThe classification branch outputs a prediction value P based on the classification scorercnnAs an initial bounding box GqA score predicted as a positive type sample; the regression branch will output a predicted regression offset Y consisting of 4 decimalsrcnn(Δxrcnn,Δyrcnn,Δhrcnn,Δwrcnn) As GqCenter point abscissa, ordinate and height and width relative to labeled bounding box G when predicted as a positive type text boxdThe position offset of the horizontal coordinate, the vertical coordinate, the height and the width of the center point can be calculated according to the position offset, and the position Q of the quadrangular text bounding box predicted by the network can be calculatedz;
For predicted text bounding box QzCarrying out non-maximum suppression operation for filtering to obtain an output result: network model to feature map FtstpEach of the initial bounding boxes Q predicted as positive-type text0All return to the horizontal quadrilateral position, and the same test picture ItstkThe situation that the positive type text quadrangles regressed on each feature map usually overlap with each other occurs, and then the non-maximum suppression operation is carried out on the positions of all the positive type text quadrangles, and the specific steps are as follows: 1) for predictedText bounding box, if and only if the text classification score PrcnnWhen the detection text box is more than or equal to 0.5, the detection text box is reserved; 2) and (4) carrying out non-maximum suppression operation (NMS) on the text box reserved in the previous step according to the Jaccard coefficient of 0.2 to obtain the final reserved quadrilateral bounding box of the positive text.
(2.2) inputting the predicted candidate text region into a segmentation branch network to perform text example segmentation and character segmentation, respectively generating a global text example segmentation mask and a character segmentation mask, obtaining a polygonal word text region by calculating the outline of the text region on the global text example segmentation mask, and predicting to obtain a character sequence by utilizing a pixel voting algorithm on the character segmentation mask: bounding box position Q of predicted quadrilateral textzThe input segmentation branch generates 38 target segmentation layers, firstly, the outline of the text region is directly calculated through a global text example segmentation mask, and the polygon of the text region is obtained. Secondly, generating a character sequence S by using a pixel voting algorithmq。
Segment layer M for 36 characters1,M2,…,M36H, the value p of a pixel in the ith segmentation layerci(x, y) represents a pixel p at the corresponding position of the global text segmentation layerg(x, y) is the character ziProbability of (a), ziIs the ith of 36 characters {0, 1.. 9, a, b.. once, z }, and the probability sum of the corresponding pixel positions of the layer divided by 36 characters is 1, namely
Segmenting layer M for character backgroundbackgroundFirstly, binary processing is carried out on the image, and then a character region set on a background image layer is defined as R ═ { R ═ on a binary background image1,r2,,…,rnWherein riDividing an ith character area on the character background division layer, wherein n is the number of all characters on the background division layer;
the pixel voting algorithm process is as follows: firstly, character regions r in 36 character division layers and character background division layers are dividediThe connected region set is defined as Ci={ci1,ci2,…,ci36In which cijDividing a region block corresponding to the ith character region in the character background division layer in the jth character division layer, and then for the region riAnd corresponding connected region CiThe step of solving the predicted character by using the pixel voting algorithm comprises the following steps: first, a pair connection region C is calculatediInner cijThe values of all pixels are averaged, and secondly, the c with the largest average is foundij_maxThe character layer Mj_maxCorresponding character class zj_maxThe character is predicted for this character area and finally the character area r of each of the character background segmentation layers is segmentediThe final predicted character sequence S is obtained by the operationq。
(2.3) processing the character sequence predicted by the segmentation branch through a weighted edit distance algorithm, finding the best matched word of the predicted sequence in a given dictionary, and obtaining a final recognition result: in the pixel voting stage, the probabilities of all character categories of each character region in the prediction sequence can be obtained, and different weights are defined for deleting, inserting and replacing operations according to the probabilities. For deletion operations, the cost is the probability that a character is predicted to be the currently deleted character; for an insertion operation, the cost is the average probability of two characters adjacent to the character insertion position; for the replacement operation, the computational cost is: max (1-s1/s2, 0), where s1 and s2 are the probabilities of the candidate character and the predicted character to be replaced. And regressing the predicted character string according to a given dictionary through a weighted editing distance algorithm, defining different weights for deletion, insertion and replacement, and adjusting the predicted word, so that the accuracy is improved, and the final recognition result is obtained.
Through the technical scheme, compared with the prior art, the invention has the following technical effects:
(1) the accuracy is high: aiming at the problem of recognizing texts in any shapes in scene texts, the method creatively utilizes example segmentation to detect the texts, semantically segments and recognizes the texts, and more accurately detects the text positions and recognizes the texts.
(2) The speed is high: the detection and recognition model provided by the invention has the advantages that the detection and recognition accuracy is ensured, and the training speed is high.
(3) The universality is strong: the invention discloses an end-to-end trainable text detection and recognition model, which can not only simultaneously detect and recognize texts and realize complete end-to-end training, but also process texts in various shapes, including horizontal, directional and curved texts;
(4) the robustness is strong: the invention can overcome the change of text dimension and shape, and can detect the recognition level, orientation and curve text at the same time.
Drawings
FIG. 1 is a flow chart of an arbitrary-shaped scene text end-to-end recognition method of the present invention, in which a solid arrow represents training and a dashed arrow represents testing;
FIG. 2 is a diagram of an arbitrarily shaped scene text end-to-end recognition network model of the present invention;
FIG. 3 is a schematic diagram of a network structure of a feature pyramid structure module in an arbitrary-shaped scene text end-to-end recognition model according to the present invention;
FIG. 4 is a diagram of a segmentation branch network structure in an arbitrary-shaped scene text end-to-end recognition model according to the present invention;
FIG. 5 is a schematic diagram of a test portion pixel voting algorithm of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
The technical terms of the present invention are explained and explained first:
ResNet-50: a neural network for classification mainly comprises 50 convolutional layers, a pooling layer and a short connecting layer. The convolution layer is used for extracting picture characteristics; the pooling layer has the functions of reducing the dimensionality of the feature vector output by the convolutional layer and reducing overfitting; the shortcut connection layer is used for transferring gradient and solving the problems of extinction and explosion gradient. The network parameters can be updated through a reverse conduction algorithm;
area extraction network: a network for generating candidate text regions is used for generating full-connection features with the height of a specific dimension on an extracted feature map by using a sliding window, generating two full-connection branch classification and regression candidate text regions according to the full-connection features, and finally generating candidate text regions with different scale proportions for a subsequent network according to different anchor points and proportions.
Jaccard coefficient: the Jaccard coefficient is used for comparing similarity and difference between limited sample sets, in the field of text detection, the Jaccard coefficient is defaulted to be equal to IOU (input/output), namely the intersection area/combination area of two frames, and describes the overlapping rate of a predicted text box and an original marked text box generated by a model, wherein the IOU is larger, the overlapping degree is higher, and the detection is more accurate.
Non-maximum inhibition (NMS): the non-maximum suppression is a post-processing algorithm widely applied in the field of computer vision detection, and the non-maximum suppression is used for filtering overlapped detection frames by means of sorting, traversing and rejecting to realize loop iteration according to a set threshold value, and removing redundant detection frames to obtain a final detection result.
As shown in fig. 1, the method for recognizing a scene text in an arbitrary shape from end to end of the present invention includes the following steps:
(1) training an arbitrarily-shaped scene text end-to-end recognition network model, comprising the following sub-steps:
(1.1) carrying out word-level labeling on multidirectional texts of all pictures in an original data set, wherein labels are polygon clockwise vertex coordinates of a text bounding box at the word level and word character sequences of the texts, and a standard training data set with labels is obtained;
and (1.2) defining an end-to-end identification network model of the scene text in any shape, wherein the detection identification network model consists of a characteristic pyramid structure network, a region extraction network, a rapid region classification regression branch network and a segmentation branch network. Calculating a training label according to the standard training data set with the label in the step (1.1), designing a loss function, and training the end-to-end recognition network of the scene text in any shape by using a reverse conduction method to obtain an end-to-end recognition network model of the scene text in any shape; the method specifically comprises the following substeps:
(1.2.1) constructing a scene text end-to-end identification network model in any shape, wherein the identification network model consists of a characteristic pyramid structure network, a region extraction network, a rapid region classification regression branch network and a segmentation branch network; the feature pyramid structure network is shown in fig. 3, and is formed by adding a bottom-up connection, a top-down connection and a transverse connection to a base network of a ResNet-50 deep convolutional neural network, and is used for extracting features fused with different resolutions from an input standard data set picture; inputting the extracted features of different scales into a region extraction network to obtain candidate text regions, obtaining the candidate text regions of fixed scales after aligning the region of interest, and respectively inputting the candidate text regions into a fast region classification regression branch network and a segmentation branch network; inputting a candidate text region with the resolution of 7 multiplied by 7 extracted by a region extraction network into a rapid region classification regression network, predicting the probability that the input candidate text region is a positive sample through classification branches, providing a more accurate candidate text region, calculating the offset of the candidate text region relative to a real text region through regression branches, and adjusting the position of the candidate text region; as shown in fig. 4, the segmentation branching network is composed of four convolutional layers Conv1, Conv2, Conv3, Conv4, one deconvolution layer DeConv and one final convolutional layer Conv5, and inputs the candidate text region with the resolution of 16 × 64 extracted by the region extraction network into the segmentation branching, and finally generates 38 target segmentation layers with the resolution of 32 × 128 through convolution and deconvolution operations; the method comprises the following steps that 1 global text instance segmentation layer is used for predicting the specific position of a text region, and a 36 character segmentation layer and a 1 character background segmentation layer are used for obtaining a predicted character sequence through a pixel voting algorithm.
(1.2.2) generating a level on the original image according to the standard training data set with labels and the characteristic diagramAn initial bounding box, which is used for generating training labels for the modules of the region extraction network, the fast region classification regression branch network and the segmentation branch network in the recognition network model: for the labeled standard training data set Itr, the input picture true label contains a polygon P ═ { P ] representing the text region1,p2…pmAnd a character label C ═ C that indicates the category and position of the character1=(cc1,cl1),c2=(cc2,cl2),…,cn=(ccn,cln) For input picture ItriWherein P isiIs a picture ItriPolygonal bounding box of the middle text region, pij=(xij,yij) Is a polygon PiThe coordinates of the jth vertex, m, denote the number of polygonal text labels, cckAnd clkRespectively, the class and position of the kth character in the text, C is not necessary for all training samples in the present invention.
For a given standard dataset Itr, first the polygon P in the dataset tag is given as { P ═ P1,p2…pmConverts to the smallest horizontal rectangular bounding box of a polygonal text label box, which is denoted G with the center point (x, y) of the rectangle and the height h and width wd(x, y, h, w); for the area extraction network, labeling bounding box G according to the labeling data setd(x, y, h, w), each pixel on each feature map in the feature maps to be extracted output by the feature pyramid is corresponding to the original image, a plurality of initial bounding boxes are generated according to candidate text regions predicted by the region extraction network, and the initial bounding box Q is calculated0Annotation bounding box G with respect to an annotation data setdWhen all the labeled bounding boxes G are labeleddAnd an initial bounding box Q0All Jaccard coefficients are less than 0.5, then the initial bounding box Q0Labeled negative class non-text, class label PrpnThe value is 0; otherwise, i.e. there is at least one label bounding box GdAnd Q0Has a Jaccard coefficient of not less than 0.5, Q0Marked as positive text, class label PrpnThe value is 1, and the position offset is calculated relative to the labeling box with the maximum Jaccard coefficient, and the formula is as follows:
x=x0+w0Δx
y=y0+h0Δy
w=w0exp(Δw)
h=h0exp(Δh)
wherein x is0、y0Respectively an initial bounding box Q0Abscissa, ordinate, w of the center point of (a)0、h0Respectively an initial bounding box Q0And Δ x, Δ y are Q, respectively0Center point of (D) relative to GdThe horizontal and vertical coordinate position offset of the central point, exp is exponential operation, and the training label of the area extraction network is obtained as follows:
gtrpn=(Δxrpn,Δyrpn,Δhrpn,Δwrpn,Prpn)
for the fast region classification regression branch network, similarly, the training labels can be calculated as follows:
gtrcnn=(Δxrcnn,Δyrcnn,Δhrcnn,Δwrcnn,Prcnn)
for a split branch network, two types of target tags need to be generated: global labels for text instance segmentation and character labels for character semantic segmentation; for a given positive candidate text box r, firstly, obtaining a best matching horizontal rectangle, further obtaining a matching polygon and a character box, and then, shifting and resizing the matching polygon and the character box so as to align the candidate text box r with a target label with a preset height H and a preset width W according to the following formula:
By=(By0-min(ry))×H/(max(ry))
wherein (r)x,ry) Is the vertex of the candidate text box r, (B)x,By) Andare the updated vertices and the original vertices of the polygon and all character boxes, specifically rxSet of abscissas of all vertices of a candidate text box r, ryIs the set of ordinates of all the vertices of the candidate text box r,similarly, the target global label X is then generated by drawing a standard polygon on a zero-initialized mask and filling the value to 1gFor the character label, the character label X is generated by using the center as the origin, reducing the standard character frame to one eighth of the size of the origin frame, avoiding the character masks from overlapping each other, drawing the reduced character frames on the zero initialization mask and using the corresponding category index padding of the reduced character framescIf C does not exist, all pixels in the character layer are set to be-1 and are ignored during optimization, and finally the segmentation branch overall label gt is obtainedmaskX, in combination with the above label gtrpn,gtrcnn,gtmaskGenerating the final training label as follows:
gt={Δxrpn,Δyrpn,Δhrpn,Δwrpn,Prpn,Δxrcnn,Δyrcnn,
Δhrcnn,Δwrcnn,Prcnn,X};
(1.2.3) training data set I with the standardtrAs the input of the recognition network model, extracting the characteristics by using a characteristic pyramid network module, namely extracting the characteristics of a standard training data set ItrIn the ResNet-50 network structure of the image input feature pyramid network from bottom to top, a convolutional layer unit which does not change the size of a feature map in the network is defined as a level (levels { P2, P3, P4, P5 and P6 }), and finally output convolutional features F of each level are extracted; top-down connection pair in feature pyramid network moduleThe output convolution characteristics of et-50 are subjected to upsampling to generate multi-scale upsampled characteristics, and the transverse connection structure in the characteristic pyramid network module fuses the characteristics of each level of upsampled characteristics in the top-down process and the characteristics generated in the bottom-up process to generate final characteristics { F2, F3, F4, F5 and F6}, wherein the process is shown in FIG. 3.
(1.2.4) inputting the features extracted by the feature pyramid network into a region extraction network, distributing anchor points, adjusting a feature map by using a region-of-interest alignment method, and generating a candidate text box:
for input picture ItrkExtracting 5 stage features { F2, F3, F4, F5 and F6} through a feature pyramid network, and defining the feature scale of the anchor at different stages as {32 } according to stages { P2, P3, P4, P5 and P6}2,642,1282,2562,5122And each scale layer has 3 aspect ratios {1:2, 1:1, 2:1 }; 15 feature graphs { Ftr with different scales and proportions can be extracted1,Ftr2,…,Ftr15Is denoted as FtrpSubscript p ═ 1, …, 15;
by region of interest alignment operation, feature Ftr is alignedpGenerating a candidate text region of fixed scale, wherein a candidate text region R of 7 × 7 resolution is generated for the region extraction networkrcnnGenerating a candidate text region R with a resolution of 16 × 64 for dividing branchesmask(ii) a And predicting the probability P of each candidate text box as a correct text region bounding box through classificationrpnPredicting candidate textbox offsets by regression:
Yrpn=(Δxrpn,Δyrpn,Δhrpn,Δwrpn)。
(1.2.5) size (7 × 7) candidate text region R generated by region extraction networkrcnnInputting a fast region classification regression branch network module, calculating a loss function through classification and regression two branches, conducting reversely, and finally generating a predicted text bounding box: the region extraction network is divided into two network branches of classification and regression, and candidate text regions R with the size of 7 multiplied by 7 are obtainedrcnnInput a classification branch, byThe convolution operation outputs a classification score P for the predicted bounding boxrcnnThat is, the probability that the bounding box is the positive text box is predicted, and the value is [0, 1 ]]A decimal fraction in between; r is to bercnnInputting regression branches and outputting 4 [0, 1 ]]Fractional component between predicted regression offset Yrcnn=(Δxrcnn,Δyrcnn,Δhrcnn,Δwrcnn) As a prediction bounding box GqThe abscissa and ordinate of the center point when predicted as a positive type text box, and the height and width of the text box are relative to the labeled bounding box GdThe abscissa of the center point, the ordinate, and the predicted positional offset of the height and width of the text box.
(1.2.6) size (16 × 64) candidate text region R generated by region extraction networkmaskThe input segmentation branch network module generates 38 target segmentation layers based on example segmentation and semantic segmentation operations: the split branch network module comprises 4 convolutional layers Conv1, Conv2, Conv3, Conv4, a deconvolution layer deconnv, and a final convolutional layer Conv 5; candidate text box R with size of 16 x 64 generated by area extraction networkmaskInputting the division branch module, and finally generating 38 target division layers { M ] with the scale of 32 x 128 through operations such as convolution, deconvolution and the likeglobal,M1,M2,…,M36,MbackgroundAnd outputting the pixel value X of each pixel in the layer, wherein the value is [0, 1 ]]In the meantime. Outputting global division layer M in layerglobalThe text area polygon Pm can be directly predicted as Pm1,pm2…pmn}, character segmentation layer { M1,M2,…,M36And character background segmentation layer MbackgroundThe character sequence S can be predicted according to a pixel voting algorithmq。
(1.2.7) taking training label gt as expected output of the network to predict labelsFor the network prediction output, an objective loss function between the desired output and the prediction output is designed for the constructed network model: calculated in step (1.2.2)Training label gt is the expected output of the network, with the predicted labels in steps (1.2.4) (1.2.5) and (1.2.6) For network prediction output, aiming at the network model constructed in (1.2.1), designing a target loss function between expected output and prediction output, wherein the overall target loss function consists of a region extraction network, a fast region classification regression branch network and a segmentation branch network loss function, and the overall target loss function expression is as follows:
L(Prpn,Yrpn,Prcnn,Yrcnn,X)=Lrpn(Prpn,Yrpn)
+α1Lrcnn(Prcnn,Yrcnn)+α2Lmask(X)
wherein L isrpn(Prpn,Yrpn) Extracting the loss function of the network for the region, Lrcnn(Prcnn,Yrcnn) For fast regionally classifying the loss function of the regression branch network, Lmask(X) is a loss function of the split branch network α1,α2Are respectively a loss function LrcnnAnd LmaskThe weight coefficient of (1) is simply set to 1;
according to a designed overall target loss function, iterative training is carried out on the model by utilizing a back propagation algorithm, the overall target loss function is minimized, an optimal network model is realized, and aiming at a scene character detection and recognition task, iterative training is firstly carried out on a synthetic text data set (SynthText) in the training process to obtain initial network parameters; training is then performed on the real dataset to fine-tune the network parameters.
The character recognition is carried out on the text picture to be recognized by utilizing the trained model, and the character recognition method comprises the following substeps:
(2.1) inputting extracted features of text pictures of the scene to be detected and identified into a fast regional classification regression branch networkGenerating a candidate text region, and filtering the candidate text region by non-maximum suppression operation to obtain a more accurate candidate text region: for the data set I to be detectedtstIth picture ItstkInputting the initial bounding boxes into the model trained in the step (1.2), generating the initial bounding boxes after the model passes through the characteristic pyramid network and the region extraction network, inputting the initial bounding boxes into the fast region classification regression branch network, and performing fast region classification on each initial bounding box GqThe classification branch outputs a prediction value P based on the classification scorercnnAs an initial bounding box GqA score predicted as a positive type sample; the regression branch will output a predicted regression offset Y consisting of 4 decimalsrcnn(Δxrcnn,Δyrcnn,Δhrcnn,Δwrcnn) As GqCenter point abscissa, ordinate and height and width relative to labeled bounding box G when predicted as a positive type text boxdThe position offset of the horizontal coordinate, the vertical coordinate, the height and the width of the center point can be calculated according to the position offset, and the position Q of the quadrangular text bounding box predicted by the network can be calculatedz;
For predicted text bounding box QzCarrying out non-maximum suppression operation for filtering to obtain an output result: network model to feature map FtstpEach of the initial bounding boxes Q predicted as positive-type text0All return to the horizontal quadrilateral position, and the same test picture ItstkThe situation that the positive type text quadrangles regressed on each feature map usually overlap with each other occurs, and then the non-maximum suppression operation is carried out on the positions of all the positive type text quadrangles, and the specific steps are as follows: 1) for the predicted text bounding box, if and only if the text classification score PrcnnWhen the detection text box is more than or equal to 0.5, the detection text box is reserved; 2) and (4) carrying out non-maximum suppression operation (NMS) on the text box reserved in the previous step according to the Jaccard coefficient of 0.2 to obtain the final reserved quadrilateral bounding box of the positive text.
(2.2) inputting the predicted candidate text area into a segmentation branch network to perform text instance segmentation and character segmentation, respectively generating a global text instance segmentation mask and a character segmentation mask, and calculating the text area on the global text instance segmentation maskObtaining a polygonal word text region by the contour of the domain, and predicting by a pixel voting algorithm on a character segmentation mask to obtain a character sequence: bounding box position Q of predicted quadrilateral textzThe input segmentation branch generates 38 target segmentation layers, firstly, the outline of the text region is directly calculated through a global text example segmentation mask, and the polygon of the text region is obtained. Secondly, generating a character sequence S by using a pixel voting algorithmq。
Segment layer M for 36 characters1,M2,…,M36H, the value p of a pixel in the ith segmentation layerci(x, y) represents a pixel p at the corresponding position of the global text segmentation layerg(x, y) is the character ziProbability of (a), ziIs the ith of 36 characters {0, 1.. 9, a, b.. once, z }, and the probability sum of the corresponding pixel positions of the layer divided by 36 characters is 1, namely
Segmenting layer M for character backgroundbackgroundFirstly, binary processing is carried out on the image, and then a character region set on a background image layer is defined as R ═ { R ═ on a binary background image1,r2,,…,rnWherein riDividing an ith character area on the character background division layer, wherein n is the number of all characters on the background division layer;
the pixel voting algorithm process is as follows: firstly, character regions r in 36 character division layers and character background division layers are dividediThe connected region set is defined as Ci={ci1,ci2,…,ci36In which cijDividing a region block corresponding to the ith character region in the character background division layer in the jth character division layer, and then for the region riAnd corresponding connected region CiThe step of solving the predicted character by using the pixel voting algorithm comprises the following steps: first, a pair connection region C is calculatediInner cijThe values of all pixels are averaged, and secondly, the c with the largest average is foundij_maxWord of the positionSymbol layer Mj_maxCorresponding character class zj_maxThe character is predicted for this character area and finally the character area r of each of the character background segmentation layers is segmentediThe final predicted character sequence S is obtained by the operationq。
(2.3) processing the character sequence predicted by the segmentation branch through a weighted edit distance algorithm, finding the best matched word of the predicted sequence in a given dictionary, and obtaining a final recognition result: in the pixel voting stage, the probabilities of all character categories of each character region in the prediction sequence can be obtained, and different weights are defined for deleting, inserting and replacing operations according to the probabilities. For deletion operations, the cost is the probability that a character is predicted to be the currently deleted character; for an insertion operation, the cost is the average probability of two characters adjacent to the character insertion position; for the replacement operation, the computational cost is: max (1-s1/s2, 0), where s1 and s2 are the probabilities of the candidate character and the predicted character to be replaced. And regressing the predicted character string according to a given dictionary through a weighted editing distance algorithm, defining different weights for deletion, insertion and replacement, and adjusting the predicted word, so that the accuracy is improved, and the final recognition result is obtained.
Claims (10)
1. An end-to-end identification method for scene texts with arbitrary shapes is characterized by comprising the following steps:
(1) training an arbitrarily-shaped scene text end-to-end recognition network model, comprising the following sub-steps:
(1.1) carrying out word-level labeling on multidirectional texts of all pictures in an original data set, wherein labels are polygon clockwise vertex coordinates of a text bounding box at the word level and word character sequences of the texts, and a standard training data set with labels is obtained;
(1.2) defining a scene text end-to-end recognition network model in any shape, calculating a training label according to the standard training data set with labels in the step (1.1), designing a loss function, and training the scene text end-to-end recognition network by using a reverse conduction method to obtain the scene text end-to-end recognition network model; the method comprises the following steps:
(1.2.1) constructing a scene text end-to-end identification network model in any shape, wherein the identification network model consists of a characteristic pyramid structure network, a region extraction network, a rapid region classification regression branch and a segmentation branch;
(1.2.2) generating a horizontal initial bounding box on an original image according to the feature map, and generating training labels for a region extraction network module, a fast region classification regression branch network module and a segmentation branch network module in the recognition network model;
(1.2.3) training data set I with the standardtrAs input for identifying the network model, extracting features by using a feature pyramid network module;
(1.2.4) inputting the features extracted by the feature pyramid network into a region extraction network, and generating a candidate text box by using a region-of-interest alignment method to adjust a feature map through anchor point distribution;
(1.2.5) inputting the candidate text box into a rapid regional classification regression network module, calculating a loss function and conducting reversely through two branches of classification and regression, and finally generating a predicted text bounding box;
(1.2.6) inputting the candidate text box into a segmentation branch network module, and generating a target segmentation layer based on example segmentation and semantic segmentation;
(1.2.7) taking training label gt as expected output of the network to predict labelsDesigning a target loss function between the expected output and the predicted output for the network prediction output aiming at the constructed network model;
(2) the character detection and recognition of the text picture of the scene to be detected and recognized by utilizing the trained model comprises the following substeps:
(2.1) inputting extracted features of the text picture of the scene to be detected into a fast region classification regression branch network to generate a candidate text region, and filtering the candidate text region by non-maximum suppression operation to obtain a more accurate candidate text region;
(2.2) inputting the predicted candidate text region into a segmentation branch network to perform text example segmentation and character segmentation, respectively generating a global text example segmentation mask and a character segmentation mask, obtaining a polygonal word text region by calculating the outline of the text region on the global text example segmentation mask, and predicting by utilizing a pixel voting algorithm on the character segmentation mask to obtain a character sequence;
and (2.3) processing the character sequence predicted by the segmentation branch through a weighted edit distance algorithm, finding the best matched word of the predicted sequence in the given dictionary, and obtaining the final recognition result.
2. The method for recognizing the scene text in the arbitrary shape end-to-end as claimed in claim 1, wherein the step (1.2.1) of detecting and recognizing the network model specifically comprises:
the identification network model consists of a characteristic pyramid structure network, a regional extraction network, a rapid regional classification regression branch network and a segmentation branch network; the characteristic pyramid structure network is formed by adding a bottom-up connection, a top-down connection and a transverse connection by taking a ResNet-50 deep convolution neural network as a basic network, and is used for extracting and fusing characteristics with different resolutions from an input standard training data set picture; inputting the extracted features of different scales into a region extraction network to obtain candidate text regions, obtaining the candidate text regions of fixed scales after aligning the region of interest, and respectively inputting the candidate text regions into a fast region classification regression branch network and a segmentation branch network; inputting a candidate text region with the resolution of 7 multiplied by 7 extracted by a region extraction network into a rapid region classification regression network, predicting the probability that the input candidate text region is a positive sample through classification branches, providing a more accurate candidate text region, calculating the offset of the candidate text region relative to a real text region through regression branches, and adjusting the position of the candidate text region; the segmentation branch network is composed of four convolutional layers Conv1, Conv2, Conv3, Conv4, an anti-convolutional layer Deconv and a final convolutional layer Conv5, the candidate text regions with the resolution of 16 × 64 extracted by the region extraction network are input into the segmentation branch, and 38 target segmentation layers with the resolution of 32 × 128 are finally generated through convolution and deconvolution operations; the method comprises the following steps that 1 global text instance segmentation layer is used for predicting the specific position of a text region, and a 36 character segmentation layer and a 1 character background segmentation layer are used for obtaining a predicted character sequence through a pixel voting algorithm.
3. The method for recognizing the scene text in the arbitrary shape end-to-end as claimed in claim 1 or 2, wherein the step (1.2.2) is specifically as follows:
for labeled Standard training dataset ItrThe input picture true tag includes a polygon P ═ { P ] representing a text region1,p2…pmAnd a character label C ═ C that indicates the category and position of the character1=(cc1,cl1),c2=(cc2,cl2),…,cn=(ccn,cln) For input pictures }Wherein, PiIs a picturePolygonal bounding box of the middle text region, pij=(xij,yij) Is a polygon PiThe coordinates of the jth vertex, m, denote the number of polygonal text labels, cckAnd clkRespectively, the category and position of the kth character in the text;
for a given standard training data set ItrFirst, the polygon P in the dataset label is given as { P ═ P1,p2…pmConverts to the smallest horizontal rectangular bounding box of a polygonal text label box, which is denoted G with the center point (x, y) of the rectangle and the height h and width wd(x, y, h, w); for the area extraction network, the labeled bounding box G of the data set is trained according to the standardd(x, y, h, w), each pixel on each feature map to be extracted output by the feature pyramid corresponds to the original image according toRegion extraction network predicted candidate text region to generate multiple initial bounding boxes, calculating initial bounding box Q0Labeled bounding box G relative to a standard training data setdWhen all the labeled bounding boxes G are labeleddAnd an initial bounding box Q0All Jaccard coefficients are less than 0.5, then the initial bounding box Q0Labeled negative class non-text, class label PrpnThe value is 0; otherwise, i.e. there is at least one label bounding box GdAnd Q0Has a Jaccard coefficient of not less than 0.5, Q0Marked as positive text, class label PrpnThe value is 1, and the position offset is calculated relative to the labeling box with the maximum Jaccard coefficient, and the formula is as follows:
x=x0+w0Δx
y=y0+h0Δy
w=w0exp(Δw)
h=h0exp(Δh)
wherein x is0、y0Respectively an initial bounding box Q0Abscissa, ordinate, w of the center point of (a)0、h0Respectively an initial bounding box Q0And Δ x, Δ y are Q, respectively0Center point of (D) relative to GdThe horizontal and vertical coordinate position offset of the central point, exp is exponential operation, and the training label of the area extraction network is obtained as follows:
gtrpn=(Δxrpn,Δyrpn,Δhrpn,Δwrpn,Prpn)
for the fast region classification regression branch network, similarly, the training labels can be calculated as follows:
gtrcnn=(Δxrcnn,Δyrcnn,Δhrcnn,Δwrcnn,Prcnn);
for a split branch network, two types of target tags need to be generated: global labels for text instance segmentation and character labels for character semantic segmentation; for a given positive candidate text box r, firstly, obtaining a best matching horizontal rectangle, further obtaining a matching polygon and a character box, and then, shifting and resizing the matching polygon and the character box so as to align the candidate text box r with a target label with a preset height H and a preset width W according to the following formula:
wherein (r)x,ry) Is the vertex of the candidate text box r, (B)x,By) Andare the updated vertices and the original vertices of the polygon and all character boxes, specifically rxSet of abscissas of all vertices of a candidate text box r, rySet of ordinates of all vertices of the candidate text box r, Bx,By,Similarly, the target global label X is then generated by drawing a standard polygon on a zero-initialized mask and filling the value to 1gFor the character label, the character label X is generated by using the center as the origin, reducing the standard character frame to one eighth of the size of the origin frame, avoiding the character masks from overlapping each other, drawing the reduced character frames on the zero initialization mask and using the corresponding category index padding of the reduced character framescIf C does not exist, all pixels in the character layer are set to be-1 and are ignored during optimization, and finally the segmentation branch overall label gt is obtainedmaskX, in combination with the above label gtrpn,gtrcnn,gtmaskGenerating the final training label as follows:
gt={Δxrpn,Δyrpn,Δhrpn,Δwrpn,Prpn,Δxrcnn,Δyrcnn,Δhrcnn,Δwrcnn,Prcnn,X}。
4. the method for recognizing the scene text in the arbitrary shape end-to-end as claimed in claim 1 or 2, wherein the step (1.2.3) is specifically as follows:
standard training data set ItrIn the ResNet-50 network structure of the image input feature pyramid network from bottom to top, a convolutional layer unit which does not change the size of a feature map in the network is defined as a level (levels { P2, P3, P4, P5 and P6 }), and finally output convolutional features F of each level are extracted; and the top-down connection in the feature pyramid network module performs up-sampling on the output convolution features of ResNet-50 to generate multi-scale up-sampling features, and the transverse connection structure in the feature pyramid network module performs fusion on the features of each level up-sampled in the top-down process and the features generated in the bottom-up process to generate final features { F2, F3, F4, F5, F6 }.
5. The method for recognizing the scene text in the arbitrary shape end-to-end as claimed in claim 1 or 2, wherein the step (1.2.4) is specifically as follows:
for input picture ItrkExtracting 5 stage features { F2, F3, F4, F5 and F6} through a feature pyramid network, and defining the feature scale of the anchor at different stages as {32 } according to stages { P2, P3, P4, P5 and P6}2,642,1282,2562,5122And each scale layer has 3 aspect ratios {1:2, 1:1, 2:1 }; 15 feature graphs { Ftr with different scales and proportions can be extracted1,Ftr2,…,Ftr15Is denoted as FtrpSubscript p ═ 1, …, 15;
by region of interest alignment operation, feature Ftr is alignedpGenerating fixed-scale candidate text regionsA domain in which candidate text regions R with a resolution of 7 x 7 are generated for the region extraction networkrcnnGenerating a candidate text region R with a resolution of 16 × 64 for dividing branchesmask(ii) a And predicting the probability P of each candidate text box as a correct text region bounding box through classificationrpnPredicting candidate textbox offset Y by regressionrpn=(Δxrpn,Δyrpn,Δhrpn,Δwrpn)。
6. The method for recognizing the scene text in the arbitrary shape end-to-end as claimed in claim 1 or 2, wherein the step (1.2.5) is specifically as follows:
the region extraction network is divided into two network branches of classification and regression, and candidate text regions R with the size of 7 multiplied by 7 are obtainedrcnnInputting a classification branch, and outputting a classification score P of the predicted bounding box by convolution operationrcnnThat is, the probability that the bounding box is the positive text box is predicted, and the value is [0, 1 ]]A decimal fraction in between; r is to bercnnInputting regression branches and outputting 4 [0, 1 ]]Fractional component between predicted regression offset Yrcnn=(Δxrcnn,Δyrcnn,Δhrcnn,Δwrcnn) As a prediction bounding box GqThe abscissa and ordinate of the center point when predicted as a positive type text box, and the height and width of the text box are relative to the labeled bounding box GdThe abscissa of the center point, the ordinate, and the predicted positional offset of the height and width of the text box.
7. The method for recognizing the scene text in the arbitrary shape end-to-end as claimed in claim 1 or 2, wherein the step (1.2.6) is specifically as follows:
the split branch network module comprises 4 convolutional layers Conv1, Conv2, Conv3, Conv4, a deconvolution layer deconnv, and a final convolutional layer Conv 5; candidate text box R with size of 16 x 64 generated by area extraction networkmaskInputting the division branch module, and finally generating 38 target division layers { M ] with the scale of 32 x 128 through operations such as convolution, deconvolution and the likeglobal,M1,M2,…,M36,MbackgroundAnd outputting the pixel value X of each pixel in the layer, wherein the value is [0, 1 ]]In the output layer, the global division layer M in the output layerglobalThe text area polygon Pm can be directly predicted as Pm1,pm2…pmn}, character segmentation layer { M1,M2,…,M36And character background segmentation layer MbackgroundThe character sequence Sq can be predicted according to a pixel voting algorithm.
8. The method for recognizing the scene text in the arbitrary shape end-to-end as claimed in claim 1 or 2, wherein the step (1.2.7) is specifically as follows:
taking the training label gt obtained by calculation in the step (1.2.2) as the expected output of the network, and taking the prediction labels in the steps (1.2.4), (1.2.5) and (1.2.6) For network prediction output, aiming at the network model constructed in (1.2.1), designing a target loss function between expected output and prediction output, wherein the overall target loss function consists of a region extraction network, a fast region classification regression branch network and a segmentation branch network loss function, and the overall target loss function expression is as follows:
L(Prpn,Yrpn,Prcnn,Yrcnn,X)=Lrpn(Prpn,Yrpn)+α1Lrcnn(Prcnn,Yrcnn)+α2Lmask(X)
wherein L isrpn(Prpn,Yrpn) Extracting the loss function of the network for the region, Lrcnn(Prcnn,Yrcnn) For fast regionally classifying the loss function of the regression branch network, Lmask(X) is a loss function for splitting the branched network, α1,α2Are respectively a loss function LrcnnAnd LmaskThe weight coefficient of (1) is simply set to 1;
according to the designed overall target loss function, iterative training is carried out on the model by utilizing a back propagation algorithm, the overall target loss function is minimized, the optimal network model is realized, and aiming at a scene character detection and recognition task, iterative training is firstly carried out on a synthetic text data set in the training process to obtain initial network parameters; training is then performed on the real dataset to fine-tune the network parameters.
9. The method for recognizing the scene text in the arbitrary shape end-to-end as claimed in claim 1 or 2, wherein the step (2.1) is specifically as follows:
for the data set I to be detectedtstIth picture ItstkInputting the initial bounding boxes into the model trained in the step (1.2), generating the initial bounding boxes after the model passes through the characteristic pyramid network and the region extraction network, inputting the initial bounding boxes into the fast region classification regression branch network, and performing fast region classification on each initial bounding box GqThe classification branch outputs a prediction value P based on the classification scorercnnAs an initial bounding box GqA score predicted as a positive type sample; the regression branch will output a predicted regression offset Y consisting of 4 decimalsrcnn(Δxrcnn,Δyrcnn,Δhrcnn,Δwrcnn) As GqCenter point abscissa, ordinate and height and width relative to labeled bounding box G when predicted as a positive type text boxdThe position offset of the horizontal coordinate, the vertical coordinate, the height and the width of the center point can be calculated according to the position offset, and the position Q of the quadrangular text bounding box predicted by the network can be calculatedz,
For predicted text bounding box QzCarrying out non-maximum suppression operation for filtering to obtain an output result: network model to feature map FtstpEach of the initial bounding boxes Q predicted as positive-type text0All return to the horizontal quadrilateral position, and the same test picture ItstkThe positive type text quadrangles regressed on each feature map usually overlap with each other, and all the positive type texts need to be processedPerforming non-maximum suppression operation on the quadrilateral position, which comprises the following specific steps: 1) for the predicted text bounding box, if and only if the text classification score PrcnnWhen the number of the text boxes is more than or equal to 0.5, the text boxes are detected and reserved; 2) and (4) carrying out non-maximum suppression operation on the text box reserved in the previous step according to the Jaccard coefficient of 0.2 to obtain the final reserved quadrilateral bounding box of the positive text.
10. The method for recognizing the scene text in the arbitrary shape end-to-end as claimed in claim 1 or 2, wherein the step (2.2) is specifically as follows:
bounding box position Q of predicted quadrilateral textzInputting a segmentation branch to generate 38 target segmentation layers, firstly, segmenting a mask through a global text example, directly calculating the outline of a text region to obtain a polygon of the text region, and secondly, generating a character sequence S by utilizing a pixel voting algorithmq,
Segment layer M for 36 characters1,M2,…,M36H, the value p of a pixel in the ith segmentation layerci(x, y) represents a pixel p at the corresponding position of the global text segmentation layerg(x, y) is the character ziProbability of (a), ziIs the ith of 36 characters {0, 1.. 9, a, b.. once, z }, and the probability sum of the corresponding pixel positions of the layer divided by 36 characters is 1, namely
Segmenting layer M for character backgroundbackgroundFirstly, binary processing is carried out on the image, and then a character region set on a background image layer is defined as R ═ { R ═ on a binary background image1,r2,,…,rnWherein riDividing an ith character area on the character background division layer, wherein n is the number of all characters on the background division layer;
the pixel voting algorithm process is as follows: firstly, character regions r in 36 character division layers and character background division layers are dividediThe connected region set is defined as Ci={ci1,ci2,…,ci36In which cijDividing a region block corresponding to the ith character region in the character background division layer in the jth character division layer, and then for the region riAnd corresponding connected region CiThe step of solving the predicted character by using the pixel voting algorithm comprises the following steps: first, a pair connection region C is calculatediInner cijThe values of all pixels are averaged, and secondly, the c with the largest average is foundij_maxThe character layer Mj_maxCorresponding character class zj_maxThe character is predicted for this character area and finally the character area r of each of the character background segmentation layers is segmentediThe final predicted character sequence S is obtained by the operationq。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810294058.XA CN108549893B (en) | 2018-04-04 | 2018-04-04 | End-to-end identification method for scene text with any shape |
PCT/CN2019/080354 WO2019192397A1 (en) | 2018-04-04 | 2019-03-29 | End-to-end recognition method for scene text in any shape |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810294058.XA CN108549893B (en) | 2018-04-04 | 2018-04-04 | End-to-end identification method for scene text with any shape |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108549893A CN108549893A (en) | 2018-09-18 |
CN108549893B true CN108549893B (en) | 2020-03-31 |
Family
ID=63514169
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810294058.XA Active CN108549893B (en) | 2018-04-04 | 2018-04-04 | End-to-end identification method for scene text with any shape |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN108549893B (en) |
WO (1) | WO2019192397A1 (en) |
Families Citing this family (329)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108549893B (en) * | 2018-04-04 | 2020-03-31 | 华中科技大学 | End-to-end identification method for scene text with any shape |
CN109492672A (en) * | 2018-10-17 | 2019-03-19 | 福州大学 | Under a kind of natural scene quickly, the positioning of the bank card of robust and classification method |
CN109583449A (en) * | 2018-10-29 | 2019-04-05 | 深圳市华尊科技股份有限公司 | Character identifying method and Related product |
CN109299274B (en) * | 2018-11-07 | 2021-12-17 | 南京大学 | Natural scene text detection method based on full convolution neural network |
CN109492638A (en) * | 2018-11-07 | 2019-03-19 | 北京旷视科技有限公司 | Method for text detection, device and electronic equipment |
CN112789623A (en) * | 2018-11-16 | 2021-05-11 | 北京比特大陆科技有限公司 | Text detection method, device and storage medium |
CN109559300A (en) * | 2018-11-19 | 2019-04-02 | 上海商汤智能科技有限公司 | Image processing method, electronic equipment and computer readable storage medium |
CN109753956A (en) * | 2018-11-23 | 2019-05-14 | 西北工业大学 | The multi-direction text detection algorithm extracted based on dividing candidate area |
CN109544564A (en) * | 2018-11-23 | 2019-03-29 | 清华大学深圳研究生院 | A kind of medical image segmentation method |
CN109785359B (en) * | 2018-11-27 | 2020-12-04 | 北京理工大学 | Video target detection method based on depth feature pyramid and tracking loss |
EP3660731B1 (en) * | 2018-11-28 | 2024-05-22 | Tata Consultancy Services Limited | Digitization of industrial inspection sheets by inferring visual relations |
CN111259878A (en) * | 2018-11-30 | 2020-06-09 | 中移(杭州)信息技术有限公司 | Method and equipment for detecting text |
CN111292334B (en) * | 2018-12-10 | 2023-06-09 | 北京地平线机器人技术研发有限公司 | Panoramic image segmentation method and device and electronic equipment |
CN109753966A (en) * | 2018-12-16 | 2019-05-14 | 初速度(苏州)科技有限公司 | A kind of Text region training system and method |
CN109740484A (en) * | 2018-12-27 | 2019-05-10 | 斑马网络技术有限公司 | The method, apparatus and system of road barrier identification |
CN110008808B (en) * | 2018-12-29 | 2021-04-09 | 北京迈格威科技有限公司 | Panorama segmentation method, device and system and storage medium |
CN109886286B (en) * | 2019-01-03 | 2021-07-23 | 武汉精测电子集团股份有限公司 | Target detection method based on cascade detector, target detection model and system |
CN111489283B (en) * | 2019-01-25 | 2023-08-11 | 鸿富锦精密工业(武汉)有限公司 | Picture format conversion method and device and computer storage medium |
CN109858432B (en) * | 2019-01-28 | 2022-01-04 | 北京市商汤科技开发有限公司 | Method and device for detecting character information in image and computer equipment |
CN109829437B (en) * | 2019-02-01 | 2022-03-25 | 北京旷视科技有限公司 | Image processing method, text recognition device and electronic system |
CN109977997B (en) * | 2019-02-13 | 2021-02-02 | 中国科学院自动化研究所 | Image target detection and segmentation method based on convolutional neural network rapid robustness |
CN110176017A (en) * | 2019-03-01 | 2019-08-27 | 北京纵目安驰智能科技有限公司 | A kind of Model for Edge Detection based on target detection, method and storage medium |
CN110008950A (en) * | 2019-03-13 | 2019-07-12 | 南京大学 | The method of text detection in the natural scene of a kind of pair of shape robust |
CN109948510B (en) * | 2019-03-14 | 2021-06-11 | 北京易道博识科技有限公司 | Document image instance segmentation method and device |
CN109919239A (en) * | 2019-03-15 | 2019-06-21 | 尹显东 | A kind of diseases and pests of agronomic crop intelligent detecting method based on deep learning |
CN109948533B (en) * | 2019-03-19 | 2021-02-09 | 讯飞智元信息科技有限公司 | Text detection method, device and equipment and readable storage medium |
CN109977949B (en) * | 2019-03-20 | 2024-01-26 | 深圳华付技术股份有限公司 | Frame fine adjustment text positioning method and device, computer equipment and storage medium |
CN111723627A (en) * | 2019-03-22 | 2020-09-29 | 北京搜狗科技发展有限公司 | Image processing method and device and electronic equipment |
CN111753575A (en) * | 2019-03-26 | 2020-10-09 | 杭州海康威视数字技术股份有限公司 | Text recognition method, device and equipment |
CN109977952B (en) * | 2019-03-27 | 2021-10-22 | 深动科技(北京)有限公司 | Candidate target detection method based on local maximum |
CN109934229B (en) * | 2019-03-28 | 2021-08-03 | 网易有道信息技术(北京)有限公司 | Image processing method, device, medium and computing equipment |
CN110135248A (en) * | 2019-04-03 | 2019-08-16 | 华南理工大学 | A kind of natural scene Method for text detection based on deep learning |
CN110147786B (en) | 2019-04-11 | 2021-06-29 | 北京百度网讯科技有限公司 | Method, apparatus, device, and medium for detecting text region in image |
CN110032969B (en) * | 2019-04-11 | 2021-11-05 | 北京百度网讯科技有限公司 | Method, apparatus, device, and medium for detecting text region in image |
CN110059753A (en) * | 2019-04-19 | 2019-07-26 | 北京朗镜科技有限责任公司 | Model training method, interlayer are every recognition methods, device, equipment and medium |
CN110321923B (en) * | 2019-05-10 | 2021-05-04 | 上海大学 | Target detection method, system and medium for fusion of different-scale receptive field characteristic layers |
CN112001406B (en) * | 2019-05-27 | 2023-09-08 | 杭州海康威视数字技术股份有限公司 | Text region detection method and device |
CN110147788B (en) * | 2019-05-27 | 2021-09-21 | 东北大学 | Feature enhancement CRNN-based metal plate strip product label character recognition method |
CN110276279B (en) * | 2019-06-06 | 2020-06-16 | 华东师范大学 | Method for detecting arbitrary-shape scene text based on image segmentation |
CN110348445B (en) * | 2019-06-06 | 2021-07-27 | 华中科技大学 | Instance segmentation method fusing void convolution and edge information |
CN110334705B (en) * | 2019-06-25 | 2021-08-03 | 华中科技大学 | Language identification method of scene text image combining global and local information |
CN110263877B (en) * | 2019-06-27 | 2022-07-08 | 中国科学技术大学 | Scene character detection method |
CN110276351B (en) * | 2019-06-28 | 2022-09-06 | 中国科学技术大学 | Multi-language scene text detection and identification method |
CN110287960B (en) * | 2019-07-02 | 2021-12-10 | 中国科学院信息工程研究所 | Method for detecting and identifying curve characters in natural scene image |
CN110443140B (en) * | 2019-07-05 | 2023-10-03 | 平安科技(深圳)有限公司 | Text positioning method, device, computer equipment and storage medium |
CN110443258B (en) * | 2019-07-08 | 2021-03-02 | 北京三快在线科技有限公司 | Character detection method and device, electronic equipment and storage medium |
CN110443141A (en) * | 2019-07-08 | 2019-11-12 | 深圳中兴网信科技有限公司 | Data set processing method, data set processing unit and storage medium |
CN110503090B (en) * | 2019-07-09 | 2021-11-09 | 中国科学院信息工程研究所 | Character detection network training method based on limited attention model, character detection method and character detector |
CN110363140B (en) * | 2019-07-15 | 2022-11-11 | 成都理工大学 | Human body action real-time identification method based on infrared image |
CN110490191B (en) * | 2019-07-16 | 2022-03-04 | 北京百度网讯科技有限公司 | Training method and system of end-to-end model, and Chinese recognition method and system |
CN112241736B (en) * | 2019-07-19 | 2024-01-26 | 上海高德威智能交通系统有限公司 | Text detection method and device |
CN110427852B (en) * | 2019-07-24 | 2022-04-15 | 北京旷视科技有限公司 | Character recognition method and device, computer equipment and storage medium |
CN113159016A (en) * | 2019-07-26 | 2021-07-23 | 第四范式(北京)技术有限公司 | Text position positioning method and system and model training method and system |
CN110895695B (en) * | 2019-07-31 | 2023-02-24 | 上海海事大学 | Deep learning network for character segmentation of text picture and segmentation method |
CN110503085A (en) * | 2019-07-31 | 2019-11-26 | 联想(北京)有限公司 | A kind of data processing method, electronic equipment and computer readable storage medium |
CN110674807A (en) * | 2019-08-06 | 2020-01-10 | 中国科学院信息工程研究所 | Curved scene character detection method based on semi-supervised and weakly supervised learning |
CN110458132A (en) * | 2019-08-19 | 2019-11-15 | 河海大学常州校区 | One kind is based on random length text recognition method end to end |
CN110516732B (en) * | 2019-08-22 | 2022-03-15 | 北京地平线机器人技术研发有限公司 | Training method of feature pyramid network, and method and device for extracting image features |
CN110852324A (en) * | 2019-08-23 | 2020-02-28 | 上海撬动网络科技有限公司 | Deep neural network-based container number detection method |
CN110598698B (en) * | 2019-08-29 | 2022-02-15 | 华中科技大学 | Natural scene text detection method and system based on adaptive regional suggestion network |
CN110533113B (en) * | 2019-09-04 | 2022-11-11 | 湖南大学 | Method for detecting branch points of tree structure in digital image |
CN110533041B (en) * | 2019-09-05 | 2022-07-01 | 重庆邮电大学 | Regression-based multi-scale scene text detection method |
CN110738207B (en) * | 2019-09-10 | 2020-06-19 | 西南交通大学 | Character detection method for fusing character area edge information in character image |
CN110705535A (en) * | 2019-09-19 | 2020-01-17 | 安徽七天教育科技有限公司 | Method for automatically detecting test paper layout character line |
CN110807764A (en) * | 2019-09-20 | 2020-02-18 | 成都智能迭迦科技合伙企业(有限合伙) | Lung cancer screening method based on neural network |
CN110751154B (en) * | 2019-09-27 | 2022-04-08 | 西北工业大学 | Complex environment multi-shape text detection method based on pixel-level segmentation |
CN110717427B (en) * | 2019-09-27 | 2022-08-12 | 华中科技大学 | Multi-direction object detection method based on vertex sliding |
CN110689012A (en) * | 2019-10-08 | 2020-01-14 | 山东浪潮人工智能研究院有限公司 | End-to-end natural scene text recognition method and system |
CN111626279B (en) * | 2019-10-15 | 2023-06-02 | 西安网算数据科技有限公司 | Negative sample labeling training method and highly-automatic bill identification method |
CN111126401B (en) * | 2019-10-17 | 2023-06-02 | 安徽清新互联信息科技有限公司 | License plate character recognition method based on context information |
CN111062381B (en) * | 2019-10-17 | 2023-09-01 | 安徽清新互联信息科技有限公司 | License plate position detection method based on deep learning |
CN110766707B (en) * | 2019-10-22 | 2022-09-23 | 河海大学常州校区 | Cavitation bubble image processing method based on multi-operator fusion edge detection technology |
CN111222396B (en) * | 2019-10-23 | 2023-07-18 | 江苏大学 | All-weather multispectral pedestrian detection method |
CN110765733A (en) * | 2019-10-24 | 2020-02-07 | 科大讯飞股份有限公司 | Text normalization method, device, equipment and storage medium |
CN110781967B (en) * | 2019-10-29 | 2022-08-19 | 华中科技大学 | Real-time text detection method based on differentiable binarization |
CN110837835B (en) * | 2019-10-29 | 2022-11-08 | 华中科技大学 | End-to-end scene text identification method based on boundary point detection |
CN110807422B (en) * | 2019-10-31 | 2023-05-23 | 华南理工大学 | Natural scene text detection method based on deep learning |
CN110796143A (en) * | 2019-10-31 | 2020-02-14 | 天津大学 | Scene text recognition method based on man-machine cooperation |
CN112749704A (en) * | 2019-10-31 | 2021-05-04 | 北京金山云网络技术有限公司 | Text region detection method and device and server |
CN110956088B (en) * | 2019-10-31 | 2023-06-30 | 北京易道博识科技有限公司 | Overlapped text line positioning and segmentation method and system based on deep learning |
CN112749599A (en) * | 2019-10-31 | 2021-05-04 | 北京金山云网络技术有限公司 | Image enhancement method and device and server |
CN110837796B (en) * | 2019-11-05 | 2022-08-19 | 泰康保险集团股份有限公司 | Image processing method and device |
CN111104962B (en) * | 2019-11-05 | 2023-04-18 | 北京航空航天大学青岛研究院 | Semantic segmentation method and device for image, electronic equipment and readable storage medium |
CN112825141B (en) * | 2019-11-21 | 2023-02-17 | 上海高德威智能交通系统有限公司 | Method and device for recognizing text, recognition equipment and storage medium |
CN111010605B (en) * | 2019-11-26 | 2021-08-17 | 杭州东信北邮信息技术有限公司 | Method for displaying video picture-in-picture window |
CN111062386B (en) * | 2019-11-28 | 2023-12-29 | 大连交通大学 | Natural scene text detection method based on depth pyramid attention and feature fusion |
CN110969129B (en) * | 2019-12-03 | 2023-09-01 | 山东浪潮科学研究院有限公司 | End-to-end tax bill text detection and recognition method |
CN110929678B (en) * | 2019-12-04 | 2023-04-25 | 山东省计算中心(国家超级计算济南中心) | Method for detecting vulvovaginal candida spores |
CN111008600B (en) * | 2019-12-06 | 2023-04-07 | 中国科学技术大学 | Lane line detection method |
CN111178148B (en) * | 2019-12-06 | 2023-06-02 | 天津大学 | Ground target geographic coordinate positioning method based on unmanned aerial vehicle vision system |
CN111061904B (en) * | 2019-12-06 | 2023-04-18 | 武汉理工大学 | Local picture rapid detection method based on image content identification |
CN110991440B (en) * | 2019-12-11 | 2023-10-13 | 易诚高科(大连)科技有限公司 | Pixel-driven mobile phone operation interface text detection method |
CN112990188A (en) * | 2019-12-13 | 2021-06-18 | 华为技术有限公司 | Text recognition method and device |
CN111104892A (en) * | 2019-12-16 | 2020-05-05 | 武汉大千信息技术有限公司 | Human face tampering identification method based on target detection, model and identification method thereof |
CN111061915B (en) * | 2019-12-17 | 2023-04-18 | 中国科学技术大学 | Video character relation identification method |
CN111079649B (en) * | 2019-12-17 | 2023-04-07 | 西安电子科技大学 | Remote sensing image ground feature classification method based on lightweight semantic segmentation network |
CN110991403A (en) * | 2019-12-19 | 2020-04-10 | 同方知网(北京)技术有限公司 | Document information fragmentation extraction method based on visual deep learning |
CN111126386B (en) * | 2019-12-20 | 2023-06-30 | 复旦大学 | Sequence domain adaptation method based on countermeasure learning in scene text recognition |
CN111144469B (en) * | 2019-12-20 | 2023-05-02 | 复旦大学 | End-to-end multi-sequence text recognition method based on multi-dimensional associated time sequence classification neural network |
CN111008613B (en) * | 2019-12-24 | 2023-12-19 | 黑龙江文旅信息科技有限公司 | High-density traffic positioning and monitoring method based on field |
CN111126266B (en) * | 2019-12-24 | 2023-05-05 | 上海智臻智能网络科技股份有限公司 | Text processing method, text processing system, equipment and medium |
CN111046840B (en) * | 2019-12-26 | 2023-06-23 | 天津理工大学 | Personnel safety monitoring method and system based on artificial intelligence in pollution remediation environment |
CN111144411B (en) * | 2019-12-27 | 2024-02-27 | 南京大学 | Irregular text correction and identification method and system based on saliency map |
CN111160242A (en) * | 2019-12-27 | 2020-05-15 | 上海眼控科技股份有限公司 | Image target detection method, system, electronic terminal and storage medium |
CN111160352B (en) * | 2019-12-27 | 2023-04-07 | 创新奇智(北京)科技有限公司 | Workpiece metal surface character recognition method and system based on image segmentation |
CN111160372B (en) * | 2019-12-30 | 2023-04-18 | 沈阳理工大学 | Large target identification method based on high-speed convolutional neural network |
CN111126410B (en) * | 2019-12-31 | 2022-11-18 | 讯飞智元信息科技有限公司 | Character recognition method, device, equipment and readable storage medium |
CN111178358A (en) * | 2019-12-31 | 2020-05-19 | 上海眼控科技股份有限公司 | Text recognition method and device, computer equipment and storage medium |
CN111178364A (en) * | 2019-12-31 | 2020-05-19 | 北京奇艺世纪科技有限公司 | Image identification method and device |
CN111145202B (en) * | 2019-12-31 | 2024-03-08 | 北京奇艺世纪科技有限公司 | Model generation method, image processing method, device, equipment and storage medium |
CN111191611B (en) * | 2019-12-31 | 2023-10-13 | 同济大学 | Traffic sign label identification method based on deep learning |
CN111242122B (en) * | 2020-01-07 | 2023-09-08 | 浙江大学 | Lightweight deep neural network rotating target detection method and system |
CN111242027B (en) * | 2020-01-13 | 2023-04-14 | 北京工业大学 | Unsupervised learning scene feature rapid extraction method fusing semantic information |
CN111310746B (en) * | 2020-01-15 | 2024-03-01 | 支付宝实验室(新加坡)有限公司 | Text line detection method, model training method, device, server and medium |
CN111291759A (en) * | 2020-01-17 | 2020-06-16 | 北京三快在线科技有限公司 | Character detection method and device, electronic equipment and storage medium |
CN111310609B (en) * | 2020-01-22 | 2023-04-07 | 西安电子科技大学 | Video target detection method based on time sequence information and local feature similarity |
CN111428749A (en) * | 2020-02-21 | 2020-07-17 | 平安科技(深圳)有限公司 | Image annotation task pre-verification method, device, equipment and storage medium |
CN111340784B (en) * | 2020-02-25 | 2023-06-23 | 安徽大学 | Mask R-CNN-based image tampering detection method |
CN113324864B (en) * | 2020-02-28 | 2022-09-20 | 南京理工大学 | Pantograph carbon slide plate abrasion detection method based on deep learning target detection |
CN111461114B (en) * | 2020-03-03 | 2023-05-02 | 华南理工大学 | Multi-scale feature pyramid text detection method based on segmentation |
CN111368831B (en) * | 2020-03-03 | 2023-05-23 | 开放智能机器(上海)有限公司 | Positioning system and method for vertical text |
CN111353458B (en) * | 2020-03-10 | 2023-08-18 | 腾讯科技(深圳)有限公司 | Text box labeling method, device and storage medium |
CN113496223A (en) * | 2020-03-19 | 2021-10-12 | 顺丰科技有限公司 | Method and device for establishing text region detection model |
CN111553361B (en) * | 2020-03-19 | 2022-11-01 | 四川大学华西医院 | Pathological section label identification method |
CN111414855B (en) * | 2020-03-19 | 2023-03-24 | 国网陕西省电力公司电力科学研究院 | Telegraph pole sign target detection and identification method based on end-to-end regression model |
CN111310861B (en) * | 2020-03-27 | 2023-05-23 | 西安电子科技大学 | License plate recognition and positioning method based on deep neural network |
CN113449760A (en) * | 2020-03-27 | 2021-09-28 | 北京沃东天骏信息技术有限公司 | Character recognition method and device |
CN111476302B (en) * | 2020-04-08 | 2023-03-24 | 北京工商大学 | fast-RCNN target object detection method based on deep reinforcement learning |
CN113516673B (en) * | 2020-04-10 | 2022-12-02 | 阿里巴巴集团控股有限公司 | Image detection method, device, equipment and storage medium |
CN111488883A (en) * | 2020-04-14 | 2020-08-04 | 上海眼控科技股份有限公司 | Vehicle frame number identification method and device, computer equipment and storage medium |
CN111444919B (en) * | 2020-04-17 | 2023-07-04 | 南京大学 | Method for detecting text with arbitrary shape in natural scene |
CN111461133B (en) * | 2020-04-20 | 2023-04-18 | 上海东普信息科技有限公司 | Express delivery surface single item name identification method, device, equipment and storage medium |
CN111461101B (en) * | 2020-04-20 | 2023-05-19 | 上海东普信息科技有限公司 | Method, device, equipment and storage medium for identifying work clothes mark |
CN111507333B (en) * | 2020-04-21 | 2023-09-15 | 腾讯科技(深圳)有限公司 | Image correction method and device, electronic equipment and storage medium |
CN111582329B (en) * | 2020-04-22 | 2023-03-28 | 西安交通大学 | Natural scene text character detection and labeling method based on multi-example learning |
CN111553345B (en) * | 2020-04-22 | 2023-10-20 | 上海浩方信息技术有限公司 | Method for realizing meter pointer reading identification processing based on Mask RCNN and orthogonal linear regression |
CN111507292B (en) * | 2020-04-22 | 2023-05-12 | 广东光大信息科技股份有限公司 | Handwriting board correction method, handwriting board correction device, computer equipment and storage medium |
CN111553351A (en) * | 2020-04-26 | 2020-08-18 | 佛山市南海区广工大数控装备协同创新研究院 | Semantic segmentation based text detection method for arbitrary scene shape |
CN111563502B (en) * | 2020-05-09 | 2023-12-15 | 腾讯科技(深圳)有限公司 | Image text recognition method and device, electronic equipment and computer storage medium |
CN111723841A (en) * | 2020-05-09 | 2020-09-29 | 北京捷通华声科技股份有限公司 | Text detection method and device, electronic equipment and storage medium |
CN111640089B (en) * | 2020-05-09 | 2023-08-15 | 武汉精立电子技术有限公司 | Defect detection method and device based on feature map center point |
CN111597945B (en) * | 2020-05-11 | 2023-08-18 | 济南博观智能科技有限公司 | Target detection method, device, equipment and medium |
CN111524135B (en) * | 2020-05-11 | 2023-12-26 | 安徽继远软件有限公司 | Method and system for detecting defects of tiny hardware fittings of power transmission line based on image enhancement |
CN111753653B (en) * | 2020-05-15 | 2024-05-03 | 中铁第一勘察设计院集团有限公司 | High-speed rail contact net fastener identification and positioning method based on attention mechanism |
CN111553355B (en) * | 2020-05-18 | 2023-07-28 | 城云科技(中国)有限公司 | Monitoring video-based method for detecting and notifying store outgoing business and managing store owner |
CN111753828B (en) * | 2020-05-19 | 2022-12-27 | 重庆邮电大学 | Natural scene horizontal character detection method based on deep convolutional neural network |
CN111783523B (en) * | 2020-05-19 | 2022-10-21 | 中国人民解放军93114部队 | Remote sensing image rotating target detection method |
CN112001878A (en) * | 2020-05-21 | 2020-11-27 | 合肥合工安驰智能科技有限公司 | Deep learning ore scale measuring method based on binarization neural network and application system |
CN111612081B (en) * | 2020-05-25 | 2024-04-02 | 深圳前海微众银行股份有限公司 | Training method, device, equipment and storage medium for recognition model |
CN111667469B (en) * | 2020-06-03 | 2023-10-31 | 北京小白世纪网络科技有限公司 | Lung disease classification method, device and equipment |
CN111932583A (en) * | 2020-06-05 | 2020-11-13 | 西安羚控电子科技有限公司 | Space-time information integrated intelligent tracking method based on complex background |
CN111709987B (en) * | 2020-06-11 | 2023-04-07 | 上海东普信息科技有限公司 | Package volume measuring method, device, equipment and storage medium |
CN111860479B (en) * | 2020-06-16 | 2024-03-26 | 北京百度网讯科技有限公司 | Optical character recognition method, device, electronic equipment and storage medium |
CN111783572B (en) * | 2020-06-17 | 2023-11-14 | 泰康保险集团股份有限公司 | Text detection method and device |
CN111753714B (en) * | 2020-06-23 | 2023-09-01 | 中南大学 | Multidirectional natural scene text detection method based on character segmentation |
CN111915628B (en) * | 2020-06-24 | 2023-11-24 | 浙江大学 | Single-stage instance segmentation method based on prediction target dense boundary points |
CN111898597A (en) * | 2020-06-24 | 2020-11-06 | 泰康保险集团股份有限公司 | Method, device, equipment and computer readable medium for processing text image |
CN111985525B (en) * | 2020-06-30 | 2023-09-22 | 上海海事大学 | Text recognition method based on multi-mode information fusion processing |
CN111950353B (en) * | 2020-06-30 | 2024-04-19 | 深圳市雄帝科技股份有限公司 | Seal text recognition method and device and electronic equipment |
CN111783427B (en) * | 2020-06-30 | 2024-04-02 | 北京百度网讯科技有限公司 | Method, device, equipment and storage medium for training model and outputting information |
CN111798516B (en) * | 2020-07-01 | 2023-12-22 | 广东省特种设备检测研究院珠海检测院 | Method for detecting running state quantity and analyzing errors of bridge crane equipment |
CN111783763A (en) * | 2020-07-07 | 2020-10-16 | 厦门商集网络科技有限责任公司 | Text positioning box correction method and system based on convolutional neural network |
CN111931572B (en) * | 2020-07-07 | 2024-01-09 | 广东工业大学 | Target detection method for remote sensing image |
CN111783705B (en) * | 2020-07-08 | 2023-11-14 | 厦门商集网络科技有限责任公司 | Character recognition method and system based on attention mechanism |
CN111860264B (en) * | 2020-07-10 | 2024-01-05 | 武汉理工大学 | Multi-task instance-level road scene understanding algorithm based on gradient equalization strategy |
CN111814705B (en) * | 2020-07-14 | 2022-08-02 | 广西师范大学 | Pedestrian re-identification method based on batch blocking shielding network |
CN111798480A (en) * | 2020-07-23 | 2020-10-20 | 北京思图场景数据科技服务有限公司 | Character detection method and device based on single character and character connection relation prediction |
CN111860506B (en) * | 2020-07-24 | 2024-03-29 | 北京百度网讯科技有限公司 | Method and device for recognizing characters |
CN111914727B (en) * | 2020-07-28 | 2024-04-26 | 联芯智能(南京)科技有限公司 | Small target human body detection method based on balance sampling and nonlinear feature fusion |
CN111898610B (en) * | 2020-07-29 | 2024-04-19 | 平安科技(深圳)有限公司 | Card unfilled corner detection method, device, computer equipment and storage medium |
CN111753812A (en) * | 2020-07-30 | 2020-10-09 | 上海眼控科技股份有限公司 | Text recognition method and equipment |
CN112016403B (en) * | 2020-08-05 | 2023-07-21 | 中山大学 | Video abnormal event detection method |
CN111930622B (en) * | 2020-08-10 | 2023-10-13 | 中国工商银行股份有限公司 | Interface control testing method and system based on deep learning |
CN112069907A (en) * | 2020-08-11 | 2020-12-11 | 盛视科技股份有限公司 | X-ray machine image recognition method, device and system based on example segmentation |
CN112069910B (en) * | 2020-08-11 | 2024-03-01 | 上海海事大学 | Multi-directional ship target detection method for remote sensing image |
CN112200181B (en) * | 2020-08-19 | 2023-10-10 | 西安理工大学 | Character shape approximation method based on particle swarm optimization algorithm |
CN112102250B (en) * | 2020-08-20 | 2022-11-04 | 西北大学 | Method for establishing and detecting pathological image detection model with training data as missing label |
CN112926372B (en) * | 2020-08-22 | 2023-03-10 | 清华大学 | Scene character detection method and system based on sequence deformation |
CN112070082B (en) * | 2020-08-24 | 2023-04-07 | 西安理工大学 | Curve character positioning method based on instance perception component merging network |
CN111985439A (en) * | 2020-08-31 | 2020-11-24 | 中移(杭州)信息技术有限公司 | Face detection method, device, equipment and storage medium |
CN112036405A (en) * | 2020-08-31 | 2020-12-04 | 浪潮云信息技术股份公司 | Detection and identification method for handwritten document text |
CN112052853B (en) * | 2020-09-09 | 2024-02-02 | 国家气象信息中心 | Text positioning method of handwriting meteorological archive data based on deep learning |
CN112085122B (en) * | 2020-09-21 | 2024-03-15 | 中国科学院上海微系统与信息技术研究所 | Ontology-based semi-supervised image scene semantic deepening method |
CN112101277B (en) * | 2020-09-24 | 2023-07-28 | 湖南大学 | Remote sensing target detection method based on image semantic feature constraint |
CN112101386B (en) * | 2020-09-25 | 2024-04-23 | 腾讯科技(深圳)有限公司 | Text detection method, device, computer equipment and storage medium |
CN112183322B (en) * | 2020-09-27 | 2022-07-19 | 成都数之联科技股份有限公司 | Text detection and correction method for any shape |
CN112085735B (en) * | 2020-09-28 | 2022-10-25 | 西安交通大学 | Aluminum material image defect detection method based on self-adaptive anchor frame |
CN112183545B (en) * | 2020-09-29 | 2024-05-17 | 佛山市南海区广工大数控装备协同创新研究院 | Natural scene text recognition method with arbitrary shape |
CN112287977B (en) * | 2020-10-06 | 2024-02-09 | 武汉大学 | Target detection method based on bounding box key point distance |
CN112036398B (en) * | 2020-10-15 | 2024-02-23 | 北京一览群智数据科技有限责任公司 | Text correction method and system |
CN112215235B (en) * | 2020-10-16 | 2024-04-26 | 深圳华付技术股份有限公司 | Scene text detection method aiming at large character spacing and local shielding |
CN112308150B (en) * | 2020-11-02 | 2022-04-15 | 平安科技(深圳)有限公司 | Target detection model training method and device, computer equipment and storage medium |
CN112419174B (en) * | 2020-11-04 | 2022-09-20 | 中国科学院自动化研究所 | Image character removing method, system and device based on gate cycle unit |
CN112270370B (en) * | 2020-11-06 | 2023-06-02 | 北京环境特性研究所 | Vehicle apparent damage assessment method |
CN112434698A (en) * | 2020-11-23 | 2021-03-02 | 泰康保险集团股份有限公司 | Character recognition method, character recognition device, electronic equipment and storage medium |
CN112464943B (en) * | 2020-11-25 | 2023-07-14 | 创新奇智(南京)科技有限公司 | Semantic segmentation method and device based on few samples, electronic equipment and storage medium |
CN112418134B (en) * | 2020-12-01 | 2024-02-27 | 厦门大学 | Pedestrian analysis-based multi-stream multi-tag pedestrian re-identification method |
CN112529768B (en) * | 2020-12-04 | 2023-01-06 | 中山大学 | Garment editing and generating method based on generation countermeasure network |
CN112541491B (en) * | 2020-12-07 | 2024-02-02 | 沈阳雅译网络技术有限公司 | End-to-end text detection and recognition method based on image character region perception |
CN112446372B (en) * | 2020-12-08 | 2022-11-08 | 电子科技大学 | Text detection method based on channel grouping attention mechanism |
CN112650832B (en) * | 2020-12-14 | 2022-09-06 | 中国电子科技集团公司第二十八研究所 | Knowledge correlation network key node discovery method based on topology and literature characteristics |
CN112633343B (en) * | 2020-12-16 | 2024-04-19 | 国网江苏省电力有限公司检修分公司 | Method and device for checking wiring of power equipment terminal strip |
CN112598635B (en) * | 2020-12-18 | 2024-03-12 | 武汉大学 | Point cloud 3D target detection method based on symmetric point generation |
CN112528997B (en) * | 2020-12-24 | 2022-04-19 | 西北民族大学 | Tibetan-Chinese bilingual scene text detection method based on text center region amplification |
CN112669446B (en) * | 2020-12-24 | 2024-04-19 | 联通(浙江)产业互联网有限公司 | Building scene modeling method and device |
CN112580738B (en) * | 2020-12-25 | 2021-07-23 | 特赞(上海)信息科技有限公司 | AttentionOCR text recognition method and device based on improvement |
CN113435466A (en) * | 2020-12-26 | 2021-09-24 | 上海有个机器人有限公司 | Method, device, medium and terminal for detecting elevator door position and switch state |
CN112598683B (en) * | 2020-12-27 | 2024-04-02 | 北京化工大学 | Sweep OCT human eye image segmentation method based on sweep frequency optical coherence tomography |
CN112651948B (en) * | 2020-12-30 | 2022-04-12 | 重庆科技学院 | Machine vision-based artemisinin extraction intelligent tracking and identification method |
CN112862842B (en) * | 2020-12-31 | 2023-05-12 | 青岛海尔科技有限公司 | Image data processing method and device, storage medium and electronic device |
CN112686245B (en) * | 2021-01-04 | 2022-05-13 | 福州大学 | Character and text parallel detection method based on character response |
CN112686203B (en) * | 2021-01-12 | 2023-10-31 | 重庆大学 | Vehicle safety warning device detection method based on space priori |
CN112801146B (en) * | 2021-01-13 | 2024-03-19 | 华中科技大学 | Target detection method and system |
CN112733768B (en) * | 2021-01-15 | 2022-09-09 | 中国科学技术大学 | Natural scene text recognition method and device based on bidirectional characteristic language model |
CN112766361A (en) * | 2021-01-18 | 2021-05-07 | 山东师范大学 | Target fruit detection method and detection system under homochromatic background |
CN112712535B (en) * | 2021-01-18 | 2024-03-22 | 长安大学 | Mask-RCNN landslide segmentation method based on simulation difficult sample |
CN112883795B (en) * | 2021-01-19 | 2023-01-31 | 贵州电网有限责任公司 | Rapid and automatic table extraction method based on deep neural network |
CN112651989B (en) * | 2021-01-19 | 2024-01-19 | 华东理工大学 | SEM image molecular sieve particle size statistical method and system based on Mask RCNN example segmentation |
CN112766263B (en) * | 2021-01-21 | 2024-02-02 | 西安理工大学 | Identification method for multi-layer control stock relationship share graphs |
CN112766262B (en) * | 2021-01-21 | 2024-02-02 | 西安理工大学 | Identification method for single-layer one-to-many and many-to-one share graphs |
CN112784737B (en) * | 2021-01-21 | 2023-10-20 | 上海云从汇临人工智能科技有限公司 | Text detection method, system and device combining pixel segmentation and line segment anchor |
CN112766194A (en) * | 2021-01-26 | 2021-05-07 | 上海海洋大学 | Detection method for mesoscale ocean eddy |
CN112818975A (en) * | 2021-01-27 | 2021-05-18 | 北京金山数字娱乐科技有限公司 | Text detection model training method and device and text detection method and device |
CN112990211B (en) * | 2021-01-29 | 2023-07-11 | 华为技术有限公司 | Training method, image processing method and device for neural network |
CN112801092B (en) * | 2021-01-29 | 2022-07-15 | 重庆邮电大学 | Method for detecting character elements in natural scene image |
CN112766274B (en) * | 2021-02-01 | 2023-07-07 | 长沙市盛唐科技有限公司 | Water gauge image water level automatic reading method and system based on Mask RCNN algorithm |
CN112946436A (en) * | 2021-02-02 | 2021-06-11 | 成都国铁电气设备有限公司 | Online intelligent detection method for arc extinction and disconnection of vehicle-mounted contact net insulator |
CN112818873B (en) * | 2021-02-04 | 2023-05-26 | 苏州魔视智能科技有限公司 | Lane line detection method and system and electronic equipment |
CN112700444B (en) * | 2021-02-19 | 2023-06-23 | 中国铁道科学研究院集团有限公司铁道建筑研究所 | Bridge bolt detection method based on self-attention and central point regression model |
CN112883887B (en) * | 2021-03-01 | 2023-07-18 | 中央财经大学 | Building instance automatic extraction method based on high spatial resolution optical remote sensing image |
CN113095319B (en) * | 2021-03-03 | 2022-11-15 | 中国科学院信息工程研究所 | Multidirectional scene character detection method and device based on full convolution angular point correction network |
CN113065401A (en) * | 2021-03-04 | 2021-07-02 | 国网河北省电力有限公司 | Intelligent platform for full-ticket account reporting |
CN113065404B (en) * | 2021-03-08 | 2023-02-24 | 国网河北省电力有限公司 | Method and system for detecting train ticket content based on equal-width character segments |
CN113159021A (en) * | 2021-03-10 | 2021-07-23 | 国网河北省电力有限公司 | Text detection method based on context information |
CN113033346B (en) * | 2021-03-10 | 2023-08-04 | 北京百度网讯科技有限公司 | Text detection method and device and electronic equipment |
CN112966678B (en) * | 2021-03-11 | 2023-01-24 | 南昌航空大学 | Text detection method and system |
CN113011597B (en) * | 2021-03-12 | 2023-02-28 | 山东英信计算机技术有限公司 | Deep learning method and device for regression task |
US11682220B2 (en) * | 2021-03-15 | 2023-06-20 | Optum Technology, Inc. | Overlap-aware optical character recognition |
CN113052369B (en) * | 2021-03-15 | 2024-05-10 | 北京农业智能装备技术研究中心 | Intelligent agricultural machinery operation management method and system |
CN113033377A (en) * | 2021-03-16 | 2021-06-25 | 北京有竹居网络技术有限公司 | Character position correction method, character position correction device, electronic equipment and storage medium |
CN112907605B (en) * | 2021-03-19 | 2023-11-17 | 南京大学 | Data enhancement method for instance segmentation |
CN113128560B (en) * | 2021-03-19 | 2023-02-24 | 西安理工大学 | CNN regular script style classification method based on attention module enhancement |
CN112733822B (en) * | 2021-03-31 | 2021-07-27 | 上海旻浦科技有限公司 | End-to-end text detection and identification method |
CN113052759B (en) * | 2021-03-31 | 2023-03-21 | 华南理工大学 | Scene complex text image editing method based on MASK and automatic encoder |
CN112926692B (en) * | 2021-04-09 | 2023-05-09 | 四川翼飞视科技有限公司 | Target detection device, method and storage medium based on non-uniform mixed convolution |
CN112927245B (en) * | 2021-04-12 | 2022-06-21 | 华中科技大学 | End-to-end instance segmentation method based on instance query |
CN113033540A (en) * | 2021-04-14 | 2021-06-25 | 易视腾科技股份有限公司 | Contour fitting and correcting method for scene characters, electronic device and storage medium |
CN113033482B (en) * | 2021-04-20 | 2024-01-30 | 上海应用技术大学 | Traffic sign detection method based on regional attention |
CN113177389A (en) * | 2021-04-23 | 2021-07-27 | 网易(杭州)网络有限公司 | Text processing method and device, electronic equipment and storage medium |
CN113139541B (en) * | 2021-04-24 | 2023-10-24 | 西安交通大学 | Power distribution cabinet dial nixie tube visual identification method based on deep learning |
CN113269197B (en) * | 2021-04-25 | 2024-03-08 | 南京三百云信息科技有限公司 | Certificate image vertex coordinate regression system and identification method based on semantic segmentation |
CN113762237B (en) * | 2021-04-26 | 2023-08-18 | 腾讯科技(深圳)有限公司 | Text image processing method, device, equipment and storage medium |
CN113159053A (en) * | 2021-04-27 | 2021-07-23 | 北京有竹居网络技术有限公司 | Image recognition method and device and computing equipment |
CN113269045A (en) * | 2021-04-28 | 2021-08-17 | 南京大学 | Chinese artistic word detection and recognition method under natural scene |
CN113191296A (en) * | 2021-05-13 | 2021-07-30 | 中国人民解放军陆军炮兵防空兵学院 | Method for detecting five parameters of target in any orientation based on YOLOV5 |
CN113139625B (en) * | 2021-05-18 | 2023-12-15 | 北京世纪好未来教育科技有限公司 | Model training method, electronic equipment and storage medium thereof |
CN113221773B (en) * | 2021-05-19 | 2022-09-13 | 中国电子科技集团公司第二十八研究所 | Method for quickly constructing airplane classification data set based on remote sensing image |
CN113516116B (en) * | 2021-05-19 | 2022-11-22 | 西安建筑科技大学 | Text detection method, system and medium suitable for complex natural scene |
CN113177511A (en) * | 2021-05-20 | 2021-07-27 | 中国人民解放军国防科技大学 | Rotating frame intelligent perception target detection method based on multiple data streams |
CN113159037B (en) * | 2021-05-25 | 2023-08-08 | 中国平安人寿保险股份有限公司 | Picture correction method, device, computer equipment and storage medium |
CN113379761B (en) * | 2021-05-25 | 2023-04-28 | 重庆顺多利机车有限责任公司 | Linkage method and system of multiple AGVs and automatic doors based on artificial intelligence |
CN113177553B (en) * | 2021-05-31 | 2022-08-12 | 哈尔滨工业大学(深圳) | Method and device for identifying floor buttons of inner panel of elevator |
CN113191358B (en) * | 2021-05-31 | 2023-01-24 | 上海交通大学 | Metal part surface text detection method and system |
CN113313173B (en) * | 2021-06-01 | 2023-05-30 | 中山大学 | Human body analysis method based on graph representation and improved transducer |
CN115457531A (en) * | 2021-06-07 | 2022-12-09 | 京东科技信息技术有限公司 | Method and device for recognizing text |
CN113362380A (en) * | 2021-06-09 | 2021-09-07 | 北京世纪好未来教育科技有限公司 | Image feature point detection model training method and device and electronic equipment thereof |
CN113343980B (en) * | 2021-06-10 | 2023-06-09 | 西安邮电大学 | Natural scene text detection method and system |
CN113378815B (en) * | 2021-06-16 | 2023-11-24 | 南京信息工程大学 | Scene text positioning and identifying system and training and identifying method thereof |
CN113345106A (en) * | 2021-06-24 | 2021-09-03 | 西南大学 | Three-dimensional point cloud analysis method and system based on multi-scale multi-level converter |
CN113360655B (en) * | 2021-06-25 | 2022-10-04 | 中国电子科技集团公司第二十八研究所 | Track point classification and text generation method based on sequence annotation |
CN113255669B (en) * | 2021-06-28 | 2021-10-01 | 山东大学 | Method and system for detecting text of natural scene with any shape |
CN113569650A (en) * | 2021-06-29 | 2021-10-29 | 上海红檀智能科技有限公司 | Unmanned aerial vehicle autonomous inspection positioning method based on electric power tower label identification |
CN113343987B (en) * | 2021-06-30 | 2023-08-22 | 北京奇艺世纪科技有限公司 | Text detection processing method and device, electronic equipment and storage medium |
CN113469177B (en) * | 2021-06-30 | 2024-04-26 | 河海大学 | Deep learning-based drainage pipeline defect detection method and system |
WO2023279186A1 (en) * | 2021-07-06 | 2023-01-12 | Orbiseed Technology Inc. | Methods and systems for extracting text and symbols from documents |
CN113435542A (en) * | 2021-07-22 | 2021-09-24 | 安徽理工大学 | Coal and gangue real-time detection method based on deep learning |
CN113343990B (en) * | 2021-07-28 | 2021-12-03 | 浩鲸云计算科技股份有限公司 | Key text detection and classification training method for certificate pictures |
CN113657213A (en) * | 2021-07-30 | 2021-11-16 | 五邑大学 | Text recognition method, text recognition device and computer-readable storage medium |
CN113763326B (en) * | 2021-08-04 | 2023-11-21 | 武汉工程大学 | Pantograph detection method based on Mask scanning R-CNN network |
CN113807336B (en) * | 2021-08-09 | 2023-06-30 | 华南理工大学 | Semi-automatic labeling method, system, computer equipment and medium for image text detection |
CN113780087B (en) * | 2021-08-11 | 2024-04-26 | 同济大学 | Postal package text detection method and equipment based on deep learning |
CN113643136A (en) * | 2021-09-01 | 2021-11-12 | 京东科技信息技术有限公司 | Information processing method, system and device |
CN113807340B (en) * | 2021-09-07 | 2024-03-15 | 南京信息工程大学 | Attention mechanism-based irregular natural scene text recognition method |
CN113807351B (en) * | 2021-09-18 | 2024-01-16 | 京东鲲鹏(江苏)科技有限公司 | Scene text detection method and device |
CN113837168A (en) * | 2021-09-22 | 2021-12-24 | 易联众智鼎(厦门)科技有限公司 | Image text detection and OCR recognition method, device and storage medium |
CN113989708A (en) * | 2021-10-27 | 2022-01-28 | 福州大学 | Campus library epidemic prevention and control method based on YOLO v4 |
TWI807467B (en) * | 2021-11-02 | 2023-07-01 | 中國信託商業銀行股份有限公司 | Key-item detection model building method, business-oriented key-value identification system and method |
CN114049625B (en) * | 2021-11-11 | 2024-02-27 | 西北工业大学 | Multidirectional text detection method based on novel image shrinkage method |
CN114155540B (en) * | 2021-11-16 | 2024-05-03 | 深圳市联洲国际技术有限公司 | Character recognition method, device, equipment and storage medium based on deep learning |
CN114140786B (en) * | 2021-12-03 | 2024-05-17 | 杭州师范大学 | HRNet coding and double-branch decoding-based scene text recognition method |
CN114332839A (en) * | 2021-12-30 | 2022-04-12 | 福州大学 | Streetscape text detection method based on multi-space joint perception |
CN114332841A (en) * | 2021-12-31 | 2022-04-12 | 福州大学 | Scene text detection method based on selective feature fusion pyramid |
CN114399757A (en) * | 2022-01-13 | 2022-04-26 | 福州大学 | Natural scene text recognition method and system for multi-path parallel position correlation network |
CN114067321B (en) * | 2022-01-14 | 2022-04-08 | 腾讯科技(深圳)有限公司 | Text detection model training method, device, equipment and storage medium |
CN114418001B (en) * | 2022-01-20 | 2023-05-12 | 北方工业大学 | Character recognition method and system based on parameter reconstruction network |
CN114419020B (en) * | 2022-01-26 | 2022-10-18 | 深圳大学 | Medical image segmentation method, medical image segmentation device, computer equipment and storage medium |
CN114201967B (en) * | 2022-02-17 | 2022-06-10 | 杭州费尔斯通科技有限公司 | Entity identification method, system and device based on candidate entity classification |
CN114549958B (en) * | 2022-02-24 | 2023-08-04 | 四川大学 | Night and camouflage target detection method based on context information perception mechanism |
CN114359912B (en) * | 2022-03-22 | 2022-06-24 | 杭州实在智能科技有限公司 | Software page key information extraction method and system based on graph neural network |
CN114399769B (en) * | 2022-03-22 | 2022-08-02 | 北京百度网讯科技有限公司 | Training method of text recognition model, and text recognition method and device |
CN114723946B (en) * | 2022-04-11 | 2024-02-27 | 合肥工业大学 | Preferential direction deviation early warning system and method based on semantic segmentation |
CN114862648B (en) * | 2022-05-27 | 2023-06-20 | 晋城市大锐金马工程设计咨询有限公司 | Cross-watermark encrypted document using A, B two documents |
CN114972947B (en) * | 2022-07-26 | 2022-12-06 | 之江实验室 | Depth scene text detection method and device based on fuzzy semantic modeling |
CN114972710B (en) * | 2022-07-27 | 2022-10-28 | 深圳爱莫科技有限公司 | Method and system for realizing multi-shape target detection in image |
CN115346206B (en) * | 2022-10-20 | 2023-01-31 | 松立控股集团股份有限公司 | License plate detection method based on improved super-resolution deep convolution feature recognition |
CN115546778B (en) * | 2022-10-22 | 2023-06-13 | 清华大学 | Scene text detection method and system based on multitask learning |
CN115909376A (en) * | 2022-11-01 | 2023-04-04 | 北京百度网讯科技有限公司 | Text recognition method, text recognition model training device and storage medium |
CN115422389B (en) * | 2022-11-07 | 2023-04-07 | 北京百度网讯科技有限公司 | Method and device for processing text image and training method of neural network |
CN115497106B (en) * | 2022-11-14 | 2023-01-24 | 合肥中科类脑智能技术有限公司 | Battery laser code-spraying identification method based on data enhancement and multitask model |
CN116701347B (en) * | 2023-05-08 | 2023-12-05 | 北京三维天地科技股份有限公司 | Data modeling method and system based on category expansion |
CN116342627B (en) * | 2023-05-23 | 2023-09-08 | 山东大学 | Intestinal epithelial metaplasia area image segmentation system based on multi-instance learning |
CN116434234B (en) * | 2023-05-25 | 2023-10-17 | 珠海亿智电子科技有限公司 | Method, device, equipment and storage medium for detecting and identifying casting blank characters |
CN116442393B (en) * | 2023-06-08 | 2024-02-13 | 山东博硕自动化技术有限公司 | Intelligent unloading method, system and control equipment for mixing plant based on video identification |
CN116436987B (en) * | 2023-06-12 | 2023-08-22 | 深圳舜昌自动化控制技术有限公司 | IO-Link master station data message transmission processing method and system |
CN116524521B (en) * | 2023-06-30 | 2023-09-15 | 武汉纺织大学 | English character recognition method and system based on deep learning |
CN116524529B (en) * | 2023-07-04 | 2023-10-27 | 青岛海信信息科技股份有限公司 | Novel method for identifying layers based on graph nesting relationship |
CN117078901B (en) * | 2023-07-12 | 2024-04-16 | 长江勘测规划设计研究有限责任公司 | Automatic marking method for single-point bar in steel bar view |
CN116740688B (en) * | 2023-08-11 | 2023-11-07 | 武汉市中西医结合医院(武汉市第一医院) | Medicine identification method and system |
CN116863482B (en) * | 2023-09-05 | 2023-12-19 | 华立科技股份有限公司 | Mutual inductor detection method, device, equipment and storage medium |
CN117037173B (en) * | 2023-09-22 | 2024-02-27 | 武汉纺织大学 | Two-stage English character detection and recognition method and system |
CN117409400A (en) * | 2023-10-18 | 2024-01-16 | 无锡九霄科技有限公司 | Complex condition character recognition method based on deep learning network |
CN117221146B (en) * | 2023-11-09 | 2024-01-23 | 成都科江科技有限公司 | Interface layout system and layout method for ladder diagram logic configuration |
CN117315702B (en) * | 2023-11-28 | 2024-02-23 | 山东正云信息科技有限公司 | Text detection method, system and medium based on set prediction |
CN117315238B (en) * | 2023-11-29 | 2024-03-15 | 福建理工大学 | Vehicle target detection method and terminal |
CN117436442B (en) * | 2023-12-19 | 2024-03-12 | 中南大学 | Text term multiple segmentation, merging, labeling and splitting method and device |
CN117556806B (en) * | 2023-12-28 | 2024-03-22 | 大连云智信科技发展有限公司 | Fine granularity segmentation method for traditional Chinese medicine syndrome names |
CN117475038B (en) * | 2023-12-28 | 2024-04-19 | 浪潮电子信息产业股份有限公司 | Image generation method, device, equipment and computer readable storage medium |
CN117560456A (en) * | 2024-01-11 | 2024-02-13 | 卓世未来(天津)科技有限公司 | Large model data leakage prevention method and system |
CN117975467A (en) * | 2024-04-02 | 2024-05-03 | 华南理工大学 | Bridge type end-to-end character recognition method |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105740909A (en) * | 2016-02-02 | 2016-07-06 | 华中科技大学 | Text recognition method under natural scene on the basis of spatial transformation |
CN106446899A (en) * | 2016-09-22 | 2017-02-22 | 北京市商汤科技开发有限公司 | Text detection method and device and text detection training method and device |
CN106897732A (en) * | 2017-01-06 | 2017-06-27 | 华中科技大学 | Multi-direction Method for text detection in a kind of natural picture based on connection word section |
CN107617573A (en) * | 2017-09-30 | 2018-01-23 | 浙江瀚镪自动化设备股份有限公司 | A kind of logistics code identification and method for sorting based on multitask deep learning |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9245191B2 (en) * | 2013-09-05 | 2016-01-26 | Ebay, Inc. | System and method for scene text recognition |
CN104751153B (en) * | 2013-12-31 | 2018-08-14 | 中国科学院深圳先进技术研究院 | A kind of method and device of identification scene word |
CN106778757B (en) * | 2016-12-12 | 2019-06-04 | 哈尔滨工业大学 | Scene text detection method based on text conspicuousness |
CN108549893B (en) * | 2018-04-04 | 2020-03-31 | 华中科技大学 | End-to-end identification method for scene text with any shape |
-
2018
- 2018-04-04 CN CN201810294058.XA patent/CN108549893B/en active Active
-
2019
- 2019-03-29 WO PCT/CN2019/080354 patent/WO2019192397A1/en active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105740909A (en) * | 2016-02-02 | 2016-07-06 | 华中科技大学 | Text recognition method under natural scene on the basis of spatial transformation |
CN106446899A (en) * | 2016-09-22 | 2017-02-22 | 北京市商汤科技开发有限公司 | Text detection method and device and text detection training method and device |
CN106897732A (en) * | 2017-01-06 | 2017-06-27 | 华中科技大学 | Multi-direction Method for text detection in a kind of natural picture based on connection word section |
CN107617573A (en) * | 2017-09-30 | 2018-01-23 | 浙江瀚镪自动化设备股份有限公司 | A kind of logistics code identification and method for sorting based on multitask deep learning |
Non-Patent Citations (3)
Title |
---|
Deep TextSpotter: An End-to-End Trainable Scene Text Localization and Recognition Framework;Michal Busta et al;《2017 IEEE International Conference on Computer Vision》;20171231;第2223-2231页 * |
TextBoxes: A Fast Text Detector with a Single Deep Neural Network;Minghui Liao et al;《Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence》;20171231;第4161-4167页 * |
基于卷积神经网络的自然场景中数字识别;周成伟;《计算机技术与发展》;20171130;第27卷(第11期);第101-105页 * |
Also Published As
Publication number | Publication date |
---|---|
CN108549893A (en) | 2018-09-18 |
WO2019192397A1 (en) | 2019-10-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108549893B (en) | End-to-end identification method for scene text with any shape | |
CN110837835B (en) | End-to-end scene text identification method based on boundary point detection | |
CN109299274B (en) | Natural scene text detection method based on full convolution neural network | |
CN108304835B (en) | character detection method and device | |
CN110738207B (en) | Character detection method for fusing character area edge information in character image | |
CN108229303B (en) | Detection recognition and training method, device, equipment and medium for detection recognition network | |
US10424072B2 (en) | Leveraging multi cues for fine-grained object classification | |
WO2019089578A1 (en) | Font identification from imagery | |
Alidoost et al. | A CNN-based approach for automatic building detection and recognition of roof types using a single aerial image | |
CN111488826A (en) | Text recognition method and device, electronic equipment and storage medium | |
CN110782420A (en) | Small target feature representation enhancement method based on deep learning | |
CN110751154B (en) | Complex environment multi-shape text detection method based on pixel-level segmentation | |
CN111461039B (en) | Landmark identification method based on multi-scale feature fusion | |
CN112541491B (en) | End-to-end text detection and recognition method based on image character region perception | |
CN115131797B (en) | Scene text detection method based on feature enhancement pyramid network | |
CN116645592B (en) | Crack detection method based on image processing and storage medium | |
CN111860309A (en) | Face recognition method and system | |
CN110852327A (en) | Image processing method, image processing device, electronic equipment and storage medium | |
US20230095533A1 (en) | Enriched and discriminative convolutional neural network features for pedestrian re-identification and trajectory modeling | |
Sharma et al. | Segmentation of handwritten words using structured support vector machine | |
Cai et al. | IOS-Net: An inside-to-outside supervision network for scale robust text detection in the wild | |
Mohammad et al. | Contour-based character segmentation for printed Arabic text with diacritics | |
El Abbadi | Scene Text detection and Recognition by Using Multi-Level Features Extractions Based on You Only Once Version Five (YOLOv5) and Maximally Stable Extremal Regions (MSERs) with Optical Character Recognition (OCR) | |
Naosekpam et al. | UTextNet: a UNet based arbitrary shaped scene text detector | |
Shi et al. | Fuzzy support tensor product adaptive image classification for the internet of things |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |