CN108549893B - End-to-end identification method for scene text with any shape - Google Patents

End-to-end identification method for scene text with any shape Download PDF

Info

Publication number
CN108549893B
CN108549893B CN201810294058.XA CN201810294058A CN108549893B CN 108549893 B CN108549893 B CN 108549893B CN 201810294058 A CN201810294058 A CN 201810294058A CN 108549893 B CN108549893 B CN 108549893B
Authority
CN
China
Prior art keywords
text
network
character
region
rcnn
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810294058.XA
Other languages
Chinese (zh)
Other versions
CN108549893A (en
Inventor
白翔
吕鹏原
廖明辉
姚聪
储佳佳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN201810294058.XA priority Critical patent/CN108549893B/en
Publication of CN108549893A publication Critical patent/CN108549893A/en
Priority to PCT/CN2019/080354 priority patent/WO2019192397A1/en
Application granted granted Critical
Publication of CN108549893B publication Critical patent/CN108549893B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • G06V20/63Scene text, e.g. street names
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an end-to-end identification method for a scene text with any shape, which is used for extracting text characteristics through a characteristic pyramid network and generating a candidate text box through a regional extraction network; then, adjusting the position of the candidate text box through the fast region classification regression branch to obtain more accurate position information of the text bounding box; secondly, inputting the position information of the bounding box into a segmentation branch, and obtaining a predicted character sequence through a pixel voting algorithm; and finally, processing the predicted character sequence through a weighted edit distance algorithm, and finding the best matched word of the predicted sequence in the given dictionary to obtain a final text recognition result. The method can simultaneously detect and recognize scene texts in any shapes in natural images, including horizontal texts, multi-directional texts and curved texts, and can completely perform end-to-end training. Compared with the prior art, the detection and identification method provided by the invention has excellent effects in the aspects of accuracy and universality and has strong practical application value.

Description

End-to-end identification method for scene text with any shape
Technical Field
The invention belongs to the technical field of computer vision, and particularly relates to an end-to-end identification method for a scene text with any shape.
Background
Scene text detection and recognition is a very active and challenging research direction in the field of computer vision, to which many real-life applications are relevant, such as picture-based geo-location, real-time translation, and blind help.
The method for detecting and identifying the scene text aims at simultaneously detecting and identifying the text from the natural scene, namely, the method is divided into two tasks of detecting and identifying. In most of the past researches, text detection and recognition are processed separately, namely, in the first step, a trained detector is used for detecting character areas in a natural scene picture, and in the second step, the character areas detected in the first step are input into a recognition module for recognition to obtain character contents. But since these two tasks are highly correlated and complementary, on the one hand, the quality of the detection step determines the accuracy of the recognition; on the other hand, the result of the recognition may also provide feedback for the detection. Such separate processing may result in less than optimal performance of the detection and identification.
Recently, two approaches have been proposed for an end-to-end trainable framework for scene text recognition. These unified models are significantly superior to previous approaches in view of the complementarity between detection and recognition. However, both of these methods have two major drawbacks, firstly, they are not fully trained in an end-to-end fashion. Second, these methods can only recognize horizontal or directional text, but there may be significant changes in the shape of the text in the actual scene picture, from horizontal or directional to curved form. Therefore, an end-to-end recognition method capable of processing scene texts with arbitrary shapes needs to be designed.
Disclosure of Invention
The invention aims to provide an end-to-end identification method of scene texts with arbitrary shapes, which consists of a text detector based on example segmentation and a text recognizer based on character segmentation. Detecting texts in any shapes by a method of segmenting example text regions; and recognizing the text through semantic segmentation in a two-dimensional space, so that irregular text instances are recognized. The method can detect and recognize text instances of arbitrary shapes and can perform end-to-end training completely.
In order to achieve the above object, the present invention provides an end-to-end recognition method for scene texts with arbitrary shapes, which solves the problem of scene text detection and recognition from a completely new perspective, and comprises the following steps:
(1) training an arbitrarily-shaped scene text end-to-end recognition network model, comprising the following sub-steps:
(1.1) carrying out word-level labeling on multidirectional texts of all pictures in an original data set, wherein labels are polygon clockwise vertex coordinates of a text bounding box at the word level and word character sequences of the texts, and a standard training data set with labels is obtained;
and (1.2) defining an end-to-end identification network model of the scene text in any shape, wherein the detection identification network model consists of a characteristic pyramid structure network, a region extraction network, a rapid region classification regression branch network and a segmentation branch network. Calculating a training label according to the standard training data set with the label in the step (1.1), designing a loss function, and training the end-to-end recognition network of the scene text in any shape by using a reverse conduction method to obtain an end-to-end recognition network model of the scene text in any shape; the method specifically comprises the following substeps:
(1.2.1) constructing a scene text end-to-end identification network model in any shape, wherein the identification network model consists of a characteristic pyramid structure network, a region extraction network, a rapid region classification regression branch network and a segmentation branch network; the feature pyramid structure network is shown in fig. 3, and is formed by adding a bottom-up connection, a top-down connection and a transverse connection to a base network of a ResNet-50 deep convolutional neural network, and is used for extracting features fused with different resolutions from an input standard data set picture; inputting the extracted features of different scales into a region extraction network to obtain candidate text regions, obtaining the candidate text regions of fixed scales after aligning the region of interest, and respectively inputting the candidate text regions into a fast region classification regression branch network and a segmentation branch network; inputting a candidate text region with the resolution of 7 multiplied by 7 extracted by a region extraction network into a rapid region classification regression network, predicting the probability that the input candidate text region is a positive sample through classification branches, providing a more accurate candidate text region, calculating the offset of the candidate text region relative to a real text region through regression branches, and adjusting the position of the candidate text region; as shown in fig. 4, the segmentation branching network is composed of four convolutional layers Conv1, Conv2, Conv3, Conv4, one deconvolution layer DeConv and one final convolutional layer Conv5, and inputs the candidate text region with the resolution of 16 × 64 extracted by the region extraction network into the segmentation branching, and finally generates 38 target segmentation layers with the resolution of 32 × 128 through convolution and deconvolution operations; the method comprises the following steps that 1 global text instance segmentation layer is used for predicting the specific position of a text region, and a 36 character segmentation layer and a 1 character background segmentation layer are used for obtaining a predicted character sequence through a pixel voting algorithm.
(1.2.2) generating a horizontal initial bounding box on an original image according to a standard training data set with labels and a characteristic diagram, and generating training labels for a region extraction network module, a fast region classification regression branch network module and a segmentation branch network module in the recognition network model: for the labeled standard training data set Itr, the input picture true label contains a polygon P ═ { P ] representing the text region1,p2…pmAnd a character label C ═ C that indicates the category and position of the character1=(cc1,cl1),c2=(cc2,cl2),…,cn=(ccn,cln) For input picture ItriWherein P isiIs a picture ItriPolygonal bounding box of the middle text region, pij=(xij,yij) Is a polygon PiThe coordinates of the jth vertex, m, denote the number of polygonal text labels, cckAnd clkRespectively, the class and position of the kth character in the text, C is not necessary for all training samples in the present invention.
For a given standard dataset Itr, first the polygon P in the dataset tag is given as { P ═ P1,p2…pmConverts to the smallest horizontal rectangular bounding box of a polygonal text label box, which is denoted G with the center point (x, y) of the rectangle and the height h and width wd(x, y, h, w); for regionDomain extraction network, labeling bounding box G according to labeling data setd(x, y, h, w), each pixel on each feature map in the feature maps to be extracted output by the feature pyramid is corresponding to the original image, a plurality of initial bounding boxes are generated according to candidate text regions predicted by the region extraction network, and the initial bounding box Q is calculated0Annotation bounding box G with respect to an annotation data setdWhen all the labeled bounding boxes G are labeleddAnd an initial bounding box Q0All Jaccard coefficients are less than 0.5, then the initial bounding box Q0Labeled negative class non-text, class label PrpnThe value is 0; otherwise, i.e. there is at least one label bounding box GdAnd Q0Has a Jaccard coefficient of not less than 0.5, Q0Marked as positive text, class label PrpnThe value is 1, and the position offset is calculated relative to the labeling box with the maximum Jaccard coefficient, and the formula is as follows:
x=x0+w0Δx
y=y0+h0Δy
w=w0exp(Δw)
h=h0exp(Δh)
wherein x is0、y0Respectively an initial bounding box Q0Abscissa, ordinate, w of the center point of (a)0、h0Respectively an initial bounding box Q0And Δ x, Δ y are Q, respectively0Center point of (D) relative to GdThe horizontal and vertical coordinate position offset of the central point, exp is exponential operation, and the training label of the area extraction network is obtained as follows:
gtrpn=(Δxrpn,Δyrpn,Δhrpn,Δwrpn,Prpn)
for the fast region classification regression branch network, similarly, the training labels can be calculated as follows: gtrcnn=(Δxrcnn,Δyrcnn,Δhrcnn,Δwrcnn,Prcnn)
For a split branch network, two types of target tags need to be generated: global labels for text instance segmentation and character labels for character semantic segmentation; for a given positive candidate text box r, firstly, obtaining a best matching horizontal rectangle, further obtaining a matching polygon and a character box, and then, shifting and resizing the matching polygon and the character box so as to align the candidate text box r with a target label with a preset height H and a preset width W according to the following formula:
Figure BDA0001618307750000051
By=(By0-min(ry))×H/(max(ry))
wherein (r)x,ry) Is the vertex of the candidate text box r, (B)x,By) And (B)x0,By0) Are the updated vertices and the original vertices of the polygon and all character boxes, specifically rxSet of abscissas of all vertices of a candidate text box r, rySet of ordinates of all vertices of the candidate text box r, Bx,Bx0,By,By0Similarly, the target global label X is then generated by drawing a standard polygon on a zero-initialized mask and filling the value to 1gFor the character label, the character label X is generated by using the center as the origin, reducing the standard character frame to one eighth of the size of the origin frame, avoiding the character masks from overlapping each other, drawing the reduced character frames on the zero initialization mask and using the corresponding category index padding of the reduced character framescIf C does not exist, all pixels in the character layer are set to be-1 and are ignored during optimization, and finally the segmentation branch overall label gt is obtainedmaskX, in combination with the above label gtrpn,gtrcnn,gtmaskGenerating the final training label as follows:
gt={Δxrpn,Δyrpn,Δhrpn,Δwrpn,Prpn,Δxrcnn,Δyrcnn
Δhrcnn,Δwrcnn,Prcnn,X};
(1.2.3) training data set I with the standardtrAs the input of the recognition network model, extracting the characteristics by using a characteristic pyramid network module, namely extracting the characteristics of a standard training data set ItrIn the ResNet-50 network structure of the image input feature pyramid network from bottom to top, a convolutional layer unit which does not change the size of a feature map in the network is defined as a level (levels { P2, P3, P4, P5 and P6 }), and finally output convolutional features F of each level are extracted; the top-down connection in the feature pyramid network module upsamples the output convolution feature of the ResNet-50 to generate a multi-scale upsampling feature, and the transverse connection structure in the feature pyramid network module fuses the feature of each level upsampled in the top-down process and the feature generated in the bottom-up process to generate a final feature { F2, F3, F4, F5, F6}, which is shown in fig. 3.
(1.2.4) inputting the features extracted by the feature pyramid network into a region extraction network, distributing anchor points, adjusting a feature map by using a region-of-interest alignment method, and generating a candidate text box:
for input picture ItrkExtracting 5 stage features { F2, F3, F4, F5 and F6} through a feature pyramid network, and defining the feature scale of the anchor at different stages as {32 } according to stages { P2, P3, P4, P5 and P6}2,642,1282,2562,5122And each scale layer has 3 aspect ratios {1:2, 1:1, 2:1 }; 15 feature graphs { Ftr with different scales and proportions can be extracted1,Ftr2,…,Ftr15Is denoted as FtrpSubscript p ═ 1, …, 15;
by region of interest alignment operation, feature Ftr is alignedpGenerating a candidate text region of fixed scale, wherein a candidate text region R of 7 × 7 resolution is generated for the region extraction networkrcnnGenerating a candidate text region R with a resolution of 16 × 64 for dividing branchesmask(ii) a And predicting the probability P of each candidate text box as a correct text region bounding box through classificationrpnPredicting candidate textbox offsets by regression:
Yrpn=(Δxrpn,Δyrpn,Δhrpn,Δwrpn)。
(1.2.5) size (7 × 7) candidate text region R generated by region extraction networkrcnnInputting a fast region classification regression branch network module, calculating a loss function through classification and regression two branches, conducting reversely, and finally generating a predicted text bounding box: the region extraction network is divided into two network branches of classification and regression, and candidate text regions R with the size of 7 multiplied by 7 are obtainedrcnnInputting a classification branch, and outputting a classification score P of the predicted bounding box by convolution operationrcnnThat is, the probability that the bounding box is the positive text box is predicted, and the value is [0, 1 ]]A decimal fraction in between; r is to bercnnInputting regression branches and outputting 4 [0, 1 ]]Fractional component between predicted regression offset Yrcnn=(Δxrcnn,Δyrcnn,Δhrcnn,Δwrcnn) As a prediction bounding box GqThe abscissa and ordinate of the center point when predicted as a positive type text box, and the height and width of the text box are relative to the labeled bounding box GdThe abscissa of the center point, the ordinate, and the predicted positional offset of the height and width of the text box.
(1.2.6) size (16 × 64) candidate text region R generated by region extraction networkmaskThe input segmentation branch network module generates 38 target segmentation layers based on example segmentation and semantic segmentation operations: the split branch network module comprises 4 convolutional layers Conv1, Conv2, Conv3, Conv4, a deconvolution layer deconnv, and a final convolutional layer Conv 5; candidate text box R with size of 16 x 64 generated by area extraction networkmaskInputting the division branch module, and finally generating 38 target division layers { M ] with the scale of 32 x 128 through operations such as convolution, deconvolution and the likeglobal,M1,M2,…,M36,MbackgroundAnd outputting the pixel value X of each pixel in the layer, wherein the value is [0, 1 ]]In the meantime. Outputting global division layer M in layerglobalThe text area polygon Pm can be directly predicted as Pm1,pm2…pmnCharacter segmentation drawingLayer { M1,M2,…,M36And character background segmentation layer MbackgroundThe character sequence S can be predicted according to a pixel voting algorithmq
(1.2.7) taking training label gt as expected output of the network to predict labels
Figure BDA0001618307750000081
For the network prediction output, an objective loss function between the desired output and the prediction output is designed for the constructed network model: taking the training label gt obtained by calculation in the step (1.2.2) as the expected output of the network, and taking the prediction labels in the steps (1.2.4), (1.2.5) and (1.2.6)
Figure BDA0001618307750000082
Figure BDA0001618307750000083
For network prediction output, aiming at the network model constructed in (1.2.1), designing a target loss function between expected output and prediction output, wherein the overall target loss function consists of a region extraction network, a fast region classification regression branch network and a segmentation branch network loss function, and the overall target loss function expression is as follows:
L(Prpn,Yrpn,Prcnn,Yrcnn,X)=Lrpn(Prpn,Yrpn)+α1Lrcnn(Prcnn,Yrcnn)+α2Lmask(X)
wherein L isrpn(Prpn,Yrpn) Extracting the loss function of the network for the region, Lrcnn(Prcnn,Yrcnn) For fast regionally classifying the loss function of the regression branch network, Lmask(X) is a loss function of the split branch network α1,α2Are respectively a loss function LrcnnAnd LmaskThe weight coefficient of (1) is simply set to 1;
according to a designed overall target loss function, iterative training is carried out on the model by utilizing a back propagation algorithm, the overall target loss function is minimized, an optimal network model is realized, and aiming at a scene character detection and recognition task, iterative training is firstly carried out on a synthetic text data set (SynthText) in the training process to obtain initial network parameters; training is then performed on the real dataset to fine-tune the network parameters.
The character recognition is carried out on the text picture to be recognized by utilizing the trained model, and the character recognition method comprises the following substeps:
(2.1) extracting features of the text picture of the scene to be detected and recognized, inputting the extracted features into a fast region classification regression branch network to generate a candidate text region, and filtering the candidate text region by non-maximum suppression operation to obtain a more accurate candidate text region: for the data set I to be detectedtstIth picture ItstkInputting the initial bounding boxes into the model trained in the step (1.2), generating the initial bounding boxes after the model passes through the characteristic pyramid network and the region extraction network, inputting the initial bounding boxes into the fast region classification regression branch network, and performing fast region classification on each initial bounding box GqThe classification branch outputs a prediction value P based on the classification scorercnnAs an initial bounding box GqA score predicted as a positive type sample; the regression branch will output a predicted regression offset Y consisting of 4 decimalsrcnn(Δxrcnn,Δyrcnn,Δhrcnn,Δwrcnn) As GqCenter point abscissa, ordinate and height and width relative to labeled bounding box G when predicted as a positive type text boxdThe position offset of the horizontal coordinate, the vertical coordinate, the height and the width of the center point can be calculated according to the position offset, and the position Q of the quadrangular text bounding box predicted by the network can be calculatedz
For predicted text bounding box QzCarrying out non-maximum suppression operation for filtering to obtain an output result: network model to feature map FtstpEach of the initial bounding boxes Q predicted as positive-type text0All return to the horizontal quadrilateral position, and the same test picture ItstkThe situation that the positive type text quadrangles regressed on each feature map usually overlap with each other occurs, and then the non-maximum suppression operation is carried out on the positions of all the positive type text quadrangles, and the specific steps are as follows: 1) for predictedText bounding box, if and only if the text classification score PrcnnWhen the detection text box is more than or equal to 0.5, the detection text box is reserved; 2) and (4) carrying out non-maximum suppression operation (NMS) on the text box reserved in the previous step according to the Jaccard coefficient of 0.2 to obtain the final reserved quadrilateral bounding box of the positive text.
(2.2) inputting the predicted candidate text region into a segmentation branch network to perform text example segmentation and character segmentation, respectively generating a global text example segmentation mask and a character segmentation mask, obtaining a polygonal word text region by calculating the outline of the text region on the global text example segmentation mask, and predicting to obtain a character sequence by utilizing a pixel voting algorithm on the character segmentation mask: bounding box position Q of predicted quadrilateral textzThe input segmentation branch generates 38 target segmentation layers, firstly, the outline of the text region is directly calculated through a global text example segmentation mask, and the polygon of the text region is obtained. Secondly, generating a character sequence S by using a pixel voting algorithmq
Segment layer M for 36 characters1,M2,…,M36H, the value p of a pixel in the ith segmentation layerci(x, y) represents a pixel p at the corresponding position of the global text segmentation layerg(x, y) is the character ziProbability of (a), ziIs the ith of 36 characters {0, 1.. 9, a, b.. once, z }, and the probability sum of the corresponding pixel positions of the layer divided by 36 characters is 1, namely
Figure BDA0001618307750000101
Segmenting layer M for character backgroundbackgroundFirstly, binary processing is carried out on the image, and then a character region set on a background image layer is defined as R ═ { R ═ on a binary background image1,r2,,…,rnWherein riDividing an ith character area on the character background division layer, wherein n is the number of all characters on the background division layer;
the pixel voting algorithm process is as follows: firstly, character regions r in 36 character division layers and character background division layers are dividediThe connected region set is defined as Ci={ci1,ci2,…,ci36In which cijDividing a region block corresponding to the ith character region in the character background division layer in the jth character division layer, and then for the region riAnd corresponding connected region CiThe step of solving the predicted character by using the pixel voting algorithm comprises the following steps: first, a pair connection region C is calculatediInner cijThe values of all pixels are averaged, and secondly, the c with the largest average is foundij_maxThe character layer Mj_maxCorresponding character class zj_maxThe character is predicted for this character area and finally the character area r of each of the character background segmentation layers is segmentediThe final predicted character sequence S is obtained by the operationq
(2.3) processing the character sequence predicted by the segmentation branch through a weighted edit distance algorithm, finding the best matched word of the predicted sequence in a given dictionary, and obtaining a final recognition result: in the pixel voting stage, the probabilities of all character categories of each character region in the prediction sequence can be obtained, and different weights are defined for deleting, inserting and replacing operations according to the probabilities. For deletion operations, the cost is the probability that a character is predicted to be the currently deleted character; for an insertion operation, the cost is the average probability of two characters adjacent to the character insertion position; for the replacement operation, the computational cost is: max (1-s1/s2, 0), where s1 and s2 are the probabilities of the candidate character and the predicted character to be replaced. And regressing the predicted character string according to a given dictionary through a weighted editing distance algorithm, defining different weights for deletion, insertion and replacement, and adjusting the predicted word, so that the accuracy is improved, and the final recognition result is obtained.
Through the technical scheme, compared with the prior art, the invention has the following technical effects:
(1) the accuracy is high: aiming at the problem of recognizing texts in any shapes in scene texts, the method creatively utilizes example segmentation to detect the texts, semantically segments and recognizes the texts, and more accurately detects the text positions and recognizes the texts.
(2) The speed is high: the detection and recognition model provided by the invention has the advantages that the detection and recognition accuracy is ensured, and the training speed is high.
(3) The universality is strong: the invention discloses an end-to-end trainable text detection and recognition model, which can not only simultaneously detect and recognize texts and realize complete end-to-end training, but also process texts in various shapes, including horizontal, directional and curved texts;
(4) the robustness is strong: the invention can overcome the change of text dimension and shape, and can detect the recognition level, orientation and curve text at the same time.
Drawings
FIG. 1 is a flow chart of an arbitrary-shaped scene text end-to-end recognition method of the present invention, in which a solid arrow represents training and a dashed arrow represents testing;
FIG. 2 is a diagram of an arbitrarily shaped scene text end-to-end recognition network model of the present invention;
FIG. 3 is a schematic diagram of a network structure of a feature pyramid structure module in an arbitrary-shaped scene text end-to-end recognition model according to the present invention;
FIG. 4 is a diagram of a segmentation branch network structure in an arbitrary-shaped scene text end-to-end recognition model according to the present invention;
FIG. 5 is a schematic diagram of a test portion pixel voting algorithm of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
The technical terms of the present invention are explained and explained first:
ResNet-50: a neural network for classification mainly comprises 50 convolutional layers, a pooling layer and a short connecting layer. The convolution layer is used for extracting picture characteristics; the pooling layer has the functions of reducing the dimensionality of the feature vector output by the convolutional layer and reducing overfitting; the shortcut connection layer is used for transferring gradient and solving the problems of extinction and explosion gradient. The network parameters can be updated through a reverse conduction algorithm;
area extraction network: a network for generating candidate text regions is used for generating full-connection features with the height of a specific dimension on an extracted feature map by using a sliding window, generating two full-connection branch classification and regression candidate text regions according to the full-connection features, and finally generating candidate text regions with different scale proportions for a subsequent network according to different anchor points and proportions.
Jaccard coefficient: the Jaccard coefficient is used for comparing similarity and difference between limited sample sets, in the field of text detection, the Jaccard coefficient is defaulted to be equal to IOU (input/output), namely the intersection area/combination area of two frames, and describes the overlapping rate of a predicted text box and an original marked text box generated by a model, wherein the IOU is larger, the overlapping degree is higher, and the detection is more accurate.
Non-maximum inhibition (NMS): the non-maximum suppression is a post-processing algorithm widely applied in the field of computer vision detection, and the non-maximum suppression is used for filtering overlapped detection frames by means of sorting, traversing and rejecting to realize loop iteration according to a set threshold value, and removing redundant detection frames to obtain a final detection result.
As shown in fig. 1, the method for recognizing a scene text in an arbitrary shape from end to end of the present invention includes the following steps:
(1) training an arbitrarily-shaped scene text end-to-end recognition network model, comprising the following sub-steps:
(1.1) carrying out word-level labeling on multidirectional texts of all pictures in an original data set, wherein labels are polygon clockwise vertex coordinates of a text bounding box at the word level and word character sequences of the texts, and a standard training data set with labels is obtained;
and (1.2) defining an end-to-end identification network model of the scene text in any shape, wherein the detection identification network model consists of a characteristic pyramid structure network, a region extraction network, a rapid region classification regression branch network and a segmentation branch network. Calculating a training label according to the standard training data set with the label in the step (1.1), designing a loss function, and training the end-to-end recognition network of the scene text in any shape by using a reverse conduction method to obtain an end-to-end recognition network model of the scene text in any shape; the method specifically comprises the following substeps:
(1.2.1) constructing a scene text end-to-end identification network model in any shape, wherein the identification network model consists of a characteristic pyramid structure network, a region extraction network, a rapid region classification regression branch network and a segmentation branch network; the feature pyramid structure network is shown in fig. 3, and is formed by adding a bottom-up connection, a top-down connection and a transverse connection to a base network of a ResNet-50 deep convolutional neural network, and is used for extracting features fused with different resolutions from an input standard data set picture; inputting the extracted features of different scales into a region extraction network to obtain candidate text regions, obtaining the candidate text regions of fixed scales after aligning the region of interest, and respectively inputting the candidate text regions into a fast region classification regression branch network and a segmentation branch network; inputting a candidate text region with the resolution of 7 multiplied by 7 extracted by a region extraction network into a rapid region classification regression network, predicting the probability that the input candidate text region is a positive sample through classification branches, providing a more accurate candidate text region, calculating the offset of the candidate text region relative to a real text region through regression branches, and adjusting the position of the candidate text region; as shown in fig. 4, the segmentation branching network is composed of four convolutional layers Conv1, Conv2, Conv3, Conv4, one deconvolution layer DeConv and one final convolutional layer Conv5, and inputs the candidate text region with the resolution of 16 × 64 extracted by the region extraction network into the segmentation branching, and finally generates 38 target segmentation layers with the resolution of 32 × 128 through convolution and deconvolution operations; the method comprises the following steps that 1 global text instance segmentation layer is used for predicting the specific position of a text region, and a 36 character segmentation layer and a 1 character background segmentation layer are used for obtaining a predicted character sequence through a pixel voting algorithm.
(1.2.2) generating a level on the original image according to the standard training data set with labels and the characteristic diagramAn initial bounding box, which is used for generating training labels for the modules of the region extraction network, the fast region classification regression branch network and the segmentation branch network in the recognition network model: for the labeled standard training data set Itr, the input picture true label contains a polygon P ═ { P ] representing the text region1,p2…pmAnd a character label C ═ C that indicates the category and position of the character1=(cc1,cl1),c2=(cc2,cl2),…,cn=(ccn,cln) For input picture ItriWherein P isiIs a picture ItriPolygonal bounding box of the middle text region, pij=(xij,yij) Is a polygon PiThe coordinates of the jth vertex, m, denote the number of polygonal text labels, cckAnd clkRespectively, the class and position of the kth character in the text, C is not necessary for all training samples in the present invention.
For a given standard dataset Itr, first the polygon P in the dataset tag is given as { P ═ P1,p2…pmConverts to the smallest horizontal rectangular bounding box of a polygonal text label box, which is denoted G with the center point (x, y) of the rectangle and the height h and width wd(x, y, h, w); for the area extraction network, labeling bounding box G according to the labeling data setd(x, y, h, w), each pixel on each feature map in the feature maps to be extracted output by the feature pyramid is corresponding to the original image, a plurality of initial bounding boxes are generated according to candidate text regions predicted by the region extraction network, and the initial bounding box Q is calculated0Annotation bounding box G with respect to an annotation data setdWhen all the labeled bounding boxes G are labeleddAnd an initial bounding box Q0All Jaccard coefficients are less than 0.5, then the initial bounding box Q0Labeled negative class non-text, class label PrpnThe value is 0; otherwise, i.e. there is at least one label bounding box GdAnd Q0Has a Jaccard coefficient of not less than 0.5, Q0Marked as positive text, class label PrpnThe value is 1, and the position offset is calculated relative to the labeling box with the maximum Jaccard coefficient, and the formula is as follows:
x=x0+w0Δx
y=y0+h0Δy
w=w0exp(Δw)
h=h0exp(Δh)
wherein x is0、y0Respectively an initial bounding box Q0Abscissa, ordinate, w of the center point of (a)0、h0Respectively an initial bounding box Q0And Δ x, Δ y are Q, respectively0Center point of (D) relative to GdThe horizontal and vertical coordinate position offset of the central point, exp is exponential operation, and the training label of the area extraction network is obtained as follows:
gtrpn=(Δxrpn,Δyrpn,Δhrpn,Δwrpn,Prpn)
for the fast region classification regression branch network, similarly, the training labels can be calculated as follows:
gtrcnn=(Δxrcnn,Δyrcnn,Δhrcnn,Δwrcnn,Prcnn)
for a split branch network, two types of target tags need to be generated: global labels for text instance segmentation and character labels for character semantic segmentation; for a given positive candidate text box r, firstly, obtaining a best matching horizontal rectangle, further obtaining a matching polygon and a character box, and then, shifting and resizing the matching polygon and the character box so as to align the candidate text box r with a target label with a preset height H and a preset width W according to the following formula:
Figure BDA0001618307750000161
By=(By0-min(ry))×H/(max(ry))
wherein (r)x,ry) Is the vertex of the candidate text box r, (B)x,By) And
Figure BDA0001618307750000162
are the updated vertices and the original vertices of the polygon and all character boxes, specifically rxSet of abscissas of all vertices of a candidate text box r, ryIs the set of ordinates of all the vertices of the candidate text box r,
Figure BDA0001618307750000163
similarly, the target global label X is then generated by drawing a standard polygon on a zero-initialized mask and filling the value to 1gFor the character label, the character label X is generated by using the center as the origin, reducing the standard character frame to one eighth of the size of the origin frame, avoiding the character masks from overlapping each other, drawing the reduced character frames on the zero initialization mask and using the corresponding category index padding of the reduced character framescIf C does not exist, all pixels in the character layer are set to be-1 and are ignored during optimization, and finally the segmentation branch overall label gt is obtainedmaskX, in combination with the above label gtrpn,gtrcnn,gtmaskGenerating the final training label as follows:
gt={Δxrpn,Δyrpn,Δhrpn,Δwrpn,Prpn,Δxrcnn,Δyrcnn
Δhrcnn,Δwrcnn,Prcnn,X};
(1.2.3) training data set I with the standardtrAs the input of the recognition network model, extracting the characteristics by using a characteristic pyramid network module, namely extracting the characteristics of a standard training data set ItrIn the ResNet-50 network structure of the image input feature pyramid network from bottom to top, a convolutional layer unit which does not change the size of a feature map in the network is defined as a level (levels { P2, P3, P4, P5 and P6 }), and finally output convolutional features F of each level are extracted; top-down connection pair in feature pyramid network moduleThe output convolution characteristics of et-50 are subjected to upsampling to generate multi-scale upsampled characteristics, and the transverse connection structure in the characteristic pyramid network module fuses the characteristics of each level of upsampled characteristics in the top-down process and the characteristics generated in the bottom-up process to generate final characteristics { F2, F3, F4, F5 and F6}, wherein the process is shown in FIG. 3.
(1.2.4) inputting the features extracted by the feature pyramid network into a region extraction network, distributing anchor points, adjusting a feature map by using a region-of-interest alignment method, and generating a candidate text box:
for input picture ItrkExtracting 5 stage features { F2, F3, F4, F5 and F6} through a feature pyramid network, and defining the feature scale of the anchor at different stages as {32 } according to stages { P2, P3, P4, P5 and P6}2,642,1282,2562,5122And each scale layer has 3 aspect ratios {1:2, 1:1, 2:1 }; 15 feature graphs { Ftr with different scales and proportions can be extracted1,Ftr2,…,Ftr15Is denoted as FtrpSubscript p ═ 1, …, 15;
by region of interest alignment operation, feature Ftr is alignedpGenerating a candidate text region of fixed scale, wherein a candidate text region R of 7 × 7 resolution is generated for the region extraction networkrcnnGenerating a candidate text region R with a resolution of 16 × 64 for dividing branchesmask(ii) a And predicting the probability P of each candidate text box as a correct text region bounding box through classificationrpnPredicting candidate textbox offsets by regression:
Yrpn=(Δxrpn,Δyrpn,Δhrpn,Δwrpn)。
(1.2.5) size (7 × 7) candidate text region R generated by region extraction networkrcnnInputting a fast region classification regression branch network module, calculating a loss function through classification and regression two branches, conducting reversely, and finally generating a predicted text bounding box: the region extraction network is divided into two network branches of classification and regression, and candidate text regions R with the size of 7 multiplied by 7 are obtainedrcnnInput a classification branch, byThe convolution operation outputs a classification score P for the predicted bounding boxrcnnThat is, the probability that the bounding box is the positive text box is predicted, and the value is [0, 1 ]]A decimal fraction in between; r is to bercnnInputting regression branches and outputting 4 [0, 1 ]]Fractional component between predicted regression offset Yrcnn=(Δxrcnn,Δyrcnn,Δhrcnn,Δwrcnn) As a prediction bounding box GqThe abscissa and ordinate of the center point when predicted as a positive type text box, and the height and width of the text box are relative to the labeled bounding box GdThe abscissa of the center point, the ordinate, and the predicted positional offset of the height and width of the text box.
(1.2.6) size (16 × 64) candidate text region R generated by region extraction networkmaskThe input segmentation branch network module generates 38 target segmentation layers based on example segmentation and semantic segmentation operations: the split branch network module comprises 4 convolutional layers Conv1, Conv2, Conv3, Conv4, a deconvolution layer deconnv, and a final convolutional layer Conv 5; candidate text box R with size of 16 x 64 generated by area extraction networkmaskInputting the division branch module, and finally generating 38 target division layers { M ] with the scale of 32 x 128 through operations such as convolution, deconvolution and the likeglobal,M1,M2,…,M36,MbackgroundAnd outputting the pixel value X of each pixel in the layer, wherein the value is [0, 1 ]]In the meantime. Outputting global division layer M in layerglobalThe text area polygon Pm can be directly predicted as Pm1,pm2…pmn}, character segmentation layer { M1,M2,…,M36And character background segmentation layer MbackgroundThe character sequence S can be predicted according to a pixel voting algorithmq
(1.2.7) taking training label gt as expected output of the network to predict labels
Figure BDA0001618307750000181
For the network prediction output, an objective loss function between the desired output and the prediction output is designed for the constructed network model: calculated in step (1.2.2)Training label gt is the expected output of the network, with the predicted labels in steps (1.2.4) (1.2.5) and (1.2.6)
Figure BDA0001618307750000191
Figure BDA0001618307750000192
For network prediction output, aiming at the network model constructed in (1.2.1), designing a target loss function between expected output and prediction output, wherein the overall target loss function consists of a region extraction network, a fast region classification regression branch network and a segmentation branch network loss function, and the overall target loss function expression is as follows:
L(Prpn,Yrpn,Prcnn,Yrcnn,X)=Lrpn(Prpn,Yrpn)
1Lrcnn(Prcnn,Yrcnn)+α2Lmask(X)
wherein L isrpn(Prpn,Yrpn) Extracting the loss function of the network for the region, Lrcnn(Prcnn,Yrcnn) For fast regionally classifying the loss function of the regression branch network, Lmask(X) is a loss function of the split branch network α1,α2Are respectively a loss function LrcnnAnd LmaskThe weight coefficient of (1) is simply set to 1;
according to a designed overall target loss function, iterative training is carried out on the model by utilizing a back propagation algorithm, the overall target loss function is minimized, an optimal network model is realized, and aiming at a scene character detection and recognition task, iterative training is firstly carried out on a synthetic text data set (SynthText) in the training process to obtain initial network parameters; training is then performed on the real dataset to fine-tune the network parameters.
The character recognition is carried out on the text picture to be recognized by utilizing the trained model, and the character recognition method comprises the following substeps:
(2.1) inputting extracted features of text pictures of the scene to be detected and identified into a fast regional classification regression branch networkGenerating a candidate text region, and filtering the candidate text region by non-maximum suppression operation to obtain a more accurate candidate text region: for the data set I to be detectedtstIth picture ItstkInputting the initial bounding boxes into the model trained in the step (1.2), generating the initial bounding boxes after the model passes through the characteristic pyramid network and the region extraction network, inputting the initial bounding boxes into the fast region classification regression branch network, and performing fast region classification on each initial bounding box GqThe classification branch outputs a prediction value P based on the classification scorercnnAs an initial bounding box GqA score predicted as a positive type sample; the regression branch will output a predicted regression offset Y consisting of 4 decimalsrcnn(Δxrcnn,Δyrcnn,Δhrcnn,Δwrcnn) As GqCenter point abscissa, ordinate and height and width relative to labeled bounding box G when predicted as a positive type text boxdThe position offset of the horizontal coordinate, the vertical coordinate, the height and the width of the center point can be calculated according to the position offset, and the position Q of the quadrangular text bounding box predicted by the network can be calculatedz
For predicted text bounding box QzCarrying out non-maximum suppression operation for filtering to obtain an output result: network model to feature map FtstpEach of the initial bounding boxes Q predicted as positive-type text0All return to the horizontal quadrilateral position, and the same test picture ItstkThe situation that the positive type text quadrangles regressed on each feature map usually overlap with each other occurs, and then the non-maximum suppression operation is carried out on the positions of all the positive type text quadrangles, and the specific steps are as follows: 1) for the predicted text bounding box, if and only if the text classification score PrcnnWhen the detection text box is more than or equal to 0.5, the detection text box is reserved; 2) and (4) carrying out non-maximum suppression operation (NMS) on the text box reserved in the previous step according to the Jaccard coefficient of 0.2 to obtain the final reserved quadrilateral bounding box of the positive text.
(2.2) inputting the predicted candidate text area into a segmentation branch network to perform text instance segmentation and character segmentation, respectively generating a global text instance segmentation mask and a character segmentation mask, and calculating the text area on the global text instance segmentation maskObtaining a polygonal word text region by the contour of the domain, and predicting by a pixel voting algorithm on a character segmentation mask to obtain a character sequence: bounding box position Q of predicted quadrilateral textzThe input segmentation branch generates 38 target segmentation layers, firstly, the outline of the text region is directly calculated through a global text example segmentation mask, and the polygon of the text region is obtained. Secondly, generating a character sequence S by using a pixel voting algorithmq
Segment layer M for 36 characters1,M2,…,M36H, the value p of a pixel in the ith segmentation layerci(x, y) represents a pixel p at the corresponding position of the global text segmentation layerg(x, y) is the character ziProbability of (a), ziIs the ith of 36 characters {0, 1.. 9, a, b.. once, z }, and the probability sum of the corresponding pixel positions of the layer divided by 36 characters is 1, namely
Figure BDA0001618307750000211
Segmenting layer M for character backgroundbackgroundFirstly, binary processing is carried out on the image, and then a character region set on a background image layer is defined as R ═ { R ═ on a binary background image1,r2,,…,rnWherein riDividing an ith character area on the character background division layer, wherein n is the number of all characters on the background division layer;
the pixel voting algorithm process is as follows: firstly, character regions r in 36 character division layers and character background division layers are dividediThe connected region set is defined as Ci={ci1,ci2,…,ci36In which cijDividing a region block corresponding to the ith character region in the character background division layer in the jth character division layer, and then for the region riAnd corresponding connected region CiThe step of solving the predicted character by using the pixel voting algorithm comprises the following steps: first, a pair connection region C is calculatediInner cijThe values of all pixels are averaged, and secondly, the c with the largest average is foundij_maxWord of the positionSymbol layer Mj_maxCorresponding character class zj_maxThe character is predicted for this character area and finally the character area r of each of the character background segmentation layers is segmentediThe final predicted character sequence S is obtained by the operationq
(2.3) processing the character sequence predicted by the segmentation branch through a weighted edit distance algorithm, finding the best matched word of the predicted sequence in a given dictionary, and obtaining a final recognition result: in the pixel voting stage, the probabilities of all character categories of each character region in the prediction sequence can be obtained, and different weights are defined for deleting, inserting and replacing operations according to the probabilities. For deletion operations, the cost is the probability that a character is predicted to be the currently deleted character; for an insertion operation, the cost is the average probability of two characters adjacent to the character insertion position; for the replacement operation, the computational cost is: max (1-s1/s2, 0), where s1 and s2 are the probabilities of the candidate character and the predicted character to be replaced. And regressing the predicted character string according to a given dictionary through a weighted editing distance algorithm, defining different weights for deletion, insertion and replacement, and adjusting the predicted word, so that the accuracy is improved, and the final recognition result is obtained.

Claims (10)

1. An end-to-end identification method for scene texts with arbitrary shapes is characterized by comprising the following steps:
(1) training an arbitrarily-shaped scene text end-to-end recognition network model, comprising the following sub-steps:
(1.1) carrying out word-level labeling on multidirectional texts of all pictures in an original data set, wherein labels are polygon clockwise vertex coordinates of a text bounding box at the word level and word character sequences of the texts, and a standard training data set with labels is obtained;
(1.2) defining a scene text end-to-end recognition network model in any shape, calculating a training label according to the standard training data set with labels in the step (1.1), designing a loss function, and training the scene text end-to-end recognition network by using a reverse conduction method to obtain the scene text end-to-end recognition network model; the method comprises the following steps:
(1.2.1) constructing a scene text end-to-end identification network model in any shape, wherein the identification network model consists of a characteristic pyramid structure network, a region extraction network, a rapid region classification regression branch and a segmentation branch;
(1.2.2) generating a horizontal initial bounding box on an original image according to the feature map, and generating training labels for a region extraction network module, a fast region classification regression branch network module and a segmentation branch network module in the recognition network model;
(1.2.3) training data set I with the standardtrAs input for identifying the network model, extracting features by using a feature pyramid network module;
(1.2.4) inputting the features extracted by the feature pyramid network into a region extraction network, and generating a candidate text box by using a region-of-interest alignment method to adjust a feature map through anchor point distribution;
(1.2.5) inputting the candidate text box into a rapid regional classification regression network module, calculating a loss function and conducting reversely through two branches of classification and regression, and finally generating a predicted text bounding box;
(1.2.6) inputting the candidate text box into a segmentation branch network module, and generating a target segmentation layer based on example segmentation and semantic segmentation;
(1.2.7) taking training label gt as expected output of the network to predict labels
Figure FDA0002357710800000021
Designing a target loss function between the expected output and the predicted output for the network prediction output aiming at the constructed network model;
(2) the character detection and recognition of the text picture of the scene to be detected and recognized by utilizing the trained model comprises the following substeps:
(2.1) inputting extracted features of the text picture of the scene to be detected into a fast region classification regression branch network to generate a candidate text region, and filtering the candidate text region by non-maximum suppression operation to obtain a more accurate candidate text region;
(2.2) inputting the predicted candidate text region into a segmentation branch network to perform text example segmentation and character segmentation, respectively generating a global text example segmentation mask and a character segmentation mask, obtaining a polygonal word text region by calculating the outline of the text region on the global text example segmentation mask, and predicting by utilizing a pixel voting algorithm on the character segmentation mask to obtain a character sequence;
and (2.3) processing the character sequence predicted by the segmentation branch through a weighted edit distance algorithm, finding the best matched word of the predicted sequence in the given dictionary, and obtaining the final recognition result.
2. The method for recognizing the scene text in the arbitrary shape end-to-end as claimed in claim 1, wherein the step (1.2.1) of detecting and recognizing the network model specifically comprises:
the identification network model consists of a characteristic pyramid structure network, a regional extraction network, a rapid regional classification regression branch network and a segmentation branch network; the characteristic pyramid structure network is formed by adding a bottom-up connection, a top-down connection and a transverse connection by taking a ResNet-50 deep convolution neural network as a basic network, and is used for extracting and fusing characteristics with different resolutions from an input standard training data set picture; inputting the extracted features of different scales into a region extraction network to obtain candidate text regions, obtaining the candidate text regions of fixed scales after aligning the region of interest, and respectively inputting the candidate text regions into a fast region classification regression branch network and a segmentation branch network; inputting a candidate text region with the resolution of 7 multiplied by 7 extracted by a region extraction network into a rapid region classification regression network, predicting the probability that the input candidate text region is a positive sample through classification branches, providing a more accurate candidate text region, calculating the offset of the candidate text region relative to a real text region through regression branches, and adjusting the position of the candidate text region; the segmentation branch network is composed of four convolutional layers Conv1, Conv2, Conv3, Conv4, an anti-convolutional layer Deconv and a final convolutional layer Conv5, the candidate text regions with the resolution of 16 × 64 extracted by the region extraction network are input into the segmentation branch, and 38 target segmentation layers with the resolution of 32 × 128 are finally generated through convolution and deconvolution operations; the method comprises the following steps that 1 global text instance segmentation layer is used for predicting the specific position of a text region, and a 36 character segmentation layer and a 1 character background segmentation layer are used for obtaining a predicted character sequence through a pixel voting algorithm.
3. The method for recognizing the scene text in the arbitrary shape end-to-end as claimed in claim 1 or 2, wherein the step (1.2.2) is specifically as follows:
for labeled Standard training dataset ItrThe input picture true tag includes a polygon P ═ { P ] representing a text region1,p2…pmAnd a character label C ═ C that indicates the category and position of the character1=(cc1,cl1),c2=(cc2,cl2),…,cn=(ccn,cln) For input pictures }
Figure FDA0002357710800000031
Wherein, PiIs a picture
Figure FDA0002357710800000032
Polygonal bounding box of the middle text region, pij=(xij,yij) Is a polygon PiThe coordinates of the jth vertex, m, denote the number of polygonal text labels, cckAnd clkRespectively, the category and position of the kth character in the text;
for a given standard training data set ItrFirst, the polygon P in the dataset label is given as { P ═ P1,p2…pmConverts to the smallest horizontal rectangular bounding box of a polygonal text label box, which is denoted G with the center point (x, y) of the rectangle and the height h and width wd(x, y, h, w); for the area extraction network, the labeled bounding box G of the data set is trained according to the standardd(x, y, h, w), each pixel on each feature map to be extracted output by the feature pyramid corresponds to the original image according toRegion extraction network predicted candidate text region to generate multiple initial bounding boxes, calculating initial bounding box Q0Labeled bounding box G relative to a standard training data setdWhen all the labeled bounding boxes G are labeleddAnd an initial bounding box Q0All Jaccard coefficients are less than 0.5, then the initial bounding box Q0Labeled negative class non-text, class label PrpnThe value is 0; otherwise, i.e. there is at least one label bounding box GdAnd Q0Has a Jaccard coefficient of not less than 0.5, Q0Marked as positive text, class label PrpnThe value is 1, and the position offset is calculated relative to the labeling box with the maximum Jaccard coefficient, and the formula is as follows:
x=x0+w0Δx
y=y0+h0Δy
w=w0exp(Δw)
h=h0exp(Δh)
wherein x is0、y0Respectively an initial bounding box Q0Abscissa, ordinate, w of the center point of (a)0、h0Respectively an initial bounding box Q0And Δ x, Δ y are Q, respectively0Center point of (D) relative to GdThe horizontal and vertical coordinate position offset of the central point, exp is exponential operation, and the training label of the area extraction network is obtained as follows:
gtrpn=(Δxrpn,Δyrpn,Δhrpn,Δwrpn,Prpn)
for the fast region classification regression branch network, similarly, the training labels can be calculated as follows:
gtrcnn=(Δxrcnn,Δyrcnn,Δhrcnn,Δwrcnn,Prcnn);
for a split branch network, two types of target tags need to be generated: global labels for text instance segmentation and character labels for character semantic segmentation; for a given positive candidate text box r, firstly, obtaining a best matching horizontal rectangle, further obtaining a matching polygon and a character box, and then, shifting and resizing the matching polygon and the character box so as to align the candidate text box r with a target label with a preset height H and a preset width W according to the following formula:
Figure FDA0002357710800000054
Figure FDA0002357710800000055
wherein (r)x,ry) Is the vertex of the candidate text box r, (B)x,By) And
Figure FDA0002357710800000051
are the updated vertices and the original vertices of the polygon and all character boxes, specifically rxSet of abscissas of all vertices of a candidate text box r, rySet of ordinates of all vertices of the candidate text box r, Bx,
Figure FDA0002357710800000052
By,
Figure FDA0002357710800000053
Similarly, the target global label X is then generated by drawing a standard polygon on a zero-initialized mask and filling the value to 1gFor the character label, the character label X is generated by using the center as the origin, reducing the standard character frame to one eighth of the size of the origin frame, avoiding the character masks from overlapping each other, drawing the reduced character frames on the zero initialization mask and using the corresponding category index padding of the reduced character framescIf C does not exist, all pixels in the character layer are set to be-1 and are ignored during optimization, and finally the segmentation branch overall label gt is obtainedmaskX, in combination with the above label gtrpn,gtrcnn,gtmaskGenerating the final training label as follows:
gt={Δxrpn,Δyrpn,Δhrpn,Δwrpn,Prpn,Δxrcnn,Δyrcnn,Δhrcnn,Δwrcnn,Prcnn,X}。
4. the method for recognizing the scene text in the arbitrary shape end-to-end as claimed in claim 1 or 2, wherein the step (1.2.3) is specifically as follows:
standard training data set ItrIn the ResNet-50 network structure of the image input feature pyramid network from bottom to top, a convolutional layer unit which does not change the size of a feature map in the network is defined as a level (levels { P2, P3, P4, P5 and P6 }), and finally output convolutional features F of each level are extracted; and the top-down connection in the feature pyramid network module performs up-sampling on the output convolution features of ResNet-50 to generate multi-scale up-sampling features, and the transverse connection structure in the feature pyramid network module performs fusion on the features of each level up-sampled in the top-down process and the features generated in the bottom-up process to generate final features { F2, F3, F4, F5, F6 }.
5. The method for recognizing the scene text in the arbitrary shape end-to-end as claimed in claim 1 or 2, wherein the step (1.2.4) is specifically as follows:
for input picture ItrkExtracting 5 stage features { F2, F3, F4, F5 and F6} through a feature pyramid network, and defining the feature scale of the anchor at different stages as {32 } according to stages { P2, P3, P4, P5 and P6}2,642,1282,2562,5122And each scale layer has 3 aspect ratios {1:2, 1:1, 2:1 }; 15 feature graphs { Ftr with different scales and proportions can be extracted1,Ftr2,…,Ftr15Is denoted as FtrpSubscript p ═ 1, …, 15;
by region of interest alignment operation, feature Ftr is alignedpGenerating fixed-scale candidate text regionsA domain in which candidate text regions R with a resolution of 7 x 7 are generated for the region extraction networkrcnnGenerating a candidate text region R with a resolution of 16 × 64 for dividing branchesmask(ii) a And predicting the probability P of each candidate text box as a correct text region bounding box through classificationrpnPredicting candidate textbox offset Y by regressionrpn=(Δxrpn,Δyrpn,Δhrpn,Δwrpn)。
6. The method for recognizing the scene text in the arbitrary shape end-to-end as claimed in claim 1 or 2, wherein the step (1.2.5) is specifically as follows:
the region extraction network is divided into two network branches of classification and regression, and candidate text regions R with the size of 7 multiplied by 7 are obtainedrcnnInputting a classification branch, and outputting a classification score P of the predicted bounding box by convolution operationrcnnThat is, the probability that the bounding box is the positive text box is predicted, and the value is [0, 1 ]]A decimal fraction in between; r is to bercnnInputting regression branches and outputting 4 [0, 1 ]]Fractional component between predicted regression offset Yrcnn=(Δxrcnn,Δyrcnn,Δhrcnn,Δwrcnn) As a prediction bounding box GqThe abscissa and ordinate of the center point when predicted as a positive type text box, and the height and width of the text box are relative to the labeled bounding box GdThe abscissa of the center point, the ordinate, and the predicted positional offset of the height and width of the text box.
7. The method for recognizing the scene text in the arbitrary shape end-to-end as claimed in claim 1 or 2, wherein the step (1.2.6) is specifically as follows:
the split branch network module comprises 4 convolutional layers Conv1, Conv2, Conv3, Conv4, a deconvolution layer deconnv, and a final convolutional layer Conv 5; candidate text box R with size of 16 x 64 generated by area extraction networkmaskInputting the division branch module, and finally generating 38 target division layers { M ] with the scale of 32 x 128 through operations such as convolution, deconvolution and the likeglobal,M1,M2,…,M36,MbackgroundAnd outputting the pixel value X of each pixel in the layer, wherein the value is [0, 1 ]]In the output layer, the global division layer M in the output layerglobalThe text area polygon Pm can be directly predicted as Pm1,pm2…pmn}, character segmentation layer { M1,M2,…,M36And character background segmentation layer MbackgroundThe character sequence Sq can be predicted according to a pixel voting algorithm.
8. The method for recognizing the scene text in the arbitrary shape end-to-end as claimed in claim 1 or 2, wherein the step (1.2.7) is specifically as follows:
taking the training label gt obtained by calculation in the step (1.2.2) as the expected output of the network, and taking the prediction labels in the steps (1.2.4), (1.2.5) and (1.2.6)
Figure FDA0002357710800000071
Figure FDA0002357710800000072
For network prediction output, aiming at the network model constructed in (1.2.1), designing a target loss function between expected output and prediction output, wherein the overall target loss function consists of a region extraction network, a fast region classification regression branch network and a segmentation branch network loss function, and the overall target loss function expression is as follows:
L(Prpn,Yrpn,Prcnn,Yrcnn,X)=Lrpn(Prpn,Yrpn)+α1Lrcnn(Prcnn,Yrcnn)+α2Lmask(X)
wherein L isrpn(Prpn,Yrpn) Extracting the loss function of the network for the region, Lrcnn(Prcnn,Yrcnn) For fast regionally classifying the loss function of the regression branch network, Lmask(X) is a loss function for splitting the branched network, α1,α2Are respectively a loss function LrcnnAnd LmaskThe weight coefficient of (1) is simply set to 1;
according to the designed overall target loss function, iterative training is carried out on the model by utilizing a back propagation algorithm, the overall target loss function is minimized, the optimal network model is realized, and aiming at a scene character detection and recognition task, iterative training is firstly carried out on a synthetic text data set in the training process to obtain initial network parameters; training is then performed on the real dataset to fine-tune the network parameters.
9. The method for recognizing the scene text in the arbitrary shape end-to-end as claimed in claim 1 or 2, wherein the step (2.1) is specifically as follows:
for the data set I to be detectedtstIth picture ItstkInputting the initial bounding boxes into the model trained in the step (1.2), generating the initial bounding boxes after the model passes through the characteristic pyramid network and the region extraction network, inputting the initial bounding boxes into the fast region classification regression branch network, and performing fast region classification on each initial bounding box GqThe classification branch outputs a prediction value P based on the classification scorercnnAs an initial bounding box GqA score predicted as a positive type sample; the regression branch will output a predicted regression offset Y consisting of 4 decimalsrcnn(Δxrcnn,Δyrcnn,Δhrcnn,Δwrcnn) As GqCenter point abscissa, ordinate and height and width relative to labeled bounding box G when predicted as a positive type text boxdThe position offset of the horizontal coordinate, the vertical coordinate, the height and the width of the center point can be calculated according to the position offset, and the position Q of the quadrangular text bounding box predicted by the network can be calculatedz
For predicted text bounding box QzCarrying out non-maximum suppression operation for filtering to obtain an output result: network model to feature map FtstpEach of the initial bounding boxes Q predicted as positive-type text0All return to the horizontal quadrilateral position, and the same test picture ItstkThe positive type text quadrangles regressed on each feature map usually overlap with each other, and all the positive type texts need to be processedPerforming non-maximum suppression operation on the quadrilateral position, which comprises the following specific steps: 1) for the predicted text bounding box, if and only if the text classification score PrcnnWhen the number of the text boxes is more than or equal to 0.5, the text boxes are detected and reserved; 2) and (4) carrying out non-maximum suppression operation on the text box reserved in the previous step according to the Jaccard coefficient of 0.2 to obtain the final reserved quadrilateral bounding box of the positive text.
10. The method for recognizing the scene text in the arbitrary shape end-to-end as claimed in claim 1 or 2, wherein the step (2.2) is specifically as follows:
bounding box position Q of predicted quadrilateral textzInputting a segmentation branch to generate 38 target segmentation layers, firstly, segmenting a mask through a global text example, directly calculating the outline of a text region to obtain a polygon of the text region, and secondly, generating a character sequence S by utilizing a pixel voting algorithmq
Segment layer M for 36 characters1,M2,…,M36H, the value p of a pixel in the ith segmentation layerci(x, y) represents a pixel p at the corresponding position of the global text segmentation layerg(x, y) is the character ziProbability of (a), ziIs the ith of 36 characters {0, 1.. 9, a, b.. once, z }, and the probability sum of the corresponding pixel positions of the layer divided by 36 characters is 1, namely
Figure FDA0002357710800000091
Segmenting layer M for character backgroundbackgroundFirstly, binary processing is carried out on the image, and then a character region set on a background image layer is defined as R ═ { R ═ on a binary background image1,r2,,…,rnWherein riDividing an ith character area on the character background division layer, wherein n is the number of all characters on the background division layer;
the pixel voting algorithm process is as follows: firstly, character regions r in 36 character division layers and character background division layers are dividediThe connected region set is defined as Ci={ci1,ci2,…,ci36In which cijDividing a region block corresponding to the ith character region in the character background division layer in the jth character division layer, and then for the region riAnd corresponding connected region CiThe step of solving the predicted character by using the pixel voting algorithm comprises the following steps: first, a pair connection region C is calculatediInner cijThe values of all pixels are averaged, and secondly, the c with the largest average is foundij_maxThe character layer Mj_maxCorresponding character class zj_maxThe character is predicted for this character area and finally the character area r of each of the character background segmentation layers is segmentediThe final predicted character sequence S is obtained by the operationq
CN201810294058.XA 2018-04-04 2018-04-04 End-to-end identification method for scene text with any shape Active CN108549893B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201810294058.XA CN108549893B (en) 2018-04-04 2018-04-04 End-to-end identification method for scene text with any shape
PCT/CN2019/080354 WO2019192397A1 (en) 2018-04-04 2019-03-29 End-to-end recognition method for scene text in any shape

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810294058.XA CN108549893B (en) 2018-04-04 2018-04-04 End-to-end identification method for scene text with any shape

Publications (2)

Publication Number Publication Date
CN108549893A CN108549893A (en) 2018-09-18
CN108549893B true CN108549893B (en) 2020-03-31

Family

ID=63514169

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810294058.XA Active CN108549893B (en) 2018-04-04 2018-04-04 End-to-end identification method for scene text with any shape

Country Status (2)

Country Link
CN (1) CN108549893B (en)
WO (1) WO2019192397A1 (en)

Families Citing this family (329)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108549893B (en) * 2018-04-04 2020-03-31 华中科技大学 End-to-end identification method for scene text with any shape
CN109492672A (en) * 2018-10-17 2019-03-19 福州大学 Under a kind of natural scene quickly, the positioning of the bank card of robust and classification method
CN109583449A (en) * 2018-10-29 2019-04-05 深圳市华尊科技股份有限公司 Character identifying method and Related product
CN109299274B (en) * 2018-11-07 2021-12-17 南京大学 Natural scene text detection method based on full convolution neural network
CN109492638A (en) * 2018-11-07 2019-03-19 北京旷视科技有限公司 Method for text detection, device and electronic equipment
CN112789623A (en) * 2018-11-16 2021-05-11 北京比特大陆科技有限公司 Text detection method, device and storage medium
CN109559300A (en) * 2018-11-19 2019-04-02 上海商汤智能科技有限公司 Image processing method, electronic equipment and computer readable storage medium
CN109753956A (en) * 2018-11-23 2019-05-14 西北工业大学 The multi-direction text detection algorithm extracted based on dividing candidate area
CN109544564A (en) * 2018-11-23 2019-03-29 清华大学深圳研究生院 A kind of medical image segmentation method
CN109785359B (en) * 2018-11-27 2020-12-04 北京理工大学 Video target detection method based on depth feature pyramid and tracking loss
EP3660731B1 (en) * 2018-11-28 2024-05-22 Tata Consultancy Services Limited Digitization of industrial inspection sheets by inferring visual relations
CN111259878A (en) * 2018-11-30 2020-06-09 中移(杭州)信息技术有限公司 Method and equipment for detecting text
CN111292334B (en) * 2018-12-10 2023-06-09 北京地平线机器人技术研发有限公司 Panoramic image segmentation method and device and electronic equipment
CN109753966A (en) * 2018-12-16 2019-05-14 初速度(苏州)科技有限公司 A kind of Text region training system and method
CN109740484A (en) * 2018-12-27 2019-05-10 斑马网络技术有限公司 The method, apparatus and system of road barrier identification
CN110008808B (en) * 2018-12-29 2021-04-09 北京迈格威科技有限公司 Panorama segmentation method, device and system and storage medium
CN109886286B (en) * 2019-01-03 2021-07-23 武汉精测电子集团股份有限公司 Target detection method based on cascade detector, target detection model and system
CN111489283B (en) * 2019-01-25 2023-08-11 鸿富锦精密工业(武汉)有限公司 Picture format conversion method and device and computer storage medium
CN109858432B (en) * 2019-01-28 2022-01-04 北京市商汤科技开发有限公司 Method and device for detecting character information in image and computer equipment
CN109829437B (en) * 2019-02-01 2022-03-25 北京旷视科技有限公司 Image processing method, text recognition device and electronic system
CN109977997B (en) * 2019-02-13 2021-02-02 中国科学院自动化研究所 Image target detection and segmentation method based on convolutional neural network rapid robustness
CN110176017A (en) * 2019-03-01 2019-08-27 北京纵目安驰智能科技有限公司 A kind of Model for Edge Detection based on target detection, method and storage medium
CN110008950A (en) * 2019-03-13 2019-07-12 南京大学 The method of text detection in the natural scene of a kind of pair of shape robust
CN109948510B (en) * 2019-03-14 2021-06-11 北京易道博识科技有限公司 Document image instance segmentation method and device
CN109919239A (en) * 2019-03-15 2019-06-21 尹显东 A kind of diseases and pests of agronomic crop intelligent detecting method based on deep learning
CN109948533B (en) * 2019-03-19 2021-02-09 讯飞智元信息科技有限公司 Text detection method, device and equipment and readable storage medium
CN109977949B (en) * 2019-03-20 2024-01-26 深圳华付技术股份有限公司 Frame fine adjustment text positioning method and device, computer equipment and storage medium
CN111723627A (en) * 2019-03-22 2020-09-29 北京搜狗科技发展有限公司 Image processing method and device and electronic equipment
CN111753575A (en) * 2019-03-26 2020-10-09 杭州海康威视数字技术股份有限公司 Text recognition method, device and equipment
CN109977952B (en) * 2019-03-27 2021-10-22 深动科技(北京)有限公司 Candidate target detection method based on local maximum
CN109934229B (en) * 2019-03-28 2021-08-03 网易有道信息技术(北京)有限公司 Image processing method, device, medium and computing equipment
CN110135248A (en) * 2019-04-03 2019-08-16 华南理工大学 A kind of natural scene Method for text detection based on deep learning
CN110147786B (en) 2019-04-11 2021-06-29 北京百度网讯科技有限公司 Method, apparatus, device, and medium for detecting text region in image
CN110032969B (en) * 2019-04-11 2021-11-05 北京百度网讯科技有限公司 Method, apparatus, device, and medium for detecting text region in image
CN110059753A (en) * 2019-04-19 2019-07-26 北京朗镜科技有限责任公司 Model training method, interlayer are every recognition methods, device, equipment and medium
CN110321923B (en) * 2019-05-10 2021-05-04 上海大学 Target detection method, system and medium for fusion of different-scale receptive field characteristic layers
CN112001406B (en) * 2019-05-27 2023-09-08 杭州海康威视数字技术股份有限公司 Text region detection method and device
CN110147788B (en) * 2019-05-27 2021-09-21 东北大学 Feature enhancement CRNN-based metal plate strip product label character recognition method
CN110276279B (en) * 2019-06-06 2020-06-16 华东师范大学 Method for detecting arbitrary-shape scene text based on image segmentation
CN110348445B (en) * 2019-06-06 2021-07-27 华中科技大学 Instance segmentation method fusing void convolution and edge information
CN110334705B (en) * 2019-06-25 2021-08-03 华中科技大学 Language identification method of scene text image combining global and local information
CN110263877B (en) * 2019-06-27 2022-07-08 中国科学技术大学 Scene character detection method
CN110276351B (en) * 2019-06-28 2022-09-06 中国科学技术大学 Multi-language scene text detection and identification method
CN110287960B (en) * 2019-07-02 2021-12-10 中国科学院信息工程研究所 Method for detecting and identifying curve characters in natural scene image
CN110443140B (en) * 2019-07-05 2023-10-03 平安科技(深圳)有限公司 Text positioning method, device, computer equipment and storage medium
CN110443258B (en) * 2019-07-08 2021-03-02 北京三快在线科技有限公司 Character detection method and device, electronic equipment and storage medium
CN110443141A (en) * 2019-07-08 2019-11-12 深圳中兴网信科技有限公司 Data set processing method, data set processing unit and storage medium
CN110503090B (en) * 2019-07-09 2021-11-09 中国科学院信息工程研究所 Character detection network training method based on limited attention model, character detection method and character detector
CN110363140B (en) * 2019-07-15 2022-11-11 成都理工大学 Human body action real-time identification method based on infrared image
CN110490191B (en) * 2019-07-16 2022-03-04 北京百度网讯科技有限公司 Training method and system of end-to-end model, and Chinese recognition method and system
CN112241736B (en) * 2019-07-19 2024-01-26 上海高德威智能交通系统有限公司 Text detection method and device
CN110427852B (en) * 2019-07-24 2022-04-15 北京旷视科技有限公司 Character recognition method and device, computer equipment and storage medium
CN113159016A (en) * 2019-07-26 2021-07-23 第四范式(北京)技术有限公司 Text position positioning method and system and model training method and system
CN110895695B (en) * 2019-07-31 2023-02-24 上海海事大学 Deep learning network for character segmentation of text picture and segmentation method
CN110503085A (en) * 2019-07-31 2019-11-26 联想(北京)有限公司 A kind of data processing method, electronic equipment and computer readable storage medium
CN110674807A (en) * 2019-08-06 2020-01-10 中国科学院信息工程研究所 Curved scene character detection method based on semi-supervised and weakly supervised learning
CN110458132A (en) * 2019-08-19 2019-11-15 河海大学常州校区 One kind is based on random length text recognition method end to end
CN110516732B (en) * 2019-08-22 2022-03-15 北京地平线机器人技术研发有限公司 Training method of feature pyramid network, and method and device for extracting image features
CN110852324A (en) * 2019-08-23 2020-02-28 上海撬动网络科技有限公司 Deep neural network-based container number detection method
CN110598698B (en) * 2019-08-29 2022-02-15 华中科技大学 Natural scene text detection method and system based on adaptive regional suggestion network
CN110533113B (en) * 2019-09-04 2022-11-11 湖南大学 Method for detecting branch points of tree structure in digital image
CN110533041B (en) * 2019-09-05 2022-07-01 重庆邮电大学 Regression-based multi-scale scene text detection method
CN110738207B (en) * 2019-09-10 2020-06-19 西南交通大学 Character detection method for fusing character area edge information in character image
CN110705535A (en) * 2019-09-19 2020-01-17 安徽七天教育科技有限公司 Method for automatically detecting test paper layout character line
CN110807764A (en) * 2019-09-20 2020-02-18 成都智能迭迦科技合伙企业(有限合伙) Lung cancer screening method based on neural network
CN110751154B (en) * 2019-09-27 2022-04-08 西北工业大学 Complex environment multi-shape text detection method based on pixel-level segmentation
CN110717427B (en) * 2019-09-27 2022-08-12 华中科技大学 Multi-direction object detection method based on vertex sliding
CN110689012A (en) * 2019-10-08 2020-01-14 山东浪潮人工智能研究院有限公司 End-to-end natural scene text recognition method and system
CN111626279B (en) * 2019-10-15 2023-06-02 西安网算数据科技有限公司 Negative sample labeling training method and highly-automatic bill identification method
CN111126401B (en) * 2019-10-17 2023-06-02 安徽清新互联信息科技有限公司 License plate character recognition method based on context information
CN111062381B (en) * 2019-10-17 2023-09-01 安徽清新互联信息科技有限公司 License plate position detection method based on deep learning
CN110766707B (en) * 2019-10-22 2022-09-23 河海大学常州校区 Cavitation bubble image processing method based on multi-operator fusion edge detection technology
CN111222396B (en) * 2019-10-23 2023-07-18 江苏大学 All-weather multispectral pedestrian detection method
CN110765733A (en) * 2019-10-24 2020-02-07 科大讯飞股份有限公司 Text normalization method, device, equipment and storage medium
CN110781967B (en) * 2019-10-29 2022-08-19 华中科技大学 Real-time text detection method based on differentiable binarization
CN110837835B (en) * 2019-10-29 2022-11-08 华中科技大学 End-to-end scene text identification method based on boundary point detection
CN110807422B (en) * 2019-10-31 2023-05-23 华南理工大学 Natural scene text detection method based on deep learning
CN110796143A (en) * 2019-10-31 2020-02-14 天津大学 Scene text recognition method based on man-machine cooperation
CN112749704A (en) * 2019-10-31 2021-05-04 北京金山云网络技术有限公司 Text region detection method and device and server
CN110956088B (en) * 2019-10-31 2023-06-30 北京易道博识科技有限公司 Overlapped text line positioning and segmentation method and system based on deep learning
CN112749599A (en) * 2019-10-31 2021-05-04 北京金山云网络技术有限公司 Image enhancement method and device and server
CN110837796B (en) * 2019-11-05 2022-08-19 泰康保险集团股份有限公司 Image processing method and device
CN111104962B (en) * 2019-11-05 2023-04-18 北京航空航天大学青岛研究院 Semantic segmentation method and device for image, electronic equipment and readable storage medium
CN112825141B (en) * 2019-11-21 2023-02-17 上海高德威智能交通系统有限公司 Method and device for recognizing text, recognition equipment and storage medium
CN111010605B (en) * 2019-11-26 2021-08-17 杭州东信北邮信息技术有限公司 Method for displaying video picture-in-picture window
CN111062386B (en) * 2019-11-28 2023-12-29 大连交通大学 Natural scene text detection method based on depth pyramid attention and feature fusion
CN110969129B (en) * 2019-12-03 2023-09-01 山东浪潮科学研究院有限公司 End-to-end tax bill text detection and recognition method
CN110929678B (en) * 2019-12-04 2023-04-25 山东省计算中心(国家超级计算济南中心) Method for detecting vulvovaginal candida spores
CN111008600B (en) * 2019-12-06 2023-04-07 中国科学技术大学 Lane line detection method
CN111178148B (en) * 2019-12-06 2023-06-02 天津大学 Ground target geographic coordinate positioning method based on unmanned aerial vehicle vision system
CN111061904B (en) * 2019-12-06 2023-04-18 武汉理工大学 Local picture rapid detection method based on image content identification
CN110991440B (en) * 2019-12-11 2023-10-13 易诚高科(大连)科技有限公司 Pixel-driven mobile phone operation interface text detection method
CN112990188A (en) * 2019-12-13 2021-06-18 华为技术有限公司 Text recognition method and device
CN111104892A (en) * 2019-12-16 2020-05-05 武汉大千信息技术有限公司 Human face tampering identification method based on target detection, model and identification method thereof
CN111061915B (en) * 2019-12-17 2023-04-18 中国科学技术大学 Video character relation identification method
CN111079649B (en) * 2019-12-17 2023-04-07 西安电子科技大学 Remote sensing image ground feature classification method based on lightweight semantic segmentation network
CN110991403A (en) * 2019-12-19 2020-04-10 同方知网(北京)技术有限公司 Document information fragmentation extraction method based on visual deep learning
CN111126386B (en) * 2019-12-20 2023-06-30 复旦大学 Sequence domain adaptation method based on countermeasure learning in scene text recognition
CN111144469B (en) * 2019-12-20 2023-05-02 复旦大学 End-to-end multi-sequence text recognition method based on multi-dimensional associated time sequence classification neural network
CN111008613B (en) * 2019-12-24 2023-12-19 黑龙江文旅信息科技有限公司 High-density traffic positioning and monitoring method based on field
CN111126266B (en) * 2019-12-24 2023-05-05 上海智臻智能网络科技股份有限公司 Text processing method, text processing system, equipment and medium
CN111046840B (en) * 2019-12-26 2023-06-23 天津理工大学 Personnel safety monitoring method and system based on artificial intelligence in pollution remediation environment
CN111144411B (en) * 2019-12-27 2024-02-27 南京大学 Irregular text correction and identification method and system based on saliency map
CN111160242A (en) * 2019-12-27 2020-05-15 上海眼控科技股份有限公司 Image target detection method, system, electronic terminal and storage medium
CN111160352B (en) * 2019-12-27 2023-04-07 创新奇智(北京)科技有限公司 Workpiece metal surface character recognition method and system based on image segmentation
CN111160372B (en) * 2019-12-30 2023-04-18 沈阳理工大学 Large target identification method based on high-speed convolutional neural network
CN111126410B (en) * 2019-12-31 2022-11-18 讯飞智元信息科技有限公司 Character recognition method, device, equipment and readable storage medium
CN111178358A (en) * 2019-12-31 2020-05-19 上海眼控科技股份有限公司 Text recognition method and device, computer equipment and storage medium
CN111178364A (en) * 2019-12-31 2020-05-19 北京奇艺世纪科技有限公司 Image identification method and device
CN111145202B (en) * 2019-12-31 2024-03-08 北京奇艺世纪科技有限公司 Model generation method, image processing method, device, equipment and storage medium
CN111191611B (en) * 2019-12-31 2023-10-13 同济大学 Traffic sign label identification method based on deep learning
CN111242122B (en) * 2020-01-07 2023-09-08 浙江大学 Lightweight deep neural network rotating target detection method and system
CN111242027B (en) * 2020-01-13 2023-04-14 北京工业大学 Unsupervised learning scene feature rapid extraction method fusing semantic information
CN111310746B (en) * 2020-01-15 2024-03-01 支付宝实验室(新加坡)有限公司 Text line detection method, model training method, device, server and medium
CN111291759A (en) * 2020-01-17 2020-06-16 北京三快在线科技有限公司 Character detection method and device, electronic equipment and storage medium
CN111310609B (en) * 2020-01-22 2023-04-07 西安电子科技大学 Video target detection method based on time sequence information and local feature similarity
CN111428749A (en) * 2020-02-21 2020-07-17 平安科技(深圳)有限公司 Image annotation task pre-verification method, device, equipment and storage medium
CN111340784B (en) * 2020-02-25 2023-06-23 安徽大学 Mask R-CNN-based image tampering detection method
CN113324864B (en) * 2020-02-28 2022-09-20 南京理工大学 Pantograph carbon slide plate abrasion detection method based on deep learning target detection
CN111461114B (en) * 2020-03-03 2023-05-02 华南理工大学 Multi-scale feature pyramid text detection method based on segmentation
CN111368831B (en) * 2020-03-03 2023-05-23 开放智能机器(上海)有限公司 Positioning system and method for vertical text
CN111353458B (en) * 2020-03-10 2023-08-18 腾讯科技(深圳)有限公司 Text box labeling method, device and storage medium
CN113496223A (en) * 2020-03-19 2021-10-12 顺丰科技有限公司 Method and device for establishing text region detection model
CN111553361B (en) * 2020-03-19 2022-11-01 四川大学华西医院 Pathological section label identification method
CN111414855B (en) * 2020-03-19 2023-03-24 国网陕西省电力公司电力科学研究院 Telegraph pole sign target detection and identification method based on end-to-end regression model
CN111310861B (en) * 2020-03-27 2023-05-23 西安电子科技大学 License plate recognition and positioning method based on deep neural network
CN113449760A (en) * 2020-03-27 2021-09-28 北京沃东天骏信息技术有限公司 Character recognition method and device
CN111476302B (en) * 2020-04-08 2023-03-24 北京工商大学 fast-RCNN target object detection method based on deep reinforcement learning
CN113516673B (en) * 2020-04-10 2022-12-02 阿里巴巴集团控股有限公司 Image detection method, device, equipment and storage medium
CN111488883A (en) * 2020-04-14 2020-08-04 上海眼控科技股份有限公司 Vehicle frame number identification method and device, computer equipment and storage medium
CN111444919B (en) * 2020-04-17 2023-07-04 南京大学 Method for detecting text with arbitrary shape in natural scene
CN111461133B (en) * 2020-04-20 2023-04-18 上海东普信息科技有限公司 Express delivery surface single item name identification method, device, equipment and storage medium
CN111461101B (en) * 2020-04-20 2023-05-19 上海东普信息科技有限公司 Method, device, equipment and storage medium for identifying work clothes mark
CN111507333B (en) * 2020-04-21 2023-09-15 腾讯科技(深圳)有限公司 Image correction method and device, electronic equipment and storage medium
CN111582329B (en) * 2020-04-22 2023-03-28 西安交通大学 Natural scene text character detection and labeling method based on multi-example learning
CN111553345B (en) * 2020-04-22 2023-10-20 上海浩方信息技术有限公司 Method for realizing meter pointer reading identification processing based on Mask RCNN and orthogonal linear regression
CN111507292B (en) * 2020-04-22 2023-05-12 广东光大信息科技股份有限公司 Handwriting board correction method, handwriting board correction device, computer equipment and storage medium
CN111553351A (en) * 2020-04-26 2020-08-18 佛山市南海区广工大数控装备协同创新研究院 Semantic segmentation based text detection method for arbitrary scene shape
CN111563502B (en) * 2020-05-09 2023-12-15 腾讯科技(深圳)有限公司 Image text recognition method and device, electronic equipment and computer storage medium
CN111723841A (en) * 2020-05-09 2020-09-29 北京捷通华声科技股份有限公司 Text detection method and device, electronic equipment and storage medium
CN111640089B (en) * 2020-05-09 2023-08-15 武汉精立电子技术有限公司 Defect detection method and device based on feature map center point
CN111597945B (en) * 2020-05-11 2023-08-18 济南博观智能科技有限公司 Target detection method, device, equipment and medium
CN111524135B (en) * 2020-05-11 2023-12-26 安徽继远软件有限公司 Method and system for detecting defects of tiny hardware fittings of power transmission line based on image enhancement
CN111753653B (en) * 2020-05-15 2024-05-03 中铁第一勘察设计院集团有限公司 High-speed rail contact net fastener identification and positioning method based on attention mechanism
CN111553355B (en) * 2020-05-18 2023-07-28 城云科技(中国)有限公司 Monitoring video-based method for detecting and notifying store outgoing business and managing store owner
CN111753828B (en) * 2020-05-19 2022-12-27 重庆邮电大学 Natural scene horizontal character detection method based on deep convolutional neural network
CN111783523B (en) * 2020-05-19 2022-10-21 中国人民解放军93114部队 Remote sensing image rotating target detection method
CN112001878A (en) * 2020-05-21 2020-11-27 合肥合工安驰智能科技有限公司 Deep learning ore scale measuring method based on binarization neural network and application system
CN111612081B (en) * 2020-05-25 2024-04-02 深圳前海微众银行股份有限公司 Training method, device, equipment and storage medium for recognition model
CN111667469B (en) * 2020-06-03 2023-10-31 北京小白世纪网络科技有限公司 Lung disease classification method, device and equipment
CN111932583A (en) * 2020-06-05 2020-11-13 西安羚控电子科技有限公司 Space-time information integrated intelligent tracking method based on complex background
CN111709987B (en) * 2020-06-11 2023-04-07 上海东普信息科技有限公司 Package volume measuring method, device, equipment and storage medium
CN111860479B (en) * 2020-06-16 2024-03-26 北京百度网讯科技有限公司 Optical character recognition method, device, electronic equipment and storage medium
CN111783572B (en) * 2020-06-17 2023-11-14 泰康保险集团股份有限公司 Text detection method and device
CN111753714B (en) * 2020-06-23 2023-09-01 中南大学 Multidirectional natural scene text detection method based on character segmentation
CN111915628B (en) * 2020-06-24 2023-11-24 浙江大学 Single-stage instance segmentation method based on prediction target dense boundary points
CN111898597A (en) * 2020-06-24 2020-11-06 泰康保险集团股份有限公司 Method, device, equipment and computer readable medium for processing text image
CN111985525B (en) * 2020-06-30 2023-09-22 上海海事大学 Text recognition method based on multi-mode information fusion processing
CN111950353B (en) * 2020-06-30 2024-04-19 深圳市雄帝科技股份有限公司 Seal text recognition method and device and electronic equipment
CN111783427B (en) * 2020-06-30 2024-04-02 北京百度网讯科技有限公司 Method, device, equipment and storage medium for training model and outputting information
CN111798516B (en) * 2020-07-01 2023-12-22 广东省特种设备检测研究院珠海检测院 Method for detecting running state quantity and analyzing errors of bridge crane equipment
CN111783763A (en) * 2020-07-07 2020-10-16 厦门商集网络科技有限责任公司 Text positioning box correction method and system based on convolutional neural network
CN111931572B (en) * 2020-07-07 2024-01-09 广东工业大学 Target detection method for remote sensing image
CN111783705B (en) * 2020-07-08 2023-11-14 厦门商集网络科技有限责任公司 Character recognition method and system based on attention mechanism
CN111860264B (en) * 2020-07-10 2024-01-05 武汉理工大学 Multi-task instance-level road scene understanding algorithm based on gradient equalization strategy
CN111814705B (en) * 2020-07-14 2022-08-02 广西师范大学 Pedestrian re-identification method based on batch blocking shielding network
CN111798480A (en) * 2020-07-23 2020-10-20 北京思图场景数据科技服务有限公司 Character detection method and device based on single character and character connection relation prediction
CN111860506B (en) * 2020-07-24 2024-03-29 北京百度网讯科技有限公司 Method and device for recognizing characters
CN111914727B (en) * 2020-07-28 2024-04-26 联芯智能(南京)科技有限公司 Small target human body detection method based on balance sampling and nonlinear feature fusion
CN111898610B (en) * 2020-07-29 2024-04-19 平安科技(深圳)有限公司 Card unfilled corner detection method, device, computer equipment and storage medium
CN111753812A (en) * 2020-07-30 2020-10-09 上海眼控科技股份有限公司 Text recognition method and equipment
CN112016403B (en) * 2020-08-05 2023-07-21 中山大学 Video abnormal event detection method
CN111930622B (en) * 2020-08-10 2023-10-13 中国工商银行股份有限公司 Interface control testing method and system based on deep learning
CN112069907A (en) * 2020-08-11 2020-12-11 盛视科技股份有限公司 X-ray machine image recognition method, device and system based on example segmentation
CN112069910B (en) * 2020-08-11 2024-03-01 上海海事大学 Multi-directional ship target detection method for remote sensing image
CN112200181B (en) * 2020-08-19 2023-10-10 西安理工大学 Character shape approximation method based on particle swarm optimization algorithm
CN112102250B (en) * 2020-08-20 2022-11-04 西北大学 Method for establishing and detecting pathological image detection model with training data as missing label
CN112926372B (en) * 2020-08-22 2023-03-10 清华大学 Scene character detection method and system based on sequence deformation
CN112070082B (en) * 2020-08-24 2023-04-07 西安理工大学 Curve character positioning method based on instance perception component merging network
CN111985439A (en) * 2020-08-31 2020-11-24 中移(杭州)信息技术有限公司 Face detection method, device, equipment and storage medium
CN112036405A (en) * 2020-08-31 2020-12-04 浪潮云信息技术股份公司 Detection and identification method for handwritten document text
CN112052853B (en) * 2020-09-09 2024-02-02 国家气象信息中心 Text positioning method of handwriting meteorological archive data based on deep learning
CN112085122B (en) * 2020-09-21 2024-03-15 中国科学院上海微系统与信息技术研究所 Ontology-based semi-supervised image scene semantic deepening method
CN112101277B (en) * 2020-09-24 2023-07-28 湖南大学 Remote sensing target detection method based on image semantic feature constraint
CN112101386B (en) * 2020-09-25 2024-04-23 腾讯科技(深圳)有限公司 Text detection method, device, computer equipment and storage medium
CN112183322B (en) * 2020-09-27 2022-07-19 成都数之联科技股份有限公司 Text detection and correction method for any shape
CN112085735B (en) * 2020-09-28 2022-10-25 西安交通大学 Aluminum material image defect detection method based on self-adaptive anchor frame
CN112183545B (en) * 2020-09-29 2024-05-17 佛山市南海区广工大数控装备协同创新研究院 Natural scene text recognition method with arbitrary shape
CN112287977B (en) * 2020-10-06 2024-02-09 武汉大学 Target detection method based on bounding box key point distance
CN112036398B (en) * 2020-10-15 2024-02-23 北京一览群智数据科技有限责任公司 Text correction method and system
CN112215235B (en) * 2020-10-16 2024-04-26 深圳华付技术股份有限公司 Scene text detection method aiming at large character spacing and local shielding
CN112308150B (en) * 2020-11-02 2022-04-15 平安科技(深圳)有限公司 Target detection model training method and device, computer equipment and storage medium
CN112419174B (en) * 2020-11-04 2022-09-20 中国科学院自动化研究所 Image character removing method, system and device based on gate cycle unit
CN112270370B (en) * 2020-11-06 2023-06-02 北京环境特性研究所 Vehicle apparent damage assessment method
CN112434698A (en) * 2020-11-23 2021-03-02 泰康保险集团股份有限公司 Character recognition method, character recognition device, electronic equipment and storage medium
CN112464943B (en) * 2020-11-25 2023-07-14 创新奇智(南京)科技有限公司 Semantic segmentation method and device based on few samples, electronic equipment and storage medium
CN112418134B (en) * 2020-12-01 2024-02-27 厦门大学 Pedestrian analysis-based multi-stream multi-tag pedestrian re-identification method
CN112529768B (en) * 2020-12-04 2023-01-06 中山大学 Garment editing and generating method based on generation countermeasure network
CN112541491B (en) * 2020-12-07 2024-02-02 沈阳雅译网络技术有限公司 End-to-end text detection and recognition method based on image character region perception
CN112446372B (en) * 2020-12-08 2022-11-08 电子科技大学 Text detection method based on channel grouping attention mechanism
CN112650832B (en) * 2020-12-14 2022-09-06 中国电子科技集团公司第二十八研究所 Knowledge correlation network key node discovery method based on topology and literature characteristics
CN112633343B (en) * 2020-12-16 2024-04-19 国网江苏省电力有限公司检修分公司 Method and device for checking wiring of power equipment terminal strip
CN112598635B (en) * 2020-12-18 2024-03-12 武汉大学 Point cloud 3D target detection method based on symmetric point generation
CN112528997B (en) * 2020-12-24 2022-04-19 西北民族大学 Tibetan-Chinese bilingual scene text detection method based on text center region amplification
CN112669446B (en) * 2020-12-24 2024-04-19 联通(浙江)产业互联网有限公司 Building scene modeling method and device
CN112580738B (en) * 2020-12-25 2021-07-23 特赞(上海)信息科技有限公司 AttentionOCR text recognition method and device based on improvement
CN113435466A (en) * 2020-12-26 2021-09-24 上海有个机器人有限公司 Method, device, medium and terminal for detecting elevator door position and switch state
CN112598683B (en) * 2020-12-27 2024-04-02 北京化工大学 Sweep OCT human eye image segmentation method based on sweep frequency optical coherence tomography
CN112651948B (en) * 2020-12-30 2022-04-12 重庆科技学院 Machine vision-based artemisinin extraction intelligent tracking and identification method
CN112862842B (en) * 2020-12-31 2023-05-12 青岛海尔科技有限公司 Image data processing method and device, storage medium and electronic device
CN112686245B (en) * 2021-01-04 2022-05-13 福州大学 Character and text parallel detection method based on character response
CN112686203B (en) * 2021-01-12 2023-10-31 重庆大学 Vehicle safety warning device detection method based on space priori
CN112801146B (en) * 2021-01-13 2024-03-19 华中科技大学 Target detection method and system
CN112733768B (en) * 2021-01-15 2022-09-09 中国科学技术大学 Natural scene text recognition method and device based on bidirectional characteristic language model
CN112766361A (en) * 2021-01-18 2021-05-07 山东师范大学 Target fruit detection method and detection system under homochromatic background
CN112712535B (en) * 2021-01-18 2024-03-22 长安大学 Mask-RCNN landslide segmentation method based on simulation difficult sample
CN112883795B (en) * 2021-01-19 2023-01-31 贵州电网有限责任公司 Rapid and automatic table extraction method based on deep neural network
CN112651989B (en) * 2021-01-19 2024-01-19 华东理工大学 SEM image molecular sieve particle size statistical method and system based on Mask RCNN example segmentation
CN112766263B (en) * 2021-01-21 2024-02-02 西安理工大学 Identification method for multi-layer control stock relationship share graphs
CN112766262B (en) * 2021-01-21 2024-02-02 西安理工大学 Identification method for single-layer one-to-many and many-to-one share graphs
CN112784737B (en) * 2021-01-21 2023-10-20 上海云从汇临人工智能科技有限公司 Text detection method, system and device combining pixel segmentation and line segment anchor
CN112766194A (en) * 2021-01-26 2021-05-07 上海海洋大学 Detection method for mesoscale ocean eddy
CN112818975A (en) * 2021-01-27 2021-05-18 北京金山数字娱乐科技有限公司 Text detection model training method and device and text detection method and device
CN112990211B (en) * 2021-01-29 2023-07-11 华为技术有限公司 Training method, image processing method and device for neural network
CN112801092B (en) * 2021-01-29 2022-07-15 重庆邮电大学 Method for detecting character elements in natural scene image
CN112766274B (en) * 2021-02-01 2023-07-07 长沙市盛唐科技有限公司 Water gauge image water level automatic reading method and system based on Mask RCNN algorithm
CN112946436A (en) * 2021-02-02 2021-06-11 成都国铁电气设备有限公司 Online intelligent detection method for arc extinction and disconnection of vehicle-mounted contact net insulator
CN112818873B (en) * 2021-02-04 2023-05-26 苏州魔视智能科技有限公司 Lane line detection method and system and electronic equipment
CN112700444B (en) * 2021-02-19 2023-06-23 中国铁道科学研究院集团有限公司铁道建筑研究所 Bridge bolt detection method based on self-attention and central point regression model
CN112883887B (en) * 2021-03-01 2023-07-18 中央财经大学 Building instance automatic extraction method based on high spatial resolution optical remote sensing image
CN113095319B (en) * 2021-03-03 2022-11-15 中国科学院信息工程研究所 Multidirectional scene character detection method and device based on full convolution angular point correction network
CN113065401A (en) * 2021-03-04 2021-07-02 国网河北省电力有限公司 Intelligent platform for full-ticket account reporting
CN113065404B (en) * 2021-03-08 2023-02-24 国网河北省电力有限公司 Method and system for detecting train ticket content based on equal-width character segments
CN113159021A (en) * 2021-03-10 2021-07-23 国网河北省电力有限公司 Text detection method based on context information
CN113033346B (en) * 2021-03-10 2023-08-04 北京百度网讯科技有限公司 Text detection method and device and electronic equipment
CN112966678B (en) * 2021-03-11 2023-01-24 南昌航空大学 Text detection method and system
CN113011597B (en) * 2021-03-12 2023-02-28 山东英信计算机技术有限公司 Deep learning method and device for regression task
US11682220B2 (en) * 2021-03-15 2023-06-20 Optum Technology, Inc. Overlap-aware optical character recognition
CN113052369B (en) * 2021-03-15 2024-05-10 北京农业智能装备技术研究中心 Intelligent agricultural machinery operation management method and system
CN113033377A (en) * 2021-03-16 2021-06-25 北京有竹居网络技术有限公司 Character position correction method, character position correction device, electronic equipment and storage medium
CN112907605B (en) * 2021-03-19 2023-11-17 南京大学 Data enhancement method for instance segmentation
CN113128560B (en) * 2021-03-19 2023-02-24 西安理工大学 CNN regular script style classification method based on attention module enhancement
CN112733822B (en) * 2021-03-31 2021-07-27 上海旻浦科技有限公司 End-to-end text detection and identification method
CN113052759B (en) * 2021-03-31 2023-03-21 华南理工大学 Scene complex text image editing method based on MASK and automatic encoder
CN112926692B (en) * 2021-04-09 2023-05-09 四川翼飞视科技有限公司 Target detection device, method and storage medium based on non-uniform mixed convolution
CN112927245B (en) * 2021-04-12 2022-06-21 华中科技大学 End-to-end instance segmentation method based on instance query
CN113033540A (en) * 2021-04-14 2021-06-25 易视腾科技股份有限公司 Contour fitting and correcting method for scene characters, electronic device and storage medium
CN113033482B (en) * 2021-04-20 2024-01-30 上海应用技术大学 Traffic sign detection method based on regional attention
CN113177389A (en) * 2021-04-23 2021-07-27 网易(杭州)网络有限公司 Text processing method and device, electronic equipment and storage medium
CN113139541B (en) * 2021-04-24 2023-10-24 西安交通大学 Power distribution cabinet dial nixie tube visual identification method based on deep learning
CN113269197B (en) * 2021-04-25 2024-03-08 南京三百云信息科技有限公司 Certificate image vertex coordinate regression system and identification method based on semantic segmentation
CN113762237B (en) * 2021-04-26 2023-08-18 腾讯科技(深圳)有限公司 Text image processing method, device, equipment and storage medium
CN113159053A (en) * 2021-04-27 2021-07-23 北京有竹居网络技术有限公司 Image recognition method and device and computing equipment
CN113269045A (en) * 2021-04-28 2021-08-17 南京大学 Chinese artistic word detection and recognition method under natural scene
CN113191296A (en) * 2021-05-13 2021-07-30 中国人民解放军陆军炮兵防空兵学院 Method for detecting five parameters of target in any orientation based on YOLOV5
CN113139625B (en) * 2021-05-18 2023-12-15 北京世纪好未来教育科技有限公司 Model training method, electronic equipment and storage medium thereof
CN113221773B (en) * 2021-05-19 2022-09-13 中国电子科技集团公司第二十八研究所 Method for quickly constructing airplane classification data set based on remote sensing image
CN113516116B (en) * 2021-05-19 2022-11-22 西安建筑科技大学 Text detection method, system and medium suitable for complex natural scene
CN113177511A (en) * 2021-05-20 2021-07-27 中国人民解放军国防科技大学 Rotating frame intelligent perception target detection method based on multiple data streams
CN113159037B (en) * 2021-05-25 2023-08-08 中国平安人寿保险股份有限公司 Picture correction method, device, computer equipment and storage medium
CN113379761B (en) * 2021-05-25 2023-04-28 重庆顺多利机车有限责任公司 Linkage method and system of multiple AGVs and automatic doors based on artificial intelligence
CN113177553B (en) * 2021-05-31 2022-08-12 哈尔滨工业大学(深圳) Method and device for identifying floor buttons of inner panel of elevator
CN113191358B (en) * 2021-05-31 2023-01-24 上海交通大学 Metal part surface text detection method and system
CN113313173B (en) * 2021-06-01 2023-05-30 中山大学 Human body analysis method based on graph representation and improved transducer
CN115457531A (en) * 2021-06-07 2022-12-09 京东科技信息技术有限公司 Method and device for recognizing text
CN113362380A (en) * 2021-06-09 2021-09-07 北京世纪好未来教育科技有限公司 Image feature point detection model training method and device and electronic equipment thereof
CN113343980B (en) * 2021-06-10 2023-06-09 西安邮电大学 Natural scene text detection method and system
CN113378815B (en) * 2021-06-16 2023-11-24 南京信息工程大学 Scene text positioning and identifying system and training and identifying method thereof
CN113345106A (en) * 2021-06-24 2021-09-03 西南大学 Three-dimensional point cloud analysis method and system based on multi-scale multi-level converter
CN113360655B (en) * 2021-06-25 2022-10-04 中国电子科技集团公司第二十八研究所 Track point classification and text generation method based on sequence annotation
CN113255669B (en) * 2021-06-28 2021-10-01 山东大学 Method and system for detecting text of natural scene with any shape
CN113569650A (en) * 2021-06-29 2021-10-29 上海红檀智能科技有限公司 Unmanned aerial vehicle autonomous inspection positioning method based on electric power tower label identification
CN113343987B (en) * 2021-06-30 2023-08-22 北京奇艺世纪科技有限公司 Text detection processing method and device, electronic equipment and storage medium
CN113469177B (en) * 2021-06-30 2024-04-26 河海大学 Deep learning-based drainage pipeline defect detection method and system
WO2023279186A1 (en) * 2021-07-06 2023-01-12 Orbiseed Technology Inc. Methods and systems for extracting text and symbols from documents
CN113435542A (en) * 2021-07-22 2021-09-24 安徽理工大学 Coal and gangue real-time detection method based on deep learning
CN113343990B (en) * 2021-07-28 2021-12-03 浩鲸云计算科技股份有限公司 Key text detection and classification training method for certificate pictures
CN113657213A (en) * 2021-07-30 2021-11-16 五邑大学 Text recognition method, text recognition device and computer-readable storage medium
CN113763326B (en) * 2021-08-04 2023-11-21 武汉工程大学 Pantograph detection method based on Mask scanning R-CNN network
CN113807336B (en) * 2021-08-09 2023-06-30 华南理工大学 Semi-automatic labeling method, system, computer equipment and medium for image text detection
CN113780087B (en) * 2021-08-11 2024-04-26 同济大学 Postal package text detection method and equipment based on deep learning
CN113643136A (en) * 2021-09-01 2021-11-12 京东科技信息技术有限公司 Information processing method, system and device
CN113807340B (en) * 2021-09-07 2024-03-15 南京信息工程大学 Attention mechanism-based irregular natural scene text recognition method
CN113807351B (en) * 2021-09-18 2024-01-16 京东鲲鹏(江苏)科技有限公司 Scene text detection method and device
CN113837168A (en) * 2021-09-22 2021-12-24 易联众智鼎(厦门)科技有限公司 Image text detection and OCR recognition method, device and storage medium
CN113989708A (en) * 2021-10-27 2022-01-28 福州大学 Campus library epidemic prevention and control method based on YOLO v4
TWI807467B (en) * 2021-11-02 2023-07-01 中國信託商業銀行股份有限公司 Key-item detection model building method, business-oriented key-value identification system and method
CN114049625B (en) * 2021-11-11 2024-02-27 西北工业大学 Multidirectional text detection method based on novel image shrinkage method
CN114155540B (en) * 2021-11-16 2024-05-03 深圳市联洲国际技术有限公司 Character recognition method, device, equipment and storage medium based on deep learning
CN114140786B (en) * 2021-12-03 2024-05-17 杭州师范大学 HRNet coding and double-branch decoding-based scene text recognition method
CN114332839A (en) * 2021-12-30 2022-04-12 福州大学 Streetscape text detection method based on multi-space joint perception
CN114332841A (en) * 2021-12-31 2022-04-12 福州大学 Scene text detection method based on selective feature fusion pyramid
CN114399757A (en) * 2022-01-13 2022-04-26 福州大学 Natural scene text recognition method and system for multi-path parallel position correlation network
CN114067321B (en) * 2022-01-14 2022-04-08 腾讯科技(深圳)有限公司 Text detection model training method, device, equipment and storage medium
CN114418001B (en) * 2022-01-20 2023-05-12 北方工业大学 Character recognition method and system based on parameter reconstruction network
CN114419020B (en) * 2022-01-26 2022-10-18 深圳大学 Medical image segmentation method, medical image segmentation device, computer equipment and storage medium
CN114201967B (en) * 2022-02-17 2022-06-10 杭州费尔斯通科技有限公司 Entity identification method, system and device based on candidate entity classification
CN114549958B (en) * 2022-02-24 2023-08-04 四川大学 Night and camouflage target detection method based on context information perception mechanism
CN114359912B (en) * 2022-03-22 2022-06-24 杭州实在智能科技有限公司 Software page key information extraction method and system based on graph neural network
CN114399769B (en) * 2022-03-22 2022-08-02 北京百度网讯科技有限公司 Training method of text recognition model, and text recognition method and device
CN114723946B (en) * 2022-04-11 2024-02-27 合肥工业大学 Preferential direction deviation early warning system and method based on semantic segmentation
CN114862648B (en) * 2022-05-27 2023-06-20 晋城市大锐金马工程设计咨询有限公司 Cross-watermark encrypted document using A, B two documents
CN114972947B (en) * 2022-07-26 2022-12-06 之江实验室 Depth scene text detection method and device based on fuzzy semantic modeling
CN114972710B (en) * 2022-07-27 2022-10-28 深圳爱莫科技有限公司 Method and system for realizing multi-shape target detection in image
CN115346206B (en) * 2022-10-20 2023-01-31 松立控股集团股份有限公司 License plate detection method based on improved super-resolution deep convolution feature recognition
CN115546778B (en) * 2022-10-22 2023-06-13 清华大学 Scene text detection method and system based on multitask learning
CN115909376A (en) * 2022-11-01 2023-04-04 北京百度网讯科技有限公司 Text recognition method, text recognition model training device and storage medium
CN115422389B (en) * 2022-11-07 2023-04-07 北京百度网讯科技有限公司 Method and device for processing text image and training method of neural network
CN115497106B (en) * 2022-11-14 2023-01-24 合肥中科类脑智能技术有限公司 Battery laser code-spraying identification method based on data enhancement and multitask model
CN116701347B (en) * 2023-05-08 2023-12-05 北京三维天地科技股份有限公司 Data modeling method and system based on category expansion
CN116342627B (en) * 2023-05-23 2023-09-08 山东大学 Intestinal epithelial metaplasia area image segmentation system based on multi-instance learning
CN116434234B (en) * 2023-05-25 2023-10-17 珠海亿智电子科技有限公司 Method, device, equipment and storage medium for detecting and identifying casting blank characters
CN116442393B (en) * 2023-06-08 2024-02-13 山东博硕自动化技术有限公司 Intelligent unloading method, system and control equipment for mixing plant based on video identification
CN116436987B (en) * 2023-06-12 2023-08-22 深圳舜昌自动化控制技术有限公司 IO-Link master station data message transmission processing method and system
CN116524521B (en) * 2023-06-30 2023-09-15 武汉纺织大学 English character recognition method and system based on deep learning
CN116524529B (en) * 2023-07-04 2023-10-27 青岛海信信息科技股份有限公司 Novel method for identifying layers based on graph nesting relationship
CN117078901B (en) * 2023-07-12 2024-04-16 长江勘测规划设计研究有限责任公司 Automatic marking method for single-point bar in steel bar view
CN116740688B (en) * 2023-08-11 2023-11-07 武汉市中西医结合医院(武汉市第一医院) Medicine identification method and system
CN116863482B (en) * 2023-09-05 2023-12-19 华立科技股份有限公司 Mutual inductor detection method, device, equipment and storage medium
CN117037173B (en) * 2023-09-22 2024-02-27 武汉纺织大学 Two-stage English character detection and recognition method and system
CN117409400A (en) * 2023-10-18 2024-01-16 无锡九霄科技有限公司 Complex condition character recognition method based on deep learning network
CN117221146B (en) * 2023-11-09 2024-01-23 成都科江科技有限公司 Interface layout system and layout method for ladder diagram logic configuration
CN117315702B (en) * 2023-11-28 2024-02-23 山东正云信息科技有限公司 Text detection method, system and medium based on set prediction
CN117315238B (en) * 2023-11-29 2024-03-15 福建理工大学 Vehicle target detection method and terminal
CN117436442B (en) * 2023-12-19 2024-03-12 中南大学 Text term multiple segmentation, merging, labeling and splitting method and device
CN117556806B (en) * 2023-12-28 2024-03-22 大连云智信科技发展有限公司 Fine granularity segmentation method for traditional Chinese medicine syndrome names
CN117475038B (en) * 2023-12-28 2024-04-19 浪潮电子信息产业股份有限公司 Image generation method, device, equipment and computer readable storage medium
CN117560456A (en) * 2024-01-11 2024-02-13 卓世未来(天津)科技有限公司 Large model data leakage prevention method and system
CN117975467A (en) * 2024-04-02 2024-05-03 华南理工大学 Bridge type end-to-end character recognition method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105740909A (en) * 2016-02-02 2016-07-06 华中科技大学 Text recognition method under natural scene on the basis of spatial transformation
CN106446899A (en) * 2016-09-22 2017-02-22 北京市商汤科技开发有限公司 Text detection method and device and text detection training method and device
CN106897732A (en) * 2017-01-06 2017-06-27 华中科技大学 Multi-direction Method for text detection in a kind of natural picture based on connection word section
CN107617573A (en) * 2017-09-30 2018-01-23 浙江瀚镪自动化设备股份有限公司 A kind of logistics code identification and method for sorting based on multitask deep learning

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9245191B2 (en) * 2013-09-05 2016-01-26 Ebay, Inc. System and method for scene text recognition
CN104751153B (en) * 2013-12-31 2018-08-14 中国科学院深圳先进技术研究院 A kind of method and device of identification scene word
CN106778757B (en) * 2016-12-12 2019-06-04 哈尔滨工业大学 Scene text detection method based on text conspicuousness
CN108549893B (en) * 2018-04-04 2020-03-31 华中科技大学 End-to-end identification method for scene text with any shape

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105740909A (en) * 2016-02-02 2016-07-06 华中科技大学 Text recognition method under natural scene on the basis of spatial transformation
CN106446899A (en) * 2016-09-22 2017-02-22 北京市商汤科技开发有限公司 Text detection method and device and text detection training method and device
CN106897732A (en) * 2017-01-06 2017-06-27 华中科技大学 Multi-direction Method for text detection in a kind of natural picture based on connection word section
CN107617573A (en) * 2017-09-30 2018-01-23 浙江瀚镪自动化设备股份有限公司 A kind of logistics code identification and method for sorting based on multitask deep learning

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Deep TextSpotter: An End-to-End Trainable Scene Text Localization and Recognition Framework;Michal Busta et al;《2017 IEEE International Conference on Computer Vision》;20171231;第2223-2231页 *
TextBoxes: A Fast Text Detector with a Single Deep Neural Network;Minghui Liao et al;《Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence》;20171231;第4161-4167页 *
基于卷积神经网络的自然场景中数字识别;周成伟;《计算机技术与发展》;20171130;第27卷(第11期);第101-105页 *

Also Published As

Publication number Publication date
CN108549893A (en) 2018-09-18
WO2019192397A1 (en) 2019-10-10

Similar Documents

Publication Publication Date Title
CN108549893B (en) End-to-end identification method for scene text with any shape
CN110837835B (en) End-to-end scene text identification method based on boundary point detection
CN109299274B (en) Natural scene text detection method based on full convolution neural network
CN108304835B (en) character detection method and device
CN110738207B (en) Character detection method for fusing character area edge information in character image
CN108229303B (en) Detection recognition and training method, device, equipment and medium for detection recognition network
US10424072B2 (en) Leveraging multi cues for fine-grained object classification
WO2019089578A1 (en) Font identification from imagery
Alidoost et al. A CNN-based approach for automatic building detection and recognition of roof types using a single aerial image
CN111488826A (en) Text recognition method and device, electronic equipment and storage medium
CN110782420A (en) Small target feature representation enhancement method based on deep learning
CN110751154B (en) Complex environment multi-shape text detection method based on pixel-level segmentation
CN111461039B (en) Landmark identification method based on multi-scale feature fusion
CN112541491B (en) End-to-end text detection and recognition method based on image character region perception
CN115131797B (en) Scene text detection method based on feature enhancement pyramid network
CN116645592B (en) Crack detection method based on image processing and storage medium
CN111860309A (en) Face recognition method and system
CN110852327A (en) Image processing method, image processing device, electronic equipment and storage medium
US20230095533A1 (en) Enriched and discriminative convolutional neural network features for pedestrian re-identification and trajectory modeling
Sharma et al. Segmentation of handwritten words using structured support vector machine
Cai et al. IOS-Net: An inside-to-outside supervision network for scale robust text detection in the wild
Mohammad et al. Contour-based character segmentation for printed Arabic text with diacritics
El Abbadi Scene Text detection and Recognition by Using Multi-Level Features Extractions Based on You Only Once Version Five (YOLOv5) and Maximally Stable Extremal Regions (MSERs) with Optical Character Recognition (OCR)
Naosekpam et al. UTextNet: a UNet based arbitrary shaped scene text detector
Shi et al. Fuzzy support tensor product adaptive image classification for the internet of things

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant