CN110399845A - Continuously at section text detection and recognition methods in a kind of image - Google Patents

Continuously at section text detection and recognition methods in a kind of image Download PDF

Info

Publication number
CN110399845A
CN110399845A CN201910688854.6A CN201910688854A CN110399845A CN 110399845 A CN110399845 A CN 110399845A CN 201910688854 A CN201910688854 A CN 201910688854A CN 110399845 A CN110399845 A CN 110399845A
Authority
CN
China
Prior art keywords
text
moment
sequence
indicates
indicate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910688854.6A
Other languages
Chinese (zh)
Inventor
刘晋
龚沛朱
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Maritime University
Original Assignee
Shanghai Maritime University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Maritime University filed Critical Shanghai Maritime University
Priority to CN201910688854.6A priority Critical patent/CN110399845A/en
Publication of CN110399845A publication Critical patent/CN110399845A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/146Aligning or centring of the image pick-up or image-field
    • G06V30/1475Inclination or skew detection or correction of characters or of image to be recognised
    • G06V30/1478Inclination or skew detection or correction of characters or of image to be recognised of characters or characters lines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition

Abstract

The present invention discloses in a kind of image based on SegLink and Attention-based CRNN fusion treatment continuously at section text detection and recognition methods, belong to optical character recognition technology field, aim to solve the problem that text detection in the digitlization of OCR information document, especially inclination text detection accurate rate is low, location difficulty, font cutting is difficult, the low problem of recognition accuracy.The present invention has built the SegLink+CRNN model based on Tensorflow deep learning frame, pass through the line of text in SegLink network detection image, row cutting will be pressed at section text, single file text feature is extracted by the convolutional neural networks intensively connected, the sequence information of context in bidirectional circulating Processing with Neural Network text, individual character cutting problems are avoided using CTC decoding algorithm, influence of the individual character cutting link to recognition accuracy is eliminated, and further incorporates Attention mechanism when CTC is transcribed and improves recognition accuracy for text sequence characteristic.This method is all suitable in block letter and handwriting recongnition, and can apply to English, the multilingual text identification such as Chinese.

Description

Continuously at section text detection and recognition methods in a kind of image
Technical field
The invention belongs to computer vision, target detection and optical character recognition technology fields, are related to for information document In text detection and identification, in particular to a kind of figure based on SegLink Yu Attention-based CRNN fusion treatment Continuously at section text detection and recognition methods as in.
Background technique
The detection of text is carried out in natural scene and identification is a field of greatest concern in current computer vision, It includes two subtasks: text detection and text identification.
Current Method for text detection is based on process from bottom to top, from low-level features such as simple characters and stroke mostly It detects, then carries out non-textual filtering, the building of line of text and the verifying of line of text.The accuracy rate of this method is very big Be to rely in degree character machining as a result, and the mistake of its testing result can constantly accumulate during from bottom to top, Therefore its reliability is poor and its structure is extremely complex.
In object detection task, depth convolutional neural networks are widely used, such as Faster RCNN structure, wherein Region candidate network (Region Proposal Network, RPN) is proposed, is generated directly from the characteristic pattern of convolutional layer high The classification candidate frame of quality.But since line of text is by multiple characters, text is spliced, without specific closed boundary, Therefore the problems such as will appear candidate frame overlapping or leak detection in identification.For another example CTPN network structure is fixed its horizontal direction and is adopted With the position of line of text in vertical anchor point regression forecasting image, but the network structure is difficult to solve the text in inclination text image Detection task.
Text identification task is similar to a polytypic problem, and the most commonly used method is the convolution mind by deep layer Classify through network and Recognition with Recurrent Neural Network to text, a word represents one kind.This structure is known in the less text of classification Be in other task it is adequate, if English text identify, count capital and small letter and common punctuation mark totally 79 class in, but handling Effect is bad when such as Chinese text identification (3755 class of Chinese characters in common use) of big category task, this is because the rising of classification number must not Do not deepen the depth of neural network, the explosion of gradient disappearance gradient and network degenerate problem can be generated.
Summary of the invention
The invention proposes continuous in a kind of image based on SegLink and Attention-based CRNN fusion treatment At section text detection and recognition methods, the main thought of this method is: detecting line of text position by SegLink and carries out to it Line of text input convolutional neural networks are carried out feature extraction to it, obtained characteristic sequence are input to two-way length by correction Phase remembers Recognition with Recurrent Neural Network, completes the mapping from characteristic sequence to character string, then character string is carried out CTC transcription, obtains To final recognition result;To solve text detection in the digitlization of OCR information document, especially inclination text detection accurate rate Low, location difficulty, font cutting is difficult, the low problem of recognition accuracy.
In order to achieve the above object, the invention is realized by the following technical scheme:
Continuously at section text detection in a kind of image based on SegLink and Attention-based CRNN fusion treatment With recognition methods comprising the steps of:
Data set is simultaneously respectively divided into training set, verifying collection and test set by S1, production continuous text image data set;
S2, SegLink network model is built under Tensorflow deep learning frame, by the spy for generating different scale Inclination line of text is carried out rectification to detect the line of text of different sizes, length-width ratio by sign figure;
S3, allowable loss function: by prediction Segment confidence function, Link confidence function and predicted position error function Weighted sum obtains overall loss function optimization model;
S4, the SegLink network model of the training set in step S1 to step S2 is trained to obtain it is final Text detection model is simultaneously tested with test set;
S5, building CRNN model, extract picture feature by the network DenseNet intensively connected and export characteristic pattern, pass through The feature sequence that the contextual information of two-way shot and long term memory network BLSTM combination continuous text obtains entire DenseNet convolution Each frame of column is predicted;
S6, the Attention based CTC dubbing method decoding step S5 for the identification of random length text sequence is used Middle forecasting sequence obtains target text;
S7, it is trained the CRNN model of the training set in step S1 to step S5 to obtain final text identification mould Type, and the CRNN model is inputted with test set, obtain test result;
S8, entire text detection and identification network are trained, are completed continuously at the detection and identification of section text.
Preferably, it in the step S1, further includes: different words is generated by CycleGAN deep learning modelling Individual character data set picture is spliced into semantic line of text or text fragment, and adds by the individual character data set picture of body style Plus noise;Data set is foreign language data set or Chinese data collection, and the font style of data set is block letter or handwritten form.
Preferably, it in the step S2, further includes:
The VGG16 convolutional neural networks that pre-training is crossed are substituted for convolution as network backbone, by full articulamentum therein Layer, and the size of convolution is successively halved and generates Analysis On Multi-scale Features figure, the text image of input can be divided into slice by network Segment and link Link two parts;Wherein, Segment outlines any one part of line of text, indicates its location information; One line of text includes multiple Segment, and each Segment is connected by Link;
The Segment and Link for generating multiple predictions with the sliding window of 3*3 to the characteristic pattern of different scale, pass through fusion Segment and Link information on each scale is merged and is rejected redundancy by rule, the line of text finally predicted.
Preferably, the location information of Segment is indicated by five coordinate (x, y), long w, high h and tilt angle theta parameters;
The Analysis On Multi-scale Features figure of generation is the characteristic pattern containing six kinds of sizes, respectively [64,32,16,8,4,2], The parameter (x, y, w, h, θ) of Segment updates in the following manner:
ws=α exp (Δ ws);
hs=α exp (Δ hs);
θs=Δ θs
Wherein, xs,ysRespectively indicate cross, the ordinate of anchor point, Δ xs,ΔysIndicate horizontal, the vertical offset of prediction anchor point, ws,hsRespectively indicate the width and height of anchor box, Δ ws,ΔhsRespectively indicate wide, the high offset of prediction anchor box, θs,ΔθsTable respectively Show the rotation angle and its offset of anchor box, wI,hIRespectively indicate the width and height of original image;wf,hfRespectively indicate characteristic pattern width and It is high;Pass throughIndicate the size of receptive field;λ is weight coefficient;
Fusion method be straight line L is obtained by least-squares linear regression so that the centre coordinate of all Segment to should The distance of straight line L is minimum, centre coordinate is projected to straight line L, and two farthest coordinates is taken to be denoted as (xm,ym),(xn,yn), Along with the width w of the Segment where farthest two pointsm,wnHalf, height h takes the mean value of all anchor box height, obtains It is as follows:
Finally obtained text detection resulting text frame coordinate is (x, y, w, h, θ);Wherein, N is the quantity of anchor box, hiFor The height of i-th of anchor box.
Preferably, in the step S2, rectification process is further included: rectification is visited using the transformation of Hough line The straight line in image where text is measured, the inclination angle of straight line is calculated, then according to inclination angle rotational correction.
Preferably, in the step 3, loss function is as follows:
In formula, Lconf(ys,cs) indicate prediction Segment confidence function;Lconf(yl,cl) indicate Link confidence function;Indicate predicted position error function;λ12Indicate weight coefficient;Ns,NlRespectively indicate the quantity and Link of Segment Quantity;ys,ylRespectively indicate the label of Segment and Link;cs,clRespectively indicate the prediction for indicating Segment and Link Value;Respectively indicate the geometry and its true virtual value of prediction segment.
Preferably, in the step S5, network DenseNet includes several intensive block Dense Block and transition block It is attached between Transition Block, each Dense Block by Transition Block;Dense Block by Batch Normalization+RELU+3*3 convolutional layer is as component function;In Dense Bolck any two convolutional layer it Between have connection;Transition Block is made of bottleneck layer and pond layer.
Preferably, in the step S5, the context both direction of continuous text sequence is handled using BLSTM network Information joined three kinds of gates in LSTM network, respectively update door, forget door and out gate, formula are as follows:
Γf=σ (Wf[a< t-1 >,x< t >]+bf);
Γu=σ (Wu[a< t-1 >,x< t >]+bu);
a< t >o*tanh c< t >
Wherein, σ is activation primitive;C indicates long-term memory state;A indicates state;X indicates input;When<t>indicates current t It carves;c< t >Indicate the memory state of t moment;Wf,Wu,WcRespectively weight matrix;bf,bu,bcRespectively offset;Pass through first The output a of a upper moment t-1<t-1>With the input x of this moment t<t>, intermediate more new state is obtained by tanh function ΓuAnd ΓfIt respectively indicates and updates door and forget door, respectively value ∈ [0,1], control memory sequences in memory cell, finally by Out gate ΓoObtain the state a of t momentt
Preferably, in the step S6, CTC is by forward, backward algorithm by text image and label according to time step Alignment includes following process:
Define forwards algorithms αt(s), it initializes first:
Wherein, α1(1) indicate that first character is the probability of blank labelα1(2) indicate that first character is true The probability of first character in sequence
Subsequent recurrence formula is as follows:
Wherein,
In formula, at-1It (s) is from 0 to t-1 moment, and the t-1 moment is predicted as the forward direction probability of s-th of character in sequence l '; at-1It (s-1) is from 0 to t-1 moment, and the t-1 moment is predicted as the forward direction probability of the s-1 character in sequence l ';By at-1(s) And at-1(s-1) total forward direction probability before the two summation obtains t momentB indicates blank tag blank;L indicates image The length of middle character string;L ' indicates to introduce the sequence length after blank label, l '=2l+1;l's,l’s-2Respectively indicate l ' S, s-2 characters in sequence;Indicate that t moment is predicted as the probability of s-th of character in sequence l ';
Define backward algorithm βt(s), it initializes:
In formula,Indicate that the T moment is predicted as the probability of blank, as backward probability βT(l ') the i.e. T moment is predicted as The backward probability of the last one blank of sequence l ';Indicate that the T moment is predicted as the probability of last character in sequence l, it will It is as βT(l ' -1) the i.e. T moment is predicted as the backward probability of penultimate character in sequence l ';
Subsequent recurrence formula are as follows:
Wherein,
In formula, βt+1(s) indicate that the t+1 moment is predicted as in sequence after s-th of character to total moment T, and at the t+1 moment To probability;βt+1(s+1) indicate that the t+1 moment is predicted as in sequence the rear to general of the s+ character to total moment T, and at the t+ moment Rate;By βt+1(s) and βt+1(s+1) the two summation obtains total backward probabilityB indicates blank tag blank;L indicates image The length of middle character string;L ' indicates to introduce the sequence length after blank label, l '=2l+1;l's,l’s+2Respectively indicate l ' S, s+2 characters in sequence;Indicate that t moment is predicted as the probability of s-th of character in sequence l ';
Further, the probability of the label s when t is walked is obtained are as follows:
In formula, t indicates the moment;T indicates total moment;L indicates sequence length;S indicates s-th of character in sequence;Table Show that t moment is predicted as the probability of s-th of character in l sequence;αt(s) it indicates in t moment by all possible path formation sequence l1:s Probability;βt(s) refer to and generate subsequent sequence l by all possible paths in t moments:lProbability.
Preferably, in the step 6, the loss function of CTC are as follows:
Wherein, p (z | x) indicates given input x, the probability that list entries is z;U indicates the length of original sequence;T is indicated Moment;α (t, u) indicates the forward direction probability in t moment in node u;β (t, u) indicates the backward probability in t moment in node u.
Compared with prior art, the beneficial effects of the present invention are: (1) by SegLink carry out line of text detection, lead to The characteristic pattern of different scale is crossed to detect that different sizes, the line of text of length-width ratio are detected line of text using the thought divided and ruled It is divided into the detection of Segment and Link, promotes the precision that text box detects in conjunction with Analysis On Multi-scale Features figure, add in Segment Parameter is entered, to solve the text detection task to inclination text image and correct text by perspective transform and straight line probe method This;(2) traditional convolutional network is replaced to extract line of text feature using DenseNet, it is characterized in that in Dense Block structure Have connection between any two layers, parameter amount and calculating can be reduced by the multiplexing of characteristic pattern, thus solve gradient disappear/it is quick-fried Fried, network degenerate problem extracts more crucial feature;(3) obtained characteristic sequence is inputted on two-way LSTM combination text Sequence information hereafter completes the mapping of characteristic sequence to character string, to improve recognition accuracy;(4) finally by involvement The unified training of CRNN is avoided cutting by the CTC decoding algorithm of Attention mechanism by the prediction to each time step of sequence Error caused by point, the text identification of completion.
Detailed description of the invention
Fig. 1 be in the image of the invention based on SegLink and Attention-based CRNN fusion treatment continuously at The flow chart of section text detection and recognition methods;
Fig. 2 a- Fig. 2 c is the data set sample figure generated in the present invention;
Fig. 3 is the network structure of CycleGAN in the present invention of the invention;
Fig. 4 is the network structure of SegLink in the present invention;
Fig. 5 is the effect picture of line of text detection in the present invention;
Fig. 6 is the network structure of DenseNet in the present invention;
Fig. 7 is the effect picture of text identification in the present invention;
Fig. 8 is the structure chart of BLSTM in the present invention;
Fig. 9 is the CTC decoding structural schematic diagram that Attention mechanism has been incorporated in the present invention.
Specific embodiment
Keep the purposes, technical schemes and advantages of the embodiment of the present invention clearer, below in conjunction in the embodiment of the present invention Attached drawing, technical scheme in the embodiment of the invention is clearly and completely described.The present invention includes but are not limited to following Embodiment.
As shown in Figure 1 in the image of the invention based on SegLink and Attention-based CRNN fusion treatment Continuously at the whole implementation flow chart of section text detection and recognition methods, the specific steps are as follows:
Step 1, selection and production data set: the continuous text image data set of multiple fonts is made simultaneously using CycleGAN Data set is respectively divided into training set, verifying collection and test set.
The step 1 includes following procedure: the individual character data set figure of different fonts style is produced and generated by CycleGAN Individual character data set is spliced into semantic line of text/text fragment, and adds the noises such as inclination, rotation by piece, text line number According to the corresponding sequence label of collection picture to training, data set can be a variety of styles such as block letter or handwritten form, can also be with It is that English, Chinese etc. multilingual (selection one) are trained.
As shown in Figure 2 c, the English data set in the present embodiment is selected from IAM hand-written data library, includes handwritten form English text Data set, with the resolution scan of 300dpi and save as the png image of 256 gray levels, data are infused in word grade and row grade It releases.
As shown in Fig. 2 a- Fig. 2 b, it is and splicing the Chinese handwritten body individual character data set of HWDB1.1 that Chinese data, which collects, At, and the Chinese script data set of different writing styles is generated by CycleGAN, keep the robustness of model higher.
It is the structure chart of CycleGAN deep learning model as shown in Figure 3.CycleGAN by generator (generator) and Discriminator (Discriminator) composition, the two form confrontation network (GAN), and CycleGAN is substantially two mirror symmetries GAN, constitute a loop network.Generator trial generates sample from distribution, and the discriminator determines sample For original image or generate figure.The picture in data field A is mapped as output image B using Generator A to B, is true Guarantor's image mapping is significant, and outputting and inputting there must be some significant associations between image, therefore uses another life The Generator B to A that grows up to be a useful person will export image mapping can former data field.
Generator in this example passes through using network DenseNet as shift module (Transfer Module) Encoder-Decoder (coder-decoder) constructs network.The part Encoder includes three-layer coil lamination, and structure is Conv-Norm-ReLU, convolution kernel size are two 3*3 convolution of a 7*7 convolution sum;Transfer part is by three Dense Block Composition, growth rate (growth rate) are set to 256;With comprising three layers of deconvolution, structure is for the part Decoder Two layers is 3*3, the last layer 7*7 before DeConv-Norm-ReLU, kernel size.
Step 2 builds SegLink network model under Tensorflow deep learning frame, by generating different scale Characteristic pattern detect the line of text of different sizes, length-width ratio, and extract four apex coordinates of line of text, it is saturating with four-point method Line of text correction will be tilted by penetrating transformation and rectification.
It is illustrated in figure 4 the structure chart of SegLink network model in this example, the VGG16 convolutional Neural that pre-training is crossed Network is substituted for convolutional layer as backbone (network backbone), by full articulamentum therein, and the size of convolution is successively subtracted Half generates Analysis On Multi-scale Features figure (64,32,16,8,4,2 in such as Fig. 4), and the text image of input can be divided by network Segment (slice) and Link (link) two parts.Segment outlines any one part of line of text, indicates its position letter Breath is indicated by (x, y, w, h, θ) five parameters, respectively represents its coordinate (x, y), long (w), high (h) and tilt angle (θ).
One line of text can be composed of multiple Segment frames, and each Segment is connected by Link, to not The Segment frame and Link (as shown in Figure 4) that multiple predictions are generated with the sliding window of 3*3 with the characteristic pattern of scale, by melting Segment the and Link information on each scale is normally merged and is rejected redundancy, the line of text finally predicted.
The characteristic pattern of six kinds of sizes, respectively [64,32,16,8,4,2], the parameter of Segment are provided in this example (x, y, w, h, θ) updates in the following manner:
ws=α exp (Δ ws);
hs=α exp (Δ hs);
θs=Δ θs
Wherein, xs,ysRespectively indicate the transverse and longitudinal coordinate of anchor point;Δxs,ΔysRespectively indicate horizontal, the vertical offset of prediction anchor point Amount;ws,hsRespectively indicate the width and height of anchor box;Δws,ΔhsRespectively indicate wide, the high offset of prediction anchor box;θs,ΔθsRespectively Indicate the rotation angle and its offset of anchor box;wI,hIRespectively indicate the width and height of original image;wf,hfRespectively indicate the width of characteristic pattern And height;Pass throughIndicate the size of receptive field;λ is weight coefficient.
Fusion method in the present embodiment is that straight line L is obtained by least-squares linear regression, so that all The distance of the centre coordinate of Segment to the straight line is minimum, centre coordinate is projected to straight line L, and farthest two is taken to sit Labeled as (xm,ym),(xn,yn), along with the width w of the Segment where farthest two pointsm,wnHalf, height h takes institute There is the mean value of anchor box height, N is the quantity of anchor box, and detail formula is as follows:
Finally obtained text detection resulting text frame coordinate is (x, y, w, h, θ).
In the step 2, rectification is the straight line detected in image where text with the transformation of Hough line, is calculated straight The inclination angle of line, then according to inclination angle rotational correction.
Step 3, allowable loss function: by prediction Segment confidence function, Link confidence function and predicted position error letter These three loss function weighted sums of number, obtain overall loss function optimization model.Loss function definition in the step 3 is such as Under:
Loss function is made of above three loss subfunction, respectively prediction Segment confidence function, Link confidence letter Several and predicted position error function.
Wherein, the confidence function of Segment and Link carries out two classification using softmax, judges whether it has text And whether have link, use LconfIt indicates;Predicted position error returns loss function with Smooth L1 to calculate, and uses LlocTable Show;cs,clRespectively indicate the predicted value of Segment and Link;ys,ylRespectively indicate the label of Segment and Link;Ns,NlPoint It Biao Shi not the quantity of Segment and the quantity of Link;Respectively indicate the geometry and its ground of prediction segment Truth (true virtual value);λ12Indicate weight coefficient.
Step 4, the training set in step 1 is trained to obtain to the SegLink network model described in step 2 it is final Text detection model is simultaneously tested with test set.
It is effect picture of the SegLink to text detection as shown in Figure 5, is the text position of its positioning in frame.
Step 5, building CRNN model (Convolutional Recurrent Neural Network, convolution loop mind Through network), picture feature is extracted by the convolutional neural networks DenseNet intensively connected and exports characteristic pattern, passes through two-way length The characteristic sequence that the contextual information of phase memory network BLSTM combination continuous text obtains entire DenseNet convolution it is each Frame is predicted.
The CRNN model of the present embodiment is a kind of Attention-based CRNN convolution loop neural network, the CRNN mould Type mainly includes three parts: DenseNet convolutional neural networks, BLSTM transcribe layer to shot and long term memory network and CTC.
It is the structure chart of DenseNet network in the present embodiment as shown in Figure 6.DenseNet mainly consists of two parts, point It is not Dense Block (intensive block) and Transition Block (transition block).
It is different from conventional convolution neural network, having between any two convolutional layer in Dense Bolck of the invention Connection, i.e. each layer of output can all become all layers below of input, and each layer of input all contains all layers of front Output, to reduce parameter amount and the calculation amount in network, can retain shallow-layer feature by this structure repeatedly used features figure, Solve the problems, such as that gradient disappears and gradient is exploded.
As shown in fig. 6, Dense Block is made by Batch Normalization+RELU+3*3 convolutional layer in the present embodiment For component function.It is attached between Dense Block by Transition Block;Transition Block is by bottleneck Layer and pond layer form, and complete the dimensionality reduction of characteristic pattern with the convolutional layer of 1*1 in bottleneck layer come compression parameters.It is close due to using Collect connection structure, it is infeasible that pond layer is directly added between the layers, therefore is used between each Dense Block Pond layer is added.
As shown in fig. 6, the DenseNet network in the present embodiment is by three Dense Block and two Transition Block composition;Each Dense Block is made of the convolutional neural networks that eight convolution kernels are 3*3, Transition Block It is made of the 1*1 convolution kernel pond layer of 128 dimensions.
Fig. 7 is the structure chart of BLSTM.The present embodiment handles two sides of context of continuous text sequence using BLSTM To information, compared to Recognition with Recurrent Neural Network, joined three kinds of gates in LSTM network is respectively to update door, forget door and output Door, formula are as follows:
Γf=σ (Wf[a< t-1 >,x< t >]+bf);
Γu=σ (Wu[a< t-1 >,x< t >]+bu);
a< t >o*tanh c< t >
Wherein, c indicates long-term memory state (memory);A indicates state (stage);X indicates input;<t>indicates current T moment;c< t >Indicate the memory state of t moment;σ is activation primitive;Wf,Wu,WcFor weight matrix;bf,bu,bcFor offset; Pass through the output a of a upper moment<t-1>first< t-1 >With the input x at this moment<t><t>, centre is obtained by tanh function More new stateΓuAnd ΓfIt respectively indicates and updates door and forget door, value ∈ [0,1] remembers to control in memory cell Sequence, finally by out gate ΓoObtain the state a of t momentt
Attention network mechanism is also incorporated in the present embodiment in BLSTM network, so that each time series is pre- It surveys and all calculates output by choosing current time most suitable contextual information.
Step 6 (is incorporated using the Attention based CTC dubbing method for the identification of random length text sequence The CTC of Attention network mechanism) forecasting sequence obtains target text in decoding step 5.Wherein, CTC (Connectionist temporal classification couples chronological classification) is calculated by the forward-backward algorithm of Dynamic Programming Text image is aligned with label according to time step by method, includes following process:
(1) forwards algorithms α is definedt(s), it initializes first:
Wherein, α1(1) indicate that first character is the probability of blank labelα1(2) indicate that first character is true The probability of first character in sequenceThis probability can be provided by upper BLSTM.
Subsequent recurrence formula is as follows:
Wherein,
In formula, at-1It (s) is from 0 to t-1 moment, and the t-1 moment is predicted as the forward direction probability of s-th of character in sequence l '; at-1It (s-1) is from 0 to t-1 moment, and the t-1 moment is predicted as the forward direction probability of the s-1 character in sequence l ';Due to each Prediction can only maintain former character or be moved to character late when into subsequent time, therefore when t can be obtained in the two summation Total forward direction probability before quarterB indicates blank tag blank;L indicates the length of character string in image, and l ' expression is drawn Sequence length after entering blank label, l '=2l+1;l's,l’s-2Respectively indicate s, s-2 characters in l ' sequence;It indicates T moment is predicted as the probability of s-th of character in sequence l ', can obtain the value by BLSTM.
(2) backward algorithm β is definedt(s), it initializes:
Wherein,Indicate that the T moment is predicted as the probability of blank, as backward probability βT(l ') the i.e. T moment is predicted as The backward probability of the last one blank of sequence l ';Indicate that the T moment is predicted as the probability of last character in sequence l, it will It is as βT(l ' -1) the i.e. T moment is predicted as the backward general of penultimate character in sequence l ' (i.e. last character in l) Rate.
Subsequent recurrence formula are as follows:
Wherein,
In formula, βt+1(s) indicate that the t+1 moment is predicted as in sequence after s-th of character to total moment T, and at the t+1 moment To probability;βt+1(s+1) indicate that the t+1 moment is predicted as in sequence the rear to general of the s+ character to total moment T, and at the t+ moment Rate;It is identical as forward direction probabilistic operation, total backward probability is can be obtained into the two summationB indicates blank tag blank;L table The length of character string in diagram picture;L ' indicates to introduce the sequence length after blank label, l '=2l+1;l's,l’s+2Respectively Indicate s, s+2 characters in l ' sequence;Indicate that t moment is predicted as the probability of s-th of character in sequence l ', it can by BLSTM Obtain the value.
Therefore, the probability of the label s in t step can be obtained are as follows:
In formula, t indicates the moment;T indicates total moment;L indicates sequence length;S indicates s-th of character in sequence;Table Show that t moment is predicted as the probability of s-th of character in l sequence;αt(s) it indicates in t moment by all possible path formation sequence l1:s Probability;βt(s) refer to and generate subsequent sequence l by all possible paths in t moments:lProbability.
In the step 6, the loss function of CTC are as follows:
Wherein, p (z | x) indicates given input x, the probability that list entries is z;U indicates the length of original sequence;T is indicated Moment;α (t, u) indicates the forward direction probability in t moment in node u;β (t, u) indicates the backward probability in t moment in node u;It should Take logarithm operation that can obtain by the probability function derived above.
Attention network mechanism has been incorporated in Decoder in training in the present invention, has been based on content and position, thus Identification precision is improved, reduces leakage knowledge in OCR identification process, wrong the case where knowing, knowing more, CTC decoding structure chart is visible Fig. 9.Its principle is that traditional CTC decoding algorithm is assessed all information in front and back using function of the same race in decoding process Its weight, this is clearly unscientific;Such as when decoding the prediction of t moment, at the time of closer from t moment such as t-1, t+1's Weight shared by information should be bigger, should be more smaller to the predicted impact of t moment at the time of remoter from its moment;Therefore in the present invention In Attention mechanism has been incorporated in CTC, its weights influence is adjusted using different functions, so that prediction result is more Precisely.
Training set in step 1 is trained to obtain final text identification by step 7 to the CRNN model described in step 5 Model, and the CRNN model is inputted with test set, obtain test result.
CRNN Web vector graphic stochastic gradient descent is trained, and gradient is calculated by back-propagation algorithm;Particularly, in CTC When transcription, error carries out backpropagation using forward-backward algorithm algorithm, and in BLSTM, error passes through the backpropagation of BPTT algorithm.
Step 8 is trained entire text detection and identification network, completes continuously at the detection and identification of section text. Identification error rate in the present embodiment see the table below 1:
Table 1 is the identification error rate in embodiment
The preferred embodiment of the present invention has been described in detail above.It should be appreciated that those skilled in the art or Be universal model fan may not need creative work or by software programming can it is according to the present invention design make Many modifications and variations.Therefore, all technician in the art or universal model fan exist under this invention's idea It, all should be by weighing by the available technical solution of logical analysis, reasoning, or a limited experiment on the basis of the prior art In protection scope determined by sharp claim.

Claims (10)

1. in a kind of image based on SegLink and Attention-based CRNN fusion treatment continuously at section text detection with Recognition methods, which is characterized in that comprise the steps of:
Data set is simultaneously respectively divided into training set, verifying collection and test set by S1, production continuous text image data set;
S2, SegLink network model is built under Tensorflow deep learning frame, by the characteristic pattern for generating different scale To detect the line of text of different sizes, length-width ratio, and inclination line of text is subjected to rectification;
S3, allowable loss function: prediction Segment confidence function, Link confidence function and predicted position error function are weighted Summation, obtains overall loss function optimization model;
S4, it is trained the SegLink network model of the training set in step S1 to step S2 to obtain final text Detection model is simultaneously tested with test set;
S5, building CRNN model, extract picture feature by the network DenseNet intensively connected and export characteristic pattern, by two-way The characteristic sequence that the contextual information of shot and long term memory network BLSTM combination continuous text obtains entire DenseNet convolution Each frame is predicted;
S6, using pre- in the Attention based CTC dubbing method decoding step S5 for the identification of random length text sequence Sequencing column obtain target text;
S7, it is trained the CRNN model of the training set in step S1 to step S5 to obtain final text identification model, And the CRNN model is inputted with test set, obtain test result;
S8, entire text detection and identification network are trained, are completed continuously at the detection and identification of section text.
2. continuously at section text detection and recognition methods in image as described in claim 1, which is characterized in that the step S1 In, it further includes:
The individual character data set picture that different fonts style is generated by CycleGAN deep learning modelling, by individual character data set Picture is spliced into semantic line of text or text fragment, and adds noise;
Data set is foreign language data set or Chinese data collection, and the font style of data set is block letter or handwritten form.
3. continuously at section text detection and recognition methods in image as described in claim 1, which is characterized in that the step S2 In, it further includes:
The VGG16 convolutional neural networks that pre-training is crossed are substituted for convolutional layer as network backbone, by full articulamentum therein, and The size of convolution is successively halved and generates Analysis On Multi-scale Features figure, the text image of input can be divided into slice by network Segment and link Link two parts;Wherein, Segment outlines any one part of line of text, indicates its location information; One line of text includes multiple Segment, and each Segment is connected by Link;
The Segment and Link for generating multiple predictions with the sliding window of 3*3 to the characteristic pattern of different scale, pass through fusion rule Segment and Link information on each scale is merged and is rejected redundancy, the line of text finally predicted.
4. continuously at section text detection and recognition methods in image as claimed in claim 3, which is characterized in that Segment's Location information is indicated by five coordinate (x, y), long w, high h and tilt angle theta parameters;The Analysis On Multi-scale Features figure of generation is containing six The characteristic pattern of kind of size, respectively [64,32,16,8,4,2], the parameter (x, y, w, h, θ) of Segment is in the following manner more It is new:
ws=α exp (Δ ws);
hs=α exp (Δ hs);
θs=Δ θs
Wherein, xs,ysRespectively indicate cross, the ordinate of anchor point, Δ xs,ΔysIndicate horizontal, the vertical offset of prediction anchor point, ws,hs Respectively indicate the width and height of anchor box, Δ ws,ΔhsRespectively indicate wide, the high offset of prediction anchor box, θs,ΔθsRespectively indicate anchor The rotation angle and its offset of box, wI,hIRespectively indicate the width and height of original image;wf,hfRespectively indicate the width and height of characteristic pattern; Pass throughIndicate the size of receptive field;λ is weight coefficient;
Fusion method is that straight line L is obtained by least-squares linear regression, so that the centre coordinate of all Segment is to the straight line The distance of L is minimum, centre coordinate is projected to straight line L, and two farthest coordinates is taken to be denoted as (xm,ym),(xn,yn), then plus The width w of Segment where upper farthest two pointsm,wnHalf, height h takes the mean value of all anchor box height, obtains as follows:
Finally obtained text detection resulting text frame coordinate is (x, y, w, h, θ);Wherein, N is the quantity of anchor box, hiIt is i-th The height of anchor box.
5. continuously at section text detection and recognition methods in image as described in claim 1, which is characterized in that the step S2 In, rectification process further includes:
Rectification is the straight line detected in image where text using the transformation of Hough line, calculates the inclination angle of straight line, so Afterwards according to inclination angle rotational correction.
6. continuously at section text detection and recognition methods in image as described in claim 3 or 4, which is characterized in that
In the step 3, loss function is as follows:
In formula, Lconf(ys,cs) indicate prediction Segment confidence function;Lconf(yl,cl) indicate Link confidence function; Indicate predicted position error function;λ12Indicate weight coefficient;Ns,NlRespectively indicate the quantity of Segment and the quantity of Link; ys,ylRespectively indicate the label of Segment and Link;cs,clRespectively indicate the predicted value for indicating Segment and Link;S,Point The geometry and its true virtual value of segment Biao Shi not be predicted.
7. continuously at section text detection and recognition methods in image as described in claim 1, which is characterized in that the step S5 In, network DenseNet includes several intensive block Dense Block and transition block Transition Block, each Dense It is attached between Block by Transition Block;
Dense Block is by BatchNormalization+RELU+3*3 convolutional layer as component function;
There is connection between any two convolutional layer in Dense Bolck;
Transition Block is made of bottleneck layer and pond layer.
8. continuously at section text detection and recognition methods in image as described in claim 1, which is characterized in that the step S5 In, the information of the context both direction of continuous text sequence is handled using BLSTM network, joined three kinds in LSTM network Gate, respectively update door forget door and out gate, formula are as follows:
Γf=σ (Wf[a< t-1 >,x< t >]+bf);
Γu=σ (Wu[a< t-1 >,x< t >]+bu);
a< t >o*tanh c< t >
Wherein, σ is activation primitive;C indicates long-term memory state;A indicates state;X indicates input;
<t>indicates current t moment;c< t >Indicate the memory state of t moment;Wf,Wu,WcRespectively weight matrix;bf,bu,bcRespectively For offset;Pass through the output a of a upper moment t-1 first<t-1>With the input x of this moment t<t>, obtained by tanh function Intermediate more new stateΓuAnd ΓfIt respectively indicates and updates door and forget door, respectively value ∈ [0,1], control in memory cell Memory sequences, finally by out gate ΓoObtain the state a of t momentt
9. continuously at section text detection and recognition methods in image as described in claim 1, which is characterized in that the step S6 In, it includes following process that text image is aligned with label according to time step by CTC by forward, backward algorithm:
Define forwards algorithms αt(s), it initializes first:
Wherein, α1(1) indicate that first character is the probability of blank labelα1(2) indicate that first character is real sequence The probability of middle first character
Subsequent recurrence formula is as follows:
Wherein,
In formula, at-1It (s) is from 0 to t-1 moment, and the t-1 moment is predicted as the forward direction probability of s-th of character in sequence l ';at-1 It (s-1) is from 0 to t-1 moment, and the t-1 moment is predicted as the forward direction probability of the s-1 character in sequence l ';By at-1(s) and at-1(s-1) total forward direction probability before the two summation obtains t momentB indicates blank tag blank;L is indicated in image The length of character string;L ' indicates to introduce the sequence length after blank label, l '=2l+1;l's,l’s-2Respectively indicate l ' sequence S, s-2 characters in column;Indicate that t moment is predicted as the probability of s-th of character in sequence l ';
Define backward algorithm βt(s), it initializes:
In formula,Indicate that the T moment is predicted as the probability of blank, as backward probability βT(l ') the i.e. T moment is predicted as sequence The backward probability of the last one blank of l ';It indicates that the T moment is predicted as the probability of last character in sequence l, is made For βT(l ' -1) the i.e. T moment is predicted as the backward probability of penultimate character in sequence l ';
Subsequent recurrence formula are as follows:
Wherein,
In formula, βt+1(s) indicate that the t+1 moment is predicted as in sequence the rear to general of s-th of character to total moment T, and at the t+1 moment Rate;βt+1(s+1) indicate that the t+1 moment to total moment T, and is predicted as at the t+ moment backward probability of the s+ character in sequence;It will βt+1(s) and βt+1(s+1) the two summation obtains total backward probabilityB indicates blank tag blank;L indicates word in image Accord with the length of sequence;L ' indicates to introduce the sequence length after blank label, l '=2l+1;l's,l’s+2Respectively indicate l ' sequence In s, s+2 characters;Indicate that t moment is predicted as the probability of s-th of character in sequence l ';
Further, the probability of the label s when t is walked is obtained are as follows:
In formula, t indicates the moment;T indicates total moment;L indicates sequence length;S indicates s-th of character in sequence;When indicating t Carve the probability for being predicted as s-th of character in l sequence;αt(s) it indicates in t moment by all possible path formation sequence l1:sIt is general Rate;βt(s) refer to and generate subsequent sequence l by all possible paths in t moments:lProbability.
10. continuously at section text detection and recognition methods in image as claimed in claim 9, which is characterized in that the step 6 In, the loss function of CTC are as follows:
Wherein, p (z | x) indicates given input x, the probability that list entries is z;U indicates the length of original sequence;T indicates the moment; α (t, u) indicates the forward direction probability in t moment in node u;β (t, u) indicates the backward probability in t moment in node u.
CN201910688854.6A 2019-07-29 2019-07-29 Continuously at section text detection and recognition methods in a kind of image Pending CN110399845A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910688854.6A CN110399845A (en) 2019-07-29 2019-07-29 Continuously at section text detection and recognition methods in a kind of image

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910688854.6A CN110399845A (en) 2019-07-29 2019-07-29 Continuously at section text detection and recognition methods in a kind of image

Publications (1)

Publication Number Publication Date
CN110399845A true CN110399845A (en) 2019-11-01

Family

ID=68326437

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910688854.6A Pending CN110399845A (en) 2019-07-29 2019-07-29 Continuously at section text detection and recognition methods in a kind of image

Country Status (1)

Country Link
CN (1) CN110399845A (en)

Cited By (36)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110738262A (en) * 2019-10-16 2020-01-31 北京市商汤科技开发有限公司 Text recognition method and related product
CN110969154A (en) * 2019-11-29 2020-04-07 上海眼控科技股份有限公司 Text recognition method and device, computer equipment and storage medium
CN111191649A (en) * 2019-12-31 2020-05-22 上海眼控科技股份有限公司 Method and equipment for identifying bent multi-line text image
CN111275046A (en) * 2020-01-10 2020-06-12 中科鼎富(北京)科技发展有限公司 Character image recognition method and device, electronic equipment and storage medium
CN111265317A (en) * 2020-02-10 2020-06-12 上海牙典医疗器械有限公司 Tooth orthodontic process prediction method
CN111310762A (en) * 2020-03-16 2020-06-19 天津得迈科技有限公司 Intelligent medical bill identification method based on Internet of things
CN111414908A (en) * 2020-03-16 2020-07-14 湖南快乐阳光互动娱乐传媒有限公司 Method and device for recognizing caption characters in video
CN111428715A (en) * 2020-03-26 2020-07-17 广州市南方人力资源评价中心有限公司 Character recognition method based on neural network
CN111539309A (en) * 2020-04-21 2020-08-14 广州云从鼎望科技有限公司 Data processing method, system, platform, equipment and medium based on OCR
CN111612045A (en) * 2020-04-29 2020-09-01 杭州电子科技大学 Universal method for acquiring target detection data set
CN111626292A (en) * 2020-05-09 2020-09-04 北京邮电大学 Character recognition method of building indication mark based on deep learning technology
CN111680684A (en) * 2020-03-16 2020-09-18 广东技术师范大学 Method, device and storage medium for recognizing spine text based on deep learning
CN111738255A (en) * 2020-05-27 2020-10-02 复旦大学 Guideboard text detection and recognition algorithm based on deep learning
CN111931773A (en) * 2020-09-24 2020-11-13 北京易真学思教育科技有限公司 Image recognition method, device, equipment and storage medium
CN111967391A (en) * 2020-08-18 2020-11-20 清华大学 Text recognition method and computer-readable storage medium for medical laboratory test reports
CN112052853A (en) * 2020-09-09 2020-12-08 国家气象信息中心 Text positioning method of handwritten meteorological archive data based on deep learning
CN112115264A (en) * 2020-09-14 2020-12-22 中国科学院计算技术研究所苏州智能计算产业技术研究院 Text classification model adjusting method facing data distribution change
CN112183538A (en) * 2020-11-30 2021-01-05 华南师范大学 Manchu recognition method and system
CN112381175A (en) * 2020-12-05 2021-02-19 中国人民解放军32181部队 Circuit board identification and analysis method based on image processing
CN112418225A (en) * 2020-10-16 2021-02-26 中山大学 Offline character recognition method for address scene recognition
CN112508023A (en) * 2020-10-27 2021-03-16 重庆大学 Deep learning-based end-to-end identification method for code-spraying characters of parts
CN112528776A (en) * 2020-11-27 2021-03-19 京东数字科技控股股份有限公司 Text line correction method and device
CN112528980A (en) * 2020-12-16 2021-03-19 北京华宇信息技术有限公司 OCR recognition result correction method and terminal and system thereof
CN112560842A (en) * 2020-12-07 2021-03-26 马上消费金融股份有限公司 Information identification method, device, equipment and readable storage medium
CN112818951A (en) * 2021-03-11 2021-05-18 南京大学 Ticket identification method
CN112862024A (en) * 2021-04-28 2021-05-28 明品云(北京)数据科技有限公司 Text recognition method and system
CN112966678A (en) * 2021-03-11 2021-06-15 南昌航空大学 Text detection method and system
CN113326842A (en) * 2021-06-01 2021-08-31 武汉理工大学 Financial form character recognition method
CN113435449A (en) * 2021-08-03 2021-09-24 全知科技(杭州)有限责任公司 OCR image character recognition and paragraph output method based on deep learning
CN113516124A (en) * 2021-05-29 2021-10-19 大连民族大学 Electric energy meter electricity consumption information identification algorithm based on computer vision technology
CN113553885A (en) * 2020-04-26 2021-10-26 复旦大学 Natural scene text recognition method based on generation countermeasure network
CN114140803A (en) * 2022-01-30 2022-03-04 杭州实在智能科技有限公司 Document single word coordinate detection and correction method and system based on deep learning
CN114155530A (en) * 2021-11-10 2022-03-08 北京中科闻歌科技股份有限公司 Text recognition and question-answering method, device, equipment and medium
CN114495114A (en) * 2022-04-18 2022-05-13 华南理工大学 Text sequence identification model calibration method based on CTC decoder
WO2022147965A1 (en) * 2021-01-09 2022-07-14 江苏拓邮信息智能技术研究院有限公司 Arithmetic question marking system based on mixnet-yolov3 and convolutional recurrent neural network (crnn)
EP4047519A1 (en) 2021-02-22 2022-08-24 Carl Zeiss Vision International GmbH Devices and methods for processing eyeglass prescriptions

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109214382A (en) * 2018-07-16 2019-01-15 顺丰科技有限公司 A kind of billing information recognizer, equipment and storage medium based on CRNN
CN109272048A (en) * 2018-09-30 2019-01-25 北京工业大学 A kind of mode identification method based on depth convolutional neural networks
CN109408776A (en) * 2018-10-09 2019-03-01 西华大学 A kind of calligraphy font automatic generating calculation based on production confrontation network
CN109726657A (en) * 2018-12-21 2019-05-07 万达信息股份有限公司 A kind of deep learning scene text recognition sequence method
CN109800749A (en) * 2019-01-17 2019-05-24 湖南师范大学 A kind of character recognition method and device
CN109886174A (en) * 2019-02-13 2019-06-14 东北大学 A kind of natural scene character recognition method of warehouse shelf Sign Board Text region
CN109919060A (en) * 2019-02-26 2019-06-21 上海七牛信息技术有限公司 A kind of identity card content identifying system and method based on characteristic matching
CN109993803A (en) * 2019-02-25 2019-07-09 复旦大学 The intellectual analysis and evaluation method of city tone
CN110032998A (en) * 2019-03-18 2019-07-19 华南师范大学 Character detecting method, system, device and the storage medium of natural scene picture
CN110059694A (en) * 2019-04-19 2019-07-26 山东大学 The intelligent identification Method of lteral data under power industry complex scene

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109214382A (en) * 2018-07-16 2019-01-15 顺丰科技有限公司 A kind of billing information recognizer, equipment and storage medium based on CRNN
CN109272048A (en) * 2018-09-30 2019-01-25 北京工业大学 A kind of mode identification method based on depth convolutional neural networks
CN109408776A (en) * 2018-10-09 2019-03-01 西华大学 A kind of calligraphy font automatic generating calculation based on production confrontation network
CN109726657A (en) * 2018-12-21 2019-05-07 万达信息股份有限公司 A kind of deep learning scene text recognition sequence method
CN109800749A (en) * 2019-01-17 2019-05-24 湖南师范大学 A kind of character recognition method and device
CN109886174A (en) * 2019-02-13 2019-06-14 东北大学 A kind of natural scene character recognition method of warehouse shelf Sign Board Text region
CN109993803A (en) * 2019-02-25 2019-07-09 复旦大学 The intellectual analysis and evaluation method of city tone
CN109919060A (en) * 2019-02-26 2019-06-21 上海七牛信息技术有限公司 A kind of identity card content identifying system and method based on characteristic matching
CN110032998A (en) * 2019-03-18 2019-07-19 华南师范大学 Character detecting method, system, device and the storage medium of natural scene picture
CN110059694A (en) * 2019-04-19 2019-07-26 山东大学 The intelligent identification Method of lteral data under power industry complex scene

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
ALEX GRAVES等: "Connectionist temporal classi_cation: labelling unsegmented sequence data with recurrent neural networks", 《PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON MACHINE LEARNING》 *
白翔 等: "基于深度学习的场景文字检测与识别", 《中国科学》 *
石葆光: "基于深度学习的自然场景文字检测与识别方法研究", 《中国博士学位论文全文数据库 信息科技辑》 *

Cited By (54)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110738262A (en) * 2019-10-16 2020-01-31 北京市商汤科技开发有限公司 Text recognition method and related product
CN110969154A (en) * 2019-11-29 2020-04-07 上海眼控科技股份有限公司 Text recognition method and device, computer equipment and storage medium
CN111191649A (en) * 2019-12-31 2020-05-22 上海眼控科技股份有限公司 Method and equipment for identifying bent multi-line text image
CN111275046B (en) * 2020-01-10 2024-04-16 鼎富智能科技有限公司 Character image recognition method and device, electronic equipment and storage medium
CN111275046A (en) * 2020-01-10 2020-06-12 中科鼎富(北京)科技发展有限公司 Character image recognition method and device, electronic equipment and storage medium
CN111265317A (en) * 2020-02-10 2020-06-12 上海牙典医疗器械有限公司 Tooth orthodontic process prediction method
CN111680684B (en) * 2020-03-16 2023-09-05 广东技术师范大学 Spine text recognition method, device and storage medium based on deep learning
CN111414908A (en) * 2020-03-16 2020-07-14 湖南快乐阳光互动娱乐传媒有限公司 Method and device for recognizing caption characters in video
CN111680684A (en) * 2020-03-16 2020-09-18 广东技术师范大学 Method, device and storage medium for recognizing spine text based on deep learning
CN111310762A (en) * 2020-03-16 2020-06-19 天津得迈科技有限公司 Intelligent medical bill identification method based on Internet of things
CN111414908B (en) * 2020-03-16 2023-08-29 湖南快乐阳光互动娱乐传媒有限公司 Method and device for recognizing caption characters in video
CN111428715A (en) * 2020-03-26 2020-07-17 广州市南方人力资源评价中心有限公司 Character recognition method based on neural network
CN111539309A (en) * 2020-04-21 2020-08-14 广州云从鼎望科技有限公司 Data processing method, system, platform, equipment and medium based on OCR
CN113553885A (en) * 2020-04-26 2021-10-26 复旦大学 Natural scene text recognition method based on generation countermeasure network
CN111612045A (en) * 2020-04-29 2020-09-01 杭州电子科技大学 Universal method for acquiring target detection data set
CN111612045B (en) * 2020-04-29 2023-06-23 杭州电子科技大学 Universal method for acquiring target detection data set
CN111626292A (en) * 2020-05-09 2020-09-04 北京邮电大学 Character recognition method of building indication mark based on deep learning technology
CN111626292B (en) * 2020-05-09 2023-06-30 北京邮电大学 Text recognition method of building indication mark based on deep learning technology
CN111738255A (en) * 2020-05-27 2020-10-02 复旦大学 Guideboard text detection and recognition algorithm based on deep learning
CN111967391A (en) * 2020-08-18 2020-11-20 清华大学 Text recognition method and computer-readable storage medium for medical laboratory test reports
CN112052853A (en) * 2020-09-09 2020-12-08 国家气象信息中心 Text positioning method of handwritten meteorological archive data based on deep learning
CN112052853B (en) * 2020-09-09 2024-02-02 国家气象信息中心 Text positioning method of handwriting meteorological archive data based on deep learning
CN112115264B (en) * 2020-09-14 2024-03-22 中科苏州智能计算技术研究院 Text classification model adjustment method for data distribution change
CN112115264A (en) * 2020-09-14 2020-12-22 中国科学院计算技术研究所苏州智能计算产业技术研究院 Text classification model adjusting method facing data distribution change
CN111931773A (en) * 2020-09-24 2020-11-13 北京易真学思教育科技有限公司 Image recognition method, device, equipment and storage medium
CN111931773B (en) * 2020-09-24 2022-01-28 北京易真学思教育科技有限公司 Image recognition method, device, equipment and storage medium
CN112418225B (en) * 2020-10-16 2023-07-21 中山大学 Offline text recognition method for address scene recognition
CN112418225A (en) * 2020-10-16 2021-02-26 中山大学 Offline character recognition method for address scene recognition
CN112508023A (en) * 2020-10-27 2021-03-16 重庆大学 Deep learning-based end-to-end identification method for code-spraying characters of parts
CN112528776A (en) * 2020-11-27 2021-03-19 京东数字科技控股股份有限公司 Text line correction method and device
CN112528776B (en) * 2020-11-27 2024-04-09 京东科技控股股份有限公司 Text line correction method and device
CN112183538B (en) * 2020-11-30 2021-03-02 华南师范大学 Manchu recognition method and system
CN112183538A (en) * 2020-11-30 2021-01-05 华南师范大学 Manchu recognition method and system
CN112381175A (en) * 2020-12-05 2021-02-19 中国人民解放军32181部队 Circuit board identification and analysis method based on image processing
CN112560842B (en) * 2020-12-07 2021-10-22 马上消费金融股份有限公司 Information identification method, device, equipment and readable storage medium
CN112560842A (en) * 2020-12-07 2021-03-26 马上消费金融股份有限公司 Information identification method, device, equipment and readable storage medium
CN112528980B (en) * 2020-12-16 2022-02-15 北京华宇信息技术有限公司 OCR recognition result correction method and terminal and system thereof
CN112528980A (en) * 2020-12-16 2021-03-19 北京华宇信息技术有限公司 OCR recognition result correction method and terminal and system thereof
WO2022147965A1 (en) * 2021-01-09 2022-07-14 江苏拓邮信息智能技术研究院有限公司 Arithmetic question marking system based on mixnet-yolov3 and convolutional recurrent neural network (crnn)
EP4047519A1 (en) 2021-02-22 2022-08-24 Carl Zeiss Vision International GmbH Devices and methods for processing eyeglass prescriptions
WO2022175511A1 (en) 2021-02-22 2022-08-25 Carl Zeiss Vision International Gmbh Devices and methods for processing eyeglass prescriptions
CN112818951B (en) * 2021-03-11 2023-11-21 南京大学 Ticket identification method
CN112818951A (en) * 2021-03-11 2021-05-18 南京大学 Ticket identification method
CN112966678A (en) * 2021-03-11 2021-06-15 南昌航空大学 Text detection method and system
CN112862024A (en) * 2021-04-28 2021-05-28 明品云(北京)数据科技有限公司 Text recognition method and system
CN113516124B (en) * 2021-05-29 2023-08-11 大连民族大学 Electric energy meter electricity consumption identification algorithm based on computer vision technology
CN113516124A (en) * 2021-05-29 2021-10-19 大连民族大学 Electric energy meter electricity consumption information identification algorithm based on computer vision technology
CN113326842A (en) * 2021-06-01 2021-08-31 武汉理工大学 Financial form character recognition method
CN113435449B (en) * 2021-08-03 2023-08-22 全知科技(杭州)有限责任公司 OCR image character recognition and paragraph output method based on deep learning
CN113435449A (en) * 2021-08-03 2021-09-24 全知科技(杭州)有限责任公司 OCR image character recognition and paragraph output method based on deep learning
CN114155530A (en) * 2021-11-10 2022-03-08 北京中科闻歌科技股份有限公司 Text recognition and question-answering method, device, equipment and medium
CN114140803B (en) * 2022-01-30 2022-06-17 杭州实在智能科技有限公司 Document single word coordinate detection and correction method and system based on deep learning
CN114140803A (en) * 2022-01-30 2022-03-04 杭州实在智能科技有限公司 Document single word coordinate detection and correction method and system based on deep learning
CN114495114A (en) * 2022-04-18 2022-05-13 华南理工大学 Text sequence identification model calibration method based on CTC decoder

Similar Documents

Publication Publication Date Title
CN110399845A (en) Continuously at section text detection and recognition methods in a kind of image
CN107368831B (en) English words and digit recognition method in a kind of natural scene image
Álvaro et al. An integrated grammar-based approach for mathematical expression recognition
Li et al. Improving attention-based handwritten mathematical expression recognition with scale augmentation and drop attention
Naz et al. Urdu Nasta’liq text recognition system based on multi-dimensional recurrent neural network and statistical features
Ray et al. Text recognition using deep BLSTM networks
CN111259930A (en) General target detection method of self-adaptive attention guidance mechanism
CN109492630A (en) A method of the word area detection positioning in the financial industry image based on deep learning
CN110427937A (en) A kind of correction of inclination license plate and random length licence plate recognition method based on deep learning
CN112818159A (en) Image description text generation method based on generation countermeasure network
Chen et al. Simultaneous script identification and handwriting recognition via multi-task learning of recurrent neural networks
CN113449801B (en) Image character behavior description generation method based on multi-level image context coding and decoding
Liu et al. ASTS: A unified framework for arbitrary shape text spotting
CN112149665A (en) High-performance multi-scale target detection method based on deep learning
CN110503090B (en) Character detection network training method based on limited attention model, character detection method and character detector
Inunganbi et al. Handwritten Meitei Mayek recognition using three‐channel convolution neural network of gradients and gray
CN113378919B (en) Image description generation method for fusing visual sense and enhancing multilayer global features
CN113420833A (en) Visual question-answering method and device based on question semantic mapping
Azizah et al. Tajweed-YOLO: Object Detection Method for Tajweed by Applying HSV Color Model Augmentation on Mushaf Images
CN110738123B (en) Method and device for identifying densely displayed commodities
Budiwati et al. Japanese character (Kana) pattern recognition application using neural network
Vankadaru et al. Text Identification from Handwritten Data using Bi-LSTM and CNN with FastAI
Echi Attention-based CNN-ConvLSTM for Handwritten Arabic Word Extraction
Yu et al. An efficient prototype-based model for handwritten text recognition with multi-loss fusion
Bi et al. Chinese character captcha sequential selection system based on convolutional neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20191101

RJ01 Rejection of invention patent application after publication