CN110135419A - End-to-end text recognition method under a kind of natural scene - Google Patents

End-to-end text recognition method under a kind of natural scene Download PDF

Info

Publication number
CN110135419A
CN110135419A CN201910371620.9A CN201910371620A CN110135419A CN 110135419 A CN110135419 A CN 110135419A CN 201910371620 A CN201910371620 A CN 201910371620A CN 110135419 A CN110135419 A CN 110135419A
Authority
CN
China
Prior art keywords
natural scene
neighbour
identification
optimization algorithm
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910371620.9A
Other languages
Chinese (zh)
Other versions
CN110135419B (en
Inventor
李武军
陈雨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN201910371620.9A priority Critical patent/CN110135419B/en
Publication of CN110135419A publication Critical patent/CN110135419A/en
Application granted granted Critical
Publication of CN110135419B publication Critical patent/CN110135419B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • G06F18/24143Distances to neighbourhood prototypes, e.g. restricted Coulomb energy networks [RCEN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • G06V20/63Scene text, e.g. street names
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)
  • Character Discrimination (AREA)

Abstract

The invention discloses end-to-end text recognition methods under a kind of natural scene, including text filed and content is predicted with natural scene picture and authentic signature training frame and to natural scene on piece: in the training stage, collect the picture under the natural scene comprising text, building includes the data set of text position and content, the end-to-end text identification frame of definition standard, training detection part is marked using true detection, use neighbour's relevant border optimization algorithm optimizing detection region, using the detection zone after optimization input into identification division to train identification division parameter, trained frame parameter is saved to data platform;In test phase, read trained frame parameter, input test image, detection-phase detect it is text filed, using based on neighbour's coherency boundary's optimization algorithm optimizing detection region, by after optimization detection zone be sent into identification division carry out text identification.

Description

End-to-end text recognition method under a kind of natural scene
Technical field
The present invention relates to end-to-end text identification sides under a kind of natural scene based on neighbour's coherency boundary's optimization algorithm Method, is related to end-to-end text identification under natural scene, inaccurately leads to asking for recognition failures especially suitable for detection zone boundary Topic.
Background technique
End-to-end text identification task objective under natural scene is that input one includes text filed natural scene Piece should detect Pictures location, also identify corresponding position content of text.In end-to-end text identification task, identification The influence of the examined stage accuracy of the accuracy in stage is very high, and only detection-phase has accurately framed word all in text Mother, cognitive phase could export accurate recognition result.Particularly, existing end-to-end text frame is for long text or big text The Boundary Prediction inaccuracy in region, this brings certain difficulty to subsequent identification mission.
Existing common post-processing algorithm such as non-maximum restraining (Non-Maximum Suppression, abbreviation NMS) is calculated Method or local sensing non-maximum restraining (Locality-Aware NMS, abbreviation LANMS) algorithm, can only will be adjacent and hand over and compare Big region, which is done, to be merged, and is not required the accuracy on boundary, and inaccurate boundary is likely to be obtained which results in detection process, To influence recognition result.
Summary of the invention
Goal of the invention: in current end-to-end text identification frame, the boundary accuracy of testing result is not defined It is required that available frame boundary usual for the testing result of long text, big text is inaccurate, even without by the complete frame of text Out, which results in the inaccuracy of recognition result.In view of the above-mentioned problems, the present invention devises the optimization of the boundary based on neighbour's correlation Algorithm, has invented the end-to-end text identification deep learning frame using the algorithm, and method describes frame structure, frame training Process, framework test process solve the problems, such as that Boundary Prediction is inaccurate with this, improve the precision of end-to-end task.
Technical solution: end-to-end text recognition method under a kind of natural scene, including optimized based on neighbour coherency boundary The end-to-end text identification deep learning frame training of algorithm, and using trained frame to text filed in natural scene And content carries out the test process of end-to-end identification.
The end-to-end text identification deep learning frame based on neighbour's coherency boundary's optimization algorithm is trained specific Step are as follows:
Step 100, natural scene image, authentic signature region, authentic signature string to data processing platform (DPP) are inputted;
Step 101, input natural scene picture is pre-processed, carries out the operation such as Random-Rotation, sampling, normalization;
Step 102, use the true class figure of authentic signature Area generation and true geometric figure using as training supervision message;
Step 103, sharing feature part, the detection part, the weight of identification division each section of entire frame are initialized;
Step 104, in data processing platform (DPP), natural scene image, true class figure, true geometric figure, true mark are used Note string, with the entire frame of the training of method end to end;It the steps include: that natural scene image first passes around sharing feature part, obtain To sharing feature figure;Detection part generates testing result using sharing feature figure;The optimization inspection of neighbour's coherency boundary's optimization algorithm Survey result;The bilinear interpolation acted on sharing feature figure will test area sampling and obtain identification feature;Identification division utilizes The identification feature of input obtains recognition result;
Step 105, export and save the storage system of frame each section weight to data processing platform (DPP).
It is right using the trained end-to-end text identification deep learning frame based on neighbour's coherency boundary's optimization algorithm Text filed and content carries out the test of end-to-end identification in natural scene, tests specific steps are as follows:
Step 200, natural scene image is inputted to data processing platform (DPP);
Step 201, read trained frame each section weight for having saved, including sharing feature part, detection part, The weight of identification division each section;
Step 202, natural scene image first passes around sharing feature part, obtains sharing feature figure;Detection part utilizes Sharing feature figure generates testing result;Neighbour's coherency boundary's optimization algorithm optimizing detection result;It acts on sharing feature figure Bilinear interpolation will test area sampling and obtain identification feature;Identification division obtains identification knot using the identification feature of input Fruit.
The end-to-end text identification deep learning frame based on neighbour's coherency boundary's optimization algorithm, wherein sharing Characteristic extracts sharing feature using the U-shaped framework based on residual error neural network;U-shaped framework using the first coding module with The mode that first decoder module connects in succession obtains sharing feature;
First coding module includes the down-sampling structure between the convolutional coding structure of multilayer convolutional coding structure and adjacent layer, institute Down-sampling structure is stated for carrying out down-sampling to the characteristic pattern of the upper layer convolutional coding structure output in the convolutional coding structure of adjacent layer and inciting somebody to action The characteristic pattern of down-sampling inputs lower layer's convolutional coding structure in adjacent convolutional coding structure;
First decoder module includes the up-sampling structure between the convolutional coding structure of multilayer convolutional coding structure and adjacent layer, institute State up-sampling structure be used for in the convolutional coding structure of adjacent layer lower layer's convolutional coding structure output characteristic pattern up-sampled and incite somebody to action The characteristic pattern of up-sampling inputs the upper layer convolutional coding structure in adjacent convolutional coding structure.
The class figure and geometric graph that convolution for several times generates prediction is respectively adopted in the detection part in sharing feature.
The boundary optimization algorithm based on neighbour's correlation, it is contemplated that the point on characteristic pattern is to close.Input is inspection Survey the class figure F of fractional predictionscoreWith geometric graph Fgeo, obtained according to class figure with geometric graph single text filedScore threshold st, depend on distance threshold rtConfidence level function fc;It the steps include:
Step 501, for single text filedIt obtains being pertaining only to the region, and in class figure FscoreOn class probability Greater than stPoint set
Step 502, rightMiddle every bit p, calculate this away from region, it is right, under, the distance on left four sides
Step 503, according to distanceAnd confidence level function fc, calculate confidence level
Step 504, rightMiddle every bit p and geometric graph Fgeo, calculate the region that the point itself is predicted
Step 505, according toThe respective confidence level of middle all the pointsAnd the region of predictionIt is logical It crosses average weighted process and calculates final region
Weighted mean procedure described in algorithm calculates final areaProcess, it is assumed that useIndicate regionAn apex coordinate, region is quadrangle, with i=1,2,3,4 respectively indicate the upper left corner in region, the upper right corner, the lower right corner, Four, lower left corner vertex, then the weighting procedure of coordinate can be described with following formula:
Confidence level function f described in algorithmcDesign, can be used following form:
The identification division obtains prediction text in such a way that the second coding module is connect in succession with the second decoder module This string;Wherein the second coding module includes the down-sampling structure between multilayer convolutional coding structure and adjacent convolutional coding structure, the second decoding Module is used based on long Memory Neural Networks structure in short-term.
The bilinear interpolation sampling section is found corresponding for a testing result region on sharing feature figure Position, carry out bilinear interpolation sampling to it, obtain identification feature figure.
The utility model has the advantages that compared with prior art, the end provided by the invention based on neighbour's coherency boundary's optimization algorithm is arrived Text recognition method is held, point on characteristic pattern has been used and the essence on testing result boundary is improved to the accurate property of neighbor prediction Degree, to improve the result of end-to-end task.
Detailed description of the invention
Fig. 1 is the flow chart based on neighbour's coherency boundary's optimization algorithm that the present invention is implemented;
Fig. 2 is the end-to-end text identification deep learning frame based on neighbour's coherency boundary's optimization algorithm that the present invention designs In the sharing feature layer of frame, the first decoder module and U-shaped network diagram;
Fig. 3 is the end-to-end text identification deep learning frame based on neighbour's coherency boundary's optimization algorithm that the present invention designs Frame training process flow chart;
Fig. 4 is the flow chart of the frame of specifically used learning algorithm training;
Fig. 5 is the end-to-end text identification deep learning frame based on neighbour's coherency boundary's optimization algorithm that the present invention designs Frame test process flow chart.
Specific embodiment
Combined with specific embodiments below, the present invention is furture elucidated, it should be understood that these embodiments are merely to illustrate the present invention Rather than limit the scope of the invention, after the present invention has been read, those skilled in the art are to various equivalences of the invention The modification of form falls within the application range as defined in the appended claims.
End-to-end text identification deep learning frame based on neighbour's coherency boundary's optimization algorithm, structure are divided into shared spy Levy several parts such as part, detection part, boundary optimization algorithm part, bilinear interpolation sampling section, identification division.
Sharing feature part can be used the U-shaped framework based on residual error neural network and extract sharing feature;U-shaped framework is using the One coding module obtains sharing feature with the mode that the first decoder module is connect in succession;First coding module includes multilayer convolution knot Down-sampling structure between the convolutional coding structure of structure and adjacent layer, down-sampling structure are used for the upper layer in the convolutional coding structure of adjacent layer The characteristic pattern of convolutional coding structure output carries out down-sampling and the characteristic pattern of down-sampling is inputted to lower layer's convolution in adjacent convolutional coding structure Structure;First decoder module includes the up-sampling structure between the convolutional coding structure of multilayer convolutional coding structure and adjacent layer, up-sampling knot Structure is used to carry out up-sampling and by the spy of up-sampling to the characteristic pattern of lower layer's convolutional coding structure output in the convolutional coding structure of adjacent layer Sign figure inputs the upper layer convolutional coding structure in adjacent convolutional coding structure.
The class figure and geometric graph that convolution for several times generates prediction is respectively adopted in detection part in sharing feature.
Boundary optimization algorithm core concept based on neighbour's correlation is that the prediction to certain boundary only takes the boundary attached The close point point high as confidence level is weighted and averaged.Process is as shown in Figure 1.Input is the class figure F of detection part predictionscoreWith Geometric graph Fgeo, obtained according to class figure with geometric graph single text filedScore threshold st, depend on distance threshold rtSet Belief function fc;It the steps include:
For single text filedIt obtains being pertaining only to the region, and in class figure FscoreOn class probability be greater than stPoint Collection
It is rightMiddle every bit p, calculate this away from region, it is right, under, the distance on left four sides
According to distanceAnd confidence level function fc, calculate confidence level
It is rightMiddle every bit p and geometric graph Fgeo, calculate the region that the point itself is predicted
According toThe respective confidence level of middle all the pointsAnd the region of predictionIt is flat by weighting Equal process calculates final region
Wherein weighted mean procedure calculates final areaProcess, it is assumed that useIndicate regionOne A apex coordinate, region are quadrangle, and the upper left corner, the upper right corner, the lower right corner, the lower left corner in region are respectively indicated with i=1,2,3,4 Four vertex, then the weighting procedure of coordinate can be described with following formula:
Confidence level function fcDesign, can be used following form:
Threshold parameter can be chosen according to practical problem, such as desirable st=0.7, rt=0.01.
Identification division obtains prediction text string in such a way that the second coding module is connect in succession with the second decoder module;Its In the second coding module include down-sampling structure between multilayer convolutional coding structure and adjacent convolutional coding structure, the second decoder module uses Based on long Memory Neural Networks structure in short-term.
Bilinear interpolation sampling section finds corresponding position for a testing result region on sharing feature figure, Bilinear interpolation sampling is carried out to it, obtains identification feature figure.
Table 1 is that the end-to-end text identification deep learning frame based on neighbour's coherency boundary's optimization algorithm shares convolutional layer The first coding module, module is by the down-sampling structure group between the convolutional coding structure of a series of multilayer convolutional coding structure and adjacent layer At: output size is characterized figure in the size of space scale in figure;[n × n, m] represent the convolution kernel size of current convolution kernel as [n × n], port number m;The residual error convolution block of layer 2,3,4,5 can be respectively repeated 3 times.
Table 1
Fig. 2 is that the end-to-end text identification deep learning frame based on neighbour's coherency boundary's optimization algorithm shares convolutional layer First decoder module and U-shaped network, decoder module include the up-sampling between the convolutional coding structure of multilayer convolutional coding structure and adjacent layer Structure, U-shaped network obtain sharing feature in such a way that the first coding module is connect in succession with the first decoder module: U-shaped in figure The left side of network is the first coding module, and right side is the first decoder module, and conv, concat, upsampling respectively represent volume Product, process channel connection, up-sampled.
Table 2 is the end-to-end text identification deep learning frame identification division based on neighbour's coherency boundary's optimization algorithm Second coding module, module is by the down-sampling structure group between the convolutional coding structure of a series of multilayer convolutional coding structure and adjacent layer At: input layer, convolutional layer, pond layer are respectively represented for input, conv, pool layers in figure.
Table 2
Second of end-to-end text identification deep learning frame identification division based on neighbour's coherency boundary's optimization algorithm Decoder module can be used based on two-way length Memory Neural Networks structure in short-term, input identification feature with this to obtain prediction string.
Fig. 3 is the end-to-end text identification deep learning frame training process based on neighbour's coherency boundary's optimization algorithm Flow chart, training process are described as follows: when training starts, frame initializes sharing feature part, detection part, identification first The parameter (weight) of part three parts;Input a series of corresponding natural scene pictures, real estate position, real text string extremely After data processing platform (DPP), input natural scene picture is pre-processed, carries out the operation such as Random-Rotation, sampling, normalization;According to Real estate position generates true class figure and true geometric figure;Sharing feature layer is shared according to the natural scene picture of input Feature;Sharing feature part after testing obtains prediction class figure and predicts geometric graph, obtains detection zone accordingly;Boundary optimization Algorithm acts in detection zone, the detection zone after obtaining boundary optimization;Detection zone after being optimized according to boundary, bilinearity Interpolation sampling acts in sharing feature, obtains identification feature;Identification feature passes through identification division, obtains prediction text string;In advance It surveys class figure and true class figure, prediction geometric graph and true geometric figure, prediction text string and real text string calculates separately loss, return Pass gradient and undated parameter;As above training terminates until reaching termination condition (being greater than threshold value as updated wheel number) training;Storage instruction The parameter perfected;Terminate.
Fig. 4 is the specifically used learning algorithm training block flow diagram.Steps are as follows: when training starts described in initialization Frame each section parameter;Input natural scene picture, real estate position, real text string;Frame is according to real estate position Generate true class figure, true geometric figure;Frame handles natural scene picture, generates prediction class figure, prediction geometric graph and prediction text This string;Frame measures loss between true class figure and prediction class figure using cross entropy loss function, using handing over and compare loss function And cosine losses function measures loss between true geometric figure and prediction geometric graph, measures true text using ctc loss function It is lost between this string and prediction text string;Frame calculates whole loss;Gradient is returned by back-propagation algorithm;Frame uses SGD algorithm updates each section parameter;Such as reach termination condition (being greater than threshold value as updated wheel number), then storing parameter terminates;If not Reach, then inputs new natural scene picture, real estate position, real text string, start the training of a new round.
Fig. 5 is the end-to-end text identification deep learning framework test process stream based on neighbour's coherency boundary's optimization algorithm Cheng Tu, test process are described as follows: when test starts, data processing platform (DPP) reads trained each section parameter initialization frame Frame;Read picture to be tested;After picture is via sharing feature layer, sharing feature is obtained;Sharing feature is obtained via detection part To prediction class figure and prediction geometric graph, detection zone is obtained accordingly;Boundary optimization algorithm acts on detection zone, obtains side Detection zone after boundary's optimization, i.e. estimation range;According to estimation range, bilinear interpolation sampling action obtains in sharing feature To identification feature;Identification feature obtains prediction text string via identification division;Estimation range and prediction text string are finally exported, End-to-end text identification task terminates.

Claims (10)

1. end-to-end text recognition method under a kind of natural scene based on neighbour's coherency boundary's optimization algorithm, feature exist In, it is trained including the end-to-end text identification deep learning frame based on neighbour's coherency boundary's optimization algorithm, and utilize instruction The frame perfected carries out the test process of end-to-end identification to text filed and content in natural scene;
The specific steps of the end-to-end text identification deep learning frame training based on neighbour's coherency boundary's optimization algorithm Are as follows:
Step 100, natural scene image, authentic signature region, authentic signature string to data processing platform (DPP) are inputted;
Step 101, input natural scene picture is pre-processed;
Step 102, use the true class figure of authentic signature Area generation and true geometric figure using as training supervision message;
Step 103, sharing feature part, the detection part, the weight of identification division each section of entire frame are initialized;
Step 104, in data processing platform (DPP), using natural scene image, true class figure, true geometric figure, authentic signature string, With the entire frame of the training of method end to end;It the steps include: that natural scene image first passes around sharing feature part, shared Characteristic pattern;Detection part generates testing result using sharing feature figure;Neighbour's coherency boundary's optimization algorithm optimizing detection result; The bilinear interpolation acted on sharing feature figure will test area sampling and obtain identification feature;Identification division utilizes the knowledge inputted Other feature obtains recognition result;
Step 105, export and save the storage system of frame each section parameter to data processing platform (DPP).
2. end-to-end text identification side under the natural scene as described in claim 1 based on neighbour's coherency boundary's optimization algorithm Method, which is characterized in that utilize the trained end-to-end text identification deep learning based on neighbour's coherency boundary's optimization algorithm Frame carries out the test of end-to-end identification to text filed and content in natural scene, tests specific steps are as follows:
Step 200, natural scene image is inputted to data processing platform (DPP);
Step 201, the trained frame each section weight saved, including sharing feature part, detection part, identification are read The weight of part each section;
Step 202, natural scene image first passes around sharing feature part, obtains sharing feature figure;Detection part utilizes shared Characteristic pattern generates testing result;Neighbour's coherency boundary's optimization algorithm optimizing detection result;It acts on double on sharing feature figure Linear interpolation will test area sampling and obtain identification feature;Identification division obtains recognition result using the identification feature of input.
3. end-to-end text identification side under the natural scene as described in claim 1 based on neighbour's coherency boundary's optimization algorithm Method, which is characterized in that the end-to-end text identification deep learning frame based on neighbour's coherency boundary's optimization algorithm, Sharing feature is extracted using the U-shaped framework based on residual error neural network in middle sharing feature part;U-shaped framework is using the first coding Module obtains sharing feature with the mode that the first decoder module is connect in succession;
First coding module includes the down-sampling structure between the convolutional coding structure of multilayer convolutional coding structure and adjacent layer, under described Sampling structure is used to carry out down-sampling to the characteristic pattern of the upper layer convolutional coding structure output in the convolutional coding structure of adjacent layer and adopt by under The characteristic pattern of sample inputs lower layer's convolutional coding structure in adjacent convolutional coding structure;
First decoder module includes the up-sampling structure between the convolutional coding structure of multilayer convolutional coding structure and adjacent layer, it is described on Sampling structure is used to up-sample and adopt by to the characteristic pattern of lower layer's convolutional coding structure output in the convolutional coding structure of adjacent layer The characteristic pattern of sample inputs the upper layer convolutional coding structure in adjacent convolutional coding structure.
4. end-to-end text identification side under the natural scene as claimed in claim 2 based on neighbour's coherency boundary's optimization algorithm Method, which is characterized in that the class figure and geometry that convolution for several times generates prediction is respectively adopted in the detection part in sharing feature Figure.
5. end-to-end text identification side under the natural scene as described in claim 1 based on neighbour's coherency boundary's optimization algorithm Method, which is characterized in that the boundary optimization algorithm based on neighbour's correlation, it is contemplated that the point on characteristic pattern is to close.It is defeated Enter the class figure F for detection part predictionscoreWith geometric graph Fgeo, obtained according to class figure with geometric graph single text filedPoint Number threshold value st, depend on distance threshold rtConfidence level function fc;It the steps include:
Step 501, for single text filedIt obtains being pertaining only to the region, and in class figure FscoreOn class probability be greater than st Point set
Step 502, rightMiddle every bit p, calculate this away from region, it is right, under, the distance on left four sides
Step 503, according to distanceAnd confidence level function fc, calculate confidence level
Step 504, rightMiddle every bit p and geometric graph Fgeo, calculate the region that the point itself is predicted
Step 505, according toThe respective confidence level of middle all the pointsAnd the region of predictionBy adding The process of weight average calculates final region
6. end-to-end text identification side under the natural scene as claimed in claim 5 based on neighbour's coherency boundary's optimization algorithm Method, which is characterized in that weighted mean procedure described in algorithm calculates final areaProcess, it is assumed that useTable Show regionAn apex coordinate, region is quadrangle, with i=1,2,3,4 respectively indicate the upper left corner in region, the upper right corner, The lower right corner, four, lower left corner vertex, then the weighting procedure of coordinate can be described with following formula:
7. end-to-end text identification side under the natural scene as claimed in claim 5 based on neighbour's coherency boundary's optimization algorithm Method, which is characterized in that confidence level function f described in algorithmcDesign, can be used following form:
8. end-to-end text identification side under the natural scene as claimed in claim 2 based on neighbour's coherency boundary's optimization algorithm Method, which is characterized in that the identification division obtains in such a way that the second coding module is connect in succession with the second decoder module Predict text string;Wherein the second coding module includes the down-sampling structure between multilayer convolutional coding structure and adjacent convolutional coding structure, the Two decoder modules are used based on long Memory Neural Networks structure in short-term.
9. end-to-end text identification side under the natural scene as described in claim 1 based on neighbour's coherency boundary's optimization algorithm Method, which is characterized in that the bilinear interpolation sampling section is looked on sharing feature figure for a testing result region To corresponding position, bilinear interpolation sampling is carried out to it, obtains identification feature figure.
10. end-to-end text identification under the natural scene as described in claim 1 based on neighbour's coherency boundary's optimization algorithm Method, which is characterized in that be trained to via following steps:
Step 701, forward process is carried out to natural scene image;
Step 702, the error of prediction class figure and true class figure is calculated using cross entropy loss function;Using handing over and compare loss function The error of prediction geometric graph and true geometric figure is calculated with cosine similarity function;Using CTC loss function calculate prediction string with The error really gone here and there;
Step 703, parameter gradients are obtained using back-propagation algorithm, ginseng is updated using optimization algorithm such as stochastic gradient descent algorithm Number gradient.
CN201910371620.9A 2019-05-06 2019-05-06 Method for recognizing end-to-end text in natural scene Active CN110135419B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910371620.9A CN110135419B (en) 2019-05-06 2019-05-06 Method for recognizing end-to-end text in natural scene

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910371620.9A CN110135419B (en) 2019-05-06 2019-05-06 Method for recognizing end-to-end text in natural scene

Publications (2)

Publication Number Publication Date
CN110135419A true CN110135419A (en) 2019-08-16
CN110135419B CN110135419B (en) 2023-04-28

Family

ID=67576358

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910371620.9A Active CN110135419B (en) 2019-05-06 2019-05-06 Method for recognizing end-to-end text in natural scene

Country Status (1)

Country Link
CN (1) CN110135419B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110738262A (en) * 2019-10-16 2020-01-31 北京市商汤科技开发有限公司 Text recognition method and related product

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108108657A (en) * 2017-11-16 2018-06-01 浙江工业大学 A kind of amendment local sensitivity Hash vehicle retrieval method based on multitask deep learning
KR20180092836A (en) * 2017-02-08 2018-08-20 한국과학기술원 System and method for character boundary recognition
CN109447078A (en) * 2018-10-23 2019-03-08 四川大学 A kind of detection recognition method of natural scene image sensitivity text

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20180092836A (en) * 2017-02-08 2018-08-20 한국과학기술원 System and method for character boundary recognition
CN108108657A (en) * 2017-11-16 2018-06-01 浙江工业大学 A kind of amendment local sensitivity Hash vehicle retrieval method based on multitask deep learning
CN109447078A (en) * 2018-10-23 2019-03-08 四川大学 A kind of detection recognition method of natural scene image sensitivity text

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110738262A (en) * 2019-10-16 2020-01-31 北京市商汤科技开发有限公司 Text recognition method and related product

Also Published As

Publication number Publication date
CN110135419B (en) 2023-04-28

Similar Documents

Publication Publication Date Title
CN110782462B (en) Semantic segmentation method based on double-flow feature fusion
CN112766087A (en) Optical remote sensing image ship detection method based on knowledge distillation
CN111612790B (en) Medical image segmentation method based on T-shaped attention structure
CN111680706B (en) Dual-channel output contour detection method based on coding and decoding structure
CN110287960A (en) The detection recognition method of curve text in natural scene image
CN110390340B (en) Feature coding model, training method and detection method of visual relation detection model
CN112597985B (en) Crowd counting method based on multi-scale feature fusion
CN110533631A (en) SAR image change detection based on the twin network of pyramid pondization
CN111259853A (en) High-resolution remote sensing image change detection method, system and device
CN110705566B (en) Multi-mode fusion significance detection method based on spatial pyramid pool
CN110147745A (en) A kind of key frame of video detection method and device
CN111462230A (en) Typhoon center positioning method based on deep reinforcement learning
CN116363124B (en) Steel surface defect detection method based on deep learning
CN114463759A (en) Lightweight character detection method and device based on anchor-frame-free algorithm
CN116958163A (en) Multi-organ and/or focus medical image segmentation method and device
CN114782798A (en) Underwater target detection method based on attention fusion
CN116580322A (en) Unmanned aerial vehicle infrared small target detection method under ground background
CN112818777B (en) Remote sensing image target detection method based on dense connection and feature enhancement
CN114821299A (en) Remote sensing image change detection method
CN110135419A (en) End-to-end text recognition method under a kind of natural scene
Hou et al. Retracted: KSSD: single‐stage multi‐object detection algorithm with higher accuracy
CN115410102A (en) SAR image airplane target detection method based on combined attention mechanism
CN114331950A (en) SAR image ship detection method based on dense connection sparse activation network
Idicula et al. Real time SAR Ship Detection using novel SarNeDe method
CN115205710B (en) Double-time-phase remote sensing image change detection method combined with color correction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant