CN109117836A - Text detection localization method and device under a kind of natural scene based on focal loss function - Google Patents

Text detection localization method and device under a kind of natural scene based on focal loss function Download PDF

Info

Publication number
CN109117836A
CN109117836A CN201810729838.2A CN201810729838A CN109117836A CN 109117836 A CN109117836 A CN 109117836A CN 201810729838 A CN201810729838 A CN 201810729838A CN 109117836 A CN109117836 A CN 109117836A
Authority
CN
China
Prior art keywords
text
pixel
network
loss function
true value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810729838.2A
Other languages
Chinese (zh)
Other versions
CN109117836B (en
Inventor
操晓春
田晓玮
伍蹈
代朋纹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Information Engineering of CAS
Original Assignee
Institute of Information Engineering of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Information Engineering of CAS filed Critical Institute of Information Engineering of CAS
Priority to CN201810729838.2A priority Critical patent/CN109117836B/en
Publication of CN109117836A publication Critical patent/CN109117836A/en
Application granted granted Critical
Publication of CN109117836B publication Critical patent/CN109117836B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]

Abstract

The present invention discloses text detection localization method and device under a kind of natural scene based on focal loss function.This method first pre-processes the data of mark, then constructs text detection and positions network, then quotes a part of focused lost function as training process loss function, then detects to natural scene picture to be detected.This method makes to mark the text detection network designed by being more suitable by adjusting existing mark;Multireel lamination is merged based on FCN network, is allowed to be more in line with text detection task;Positive negative sample is balanced in the training process by introducing focused lost function, improves detection accuracy.The present invention can obtain the high effect recalled of high-precision in text detection positioning.

Description

Under a kind of natural scene based on focal loss function text detection localization method and Device
Technical field
The invention belongs to technical field of computer vision, and in particular to one kind can be accurately positioned in natural scene picture The method and apparatus of character area.
Background technique
The method that the mankind propagate information is varied, and the carrier that text is propagated as information itself directly contains rich Rich semantic information.In natural scene, text is ubiquitous.Whether shop signboard or traffic sign or even curbside Advertisement, poster etc. all convey information using text.Character area is accurately oriented from natural scene and is identified, Machine can be helped to more fully understand Scene Semantics content, many fields are suffered from complementary help.For example, in street Scape identifies field, identifies that the text on building board is conducive to us and more fully understands streetscape information;Driving field is being assisted, Identify that the text on traffic signboard is conducive to us and preferably assists automatic Pilot.In the present of artificial intelligence high speed development It, natural scene Text region has become the important component of computer vision.Text in natural scene image is known It is not broadly divided into two tasks, first is that text detection, i.e., orient text region in the picture;Second is that Text region, i.e., The content of text is extracted from the region of positioning.Since strokes of characters is abundant, details is sensitive, only accurately orients text Region can just do subsequent identification work, therefore text detection occupies an important position in entire identification mission.
Text detection under natural scene is very different with traditional text detection technology, is mainly reflected in following several A aspect.First is that the object of processing is different.Conventional text detection master to be processed is file and picture, usually scanning document, lattice Formula is PDF etc..Text detection master under natural scene to be processed is Streetscape picture, usually Photograph image, format JPG Deng.Second is that character area is different from background area situation.Character area in conventional text detection processing object occupies picture master Position is wanted, text is regular, and background is mostly pure color, without other interference.Literal field in natural scene text detection process object Domain layout is irregular, and text is not of uniform size, varies in color, and background is complicated, is filled with the various interference informations for text detection, Such as, railing, electric wire, shelter etc..Third is that the image quality of process object is different.Conventional text detection processing object image quality compared with It is good, image clearly.Natural scene text detection deals with objects the image due to shooting angle, shake or light conditions etc. Situations such as it is more that there may be noises, obscures.As can be seen that the text detection in natural scene is more multiple than conventional text detection It is miscellaneous, it deals with objects more difficult.Therefore, the text detection in natural scene is all one in computer vision field all the time The task of great challenge.
Currently, the character detecting method one under natural scene shares three classes, it is based on connection component, based on texture spy respectively Seek peace based on the two mixing method.Method based on connection component is mainly the correlation utilized between text point adjacent pixel Property detects character area.For the text of natural scene, the gray value for being reflected in the text in gray level image is approximate, in addition, Text color, stroke width etc. also have certain correlation.Main method has: maximum stable extremal region (MSER), color Clustering method (Color Clustering), stroke width convert the methods of (SWT) and gradient orientation histogram (HOG).It is based on The method of textural characteristics mainly detects character area using the dissimilarity of text and background texture feature.For natural scene Text, character area often has unique textural characteristics, can be passed to trained classification using this category feature as input Device extracts text from background.Based on the two mixing method combine above-mentioned correlation and dissimilarity come Detect character area.Two steps are broadly divided into, first pass through correlation to obtain text candidate region, then to these candidate regions Textural characteristics detection is carried out, character area is accurately positioned out.
In recent years, deep learning high speed development obtains the textural characteristics of text using the method for deep learning, in turn Distinguish that character area and background area become the main stream approach of text detection instantly.Herein, it is a kind of special text to be considered as Object, using the main stream approach of object detection in deep learning, such as: Faster R-CNN, YOLO, RFCN etc. are carried out The text detection of natural scene.But there are larger difference, main bodies for the object in the text and object detection in natural scene Present character area may be longer, and directly poor using object detecting method specific aim, effect is bad.Therefore, for the spy of text Point designs rationally efficient character detecting method and still remains biggish challenge.
Summary of the invention
The present invention proposes a kind of deep learning text inspection based on focal loss function for the image in natural scene Survey localization method and device.
Present invention employs the FCN in deep neural network, the texture feature information based on text tells character area Place pixel, and returning simultaneously to the size of text box and inclination angle, can be to the text of arbitrary size in natural scene, inclination angle Word carries out detection positioning.Particularly, focal loss function is applied in the training process during differentiating character area, is put down Having weighed in picture, character area is smaller, is difficult to the problem of learning, and improves the accuracy of text detection.In terms of test, due to Network design of the invention all has good sensibility to the text of different scale, and so there is no need to use cascade test can also be accurate Position character area.
The technical solution adopted by the invention is as follows:
Text detection localization method under a kind of natural scene based on focal loss function, comprising the following steps:
1) according to the data set of the natural scene picture of mark, the classification true value figure of text/background two and text picture are constructed Five dimension true value figures of element and text box corresponding relationship where it;
2) it is based on FCN net structure text detection network, the loss function of the text detection network includes focal loss Function and the loss function for returning text box;
3) the text detection network is instructed using the two classification true value figure of construction and the five dimensions true value figure Practice;The text detection network carries out two classification of single cent word, background using the focal loss function pixel-by-pixel, and uses institute The loss function for stating recurrence text box returns height, width and the tilt angle of text box where pixel;
4) natural scene picture to be detected is inputted into the text detection network that training is completed, realizes that the detection of text is fixed Position.
Further, existing mark is adjusted to two classification annotations first by step 1), and the pixel of character area is set as 1, back The pixel of scene area is set as 0, constructs the classification true value figure of text/background two;Then pixel and the minimum rectangle surrounded are calculated Four frontier distances of frame and five dimension true value figures of place text box same level angular separation.
Further, step 2) uses ResNet-50 as basic convolutional neural networks structure, by multiple convolutional layer results It is cascaded, constructs text detection network.
Further, it in step 2) the text detection network, is done by the result of conv5_c same after anti-pondization operates The result of conv4_f obtains f1 network layer after two convolution operations of 3*3 and 1*1 after merging, and is added repeatedly Two network layers of f2 and f3 are obtained after conv3_d, conv2_c;F3 obtains parallel after two different 3*3 convolution operations Two network layers f4_1 and f4_2 are respectively used to calculate two loss functions, common training.
Further, the formula of the focal loss function is as follows:
Wherein w and h refers respectively to the width and height of two classification charts of prediction, Y*Refer to given true value, αtIt is to use In the parameter for balancing positive negative sample, γ is the parameter for balancing difficulty or ease sample, ptRefer to two sorter network predicted values;ptWith αtCalculation formula is as follows:
The wherein predicted value provided in the pixel lower network that p refers to, the true value in the pixel that y refers to.
Further, the loss function for returning text box is defined as follows:
Lreg=LIoU+ηLθ
Wherein LIoURefer to the degree of overlapping between the text box and real text frame that return, LθRefer to the inclination angle of prediction With the loss between true inclination angle, η is balance parameters;LIoUAnd LθCalculation it is as follows:
WhereinR*The text box and corresponding mark text box of prediction are respectively referred to,θ*Respectively refer to the inclination of prediction Angle and corresponding mark tilt angle, constant c is for constraining LθThe upper bound.
Further, step 4) the following steps are included:
4.1) natural scene picture to be detected is subjected to scaled down to long side < 2400p;
4.2) natural scene picture to be detected input text detection network is obtained into two classification results and regression result;
4.3) the text pixel region greater than threshold value is selected to two classification charts, according to text pixel region and corresponding recurrence As a result, the maximum restrainable algorithms using part remove extra text box.
Corresponding with above method, the present invention also provides text detections under a kind of natural scene based on focal loss function Positioning device comprising:
True value figure constructing module, is responsible for the data set of the natural scene picture according to mark, and construction text/background two is classified Five dimension true value figures of true value figure and text pixel and text box corresponding relationship where it;
Text detection net structure module is responsible for being based on FCN net structure text detection network, the text detection network Loss function include focal loss function and return text box loss function;
Text detection network training module is responsible for the two classification true value figure and the five dimensions true value figure pair using construction The text detection network is trained;The text detection network carries out single cent using the focal loss function pixel-by-pixel Two classification of word, background, and using the loss function for returning text box to the height of text box where pixel, width and Tilt angle is returned;
The detection locating module of text is responsible for natural scene picture to be detected inputting the text detection net that training is completed Network realizes the detection positioning of text.
In conclusion the method that the present invention devises text detection and localization under the natural scene based on deep learning, Result with the high recall rate of high-precision.Compared with prior art, the present invention has the advantages that
1, it is improved based on FCN network, devises a text detection network.
2, focused lost function is adjusted and used, network training is conducive to.
3, network adaptability is strong, can use few training sample, obtains high-precision test result.
Detailed description of the invention
Fig. 1: data mark conversion schematic diagram;
Fig. 2: natural scene text detection network architecture diagram;
Fig. 3: natural scene text detection result instance graph.
Specific embodiment
Below by specific embodiments and the drawings, the present invention is described in further details.
Text detection localization method under natural scene based on focal loss function of the invention, is broadly divided into the training stage (corresponding training module) and test phase (corresponding test module).
The step of training stage, is as follows:
1) data set of mark is pre-processed, constructs the classification true value figure of text/background two and text pixel and its Five dimension true value figures of place text box corresponding relationship.
The mark of the step 1) converts as shown in Figure 1, being labeled as 1 to pixel in the text box of mark, background pixel marks It is 0, constructs the classification true value figure of text/background two.Such as marking text box is arbitrary quadrilateral frame, need to unify to expand to be adjusted to most The rectangle frame of small encirclement, then to avoid interfering, the interior pixels for reducing 30% to the text box after expansion (consider literal field Might not all be text completely in domain, so filtering out the position close to mark edge by reducing 30%), calculate the pixel With the distance on four boundaries of place rectangle frame and the angle in place text box same level direction, four distances and an angle Constitute five dimension true value figures.As shown in Figure 1, l indicate left, the distance for the pixel away from left frame, t indicate top, for the pixel away from The distance of upper side frame;B indicates that bottom, the distance for the pixel away from lower frame, r indicate right, is the pixel away from left frame Distance, θ refer to the angle in the textbox same level direction.
2) FCN network (Evan Shelhamer, Jonathan Long, and Trevor Darrell, " Fully are based on Convolutional networks for semantic segmentation, " PAMI, 2017, pp.640-651.), use ResNet-50 cascades multiple convolutional layer results as basic convolutional neural networks structure, constructs text detection network.
The step 2) uses for reference Fully Convolutional Network (FCN) thought, and text detection task is treated as object Body divides task.Network structure as shown in Fig. 2, firstly, used ResNet-50 as basic network structure, abstract image The high-rise feature with after base layer texture Fusion Features.Conv5_c in Fig. 2, conv4_f, conv3_d, conv2_c and conv1 table Show convolutional layer.In order to adapt to the character features of different scale, by the feature of conv5_c, conv4_f, conv3_d and conv2_c It is merged, constructs f1, the several special network layers of f2, f3.Specifically, after doing anti-pondization operation by the result of conv5_c With conv4_f result merge after after two convolution operations of 3*3 and 1*1 obtain f1 network layer, be added repeatedly Two network layers of f2 and f3 are obtained after conv3_d, conv2_c, i.e., the knot after anti-pondization operates with conv3_d is done by the result of f1 Fruit obtains f2 network layer after two convolution operations of 3*3 and 1*1 after merging, and is done by the result of f2 same after anti-pondization operates The result of conv2_c obtains f3 network layer after two convolution operations of 3*3 and 1*1 after merging.In Fig. 2 above " predicted value " The representation dimension of numerical value 1 and 5,1 indicate judgement be text pixel, 5 be five parameters in five dimension true value figures.
3) while carrying out the classification of text/background two pixel-by-pixel, the same place of pixel has been returned to pixel where character area The distance on four boundaries of text box and the tilt angle in place text box same level direction.So that network is predicting text While region, text box can be relatively easily constructed, realizes the detection and positioning of text.
The step 3) has carried out returning task to construct the text box of text detection while two classification problems Multi-task learning.Classify for this purpose, constructing two parallel convolutional layer f4_1 and f4_2 difference output character/backgrounds two after f3 As a result with the regression result of text box size and inclination angle.Specifically, f3 is obtained in parallel after two different 3*3 convolution operations Two network layers f4_1 and f4_2, be respectively used to calculate two loss functions, common training.
4) in network training process, it is contemplated that sample is in distress to be had easily, in addition, character area in whole picture ratio compared with Small, positive and negative sample proportion is extremely uneven when sampling, it usually needs introduces the method that difficult example is excavated and improves training effectiveness.The present invention will Focal loss function (Tsung-Yi Lin, Priya Goyal, Ross B.Girshick, Kaiming He, and Piotr Dolla′r,“Focal loss for dense object detec-tion,”in ICCV 2017,pp.2999–3007.) It is introduced into after adjustment in the training process of text detection task, is excavated without difficult example.
The step 4) has used in object detection that (including positive negative sample is not for solving imbalance problem between training sample class Balance and difficulty or ease sample imbalance) a part as text detection task loss function of loss function.By the loss function Object be converted into pixel from anchor (candidate character (text) frame), by multi-class problem convert two classification problems to adapt to text Detection task.
The step of test phase, is as follows:
1) adjustment test picture size inputs text detection network, and the testing result of text detection network is text, background The regression results figure such as two classification charts and text box size, inclination angle.
The step 1) keeps picture ratio, and adjustment picture size to long side is no more than 2400p, and wherein p indicates pixel quantity, That is long side is long no more than 2400 pixels.Keep to the picture for being more than limitation the scaled down of length-width ratio.
2) text pixel region that score (score) is greater than threshold value is selected to two classification charts, according to text pixel region and Corresponding regression result removes extra text box using the non-maxima suppression algorithm (Local-Aware NMS) of part.
The two classification results figures that the step 2) exports network, filtering effective text pixel threshold value score is 0.97.Figure 3 be natural scene text detection result instance graph.
Text detection network of the invention is further illustrated below.Natural scene proposed by the present invention based on deep learning Text detection network in the method for lower text detection positioning, which is mainly improved by FCN, to be got, as shown in Figure 2.The network Use two loss functions, first be focal loss function (L adjustedseg) two classification (text/background) are carried out, the Two are the loss function (L for returning text boxreg) four boundaries to text pixel apart from place text box distance and text Frame is returned with horizontal direction angle.Total loss function equation is as follows:
L=Lseg+λLreg#(1)
Wherein balance parameters λ is set as 1.
For two assorting processes, loss function is defined as follows:
Wherein w and h refers respectively to the width and height of two classification charts of prediction, Y*Refer to given true value, αtIt is to use In the parameter for balancing positive negative sample, γ is the parameter for balancing difficulty or ease sample, ptRefer to two sorter network predicted values.(2) Middle ptAnd αtCalculation is similar, and calculation formula is as follows:
The wherein predicted value provided in the pixel lower network that p refers to, the true value in the pixel that y refers to, α are a ginsengs Number.In the text detection model of the present embodiment, it is 0.95 and 2 that alpha, gamma, which distinguishes value,.α in other embodiments, γ can also be with Other numerical value are taken as needed.
For regression process, loss function is defined as follows:
Lreg=LIoU+ηLθ#(5)
Wherein LIoURefer to the degree of overlapping (IoU) between the text box and real text frame that return, LθRefer to prediction Loss between inclination angle and true inclination angle, η are balance parameters, and 20 are set as in this character detecting method.The two calculation is such as Under:
WhereinR*The text box and corresponding mark text box of prediction are respectively referred to,θ*Respectively refer to the inclination of prediction Angle and corresponding mark tilt angle, constant c is for constraining LθThe upper bound, 6 are set as in this character detecting method.
Natural scene character detecting method proposed by the present invention tests environment and experimental result are as follows:
(1) environment is tested:
System environments: ubuntu16.04;
Hardware environment: memory: 64GB, GPU:K80, hard disk: 2TB;
(2) experimental data:
Training data:
ImageNet pre-training RestNet-50 basic network.
Opened using natural scene picture 1229 (including ICDAR2013 training set 299 is opened, ICDAR2015 training set 1000 ) train and arrive model stability, effect is no longer promoted
Training optimization method: ADAM
Test data: ICDAR2015 (500)
Appraisal procedure: ICDAR2015 online evaluation
(3) experimental result:
Effect to illustrate the invention, using identical data set to whether use focal loss function text of the present invention Detection network is trained, and deconditioning when no longer being promoted to model stability effect is surveyed using ICDAR2015 test set Examination, and compared with the character detecting method effect of existing mainstream.
Existing mainstream scheme and test comparison result of the present invention are as shown in Table 1 below:
1. existing method of table and test result of the invention comparison
Serial number Method P R F
1 CTPN 0.516 0.742 0.609
2 EAST 0.836 0.735 0.782
3 The present invention (does not use focal loss function) 0.819 0.767 0.792
4 The present invention (uses focal loss function) 0.847 0.773 0.809
Wherein P refers to accuracy rate, and R refers to recall rate, and F is the harmonic-mean of P and R.It can be clearly seen that from table, this hair Bright involved precision and recall rate of the text detection network than existing character detecting method CTPN and EAST has to be mentioned greatly very much It rises, and the network model obtained using the method that focal loss function is trained has been obtained in precision and recall rate into one Step is promoted.Wherein CTPN method referring to " Zhi Tian, Weilin Huang, Tong He, Pan He, and Yu Qiao, “Detecting text in natural image with connectionist text proposal network,”in ECCV 2016, pp.56-72. ", EAST method referring to " Xinyu Zhou, Cong Yao, He Wen, Yuzhi Wang, Shuchang Zhou,Weiran He,and Jiajun Liang,“EAST:an efficient and accurate scene text detector,”in CVPR 2017,pp.2642–2651.”。
Another embodiment of the present invention provides text detection positioning device under a kind of natural scene based on focal loss function, Comprising:
True value figure constructing module, is responsible for the data set of the natural scene picture according to mark, and construction text/background two is classified Five dimension true value figures of true value figure and text pixel and text box corresponding relationship where it;
Text detection net structure module is responsible for being based on FCN net structure text detection network, the text detection network Loss function include focal loss function and return text box loss function;
Text detection network training module is responsible for the two classification true value figure and the five dimensions true value figure pair using construction The text detection network is trained;The text detection network carries out single cent using the focal loss function pixel-by-pixel Two classification of word, background, and using the loss function for returning text box to the height of text box where pixel, width and Tilt angle is returned;
The detection locating module of text is responsible for natural scene picture to be detected inputting the text detection net that training is completed Network realizes the detection positioning of text.
The above embodiments are merely illustrative of the technical solutions of the present invention rather than is limited, the ordinary skill of this field Personnel can be with modification or equivalent replacement of the technical solution of the present invention are made, without departing from the spirit and scope of the present invention, this The protection scope of invention should be subject to described in claims.

Claims (10)

1. text detection localization method under a kind of natural scene based on focal loss function, which is characterized in that including following step It is rapid:
1) according to the data set of the natural scene picture of mark, construct text/background two classify true value figure and text pixel with Five dimension true value figures of text box corresponding relationship where it;
2) it is based on FCN net structure text detection network, the loss function of the text detection network includes focal loss function With the loss function for returning text box;
3) the text detection network is trained using the two classification true value figure of construction and the five dimensions true value figure;Institute Two classification that text detection network carries out single cent word, background using the focal loss function pixel-by-pixel are stated, and are returned using described The loss function of text box is returned to return height, width and the tilt angle of text box where pixel;
4) natural scene picture to be detected is inputted into the text detection network that training is completed, realizes the detection positioning of text.
2. the method as described in claim 1, which is characterized in that existing mark is adjusted to two classification annotations first by step 1), The pixel of character area is set as 1, and the pixel of background area is set as 0, constructs the classification true value figure of text/background two;Then it calculates Five dimension true value figures of four frontier distances and place text box same level angular separation of pixel and the minimum rectangle frame surrounded out.
3. the method as described in claim 1, which is characterized in that step 2) uses ResNet-50 as basic convolutional Neural net Network structure cascades multiple convolutional layer results, constructs text detection network.
4. method as claimed in claim 3, which is characterized in that in step 2) the text detection network, by the knot of conv5_c The result that fruit is cooked after anti-pondization operates with conv4_f obtains f1 network after two convolution operations of 3*3 and 1*1 after merging Layer;It is done after the result after anti-pondization operates with conv3_d merges after two convolution operations of 3*3 and 1*1 by the result of f1 Obtain f2 network layer;It is done after the result after anti-pondization operates with conv2_c merges by the result of f2 through 3*3 and 1*1 two F3 network layer is obtained after convolution operation;F3 obtains two parallel network layer f4_1 after two different 3*3 convolution operations And f4_2, it is respectively used to calculate two loss functions, common training.
5. the method as described in claim 1, which is characterized in that the formula of the focal loss function is as follows:
Wherein w and h refers respectively to the width and height of two classification charts of prediction, Y*Refer to given true value, αtIt is for putting down Weigh the parameter of positive negative sample, and γ is the parameter for balancing difficulty or ease sample, ptRefer to two sorter network predicted values;ptAnd αtMeter It is as follows to calculate formula:
The wherein predicted value provided in the pixel lower network that p refers to, the true value in the pixel that y refers to.
6. the method as described in claim 1, which is characterized in that the loss function for returning text box is defined as follows:
Lreg=LIoU+ηLθ
Wherein LIoURefer to the degree of overlapping between the text box and real text frame that return, LθRefer to the inclination angle of prediction and true Loss between inclination angle, η are balance parameters;LIoUAnd LθCalculation it is as follows:
WhereinR*The text box and corresponding mark text box of prediction are respectively referred to,θ*Respectively refer to prediction tilt angle and Corresponding mark tilt angle, constant c is for constraining LθThe upper bound.
7. the method as described in claim 1, which is characterized in that step 4) the following steps are included:
4.1) natural scene picture to be detected is subjected to scaled down to long side < 2400p;
4.2) natural scene picture to be detected input text detection network is obtained into two classification results and regression result;
4.3) the text pixel region greater than threshold value is selected to two classification charts, is tied according to text pixel region and corresponding recurrence Fruit removes extra text box using the maximum restrainable algorithms of part.
8. text detection positioning device under a kind of natural scene based on focal loss function characterized by comprising
True value figure constructing module is responsible for the data set of the natural scene picture according to mark, the construction classification true value of text/background two Five dimension true value figures of figure and text pixel and text box corresponding relationship where it;
Text detection net structure module is responsible for being based on FCN net structure text detection network, the damage of the text detection network Losing function includes focal loss function and the loss function for returning text box;
Text detection network training module is responsible for the two classification true value figure and the five dimensions true value figure using construction to described Text detection network is trained;The text detection network carries out single cent word, back using the focal loss function pixel-by-pixel Two classification of scape, and height, width and inclination using the loss function for returning text box to text box where pixel Angle is returned;
The detection locating module of text is responsible for natural scene picture to be detected inputting the text detection network that training is completed, Realize the detection positioning of text.
9. device as claimed in claim 8, which is characterized in that existing mark is adjusted to by the true value figure constructing module first Two classification annotations, the pixel of character area are set as 1, and the pixel of background area is set as 0, construct the classification true value of text/background two Figure;Then four frontier distances and the place text box same level angular separation of pixel and the minimum rectangle frame surrounded are calculated Five dimension true value figures.
10. device as described in claim 1, which is characterized in that the formula of the focal loss function is as follows:
Wherein w and h refers respectively to the width and height of two classification charts of prediction, Y*Refer to given true value, αtIt is for putting down Weigh the parameter of positive negative sample, and γ is the parameter for balancing difficulty or ease sample, ptRefer to two sorter network predicted values;ptAnd αtMeter It is as follows to calculate formula:
The wherein predicted value provided in the pixel lower network that p refers to, the true value in the pixel that y refers to.
CN201810729838.2A 2018-07-05 2018-07-05 Method and device for detecting and positioning characters in natural scene based on focus loss function Active CN109117836B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810729838.2A CN109117836B (en) 2018-07-05 2018-07-05 Method and device for detecting and positioning characters in natural scene based on focus loss function

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810729838.2A CN109117836B (en) 2018-07-05 2018-07-05 Method and device for detecting and positioning characters in natural scene based on focus loss function

Publications (2)

Publication Number Publication Date
CN109117836A true CN109117836A (en) 2019-01-01
CN109117836B CN109117836B (en) 2022-05-24

Family

ID=64821941

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810729838.2A Active CN109117836B (en) 2018-07-05 2018-07-05 Method and device for detecting and positioning characters in natural scene based on focus loss function

Country Status (1)

Country Link
CN (1) CN109117836B (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109740542A (en) * 2019-01-07 2019-05-10 福建博思软件股份有限公司 Method for text detection based on modified EAST algorithm
CN109948480A (en) * 2019-03-05 2019-06-28 中国电子科技集团公司第二十八研究所 A kind of non-maxima suppression method for arbitrary quadrilateral
CN110069985A (en) * 2019-03-12 2019-07-30 北京三快在线科技有限公司 Aiming spot detection method based on image, device, electronic equipment
CN110378243A (en) * 2019-06-26 2019-10-25 深圳大学 A kind of pedestrian detection method and device
CN110674932A (en) * 2019-09-30 2020-01-10 北京小米移动软件有限公司 Two-stage convolutional neural network target detection network training method and device
CN110807523A (en) * 2019-10-23 2020-02-18 中科智云科技有限公司 Method and equipment for generating detection model of similar target
CN110827253A (en) * 2019-10-30 2020-02-21 北京达佳互联信息技术有限公司 Training method and device of target detection model and electronic equipment
CN110991440A (en) * 2019-12-11 2020-04-10 易诚高科(大连)科技有限公司 Pixel-driven mobile phone operation interface text detection method
CN111274985A (en) * 2020-02-06 2020-06-12 咪咕文化科技有限公司 Video text recognition network model, video text recognition device and electronic equipment
CN111460247A (en) * 2019-01-21 2020-07-28 重庆邮电大学 Automatic detection method for network picture sensitive characters
CN111582265A (en) * 2020-05-14 2020-08-25 上海商汤智能科技有限公司 Text detection method and device, electronic equipment and storage medium
CN112149620A (en) * 2020-10-14 2020-12-29 南昌慧亦臣科技有限公司 Method for constructing natural scene character region detection model based on no anchor point
CN112184688A (en) * 2020-10-10 2021-01-05 广州极飞科技有限公司 Network model training method, target detection method and related device
CN113139539A (en) * 2021-03-16 2021-07-20 中国科学院信息工程研究所 Method and device for detecting characters of arbitrary-shaped scene with asymptotic regression boundary
WO2023125244A1 (en) * 2021-12-30 2023-07-06 中兴通讯股份有限公司 Character detection method, terminal, and readable storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103279753A (en) * 2013-06-09 2013-09-04 中国科学院自动化研究所 English scene text block identification method based on instructions of tree structures
CN105184312A (en) * 2015-08-24 2015-12-23 中国科学院自动化研究所 Character detection method and device based on deep learning
CN105335754A (en) * 2015-10-29 2016-02-17 小米科技有限责任公司 Character recognition method and device
CN106096531A (en) * 2016-05-31 2016-11-09 安徽省云力信息技术有限公司 A kind of traffic image polymorphic type vehicle checking method based on degree of depth study
US20180032840A1 (en) * 2016-07-27 2018-02-01 Beijing Kuangshi Technology Co., Ltd. Method and apparatus for neural network training and construction and method and apparatus for object detection

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103279753A (en) * 2013-06-09 2013-09-04 中国科学院自动化研究所 English scene text block identification method based on instructions of tree structures
CN105184312A (en) * 2015-08-24 2015-12-23 中国科学院自动化研究所 Character detection method and device based on deep learning
CN105335754A (en) * 2015-10-29 2016-02-17 小米科技有限责任公司 Character recognition method and device
CN106096531A (en) * 2016-05-31 2016-11-09 安徽省云力信息技术有限公司 A kind of traffic image polymorphic type vehicle checking method based on degree of depth study
US20180032840A1 (en) * 2016-07-27 2018-02-01 Beijing Kuangshi Technology Co., Ltd. Method and apparatus for neural network training and construction and method and apparatus for object detection

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109740542A (en) * 2019-01-07 2019-05-10 福建博思软件股份有限公司 Method for text detection based on modified EAST algorithm
CN109740542B (en) * 2019-01-07 2020-11-27 福建博思软件股份有限公司 Text detection method based on improved EAST algorithm
CN111460247A (en) * 2019-01-21 2020-07-28 重庆邮电大学 Automatic detection method for network picture sensitive characters
CN111460247B (en) * 2019-01-21 2022-07-01 重庆邮电大学 Automatic detection method for network picture sensitive characters
CN109948480A (en) * 2019-03-05 2019-06-28 中国电子科技集团公司第二十八研究所 A kind of non-maxima suppression method for arbitrary quadrilateral
CN110069985A (en) * 2019-03-12 2019-07-30 北京三快在线科技有限公司 Aiming spot detection method based on image, device, electronic equipment
CN110378243A (en) * 2019-06-26 2019-10-25 深圳大学 A kind of pedestrian detection method and device
CN110674932A (en) * 2019-09-30 2020-01-10 北京小米移动软件有限公司 Two-stage convolutional neural network target detection network training method and device
CN110807523A (en) * 2019-10-23 2020-02-18 中科智云科技有限公司 Method and equipment for generating detection model of similar target
CN110807523B (en) * 2019-10-23 2022-08-05 中科智云科技有限公司 Method and equipment for generating detection model of similar target
CN110827253A (en) * 2019-10-30 2020-02-21 北京达佳互联信息技术有限公司 Training method and device of target detection model and electronic equipment
CN110991440A (en) * 2019-12-11 2020-04-10 易诚高科(大连)科技有限公司 Pixel-driven mobile phone operation interface text detection method
CN110991440B (en) * 2019-12-11 2023-10-13 易诚高科(大连)科技有限公司 Pixel-driven mobile phone operation interface text detection method
CN111274985A (en) * 2020-02-06 2020-06-12 咪咕文化科技有限公司 Video text recognition network model, video text recognition device and electronic equipment
CN111274985B (en) * 2020-02-06 2024-03-26 咪咕文化科技有限公司 Video text recognition system, video text recognition device and electronic equipment
CN111582265A (en) * 2020-05-14 2020-08-25 上海商汤智能科技有限公司 Text detection method and device, electronic equipment and storage medium
CN112184688A (en) * 2020-10-10 2021-01-05 广州极飞科技有限公司 Network model training method, target detection method and related device
CN112184688B (en) * 2020-10-10 2023-04-18 广州极飞科技股份有限公司 Network model training method, target detection method and related device
CN112149620A (en) * 2020-10-14 2020-12-29 南昌慧亦臣科技有限公司 Method for constructing natural scene character region detection model based on no anchor point
CN113139539A (en) * 2021-03-16 2021-07-20 中国科学院信息工程研究所 Method and device for detecting characters of arbitrary-shaped scene with asymptotic regression boundary
WO2023125244A1 (en) * 2021-12-30 2023-07-06 中兴通讯股份有限公司 Character detection method, terminal, and readable storage medium

Also Published As

Publication number Publication date
CN109117836B (en) 2022-05-24

Similar Documents

Publication Publication Date Title
CN109117836A (en) Text detection localization method and device under a kind of natural scene based on focal loss function
Li et al. Automatic pavement crack detection by multi-scale image fusion
Ping et al. A deep learning approach for street pothole detection
US8509478B2 (en) Detection of objects in digital images
CN106355188A (en) Image detection method and device
CN105574550A (en) Vehicle identification method and device
CN109285139A (en) A kind of x-ray imaging weld inspection method based on deep learning
CN107346420A (en) Text detection localization method under a kind of natural scene based on deep learning
CN109711288A (en) Remote sensing ship detecting method based on feature pyramid and distance restraint FCN
CN105975929A (en) Fast pedestrian detection method based on aggregated channel features
CN108765386A (en) A kind of tunnel slot detection method, device, electronic equipment and storage medium
CN107273832B (en) License plate recognition method and system based on integral channel characteristics and convolutional neural network
CN105512683A (en) Target positioning method and device based on convolution neural network
CN111368690A (en) Deep learning-based video image ship detection method and system under influence of sea waves
CN109858547A (en) A kind of object detection method and device based on BSSD
CN113111727A (en) Method for detecting rotating target in remote sensing scene based on feature alignment
CN110008899B (en) Method for extracting and classifying candidate targets of visible light remote sensing image
CN110059539A (en) A kind of natural scene text position detection method based on image segmentation
CN113033516A (en) Object identification statistical method and device, electronic equipment and storage medium
CN110020669A (en) A kind of license plate classification method, system, terminal device and computer program
CN111382766A (en) Equipment fault detection method based on fast R-CNN
CN105868269A (en) Precise image searching method based on region convolutional neural network
CN108664970A (en) A kind of fast target detection method, electronic equipment, storage medium and system
Zhang et al. CFANet: Efficient detection of UAV image based on cross-layer feature aggregation
Kuchi et al. A machine learning approach to detecting cracks in levees and floodwalls

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant