CN109117836A - Text detection localization method and device under a kind of natural scene based on focal loss function - Google Patents
Text detection localization method and device under a kind of natural scene based on focal loss function Download PDFInfo
- Publication number
- CN109117836A CN109117836A CN201810729838.2A CN201810729838A CN109117836A CN 109117836 A CN109117836 A CN 109117836A CN 201810729838 A CN201810729838 A CN 201810729838A CN 109117836 A CN109117836 A CN 109117836A
- Authority
- CN
- China
- Prior art keywords
- text
- pixel
- network
- loss function
- true value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
Abstract
The present invention discloses text detection localization method and device under a kind of natural scene based on focal loss function.This method first pre-processes the data of mark, then constructs text detection and positions network, then quotes a part of focused lost function as training process loss function, then detects to natural scene picture to be detected.This method makes to mark the text detection network designed by being more suitable by adjusting existing mark;Multireel lamination is merged based on FCN network, is allowed to be more in line with text detection task;Positive negative sample is balanced in the training process by introducing focused lost function, improves detection accuracy.The present invention can obtain the high effect recalled of high-precision in text detection positioning.
Description
Technical field
The invention belongs to technical field of computer vision, and in particular to one kind can be accurately positioned in natural scene picture
The method and apparatus of character area.
Background technique
The method that the mankind propagate information is varied, and the carrier that text is propagated as information itself directly contains rich
Rich semantic information.In natural scene, text is ubiquitous.Whether shop signboard or traffic sign or even curbside
Advertisement, poster etc. all convey information using text.Character area is accurately oriented from natural scene and is identified,
Machine can be helped to more fully understand Scene Semantics content, many fields are suffered from complementary help.For example, in street
Scape identifies field, identifies that the text on building board is conducive to us and more fully understands streetscape information;Driving field is being assisted,
Identify that the text on traffic signboard is conducive to us and preferably assists automatic Pilot.In the present of artificial intelligence high speed development
It, natural scene Text region has become the important component of computer vision.Text in natural scene image is known
It is not broadly divided into two tasks, first is that text detection, i.e., orient text region in the picture;Second is that Text region, i.e.,
The content of text is extracted from the region of positioning.Since strokes of characters is abundant, details is sensitive, only accurately orients text
Region can just do subsequent identification work, therefore text detection occupies an important position in entire identification mission.
Text detection under natural scene is very different with traditional text detection technology, is mainly reflected in following several
A aspect.First is that the object of processing is different.Conventional text detection master to be processed is file and picture, usually scanning document, lattice
Formula is PDF etc..Text detection master under natural scene to be processed is Streetscape picture, usually Photograph image, format JPG
Deng.Second is that character area is different from background area situation.Character area in conventional text detection processing object occupies picture master
Position is wanted, text is regular, and background is mostly pure color, without other interference.Literal field in natural scene text detection process object
Domain layout is irregular, and text is not of uniform size, varies in color, and background is complicated, is filled with the various interference informations for text detection,
Such as, railing, electric wire, shelter etc..Third is that the image quality of process object is different.Conventional text detection processing object image quality compared with
It is good, image clearly.Natural scene text detection deals with objects the image due to shooting angle, shake or light conditions etc.
Situations such as it is more that there may be noises, obscures.As can be seen that the text detection in natural scene is more multiple than conventional text detection
It is miscellaneous, it deals with objects more difficult.Therefore, the text detection in natural scene is all one in computer vision field all the time
The task of great challenge.
Currently, the character detecting method one under natural scene shares three classes, it is based on connection component, based on texture spy respectively
Seek peace based on the two mixing method.Method based on connection component is mainly the correlation utilized between text point adjacent pixel
Property detects character area.For the text of natural scene, the gray value for being reflected in the text in gray level image is approximate, in addition,
Text color, stroke width etc. also have certain correlation.Main method has: maximum stable extremal region (MSER), color
Clustering method (Color Clustering), stroke width convert the methods of (SWT) and gradient orientation histogram (HOG).It is based on
The method of textural characteristics mainly detects character area using the dissimilarity of text and background texture feature.For natural scene
Text, character area often has unique textural characteristics, can be passed to trained classification using this category feature as input
Device extracts text from background.Based on the two mixing method combine above-mentioned correlation and dissimilarity come
Detect character area.Two steps are broadly divided into, first pass through correlation to obtain text candidate region, then to these candidate regions
Textural characteristics detection is carried out, character area is accurately positioned out.
In recent years, deep learning high speed development obtains the textural characteristics of text using the method for deep learning, in turn
Distinguish that character area and background area become the main stream approach of text detection instantly.Herein, it is a kind of special text to be considered as
Object, using the main stream approach of object detection in deep learning, such as: Faster R-CNN, YOLO, RFCN etc. are carried out
The text detection of natural scene.But there are larger difference, main bodies for the object in the text and object detection in natural scene
Present character area may be longer, and directly poor using object detecting method specific aim, effect is bad.Therefore, for the spy of text
Point designs rationally efficient character detecting method and still remains biggish challenge.
Summary of the invention
The present invention proposes a kind of deep learning text inspection based on focal loss function for the image in natural scene
Survey localization method and device.
Present invention employs the FCN in deep neural network, the texture feature information based on text tells character area
Place pixel, and returning simultaneously to the size of text box and inclination angle, can be to the text of arbitrary size in natural scene, inclination angle
Word carries out detection positioning.Particularly, focal loss function is applied in the training process during differentiating character area, is put down
Having weighed in picture, character area is smaller, is difficult to the problem of learning, and improves the accuracy of text detection.In terms of test, due to
Network design of the invention all has good sensibility to the text of different scale, and so there is no need to use cascade test can also be accurate
Position character area.
The technical solution adopted by the invention is as follows:
Text detection localization method under a kind of natural scene based on focal loss function, comprising the following steps:
1) according to the data set of the natural scene picture of mark, the classification true value figure of text/background two and text picture are constructed
Five dimension true value figures of element and text box corresponding relationship where it;
2) it is based on FCN net structure text detection network, the loss function of the text detection network includes focal loss
Function and the loss function for returning text box;
3) the text detection network is instructed using the two classification true value figure of construction and the five dimensions true value figure
Practice;The text detection network carries out two classification of single cent word, background using the focal loss function pixel-by-pixel, and uses institute
The loss function for stating recurrence text box returns height, width and the tilt angle of text box where pixel;
4) natural scene picture to be detected is inputted into the text detection network that training is completed, realizes that the detection of text is fixed
Position.
Further, existing mark is adjusted to two classification annotations first by step 1), and the pixel of character area is set as 1, back
The pixel of scene area is set as 0, constructs the classification true value figure of text/background two;Then pixel and the minimum rectangle surrounded are calculated
Four frontier distances of frame and five dimension true value figures of place text box same level angular separation.
Further, step 2) uses ResNet-50 as basic convolutional neural networks structure, by multiple convolutional layer results
It is cascaded, constructs text detection network.
Further, it in step 2) the text detection network, is done by the result of conv5_c same after anti-pondization operates
The result of conv4_f obtains f1 network layer after two convolution operations of 3*3 and 1*1 after merging, and is added repeatedly
Two network layers of f2 and f3 are obtained after conv3_d, conv2_c;F3 obtains parallel after two different 3*3 convolution operations
Two network layers f4_1 and f4_2 are respectively used to calculate two loss functions, common training.
Further, the formula of the focal loss function is as follows:
Wherein w and h refers respectively to the width and height of two classification charts of prediction, Y*Refer to given true value, αtIt is to use
In the parameter for balancing positive negative sample, γ is the parameter for balancing difficulty or ease sample, ptRefer to two sorter network predicted values;ptWith
αtCalculation formula is as follows:
The wherein predicted value provided in the pixel lower network that p refers to, the true value in the pixel that y refers to.
Further, the loss function for returning text box is defined as follows:
Lreg=LIoU+ηLθ
Wherein LIoURefer to the degree of overlapping between the text box and real text frame that return, LθRefer to the inclination angle of prediction
With the loss between true inclination angle, η is balance parameters;LIoUAnd LθCalculation it is as follows:
WhereinR*The text box and corresponding mark text box of prediction are respectively referred to,θ*Respectively refer to the inclination of prediction
Angle and corresponding mark tilt angle, constant c is for constraining LθThe upper bound.
Further, step 4) the following steps are included:
4.1) natural scene picture to be detected is subjected to scaled down to long side < 2400p;
4.2) natural scene picture to be detected input text detection network is obtained into two classification results and regression result;
4.3) the text pixel region greater than threshold value is selected to two classification charts, according to text pixel region and corresponding recurrence
As a result, the maximum restrainable algorithms using part remove extra text box.
Corresponding with above method, the present invention also provides text detections under a kind of natural scene based on focal loss function
Positioning device comprising:
True value figure constructing module, is responsible for the data set of the natural scene picture according to mark, and construction text/background two is classified
Five dimension true value figures of true value figure and text pixel and text box corresponding relationship where it;
Text detection net structure module is responsible for being based on FCN net structure text detection network, the text detection network
Loss function include focal loss function and return text box loss function;
Text detection network training module is responsible for the two classification true value figure and the five dimensions true value figure pair using construction
The text detection network is trained;The text detection network carries out single cent using the focal loss function pixel-by-pixel
Two classification of word, background, and using the loss function for returning text box to the height of text box where pixel, width and
Tilt angle is returned;
The detection locating module of text is responsible for natural scene picture to be detected inputting the text detection net that training is completed
Network realizes the detection positioning of text.
In conclusion the method that the present invention devises text detection and localization under the natural scene based on deep learning,
Result with the high recall rate of high-precision.Compared with prior art, the present invention has the advantages that
1, it is improved based on FCN network, devises a text detection network.
2, focused lost function is adjusted and used, network training is conducive to.
3, network adaptability is strong, can use few training sample, obtains high-precision test result.
Detailed description of the invention
Fig. 1: data mark conversion schematic diagram;
Fig. 2: natural scene text detection network architecture diagram;
Fig. 3: natural scene text detection result instance graph.
Specific embodiment
Below by specific embodiments and the drawings, the present invention is described in further details.
Text detection localization method under natural scene based on focal loss function of the invention, is broadly divided into the training stage
(corresponding training module) and test phase (corresponding test module).
The step of training stage, is as follows:
1) data set of mark is pre-processed, constructs the classification true value figure of text/background two and text pixel and its
Five dimension true value figures of place text box corresponding relationship.
The mark of the step 1) converts as shown in Figure 1, being labeled as 1 to pixel in the text box of mark, background pixel marks
It is 0, constructs the classification true value figure of text/background two.Such as marking text box is arbitrary quadrilateral frame, need to unify to expand to be adjusted to most
The rectangle frame of small encirclement, then to avoid interfering, the interior pixels for reducing 30% to the text box after expansion (consider literal field
Might not all be text completely in domain, so filtering out the position close to mark edge by reducing 30%), calculate the pixel
With the distance on four boundaries of place rectangle frame and the angle in place text box same level direction, four distances and an angle
Constitute five dimension true value figures.As shown in Figure 1, l indicate left, the distance for the pixel away from left frame, t indicate top, for the pixel away from
The distance of upper side frame;B indicates that bottom, the distance for the pixel away from lower frame, r indicate right, is the pixel away from left frame
Distance, θ refer to the angle in the textbox same level direction.
2) FCN network (Evan Shelhamer, Jonathan Long, and Trevor Darrell, " Fully are based on
Convolutional networks for semantic segmentation, " PAMI, 2017, pp.640-651.), use
ResNet-50 cascades multiple convolutional layer results as basic convolutional neural networks structure, constructs text detection network.
The step 2) uses for reference Fully Convolutional Network (FCN) thought, and text detection task is treated as object
Body divides task.Network structure as shown in Fig. 2, firstly, used ResNet-50 as basic network structure, abstract image
The high-rise feature with after base layer texture Fusion Features.Conv5_c in Fig. 2, conv4_f, conv3_d, conv2_c and conv1 table
Show convolutional layer.In order to adapt to the character features of different scale, by the feature of conv5_c, conv4_f, conv3_d and conv2_c
It is merged, constructs f1, the several special network layers of f2, f3.Specifically, after doing anti-pondization operation by the result of conv5_c
With conv4_f result merge after after two convolution operations of 3*3 and 1*1 obtain f1 network layer, be added repeatedly
Two network layers of f2 and f3 are obtained after conv3_d, conv2_c, i.e., the knot after anti-pondization operates with conv3_d is done by the result of f1
Fruit obtains f2 network layer after two convolution operations of 3*3 and 1*1 after merging, and is done by the result of f2 same after anti-pondization operates
The result of conv2_c obtains f3 network layer after two convolution operations of 3*3 and 1*1 after merging.In Fig. 2 above " predicted value "
The representation dimension of numerical value 1 and 5,1 indicate judgement be text pixel, 5 be five parameters in five dimension true value figures.
3) while carrying out the classification of text/background two pixel-by-pixel, the same place of pixel has been returned to pixel where character area
The distance on four boundaries of text box and the tilt angle in place text box same level direction.So that network is predicting text
While region, text box can be relatively easily constructed, realizes the detection and positioning of text.
The step 3) has carried out returning task to construct the text box of text detection while two classification problems
Multi-task learning.Classify for this purpose, constructing two parallel convolutional layer f4_1 and f4_2 difference output character/backgrounds two after f3
As a result with the regression result of text box size and inclination angle.Specifically, f3 is obtained in parallel after two different 3*3 convolution operations
Two network layers f4_1 and f4_2, be respectively used to calculate two loss functions, common training.
4) in network training process, it is contemplated that sample is in distress to be had easily, in addition, character area in whole picture ratio compared with
Small, positive and negative sample proportion is extremely uneven when sampling, it usually needs introduces the method that difficult example is excavated and improves training effectiveness.The present invention will
Focal loss function (Tsung-Yi Lin, Priya Goyal, Ross B.Girshick, Kaiming He, and Piotr
Dolla′r,“Focal loss for dense object detec-tion,”in ICCV 2017,pp.2999–3007.)
It is introduced into after adjustment in the training process of text detection task, is excavated without difficult example.
The step 4) has used in object detection that (including positive negative sample is not for solving imbalance problem between training sample class
Balance and difficulty or ease sample imbalance) a part as text detection task loss function of loss function.By the loss function
Object be converted into pixel from anchor (candidate character (text) frame), by multi-class problem convert two classification problems to adapt to text
Detection task.
The step of test phase, is as follows:
1) adjustment test picture size inputs text detection network, and the testing result of text detection network is text, background
The regression results figure such as two classification charts and text box size, inclination angle.
The step 1) keeps picture ratio, and adjustment picture size to long side is no more than 2400p, and wherein p indicates pixel quantity,
That is long side is long no more than 2400 pixels.Keep to the picture for being more than limitation the scaled down of length-width ratio.
2) text pixel region that score (score) is greater than threshold value is selected to two classification charts, according to text pixel region and
Corresponding regression result removes extra text box using the non-maxima suppression algorithm (Local-Aware NMS) of part.
The two classification results figures that the step 2) exports network, filtering effective text pixel threshold value score is 0.97.Figure
3 be natural scene text detection result instance graph.
Text detection network of the invention is further illustrated below.Natural scene proposed by the present invention based on deep learning
Text detection network in the method for lower text detection positioning, which is mainly improved by FCN, to be got, as shown in Figure 2.The network
Use two loss functions, first be focal loss function (L adjustedseg) two classification (text/background) are carried out, the
Two are the loss function (L for returning text boxreg) four boundaries to text pixel apart from place text box distance and text
Frame is returned with horizontal direction angle.Total loss function equation is as follows:
L=Lseg+λLreg#(1)
Wherein balance parameters λ is set as 1.
For two assorting processes, loss function is defined as follows:
Wherein w and h refers respectively to the width and height of two classification charts of prediction, Y*Refer to given true value, αtIt is to use
In the parameter for balancing positive negative sample, γ is the parameter for balancing difficulty or ease sample, ptRefer to two sorter network predicted values.(2)
Middle ptAnd αtCalculation is similar, and calculation formula is as follows:
The wherein predicted value provided in the pixel lower network that p refers to, the true value in the pixel that y refers to, α are a ginsengs
Number.In the text detection model of the present embodiment, it is 0.95 and 2 that alpha, gamma, which distinguishes value,.α in other embodiments, γ can also be with
Other numerical value are taken as needed.
For regression process, loss function is defined as follows:
Lreg=LIoU+ηLθ#(5)
Wherein LIoURefer to the degree of overlapping (IoU) between the text box and real text frame that return, LθRefer to prediction
Loss between inclination angle and true inclination angle, η are balance parameters, and 20 are set as in this character detecting method.The two calculation is such as
Under:
WhereinR*The text box and corresponding mark text box of prediction are respectively referred to,θ*Respectively refer to the inclination of prediction
Angle and corresponding mark tilt angle, constant c is for constraining LθThe upper bound, 6 are set as in this character detecting method.
Natural scene character detecting method proposed by the present invention tests environment and experimental result are as follows:
(1) environment is tested:
System environments: ubuntu16.04;
Hardware environment: memory: 64GB, GPU:K80, hard disk: 2TB;
(2) experimental data:
Training data:
ImageNet pre-training RestNet-50 basic network.
Opened using natural scene picture 1229 (including ICDAR2013 training set 299 is opened, ICDAR2015 training set 1000
) train and arrive model stability, effect is no longer promoted
Training optimization method: ADAM
Test data: ICDAR2015 (500)
Appraisal procedure: ICDAR2015 online evaluation
(3) experimental result:
Effect to illustrate the invention, using identical data set to whether use focal loss function text of the present invention
Detection network is trained, and deconditioning when no longer being promoted to model stability effect is surveyed using ICDAR2015 test set
Examination, and compared with the character detecting method effect of existing mainstream.
Existing mainstream scheme and test comparison result of the present invention are as shown in Table 1 below:
1. existing method of table and test result of the invention comparison
Serial number | Method | P | R | F |
1 | CTPN | 0.516 | 0.742 | 0.609 |
2 | EAST | 0.836 | 0.735 | 0.782 |
3 | The present invention (does not use focal loss function) | 0.819 | 0.767 | 0.792 |
4 | The present invention (uses focal loss function) | 0.847 | 0.773 | 0.809 |
Wherein P refers to accuracy rate, and R refers to recall rate, and F is the harmonic-mean of P and R.It can be clearly seen that from table, this hair
Bright involved precision and recall rate of the text detection network than existing character detecting method CTPN and EAST has to be mentioned greatly very much
It rises, and the network model obtained using the method that focal loss function is trained has been obtained in precision and recall rate into one
Step is promoted.Wherein CTPN method referring to " Zhi Tian, Weilin Huang, Tong He, Pan He, and Yu Qiao,
“Detecting text in natural image with connectionist text proposal network,”in
ECCV 2016, pp.56-72. ", EAST method referring to " Xinyu Zhou, Cong Yao, He Wen, Yuzhi Wang,
Shuchang Zhou,Weiran He,and Jiajun Liang,“EAST:an efficient and accurate
scene text detector,”in CVPR 2017,pp.2642–2651.”。
Another embodiment of the present invention provides text detection positioning device under a kind of natural scene based on focal loss function,
Comprising:
True value figure constructing module, is responsible for the data set of the natural scene picture according to mark, and construction text/background two is classified
Five dimension true value figures of true value figure and text pixel and text box corresponding relationship where it;
Text detection net structure module is responsible for being based on FCN net structure text detection network, the text detection network
Loss function include focal loss function and return text box loss function;
Text detection network training module is responsible for the two classification true value figure and the five dimensions true value figure pair using construction
The text detection network is trained;The text detection network carries out single cent using the focal loss function pixel-by-pixel
Two classification of word, background, and using the loss function for returning text box to the height of text box where pixel, width and
Tilt angle is returned;
The detection locating module of text is responsible for natural scene picture to be detected inputting the text detection net that training is completed
Network realizes the detection positioning of text.
The above embodiments are merely illustrative of the technical solutions of the present invention rather than is limited, the ordinary skill of this field
Personnel can be with modification or equivalent replacement of the technical solution of the present invention are made, without departing from the spirit and scope of the present invention, this
The protection scope of invention should be subject to described in claims.
Claims (10)
1. text detection localization method under a kind of natural scene based on focal loss function, which is characterized in that including following step
It is rapid:
1) according to the data set of the natural scene picture of mark, construct text/background two classify true value figure and text pixel with
Five dimension true value figures of text box corresponding relationship where it;
2) it is based on FCN net structure text detection network, the loss function of the text detection network includes focal loss function
With the loss function for returning text box;
3) the text detection network is trained using the two classification true value figure of construction and the five dimensions true value figure;Institute
Two classification that text detection network carries out single cent word, background using the focal loss function pixel-by-pixel are stated, and are returned using described
The loss function of text box is returned to return height, width and the tilt angle of text box where pixel;
4) natural scene picture to be detected is inputted into the text detection network that training is completed, realizes the detection positioning of text.
2. the method as described in claim 1, which is characterized in that existing mark is adjusted to two classification annotations first by step 1),
The pixel of character area is set as 1, and the pixel of background area is set as 0, constructs the classification true value figure of text/background two;Then it calculates
Five dimension true value figures of four frontier distances and place text box same level angular separation of pixel and the minimum rectangle frame surrounded out.
3. the method as described in claim 1, which is characterized in that step 2) uses ResNet-50 as basic convolutional Neural net
Network structure cascades multiple convolutional layer results, constructs text detection network.
4. method as claimed in claim 3, which is characterized in that in step 2) the text detection network, by the knot of conv5_c
The result that fruit is cooked after anti-pondization operates with conv4_f obtains f1 network after two convolution operations of 3*3 and 1*1 after merging
Layer;It is done after the result after anti-pondization operates with conv3_d merges after two convolution operations of 3*3 and 1*1 by the result of f1
Obtain f2 network layer;It is done after the result after anti-pondization operates with conv2_c merges by the result of f2 through 3*3 and 1*1 two
F3 network layer is obtained after convolution operation;F3 obtains two parallel network layer f4_1 after two different 3*3 convolution operations
And f4_2, it is respectively used to calculate two loss functions, common training.
5. the method as described in claim 1, which is characterized in that the formula of the focal loss function is as follows:
Wherein w and h refers respectively to the width and height of two classification charts of prediction, Y*Refer to given true value, αtIt is for putting down
Weigh the parameter of positive negative sample, and γ is the parameter for balancing difficulty or ease sample, ptRefer to two sorter network predicted values;ptAnd αtMeter
It is as follows to calculate formula:
The wherein predicted value provided in the pixel lower network that p refers to, the true value in the pixel that y refers to.
6. the method as described in claim 1, which is characterized in that the loss function for returning text box is defined as follows:
Lreg=LIoU+ηLθ
Wherein LIoURefer to the degree of overlapping between the text box and real text frame that return, LθRefer to the inclination angle of prediction and true
Loss between inclination angle, η are balance parameters;LIoUAnd LθCalculation it is as follows:
WhereinR*The text box and corresponding mark text box of prediction are respectively referred to,θ*Respectively refer to prediction tilt angle and
Corresponding mark tilt angle, constant c is for constraining LθThe upper bound.
7. the method as described in claim 1, which is characterized in that step 4) the following steps are included:
4.1) natural scene picture to be detected is subjected to scaled down to long side < 2400p;
4.2) natural scene picture to be detected input text detection network is obtained into two classification results and regression result;
4.3) the text pixel region greater than threshold value is selected to two classification charts, is tied according to text pixel region and corresponding recurrence
Fruit removes extra text box using the maximum restrainable algorithms of part.
8. text detection positioning device under a kind of natural scene based on focal loss function characterized by comprising
True value figure constructing module is responsible for the data set of the natural scene picture according to mark, the construction classification true value of text/background two
Five dimension true value figures of figure and text pixel and text box corresponding relationship where it;
Text detection net structure module is responsible for being based on FCN net structure text detection network, the damage of the text detection network
Losing function includes focal loss function and the loss function for returning text box;
Text detection network training module is responsible for the two classification true value figure and the five dimensions true value figure using construction to described
Text detection network is trained;The text detection network carries out single cent word, back using the focal loss function pixel-by-pixel
Two classification of scape, and height, width and inclination using the loss function for returning text box to text box where pixel
Angle is returned;
The detection locating module of text is responsible for natural scene picture to be detected inputting the text detection network that training is completed,
Realize the detection positioning of text.
9. device as claimed in claim 8, which is characterized in that existing mark is adjusted to by the true value figure constructing module first
Two classification annotations, the pixel of character area are set as 1, and the pixel of background area is set as 0, construct the classification true value of text/background two
Figure;Then four frontier distances and the place text box same level angular separation of pixel and the minimum rectangle frame surrounded are calculated
Five dimension true value figures.
10. device as described in claim 1, which is characterized in that the formula of the focal loss function is as follows:
Wherein w and h refers respectively to the width and height of two classification charts of prediction, Y*Refer to given true value, αtIt is for putting down
Weigh the parameter of positive negative sample, and γ is the parameter for balancing difficulty or ease sample, ptRefer to two sorter network predicted values;ptAnd αtMeter
It is as follows to calculate formula:
The wherein predicted value provided in the pixel lower network that p refers to, the true value in the pixel that y refers to.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810729838.2A CN109117836B (en) | 2018-07-05 | 2018-07-05 | Method and device for detecting and positioning characters in natural scene based on focus loss function |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810729838.2A CN109117836B (en) | 2018-07-05 | 2018-07-05 | Method and device for detecting and positioning characters in natural scene based on focus loss function |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109117836A true CN109117836A (en) | 2019-01-01 |
CN109117836B CN109117836B (en) | 2022-05-24 |
Family
ID=64821941
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810729838.2A Active CN109117836B (en) | 2018-07-05 | 2018-07-05 | Method and device for detecting and positioning characters in natural scene based on focus loss function |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109117836B (en) |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109740542A (en) * | 2019-01-07 | 2019-05-10 | 福建博思软件股份有限公司 | Method for text detection based on modified EAST algorithm |
CN109948480A (en) * | 2019-03-05 | 2019-06-28 | 中国电子科技集团公司第二十八研究所 | A kind of non-maxima suppression method for arbitrary quadrilateral |
CN110069985A (en) * | 2019-03-12 | 2019-07-30 | 北京三快在线科技有限公司 | Aiming spot detection method based on image, device, electronic equipment |
CN110378243A (en) * | 2019-06-26 | 2019-10-25 | 深圳大学 | A kind of pedestrian detection method and device |
CN110674932A (en) * | 2019-09-30 | 2020-01-10 | 北京小米移动软件有限公司 | Two-stage convolutional neural network target detection network training method and device |
CN110807523A (en) * | 2019-10-23 | 2020-02-18 | 中科智云科技有限公司 | Method and equipment for generating detection model of similar target |
CN110827253A (en) * | 2019-10-30 | 2020-02-21 | 北京达佳互联信息技术有限公司 | Training method and device of target detection model and electronic equipment |
CN110991440A (en) * | 2019-12-11 | 2020-04-10 | 易诚高科(大连)科技有限公司 | Pixel-driven mobile phone operation interface text detection method |
CN111274985A (en) * | 2020-02-06 | 2020-06-12 | 咪咕文化科技有限公司 | Video text recognition network model, video text recognition device and electronic equipment |
CN111460247A (en) * | 2019-01-21 | 2020-07-28 | 重庆邮电大学 | Automatic detection method for network picture sensitive characters |
CN111582265A (en) * | 2020-05-14 | 2020-08-25 | 上海商汤智能科技有限公司 | Text detection method and device, electronic equipment and storage medium |
CN112149620A (en) * | 2020-10-14 | 2020-12-29 | 南昌慧亦臣科技有限公司 | Method for constructing natural scene character region detection model based on no anchor point |
CN112184688A (en) * | 2020-10-10 | 2021-01-05 | 广州极飞科技有限公司 | Network model training method, target detection method and related device |
CN113139539A (en) * | 2021-03-16 | 2021-07-20 | 中国科学院信息工程研究所 | Method and device for detecting characters of arbitrary-shaped scene with asymptotic regression boundary |
WO2023125244A1 (en) * | 2021-12-30 | 2023-07-06 | 中兴通讯股份有限公司 | Character detection method, terminal, and readable storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103279753A (en) * | 2013-06-09 | 2013-09-04 | 中国科学院自动化研究所 | English scene text block identification method based on instructions of tree structures |
CN105184312A (en) * | 2015-08-24 | 2015-12-23 | 中国科学院自动化研究所 | Character detection method and device based on deep learning |
CN105335754A (en) * | 2015-10-29 | 2016-02-17 | 小米科技有限责任公司 | Character recognition method and device |
CN106096531A (en) * | 2016-05-31 | 2016-11-09 | 安徽省云力信息技术有限公司 | A kind of traffic image polymorphic type vehicle checking method based on degree of depth study |
US20180032840A1 (en) * | 2016-07-27 | 2018-02-01 | Beijing Kuangshi Technology Co., Ltd. | Method and apparatus for neural network training and construction and method and apparatus for object detection |
-
2018
- 2018-07-05 CN CN201810729838.2A patent/CN109117836B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103279753A (en) * | 2013-06-09 | 2013-09-04 | 中国科学院自动化研究所 | English scene text block identification method based on instructions of tree structures |
CN105184312A (en) * | 2015-08-24 | 2015-12-23 | 中国科学院自动化研究所 | Character detection method and device based on deep learning |
CN105335754A (en) * | 2015-10-29 | 2016-02-17 | 小米科技有限责任公司 | Character recognition method and device |
CN106096531A (en) * | 2016-05-31 | 2016-11-09 | 安徽省云力信息技术有限公司 | A kind of traffic image polymorphic type vehicle checking method based on degree of depth study |
US20180032840A1 (en) * | 2016-07-27 | 2018-02-01 | Beijing Kuangshi Technology Co., Ltd. | Method and apparatus for neural network training and construction and method and apparatus for object detection |
Cited By (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109740542A (en) * | 2019-01-07 | 2019-05-10 | 福建博思软件股份有限公司 | Method for text detection based on modified EAST algorithm |
CN109740542B (en) * | 2019-01-07 | 2020-11-27 | 福建博思软件股份有限公司 | Text detection method based on improved EAST algorithm |
CN111460247A (en) * | 2019-01-21 | 2020-07-28 | 重庆邮电大学 | Automatic detection method for network picture sensitive characters |
CN111460247B (en) * | 2019-01-21 | 2022-07-01 | 重庆邮电大学 | Automatic detection method for network picture sensitive characters |
CN109948480A (en) * | 2019-03-05 | 2019-06-28 | 中国电子科技集团公司第二十八研究所 | A kind of non-maxima suppression method for arbitrary quadrilateral |
CN110069985A (en) * | 2019-03-12 | 2019-07-30 | 北京三快在线科技有限公司 | Aiming spot detection method based on image, device, electronic equipment |
CN110378243A (en) * | 2019-06-26 | 2019-10-25 | 深圳大学 | A kind of pedestrian detection method and device |
CN110674932A (en) * | 2019-09-30 | 2020-01-10 | 北京小米移动软件有限公司 | Two-stage convolutional neural network target detection network training method and device |
CN110807523A (en) * | 2019-10-23 | 2020-02-18 | 中科智云科技有限公司 | Method and equipment for generating detection model of similar target |
CN110807523B (en) * | 2019-10-23 | 2022-08-05 | 中科智云科技有限公司 | Method and equipment for generating detection model of similar target |
CN110827253A (en) * | 2019-10-30 | 2020-02-21 | 北京达佳互联信息技术有限公司 | Training method and device of target detection model and electronic equipment |
CN110991440A (en) * | 2019-12-11 | 2020-04-10 | 易诚高科(大连)科技有限公司 | Pixel-driven mobile phone operation interface text detection method |
CN110991440B (en) * | 2019-12-11 | 2023-10-13 | 易诚高科(大连)科技有限公司 | Pixel-driven mobile phone operation interface text detection method |
CN111274985A (en) * | 2020-02-06 | 2020-06-12 | 咪咕文化科技有限公司 | Video text recognition network model, video text recognition device and electronic equipment |
CN111274985B (en) * | 2020-02-06 | 2024-03-26 | 咪咕文化科技有限公司 | Video text recognition system, video text recognition device and electronic equipment |
CN111582265A (en) * | 2020-05-14 | 2020-08-25 | 上海商汤智能科技有限公司 | Text detection method and device, electronic equipment and storage medium |
CN112184688A (en) * | 2020-10-10 | 2021-01-05 | 广州极飞科技有限公司 | Network model training method, target detection method and related device |
CN112184688B (en) * | 2020-10-10 | 2023-04-18 | 广州极飞科技股份有限公司 | Network model training method, target detection method and related device |
CN112149620A (en) * | 2020-10-14 | 2020-12-29 | 南昌慧亦臣科技有限公司 | Method for constructing natural scene character region detection model based on no anchor point |
CN113139539A (en) * | 2021-03-16 | 2021-07-20 | 中国科学院信息工程研究所 | Method and device for detecting characters of arbitrary-shaped scene with asymptotic regression boundary |
WO2023125244A1 (en) * | 2021-12-30 | 2023-07-06 | 中兴通讯股份有限公司 | Character detection method, terminal, and readable storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN109117836B (en) | 2022-05-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109117836A (en) | Text detection localization method and device under a kind of natural scene based on focal loss function | |
Li et al. | Automatic pavement crack detection by multi-scale image fusion | |
Ping et al. | A deep learning approach for street pothole detection | |
US8509478B2 (en) | Detection of objects in digital images | |
CN106355188A (en) | Image detection method and device | |
CN105574550A (en) | Vehicle identification method and device | |
CN109285139A (en) | A kind of x-ray imaging weld inspection method based on deep learning | |
CN107346420A (en) | Text detection localization method under a kind of natural scene based on deep learning | |
CN109711288A (en) | Remote sensing ship detecting method based on feature pyramid and distance restraint FCN | |
CN105975929A (en) | Fast pedestrian detection method based on aggregated channel features | |
CN108765386A (en) | A kind of tunnel slot detection method, device, electronic equipment and storage medium | |
CN107273832B (en) | License plate recognition method and system based on integral channel characteristics and convolutional neural network | |
CN105512683A (en) | Target positioning method and device based on convolution neural network | |
CN111368690A (en) | Deep learning-based video image ship detection method and system under influence of sea waves | |
CN109858547A (en) | A kind of object detection method and device based on BSSD | |
CN113111727A (en) | Method for detecting rotating target in remote sensing scene based on feature alignment | |
CN110008899B (en) | Method for extracting and classifying candidate targets of visible light remote sensing image | |
CN110059539A (en) | A kind of natural scene text position detection method based on image segmentation | |
CN113033516A (en) | Object identification statistical method and device, electronic equipment and storage medium | |
CN110020669A (en) | A kind of license plate classification method, system, terminal device and computer program | |
CN111382766A (en) | Equipment fault detection method based on fast R-CNN | |
CN105868269A (en) | Precise image searching method based on region convolutional neural network | |
CN108664970A (en) | A kind of fast target detection method, electronic equipment, storage medium and system | |
Zhang et al. | CFANet: Efficient detection of UAV image based on cross-layer feature aggregation | |
Kuchi et al. | A machine learning approach to detecting cracks in levees and floodwalls |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |