CN107563379A - For the localization method to natural scene image Chinese version - Google Patents
For the localization method to natural scene image Chinese version Download PDFInfo
- Publication number
- CN107563379A CN107563379A CN201710781807.7A CN201710781807A CN107563379A CN 107563379 A CN107563379 A CN 107563379A CN 201710781807 A CN201710781807 A CN 201710781807A CN 107563379 A CN107563379 A CN 107563379A
- Authority
- CN
- China
- Prior art keywords
- convolutional layer
- output
- layer
- msup
- neural networks
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Landscapes
- Image Analysis (AREA)
Abstract
The invention discloses one kind to be based on full convolutional neural networks natural scene text positioning method.Its step is:(1) images to be recognized sample is inputted;(2) normalized;(3) build and train full convolutional neural networks;(4) coordinate parameters of full convolutional neural networks output are screened;(5) text in natural scene image is positioned;The present invention builds and trains full convolutional neural networks, the image of text will be contained under natural scene as input, it can efficiently solve and be not enough to characterize profound text message in the prior art only with the feature of single shallow-layer, cause computationally intensive during using artificial extraction feature and be not carried out the problem of end-to-end autotext positions, the present invention, which has, combines a variety of characteristics of image, obtain the text message of more rich deeper, the advantages of improving the String localization precision in natural scene image.
Description
Technical field
The invention belongs to technical field of image processing, further relates to a kind of in framing technical field be used for certainly
The localization method of right scene image Chinese version.The present invention is directed to the natural image containing text, using the complete of fusion angle change
Convolutional neural networks, arbitrary size image Chinese version can be positioned.
Background technology
With the increase at double of view data, text is extracted from substantial amounts of natural scene image becomes research heat
Point, String localization have become a critically important research theme in technical field of image processing.
Paper patent at present in terms of the String localization in natural scene image is very more, from the technology path taken
From the point of view of, the technology of String localization is broadly divided into four kinds:Technology based on connected region, the technology based on edge, based on texture
Technology and the technology based on angle point.Also two kinds of technologies are combined the String localization technology used together.Some is based on texture
Text positioning method, although can be with localization of text region, for some non-textual areas under complex scene sometimes
Domain but texture and text filed similar situation, easily occur deviation in positioning.Text positioning method of some based on angle point,
It is but more preferable than English text on positioning Chinese text because the angle point that Chinese is complicated, stroke is formed is more.These
Method belongs to shallow-layer study category, depends on feature extraction, it is necessary to a large amount of manual workings, and only with single spy
Sign, it is not enough to characterize more fully target information, causes String localization effect bad.
In its patent document applied, " text in a kind of complex background image is determined for Institute of Information Engineering, CAS
A kind of complex background figure is disclosed in position method " (number of patent application CN201610153384.X, publication number CN105825216A)
Text positioning method as in.This method uses respectively on tri- passages of R, G and B of pending coloured image first
MSERs algorithms are handled, and obtain coordinate of three MSERs regions on coloured image, then to after the denoising of MSERs regions again
Extraction setting feature and based on the setting feature to candidate's MSERs territorial classifications, obtain including the MSERs regions of text, finally
Obtained text block is connected into text bar, and duplicate removal processing is carried out to text bar.Although this method is that one kind is applied in complexity
Chinese text localization method under scene, still, this method, which still has weak point, is, because this method is used based on company
The technology in logical region, the feature extraction dependent on setting take longer, it is necessary to substantial amounts of manual working, computationally intensive, not carried
Get deeper feature, it is impossible to characterize more fully text message, the problem of causing String localization effect bad, easily will
The non-textual region flase drop similar with text, it is not carried out autotext positioning end to end.
Paper " the Multi-Oriented Text Detection In Scene that M.Basavanna et al. delivers at it
Images”(International Journal of Pattern Recognition&Artificial Intelligence,
2012,26(07):A kind of text positioning method based on edge is disclosed in 1255010-1-1255010-19.).This method is first
First with Sobel operator extractions image border, the Sobel edge features of image are obtained, it is then big according to distance between character edge
The rule of small fixation, increase the size of calculated level direction character edge spacing using edge, be finally solution character and character
Between Characters Stuck problem caused by edge spacing very little, using the method for zero cross point, so as to position the text in image.Should
Weak point is existing for method, due to using Zero-Crossing Method, it is desirable to which text distribution must be horizontal, otherwise will go out
Now intersect, therefore be directed to the inclined situation of text, the accuracy rate of this method String localization is not high.
The content of the invention
The purpose of the present invention is to be directed to above-mentioned the shortcomings of the prior art, it is proposed that one kind is used for natural scene image
The localization method of Chinese version.The present invention is compared with other natural scene String localization technologies in the prior art, without artificial setting
Feature, strong adaptability, accuracy rate are high.
The specific steps that the present invention realizes include as follows:
(1) image pattern to be identified is inputted:
(1a) is from the artificial synthesized training that Chinese text RCTW-17 is read containing text image data collection and natural scene
In data set, the random image composition training sample set for extracting 32000 width known text coordinates;
(1b) shoots the image composition test sample collection that 200 width contain text from natural scene;
(2) normalized:
Each sample of training sample set and test sample collection is zoomed to 416 × 416 sizes by (2a), after composition scales
Training sample set and test sample collection;
(2b) concentrates the pixel value of each sample to be normalized the training sample set after scaling and test sample,
Obtain the training sample set after normalized and test sample collection;
(3) build and train full convolutional neural networks:
(3a) structure contains 23 layers of full convolutional neural networks;
The sample that training sample after normalized is concentrated is input in full convolutional neural networks by (3b), to full convolution
Neutral net is trained, until the penalty values of the output vector of full convolutional neural networks output layer are less than or equal to 10;
(4) coordinate parameters in full convolutional neural networks output vector are screened:
(4a) concentrates from the test sample after normalized obtains a sample do not tested as input sample;
Current input sample is input in the full convolutional neural networks trained by (4b), obtains the full convolution of current sample
Text detection probable value in the output vector of neutral net output layer and in the output vector of full convolutional neural networks output layer
Coordinate parameters;
(4c) judges whether the output vector Chinese version detection probability value of current input sample is more than or equal to 0.6, if so, then
Step (4d) is performed, otherwise, performs step (4e);
(4d) retains the coordinate parameters corresponding with the output vector Chinese version detection probability value of current input sample, performs
Step (4f);
(4e) gives up the coordinate parameters corresponding with the output vector Chinese version detection probability value of current input sample, performs
Step (4f);
(4f) judges test sample concentrates whether also have the sample do not tested, if so, step (4a) is then performed, otherwise,
Perform step (5);
(5) text in natural scene image is positioned:
Using the coordinate parameters remained in full convolutional neural networks output layer output vector, test set sample is demarcated successively
The text of each sample in this.
The present invention has advantages below compared with prior art:
Firstth, due to present invention employs containing 23 layers of full convolutional neural networks, in the image for extracting natural scene
Deeper level, more fully text message feature, overcome and use single shallow-layer characteristic present text message not in the prior art
Comprehensively, the problem of causing String localization effect bad so that the present invention can meticulously describe the text of natural scene image comprehensively
Feature, improve the accuracy rate of String localization.
Secondth, due to present invention employs being trained to full convolutional neural networks, by training full convolutional neural networks,
The coordinate parameters and text detection probability of full convolutional neural networks output are obtained, are overcome special using artificial extraction in the prior art
Sign causes location Calculation amount big, time-consuming longer, while can not realize autotext positioning end to end and positioning tilt text effect
The problem of fruit is poor so that the present invention can learn the deeper feature of image automatically, independent of artificial extraction feature, reduce
Amount of calculation, realize that text is automatically positioned and improved the locating effect for tilting text end to end.
Brief description of the drawings
Fig. 1 is the flow chart of the present invention;
Fig. 2 is the flow chart for the coordinate parameters step that the present invention screens full convolutional neural networks output;
Fig. 3 is a width figure of the training sample set of emulation experiment input of the present invention;
Fig. 4 is the test sample input figure and test sample output figure of emulation experiment of the present invention.
Embodiment
The present invention will be further described below in conjunction with the accompanying drawings.
Reference picture 1, what the present invention realized comprises the following steps that:
Step 1, image pattern to be identified is inputted.
From the artificial synthesized training data that Chinese text RCTW-17 is read containing text image data collection and natural scene
Concentrate, the random image composition training sample set for extracting 32000 width known text coordinates.
The image composition test sample collection that 200 width contain text is shot from natural scene.
Step 2, normalized.
Each sample of training sample set and test sample collection is zoomed into 416 × 416 sizes, forms the training after scaling
Sample set and test sample collection.
Concentrate the pixel value of each sample to be normalized the training sample set after scaling and test sample, obtain
Training sample set and test sample collection after normalized.
Step 3, build and train full convolutional neural networks.
For structure containing 23 layers of full convolutional neural networks, the structure of 23 layers of full convolutional neural networks is convolutional layer successively
Conv1, convolutional layer Conv2, without sampling convolutional layer NSConv3, low step convolutional layer LSConv4, convolutional layer Conv5, without sampling roll up
Lamination NSConv6, low step convolutional layer LSConv7, convolutional layer Conv8, without sampling convolutional layer NSConv9, low step convolutional layer
LSConv10, without sampling convolutional layer NSConv11, low step convolutional layer LSConv12, convolutional layer Conv13, without sampling convolutional layer
NSConv14, low step convolutional layer LSConv15, without sampling convolutional layer NSConv16, low step convolutional layer LSConv17, without sampling roll up
Lamination NSConv18, without sampling convolutional layer NSConv19, without sampling convolutional layer NSConv20, without sampling convolutional layer NSConv21,
Low step convolutional layer LSConv22, output layer, build comprising the following steps that containing 23 layers of full convolutional neural networks:
1st step, the sample that the training sample after normalized is concentrated is input to convolutional layer Conv1, it is entered successively
Row convolution operation, batch standardization, linear R elu conversion and the operation of maximum down-sampling, obtain 32 208 × 208 pixel sizes
The feature vector chart of convolutional layer Conv1 outputs;
2nd step, convolutional layer Conv1 output characteristic vectograms are input to convolutional layer Conv2, convolution behaviour is carried out successively to it
Make, criticize standardization, linear R elu conversion and the operation of maximum down-sampling, obtain the convolutional layer of 64 104 × 104 pixel sizes
The feature vector chart of Conv2 outputs;
3rd step, the convolutional layer Conv2 feature vector charts exported are input to without convolutional layer NSConv3 is sampled, to it successively
Carry out convolution operation, batch standardization and linear R elu conversion, obtain 128 104 × 104 pixel sizes without sampling convolutional layer
The feature vector chart of NSConv3 outputs;
4th step, convolutional layer Conv3 output characteristic vectograms are input to low step convolutional layer LSConv4, it is carried out successively
Low step convolution operation and linear R elu conversion, the low convolutional layer LSConv4 outputs for obtaining 64 104 × 104 pixel sizes are special
Levy vectogram;
5th step, convolutional layer LSConv4 output characteristic vectograms are input to convolutional layer Conv5, convolution is carried out successively to it
Operation, batch standardization, linear R elu conversion and the operation of maximum down-sampling, obtain the convolutional layer of 128 52 × 52 pixel sizes
Conv5 output characteristic vectograms;
6th step, convolutional layer Conv5 output characteristic vectograms are input to without sampling convolutional layer NSConv6, it is entered successively
Row convolution operation, batch standardization and linear R elu conversion, obtain 256 52 × 52 pixel sizes without sampling convolutional layer
NSConv6 output characteristic vectograms;
7th step, convolutional layer Conv6 output characteristic vectograms are input to low step convolutional layer LSConv7, it is carried out successively
Low step convolution operation and linear R elu conversion, obtain the low convolutional layer LSConv7 output characteristics of 128 52 × 52 pixel sizes
Vectogram;
8th step, convolutional layer LSConv7 output characteristic vectograms are input to convolutional layer Conv8, convolution is carried out successively to it
Operation, batch standardization, linear R elu conversion and the operation of maximum down-sampling, obtain the convolutional layer of 256 26 × 26 pixel sizes
Conv8 output characteristic vectograms;
9th step, convolutional layer Conv8 output characteristic vectograms are input to convolutional layer Conv9, convolution behaviour is carried out successively to it
Make, criticize standardization and linear R elu conversion, obtain the convolutional layer Conv9 output characteristics vector of 512 26 × 26 pixel sizes
Figure;
10th step, convolutional layer Conv9 output characteristic vectograms are input to low step convolutional layer LSConv10, it is entered successively
The low step convolution operation of row and linear R elu conversion, obtain the low convolutional layer LSConv10 outputs of 256 26 × 26 pixel sizes
Feature vector chart;
11st step, low convolutional layer LSConv10 output characteristic vectograms are input to without sampling convolutional layer NSConv11, it is right
It carries out convolution operation, batch standardization and linear R elu conversion successively, obtain 512 26 × 26 pixel sizes without sampling
Convolutional layer NSConv11 output characteristic vectograms;
12nd step, convolutional layer Conv11 output characteristic vectograms are input to low step convolutional layer LSConv12, to it successively
Low step convolution operation and linear R elu conversion are carried out, the low convolutional layer LSConv12 for obtaining 256 26 × 26 pixel sizes is defeated
Go out feature vector chart;
13rd step, convolutional layer LSConv12 output characteristic vectograms are input to convolutional layer Conv13, it is carried out successively
Convolution operation, batch standardization, linear R elu conversion and the operation of maximum down-sampling, obtain the volume of 512 13 × 13 pixel sizes
Lamination Conv13 output characteristic vectograms;
14th step, by convolutional layer Conv13 output characteristic vectograms be input to without sampling convolutional layer NSConv14, to its according to
Secondary progress convolution operation, batch standardization and linear R elu conversion, obtain 1024 13 × 13 pixel sizes without sampling convolution
Layer NSConv14 output characteristic vectograms;
15th step, convolutional layer Conv14 output characteristic vectograms are input to low step convolutional layer LSConv15, to it successively
Low step convolution operation and linear R elu conversion are carried out, the low convolutional layer LSConv15 for obtaining 512 13 × 13 pixel sizes is defeated
Go out feature vector chart;
16th step, low convolutional layer LSConv15 output characteristic vectograms are input to without sampling convolutional layer NSConv16, it is right
It carries out convolution operation, batch standardization and linear R elu conversion successively, obtain 1024 13 × 13 pixel sizes without sampling
Convolutional layer NSConv16 output characteristic vectograms;
17th step, convolutional layer Conv16 output characteristic vectograms are input to low step convolutional layer LSConv17, to it successively
Low step convolution operation and linear R elu conversion are carried out, the low convolutional layer LSConv17 for obtaining 512 13 × 13 pixel sizes is defeated
Go out feature vector chart;
18th step, low convolutional layer LSConv17 output characteristic vectograms are input to without sampling convolutional layer NSConv18, it is right
It carries out convolution operation, batch standardization and linear R elu conversion successively, obtain 1024 13 × 13 pixel sizes without sampling
Convolutional layer NSConv18 output characteristic vectograms;
19th step, by convolutional layer Conv18 output characteristic vectograms be input to without sampling convolutional layer NSConv19, to its according to
Secondary progress convolution operation, batch standardization and linear R elu conversion, obtain 1024 13 × 13 pixel sizes without sampling convolution
Layer NSConv19 output characteristic vectograms;
20th step, Conv19 output characteristic vectograms are input to without sampling convolutional layer NSConv20, it is carried out successively
Convolution operation, batch standardization and linear R elu conversion, obtain 1024 13 × 13 pixel sizes without sampling convolutional layer
NSConv20 output characteristic vectograms;
21st step, by convolutional layer Conv20 output characteristic vectograms be input to without sampling convolutional layer NSConv21, to its according to
Secondary progress convolution operation, batch standardization and linear R elu conversion, obtain 1024 13 × 13 pixel sizes without sampling convolution
Layer NSConv21 output characteristic vectograms;
22nd step, convolutional layer Conv21 output characteristic vectograms are input to low step convolutional layer LSConv22, to it successively
Low step convolution operation and linear R elu conversion are carried out, obtains the low convolutional layer LSConv22 outputs of 40 13 × 13 pixel sizes
Feature vector chart;
23rd step, low convolutional layer LSConv22 output characteristic vectograms are input to output layer, it carried out successively linear
Conversion and non-linear Sigmoid transform, obtain the output vector of output layer, wherein output vector is by the defeated of full convolutional neural networks
Go out the coordinate parameters exported in layer and text detection probable value composition.
The sample that training sample after normalized is concentrated is input in full convolutional neural networks, to full convolutional Neural
Network is trained, until full convolutional neural networks output layer output vector penalty values be less than or equal to 10, to full convolution god
Comprised the following steps that through what network was trained:
1st step, one sample of extraction is concentrated from training sample, the full convolutional neural networks that sample input is built
In, then exported accordingly from the output layer of full convolutional neural networks, in this stage, the information of the sample passes through full convolution
The successively conversion of neutral net, it is delivered to the output layer of full convolutional neural networks;
2nd step, the penalty values of the output vector of full convolutional neural networks output layer are calculated, using ADAM algorithms to full convolution
All parameters of neutral net carry out the adjustment for having supervision so that the penalty values of the output vector of full convolutional neural networks output layer
It is gradually reduced;
3rd step, the 1st step is repeated and the step of 2 step, until the loss of the output vector of full convolutional neural networks output layer
Value is less than or equal to stop iteration when 10, obtains and preserve the full convolutional neural networks trained.
The penalty values of the output vector of full convolutional neural networks output layer are obtained by following formula:
Wherein, X represents the penalty values of abscissa value in full convolutional neural networks output layer output vector, xpRepresent output to
Abscissa value in amount, xrThe real abscissa value in input sample Chinese one's respective area is represented, Y represents full convolutional neural networks output
The penalty values of ordinate value, y in layer output vectorpRepresent the ordinate value in output vector, yrRepresent input sample Chinese local area
The real ordinate value in domain, W represent the penalty values of width value in full convolutional neural networks output layer output vector, wpRepresent output
Width value in vector, wrThe real width value in input sample Chinese one's respective area is represented, H represents full convolutional neural networks output layer
The penalty values of height value, h in output vectorpRepresent the height value in output vector, hrRepresent that input sample Chinese one's respective area is true
Height value, S represents the penalty values of sine value in full convolutional neural networks output layer output vector, spRepresent in output vector
Sine value, srThe real sine value in input sample Chinese one's respective area is represented, T represents full convolutional neural networks output layer output vector
The penalty values of middle cosine value, tpRepresent the cosine value in output vector, trRepresent the real cosine in input sample Chinese one's respective area
Value, C represent the penalty values of full convolutional neural networks output layer output vector Chinese version detection probability value, cpRepresent in output vector
Text detection probable value, crRepresent the real text detection idealized probability value in input sample Chinese one's respective area, l represent output to
The penalty values of amount.
Step 4, the coordinate parameters of full convolutional neural networks output layer output are screened.
Reference picture 2, the coordinate parameters specific steps for screening full convolutional neural networks output layer output are described as follows.
1st step, concentrated from the test sample after normalized and obtain a sample do not tested as input sample;
2nd step, current input sample is input in the full convolutional neural networks trained, obtains the full volume of current sample
The output vector of text detection probable value and full convolutional neural networks output layer in the output vector of product neutral net output layer
In coordinate parameters;
3rd step, judges whether the output vector Chinese version detection probability value of current input sample is more than or equal to 0.6, if so,
The 4th step of this step is then performed, otherwise, performs the 5th step of this step;
4th step, retain the coordinate parameters corresponding with the output vector Chinese version detection probability value of current input sample, hold
6th step of this step of row;
5th step, give up the coordinate parameters corresponding with the output vector Chinese version detection probability value of current input sample, hold
6th step of this step of row;
6th step, judges test sample concentrates whether also have the sample do not tested, if so, then performing the 1st of this step
Step, otherwise, perform step 5.
Step 5, the text in natural scene image is positioned.
Using the coordinate parameters remained in full convolutional neural networks output layer output vector, test set sample is demarcated successively
The text of each sample in this.
The effect of the present invention is further described with reference to emulation experiment.
1st, emulation experiment condition:
Hardware platform is:Intel Core i7-6700CPU@3.40GHz、32GB RAM、Nvidia GeForce GTX
1060 6GB GPU, software platform:Python3.5.2, Tensorflow1.0.1.
2nd, experiment content and interpretation of result:
Training sample set used in emulation experiment of the present invention is to contain text image data collection and nature from artificial synthesized
Read under scene in Chinese text RCTW-17 training datas, the instruction of the random image composition for extracting 32000 width known text coordinates
Practice sample set.
Image shown in Fig. 3 (a) is that training sample concentration belongs to an artificial synthesized width containing text image data collection
Image, the image shown in Fig. 3 (b) are that training sample concentration belongs to reading Chinese text RCTW-17 training datas under natural scene
One width training image of collection.The image composition test sample collection that 200 width shot from natural scene contain text.
Image shown in Fig. 4 (a) is the width test image that test sample is concentrated.Image shown in Fig. 4 (b) is to Fig. 4
(a) test image, by 23 layers of full convolutional neural networks and calibrated image is passed through.
The present invention initially sets up the test sample collection of the training sample set containing 32000 width images and 200 width images.To instruction
Practice sample set and the training sample set after being normalized and test sample collection is normalized in test sample collection.Use
32000 width training samples train 23 layers of full convolutional neural networks, the 23 layers of full convolutional neural networks trained.By 200 width
Test sample is input in the full convolutional neural networks of 23 layers trained, draws the coordinate parameters and text detection probability of output,
The coordinate parameters finally retained by screening, the test sample of input is demarcated using the coordinate parameters of reservation
Table 1 is the simulation result of the present invention, in the 200 width test samples after test, the successful sample of localization of text
This number is 162 width, and the number of samples for positioning all text failures is 38 width.
Test sample collection String localization result list under the natural scene of table 1.
All successful samples | The sample all to fail | All samples | |
Number of samples | 162 | 38 | 200 |
Ratio | 81.0% | 19% | 100.0% |
From table 1 it follows that be used for using proposed by the present invention to the localization method of natural scene image Chinese version,
The successful accuracy rate of localization of text is 81.0% in the image containing text under natural scene, it was demonstrated that the present invention is by building and instructing
Practice full convolutional neural networks, the text message of deeper in image can be extracted, have and combine image various features, obtain more complete
The text message in face, improve natural scene under text locating accuracy the advantages of.
Claims (7)
1. a kind of localization method for natural scene image Chinese version, it is characterised in that comprise the following steps:
(1) image pattern to be identified is inputted:
(1a) is from the artificial synthesized training data that Chinese text RCTW-17 is read containing text image data collection and natural scene
Concentrate, the random image composition training sample set for extracting 32000 width known text coordinates;
(1b) shoots the image composition test sample collection that 200 width contain text from natural scene;
(2) normalized:
Each sample of training sample set and test sample collection is zoomed to 416 × 416 sizes by (2a), forms the training after scaling
Sample set and test sample collection;
(2b) concentrates the pixel value of each sample to be normalized the training sample set after scaling and test sample, obtains
Training sample set and test sample collection after normalized;
(3) build and train full convolutional neural networks:
(3a) structure contains 23 layers of full convolutional neural networks;
The sample that training sample after normalized is concentrated is input in full convolutional neural networks by (3b), to full convolutional Neural
Network is trained, until the penalty values of the output vector of full convolutional neural networks output layer are less than or equal to 10;
(4) coordinate parameters in full convolutional neural networks output vector are screened:
(4a) concentrates from the test sample after normalized obtains a sample do not tested as input sample;
Current input sample is input in the full convolutional neural networks trained by (4b), obtains the full convolutional Neural of current sample
Text detection probable value in the output vector of network output layer and the seat in the output vector of full convolutional neural networks output layer
Mark parameter;
(4c) judges whether the output vector Chinese version detection probability value of current input sample is more than or equal to 0.6, if so, then performing
Step (4d), otherwise, perform step (4e);
(4d) retains the coordinate parameters corresponding with the output vector Chinese version detection probability value of current input sample, performs step
(4f);
(4e) gives up the coordinate parameters corresponding with the output vector Chinese version detection probability value of current input sample, performs step
(4f);
(4f) judges test sample concentrates whether also have the sample do not tested, if so, then performing step (4a), otherwise, performed
Step (5);
(5) text in natural scene image is positioned:
Using the coordinate parameters remained in full convolutional neural networks output layer output vector, demarcate successively in test set sample
The text of each sample.
2. the localization method according to claim 1 for natural scene image Chinese version, it is characterised in that:Step (3a)
Described in the structures of 23 layers of full convolutional neural networks be convolutional layer Conv1, convolutional layer Conv2, without sampling convolutional layer successively
NSConv3, low step convolutional layer LSConv4, convolutional layer Conv5, without sampling convolutional layer NSConv6, low step convolutional layer LSConv7,
Convolutional layer Conv8, without sampling convolutional layer NSConv9, low step convolutional layer LSConv10, without sampling convolutional layer NSConv11, low step
Convolutional layer LSConv12, convolutional layer Conv13, without sampling convolutional layer NSConv14, low step convolutional layer LSConv15, without sampling roll up
Lamination NSConv16, low step convolutional layer LSConv17, without sampling convolutional layer NSConv18, without sampling convolutional layer NSConv19, nothing
Sample convolutional layer NSConv20, without sampling convolutional layer NSConv21, low step convolutional layer LSConv22, output layer.
3. the localization method according to claim 1 for natural scene image Chinese version, it is characterised in that:Step (3a)
Described in build comprising the following steps that containing 23 layers of full convolutional neural networks:
1st step, the sample that the training sample after normalized is concentrated is input to convolutional layer Conv1, it is rolled up successively
Product operation, batch standardization, linear R elu conversion and the operation of maximum down-sampling, obtain the convolution of 32 208 × 208 pixel sizes
The feature vector chart of layer Conv1 outputs;
2nd step, convolutional layer Conv1 output characteristic vectograms are input to convolutional layer Conv2, it is carried out successively convolution operation,
Standardization, linear R elu conversion and the operation of maximum down-sampling are criticized, obtains the convolutional layer Conv2 of 64 104 × 104 pixel sizes
The feature vector chart of output;
3rd step, the convolutional layer Conv2 feature vector charts exported are input to without sampling convolutional layer NSConv3, it is carried out successively
Convolution operation, batch standardization and linear R elu conversion, obtain 128 104 × 104 pixel sizes without sampling convolutional layer
The feature vector chart of NSConv3 outputs;
4th step, convolutional layer Conv3 output characteristic vectograms are input to low step convolutional layer LSConv4, low step is carried out successively to it
Convolution operation and linear R elu conversion, obtain the low convolutional layer LSConv4 output characteristics of 64 104 × 104 pixel sizes to
Spirogram;
5th step, convolutional layer LSConv4 output characteristic vectograms are input to convolutional layer Conv5, convolution behaviour is carried out successively to it
Make, criticize standardization, linear R elu conversion and the operation of maximum down-sampling, obtain the convolutional layer of 128 52 × 52 pixel sizes
Conv5 output characteristic vectograms;
6th step, convolutional layer Conv5 output characteristic vectograms are input to without sampling convolutional layer NSConv6, it is rolled up successively
Product operation, batch standardization and linear R elu conversion, obtain 256 52 × 52 pixel sizes without sampling convolutional layer NSConv6
Output characteristic vectogram;
7th step, convolutional layer Conv6 output characteristic vectograms are input to low step convolutional layer LSConv7, low step is carried out successively to it
Convolution operation and linear R elu conversion, obtain the low convolutional layer LSConv7 output characteristics vector of 128 52 × 52 pixel sizes
Figure;
8th step, convolutional layer LSConv7 output characteristic vectograms are input to convolutional layer Conv8, convolution behaviour is carried out successively to it
Make, criticize standardization, linear R elu conversion and the operation of maximum down-sampling, obtain the convolutional layer of 256 26 × 26 pixel sizes
Conv8 output characteristic vectograms;
9th step, convolutional layer Conv8 output characteristic vectograms are input to convolutional layer Conv9, it is carried out successively convolution operation,
Standardization and linear R elu conversion are criticized, obtains the convolutional layer Conv9 output characteristic vectograms of 512 26 × 26 pixel sizes;
10th step, convolutional layer Conv9 output characteristic vectograms are input to low step convolutional layer LSConv10, it carried out successively low
Convolution operation and linear R elu conversion are walked, obtains the low convolutional layer LSConv10 output characteristics of 256 26 × 26 pixel sizes
Vectogram;
11st step, by low convolutional layer LSConv10 output characteristic vectograms be input to without sampling convolutional layer NSConv11, to its according to
Secondary progress convolution operation, batch standardization and linear R elu conversion, obtain the convolution without sampling of 512 26 × 26 pixel sizes
Layer NSConv11 output characteristic vectograms;
12nd step, convolutional layer Conv11 output characteristic vectograms are input to low step convolutional layer LSConv12, it is carried out successively
Low step convolution operation and linear R elu conversion, the low convolutional layer LSConv12 outputs for obtaining 256 26 × 26 pixel sizes are special
Levy vectogram;
13rd step, convolutional layer LSConv12 output characteristic vectograms are input to convolutional layer Conv13, convolution is carried out successively to it
Operation, batch standardization, linear R elu conversion and the operation of maximum down-sampling, obtain the convolutional layer of 512 13 × 13 pixel sizes
Conv13 output characteristic vectograms;
14th step, convolutional layer Conv13 output characteristic vectograms are input to without sampling convolutional layer NSConv14, it is entered successively
Row convolution operation, batch standardization and linear R elu conversion, obtain 1024 13 × 13 pixel sizes without sampling convolutional layer
NSConv14 output characteristic vectograms;
15th step, convolutional layer Conv14 output characteristic vectograms are input to low step convolutional layer LSConv15, it is carried out successively
Low step convolution operation and linear R elu conversion, the low convolutional layer LSConv15 outputs for obtaining 512 13 × 13 pixel sizes are special
Levy vectogram;
16th step, by low convolutional layer LSConv15 output characteristic vectograms be input to without sampling convolutional layer NSConv16, to its according to
Secondary progress convolution operation, batch standardization and linear R elu conversion, obtain 1024 13 × 13 pixel sizes without sampling convolution
Layer NSConv16 output characteristic vectograms;
17th step, convolutional layer Conv16 output characteristic vectograms are input to low step convolutional layer LSConv17, it is carried out successively
Low step convolution operation and linear R elu conversion, the low convolutional layer LSConv17 outputs for obtaining 512 13 × 13 pixel sizes are special
Levy vectogram;
18th step, by low convolutional layer LSConv17 output characteristic vectograms be input to without sampling convolutional layer NSConv18, to its according to
Secondary progress convolution operation, batch standardization and linear R elu conversion, obtain 1024 13 × 13 pixel sizes without sampling convolution
Layer NSConv18 output characteristic vectograms;
19th step, convolutional layer Conv18 output characteristic vectograms are input to without sampling convolutional layer NSConv19, it is entered successively
Row convolution operation, batch standardization and linear R elu conversion, obtain 1024 13 × 13 pixel sizes without sampling convolutional layer
NSConv19 output characteristic vectograms;
20th step, Conv19 output characteristic vectograms are input to without sampling convolutional layer NSConv20, convolution is carried out successively to it
Operation, batch standardization and linear R elu conversion, obtain 1024 13 × 13 pixel sizes without sampling convolutional layer NSConv20
Output characteristic vectogram;
21st step, convolutional layer Conv20 output characteristic vectograms are input to without sampling convolutional layer NSConv21, it is entered successively
Row convolution operation, batch standardization and linear R elu conversion, obtain 1024 13 × 13 pixel sizes without sampling convolutional layer
NSConv21 output characteristic vectograms;
22nd step, convolutional layer Conv21 output characteristic vectograms are input to low step convolutional layer LSConv22, it is carried out successively
Low step convolution operation and linear R elu conversion, obtain the low convolutional layer LSConv22 output characteristics of 40 13 × 13 pixel sizes
Vectogram;
23rd step, low convolutional layer LSConv22 output characteristic vectograms are input to output layer, linear transformation is carried out successively to it
And non-linear Sigmoid transform, obtain the output vector of output layer, wherein output vector by full convolutional neural networks output layer
Coordinate parameters and text detection the probable value composition of middle output.
4. the localization method according to claim 1 for natural scene image Chinese version, it is characterised in that:Step (3b)
Described in comprised the following steps that to what full convolutional neural networks were trained:
1st step, one sample of extraction is concentrated from training sample, in the full convolutional neural networks that sample input is built, so
Exported accordingly from the output layer of full convolutional neural networks afterwards, in this stage, the information of the sample passes through full convolutional Neural
The successively conversion of network, it is delivered to the output layer of full convolutional neural networks;
2nd step, the penalty values of the output vector of full convolutional neural networks output layer are calculated, using ADAM algorithms to full convolutional Neural
All parameters of network carry out the adjustment for having supervision so that the penalty values of the output vector of full convolutional neural networks output layer are gradual
Reduce;
3rd step, the 1st step is repeated and the step of 2 step, until the penalty values of the output vector of full convolutional neural networks output layer are small
Stop iteration when equal to 10, obtain and preserve the full convolutional neural networks trained.
5. the localization method according to claim 1 for natural scene image Chinese version, it is characterised in that:Step (3b)
Described in the penalty values of output vector of full convolutional neural networks output layer obtained by following formula:
<mfenced open = "{" close = "">
<mtable>
<mtr>
<mtd>
<mrow>
<mi>X</mi>
<mo>=</mo>
<msup>
<mrow>
<mo>(</mo>
<msup>
<mi>x</mi>
<mi>p</mi>
</msup>
<mo>-</mo>
<msup>
<mi>x</mi>
<mi>r</mi>
</msup>
<mo>)</mo>
</mrow>
<mn>2</mn>
</msup>
</mrow>
</mtd>
</mtr>
<mtr>
<mtd>
<mrow>
<mi>Y</mi>
<mo>=</mo>
<msup>
<mrow>
<mo>(</mo>
<msup>
<mi>y</mi>
<mi>p</mi>
</msup>
<mo>-</mo>
<msup>
<mi>y</mi>
<mi>r</mi>
</msup>
<mo>)</mo>
</mrow>
<mn>2</mn>
</msup>
</mrow>
</mtd>
</mtr>
<mtr>
<mtd>
<mrow>
<mi>W</mi>
<mo>=</mo>
<msup>
<mrow>
<mo>(</mo>
<msup>
<mi>w</mi>
<mi>p</mi>
</msup>
<mo>-</mo>
<msup>
<mi>w</mi>
<mi>r</mi>
</msup>
<mo>)</mo>
</mrow>
<mn>2</mn>
</msup>
</mrow>
</mtd>
</mtr>
<mtr>
<mtd>
<mrow>
<mi>H</mi>
<mo>=</mo>
<msup>
<mrow>
<mo>(</mo>
<msup>
<mi>h</mi>
<mi>p</mi>
</msup>
<mo>-</mo>
<msup>
<mi>h</mi>
<mi>r</mi>
</msup>
<mo>)</mo>
</mrow>
<mn>2</mn>
</msup>
</mrow>
</mtd>
</mtr>
<mtr>
<mtd>
<mrow>
<mi>S</mi>
<mo>=</mo>
<msup>
<mrow>
<mo>(</mo>
<msup>
<mi>s</mi>
<mi>p</mi>
</msup>
<mo>-</mo>
<msup>
<mi>s</mi>
<mi>r</mi>
</msup>
<mo>)</mo>
</mrow>
<mn>2</mn>
</msup>
</mrow>
</mtd>
</mtr>
<mtr>
<mtd>
<mrow>
<mi>T</mi>
<mo>=</mo>
<msup>
<mrow>
<mo>(</mo>
<msup>
<mi>t</mi>
<mi>p</mi>
</msup>
<mo>-</mo>
<msup>
<mi>t</mi>
<mi>r</mi>
</msup>
<mo>)</mo>
</mrow>
<mn>2</mn>
</msup>
</mrow>
</mtd>
</mtr>
<mtr>
<mtd>
<mrow>
<mi>C</mi>
<mo>=</mo>
<msup>
<mrow>
<mo>(</mo>
<msup>
<mi>c</mi>
<mi>p</mi>
</msup>
<mo>-</mo>
<msup>
<mi>c</mi>
<mi>r</mi>
</msup>
<mo>)</mo>
</mrow>
<mn>2</mn>
</msup>
</mrow>
</mtd>
</mtr>
<mtr>
<mtd>
<mrow>
<mi>l</mi>
<mo>=</mo>
<mfrac>
<mn>5</mn>
<mn>2</mn>
</mfrac>
<mo>&lsqb;</mo>
<mi>X</mi>
<mo>+</mo>
<mi>Y</mi>
<mo>+</mo>
<mi>W</mi>
<mo>+</mo>
<mi>H</mi>
<mo>+</mo>
<mi>S</mi>
<mo>+</mo>
<mi>T</mi>
<mo>&rsqb;</mo>
<mo>+</mo>
<mfrac>
<mn>3</mn>
<mn>2</mn>
</mfrac>
<mi>C</mi>
</mrow>
</mtd>
</mtr>
</mtable>
</mfenced>
Wherein, X represents the penalty values of abscissa value in full convolutional neural networks output layer output vector, xpRepresent in output vector
Abscissa value, xrThe real abscissa value in input sample Chinese one's respective area is represented, Y represents that full convolutional neural networks output layer is defeated
The penalty values of ordinate value, y in outgoing vectorpRepresent the ordinate value in output vector, yrRepresent that input sample Chinese one's respective area is true
Real ordinate value, W represent the penalty values of width value in full convolutional neural networks output layer output vector, wpRepresent output vector
In width value, wrThe real width value in input sample Chinese one's respective area is represented, H represents full convolutional neural networks output layer output
The penalty values of height value, h in vectorpRepresent the height value in output vector, hrRepresent that input sample Chinese one's respective area is really high
Angle value, S represent the penalty values of sine value in full convolutional neural networks output layer output vector, spRepresent the sine in output vector
Value, srThe real sine value in input sample Chinese one's respective area is represented, T represents remaining in full convolutional neural networks output layer output vector
The penalty values of string value, tpRepresent the cosine value in output vector, trRepresent the real cosine value in input sample Chinese one's respective area, C tables
Show the penalty values of full convolutional neural networks output layer output vector Chinese version detection probability value, cpRepresent the text in output vector
Detection probability value, crThe real text detection idealized probability value in input sample Chinese one's respective area is represented, l represents the damage of output vector
Mistake value.
6. the localization method according to claim 1 for natural scene image Chinese version, it is characterised in that:Step (4b)
Text detection probable value in the output vector of described full convolutional neural networks output layer is obtained by following formula:
<mrow>
<mi>c</mi>
<mo>=</mo>
<mfrac>
<mn>1</mn>
<mrow>
<mn>1</mn>
<mo>+</mo>
<msup>
<mi>e</mi>
<mrow>
<mo>-</mo>
<mi>&beta;</mi>
</mrow>
</msup>
</mrow>
</mfrac>
</mrow>
Wherein, c represents the text detection probable value in the output vector of full convolutional neural networks output layer, and β represents full convolution god
The 7th output valve in output vector through network output layer from left to right.
7. the localization method according to claim 1 for natural scene image Chinese version, it is characterised in that:Step (4b)
Coordinate parameters in the output vector of described full convolutional neural networks output layer include its abscissa value, ordinate value, width
Value, height value, sine value, cosine value.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710781807.7A CN107563379B (en) | 2017-09-02 | 2017-09-02 | Method for positioning text in natural scene image |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710781807.7A CN107563379B (en) | 2017-09-02 | 2017-09-02 | Method for positioning text in natural scene image |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107563379A true CN107563379A (en) | 2018-01-09 |
CN107563379B CN107563379B (en) | 2019-12-24 |
Family
ID=60977874
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710781807.7A Active CN107563379B (en) | 2017-09-02 | 2017-09-02 | Method for positioning text in natural scene image |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107563379B (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108288088A (en) * | 2018-01-17 | 2018-07-17 | 浙江大学 | A kind of scene text detection method based on end-to-end full convolutional neural networks |
CN108664968A (en) * | 2018-04-18 | 2018-10-16 | 江南大学 | A kind of unsupervised text positioning method based on text selection model |
CN108805131A (en) * | 2018-05-22 | 2018-11-13 | 北京旷视科技有限公司 | Text line detection method, apparatus and system |
CN109858318A (en) * | 2018-11-16 | 2019-06-07 | 平安科技(深圳)有限公司 | The classification recognition methods of landscape image and device |
CN110399871A (en) * | 2019-06-14 | 2019-11-01 | 华南理工大学 | A kind of appraisal procedure of scene text testing result |
CN110689012A (en) * | 2019-10-08 | 2020-01-14 | 山东浪潮人工智能研究院有限公司 | End-to-end natural scene text recognition method and system |
CN112200598A (en) * | 2020-09-08 | 2021-01-08 | 北京数美时代科技有限公司 | Picture advertisement identification method and device and computer equipment |
CN112836696A (en) * | 2019-11-22 | 2021-05-25 | 搜狗(杭州)智能科技有限公司 | Text data detection method and device and electronic equipment |
CN113342994A (en) * | 2021-07-05 | 2021-09-03 | 成都信息工程大学 | Recommendation system based on non-sampling cooperative knowledge graph network |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090116756A1 (en) * | 2007-11-06 | 2009-05-07 | Copanion, Inc. | Systems and methods for training a document classification system using documents from a plurality of users |
CN102663383A (en) * | 2012-04-26 | 2012-09-12 | 北京科技大学 | Method for positioning texts in images of natural scene |
CN104182750A (en) * | 2014-07-14 | 2014-12-03 | 上海交通大学 | Extremum connected domain based Chinese character detection method in natural scene image |
CN104809481A (en) * | 2015-05-21 | 2015-07-29 | 中南大学 | Natural scene text detection method based on adaptive color clustering |
CN105825216A (en) * | 2016-03-17 | 2016-08-03 | 中国科学院信息工程研究所 | Method of locating text in complex background image |
-
2017
- 2017-09-02 CN CN201710781807.7A patent/CN107563379B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090116756A1 (en) * | 2007-11-06 | 2009-05-07 | Copanion, Inc. | Systems and methods for training a document classification system using documents from a plurality of users |
CN102663383A (en) * | 2012-04-26 | 2012-09-12 | 北京科技大学 | Method for positioning texts in images of natural scene |
CN104182750A (en) * | 2014-07-14 | 2014-12-03 | 上海交通大学 | Extremum connected domain based Chinese character detection method in natural scene image |
CN104809481A (en) * | 2015-05-21 | 2015-07-29 | 中南大学 | Natural scene text detection method based on adaptive color clustering |
CN105825216A (en) * | 2016-03-17 | 2016-08-03 | 中国科学院信息工程研究所 | Method of locating text in complex background image |
Non-Patent Citations (5)
Title |
---|
ANKUSH GUPTA 等: "Synthetic Data for Text Localisation in Natural Images", 《THE IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)》 * |
DENA BAZAZIAN 等: "Improving text proposals for scene images with fully convolutional networks", 《网页在线公开:HTTPS://ARXIV.ORG/ABS/1702.05089》 * |
ZECHENG XIE 等: "Learning spatial-semantic context with fully convolutional recurrent network for online handwritten Chinese text recognition", 《网页在线公开:HTTPS://ARXIV.ORG/ABS/ 1610.02616》 * |
贺通姚剑: "基于全卷积网络的场景文本检测", 《黑龙江科技信息》 * |
骆遥: "基于深度全卷积神经网络的文字区域定位方法", 《无线互联科技》 * |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108288088B (en) * | 2018-01-17 | 2020-02-28 | 浙江大学 | Scene text detection method based on end-to-end full convolution neural network |
CN108288088A (en) * | 2018-01-17 | 2018-07-17 | 浙江大学 | A kind of scene text detection method based on end-to-end full convolutional neural networks |
CN108664968A (en) * | 2018-04-18 | 2018-10-16 | 江南大学 | A kind of unsupervised text positioning method based on text selection model |
CN108664968B (en) * | 2018-04-18 | 2020-07-07 | 江南大学 | Unsupervised text positioning method based on text selection model |
CN108805131A (en) * | 2018-05-22 | 2018-11-13 | 北京旷视科技有限公司 | Text line detection method, apparatus and system |
CN109858318A (en) * | 2018-11-16 | 2019-06-07 | 平安科技(深圳)有限公司 | The classification recognition methods of landscape image and device |
CN110399871A (en) * | 2019-06-14 | 2019-11-01 | 华南理工大学 | A kind of appraisal procedure of scene text testing result |
CN110689012A (en) * | 2019-10-08 | 2020-01-14 | 山东浪潮人工智能研究院有限公司 | End-to-end natural scene text recognition method and system |
CN112836696A (en) * | 2019-11-22 | 2021-05-25 | 搜狗(杭州)智能科技有限公司 | Text data detection method and device and electronic equipment |
CN112200598A (en) * | 2020-09-08 | 2021-01-08 | 北京数美时代科技有限公司 | Picture advertisement identification method and device and computer equipment |
CN112200598B (en) * | 2020-09-08 | 2022-02-15 | 北京数美时代科技有限公司 | Picture advertisement identification method and device and computer equipment |
CN113342994A (en) * | 2021-07-05 | 2021-09-03 | 成都信息工程大学 | Recommendation system based on non-sampling cooperative knowledge graph network |
CN113342994B (en) * | 2021-07-05 | 2022-07-05 | 成都信息工程大学 | Recommendation system based on non-sampling cooperative knowledge graph network |
Also Published As
Publication number | Publication date |
---|---|
CN107563379B (en) | 2019-12-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107563379A (en) | For the localization method to natural scene image Chinese version | |
AU2020100200A4 (en) | Content-guide Residual Network for Image Super-Resolution | |
CN106127684B (en) | Image super-resolution Enhancement Method based on forward-backward recutrnce convolutional neural networks | |
CN104978580B (en) | A kind of insulator recognition methods for unmanned plane inspection transmission line of electricity | |
CN103077511B (en) | Image super-resolution reconstruction method based on dictionary learning and structure similarity | |
CN108549893A (en) | A kind of end-to-end recognition methods of the scene text of arbitrary shape | |
CN111986099A (en) | Tillage monitoring method and system based on convolutional neural network with residual error correction fused | |
CN109064396A (en) | A kind of single image super resolution ratio reconstruction method based on depth ingredient learning network | |
CN108399362A (en) | A kind of rapid pedestrian detection method and device | |
CN108197606A (en) | The recognition methods of abnormal cell in a kind of pathological section based on multiple dimensioned expansion convolution | |
CN108038420A (en) | A kind of Human bodys' response method based on deep video | |
CN105335929B (en) | A kind of depth map ultra-resolution method | |
CN105069746A (en) | Video real-time human face substitution method and system based on partial affine and color transfer technology | |
CN105069825A (en) | Image super resolution reconstruction method based on deep belief network | |
CN106339984B (en) | Distributed image ultra-resolution method based on K mean value driving convolutional neural networks | |
CN110276354A (en) | A kind of training of high-resolution Streetscape picture semantic segmentation and real time method for segmenting | |
CN110458165A (en) | A kind of natural scene Method for text detection introducing attention mechanism | |
CN110223304A (en) | A kind of image partition method, device and computer readable storage medium based on multipath polymerization | |
CN110136060A (en) | The image super-resolution rebuilding method of network is intensively connected based on shallow-layer | |
CN106169174A (en) | A kind of image magnification method | |
CN107424161A (en) | A kind of indoor scene image layout method of estimation by thick extremely essence | |
CN108765349A (en) | A kind of image repair method and system with watermark | |
CN110349087A (en) | RGB-D image superior quality grid generation method based on adaptability convolution | |
CN105095857A (en) | Face data enhancement method based on key point disturbance technology | |
CN104091364B (en) | Single-image super-resolution reconstruction method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |