CN107563379A - For the localization method to natural scene image Chinese version - Google Patents

For the localization method to natural scene image Chinese version Download PDF

Info

Publication number
CN107563379A
CN107563379A CN201710781807.7A CN201710781807A CN107563379A CN 107563379 A CN107563379 A CN 107563379A CN 201710781807 A CN201710781807 A CN 201710781807A CN 107563379 A CN107563379 A CN 107563379A
Authority
CN
China
Prior art keywords
convolutional layer
output
layer
msup
neural networks
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710781807.7A
Other languages
Chinese (zh)
Other versions
CN107563379B (en
Inventor
宋彬
黄家冕
郭洁
王丹
秦浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN201710781807.7A priority Critical patent/CN107563379B/en
Publication of CN107563379A publication Critical patent/CN107563379A/en
Application granted granted Critical
Publication of CN107563379B publication Critical patent/CN107563379B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Image Analysis (AREA)

Abstract

The invention discloses one kind to be based on full convolutional neural networks natural scene text positioning method.Its step is:(1) images to be recognized sample is inputted;(2) normalized;(3) build and train full convolutional neural networks;(4) coordinate parameters of full convolutional neural networks output are screened;(5) text in natural scene image is positioned;The present invention builds and trains full convolutional neural networks, the image of text will be contained under natural scene as input, it can efficiently solve and be not enough to characterize profound text message in the prior art only with the feature of single shallow-layer, cause computationally intensive during using artificial extraction feature and be not carried out the problem of end-to-end autotext positions, the present invention, which has, combines a variety of characteristics of image, obtain the text message of more rich deeper, the advantages of improving the String localization precision in natural scene image.

Description

For the localization method to natural scene image Chinese version
Technical field
The invention belongs to technical field of image processing, further relates to a kind of in framing technical field be used for certainly The localization method of right scene image Chinese version.The present invention is directed to the natural image containing text, using the complete of fusion angle change Convolutional neural networks, arbitrary size image Chinese version can be positioned.
Background technology
With the increase at double of view data, text is extracted from substantial amounts of natural scene image becomes research heat Point, String localization have become a critically important research theme in technical field of image processing.
Paper patent at present in terms of the String localization in natural scene image is very more, from the technology path taken From the point of view of, the technology of String localization is broadly divided into four kinds:Technology based on connected region, the technology based on edge, based on texture Technology and the technology based on angle point.Also two kinds of technologies are combined the String localization technology used together.Some is based on texture Text positioning method, although can be with localization of text region, for some non-textual areas under complex scene sometimes Domain but texture and text filed similar situation, easily occur deviation in positioning.Text positioning method of some based on angle point, It is but more preferable than English text on positioning Chinese text because the angle point that Chinese is complicated, stroke is formed is more.These Method belongs to shallow-layer study category, depends on feature extraction, it is necessary to a large amount of manual workings, and only with single spy Sign, it is not enough to characterize more fully target information, causes String localization effect bad.
In its patent document applied, " text in a kind of complex background image is determined for Institute of Information Engineering, CAS A kind of complex background figure is disclosed in position method " (number of patent application CN201610153384.X, publication number CN105825216A) Text positioning method as in.This method uses respectively on tri- passages of R, G and B of pending coloured image first MSERs algorithms are handled, and obtain coordinate of three MSERs regions on coloured image, then to after the denoising of MSERs regions again Extraction setting feature and based on the setting feature to candidate's MSERs territorial classifications, obtain including the MSERs regions of text, finally Obtained text block is connected into text bar, and duplicate removal processing is carried out to text bar.Although this method is that one kind is applied in complexity Chinese text localization method under scene, still, this method, which still has weak point, is, because this method is used based on company The technology in logical region, the feature extraction dependent on setting take longer, it is necessary to substantial amounts of manual working, computationally intensive, not carried Get deeper feature, it is impossible to characterize more fully text message, the problem of causing String localization effect bad, easily will The non-textual region flase drop similar with text, it is not carried out autotext positioning end to end.
Paper " the Multi-Oriented Text Detection In Scene that M.Basavanna et al. delivers at it Images”(International Journal of Pattern Recognition&Artificial Intelligence, 2012,26(07):A kind of text positioning method based on edge is disclosed in 1255010-1-1255010-19.).This method is first First with Sobel operator extractions image border, the Sobel edge features of image are obtained, it is then big according to distance between character edge The rule of small fixation, increase the size of calculated level direction character edge spacing using edge, be finally solution character and character Between Characters Stuck problem caused by edge spacing very little, using the method for zero cross point, so as to position the text in image.Should Weak point is existing for method, due to using Zero-Crossing Method, it is desirable to which text distribution must be horizontal, otherwise will go out Now intersect, therefore be directed to the inclined situation of text, the accuracy rate of this method String localization is not high.
The content of the invention
The purpose of the present invention is to be directed to above-mentioned the shortcomings of the prior art, it is proposed that one kind is used for natural scene image The localization method of Chinese version.The present invention is compared with other natural scene String localization technologies in the prior art, without artificial setting Feature, strong adaptability, accuracy rate are high.
The specific steps that the present invention realizes include as follows:
(1) image pattern to be identified is inputted:
(1a) is from the artificial synthesized training that Chinese text RCTW-17 is read containing text image data collection and natural scene In data set, the random image composition training sample set for extracting 32000 width known text coordinates;
(1b) shoots the image composition test sample collection that 200 width contain text from natural scene;
(2) normalized:
Each sample of training sample set and test sample collection is zoomed to 416 × 416 sizes by (2a), after composition scales Training sample set and test sample collection;
(2b) concentrates the pixel value of each sample to be normalized the training sample set after scaling and test sample, Obtain the training sample set after normalized and test sample collection;
(3) build and train full convolutional neural networks:
(3a) structure contains 23 layers of full convolutional neural networks;
The sample that training sample after normalized is concentrated is input in full convolutional neural networks by (3b), to full convolution Neutral net is trained, until the penalty values of the output vector of full convolutional neural networks output layer are less than or equal to 10;
(4) coordinate parameters in full convolutional neural networks output vector are screened:
(4a) concentrates from the test sample after normalized obtains a sample do not tested as input sample;
Current input sample is input in the full convolutional neural networks trained by (4b), obtains the full convolution of current sample Text detection probable value in the output vector of neutral net output layer and in the output vector of full convolutional neural networks output layer Coordinate parameters;
(4c) judges whether the output vector Chinese version detection probability value of current input sample is more than or equal to 0.6, if so, then Step (4d) is performed, otherwise, performs step (4e);
(4d) retains the coordinate parameters corresponding with the output vector Chinese version detection probability value of current input sample, performs Step (4f);
(4e) gives up the coordinate parameters corresponding with the output vector Chinese version detection probability value of current input sample, performs Step (4f);
(4f) judges test sample concentrates whether also have the sample do not tested, if so, step (4a) is then performed, otherwise, Perform step (5);
(5) text in natural scene image is positioned:
Using the coordinate parameters remained in full convolutional neural networks output layer output vector, test set sample is demarcated successively The text of each sample in this.
The present invention has advantages below compared with prior art:
Firstth, due to present invention employs containing 23 layers of full convolutional neural networks, in the image for extracting natural scene Deeper level, more fully text message feature, overcome and use single shallow-layer characteristic present text message not in the prior art Comprehensively, the problem of causing String localization effect bad so that the present invention can meticulously describe the text of natural scene image comprehensively Feature, improve the accuracy rate of String localization.
Secondth, due to present invention employs being trained to full convolutional neural networks, by training full convolutional neural networks, The coordinate parameters and text detection probability of full convolutional neural networks output are obtained, are overcome special using artificial extraction in the prior art Sign causes location Calculation amount big, time-consuming longer, while can not realize autotext positioning end to end and positioning tilt text effect The problem of fruit is poor so that the present invention can learn the deeper feature of image automatically, independent of artificial extraction feature, reduce Amount of calculation, realize that text is automatically positioned and improved the locating effect for tilting text end to end.
Brief description of the drawings
Fig. 1 is the flow chart of the present invention;
Fig. 2 is the flow chart for the coordinate parameters step that the present invention screens full convolutional neural networks output;
Fig. 3 is a width figure of the training sample set of emulation experiment input of the present invention;
Fig. 4 is the test sample input figure and test sample output figure of emulation experiment of the present invention.
Embodiment
The present invention will be further described below in conjunction with the accompanying drawings.
Reference picture 1, what the present invention realized comprises the following steps that:
Step 1, image pattern to be identified is inputted.
From the artificial synthesized training data that Chinese text RCTW-17 is read containing text image data collection and natural scene Concentrate, the random image composition training sample set for extracting 32000 width known text coordinates.
The image composition test sample collection that 200 width contain text is shot from natural scene.
Step 2, normalized.
Each sample of training sample set and test sample collection is zoomed into 416 × 416 sizes, forms the training after scaling Sample set and test sample collection.
Concentrate the pixel value of each sample to be normalized the training sample set after scaling and test sample, obtain Training sample set and test sample collection after normalized.
Step 3, build and train full convolutional neural networks.
For structure containing 23 layers of full convolutional neural networks, the structure of 23 layers of full convolutional neural networks is convolutional layer successively Conv1, convolutional layer Conv2, without sampling convolutional layer NSConv3, low step convolutional layer LSConv4, convolutional layer Conv5, without sampling roll up Lamination NSConv6, low step convolutional layer LSConv7, convolutional layer Conv8, without sampling convolutional layer NSConv9, low step convolutional layer LSConv10, without sampling convolutional layer NSConv11, low step convolutional layer LSConv12, convolutional layer Conv13, without sampling convolutional layer NSConv14, low step convolutional layer LSConv15, without sampling convolutional layer NSConv16, low step convolutional layer LSConv17, without sampling roll up Lamination NSConv18, without sampling convolutional layer NSConv19, without sampling convolutional layer NSConv20, without sampling convolutional layer NSConv21, Low step convolutional layer LSConv22, output layer, build comprising the following steps that containing 23 layers of full convolutional neural networks:
1st step, the sample that the training sample after normalized is concentrated is input to convolutional layer Conv1, it is entered successively Row convolution operation, batch standardization, linear R elu conversion and the operation of maximum down-sampling, obtain 32 208 × 208 pixel sizes The feature vector chart of convolutional layer Conv1 outputs;
2nd step, convolutional layer Conv1 output characteristic vectograms are input to convolutional layer Conv2, convolution behaviour is carried out successively to it Make, criticize standardization, linear R elu conversion and the operation of maximum down-sampling, obtain the convolutional layer of 64 104 × 104 pixel sizes The feature vector chart of Conv2 outputs;
3rd step, the convolutional layer Conv2 feature vector charts exported are input to without convolutional layer NSConv3 is sampled, to it successively Carry out convolution operation, batch standardization and linear R elu conversion, obtain 128 104 × 104 pixel sizes without sampling convolutional layer The feature vector chart of NSConv3 outputs;
4th step, convolutional layer Conv3 output characteristic vectograms are input to low step convolutional layer LSConv4, it is carried out successively Low step convolution operation and linear R elu conversion, the low convolutional layer LSConv4 outputs for obtaining 64 104 × 104 pixel sizes are special Levy vectogram;
5th step, convolutional layer LSConv4 output characteristic vectograms are input to convolutional layer Conv5, convolution is carried out successively to it Operation, batch standardization, linear R elu conversion and the operation of maximum down-sampling, obtain the convolutional layer of 128 52 × 52 pixel sizes Conv5 output characteristic vectograms;
6th step, convolutional layer Conv5 output characteristic vectograms are input to without sampling convolutional layer NSConv6, it is entered successively Row convolution operation, batch standardization and linear R elu conversion, obtain 256 52 × 52 pixel sizes without sampling convolutional layer NSConv6 output characteristic vectograms;
7th step, convolutional layer Conv6 output characteristic vectograms are input to low step convolutional layer LSConv7, it is carried out successively Low step convolution operation and linear R elu conversion, obtain the low convolutional layer LSConv7 output characteristics of 128 52 × 52 pixel sizes Vectogram;
8th step, convolutional layer LSConv7 output characteristic vectograms are input to convolutional layer Conv8, convolution is carried out successively to it Operation, batch standardization, linear R elu conversion and the operation of maximum down-sampling, obtain the convolutional layer of 256 26 × 26 pixel sizes Conv8 output characteristic vectograms;
9th step, convolutional layer Conv8 output characteristic vectograms are input to convolutional layer Conv9, convolution behaviour is carried out successively to it Make, criticize standardization and linear R elu conversion, obtain the convolutional layer Conv9 output characteristics vector of 512 26 × 26 pixel sizes Figure;
10th step, convolutional layer Conv9 output characteristic vectograms are input to low step convolutional layer LSConv10, it is entered successively The low step convolution operation of row and linear R elu conversion, obtain the low convolutional layer LSConv10 outputs of 256 26 × 26 pixel sizes Feature vector chart;
11st step, low convolutional layer LSConv10 output characteristic vectograms are input to without sampling convolutional layer NSConv11, it is right It carries out convolution operation, batch standardization and linear R elu conversion successively, obtain 512 26 × 26 pixel sizes without sampling Convolutional layer NSConv11 output characteristic vectograms;
12nd step, convolutional layer Conv11 output characteristic vectograms are input to low step convolutional layer LSConv12, to it successively Low step convolution operation and linear R elu conversion are carried out, the low convolutional layer LSConv12 for obtaining 256 26 × 26 pixel sizes is defeated Go out feature vector chart;
13rd step, convolutional layer LSConv12 output characteristic vectograms are input to convolutional layer Conv13, it is carried out successively Convolution operation, batch standardization, linear R elu conversion and the operation of maximum down-sampling, obtain the volume of 512 13 × 13 pixel sizes Lamination Conv13 output characteristic vectograms;
14th step, by convolutional layer Conv13 output characteristic vectograms be input to without sampling convolutional layer NSConv14, to its according to Secondary progress convolution operation, batch standardization and linear R elu conversion, obtain 1024 13 × 13 pixel sizes without sampling convolution Layer NSConv14 output characteristic vectograms;
15th step, convolutional layer Conv14 output characteristic vectograms are input to low step convolutional layer LSConv15, to it successively Low step convolution operation and linear R elu conversion are carried out, the low convolutional layer LSConv15 for obtaining 512 13 × 13 pixel sizes is defeated Go out feature vector chart;
16th step, low convolutional layer LSConv15 output characteristic vectograms are input to without sampling convolutional layer NSConv16, it is right It carries out convolution operation, batch standardization and linear R elu conversion successively, obtain 1024 13 × 13 pixel sizes without sampling Convolutional layer NSConv16 output characteristic vectograms;
17th step, convolutional layer Conv16 output characteristic vectograms are input to low step convolutional layer LSConv17, to it successively Low step convolution operation and linear R elu conversion are carried out, the low convolutional layer LSConv17 for obtaining 512 13 × 13 pixel sizes is defeated Go out feature vector chart;
18th step, low convolutional layer LSConv17 output characteristic vectograms are input to without sampling convolutional layer NSConv18, it is right It carries out convolution operation, batch standardization and linear R elu conversion successively, obtain 1024 13 × 13 pixel sizes without sampling Convolutional layer NSConv18 output characteristic vectograms;
19th step, by convolutional layer Conv18 output characteristic vectograms be input to without sampling convolutional layer NSConv19, to its according to Secondary progress convolution operation, batch standardization and linear R elu conversion, obtain 1024 13 × 13 pixel sizes without sampling convolution Layer NSConv19 output characteristic vectograms;
20th step, Conv19 output characteristic vectograms are input to without sampling convolutional layer NSConv20, it is carried out successively Convolution operation, batch standardization and linear R elu conversion, obtain 1024 13 × 13 pixel sizes without sampling convolutional layer NSConv20 output characteristic vectograms;
21st step, by convolutional layer Conv20 output characteristic vectograms be input to without sampling convolutional layer NSConv21, to its according to Secondary progress convolution operation, batch standardization and linear R elu conversion, obtain 1024 13 × 13 pixel sizes without sampling convolution Layer NSConv21 output characteristic vectograms;
22nd step, convolutional layer Conv21 output characteristic vectograms are input to low step convolutional layer LSConv22, to it successively Low step convolution operation and linear R elu conversion are carried out, obtains the low convolutional layer LSConv22 outputs of 40 13 × 13 pixel sizes Feature vector chart;
23rd step, low convolutional layer LSConv22 output characteristic vectograms are input to output layer, it carried out successively linear Conversion and non-linear Sigmoid transform, obtain the output vector of output layer, wherein output vector is by the defeated of full convolutional neural networks Go out the coordinate parameters exported in layer and text detection probable value composition.
The sample that training sample after normalized is concentrated is input in full convolutional neural networks, to full convolutional Neural Network is trained, until full convolutional neural networks output layer output vector penalty values be less than or equal to 10, to full convolution god Comprised the following steps that through what network was trained:
1st step, one sample of extraction is concentrated from training sample, the full convolutional neural networks that sample input is built In, then exported accordingly from the output layer of full convolutional neural networks, in this stage, the information of the sample passes through full convolution The successively conversion of neutral net, it is delivered to the output layer of full convolutional neural networks;
2nd step, the penalty values of the output vector of full convolutional neural networks output layer are calculated, using ADAM algorithms to full convolution All parameters of neutral net carry out the adjustment for having supervision so that the penalty values of the output vector of full convolutional neural networks output layer It is gradually reduced;
3rd step, the 1st step is repeated and the step of 2 step, until the loss of the output vector of full convolutional neural networks output layer Value is less than or equal to stop iteration when 10, obtains and preserve the full convolutional neural networks trained.
The penalty values of the output vector of full convolutional neural networks output layer are obtained by following formula:
Wherein, X represents the penalty values of abscissa value in full convolutional neural networks output layer output vector, xpRepresent output to Abscissa value in amount, xrThe real abscissa value in input sample Chinese one's respective area is represented, Y represents full convolutional neural networks output The penalty values of ordinate value, y in layer output vectorpRepresent the ordinate value in output vector, yrRepresent input sample Chinese local area The real ordinate value in domain, W represent the penalty values of width value in full convolutional neural networks output layer output vector, wpRepresent output Width value in vector, wrThe real width value in input sample Chinese one's respective area is represented, H represents full convolutional neural networks output layer The penalty values of height value, h in output vectorpRepresent the height value in output vector, hrRepresent that input sample Chinese one's respective area is true Height value, S represents the penalty values of sine value in full convolutional neural networks output layer output vector, spRepresent in output vector Sine value, srThe real sine value in input sample Chinese one's respective area is represented, T represents full convolutional neural networks output layer output vector The penalty values of middle cosine value, tpRepresent the cosine value in output vector, trRepresent the real cosine in input sample Chinese one's respective area Value, C represent the penalty values of full convolutional neural networks output layer output vector Chinese version detection probability value, cpRepresent in output vector Text detection probable value, crRepresent the real text detection idealized probability value in input sample Chinese one's respective area, l represent output to The penalty values of amount.
Step 4, the coordinate parameters of full convolutional neural networks output layer output are screened.
Reference picture 2, the coordinate parameters specific steps for screening full convolutional neural networks output layer output are described as follows.
1st step, concentrated from the test sample after normalized and obtain a sample do not tested as input sample;
2nd step, current input sample is input in the full convolutional neural networks trained, obtains the full volume of current sample The output vector of text detection probable value and full convolutional neural networks output layer in the output vector of product neutral net output layer In coordinate parameters;
3rd step, judges whether the output vector Chinese version detection probability value of current input sample is more than or equal to 0.6, if so, The 4th step of this step is then performed, otherwise, performs the 5th step of this step;
4th step, retain the coordinate parameters corresponding with the output vector Chinese version detection probability value of current input sample, hold 6th step of this step of row;
5th step, give up the coordinate parameters corresponding with the output vector Chinese version detection probability value of current input sample, hold 6th step of this step of row;
6th step, judges test sample concentrates whether also have the sample do not tested, if so, then performing the 1st of this step Step, otherwise, perform step 5.
Step 5, the text in natural scene image is positioned.
Using the coordinate parameters remained in full convolutional neural networks output layer output vector, test set sample is demarcated successively The text of each sample in this.
The effect of the present invention is further described with reference to emulation experiment.
1st, emulation experiment condition:
Hardware platform is:Intel Core i7-6700CPU@3.40GHz、32GB RAM、Nvidia GeForce GTX 1060 6GB GPU, software platform:Python3.5.2, Tensorflow1.0.1.
2nd, experiment content and interpretation of result:
Training sample set used in emulation experiment of the present invention is to contain text image data collection and nature from artificial synthesized Read under scene in Chinese text RCTW-17 training datas, the instruction of the random image composition for extracting 32000 width known text coordinates Practice sample set.
Image shown in Fig. 3 (a) is that training sample concentration belongs to an artificial synthesized width containing text image data collection Image, the image shown in Fig. 3 (b) are that training sample concentration belongs to reading Chinese text RCTW-17 training datas under natural scene One width training image of collection.The image composition test sample collection that 200 width shot from natural scene contain text.
Image shown in Fig. 4 (a) is the width test image that test sample is concentrated.Image shown in Fig. 4 (b) is to Fig. 4 (a) test image, by 23 layers of full convolutional neural networks and calibrated image is passed through.
The present invention initially sets up the test sample collection of the training sample set containing 32000 width images and 200 width images.To instruction Practice sample set and the training sample set after being normalized and test sample collection is normalized in test sample collection.Use 32000 width training samples train 23 layers of full convolutional neural networks, the 23 layers of full convolutional neural networks trained.By 200 width Test sample is input in the full convolutional neural networks of 23 layers trained, draws the coordinate parameters and text detection probability of output, The coordinate parameters finally retained by screening, the test sample of input is demarcated using the coordinate parameters of reservation
Table 1 is the simulation result of the present invention, in the 200 width test samples after test, the successful sample of localization of text This number is 162 width, and the number of samples for positioning all text failures is 38 width.
Test sample collection String localization result list under the natural scene of table 1.
All successful samples The sample all to fail All samples
Number of samples 162 38 200
Ratio 81.0% 19% 100.0%
From table 1 it follows that be used for using proposed by the present invention to the localization method of natural scene image Chinese version, The successful accuracy rate of localization of text is 81.0% in the image containing text under natural scene, it was demonstrated that the present invention is by building and instructing Practice full convolutional neural networks, the text message of deeper in image can be extracted, have and combine image various features, obtain more complete The text message in face, improve natural scene under text locating accuracy the advantages of.

Claims (7)

1. a kind of localization method for natural scene image Chinese version, it is characterised in that comprise the following steps:
(1) image pattern to be identified is inputted:
(1a) is from the artificial synthesized training data that Chinese text RCTW-17 is read containing text image data collection and natural scene Concentrate, the random image composition training sample set for extracting 32000 width known text coordinates;
(1b) shoots the image composition test sample collection that 200 width contain text from natural scene;
(2) normalized:
Each sample of training sample set and test sample collection is zoomed to 416 × 416 sizes by (2a), forms the training after scaling Sample set and test sample collection;
(2b) concentrates the pixel value of each sample to be normalized the training sample set after scaling and test sample, obtains Training sample set and test sample collection after normalized;
(3) build and train full convolutional neural networks:
(3a) structure contains 23 layers of full convolutional neural networks;
The sample that training sample after normalized is concentrated is input in full convolutional neural networks by (3b), to full convolutional Neural Network is trained, until the penalty values of the output vector of full convolutional neural networks output layer are less than or equal to 10;
(4) coordinate parameters in full convolutional neural networks output vector are screened:
(4a) concentrates from the test sample after normalized obtains a sample do not tested as input sample;
Current input sample is input in the full convolutional neural networks trained by (4b), obtains the full convolutional Neural of current sample Text detection probable value in the output vector of network output layer and the seat in the output vector of full convolutional neural networks output layer Mark parameter;
(4c) judges whether the output vector Chinese version detection probability value of current input sample is more than or equal to 0.6, if so, then performing Step (4d), otherwise, perform step (4e);
(4d) retains the coordinate parameters corresponding with the output vector Chinese version detection probability value of current input sample, performs step (4f);
(4e) gives up the coordinate parameters corresponding with the output vector Chinese version detection probability value of current input sample, performs step (4f);
(4f) judges test sample concentrates whether also have the sample do not tested, if so, then performing step (4a), otherwise, performed Step (5);
(5) text in natural scene image is positioned:
Using the coordinate parameters remained in full convolutional neural networks output layer output vector, demarcate successively in test set sample The text of each sample.
2. the localization method according to claim 1 for natural scene image Chinese version, it is characterised in that:Step (3a) Described in the structures of 23 layers of full convolutional neural networks be convolutional layer Conv1, convolutional layer Conv2, without sampling convolutional layer successively NSConv3, low step convolutional layer LSConv4, convolutional layer Conv5, without sampling convolutional layer NSConv6, low step convolutional layer LSConv7, Convolutional layer Conv8, without sampling convolutional layer NSConv9, low step convolutional layer LSConv10, without sampling convolutional layer NSConv11, low step Convolutional layer LSConv12, convolutional layer Conv13, without sampling convolutional layer NSConv14, low step convolutional layer LSConv15, without sampling roll up Lamination NSConv16, low step convolutional layer LSConv17, without sampling convolutional layer NSConv18, without sampling convolutional layer NSConv19, nothing Sample convolutional layer NSConv20, without sampling convolutional layer NSConv21, low step convolutional layer LSConv22, output layer.
3. the localization method according to claim 1 for natural scene image Chinese version, it is characterised in that:Step (3a) Described in build comprising the following steps that containing 23 layers of full convolutional neural networks:
1st step, the sample that the training sample after normalized is concentrated is input to convolutional layer Conv1, it is rolled up successively Product operation, batch standardization, linear R elu conversion and the operation of maximum down-sampling, obtain the convolution of 32 208 × 208 pixel sizes The feature vector chart of layer Conv1 outputs;
2nd step, convolutional layer Conv1 output characteristic vectograms are input to convolutional layer Conv2, it is carried out successively convolution operation, Standardization, linear R elu conversion and the operation of maximum down-sampling are criticized, obtains the convolutional layer Conv2 of 64 104 × 104 pixel sizes The feature vector chart of output;
3rd step, the convolutional layer Conv2 feature vector charts exported are input to without sampling convolutional layer NSConv3, it is carried out successively Convolution operation, batch standardization and linear R elu conversion, obtain 128 104 × 104 pixel sizes without sampling convolutional layer The feature vector chart of NSConv3 outputs;
4th step, convolutional layer Conv3 output characteristic vectograms are input to low step convolutional layer LSConv4, low step is carried out successively to it Convolution operation and linear R elu conversion, obtain the low convolutional layer LSConv4 output characteristics of 64 104 × 104 pixel sizes to Spirogram;
5th step, convolutional layer LSConv4 output characteristic vectograms are input to convolutional layer Conv5, convolution behaviour is carried out successively to it Make, criticize standardization, linear R elu conversion and the operation of maximum down-sampling, obtain the convolutional layer of 128 52 × 52 pixel sizes Conv5 output characteristic vectograms;
6th step, convolutional layer Conv5 output characteristic vectograms are input to without sampling convolutional layer NSConv6, it is rolled up successively Product operation, batch standardization and linear R elu conversion, obtain 256 52 × 52 pixel sizes without sampling convolutional layer NSConv6 Output characteristic vectogram;
7th step, convolutional layer Conv6 output characteristic vectograms are input to low step convolutional layer LSConv7, low step is carried out successively to it Convolution operation and linear R elu conversion, obtain the low convolutional layer LSConv7 output characteristics vector of 128 52 × 52 pixel sizes Figure;
8th step, convolutional layer LSConv7 output characteristic vectograms are input to convolutional layer Conv8, convolution behaviour is carried out successively to it Make, criticize standardization, linear R elu conversion and the operation of maximum down-sampling, obtain the convolutional layer of 256 26 × 26 pixel sizes Conv8 output characteristic vectograms;
9th step, convolutional layer Conv8 output characteristic vectograms are input to convolutional layer Conv9, it is carried out successively convolution operation, Standardization and linear R elu conversion are criticized, obtains the convolutional layer Conv9 output characteristic vectograms of 512 26 × 26 pixel sizes;
10th step, convolutional layer Conv9 output characteristic vectograms are input to low step convolutional layer LSConv10, it carried out successively low Convolution operation and linear R elu conversion are walked, obtains the low convolutional layer LSConv10 output characteristics of 256 26 × 26 pixel sizes Vectogram;
11st step, by low convolutional layer LSConv10 output characteristic vectograms be input to without sampling convolutional layer NSConv11, to its according to Secondary progress convolution operation, batch standardization and linear R elu conversion, obtain the convolution without sampling of 512 26 × 26 pixel sizes Layer NSConv11 output characteristic vectograms;
12nd step, convolutional layer Conv11 output characteristic vectograms are input to low step convolutional layer LSConv12, it is carried out successively Low step convolution operation and linear R elu conversion, the low convolutional layer LSConv12 outputs for obtaining 256 26 × 26 pixel sizes are special Levy vectogram;
13rd step, convolutional layer LSConv12 output characteristic vectograms are input to convolutional layer Conv13, convolution is carried out successively to it Operation, batch standardization, linear R elu conversion and the operation of maximum down-sampling, obtain the convolutional layer of 512 13 × 13 pixel sizes Conv13 output characteristic vectograms;
14th step, convolutional layer Conv13 output characteristic vectograms are input to without sampling convolutional layer NSConv14, it is entered successively Row convolution operation, batch standardization and linear R elu conversion, obtain 1024 13 × 13 pixel sizes without sampling convolutional layer NSConv14 output characteristic vectograms;
15th step, convolutional layer Conv14 output characteristic vectograms are input to low step convolutional layer LSConv15, it is carried out successively Low step convolution operation and linear R elu conversion, the low convolutional layer LSConv15 outputs for obtaining 512 13 × 13 pixel sizes are special Levy vectogram;
16th step, by low convolutional layer LSConv15 output characteristic vectograms be input to without sampling convolutional layer NSConv16, to its according to Secondary progress convolution operation, batch standardization and linear R elu conversion, obtain 1024 13 × 13 pixel sizes without sampling convolution Layer NSConv16 output characteristic vectograms;
17th step, convolutional layer Conv16 output characteristic vectograms are input to low step convolutional layer LSConv17, it is carried out successively Low step convolution operation and linear R elu conversion, the low convolutional layer LSConv17 outputs for obtaining 512 13 × 13 pixel sizes are special Levy vectogram;
18th step, by low convolutional layer LSConv17 output characteristic vectograms be input to without sampling convolutional layer NSConv18, to its according to Secondary progress convolution operation, batch standardization and linear R elu conversion, obtain 1024 13 × 13 pixel sizes without sampling convolution Layer NSConv18 output characteristic vectograms;
19th step, convolutional layer Conv18 output characteristic vectograms are input to without sampling convolutional layer NSConv19, it is entered successively Row convolution operation, batch standardization and linear R elu conversion, obtain 1024 13 × 13 pixel sizes without sampling convolutional layer NSConv19 output characteristic vectograms;
20th step, Conv19 output characteristic vectograms are input to without sampling convolutional layer NSConv20, convolution is carried out successively to it Operation, batch standardization and linear R elu conversion, obtain 1024 13 × 13 pixel sizes without sampling convolutional layer NSConv20 Output characteristic vectogram;
21st step, convolutional layer Conv20 output characteristic vectograms are input to without sampling convolutional layer NSConv21, it is entered successively Row convolution operation, batch standardization and linear R elu conversion, obtain 1024 13 × 13 pixel sizes without sampling convolutional layer NSConv21 output characteristic vectograms;
22nd step, convolutional layer Conv21 output characteristic vectograms are input to low step convolutional layer LSConv22, it is carried out successively Low step convolution operation and linear R elu conversion, obtain the low convolutional layer LSConv22 output characteristics of 40 13 × 13 pixel sizes Vectogram;
23rd step, low convolutional layer LSConv22 output characteristic vectograms are input to output layer, linear transformation is carried out successively to it And non-linear Sigmoid transform, obtain the output vector of output layer, wherein output vector by full convolutional neural networks output layer Coordinate parameters and text detection the probable value composition of middle output.
4. the localization method according to claim 1 for natural scene image Chinese version, it is characterised in that:Step (3b) Described in comprised the following steps that to what full convolutional neural networks were trained:
1st step, one sample of extraction is concentrated from training sample, in the full convolutional neural networks that sample input is built, so Exported accordingly from the output layer of full convolutional neural networks afterwards, in this stage, the information of the sample passes through full convolutional Neural The successively conversion of network, it is delivered to the output layer of full convolutional neural networks;
2nd step, the penalty values of the output vector of full convolutional neural networks output layer are calculated, using ADAM algorithms to full convolutional Neural All parameters of network carry out the adjustment for having supervision so that the penalty values of the output vector of full convolutional neural networks output layer are gradual Reduce;
3rd step, the 1st step is repeated and the step of 2 step, until the penalty values of the output vector of full convolutional neural networks output layer are small Stop iteration when equal to 10, obtain and preserve the full convolutional neural networks trained.
5. the localization method according to claim 1 for natural scene image Chinese version, it is characterised in that:Step (3b) Described in the penalty values of output vector of full convolutional neural networks output layer obtained by following formula:
<mfenced open = "{" close = ""> <mtable> <mtr> <mtd> <mrow> <mi>X</mi> <mo>=</mo> <msup> <mrow> <mo>(</mo> <msup> <mi>x</mi> <mi>p</mi> </msup> <mo>-</mo> <msup> <mi>x</mi> <mi>r</mi> </msup> <mo>)</mo> </mrow> <mn>2</mn> </msup> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mi>Y</mi> <mo>=</mo> <msup> <mrow> <mo>(</mo> <msup> <mi>y</mi> <mi>p</mi> </msup> <mo>-</mo> <msup> <mi>y</mi> <mi>r</mi> </msup> <mo>)</mo> </mrow> <mn>2</mn> </msup> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mi>W</mi> <mo>=</mo> <msup> <mrow> <mo>(</mo> <msup> <mi>w</mi> <mi>p</mi> </msup> <mo>-</mo> <msup> <mi>w</mi> <mi>r</mi> </msup> <mo>)</mo> </mrow> <mn>2</mn> </msup> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mi>H</mi> <mo>=</mo> <msup> <mrow> <mo>(</mo> <msup> <mi>h</mi> <mi>p</mi> </msup> <mo>-</mo> <msup> <mi>h</mi> <mi>r</mi> </msup> <mo>)</mo> </mrow> <mn>2</mn> </msup> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mi>S</mi> <mo>=</mo> <msup> <mrow> <mo>(</mo> <msup> <mi>s</mi> <mi>p</mi> </msup> <mo>-</mo> <msup> <mi>s</mi> <mi>r</mi> </msup> <mo>)</mo> </mrow> <mn>2</mn> </msup> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mi>T</mi> <mo>=</mo> <msup> <mrow> <mo>(</mo> <msup> <mi>t</mi> <mi>p</mi> </msup> <mo>-</mo> <msup> <mi>t</mi> <mi>r</mi> </msup> <mo>)</mo> </mrow> <mn>2</mn> </msup> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mi>C</mi> <mo>=</mo> <msup> <mrow> <mo>(</mo> <msup> <mi>c</mi> <mi>p</mi> </msup> <mo>-</mo> <msup> <mi>c</mi> <mi>r</mi> </msup> <mo>)</mo> </mrow> <mn>2</mn> </msup> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mi>l</mi> <mo>=</mo> <mfrac> <mn>5</mn> <mn>2</mn> </mfrac> <mo>&amp;lsqb;</mo> <mi>X</mi> <mo>+</mo> <mi>Y</mi> <mo>+</mo> <mi>W</mi> <mo>+</mo> <mi>H</mi> <mo>+</mo> <mi>S</mi> <mo>+</mo> <mi>T</mi> <mo>&amp;rsqb;</mo> <mo>+</mo> <mfrac> <mn>3</mn> <mn>2</mn> </mfrac> <mi>C</mi> </mrow> </mtd> </mtr> </mtable> </mfenced>
Wherein, X represents the penalty values of abscissa value in full convolutional neural networks output layer output vector, xpRepresent in output vector Abscissa value, xrThe real abscissa value in input sample Chinese one's respective area is represented, Y represents that full convolutional neural networks output layer is defeated The penalty values of ordinate value, y in outgoing vectorpRepresent the ordinate value in output vector, yrRepresent that input sample Chinese one's respective area is true Real ordinate value, W represent the penalty values of width value in full convolutional neural networks output layer output vector, wpRepresent output vector In width value, wrThe real width value in input sample Chinese one's respective area is represented, H represents full convolutional neural networks output layer output The penalty values of height value, h in vectorpRepresent the height value in output vector, hrRepresent that input sample Chinese one's respective area is really high Angle value, S represent the penalty values of sine value in full convolutional neural networks output layer output vector, spRepresent the sine in output vector Value, srThe real sine value in input sample Chinese one's respective area is represented, T represents remaining in full convolutional neural networks output layer output vector The penalty values of string value, tpRepresent the cosine value in output vector, trRepresent the real cosine value in input sample Chinese one's respective area, C tables Show the penalty values of full convolutional neural networks output layer output vector Chinese version detection probability value, cpRepresent the text in output vector Detection probability value, crThe real text detection idealized probability value in input sample Chinese one's respective area is represented, l represents the damage of output vector Mistake value.
6. the localization method according to claim 1 for natural scene image Chinese version, it is characterised in that:Step (4b) Text detection probable value in the output vector of described full convolutional neural networks output layer is obtained by following formula:
<mrow> <mi>c</mi> <mo>=</mo> <mfrac> <mn>1</mn> <mrow> <mn>1</mn> <mo>+</mo> <msup> <mi>e</mi> <mrow> <mo>-</mo> <mi>&amp;beta;</mi> </mrow> </msup> </mrow> </mfrac> </mrow>
Wherein, c represents the text detection probable value in the output vector of full convolutional neural networks output layer, and β represents full convolution god The 7th output valve in output vector through network output layer from left to right.
7. the localization method according to claim 1 for natural scene image Chinese version, it is characterised in that:Step (4b) Coordinate parameters in the output vector of described full convolutional neural networks output layer include its abscissa value, ordinate value, width Value, height value, sine value, cosine value.
CN201710781807.7A 2017-09-02 2017-09-02 Method for positioning text in natural scene image Active CN107563379B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710781807.7A CN107563379B (en) 2017-09-02 2017-09-02 Method for positioning text in natural scene image

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710781807.7A CN107563379B (en) 2017-09-02 2017-09-02 Method for positioning text in natural scene image

Publications (2)

Publication Number Publication Date
CN107563379A true CN107563379A (en) 2018-01-09
CN107563379B CN107563379B (en) 2019-12-24

Family

ID=60977874

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710781807.7A Active CN107563379B (en) 2017-09-02 2017-09-02 Method for positioning text in natural scene image

Country Status (1)

Country Link
CN (1) CN107563379B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108288088A (en) * 2018-01-17 2018-07-17 浙江大学 A kind of scene text detection method based on end-to-end full convolutional neural networks
CN108664968A (en) * 2018-04-18 2018-10-16 江南大学 A kind of unsupervised text positioning method based on text selection model
CN108805131A (en) * 2018-05-22 2018-11-13 北京旷视科技有限公司 Text line detection method, apparatus and system
CN109858318A (en) * 2018-11-16 2019-06-07 平安科技(深圳)有限公司 The classification recognition methods of landscape image and device
CN110399871A (en) * 2019-06-14 2019-11-01 华南理工大学 A kind of appraisal procedure of scene text testing result
CN110689012A (en) * 2019-10-08 2020-01-14 山东浪潮人工智能研究院有限公司 End-to-end natural scene text recognition method and system
CN112200598A (en) * 2020-09-08 2021-01-08 北京数美时代科技有限公司 Picture advertisement identification method and device and computer equipment
CN112836696A (en) * 2019-11-22 2021-05-25 搜狗(杭州)智能科技有限公司 Text data detection method and device and electronic equipment
CN113342994A (en) * 2021-07-05 2021-09-03 成都信息工程大学 Recommendation system based on non-sampling cooperative knowledge graph network

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090116756A1 (en) * 2007-11-06 2009-05-07 Copanion, Inc. Systems and methods for training a document classification system using documents from a plurality of users
CN102663383A (en) * 2012-04-26 2012-09-12 北京科技大学 Method for positioning texts in images of natural scene
CN104182750A (en) * 2014-07-14 2014-12-03 上海交通大学 Extremum connected domain based Chinese character detection method in natural scene image
CN104809481A (en) * 2015-05-21 2015-07-29 中南大学 Natural scene text detection method based on adaptive color clustering
CN105825216A (en) * 2016-03-17 2016-08-03 中国科学院信息工程研究所 Method of locating text in complex background image

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090116756A1 (en) * 2007-11-06 2009-05-07 Copanion, Inc. Systems and methods for training a document classification system using documents from a plurality of users
CN102663383A (en) * 2012-04-26 2012-09-12 北京科技大学 Method for positioning texts in images of natural scene
CN104182750A (en) * 2014-07-14 2014-12-03 上海交通大学 Extremum connected domain based Chinese character detection method in natural scene image
CN104809481A (en) * 2015-05-21 2015-07-29 中南大学 Natural scene text detection method based on adaptive color clustering
CN105825216A (en) * 2016-03-17 2016-08-03 中国科学院信息工程研究所 Method of locating text in complex background image

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
ANKUSH GUPTA 等: "Synthetic Data for Text Localisation in Natural Images", 《THE IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)》 *
DENA BAZAZIAN 等: "Improving text proposals for scene images with fully convolutional networks", 《网页在线公开:HTTPS://ARXIV.ORG/ABS/1702.05089》 *
ZECHENG XIE 等: "Learning spatial-semantic context with fully convolutional recurrent network for online handwritten Chinese text recognition", 《网页在线公开:HTTPS://ARXIV.ORG/ABS/ 1610.02616》 *
贺通姚剑: "基于全卷积网络的场景文本检测", 《黑龙江科技信息》 *
骆遥: "基于深度全卷积神经网络的文字区域定位方法", 《无线互联科技》 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108288088B (en) * 2018-01-17 2020-02-28 浙江大学 Scene text detection method based on end-to-end full convolution neural network
CN108288088A (en) * 2018-01-17 2018-07-17 浙江大学 A kind of scene text detection method based on end-to-end full convolutional neural networks
CN108664968A (en) * 2018-04-18 2018-10-16 江南大学 A kind of unsupervised text positioning method based on text selection model
CN108664968B (en) * 2018-04-18 2020-07-07 江南大学 Unsupervised text positioning method based on text selection model
CN108805131A (en) * 2018-05-22 2018-11-13 北京旷视科技有限公司 Text line detection method, apparatus and system
CN109858318A (en) * 2018-11-16 2019-06-07 平安科技(深圳)有限公司 The classification recognition methods of landscape image and device
CN110399871A (en) * 2019-06-14 2019-11-01 华南理工大学 A kind of appraisal procedure of scene text testing result
CN110689012A (en) * 2019-10-08 2020-01-14 山东浪潮人工智能研究院有限公司 End-to-end natural scene text recognition method and system
CN112836696A (en) * 2019-11-22 2021-05-25 搜狗(杭州)智能科技有限公司 Text data detection method and device and electronic equipment
CN112200598A (en) * 2020-09-08 2021-01-08 北京数美时代科技有限公司 Picture advertisement identification method and device and computer equipment
CN112200598B (en) * 2020-09-08 2022-02-15 北京数美时代科技有限公司 Picture advertisement identification method and device and computer equipment
CN113342994A (en) * 2021-07-05 2021-09-03 成都信息工程大学 Recommendation system based on non-sampling cooperative knowledge graph network
CN113342994B (en) * 2021-07-05 2022-07-05 成都信息工程大学 Recommendation system based on non-sampling cooperative knowledge graph network

Also Published As

Publication number Publication date
CN107563379B (en) 2019-12-24

Similar Documents

Publication Publication Date Title
CN107563379A (en) For the localization method to natural scene image Chinese version
AU2020100200A4 (en) Content-guide Residual Network for Image Super-Resolution
CN106127684B (en) Image super-resolution Enhancement Method based on forward-backward recutrnce convolutional neural networks
CN104978580B (en) A kind of insulator recognition methods for unmanned plane inspection transmission line of electricity
CN103077511B (en) Image super-resolution reconstruction method based on dictionary learning and structure similarity
CN108549893A (en) A kind of end-to-end recognition methods of the scene text of arbitrary shape
CN111986099A (en) Tillage monitoring method and system based on convolutional neural network with residual error correction fused
CN109064396A (en) A kind of single image super resolution ratio reconstruction method based on depth ingredient learning network
CN108399362A (en) A kind of rapid pedestrian detection method and device
CN108197606A (en) The recognition methods of abnormal cell in a kind of pathological section based on multiple dimensioned expansion convolution
CN108038420A (en) A kind of Human bodys&#39; response method based on deep video
CN105335929B (en) A kind of depth map ultra-resolution method
CN105069746A (en) Video real-time human face substitution method and system based on partial affine and color transfer technology
CN105069825A (en) Image super resolution reconstruction method based on deep belief network
CN106339984B (en) Distributed image ultra-resolution method based on K mean value driving convolutional neural networks
CN110276354A (en) A kind of training of high-resolution Streetscape picture semantic segmentation and real time method for segmenting
CN110458165A (en) A kind of natural scene Method for text detection introducing attention mechanism
CN110223304A (en) A kind of image partition method, device and computer readable storage medium based on multipath polymerization
CN110136060A (en) The image super-resolution rebuilding method of network is intensively connected based on shallow-layer
CN106169174A (en) A kind of image magnification method
CN107424161A (en) A kind of indoor scene image layout method of estimation by thick extremely essence
CN108765349A (en) A kind of image repair method and system with watermark
CN110349087A (en) RGB-D image superior quality grid generation method based on adaptability convolution
CN105095857A (en) Face data enhancement method based on key point disturbance technology
CN104091364B (en) Single-image super-resolution reconstruction method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant