CN109948609A - Intelligently reading localization method based on deep learning - Google Patents

Intelligently reading localization method based on deep learning Download PDF

Info

Publication number
CN109948609A
CN109948609A CN201910168207.2A CN201910168207A CN109948609A CN 109948609 A CN109948609 A CN 109948609A CN 201910168207 A CN201910168207 A CN 201910168207A CN 109948609 A CN109948609 A CN 109948609A
Authority
CN
China
Prior art keywords
bounding box
picture
examination question
deep learning
localization method
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201910168207.2A
Other languages
Chinese (zh)
Inventor
桂冠
邵蕾
李懋阳
刘超
熊健
杨洁
孙颖异
孟洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Post and Telecommunication University
Original Assignee
Nanjing Post and Telecommunication University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Post and Telecommunication University filed Critical Nanjing Post and Telecommunication University
Priority to CN201910168207.2A priority Critical patent/CN109948609A/en
Publication of CN109948609A publication Critical patent/CN109948609A/en
Withdrawn legal-status Critical Current

Links

Landscapes

  • Image Analysis (AREA)

Abstract

The present invention discloses a kind of intelligently reading localization method based on deep learning, and by taking verbal exercise positions as an example, method includes shooting several verbal exercise pictures with mobile phone, and be classified as training set and test set;Using labelImg annotation tool, the position of verbal exercise on every picture in training set is marked with Bounding Box;Then xml document is generated, xml document is converted into txt file;It modifies to YOLOv3 algorithm, the classification of pupil's verbal exercise is added in classification, and data set is trained;Weight is saved after the completion of training, the picture in test set is tested, returns out the Bounding Box of each verbal exercise in every picture, realizes the positioning of pupil's verbal exercise.The present invention can be realized the positioning function to verbal exercises several in picture, and the accuracy of test result is higher, so as to reduce the workload of group signature.

Description

Intelligently reading localization method based on deep learning
Technical field
The present invention relates to a kind of intelligently reading localization method based on deep learning belongs to computer visual image processing skill Art field.
Background technique
With the continuous development of information technology, educational information technologyization also achieves the progress for making us proud.For teacher For, correcting a large amount of papers is extremely cumbersome and time-consuming work, and may not can guarantee correct ground completely it is correct, with people For work intelligence in the extensive use of all trades and professions, artificial intelligence can help teacher comment paper, so that the work for mitigating teacher is negative Load, and can guarantee the accuracy rate goed over examination papers.Computer vision field is to the identification of picture usually using OCR Text region at present System, OCR accurately cannot carry out subregion identification to picture, when in face of there is the paper of numerous topics, although OCR It can accurately identify text, but when occurring a large amount of texts in picture, be easy to happen the phenomenon of text confusion, this is because OCR cannot achieve to picture carry out subregion identification, it can only whole picture identify together, so as to cause that can not differentiate topic To mistake, so needing first to position the topic on paper before using OCR Text region.
Summary of the invention
The purpose of the present invention is to provide a kind of intelligently reading localization method based on deep learning, solves OCR text Identifying system can not auto-partition domain identification the problem of.
Intelligently reading localization method based on deep learning, comprising the following steps:
1) data set is created, detailed process is as follows:
11) the jpg format picture of several examination questions is obtained by way of shooting, wherein include in every examination question picture Examination question picture is divided into training set and test set by several examination questions;
12) every examination question picture in training set is marked using Bounding Box using labelImg annotation tool Note, marks the specific location of each examination question in every examination question picture;
13) after completing mark, the picture of jpg format is generated to the file of xml format;
14) file of xml format is converted to the label file of txt format, stores the picture inside each label file In each examination question bounding box five numerical value;First value indicates the class number of the examination question picture in five numerical value, the Two values indicate the central point x coordinate of the bounding box after normalization, and third value indicates the central point y of the bounding box after normalization Coordinate, the 4th value indicate the bounding box width after normalization, and the 5th value indicates the height of the bounding box after normalization, described Bounding box refers to the box of the mark of the Bounding Box comprising the examination question;
2) each examination question is positioned using the YOLOv3 algorithm based on deep learning, detailed process is as follows:
21) bounding box of each examination question is predicted;
22) classified to the bounding box predicted using multi-tag classification, be divided into 0 and 1 liang of class, indicate 0 boundary Frame removes in picture, and reservation represents 1 bounding box;
23) box of three kinds of different scales is predicted using YOLOv3 algorithm;
24) Darknet-53 network is constructed, feature extraction is carried out;
3) training set is put into Darknet-53 network and is trained, export the positioning of each examination question, training complete with After automatically generate weight file;
4) picture in test set is tested, exports the Bounding of each examination question precise positioning of test chart on piece Box。
In aforementioned step 11), training set and test set are divided according to setting ratio.
In aforementioned step 14), format conversion is carried out using voc_label.py file in YOLOv3 algorithm.
After format conversion above-mentioned, the picture of the label file of every examination question picture and original jpg format is placed on same In file.
Aforementioned step 21), bounding box predicted value are as follows:
bx=σ (tx)+cx (1)
by=σ (ty)+cy (2)
Wherein, (bx,by) it is the center point coordinate for predicting obtained bounding box, bwTo predict the obtained width of bounding box, bhFor the width for predicting obtained bounding box, σ (tx)、σ(ty) be coordinate square error loss, (tx,ty) indicate after normalizing Bounding box center point coordinate value, tw,thThe width and height of bounding box after indicating normalization, (cx,cy) it is the upper left corner Offset, pw,phFor the width and height of the bounding box before prediction.
During bounding box prediction above-mentioned, the score of an object is predicted each bounding box by logistic regression, The score is indicated with Duplication, if Duplication does not reach the threshold value of setting, the bounding box of the prediction will be neglected Slightly;The Duplication refers to the specific gravity between the bounding box that prediction obtains and true bounding box.
Threshold value above-mentioned is taken as 0.5.
In the training process, when the number of iterations is less than 1000 times, every 100 preservations are primary, when repeatedly for aforementioned step 3) When generation number is more than 1000 times, then every 10000 preservations are primary.
The beneficial effects obtained by the present invention are as follows are as follows:
The present invention creatively establishes the model of deep learning, and is applied and goed over examination papers positioning field in artificial intelligence, warp The training of YOLOv3 algorithm is crossed to extract the feature of verbal exercise, realizes the precise positioning to verbal exercise.
The present invention only needs to shoot verbal exercise picture can realize the function of precise positioning to verbal exercises several in figure, overcome OCR character identification system is unable to the problem of subregion identification, test result accuracy with higher and robustness, to realize Intelligently reading is had laid a good foundation.
Detailed description of the invention
Fig. 1 is the flow chart of the method for the present invention;
Fig. 2 is YOLOv3 network structure in the present invention;
Fig. 3 is the verbal exercise picture in test set;
Fig. 4 is the effect picture that Fig. 3 realizes verbal exercise positioning by test.
Specific embodiment
The invention will be further described below.Following embodiment is only used for clearly illustrating technical side of the invention Case, and not intended to limit the protection scope of the present invention.
The present invention realizes on python platform, referring to Fig. 1, mainly including the following steps:
Step 1: creation data set
The invention proposes a kind of intelligently reading localization method based on deep learning, this method is using the study for having supervision Method.Deep learning be unable to do without the development of data set in the development of all trades and professions, so establishing good data set is depth Habit provides good precondition.In order to preferably train deep learning network, the present invention is by taking the positioning of verbal exercise as an example, wound A small-sized verbal exercise data set is built.Specific creation process is as follows:
11) several pupil's verbal exercises are originally shot, is classified as training set and test set.
For example, by mobile phone shoot 120 pupil's verbal exercises high definition picture, and will wherein 90 be used as training set, Remaining 30 as test.
12) picture every in training set is labeled using labelImg annotation tool using Bounding Box, is marked Remember the specific location of each verbal exercise out, so that each verbal exercise is included in a Bounding Box, consequently facilitating The training of YOLOv3 network.When to verbal exercise, specific location is labeled in picture, due to being closer for each verbal exercise, It has to accurately outline its position, provides good precondition for the training of YOLOv3, and have very by test Good accuracy and robustness.
13) verbal exercise each in picture is carried out after precisely marking, a picture generates an xml document.By jpg format Picture generate the file of xml format, such as original picture is that " 88.jpg " format is generated as the file of " 88.xml " format.
The present invention realizes positioning, fortune using the YOLOv3 algorithm (You Only Look Once algorithm) based on deep learning Voc_label.py file converts the xml formatted file of generation to suitable for YOLOv3 network training in row YOLOv3 algorithm The label file of txt format.The label file (txt format) and original picture (jpg format) of every picture are placed on same file Underedge.Five numerical value of every verbal exercise bounding box on original image are stored inside each label file (txt format), such as: (1,0.314706855353,0.318333223444,0.273583234244,0.12)。
Bounding box refers to the box comprising each target position (every verbal exercise).The seat in the lower left corner of hypothetical boundary frame Mark is (x1,y1), the coordinate in the upper right corner is (x2,y2), wide and high respectively w and h.First value indicates the figure in five numerical value Class number, the category number can self-defining, second value indicate normalization after central point txCoordinate, third are worth table Central point t after showing normalizationyCoordinate, the 4th value indicate the target width of frame t after normalizationw, the 5th value expression normalization The height t of target frame afterwardsh, central point refers to the center of bounding box.
Step 2: it is realized using the YOLOv3 algorithm based on deep learning and each verbal exercise is positioned
The present invention modifies to the YOLOv3 algorithm based on deep learning, and the classification of pupil's verbal exercise is added.
21) predicted boundary frame
YOLOv3 algorithm obtains anchor boxes by the method clustered.YOLOv3 predicts t to each bounding boxx,ty, tw,thThis four coordinate values, (tx,ty) indicate center point coordinate value, tw,thIndicate the width and height of bounding box.
For the cell (picture is divided into S × S latticed cell) of prediction, according to the offset in the picture upper left corner (cx,cy) --- the distance in the offset picture upper left corner, and the width and high p of the bounding box before predictionw,ph, can to bounding box according to According to following formula predictions:
bx=σ (tx)+cx (1)
by=σ (ty)+cy (2)
bx,by,bw,bhThe center point coordinate and size of the bounding box exactly predicted.σ(tx)、σ(ty) it is to sit Target loss, loses using square error.
In training bx,by,bw,bhWhen uses sum of squared error loss, and (quadratic sum range error is damaged Lose), in this way, error can be calculated quickly.
YOLOv3 predicts by logistic regression each bounding box the score (Duplication) of an object, and Duplication refers to pre- Survey the specific gravity between frame and true frame.If this bounding box of prediction is Chong Die with true bounding box large area, and than it His predicted value of institute's bounding box is all good, then this value is exactly 1.It (is set here if Duplication does not reach a threshold value 0.5), then the bounding box of this prediction will be ignored, to represent free of losses value.
22) classify to bounding box
To each bounding box prediction classification (0 and 1 classification), indicate that 0 bounding box removes in figure, reservation represents 1 side Boundary's frame, bounding box use multi-tag classification (multi-label classification).YOLOv3 algorithm is using simply patrolling It collects to return and classify, entropy loss (binary cross-entropy loss) function is intersected using two-value.
23) across scale prediction
The network can predict the box of three kinds of different scales.YOLOv3 extracts spy using a feature pyramid network Sign.In essential characteristic extractor, it is added to several convolutional layers.Wherein the last layer prediction three-dimensional tensor is encoded Bounding Box (bounding box), objectness (object) and class prediction.Bounding is determined using k-means cluster Box priors (priori bounding box) selects nine clusters (cluster) and three scales (scale), then entire Even partition clusters (cluster) on scales (scale).
24) feature extraction is carried out
The Feature Selection Model of YOLOv3 mixes a variety of models, it has used YOLOv2, Darknet-19 and Residual error network, this model use 3 × 3 and 1 × 1 convolutional layer of better performances, are also added to shortcut Connection structure.Finally it has 53 convolutional layers, therefore is named as Darknet-53 network.Referring to fig. 2, Darknet-53 indicates 53 convolutional layers, practical to account for 74 layers (calculating by the network structure exported when running) altogether.detection Layer is responsible for predicting some scale (dividing grid number, have 3 scales 13,26 and 52 to predict respectively at 82,94,106 layers) (each grid predicts that the regressand value of 3 boxes includes coordinate, object and classification, total 3* (4+1+20)=75 to boxes regressand value A value).If route layers there are two parameters, two layers of connection, such as 86 layers of connection 85 and 61 are indicated;One parameter indicates this Route layers (such as 83 layers consistent with 79) consistent with that layer parameter.Shortcut layers of parameter layer is connected with this layer.Darknet- The 53 network applications skip floor connection type of residual error network, performance is better than ResNet-152 and ResNet-101 network, because of net Network basic unit has differences, and the network number of plies is fewer, and parameter is also few, and the calculation amount needed is few.Darknet-53 network can To realize highest floating-point operation per second, this, which represents network structure, can more effectively utilize GPU.
It is trained Step 4: training set is put into Darknet-53 network, exports the positioning of every verbal exercise, training Weight file is automatically generated after completing.When the number of iterations is less than 1000 times, every 100 preservations are primary, when the number of iterations is super When crossing 1000 times, then every 10000 preservations are primary.
Step 5: the picture in test set is tested, the output each verbal exercise precise positioning of test chart on piece Bounding Box, to complete the positioning to multiple verbal exercises on a picture.If Fig. 3 is one to dehisce arithmetic problem picture, Fig. 4 is The effect picture of verbal exercise positioning is realized by test.
The above is only a preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art For member, without departing from the technical principles of the invention, several improvement and deformations can also be made, these improvement and deformations Also it should be regarded as protection scope of the present invention.

Claims (8)

1. the intelligently reading localization method based on deep learning, which comprises the following steps:
1) data set is created, detailed process is as follows:
11) the jpg format picture of several examination questions is obtained by way of shooting, wherein comprising several in every examination question picture Examination question picture is divided into training set and test set by a examination question;
12) every examination question picture in training set is labeled using labelImg annotation tool using Bounding Box, is marked Remember the specific location of each examination question in every examination question picture out;
13) after completing mark, the picture of jpg format is generated to the file of xml format;
14) file of xml format is converted to the label file of txt format, is stored inside each label file every in the picture Five numerical value of a examination question bounding box;In five numerical value first value indicate the examination question picture class number, second Value indicates the central point x coordinate of the bounding box after normalization, and third value indicates that the central point y of the bounding box after normalization is sat Mark, the 4th value indicate the bounding box width after normalization, and the 5th value indicates the height of the bounding box after normalization, the side Boundary's frame refers to the box of the mark of the Bounding Box comprising the examination question;
2) each examination question is positioned using the YOLOv3 algorithm based on deep learning, detailed process is as follows:
21) bounding box of each examination question is predicted;
22) classified to the bounding box predicted using multi-tag classification, be divided into 0 and 1 liang of class, indicate that 0 bounding box exists Remove in picture, reservation represents 1 bounding box;
23) box of three kinds of different scales is predicted using YOLOv3 algorithm;
24) Darknet-53 network is constructed, feature extraction is carried out;
3) training set is put into Darknet-53 network and is trained, export the positioning of each examination question, training is completed later certainly It is dynamic to generate weight file;
4) picture in test set is tested, exports the Bounding Box of each examination question precise positioning of test chart on piece.
2. the intelligently reading localization method according to claim 1 based on deep learning, which is characterized in that the step 11) in, training set and test set are divided according to setting ratio.
3. the intelligently reading localization method according to claim 1 based on deep learning, which is characterized in that the step 14) in, format conversion is carried out using voc_label.py file in YOLOv3 algorithm.
4. the intelligently reading localization method according to claim 3 based on deep learning, which is characterized in that format conversion Afterwards, the picture of the label file of every examination question picture and original jpg format is placed on same file underedge.
5. the intelligently reading localization method according to claim 1 based on deep learning, which is characterized in that the step 21), bounding box predicted value is as follows:
bx=σ (tx)+cx (1)
by=σ (ty)+cy (2)
Wherein, (bx,by) it is the center point coordinate for predicting obtained bounding box, bwFor the width for predicting obtained bounding box, bhFor Predict the width of obtained bounding box, σ (tx)、σ(ty) be coordinate square error loss, (tx,ty) indicate the side after normalization The center point coordinate value of boundary's frame, tw,thThe width and height of bounding box after indicating normalization, (cx,cy) it is the inclined of the upper left corner It moves, pw,phFor the width and height of the bounding box before prediction.
6. the intelligently reading localization method according to claim 5 based on deep learning, which is characterized in that bounding box prediction In the process, predict each bounding box that the score of an object, the score are indicated with Duplication by logistic regression, such as Fruit Duplication does not reach the threshold value of setting, then the bounding box of the prediction will be ignored;The Duplication refers to that prediction obtains Bounding box and true bounding box between specific gravity.
7. the intelligently reading localization method according to claim 6 based on deep learning, which is characterized in that the threshold value takes It is 0.5.
8. the intelligently reading localization method according to claim 1 based on deep learning, which is characterized in that the step 3) In the training process, when the number of iterations is less than 1000 times, every 100 preservations are primary, when the number of iterations is more than 1000 times, then Every 10000 preservations are primary.
CN201910168207.2A 2019-03-06 2019-03-06 Intelligently reading localization method based on deep learning Withdrawn CN109948609A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910168207.2A CN109948609A (en) 2019-03-06 2019-03-06 Intelligently reading localization method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910168207.2A CN109948609A (en) 2019-03-06 2019-03-06 Intelligently reading localization method based on deep learning

Publications (1)

Publication Number Publication Date
CN109948609A true CN109948609A (en) 2019-06-28

Family

ID=67009181

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910168207.2A Withdrawn CN109948609A (en) 2019-03-06 2019-03-06 Intelligently reading localization method based on deep learning

Country Status (1)

Country Link
CN (1) CN109948609A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110751138A (en) * 2019-09-09 2020-02-04 浙江工业大学 Pan head identification method based on yolov3 and CNN
CN110781888A (en) * 2019-10-25 2020-02-11 北京字节跳动网络技术有限公司 Method and device for regressing screen in video picture, readable medium and electronic equipment
CN111242131A (en) * 2020-01-06 2020-06-05 北京十六进制科技有限公司 Method, storage medium and device for image recognition in intelligent marking
CN112132143A (en) * 2020-11-23 2020-12-25 北京易真学思教育科技有限公司 Data processing method, electronic device and computer readable medium
CN112597878A (en) * 2020-12-21 2021-04-02 安徽七天教育科技有限公司 Sample making and identifying method for scanning test paper layout analysis

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110751138A (en) * 2019-09-09 2020-02-04 浙江工业大学 Pan head identification method based on yolov3 and CNN
CN110781888A (en) * 2019-10-25 2020-02-11 北京字节跳动网络技术有限公司 Method and device for regressing screen in video picture, readable medium and electronic equipment
CN111242131A (en) * 2020-01-06 2020-06-05 北京十六进制科技有限公司 Method, storage medium and device for image recognition in intelligent marking
CN111242131B (en) * 2020-01-06 2024-05-10 北京十六进制科技有限公司 Method, storage medium and device for identifying images in intelligent paper reading
CN112132143A (en) * 2020-11-23 2020-12-25 北京易真学思教育科技有限公司 Data processing method, electronic device and computer readable medium
CN112597878A (en) * 2020-12-21 2021-04-02 安徽七天教育科技有限公司 Sample making and identifying method for scanning test paper layout analysis

Similar Documents

Publication Publication Date Title
CN109948609A (en) Intelligently reading localization method based on deep learning
CN110399905B (en) Method for detecting and describing wearing condition of safety helmet in construction scene
CN107391703B (en) The method for building up and system of image library, image library and image classification method
CN104573669B (en) Image object detection method
CN104463101B (en) Answer recognition methods and system for character property examination question
Gu et al. A new deep learning method based on AlexNet model and SSD model for tennis ball recognition
CN108520273A (en) A kind of quick detection recognition method of dense small item based on target detection
CN110472642A (en) Fine granularity Image Description Methods and system based on multistage attention
CN107945153A (en) A kind of road surface crack detection method based on deep learning
CN108629367A (en) A method of clothes Attribute Recognition precision is enhanced based on depth network
CN107563439A (en) A kind of model for identifying cleaning food materials picture and identification food materials class method for distinguishing
CN110532920A (en) Smallest number data set face identification method based on FaceNet method
CN105260738A (en) Method and system for detecting change of high-resolution remote sensing image based on active learning
CN110096711A (en) The natural language semantic matching method of the concern of the sequence overall situation and local dynamic station concern
CN105701502A (en) Image automatic marking method based on Monte Carlo data balance
CN104680144A (en) Lip language recognition method and device based on projection extreme learning machine
CN104794455B (en) A kind of Dongba pictograph recognition methods
CN106570521A (en) Multi-language scene character recognition method and recognition system
CN105205449A (en) Sign language recognition method based on deep learning
CN105095863A (en) Similarity-weight-semi-supervised-dictionary-learning-based human behavior identification method
CN105184298A (en) Image classification method through fast and locality-constrained low-rank coding process
CN104298974A (en) Human body behavior recognition method based on depth video sequence
CN110287806A (en) A kind of traffic sign recognition method based on improvement SSD network
CN105740908B (en) Classifier design method based on kernel space self-explanatory sparse representation
CN114842208A (en) Power grid harmful bird species target detection method based on deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20190628

WW01 Invention patent application withdrawn after publication