CN109948609A - Intelligently reading localization method based on deep learning - Google Patents
Intelligently reading localization method based on deep learning Download PDFInfo
- Publication number
- CN109948609A CN109948609A CN201910168207.2A CN201910168207A CN109948609A CN 109948609 A CN109948609 A CN 109948609A CN 201910168207 A CN201910168207 A CN 201910168207A CN 109948609 A CN109948609 A CN 109948609A
- Authority
- CN
- China
- Prior art keywords
- bounding box
- picture
- examination question
- deep learning
- localization method
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Landscapes
- Image Analysis (AREA)
Abstract
The present invention discloses a kind of intelligently reading localization method based on deep learning, and by taking verbal exercise positions as an example, method includes shooting several verbal exercise pictures with mobile phone, and be classified as training set and test set;Using labelImg annotation tool, the position of verbal exercise on every picture in training set is marked with Bounding Box;Then xml document is generated, xml document is converted into txt file;It modifies to YOLOv3 algorithm, the classification of pupil's verbal exercise is added in classification, and data set is trained;Weight is saved after the completion of training, the picture in test set is tested, returns out the Bounding Box of each verbal exercise in every picture, realizes the positioning of pupil's verbal exercise.The present invention can be realized the positioning function to verbal exercises several in picture, and the accuracy of test result is higher, so as to reduce the workload of group signature.
Description
Technical field
The present invention relates to a kind of intelligently reading localization method based on deep learning belongs to computer visual image processing skill
Art field.
Background technique
With the continuous development of information technology, educational information technologyization also achieves the progress for making us proud.For teacher
For, correcting a large amount of papers is extremely cumbersome and time-consuming work, and may not can guarantee correct ground completely it is correct, with people
For work intelligence in the extensive use of all trades and professions, artificial intelligence can help teacher comment paper, so that the work for mitigating teacher is negative
Load, and can guarantee the accuracy rate goed over examination papers.Computer vision field is to the identification of picture usually using OCR Text region at present
System, OCR accurately cannot carry out subregion identification to picture, when in face of there is the paper of numerous topics, although OCR
It can accurately identify text, but when occurring a large amount of texts in picture, be easy to happen the phenomenon of text confusion, this is because
OCR cannot achieve to picture carry out subregion identification, it can only whole picture identify together, so as to cause that can not differentiate topic
To mistake, so needing first to position the topic on paper before using OCR Text region.
Summary of the invention
The purpose of the present invention is to provide a kind of intelligently reading localization method based on deep learning, solves OCR text
Identifying system can not auto-partition domain identification the problem of.
Intelligently reading localization method based on deep learning, comprising the following steps:
1) data set is created, detailed process is as follows:
11) the jpg format picture of several examination questions is obtained by way of shooting, wherein include in every examination question picture
Examination question picture is divided into training set and test set by several examination questions;
12) every examination question picture in training set is marked using Bounding Box using labelImg annotation tool
Note, marks the specific location of each examination question in every examination question picture;
13) after completing mark, the picture of jpg format is generated to the file of xml format;
14) file of xml format is converted to the label file of txt format, stores the picture inside each label file
In each examination question bounding box five numerical value;First value indicates the class number of the examination question picture in five numerical value, the
Two values indicate the central point x coordinate of the bounding box after normalization, and third value indicates the central point y of the bounding box after normalization
Coordinate, the 4th value indicate the bounding box width after normalization, and the 5th value indicates the height of the bounding box after normalization, described
Bounding box refers to the box of the mark of the Bounding Box comprising the examination question;
2) each examination question is positioned using the YOLOv3 algorithm based on deep learning, detailed process is as follows:
21) bounding box of each examination question is predicted;
22) classified to the bounding box predicted using multi-tag classification, be divided into 0 and 1 liang of class, indicate 0 boundary
Frame removes in picture, and reservation represents 1 bounding box;
23) box of three kinds of different scales is predicted using YOLOv3 algorithm;
24) Darknet-53 network is constructed, feature extraction is carried out;
3) training set is put into Darknet-53 network and is trained, export the positioning of each examination question, training complete with
After automatically generate weight file;
4) picture in test set is tested, exports the Bounding of each examination question precise positioning of test chart on piece
Box。
In aforementioned step 11), training set and test set are divided according to setting ratio.
In aforementioned step 14), format conversion is carried out using voc_label.py file in YOLOv3 algorithm.
After format conversion above-mentioned, the picture of the label file of every examination question picture and original jpg format is placed on same
In file.
Aforementioned step 21), bounding box predicted value are as follows:
bx=σ (tx)+cx (1)
by=σ (ty)+cy (2)
Wherein, (bx,by) it is the center point coordinate for predicting obtained bounding box, bwTo predict the obtained width of bounding box,
bhFor the width for predicting obtained bounding box, σ (tx)、σ(ty) be coordinate square error loss, (tx,ty) indicate after normalizing
Bounding box center point coordinate value, tw,thThe width and height of bounding box after indicating normalization, (cx,cy) it is the upper left corner
Offset, pw,phFor the width and height of the bounding box before prediction.
During bounding box prediction above-mentioned, the score of an object is predicted each bounding box by logistic regression,
The score is indicated with Duplication, if Duplication does not reach the threshold value of setting, the bounding box of the prediction will be neglected
Slightly;The Duplication refers to the specific gravity between the bounding box that prediction obtains and true bounding box.
Threshold value above-mentioned is taken as 0.5.
In the training process, when the number of iterations is less than 1000 times, every 100 preservations are primary, when repeatedly for aforementioned step 3)
When generation number is more than 1000 times, then every 10000 preservations are primary.
The beneficial effects obtained by the present invention are as follows are as follows:
The present invention creatively establishes the model of deep learning, and is applied and goed over examination papers positioning field in artificial intelligence, warp
The training of YOLOv3 algorithm is crossed to extract the feature of verbal exercise, realizes the precise positioning to verbal exercise.
The present invention only needs to shoot verbal exercise picture can realize the function of precise positioning to verbal exercises several in figure, overcome
OCR character identification system is unable to the problem of subregion identification, test result accuracy with higher and robustness, to realize
Intelligently reading is had laid a good foundation.
Detailed description of the invention
Fig. 1 is the flow chart of the method for the present invention;
Fig. 2 is YOLOv3 network structure in the present invention;
Fig. 3 is the verbal exercise picture in test set;
Fig. 4 is the effect picture that Fig. 3 realizes verbal exercise positioning by test.
Specific embodiment
The invention will be further described below.Following embodiment is only used for clearly illustrating technical side of the invention
Case, and not intended to limit the protection scope of the present invention.
The present invention realizes on python platform, referring to Fig. 1, mainly including the following steps:
Step 1: creation data set
The invention proposes a kind of intelligently reading localization method based on deep learning, this method is using the study for having supervision
Method.Deep learning be unable to do without the development of data set in the development of all trades and professions, so establishing good data set is depth
Habit provides good precondition.In order to preferably train deep learning network, the present invention is by taking the positioning of verbal exercise as an example, wound
A small-sized verbal exercise data set is built.Specific creation process is as follows:
11) several pupil's verbal exercises are originally shot, is classified as training set and test set.
For example, by mobile phone shoot 120 pupil's verbal exercises high definition picture, and will wherein 90 be used as training set,
Remaining 30 as test.
12) picture every in training set is labeled using labelImg annotation tool using Bounding Box, is marked
Remember the specific location of each verbal exercise out, so that each verbal exercise is included in a Bounding Box, consequently facilitating
The training of YOLOv3 network.When to verbal exercise, specific location is labeled in picture, due to being closer for each verbal exercise,
It has to accurately outline its position, provides good precondition for the training of YOLOv3, and have very by test
Good accuracy and robustness.
13) verbal exercise each in picture is carried out after precisely marking, a picture generates an xml document.By jpg format
Picture generate the file of xml format, such as original picture is that " 88.jpg " format is generated as the file of " 88.xml " format.
The present invention realizes positioning, fortune using the YOLOv3 algorithm (You Only Look Once algorithm) based on deep learning
Voc_label.py file converts the xml formatted file of generation to suitable for YOLOv3 network training in row YOLOv3 algorithm
The label file of txt format.The label file (txt format) and original picture (jpg format) of every picture are placed on same file
Underedge.Five numerical value of every verbal exercise bounding box on original image are stored inside each label file (txt format), such as:
(1,0.314706855353,0.318333223444,0.273583234244,0.12)。
Bounding box refers to the box comprising each target position (every verbal exercise).The seat in the lower left corner of hypothetical boundary frame
Mark is (x1,y1), the coordinate in the upper right corner is (x2,y2), wide and high respectively w and h.First value indicates the figure in five numerical value
Class number, the category number can self-defining, second value indicate normalization after central point txCoordinate, third are worth table
Central point t after showing normalizationyCoordinate, the 4th value indicate the target width of frame t after normalizationw, the 5th value expression normalization
The height t of target frame afterwardsh, central point refers to the center of bounding box.
Step 2: it is realized using the YOLOv3 algorithm based on deep learning and each verbal exercise is positioned
The present invention modifies to the YOLOv3 algorithm based on deep learning, and the classification of pupil's verbal exercise is added.
21) predicted boundary frame
YOLOv3 algorithm obtains anchor boxes by the method clustered.YOLOv3 predicts t to each bounding boxx,ty,
tw,thThis four coordinate values, (tx,ty) indicate center point coordinate value, tw,thIndicate the width and height of bounding box.
For the cell (picture is divided into S × S latticed cell) of prediction, according to the offset in the picture upper left corner
(cx,cy) --- the distance in the offset picture upper left corner, and the width and high p of the bounding box before predictionw,ph, can to bounding box according to
According to following formula predictions:
bx=σ (tx)+cx (1)
by=σ (ty)+cy (2)
bx,by,bw,bhThe center point coordinate and size of the bounding box exactly predicted.σ(tx)、σ(ty) it is to sit
Target loss, loses using square error.
In training bx,by,bw,bhWhen uses sum of squared error loss, and (quadratic sum range error is damaged
Lose), in this way, error can be calculated quickly.
YOLOv3 predicts by logistic regression each bounding box the score (Duplication) of an object, and Duplication refers to pre-
Survey the specific gravity between frame and true frame.If this bounding box of prediction is Chong Die with true bounding box large area, and than it
His predicted value of institute's bounding box is all good, then this value is exactly 1.It (is set here if Duplication does not reach a threshold value
0.5), then the bounding box of this prediction will be ignored, to represent free of losses value.
22) classify to bounding box
To each bounding box prediction classification (0 and 1 classification), indicate that 0 bounding box removes in figure, reservation represents 1 side
Boundary's frame, bounding box use multi-tag classification (multi-label classification).YOLOv3 algorithm is using simply patrolling
It collects to return and classify, entropy loss (binary cross-entropy loss) function is intersected using two-value.
23) across scale prediction
The network can predict the box of three kinds of different scales.YOLOv3 extracts spy using a feature pyramid network
Sign.In essential characteristic extractor, it is added to several convolutional layers.Wherein the last layer prediction three-dimensional tensor is encoded
Bounding Box (bounding box), objectness (object) and class prediction.Bounding is determined using k-means cluster
Box priors (priori bounding box) selects nine clusters (cluster) and three scales (scale), then entire
Even partition clusters (cluster) on scales (scale).
24) feature extraction is carried out
The Feature Selection Model of YOLOv3 mixes a variety of models, it has used YOLOv2, Darknet-19 and
Residual error network, this model use 3 × 3 and 1 × 1 convolutional layer of better performances, are also added to shortcut
Connection structure.Finally it has 53 convolutional layers, therefore is named as Darknet-53 network.Referring to fig. 2,
Darknet-53 indicates 53 convolutional layers, practical to account for 74 layers (calculating by the network structure exported when running) altogether.detection
Layer is responsible for predicting some scale (dividing grid number, have 3 scales 13,26 and 52 to predict respectively at 82,94,106 layers)
(each grid predicts that the regressand value of 3 boxes includes coordinate, object and classification, total 3* (4+1+20)=75 to boxes regressand value
A value).If route layers there are two parameters, two layers of connection, such as 86 layers of connection 85 and 61 are indicated;One parameter indicates this
Route layers (such as 83 layers consistent with 79) consistent with that layer parameter.Shortcut layers of parameter layer is connected with this layer.Darknet-
The 53 network applications skip floor connection type of residual error network, performance is better than ResNet-152 and ResNet-101 network, because of net
Network basic unit has differences, and the network number of plies is fewer, and parameter is also few, and the calculation amount needed is few.Darknet-53 network can
To realize highest floating-point operation per second, this, which represents network structure, can more effectively utilize GPU.
It is trained Step 4: training set is put into Darknet-53 network, exports the positioning of every verbal exercise, training
Weight file is automatically generated after completing.When the number of iterations is less than 1000 times, every 100 preservations are primary, when the number of iterations is super
When crossing 1000 times, then every 10000 preservations are primary.
Step 5: the picture in test set is tested, the output each verbal exercise precise positioning of test chart on piece
Bounding Box, to complete the positioning to multiple verbal exercises on a picture.If Fig. 3 is one to dehisce arithmetic problem picture, Fig. 4 is
The effect picture of verbal exercise positioning is realized by test.
The above is only a preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art
For member, without departing from the technical principles of the invention, several improvement and deformations can also be made, these improvement and deformations
Also it should be regarded as protection scope of the present invention.
Claims (8)
1. the intelligently reading localization method based on deep learning, which comprises the following steps:
1) data set is created, detailed process is as follows:
11) the jpg format picture of several examination questions is obtained by way of shooting, wherein comprising several in every examination question picture
Examination question picture is divided into training set and test set by a examination question;
12) every examination question picture in training set is labeled using labelImg annotation tool using Bounding Box, is marked
Remember the specific location of each examination question in every examination question picture out;
13) after completing mark, the picture of jpg format is generated to the file of xml format;
14) file of xml format is converted to the label file of txt format, is stored inside each label file every in the picture
Five numerical value of a examination question bounding box;In five numerical value first value indicate the examination question picture class number, second
Value indicates the central point x coordinate of the bounding box after normalization, and third value indicates that the central point y of the bounding box after normalization is sat
Mark, the 4th value indicate the bounding box width after normalization, and the 5th value indicates the height of the bounding box after normalization, the side
Boundary's frame refers to the box of the mark of the Bounding Box comprising the examination question;
2) each examination question is positioned using the YOLOv3 algorithm based on deep learning, detailed process is as follows:
21) bounding box of each examination question is predicted;
22) classified to the bounding box predicted using multi-tag classification, be divided into 0 and 1 liang of class, indicate that 0 bounding box exists
Remove in picture, reservation represents 1 bounding box;
23) box of three kinds of different scales is predicted using YOLOv3 algorithm;
24) Darknet-53 network is constructed, feature extraction is carried out;
3) training set is put into Darknet-53 network and is trained, export the positioning of each examination question, training is completed later certainly
It is dynamic to generate weight file;
4) picture in test set is tested, exports the Bounding Box of each examination question precise positioning of test chart on piece.
2. the intelligently reading localization method according to claim 1 based on deep learning, which is characterized in that the step
11) in, training set and test set are divided according to setting ratio.
3. the intelligently reading localization method according to claim 1 based on deep learning, which is characterized in that the step
14) in, format conversion is carried out using voc_label.py file in YOLOv3 algorithm.
4. the intelligently reading localization method according to claim 3 based on deep learning, which is characterized in that format conversion
Afterwards, the picture of the label file of every examination question picture and original jpg format is placed on same file underedge.
5. the intelligently reading localization method according to claim 1 based on deep learning, which is characterized in that the step
21), bounding box predicted value is as follows:
bx=σ (tx)+cx (1)
by=σ (ty)+cy (2)
Wherein, (bx,by) it is the center point coordinate for predicting obtained bounding box, bwFor the width for predicting obtained bounding box, bhFor
Predict the width of obtained bounding box, σ (tx)、σ(ty) be coordinate square error loss, (tx,ty) indicate the side after normalization
The center point coordinate value of boundary's frame, tw,thThe width and height of bounding box after indicating normalization, (cx,cy) it is the inclined of the upper left corner
It moves, pw,phFor the width and height of the bounding box before prediction.
6. the intelligently reading localization method according to claim 5 based on deep learning, which is characterized in that bounding box prediction
In the process, predict each bounding box that the score of an object, the score are indicated with Duplication by logistic regression, such as
Fruit Duplication does not reach the threshold value of setting, then the bounding box of the prediction will be ignored;The Duplication refers to that prediction obtains
Bounding box and true bounding box between specific gravity.
7. the intelligently reading localization method according to claim 6 based on deep learning, which is characterized in that the threshold value takes
It is 0.5.
8. the intelligently reading localization method according to claim 1 based on deep learning, which is characterized in that the step 3)
In the training process, when the number of iterations is less than 1000 times, every 100 preservations are primary, when the number of iterations is more than 1000 times, then
Every 10000 preservations are primary.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910168207.2A CN109948609A (en) | 2019-03-06 | 2019-03-06 | Intelligently reading localization method based on deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910168207.2A CN109948609A (en) | 2019-03-06 | 2019-03-06 | Intelligently reading localization method based on deep learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109948609A true CN109948609A (en) | 2019-06-28 |
Family
ID=67009181
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910168207.2A Withdrawn CN109948609A (en) | 2019-03-06 | 2019-03-06 | Intelligently reading localization method based on deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109948609A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110751138A (en) * | 2019-09-09 | 2020-02-04 | 浙江工业大学 | Pan head identification method based on yolov3 and CNN |
CN110781888A (en) * | 2019-10-25 | 2020-02-11 | 北京字节跳动网络技术有限公司 | Method and device for regressing screen in video picture, readable medium and electronic equipment |
CN111242131A (en) * | 2020-01-06 | 2020-06-05 | 北京十六进制科技有限公司 | Method, storage medium and device for image recognition in intelligent marking |
CN112132143A (en) * | 2020-11-23 | 2020-12-25 | 北京易真学思教育科技有限公司 | Data processing method, electronic device and computer readable medium |
CN112597878A (en) * | 2020-12-21 | 2021-04-02 | 安徽七天教育科技有限公司 | Sample making and identifying method for scanning test paper layout analysis |
-
2019
- 2019-03-06 CN CN201910168207.2A patent/CN109948609A/en not_active Withdrawn
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110751138A (en) * | 2019-09-09 | 2020-02-04 | 浙江工业大学 | Pan head identification method based on yolov3 and CNN |
CN110781888A (en) * | 2019-10-25 | 2020-02-11 | 北京字节跳动网络技术有限公司 | Method and device for regressing screen in video picture, readable medium and electronic equipment |
CN111242131A (en) * | 2020-01-06 | 2020-06-05 | 北京十六进制科技有限公司 | Method, storage medium and device for image recognition in intelligent marking |
CN111242131B (en) * | 2020-01-06 | 2024-05-10 | 北京十六进制科技有限公司 | Method, storage medium and device for identifying images in intelligent paper reading |
CN112132143A (en) * | 2020-11-23 | 2020-12-25 | 北京易真学思教育科技有限公司 | Data processing method, electronic device and computer readable medium |
CN112597878A (en) * | 2020-12-21 | 2021-04-02 | 安徽七天教育科技有限公司 | Sample making and identifying method for scanning test paper layout analysis |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109948609A (en) | Intelligently reading localization method based on deep learning | |
CN110399905B (en) | Method for detecting and describing wearing condition of safety helmet in construction scene | |
CN107391703B (en) | The method for building up and system of image library, image library and image classification method | |
CN104573669B (en) | Image object detection method | |
CN104463101B (en) | Answer recognition methods and system for character property examination question | |
Gu et al. | A new deep learning method based on AlexNet model and SSD model for tennis ball recognition | |
CN108520273A (en) | A kind of quick detection recognition method of dense small item based on target detection | |
CN110472642A (en) | Fine granularity Image Description Methods and system based on multistage attention | |
CN107945153A (en) | A kind of road surface crack detection method based on deep learning | |
CN108629367A (en) | A method of clothes Attribute Recognition precision is enhanced based on depth network | |
CN107563439A (en) | A kind of model for identifying cleaning food materials picture and identification food materials class method for distinguishing | |
CN110532920A (en) | Smallest number data set face identification method based on FaceNet method | |
CN105260738A (en) | Method and system for detecting change of high-resolution remote sensing image based on active learning | |
CN110096711A (en) | The natural language semantic matching method of the concern of the sequence overall situation and local dynamic station concern | |
CN105701502A (en) | Image automatic marking method based on Monte Carlo data balance | |
CN104680144A (en) | Lip language recognition method and device based on projection extreme learning machine | |
CN104794455B (en) | A kind of Dongba pictograph recognition methods | |
CN106570521A (en) | Multi-language scene character recognition method and recognition system | |
CN105205449A (en) | Sign language recognition method based on deep learning | |
CN105095863A (en) | Similarity-weight-semi-supervised-dictionary-learning-based human behavior identification method | |
CN105184298A (en) | Image classification method through fast and locality-constrained low-rank coding process | |
CN104298974A (en) | Human body behavior recognition method based on depth video sequence | |
CN110287806A (en) | A kind of traffic sign recognition method based on improvement SSD network | |
CN105740908B (en) | Classifier design method based on kernel space self-explanatory sparse representation | |
CN114842208A (en) | Power grid harmful bird species target detection method based on deep learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20190628 |
|
WW01 | Invention patent application withdrawn after publication |