CN113076900A - Test paper head student information automatic detection method based on deep learning - Google Patents
Test paper head student information automatic detection method based on deep learning Download PDFInfo
- Publication number
- CN113076900A CN113076900A CN202110388294.XA CN202110388294A CN113076900A CN 113076900 A CN113076900 A CN 113076900A CN 202110388294 A CN202110388294 A CN 202110388294A CN 113076900 A CN113076900 A CN 113076900A
- Authority
- CN
- China
- Prior art keywords
- network
- text
- data
- student information
- test paper
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/14—Image acquisition
- G06V30/148—Segmentation of character regions
- G06V30/153—Segmentation of character regions using recognition of characters or words
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Abstract
The invention discloses a test paper head student information automatic detection method based on deep learning, which comprises the following steps: s1, acquiring data, namely scanning the front surfaces of a plurality of student test papers by using a scanner to obtain a whole picture of the plurality of test papers; s2, marking data, manually marking the image of the paper head to obtain a detection frame of student information, and dividing a training set and a test set; s3, expanding the data size through the synthesized data; s4, constructing a text detector, wherein the text detector is constructed by using a convolutional neural network, comprises a feature extraction network, a candidate text region generation network, a region feature sampling module and a text positioning network, and different loss functions are designed for each component network; s5, a training text detector; and S6, testing, namely inputting the test data into the trained text detector for detection. The method can detect the student information of the print to-be-filled item at the head of the test paper and the handwriting, and has the characteristic of high accuracy.
Description
Technical Field
The invention belongs to the technical field of computer vision, and particularly relates to a test paper head student information automatic detection method based on deep learning.
Background
Computer vision is an important research direction in the field of artificial intelligence, and has important application in the aspects of automatic driving, smart cities, man-machine interaction and the like. Among them, text detection is an important branch of the computer vision field, and has been rapidly developed in recent years.
The character detection has relevant application in the field of education. In teaching practice, a teacher needs to grade test paper of students, follow-up work usually includes recording student information and scores of the test paper into an electronic system, and examination conditions are conveniently counted and a teaching scheme is improved. However, in the actual working process, if a teacher carries many classes and subjects, the excessive test paper information entry work will undoubtedly increase the extra energy of the teacher. Therefore, it is very meaningful to find an automatic and accurate student information input method.
In recent years, the research progress of deep neural networks has promoted the rapid development of target detection directions, and more detection algorithms are proposed.
Disclosure of Invention
The invention mainly aims to overcome the defects of the prior art and provides a test paper head student information automatic detection method based on deep learning, which can detect the student information of the print to-be-filled item and the handwriting of the test paper head and has the characteristic of high accuracy.
In order to achieve the purpose, the invention adopts the following technical scheme:
the method for automatically detecting the information of the test paper head students based on deep learning comprises the following steps:
s1, acquiring data, namely scanning the front surfaces of a plurality of student test papers by using a scanner to obtain a plurality of test paper full pictures, and cutting the head positions of the test paper images to obtain a plurality of test paper head images;
s2, marking data, manually marking the image of the paper head to obtain a detection frame of student information, and dividing a training set and a test set;
s3, synthesizing data, and expanding data volume through synthesized data;
s4, constructing a text detector, wherein the text detector is constructed by using a convolutional neural network, comprises a feature extraction network, a candidate text region generation network, a region feature sampling module and a text positioning network, and different loss functions are designed for each component network;
s5, training a text detector, setting training relevant parameters by adopting a pre-training model, and inputting labeled data into the text detector for training;
and S6, inputting the test data into the trained text detector for detection to obtain the detection result and probability of the student information.
Further, the step S2 specifically includes:
marking software is adopted to manually mark a horizontal rectangular frame of student information, including the marking of positions and categories;
recording the coordinates of the upper left corner of the horizontal rectangular frame and the width and height data in a file;
images were randomly divided into training and test sets.
Further, the step S3 specifically includes:
s31, carrying out data statistics analysis on the manually marked real data, wherein the data statistics analysis comprises the aspect ratio of the head image of the test paper, the aspect ratio and the size of the marking frames and the distance between the marking frames;
s32, setting the width and height of the generated image and the text interval according to the data statistical result, automatically generating a test paper header image containing the items to be filled but not filled with student information, and simultaneously storing the categories and coordinates of the items to be filled;
s33, crawling the linguistic data of the student information on the Internet, wherein the linguistic data comprise names, classes and schools of the students, filtering the character information with the length larger than 10, and storing the character information into different json files according to the items to which the information belongs, so that each json file forms a corpus containing the student information of different items;
s34, downloading a Chinese handwriting data set as an image library for subsequently pasting single handwritten character images;
s35, for each item to be filled in the head of the test paper, randomly selecting a piece of information from the corresponding item corpus, for each character of the piece of information, randomly selecting one single character image from the group of corresponding images, wherein the single character image corresponds to a group of single character images handwritten by different people in the image library, and pasting the single character image to the right side of the item to be filled in the head image of the test paper in sequence;
s36, performing affine transformation, adding salt and pepper noise, rotation and Gaussian blur on the test paper head image;
and S37, synthesizing a plurality of images based on the steps S31 to S36, and combining the images with artificially labeled real data to form a training set.
Further, the feature extraction network specifically includes:
the feature extraction network adopts ResNet50 and a bidirectional feature pyramid network BiFPN in a residual neural network, and the ResNet50 improves the feature extraction capability and relieves the network degradation problem through a shortcut connection mode;
and the BiFPN performs bottom-up and top-down fusion on the extracted features of different layers simultaneously to finally obtain a multi-channel feature map F1.
Further, the network for generating the candidate text region specifically includes:
inputting the multi-channel feature map F1 into a candidate text region to generate a network, and obtaining a candidate text region R;
the candidate text region generation network comprises a two-classification network and a detection frame regression network;
in a binary network, F1 is input into a convolutional layer 256C with a convolutional kernel size of 3 x 3 and a step size of 1, and a feature map F2 of 256 channels is output; inputting the feature map F2 into the convolution layer 2kC, wherein the convolution kernel size is 1 x 1, the step length is 1, and the number of output channels is 2 k;
inputting F1 into the convolutional layer 256C in a detection frame regression network, performing feature extraction to obtain a feature map F2, inputting the feature map F2 into the convolutional layer 2kC, and obtaining 4k coordinate regression results;
each pixel point of the feature map F1 predefines k anchor points with different sizes and proportions, and k candidate regions mapped back to the original image can be obtained based on anchor regression, wherein each candidate region comprises 2 classification confidence degrees and corresponds to 2k outputs of the two-classification network.
Further, the region feature sampling module specifically includes:
given a feature map F1 of the whole map and a candidate text region R, the corresponding region of F1 is divided into m × m portions, and a feature vector is sampled for each portion to obtain a local region feature map F3 of m × m size.
Further, the text positioning network specifically includes:
inputting the local region feature map F3 into a text positioning network to obtain the probability of each region belonging to the text;
the text positioning network comprises two branches, a segmentation branch, a detection box regression branch and a classification branch; the detection frame regression and classification branches comprise a detection frame regression branch and a detection frame classification branch;
in the segmentation branch, F3 is input into a full convolution network, a text segmentation map Mask of an input image is obtained, and text pixels and background pixels are distinguished at a pixel level;
in the detection box regression branch, inputting F3 into the full connection layer, and performing regression on the candidate text region R to obtain a detection box of the text;
in the detection box classification branch, F3 is input into the full link layer, the region inside the detection box is classified, and the probability that the region belongs to the text is output.
Further, the loss function is specifically:
for the segmentation branch of the text positioning network, adopting Diceloss, specifically:
wherein X is a predicted segmentation graph, and Y is a real labeled segmentation graph;
for the detection branches of the text positioning network and the candidate text area generation network, an IoU loss is adopted, and the method specifically comprises the following steps:
Lbox=1-IoU
wherein D is a detection frame, and G is a real labeling frame;
for the text positioning network and the classification branch of the candidate text region generation network, a binary cross entropy loss function is adopted, and the method specifically comprises the following steps:
the final loss function is defined as:
L=Lmask+Lbox+Lcls。
further, the step S5 is specifically:
taking a model trained by an Imagenet classification task as a pre-training model of the feature extraction network, and initializing parameters;
setting parameters related to training, updating model parameters by adopting a random gradient descent method, setting an initial learning rate as lr, weight attenuation as weight _ decay, the number of pictures for batch training each time as batch _ size, iteration times as iters, a learning rate updating strategy as step, an updating coefficient as lambda and an updating step as stepsize;
in the candidate text region generation network, the anchors are set to have sizes of 322, 642, 1282, 2562, and 5122, and aspect ratios of 1:1, 1:2, and 2: 1;
and training the text detector, reading pictures and labels in the training set in batches, inputting the pictures into the text detector to obtain a prediction result, calculating the loss generated by the prediction result and the labels, reducing the loss by using a gradient descent method, updating network parameters of the feature extraction network, the candidate text region generation network and the text positioning network, and iterating the processes to find the optimal parameter.
Further, the step S6 specifically includes:
inputting the pictures in the test set into a trained text detector for forward reasoning;
after the detection result is obtained, the result is automatically compared with a real label by using a program to obtain the detection accuracy and recall rate, and the harmonic mean of the detection accuracy and the recall rate is calculated to be used as an evaluation index of the whole graph;
and (4) randomly selecting the detection effect of a plurality of images, automatically framing out the student information and the affiliated items of each image, and simultaneously carrying out judgment probability.
Compared with the prior art, the invention has the following advantages and beneficial effects:
1. according to the invention, an automatic learning detection algorithm of a deep network structure is adopted, so that effective expression can be well learned from data, and the detection accuracy is improved; due to the adoption of the end-to-end design, compared with the traditional manual entry, the accuracy is higher, and meanwhile, errors in the manual entry are avoided; the method has high detection accuracy and strong robustness, and can effectively detect the student information of the print to-be-filled item and the handwriting of the test paper roll head.
Drawings
FIG. 1 is a flow chart of the detection method of the present invention;
FIG. 2 is a data acquisition and processing flow diagram of the present invention;
FIG. 3 is a flow chart of the data synthesis of the present invention;
FIG. 4 is a diagram of the deep convolutional neural network of the present invention;
FIG. 5 is an example of the test results of the present invention.
Detailed Description
The present invention will be described in further detail with reference to examples and drawings, but the present invention is not limited thereto.
Examples
As shown in FIG. 1, the method for automatically detecting student information at the head of a test paper based on deep learning of the invention comprises the following steps:
and S1, acquiring data, namely, as shown in fig. 2, scanning the front surfaces of a plurality of student test papers by using a scanner to obtain a plurality of test paper full drawings, ensuring that no curling or folding phenomenon exists on the test paper pages in the scanning process, centering the positions of the test paper sheets, and cutting the test paper full drawings to obtain test paper heads which contain all personal information of students, including names, classes, seat numbers and the like, and simultaneously not including excessive non-student information areas, such as test paper titles, teacher scores and the like.
S2, labeling data, manually labeling the image of the paper head to obtain a detection frame of student information, and dividing a training set and a test set, as shown in FIG. 2, specifically comprising the following steps:
s21, manually calibrating a horizontal rectangular frame of student information by adopting special labeling software, including the calibration of positions and categories; the marked categories include two categories, namely, the items to be filled in the printed form, and the specific information handwritten by students;
s22, recording the coordinates of the horizontal rectangular frame and the category to which the rectangular frame belongs in the json file. The frame coordinates are coordinates of the upper left corner and the width and the height of the rectangular frame, and each coordinate value and the category to which the coordinate value belongs are separated by a comma;
s23, randomly dividing the images into a training set (about 2500 sheets) and a testing set (about 500 sheets).
S3, synthesizing data, wherein the student information is written by a black pen of a student, so that the student information is not finished, the student information is similar to the print characters of the paper roll head, and the paper roll head of the test paper has various styles, so that the detection difficulty is high, and tens of thousands of training data are needed for improving the performance of a model; the data volume can be expanded by synthesizing the data, and the cost of manual labeling can also be reduced, as shown in fig. 3, the method comprises the following steps:
s31, carrying out data statistics analysis on the manually marked real data, wherein the data statistics analysis comprises the aspect ratio of the head image of the test paper, the aspect ratio and the size of the marking frames, the distance between the marking frames and the like;
and S32, setting parameters such as width and height of the generated image, text spacing and the like according to the data statistics result, and automatically generating a test paper head image which contains the items to be filled in but does not fill in the student information. Meanwhile, the category and the coordinate of the item to be filled are stored, so that the information and the position coordinate of the pasted student can be conveniently determined when the handwritten single character image is pasted subsequently;
s33, crawling the linguistic data of the student information on the Internet, including names, classes, schools and the like of the students, filtering out the text information with the length being more than 10, and storing the text information into different json files according to the items to which the information belongs, wherein each json file forms a corpus containing the student information of different items;
s34, downloading a Chinese handwriting data set issued by the automation of the Chinese academy of sciences as an image library for subsequently pasting the handwritten single character images;
and S35, randomly selecting a piece of information from the corresponding project corpus for each project to be filled in the test paper header. For each character of the information, the image library has a group of single character images corresponding to the character, which are handwritten by different people, so that one single character image can be randomly selected from the group of corresponding images, and the single character image is sequentially pasted to the right side of the item to be filled in the image at the head of the test paper;
s36, performing affine transformation, adding salt and pepper noise, rotation, Gaussian blur and other operations on the test paper head image;
s37, based on steps S31 to S36, 20000 data images are synthesized and combined with 2500 artificially labeled real data images to form a training set.
S4, constructing a text detector, wherein the text detector is a double-stage text detector and comprises a feature extraction network, a candidate text region generation network, a region feature sampling module and a text positioning network;
in this embodiment, the feature extraction network adopts ResNet50 and a bidirectional feature pyramid network bipfn in a residual neural network;
the ResNet50 improves the feature extraction capability and relieves the problem of network degradation through a shortcut connection mode, the BiFPN performs bottom-up and top-down fusion on the extracted features of different layers simultaneously, and a multi-channel feature map F1 is finally obtained;
after the feature map F1, the candidate text region is accessed to generate a network, and a candidate text region R is obtained:
in the present embodiment, as shown in fig. 4, the candidate text region generating network includes a two-classification network and a detection box regression network;
in a binary network, F1 is input into a convolutional layer 256C with a convolutional kernel size of 3 x 3 and a step size of 1, and a feature map F2 of 256 channels is output; inputting the feature map F2 into the convolution layer 2kC, wherein the convolution kernel size is 1 x 1, the step length is 1, and the number of output channels is 2 k;
inputting F1 into the convolutional layer 256C in a detection frame regression network, performing feature extraction to obtain a feature map F2, inputting the feature map F2 into the convolutional layer 2kC, and obtaining 4k coordinate regression results;
each pixel point of the feature map F1 predefines k anchor points with different sizes and proportions, and k candidate regions mapped back to the original image can be obtained based on anchor regression, wherein each candidate region comprises 2 classification confidence degrees and corresponds to 2k outputs of the two-classification network.
Dividing a corresponding region of F1 into m portions on a given whole feature map F1 and a candidate text region R, and sampling a feature vector for each portion to obtain a local region feature map F3 with the size;
inputting the local region feature map F3 into a text positioning network to obtain the probability of each region belonging to the text;
in this embodiment, as shown in fig. 4, the text positioning network includes two branches, a segmentation branch, and a detection box regression and classification branch; the detection frame regression and classification branches comprise a detection frame regression branch and a detection frame classification branch;
in the segmentation branch, F3 is input into a full convolution network, a text segmentation map Mask of an input image is obtained, and text pixels and background pixels are distinguished at a pixel level;
in the detection box regression branch, inputting F3 into the full connection layer, and performing regression on the candidate text region R to obtain a detection box of the text;
in the detection box classification branch, F3 is input into the full link layer, the region inside the detection box is classified, and the probability that the region belongs to the text is output.
In this embodiment, for the segmentation branch of the text positioning network, a piece loss is adopted:
wherein X is a predicted segmentation graph, and Y is a real labeled segmentation graph;
generating detection branches of the network for the text positioning network and the candidate text area, and adopting IoU loss:
Lbox=1-IoU
wherein D is a detection frame, and G is a real labeling frame;
generating classification branches of the network for the text positioning network and the candidate text regions, and using a binary cross entropy loss function:
the final loss function is defined as:
L=Lmask+Lbox+Lcls
s5, inputting the data with labels into a text detector to train to obtain a model, specifically:
s51, in this embodiment, parameters related to training are set: updating model parameters by adopting a stochastic gradient descent method, setting an initial learning rate lr to be 0.01, weight _ decay to be 0.0005, the number of pictures of batch _ size of each batch training to be 8, iteration times iters to be 50000, a learning rate updating strategy to be step, an updating coefficient lambda to be 0.1, and updating step sizes to be 30000 and 40000. In the candidate text region generation network, the anchors are set to have sizes of 322, 642, 1282, 2562, and 5122, and aspect ratios of 1:1, 1:2, and 2: 1;
s52, using the model trained by the Imagenet classification task as a pre-training model of the backbone network to initialize parameters;
s53, training the convolutional neural network, and training the feature extraction network, the candidate text region generation network and the text positioning network by adopting an end-to-end training method, wherein the training method specifically comprises the following steps:
reading pictures and labels in the training set in batches, inputting the pictures into a text detector to obtain a prediction result, calculating the loss generated by the prediction result and the labels, reducing the loss by using a gradient descent method, updating network parameters of a feature extraction network, a candidate text region generation network and a text positioning network, and iteratively training the text detector to find the optimal parameter.
S6, testing the network, specifically including:
s61, inputting the pictures in the test set into the trained model for forward reasoning;
s62, after the detection result is obtained, automatically comparing the result with a real label by using a program to obtain the detection accuracy and the recall rate, and calculating the harmonic mean of the result and the recall rate as the evaluation index of the whole graph;
and S63, randomly selecting the detection effect of 30 images, and automatically framing the student information and the belonged items of each image with judgment probability.
As shown in fig. 5, the result of detecting a 4680 × 403 head image of a test paper is shown, in which student information and the items are outlined, and the upper left corner has a decision probability.
It should also be noted that in this specification, terms such as "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims (10)
1. The method for automatically detecting the information of the test paper head students based on deep learning is characterized by comprising the following steps:
s1, acquiring data, namely scanning the front surfaces of a plurality of student test papers by using a scanner to obtain a plurality of test paper full pictures, and cutting the head positions of the test paper images to obtain a plurality of test paper head images;
s2, marking data, manually marking the image of the paper head to obtain a detection frame of student information, and dividing a training set and a test set;
s3, synthesizing data, and expanding data volume through synthesized data;
s4, constructing a text detector, wherein the text detector is constructed by using a convolutional neural network, comprises a feature extraction network, a candidate text region generation network, a region feature sampling module and a text positioning network, and different loss functions are designed for each component network;
s5, training a text detector, setting training relevant parameters by adopting a pre-training model, and inputting labeled data into the text detector for training;
and S6, inputting the test data into the trained text detector for detection to obtain the detection result and probability of the student information.
2. The method for automatically detecting student information in paper based on deep learning according to claim 1, wherein the step S2 specifically includes:
marking software is adopted to manually mark a horizontal rectangular frame of student information, including the marking of positions and categories;
recording the coordinates of the upper left corner of the horizontal rectangular frame and the width and height data in a file;
images were randomly divided into training and test sets.
3. The method for automatically detecting student information in paper based on deep learning according to claim 1, wherein the step S3 specifically includes:
s31, carrying out data statistics analysis on the manually marked real data, wherein the data statistics analysis comprises the aspect ratio of the head image of the test paper, the aspect ratio and the size of the marking frames and the distance between the marking frames;
s32, setting the width and height of the generated image and the text interval according to the data statistical result, automatically generating a test paper header image containing the items to be filled but not filled with student information, and simultaneously storing the categories and coordinates of the items to be filled;
s33, crawling the linguistic data of the student information on the Internet, wherein the linguistic data comprise names, classes and schools of the students, filtering the character information with the length larger than 10, and storing the character information into different json files according to the items to which the information belongs, so that each json file forms a corpus containing the student information of different items;
s34, downloading a Chinese handwriting data set as an image library for subsequently pasting single handwritten character images;
s35, for each item to be filled in the head of the test paper, randomly selecting a piece of information from the corresponding item corpus, for each character of the piece of information, randomly selecting one single character image from the group of corresponding images, wherein the single character image corresponds to a group of single character images handwritten by different people in the image library, and pasting the single character image to the right side of the item to be filled in the head image of the test paper in sequence;
s36, performing affine transformation, adding salt and pepper noise, rotation and Gaussian blur on the test paper head image;
and S37, synthesizing a plurality of images based on the steps S31 to S36, and combining the images with artificially labeled real data to form a training set.
4. The method for automatically detecting student information at the beginning of a test paper based on deep learning according to claim 1, wherein the feature extraction network specifically comprises:
the feature extraction network adopts ResNet50 and a bidirectional feature pyramid network BiFPN in a residual neural network, and the ResNet50 improves the feature extraction capability and relieves the network degradation problem through a shortcut connection mode;
and the BiFPN performs bottom-up and top-down fusion on the extracted features of different layers simultaneously to finally obtain a multi-channel feature map F1.
5. The method for automatically detecting student information at the beginning of a test paper based on deep learning according to claim 4, wherein the candidate text area generation network specifically comprises:
inputting the multi-channel feature map F1 into a candidate text region to generate a network, and obtaining a candidate text region R;
the candidate text region generation network comprises a two-classification network and a detection frame regression network;
in a binary network, F1 is input into a convolutional layer 256C with a convolutional kernel size of 3 x 3 and a step size of 1, and a feature map F2 of 256 channels is output; inputting the feature map F2 into the convolution layer 2kC, wherein the convolution kernel size is 1 x 1, the step length is 1, and the number of output channels is 2 k;
inputting F1 into the convolutional layer 256C in a detection frame regression network, performing feature extraction to obtain a feature map F2, inputting the feature map F2 into the convolutional layer 2kC, and obtaining 4k coordinate regression results;
each pixel point of the feature map F1 predefines k anchor points with different sizes and proportions, and k candidate regions mapped back to the original image can be obtained based on anchor regression, wherein each candidate region comprises 2 classification confidence degrees and corresponds to 2k outputs of the two-classification network.
6. The method for automatically detecting student information at the beginning of a test paper based on deep learning according to claim 5, wherein the region feature sampling module specifically comprises:
given a feature map F1 of the whole map and a candidate text region R, the corresponding region of F1 is divided into m × m portions, and a feature vector is sampled for each portion to obtain a local region feature map F3 of m × m size.
7. The method for automatically detecting the student information at the beginning of the test paper based on deep learning according to claim 6, wherein the text positioning network specifically comprises:
inputting the local region feature map F3 into a text positioning network to obtain the probability of each region belonging to the text;
the text positioning network comprises two branches, a segmentation branch, a detection box regression branch and a classification branch; the detection frame regression and classification branches comprise a detection frame regression branch and a detection frame classification branch;
in the segmentation branch, F3 is input into a full convolution network, a text segmentation map Mask of an input image is obtained, and text pixels and background pixels are distinguished at a pixel level;
in the detection box regression branch, inputting F3 into the full connection layer, and performing regression on the candidate text region R to obtain a detection box of the text;
in the detection box classification branch, F3 is input into the full link layer, the region inside the detection box is classified, and the probability that the region belongs to the text is output.
8. The method for automatically detecting student information at the beginning of test paper based on deep learning according to claim 7, wherein the loss function is specifically:
for the segmentation branch of the text positioning network, adopting Diceloss, specifically:
wherein X is a predicted segmentation graph, and Y is a real labeled segmentation graph;
for the detection branches of the text positioning network and the candidate text area generation network, an IoU loss is adopted, and the method specifically comprises the following steps:
Lbox=1-IoU
wherein D is a detection frame, and G is a real labeling frame;
for the text positioning network and the classification branch of the candidate text region generation network, a binary cross entropy loss function is adopted, and the method specifically comprises the following steps:
the final loss function is defined as:
L=Lmask+Lbox+Lcls。
9. the method for automatically detecting student information in paper based on deep learning according to claim 1, wherein the step S5 is specifically as follows:
taking a model trained by an Imagenet classification task as a pre-training model of the feature extraction network, and initializing parameters;
setting parameters related to training, updating model parameters by adopting a random gradient descent method, setting an initial learning rate as lr, weight attenuation as weight _ decay, the number of pictures for batch training each time as batch _ size, iteration times as iters, a learning rate updating strategy as step, an updating coefficient as lambda and an updating step as stepsize;
in the candidate text region generation network, the anchors are set to have sizes of 322, 642, 1282, 2562, and 5122, and aspect ratios of 1:1, 1:2, and 2: 1;
and the training text detector reads pictures and labels in the training set in batches, inputs the pictures into the text detector to obtain a prediction result, calculates the loss generated by the prediction result and the labels, reduces the loss by using a gradient descent method, updates network parameters of the feature extraction network, the candidate text region generation network and the text positioning network, and iteratively trains the text detector to find the optimal parameter.
10. The method for automatically detecting student information in paper based on deep learning according to claim 1, wherein the step S6 specifically includes:
inputting the pictures in the test set into a trained text detector for forward reasoning;
after the detection result is obtained, the result is automatically compared with a real label by using a program to obtain the detection accuracy and recall rate, and the harmonic mean of the detection accuracy and the recall rate is calculated to be used as an evaluation index of the whole graph;
and (4) randomly selecting the detection effect of a plurality of images, automatically framing out the student information and the affiliated items of each image, and simultaneously carrying out judgment probability.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110388294.XA CN113076900B (en) | 2021-04-12 | 2021-04-12 | Test paper head student information automatic detection method based on deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110388294.XA CN113076900B (en) | 2021-04-12 | 2021-04-12 | Test paper head student information automatic detection method based on deep learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113076900A true CN113076900A (en) | 2021-07-06 |
CN113076900B CN113076900B (en) | 2022-06-14 |
Family
ID=76617428
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110388294.XA Active CN113076900B (en) | 2021-04-12 | 2021-04-12 | Test paper head student information automatic detection method based on deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113076900B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113343990A (en) * | 2021-07-28 | 2021-09-03 | 浩鲸云计算科技股份有限公司 | Key text detection and classification training method for certificate pictures |
CN113780087A (en) * | 2021-08-11 | 2021-12-10 | 同济大学 | Postal parcel text detection method and equipment based on deep learning |
CN114708127A (en) * | 2022-04-15 | 2022-07-05 | 广东南粤科教研究院 | Student point system comprehensive assessment method and system |
CN115565190A (en) * | 2022-11-17 | 2023-01-03 | 江西风向标智能科技有限公司 | Test paper layout analysis method, system, computer and readable storage medium |
CN116128954A (en) * | 2022-12-30 | 2023-05-16 | 上海强仝智能科技有限公司 | Commodity layout identification method, device and storage medium based on generation network |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103853852A (en) * | 2014-03-31 | 2014-06-11 | 广州视源电子科技股份有限公司 | Electronic test paper importing method |
JP2018067219A (en) * | 2016-10-21 | 2018-04-26 | 株式会社森山商会 | Score input device, program thereof, and computer readable recording medium recording program thereof |
CN110751232A (en) * | 2019-11-04 | 2020-02-04 | 哈尔滨理工大学 | Chinese complex scene text detection and identification method |
US20200090539A1 (en) * | 2018-08-13 | 2020-03-19 | Hangzhou Dana Technology Inc. | Method and system for intelligent identification and correction of questions |
CN111539309A (en) * | 2020-04-21 | 2020-08-14 | 广州云从鼎望科技有限公司 | Data processing method, system, platform, equipment and medium based on OCR |
CN111553423A (en) * | 2020-04-29 | 2020-08-18 | 河北地质大学 | Handwriting recognition method based on deep convolutional neural network image processing technology |
CN111753828A (en) * | 2020-05-19 | 2020-10-09 | 重庆邮电大学 | Natural scene horizontal character detection method based on deep convolutional neural network |
CN111898411A (en) * | 2020-06-16 | 2020-11-06 | 华南理工大学 | Text image labeling system, method, computer device and storage medium |
-
2021
- 2021-04-12 CN CN202110388294.XA patent/CN113076900B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103853852A (en) * | 2014-03-31 | 2014-06-11 | 广州视源电子科技股份有限公司 | Electronic test paper importing method |
JP2018067219A (en) * | 2016-10-21 | 2018-04-26 | 株式会社森山商会 | Score input device, program thereof, and computer readable recording medium recording program thereof |
US20200090539A1 (en) * | 2018-08-13 | 2020-03-19 | Hangzhou Dana Technology Inc. | Method and system for intelligent identification and correction of questions |
CN110751232A (en) * | 2019-11-04 | 2020-02-04 | 哈尔滨理工大学 | Chinese complex scene text detection and identification method |
CN111539309A (en) * | 2020-04-21 | 2020-08-14 | 广州云从鼎望科技有限公司 | Data processing method, system, platform, equipment and medium based on OCR |
CN111553423A (en) * | 2020-04-29 | 2020-08-18 | 河北地质大学 | Handwriting recognition method based on deep convolutional neural network image processing technology |
CN111753828A (en) * | 2020-05-19 | 2020-10-09 | 重庆邮电大学 | Natural scene horizontal character detection method based on deep convolutional neural network |
CN111898411A (en) * | 2020-06-16 | 2020-11-06 | 华南理工大学 | Text image labeling system, method, computer device and storage medium |
Non-Patent Citations (1)
Title |
---|
XIANGLE CHEN ET AL: "Radical aggregation network for few-shot offline handwritten Chinese character recognition", 《PATTERN RECOGNITION LETTERS》 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113343990A (en) * | 2021-07-28 | 2021-09-03 | 浩鲸云计算科技股份有限公司 | Key text detection and classification training method for certificate pictures |
CN113343990B (en) * | 2021-07-28 | 2021-12-03 | 浩鲸云计算科技股份有限公司 | Key text detection and classification training method for certificate pictures |
CN113780087A (en) * | 2021-08-11 | 2021-12-10 | 同济大学 | Postal parcel text detection method and equipment based on deep learning |
CN113780087B (en) * | 2021-08-11 | 2024-04-26 | 同济大学 | Postal package text detection method and equipment based on deep learning |
CN114708127A (en) * | 2022-04-15 | 2022-07-05 | 广东南粤科教研究院 | Student point system comprehensive assessment method and system |
CN115565190A (en) * | 2022-11-17 | 2023-01-03 | 江西风向标智能科技有限公司 | Test paper layout analysis method, system, computer and readable storage medium |
CN116128954A (en) * | 2022-12-30 | 2023-05-16 | 上海强仝智能科技有限公司 | Commodity layout identification method, device and storage medium based on generation network |
CN116128954B (en) * | 2022-12-30 | 2023-12-05 | 上海强仝智能科技有限公司 | Commodity layout identification method, device and storage medium based on generation network |
Also Published As
Publication number | Publication date |
---|---|
CN113076900B (en) | 2022-06-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113076900B (en) | Test paper head student information automatic detection method based on deep learning | |
CN111325203B (en) | American license plate recognition method and system based on image correction | |
CN107403130A (en) | A kind of character identifying method and character recognition device | |
CN108537269B (en) | Weak interactive object detection deep learning method and system thereof | |
CN103488711B (en) | A kind of method and system of quick Fabrication vector font library | |
CN110287960A (en) | The detection recognition method of curve text in natural scene image | |
CN109726628A (en) | A kind of recognition methods and system of form image | |
CN111062885A (en) | Mark detection model training and mark detection method based on multi-stage transfer learning | |
CN110163208B (en) | Scene character detection method and system based on deep learning | |
CN105893968A (en) | Text-independent end-to-end handwriting recognition method based on deep learning | |
CN112528862B (en) | Remote sensing image target detection method based on improved cross entropy loss function | |
CN111626292B (en) | Text recognition method of building indication mark based on deep learning technology | |
CN110929746A (en) | Electronic file title positioning, extracting and classifying method based on deep neural network | |
CN113673338A (en) | Natural scene text image character pixel weak supervision automatic labeling method, system and medium | |
CN110414616A (en) | A kind of remote sensing images dictionary learning classification method using spatial relationship | |
CN109598185A (en) | Image recognition interpretation method, device, equipment and readable storage medium storing program for executing | |
CN111563563B (en) | Method for enhancing combined data of handwriting recognition | |
CN112232371A (en) | American license plate recognition method based on YOLOv3 and text recognition | |
CN111666937A (en) | Method and system for recognizing text in image | |
CN111507353B (en) | Chinese field detection method and system based on character recognition | |
CN113378812A (en) | Digital dial plate identification method based on Mask R-CNN and CRNN | |
CN114119949A (en) | Method and system for generating enhanced text synthetic image | |
CN108052936B (en) | Automatic inclination correction method and system for Braille image | |
CN110443235B (en) | Intelligent paper test paper total score identification method and system | |
JPH08508128A (en) | Image classification method and apparatus using distribution map |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |