CN113076900A - Test paper head student information automatic detection method based on deep learning - Google Patents

Test paper head student information automatic detection method based on deep learning Download PDF

Info

Publication number
CN113076900A
CN113076900A CN202110388294.XA CN202110388294A CN113076900A CN 113076900 A CN113076900 A CN 113076900A CN 202110388294 A CN202110388294 A CN 202110388294A CN 113076900 A CN113076900 A CN 113076900A
Authority
CN
China
Prior art keywords
network
text
data
student information
test paper
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110388294.XA
Other languages
Chinese (zh)
Other versions
CN113076900B (en
Inventor
陈向乐
黄双萍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN202110388294.XA priority Critical patent/CN113076900B/en
Publication of CN113076900A publication Critical patent/CN113076900A/en
Application granted granted Critical
Publication of CN113076900B publication Critical patent/CN113076900B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Abstract

The invention discloses a test paper head student information automatic detection method based on deep learning, which comprises the following steps: s1, acquiring data, namely scanning the front surfaces of a plurality of student test papers by using a scanner to obtain a whole picture of the plurality of test papers; s2, marking data, manually marking the image of the paper head to obtain a detection frame of student information, and dividing a training set and a test set; s3, expanding the data size through the synthesized data; s4, constructing a text detector, wherein the text detector is constructed by using a convolutional neural network, comprises a feature extraction network, a candidate text region generation network, a region feature sampling module and a text positioning network, and different loss functions are designed for each component network; s5, a training text detector; and S6, testing, namely inputting the test data into the trained text detector for detection. The method can detect the student information of the print to-be-filled item at the head of the test paper and the handwriting, and has the characteristic of high accuracy.

Description

Test paper head student information automatic detection method based on deep learning
Technical Field
The invention belongs to the technical field of computer vision, and particularly relates to a test paper head student information automatic detection method based on deep learning.
Background
Computer vision is an important research direction in the field of artificial intelligence, and has important application in the aspects of automatic driving, smart cities, man-machine interaction and the like. Among them, text detection is an important branch of the computer vision field, and has been rapidly developed in recent years.
The character detection has relevant application in the field of education. In teaching practice, a teacher needs to grade test paper of students, follow-up work usually includes recording student information and scores of the test paper into an electronic system, and examination conditions are conveniently counted and a teaching scheme is improved. However, in the actual working process, if a teacher carries many classes and subjects, the excessive test paper information entry work will undoubtedly increase the extra energy of the teacher. Therefore, it is very meaningful to find an automatic and accurate student information input method.
In recent years, the research progress of deep neural networks has promoted the rapid development of target detection directions, and more detection algorithms are proposed.
Disclosure of Invention
The invention mainly aims to overcome the defects of the prior art and provides a test paper head student information automatic detection method based on deep learning, which can detect the student information of the print to-be-filled item and the handwriting of the test paper head and has the characteristic of high accuracy.
In order to achieve the purpose, the invention adopts the following technical scheme:
the method for automatically detecting the information of the test paper head students based on deep learning comprises the following steps:
s1, acquiring data, namely scanning the front surfaces of a plurality of student test papers by using a scanner to obtain a plurality of test paper full pictures, and cutting the head positions of the test paper images to obtain a plurality of test paper head images;
s2, marking data, manually marking the image of the paper head to obtain a detection frame of student information, and dividing a training set and a test set;
s3, synthesizing data, and expanding data volume through synthesized data;
s4, constructing a text detector, wherein the text detector is constructed by using a convolutional neural network, comprises a feature extraction network, a candidate text region generation network, a region feature sampling module and a text positioning network, and different loss functions are designed for each component network;
s5, training a text detector, setting training relevant parameters by adopting a pre-training model, and inputting labeled data into the text detector for training;
and S6, inputting the test data into the trained text detector for detection to obtain the detection result and probability of the student information.
Further, the step S2 specifically includes:
marking software is adopted to manually mark a horizontal rectangular frame of student information, including the marking of positions and categories;
recording the coordinates of the upper left corner of the horizontal rectangular frame and the width and height data in a file;
images were randomly divided into training and test sets.
Further, the step S3 specifically includes:
s31, carrying out data statistics analysis on the manually marked real data, wherein the data statistics analysis comprises the aspect ratio of the head image of the test paper, the aspect ratio and the size of the marking frames and the distance between the marking frames;
s32, setting the width and height of the generated image and the text interval according to the data statistical result, automatically generating a test paper header image containing the items to be filled but not filled with student information, and simultaneously storing the categories and coordinates of the items to be filled;
s33, crawling the linguistic data of the student information on the Internet, wherein the linguistic data comprise names, classes and schools of the students, filtering the character information with the length larger than 10, and storing the character information into different json files according to the items to which the information belongs, so that each json file forms a corpus containing the student information of different items;
s34, downloading a Chinese handwriting data set as an image library for subsequently pasting single handwritten character images;
s35, for each item to be filled in the head of the test paper, randomly selecting a piece of information from the corresponding item corpus, for each character of the piece of information, randomly selecting one single character image from the group of corresponding images, wherein the single character image corresponds to a group of single character images handwritten by different people in the image library, and pasting the single character image to the right side of the item to be filled in the head image of the test paper in sequence;
s36, performing affine transformation, adding salt and pepper noise, rotation and Gaussian blur on the test paper head image;
and S37, synthesizing a plurality of images based on the steps S31 to S36, and combining the images with artificially labeled real data to form a training set.
Further, the feature extraction network specifically includes:
the feature extraction network adopts ResNet50 and a bidirectional feature pyramid network BiFPN in a residual neural network, and the ResNet50 improves the feature extraction capability and relieves the network degradation problem through a shortcut connection mode;
and the BiFPN performs bottom-up and top-down fusion on the extracted features of different layers simultaneously to finally obtain a multi-channel feature map F1.
Further, the network for generating the candidate text region specifically includes:
inputting the multi-channel feature map F1 into a candidate text region to generate a network, and obtaining a candidate text region R;
the candidate text region generation network comprises a two-classification network and a detection frame regression network;
in a binary network, F1 is input into a convolutional layer 256C with a convolutional kernel size of 3 x 3 and a step size of 1, and a feature map F2 of 256 channels is output; inputting the feature map F2 into the convolution layer 2kC, wherein the convolution kernel size is 1 x 1, the step length is 1, and the number of output channels is 2 k;
inputting F1 into the convolutional layer 256C in a detection frame regression network, performing feature extraction to obtain a feature map F2, inputting the feature map F2 into the convolutional layer 2kC, and obtaining 4k coordinate regression results;
each pixel point of the feature map F1 predefines k anchor points with different sizes and proportions, and k candidate regions mapped back to the original image can be obtained based on anchor regression, wherein each candidate region comprises 2 classification confidence degrees and corresponds to 2k outputs of the two-classification network.
Further, the region feature sampling module specifically includes:
given a feature map F1 of the whole map and a candidate text region R, the corresponding region of F1 is divided into m × m portions, and a feature vector is sampled for each portion to obtain a local region feature map F3 of m × m size.
Further, the text positioning network specifically includes:
inputting the local region feature map F3 into a text positioning network to obtain the probability of each region belonging to the text;
the text positioning network comprises two branches, a segmentation branch, a detection box regression branch and a classification branch; the detection frame regression and classification branches comprise a detection frame regression branch and a detection frame classification branch;
in the segmentation branch, F3 is input into a full convolution network, a text segmentation map Mask of an input image is obtained, and text pixels and background pixels are distinguished at a pixel level;
in the detection box regression branch, inputting F3 into the full connection layer, and performing regression on the candidate text region R to obtain a detection box of the text;
in the detection box classification branch, F3 is input into the full link layer, the region inside the detection box is classified, and the probability that the region belongs to the text is output.
Further, the loss function is specifically:
for the segmentation branch of the text positioning network, adopting Diceloss, specifically:
Figure BDA0003015884580000041
wherein X is a predicted segmentation graph, and Y is a real labeled segmentation graph;
for the detection branches of the text positioning network and the candidate text area generation network, an IoU loss is adopted, and the method specifically comprises the following steps:
Lbox=1-IoU
Figure BDA0003015884580000051
wherein D is a detection frame, and G is a real labeling frame;
for the text positioning network and the classification branch of the candidate text region generation network, a binary cross entropy loss function is adopted, and the method specifically comprises the following steps:
Figure BDA0003015884580000052
wherein, p represents the prediction probability,
Figure BDA0003015884580000053
representing a real category;
the final loss function is defined as:
L=Lmask+Lbox+Lcls
further, the step S5 is specifically:
taking a model trained by an Imagenet classification task as a pre-training model of the feature extraction network, and initializing parameters;
setting parameters related to training, updating model parameters by adopting a random gradient descent method, setting an initial learning rate as lr, weight attenuation as weight _ decay, the number of pictures for batch training each time as batch _ size, iteration times as iters, a learning rate updating strategy as step, an updating coefficient as lambda and an updating step as stepsize;
in the candidate text region generation network, the anchors are set to have sizes of 322, 642, 1282, 2562, and 5122, and aspect ratios of 1:1, 1:2, and 2: 1;
and training the text detector, reading pictures and labels in the training set in batches, inputting the pictures into the text detector to obtain a prediction result, calculating the loss generated by the prediction result and the labels, reducing the loss by using a gradient descent method, updating network parameters of the feature extraction network, the candidate text region generation network and the text positioning network, and iterating the processes to find the optimal parameter.
Further, the step S6 specifically includes:
inputting the pictures in the test set into a trained text detector for forward reasoning;
after the detection result is obtained, the result is automatically compared with a real label by using a program to obtain the detection accuracy and recall rate, and the harmonic mean of the detection accuracy and the recall rate is calculated to be used as an evaluation index of the whole graph;
and (4) randomly selecting the detection effect of a plurality of images, automatically framing out the student information and the affiliated items of each image, and simultaneously carrying out judgment probability.
Compared with the prior art, the invention has the following advantages and beneficial effects:
1. according to the invention, an automatic learning detection algorithm of a deep network structure is adopted, so that effective expression can be well learned from data, and the detection accuracy is improved; due to the adoption of the end-to-end design, compared with the traditional manual entry, the accuracy is higher, and meanwhile, errors in the manual entry are avoided; the method has high detection accuracy and strong robustness, and can effectively detect the student information of the print to-be-filled item and the handwriting of the test paper roll head.
Drawings
FIG. 1 is a flow chart of the detection method of the present invention;
FIG. 2 is a data acquisition and processing flow diagram of the present invention;
FIG. 3 is a flow chart of the data synthesis of the present invention;
FIG. 4 is a diagram of the deep convolutional neural network of the present invention;
FIG. 5 is an example of the test results of the present invention.
Detailed Description
The present invention will be described in further detail with reference to examples and drawings, but the present invention is not limited thereto.
Examples
As shown in FIG. 1, the method for automatically detecting student information at the head of a test paper based on deep learning of the invention comprises the following steps:
and S1, acquiring data, namely, as shown in fig. 2, scanning the front surfaces of a plurality of student test papers by using a scanner to obtain a plurality of test paper full drawings, ensuring that no curling or folding phenomenon exists on the test paper pages in the scanning process, centering the positions of the test paper sheets, and cutting the test paper full drawings to obtain test paper heads which contain all personal information of students, including names, classes, seat numbers and the like, and simultaneously not including excessive non-student information areas, such as test paper titles, teacher scores and the like.
S2, labeling data, manually labeling the image of the paper head to obtain a detection frame of student information, and dividing a training set and a test set, as shown in FIG. 2, specifically comprising the following steps:
s21, manually calibrating a horizontal rectangular frame of student information by adopting special labeling software, including the calibration of positions and categories; the marked categories include two categories, namely, the items to be filled in the printed form, and the specific information handwritten by students;
s22, recording the coordinates of the horizontal rectangular frame and the category to which the rectangular frame belongs in the json file. The frame coordinates are coordinates of the upper left corner and the width and the height of the rectangular frame, and each coordinate value and the category to which the coordinate value belongs are separated by a comma;
s23, randomly dividing the images into a training set (about 2500 sheets) and a testing set (about 500 sheets).
S3, synthesizing data, wherein the student information is written by a black pen of a student, so that the student information is not finished, the student information is similar to the print characters of the paper roll head, and the paper roll head of the test paper has various styles, so that the detection difficulty is high, and tens of thousands of training data are needed for improving the performance of a model; the data volume can be expanded by synthesizing the data, and the cost of manual labeling can also be reduced, as shown in fig. 3, the method comprises the following steps:
s31, carrying out data statistics analysis on the manually marked real data, wherein the data statistics analysis comprises the aspect ratio of the head image of the test paper, the aspect ratio and the size of the marking frames, the distance between the marking frames and the like;
and S32, setting parameters such as width and height of the generated image, text spacing and the like according to the data statistics result, and automatically generating a test paper head image which contains the items to be filled in but does not fill in the student information. Meanwhile, the category and the coordinate of the item to be filled are stored, so that the information and the position coordinate of the pasted student can be conveniently determined when the handwritten single character image is pasted subsequently;
s33, crawling the linguistic data of the student information on the Internet, including names, classes, schools and the like of the students, filtering out the text information with the length being more than 10, and storing the text information into different json files according to the items to which the information belongs, wherein each json file forms a corpus containing the student information of different items;
s34, downloading a Chinese handwriting data set issued by the automation of the Chinese academy of sciences as an image library for subsequently pasting the handwritten single character images;
and S35, randomly selecting a piece of information from the corresponding project corpus for each project to be filled in the test paper header. For each character of the information, the image library has a group of single character images corresponding to the character, which are handwritten by different people, so that one single character image can be randomly selected from the group of corresponding images, and the single character image is sequentially pasted to the right side of the item to be filled in the image at the head of the test paper;
s36, performing affine transformation, adding salt and pepper noise, rotation, Gaussian blur and other operations on the test paper head image;
s37, based on steps S31 to S36, 20000 data images are synthesized and combined with 2500 artificially labeled real data images to form a training set.
S4, constructing a text detector, wherein the text detector is a double-stage text detector and comprises a feature extraction network, a candidate text region generation network, a region feature sampling module and a text positioning network;
in this embodiment, the feature extraction network adopts ResNet50 and a bidirectional feature pyramid network bipfn in a residual neural network;
the ResNet50 improves the feature extraction capability and relieves the problem of network degradation through a shortcut connection mode, the BiFPN performs bottom-up and top-down fusion on the extracted features of different layers simultaneously, and a multi-channel feature map F1 is finally obtained;
after the feature map F1, the candidate text region is accessed to generate a network, and a candidate text region R is obtained:
in the present embodiment, as shown in fig. 4, the candidate text region generating network includes a two-classification network and a detection box regression network;
in a binary network, F1 is input into a convolutional layer 256C with a convolutional kernel size of 3 x 3 and a step size of 1, and a feature map F2 of 256 channels is output; inputting the feature map F2 into the convolution layer 2kC, wherein the convolution kernel size is 1 x 1, the step length is 1, and the number of output channels is 2 k;
inputting F1 into the convolutional layer 256C in a detection frame regression network, performing feature extraction to obtain a feature map F2, inputting the feature map F2 into the convolutional layer 2kC, and obtaining 4k coordinate regression results;
each pixel point of the feature map F1 predefines k anchor points with different sizes and proportions, and k candidate regions mapped back to the original image can be obtained based on anchor regression, wherein each candidate region comprises 2 classification confidence degrees and corresponds to 2k outputs of the two-classification network.
Dividing a corresponding region of F1 into m portions on a given whole feature map F1 and a candidate text region R, and sampling a feature vector for each portion to obtain a local region feature map F3 with the size;
inputting the local region feature map F3 into a text positioning network to obtain the probability of each region belonging to the text;
in this embodiment, as shown in fig. 4, the text positioning network includes two branches, a segmentation branch, and a detection box regression and classification branch; the detection frame regression and classification branches comprise a detection frame regression branch and a detection frame classification branch;
in the segmentation branch, F3 is input into a full convolution network, a text segmentation map Mask of an input image is obtained, and text pixels and background pixels are distinguished at a pixel level;
in the detection box regression branch, inputting F3 into the full connection layer, and performing regression on the candidate text region R to obtain a detection box of the text;
in the detection box classification branch, F3 is input into the full link layer, the region inside the detection box is classified, and the probability that the region belongs to the text is output.
In this embodiment, for the segmentation branch of the text positioning network, a piece loss is adopted:
Figure BDA0003015884580000091
wherein X is a predicted segmentation graph, and Y is a real labeled segmentation graph;
generating detection branches of the network for the text positioning network and the candidate text area, and adopting IoU loss:
Lbox=1-IoU
Figure BDA0003015884580000101
wherein D is a detection frame, and G is a real labeling frame;
generating classification branches of the network for the text positioning network and the candidate text regions, and using a binary cross entropy loss function:
Figure BDA0003015884580000102
wherein, p represents the prediction probability,
Figure BDA0003015884580000103
representing a real category;
the final loss function is defined as:
L=Lmask+Lbox+Lcls
s5, inputting the data with labels into a text detector to train to obtain a model, specifically:
s51, in this embodiment, parameters related to training are set: updating model parameters by adopting a stochastic gradient descent method, setting an initial learning rate lr to be 0.01, weight _ decay to be 0.0005, the number of pictures of batch _ size of each batch training to be 8, iteration times iters to be 50000, a learning rate updating strategy to be step, an updating coefficient lambda to be 0.1, and updating step sizes to be 30000 and 40000. In the candidate text region generation network, the anchors are set to have sizes of 322, 642, 1282, 2562, and 5122, and aspect ratios of 1:1, 1:2, and 2: 1;
s52, using the model trained by the Imagenet classification task as a pre-training model of the backbone network to initialize parameters;
s53, training the convolutional neural network, and training the feature extraction network, the candidate text region generation network and the text positioning network by adopting an end-to-end training method, wherein the training method specifically comprises the following steps:
reading pictures and labels in the training set in batches, inputting the pictures into a text detector to obtain a prediction result, calculating the loss generated by the prediction result and the labels, reducing the loss by using a gradient descent method, updating network parameters of a feature extraction network, a candidate text region generation network and a text positioning network, and iteratively training the text detector to find the optimal parameter.
S6, testing the network, specifically including:
s61, inputting the pictures in the test set into the trained model for forward reasoning;
s62, after the detection result is obtained, automatically comparing the result with a real label by using a program to obtain the detection accuracy and the recall rate, and calculating the harmonic mean of the result and the recall rate as the evaluation index of the whole graph;
and S63, randomly selecting the detection effect of 30 images, and automatically framing the student information and the belonged items of each image with judgment probability.
As shown in fig. 5, the result of detecting a 4680 × 403 head image of a test paper is shown, in which student information and the items are outlined, and the upper left corner has a decision probability.
It should also be noted that in this specification, terms such as "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. The method for automatically detecting the information of the test paper head students based on deep learning is characterized by comprising the following steps:
s1, acquiring data, namely scanning the front surfaces of a plurality of student test papers by using a scanner to obtain a plurality of test paper full pictures, and cutting the head positions of the test paper images to obtain a plurality of test paper head images;
s2, marking data, manually marking the image of the paper head to obtain a detection frame of student information, and dividing a training set and a test set;
s3, synthesizing data, and expanding data volume through synthesized data;
s4, constructing a text detector, wherein the text detector is constructed by using a convolutional neural network, comprises a feature extraction network, a candidate text region generation network, a region feature sampling module and a text positioning network, and different loss functions are designed for each component network;
s5, training a text detector, setting training relevant parameters by adopting a pre-training model, and inputting labeled data into the text detector for training;
and S6, inputting the test data into the trained text detector for detection to obtain the detection result and probability of the student information.
2. The method for automatically detecting student information in paper based on deep learning according to claim 1, wherein the step S2 specifically includes:
marking software is adopted to manually mark a horizontal rectangular frame of student information, including the marking of positions and categories;
recording the coordinates of the upper left corner of the horizontal rectangular frame and the width and height data in a file;
images were randomly divided into training and test sets.
3. The method for automatically detecting student information in paper based on deep learning according to claim 1, wherein the step S3 specifically includes:
s31, carrying out data statistics analysis on the manually marked real data, wherein the data statistics analysis comprises the aspect ratio of the head image of the test paper, the aspect ratio and the size of the marking frames and the distance between the marking frames;
s32, setting the width and height of the generated image and the text interval according to the data statistical result, automatically generating a test paper header image containing the items to be filled but not filled with student information, and simultaneously storing the categories and coordinates of the items to be filled;
s33, crawling the linguistic data of the student information on the Internet, wherein the linguistic data comprise names, classes and schools of the students, filtering the character information with the length larger than 10, and storing the character information into different json files according to the items to which the information belongs, so that each json file forms a corpus containing the student information of different items;
s34, downloading a Chinese handwriting data set as an image library for subsequently pasting single handwritten character images;
s35, for each item to be filled in the head of the test paper, randomly selecting a piece of information from the corresponding item corpus, for each character of the piece of information, randomly selecting one single character image from the group of corresponding images, wherein the single character image corresponds to a group of single character images handwritten by different people in the image library, and pasting the single character image to the right side of the item to be filled in the head image of the test paper in sequence;
s36, performing affine transformation, adding salt and pepper noise, rotation and Gaussian blur on the test paper head image;
and S37, synthesizing a plurality of images based on the steps S31 to S36, and combining the images with artificially labeled real data to form a training set.
4. The method for automatically detecting student information at the beginning of a test paper based on deep learning according to claim 1, wherein the feature extraction network specifically comprises:
the feature extraction network adopts ResNet50 and a bidirectional feature pyramid network BiFPN in a residual neural network, and the ResNet50 improves the feature extraction capability and relieves the network degradation problem through a shortcut connection mode;
and the BiFPN performs bottom-up and top-down fusion on the extracted features of different layers simultaneously to finally obtain a multi-channel feature map F1.
5. The method for automatically detecting student information at the beginning of a test paper based on deep learning according to claim 4, wherein the candidate text area generation network specifically comprises:
inputting the multi-channel feature map F1 into a candidate text region to generate a network, and obtaining a candidate text region R;
the candidate text region generation network comprises a two-classification network and a detection frame regression network;
in a binary network, F1 is input into a convolutional layer 256C with a convolutional kernel size of 3 x 3 and a step size of 1, and a feature map F2 of 256 channels is output; inputting the feature map F2 into the convolution layer 2kC, wherein the convolution kernel size is 1 x 1, the step length is 1, and the number of output channels is 2 k;
inputting F1 into the convolutional layer 256C in a detection frame regression network, performing feature extraction to obtain a feature map F2, inputting the feature map F2 into the convolutional layer 2kC, and obtaining 4k coordinate regression results;
each pixel point of the feature map F1 predefines k anchor points with different sizes and proportions, and k candidate regions mapped back to the original image can be obtained based on anchor regression, wherein each candidate region comprises 2 classification confidence degrees and corresponds to 2k outputs of the two-classification network.
6. The method for automatically detecting student information at the beginning of a test paper based on deep learning according to claim 5, wherein the region feature sampling module specifically comprises:
given a feature map F1 of the whole map and a candidate text region R, the corresponding region of F1 is divided into m × m portions, and a feature vector is sampled for each portion to obtain a local region feature map F3 of m × m size.
7. The method for automatically detecting the student information at the beginning of the test paper based on deep learning according to claim 6, wherein the text positioning network specifically comprises:
inputting the local region feature map F3 into a text positioning network to obtain the probability of each region belonging to the text;
the text positioning network comprises two branches, a segmentation branch, a detection box regression branch and a classification branch; the detection frame regression and classification branches comprise a detection frame regression branch and a detection frame classification branch;
in the segmentation branch, F3 is input into a full convolution network, a text segmentation map Mask of an input image is obtained, and text pixels and background pixels are distinguished at a pixel level;
in the detection box regression branch, inputting F3 into the full connection layer, and performing regression on the candidate text region R to obtain a detection box of the text;
in the detection box classification branch, F3 is input into the full link layer, the region inside the detection box is classified, and the probability that the region belongs to the text is output.
8. The method for automatically detecting student information at the beginning of test paper based on deep learning according to claim 7, wherein the loss function is specifically:
for the segmentation branch of the text positioning network, adopting Diceloss, specifically:
Figure FDA0003015884570000031
wherein X is a predicted segmentation graph, and Y is a real labeled segmentation graph;
for the detection branches of the text positioning network and the candidate text area generation network, an IoU loss is adopted, and the method specifically comprises the following steps:
Lbox=1-IoU
Figure FDA0003015884570000041
wherein D is a detection frame, and G is a real labeling frame;
for the text positioning network and the classification branch of the candidate text region generation network, a binary cross entropy loss function is adopted, and the method specifically comprises the following steps:
Figure FDA0003015884570000042
wherein, p represents the prediction probability,
Figure FDA0003015884570000043
representing a real category;
the final loss function is defined as:
L=Lmask+Lbox+Lcls
9. the method for automatically detecting student information in paper based on deep learning according to claim 1, wherein the step S5 is specifically as follows:
taking a model trained by an Imagenet classification task as a pre-training model of the feature extraction network, and initializing parameters;
setting parameters related to training, updating model parameters by adopting a random gradient descent method, setting an initial learning rate as lr, weight attenuation as weight _ decay, the number of pictures for batch training each time as batch _ size, iteration times as iters, a learning rate updating strategy as step, an updating coefficient as lambda and an updating step as stepsize;
in the candidate text region generation network, the anchors are set to have sizes of 322, 642, 1282, 2562, and 5122, and aspect ratios of 1:1, 1:2, and 2: 1;
and the training text detector reads pictures and labels in the training set in batches, inputs the pictures into the text detector to obtain a prediction result, calculates the loss generated by the prediction result and the labels, reduces the loss by using a gradient descent method, updates network parameters of the feature extraction network, the candidate text region generation network and the text positioning network, and iteratively trains the text detector to find the optimal parameter.
10. The method for automatically detecting student information in paper based on deep learning according to claim 1, wherein the step S6 specifically includes:
inputting the pictures in the test set into a trained text detector for forward reasoning;
after the detection result is obtained, the result is automatically compared with a real label by using a program to obtain the detection accuracy and recall rate, and the harmonic mean of the detection accuracy and the recall rate is calculated to be used as an evaluation index of the whole graph;
and (4) randomly selecting the detection effect of a plurality of images, automatically framing out the student information and the affiliated items of each image, and simultaneously carrying out judgment probability.
CN202110388294.XA 2021-04-12 2021-04-12 Test paper head student information automatic detection method based on deep learning Active CN113076900B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110388294.XA CN113076900B (en) 2021-04-12 2021-04-12 Test paper head student information automatic detection method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110388294.XA CN113076900B (en) 2021-04-12 2021-04-12 Test paper head student information automatic detection method based on deep learning

Publications (2)

Publication Number Publication Date
CN113076900A true CN113076900A (en) 2021-07-06
CN113076900B CN113076900B (en) 2022-06-14

Family

ID=76617428

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110388294.XA Active CN113076900B (en) 2021-04-12 2021-04-12 Test paper head student information automatic detection method based on deep learning

Country Status (1)

Country Link
CN (1) CN113076900B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113343990A (en) * 2021-07-28 2021-09-03 浩鲸云计算科技股份有限公司 Key text detection and classification training method for certificate pictures
CN113780087A (en) * 2021-08-11 2021-12-10 同济大学 Postal parcel text detection method and equipment based on deep learning
CN114708127A (en) * 2022-04-15 2022-07-05 广东南粤科教研究院 Student point system comprehensive assessment method and system
CN115565190A (en) * 2022-11-17 2023-01-03 江西风向标智能科技有限公司 Test paper layout analysis method, system, computer and readable storage medium
CN116128954A (en) * 2022-12-30 2023-05-16 上海强仝智能科技有限公司 Commodity layout identification method, device and storage medium based on generation network

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103853852A (en) * 2014-03-31 2014-06-11 广州视源电子科技股份有限公司 Electronic test paper importing method
JP2018067219A (en) * 2016-10-21 2018-04-26 株式会社森山商会 Score input device, program thereof, and computer readable recording medium recording program thereof
CN110751232A (en) * 2019-11-04 2020-02-04 哈尔滨理工大学 Chinese complex scene text detection and identification method
US20200090539A1 (en) * 2018-08-13 2020-03-19 Hangzhou Dana Technology Inc. Method and system for intelligent identification and correction of questions
CN111539309A (en) * 2020-04-21 2020-08-14 广州云从鼎望科技有限公司 Data processing method, system, platform, equipment and medium based on OCR
CN111553423A (en) * 2020-04-29 2020-08-18 河北地质大学 Handwriting recognition method based on deep convolutional neural network image processing technology
CN111753828A (en) * 2020-05-19 2020-10-09 重庆邮电大学 Natural scene horizontal character detection method based on deep convolutional neural network
CN111898411A (en) * 2020-06-16 2020-11-06 华南理工大学 Text image labeling system, method, computer device and storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103853852A (en) * 2014-03-31 2014-06-11 广州视源电子科技股份有限公司 Electronic test paper importing method
JP2018067219A (en) * 2016-10-21 2018-04-26 株式会社森山商会 Score input device, program thereof, and computer readable recording medium recording program thereof
US20200090539A1 (en) * 2018-08-13 2020-03-19 Hangzhou Dana Technology Inc. Method and system for intelligent identification and correction of questions
CN110751232A (en) * 2019-11-04 2020-02-04 哈尔滨理工大学 Chinese complex scene text detection and identification method
CN111539309A (en) * 2020-04-21 2020-08-14 广州云从鼎望科技有限公司 Data processing method, system, platform, equipment and medium based on OCR
CN111553423A (en) * 2020-04-29 2020-08-18 河北地质大学 Handwriting recognition method based on deep convolutional neural network image processing technology
CN111753828A (en) * 2020-05-19 2020-10-09 重庆邮电大学 Natural scene horizontal character detection method based on deep convolutional neural network
CN111898411A (en) * 2020-06-16 2020-11-06 华南理工大学 Text image labeling system, method, computer device and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
XIANGLE CHEN ET AL: "Radical aggregation network for few-shot offline handwritten Chinese character recognition", 《PATTERN RECOGNITION LETTERS》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113343990A (en) * 2021-07-28 2021-09-03 浩鲸云计算科技股份有限公司 Key text detection and classification training method for certificate pictures
CN113343990B (en) * 2021-07-28 2021-12-03 浩鲸云计算科技股份有限公司 Key text detection and classification training method for certificate pictures
CN113780087A (en) * 2021-08-11 2021-12-10 同济大学 Postal parcel text detection method and equipment based on deep learning
CN113780087B (en) * 2021-08-11 2024-04-26 同济大学 Postal package text detection method and equipment based on deep learning
CN114708127A (en) * 2022-04-15 2022-07-05 广东南粤科教研究院 Student point system comprehensive assessment method and system
CN115565190A (en) * 2022-11-17 2023-01-03 江西风向标智能科技有限公司 Test paper layout analysis method, system, computer and readable storage medium
CN116128954A (en) * 2022-12-30 2023-05-16 上海强仝智能科技有限公司 Commodity layout identification method, device and storage medium based on generation network
CN116128954B (en) * 2022-12-30 2023-12-05 上海强仝智能科技有限公司 Commodity layout identification method, device and storage medium based on generation network

Also Published As

Publication number Publication date
CN113076900B (en) 2022-06-14

Similar Documents

Publication Publication Date Title
CN113076900B (en) Test paper head student information automatic detection method based on deep learning
CN111325203B (en) American license plate recognition method and system based on image correction
CN107403130A (en) A kind of character identifying method and character recognition device
CN108537269B (en) Weak interactive object detection deep learning method and system thereof
CN103488711B (en) A kind of method and system of quick Fabrication vector font library
CN110287960A (en) The detection recognition method of curve text in natural scene image
CN109726628A (en) A kind of recognition methods and system of form image
CN111062885A (en) Mark detection model training and mark detection method based on multi-stage transfer learning
CN110163208B (en) Scene character detection method and system based on deep learning
CN105893968A (en) Text-independent end-to-end handwriting recognition method based on deep learning
CN112528862B (en) Remote sensing image target detection method based on improved cross entropy loss function
CN111626292B (en) Text recognition method of building indication mark based on deep learning technology
CN110929746A (en) Electronic file title positioning, extracting and classifying method based on deep neural network
CN113673338A (en) Natural scene text image character pixel weak supervision automatic labeling method, system and medium
CN110414616A (en) A kind of remote sensing images dictionary learning classification method using spatial relationship
CN109598185A (en) Image recognition interpretation method, device, equipment and readable storage medium storing program for executing
CN111563563B (en) Method for enhancing combined data of handwriting recognition
CN112232371A (en) American license plate recognition method based on YOLOv3 and text recognition
CN111666937A (en) Method and system for recognizing text in image
CN111507353B (en) Chinese field detection method and system based on character recognition
CN113378812A (en) Digital dial plate identification method based on Mask R-CNN and CRNN
CN114119949A (en) Method and system for generating enhanced text synthetic image
CN108052936B (en) Automatic inclination correction method and system for Braille image
CN110443235B (en) Intelligent paper test paper total score identification method and system
JPH08508128A (en) Image classification method and apparatus using distribution map

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant