CN111612045A  Universal method for acquiring target detection data set  Google Patents
Universal method for acquiring target detection data set Download PDFInfo
 Publication number
 CN111612045A CN111612045A CN202010355022.5A CN202010355022A CN111612045A CN 111612045 A CN111612045 A CN 111612045A CN 202010355022 A CN202010355022 A CN 202010355022A CN 111612045 A CN111612045 A CN 111612045A
 Authority
 CN
 China
 Prior art keywords
 handwritten
 data set
 sample
 image
 target detection
 Prior art date
 Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
 Granted
Links
Images
Classifications

 G—PHYSICS
 G06—COMPUTING; CALCULATING OR COUNTING
 G06F—ELECTRIC DIGITAL DATA PROCESSING
 G06F18/00—Pattern recognition
 G06F18/20—Analysing
 G06F18/24—Classification techniques
 G06F18/241—Classification techniques relating to the classification model, e.g. parametric or nonparametric approaches

 G—PHYSICS
 G06—COMPUTING; CALCULATING OR COUNTING
 G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
 G06V30/00—Character recognition; Recognising digital ink; Documentoriented imagebased pattern recognition
 G06V30/10—Character recognition
 G06V30/24—Character recognition characterised by the processing or recognition method
 G06V30/242—Division of the character sequences into groups prior to recognition; Selection of dictionaries
 G06V30/244—Division of the character sequences into groups prior to recognition; Selection of dictionaries using graphical properties, e.g. alphabet type or font

 G—PHYSICS
 G06—COMPUTING; CALCULATING OR COUNTING
 G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
 G06N3/00—Computing arrangements based on biological models
 G06N3/02—Neural networks
 G06N3/04—Architecture, e.g. interconnection topology
 G06N3/045—Combinations of networks

 G—PHYSICS
 G06—COMPUTING; CALCULATING OR COUNTING
 G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
 G06N3/00—Computing arrangements based on biological models
 G06N3/02—Neural networks
 G06N3/08—Learning methods

 G—PHYSICS
 G06—COMPUTING; CALCULATING OR COUNTING
 G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
 G06V30/00—Character recognition; Recognising digital ink; Documentoriented imagebased pattern recognition
 G06V30/10—Character recognition

 Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSSSECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSSREFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
 Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
 Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
 Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
 Engineering & Computer Science (AREA)
 Theoretical Computer Science (AREA)
 Data Mining & Analysis (AREA)
 General Physics & Mathematics (AREA)
 Physics & Mathematics (AREA)
 Computer Vision & Pattern Recognition (AREA)
 Multimedia (AREA)
 Life Sciences & Earth Sciences (AREA)
 Artificial Intelligence (AREA)
 Bioinformatics & Cheminformatics (AREA)
 Bioinformatics & Computational Biology (AREA)
 Evolutionary Biology (AREA)
 Evolutionary Computation (AREA)
 General Engineering & Computer Science (AREA)
 Image Analysis (AREA)
Abstract
The present invention provides a general method of obtaining a target detection dataset. The invention uses Optical Character Recognition (OCR) technology, i.e. the characters printed on the paper are checked by means of a laser scanner, then the character shapes are translated into computer characters by using a recognition method, a data set similar to the public MNIST handwritten form number set is obtained, then a target detection data set is obtained by sampling from the data set, and the marking information can be directly generated by an algorithm. The method can acquire the target detection data set more quickly, can solve the problem of insufficient data sets to a certain extent, and can be also applied to the data set generated by the method as a means for testing the network performance.
Description
Technical Field
The invention relates to the field of machine learning data set generation, in particular to a general method for acquiring a target detection data set.
Background
With the progress and development of science and technology, deep learning and computer vision are rapidly developed. The target detection is an important branch in the field of computer vision, and mainly has the tasks of finding out the position information of a target object in an image, judging the type of the object in a target frame, and drawing all predicted frame bounding boxes on the image by the target. The related technology of deep learning is rapidly developed due to the breakthrough of hardware computing capability in recent years, and the field of computer vision is continuously improved. Target detection algorithms based on deep learning are currently mainly classified into two categories: one is the RCNN series, such as Fast RCNN, Fast RCNN and other twostage way, and the other is the YOLO, SSD and other endtoend onesrage way.
Compared with the traditional method, the target detection is performed by using deep learning, so that false detection and missed detection can be reduced, the method is better suitable for complex scenes, and the detection speed is greatly improved. However, if the target detection model that we need to train lacks or has no existing data set, it is generally the conventional practice to manually mark the position of the target object in the image and the category of the marked object by using software such as LabelImg, eidolon marking, etc. For a large number of images in a data set, it is a tedious and timeconsuming task to make and process the data set. Besides the traditional manual labeling approach, various function methods available in the library can be used. For example, the OpenCV threshold filtering method is used to obtain the target position by extracting a portion in a specified gray scale value range, but this method has a high requirement on the image, and if the target object in the image is similar to the background color or the background is complex, it is difficult to extract the object in the image.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: in order to acquire a target detection data set, conventionally, software such as LabelImg is used to manually mark the position of a target object in an image and mark the category of the target object, which consumes a lot of effort and time.
In view of the above practical situation, the present invention provides a general method for acquiring a target detection data set. The invention utilizes the Optical Character Recognition (OCR) technology, namely, the characters printed on the paper are checked by a laser scanner, then the character shapes are translated into computer characters by using a recognition method, a data set similar to a public MNIST handwritten form number set is obtained, then a target detection data set is obtained by sampling from the data set, and the marking information can be directly generated by an algorithm, and the method comprises the following steps:
And 2, preprocessing the acquired digital image of the handwritten form and generating an original data set.
The preprocessing comprises image correction, graying digital images, image denoising and digital normalization processing, and cutting images obtained by scanning each piece of A4 printing paper by using an auxiliary writing standard mark, so that each cut image only contains one handwritten figure, and an original data set is sequentially generated from left to right and from top to bottom;
and 3, creating a corresponding category label. Due to the classification problem involved in object detection, a number class label is created in the order of arrangement of numbers in an image scanned from each a4 printing sheet and stored as a onedimensional vector. The numerical arrangement order is consistent with the generation order of the original data set in step 2.
And 4, dividing the data set. If correlation factors exist among the handwritten number fonts, the correlation factors are considered in the division of the training set and the verification set, and the difference between the training set and the verification set is ensured to be as small as possible.
And 5, generating a target detection data set.
Parameters of the single handwritten digit, such as scale and rotation angle, are set, and distance constraints between the handwritten digits and quantity constraints of the handwritten digits in the generated single image are added, so that robustness and effectiveness of the data set are improved.
And 6, constructing a Faster RCNN neural network and testing the effectiveness of the data set.
The Faster RCNN neural network is a full convolution neural network (FCN), and outputs the category and spatial information of the detected object, so that the related tasks of target detection can be completed. The effectiveness of the target detection data set is indirectly detected by constructing a Faster RCNN neural network fitting data set and evaluating the performance of the network by using a mAP (mean Average precision) index.
The beneficial results are that:
the method can acquire the target detection data set more quickly, can solve the problem of insufficient data sets to a certain extent, and can be also applied to the data set generated by the method as a means for testing the network performance.
Drawings
FIG. 1 is a sample data set of a base paper;
FIG. 2 is an example of a cut 28 x 28 handwritten number font image;
FIG. 3 is an example of a 200 x 200 target data set generated;
FIG. 4 is an example of a 200 x 200 generated target data set with bounding boxes of a group Truth;
FIG. 5 is a flowchart of an algorithm for generating a single sample;
FIG. 6 is a diagram showing an example of a storage form of the generated group Truth;
FIG. 7 is a diagram showing an example of the structure of the neural network Faster RCNN in this example;
FIG. 8 shows the configuration of the Faster RCNN parameter and the test indexes in this example.
Detailed description of the invention
The present invention will be described in further detail below with reference to the accompanying drawings and examples.
Fig. 1 is a sample data set of a base paper.
And 2, preprocessing the acquired digital image of the handwritten form. The specific operation is as follows:
(a) graying the image. According to different perception degrees of human eyes to colors, the generated gray level image pixel value calculation formula is as follows:
Gray(i，j)＝0.299×R(i，j)+0.587×G(i，j)+0.114×B(i，j) (1)
where Gray is the target Gray image matrix and R, G, B corresponds to the three channel matrices of the input handwritten digital image, respectively. i. j correspond to the rows and columns, respectively, of the matrix of target gray images.
(b) Image denoising. And denoising the grayed digital image of the handwriting body by utilizing an image filtering technology.
(c) Reverse the image. The handwritten digital image is inverted, and the calculation formula is as follows:
Neg(i，j)＝255Gray(i，j) (2)
wherein Neg is an image matrix obtained after inversion.
(d) Unifying the size of the image. The number of writing paper sheets used in this embodiment is 20 × 15, and the size of a single number is set to 28 × 28, resulting in a uniform size of 560 × 420 pixels.
(e) And cutting the processed digital handwriting image to ensure that each output image only contains one handwriting digit. However, considering the existence of the auxiliary writing mark, namely the interval line between the numbers, the outermost circles of pixel values in the extracted 28 × 28 pixels are set to zero, and the specific number of circles can be manually selected according to the actual situation. The reference range is 1 to 3 circles of pixels, resulting in an image like the pattern of fig. 2, the calculation formula being:
where Cropped is a single clipped digital handwritten image, i ═ 0,1, …,27, j ═ 0,1, …,27, and each represents a row and a column, and k is 2 in this example. In scanning each sheet of a4 paper with the aid of the writingassist specification marks to obtain an image, the original data sets are generated sequentially from left to right and from top to bottom.
And 3, creating a corresponding category label. Due to the classification problem involved in object detection, numeric category labels are created and stored as onedimensional vectors in the order in which the numbers in the image are scanned for each sheet of a4 printing paper. In this embodiment, the category label is a number corresponding to the handwritten number itself. This order of arrangement is consistent with the order of generation of the original data set in step 2. The class labels need to be subjected to OneHot coding, so that the problem that attribute data are not easy to process by a classifier is solved, and the function of expanding features is achieved to a certain extent.
And 4, dividing the data set. And according to the correlation factor between the handwritten number fonts, a training set and a verification set are divided, so that the difference between the two sets is ensured to be as small as possible. The correlation factor can be considered as the style of the handwritten numbers, and the styles of the handwritten numbers are different due to different handwriting habits among different people and may also be different for the same person in different periods. Since 3000 handwritten figures in this embodiment are written by 5 persons on average, 5 different styles of handwritten figures are considered. Therefore, when the original data set is divided, the quantity of different types of labels and different styles in the obtained training set and verification set is ensured to be consistent. For example, in this embodiment, the training set and validation set are scaled 9 to 1, the training set contains 2700 handwritten number samples, 270 for each category, and 540 for each style. Verification set as such, there are 30 for each category and 60 for each style in 300 verification set samples. This way the partitioning of the original data set is more robust.
And 5, generating a target detection data set, wherein the specific method comprises the following steps:
first the parameters of the target detection data set are determined. The parameters include: and generating the size and the total number of samples of the target detection data set, the scaling range of the handwritten numbers in each sample, the change range of the rotation angle, the number range and the minimum distance of the handwritten numbers.
In this embodiment, 3000 training set samples of 200 × 200 pixel values are generated from 2700 original training set samples, and similarly, 1000 verification set samples are generated from 300 original verification set samples. In each sample: the number is in the range of (4, 6), the rotation angle is in the range of (15 °, 15 °), the scale of handwritten numbers is in the range of (1.5, 3), and the minimum pitch of the center points is 36 pixels. Fig. 3 is a generation example, and fig. 4 marks coordinates of numbers and category information.
The generation of a single target detection training set sample and a ground truth (machine learning supervised data) is as follows, and fig. 5 is a flow chart of the operation.
(a) Generating handwritten numbers in the sample. Randomly generating the number N of handwritten digits in the sample from the established number range (4, 6), and randomly extracting N handwritten digits X from 2700 samples of the original training set_{1}，…X_{n}，…，X_{N}In which X is_{n}∈R^{28*28}And a corresponding class label L ═ { L ═ L_{1}，…L_{n}，…，L_{N}In which L is_{n}∈R^{10}. A scaling, rotation angle is then randomly generated for each handwritten digit from the established range. The handwritten numbers are rotated and then scaled. The rotation center is the original sample center, and the scaling calculation formula is as follows; row ═ m_{1}*28，col＝m_{2}Row 28, row, col are rows and columns, m, respectively, of scaled handwritten numbers_{1}，m_{2}For the scaling factor, the length and width are scaled according to the same proportionm_{1，2}∈ (1.5, 3.) it is also possible to make the scaling of the length and width of handwritten numbers inconsistent, i.e. without having to ensureIn this embodiment, the influence is large and therefore, the present invention is not used. The result of X after rotation and scaling is recorded as Y ═ Y_{1}，，…Y_{n}，…，Y_{N}In which Y is_{n}∈R^{row*col}。
(b) Calculating a rectangular box enclosing the handwritten digit in Y_{n}Coordinates of (2). In this embodiment, for one Y_{n}The handwritten digit does not occupy a large range in the raw pixel sample, namely the original training sample after rotating and scaling, the size of the rectangular frame cannot be directly set as the raw pixel sample, extra calculation is needed, and the influence of the background of the handwritten digit is removed. The rectangular frame surrounding the handwritten digit in this embodiment is Y_{n}The coordinate calculation formulas of the middle upper left corner and the lower right corner are as follows:
xl_{n}＝min_{x}[argwhere(Y_{n}，t)]g (4)
xr_{n}＝max_{x}[argwhere(Y_{n}，t)]+g (5)
yl_{n}＝min_{y}[argwhere(Y_{n}，t)]g (6)
yr_{n}＝max_{y}[argwhere(Y_{n}，t)]+g (7)
wherein (xl)_{n}，yl_{n})，(xr_{n}，yr_{n}) Respectively as rectangular frames surrounding the handwritten digit at Y_{n}Coordinates of the middle upper left corner and the lower right corner, argwhere (Y)_{n}And t) is extraction of Y_{n}Coordinates with pixel value greater than threshold t, min_{x}，max_{x}，min_{y}，max_{y}Representing taking the minimum or maximum value of the coordinate in the x or y direction, respectively. Meanwhile, in order to ensure that a certain gap exists between the rectangular frame surrounding the handwritten numeral and the handwritten numeral, a constant term g is set in the formula. The values of t and g are set to 100 and 3 in this embodiment.
(c) Generating coordinates of the handwritten digit in the sample. In this embodiment, the size of the sample is 200 × 200, and the width and length of the rectangular frame surrounding the handwritten numeral are BW_{n}＝xr_{n}xl_{n}，BL_{n}＝yr_{n}yl_{n}The coordinate (x, y) of the upper left corner of each handwritten numeral { (x)_{1}，y_{1})，…(x_{n}，y_{n})，…，(x_{N}，y_{N}) In which x_{n}∈(0，200BW_{n})，y_{n}∈(0，200BL_{n}) The minimum distance of the handwritten digit is at least 36 pixels, the distance is Euclidean distance, and the reference point is the center point of the handwritten digit. The operation of calculating (x, y) is as follows:
(x_{n}，y_{n}) Is the nth coordinate of (x, y), randomly sampling from the value range, and judging the following formula:
to calculateAndthe euclidean distance of (c). If the formula result is false, randomly sampling from the value range of the formula again and judging the formula (8) again; if the formula result is true, then the same method is used to sample and determine the next coordinate (x)_{n+1}，y_{n+1}) Until all coordinates are generated. The maximum number of iterations, set to 10000 in this embodiment, may be set to prevent an unexpected situation.
(d) Generating a sample of the target detection dataset. First, a background P with a pixel value of 200 × 200 is generated, and coordinates (x, Y) are added pixel by pixel as the upper left corner of the handwritten numeral Y. The specific operation is as follows: for a single handwritten number Y_{n}The background P is set at the coordinate (x)_{n}，y_{n}) To (x)_{n}+BW_{n}，y_{n}+BL_{n}) Pixel in between and Y_{n}Adding corresponding pixels, and recording the result of completing the operation of N handwritten numbers as D ∈ R^{200*200}I.e. a sample of the target detection data set.
(e) Generating a ground truth of the target detection data set, namely machine learning supervision data. The ground truth consists of a handwritten digit category label L and a rectangular box enclosing all handwritten digits in the sample. Note that all the rectangular frames B ═ B in one sample_{1}，…B_{n}，…，B_{N}In which B is_{n}＝(x_{n}+xl_{n}，y_{n}+yl_{n}，x_{n}+xr_{n}，y_{n}+yr_{n})；B_{n}Containing the coordinates (x) of the rectangle box at the upper left corner of the sample_{n}+xl_{n}，y_{n}+yl_{n}) And coordinates of the lower right corner (x)_{n}+xr_{n}，y_{n}+yr_{n}) Note that the integrated ground route is LB ═ L, B }; as shown in fig. 6, each line of data output by the code represents a group route of a sample, which is sequentially a file name of the sample, coordinates of the upper left corner and the lower right corner of the rectangular frame, and a handwritten number category.
The sample D and its corresponding group channel LB., which generated one target detection data set, repeat the above operations of this step until 3000 training set samples and their corresponding group channels are generated. The generation of the verification set only needs to change the extraction of X and L in operation (a) to 300 original verification set samples, and repeat the remaining operations until 1000 verification set samples and their corresponding group try are generated.
And 6, constructing a Faster RCNN neural network and testing the effectiveness of the data set.
The Faster RCNN neural network is a full convolution neural network (FCN), outputs the category and spatial information of the detected object, and can complete the related tasks of target detection. The effectiveness of the target detection data set is indirectly detected by constructing a Faster RCNN neural network fitting data set and evaluating the performance of the network by using a mAP (mean Average precision) index.
Setting parameters of a network part: the epochs are 20, and the learning rate is 0.0001. FIG. 7 is an exemplary diagram of a Faster RCNN neural network. FIG. 8 is an indication of the Faster RCNN neural network under two superparameters and Ap and mAp indications for each number category under one condition.
From the test results, it can be seen that the Faster RCNN network mAp performed well and the degree of differentiation in certain numeric categories was high. In general, the data set generated with this invention is illustrated as being suitable for training and testing of neural networks.
Claims (7)
1. A general method of obtaining a target detection dataset, comprising the steps of:
step 1, obtaining a sample data set of paper and collecting the sample data set into a computer; a4 printing paper with uniform specification and attached with auxiliary writing standard marks is collected into a computer through a laser scanner, and the consistent type of numbers in each piece of A4 paper is ensured;
step 2, preprocessing the acquired digital image of the handwritten form and generating an original data set;
the preprocessing comprises image correction, graying digital images, image denoising and digital normalization processing, and cutting images obtained by scanning each piece of A4 printing paper by using an auxiliary writing standard mark, so that each cut image only contains one handwritten figure, and an original data set is sequentially generated from left to right and from top to bottom;
step 3, creating a corresponding category label; due to the classification problem involved in target detection, digital category labels are created according to the arrangement sequence of numbers in an image obtained by scanning each piece of A4 printing paper and are stored as onedimensional vectors; the numerical arrangement sequence is consistent with the generation sequence of the original data set in the step 2;
step 4, dividing a data set; if correlation factors exist among the handwritten number fonts, the correlation factors are considered in the division of a training set and a verification set, and the difference between the training set and the verification set is ensured to be as small as possible;
step 5, generating a target detection data set;
setting parameters of single handwritten figures, such as scale and rotation angle, and adding distance constraint between the handwritten figures and quantity constraint of the handwritten figures in a generated single image for increasing robustness and effectiveness of a data set;
step 6, constructing a Faster RCNN neural network and testing the effectiveness of a data set;
the Faster RCNN neural network is a full convolution neural network (FCN), outputs the category and spatial information of a detected object, and can complete related tasks of target detection; the effectiveness of the target detection data set is indirectly detected by constructing a Faster RCNN neural network fitting data set and evaluating the performance of the network by using a mAP (mean Average precision) index.
2. The method of claim 1, wherein the step 1 of obtaining a paper sample data set and collecting the paper sample data set into a computer comprises the following specific operations:
adopting uniform A4 printing paper with auxiliary writing standard marks; the auxiliary writing specification mark is A4 printing paper which is paper with squares so as to be convenient for computer segmentation and collection; collecting a sample data set through a laser scanner; a total of 3000 handwritten figures are collected and are written by 5 persons on average, 600 figures are written by each person, wherein the ratio of each figure type is the same, and the number of the used writing paper holding figures is 1015.
3. The method of claim 2, wherein the step 2 preprocesses the captured digital handwritten image by performing the following operations:
(a) graying the image; according to different perception degrees of human eyes to colors, the generated gray level image pixel value calculation formula is as follows:
Gray(i，j)＝0.299×R(i，j)+0.587×G(i，j)+0.114×B(i，j) (1)
wherein Gray is a target Gray image matrix, R, G, B respectively corresponding to three channel matrices of the input handwritten digital image; i. j respectively corresponds to the rows and columns of the target gray image matrix;
(b) denoising the image; denoising the grayed digital image of the handwriting body by utilizing an image filtering technology;
(c) inverting the image; the handwritten digital image is inverted, and the calculation formula is as follows:
Neg(i，j)＝255Gray(i，j) (2)
wherein Neg is an image matrix obtained after inversion;
(d) unifying the size of the images; the number of the adopted writing paper holding numbers is 20 × 15, the size of a single number is set to be 28 × 28, and the uniform size of the image is 560 × 420 pixels;
(e) cutting the processed digital handwriting image to make each output image only contain one handwriting digit; however, considering the existence of auxiliary writing marks, namely spacing lines between numbers, the pixel values of the outermost circles in the extracted 28 × 28 pixels are set to be zero, and the specific number of circles can be manually selected according to actual conditions; the reference range is 1 to 3 circles of pixels, and the calculation formula is:
wherein Cropped is a single clipped handwritten digital image, i ═ 0,1, …,27}, j ═ 0,1, …,27}, which respectively represent rows and columns, and k is a value of 2; in scanning each sheet of a4 paper with the aid of the writingassist specification marks to obtain an image, the original data sets are generated sequentially from left to right and from top to bottom.
4. The general method for acquiring a target detection data set according to claim 3, wherein the step 3 creates the corresponding category label by the following specific operations:
due to the classification problem involved in target detection, digital category labels are created according to the arrangement sequence of the numbers in each A4 printing paper scanned to obtain an image and are stored as onedimensional vectors; the category label is a number corresponding to the handwritten number; this arrangement order is consistent with the generation order of the original data set in step 2; the class labels need to be subjected to OneHot coding, so that the problem that attribute data are not easy to process by a classifier is solved, and the function of expanding features is achieved to a certain extent.
5. The method of claim 4, wherein the step 4 dividing the data set is performed by:
according to the correlation factor between the handwritten number fonts, a training set and a verification set are divided, and the difference between the two sets is ensured to be as small as possible; the correlation factor can be considered as the style of handwritten figures, the styles of handwritten figures are different due to different handwriting habits among different people, and the styles of handwritten figures of the same person in different periods can also be different; 3000 handwritten figures are written by 5 persons on average, so 5 handwritten figures with different styles are considered; therefore, when the original data set is divided, the quantity of different types of labels and different styles in the obtained training set and verification set is ensured to be consistent; the training set and the validation set are in a ratio of 9 to 1, the training set contains 2700 handwritten digital samples, each category has 270, and each style has 540; verification set as such, there are 30 for each category and 60 for each style in 300 verification set samples.
6. The method of claim 5, wherein the step 5 generates the target detection data set by the following steps:
firstly, determining parameters of a target detection data set; the parameters include: generating the size and the total number of samples of the target detection data set, and the scaling range, the rotation angle change range, the number range and the minimum distance of handwritten numbers of the handwritten numbers in each sample;
generating 3000 training set samples of 200 x 200 pixel values from 2700 original training set samples, and similarly generating 1000 validation set samples from 300 original validation set samples; in each sample: the number range is (4, 6), the rotation angle variation range is (15 degrees, 15 degrees), the scaling range of the handwritten numbers is (1.5, 3), and the minimum distance between the central points is 36 pixels;
generating a single target detection training set sample and ground truth machine learning supervision data;
(a) generating handwritten numbers in the sample; from established quantitative normsRandomly generating the number N of handwritten numbers in the samples in the circle (4, 6), and randomly extracting N handwritten numbers X in 2700 original training set samples as { X ═ X_{1}，...X_{n}，...，X_{N}In which x_{n}∈R^{28*28}And a corresponding class label L ═ { L ═ L_{1}，...L_{n}，...，L_{N}In which L is_{n}∈R^{10}(ii) a Then randomly generating scaling and rotation angles for each handwritten digit from the established range; firstly, rotating the handwritten numbers and then zooming; the rotation center is the original sample center, and the scaling calculation formula is as follows: row ═ m_{1}*28，col＝m_{2}Row 28, row, col are rows and columns, m, respectively, of scaled handwritten numbers_{1}，m_{2}For the scaling factor, the length and width are scaled according to the same proportionm_{1，2}∈ (1.5, 3); note that the result of X rotation and scaling is Y ═ Y { (Y)_{1}，，...Y_{n}，...，Y_{N}In which Y is_{n}∈R^{row*col}；
(b) Calculating a rectangular box enclosing the handwritten digit in Y_{n}Coordinates of (5); for one Y_{n}The handwritten digit does not occupy a large range in a row col pixel sample, namely an original training sample after rotating and scaling, the size of a rectangular frame cannot be directly set as row col, extra calculation is needed, and the influence of the background of the handwritten digit is removed; the rectangular frame surrounding the handwritten digit is in Y_{n}The coordinate calculation formulas of the middle upper left corner and the lower right corner are as follows:
xl_{n}＝min_{x}[argwhere(Y_{n}，t)]g (4)
xr_{n}＝max_{x}[argwhere(Y_{n}，t)]+g (5)
yl_{n}＝min_{y}[argwhere(Y_{n}，t)]g (6)
yr_{n}＝max_{y}[argwhere(Y_{n}，t)]+g (7)
wherein (xl)_{n}，yl_{n})，(xr_{n}，yr_{n}) Respectively as rectangular frames surrounding the handwritten digit at Y_{n}Coordinates of the middle upper left corner and the lower right corner, argwhere (Y)_{n}And t) is extraction of Y_{n}Coordinates with pixel value greater than threshold t, min_{x}，max_{x}，min_{y}，max_{y}Representing the minimum or maximum value of the coordinate in the x or y direction, respectively; meanwhile, in order to ensure that a certain gap exists between a rectangular frame surrounding the handwritten number and the handwritten number, a constant term g is set in the formula;
(c) generating coordinates of the handwritten digit in the sample; the size of the sample is 200 x 200, and the width and length of the rectangular frame surrounding the handwritten digit are BW_{n}＝xr_{n}xl_{n}，BL_{n}＝yr_{n}yl_{n}The coordinate (x, y) of the upper left corner of each handwritten numeral { (x)_{1}，y_{1})，...(x_{n}，y_{n})，...，(x_{N}，y_{N}) In which x_{n}∈(0，200BW_{n})，y_{n}∈(0，200BL_{n}) The minimum distance of the handwritten digit is at least 36 pixels, the distance is Euclidean distance, and the reference point is the central point of the handwritten digit; the operation of calculating (x, y) is as follows:
(x_{n}，y_{n}) Is the nth coordinate of (x, y), randomly sampling from the value range, and judging the following formula:
to calculateAndthe Euclidean distance of (c); if the formula result is false, randomly sampling from the value range of the formula again and judging the formula (8) again; if the formula results inIf true, the same method is used to sample and determine the next coordinate (x)_{n+1}，y_{n+1}) Until all coordinates are generated; the maximum iteration times can be set to prevent the occurrence of accidents;
(d) generating a sample of the target detection dataset; firstly, generating a background P with 200200 pixel values as zero, taking coordinates (x, Y) as the upper left corner of a handwritten number Y, and adding the coordinates one by one; the specific operation is as follows: for a single handwritten number Y_{n}The background P is set at the coordinate (x)_{n}，y_{n}) To (x)_{n}+BW_{n}，y_{n}+BL_{n}) Pixel in between and Y_{n}Adding corresponding pixels, and recording as D ∈ R after N handwritten digit operations^{200*200}I.e. samples of the target detection data set;
(e) generating a ground truth of the target detection data set, namely machine learning supervision data; the ground route is composed of a handwritten number type label L and a rectangular frame surrounding all handwritten numbers in the sample; note that all the rectangular frames B ═ B in one sample_{1}，...B_{n}，...，B_{N}In which B is_{n}＝(x_{n}+xl_{n}，y_{n}+yl_{n}，x_{n}+xr_{n}，y_{n}+yr_{n})；B_{n}Containing the coordinates (x) of the rectangle box at the upper left corner of the sample_{n}+xl_{n}，y_{n}+yl_{n}) And coordinates of the lower right corner (x)_{n}+xr_{n}，y_{n}+yr_{n}) Note that the integrated ground route is LB ═ L, B }; each line of data output by the codes represents a ground route of a sample, and sequentially comprises a file name of the sample, coordinates of the upper left corner and the lower right corner of a rectangular frame and a handwritten number type;
repeating the above operations of the step until 3000 training set samples and their corresponding group truth are generated, wherein the sample D and the corresponding group truth LB. generate one target detection data set; the generation of the verification set only needs to change the extraction of X and L in operation (a) to 300 original verification set samples, and repeat the remaining operations until 1000 verification set samples and their corresponding group try are generated.
7. The universal method for acquiring a target detection data set according to claim 6, wherein step 6, a Faster RCNN neural network is constructed and the validity of the data set is tested, and the specific operation is as follows;
the Faster RCNN neural network is a full convolution neural network, outputs the category and space information of the detected object, and can complete the related task of target detection; the method comprises the steps of (1) constructing a fast RCNN neural network fitting data set, and evaluating the effectiveness of a target detection data set by network performance indirect detection by using a mAP (mean Average precision) index;
setting parameters of a network part: the epochs are 20, and the learning rate is 0.0001.
Priority Applications (1)
Application Number  Priority Date  Filing Date  Title 

CN202010355022.5A CN111612045B (en)  20200429  20200429  Universal method for acquiring target detection data set 
Applications Claiming Priority (1)
Application Number  Priority Date  Filing Date  Title 

CN202010355022.5A CN111612045B (en)  20200429  20200429  Universal method for acquiring target detection data set 
Publications (2)
Publication Number  Publication Date 

CN111612045A true CN111612045A (en)  20200901 
CN111612045B CN111612045B (en)  20230623 
Family
ID=72198382
Family Applications (1)
Application Number  Title  Priority Date  Filing Date 

CN202010355022.5A Active CN111612045B (en)  20200429  20200429  Universal method for acquiring target detection data set 
Country Status (1)
Country  Link 

CN (1)  CN111612045B (en) 
Cited By (2)
Publication number  Priority date  Publication date  Assignee  Title 

CN112396009A (en) *  20201124  20210223  广东国粒教育技术有限公司  Calculation question correcting method and device based on full convolution neural network model 
CN113920517A (en) *  20211011  20220111  广东电网有限责任公司广州供电局  OCR recognition effect evaluation method and device 
Citations (3)
Publication number  Priority date  Publication date  Assignee  Title 

CN109977958A (en) *  20190325  20190705  中国科学技术大学  A kind of offline handwritten form mathematical formulae identification reconstructing method 
CN110399845A (en) *  20190729  20191101  上海海事大学  Continuously at section text detection and recognition methods in a kind of image 
EP3591582A1 (en) *  20180706  20200108  Tata Consultancy Services Limited  Method and system for automatic object annotation using deep network 

2020
 20200429 CN CN202010355022.5A patent/CN111612045B/en active Active
Patent Citations (3)
Publication number  Priority date  Publication date  Assignee  Title 

EP3591582A1 (en) *  20180706  20200108  Tata Consultancy Services Limited  Method and system for automatic object annotation using deep network 
CN109977958A (en) *  20190325  20190705  中国科学技术大学  A kind of offline handwritten form mathematical formulae identification reconstructing method 
CN110399845A (en) *  20190729  20191101  上海海事大学  Continuously at section text detection and recognition methods in a kind of image 
NonPatent Citations (2)
Title 

YANG J等: "Handwriting text recognition based on faster RCNN" * 
陈英等: "智能电能表数字识别算法研究" * 
Cited By (2)
Publication number  Priority date  Publication date  Assignee  Title 

CN112396009A (en) *  20201124  20210223  广东国粒教育技术有限公司  Calculation question correcting method and device based on full convolution neural network model 
CN113920517A (en) *  20211011  20220111  广东电网有限责任公司广州供电局  OCR recognition effect evaluation method and device 
Also Published As
Publication number  Publication date 

CN111612045B (en)  20230623 
Similar Documents
Publication  Publication Date  Title 

CN111814722B (en)  Method and device for identifying table in image, electronic equipment and storage medium  
CN111325203B (en)  American license plate recognition method and system based on image correction  
CN110503054B (en)  Text image processing method and device  
WO2017016240A1 (en)  Banknote serial number identification method  
CN106778586A (en)  Offline handwriting signature verification method and system  
CN104715256A (en)  Auxiliary calligraphy exercising system and evaluation method based on image method  
CN105046200B (en)  Electronic paper marking method based on straight line detection  
CN111091124B (en)  Spine character recognition method  
CN112307919B (en)  Improved YOLOv 3based digital information area identification method in document image  
CN115457565A (en)  OCR character recognition method, electronic equipment and storage medium  
CN105117741A (en)  Recognition method of calligraphy character style  
CN102737240B (en)  Method of analyzing digital document images  
CN112712273A (en)  Handwritten Chinese character beauty evaluation method based on skeleton similarity  
CN112241730A (en)  Form extraction method and system based on machine learning  
JP3228938B2 (en)  Image classification method and apparatus using distribution map  
CN108052936B (en)  Automatic inclination correction method and system for Braille image  
CN111612045B (en)  Universal method for acquiring target detection data set  
CN110222660B (en)  Signature authentication method and system based on dynamic and static feature fusion  
CN116824608A (en)  Answer sheet layout analysis method based on target detection technology  
JP3428494B2 (en)  Character recognition device, its character recognition method, and recording medium storing its control program  
CN108062548B (en)  Braille square selfadaptive positioning method and system  
CN114241486A (en)  Method for improving accuracy rate of identifying student information of test paper  
CN109740618B (en)  Test paper score automatic statistical method and device based on FHOG characteristics  
CN112633116A (en)  Method for intelligently analyzing PDF (Portable document Format) imagetext  
CN111325194B (en)  Character recognition method, device and equipment and storage medium 
Legal Events
Date  Code  Title  Description 

PB01  Publication  
PB01  Publication  
SE01  Entry into force of request for substantive examination  
SE01  Entry into force of request for substantive examination  
GR01  Patent grant  
GR01  Patent grant 