CN111612045A - Universal method for acquiring target detection data set - Google Patents

Universal method for acquiring target detection data set Download PDF

Info

Publication number
CN111612045A
CN111612045A CN202010355022.5A CN202010355022A CN111612045A CN 111612045 A CN111612045 A CN 111612045A CN 202010355022 A CN202010355022 A CN 202010355022A CN 111612045 A CN111612045 A CN 111612045A
Authority
CN
China
Prior art keywords
handwritten
data set
sample
image
target detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010355022.5A
Other languages
Chinese (zh)
Other versions
CN111612045B (en
Inventor
颜成钢
贺熠凡
方豪
孙垚棋
张继勇
张勇东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dianzi University
Original Assignee
Hangzhou Dianzi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dianzi University filed Critical Hangzhou Dianzi University
Priority to CN202010355022.5A priority Critical patent/CN111612045B/en
Publication of CN111612045A publication Critical patent/CN111612045A/en
Application granted granted Critical
Publication of CN111612045B publication Critical patent/CN111612045B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/24Character recognition characterised by the processing or recognition method
    • G06V30/242Division of the character sequences into groups prior to recognition; Selection of dictionaries
    • G06V30/244Division of the character sequences into groups prior to recognition; Selection of dictionaries using graphical properties, e.g. alphabet type or font
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The present invention provides a general method of obtaining a target detection dataset. The invention uses Optical Character Recognition (OCR) technology, i.e. the characters printed on the paper are checked by means of a laser scanner, then the character shapes are translated into computer characters by using a recognition method, a data set similar to the public MNIST handwritten form number set is obtained, then a target detection data set is obtained by sampling from the data set, and the marking information can be directly generated by an algorithm. The method can acquire the target detection data set more quickly, can solve the problem of insufficient data sets to a certain extent, and can be also applied to the data set generated by the method as a means for testing the network performance.

Description

Universal method for acquiring target detection data set
Technical Field
The invention relates to the field of machine learning data set generation, in particular to a general method for acquiring a target detection data set.
Background
With the progress and development of science and technology, deep learning and computer vision are rapidly developed. The target detection is an important branch in the field of computer vision, and mainly has the tasks of finding out the position information of a target object in an image, judging the type of the object in a target frame, and drawing all predicted frame bounding boxes on the image by the target. The related technology of deep learning is rapidly developed due to the breakthrough of hardware computing capability in recent years, and the field of computer vision is continuously improved. Target detection algorithms based on deep learning are currently mainly classified into two categories: one is the R-CNN series, such as Fast R-CNN, Fast R-CNN and other two-stage way, and the other is the YOLO, SSD and other end-to-end one-srage way.
Compared with the traditional method, the target detection is performed by using deep learning, so that false detection and missed detection can be reduced, the method is better suitable for complex scenes, and the detection speed is greatly improved. However, if the target detection model that we need to train lacks or has no existing data set, it is generally the conventional practice to manually mark the position of the target object in the image and the category of the marked object by using software such as LabelImg, eidolon marking, etc. For a large number of images in a data set, it is a tedious and time-consuming task to make and process the data set. Besides the traditional manual labeling approach, various function methods available in the library can be used. For example, the OpenCV threshold filtering method is used to obtain the target position by extracting a portion in a specified gray scale value range, but this method has a high requirement on the image, and if the target object in the image is similar to the background color or the background is complex, it is difficult to extract the object in the image.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: in order to acquire a target detection data set, conventionally, software such as LabelImg is used to manually mark the position of a target object in an image and mark the category of the target object, which consumes a lot of effort and time.
In view of the above practical situation, the present invention provides a general method for acquiring a target detection data set. The invention utilizes the Optical Character Recognition (OCR) technology, namely, the characters printed on the paper are checked by a laser scanner, then the character shapes are translated into computer characters by using a recognition method, a data set similar to a public MNIST handwritten form number set is obtained, then a target detection data set is obtained by sampling from the data set, and the marking information can be directly generated by an algorithm, and the method comprises the following steps:
step 1, obtaining a sample data set of paper and collecting the sample data set into a computer. A4 printing paper with uniform specification and attached auxiliary writing specification marks is collected into a computer through a laser scanner, and the consistent type of the numbers in each A4 paper is ensured.
And 2, preprocessing the acquired digital image of the handwritten form and generating an original data set.
The preprocessing comprises image correction, graying digital images, image denoising and digital normalization processing, and cutting images obtained by scanning each piece of A4 printing paper by using an auxiliary writing standard mark, so that each cut image only contains one handwritten figure, and an original data set is sequentially generated from left to right and from top to bottom;
and 3, creating a corresponding category label. Due to the classification problem involved in object detection, a number class label is created in the order of arrangement of numbers in an image scanned from each a4 printing sheet and stored as a one-dimensional vector. The numerical arrangement order is consistent with the generation order of the original data set in step 2.
And 4, dividing the data set. If correlation factors exist among the handwritten number fonts, the correlation factors are considered in the division of the training set and the verification set, and the difference between the training set and the verification set is ensured to be as small as possible.
And 5, generating a target detection data set.
Parameters of the single handwritten digit, such as scale and rotation angle, are set, and distance constraints between the handwritten digits and quantity constraints of the handwritten digits in the generated single image are added, so that robustness and effectiveness of the data set are improved.
And 6, constructing a Faster R-CNN neural network and testing the effectiveness of the data set.
The Faster R-CNN neural network is a full convolution neural network (FCN), and outputs the category and spatial information of the detected object, so that the related tasks of target detection can be completed. The effectiveness of the target detection data set is indirectly detected by constructing a Faster R-CNN neural network fitting data set and evaluating the performance of the network by using a mAP (mean Average precision) index.
The beneficial results are that:
the method can acquire the target detection data set more quickly, can solve the problem of insufficient data sets to a certain extent, and can be also applied to the data set generated by the method as a means for testing the network performance.
Drawings
FIG. 1 is a sample data set of a base paper;
FIG. 2 is an example of a cut 28 x 28 handwritten number font image;
FIG. 3 is an example of a 200 x 200 target data set generated;
FIG. 4 is an example of a 200 x 200 generated target data set with bounding boxes of a group Truth;
FIG. 5 is a flowchart of an algorithm for generating a single sample;
FIG. 6 is a diagram showing an example of a storage form of the generated group Truth;
FIG. 7 is a diagram showing an example of the structure of the neural network Faster R-CNN in this example;
FIG. 8 shows the configuration of the Faster R-CNN parameter and the test indexes in this example.
Detailed description of the invention
The present invention will be described in further detail below with reference to the accompanying drawings and examples.
Step 1, obtaining a sample data set of paper and collecting the sample data set into a computer. Adopting uniform A4 printing paper with auxiliary writing standard marks; the auxiliary writing specification is marked as A4 printing paper with square grids for computer segmentation and collection. A sample dataset is collected by a laser scanner. In this example, a total of 3000 handwritten numbers are collected, and each person writes 600 average writings by 5 persons, wherein the ratio of each number category is the same, and the number of the used writing paper holding numbers is 10 × 15, and the sample is shown in fig. 1.
Fig. 1 is a sample data set of a base paper.
And 2, preprocessing the acquired digital image of the handwritten form. The specific operation is as follows:
(a) graying the image. According to different perception degrees of human eyes to colors, the generated gray level image pixel value calculation formula is as follows:
Gray(i,j)=0.299×R(i,j)+0.587×G(i,j)+0.114×B(i,j) (1)
where Gray is the target Gray image matrix and R, G, B corresponds to the three channel matrices of the input handwritten digital image, respectively. i. j correspond to the rows and columns, respectively, of the matrix of target gray images.
(b) Image denoising. And denoising the grayed digital image of the handwriting body by utilizing an image filtering technology.
(c) Reverse the image. The handwritten digital image is inverted, and the calculation formula is as follows:
Neg(i,j)=255-Gray(i,j) (2)
wherein Neg is an image matrix obtained after inversion.
(d) Unifying the size of the image. The number of writing paper sheets used in this embodiment is 20 × 15, and the size of a single number is set to 28 × 28, resulting in a uniform size of 560 × 420 pixels.
(e) And cutting the processed digital handwriting image to ensure that each output image only contains one handwriting digit. However, considering the existence of the auxiliary writing mark, namely the interval line between the numbers, the outermost circles of pixel values in the extracted 28 × 28 pixels are set to zero, and the specific number of circles can be manually selected according to the actual situation. The reference range is 1 to 3 circles of pixels, resulting in an image like the pattern of fig. 2, the calculation formula being:
Figure BDA0002473152720000041
where Cropped is a single clipped digital handwritten image, i ═ 0,1, …,27, j ═ 0,1, …,27, and each represents a row and a column, and k is 2 in this example. In scanning each sheet of a4 paper with the aid of the writing-assist specification marks to obtain an image, the original data sets are generated sequentially from left to right and from top to bottom.
And 3, creating a corresponding category label. Due to the classification problem involved in object detection, numeric category labels are created and stored as one-dimensional vectors in the order in which the numbers in the image are scanned for each sheet of a4 printing paper. In this embodiment, the category label is a number corresponding to the handwritten number itself. This order of arrangement is consistent with the order of generation of the original data set in step 2. The class labels need to be subjected to One-Hot coding, so that the problem that attribute data are not easy to process by a classifier is solved, and the function of expanding features is achieved to a certain extent.
And 4, dividing the data set. And according to the correlation factor between the handwritten number fonts, a training set and a verification set are divided, so that the difference between the two sets is ensured to be as small as possible. The correlation factor can be considered as the style of the handwritten numbers, and the styles of the handwritten numbers are different due to different handwriting habits among different people and may also be different for the same person in different periods. Since 3000 handwritten figures in this embodiment are written by 5 persons on average, 5 different styles of handwritten figures are considered. Therefore, when the original data set is divided, the quantity of different types of labels and different styles in the obtained training set and verification set is ensured to be consistent. For example, in this embodiment, the training set and validation set are scaled 9 to 1, the training set contains 2700 handwritten number samples, 270 for each category, and 540 for each style. Verification set as such, there are 30 for each category and 60 for each style in 300 verification set samples. This way the partitioning of the original data set is more robust.
And 5, generating a target detection data set, wherein the specific method comprises the following steps:
first the parameters of the target detection data set are determined. The parameters include: and generating the size and the total number of samples of the target detection data set, the scaling range of the handwritten numbers in each sample, the change range of the rotation angle, the number range and the minimum distance of the handwritten numbers.
In this embodiment, 3000 training set samples of 200 × 200 pixel values are generated from 2700 original training set samples, and similarly, 1000 verification set samples are generated from 300 original verification set samples. In each sample: the number is in the range of (4, 6), the rotation angle is in the range of (-15 °, 15 °), the scale of handwritten numbers is in the range of (1.5, 3), and the minimum pitch of the center points is 36 pixels. Fig. 3 is a generation example, and fig. 4 marks coordinates of numbers and category information.
The generation of a single target detection training set sample and a ground truth (machine learning supervised data) is as follows, and fig. 5 is a flow chart of the operation.
(a) Generating handwritten numbers in the sample. Randomly generating the number N of handwritten digits in the sample from the established number range (4, 6), and randomly extracting N handwritten digits X from 2700 samples of the original training set1,…Xn,…,XNIn which X isn∈R28*28And a corresponding class label L ═ { L ═ L1,…Ln,…,LNIn which L isn∈R10. A scaling, rotation angle is then randomly generated for each handwritten digit from the established range. The handwritten numbers are rotated and then scaled. The rotation center is the original sample center, and the scaling calculation formula is as follows; row ═ m1*28,col=m2Row 28, row, col are rows and columns, m, respectively, of scaled handwritten numbers1,m2For the scaling factor, the length and width are scaled according to the same proportion
Figure BDA0002473152720000061
m1,2∈ (1.5, 3.) it is also possible to make the scaling of the length and width of handwritten numbers inconsistent, i.e. without having to ensure
Figure BDA0002473152720000062
In this embodiment, the influence is large and therefore, the present invention is not used. The result of X after rotation and scaling is recorded as Y ═ Y1,,…Yn,…,YNIn which Y isn∈Rrow*col
(b) Calculating a rectangular box enclosing the handwritten digit in YnCoordinates of (2). In this embodiment, for one YnThe handwritten digit does not occupy a large range in the raw pixel sample, namely the original training sample after rotating and scaling, the size of the rectangular frame cannot be directly set as the raw pixel sample, extra calculation is needed, and the influence of the background of the handwritten digit is removed. The rectangular frame surrounding the handwritten digit in this embodiment is YnThe coordinate calculation formulas of the middle upper left corner and the lower right corner are as follows:
xln=minx[argwhere(Yn,t)]-g (4)
xrn=maxx[argwhere(Yn,t)]+g (5)
yln=miny[argwhere(Yn,t)]-g (6)
yrn=maxy[argwhere(Yn,t)]+g (7)
wherein (xl)n,yln),(xrn,yrn) Respectively as rectangular frames surrounding the handwritten digit at YnCoordinates of the middle upper left corner and the lower right corner, argwhere (Y)nAnd t) is extraction of YnCoordinates with pixel value greater than threshold t, minx,maxx,miny,maxyRepresenting taking the minimum or maximum value of the coordinate in the x or y direction, respectively. Meanwhile, in order to ensure that a certain gap exists between the rectangular frame surrounding the handwritten numeral and the handwritten numeral, a constant term g is set in the formula. The values of t and g are set to 100 and 3 in this embodiment.
(c) Generating coordinates of the handwritten digit in the sample. In this embodiment, the size of the sample is 200 × 200, and the width and length of the rectangular frame surrounding the handwritten numeral are BWn=xrn-xln,BLn=yrn-ylnThe coordinate (x, y) of the upper left corner of each handwritten numeral { (x)1,y1),…(xn,yn),…,(xN,yN) In which xn∈(0,200-BWn),yn∈(0,200-BLn) The minimum distance of the handwritten digit is at least 36 pixels, the distance is Euclidean distance, and the reference point is the center point of the handwritten digit. The operation of calculating (x, y) is as follows:
(xn,yn) Is the nth coordinate of (x, y), randomly sampling from the value range, and judging the following formula:
Figure BDA0002473152720000071
Figure BDA0002473152720000072
to calculate
Figure BDA0002473152720000073
And
Figure BDA0002473152720000074
the euclidean distance of (c). If the formula result is false, randomly sampling from the value range of the formula again and judging the formula (8) again; if the formula result is true, then the same method is used to sample and determine the next coordinate (x)n+1,yn+1) Until all coordinates are generated. The maximum number of iterations, set to 10000 in this embodiment, may be set to prevent an unexpected situation.
(d) Generating a sample of the target detection dataset. First, a background P with a pixel value of 200 × 200 is generated, and coordinates (x, Y) are added pixel by pixel as the upper left corner of the handwritten numeral Y. The specific operation is as follows: for a single handwritten number YnThe background P is set at the coordinate (x)n,yn) To (x)n+BWn,yn+BLn) Pixel in between and YnAdding corresponding pixels, and recording the result of completing the operation of N handwritten numbers as D ∈ R200*200I.e. a sample of the target detection data set.
(e) Generating a ground truth of the target detection data set, namely machine learning supervision data. The ground truth consists of a handwritten digit category label L and a rectangular box enclosing all handwritten digits in the sample. Note that all the rectangular frames B ═ B in one sample1,…Bn,…,BNIn which B isn=(xn+xln,yn+yln,xn+xrn,yn+yrn);BnContaining the coordinates (x) of the rectangle box at the upper left corner of the samplen+xln,yn+yln) And coordinates of the lower right corner (x)n+xrn,yn+yrn) Note that the integrated ground route is LB ═ L, B }; as shown in fig. 6, each line of data output by the code represents a group route of a sample, which is sequentially a file name of the sample, coordinates of the upper left corner and the lower right corner of the rectangular frame, and a handwritten number category.
The sample D and its corresponding group channel LB., which generated one target detection data set, repeat the above operations of this step until 3000 training set samples and their corresponding group channels are generated. The generation of the verification set only needs to change the extraction of X and L in operation (a) to 300 original verification set samples, and repeat the remaining operations until 1000 verification set samples and their corresponding group try are generated.
And 6, constructing a Faster R-CNN neural network and testing the effectiveness of the data set.
The Faster R-CNN neural network is a full convolution neural network (FCN), outputs the category and spatial information of the detected object, and can complete the related tasks of target detection. The effectiveness of the target detection data set is indirectly detected by constructing a Faster R-CNN neural network fitting data set and evaluating the performance of the network by using a mAP (mean Average precision) index.
Setting parameters of a network part: the epochs are 20, and the learning rate is 0.0001. FIG. 7 is an exemplary diagram of a Faster R-CNN neural network. FIG. 8 is an indication of the Faster R-CNN neural network under two super-parameters and Ap and mAp indications for each number category under one condition.
From the test results, it can be seen that the Faster R-CNN network mAp performed well and the degree of differentiation in certain numeric categories was high. In general, the data set generated with this invention is illustrated as being suitable for training and testing of neural networks.

Claims (7)

1. A general method of obtaining a target detection dataset, comprising the steps of:
step 1, obtaining a sample data set of paper and collecting the sample data set into a computer; a4 printing paper with uniform specification and attached with auxiliary writing standard marks is collected into a computer through a laser scanner, and the consistent type of numbers in each piece of A4 paper is ensured;
step 2, preprocessing the acquired digital image of the handwritten form and generating an original data set;
the preprocessing comprises image correction, graying digital images, image denoising and digital normalization processing, and cutting images obtained by scanning each piece of A4 printing paper by using an auxiliary writing standard mark, so that each cut image only contains one handwritten figure, and an original data set is sequentially generated from left to right and from top to bottom;
step 3, creating a corresponding category label; due to the classification problem involved in target detection, digital category labels are created according to the arrangement sequence of numbers in an image obtained by scanning each piece of A4 printing paper and are stored as one-dimensional vectors; the numerical arrangement sequence is consistent with the generation sequence of the original data set in the step 2;
step 4, dividing a data set; if correlation factors exist among the handwritten number fonts, the correlation factors are considered in the division of a training set and a verification set, and the difference between the training set and the verification set is ensured to be as small as possible;
step 5, generating a target detection data set;
setting parameters of single handwritten figures, such as scale and rotation angle, and adding distance constraint between the handwritten figures and quantity constraint of the handwritten figures in a generated single image for increasing robustness and effectiveness of a data set;
step 6, constructing a Faster R-CNN neural network and testing the effectiveness of a data set;
the Faster R-CNN neural network is a full convolution neural network (FCN), outputs the category and spatial information of a detected object, and can complete related tasks of target detection; the effectiveness of the target detection data set is indirectly detected by constructing a Faster R-CNN neural network fitting data set and evaluating the performance of the network by using a mAP (mean Average precision) index.
2. The method of claim 1, wherein the step 1 of obtaining a paper sample data set and collecting the paper sample data set into a computer comprises the following specific operations:
adopting uniform A4 printing paper with auxiliary writing standard marks; the auxiliary writing specification mark is A4 printing paper which is paper with squares so as to be convenient for computer segmentation and collection; collecting a sample data set through a laser scanner; a total of 3000 handwritten figures are collected and are written by 5 persons on average, 600 figures are written by each person, wherein the ratio of each figure type is the same, and the number of the used writing paper holding figures is 10-15.
3. The method of claim 2, wherein the step 2 pre-processes the captured digital handwritten image by performing the following operations:
(a) graying the image; according to different perception degrees of human eyes to colors, the generated gray level image pixel value calculation formula is as follows:
Gray(i,j)=0.299×R(i,j)+0.587×G(i,j)+0.114×B(i,j) (1)
wherein Gray is a target Gray image matrix, R, G, B respectively corresponding to three channel matrices of the input handwritten digital image; i. j respectively corresponds to the rows and columns of the target gray image matrix;
(b) denoising the image; denoising the grayed digital image of the handwriting body by utilizing an image filtering technology;
(c) inverting the image; the handwritten digital image is inverted, and the calculation formula is as follows:
Neg(i,j)=255-Gray(i,j) (2)
wherein Neg is an image matrix obtained after inversion;
(d) unifying the size of the images; the number of the adopted writing paper holding numbers is 20 × 15, the size of a single number is set to be 28 × 28, and the uniform size of the image is 560 × 420 pixels;
(e) cutting the processed digital handwriting image to make each output image only contain one handwriting digit; however, considering the existence of auxiliary writing marks, namely spacing lines between numbers, the pixel values of the outermost circles in the extracted 28 × 28 pixels are set to be zero, and the specific number of circles can be manually selected according to actual conditions; the reference range is 1 to 3 circles of pixels, and the calculation formula is:
Figure FDA0002473152710000031
wherein Cropped is a single clipped handwritten digital image, i ═ 0,1, …,27}, j ═ 0,1, …,27}, which respectively represent rows and columns, and k is a value of 2; in scanning each sheet of a4 paper with the aid of the writing-assist specification marks to obtain an image, the original data sets are generated sequentially from left to right and from top to bottom.
4. The general method for acquiring a target detection data set according to claim 3, wherein the step 3 creates the corresponding category label by the following specific operations:
due to the classification problem involved in target detection, digital category labels are created according to the arrangement sequence of the numbers in each A4 printing paper scanned to obtain an image and are stored as one-dimensional vectors; the category label is a number corresponding to the handwritten number; this arrangement order is consistent with the generation order of the original data set in step 2; the class labels need to be subjected to One-Hot coding, so that the problem that attribute data are not easy to process by a classifier is solved, and the function of expanding features is achieved to a certain extent.
5. The method of claim 4, wherein the step 4 dividing the data set is performed by:
according to the correlation factor between the handwritten number fonts, a training set and a verification set are divided, and the difference between the two sets is ensured to be as small as possible; the correlation factor can be considered as the style of handwritten figures, the styles of handwritten figures are different due to different handwriting habits among different people, and the styles of handwritten figures of the same person in different periods can also be different; 3000 handwritten figures are written by 5 persons on average, so 5 handwritten figures with different styles are considered; therefore, when the original data set is divided, the quantity of different types of labels and different styles in the obtained training set and verification set is ensured to be consistent; the training set and the validation set are in a ratio of 9 to 1, the training set contains 2700 handwritten digital samples, each category has 270, and each style has 540; verification set as such, there are 30 for each category and 60 for each style in 300 verification set samples.
6. The method of claim 5, wherein the step 5 generates the target detection data set by the following steps:
firstly, determining parameters of a target detection data set; the parameters include: generating the size and the total number of samples of the target detection data set, and the scaling range, the rotation angle change range, the number range and the minimum distance of handwritten numbers of the handwritten numbers in each sample;
generating 3000 training set samples of 200 x 200 pixel values from 2700 original training set samples, and similarly generating 1000 validation set samples from 300 original validation set samples; in each sample: the number range is (4, 6), the rotation angle variation range is (-15 degrees, 15 degrees), the scaling range of the handwritten numbers is (1.5, 3), and the minimum distance between the central points is 36 pixels;
generating a single target detection training set sample and ground truth machine learning supervision data;
(a) generating handwritten numbers in the sample; from established quantitative normsRandomly generating the number N of handwritten numbers in the samples in the circle (4, 6), and randomly extracting N handwritten numbers X in 2700 original training set samples as { X ═ X1,...Xn,...,XNIn which xn∈R28*28And a corresponding class label L ═ { L ═ L1,...Ln,...,LNIn which L isn∈R10(ii) a Then randomly generating scaling and rotation angles for each handwritten digit from the established range; firstly, rotating the handwritten numbers and then zooming; the rotation center is the original sample center, and the scaling calculation formula is as follows: row ═ m1*28,col=m2Row 28, row, col are rows and columns, m, respectively, of scaled handwritten numbers1,m2For the scaling factor, the length and width are scaled according to the same proportion
Figure FDA0002473152710000041
m1,2∈ (1.5, 3); note that the result of X rotation and scaling is Y ═ Y { (Y)1,,...Yn,...,YNIn which Y isn∈Rrow*col
(b) Calculating a rectangular box enclosing the handwritten digit in YnCoordinates of (5); for one YnThe handwritten digit does not occupy a large range in a row col pixel sample, namely an original training sample after rotating and scaling, the size of a rectangular frame cannot be directly set as row col, extra calculation is needed, and the influence of the background of the handwritten digit is removed; the rectangular frame surrounding the handwritten digit is in YnThe coordinate calculation formulas of the middle upper left corner and the lower right corner are as follows:
xln=minx[argwhere(Yn,t)]-g (4)
xrn=maxx[argwhere(Yn,t)]+g (5)
yln=miny[argwhere(Yn,t)]-g (6)
yrn=maxy[argwhere(Yn,t)]+g (7)
wherein (xl)n,yln),(xrn,yrn) Respectively as rectangular frames surrounding the handwritten digit at YnCoordinates of the middle upper left corner and the lower right corner, argwhere (Y)nAnd t) is extraction of YnCoordinates with pixel value greater than threshold t, minx,maxx,miny,maxyRepresenting the minimum or maximum value of the coordinate in the x or y direction, respectively; meanwhile, in order to ensure that a certain gap exists between a rectangular frame surrounding the handwritten number and the handwritten number, a constant term g is set in the formula;
(c) generating coordinates of the handwritten digit in the sample; the size of the sample is 200 x 200, and the width and length of the rectangular frame surrounding the handwritten digit are BWn=xrn-xln,BLn=yrn-ylnThe coordinate (x, y) of the upper left corner of each handwritten numeral { (x)1,y1),...(xn,yn),...,(xN,yN) In which xn∈(0,200-BWn),yn∈(0,200-BLn) The minimum distance of the handwritten digit is at least 36 pixels, the distance is Euclidean distance, and the reference point is the central point of the handwritten digit; the operation of calculating (x, y) is as follows:
(xn,yn) Is the nth coordinate of (x, y), randomly sampling from the value range, and judging the following formula:
Figure FDA0002473152710000051
Figure FDA0002473152710000052
to calculate
Figure FDA0002473152710000053
And
Figure FDA0002473152710000054
the Euclidean distance of (c); if the formula result is false, randomly sampling from the value range of the formula again and judging the formula (8) again; if the formula results inIf true, the same method is used to sample and determine the next coordinate (x)n+1,yn+1) Until all coordinates are generated; the maximum iteration times can be set to prevent the occurrence of accidents;
(d) generating a sample of the target detection dataset; firstly, generating a background P with 200-200 pixel values as zero, taking coordinates (x, Y) as the upper left corner of a handwritten number Y, and adding the coordinates one by one; the specific operation is as follows: for a single handwritten number YnThe background P is set at the coordinate (x)n,yn) To (x)n+BWn,yn+BLn) Pixel in between and YnAdding corresponding pixels, and recording as D ∈ R after N handwritten digit operations200*200I.e. samples of the target detection data set;
(e) generating a ground truth of the target detection data set, namely machine learning supervision data; the ground route is composed of a handwritten number type label L and a rectangular frame surrounding all handwritten numbers in the sample; note that all the rectangular frames B ═ B in one sample1,...Bn,...,BNIn which B isn=(xn+xln,yn+yln,xn+xrn,yn+yrn);BnContaining the coordinates (x) of the rectangle box at the upper left corner of the samplen+xln,yn+yln) And coordinates of the lower right corner (x)n+xrn,yn+yrn) Note that the integrated ground route is LB ═ L, B }; each line of data output by the codes represents a ground route of a sample, and sequentially comprises a file name of the sample, coordinates of the upper left corner and the lower right corner of a rectangular frame and a handwritten number type;
repeating the above operations of the step until 3000 training set samples and their corresponding group truth are generated, wherein the sample D and the corresponding group truth LB. generate one target detection data set; the generation of the verification set only needs to change the extraction of X and L in operation (a) to 300 original verification set samples, and repeat the remaining operations until 1000 verification set samples and their corresponding group try are generated.
7. The universal method for acquiring a target detection data set according to claim 6, wherein step 6, a Faster R-CNN neural network is constructed and the validity of the data set is tested, and the specific operation is as follows;
the Faster R-CNN neural network is a full convolution neural network, outputs the category and space information of the detected object, and can complete the related task of target detection; the method comprises the steps of (1) constructing a fast R-CNN neural network fitting data set, and evaluating the effectiveness of a target detection data set by network performance indirect detection by using a mAP (mean Average precision) index;
setting parameters of a network part: the epochs are 20, and the learning rate is 0.0001.
CN202010355022.5A 2020-04-29 2020-04-29 Universal method for acquiring target detection data set Active CN111612045B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010355022.5A CN111612045B (en) 2020-04-29 2020-04-29 Universal method for acquiring target detection data set

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010355022.5A CN111612045B (en) 2020-04-29 2020-04-29 Universal method for acquiring target detection data set

Publications (2)

Publication Number Publication Date
CN111612045A true CN111612045A (en) 2020-09-01
CN111612045B CN111612045B (en) 2023-06-23

Family

ID=72198382

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010355022.5A Active CN111612045B (en) 2020-04-29 2020-04-29 Universal method for acquiring target detection data set

Country Status (1)

Country Link
CN (1) CN111612045B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112396009A (en) * 2020-11-24 2021-02-23 广东国粒教育技术有限公司 Calculation question correcting method and device based on full convolution neural network model
CN113920517A (en) * 2021-10-11 2022-01-11 广东电网有限责任公司广州供电局 OCR recognition effect evaluation method and device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109977958A (en) * 2019-03-25 2019-07-05 中国科学技术大学 A kind of offline handwritten form mathematical formulae identification reconstructing method
CN110399845A (en) * 2019-07-29 2019-11-01 上海海事大学 Continuously at section text detection and recognition methods in a kind of image
EP3591582A1 (en) * 2018-07-06 2020-01-08 Tata Consultancy Services Limited Method and system for automatic object annotation using deep network

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3591582A1 (en) * 2018-07-06 2020-01-08 Tata Consultancy Services Limited Method and system for automatic object annotation using deep network
CN109977958A (en) * 2019-03-25 2019-07-05 中国科学技术大学 A kind of offline handwritten form mathematical formulae identification reconstructing method
CN110399845A (en) * 2019-07-29 2019-11-01 上海海事大学 Continuously at section text detection and recognition methods in a kind of image

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
YANG J等: "Handwriting text recognition based on faster R-CNN" *
陈英等: "智能电能表数字识别算法研究" *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112396009A (en) * 2020-11-24 2021-02-23 广东国粒教育技术有限公司 Calculation question correcting method and device based on full convolution neural network model
CN113920517A (en) * 2021-10-11 2022-01-11 广东电网有限责任公司广州供电局 OCR recognition effect evaluation method and device

Also Published As

Publication number Publication date
CN111612045B (en) 2023-06-23

Similar Documents

Publication Publication Date Title
CN111814722B (en) Method and device for identifying table in image, electronic equipment and storage medium
CN111325203B (en) American license plate recognition method and system based on image correction
CN110503054B (en) Text image processing method and device
WO2017016240A1 (en) Banknote serial number identification method
CN106778586A (en) Offline handwriting signature verification method and system
CN104715256A (en) Auxiliary calligraphy exercising system and evaluation method based on image method
CN105046200B (en) Electronic paper marking method based on straight line detection
CN111091124B (en) Spine character recognition method
CN112307919B (en) Improved YOLOv 3-based digital information area identification method in document image
CN115457565A (en) OCR character recognition method, electronic equipment and storage medium
CN105117741A (en) Recognition method of calligraphy character style
CN102737240B (en) Method of analyzing digital document images
CN112712273A (en) Handwritten Chinese character beauty evaluation method based on skeleton similarity
CN112241730A (en) Form extraction method and system based on machine learning
JP3228938B2 (en) Image classification method and apparatus using distribution map
CN108052936B (en) Automatic inclination correction method and system for Braille image
CN111612045B (en) Universal method for acquiring target detection data set
CN110222660B (en) Signature authentication method and system based on dynamic and static feature fusion
CN116824608A (en) Answer sheet layout analysis method based on target detection technology
JP3428494B2 (en) Character recognition device, its character recognition method, and recording medium storing its control program
CN108062548B (en) Braille square self-adaptive positioning method and system
CN114241486A (en) Method for improving accuracy rate of identifying student information of test paper
CN109740618B (en) Test paper score automatic statistical method and device based on FHOG characteristics
CN112633116A (en) Method for intelligently analyzing PDF (Portable document Format) image-text
CN111325194B (en) Character recognition method, device and equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant