CN111353976A - Sand grain target detection method based on convolutional neural network - Google Patents
Sand grain target detection method based on convolutional neural network Download PDFInfo
- Publication number
- CN111353976A CN111353976A CN202010114804.XA CN202010114804A CN111353976A CN 111353976 A CN111353976 A CN 111353976A CN 202010114804 A CN202010114804 A CN 202010114804A CN 111353976 A CN111353976 A CN 111353976A
- Authority
- CN
- China
- Prior art keywords
- network
- image
- sand
- convolutional
- sand grain
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 94
- 239000004576 sand Substances 0.000 title claims abstract description 78
- 238000013527 convolutional neural network Methods 0.000 title claims abstract description 27
- 238000012549 training Methods 0.000 claims abstract description 40
- 238000000034 method Methods 0.000 claims abstract description 29
- 230000010287 polarization Effects 0.000 claims abstract description 20
- 238000007781 pre-processing Methods 0.000 claims abstract description 18
- 230000008569 process Effects 0.000 claims abstract description 13
- 230000004913 activation Effects 0.000 claims description 8
- 238000010606 normalization Methods 0.000 claims description 5
- 238000013507 mapping Methods 0.000 claims description 4
- 238000003384 imaging method Methods 0.000 claims description 3
- 230000005764 inhibitory process Effects 0.000 claims description 2
- 238000012545 processing Methods 0.000 claims description 2
- 238000013528 artificial neural network Methods 0.000 abstract description 5
- 230000006870 function Effects 0.000 description 19
- 230000009467 reduction Effects 0.000 description 18
- 238000010586 diagram Methods 0.000 description 5
- 230000009466 transformation Effects 0.000 description 4
- 238000005070 sampling Methods 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 2
- 238000005388 cross polarization Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 230000008034 disappearance Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 230000003213 activating effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000001629 suppression Effects 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/0002—Inspection of images, e.g. flaw detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/13—Edge detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10004—Still image; Photographic image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Evolutionary Biology (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Quality & Reliability (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a sand target detection method based on a convolutional neural network, which comprises the following steps: 1) designing a convolution network structure, and stacking a convolution module and a residual error module; adding a double-end input structure and a multi-scale detection structure; 2) preprocessing is carried out based on the sand grain images and the labels; constructing a training data set based on the marked sand grain images; 3) training a convolution network based on a training data set, wherein the training process comprises the steps of defining an objective function and optimizing the training process; 4) and predicting the target position of the sand grain image by applying the trained convolutional network. The invention fully utilizes the characteristics of the single polarization image and the orthogonal polarization image, applies the convolution neural network technology and improves the detection precision and the detection efficiency; the method has the advantages of high network training speed, capability of quickly finishing the sand target detection, suitability for automatic detection of mass sand images, and good expansibility, robustness and practicability.
Description
Technical Field
The invention belongs to the field of image detection and identification, and particularly relates to a sand target detection method based on a convolutional neural network.
Background
In the geological field, the classification and statistics task of sand grains is always an important part of sand grain research, and sand grain target detection based on sand grain images is one of the most basic links. The traditional sand classification statistical task firstly needs to manually mark out a sand target in an image, but due to the particularity of the sand image, the marking method is low in efficiency and poor in repeatability when facing a large amount of image data. Specifically, the contrast between the target in the sand grain image and the background is weak, and the accuracy of annotation may be low due to manual annotation of a large amount of image data; the marking task of the sand target has certain speciality, and a large amount of manpower is not easy to be started for marking.
Object detection is a technique for detecting a specific object in an image using deep learning. The automation of the detection process can be realized by training the deep network through a certain amount of labeled data. Therefore, in the sand target detection task, a certain amount of sand image data is used for training the network, and automatic detection with high accuracy can be realized. The detection method utilizing the target detection technology can obviously reduce the workload of manual labeling and simultaneously ensure the detection precision.
The current mainstream target detection method based on the deep learning technology comprises the following steps: the method divides a target detection task into two subtasks, namely a feature frame extraction task and a feature frame classification task, and has high detection precision but large detection time consumption; the single-stage detector based on one-time scanning is different from the double-stage detector, the target detection task is regarded as a whole by the method, the detection precision of the method is lower than that of the double-stage detector, and the detection time consumption is higher. The target detection method can effectively detect the target in the image containing the natural object, but if the method is directly applied to a sand grain detection task, the method cannot achieve better detection precision.
Disclosure of Invention
In view of the above-mentioned shortcomings of the prior art, the present invention aims to provide a method for detecting a sand target based on a Convolutional Neural Network (CNN), which uses a full convolutional network with a residual structure, trains the convolutional network by using an annotated data set, automatically extracts features in a single-polarization image and an orthogonal-polarization image based on the trained convolutional network, and finally predicts the sand target in the sand image by using the features.
In order to achieve the purpose, the technical scheme adopted by the invention is as follows:
the invention discloses a sand grain target detection method based on a convolutional neural network, which comprises the following steps of:
1) designing a convolution network structure, and adding a double-end input structure and a multi-scale detection structure;
2) preprocessing is carried out based on the sand grain images and the labels;
3) training a convolutional network;
4) and predicting the target position of the sand grain image by applying the trained convolutional network.
Further, the convolution network structure in the step 1) is formed by stacking a convolution module and a residual error module.
Further, the convolution module in the step 1) is formed by serially connecting a convolution layer, a Batch Normalization (BN) layer and an activation layer.
The convolution strategy of the convolution layer is divided into two types of dimension reduction strategy and dimension non-reduction strategy according to different use conditions, the convolution kernel of the dimension non-reduction strategy has two sizes which are 1 × 1 and 3 × 3 respectively, and the convolution kernel of the dimension reduction strategy has a size of 3 × 3.
For a non-dimensionality reduction strategy with a convolution kernel size of 1 × 1, the input is Cin× W × H, with a convolution kernel of Cout×Cin× 1 × 1, and the output O dimension of the convolutional layer is Cout× W × H at [ c, x, y ]],1≤c≤CoutX is more than or equal to 1 and less than or equal to W, and y is more than or equal to 1 and less than or equal to H, the value of the position is as follows:
for the non-dimensionality reduction strategy with a convolution kernel of 3 × 3, the input is Cin× (W +2) × (H +2), the output O has dimensions [ c, X, y ]],1≤c≤CoutX is more than or equal to 1 and less than or equal to W, and y is more than or equal to 1 and less than or equal to H, the value of the position is as follows:
for a dimensionality reduction strategy with a convolution kernel of 3 × 3, the input is Cin× tensor X of W × H, where W and H are even numbers and the convolution kernel is Cout×Cin× 1 × 1, and the dimension of the convolutional layer output O is Cout× W × H at [ c, x, y ]],1≤c≤CoutX is more than or equal to 1 and less than or equal to W, and y is more than or equal to 1 and less than or equal to H, the value of the position is as follows:
the batch normalization layer is used for keeping the same distribution of input and output under the condition of small batch input so as to prevent the condition of slow training convergence caused by output distribution deviation when the number of network layers is increased; for each hidden layer, in small batch training, if the batch size is m, x is input for activation(k)K is more than or equal to 1 and less than or equal to m, and output after batch standardization operationComprises the following steps:
wherein E [. cndot. ] represents a mathematical expectation; var [. cndot. ] represents variance;
and in order to increase the network expression capacity, two adjusting parameters gamma and β are added for carrying out inverse transformation on the transformed activation:
the active layer adds non-linearity to the network using the Leaky ReLU function, whose output has a small gradient to the negative input, defined as follows:
wherein LReLU (·) represents a Leaky ReLU function; x is input; y is the output; k is a negative gradient.
Further, the residual error module in the step 1) is formed by cascading a zero padding layer, a convolution unit and a residual error structure module.
The Zero Padding (Zero Padding) layer is used to augment the input to fit the underlying input: will have a size of Cin× W × H input expansion to Cin× (W +1) × (H +1), a convolution unit uses a convolution kernel of a dimensionality reduction strategy and is defined by the formula (3), a residual error structure module is formed by connecting two convolution kernel sizes of non-dimensionality reduction strategy convolution units of 1 × 1 and 3 × 3 respectively through residual errors and is defined by the formulas (1) and (2), and the residual error connection uses a short-circuit mechanism to relieve the gradient disappearance problem caused by increasing the depth in the neural network, so that the neural network becomes easier to optimize.
Further, the double-end input structure of the convolution network structure in the step 1) is used for inputting the single-polarization image and the orthogonal-polarization imaging image into the network simultaneously, so that the network can learn the characteristics of the orthogonal-polarization image and the single-polarization image simultaneously during training; each end of the double-ended input structure is identical in structure but does not share parameters.
The network needs to combine the two-end inputs after inputting the structure and generate three branches for accessing the detection networks with three dimensions, namely large, medium and small.
Further, the multi-scale detection structure in the step 1) is used for detecting sand targets in the image at different scales; the multi-scale detection network uses three scales of large, medium and small; contains 5 modules: large-scale detection structure, large-scale medium-scale branch structure, medium-scale detection structure, medium-scale small-scale detection structure, and small-scale detection structure.
The number of tensor channels output by the three dimensions of the network is B × 5, wherein B represents the number of target frames in an original image mapped by each vector in the tensor, 5 represents the confidence coefficient of the target frames and the offset of the x-axis direction, the y-axis direction, the length and the width of the corresponding preset anchor frame, and for accurately obtaining the information of the target frames, the position and the confidence coefficient of the output tensor are transformed, the position offset of a certain target frame is respectively delta x, delta y, delta w and delta h, wherein delta x and delta y both represent the offset of the central position of the target frame, and the predicted confidence coefficient is coThe transformed prediction result is calculated as follows:
x=[sigmoid(Δx)+gx]·s;
y=|sigmoid(Δy)+gy|·s;
w=pw·eΔw·s;
h=ph·eΔh·s;
c=sigmoid(co);
wherein sigmoid (·) represents a sigmoid function; gxAnd gyRespectively representing the positions of grids with centers; s is a scale scaling coefficient, the small scale is 8, the medium scale is 16, and the large scale is 32; p is a radical ofwAnd phRespectively, the length and width of the anchor frame.
Further, the step 2) specifically includes: the training data set constructed based on the labeled sand grain images comprises two parts: an image part and a label part; the image part is a group of single polarization and orthogonal polarization images; the label part is the position of the target sand grain in each image.
Further, the preprocessing of the sand grain image and the annotation in the step 2) specifically comprises:
preprocessing a sand image:
21) downsampling or upsampling the image to make the longer side of the image match the input size of the convolution network;
22) the shorter edge is extended to the convolution input size, with an extended pixel value of 125;
23) normalizing the image;
preprocessing the marked data:
24) converting the position information of the annotation data according to the scaling of the image data;
25) comparing the intersection ratio of each piece of position information and the anchor frame, mapping the position information larger than a certain threshold value to a tensor which has the same size as the output tensor, and storing other position information independently;
26) and obtaining the tensors of the three scales, and keeping the tensors consistent with the output tensors of the convolutional network.
Further, the step 3) specifically includes: training a convolutional network based on a training data set, comprising: defining an objective function and optimizing a training process.
Further, the loss function of the convolutional network in the step 3) is lost by a bounding boxAnd confidence lossThe two parts are as follows:
wherein, the loss of each part simultaneously considers the loss of different prediction frames under different scales, and the formula is as follows:
wherein S islL is more than or equal to 1 and less than or equal to 3 and is the output size under the first scale;
bounding box loss the G-IoU function between two bounding boxes A, B was calculated using the G-IoU loss, where a is the prediction box and B is the label (true) box, and the formula is as follows:
where C is the smallest convexity of A and B, IoU (A, B) represents the intersection ratio of A and B, defined as:
based on the above, the bounding box loss is:
wherein, Cl,i,jRepresenting the true confidence of the grid in which it is located;indicating whether the predicted bounding box is valid, and IoU ≧ 0.3 associated with the true bounding box is considered valid; w, H denote the width and height of the input image, respectively; o represents a real bounding box; w and h respectively represent the width and the height of the real bounding box;a bounding box representing the prediction;
confidence loss using cross entropy loss with sigmoid and logist, the formula is as follows:
wherein, Cl,i,jIs the prediction confidence; CEL (-) is a cross-entropy loss function with sigmoid and logist:
CEL(X,Y)=-[logist(Y)·log(σ(X))+(1-logist(Y))·log(1-σ(X))](14)
wherein the logist function is:
further, the optimizer used in the training phase of the convolutional network in the step 3) is an Adam optimizer; the learning rate scheduler adopts Linear cosine scheduling, and the learning rate lr (i) of the training stage i is:
wherein, W is the number of training stages of the linear stage; lr of0Setting the basic learning rate as 1 e-6; lr ofmaxThe maximum learning rate is set to 1 e-3.
Further, the processing procedure of predicting the target position of the sand grain image in the step 4) is as follows: preprocessing the sand grain image by applying a trained convolutional network, inputting two orthogonally polarized and single polarized sand grain images into the convolutional network to obtain prediction tensors of three scales, setting a confidence threshold value of 0.3 for all prediction frames, and deleting the prediction frames with confidence coefficients smaller than the threshold value; and finally, removing redundant prediction frames by using non-maximum value inhibition, outputting the residual prediction frames as prediction results, and marking the prediction results on the original sand grain image.
The invention has the beneficial effects that:
the method can prevent the influence of small difference of different types of sand characteristics on the detection precision, fully utilizes the characteristics of the single-polarization image and the orthogonal-polarization image, simplifies the network structure and improves the detection precision and the detection efficiency; the method has the advantages of high network training speed, capability of quickly finishing the sand target detection, suitability for automatic detection of mass sand images, and good expansibility, robustness and practicability.
Drawings
FIG. 1 is an overall block diagram of the process of the present invention;
FIG. 2 is a block diagram of a sand image pre-processing process;
FIG. 3a is a schematic diagram of cross-polarization input;
FIG. 3b is a schematic diagram of a single polarization input;
FIG. 3c is a graph showing the results of cross polarization prediction;
FIG. 3d is a graph showing the single polarization prediction result;
fig. 4 is a structural diagram of a convolutional network designed by the present invention.
Detailed Description
In order to facilitate understanding of those skilled in the art, the present invention will be further described with reference to the following examples and drawings, which are not intended to limit the present invention.
The invention discloses a sand grain target detection method based on a convolutional neural network, which utilizes a trained CNN to automatically extract the characteristics of a single polarization image and an orthogonal polarization image corresponding to sand grains, and predicts the sand grain target in the image according to the characteristics. Different from the traditional target detection method, the method abandons the classification part in the traditional target detection frame and focuses on the prediction of the target position; the method inputs the orthogonal polarization image and the single polarization image simultaneously so as to fully utilize the characteristics of the single polarization image and the orthogonal polarization image.
Referring to fig. 1, the present invention adopts the following steps:
1) designing a convolution network structure;
2) preprocessing is carried out based on the sand grain images and the labels; a training data set is constructed based on labeled sand images, and comprises two parts: an image part and a label part; the image part is a group of single polarization and orthogonal polarization images; the marking part is the position of the target sand in each image;
3) training a convolution network based on a training data set, wherein the training process comprises the steps of defining an objective function and optimizing the training process;
4) and predicting the target position of the sand grain image by applying the trained convolutional network.
The network structure in step 1) is shown in fig. 4, and the network is a full convolution network using a residual error structure, is formed by stacking convolution units and residual error modules, and includes a double-end input structure and a multi-scale detection structure.
As shown in [1] of fig. 4, the convolution module in the network is formed by connecting a convolution (conv) layer, a Batch Normalization (BN) layer, and a leakage relu activation layer in series.
The convolution strategy of the convolution layer is divided into two types of dimension reduction strategy and dimension non-reduction strategy according to different use conditions, wherein the convolution kernel of the dimension non-reduction strategy has two sizes which are 1 × 1 and 3 × 3 respectively, and the convolution kernel size of the dimension reduction strategy is 3 × 3.
For a non-dimensionality reduction strategy with a convolution kernel size of 1 × 1, the input is Cin× W × H, with a convolution kernel of Cout×Cin× 1 × 1, and the output O dimension of the convolutional layer is Cout× W × H at [ c, x, y ]],1≤c≤CoutX is more than or equal to 1 and less than or equal to W, and y is more than or equal to 1 and less than or equal to H, the value of the position is as follows:
for the non-dimensionality reduction strategy with a convolution kernel of 3 × 3, the input is Cin× (W +2) × (H +2), the output O has dimensions [ c, X, y ]],1≤c≤CoutX is more than or equal to 1 and less than or equal to W, and y is more than or equal to 1 and less than or equal to H, the value of the position is as follows:
for a dimensionality reduction strategy with a convolution kernel of 3 × 3, the input is Cin× tensor X of W × H, where W and H are even numbers and the convolution kernel is Cout×Cin× 1 × 1, and the dimension of the convolutional layer output O is Cout× W × H at [ c, x, y ]],1≤c≤CoutX is more than or equal to 1 and less than or equal to W, and y is more than or equal to 1 and less than or equal to H, the value of the position is as follows:
the batch normalization layer is used for keeping the same distribution of input and output under the condition of small batch input so as to prevent the condition of slow training convergence caused by output distribution deviation when the number of network layers is increased; for each hidden layer in the network, the input distribution which is gradually mapped to the nonlinear function and then drawn close to the extreme saturation region of the value-taking interval is converted into the standard normal distribution which is approximate to the standard that the mean value is 0 and the variance is 1, so that the input value of the nonlinear transformation function is positioned in a region which is sensitive to input, and the problem of gradient disappearance is avoided. For a hidden layer in the network, inFor small training batches, if the batch size is m, x is used for activating input(k)K is more than or equal to 1 and less than or equal to m, and output after batch standardization operationComprises the following steps:
wherein E [. cndot. ] represents a mathematical expectation; var [. cndot. ] represents variance;
and converting the activation input into a linear region of nonlinear transformation to enhance the mobility of back propagation information and accelerate the convergence speed of training, but also leading the network expression capacity to be reduced, and in order to prevent the network expression capacity from being reduced, adding two adjusting parameters gamma and β for carrying out inverse transformation on the transformed activation:
the activation layer uses a Leaky ReLU function to add nonlinearity to the network; compared to the ReLU function, the output has a small gradient for negative inputs, defined as follows:
wherein LReLU (·) represents a Leaky ReLU function; x is input; y is the output; k is a negative gradient.
As shown in FIG. 4 [3 ]]As shown, a residual error unit of a network in the network is formed by cascading a Zero Padding (ZP) layer, a convolution unit and a plurality of residual error structure modules. The Zero Padding layer is used to augment the input to fit the underlying input: will have a size of Cin× W × H input expansion to Cin× (W +1) × (H +1) convolution unit uses the dimensionality reduction strategy convolution kernel defined by the formula (3) and a residual error structure module is formed by connecting two non-dimensionality reduction strategy convolution units respectively using 1 × 1 and 3 × 3 convolution kernels through residual errors and respectively comprises the formula (3)The formulas (1) and (2) are defined as [2 ] in FIG. 4]As shown. Residual concatenation uses a short-circuit mechanism to mitigate the gradient vanishing problem with increasing depth in neural networks. A direct correlation channel is established between input and output in an identity mapping mode, so that a network can intensively learn residual errors between the input and the output, and the neural network becomes easier to optimize.
The double-end input structure in the network is used for inputting the single-polarization image and the orthogonal-polarization imaging image into the network simultaneously, so that the network can learn the characteristics of the orthogonal-polarization image and the single-polarization image simultaneously during training; in the testing stage, the network can more accurately predict the sand positions in the image through the orthogonal polarization image and the single polarization image. The structure of each end of the double-end input structure is the same, but does not share parameters, and the structure of a possible input sub-network is shown as [4] in fig. 4 and table 1:
TABLE 1
The network needs to combine the double-end inputs after inputting the structure and generate three branches for accessing the detection networks with large, medium and small scales respectively; one possible connection and multi-scale generation structure is shown in [5] in FIG. 4 and Table 2:
TABLE 2
A multi-scale detection structure in the network is used for detecting sand targets in the image at different scales. Particularly, for small target objects in the image, the effect of using a smaller scale for detection is better; similarly, for a large target object in the image, the effect of using large-scale detection is better. Generally, the more detection scales are used in the multi-scale structure, the higher the detection accuracy of the model is, but the longer the detection time may be. In order to balance the detection time and the detection precision of the network, the network provided by the embodiment uses a large, medium and small three-scale detection structure. The multi-scale detection network uses the FPN structure, and the structure of a possible three-scale detection network is shown as [6] to [10] in fig. 4. Specifically, the system comprises 5 modules which are respectively: large-scale detection structure, large-scale medium-scale branch structure, medium-scale detection structure, medium-scale small-scale detection structure, and small-scale detection structure.
The large-scale detection structure is shown as [6] in FIG. 4 and in the following Table 3:
TABLE 3
The large-scale medium-scale branch structure is shown in the following table 4 and [7] in fig. 4:
TABLE 4
The mesoscale detection structure is shown in [8] in FIG. 4 and in Table 5 below:
TABLE 5
The meso-scale versus micro-scale detection structure is shown in fig. 4 [9] and table 6 below:
TABLE 6
The small scale detection structure is shown as [10] in fig. 4 and in table 7 below:
TABLE 7
The number of tensor channels of the three-scale output of the network is all B × 5, wherein B represents the number of target frames of the RoI prediction of the original image mapped by each vector in the tensor, and 5 represents the confidence of the predicted target frames and the offset of the X-axis direction, the Y-axis direction, the length direction and the width direction of the corresponding anchor frame.
Assuming that the size of the input image is 608 × 608 and B is 3, the output of CNN is 3 tensors, each being a small-scale tensor T with a size of 72 × 72 × 15s26 × 26 × 15 medium scale tensor TmAnd a large scale tensor T of size 13 × 13 × 15l。
Then, transforming the position and confidence of the output tensor; specifically, assume that the position offset of a certain target frame is Δ x, Δ y, Δ w, Δ h, respectively, where Δ x and Δ y both represent the offset of the center position of the target frame, and the confidence of the prediction is coThe transformed prediction result is calculated as follows:
x=[sigmoid(Δx)+gx]·s;
y=[sigmoid(Δy)+gy]·s;
w=pw·eΔw·s;
h=ph·eΔh·s;
c=sigmoid(co);
wherein sigmoid (·) represents a sigmoid function; gxAnd gyRespectively representing the positions of grids with centers; s is a scale scaling coefficient, the small scale is 8, the medium scale is 16, and the large scale is 32; p is a radical ofwAnd phRespectively, the length and width of the anchor frame.
After decoding the output tensor of the CNN, the data in the tensor directly represents the target frame information of the network prediction.
The training phase first requires preprocessing of the data set data. The data set data contains two parts: an image part and an annotation part. The image part is a group of single polarization and orthogonal polarization images; the annotation data is a target location in the image. The purpose of the data preprocessing is to match the image data to the input of the CNN and the annotation data to the output of the CNN.
The preprocessing process for the image data in the above step 2) is shown in fig. 2. Firstly, down-sampling or up-sampling an image to enable the longer side of the image to be matched with the input size of the CNN; secondly, the shorter edge is expanded to the input size of CNN, and the general expanded pixel value is 125; finally, the image is normalized. Compared with the traditional scheme of directly sampling two edges of the image to the input size of the CNN, the method can keep the aspect ratio of the target in the image and prevent the network from learning distorted target information.
The pre-processing process of the marked data comprises the following steps: firstly, converting the position information of the marking data according to the scaling of the image data; secondly, comparing the intersection ratio of each piece of position information and the anchor frame, mapping the position information larger than a certain threshold value to a tensor which has the same size as the output tensor, and storing other position information independently; and finally, obtaining three scales of tensors consistent with the CNN output tensors in size.
Loss function of the convolutional network in the step 3) is lost by the bounding boxAnd confidence lossThe two parts are as follows:
wherein, the loss of each part simultaneously considers the loss of different prediction frames under different scales, and the formula is as follows:
wherein S islL is more than or equal to 1 and less than or equal to 3 and is the output size under the first scale;
bounding box loss the G-IoU function between two bounding boxes A, B was calculated using the G-IoU loss, where a is the prediction box and B is the label (true) box, and the formula is as follows:
where C is the smallest convexity of A and B, IoU (A, B) represents the intersection ratio of A and B, defined as:
based on the above, the bounding box loss is:
wherein, Cl,i,jRepresenting the true confidence of the grid in which it is located;indicating whether the predicted bounding box is valid, and IoU ≧ 0.3 associated with the true bounding box is considered valid; w, H denote the width and height of the input image, respectively; o represents a real bounding box; w and h respectively represent the width and the height of the real bounding box;a bounding box representing the prediction;
confidence loss using cross entropy loss with sigmoid and logist, the formula is as follows:
wherein, Cl,i,jIs the prediction confidence; CEL (-) is a cross-entropy loss function with sigmoid and logist:
CEL(X,Y)=-[logist(Y)·log(σ(X))+(1-logist(Y))·log(1-σ(X))](14)
wherein the logist function is:
the optimizer used in the training phase of the convolutional network in the step 3) is an Adam optimizer; the learning rate scheduler adopts Linear cosine scheduling, and the learning rate lr (i) of the training stage i is:
wherein, W is the number of training stages of the linear stage; lr of0Setting the basic learning rate as 1 e-6; lr ofmax1e-3 for the maximum learning rate.
The specific process of the detection process in the step 4) is as follows: firstly, preprocessing a sand grain image; then inputting two orthogonal polarized light and single polarized light images to be detected into a network to obtain a three-scale prediction tensor, setting a confidence threshold for all prediction frames, deleting the prediction frames with the confidence lower than the threshold, and generally setting the threshold as 0.3; and finally, removing redundant prediction frames by using non-maximum value suppression, outputting the residual prediction frames as prediction results, and marking the prediction results on the original image.
As shown in fig. 3a to fig. 3d, through experiments, the sand target detection method provided by the present invention has a fast network training speed, can quickly complete sand target detection, is suitable for automatic detection of a large amount of sand images, and has good expansibility, robustness and practicability. Specifically, the detection rate of the sand target in the test set reaches 97.40%, and the detection rate of AP reaches 90.98%. The requirement of a sand target detection task is completely met.
In summary, the invention is based on the task of detecting the sand target, and has the following characteristics:
(1) in order to prevent the influence of small characteristic difference of different types of sand grains on the detection frame precision, a multi-classification part of a traditional target detection frame is abandoned, and the detection precision is improved while the frame structure is simplified;
(2) in order to fully utilize the image characteristics in the data set, a double-end input network structure which takes a single-polarization image and an orthogonal-polarization image as input is designed, and compared with a single-input framework, the framework has higher detection precision due to more available image characteristics in the training and testing stages.
While the invention has been described in terms of its preferred embodiments, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention.
Claims (10)
1. A sand grain target detection method based on a convolutional neural network is characterized by comprising the following steps:
1) designing a convolution network structure, and adding a double-end input structure and a multi-scale detection structure;
2) preprocessing is carried out based on the sand grain images and the labels;
3) training a convolutional network;
4) and predicting the target position of the sand grain image by applying the trained convolutional network.
2. The sand grain target detection method based on the convolutional neural network as claimed in claim 1, wherein the convolutional network structure in step 1) is formed by stacking a convolutional module and a residual module.
3. The sand grain target detection method based on the convolutional neural network as claimed in claim 2, wherein the convolutional module in step 1) is formed by serially connecting a convolutional layer, a batch normalization layer and an activation layer.
4. The sand grain target detection method based on the convolutional neural network as claimed in claim 2, wherein the residual module in step 1) is formed by cascading a zero padding layer, a convolution unit and a residual structure module.
5. The convolutional neural network-based sand target detection method as claimed in claim 1, wherein the double-ended input structure of the convolutional network structure in step 1) is used to simultaneously input the single-polarized and orthogonal-polarized imaging images into the network, so that the network simultaneously learns the features of the orthogonal-polarized image and the single-polarized image during training; each end of the double-end input structure has the same structure, and parameters are not shared.
6. The convolutional neural network-based sand target detection method as claimed in claim 1, wherein the multi-scale detection structure in step 1) is used for detecting sand targets in images at different scales; the multi-scale detection network uses three scales of large, medium and small; contains 5 modules: a large-scale detection structure, a large-scale medium-scale branch structure, a medium-scale detection structure, a medium-scale small-scale detection structure and a small-scale detection structure.
7. The sand grain target detection method based on the convolutional neural network as claimed in claim 1, wherein the step 2) specifically comprises: constructing a trained data set based on the marked sand grain images, wherein the trained data set comprises two parts: an image part and a label part; the image part is a group of single polarization and orthogonal polarization images; the label part is the position of the target sand grain in each image.
8. The convolutional neural network-based sand target detection method according to claim 7, wherein the preprocessing of the sand image and the annotation in step 2) specifically comprises:
preprocessing a sand image:
21) downsampling or upsampling the image to make the longer side of the image match the input size of the convolution network;
22) the shorter edge is extended to the convolution input size, with an extended pixel value of 125;
23) normalizing the image;
preprocessing the marked data:
24) converting the position information of the annotation data according to the scaling of the image data;
25) comparing the intersection ratio of each piece of position information and the anchor frame, mapping the position information larger than a certain threshold value to a tensor which has the same size as the output tensor, and storing other position information independently;
26) and obtaining the tensors of the three scales, and keeping the tensors consistent with the output tensors of the convolutional network.
9. The sand grain target detection method based on the convolutional neural network as claimed in claim 1, wherein the step 3) specifically comprises: training a convolutional network based on a training data set, comprising: defining an objective function and optimizing a training process.
10. The convolutional neural network-based sand target detection method as claimed in claim 1, wherein the processing procedure for predicting the sand image target position in step 4) is: preprocessing the sand grain image by applying a trained convolutional network, inputting two orthogonally polarized and single polarized sand grain images into the convolutional network to obtain prediction tensors of three scales, setting a confidence threshold value of 0.3 for all prediction frames, and deleting the prediction frames with confidence coefficients smaller than the threshold value; and finally, removing redundant prediction frames by using non-maximum value inhibition, outputting the residual prediction frames as prediction results, and marking the prediction results on the original sand grain image.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010114804.XA CN111353976B (en) | 2020-02-25 | 2020-02-25 | Sand grain target detection method based on convolutional neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010114804.XA CN111353976B (en) | 2020-02-25 | 2020-02-25 | Sand grain target detection method based on convolutional neural network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111353976A true CN111353976A (en) | 2020-06-30 |
CN111353976B CN111353976B (en) | 2023-07-25 |
Family
ID=71197181
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010114804.XA Active CN111353976B (en) | 2020-02-25 | 2020-02-25 | Sand grain target detection method based on convolutional neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111353976B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113256704A (en) * | 2021-03-26 | 2021-08-13 | 上海师范大学 | Grain length and width measuring method |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160328838A1 (en) * | 2015-05-01 | 2016-11-10 | Applied Research LLC. | Automatic target recognition system with online machine learning capability |
CN106557758A (en) * | 2016-11-25 | 2017-04-05 | 南京大学 | A kind of multiple target automatic identification method of grains of sand micro-image |
CN107146233A (en) * | 2017-04-24 | 2017-09-08 | 四川大学 | Granulometry Segmentation based on petrographic thin section polarisation sequence chart |
CN109523566A (en) * | 2018-09-18 | 2019-03-26 | 姜枫 | A kind of automatic division method of Sandstone Slice micro-image |
CN109668909A (en) * | 2017-10-13 | 2019-04-23 | 南京敏光视觉智能科技有限公司 | A kind of glass defect detection method |
CN110188720A (en) * | 2019-06-05 | 2019-08-30 | 上海云绅智能科技有限公司 | A kind of object detection method and system based on convolutional neural networks |
CN110348376A (en) * | 2019-07-09 | 2019-10-18 | 华南理工大学 | A kind of pedestrian's real-time detection method neural network based |
-
2020
- 2020-02-25 CN CN202010114804.XA patent/CN111353976B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160328838A1 (en) * | 2015-05-01 | 2016-11-10 | Applied Research LLC. | Automatic target recognition system with online machine learning capability |
CN106557758A (en) * | 2016-11-25 | 2017-04-05 | 南京大学 | A kind of multiple target automatic identification method of grains of sand micro-image |
CN107146233A (en) * | 2017-04-24 | 2017-09-08 | 四川大学 | Granulometry Segmentation based on petrographic thin section polarisation sequence chart |
CN109668909A (en) * | 2017-10-13 | 2019-04-23 | 南京敏光视觉智能科技有限公司 | A kind of glass defect detection method |
CN109523566A (en) * | 2018-09-18 | 2019-03-26 | 姜枫 | A kind of automatic division method of Sandstone Slice micro-image |
CN110188720A (en) * | 2019-06-05 | 2019-08-30 | 上海云绅智能科技有限公司 | A kind of object detection method and system based on convolutional neural networks |
CN110348376A (en) * | 2019-07-09 | 2019-10-18 | 华南理工大学 | A kind of pedestrian's real-time detection method neural network based |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113256704A (en) * | 2021-03-26 | 2021-08-13 | 上海师范大学 | Grain length and width measuring method |
CN113256704B (en) * | 2021-03-26 | 2024-04-05 | 上海师范大学 | Grain length and width measuring method |
Also Published As
Publication number | Publication date |
---|---|
CN111353976B (en) | 2023-07-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111639692B (en) | Shadow detection method based on attention mechanism | |
CN112115783B (en) | Depth knowledge migration-based face feature point detection method, device and equipment | |
CN108427920B (en) | Edge-sea defense target detection method based on deep learning | |
CN111950453B (en) | Random shape text recognition method based on selective attention mechanism | |
CN104318261B (en) | A kind of sparse representation face identification method representing recovery based on figure embedding low-rank sparse | |
WO2018099473A1 (en) | Scene analysis method and system, and electronic device | |
CN108985250A (en) | A kind of traffic scene analytic method based on multitask network | |
EP2774080A1 (en) | Object detection using extended surf features | |
CN111860683B (en) | Target detection method based on feature fusion | |
CN112818969A (en) | Knowledge distillation-based face pose estimation method and system | |
Li et al. | Designing robust sensing matrix for image compression | |
CN110598636B (en) | Ship target identification method based on feature migration | |
CN115861210B (en) | Transformer substation equipment abnormality detection method and system based on twin network | |
CN116797533B (en) | Appearance defect detection method and system for power adapter | |
WO2023201772A1 (en) | Cross-domain remote sensing image semantic segmentation method based on adaptation and self-training in iteration domain | |
CN115423796A (en) | Chip defect detection method and system based on TensorRT accelerated reasoning | |
CN114972225A (en) | Two-stage photovoltaic panel defect detection method based on deep learning | |
CN116385911A (en) | Lightweight target detection method for unmanned aerial vehicle inspection insulator | |
CN112884721B (en) | Abnormality detection method, abnormality detection system and computer-readable storage medium | |
CN111353976A (en) | Sand grain target detection method based on convolutional neural network | |
CN117236201B (en) | Diffusion and ViT-based downscaling method | |
CN108121964B (en) | Matrix-based joint sparse local preserving projection face recognition method | |
CN111985368B (en) | Convolutional neural network water body extraction method for container cloud | |
CN109902720B (en) | Image classification and identification method for depth feature estimation based on subspace decomposition | |
CN108388920B (en) | HOG and LBPH characteristic fused identity card copy detection method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |