CN111353976A - Sand grain target detection method based on convolutional neural network - Google Patents

Sand grain target detection method based on convolutional neural network Download PDF

Info

Publication number
CN111353976A
CN111353976A CN202010114804.XA CN202010114804A CN111353976A CN 111353976 A CN111353976 A CN 111353976A CN 202010114804 A CN202010114804 A CN 202010114804A CN 111353976 A CN111353976 A CN 111353976A
Authority
CN
China
Prior art keywords
network
image
sand
convolutional
sand grain
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010114804.XA
Other languages
Chinese (zh)
Other versions
CN111353976B (en
Inventor
王聪
顾庆
蒋智威
郝慧珍
董小龙
胡修棉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN202010114804.XA priority Critical patent/CN111353976B/en
Publication of CN111353976A publication Critical patent/CN111353976A/en
Application granted granted Critical
Publication of CN111353976B publication Critical patent/CN111353976B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/13Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Biology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Quality & Reliability (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a sand target detection method based on a convolutional neural network, which comprises the following steps: 1) designing a convolution network structure, and stacking a convolution module and a residual error module; adding a double-end input structure and a multi-scale detection structure; 2) preprocessing is carried out based on the sand grain images and the labels; constructing a training data set based on the marked sand grain images; 3) training a convolution network based on a training data set, wherein the training process comprises the steps of defining an objective function and optimizing the training process; 4) and predicting the target position of the sand grain image by applying the trained convolutional network. The invention fully utilizes the characteristics of the single polarization image and the orthogonal polarization image, applies the convolution neural network technology and improves the detection precision and the detection efficiency; the method has the advantages of high network training speed, capability of quickly finishing the sand target detection, suitability for automatic detection of mass sand images, and good expansibility, robustness and practicability.

Description

Sand grain target detection method based on convolutional neural network
Technical Field
The invention belongs to the field of image detection and identification, and particularly relates to a sand target detection method based on a convolutional neural network.
Background
In the geological field, the classification and statistics task of sand grains is always an important part of sand grain research, and sand grain target detection based on sand grain images is one of the most basic links. The traditional sand classification statistical task firstly needs to manually mark out a sand target in an image, but due to the particularity of the sand image, the marking method is low in efficiency and poor in repeatability when facing a large amount of image data. Specifically, the contrast between the target in the sand grain image and the background is weak, and the accuracy of annotation may be low due to manual annotation of a large amount of image data; the marking task of the sand target has certain speciality, and a large amount of manpower is not easy to be started for marking.
Object detection is a technique for detecting a specific object in an image using deep learning. The automation of the detection process can be realized by training the deep network through a certain amount of labeled data. Therefore, in the sand target detection task, a certain amount of sand image data is used for training the network, and automatic detection with high accuracy can be realized. The detection method utilizing the target detection technology can obviously reduce the workload of manual labeling and simultaneously ensure the detection precision.
The current mainstream target detection method based on the deep learning technology comprises the following steps: the method divides a target detection task into two subtasks, namely a feature frame extraction task and a feature frame classification task, and has high detection precision but large detection time consumption; the single-stage detector based on one-time scanning is different from the double-stage detector, the target detection task is regarded as a whole by the method, the detection precision of the method is lower than that of the double-stage detector, and the detection time consumption is higher. The target detection method can effectively detect the target in the image containing the natural object, but if the method is directly applied to a sand grain detection task, the method cannot achieve better detection precision.
Disclosure of Invention
In view of the above-mentioned shortcomings of the prior art, the present invention aims to provide a method for detecting a sand target based on a Convolutional Neural Network (CNN), which uses a full convolutional network with a residual structure, trains the convolutional network by using an annotated data set, automatically extracts features in a single-polarization image and an orthogonal-polarization image based on the trained convolutional network, and finally predicts the sand target in the sand image by using the features.
In order to achieve the purpose, the technical scheme adopted by the invention is as follows:
the invention discloses a sand grain target detection method based on a convolutional neural network, which comprises the following steps of:
1) designing a convolution network structure, and adding a double-end input structure and a multi-scale detection structure;
2) preprocessing is carried out based on the sand grain images and the labels;
3) training a convolutional network;
4) and predicting the target position of the sand grain image by applying the trained convolutional network.
Further, the convolution network structure in the step 1) is formed by stacking a convolution module and a residual error module.
Further, the convolution module in the step 1) is formed by serially connecting a convolution layer, a Batch Normalization (BN) layer and an activation layer.
The convolution strategy of the convolution layer is divided into two types of dimension reduction strategy and dimension non-reduction strategy according to different use conditions, the convolution kernel of the dimension non-reduction strategy has two sizes which are 1 × 1 and 3 × 3 respectively, and the convolution kernel of the dimension reduction strategy has a size of 3 × 3.
For a non-dimensionality reduction strategy with a convolution kernel size of 1 × 1, the input is Cin× W × H, with a convolution kernel of Cout×Cin× 1 × 1, and the output O dimension of the convolutional layer is Cout× W × H at [ c, x, y ]],1≤c≤CoutX is more than or equal to 1 and less than or equal to W, and y is more than or equal to 1 and less than or equal to H, the value of the position is as follows:
Figure BDA0002391155310000021
for the non-dimensionality reduction strategy with a convolution kernel of 3 × 3, the input is Cin× (W +2) × (H +2), the output O has dimensions [ c, X, y ]],1≤c≤CoutX is more than or equal to 1 and less than or equal to W, and y is more than or equal to 1 and less than or equal to H, the value of the position is as follows:
Figure BDA0002391155310000022
for a dimensionality reduction strategy with a convolution kernel of 3 × 3, the input is Cin× tensor X of W × H, where W and H are even numbers and the convolution kernel is Cout×Cin× 1 × 1, and the dimension of the convolutional layer output O is Cout× W × H at [ c, x, y ]],1≤c≤CoutX is more than or equal to 1 and less than or equal to W, and y is more than or equal to 1 and less than or equal to H, the value of the position is as follows:
Figure BDA0002391155310000023
the batch normalization layer is used for keeping the same distribution of input and output under the condition of small batch input so as to prevent the condition of slow training convergence caused by output distribution deviation when the number of network layers is increased; for each hidden layer, in small batch training, if the batch size is m, x is input for activation(k)K is more than or equal to 1 and less than or equal to m, and output after batch standardization operation
Figure BDA0002391155310000024
Comprises the following steps:
Figure BDA0002391155310000025
wherein E [. cndot. ] represents a mathematical expectation; var [. cndot. ] represents variance;
and in order to increase the network expression capacity, two adjusting parameters gamma and β are added for carrying out inverse transformation on the transformed activation:
Figure BDA0002391155310000026
the active layer adds non-linearity to the network using the Leaky ReLU function, whose output has a small gradient to the negative input, defined as follows:
Figure BDA0002391155310000031
wherein LReLU (·) represents a Leaky ReLU function; x is input; y is the output; k is a negative gradient.
Further, the residual error module in the step 1) is formed by cascading a zero padding layer, a convolution unit and a residual error structure module.
The Zero Padding (Zero Padding) layer is used to augment the input to fit the underlying input: will have a size of Cin× W × H input expansion to Cin× (W +1) × (H +1), a convolution unit uses a convolution kernel of a dimensionality reduction strategy and is defined by the formula (3), a residual error structure module is formed by connecting two convolution kernel sizes of non-dimensionality reduction strategy convolution units of 1 × 1 and 3 × 3 respectively through residual errors and is defined by the formulas (1) and (2), and the residual error connection uses a short-circuit mechanism to relieve the gradient disappearance problem caused by increasing the depth in the neural network, so that the neural network becomes easier to optimize.
Further, the double-end input structure of the convolution network structure in the step 1) is used for inputting the single-polarization image and the orthogonal-polarization imaging image into the network simultaneously, so that the network can learn the characteristics of the orthogonal-polarization image and the single-polarization image simultaneously during training; each end of the double-ended input structure is identical in structure but does not share parameters.
The network needs to combine the two-end inputs after inputting the structure and generate three branches for accessing the detection networks with three dimensions, namely large, medium and small.
Further, the multi-scale detection structure in the step 1) is used for detecting sand targets in the image at different scales; the multi-scale detection network uses three scales of large, medium and small; contains 5 modules: large-scale detection structure, large-scale medium-scale branch structure, medium-scale detection structure, medium-scale small-scale detection structure, and small-scale detection structure.
The number of tensor channels output by the three dimensions of the network is B × 5, wherein B represents the number of target frames in an original image mapped by each vector in the tensor, 5 represents the confidence coefficient of the target frames and the offset of the x-axis direction, the y-axis direction, the length and the width of the corresponding preset anchor frame, and for accurately obtaining the information of the target frames, the position and the confidence coefficient of the output tensor are transformed, the position offset of a certain target frame is respectively delta x, delta y, delta w and delta h, wherein delta x and delta y both represent the offset of the central position of the target frame, and the predicted confidence coefficient is coThe transformed prediction result is calculated as follows:
x=[sigmoid(Δx)+gx]·s;
y=|sigmoid(Δy)+gy|·s;
w=pw·eΔw·s;
h=ph·eΔh·s;
c=sigmoid(co);
wherein sigmoid (·) represents a sigmoid function; gxAnd gyRespectively representing the positions of grids with centers; s is a scale scaling coefficient, the small scale is 8, the medium scale is 16, and the large scale is 32; p is a radical ofwAnd phRespectively, the length and width of the anchor frame.
Further, the step 2) specifically includes: the training data set constructed based on the labeled sand grain images comprises two parts: an image part and a label part; the image part is a group of single polarization and orthogonal polarization images; the label part is the position of the target sand grain in each image.
Further, the preprocessing of the sand grain image and the annotation in the step 2) specifically comprises:
preprocessing a sand image:
21) downsampling or upsampling the image to make the longer side of the image match the input size of the convolution network;
22) the shorter edge is extended to the convolution input size, with an extended pixel value of 125;
23) normalizing the image;
preprocessing the marked data:
24) converting the position information of the annotation data according to the scaling of the image data;
25) comparing the intersection ratio of each piece of position information and the anchor frame, mapping the position information larger than a certain threshold value to a tensor which has the same size as the output tensor, and storing other position information independently;
26) and obtaining the tensors of the three scales, and keeping the tensors consistent with the output tensors of the convolutional network.
Further, the step 3) specifically includes: training a convolutional network based on a training data set, comprising: defining an objective function and optimizing a training process.
Further, the loss function of the convolutional network in the step 3) is lost by a bounding box
Figure BDA0002391155310000041
And confidence loss
Figure BDA0002391155310000042
The two parts are as follows:
Figure BDA0002391155310000043
wherein, the loss of each part simultaneously considers the loss of different prediction frames under different scales, and the formula is as follows:
Figure BDA0002391155310000044
wherein S islL is more than or equal to 1 and less than or equal to 3 and is the output size under the first scale;
bounding box loss the G-IoU function between two bounding boxes A, B was calculated using the G-IoU loss, where a is the prediction box and B is the label (true) box, and the formula is as follows:
Figure BDA0002391155310000045
where C is the smallest convexity of A and B, IoU (A, B) represents the intersection ratio of A and B, defined as:
Figure BDA0002391155310000046
based on the above, the bounding box loss is:
Figure BDA0002391155310000047
wherein, Cl,i,jRepresenting the true confidence of the grid in which it is located;
Figure BDA0002391155310000048
indicating whether the predicted bounding box is valid, and IoU ≧ 0.3 associated with the true bounding box is considered valid; w, H denote the width and height of the input image, respectively; o represents a real bounding box; w and h respectively represent the width and the height of the real bounding box;
Figure BDA0002391155310000051
a bounding box representing the prediction;
confidence loss using cross entropy loss with sigmoid and logist, the formula is as follows:
Figure BDA0002391155310000052
wherein, Cl,i,jIs the prediction confidence; CEL (-) is a cross-entropy loss function with sigmoid and logist:
CEL(X,Y)=-[logist(Y)·log(σ(X))+(1-logist(Y))·log(1-σ(X))](14)
wherein the logist function is:
Figure BDA0002391155310000053
further, the optimizer used in the training phase of the convolutional network in the step 3) is an Adam optimizer; the learning rate scheduler adopts Linear cosine scheduling, and the learning rate lr (i) of the training stage i is:
Figure BDA0002391155310000054
wherein, W is the number of training stages of the linear stage; lr of0Setting the basic learning rate as 1 e-6; lr ofmaxThe maximum learning rate is set to 1 e-3.
Further, the processing procedure of predicting the target position of the sand grain image in the step 4) is as follows: preprocessing the sand grain image by applying a trained convolutional network, inputting two orthogonally polarized and single polarized sand grain images into the convolutional network to obtain prediction tensors of three scales, setting a confidence threshold value of 0.3 for all prediction frames, and deleting the prediction frames with confidence coefficients smaller than the threshold value; and finally, removing redundant prediction frames by using non-maximum value inhibition, outputting the residual prediction frames as prediction results, and marking the prediction results on the original sand grain image.
The invention has the beneficial effects that:
the method can prevent the influence of small difference of different types of sand characteristics on the detection precision, fully utilizes the characteristics of the single-polarization image and the orthogonal-polarization image, simplifies the network structure and improves the detection precision and the detection efficiency; the method has the advantages of high network training speed, capability of quickly finishing the sand target detection, suitability for automatic detection of mass sand images, and good expansibility, robustness and practicability.
Drawings
FIG. 1 is an overall block diagram of the process of the present invention;
FIG. 2 is a block diagram of a sand image pre-processing process;
FIG. 3a is a schematic diagram of cross-polarization input;
FIG. 3b is a schematic diagram of a single polarization input;
FIG. 3c is a graph showing the results of cross polarization prediction;
FIG. 3d is a graph showing the single polarization prediction result;
fig. 4 is a structural diagram of a convolutional network designed by the present invention.
Detailed Description
In order to facilitate understanding of those skilled in the art, the present invention will be further described with reference to the following examples and drawings, which are not intended to limit the present invention.
The invention discloses a sand grain target detection method based on a convolutional neural network, which utilizes a trained CNN to automatically extract the characteristics of a single polarization image and an orthogonal polarization image corresponding to sand grains, and predicts the sand grain target in the image according to the characteristics. Different from the traditional target detection method, the method abandons the classification part in the traditional target detection frame and focuses on the prediction of the target position; the method inputs the orthogonal polarization image and the single polarization image simultaneously so as to fully utilize the characteristics of the single polarization image and the orthogonal polarization image.
Referring to fig. 1, the present invention adopts the following steps:
1) designing a convolution network structure;
2) preprocessing is carried out based on the sand grain images and the labels; a training data set is constructed based on labeled sand images, and comprises two parts: an image part and a label part; the image part is a group of single polarization and orthogonal polarization images; the marking part is the position of the target sand in each image;
3) training a convolution network based on a training data set, wherein the training process comprises the steps of defining an objective function and optimizing the training process;
4) and predicting the target position of the sand grain image by applying the trained convolutional network.
The network structure in step 1) is shown in fig. 4, and the network is a full convolution network using a residual error structure, is formed by stacking convolution units and residual error modules, and includes a double-end input structure and a multi-scale detection structure.
As shown in [1] of fig. 4, the convolution module in the network is formed by connecting a convolution (conv) layer, a Batch Normalization (BN) layer, and a leakage relu activation layer in series.
The convolution strategy of the convolution layer is divided into two types of dimension reduction strategy and dimension non-reduction strategy according to different use conditions, wherein the convolution kernel of the dimension non-reduction strategy has two sizes which are 1 × 1 and 3 × 3 respectively, and the convolution kernel size of the dimension reduction strategy is 3 × 3.
For a non-dimensionality reduction strategy with a convolution kernel size of 1 × 1, the input is Cin× W × H, with a convolution kernel of Cout×Cin× 1 × 1, and the output O dimension of the convolutional layer is Cout× W × H at [ c, x, y ]],1≤c≤CoutX is more than or equal to 1 and less than or equal to W, and y is more than or equal to 1 and less than or equal to H, the value of the position is as follows:
Figure BDA0002391155310000071
for the non-dimensionality reduction strategy with a convolution kernel of 3 × 3, the input is Cin× (W +2) × (H +2), the output O has dimensions [ c, X, y ]],1≤c≤CoutX is more than or equal to 1 and less than or equal to W, and y is more than or equal to 1 and less than or equal to H, the value of the position is as follows:
Figure BDA0002391155310000072
for a dimensionality reduction strategy with a convolution kernel of 3 × 3, the input is Cin× tensor X of W × H, where W and H are even numbers and the convolution kernel is Cout×Cin× 1 × 1, and the dimension of the convolutional layer output O is Cout× W × H at [ c, x, y ]],1≤c≤CoutX is more than or equal to 1 and less than or equal to W, and y is more than or equal to 1 and less than or equal to H, the value of the position is as follows:
Figure BDA0002391155310000073
the batch normalization layer is used for keeping the same distribution of input and output under the condition of small batch input so as to prevent the condition of slow training convergence caused by output distribution deviation when the number of network layers is increased; for each hidden layer in the network, the input distribution which is gradually mapped to the nonlinear function and then drawn close to the extreme saturation region of the value-taking interval is converted into the standard normal distribution which is approximate to the standard that the mean value is 0 and the variance is 1, so that the input value of the nonlinear transformation function is positioned in a region which is sensitive to input, and the problem of gradient disappearance is avoided. For a hidden layer in the network, inFor small training batches, if the batch size is m, x is used for activating input(k)K is more than or equal to 1 and less than or equal to m, and output after batch standardization operation
Figure BDA0002391155310000074
Comprises the following steps:
Figure BDA0002391155310000075
wherein E [. cndot. ] represents a mathematical expectation; var [. cndot. ] represents variance;
and converting the activation input into a linear region of nonlinear transformation to enhance the mobility of back propagation information and accelerate the convergence speed of training, but also leading the network expression capacity to be reduced, and in order to prevent the network expression capacity from being reduced, adding two adjusting parameters gamma and β for carrying out inverse transformation on the transformed activation:
Figure BDA0002391155310000076
the activation layer uses a Leaky ReLU function to add nonlinearity to the network; compared to the ReLU function, the output has a small gradient for negative inputs, defined as follows:
Figure BDA0002391155310000081
wherein LReLU (·) represents a Leaky ReLU function; x is input; y is the output; k is a negative gradient.
As shown in FIG. 4 [3 ]]As shown, a residual error unit of a network in the network is formed by cascading a Zero Padding (ZP) layer, a convolution unit and a plurality of residual error structure modules. The Zero Padding layer is used to augment the input to fit the underlying input: will have a size of Cin× W × H input expansion to Cin× (W +1) × (H +1) convolution unit uses the dimensionality reduction strategy convolution kernel defined by the formula (3) and a residual error structure module is formed by connecting two non-dimensionality reduction strategy convolution units respectively using 1 × 1 and 3 × 3 convolution kernels through residual errors and respectively comprises the formula (3)The formulas (1) and (2) are defined as [2 ] in FIG. 4]As shown. Residual concatenation uses a short-circuit mechanism to mitigate the gradient vanishing problem with increasing depth in neural networks. A direct correlation channel is established between input and output in an identity mapping mode, so that a network can intensively learn residual errors between the input and the output, and the neural network becomes easier to optimize.
The double-end input structure in the network is used for inputting the single-polarization image and the orthogonal-polarization imaging image into the network simultaneously, so that the network can learn the characteristics of the orthogonal-polarization image and the single-polarization image simultaneously during training; in the testing stage, the network can more accurately predict the sand positions in the image through the orthogonal polarization image and the single polarization image. The structure of each end of the double-end input structure is the same, but does not share parameters, and the structure of a possible input sub-network is shown as [4] in fig. 4 and table 1:
TABLE 1
Figure BDA0002391155310000082
The network needs to combine the double-end inputs after inputting the structure and generate three branches for accessing the detection networks with large, medium and small scales respectively; one possible connection and multi-scale generation structure is shown in [5] in FIG. 4 and Table 2:
TABLE 2
Figure BDA0002391155310000091
A multi-scale detection structure in the network is used for detecting sand targets in the image at different scales. Particularly, for small target objects in the image, the effect of using a smaller scale for detection is better; similarly, for a large target object in the image, the effect of using large-scale detection is better. Generally, the more detection scales are used in the multi-scale structure, the higher the detection accuracy of the model is, but the longer the detection time may be. In order to balance the detection time and the detection precision of the network, the network provided by the embodiment uses a large, medium and small three-scale detection structure. The multi-scale detection network uses the FPN structure, and the structure of a possible three-scale detection network is shown as [6] to [10] in fig. 4. Specifically, the system comprises 5 modules which are respectively: large-scale detection structure, large-scale medium-scale branch structure, medium-scale detection structure, medium-scale small-scale detection structure, and small-scale detection structure.
The large-scale detection structure is shown as [6] in FIG. 4 and in the following Table 3:
TABLE 3
Figure BDA0002391155310000092
The large-scale medium-scale branch structure is shown in the following table 4 and [7] in fig. 4:
TABLE 4
Figure BDA0002391155310000093
Figure BDA0002391155310000101
The mesoscale detection structure is shown in [8] in FIG. 4 and in Table 5 below:
TABLE 5
Figure BDA0002391155310000102
The meso-scale versus micro-scale detection structure is shown in fig. 4 [9] and table 6 below:
TABLE 6
Figure BDA0002391155310000103
The small scale detection structure is shown as [10] in fig. 4 and in table 7 below:
TABLE 7
Figure BDA0002391155310000104
Figure BDA0002391155310000111
The number of tensor channels of the three-scale output of the network is all B × 5, wherein B represents the number of target frames of the RoI prediction of the original image mapped by each vector in the tensor, and 5 represents the confidence of the predicted target frames and the offset of the X-axis direction, the Y-axis direction, the length direction and the width direction of the corresponding anchor frame.
Assuming that the size of the input image is 608 × 608 and B is 3, the output of CNN is 3 tensors, each being a small-scale tensor T with a size of 72 × 72 × 15s26 × 26 × 15 medium scale tensor TmAnd a large scale tensor T of size 13 × 13 × 15l
Then, transforming the position and confidence of the output tensor; specifically, assume that the position offset of a certain target frame is Δ x, Δ y, Δ w, Δ h, respectively, where Δ x and Δ y both represent the offset of the center position of the target frame, and the confidence of the prediction is coThe transformed prediction result is calculated as follows:
x=[sigmoid(Δx)+gx]·s;
y=[sigmoid(Δy)+gy]·s;
w=pw·eΔw·s;
h=ph·eΔh·s;
c=sigmoid(co);
wherein sigmoid (·) represents a sigmoid function; gxAnd gyRespectively representing the positions of grids with centers; s is a scale scaling coefficient, the small scale is 8, the medium scale is 16, and the large scale is 32; p is a radical ofwAnd phRespectively, the length and width of the anchor frame.
After decoding the output tensor of the CNN, the data in the tensor directly represents the target frame information of the network prediction.
The training phase first requires preprocessing of the data set data. The data set data contains two parts: an image part and an annotation part. The image part is a group of single polarization and orthogonal polarization images; the annotation data is a target location in the image. The purpose of the data preprocessing is to match the image data to the input of the CNN and the annotation data to the output of the CNN.
The preprocessing process for the image data in the above step 2) is shown in fig. 2. Firstly, down-sampling or up-sampling an image to enable the longer side of the image to be matched with the input size of the CNN; secondly, the shorter edge is expanded to the input size of CNN, and the general expanded pixel value is 125; finally, the image is normalized. Compared with the traditional scheme of directly sampling two edges of the image to the input size of the CNN, the method can keep the aspect ratio of the target in the image and prevent the network from learning distorted target information.
The pre-processing process of the marked data comprises the following steps: firstly, converting the position information of the marking data according to the scaling of the image data; secondly, comparing the intersection ratio of each piece of position information and the anchor frame, mapping the position information larger than a certain threshold value to a tensor which has the same size as the output tensor, and storing other position information independently; and finally, obtaining three scales of tensors consistent with the CNN output tensors in size.
Loss function of the convolutional network in the step 3) is lost by the bounding box
Figure BDA0002391155310000112
And confidence loss
Figure BDA0002391155310000113
The two parts are as follows:
Figure BDA0002391155310000121
wherein, the loss of each part simultaneously considers the loss of different prediction frames under different scales, and the formula is as follows:
Figure BDA0002391155310000122
wherein S islL is more than or equal to 1 and less than or equal to 3 and is the output size under the first scale;
bounding box loss the G-IoU function between two bounding boxes A, B was calculated using the G-IoU loss, where a is the prediction box and B is the label (true) box, and the formula is as follows:
Figure BDA0002391155310000123
where C is the smallest convexity of A and B, IoU (A, B) represents the intersection ratio of A and B, defined as:
Figure BDA0002391155310000124
based on the above, the bounding box loss is:
Figure BDA0002391155310000125
wherein, Cl,i,jRepresenting the true confidence of the grid in which it is located;
Figure BDA0002391155310000126
indicating whether the predicted bounding box is valid, and IoU ≧ 0.3 associated with the true bounding box is considered valid; w, H denote the width and height of the input image, respectively; o represents a real bounding box; w and h respectively represent the width and the height of the real bounding box;
Figure BDA0002391155310000127
a bounding box representing the prediction;
confidence loss using cross entropy loss with sigmoid and logist, the formula is as follows:
Figure BDA0002391155310000128
wherein, Cl,i,jIs the prediction confidence; CEL (-) is a cross-entropy loss function with sigmoid and logist:
CEL(X,Y)=-[logist(Y)·log(σ(X))+(1-logist(Y))·log(1-σ(X))](14)
wherein the logist function is:
Figure BDA0002391155310000129
the optimizer used in the training phase of the convolutional network in the step 3) is an Adam optimizer; the learning rate scheduler adopts Linear cosine scheduling, and the learning rate lr (i) of the training stage i is:
Figure BDA0002391155310000131
wherein, W is the number of training stages of the linear stage; lr of0Setting the basic learning rate as 1 e-6; lr ofmax1e-3 for the maximum learning rate.
The specific process of the detection process in the step 4) is as follows: firstly, preprocessing a sand grain image; then inputting two orthogonal polarized light and single polarized light images to be detected into a network to obtain a three-scale prediction tensor, setting a confidence threshold for all prediction frames, deleting the prediction frames with the confidence lower than the threshold, and generally setting the threshold as 0.3; and finally, removing redundant prediction frames by using non-maximum value suppression, outputting the residual prediction frames as prediction results, and marking the prediction results on the original image.
As shown in fig. 3a to fig. 3d, through experiments, the sand target detection method provided by the present invention has a fast network training speed, can quickly complete sand target detection, is suitable for automatic detection of a large amount of sand images, and has good expansibility, robustness and practicability. Specifically, the detection rate of the sand target in the test set reaches 97.40%, and the detection rate of AP reaches 90.98%. The requirement of a sand target detection task is completely met.
In summary, the invention is based on the task of detecting the sand target, and has the following characteristics:
(1) in order to prevent the influence of small characteristic difference of different types of sand grains on the detection frame precision, a multi-classification part of a traditional target detection frame is abandoned, and the detection precision is improved while the frame structure is simplified;
(2) in order to fully utilize the image characteristics in the data set, a double-end input network structure which takes a single-polarization image and an orthogonal-polarization image as input is designed, and compared with a single-input framework, the framework has higher detection precision due to more available image characteristics in the training and testing stages.
While the invention has been described in terms of its preferred embodiments, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention.

Claims (10)

1. A sand grain target detection method based on a convolutional neural network is characterized by comprising the following steps:
1) designing a convolution network structure, and adding a double-end input structure and a multi-scale detection structure;
2) preprocessing is carried out based on the sand grain images and the labels;
3) training a convolutional network;
4) and predicting the target position of the sand grain image by applying the trained convolutional network.
2. The sand grain target detection method based on the convolutional neural network as claimed in claim 1, wherein the convolutional network structure in step 1) is formed by stacking a convolutional module and a residual module.
3. The sand grain target detection method based on the convolutional neural network as claimed in claim 2, wherein the convolutional module in step 1) is formed by serially connecting a convolutional layer, a batch normalization layer and an activation layer.
4. The sand grain target detection method based on the convolutional neural network as claimed in claim 2, wherein the residual module in step 1) is formed by cascading a zero padding layer, a convolution unit and a residual structure module.
5. The convolutional neural network-based sand target detection method as claimed in claim 1, wherein the double-ended input structure of the convolutional network structure in step 1) is used to simultaneously input the single-polarized and orthogonal-polarized imaging images into the network, so that the network simultaneously learns the features of the orthogonal-polarized image and the single-polarized image during training; each end of the double-end input structure has the same structure, and parameters are not shared.
6. The convolutional neural network-based sand target detection method as claimed in claim 1, wherein the multi-scale detection structure in step 1) is used for detecting sand targets in images at different scales; the multi-scale detection network uses three scales of large, medium and small; contains 5 modules: a large-scale detection structure, a large-scale medium-scale branch structure, a medium-scale detection structure, a medium-scale small-scale detection structure and a small-scale detection structure.
7. The sand grain target detection method based on the convolutional neural network as claimed in claim 1, wherein the step 2) specifically comprises: constructing a trained data set based on the marked sand grain images, wherein the trained data set comprises two parts: an image part and a label part; the image part is a group of single polarization and orthogonal polarization images; the label part is the position of the target sand grain in each image.
8. The convolutional neural network-based sand target detection method according to claim 7, wherein the preprocessing of the sand image and the annotation in step 2) specifically comprises:
preprocessing a sand image:
21) downsampling or upsampling the image to make the longer side of the image match the input size of the convolution network;
22) the shorter edge is extended to the convolution input size, with an extended pixel value of 125;
23) normalizing the image;
preprocessing the marked data:
24) converting the position information of the annotation data according to the scaling of the image data;
25) comparing the intersection ratio of each piece of position information and the anchor frame, mapping the position information larger than a certain threshold value to a tensor which has the same size as the output tensor, and storing other position information independently;
26) and obtaining the tensors of the three scales, and keeping the tensors consistent with the output tensors of the convolutional network.
9. The sand grain target detection method based on the convolutional neural network as claimed in claim 1, wherein the step 3) specifically comprises: training a convolutional network based on a training data set, comprising: defining an objective function and optimizing a training process.
10. The convolutional neural network-based sand target detection method as claimed in claim 1, wherein the processing procedure for predicting the sand image target position in step 4) is: preprocessing the sand grain image by applying a trained convolutional network, inputting two orthogonally polarized and single polarized sand grain images into the convolutional network to obtain prediction tensors of three scales, setting a confidence threshold value of 0.3 for all prediction frames, and deleting the prediction frames with confidence coefficients smaller than the threshold value; and finally, removing redundant prediction frames by using non-maximum value inhibition, outputting the residual prediction frames as prediction results, and marking the prediction results on the original sand grain image.
CN202010114804.XA 2020-02-25 2020-02-25 Sand grain target detection method based on convolutional neural network Active CN111353976B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010114804.XA CN111353976B (en) 2020-02-25 2020-02-25 Sand grain target detection method based on convolutional neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010114804.XA CN111353976B (en) 2020-02-25 2020-02-25 Sand grain target detection method based on convolutional neural network

Publications (2)

Publication Number Publication Date
CN111353976A true CN111353976A (en) 2020-06-30
CN111353976B CN111353976B (en) 2023-07-25

Family

ID=71197181

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010114804.XA Active CN111353976B (en) 2020-02-25 2020-02-25 Sand grain target detection method based on convolutional neural network

Country Status (1)

Country Link
CN (1) CN111353976B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113256704A (en) * 2021-03-26 2021-08-13 上海师范大学 Grain length and width measuring method

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160328838A1 (en) * 2015-05-01 2016-11-10 Applied Research LLC. Automatic target recognition system with online machine learning capability
CN106557758A (en) * 2016-11-25 2017-04-05 南京大学 A kind of multiple target automatic identification method of grains of sand micro-image
CN107146233A (en) * 2017-04-24 2017-09-08 四川大学 Granulometry Segmentation based on petrographic thin section polarisation sequence chart
CN109523566A (en) * 2018-09-18 2019-03-26 姜枫 A kind of automatic division method of Sandstone Slice micro-image
CN109668909A (en) * 2017-10-13 2019-04-23 南京敏光视觉智能科技有限公司 A kind of glass defect detection method
CN110188720A (en) * 2019-06-05 2019-08-30 上海云绅智能科技有限公司 A kind of object detection method and system based on convolutional neural networks
CN110348376A (en) * 2019-07-09 2019-10-18 华南理工大学 A kind of pedestrian's real-time detection method neural network based

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160328838A1 (en) * 2015-05-01 2016-11-10 Applied Research LLC. Automatic target recognition system with online machine learning capability
CN106557758A (en) * 2016-11-25 2017-04-05 南京大学 A kind of multiple target automatic identification method of grains of sand micro-image
CN107146233A (en) * 2017-04-24 2017-09-08 四川大学 Granulometry Segmentation based on petrographic thin section polarisation sequence chart
CN109668909A (en) * 2017-10-13 2019-04-23 南京敏光视觉智能科技有限公司 A kind of glass defect detection method
CN109523566A (en) * 2018-09-18 2019-03-26 姜枫 A kind of automatic division method of Sandstone Slice micro-image
CN110188720A (en) * 2019-06-05 2019-08-30 上海云绅智能科技有限公司 A kind of object detection method and system based on convolutional neural networks
CN110348376A (en) * 2019-07-09 2019-10-18 华南理工大学 A kind of pedestrian's real-time detection method neural network based

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113256704A (en) * 2021-03-26 2021-08-13 上海师范大学 Grain length and width measuring method
CN113256704B (en) * 2021-03-26 2024-04-05 上海师范大学 Grain length and width measuring method

Also Published As

Publication number Publication date
CN111353976B (en) 2023-07-25

Similar Documents

Publication Publication Date Title
CN111639692B (en) Shadow detection method based on attention mechanism
CN112115783B (en) Depth knowledge migration-based face feature point detection method, device and equipment
CN108427920B (en) Edge-sea defense target detection method based on deep learning
CN111950453B (en) Random shape text recognition method based on selective attention mechanism
CN104318261B (en) A kind of sparse representation face identification method representing recovery based on figure embedding low-rank sparse
WO2018099473A1 (en) Scene analysis method and system, and electronic device
CN108985250A (en) A kind of traffic scene analytic method based on multitask network
EP2774080A1 (en) Object detection using extended surf features
CN111860683B (en) Target detection method based on feature fusion
CN112818969A (en) Knowledge distillation-based face pose estimation method and system
Li et al. Designing robust sensing matrix for image compression
CN110598636B (en) Ship target identification method based on feature migration
CN115861210B (en) Transformer substation equipment abnormality detection method and system based on twin network
CN116797533B (en) Appearance defect detection method and system for power adapter
WO2023201772A1 (en) Cross-domain remote sensing image semantic segmentation method based on adaptation and self-training in iteration domain
CN115423796A (en) Chip defect detection method and system based on TensorRT accelerated reasoning
CN114972225A (en) Two-stage photovoltaic panel defect detection method based on deep learning
CN116385911A (en) Lightweight target detection method for unmanned aerial vehicle inspection insulator
CN112884721B (en) Abnormality detection method, abnormality detection system and computer-readable storage medium
CN111353976A (en) Sand grain target detection method based on convolutional neural network
CN117236201B (en) Diffusion and ViT-based downscaling method
CN108121964B (en) Matrix-based joint sparse local preserving projection face recognition method
CN111985368B (en) Convolutional neural network water body extraction method for container cloud
CN109902720B (en) Image classification and identification method for depth feature estimation based on subspace decomposition
CN108388920B (en) HOG and LBPH characteristic fused identity card copy detection method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant