CN115346125B - Target detection method based on deep learning - Google Patents
Target detection method based on deep learning Download PDFInfo
- Publication number
- CN115346125B CN115346125B CN202211270276.2A CN202211270276A CN115346125B CN 115346125 B CN115346125 B CN 115346125B CN 202211270276 A CN202211270276 A CN 202211270276A CN 115346125 B CN115346125 B CN 115346125B
- Authority
- CN
- China
- Prior art keywords
- training
- layer
- target detection
- data set
- target
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/10—Terrestrial scenes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/049—Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/30—Noise filtering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Multimedia (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Biomedical Technology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a target detection method based on deep learning, which comprises the steps of preprocessing an image to be detected, establishing a data set, wherein the data set comprises a target sample data set and a non-target sample data set; carrying out loss training and regression training on the data set in sequence to expand the target sample; extracting the characteristic vectors of all target samples, determining an adjacent matrix according to the characteristic vectors by taking the target samples as nodes, and establishing a weighted undirected graph; constructing an initial target detection model based on deep learning, and performing first training on the initial target detection model through a weighted undirected graph and a training set in a non-target sample data set; if the training times reach the preset training times, performing second training by using the optimization module; if the loss function is converged after the second training, stopping the training to obtain a target detection model, and performing target detection on the input image by using the target detection model; the target detection model set up by the invention has better robustness.
Description
Technical Field
The invention relates to the technical field of target detection, in particular to a target detection method based on deep learning.
Background
The traditional target detection algorithm can be divided into three steps of region selection, feature extraction and classifier classification, the feature extraction is carried out by manually selecting image features, and the feature is single and the robustness is poor. The neural network changes the current situation, can be independent of manual feature extraction, is developed vigorously after a major breakthrough in the field of image classification, and the current target detection algorithm based on deep learning becomes the mainstream of target detection research.
With the continuous development of the field, related models of an object detection algorithm are also in a variety, and the way of manually optimizing the process becomes very inefficient in view of the huge number of models. The automated approach also requires a great deal of computational support to select a qualified model, and is time-consuming.
Disclosure of Invention
The present invention has been made in view of the above-mentioned conventional problems.
Therefore, the invention provides a target detection method based on deep learning, which can solve the problems that the existing detection model is low in detection speed and the detection precision is low due to small samples.
In order to solve the above technical problems, the present invention provides the following technical solutions, including: preprocessing an image to be detected, and establishing a data set, wherein the data set comprises a target sample data set and a non-target sample data set; carrying out loss training and regression training on the data set in sequence to expand a target sample; extracting feature vectors of all target samples, determining an adjacency matrix according to the feature vectors by taking the target samples as nodes, and establishing a weighted undirected graph; constructing an initial target detection model based on deep learning, and performing first training on the initial target detection model through a weighted undirected graph and a training set in a non-target sample data set; if the training times reach the preset training times, performing second training by using the optimization module; and if the loss function is converged after the second training, stopping the training to obtain a target detection model, and performing target detection on the input image by using the target detection model.
As a preferable aspect of the deep learning-based target detection method of the present invention, wherein: preprocessing comprises noise reduction processing, binarization and normalization; the method comprises the steps that noise interference in an image to be detected is suppressed through a filtering module, wherein the filtering module comprises a plurality of two-dimensional filtering matrixes, a light-shielding storage unit and at least one CMOS sensor array; exposing at least N rows of gratings of an image to be detected through a CMOS sensor array, and transferring charges generated by exposure to an insulating and shielding storage unit at t so as to capture pixel points of M pixel regions in the image to be detected; finally, convolution is carried out on the pixel values of the pixel points through a two-dimensional filter matrix to complete noise reduction processing; performing binarization processing on the denoised image to obtain a binary image, and segmenting according to a connected domain of the binary image to divide a target sample and a non-target sample to obtain a non-target sample data set, wherein the target sample at least comprises an object to be detected, and the non-target sample does not comprise the object to be detected; sampling target samples containing objects to be detected with different sizes according to the same proportion to obtain data blocks, and performing normalization operation to obtain a target sample data set; wherein 70% of the data set is used as a training set and 30% of the data set is used as a testing set.
As a preferable aspect of the deep learning-based target detection method of the present invention, wherein: the loss training and the regression training include: performing loss training on the data set by adopting a cross entropy loss function; and performing regression training on the target sample data set by adopting a Smooth L1 loss function.
As a preferable aspect of the deep learning-based target detection method of the present invention, wherein: the feature vector includes: extracting multilayer semantic features of all target samples, and successively performing down-sampling and up-sampling on the multilayer semantic features according to a preset sampling rate to obtain a first feature vector; and fusing the first feature vector and the multilayer semantic features to obtain a second feature vector, and successively performing convolution and downsampling on the second feature vector to obtain the feature vector.
As a preferable scheme of the target detection method based on deep learning of the present invention, wherein: establishing the weighted undirected graph comprises the following steps: determining an adjacency matrix element K from the eigenvectors n,m :
Wherein, mu is a distance function, D n Is a feature vector of node n, D m Is the feature vector of node m;
according to the adjacency matrix element K n,m Establishing an undirected graph G with the right:
wherein K is the adjacent matrix set, and R is the edge weight between nodes.
As a preferable aspect of the deep learning-based target detection method of the present invention, wherein: the construction of the initial target detection model comprises the following steps: the initial target detection model consists of a convolution layer, an LSTM layer, a residual error network layer, a full connection layer and an output layer; the convolution layers comprise a first convolution layer with convolution kernel of 3 × 3, an LSTM layer, a second convolution layer with convolution kernel of 1 × 1 and a third convolution layer with convolution kernel of 5 × 5; the residual network layer comprises a fourth convolution layer with convolution kernel 1 x 1, a space pyramid pooling layer with pooling window 2 x 2, a fifth convolution layer with convolution kernel 3 x 3 and a sixth convolution layer with convolution kernel 1 x 1; the first convolution layer is used as an input layer of the initial target detection model, the LSTM layer is respectively connected with the output of the first convolution layer and the input of the second convolution layer, the second convolution layer is output to the third convolution layer through an Leaky ReLU activation function, the third convolution layer is output to the fourth convolution layer through a ReLU activation function, the fourth convolution layer, the spatial pyramid pooling layer, the fifth convolution layer and the sixth convolution layer are output through a MaxOut activation function, the features extracted by the sixth convolution layer are subjected to nonlinear combination through the full connection layer and output to the output layer, and the output layer outputs a detection result through a softmax function.
As a preferable aspect of the deep learning-based target detection method of the present invention, wherein: the first training includes: inputting 30% of training sets in the weighted undirected graph and the non-target sample data set into an initial target detection model, performing first training by a random gradient descent method, freezing a first convolution layer, an LSTM layer and a second convolution layer when the training times reach 20 times, training the rest layers except the first convolution layer, the LSTM layer and the second convolution layer by using the remaining 70% of training sets in the weighted undirected graph and the non-target sample data set, and performing second training when the training times reach preset training times.
As a preferable aspect of the deep learning-based target detection method of the present invention, wherein: the optimization module comprises: setting a learning rate, and constructing a Loss function Loss based on the model weight N obtained after the first training:
wherein y is an output value of the initial target detection model,is the predicted value of the initial target detection model, lambda is the balance factor, gamma is the learning rate, L pos Is the loss value of the target sample, L neg Loss values for non-target samples;
and (5) performing second training by using a WOA algorithm, and stopping training when the loss function value reaches the minimum value, namely convergence, so as to obtain the optimal model weight.
As a preferable aspect of the deep learning-based target detection method of the present invention, wherein: the second training includes: taking the balance factors and the model weight as whale individuals, and initializing the number of the whale individuals, the maximum iteration times T and the number of initial target detection model neurons; randomly generating the position of the whale individual, and calculating the fitness of the whale individual; updating the position of the individual whale, calculating the fitness of the individual whale at the moment, and selecting the optimal individual according to the fitness; stopping training when the loss function value reaches the minimum and the model precision meets the requirement, and obtaining the optimal model weight; wherein, the individual fitness of whale includes:
F=1/ Loss
wherein F is the fitness of the whale individual.
As a preferable aspect of the deep learning-based target detection method of the present invention, wherein: the model precision comprises: and inputting the test set into the model after the first training for testing, and performing precision evaluation by adopting an accuracy ACC.
The invention has the beneficial effects that: the method builds the target detection model based on deep learning, and performs optimization training on the target detection model by setting a secondary training mechanism, so that the method has better robustness.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise. Wherein:
fig. 1 is a schematic flowchart of a target detection method based on deep learning according to a first embodiment of the present invention;
fig. 2 is a schematic structural diagram of a target detection model of the target detection method based on deep learning according to the first embodiment of the present invention.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, specific embodiments accompanied with figures are described in detail below, and it is apparent that the described embodiments are a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, shall fall within the protection scope of the present invention.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways than those specifically described and will be readily apparent to those of ordinary skill in the art without departing from the spirit of the present invention, and therefore the present invention is not limited to the specific embodiments disclosed below.
Furthermore, reference herein to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one implementation of the invention. The appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments.
The present invention will be described in detail with reference to the drawings, wherein the cross-sectional views illustrating the structure of the device are not enlarged partially in general scale for convenience of illustration, and the drawings are only exemplary and should not be construed as limiting the scope of the present invention. In addition, the three-dimensional dimensions of length, width and depth should be included in the actual fabrication.
Meanwhile, in the description of the present invention, it should be noted that the terms "upper, lower, inner and outer" and the like indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience of describing the present invention and simplifying the description, but do not indicate or imply that the referred device or element must have a specific orientation, be constructed in a specific orientation and operate, and thus, cannot be construed as limiting the present invention. Furthermore, the terms first, second, or third are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
The terms "mounted, connected and connected" in the present invention are to be understood broadly, unless otherwise explicitly specified or limited, for example: can be fixedly connected, detachably connected or integrally connected; they may be mechanically, electrically, or directly connected, or indirectly connected through intervening media, or may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.
Example 1
Referring to fig. 1 to 2, a first embodiment of the present invention provides a deep learning-based target detection method, including:
s1: preprocessing an image to be detected, and establishing a data set, wherein the data set comprises a target sample data set and a non-target sample data set.
In order to better detect the target, the embodiment first performs preprocessing, i.e., denoising, binarization and normalization, on the image to be detected, specifically:
(1) Noise reduction processing
The noise interference in the image to be detected is suppressed through a filtering module, and the filtering module comprises a plurality of two-dimensional filtering matrixes, a light-shielding storage unit and at least one CMOS sensor array; exposing at least N rows of gratings of an image to be detected through a CMOS sensor array, and transferring charges generated by exposure to an insulating and shielding storage unit at t so as to capture pixel points of M pixel regions in the image to be detected; finally, the pixel values of the pixel points are convoluted through a two-dimensional filter matrix to complete noise reduction processing;
(2) Binarization method
Performing binarization processing on the denoised image to obtain a binary image, segmenting according to a connected domain of the binary image to divide a target sample and a non-target sample to obtain a non-target sample data set, wherein the target sample at least comprises an object to be detected, and the non-target sample does not comprise the object to be detected; the object to be detected may be, for example, a vehicle, a radar, a crowd, a ship, etc.
(3) Normalization
Sampling target samples containing objects to be detected with different sizes according to the same proportion to obtain data blocks, and performing normalization operation to obtain a target sample data set; wherein 70% of the data set is used as the training set and 30% of the data set is used as the test set.
S2: and carrying out loss training and regression training on the data set in sequence to expand the target sample.
In order to improve the detection precision, a large number of samples are needed to train the model, but the acquisition difficulty of the target sample is large, and the embodiment expands the acquired target sample through loss training and regression training, and specifically includes:
(1) Performing Loss training on the data set by adopting a cross entropy Loss function (Cross Encopy Loss);
(2) And performing regression training on the target sample data set by adopting a Smooth L1 loss function.
S3: extracting the characteristic vectors of all target samples, determining an adjacent matrix according to the characteristic vectors by taking the target samples as nodes, and establishing a weighted undirected graph.
(1) Extracting multilayer semantic features of all target samples, and sequentially performing down-sampling and up-sampling on the multilayer semantic features according to a preset sampling rate to obtain a first feature vector;
(2) And fusing the first feature vector and the multilayer semantic features to obtain a second feature vector, and successively performing convolution and downsampling on the second feature vector to obtain the feature vector.
Further, an adjacent matrix is determined according to the characteristic vector, and a weighted undirected graph is established:
(1) Determining an adjacency matrix element K from the eigenvectors n,m :
Wherein, mu is a distance function, D n Is a feature vector of node n, D m Is the feature vector of node m;
(2) According to the adjacency matrix element K n,m Establishing an undirected graph G with the right:
G=(K,R)
wherein K is the adjacent matrix set, and R is the edge weight between nodes.
S4: and constructing an initial target detection model based on deep learning, and training the initial target detection model for the first time through a weighted undirected graph and a training set in a non-target sample data set.
Referring to fig. 2, the initial target detection model is composed of a convolutional layer, an LSTM layer, a residual network layer, a full link layer, and an output layer; the convolution layers comprise a first convolution layer with convolution kernel of 3 × 3, an LSTM layer, a second convolution layer with convolution kernel of 1 × 1 and a third convolution layer with convolution kernel of 5 × 5, so that the receptive field is effectively increased; the residual network layer comprises a fourth convolution layer with convolution kernel of 1 × 1, a spatial pyramid pooling layer with pooling window of 2 × 2, a fifth convolution layer with convolution kernel of 3 × 3 and a sixth convolution layer with convolution kernel of 1 × 1, and preferably, the residual network is optimized by combining the pooling layers, so that the detection capability of the dense target detection is improved.
The first convolution layer is used as an input layer of the initial target detection model, the LSTM layer is respectively connected with the output of the first convolution layer and the input of the second convolution layer, the second convolution layer is output to the third convolution layer through a Leaky ReLU activation function, the third convolution layer is output to the fourth convolution layer through a ReLU activation function, the fourth convolution layer, the spatial pyramid pooling layer, the fifth convolution layer and the sixth convolution layer are output through a MaxOut activation function, the characteristics extracted by the sixth convolution layer are subjected to nonlinear combination through the full-connection layer and are output to the output layer, and the output layer outputs a detection result through a softmax function.
Preferably, the fifth convolutional layer firstly reduces the calculated amount under the fourth convolutional layer with reduced dimension, and then converts the convolutional features of the fourth convolutional layer into the same dimension through the spatial pyramid pooling layer, so that the fifth convolutional layer can process images with any dimension, and then the sixth convolutional layer is restored, thereby maintaining the precision, reducing the calculated amount and greatly increasing the detection speed.
Further, the initial target detection model is trained for the first time through a training set with a weighted undirected graph and a non-target sample data set:
inputting 30% of training sets in the weighted undirected graph and the non-target sample data set into an initial target detection model, performing first training by a random gradient descent method, freezing a first convolution layer, an LSTM layer and a second convolution layer when the training times reach 20 times, training the rest layers except the first convolution layer, the LSTM layer and the second convolution layer by using the remaining 70% of training sets in the weighted undirected graph and the non-target sample data set, and performing second training when the training times reach preset training times.
S5: if the training times reach the preset training times, performing second training by using the optimization module; and if the loss function is converged after the second training, stopping the training to obtain a target detection model, and performing target detection on the input image by using the target detection model.
The optimization module comprises:
(1) Setting a learning rate, and constructing a Loss function Loss based on the model weight N obtained after the first training:
wherein y is an output value of the initial target detection model,is the predicted value of the initial target detection model, lambda is the balance factor, gamma is the learning rate, L pos Is the loss value of the target sample, L neg Loss values for non-target samples;
the present embodiment sets the learning rate to 0.01, may cause an overfitting phenomenon when the learning rate is too small, and sets dropout to 0.7 in order to prevent experimental overfitting while accelerating the training speed.
(2) And (5) performing second training by using a WOA algorithm, and stopping training when the loss function value reaches the minimum value, namely convergence, so as to obtain the optimal model weight.
a) Taking the balance factors and the model weight as whale individuals, and initializing the number of the whale individuals, the maximum iteration times T and the number of initial target detection model neurons;
b) Randomly generating the position of the whale individual, and calculating the fitness of the whale individual;
the fitness of individual whale includes:
F=1/ Loss
wherein F is the fitness of the whale individual.
c) Updating the position of the individual whale, calculating the fitness of the individual whale at the moment, and selecting the optimal individual according to the fitness;
the location of individual whales was updated according to the following formula:
X=X q -aD
wherein X is the updated position of the whale individual, a is a random number of (0, 1), and D is the distance between the whale and the prey.
And c) comparing with the fitness of the step b), and selecting the individual with higher fitness as the optimal individual.
d) Stopping training when the loss function value reaches the minimum and the model precision meets the requirement, and obtaining the optimal model weight; and inputting the test set into the model after the first training for testing, and performing precision evaluation by adopting an accuracy ACC.
Preferably, the method is combined with a random gradient descent method and a WOA algorithm to train the target detection model, so that the optimal model weight is obtained, and the robustness is enhanced.
Example 2
In order to verify and explain the technical effects adopted in the method, the embodiment selects a CNN target detection algorithm, an RPN network-based target detection method and a comparison test by adopting the method, and compares test results by means of scientific demonstration to verify the real effect of the method.
The detection accuracy of the CNN target detection algorithm is high, but the corresponding calculation amount is large, so that the time cost is too high, and the target detection method based on the RPN network can only be trained through small-batch data, so that the detection performance is poor.
In order to verify that the method has higher detection precision and higher detection speed compared with a CNN target detection algorithm and an RPN network-based target detection method, in this embodiment, the existing technical scheme (CNN target detection algorithm and RPN network-based target detection method) and the method are respectively adopted to perform detection comparison on 500 vehicle images to be detected, accuracy (ACC), recall rate (TPR), precision (PRE) and F1-score (F1) are adopted as measurement indexes of each method, and the results are shown in table 1.
In the formula, TP is a real example, i.e., the number of samples for which true reentrancy is detected; FP is a false positive case, namely the number of samples for which false reentrancy is detected; FN is false negative, i.e. number of undetected true reentrant samples; TN is the true negative, i.e., the number of false reentrant samples that are not detected.
Table 1: target detection performance of different approaches.
As is clear from the data in Table 1, the method has good performance in target detection, and each performance index is superior to a CNN target detection algorithm and a target detection method based on an RPN network.
It should be recognized that embodiments of the present invention can be realized and implemented by computer hardware, a combination of hardware and software, or by computer instructions stored in a non-transitory computer readable memory. The methods may be implemented in a computer program using standard programming techniques, including a non-transitory computer-readable storage medium configured with the computer program, where the storage medium so configured causes a computer to operate in a specific and predefined manner, according to the methods and figures described in the detailed description. Each program may be implemented in a high level procedural or object oriented programming language to communicate with a computer system. However, the program(s) can be implemented in assembly or machine language, if desired. In any case, the language may be a compiled or interpreted language. Furthermore, the program can be run on a programmed application specific integrated circuit for this purpose.
Further, the operations of processes described herein can be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by context. The processes described herein (or variations and/or combinations thereof) may be performed under the control of one or more computer systems configured with executable instructions, and may be implemented as code (e.g., executable instructions, one or more computer programs, or one or more applications) collectively executed on one or more processors, by hardware, or combinations thereof. The computer program includes a plurality of instructions executable by one or more processors.
Further, the method may be implemented in any type of computing platform operatively connected to a suitable interface, including but not limited to a personal computer, mini computer, mainframe, workstation, networked or distributed computing environment, separate or integrated computer platform, or in communication with a charged particle tool or other imaging device, and the like. Aspects of the invention may be implemented in machine-readable code stored on a non-transitory storage medium or device, whether removable or integrated onto a computing platform, such as a hard disk, optically read and/or write storage media, RAM, ROM, etc., so that it is readable by a programmable computer, which when read by the computer can be used to configure and operate the computer to perform the procedures described herein. Further, the machine-readable code, or portions thereof, may be transmitted over a wired or wireless network. The invention described herein includes these and other different types of non-transitory computer-readable storage media when such media include instructions or programs that implement the steps described above in conjunction with a microprocessor or other data processor. The invention also includes the computer itself when programmed according to the methods and techniques described herein. A computer program can be applied to input data to perform the functions described herein to transform the input data to generate output data that is stored to non-volatile memory. The output information may also be applied to one or more output devices, such as a display. In a preferred embodiment of the invention, the transformed data represents physical and tangible objects, including particular visual depictions of physical and tangible objects produced on a display.
As used in this application, the terms "component," "module," "system," and the like are intended to refer to a computer-related entity, either hardware, firmware, a combination of hardware and software, or software in execution. For example, a component may be, but is not limited to being: a process running on a processor, an object, an executable, a thread of execution, a program, and/or a computer. By way of example, both an application running on a computing device and the computing device can be a component. One or more components can reside within a process and/or thread of execution and a component can be localized on one computer and/or distributed between two or more computers. In addition, these components can execute from various computer readable media having various data structures thereon. The components may communicate by way of local and/or remote processes such as in accordance with a signal having one or more data packets (e.g., data from one component interacting with another component in a local system, distributed system, and/or across a network such as the internet with other systems by way of the signal).
It should be noted that the above-mentioned embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, which should be covered by the claims of the present invention.
Claims (8)
1. A target detection method based on deep learning is characterized by comprising the following steps:
preprocessing an image to be detected, and establishing a data set, wherein the data set comprises a target sample data set and a non-target sample data set;
carrying out loss training and regression training on the data set in sequence to expand a target sample;
extracting feature vectors of all target samples, determining an adjacency matrix according to the feature vectors by taking the target samples as nodes, and establishing a weighted undirected graph;
constructing an initial target detection model based on deep learning, and performing first training on the initial target detection model through a weighted undirected graph and a training set in a non-target sample data set;
if the training times reach the preset training times, performing second training by using the optimization module;
if the loss function is converged after the second training, stopping the training to obtain a target detection model, and performing target detection on the input image by using the target detection model;
wherein, the optimization module includes:
setting a learning rate, and constructing a Loss function Loss based on the model weight N obtained after the first training:
Loss=N[-y*lgy’-(1-y)lg(1-y)]+γ[λL pos +(1-λ)L neg ]
wherein y is the output value of the initial target detection model, y' is the predicted value of the initial target detection model, lambda is the balance factor, gamma is the learning rate, and L pos Is the loss value of the target sample, L neg Loss values for non-target samples;
performing second training by using a WOA algorithm, and stopping training when the loss function value reaches the minimum value, namely convergence, so as to obtain the optimal model weight;
the second training specifically comprises the following steps:
taking the balance factors and the model weight as whale individuals, and initializing the number of the whale individuals, the maximum iteration times T and the number of initial target detection model neurons;
randomly generating the position of the whale individual, and calculating the fitness of the whale individual;
updating the positions of the whale individuals, calculating the fitness of the whale individuals at the moment, and selecting the optimal individuals according to the fitness;
stopping training when the loss function value reaches the minimum and the model precision meets the requirement, and obtaining the optimal model weight;
wherein, the individual fitness of whale includes:
F=1/Loss
wherein F is the fitness of the whale individual.
2. The deep learning-based target detection method according to claim 1, wherein the preprocessing includes noise reduction processing, binarization, and normalization;
the method comprises the steps that noise interference in an image to be detected is suppressed through a filtering module, wherein the filtering module comprises a plurality of two-dimensional filtering matrixes, a light-shielding storage unit and at least one CMOS sensor array; exposing at least N rows of gratings of an image to be detected through a CMOS sensor array, and transferring charges generated by exposure to an insulating and shielding storage unit at t so as to capture pixel points of M pixel regions in the image to be detected; finally, the pixel values of the pixel points are convoluted through a two-dimensional filter matrix to complete noise reduction processing;
performing binarization processing on the denoised image to obtain a binary image, and segmenting according to a connected domain of the binary image to divide a target sample and a non-target sample to obtain a non-target sample data set, wherein the target sample at least comprises an object to be detected, and the non-target sample does not comprise the object to be detected;
sampling target samples containing objects to be detected with different sizes according to the same proportion to obtain data blocks, and performing normalization operation to obtain a target sample data set;
wherein 70% of the data set is used as a training set and 30% of the data set is used as a testing set.
3. The deep learning-based target detection method of claim 1, wherein the loss training and the regression training comprise:
performing loss training on the data set by adopting a cross entropy loss function;
and performing regression training on the target sample data set by adopting a Smooth L1 loss function.
4. The deep learning-based object detection method according to claim 2 or 3, wherein the feature vector includes:
extracting multilayer semantic features of all target samples, and successively performing down-sampling and up-sampling on the multilayer semantic features according to a preset sampling rate to obtain a first feature vector;
and fusing the first feature vector and the multilayer semantic features to obtain a second feature vector, and successively performing convolution and downsampling on the second feature vector to obtain the feature vector.
5. The deep learning-based target detection method of claim 4, wherein building a weighted undirected graph comprises:
determining an adjacency matrix element K from the eigenvectors n,m :
K n,m =μ(D n, D m )
Where μ is a distance function, D n Is a feature vector of node n, D m Is the feature vector of node m;
according to the adjacency matrix element K n,m Establishing an undirected graph G with the right:
G=(K,R)
wherein K is the adjacent matrix set, and R is the edge weight between nodes.
6. The deep learning-based target detection method of claim 5, wherein constructing an initial target detection model comprises:
the initial target detection model consists of a convolutional layer, an LSTM layer, a residual network layer, a full connection layer and an output layer; the convolution layers comprise a first convolution layer with convolution kernel of 3 × 3, an LSTM layer, a second convolution layer with convolution kernel of 1 × 1 and a third convolution layer with convolution kernel of 5 × 5; the residual network layer comprises a fourth convolution layer with convolution kernel 1 x 1, a spatial pyramid pooling layer with pooling window 2 x 2, a fifth convolution layer with convolution kernel 3 x 3 and a sixth convolution layer with convolution kernel 1 x 1;
the first convolution layer is used as an input layer of the initial target detection model, the LSTM layer is respectively connected with the output of the first convolution layer and the input of the second convolution layer, the second convolution layer is output to the third convolution layer through a Leaky ReLU activation function, the third convolution layer is output to the fourth convolution layer through a ReLU activation function, the fourth convolution layer, the spatial pyramid pooling layer, the fifth convolution layer and the sixth convolution layer are output through a MaxOut activation function, the characteristics extracted by the sixth convolution layer are subjected to nonlinear combination through the full-connection layer and are output to the output layer, and the output layer outputs a detection result through a softmax function.
7. The deep learning-based target detection method of claim 6, wherein the first training comprises:
inputting 30% of training sets in the weighted undirected graph and the non-target sample data set into an initial target detection model, performing first training by a random gradient descent method, freezing a first convolution layer, an LSTM layer and a second convolution layer when the training times reach 20 times, training the rest layers except the first convolution layer, the LSTM layer and the second convolution layer by using the remaining 70% of training sets in the weighted undirected graph and the non-target sample data set, and performing second training when the training times reach preset training times.
8. The deep learning-based target detection method of claim 7, wherein the model precision comprises:
and inputting the test set into the model after the first training for testing, and performing precision evaluation by adopting an accuracy ACC.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211270276.2A CN115346125B (en) | 2022-10-18 | 2022-10-18 | Target detection method based on deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211270276.2A CN115346125B (en) | 2022-10-18 | 2022-10-18 | Target detection method based on deep learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115346125A CN115346125A (en) | 2022-11-15 |
CN115346125B true CN115346125B (en) | 2023-03-24 |
Family
ID=83957722
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211270276.2A Active CN115346125B (en) | 2022-10-18 | 2022-10-18 | Target detection method based on deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115346125B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116405310B (en) * | 2023-04-28 | 2024-03-15 | 北京宏博知微科技有限公司 | Network data security monitoring method and system |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112215795A (en) * | 2020-09-02 | 2021-01-12 | 苏州超集信息科技有限公司 | Intelligent server component detection method based on deep learning |
CN112668440A (en) * | 2020-12-24 | 2021-04-16 | 西安电子科技大学 | SAR ship target detection method based on regression loss of balance sample |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112417099B (en) * | 2020-11-20 | 2022-10-04 | 南京邮电大学 | Method for constructing fraud user detection model based on graph attention network |
CN113468803B (en) * | 2021-06-09 | 2023-09-26 | 淮阴工学院 | WOA-GRU flood flow prediction method and system based on improvement |
CN114021935A (en) * | 2021-10-29 | 2022-02-08 | 陕西科技大学 | Aquatic product safety early warning method based on improved convolutional neural network model |
CN114694144B (en) * | 2022-06-01 | 2022-08-23 | 南京航空航天大学 | Intelligent identification and rating method for non-metallic inclusions in steel based on deep learning |
-
2022
- 2022-10-18 CN CN202211270276.2A patent/CN115346125B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112215795A (en) * | 2020-09-02 | 2021-01-12 | 苏州超集信息科技有限公司 | Intelligent server component detection method based on deep learning |
CN112668440A (en) * | 2020-12-24 | 2021-04-16 | 西安电子科技大学 | SAR ship target detection method based on regression loss of balance sample |
Also Published As
Publication number | Publication date |
---|---|
CN115346125A (en) | 2022-11-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107704857B (en) | End-to-end lightweight license plate recognition method and device | |
CN107220618B (en) | Face detection method and device, computer readable storage medium and equipment | |
CN107529650B (en) | Closed loop detection method and device and computer equipment | |
CN111401516B (en) | Searching method for neural network channel parameters and related equipment | |
CN113902926A (en) | General image target detection method and device based on self-attention mechanism | |
CN109190537A (en) | A kind of more personage's Attitude estimation methods based on mask perceived depth intensified learning | |
CN112052787A (en) | Target detection method and device based on artificial intelligence and electronic equipment | |
CN113570029A (en) | Method for obtaining neural network model, image processing method and device | |
CN104866868A (en) | Metal coin identification method based on deep neural network and apparatus thereof | |
CN112036381B (en) | Visual tracking method, video monitoring method and terminal equipment | |
CN114140683A (en) | Aerial image target detection method, equipment and medium | |
CN111105017A (en) | Neural network quantization method and device and electronic equipment | |
CN111160217A (en) | Method and system for generating confrontation sample of pedestrian re-identification system | |
CN111626379B (en) | X-ray image detection method for pneumonia | |
CN115346125B (en) | Target detection method based on deep learning | |
EP3822864A1 (en) | Method and apparatus with deep neural network model fusing | |
CN114565092A (en) | Neural network structure determining method and device | |
CN111428566B (en) | Deformation target tracking system and method | |
CN111242176B (en) | Method and device for processing computer vision task and electronic system | |
CN116258877A (en) | Land utilization scene similarity change detection method, device, medium and equipment | |
CN110490058B (en) | Training method, device and system of pedestrian detection model and computer readable medium | |
CN113407820A (en) | Model training method, related system and storage medium | |
CN117710728A (en) | SAR image target recognition method, SAR image target recognition device, SAR image target recognition computer equipment and storage medium | |
JP6950647B2 (en) | Data determination device, method, and program | |
CN116206212A (en) | SAR image target detection method and system based on point characteristics |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |