CN115346125A

CN115346125A - Target detection method based on deep learning

Info

Publication number: CN115346125A
Application number: CN202211270276.2A
Authority: CN
Inventors: 韩德红; 杜益龙; 圣道翠
Original assignee: Nanjing Jinhantu Technology Co ltd
Current assignee: Nanjing Jinhantu Technology Co ltd
Priority date: 2022-10-18
Filing date: 2022-10-18
Publication date: 2022-11-15
Anticipated expiration: 2042-10-18
Also published as: CN115346125B

Abstract

The invention discloses a target detection method based on deep learning, which comprises the steps of preprocessing an image to be detected, establishing a data set, wherein the data set comprises a target sample data set and a non-target sample data set; carrying out loss training and regression training on the data set in sequence to expand the target sample; extracting the characteristic vectors of all target samples, determining an adjacent matrix according to the characteristic vectors by taking the target samples as nodes, and establishing a weighted undirected graph; constructing an initial target detection model based on deep learning, and performing first training on the initial target detection model through a weighted undirected graph and a training set in a non-target sample data set; if the training times reach the preset training times, performing second training by using the optimization module; if the loss function is converged after the second training, stopping the training to obtain a target detection model, and performing target detection on the input image by using the target detection model; the target detection model built by the method has better robustness.

Description

Target detection method based on deep learning

Technical Field

The invention relates to the technical field of target detection, in particular to a target detection method based on deep learning.

Background

The traditional target detection algorithm can be divided into three steps of region selection, feature extraction and classifier classification, the feature extraction is carried out by manually selecting image features, and the feature is single and the robustness is poor. The neural network changes the current situation, can be independent of manual feature extraction, is developed vigorously after a major breakthrough in the field of image classification, and the current target detection algorithm based on deep learning becomes the mainstream of target detection research.

With the continuous development of the field, the related models of the target detection algorithm are also infinite, and the way of manually optimizing the flow becomes very inefficient in the presence of huge number of models. The automated approach also requires a great deal of computational support to select a qualified model, and is time-consuming.

Disclosure of Invention

The present invention has been made in view of the above-mentioned conventional problems.

Therefore, the invention provides a target detection method based on deep learning, which can solve the problems that the existing detection model is low in detection speed and low in detection precision caused by small samples.

In order to solve the above technical problems, the present invention provides the following technical solutions, including: preprocessing an image to be detected, and establishing a data set, wherein the data set comprises a target sample data set and a non-target sample data set; carrying out loss training and regression training on the data set in sequence to expand a target sample; extracting the characteristic vectors of all target samples, determining an adjacent matrix according to the characteristic vectors by taking the target samples as nodes, and establishing a weighted undirected graph; constructing an initial target detection model based on deep learning, and performing first training on the initial target detection model through a weighted undirected graph and a training set in a non-target sample data set; if the training times reach the preset training times, performing second training by using the optimization module; and if the loss function is converged after the second training, stopping the training to obtain a target detection model, and performing target detection on the input image by using the target detection model.

As a preferable aspect of the deep learning-based target detection method of the present invention, wherein: preprocessing comprises noise reduction processing, binarization and normalization; the method comprises the steps that noise interference in an image to be detected is suppressed through a filtering module, wherein the filtering module comprises a plurality of two-dimensional filtering matrixes, a light-shielding storage unit and at least one CMOS sensor array; exposing at least N rows of gratings of an image to be detected through a CMOS sensor array, and transferring charges generated by exposure to an insulating and shielding storage unit at t so as to capture pixel points of M pixel regions in the image to be detected; finally, the pixel values of the pixel points are convoluted through a two-dimensional filter matrix to complete noise reduction processing; performing binarization processing on the denoised image to obtain a binary image, and segmenting according to a connected domain of the binary image to divide a target sample and a non-target sample to obtain a non-target sample data set, wherein the target sample at least comprises an object to be detected, and the non-target sample does not comprise the object to be detected; sampling target samples containing objects to be detected with different sizes according to the same proportion to obtain data blocks, and performing normalization operation to obtain a target sample data set; wherein 70% of the data set is used as a training set and 30% of the data set is used as a testing set.

As a preferable aspect of the deep learning-based target detection method of the present invention, wherein: the loss training and the regression training include: performing loss training on the data set by adopting a cross entropy loss function; and performing regression training on the target sample data set by adopting a Smooth L1 loss function.

As a preferable scheme of the target detection method based on deep learning of the present invention, wherein: the feature vector includes: extracting multilayer semantic features of all target samples, and sequentially performing down-sampling and up-sampling on the multilayer semantic features according to a preset sampling rate to obtain a first feature vector; and fusing the first feature vector and the multilayer semantic features to obtain a second feature vector, and successively performing convolution and downsampling on the second feature vector to obtain the feature vector.

As a preferable aspect of the deep learning-based target detection method of the present invention, wherein: establishing the weighted undirected graph comprises the following steps: determining an adjacency matrix element K from the eigenvectors _n，m ：

Wherein mu is a distance function, D _n Is a feature vector of node n, D _m Is the feature vector of node m;

according to the adjacency matrix element K _n，m Establishing an undirected graph G with the right:

wherein K is the adjacent matrix set, and R is the edge weight between nodes.

As a preferable aspect of the deep learning-based target detection method of the present invention, wherein: the construction of the initial target detection model comprises the following steps: the initial target detection model consists of a convolutional layer, an LSTM layer, a residual network layer, a full connection layer and an output layer; the convolution layers comprise a first convolution layer with convolution kernel of 3 × 3, an LSTM layer, a second convolution layer with convolution kernel of 1 × 1 and a third convolution layer with convolution kernel of 5 × 5; the residual network layer comprises a fourth convolution layer with convolution kernel 1 x 1, a space pyramid pooling layer with pooling window 2 x 2, a fifth convolution layer with convolution kernel 3 x 3 and a sixth convolution layer with convolution kernel 1 x 1; the first convolution layer is used as an input layer of the initial target detection model, the LSTM layer is respectively connected with the output of the first convolution layer and the input of the second convolution layer, the second convolution layer is output to the third convolution layer through a Leaky ReLU activation function, the third convolution layer is output to the fourth convolution layer through a ReLU activation function, the fourth convolution layer, the spatial pyramid pooling layer, the fifth convolution layer and the sixth convolution layer are output through a MaxOut activation function, the characteristics extracted by the sixth convolution layer are subjected to nonlinear combination through the full-connection layer and are output to the output layer, and the output layer outputs a detection result through a softmax function.

As a preferable scheme of the target detection method based on deep learning of the present invention, wherein: the first training includes: inputting 30% of training sets in the weighted undirected graph and the non-target sample data set into an initial target detection model, performing first training by a random gradient descent method, freezing a first convolution layer, an LSTM layer and a second convolution layer when the training times reach 20 times, training the rest layers except the first convolution layer, the LSTM layer and the second convolution layer by using the remaining 70% of training sets in the weighted undirected graph and the non-target sample data set, and performing second training when the training times reach preset training times.

As a preferable aspect of the deep learning-based target detection method of the present invention, wherein: the optimization module comprises: setting a learning rate, and constructing a Loss function Loss based on the model weight N obtained after the first training:

wherein y is an output value of the initial target detection model,

is the predicted value of the initial target detection model, lambda is the balance factor, gamma is the learning rate, L _pos Is the loss value of the target sample, L _neg Loss values for non-target samples;

and (5) performing second training by using a WOA algorithm, and stopping training when the loss function value reaches the minimum value, namely convergence, so as to obtain the optimal model weight.

As a preferable aspect of the deep learning-based target detection method of the present invention, wherein: the second training comprises: taking the balance factors and the model weight as whale individuals, and initializing the number of the whale individuals, the maximum iteration times T and the number of initial target detection model neurons; randomly generating the position of the whale individual, and calculating the fitness of the whale individual; updating the positions of the whale individuals, calculating the fitness of the whale individuals at the moment, and selecting the optimal individuals according to the fitness; stopping training when the loss function value reaches the minimum and the model precision meets the requirement, and obtaining the optimal model weight; wherein, the individual fitness of whale includes:

F=1/ Loss

wherein F is the fitness of the whale individual.

As a preferable aspect of the deep learning-based target detection method of the present invention, wherein: the model precision comprises: and inputting the test set into the model after the first training for testing, and performing precision evaluation by adopting an accuracy ACC.

The invention has the beneficial effects that: the method builds the target detection model based on deep learning, and performs optimization training on the target detection model by setting a secondary training mechanism, so that the method has better robustness.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise. Wherein:

fig. 1 is a schematic flowchart of a deep learning-based target detection method according to a first embodiment of the present invention;

fig. 2 is a schematic structural diagram of a target detection model of the target detection method based on deep learning according to the first embodiment of the present invention.

Detailed Description

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, specific embodiments accompanied with figures are described in detail below, and it is apparent that the described embodiments are a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, shall fall within the protection scope of the present invention.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced otherwise than as specifically described herein, and it will be appreciated by those skilled in the art that the present invention may be practiced without departing from the spirit and scope of the present invention and that the present invention is not limited by the specific embodiments disclosed below.

Furthermore, reference herein to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one implementation of the invention. The appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments.

The present invention will be described in detail with reference to the drawings, wherein the cross-sectional views illustrating the structure of the device are not enlarged partially in general scale for convenience of illustration, and the drawings are only exemplary and should not be construed as limiting the scope of the present invention. In addition, the three-dimensional dimensions of length, width and depth should be included in the actual fabrication.

Also in the description of the present invention, it should be noted that the terms "upper, lower, inner and outer" and the like indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, which are only for convenience of description and simplification of description, but do not indicate or imply that the device or element referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus, should not be construed as limiting the present invention. Furthermore, the terms first, second, or third are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

The terms "mounted, connected and connected" in the present invention are to be understood broadly, unless otherwise explicitly specified or limited, for example: can be fixedly connected, detachably connected or integrally connected; they may be mechanically, electrically, or directly connected, or indirectly connected through intervening media, or may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.

Example 1

Referring to fig. 1 to 2, a first embodiment of the present invention provides a deep learning-based target detection method, including:

s1: preprocessing an image to be detected, and establishing a data set, wherein the data set comprises a target sample data set and a non-target sample data set.

In order to better detect the target, the embodiment first performs preprocessing, i.e., denoising, binarization and normalization, on the image to be detected, specifically:

(1) Noise reduction processing

The noise interference in the image to be detected is suppressed through a filtering module, and the filtering module comprises a plurality of two-dimensional filtering matrixes, a light-shielding storage unit and at least one CMOS sensor array; exposing at least N rows of gratings of an image to be detected through a CMOS sensor array, and transferring charges generated by exposure to an insulating and shielding storage unit at t so as to capture pixel points of M pixel regions in the image to be detected; finally, the pixel values of the pixel points are convoluted through a two-dimensional filter matrix to complete noise reduction processing;

(2) Binarization method

Carrying out binarization processing on the denoised image to obtain a binary image, and segmenting according to a connected domain of the binary image to divide a target sample and a non-target sample to obtain a non-target sample data set, wherein the target sample at least comprises an object to be detected, and the non-target sample does not comprise the object to be detected; the object to be detected can be, for example, a vehicle, a radar, a crowd, a ship, etc.

(3) Normalization

Sampling target samples containing objects to be detected with different sizes according to the same proportion to obtain data blocks, and performing normalization operation to obtain a target sample data set; wherein 70% of the data set is used as a training set and 30% of the data set is used as a testing set.

S2: and carrying out loss training and regression training on the data set in sequence to expand the target sample.

In order to improve the detection precision, a large number of samples are required to train the model, but the acquisition difficulty of the target sample is large, and the embodiment expands the acquired target sample through loss training and regression training, and specifically comprises the following steps:

(1) Performing Loss training on the data set by adopting a cross entropy Loss function (Cross Encopy Loss);

(2) And performing regression training on the target sample data set by adopting a Smooth L1 loss function.

S3: extracting the characteristic vectors of all target samples, determining an adjacent matrix according to the characteristic vectors by taking the target samples as nodes, and establishing a weighted undirected graph.

(1) Extracting multilayer semantic features of all target samples, and successively performing down-sampling and up-sampling on the multilayer semantic features according to a preset sampling rate to obtain a first feature vector;

(2) And fusing the first feature vector and the multilayer semantic features to obtain a second feature vector, and successively performing convolution and downsampling on the second feature vector to obtain the feature vector.

Further, an adjacent matrix is determined according to the characteristic vector, and a weighted undirected graph is established:

(1) Determining an adjacency matrix element K from the eigenvectors _n，m ：

Wherein, mu is a distance function, D _n Is a feature vector of node n, D _m Is the feature vector of node m;

(2) According to the adjacency matrix element K _n，m Establishing an undirected graph G with the right:

G=（K，R）

wherein, K is an adjacent matrix set, and R is the edge weight between nodes.

S4: and constructing an initial target detection model based on deep learning, and training the initial target detection model for the first time through a weighted undirected graph and a training set in a non-target sample data set.

Referring to fig. 2, the initial target detection model is composed of a convolutional layer, an LSTM layer, a residual network layer, a full link layer, and an output layer; the convolution layers comprise a first convolution layer with convolution kernel of 3 × 3, an LSTM layer, a second convolution layer with convolution kernel of 1 × 1 and a third convolution layer with convolution kernel of 5 × 5, so that the receptive field is effectively enlarged; the residual network layer comprises a fourth convolution layer with convolution kernel of 1 × 1, a spatial pyramid pooling layer with pooling window of 2 × 2, a fifth convolution layer with convolution kernel of 3 × 3 and a sixth convolution layer with convolution kernel of 1 × 1, and preferably, the residual network is optimized by combining the pooling layers, so that the detection capability of the dense target detection is improved.

The first convolution layer is used as an input layer of the initial target detection model, the LSTM layer is respectively connected with the output of the first convolution layer and the input of the second convolution layer, the second convolution layer is output to the third convolution layer through an Leaky ReLU activation function, the third convolution layer is output to the fourth convolution layer through a ReLU activation function, the fourth convolution layer, the spatial pyramid pooling layer, the fifth convolution layer and the sixth convolution layer are output through a MaxOut activation function, the features extracted by the sixth convolution layer are subjected to nonlinear combination through the full connection layer and output to the output layer, and the output layer outputs a detection result through a softmax function.

Preferably, the fifth convolutional layer firstly reduces the calculated amount under the fourth convolutional layer with reduced dimensions, and then converts the convolution characteristics of the fourth convolutional layer into the same dimensions through the spatial pyramid pooling layer, so that the fifth convolutional layer can process images with any dimension, and then the sixth convolutional layer is restored, thereby maintaining the precision, reducing the calculated amount and greatly accelerating the detection speed.

Further, the initial target detection model is trained for the first time through a training set with a weighted undirected graph and a non-target sample data set:

inputting 30% of training sets in the weighted undirected graph and the non-target sample data set into an initial target detection model, performing first training by a random gradient descent method, freezing a first convolution layer, an LSTM layer and a second convolution layer when the training times reach 20 times, training the rest layers except the first convolution layer, the LSTM layer and the second convolution layer by using the remaining 70% of training sets in the weighted undirected graph and the non-target sample data set, and performing second training when the training times reach preset training times.

S5: if the training times reach the preset training times, performing second training by using the optimization module; and if the loss function is converged after the second training, stopping the training to obtain a target detection model, and performing target detection on the input image by using the target detection model.

The optimization module comprises:

(1) Setting a learning rate, and constructing a Loss function Loss based on the model weight N obtained after the first training:

wherein y is an output value of the initial target detection model,

the embodiment sets the learning rate to 0.01, and sets dropout to 0.7 to prevent over-fitting while speeding up training when the learning rate is too low, which may cause over-fitting.

(2) And (5) performing second training by using a WOA algorithm, and stopping training when the loss function value reaches the minimum value, namely convergence, so as to obtain the optimal model weight.

a) Taking the balance factors and the model weight as whale individuals, and initializing the number of the whale individuals, the maximum iteration times T and the number of initial target detection model neurons;

b) Randomly generating the position of the whale individual, and calculating the fitness of the whale individual;

the fitness of individual whale includes:

F=1/ Loss

wherein F is the fitness of whale individuals.

c) Updating the position of the individual whale, calculating the fitness of the individual whale at the moment, and selecting the optimal individual according to the fitness;

the location of individual whales was updated according to the following formula:

X=X _q -aD

wherein X is the updated position of the whale individual, a is a random number of (0, 1), and D is the distance between the whale and the prey.

And c) comparing the fitness with the fitness in the step b), and selecting the individual with higher fitness as the optimal individual.

d) Stopping training when the loss function value reaches the minimum and the model precision meets the requirement, and obtaining the optimal model weight; and inputting the test set into the model after the first training for testing, and performing precision evaluation by adopting an accuracy ACC.

Preferably, the method is combined with a random gradient descent method and a WOA algorithm to train the target detection model, so that the optimal model weight is obtained, and the robustness is enhanced.

Example 2

In order to verify and explain the technical effects adopted in the method, the embodiment selects a CNN target detection algorithm, an RPN network-based target detection method and a comparison test by adopting the method, and compares test results by means of scientific demonstration to verify the real effect of the method.

The detection accuracy of the CNN target detection algorithm is high, but the corresponding calculation amount is large, so that the time cost is too high, and the target detection method based on the RPN network can only be trained through small-batch data, so that the detection performance is poor.

In order to verify that the method has higher detection precision and higher detection speed compared with a CNN target detection algorithm and an RPN network-based target detection method, in this embodiment, the existing technical scheme (CNN target detection algorithm and RPN network-based target detection method) and the method are respectively adopted to perform detection comparison on 500 vehicle images to be detected, accuracy (ACC), recall rate (TPR), precision (PRE) and F1-score (F1) are adopted as measurement indexes of each method, and the results are shown in table 1.

In the formula, TP is a real example, i.e., the number of samples for which true reentrancy is detected; FP is a false positive case, namely the number of samples for which false reentrancy is detected; FN is false negative, i.e. number of undetected true reentrant samples; TN is the true negative, i.e., the number of false reentrant samples that are not detected.

Table 1: target detection performance of different approaches.

As is clear from the data in Table 1, the method has good performance in target detection, and each performance index is superior to a CNN target detection algorithm and a target detection method based on an RPN network.

It should be recognized that embodiments of the present invention can be realized and implemented by computer hardware, a combination of hardware and software, or by computer instructions stored in a non-transitory computer readable memory. The methods may be implemented in a computer program using standard programming techniques, including a non-transitory computer-readable storage medium configured with the computer program, where the storage medium so configured causes a computer to operate in a specific and predefined manner according to the methods and figures described in the detailed description. Each program may be implemented in a high level procedural or object oriented programming language to communicate with a computer system. However, the program(s) can be implemented in assembly or machine language, if desired. In any case, the language may be a compiled or interpreted language. Furthermore, the program can be run on a programmed application specific integrated circuit for this purpose.

Further, the operations of processes described herein can be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by context. The processes described herein (or variations and/or combinations thereof) may be performed under the control of one or more computer systems configured with executable instructions, and may be implemented as code (e.g., executable instructions, one or more computer programs, or one or more applications) collectively executed on one or more processors, by hardware, or combinations thereof. The computer program includes a plurality of instructions executable by one or more processors.

Further, the method may be implemented in any type of computing platform operatively connected to a suitable interface, including but not limited to a personal computer, mini computer, mainframe, workstation, networked or distributed computing environment, separate or integrated computer platform, or in communication with a charged particle tool or other imaging device, and the like. Aspects of the invention may be embodied in machine-readable code stored on a non-transitory storage medium or device, whether removable or integrated into a computing platform, such as a hard disk, optically read and/or write storage medium, RAM, ROM, or the like, such that it may be read by a programmable computer, which when read by the storage medium or device, is operative to configure and operate the computer to perform the procedures described herein. Further, the machine-readable code, or portions thereof, may be transmitted over a wired or wireless network. The invention described herein includes these and other different types of non-transitory computer-readable storage media when such media include instructions or programs that implement the steps described above in conjunction with a microprocessor or other data processor. The invention also includes the computer itself when programmed according to the methods and techniques described herein. A computer program can be applied to input data to perform the functions described herein to transform the input data to generate output data that is stored to non-volatile memory. The output information may also be applied to one or more output devices, such as a display. In a preferred embodiment of the invention, the transformed data represents physical and tangible objects, including particular visual depictions of physical and tangible objects produced on a display.

As used in this application, the terms "component," "module," "system," and the like are intended to refer to a computer-related entity, either hardware, firmware, a combination of hardware and software, or software in execution. For example, a component may be, but is not limited to being: a process running on a processor, an object, an executable, a thread of execution, a program, and/or a computer. By way of example, both an application running on a computing device and the computing device can be a component. One or more components can reside within a process and/or thread of execution and a component can be localized on one computer and/or distributed between two or more computers. In addition, these components can execute from various computer readable media having various data structures thereon. The components may communicate by way of local and/or remote processes such as in accordance with a signal having one or more data packets (e.g., data from one component interacting with another component in a local system, distributed system, and/or across a network such as the internet with other systems by way of the signal).

It should be noted that the above-mentioned embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, which should be covered by the claims of the present invention.

Claims

1. A target detection method based on deep learning is characterized by comprising the following steps:

preprocessing an image to be detected, and establishing a data set, wherein the data set comprises a target sample data set and a non-target sample data set;

carrying out loss training and regression training on the data set in sequence to expand a target sample;

extracting the characteristic vectors of all target samples, determining an adjacent matrix according to the characteristic vectors by taking the target samples as nodes, and establishing a weighted undirected graph;

constructing an initial target detection model based on deep learning, and performing first training on the initial target detection model through a weighted undirected graph and a training set in a non-target sample data set;

if the training times reach the preset training times, performing second training by using the optimization module;

and if the loss function is converged after the second training, stopping the training to obtain a target detection model, and performing target detection on the input image by using the target detection model.

2. The deep learning-based target detection method according to claim 1, wherein the preprocessing includes noise reduction processing, binarization, and normalization;

the method comprises the steps that noise interference in an image to be detected is suppressed through a filtering module, wherein the filtering module comprises a plurality of two-dimensional filtering matrixes, a light-shielding storage unit and at least one CMOS sensor array; exposing at least N rows of gratings of an image to be detected through a CMOS sensor array, and transferring charges generated by exposure to an insulating and shielding storage unit at t so as to capture pixel points of M pixel regions in the image to be detected; finally, convolution is carried out on the pixel values of the pixel points through a two-dimensional filter matrix to complete noise reduction processing;

performing binarization processing on the denoised image to obtain a binary image, and segmenting according to a connected domain of the binary image to divide a target sample and a non-target sample to obtain a non-target sample data set, wherein the target sample at least comprises an object to be detected, and the non-target sample does not comprise the object to be detected;

sampling target samples containing objects to be detected with different sizes according to the same proportion to obtain data blocks, and performing normalization operation to obtain a target sample data set;

wherein 70% of the data set is used as the training set and 30% of the data set is used as the test set.

3. The deep learning-based target detection method of claim 1, wherein the loss training and the regression training comprise:

performing loss training on the data set by adopting a cross entropy loss function;

and performing regression training on the target sample data set by adopting a Smooth L1 loss function.

4. The deep learning-based object detection method according to claim 2 or 3, wherein the feature vector includes:

extracting multilayer semantic features of all target samples, and successively performing down-sampling and up-sampling on the multilayer semantic features according to a preset sampling rate to obtain a first feature vector;

and fusing the first feature vector and the multilayer semantic features to obtain a second feature vector, and successively performing convolution and downsampling on the second feature vector to obtain the feature vector.

5. The deep learning-based target detection method of claim 4, wherein building a weighted undirected graph comprises:

determining an adjacency matrix element K from the eigenvectors _n，m ：

Wherein, mu is a distance function, D _n Feature vector for node n，D _m Is the feature vector of node m;

wherein K is the adjacent matrix set, and R is the edge weight between nodes.

6. The deep learning-based target detection method of claim 5, wherein constructing an initial target detection model comprises:

the initial target detection model consists of a convolution layer, an LSTM layer, a residual error network layer, a full connection layer and an output layer; the convolution layers comprise a first convolution layer with convolution kernel of 3 × 3, an LSTM layer, a second convolution layer with convolution kernel of 1 × 1 and a third convolution layer with convolution kernel of 5 × 5; the residual network layer comprises a fourth convolution layer with convolution kernel 1 x 1, a spatial pyramid pooling layer with pooling window 2 x 2, a fifth convolution layer with convolution kernel 3 x 3 and a sixth convolution layer with convolution kernel 1 x 1;

the first convolution layer is used as an input layer of the initial target detection model, the LSTM layer is respectively connected with the output of the first convolution layer and the input of the second convolution layer, the second convolution layer is output to the third convolution layer through a Leaky ReLU activation function, the third convolution layer is output to the fourth convolution layer through a ReLU activation function, the fourth convolution layer, the spatial pyramid pooling layer, the fifth convolution layer and the sixth convolution layer are output through a MaxOut activation function, the characteristics extracted by the sixth convolution layer are subjected to nonlinear combination through the full-connection layer and are output to the output layer, and the output layer outputs a detection result through a softmax function.

7. The deep learning-based target detection method of claim 6, wherein the first training comprises:

8. The deep learning-based target detection method of claim 7, wherein the optimization module comprises:

setting a learning rate, and constructing a Loss function Loss based on the model weight N obtained after the first training:

wherein y is an output value of the initial target detection model,

is the predicted value of the initial target detection model, lambda is the balance factor, gamma is the learning rate, L _pos Is the loss value, L, of the target sample _neg Loss values for non-target samples;

9. The deep learning-based target detection method of claim 8, wherein the second training comprises:

taking the balance factors and the model weight as whale individuals, and initializing the number of the whale individuals, the maximum iteration times T and the number of initial target detection model neurons;

randomly generating the position of the whale individual, and calculating the fitness of the whale individual;

updating the positions of the whale individuals, calculating the fitness of the whale individuals at the moment, and selecting the optimal individuals according to the fitness;

stopping training when the loss function value reaches the minimum and the model precision meets the requirement, and obtaining the optimal model weight;

wherein, the individual fitness of whale includes:

wherein F is the fitness of the whale individual.

10. The deep learning-based target detection method of claim 9, wherein the model accuracy comprises:

and inputting the test set into the model after the first training for testing, and performing precision evaluation by adopting an accuracy ACC.