CN114492540A - Training method and device of target detection model, computer equipment and storage medium - Google Patents

Training method and device of target detection model, computer equipment and storage medium Download PDF

Info

Publication number
CN114492540A
CN114492540A CN202210308846.6A CN202210308846A CN114492540A CN 114492540 A CN114492540 A CN 114492540A CN 202210308846 A CN202210308846 A CN 202210308846A CN 114492540 A CN114492540 A CN 114492540A
Authority
CN
China
Prior art keywords
loss
detection model
training
anchor frame
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210308846.6A
Other languages
Chinese (zh)
Other versions
CN114492540B (en
Inventor
不公告发明人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Shuzhilian Technology Co Ltd
Original Assignee
Chengdu Shuzhilian Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Shuzhilian Technology Co Ltd filed Critical Chengdu Shuzhilian Technology Co Ltd
Priority to CN202210308846.6A priority Critical patent/CN114492540B/en
Publication of CN114492540A publication Critical patent/CN114492540A/en
Application granted granted Critical
Publication of CN114492540B publication Critical patent/CN114492540B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/12Classification; Matching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Signal Processing (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the application discloses a training method and device of a target detection model, computer equipment and a storage medium, and relates to the technical field of image recognition. The method comprises the following steps: and constructing a network structure of the target detection model to obtain an initial detection model, inputting a plurality of signal time-frequency graphs into the initial detection model, outputting a final prediction result, constructing a loss function to calculate training loss, and adjusting the initial detection model according to the training loss to obtain the target detection model. The expansion convolution layer and the deconvolution layer are added in the network structure, and the negative sample weight value is added in the loss function, so that the target detection model can smoothly distinguish a real target area from a blank target area in training. Therefore, the signal data in the signal time-frequency diagram can be trained and detected quickly, and various types of signal data can be detected accurately.

Description

Training method and device of target detection model, computer equipment and storage medium
Technical Field
The scheme belongs to the technical field of image recognition, and particularly relates to a training method and device of a target detection model, computer equipment and a storage medium.
Background
In order to ensure the reliability of information transmission, the information transmission system must have stable anti-interference capability, and signal detection is one of the best methods for resisting interference. The existing signal detection scheme is a time-frequency analysis method, and the process is as follows: and mapping the one-dimensional signal to a two-dimensional plane to generate a signal time-frequency diagram. The target signal data in the signal time-frequency diagram is detected by using a deep neural network, which is called a target detection problem. The signal time-frequency diagram can be subjected to target detection through a YOLO algorithm, a YOLOV3 algorithm and a Poly-YOLO algorithm, and when the signal time-frequency diagram is subjected to target detection through the YOLO algorithm, the detection effect on small targets and dense targets is poor, and target detection can hardly be performed on the signal time-frequency diagram with short and dense signals. When the Yolov3 algorithm detects the target of the signal time-frequency diagram, the problems of inaccurate identification of the large target, inaccurate frame regression and inaccurate accurate detection of the dense small target exist; due to the network structure of the Poly-YOLO algorithm, convergence is difficult during training, and training and testing speeds are slow, so that the rapidness and the instantaneity of signal detection cannot be met.
Disclosure of Invention
In order to solve the problems that the existing target detection algorithm is poor in detection effect and difficult to train on different types of signal data in a signal time-frequency diagram, the application provides a training method and device, computer equipment and a storage medium of a target detection model, the signal data of different types of signal time-frequency diagrams can be accurately detected, the training and detection speed is high, and the real-time performance of signal detection can be met.
In order to achieve the purpose, the invention adopts the following technical scheme:
in a first aspect, an embodiment of the present application provides a method for training a target detection model, where the method includes:
acquiring a training set, wherein the training set comprises a plurality of signal time-frequency graphs;
constructing an initial detection model, wherein the initial detection model comprises a backbone network and a head network;
inputting the signal time-frequency graphs into the backbone network, calculating each signal time-frequency graph through the backbone network, and outputting a plurality of characteristic layers;
inputting a plurality of characteristic layers into the head network, and respectively calculating each characteristic layer through the head network to obtain a final prediction result corresponding to each signal time-frequency diagram;
acquiring real target information of the signal time-frequency diagrams;
constructing a preset loss function, and calculating the current training loss between the final prediction result corresponding to each signal time-frequency diagram and the real target information through the preset loss function;
and adjusting parameters of the initial detection model according to the current training loss feedback to obtain a target detection model.
In one possible implementation, the header network includes: a first transformation convolutional layer, a second transformation convolutional layer, a third transformation convolutional layer, a fourth transformation convolutional layer, a first expansion convolutional layer, a second expansion convolutional layer, a first deconvolution layer, a second deconvolution layer and a subsequent convolutional layer; the plurality of feature layers includes: the first characteristic layer, the second characteristic layer and the third characteristic layer;
the step of inputting the first feature layer, the second feature layer, and the third feature layer into the head network, and calculating the first feature layer, the second feature layer, and the third feature layer through the head network to obtain final prediction results corresponding to the signal time-frequency diagrams includes:
obtaining a first output result by the second characteristic layer through the second transformation convolution layer;
converting the number of channels of the third feature layer through the third conversion convolutional layer, expanding the visual field degree through the first expansion convolutional layer, and performing up-sampling through the first deconvolution layer to obtain a second output result;
adding the first output result and the second output result to obtain a third output result;
converting the third output result by the fourth conversion convolutional layer according to the number of channels, expanding the visibility by the second expansion convolutional layer, and performing upsampling by the second deconvolution layer to obtain a fourth output result;
obtaining a fifth output result by passing the first characteristic layer through the first transformation convolution layer;
adding the fourth output result and the fifth output result to obtain a sixth output result;
and obtaining the final prediction result corresponding to each signal time-frequency diagram by the sixth output result through the subsequent convolutional layer.
In one possible implementation, the method further includes:
dividing each signal time-frequency diagram into a plurality of grids, and setting a preset number of anchor frames for each grid;
the preset loss function includes: a first loss function, a second loss function, a third loss function;
the step of calculating the current training loss between the final prediction result corresponding to each signal time-frequency diagram and the real target information through the preset loss function comprises the following steps:
calculating the loss of the target of each anchor frame through the first loss function, and calculating the sum of the loss of the target of each anchor frame to obtain a first prediction loss;
calculating the target category loss of each anchor frame through the second loss function, and calculating the sum of the target category losses of each anchor frame to obtain a second prediction loss;
calculating the target coordinate loss of each anchor frame through the third loss function, and calculating the sum of the target coordinate losses of each anchor frame to obtain a third prediction loss;
and summing the product value of the first prediction loss multiplied by a first weight coefficient, the product value of the second prediction loss multiplied by a second weight coefficient and the product value of the third prediction loss multiplied by a third weight coefficient to obtain the current training loss.
In one possible implementation, the step of calculating whether there is a loss of the target of each of the anchor frames by the first loss function includes:
when the jth anchor frame of the ith grid contains real target information, calculating the target loss of the jth anchor frame of the ith grid according to the confidence coefficient of the fact that the real target information exists in the jth anchor frame of the ith grid through the first loss function;
and when the jth anchor frame of the ith grid does not contain real target information, calculating the intersection ratio of the jth anchor frame of the ith grid and the area where the real target information exists according to the confidence coefficient of the real target information of the jth anchor frame of the ith grid and the jth anchor frame of the ith grid in the predicted jth anchor frame of the ith grid through the first loss function, wherein the target of the jth anchor frame of the ith grid has no loss.
In a possible implementation manner, when the jth anchor frame of the ith mesh contains real target information, calculating, by the first loss function, a confidence that the jth anchor frame of the ith mesh has the real target information according to the fact that the jth anchor frame of the ith mesh has the real target information, where there is no loss in the target of the jth anchor frame of the ith mesh, including:
calculating the target loss of the jth anchor frame of the ith grid according to the following formula (1):
formula (1):
Figure M_220323154755031_031800001
wherein,
Figure M_220323154755094_094368001
said object being the jth anchor frame of the ith mesh has no loss,
Figure M_220323154755125_125540002
there is a confidence level of the true target information for the jth anchor box of the predicted ith mesh,
Figure M_220323154755156_156781003
the jth anchor box representing the ith mesh contains real target information.
In a possible implementation manner, when the jth anchor frame of the ith mesh does not contain real target information, the calculating, by the first loss function, the confidence that the real target information exists in the jth anchor frame of the ith mesh and the intersection of the jth anchor frame of the ith mesh and the region where the real target information exists are lossless according to the predicted jth anchor frame of the ith mesh, and the calculating includes:
calculating the object loss of the jth anchor frame of the ith grid according to formula (2):
the formula (2) is:
Figure M_220323154755188_188992001
Figure M_220323154755236_236411002
wherein,
Figure M_220323154755267_267612001
said object being the jth anchor frame of the ith mesh has no loss,
Figure M_220323154755298_298892002
there is a confidence level of the true target information for the jth anchor box of the predicted ith mesh,
Figure M_220323154755330_330132003
the intersection ratio of the jth anchor frame of the ith grid and the area where the real target information is located,
Figure M_220323154755345_345734004
the jth anchor box representing the ith mesh contains no real target information.
In a possible implementation manner, the step of calculating the object class penalty of each anchor frame through the second penalty function includes:
determining the target class penalty for each of the anchor boxes according to equation (3), wherein:
the formula (3) is:
Figure M_220323154755377_377011001
wherein,
Figure M_220323154755456_456584001
the target category loss of the jth anchor frame of the ith mesh, nc is the preset number, k is the category of the real target information,
Figure M_220323154755472_472190002
when the prediction category corresponding to the jth anchor frame of the ith grid is the category of the real target information, the prediction category is 1, otherwise, the prediction category is 0,
Figure M_220323154755503_503465003
to predict the probability that the jth anchor box of the ith mesh belongs to the category of the true target information.
In a possible implementation manner, the step of calculating the target coordinate loss of each anchor frame through the third loss function includes:
determining the target coordinate penalty for each of the anchor boxes according to equation (4), wherein:
equation (4) is:
Figure M_220323154755519_519081001
wherein,
Figure M_220323154755582_582558001
the target coordinate penalty for the jth anchor box of the ith mesh,
Figure M_220323154755614_614319002
represents the intersection ratio of the predicted target area corresponding to the jth anchor frame of the ith grid and the area where the real target information is located,
Figure M_220323154755645_645548003
the method comprises the steps of representing the straight-line distance between the center point of a prediction target area corresponding to the jth anchor frame of the ith grid and the center point of an area where the real target information is located, representing the diagonal length of the minimum circumscribed matrix of the prediction target area corresponding to the jth anchor frame of the ith grid and the area where the real target information is located, wherein alpha is a first preset parameter, and v is a second preset parameter.
In one possible implementation, the method further includes:
clustering the real target information in each signal time-frequency diagram into the preset number of cluster categories through a Kmeans algorithm to obtain cluster centers corresponding to the cluster categories;
and acquiring the coordinates of each clustering center, and adjusting the size of the corresponding anchor frame according to the coordinates of each clustering center.
In a possible implementation manner, the step of adjusting parameters of the initial detection model according to the current training loss feedback to obtain a target detection model includes:
judging whether the current training loss is smaller than a preset training loss threshold value or not;
if the current training loss is larger than the preset training loss threshold, continuing training the initial detection model, adjusting the initial detection model in a back propagation mode through a small batch gradient descent algorithm until the training loss continuously obtained in N training periods is smaller than the preset training loss threshold, and taking the adjusted initial detection model as the target detection model.
In a second aspect, an embodiment of the present application provides an apparatus for training a target detection model, where the apparatus includes:
the device comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a training set, and the training set comprises a plurality of signal time-frequency graphs;
the device comprises a first construction module, a second construction module and a third construction module, wherein the first construction module is used for constructing an initial detection model, and the initial detection model comprises a backbone network and a head network;
the first calculation module is used for inputting the signal time-frequency graphs into the backbone network, calculating each signal time-frequency graph through the backbone network and outputting a plurality of characteristic layers;
the second calculation module is used for inputting the plurality of characteristic layers into the head network, and calculating each characteristic layer through the head network to obtain a final prediction result corresponding to each signal time-frequency diagram;
the second acquisition module is used for acquiring the real target information of the signal time-frequency diagrams;
the second construction module is used for constructing a preset loss function and calculating the current training loss between the final prediction result corresponding to each signal time-frequency graph and the real target information through the preset loss function;
and the adjusting module is used for adjusting the parameters of the initial detection model according to the current training loss feedback to obtain a target detection model.
In a third aspect, an embodiment of the present application provides a computer device, where the computer device includes a memory and a processor, where the memory stores a computer program, and the computer program, when executed by the processor, performs the training method for the target detection model according to the first aspect.
In a fourth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to perform the training method of the object detection model according to the first aspect.
Compared with the prior art, the method has the following beneficial effects:
the training method, the training device, the computer equipment and the storage medium of the target detection model provided by the embodiment use a network structure added with an expansion convolutional layer and a deconvolution layer, and expand the visual field of a characteristic layer; and a loss function added with the weight value of the negative sample is used, so that the target detection model is converged quickly in training. The finally obtained target detection model can quickly and accurately detect various types of signal data in the signal time-frequency diagram.
Drawings
In order to more clearly explain the technical solutions of the present application, the drawings needed to be used in the embodiments are briefly introduced below, and it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope of protection of the present application. Like reference numerals are used for like elements in the various figures, and other related figures may be derived from those figures by those of ordinary skill in the art without inventive faculty.
FIG. 1 is a flow chart of a method for training a target detection model according to an embodiment of the present invention;
fig. 2 is an exemplary diagram of a signal time-frequency diagram according to an embodiment of the present invention;
FIG. 3 is a diagram of an exemplary initial detection model provided by an embodiment of the invention;
FIG. 4 is another schematic flow chart of a method for training a target detection model according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a training apparatus for an object detection model according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments.
The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
Hereinafter, the terms "including", "having", and their derivatives, which may be used in various embodiments of the present invention, are only intended to indicate specific features, numbers, steps, operations, elements, components, or combinations of the foregoing, and should not be construed as first excluding the existence of, or adding to, one or more other features, numbers, steps, operations, elements, components, or combinations of the foregoing.
Furthermore, the terms "first," "second," "third," and the like are used solely to distinguish one from another and are not to be construed as indicating or implying relative importance.
Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which various embodiments of the present invention belong. The terms (such as those defined in commonly used dictionaries) should be interpreted as having a meaning that is consistent with their contextual meaning in the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein in various embodiments of the present invention.
Example 1
The embodiment provides a training method of a target detection model. As shown in fig. 1, the training method for providing the target detection model in this embodiment includes the following steps:
step S110, a training set is obtained, wherein the training set comprises a plurality of signal time-frequency graphs.
When signal data in the signal time-frequency diagram is detected through the target detection model, a region where a signal exists is a signal target region, a region where the signal does not exist is a noise region, and the signal target region and the noise region have a clear boundary. As shown in fig. 2, fig. 2 is an example of a signal time-frequency diagram, where the abscissa represents the time domain and the ordinate represents the frequency domain, and some signal data with short duration and small frequency range are reflected to a small and dense rectangular object, such as 201 in fig. 2, on the signal time-frequency diagram; some signal data with long duration and wide frequency range are reflected in the signal time-frequency diagram as a rectangular object with large length and width, such as 202 in fig. 2. In practical applications, these different types of signals may appear in one signal time-frequency diagram at the same time, and in order to better detect various types and shapes of signal data in the signal time-frequency diagram, this embodiment provides a target detection model and a training method thereof.
Step S120, an initial detection model is constructed, and the initial detection model comprises a backbone network and a head network.
In one embodiment, the backbone network is a Resnet18 network, which has a smaller network structure and is faster in training and operation, and can better meet the real-time requirement of target detection.
In one embodiment, the header network comprises: a first transformational convolutional layer, a second transformational convolutional layer, a third transformational convolutional layer, a fourth transformational convolutional layer, a first expansion convolutional layer, a second expansion convolutional layer, a first deconvolution layer, a second deconvolution layer and a subsequent convolutional layer. In steps S130 and S140, the operation of the above-described convolutional layer will be described in detail.
Step S130, inputting the signal time-frequency graphs into the backbone network, calculating each signal time-frequency graph through the backbone network, and outputting a plurality of characteristic layers.
As shown in fig. 3, in an embodiment, when passing through a backbone network ResNet18, a signal time-frequency diagram passes through a stem layer and four Block layers (B1, B2, B3, B4), wherein a feature layer obtained after passing through a second Block layer B2 is F2, i.e., a first feature layer; the characteristic layer obtained after passing through the third Block layer B3 is F3, namely a second characteristic layer; the feature layer obtained after the last Block layer B4 is F4, the third feature layer. The feature layer includes three dimensions: the length and width of each feature map are equal to the number of channels, and the number of channels represents the number of layers of the feature map, which is usually the number of convolution kernels of a convolutional layer passed before the feature layer.
Step S140, inputting the plurality of feature layers into the head network, and calculating each feature layer through the head network to obtain a final prediction result corresponding to each signal time-frequency diagram. The header network, i.e., HeadNet in fig. 3.
As shown in fig. 3 and 4, the step of inputting the first feature layer, the second feature layer, and the third feature layer into the head network, and calculating the first feature layer, the second feature layer, and the third feature layer through the head network to obtain the final prediction result corresponding to each signal time-frequency diagram includes:
step S410, obtaining a first output result from the second feature layer through the second transform convolution layer.
The second feature layer F3 has been changed in the number of channels by the second conversion convolution layer conv2, resulting in a first output result.
Step S420, transform the number of channels of the third feature layer through the third transformed convolutional layer, expand the visibility through the first expanded convolutional layer, and perform upsampling through the first deconvolution layer to obtain a second output result.
The third featured layer, F4 in fig. 3. In one embodiment, the third transformed convolutional layer conv3 is a 1 × 1 convolutional layer; the expansion rate of the first expanded convolutional layer diaconv1 is 2, and the effect is to expand the visual field; the first deconvolution layer transconv1 is used to transform the length and width of the feature map in the third feature layer to 2 times the original length and width, but does not change the number of channels.
In step S430, the first output result and the second output result are added to obtain a third output result, i.e., H1 in fig. 3.
Step S440, transforming the third output result by the fourth transform convolutional layer by the number of channels, expanding the visibility by the second expanded convolutional layer, and performing upsampling by the second deconvolution layer to obtain a fourth output result.
The parameters of the fourth convolutional layer conv4, the second convolutional layer diaconv2, and the second convolutional layer transconv2 are set according to specific tasks, and are not limited herein.
Step S450, a fifth output result is obtained by passing the first feature layer through the first transform convolution layer.
The first feature layer F2 has the number of channels changed by the first conversion convolution layer conv1, and the parameters of the first conversion convolution layer conv1 are set according to specific tasks, and are not limited herein.
In step S460, the fourth output result and the fifth output result are added to obtain a sixth output result, i.e., H2 in fig. 3.
And step S470, obtaining the final prediction result corresponding to each signal time-frequency diagram through the subsequent convolutional layer according to the sixth output result.
The subsequent convolution layer convs only changes the number of channels of the sixth output result, and does not change the size of the feature map in the sixth output result, so that the final prediction result is H3 in fig. 3.
In the embodiment, all the feature layers, the feature maps and the output results are stored in the computer device in the form of a matrix, and the addition and subtraction operations thereof follow the algorithm of the matrix.
And step S150, acquiring the real target information of the signal time-frequency diagrams.
In this embodiment, the real target information is signal data of an area where the real target is located in the signal time-frequency diagram, and includes whether the signal data exists in the area where the real target is located, a category of the signal data, and a coordinate of the signal data.
Step S160, constructing a preset loss function, and calculating the current training loss between the final prediction result corresponding to each signal time-frequency diagram and the real target information through the preset loss function.
In an embodiment, the step of calculating, by using the preset loss function, a current training loss between a final predicted result corresponding to each signal time-frequency diagram and real target information includes:
dividing each signal time-frequency diagram into a plurality of grids, and setting a preset number of anchor frames for each grid;
in the target detection algorithm, an Anchor Box (Anchor Box) is a plurality of preset rectangular boxes with different sizes for detecting signal data, which are centered on the central point of the grid, and the size of the Anchor Box is configured when the model is configured.
In one embodiment, the size of the anchor box is obtained by the Kmeans algorithm. Clustering the real target information in each signal time-frequency diagram into the preset number of cluster categories through a Kmeans algorithm to obtain cluster centers corresponding to the cluster categories; and acquiring the coordinates of each clustering center, and adjusting the size of the corresponding anchor frame according to the coordinates of each clustering center. The Kmeans algorithm is a common clustering algorithm for classifying each sample in the data set into a class corresponding to the cluster center with the smallest distance.
The number of anchor frames set for the grid is equal to the cluster type set during clustering, such as: and (4) clustering the real target information in the signal time-frequency diagram into 9 types, and setting 9 anchor frames for each grid.
In one embodiment, the preset loss function comprises: a first loss function, a second loss function, and a third loss function.
Calculating the loss of the target of each anchor frame through the first loss function, and calculating the sum of the loss of the target of each anchor frame to obtain a first prediction loss;
calculating the target category loss of each anchor frame through the second loss function, and calculating the sum of the target category losses of each anchor frame to obtain a second prediction loss;
calculating the target coordinate loss of each anchor frame through the third loss function, and calculating the sum of the target coordinate losses of each anchor frame to obtain a third prediction loss;
in one embodiment, said step of calculating by said first penalty function whether there is a penalty on the target of each of said anchor boxes comprises:
when the jth anchor frame of the ith grid contains real target information, calculating the target loss of the jth anchor frame of the ith grid according to the confidence coefficient of the fact that the real target information exists in the jth anchor frame of the ith grid through the first loss function;
and when the jth anchor frame of the ith grid does not contain real target information, calculating the intersection ratio of the jth anchor frame of the ith grid and the area where the real target information exists according to the confidence coefficient of the real target information of the jth anchor frame of the ith grid and the jth anchor frame of the ith grid in the predicted jth anchor frame of the ith grid through the first loss function, wherein the target of the jth anchor frame of the ith grid has no loss.
And the confidence of the real target information is calculated by a target detection model, and the intersection and union ratio of the jth anchor frame of the ith grid and the area of the real target information is the ratio of the intersection and union of the area of the jth anchor frame of the ith grid and the area of the real target information.
In one embodiment, when the jth anchor frame of the ith mesh contains real target information, calculating, by the first loss function, whether there is a loss of the target of the jth anchor frame of the ith mesh according to a confidence that the jth anchor frame of the ith mesh has the real target information, the method includes:
calculating the target loss of the jth anchor frame of the ith grid according to the following formula (1):
formula (1):
Figure M_220323154755676_676809001
wherein,
Figure M_220323154755723_723674001
said object being the jth anchor frame of the ith mesh has no loss,
Figure M_220323154755754_754913002
there is a confidence level of the true target information for the jth anchor box of the predicted ith mesh,
Figure M_220323154755788_788094003
the jth anchor box representing the ith mesh contains real target information.
It should be noted that although the base of the logarithmic function log in the formula (1) is omitted in writing, the base of the logarithmic function log is 1 in practical application, and the formulas (2) and (3) are the same.
In one embodiment, when the jth anchor frame of the ith mesh does not contain real target information, the calculating, by the first loss function, the confidence that the real target information exists according to the jth anchor frame of the ith mesh and the intersection of the jth anchor frame of the ith mesh and the region where the real target information exists are lossless according to the predicted jth anchor frame of the ith mesh, and the method includes:
calculating according to formula (2)
Figure M_220323154755819_819853001
A first of the grid
Figure M_220323154755851_851114002
The target of each anchor frame has no loss:
the formula (2) is:
Figure M_220323154755866_866744001
Figure M_220323154755913_913593002
wherein,
Figure M_220323154755960_960491001
said object being the jth anchor frame of the ith mesh has no loss,
Figure M_220323154755976_976129002
there is a confidence level of the true target information for the jth anchor box of the predicted ith mesh,
Figure M_220323154756009_009430003
the intersection ratio of the jth anchor frame of the ith grid and the area where the real target information is located,
Figure M_220323154756024_024944004
the jth anchor box representing the ith mesh contains no real target information.
In the following description of the loss function, a positive sample represents a sample that is consistent with a true target information category, and a negative sample represents a sample that is inconsistent with the true target information category. In the formulas (1) and (2), the confidence coefficients of the true target information of all the anchor frame predictions are calculated, and punishments are made on the anchor frame with the wrong prediction according to the error degree. In practical application, there may be a case where a certain anchor frame does not have a target, but has a large overlapping area with a region where real target information is located, in which case the anchor frame is relatively high in intersection with the region where the real target information is located, and generally, the anchor frame in a grid near a center of the real target information may have such a case, and the confidence of the anchor frame is considered to be generalThe degree should be predicted to be 1 or 0, which affects the training effect of the target detection model. Therefore, in this case, the confidence that the actual target information exists in the anchor frame is multiplied by a negative sample weight value
Figure M_220323154756056_056175001
If, if
Figure M_220323154756087_087441002
The weight value of the negative sample is higher through square calculation, so that the weight value of the negative sample becomes very low, and even if the confidence coefficient of the target predicted by the anchor frame is close to 1, the large target cannot be generated and no loss exists; and if the anchor frame is a blank area,
Figure M_220323154756103_103065003
a negative sample weight of 1 at 0 will result in a large target with no loss. It needs to be noticed that when the target detection model is trained in the early stage, the gradient explosion phenomenon is easily generated without loss of the target because the number of the anchor frames is too large and the positive and negative samples are quite unbalanced when the target detection model is trained, and the target detection model is used for detecting the target in the early stage
Figure M_220323154756134_134343004
The accumulation of extreme values close to 0 or close to 1 produces a target with no loss and a large gradient, so that the target with no loss is calculated before the target with no loss is calculated
Figure M_220323154756149_149948005
Make truncation, the scheme will
Figure M_220323154756182_182131006
Cut off to [0.0001, 0.9999 ]]Within the interval. The specific cutting-off mode is as follows: values less than 0.0001 are taken as unity and values greater than 0.9999 are taken as unity and 0.9999, for example: if it is
Figure M_220323154756198_198270007
When the value is 0.00005, the calculation is carried out according to 0.0001; if it is
Figure M_220323154756229_229529008
0.99999, the calculation is performed according to 0.9999.
According to the embodiment, the negative sample weight value is added, so that the target detection model is forced to distinguish a real target area and a blank target area, the training effect of the target detection model is improved, and the gradient explosion phenomenon is avoided by intercepting the data interval.
In one embodiment, the step of calculating the object class penalty for each of the anchor boxes by the second penalty function comprises: determining the target class penalty for each of the anchor boxes according to equation (3), wherein:
the formula (3) is:
Figure M_220323154756245_245185001
wherein,
Figure M_220323154756307_307684001
the target category loss of the jth anchor frame of the ith mesh, nc is the preset number, k is the category of the real target information,
Figure M_220323154756338_338905002
when the prediction category corresponding to the jth anchor frame of the ith grid is the category of the real target information, the prediction category is 1, otherwise, the prediction category is 0,
Figure M_220323154756354_354521003
to predict the probability that the jth anchor box of the ith mesh belongs to the category of the true target information.
The object class penalty is only generated if the anchor box contains real object information, and is 0 in the rest of the cases. The target class loss of a single anchor frame is used for calculating the average value of the two-class cross entropy loss for all the classes of the single anchor frame, and the average value is used for measuring the prediction accuracy of the target detection model. For example: if the signals in the signal time-frequency diagram are classified into 9 classes, the preset number nc is also 9,
Figure M_220323154756386_386724001
representing the probability of predicting that the jth anchor box of the ith mesh belongs to the category k of the real target information.
In one embodiment, the step of calculating a target coordinate penalty for each of the anchor boxes by the third penalty function comprises:
determining the target coordinate penalty for each of the anchor boxes according to equation (4), wherein:
equation (4) is:
Figure M_220323154756418_418483001
wherein,
Figure M_220323154756465_465364001
the target coordinate penalty for the jth anchor box of the ith mesh,
Figure M_220323154756481_481003002
represents the intersection ratio of the predicted target area corresponding to the jth anchor frame of the ith grid and the area where the real target information is located,
Figure M_220323154756512_512243003
the method comprises the steps of representing the straight-line distance between the center point of a prediction target area corresponding to the jth anchor frame of the ith grid and the center point of an area where the real target information is located, representing the diagonal length of the minimum circumscribed matrix of the prediction target area corresponding to the jth anchor frame of the ith grid and the area where the real target information is located, wherein alpha is a first preset parameter, and v is a second preset parameter.
In one embodiment, v is calculated by equation (5):
equation (5) is:
Figure M_220323154756543_543488001
v is a second preset parameter which is set by the user,
Figure M_220323154756592_592296001
indicating the width of the area where the real target information is located of the jth anchor box of the ith mesh,
Figure M_220323154756624_624086002
and the height of the area where the real target information of the jth anchor box of the ith grid is located is represented, w is the width of the prediction target area, and h is the height of the prediction target area.
In one embodiment, the calculation of α is represented by equation (6):
equation (6) is:
Figure M_220323154756655_655320001
alpha is a first preset parameter and alpha is a second preset parameter,
Figure M_220323154756686_686616001
and v are explained in the formula (4) and the formula (5), and are not described in detail herein.
And summing the product value of the first prediction loss multiplied by a first weight coefficient, the product value of the second prediction loss multiplied by a second weight coefficient and the product value of the third prediction loss multiplied by a third weight coefficient to obtain the current training loss. The first weight coefficient, the second weight coefficient and the third weight coefficient are preset parameters.
And S170, adjusting parameters of the initial detection model according to the current training loss feedback to obtain a target detection model.
The step of adjusting the parameters of the initial detection model according to the current training loss feedback to obtain a target detection model comprises:
judging whether the current training loss is smaller than a preset training loss threshold value or not;
if the current training loss is larger than the preset training loss threshold, continuing training the initial detection model, adjusting the initial detection model in a back propagation mode through a small batch gradient descent algorithm until the training loss continuously obtained in N training periods is smaller than the preset training loss threshold, and taking the adjusted initial detection model as the target detection model.
In the actual training process, training parameters and verification parameters are preset. Training parameters include, but are not limited to, total number of training rounds, number of samples for one training, etc.; the verification parameters comprise an evaluation period, an evaluation index, a preset training loss threshold value and the like. During actual training, in a training period, inputting the signal time-frequency diagrams with the preset number of samples into the initial detection model, calculating the training loss through the loss function preset in step S160 by using a random data enhancement method for each signal time-frequency diagram, reversely transmitting the training loss to the initial detection model, and adjusting the parameters of the initial detection model. In one embodiment, a Mini-Batch Gradient decline algorithm may be used.
When the remainder of the training cycle number to the evaluation cycle number is 0, that is, the training cycle number is a multiple of the evaluation cycle, the evaluation index is calculated. The evaluation index may be an average accuracy (MAP) used for measuring the detection accuracy of the adjusted initial detection model on the signal data in the signal time-frequency diagram.
In practical application, when the number of training cycles reaches the total number of training rounds, the training may be stopped, and the adjusted initial detection model may be used as the target detection model and the evaluation index of the target detection model may be calculated. However, in the method adopted in this embodiment, when the training loss obtained in N training periods is smaller than the preset training loss threshold, the adjusted initial detection model is used as the target detection model, so that it can be ensured that the training is effective to some extent.
The existing target detection mainly comprises the following schemes: (1) designing a rectangular frame according to the prior knowledge of the input signal, and designing the rectangular frame with the corresponding size according to all possible durations and bandwidths of the input signal for detection; strong signal prior knowledge is needed, and the calculated amount is large; (2) the YOLO algorithm divides an image into 7 × 7 meshes, sets 2 anchor frames for each mesh, and detects a signal from the anchor frames; but because the number of grids is small, the detection effect on small targets and intensive targets is poor; (3) the YOLOV3 algorithm increases the number of anchor frames of each grid to 9, and divides the anchor frames into three dimensions of large, medium and small; but the dense small targets are not accurately identified; (4) the Poly-YOLO algorithm reduces the grid dimension to be smaller, and the grid division is denser; however, the target detection model is slow to train and test, and the coordinate and boundary regression of a large target is inaccurate.
Compared with the existing scheme, in the training method of the target detection model provided by the embodiment, the network structure of the target detection model is improved, the problem that the algorithms of YOLO and YOLOV3 are inaccurate in identifying the dense small targets is solved, the problems that the training and testing of the Poly-YOLO network are slow and the regression to the coordinates and the boundary of the large target is inaccurate are solved, and the target data in the signal time-frequency diagram can be quickly and accurately detected. The training method for the target detection model provided by the embodiment further improves the loss function, and proposes to refer to the negative sample weight coefficient of the cross-over ratio to calculate the loss of the negative sample, so that the target detection model can smoothly learn to distinguish a real target area and a blank target area, and can quickly converge in the training.
The training method of the target detection model provided by this embodiment improves the network structure of the target detection model, performs upsampling using the expansion convolution layer and the deconvolution layer, expands the visibility of the feature layer, and adds the negative sample weight coefficient to the loss function, so that the target detection model can distinguish the real target area from the blank target area more smoothly during training, and can train and detect signal data in the signal time-frequency diagram quickly, and accurately detect various types of signal data.
Example 2
Referring to fig. 5, the training apparatus 500 for a target detection model includes a first obtaining module 510, a first constructing module 520, a first calculating module 530, a second calculating module 540, a second obtaining module 550, a second constructing module 560, and an adjusting module 570.
In this embodiment, the first obtaining module 510 is configured to: acquiring a training set, wherein the training set comprises a plurality of signal time-frequency graphs;
the first building block 520 is configured to: constructing an initial detection model, wherein the initial detection model comprises a backbone network and a head network;
the first calculation module 530 is configured to: inputting the signal time-frequency graphs into the backbone network, calculating each signal time-frequency graph through the backbone network, and outputting a plurality of characteristic layers;
the second computing module 540 is configured to: inputting a plurality of characteristic layers into the head network, and respectively calculating each characteristic layer through the head network to obtain a final prediction result corresponding to each signal time-frequency diagram;
the second obtaining module 550 is configured to: acquiring real target information of the signal time-frequency diagrams;
the second building module 560 is configured to: constructing a preset loss function, and calculating the current training loss between the final prediction result corresponding to each signal time-frequency diagram and the real target information through the preset loss function;
the adjusting module 570 is configured to: and adjusting parameters of the initial detection model according to the current training loss feedback to obtain a target detection model.
In an embodiment, the second calculating module 540 is specifically configured to: obtaining a first output result by the second characteristic layer through the second transformation convolution layer;
converting the number of channels of the third feature layer through the third conversion convolutional layer, expanding the visual field degree through the first expansion convolutional layer, and performing up-sampling through the first deconvolution layer to obtain a second output result;
adding the first output result and the second output result to obtain a third output result;
converting the third output result by the fourth conversion convolutional layer according to the number of channels, expanding the visibility by the second expansion convolutional layer, and performing upsampling by the second deconvolution layer to obtain a fourth output result;
obtaining a fifth output result by passing the first characteristic layer through the first transformation convolution layer;
adding the fourth output result and the fifth output result to obtain a sixth output result;
and obtaining the final prediction result corresponding to each signal time-frequency diagram by the sixth output result through the subsequent convolutional layer.
In an embodiment, the second building module 560 is specifically configured to: dividing each signal time-frequency diagram into a plurality of grids, and setting a preset number of anchor frames for each grid;
calculating the loss of the target of each anchor frame through the first loss function, and calculating the sum of the loss of the target of each anchor frame to obtain a first prediction loss;
calculating the target category loss of each anchor frame through the second loss function, and calculating the sum of the target category losses of each anchor frame to obtain a second prediction loss;
calculating the target coordinate loss of each anchor frame through the third loss function, and calculating the sum of the target coordinate losses of each anchor frame to obtain a third prediction loss;
and summing the product value of the first prediction loss multiplied by a first weight coefficient, the product value of the second prediction loss multiplied by a second weight coefficient and the product value of the third prediction loss multiplied by a third weight coefficient to obtain the current training loss.
In an embodiment, the second building module 560 is specifically further configured to: when the jth anchor frame of the ith grid contains real target information, calculating the target loss of the jth anchor frame of the ith grid according to the confidence coefficient of the fact that the real target information exists in the jth anchor frame of the ith grid through the first loss function;
and when the jth anchor frame of the ith grid does not contain real target information, calculating the intersection ratio of the jth anchor frame of the ith grid and the area where the real target information exists according to the confidence coefficient of the real target information of the jth anchor frame of the ith grid and the jth anchor frame of the ith grid in the predicted jth anchor frame of the ith grid through the first loss function, wherein the target of the jth anchor frame of the ith grid has no loss.
In an embodiment, the second building module 560 is further specifically configured to: calculating the target loss of the jth anchor frame of the ith grid according to the following formula (1):
the formula (1) is:
Figure M_220323154756717_717861001
wherein,
Figure M_220323154756780_780331001
said object being the jth anchor frame of the ith mesh has no loss,
Figure M_220323154756813_813024002
there is a confidence level of the true target information for the jth anchor box of the predicted ith mesh,
Figure M_220323154756828_828650003
the jth anchor box representing the ith mesh contains real target information.
In one embodiment, the objective of the jth anchor frame of the ith mesh is computed with no loss according to equation (2):
the formula (2) is:
Figure M_220323154756859_859894001
Figure M_220323154756906_906768002
wherein,
Figure M_220323154756938_938036001
said object being the jth anchor frame of the ith mesh has no loss,
Figure M_220323154756969_969274002
for predicted ith gridThe jth anchor box has a confidence level of the true target information,
Figure M_220323154757002_002010003
the j-th anchor frame of the i-th grid is compared with the intersection of the area where the real target information is located,
Figure M_220323154757017_017160004
the jth anchor box representing the ith mesh contains no real target information.
In an embodiment, the second building module 560 is further specifically configured to: determining the target class penalty for each of the anchor boxes according to equation (3), wherein:
the formula (3) is:
Figure M_220323154757048_048870001
wherein,
Figure M_220323154757111_111346001
the target category loss of the jth anchor frame of the ith mesh, nc is the preset number, k is the category of the real target information,
Figure M_220323154757142_142616002
when the prediction category corresponding to the jth anchor frame of the ith grid is the category of the real target information, the prediction category is 1, otherwise, the prediction category is 0,
Figure M_220323154757173_173862003
to predict the probability that the jth anchor box of the ith mesh belongs to the category of the true target information.
In an embodiment, the second building module 560 is further specifically configured to: determining the target coordinate penalty for each of the anchor boxes according to equation (4), wherein:
equation (4) is:
Figure M_220323154757190_190435001
wherein,
Figure M_220323154757237_237819001
the target coordinate penalty for the jth anchor box of the ith mesh,
Figure M_220323154757269_269097002
represents the intersection ratio of the predicted target area corresponding to the jth anchor frame of the ith grid and the area where the real target information is located,
Figure M_220323154757300_300361003
the method comprises the steps of representing the straight-line distance between the center point of a prediction target area corresponding to the jth anchor frame of the ith grid and the center point of an area where the real target information is located, representing the diagonal length of the minimum circumscribed matrix of the prediction target area corresponding to the jth anchor frame of the ith grid and the area where the real target information is located, wherein alpha is a first preset parameter, and v is a second preset parameter.
In an embodiment, the second building module 560 is further specifically configured to: clustering the real target information in each signal time-frequency diagram into the preset number of cluster categories through a Kmeans algorithm to obtain cluster centers corresponding to the cluster categories;
and acquiring the coordinates of each clustering center, and adjusting the size of the corresponding anchor frame according to the coordinates of each clustering center.
In an embodiment, the adjusting module 570 is specifically configured to: judging whether the current training loss is smaller than a preset training loss threshold value or not;
if the current training loss is larger than the preset training loss threshold, continuing training the initial detection model, adjusting the initial detection model in a back propagation mode through a small batch gradient descent algorithm until the training loss continuously obtained in N training periods is smaller than the preset training loss threshold, and taking the adjusted initial detection model as the target detection model.
The training device for the target detection model provided by this embodiment improves the network structure of the target detection model, performs upsampling using the expansion convolution layer and the deconvolution layer, expands the visibility of the feature layer, and adds the negative sample weight coefficient to the loss function, so that the target detection model can distinguish the real target area from the blank target area more smoothly during training, and can train and detect signal data in the signal time-frequency diagram quickly, and accurately detect various types of signal data.
Example 3
The present embodiment provides a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the training method of the object detection model according to embodiment 1.
The computer device provided in this embodiment may implement the method for training the target detection model described in embodiment 1, and details are not described here again to avoid repetition.
Example 4
The present embodiment provides a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the training method of the object detection model according to embodiment 1.
The computer-readable storage medium provided in this embodiment may implement the method for training the target detection model described in embodiment 1, and is not described herein again to avoid repetition.
In this embodiment, the computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method can be implemented in other ways. The apparatus embodiments described above are merely illustrative and, for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, each functional module or unit in each embodiment of the present invention may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention.

Claims (13)

1. A method for training an object detection model, the method comprising:
acquiring a training set, wherein the training set comprises a plurality of signal time-frequency graphs;
constructing an initial detection model, wherein the initial detection model comprises a backbone network and a head network;
inputting the signal time-frequency graphs into the backbone network, calculating each signal time-frequency graph through the backbone network, and outputting a plurality of characteristic layers;
inputting a plurality of characteristic layers into the head network, and respectively calculating each characteristic layer through the head network to obtain a final prediction result corresponding to each signal time-frequency diagram;
acquiring real target information of the signal time-frequency diagrams;
constructing a preset loss function, and calculating the current training loss between the final prediction result corresponding to each signal time-frequency diagram and the real target information through the preset loss function;
and adjusting parameters of the initial detection model according to the current training loss feedback to obtain a target detection model.
2. The method of training of an object detection model according to claim 1, wherein the head network comprises: a first transformation convolutional layer, a second transformation convolutional layer, a third transformation convolutional layer, a fourth transformation convolutional layer, a first expansion convolutional layer, a second expansion convolutional layer, a first deconvolution layer, a second deconvolution layer and a subsequent convolutional layer; the plurality of feature layers includes: the first characteristic layer, the second characteristic layer and the third characteristic layer;
the step of inputting the first feature layer, the second feature layer, and the third feature layer into the head network, and calculating the first feature layer, the second feature layer, and the third feature layer through the head network to obtain final prediction results corresponding to the signal time-frequency diagrams includes:
obtaining a first output result by the second characteristic layer through the second transformation convolution layer;
converting the number of channels of the third feature layer through the third conversion convolutional layer, expanding the visual field degree through the first expansion convolutional layer, and performing up-sampling through the first deconvolution layer to obtain a second output result;
adding the first output result and the second output result to obtain a third output result;
converting the third output result by the fourth conversion convolutional layer according to the number of channels, expanding the visibility by the second expansion convolutional layer, and performing upsampling by the second deconvolution layer to obtain a fourth output result;
obtaining a fifth output result by passing the first characteristic layer through the first transformation convolution layer;
adding the fourth output result and the fifth output result to obtain a sixth output result;
and obtaining the final prediction result corresponding to each signal time-frequency graph by the sixth output result through the subsequent convolution layer.
3. The method of training an object detection model according to claim 1, the method further comprising:
dividing each signal time-frequency diagram into a plurality of grids, and setting a preset number of anchor frames for each grid;
the preset loss function includes: a first loss function, a second loss function, a third loss function;
the step of calculating the current training loss between the final prediction result corresponding to each signal time-frequency diagram and the real target information through the preset loss function comprises the following steps:
calculating the loss of the target of each anchor frame through the first loss function, and calculating the sum of the loss of the target of each anchor frame to obtain a first prediction loss;
calculating the target category loss of each anchor frame through the second loss function, and calculating the sum of the target category losses of each anchor frame to obtain a second prediction loss;
calculating the target coordinate loss of each anchor frame through the third loss function, and calculating the sum of the target coordinate losses of each anchor frame to obtain a third prediction loss;
and summing the product value of the first prediction loss multiplied by a first weight coefficient, the product value of the second prediction loss multiplied by a second weight coefficient and the product value of the third prediction loss multiplied by a third weight coefficient to obtain the current training loss.
4. The method for training an object detection model according to claim 3, wherein said step of calculating whether there is any loss of objects in each of said anchor frames by said first loss function comprises:
when the jth anchor frame of the ith grid contains real target information, calculating the target loss of the jth anchor frame of the ith grid according to the confidence coefficient of the fact that the real target information exists in the jth anchor frame of the ith grid through the first loss function;
and when the jth anchor frame of the ith grid does not contain real target information, calculating the intersection ratio of the jth anchor frame of the ith grid and the area where the real target information exists according to the confidence coefficient of the real target information of the jth anchor frame of the ith grid and the jth anchor frame of the ith grid in the predicted jth anchor frame of the ith grid through the first loss function, wherein the target of the jth anchor frame of the ith grid has no loss.
5. The method for training the object detection model according to claim 4, wherein when the jth anchor frame of the ith mesh contains real object information, calculating, by the first loss function, whether there is any object loss in the jth anchor frame of the ith mesh according to the predicted confidence that the jth anchor frame of the ith mesh has the real object information, the method comprises:
calculating the object loss of the jth anchor frame of the ith grid according to formula (1),
the formula (1) is:
Figure M_220323154740865_865776001
wherein,
Figure M_220323154741238_238832001
said object being the jth anchor frame of the ith mesh has no loss,
Figure M_220323154741413_413611002
there is a confidence level of the true target information for the jth anchor box of the predicted ith mesh,
Figure M_220323154741523_523068003
the jth anchor box representing the ith mesh contains real target information.
6. The method for training the object detection model according to claim 4, wherein when the jth anchor frame of the ith mesh does not contain real object information, calculating the object of the jth anchor frame of the ith mesh without loss according to the predicted confidence that the real object information exists in the jth anchor frame of the ith mesh and the intersection ratio of the jth anchor frame of the ith mesh and the region where the real object information exists by the first loss function, and comprising:
calculating the object loss of the jth anchor frame of the ith grid according to formula (2),
the formula (2) is:
Figure M_220323154741699_699269001
Figure M_220323154741777_777373002
wherein,
Figure M_220323154741893_893145001
said object being the jth anchor frame of the ith mesh has no loss,
Figure M_220323154741924_924332002
there is a confidence level of the true target information for the jth anchor box of the predicted ith mesh,
Figure M_220323154741955_955615003
the intersection ratio of the jth anchor frame of the ith grid and the area where the real target information is located,
Figure M_220323154741989_989263004
the jth anchor box representing the ith mesh contains no real target information.
7. The method for training an object detection model according to claim 3, wherein the step of calculating the object class loss of each anchor frame by the second loss function comprises:
determining the target class penalty for each of the anchor boxes according to equation (3),
the formula (3) is:
Figure M_220323154742021_021079001
wherein,
Figure M_220323154742098_098705001
the target category loss of the jth anchor frame of the ith mesh, nc is the preset number, k is the category of the real target information,
Figure M_220323154742130_130438002
when the prediction category corresponding to the jth anchor frame of the ith grid is the category of the real target information, the prediction category is 1, otherwise, the prediction category is 0,
Figure M_220323154742161_161224003
to predict the probability that the jth anchor box of the ith mesh belongs to the category of the true target information.
8. The method for training an object detection model according to claim 3, wherein the step of calculating the object coordinate loss of each anchor frame by the third loss function comprises:
determining the target coordinate penalty for each of the anchor boxes according to equation (4), wherein,
equation (4) is:
Figure M_220323154742194_194343001
wherein,
Figure M_220323154742255_255240001
the target coordinate penalty for the jth anchor box of the ith mesh,
Figure M_220323154742290_290056002
represents the intersection ratio of the predicted target area corresponding to the jth anchor frame of the ith grid and the area where the real target information is located,
Figure M_220323154742322_322289003
the method comprises the steps of representing the straight-line distance between the center point of a prediction target area corresponding to the jth anchor frame of the ith grid and the center point of an area where the real target information is located, representing the diagonal length of the minimum circumscribed matrix of the prediction target area corresponding to the jth anchor frame of the ith grid and the area where the real target information is located, wherein alpha is a first preset parameter, and v is a second preset parameter.
9. The method of training of an object detection model according to claim 3, the method further comprising:
clustering the real target information in each signal time-frequency diagram into the preset number of cluster categories through a Kmeans algorithm to obtain cluster centers corresponding to the cluster categories;
and acquiring the coordinates of each clustering center, and adjusting the size of the corresponding anchor frame according to the coordinates of each clustering center.
10. The method for training the target detection model according to claim 1, wherein the step of adjusting the parameters of the initial detection model according to the current training loss feedback to obtain the target detection model comprises:
judging whether the current training loss is smaller than a preset training loss threshold value or not;
if the current training loss is larger than the preset training loss threshold, continuing training the initial detection model, adjusting the initial detection model in a back propagation mode through a small batch gradient descent algorithm until the training loss continuously obtained in N training periods is smaller than the preset training loss threshold, and taking the adjusted initial detection model as the target detection model.
11. An apparatus for training an object detection model, the apparatus comprising:
the device comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a training set, and the training set comprises a plurality of signal time-frequency graphs;
the device comprises a first construction module, a second construction module and a third construction module, wherein the first construction module is used for constructing an initial detection model, and the initial detection model comprises a backbone network and a head network;
the first calculation module is used for inputting the signal time-frequency graphs into the backbone network, calculating the signal time-frequency graphs through the backbone network and outputting a plurality of characteristic layers;
the second calculation module is used for inputting the plurality of characteristic layers into the head network, and calculating each characteristic layer through the head network to obtain a final prediction result corresponding to each signal time-frequency diagram;
the second acquisition module is used for acquiring the real target information of the signal time-frequency diagrams;
the second construction module is used for constructing a preset loss function, and calculating the current training loss between the final prediction result corresponding to each signal time-frequency diagram and the real target information through the preset loss function;
and the adjusting module is used for adjusting the parameters of the initial detection model according to the current training loss feedback to obtain a target detection model.
12. A computer arrangement, characterized in that the computer arrangement comprises a memory and a processor, the memory storing a computer program which, when run by the processor, performs the method of training an object detection model according to any one of claims 1-10.
13. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a computer program which, when being executed by a processor, performs the method of training an object detection model according to any one of claims 1-10.
CN202210308846.6A 2022-03-28 2022-03-28 Training method and device of target detection model, computer equipment and storage medium Active CN114492540B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210308846.6A CN114492540B (en) 2022-03-28 2022-03-28 Training method and device of target detection model, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210308846.6A CN114492540B (en) 2022-03-28 2022-03-28 Training method and device of target detection model, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN114492540A true CN114492540A (en) 2022-05-13
CN114492540B CN114492540B (en) 2022-07-05

Family

ID=81489139

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210308846.6A Active CN114492540B (en) 2022-03-28 2022-03-28 Training method and device of target detection model, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114492540B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115100492A (en) * 2022-08-26 2022-09-23 摩尔线程智能科技(北京)有限责任公司 Yolov3 network training and PCB surface defect detection method and device
CN116776130A (en) * 2023-08-23 2023-09-19 成都新欣神风电子科技有限公司 Detection method and device for abnormal circuit signals

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10511908B1 (en) * 2019-03-11 2019-12-17 Adobe Inc. Audio denoising and normalization using image transforming neural network
CN111160255A (en) * 2019-12-30 2020-05-15 成都数之联科技有限公司 Fishing behavior identification method and system based on three-dimensional convolutional network
CN111541511A (en) * 2020-04-20 2020-08-14 中国人民解放军海军工程大学 Communication interference signal identification method based on target detection in complex electromagnetic environment
CN111832462A (en) * 2020-07-07 2020-10-27 四川大学 Frequency hopping signal detection and parameter estimation method based on deep neural network
US20210157312A1 (en) * 2016-05-09 2021-05-27 Strong Force Iot Portfolio 2016, Llc Intelligent vibration digital twin systems and methods for industrial environments
CN113033473A (en) * 2021-04-15 2021-06-25 中国人民解放军空军航空大学 ST2DCNN + SE-based radar overlapped signal identification method
CN113421281A (en) * 2021-05-17 2021-09-21 西安电子科技大学 Pedestrian micromotion part separation method based on segmentation theory
CN114154545A (en) * 2021-12-07 2022-03-08 中国人民解放军32802部队 Intelligent unmanned aerial vehicle measurement and control signal identification method under strong mutual interference condition
CN114171044A (en) * 2021-12-09 2022-03-11 江苏恩美谛医疗科技有限公司 Time domain full convolution based deep neural network electronic stethoscope self-adaptive noise elimination method

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210157312A1 (en) * 2016-05-09 2021-05-27 Strong Force Iot Portfolio 2016, Llc Intelligent vibration digital twin systems and methods for industrial environments
US10511908B1 (en) * 2019-03-11 2019-12-17 Adobe Inc. Audio denoising and normalization using image transforming neural network
CN111160255A (en) * 2019-12-30 2020-05-15 成都数之联科技有限公司 Fishing behavior identification method and system based on three-dimensional convolutional network
CN111541511A (en) * 2020-04-20 2020-08-14 中国人民解放军海军工程大学 Communication interference signal identification method based on target detection in complex electromagnetic environment
CN111832462A (en) * 2020-07-07 2020-10-27 四川大学 Frequency hopping signal detection and parameter estimation method based on deep neural network
CN113033473A (en) * 2021-04-15 2021-06-25 中国人民解放军空军航空大学 ST2DCNN + SE-based radar overlapped signal identification method
CN113421281A (en) * 2021-05-17 2021-09-21 西安电子科技大学 Pedestrian micromotion part separation method based on segmentation theory
CN114154545A (en) * 2021-12-07 2022-03-08 中国人民解放军32802部队 Intelligent unmanned aerial vehicle measurement and control signal identification method under strong mutual interference condition
CN114171044A (en) * 2021-12-09 2022-03-11 江苏恩美谛医疗科技有限公司 Time domain full convolution based deep neural network electronic stethoscope self-adaptive noise elimination method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
JUNG IM CHOI等: "Adversarial Attack and Defense of YOLO Detectors in Autonomous Driving Scenarios", 《COMPUTER VISION AND PATTERN RECOGNITION》 *
JUN-YING HU等: "Saliency-based YOLO for single target detection", 《KNOWLEDGE AND INFORMATION SYSTEMS》 *
郭琪: "基于机器学习的电磁信号识别技术研究", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115100492A (en) * 2022-08-26 2022-09-23 摩尔线程智能科技(北京)有限责任公司 Yolov3 network training and PCB surface defect detection method and device
CN116776130A (en) * 2023-08-23 2023-09-19 成都新欣神风电子科技有限公司 Detection method and device for abnormal circuit signals

Also Published As

Publication number Publication date
CN114492540B (en) 2022-07-05

Similar Documents

Publication Publication Date Title
CN114492540B (en) Training method and device of target detection model, computer equipment and storage medium
US6564176B2 (en) Signal and pattern detection or classification by estimation of continuous dynamical models
Ouarda et al. Intercomparison of regional flood frequency estimation methods at ungauged sites for a Mexican case study
CN107992447B (en) Feature selection decomposition method applied to river water level prediction data
CN111091233A (en) Wind power plant short-term wind power prediction modeling method based on wavelet analysis and multi-model AdaBoost depth network
CN111046787A (en) Pedestrian detection method based on improved YOLO v3 model
CN112735097A (en) Regional landslide early warning method and system
CN114565124A (en) Ship traffic flow prediction method based on improved graph convolution neural network
CN112633174B (en) Improved YOLOv4 high-dome-based fire detection method and storage medium
CN110224771B (en) Spectrum sensing method and device based on BP neural network and information geometry
CN111880158A (en) Radar target detection method and system based on convolutional neural network sequence classification
CN113988357B (en) Advanced learning-based high-rise building wind induced response prediction method and device
CN113852432A (en) RCS-GRU model-based spectrum prediction sensing method
CN116310850B (en) Remote sensing image target detection method based on improved RetinaNet
CN113487600A (en) Characteristic enhancement scale self-adaptive sensing ship detection method
CN115937659A (en) Mask-RCNN-based multi-target detection method in indoor complex environment
CN113536373A (en) Desensitization meteorological data generation method
CN115859840B (en) Marine environment power element region extremum analysis method
CN117218545A (en) LBP feature and improved Yolov 5-based radar image detection method
CN114822562A (en) Training method of voiceprint recognition model, voiceprint recognition method and related equipment
CN104730509A (en) Radar detection method based on knowledge auxiliary permutation detection
CN113688774B (en) Advanced learning-based high-rise building wind induced response prediction and training method and device
CN118094487B (en) Method and system for predicting precipitation by multi-source meteorological elements based on space-time perception mechanism
KR102664948B1 (en) Apparatus for preprocessing input data for artificial intelligence model and method therefor
CN116432518B (en) Rapid forecasting method, system, equipment and medium for occurrence probability of malformed wave

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20220513

Assignee: Chengdu Shuzhi Innovation Lean Technology Co.,Ltd.

Assignor: Chengdu shuzhilian Technology Co.,Ltd.

Contract record no.: X2024510000014

Denomination of invention: Training methods, devices, computer equipment, and storage media for object detection models

Granted publication date: 20220705

License type: Common License

Record date: 20240723

EE01 Entry into force of recordation of patent licensing contract