CN111353440A - Target detection method - Google Patents
Target detection method Download PDFInfo
- Publication number
- CN111353440A CN111353440A CN202010137294.8A CN202010137294A CN111353440A CN 111353440 A CN111353440 A CN 111353440A CN 202010137294 A CN202010137294 A CN 202010137294A CN 111353440 A CN111353440 A CN 111353440A
- Authority
- CN
- China
- Prior art keywords
- training
- target
- model
- data set
- target detection
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/41—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a target detection method, which comprises the following steps: constructing a video target detection and identification model, inputting labeled data in a training data set into a full convolution network, optimizing and generating a comprehensive loss value function by using weighted linear combination of independent loss functions and obtaining a total loss function; the video target detection recognition model uses a logic activation function to calculate a classification confidence value and a rectangular selection box at the last layer of the network model. The invention can improve the accuracy and the real-time performance of the high-frame-rate high-definition video target detection and identification.
Description
Technical Field
The invention belongs to the technical field of video identification, and particularly relates to a target detection method.
Background
With the rapid development of the fields of intelligent monitoring, intelligent transportation and the like, the increase of high-frame-rate high-definition data sources and the high requirements and complexity and variability of practical application scenes, the classical method cannot meet the latest requirements, and the problems of accuracy and real-time performance required by high-frame-rate high-definition video target detection and identification cannot be solved.
Therefore, how to provide a target detection method with high accuracy and real-time performance is a problem that needs to be solved urgently by those skilled in the art.
Disclosure of Invention
In view of this, the present invention provides a target detection method, which can solve the problems of accuracy and real-time required for high frame rate and high definition video target detection and identification in a high-requirement and complex and variable real-world application scene.
In order to achieve the purpose, the invention adopts the following technical scheme:
a method of target detection, comprising:
constructing a video target detection and identification model, inputting labeled data in a training data set into a full convolution network through a high-speed storage medium, optimizing and generating a comprehensive loss value function by using weighted linear combination of independent loss functions, and obtaining a total loss function through training minimization;
all layers of the video target detection and identification model except the last layer use a correction linear unit based on a leaky bucket as an activation function, and a logic activation function is used for calculating a classification confidence value and a rectangular selection frame at the last layer of the network model, so that the real-time target detection and identification of video vehicles and pedestrians is realized.
Preferably, the method further comprises the following steps of testing the video target detection recognition model: and sending the verification data in the verification data set to the optimized full convolution network to obtain a prediction rectangular selection frame, and calculating with a real rectangular selection frame in the verification data set to obtain an average value of the area under the function curve.
Preferably, the pre-training weights and threshold initialization are input to the full convolution network through a learning rate decay strategy in a learning mode.
Preferably, the method for initializing the pre-training weight and the threshold comprises the following steps: and initializing a pre-training weight and a threshold of the pre-training model before training by adopting parameters of the pre-training model obtained by training based on the standard data set.
Preferably, the learning rate attenuation strategy comprises the following steps: the parameters obtained by the video target detection and identification model are adopted for model parameter initialization before training, a small batch gradient descent method is adopted for training, the learning rate lr of each iteration of the initial small batch gradient descent method of the model is changed into 0.001, and the model can output the confidence value and the selection box coordinates of each related target classification in each partition.
Preferably, the method for optimizing the generation of the composite loss value function by using the weighted linear combination of the independent loss functions and obtaining the overall loss function by training minimization is as follows:
setting Si(xg,yg,ωg,hg) True rectangular selection box, x, for the targetg,ygIs the central point;
setting S (x, y, omega, h) as a prediction rectangle selection frame of a target, wherein x and y are central points;
wherein, x and y are coordinates of the central point of the prediction rectangle selection frame, and omega and h are the width and height of the prediction rectangle selection frame; x is the number ofg,ygSelecting the coordinates of the center point of the frame, omega, for the true rectanglegAnd hgSelecting the width and height of the frame for the true rectangle;
weighting the probability and the error D of the Euclidean distance between a prediction rectangle selection box of a target detection algorithm and a selection box truth value of a target real label based on the regular term hyperparameter lambda to obtain a loss function: l1 ═ λ · D to generalize the detection and identification capabilities, and at the same time calculate the Intersection over Union loss function value L2 ═ λ · IOU, which is a standard for measuring the accuracy of detecting the corresponding object in a specific dataset;
81 partitions generate 486 probabilities after each input original data passes through a network, a certain space point has a probability Pr, the existence of a target can be quantitatively measured, 81 partitions give out probabilities Pc of 6 classes, the quantitative measurement probability Po of the partitions to the target needs to be compounded, and then after the partitions are divided into six classes, the unconditional probability of a single independent class is compounded to obtain a corresponding classification loss function value Lc which is Pc-Po after the probability that the target is classified into the certain class after the target exists in the certain partition is determined; the overall loss function L is L1+ L2+ Lc.
Preferably, the convolutional layer and the pooling layer used by the video target detection and recognition model support multi-size image input, a data set with a single size is enhanced by implementing the data set through preprocessing, and images with different sizes are randomly selected after each M batches of training in the training.
Preferably, in the detection of multiple classes of targets, each class of target may draw a function curve of recall and precision, and avep (q) is an area under the function curve, as shown in equation 8: mAP is the average value of Q types of AP, and the type serial number Q is an integer from 1 to Q;
preferably, the training data set is various types of videos with various resolutions including a D5 format; the validation data set is composed of 20% of the data in the reserved training data set as validation data.
The invention has the beneficial effects that:
the method is based on separation confidence calculation and regular term hyper-parameters, designs a composite loss function based on classification confidence values, can be used for detecting and identifying high-frame-rate high-definition video targets under high-requirement and complex and variable practical application scenes, and obviously improves the accuracy and the real-time performance of measurement based on mAP and camera frame rate fps.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
FIG. 1 is a diagram illustrating a video object detection and recognition model according to the present invention.
FIG. 2 is a diagram illustrating a test chart of a video object detection recognition model according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1-2, the present invention provides a target detection method, including:
and constructing a video target detection and identification model, inputting the labeled data in the training data set into the full convolution network through a high-speed storage medium, and initializing a pre-training weight and a threshold value and inputting the pre-training weight and the threshold value into the full convolution network through a learning rate attenuation strategy in a learning mode. A weighted linear combination of the independent loss functions is used to optimize the resulting composite loss value function and minimize by training the overall loss function.
The video target detection and identification model calculates a classification confidence value and a rectangular selection box by using a logic activation function at the last layer of the network model, so that the real-time detection and identification of the targets of video vehicles and pedestrians are realized.
The invention also comprises the test of the video target detection and identification model: and sending the verification data in the verification data set to the optimized full convolution network to obtain a prediction rectangular selection frame, and calculating with a real rectangular selection frame in the verification data set to obtain an average value mAP of the area under the function curve.
The training data set is various videos with various resolutions including a D5 format; the validation data set is composed of 20% of the data in the reserved training data set as validation data.
The method for initializing the pre-training weight and the threshold comprises the following steps: and initializing a pre-training weight and a threshold of the pre-training model before training by adopting parameters of the pre-training model obtained by training based on the standard data set.
The method of the learning rate attenuation strategy comprises the following steps: the method is characterized in that parameters obtained by a video target detection and identification model are adopted for initializing model parameters before training, a small-batch gradient descent method is adopted for training, the learning rate lr of each iteration of the initial small-batch gradient descent method of the model is changed to 0.001, the purpose is to simultaneously keep certain learning capacity and memory capacity, new knowledge can be learned while old knowledge cannot be completely forgotten, the model can output confidence values and selection frame coordinates of each relevant target classification in each partition because the occupied memory is large and easy to overflow, the batch processing scale generally does not exceed 128, and the model can output the confidence values and the selection frame coordinates of each relevant target classification in each partition.
The method for optimizing the generation of the composite loss value function by using the weighted linear combination of the independent loss functions and obtaining the overall loss function by training minimization is as follows:
setting Si(xg,yg,ωg,hg) True rectangular selection box, x, for the targetg,ygIs the central point;
setting S (x, y, omega, h) as a prediction rectangle selection frame of a target, wherein x and y are central points;
wherein, x and y are coordinates of the central point of the prediction rectangle selection frame, and omega and h are the width and height of the prediction rectangle selection frame; x is the number ofg,ygSelecting the coordinates of the center point of the frame, omega, for the true rectanglegAnd hgSelecting the width and height of the frame for the true rectangle;
weighting errors D of Euclidean distances between a prediction rectangle selection frame of a probability and a prediction rectangle selection frame of a target detection algorithm and a true value of the selection frame of a target real label based on a regular term hyper-parameter lambda to obtain a loss function L1 (lambda.D) to realize generalization of detection and identification capacity and simultaneously calculate an interaction over Unit loss function value L2 (lambda.IOU), wherein the IOU is a standard for measuring the accuracy of detecting a corresponding object in a specific data set, and the regular term hyper-parameter lambda ∈ [0, ∞ ];
81 partitions generate 486 probabilities after each input original data passes through a network, a certain space point has a probability Pr, the existence of a target can be quantitatively measured, 81 partitions give out probabilities Pc of 6 classes, the quantitative measurement probability Po of the partitions to the target needs to be compounded, and then after the partitions are divided into six classes, the unconditional probability of a single independent class is compounded to obtain a corresponding classification loss function value Lc which is Pc-Po after the probability that the target is classified into the certain class after the target exists in the certain partition is determined; the overall loss function L is L1+ L2+ Lc.
In another embodiment, the convolutional layer and the pooling layer used by the video target detection and recognition model support multi-size image input, a single-size data set is enhanced by implementing the data set through preprocessing, and images with different sizes are randomly selected after each M batches of training (M is 20 batches) in the training. Let the downsampling parameters of the network model in this chapter be equal to multiples of 64, 64 {320,384, …, 1216}, with the smallest size being 320 × 320 and the largest size being 1216 × 1216. Only the model needs to be fine-tuned to the corresponding dimension and then the training of the next batch is continued.
In the detection of multiple classes of targets, each class of target can draw a function curve of recall and precision, and avep (q) is the area under the function curve, as shown in equation 8: mAP is the average value of Q types of AP, and the type serial number Q is an integer from 1 to Q;
the method is used for calculating the classification confidence value and a rectangular selection box on the last layer of a network model according to a model designed by the method, so that the real-time detection and identification model of the targets of video vehicles and pedestrians is realized, the method can be used for detecting and identifying the high-frame-rate high-definition video targets under high-requirement and complex and variable practical application scenes, the performance based on mAP and FPS (frame rate per second) is obviously improved compared with a classical method, the accuracy and the instantaneity (70 FPS can be supported) based on mAP and FPS are obviously improved, and the performance requirements of the high-frame-rate high-definition video target detection and identification are met.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims (9)
1. A method of object detection, comprising:
constructing a video target detection and identification model, inputting labeled data in a training data set into a full convolution network through a high-speed storage medium, optimizing and generating a comprehensive loss value function by using weighted linear combination of independent loss functions, and obtaining a total loss function through training minimization;
the video target detection and identification model calculates a classification confidence value and a rectangular selection box by using a logic activation function at the last layer of the network model, so that the real-time detection and identification of the targets of video vehicles and pedestrians are realized.
2. The method of claim 1, further comprising testing a video object detection recognition model: and sending the verification data in the verification data set to the optimized full convolution network to obtain a prediction rectangular selection frame, and calculating with a real rectangular selection frame in the verification data set to obtain an average value of the area under the function curve.
3. The method of claim 2, wherein the pre-training weights and threshold initialization are input to the full convolution network via a learning rate decay strategy in a learning mode.
4. The method of claim 3, wherein the method for initializing the pre-training weights and the threshold comprises: and initializing a pre-training weight and a threshold of the pre-training model before training by adopting parameters of the pre-training model obtained by training based on the standard data set.
5. The method of claim 4, wherein the learning rate decay strategy comprises: the parameters obtained by the video target detection and identification model are adopted for model parameter initialization before training, a small batch gradient descent method is adopted for training, the learning rate lr of each iteration of the initial small batch gradient descent method of the model is changed into 0.001, and the model can output the confidence value and the selection box coordinates of each related target classification in each partition.
6. The method of claim 1, wherein the method of using weighted linear combination of independent loss functions to optimize the generation of the composite loss value function and the training minimization to obtain the overall loss function is:
setting Si(xg,yg,ωg,hg) True rectangular selection box, x, for the targetg,ygIs the central point;
setting S (x, y, omega, h) as a prediction rectangle selection frame of a target, wherein x and y are central points;
wherein, x and y are coordinates of the central point of the prediction rectangle selection frame, and omega and h are the width and height of the prediction rectangle selection frame; x is the number ofg,ygSelecting the coordinates of the center point of the frame, omega, for the true rectanglegAnd hgSelecting the width and height of the frame for the true rectangle;
weighting the probability and the error D of the Euclidean distance between a prediction rectangle selection box of a target detection algorithm and a selection box truth value of a target real label based on the regular term hyperparameter lambda to obtain a loss function: l1 ═ λ · D to generalize the detection and identification capabilities, and at the same time calculate the Intersection over Union loss function value L2 ═ λ · IOU, which is a standard for measuring the accuracy of detecting the corresponding object in a specific dataset;
81 partitions generate 486 probabilities after each input original data passes through a network, a certain space point has a probability Pr, the existence of a target can be quantitatively measured, 81 partitions give out probabilities Pc of 6 classes, the quantitative measurement probability Po of the partitions to the target needs to be compounded, and then after the partitions are divided into six classes, the unconditional probability of a single independent class is compounded to obtain a corresponding classification loss function value Lc which is Pc-Po after the probability that the target is classified into the certain class after the target exists in the certain partition is determined; the overall loss function L is L1+ L2+ Lc.
7. The method of claim 1, wherein the convolutional layer and pooling layer of the video target detection recognition model support multi-size image input, and the single-size data set is enhanced by preprocessing, and randomly selecting different-size images after each M training batches.
8. The method as claimed in claim 2, wherein in the detection of the plurality of classes of targets, each class of target can draw a function curve of recall and precision, and avep (q) is an area under the function curve, as shown in equation 8: mAP is the average value of Q types of AP, and the type serial number Q is an integer from 1 to Q;
9. the method of claim 2, wherein the training data set is a variety of video with various resolutions including D5 format; the validation data set is composed of 20% of the data in the reserved training data set as validation data.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2019113972729 | 2019-12-30 | ||
CN201911397272 | 2019-12-30 |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111353440A true CN111353440A (en) | 2020-06-30 |
Family
ID=71197229
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010137294.8A Pending CN111353440A (en) | 2019-12-30 | 2020-03-03 | Target detection method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111353440A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111860510A (en) * | 2020-07-29 | 2020-10-30 | 浙江大华技术股份有限公司 | X-ray image target detection method and device |
CN112149501A (en) * | 2020-08-19 | 2020-12-29 | 北京豆牛网络科技有限公司 | Method and device for identifying packaged fruits and vegetables, electronic equipment and computer readable medium |
CN112613462A (en) * | 2020-12-29 | 2021-04-06 | 安徽大学 | Weighted intersection ratio method |
CN113537242A (en) * | 2021-07-19 | 2021-10-22 | 安徽炬视科技有限公司 | Small target detection algorithm based on dense deconvolution and specific loss function |
-
2020
- 2020-03-03 CN CN202010137294.8A patent/CN111353440A/en active Pending
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111860510A (en) * | 2020-07-29 | 2020-10-30 | 浙江大华技术股份有限公司 | X-ray image target detection method and device |
CN112149501A (en) * | 2020-08-19 | 2020-12-29 | 北京豆牛网络科技有限公司 | Method and device for identifying packaged fruits and vegetables, electronic equipment and computer readable medium |
CN112613462A (en) * | 2020-12-29 | 2021-04-06 | 安徽大学 | Weighted intersection ratio method |
CN112613462B (en) * | 2020-12-29 | 2022-09-23 | 安徽大学 | Weighted intersection ratio method |
CN113537242A (en) * | 2021-07-19 | 2021-10-22 | 安徽炬视科技有限公司 | Small target detection algorithm based on dense deconvolution and specific loss function |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110135267B (en) | Large-scene SAR image fine target detection method | |
CN111652321B (en) | Marine ship detection method based on improved YOLOV3 algorithm | |
CN112396002B (en) | SE-YOLOv 3-based lightweight remote sensing target detection method | |
CN111353440A (en) | Target detection method | |
CN114202672A (en) | Small target detection method based on attention mechanism | |
CN112101430B (en) | Anchor frame generation method for image target detection processing and lightweight target detection method | |
CN111126472A (en) | Improved target detection method based on SSD | |
CN111079739B (en) | Multi-scale attention feature detection method | |
Zhou et al. | Octr: Octree-based transformer for 3d object detection | |
CN112927279A (en) | Image depth information generation method, device and storage medium | |
CN114332578A (en) | Image anomaly detection model training method, image anomaly detection method and device | |
CN112801183A (en) | Multi-scale target detection method based on YOLO v3 | |
CN111310821A (en) | Multi-view feature fusion method, system, computer device and storage medium | |
CN116310850B (en) | Remote sensing image target detection method based on improved RetinaNet | |
CN116266387A (en) | YOLOV4 image recognition algorithm and system based on re-parameterized residual error structure and coordinate attention mechanism | |
CN114565842A (en) | Unmanned aerial vehicle real-time target detection method and system based on Nvidia Jetson embedded hardware | |
KR20210093875A (en) | Video analysis methods and associated model training methods, devices, and devices | |
CN115661767A (en) | Image front vehicle target identification method based on convolutional neural network | |
CN117853955A (en) | Unmanned aerial vehicle small target detection method based on improved YOLOv5 | |
CN114494823A (en) | Commodity identification, detection and counting method and system in retail scene | |
CN113435324B (en) | Vehicle target detection method and device and computer readable storage medium | |
CN113139540B (en) | Backboard detection method and equipment | |
CN118279320A (en) | Target instance segmentation model building method based on automatic prompt learning and application thereof | |
US20230343082A1 (en) | Encoding of training data for training of a neural network | |
CN114998672B (en) | Small sample target detection method and device based on meta learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20200630 |