CN115588150A - Pet dog video target detection method and system based on improved YOLOv5-L - Google Patents

Pet dog video target detection method and system based on improved YOLOv5-L Download PDF

Info

Publication number
CN115588150A
CN115588150A CN202211151017.8A CN202211151017A CN115588150A CN 115588150 A CN115588150 A CN 115588150A CN 202211151017 A CN202211151017 A CN 202211151017A CN 115588150 A CN115588150 A CN 115588150A
Authority
CN
China
Prior art keywords
video
pet dog
module
model
improved
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211151017.8A
Other languages
Chinese (zh)
Inventor
黄步添
汪志刚
刘振广
焦颖颖
许曼迪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Yunxiang Network Technology Co Ltd
Original Assignee
Hangzhou Yunxiang Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Yunxiang Network Technology Co Ltd filed Critical Hangzhou Yunxiang Network Technology Co Ltd
Priority to CN202211151017.8A priority Critical patent/CN115588150A/en
Publication of CN115588150A publication Critical patent/CN115588150A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/7715Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a pet dog video target detection method based on improved YOLOv5-L, which comprises the following steps: collecting pet dog image data for constructing an initial training set; collecting video data containing pet dogs for constructing a test set; performing frame extraction on the video in the test set, and storing the obtained frame image; preprocessing the initial training set to obtain a final training set; the YOLOv5-L model is improved, and the specific steps are as follows: building a BackBone network, improving a Pred module, and adding an SK attention mechanism behind the BackBone network; setting training parameters, training the improved model, and storing an optimal weight parameter file; and (4) putting the weight parameter file into a detector, detecting the video in the test set, storing all video frames of the detected pet dog, and evaluating the detection result by using the AP index. The invention reduces the parameter quantity of the model and improves the accuracy of detecting the blurred and shielded video frame image.

Description

Pet dog video target detection method and system based on improved YOLOv5-L
Technical Field
The invention relates to the technical field of video target detection, in particular to a pet dog video target detection method and system based on improved YOLOv 5-L.
Background
Currently, pet dogs, which are the most commonly living partners of many people who keep them for the purpose of eliminating their loneliness or for entertainment, are smart animals that are easy to move, good for human mind, and loyal for the owner after human domestication, and it is an important research work to understand the behavior of pet dogs.
The target detection is a hotspot in the field of computer vision at present, the traditional classification task generally only concerns the whole and obtains the content description of one image, but the target detection tasks are different, the target detection focuses on a specific object target, the target detection needs to extract the target of interest from the background and determine the position of the target, and therefore the target detection output is a list which comprises the category and the position of the target. Existing target detection algorithms are generally classified into two types: a two-stage detection model and a one-stage detection model. the two-stage detection model is firstly generated in a region which is called as region pro-mesa, samples are classified through a convolution network, and common two-stage detection models comprise R-CNN, SPP-Net, fast R-CNN and the like. The one-stage detection model does not need to generate region proxy, directly extracts features from input data, and directly predicts the category and position information of an object, and common algorithms include: SSD and YOLO.
Although the existing two-stage detection models have good testing accuracy on a universal data set, the detection speed of the models is very slow, and particularly when video detection is carried out, the two-stage detection models cannot carry out real-time detection on videos with fps larger than 25. Compared with a two-stage detection model, the speed of the one-stage detection model is higher, wherein the detection speed of the YOLOv5 model is far higher than that of the two-stage detection model. However, the existing target detection model is only suitable for detecting objects with regular shapes, and in the video target detection of the pet dog, when the pet dog moves, the form changes, and the model is difficult to detect accurately.
Disclosure of Invention
In view of the above problems, the present invention aims to provide an improved YOLOv 5-L-based target detection model, and to enhance data by preprocessing a data set, so as to improve the accuracy of detecting a motion video frame of a pet dog.
Based on the above purpose, the invention provides a pet dog video target detection method and system based on improved YOLOv 5-L.
A pet dog video target detection method based on improved YOLOv5-L comprises the following steps:
respectively constructing an initial training set test set based on the acquired image data containing the pet dog and the acquired video data containing the pet dog;
carrying out frame extraction on the video containing the pet dog to obtain a frame image;
preprocessing the initial training set to obtain a final training set;
improving and training a YOLOv5-L model, specifically comprising the following steps: building a BackBone network, improving a Pred module, and adding an SK attention mechanism behind the BackBone network; setting training parameters, training the improved YOLOv5-L model, and storing an optimal weight parameter file; putting the optimal weight parameter file into a detector, detecting the final test concentrated video, storing all video frames of the detected pet dog, and evaluating a detection result by using an AP index to obtain an optimal improved YOLOv5-L model;
and inputting the video of the pet dog to be detected into the optimal YOLOv5-L model to obtain a corresponding detection result. As an implementation, the constructing the initial training set and the test set includes the following steps:
obtaining all marked pet dog pictures based on the obtained image data containing the pet dog;
labeling all the pictures with different background noises by using a LabelImg labeling tool to obtain labeled pictures of the pet dog, wherein the different background noises at least comprise one or more of grassland, snow mountain, indoor and street;
merging the marked pet dog pictures into an initial training set;
searching videos of interaction between a person and a pet dog on a Video website, and downloading and storing the videos by using a 4K Video tool;
and cutting the stored video, splitting the original video into short videos of 3s-10s, and storing all the short videos to obtain a test set.
As an implementation, the frame extraction of the video in the test set and the preprocessing of the initial training set include the following steps:
extracting the video in the test set frame by frame through an extractor algorithm, and storing all video frame images;
selecting and labeling pictures with abnormal shapes and motion blurs of part of pet dogs from the video frame images to obtain labeled pictures;
randomly selecting a plurality of labeled pictures to perform left-right translation, multi-picture superposition and proportional scaling to obtain processed labeled pictures with various morphological characteristics;
and merging the processed labeled picture and the initial training set to obtain a final training set.
As an implementable mode, the BackBone network is built and comprises a down-sampling module, a CBR module, a Res module and a CSP _ X module;
the down-sampling module; dividing the 640 pixel-640 pixel RGB image into 12-channel feature maps by using a split algorithm, and obtaining 64-channel feature maps by convolution;
the CBR module; the device comprises 3 × 3 convolution layers, a regularization layer and a Relu function;
the Res module; comprises two CBR modules and a null layer residual and are connected with each other;
the CSP _ X module; the method is used for extracting features and comprises a CBR module, X Res modules and a hollow layer residual error which are connected with one another, wherein X represents the number.
As an implementation, the modified Pred module includes: and adding a flatten algorithm in front of the output layer, carrying out one-dimensional feature diagram, and replacing the convolution layer in the output layer with a full-connection layer.
As one possible implementation, the SK attention mechanism comprises a split unit, a fuse unit and a select unit; the split unit convolutes the original characteristic diagram through convolution cores with three sizes; the fuse unit calculates the weight of each convolution kernel, sums the feature graphs of the three branches according to elements, generates channel statistical information through global average pooling, and obtains a new feature dimension of C x 1; the select unit calculates the weight of each convolution kernel by using softmax, and fuses all the convolution kernels to form the final output convolution kernel.
As an implementation, the improving YOLOv5-L model and training further comprises the following steps:
modifying number class change detection categories in the YAML configuration file, the categories including: dog, human;
setting an NMS mechanism for reserving a prediction box with the best prediction, and reducing the confidence coefficient of the rest prediction boxes to 0;
setting a Loss function as DIOU _ Loss;
setting a training hyper-parameter, setting the number of training rounds to be 300, setting an optimizer to be an improved SGD, setting an initial learning rate to be 0.01, setting a learning rate momentum to be 0.95 and setting a training batch to be 64;
and (4) training the training set in a model, obtaining the optimal weight parameter through multiple iterations, and saving the file as best.
As an implementation mode, the optimal weight parameters are put into a detector, a scaling algorithm is added to fix the size of the incoming video frame to be 640 pixels by 640 pixels, the video frame is put into a test set video for detection, and all video frames of the pet dog are stored; the accuracy of the AP index evaluation model is adopted, and the AP index calculation mode is as follows: AP = number of detected pet dog occurring video frames/number of pet dog occurring video frames in all.
A pet dog video target detection system based on improved YOLOv5-L comprises a data acquisition module, an image extraction module, a preprocessing module, a model improvement training module and a result detection module;
the data acquisition module is used for respectively constructing an initial training set test set based on the acquired image data containing the pet dog and the acquired video data containing the pet dog;
the image extraction module is used for carrying out frame extraction on the video containing the pet dog to obtain a frame image;
the preprocessing module is used for preprocessing the initial training set to obtain a final training set;
the model improvement training module is used for improving and training a YOLOv5-L model, and specifically comprises the following steps: building a BackBone network, improving a Pred module, and adding an SK attention mechanism behind the BackBone network; setting training parameters, training the improved YOLOv5-L model, and storing an optimal weight parameter file; putting the optimal weight parameter file into a detector, detecting the final test concentrated video, storing all video frames of the detected pet dog, and evaluating the detection result by using an AP index to further obtain an optimal improved YOLOv5-L model;
and the result detection module is used for inputting the video of the pet dog to be detected into the optimal YOLOv5-L model to obtain a corresponding detection result.
Compared with the prior art, the pet dog video target detection method based on the improved YOLOv5-L has the following beneficial effects:
1. by combining a plurality of data sets as training sets, the data volume during training is increased, and the features which can be trained by the model are enriched;
2. by improving the YOLOv5-L model, the parameter quantity of the model is reduced, and the detection speed is increased;
3. the fuzzy frame and the shielding frame of the video in the test set are extracted, and the training set is combined, so that the accuracy of detecting the motion blur of the pet dog is improved, and when the form of the pet dog is changed, the detection accuracy is higher than that of an unmodified YOLOv5-L model;
4. an SK attention mechanism is added, the attention degree of the model to important features is improved, and local and global relations are better acquired.
Drawings
FIG. 1 is a flow chart of the overall implementation of the present invention.
Fig. 2 is a diagram of the steps of frame extraction for videos in the test set and preprocessing for the initial training set.
Fig. 3 shows the detection result of a video frame of a video in the test set.
Detailed Description
In order to clearly illustrate the present invention, the technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the drawings in the embodiments of the present invention, so that those skilled in the art can implement the embodiments according to the present invention with reference to the description text.
FIG. 1 is a diagram of the overall process steps of the invention, and a pet dog video target detection method based on improved YOLOv5-L comprises the following steps:
step one, constructing and constructing an initial training set and a test set: collecting a data set dogbred and a data set Dogs vs Cats Redox on a kaggee, and extracting pictures related to pet Dogs in the two data sets; collecting pictures with different background noises (such as grassland, snow mountain, indoor and street), wherein the pictures contain the pictures of pet dogs; labeling all the pictures by using a LabelImg labeling tool to obtain labeled pictures of the pet dog; merging the marked pet dog pictures into an initial training set; collecting videos of interaction between a person and a pet dog on a youtube website, and downloading and storing the videos by using a 4Kvideo tool; and cutting the stored video, splitting the original video into short videos of 3s-10s, and storing all the short videos to obtain a test set.
Step two, performing frame extraction on the video in the test set and preprocessing the initial training set, and specifically comprising the following steps: extracting the video in the test set frame by using an extractor algorithm, and storing all video frame images; selecting and marking pictures with abnormal shapes and motion blurs of part of pet dogs from the video frame images to obtain marked pictures; randomly selecting pictures in the training set to perform left-right translation, multi-picture superposition and scaling, thereby enriching the morphological characteristics of the pet dog; and merging the marked pictures and the initial training set to obtain a final training set.
Step three, improving a YOLOv5-L model, and firstly building a BackBone network, wherein the BackBone network specifically comprises the following steps: the device comprises a down-sampling module, a CBR module, a Res module and a CSP _ X module; the down-sampling module divides the 640 pixel-640 pixel RGB image into 12-channel feature maps by adopting a split algorithm, and obtains 64-channel feature maps by convolution; the Backbone comprises 5 CBR modules, and each CBR module consists of a 3 × 3 convolution layer, a regularization layer and a Relu function; the Res module is connected with the empty layer residual error by two CBR modules; the CSP _ X module is used for extracting main features and is connected with the empty layer residual error by the CBR module, the X Res modules; the Backbone comprises a CSP _2, two CSP _4 and a CSP _8 module.
Step four, improving a YOLOv5-L model, and then improving a Pred module, wherein the method specifically comprises the following steps: adding a flatten algorithm in front of an output module, carrying out one-dimensional feature, and replacing the convolution layer in the output module with a full-connection layer; the model has fewer detection types, the full connection layer does not increase excessive parameter calculation, and the detection accuracy can be better.
Step five, improving a YOLOv5-L model, and adding an SK attention mechanism behind the BackBone network, wherein the SK attention mechanism consists of split, fuse and select; the split part firstly convolutes the original characteristic diagram through convolution cores with three sizes; the fuse part calculates the weight of each convolution kernel, sums the feature graphs of the three branches according to elements, generates channel statistical information through global average pooling, and obtains a new feature dimension C1; the select part calculates the weight of each convolution kernel by using softmax, and fuses all the convolution kernels to form the convolution kernel of the final output.
Step six, training the improved model, specifically as follows: modifying number class change detection categories in the YAML configuration file, wherein the categories comprise: dog, human; setting an NMS mechanism for reserving a prediction box with the best prediction, and reducing the confidence coefficient of the rest prediction boxes to 0; setting a Loss function as DIOU _ Loss; setting a training hyper-parameter, setting the number of training rounds to be 300, setting an optimizer to be an improved SGD, setting an initial learning rate to be 0.01, setting a learning rate momentum to be 0.95 and setting a training batch to be 64; and (4) training the training set in a model, obtaining the optimal weight parameter through multiple iterations, and saving the file as best.
Step seven, putting the weight parameter file best.pt into a detector, adding a scaling algorithm to fix the size of the transmitted video frame to be 640 pixels by 640 pixels, putting the video into a test set for detection, and storing all video frames of the detected pet dog; the accuracy of the AP index evaluation model is adopted, and the AP index calculation mode is as follows: AP = number of detected pet dog occurring video frames/number of pet dog occurring video frames in all.
The embodiments described above are presented to enable a person having ordinary skill in the art to make and use the invention. It will be readily apparent to those skilled in the art that various modifications to the above-described embodiments may be made, and the generic principles defined herein may be applied to other embodiments without the use of inventive faculty. Therefore, the present invention is not limited to the above embodiments, and those skilled in the art should make improvements and modifications to the present invention based on the disclosure of the present invention within the protection scope of the present invention.

Claims (9)

1. A pet dog video target detection method based on improved YOLOv5-L is characterized by comprising the following steps:
respectively constructing an initial training set test set based on the acquired image data containing the pet dog and the acquired video data containing the pet dog;
carrying out frame extraction on the video containing the pet dog to obtain a frame image;
preprocessing the initial training set to obtain a final training set;
improving and training a YOLOv5-L model, specifically comprising the following steps: building a BackBone network, improving a Pred module, and adding an SK attention mechanism behind the BackBone network; setting training parameters, training the improved YOLOv5-L model, and storing an optimal weight parameter file; putting the optimal weight parameter file into a detector, detecting the final test concentrated video, storing all video frames of the detected pet dog, and evaluating a detection result by using an AP index to obtain an optimal improved YOLOv5-L model;
and inputting the video of the pet dog to be detected into the optimal YOLOv5-L model to obtain a corresponding detection result.
2. The improved YOLOv 5-L-based pet dog video target detection method as claimed in claim 1, wherein the constructing of the initial training set and the test set comprises the following steps:
obtaining all marked pet dog pictures based on the obtained image data containing the pet dog;
labeling all the pictures with different background noises by using a LabelImg labeling tool to obtain labeled pictures of the pet dog, wherein the different background noises at least comprise one or more of grassland, snow mountain, indoor and street;
merging the marked pet dog pictures into an initial training set;
searching a video of interaction between a person and a pet dog on a video website, and downloading and storing the video by using a 4Kvideo tool;
and cutting the stored video, splitting the original video into 3s-10s short videos, and storing all the short videos to obtain a test set.
3. The improved YOLOv 5-L-based pet dog video target detection method as claimed in claim 1, wherein the steps of performing frame extraction on the videos in the test set and performing pre-processing on the initial training set comprise the following steps:
extracting the video in the test set frame by frame through an extractor algorithm, and storing all video frame images;
selecting and marking pictures with abnormal shapes and motion blurs of part of pet dogs from the video frame images to obtain marked pictures;
randomly selecting a plurality of labeled pictures to perform left-right translation, multi-picture superposition and proportional scaling to obtain processed labeled pictures with various morphological characteristics;
and merging the processed labeled picture and the initial training set to obtain a final training set.
4. The improved YOLOv 5-L-based pet dog video target detection method as claimed in claim 1, wherein the BackBone network is built and comprises a down-sampling module, a CBR module, a Res module and a CSP _ X module;
the down-sampling module; dividing the 640 pixel-640 pixel RGB image into 12-channel feature maps by adopting a split algorithm, and obtaining 64-channel feature maps by convolution;
the CBR module; the device comprises 3 × 3 convolution layers, a regularization layer and a Relu function;
the Res module; comprises two CBR modules and a null layer residual and are connected with each other;
the CSP _ X module; the method is used for extracting features and comprises a CBR module, X Res modules and a blank layer residual error which are connected with one another, wherein X represents the number.
5. The improved YOLOv 5-L-based pet dog video target detection method as claimed in claim 1, wherein the improved Pred module comprises: and adding a flatten algorithm in front of the output layer, carrying out one-dimensional operation on the characteristic diagram, and replacing the convolution layer in the output layer with a full-connection layer.
6. The improved YOLOv 5-L-based pet dog video target detection method according to claim 1, wherein the SK attention mechanism comprises a split unit, a fuse unit and a select unit; the split unit convolutes the original characteristic diagram through convolution cores with three sizes; the fuse unit calculates the weight of each convolution kernel, sums the feature graphs of the three branches according to elements, generates channel statistical information through global average pooling, and obtains a new feature dimension C x 1; the select unit calculates the weight of each convolution kernel by using softmax, and fuses all the convolution kernels to form the final output convolution kernel.
7. The improved YOLOv5-L based pet dog video target detection method of claim 1, wherein the improved YOLOv5-L model is trained and further comprises the following steps:
modifying the number class change detection category in the YAML configuration file;
setting an NMS mechanism for reserving a prediction box with the best prediction and reducing the confidence coefficient of other prediction boxes to 0;
setting a Loss function as DIOU _ Loss;
setting a training hyper-parameter, setting the number of training rounds to be 300, setting an optimizer to be an improved SGD, setting an initial learning rate to be 0.01, setting a learning rate momentum to be 0.95 and setting a training batch to be 64;
and the training set enters a model for training, and the optimal weight parameter is obtained through multiple iterations.
8. The improved YOLOv 5-L-based pet dog video target detection method as claimed in claim 1, wherein the optimal weight parameters are put into a detector, a scaling algorithm is added to fix the size of the incoming video frame to 640 pixels by 640 pixels, the incoming video frame is put into a test set video for detection, and all video frames of the detected pet dog are saved; the accuracy of the AP index evaluation model is adopted, and the AP index calculation mode is as follows: AP = number of detected pet dog occurring video frames/number of pet dog occurring video frames in all.
9. A pet dog video target detection system based on improved YOLOv5-L is characterized by comprising a data acquisition module, an image extraction module, a preprocessing module, a model improvement training module and a result detection module;
the data acquisition module is used for respectively constructing an initial training set test set based on the acquired image data containing the pet dog and the acquired video data containing the pet dog;
the image extraction module is used for carrying out frame extraction on the video containing the pet dog to obtain a frame image;
the preprocessing module is used for preprocessing the initial training set to obtain a final training set;
the model improvement training module is used for improving and training a YOLOv5-L model, and specifically comprises the following steps: building a BackBone network, improving a Pred module, and adding an SK attention mechanism behind the BackBone network; setting training parameters, training the improved YOLOv5-L model, and storing an optimal weight parameter file; putting the optimal weight parameter file into a detector, detecting the final test concentrated video, storing all video frames of the detected pet dog, and evaluating a detection result by using an AP index to obtain an optimal improved YOLOv5-L model;
and the result detection module is used for inputting the video of the pet dog to be detected into the optimal YOLOv5-L model to obtain a corresponding detection result.
CN202211151017.8A 2022-09-21 2022-09-21 Pet dog video target detection method and system based on improved YOLOv5-L Pending CN115588150A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211151017.8A CN115588150A (en) 2022-09-21 2022-09-21 Pet dog video target detection method and system based on improved YOLOv5-L

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211151017.8A CN115588150A (en) 2022-09-21 2022-09-21 Pet dog video target detection method and system based on improved YOLOv5-L

Publications (1)

Publication Number Publication Date
CN115588150A true CN115588150A (en) 2023-01-10

Family

ID=84773007

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211151017.8A Pending CN115588150A (en) 2022-09-21 2022-09-21 Pet dog video target detection method and system based on improved YOLOv5-L

Country Status (1)

Country Link
CN (1) CN115588150A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117132767A (en) * 2023-10-23 2023-11-28 中国铁塔股份有限公司湖北省分公司 Small target detection method, device, equipment and readable storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117132767A (en) * 2023-10-23 2023-11-28 中国铁塔股份有限公司湖北省分公司 Small target detection method, device, equipment and readable storage medium
CN117132767B (en) * 2023-10-23 2024-03-19 中国铁塔股份有限公司湖北省分公司 Small target detection method, device, equipment and readable storage medium

Similar Documents

Publication Publication Date Title
US20220092351A1 (en) Image classification method, neural network training method, and apparatus
CN111461083A (en) Rapid vehicle detection method based on deep learning
CN113470076B (en) Multi-target tracking method for yellow feather chickens in flat raising chicken house
CN112419202B (en) Automatic wild animal image recognition system based on big data and deep learning
CN111783712A (en) Video processing method, device, equipment and medium
CN113487576B (en) Insect pest image detection method based on channel attention mechanism
CN112613548B (en) User customized target detection method, system and storage medium based on weak supervised learning
CN112528058B (en) Fine-grained image classification method based on image attribute active learning
CN110827312A (en) Learning method based on cooperative visual attention neural network
CN113392937B (en) 3D point cloud data classification method and related device thereof
CN112528961A (en) Video analysis method based on Jetson Nano
CN112580458A (en) Facial expression recognition method, device, equipment and storage medium
CN116129291A (en) Unmanned aerial vehicle animal husbandry-oriented image target recognition method and device
CN116310718A (en) Method, system and equipment for detecting pest target based on YOLOv5 model
CN112668638A (en) Image aesthetic quality evaluation and semantic recognition combined classification method and system
CN116434002A (en) Smoke detection method, system, medium and equipment based on lightweight neural network
CN115588150A (en) Pet dog video target detection method and system based on improved YOLOv5-L
CN111898418A (en) Human body abnormal behavior detection method based on T-TINY-YOLO network
EP3467677A1 (en) Image screening method and device
CN114782859A (en) Method for establishing space-time perception positioning model of target behaviors and application
CN112560668A (en) Human behavior identification method based on scene prior knowledge
CN112132207A (en) Target detection neural network construction method based on multi-branch feature mapping
CN115578624A (en) Agricultural disease and pest model construction method, detection method and device
CN115294467A (en) Detection method and related device for tea diseases
CN111950586A (en) Target detection method introducing bidirectional attention

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination