CN113902044A - Image target extraction method based on lightweight YOLOV3 - Google Patents

Image target extraction method based on lightweight YOLOV3 Download PDF

Info

Publication number
CN113902044A
CN113902044A CN202111496943.4A CN202111496943A CN113902044A CN 113902044 A CN113902044 A CN 113902044A CN 202111496943 A CN202111496943 A CN 202111496943A CN 113902044 A CN113902044 A CN 113902044A
Authority
CN
China
Prior art keywords
target
network
center point
yolov3
coordinates
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111496943.4A
Other languages
Chinese (zh)
Other versions
CN113902044B (en
Inventor
徐嘉辉
王彬
徐凯
陈石
赵佳佳
王中杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Daoyuan Technology Group Co ltd
Original Assignee
Jiangsu Peregrine Microelectronics Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Peregrine Microelectronics Co ltd filed Critical Jiangsu Peregrine Microelectronics Co ltd
Priority to CN202111496943.4A priority Critical patent/CN113902044B/en
Publication of CN113902044A publication Critical patent/CN113902044A/en
Application granted granted Critical
Publication of CN113902044B publication Critical patent/CN113902044B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an image target extraction method based on lightweight YOLOV3, which improves the backbone network structure of the prior YOLOV3, adopts a deep separable convolution as a basic convolution block, introduces a point convolution for dimension increasing before the deep separable convolution to strengthen the extraction capability of characteristics, and simultaneously introduces residual connection on the premise of ensuring the same downsampling multiple, thereby greatly reducing the parameters of the network and enabling the trained model to be more easily deployed on low-computation embedded equipment. In addition, the method adopts a mode of predicting the target center to realize the detection of the target, reduces the parameters and complexity required by the network head compared with the prior Yolov3, and simultaneously, because a large number of prior frames are not needed, the network does not need to use a non-maximum suppression algorithm during reasoning, thereby greatly increasing the reasoning speed.

Description

Image target extraction method based on lightweight YOLOV3
Technical Field
The invention relates to the field of target detection, in particular to an image target extraction method based on improved YOLOV 3.
Background
The task of object detection, which is to find all objects of interest in an image and determine their category and location, is one of the core problems in the field of computer vision. Because various objects have different appearances, shapes and postures, and interference of factors such as illumination, shielding and the like during imaging is added, target detection is always the most challenging problem in the field of computer vision.
Target detection algorithms based on deep learning are mainly classified into two categories: two stage and One stage. The network of the Tow Stage firstly generates a region, which is called region pro positive, RP for short, that is, a preselected frame that may contain the object to be detected, and then performs sample classification through the convolutional neural network. Common tow stage target detection algorithms are: R-CNN, SPP-Net, Fast R-CNN, and R-FCN, and the like. The One Stage network pursuit speed abandons the two-Stage architecture, namely a separate network is not set to generate a region proxy, and intensive sampling is directly carried out on the feature diagram to generate a large number of prior frames. Common one stage target detection algorithms are: YOLO, SSD, RetinaNet, and the like.
Among them, the YOLO series network is the most classical algorithm in one stage. Firstly, the YOLO algorithm can extract three feature maps with different scales to detect a large target, a medium target and a small target respectively, then a large number of prior frames can be generated on the three feature maps, and then the prior frames are selected through a non-maximum suppression algorithm. The YOLO series network speed has been greatly improved compared to other networks, but the application of YOLO to low-cost devices such as embedded devices has the following problems:
1. the backbone network Darknet of the YOLOV3 references the idea of Resnet, improves the feature extraction capability of the network, but greatly increases the depth and parameters of the network, so that the model trained by the network is large, the model cannot be deployed on low-computing embedded equipment, and the cost of application landing is increased.
2. The prior frame mechanism of YOLOV3 increases the complexity of the network header, increases the network parameters, and at the same time, the network needs to use a non-maximum suppression algorithm to screen the prior frame, so the model takes more time during reasoning.
Disclosure of Invention
The purpose of the invention is as follows: aiming at the prior art, an image target extraction method based on lightweight YOLOV3 is provided, so that parameters of a traditional YOLOV3 network are greatly reduced, and a trained model is easier to deploy on low-computation embedded equipment.
The technical scheme is as follows: an image target extraction method based on lightweight YOLOV3 comprises the following steps:
step 1: constructing a lightweight YOLOV3 network;
the backbone network of the lightweight YOLOV3 network comprises a CBL module and a plurality of Res modules which are connected in sequence, wherein the CBL module comprises a point convolution of 1 x 1, a depth separable convolution, a BN layer and a Leakyrelu, and the Res module comprises two connected CBL modules; the input picture outputs a feature map with three scales after downsampling and feature fusion of the backbone network, wherein the downsampling multiples are 8, 16 and 32 times respectively; under the condition of the downsampling multiple, balancing the feature extraction capability of the network and the number of network parameters by adjusting the number of Res modules;
the Head network of the lightweight YOLOV3 network is composed of three conv convolution layers, and the sizes of the three conv convolution layers are respectively as follows: 1 x cs, 1 x 2, where cs represents the number of classes of the dataset; the three conv convolution layers are output respectively: the method comprises the steps that a central point coordinate prediction value, an offset prediction value of a target central point and a target size prediction value of each category of target of a data set are obtained, wherein the target size refers to the width and the height of a target frame where the target is located;
step 2: training the lightweight YOLOV3 network;
firstly, marking a training set picture, including a target size
Figure 768827DEST_PATH_IMAGE001
Coordinates of the center point of the object
Figure 941182DEST_PATH_IMAGE002
Category c of the target; and calculating to obtain the characteristic graph size of the network output according to the labeling information
Figure 79908DEST_PATH_IMAGE003
Coordinates of center point of target in feature map
Figure 21319DEST_PATH_IMAGE004
Wherein
Figure 338031DEST_PATH_IMAGE005
Figure 997683DEST_PATH_IMAGE006
Figure 956411DEST_PATH_IMAGE007
Meaning that the rounding is done down,Rrepresents a downsampling multiple; the target size
Figure 10386DEST_PATH_IMAGE001
From the width of the target frame in which the target is locatedWAnd heightHForming;
then, a pixel circle around the center point of the target with r as the radius is subjected to gaussian smoothing processing to obtain:
Figure 29157DEST_PATH_IMAGE008
wherein,
Figure 176105DEST_PATH_IMAGE009
representing pixel coordinates
Figure 407366DEST_PATH_IMAGE010
The confidence of the class of the process c,
Figure 323369DEST_PATH_IMAGE009
the value of (a) is between 0 and 1,
Figure 513042DEST_PATH_IMAGE011
is a standard deviation obtained according to the target size in a self-adaptive manner; the confidence values outside the pixel circle are all set to be 0;
finally, training the network by using image data subjected to Gaussian smoothing;
and step 3: inputting the test picture into the trained lightweight YOLOV3 network for target feature extraction, and outputting the central point coordinate predicted value of each category of targets by the network
Figure 147286DEST_PATH_IMAGE012
Predicted value of offset of target center point
Figure 431506DEST_PATH_IMAGE013
And decoding the coordinates of the upper left corner and the lower right corner of the target frame according to the following formula according to the target size predicted value:
Figure 202016DEST_PATH_IMAGE014
wherein,
Figure 562590DEST_PATH_IMAGE015
and
Figure 418550DEST_PATH_IMAGE016
respectively, representing the predicted values of the width and height of the target dimension.
Further, in the step 2, if two adjacent targets exist in the same picture, the gaussian smoothing processing is performed by taking each target as a center, and the confidence of each pixel at the overlapping portion of the two pixel circles is correspondingly greater.
Further, the radius r is determined by the following equation:
Figure 257193DEST_PATH_IMAGE017
wherein, w and h are the width and height of the target frame where the marked target is located;overlapthe set threshold value represents the intersection ratio of the shifted frame and the target frame.
Further, in step 2, in training the network, the loss function adopted is as follows:
Figure 147789DEST_PATH_IMAGE018
wherein,
Figure 928532DEST_PATH_IMAGE019
Figure 6209DEST_PATH_IMAGE020
in order to adjust the coefficients of the loss function,
Figure 914123DEST_PATH_IMAGE021
is a loss function value;
Figure 659225DEST_PATH_IMAGE022
loss function for target center point:
Figure 96022DEST_PATH_IMAGE023
wherein,Nindicating the number of objects in the picture,
Figure 926575DEST_PATH_IMAGE024
all coordinate points of the channel where the c category is located are shown,
Figure 90709DEST_PATH_IMAGE025
representing coordinates
Figure 690318DEST_PATH_IMAGE026
The confidence level obtained by the prediction of the class c,
Figure 298016DEST_PATH_IMAGE027
and
Figure 336904DEST_PATH_IMAGE028
representing a hyper-parameter;
Figure 321041DEST_PATH_IMAGE029
as center point offset loss function:
Figure 775156DEST_PATH_IMAGE030
wherein,
Figure 819335DEST_PATH_IMAGE031
as coordinates of the center point of the object
Figure 358901DEST_PATH_IMAGE002
Is shown in a schematic representation of (a),
Figure 146728DEST_PATH_IMAGE032
indicating the predicted target center point offset,
Figure 970197DEST_PATH_IMAGE033
as coordinates of the center point of the object in the feature map
Figure 919698DEST_PATH_IMAGE004
A schematic representation of;
Figure 946560DEST_PATH_IMAGE034
loss function for target size:
Figure 538078DEST_PATH_IMAGE035
wherein,
Figure 966786DEST_PATH_IMAGE036
and predicting the target size.
Has the advantages that: 1. the invention improves the structure of the existing backbone network of YOLOV3, adopts depth separable convolution as a basic convolution block, introduces a point convolution for dimension increasing before the depth separable convolution to enhance the extraction capability of the characteristics, and introduces residual connection on the premise of ensuring the same downsampling multiple, thereby greatly reducing the parameters of the network and enabling the trained model to be more easily deployed on low-computation-power embedded equipment.
2. The method adopts a mode of predicting the target center to realize the detection of the target, reduces the parameters and complexity required by the network head compared with the prior YOLOV3, and simultaneously, because a large number of prior frames are not needed, the network does not need to use a non-maximum suppression algorithm during reasoning, thereby greatly increasing the reasoning speed.
Drawings
FIG. 1 is a flow chart of the method of the present invention;
FIG. 2 is a diagram of the backbone network architecture of the lightweight YOLOV3 of the present invention;
FIG. 3 is a block diagram of the CBL module in the lightweight YOLOV3 of the present invention;
FIG. 4 is a block diagram of the Res module in the lightweight YoloV3 according to the present invention;
FIG. 5 is a block diagram of the Head network in the lightweight YOLOV3 of the present invention;
fig. 6 is a complete block diagram of the lightweight YOLOV3 of the present invention.
Detailed Description
The invention is further explained below with reference to the drawings.
As shown in fig. 1, an image target extraction method based on lightweight YOLOV3 includes:
step 1: a lightweight YOLOV3 network was constructed.
The backbone network of the lightweight YOLOV3 network includes a CBL module and Res modules connected in sequence, as shown in fig. 2. Wherein the CBL module is composed of 1 × 1 dot convolution, depth separable convolution, BN layer and Leakyrelu, as shown in fig. 3. In this embodiment, the size of the input picture is 608 × 608, and feature maps of 76 × 76, 38 × 38, and 19 × 19 are output after the downsampling and feature fusion of the backbone network, that is, the downsampling multiples are 8, 16, and 32 times, respectively.
The Res module includes two CBL modules connected as shown in fig. 4; residual connection is introduced by adding a Res module, so that simple repetition of the convolutional layer is reduced, and the training difficulty of the network is reduced. Under the condition of ensuring the same downsampling multiple, the feature extraction capability of the network and the number of network parameters can be balanced by adjusting the number of Res modules according to the feature complexity of the picture. Specifically, the number of Res modules can be increased to increase the feature extraction capability of the network when the picture is complicated, and the number of Res modules can be decreased to decrease the parameter number and the calculation amount of the network when the picture is simple. The backbone network is connected to the Neck network of the YOLOV3 network.
The Head network of the lightweight YOLOV3 network is composed of three conv convolution layers, the sizes of which are respectively as follows: 1 × cls, 1 × 2, where cls represents the number of classes of the dataset. The three conv convolution layers of the Head network respectively output: the central point coordinate prediction value of each category target of the data set, the offset prediction value of the target central point and the target size prediction value. In this embodiment, the output sizes of the three conv convolution layers are respectively: 19 × cls, 19 × 2; the target size refers to the width and height of the target frame where the target is located.
Compared with the prior art, when the number of categories is 80, the Head network in the prior art measures: 76 × 255 × 128+38 × 255 × 1024=377057280 (377 MFLOPS), the calculated quantities after applying the inventive solution become: 76 × 84 × 128+38 × 84 × 256+19 × 84 × 1024=124207104 (124 MFLOPS). The original parameter number was changed from 255 × 128+255 × 256+255 × 1024=359040 to 84 × 128+84 × 256+84 × 1024= 11827. Meanwhile, compared with the prior art, a large number of prior frames are not required to be generated, so that the time for carrying out non-maximum suppression is saved during network reasoning, and the speed of network reasoning is increased.
The complete YOLOV3 network constructed by the present invention is shown in fig. 6.
Step 2: the constructed lightweight YOLOV3 network was trained.
Firstly, marking a training set picture, including a target size
Figure 87188DEST_PATH_IMAGE001
Coordinates of the center point of the object
Figure 116193DEST_PATH_IMAGE002
Category c of the target; and calculating to obtain the characteristic graph size of the network output according to the labeling information
Figure 980244DEST_PATH_IMAGE003
Coordinates of center point of target in feature map
Figure 263458DEST_PATH_IMAGE004
Wherein
Figure 85920DEST_PATH_IMAGE005
Figure 821795DEST_PATH_IMAGE006
Figure 489537DEST_PATH_IMAGE007
Meaning that the rounding is done down,Rrepresents a downsampling multiple; wherein the target size
Figure 142104DEST_PATH_IMAGE001
From the width of the target frame in which the target is locatedWAnd heightHAnd (4) forming.
In order to make the training process smoother, a pixel circle which takes r as a radius around the central point of the target is subjected to Gaussian smoothing processing to obtain:
Figure 869888DEST_PATH_IMAGE008
wherein,
Figure 93059DEST_PATH_IMAGE009
representing pixel coordinates
Figure 564492DEST_PATH_IMAGE010
The confidence of the class of the process c,
Figure 556719DEST_PATH_IMAGE009
the value of (a) is between 0 and 1,
Figure 701743DEST_PATH_IMAGE009
a larger value of (d) represents a more likely target to be detected;
Figure 677789DEST_PATH_IMAGE011
is the standard deviation obtained by self-adaptation according to the target size.
The radius r is determined by the following equation:
Figure 687333DEST_PATH_IMAGE017
wherein, w and h are the width and height of the target frame where the marked target is located;overlapthe threshold value is set to 0.7 in this embodiment, and represents the intersection ratio of the frame after the shift and the target frame. The confidence values outside the pixel circle are all set to 0.
If two adjacent targets exist in the same picture, the above Gaussian smoothing processing is respectively carried out by taking each target as a center, and the confidence coefficient of each pixel at the overlapping part of two pixel circles is correspondingly taken as a larger value.
Finally, the network is trained using the image data for the gaussian smoothing process.
In this embodiment, the face data is used for network training, so the category c of the target is set to 2, that is, the target represents two categories: one is a face and the other is not a face. In training the network, the loss function used is as follows:
Figure 534066DEST_PATH_IMAGE018
wherein,
Figure 869233DEST_PATH_IMAGE019
Figure 66996DEST_PATH_IMAGE020
to adjust the coefficients of the loss function, the present embodiment is set to 0.1 and 1;
Figure 129499DEST_PATH_IMAGE021
the loss function value is obtained.
Figure 361897DEST_PATH_IMAGE022
Loss function for target center point:
Figure 602385DEST_PATH_IMAGE023
wherein,Nindicating the number of objects in the picture,
Figure 287445DEST_PATH_IMAGE024
all coordinate points of the channel where the c category is located are shown,
Figure 904371DEST_PATH_IMAGE025
representing coordinates
Figure 974964DEST_PATH_IMAGE026
The confidence degree obtained by predicting the category c;
Figure 120774DEST_PATH_IMAGE027
and
Figure 27551DEST_PATH_IMAGE028
indicating adjustable hyper-parameters, set to 2 and 4, respectively, in this embodiment.
Figure 182588DEST_PATH_IMAGE029
As center point offset loss function:
Figure 107688DEST_PATH_IMAGE030
wherein,
Figure 955558DEST_PATH_IMAGE031
as coordinates of the center point of the object
Figure 615210DEST_PATH_IMAGE002
Is shown in a schematic representation of (a),
Figure 308359DEST_PATH_IMAGE032
indicating the predicted target center point offset,
Figure 369856DEST_PATH_IMAGE033
as coordinates of the center point of the object in the feature map
Figure 388628DEST_PATH_IMAGE004
Is schematically shown.
The predicted target center point coordinate corresponds to the feature map, and the predicted target center point coordinate can be mapped back to the original image through the target center point offset predicted by the network.
Figure 269996DEST_PATH_IMAGE034
Loss function for target size:
Figure 19034DEST_PATH_IMAGE035
wherein,
Figure 935037DEST_PATH_IMAGE036
and predicting the target size.
And step 3: inputting the test picture into a trained lightweight Yolov3 network for target feature extraction, and outputting the central point coordinate predicted value of each category of targets by the network
Figure 124710DEST_PATH_IMAGE012
Predicted value of offset of target center point
Figure 493374DEST_PATH_IMAGE013
And decoding the coordinates of the upper left corner and the lower right corner of the target frame according to the following formula according to the target size predicted value:
Figure 793906DEST_PATH_IMAGE014
wherein,
Figure 564416DEST_PATH_IMAGE015
and
Figure 908678DEST_PATH_IMAGE016
respectively, representing the predicted values of the width and height of the target dimension.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (4)

1. An image target extraction method based on lightweight YOLOV3 is characterized by comprising the following steps:
step 1: constructing a lightweight YOLOV3 network;
the backbone network of the lightweight YOLOV3 network comprises a CBL module and a plurality of Res modules which are connected in sequence, wherein the CBL module comprises a point convolution of 1 x 1, a depth separable convolution, a BN layer and a Leakyrelu, and the Res module comprises two connected CBL modules; the input picture outputs a feature map with three scales after downsampling and feature fusion of the backbone network, wherein the downsampling multiples are 8, 16 and 32 times respectively; under the condition of the downsampling multiple, balancing the feature extraction capability of the network and the number of network parameters by adjusting the number of Res modules;
the Head network of the lightweight YOLOV3 network is composed of three conv convolution layers, and the sizes of the three conv convolution layers are respectively as follows: 1 x cs, 1 x 2, where cs represents the number of classes of the dataset; the three conv convolution layers are output respectively: the method comprises the steps that a central point coordinate prediction value, an offset prediction value of a target central point and a target size prediction value of each category of target of a data set are obtained, wherein the target size refers to the width and the height of a target frame where the target is located;
step 2: training the lightweight YOLOV3 network;
firstly, marking a training set picture, including a target size
Figure 907463DEST_PATH_IMAGE001
Coordinates of the center point of the object
Figure 198767DEST_PATH_IMAGE002
Category c of the target; and calculating to obtain the characteristic graph size of the network output according to the labeling information
Figure 200221DEST_PATH_IMAGE003
Coordinates of center point of target in feature map
Figure 133542DEST_PATH_IMAGE004
Wherein
Figure 271262DEST_PATH_IMAGE005
Figure 999046DEST_PATH_IMAGE006
Figure 468555DEST_PATH_IMAGE007
Meaning that the rounding is done down,Rrepresents a downsampling multiple; the target size
Figure 674409DEST_PATH_IMAGE001
From the width of the target frame in which the target is locatedWAnd heightHForming;
then, a pixel circle around the center point of the target with r as the radius is subjected to gaussian smoothing processing to obtain:
Figure 932215DEST_PATH_IMAGE008
wherein,
Figure 830901DEST_PATH_IMAGE009
representing pixel coordinates
Figure 806947DEST_PATH_IMAGE010
The confidence of the class of the process c,
Figure 65759DEST_PATH_IMAGE009
the value of (a) is between 0 and 1,
Figure 178071DEST_PATH_IMAGE011
is a standard deviation obtained according to the target size in a self-adaptive manner; the confidence values outside the pixel circle are all set to be 0;
finally, training the network by using image data subjected to Gaussian smoothing;
and step 3: inputting the test picture into the trained lightweight YOLOV3 network for target feature extraction, and outputting the central point coordinate predicted value of each category of targets by the network
Figure 247658DEST_PATH_IMAGE012
Predicted value of offset of target center point
Figure 711001DEST_PATH_IMAGE013
And decoding the coordinates of the upper left corner and the lower right corner of the target frame according to the following formula according to the target size predicted value:
Figure 258657DEST_PATH_IMAGE014
wherein,
Figure 491055DEST_PATH_IMAGE015
and
Figure 246390DEST_PATH_IMAGE016
respectively, representing the predicted values of the width and height of the target dimension.
2. The method as claimed in claim 1, wherein in step 2, if two adjacent objects exist in the same picture, the gaussian smoothing process is performed around each object, and the confidence of each pixel in the overlapping portion of two pixel circles is correspondingly larger.
3. The method for extracting image objects based on lightweight YOLOV3 as claimed in claim 1, wherein the radius r is determined by the following formula:
Figure 665870DEST_PATH_IMAGE017
wherein, w and h are the width and height of the target frame where the marked target is located;overlapthe set threshold value represents the intersection ratio of the shifted frame and the target frame.
4. The method for extracting image objects based on lightweight YOLOV3 as claimed in any one of claims 1-3, wherein in step 2, the loss function used in training the network is as follows:
Figure 282796DEST_PATH_IMAGE018
wherein,
Figure 369701DEST_PATH_IMAGE019
Figure 46670DEST_PATH_IMAGE020
in order to adjust the coefficients of the loss function,
Figure 953446DEST_PATH_IMAGE021
is a loss function value;
Figure 357752DEST_PATH_IMAGE022
loss function for target center point:
Figure 564742DEST_PATH_IMAGE023
wherein,Nindicating the number of objects in the picture,
Figure 147033DEST_PATH_IMAGE024
all coordinate points of the channel where the c category is located are shown,
Figure 806685DEST_PATH_IMAGE025
representing coordinates
Figure 499834DEST_PATH_IMAGE026
The confidence level obtained by the prediction of the class c,
Figure 295752DEST_PATH_IMAGE027
and
Figure 301141DEST_PATH_IMAGE028
representing a hyper-parameter;
Figure 448089DEST_PATH_IMAGE029
as center point offset loss function:
Figure 944929DEST_PATH_IMAGE030
wherein,
Figure 860933DEST_PATH_IMAGE031
as coordinates of the center point of the object
Figure 785026DEST_PATH_IMAGE002
Is shown in a schematic representation of (a),
Figure 402958DEST_PATH_IMAGE032
indicating the predicted target center point offset,
Figure 703490DEST_PATH_IMAGE033
as coordinates of the center point of the object in the feature map
Figure 474000DEST_PATH_IMAGE004
A schematic representation of;
Figure 568995DEST_PATH_IMAGE034
loss function for target size:
Figure 690534DEST_PATH_IMAGE035
wherein,
Figure 529177DEST_PATH_IMAGE036
and predicting the target size.
CN202111496943.4A 2021-12-09 2021-12-09 Image target extraction method based on lightweight YOLOV3 Active CN113902044B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111496943.4A CN113902044B (en) 2021-12-09 2021-12-09 Image target extraction method based on lightweight YOLOV3

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111496943.4A CN113902044B (en) 2021-12-09 2021-12-09 Image target extraction method based on lightweight YOLOV3

Publications (2)

Publication Number Publication Date
CN113902044A true CN113902044A (en) 2022-01-07
CN113902044B CN113902044B (en) 2022-03-01

Family

ID=79025453

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111496943.4A Active CN113902044B (en) 2021-12-09 2021-12-09 Image target extraction method based on lightweight YOLOV3

Country Status (1)

Country Link
CN (1) CN113902044B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116881830A (en) * 2023-07-26 2023-10-13 中国信息通信研究院 Self-adaptive detection method and system based on artificial intelligence

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109325504A (en) * 2018-09-07 2019-02-12 中国农业大学 A kind of underwater sea cucumber recognition methods and system
CN110796186A (en) * 2019-10-22 2020-02-14 华中科技大学无锡研究院 Dry and wet garbage identification and classification method based on improved YOLOv3 network
CN111191566A (en) * 2019-12-26 2020-05-22 西北工业大学 Optical remote sensing image multi-target detection method based on pixel classification
CN111222474A (en) * 2020-01-09 2020-06-02 电子科技大学 Method for detecting small target of high-resolution image with any scale
CN112101434A (en) * 2020-09-04 2020-12-18 河南大学 Infrared image weak and small target detection method based on improved YOLO v3
CN112581386A (en) * 2020-12-02 2021-03-30 南京理工大学 Full-automatic lightning arrester detection and tracking method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109325504A (en) * 2018-09-07 2019-02-12 中国农业大学 A kind of underwater sea cucumber recognition methods and system
CN110796186A (en) * 2019-10-22 2020-02-14 华中科技大学无锡研究院 Dry and wet garbage identification and classification method based on improved YOLOv3 network
CN111191566A (en) * 2019-12-26 2020-05-22 西北工业大学 Optical remote sensing image multi-target detection method based on pixel classification
CN111222474A (en) * 2020-01-09 2020-06-02 电子科技大学 Method for detecting small target of high-resolution image with any scale
CN112101434A (en) * 2020-09-04 2020-12-18 河南大学 Infrared image weak and small target detection method based on improved YOLO v3
CN112581386A (en) * 2020-12-02 2021-03-30 南京理工大学 Full-automatic lightning arrester detection and tracking method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
白士磊: "轻量级YOLOv3的交通标志检测算法", 《计算机与现代化》 *
齐榕: "基于YOLOv3的轻量级目标检测网络", 《计算机应用与软件》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116881830A (en) * 2023-07-26 2023-10-13 中国信息通信研究院 Self-adaptive detection method and system based on artificial intelligence

Also Published As

Publication number Publication date
CN113902044B (en) 2022-03-01

Similar Documents

Publication Publication Date Title
CN110210551B (en) Visual target tracking method based on adaptive subject sensitivity
CN109543606B (en) Human face recognition method with attention mechanism
CN107945204B (en) Pixel-level image matting method based on generation countermeasure network
CN107016357B (en) Video pedestrian detection method based on time domain convolutional neural network
CN108182456B (en) Target detection model based on deep learning and training method thereof
WO2023015743A1 (en) Lesion detection model training method, and method for recognizing lesion in image
CN113569667B (en) Inland ship target identification method and system based on lightweight neural network model
CN111445488B (en) Method for automatically identifying and dividing salt body by weak supervision learning
CN106599789A (en) Video class identification method and device, data processing device and electronic device
CN107145867A (en) Face and face occluder detection method based on multitask deep learning
CN108182454A (en) Safety check identifying system and its control method
CN111079739B (en) Multi-scale attention feature detection method
CN108197569A (en) Obstacle recognition method, device, computer storage media and electronic equipment
CN115661943B (en) Fall detection method based on lightweight attitude assessment network
CN113420643B (en) Lightweight underwater target detection method based on depth separable cavity convolution
CN111832484A (en) Loop detection method based on convolution perception hash algorithm
KR20170038622A (en) Device and method to segment object from image
CN113326735B (en) YOLOv 5-based multi-mode small target detection method
CN107292229A (en) A kind of image-recognizing method and device
CN110136162B (en) Unmanned aerial vehicle visual angle remote sensing target tracking method and device
CN111310609B (en) Video target detection method based on time sequence information and local feature similarity
CN112633149A (en) Domain-adaptive foggy-day image target detection method and device
CN106780546A (en) The personal identification method of the motion blur encoded point based on convolutional neural networks
CN114419413A (en) Method for constructing sensing field self-adaptive transformer substation insulator defect detection neural network
CN112418032A (en) Human behavior recognition method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20220316

Address after: No. 88, Wenchang East Road, Yangzhou, Jiangsu 225000

Patentee after: Jiangsu Daoyuan Technology Group Co.,Ltd.

Address before: 211135 enlightenment star Nanjing maker space G41, second floor, No. 188, Qidi street, Qilin science and Technology Innovation Park, Qixia District, Nanjing, Jiangsu Province

Patentee before: Jiangsu Peregrine Microelectronics Co.,Ltd.