CN112668662B - Outdoor mountain forest environment target detection method based on improved YOLOv3 network - Google Patents

Outdoor mountain forest environment target detection method based on improved YOLOv3 network Download PDF

Info

Publication number
CN112668662B
CN112668662B CN202011639547.8A CN202011639547A CN112668662B CN 112668662 B CN112668662 B CN 112668662B CN 202011639547 A CN202011639547 A CN 202011639547A CN 112668662 B CN112668662 B CN 112668662B
Authority
CN
China
Prior art keywords
target
loss
network
mountain forest
box
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011639547.8A
Other languages
Chinese (zh)
Other versions
CN112668662A (en
Inventor
彭志红
蒋卓
陈杰
奚乐乐
王星博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Original Assignee
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BIT filed Critical Beijing Institute of Technology BIT
Priority to CN202011639547.8A priority Critical patent/CN112668662B/en
Publication of CN112668662A publication Critical patent/CN112668662A/en
Application granted granted Critical
Publication of CN112668662B publication Critical patent/CN112668662B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a field mountain forest environment target detection method based on an improved YOLOv3 network, which comprises the steps of manufacturing a target detection data set of a field mountain forest environment, introducing a space transformation layer based on an attention mechanism to be embedded into a YOLOv3 detection model, adding a de-STL layer on the basis to facilitate model training, finally designing an improved YOLOv3 detection network, and then performing fine-tuning on the network to obtain a target detection model finally used in the field mountain forest environment; the invention can accurately detect the target in the field mountain forest environment, and improves the precision of the target detection of the detector in the field mountain forest environment and the recall rate of the small-scale target.

Description

Outdoor mountain forest environment target detection method based on improved YOLOv3 network
Technical Field
The invention belongs to the technical field of computer vision and machine learning, and particularly relates to a field mountain forest environment target detection method based on an improved YOLOv3 network.
Background
With the development of science and technology, the target detection technology has become a popular direction and research focus in the field of computer vision. The target detection technology is applied to various actual scenes, such as the fields of unmanned driving, unmanned aerial vehicle monitoring, scene recognition and the like, but the target detection algorithm in the field mountain forest environment still has many problems. Due to the fact that the outdoor mountain forest environment has high complexity, the situations that illumination changes violently, climate changes are irregular, shielding between a target and a non-target is serious and the like exist, the task difficulty of target detection and the like based on vision is increased, and the detection speed also needs to meet real-time performance due to actual requirements.
Traditional target detection algorithms rely primarily on the apparent characteristics of real objects. For the object with rich texture, feature descriptors such as SIFT, PCA-SIFT, SURF and the like are designed manually, feature points with strong representation are extracted from the image, and then matching detection is carried out. For the object with less texture or even no texture, the template matching method is the preferred solution. The core problem of template matching is to design a reasonable and universal distance measurement mode. However, the above two methods are easily affected by the environment, such as shading, confusing background, etc., and the conventional method is very sensitive to the illumination variation and noise of the environment.
With the development of deep learning techniques in recent years, more and more engineering techniques are beginning to apply deep convolutional networks (CNN) to solve practical problems. By using the target detection method of deep learning, a feature descriptor designed manually is not needed, and the characteristics of deep learning are utilized to enable the network to learn higher-level semantic information. And the deep convolutional network (CNN) is more robust to factors such as illumination, scale change and noise, and the generalization capability of the model and the accuracy of the target detection algorithm are greatly improved.
In the literature (Girshick R, donahue J, darrell T, et al, rich feature technologies for the acquisition object detection and the management segmentation [ C ]// Proceedings of the IEEE conference on computer vision and pattern recognition.2014-587.), a deep learning-based target detection algorithm is firstly proposed, which is a typical Two Stage algorithm, and mainly uses a method of candidate box extraction + classification regression for detection, and in addition, the typical algorithms also have a document (Ren S, he K, girshick R, ingredient R-nnn: todards real-e object detection with a real-time processing work [ C ]// advance in network processing system 91.2015-99. The algorithm can not meet the requirements of the network speed calculation in real-time due to the high precision of the algorithm. To balance the requirements of both precision and speed, another class of One Stage target detection models based on regression ideas has received much attention, among which the more typical are the YOLO series (Redmon J, divvala S, girshick R, et al. You only lock on: unifield, real-time object detection [ C ]// Proceedings of the IEEE con on component vision and pattern registration.2016: 779-788.) and the SSD series (Liu W, anguelo D, han D, et al. Ssd: single shot detection [ C ]// European con on component vision. Springer, cham,2016 21-37.). The SSD detection model is a set of default boxes (default boxes) that discretize the output space of the bounding box into different aspect ratios, fine-tunes the default boxes, and detects the default boxes on feature maps of different scales, but the accuracy is higher, but the real-time performance is far less than that of YOLO. The real-time performance of the YOLO series is very good, the mutual balance of speed and precision is basically achieved by the YOLOv3 model, the detection effect based on small targets and multiple targets is greatly improved, and compared with an SSD model, the YOLO series is more widely applied due to the high efficiency of the YOLO.
Disclosure of Invention
In view of this, the invention provides a method for detecting targets in a field forest environment based on an improved YOLOv3 network, which can effectively improve the accuracy of target detection in the field forest environment by using a new network model.
The technical scheme for realizing the invention is as follows:
a field mountain forest environment target detection method based on an improved YOLOv3 network comprises the following steps:
acquiring a background picture and a foreground object of a field mountain forest, and presetting detection target types of people and vehicles;
secondly, superposing the background and the foreground through image preprocessing to generate a data set, acquiring bounding box and category data of a foreground object, and generating an xml file with the same format as the PASCALVOC2012 data set to obtain a training set, a verification set and a test set;
thirdly, based on a YOLOv3 network model, adding a spatial transformation layer STL behind feature extraction layers of different scales, training by taking feature maps of different scales as input to obtain different affine transformation results, and performing subsequent classification and bounding box regression on output features of the STL layer, wherein the method can effectively solve the problem of detection effect reduction when a target rotates and scales change in a field mountain forest environment;
fourthly, a de-STL conversion layer is added at last in the network, so that the final calculation result is matched with the encoding result of a true value bounding box which takes the feature map as the coordinate system before affine transformation, x and y corresponding to the original image coordinate system are obtained, and loss is calculated;
and step five, training the improved YOLOv3 network obtained in the step four by using the training set, the verification set and the test set obtained in the step two to obtain a performance optimal model.
Further, the first step specifically includes the following steps: by using
Obtaining images of four keywords of forest, valley, plain and wetland on the internet by python3+ Beautiful Soup + requests + lxml, wherein the number of the images is 700, 600 and 600, and the images which are not suitable are manually removed; data of people (person, category _ id = 1) and cars (car, category _ id = 3) in the COCO2014 data set are selected as foreground objects, as objects contained in the data set are more daily and have more deformation and shielding situations, and the foreground objects are randomly inserted into the background image to construct the field mountain forest environment data set.
Further, the third step specifically includes the following steps: the design of the STL layer based on the attention mechanism focuses on localization net, 6 parameters are output for an input feature map to perform affine transformation on the original feature map, and therefore the problem of detection accuracy reduction caused by rotation and scale change of a target is solved; therefore, after the STL is embedded into the conv26, conv43 and conv52 characteristic diagrams of the dark net-53, the small information loss amount and the sensitivity to the rotation change are ensured after passing through the STL layer, and then the loss calculation such as classification, regression and the like is carried out on the output result.
Further, the step four specifically includes the following processes: embedding the de-STL layer behind the convolution layer of the image target position so as to conveniently calculate the loss of location; wherein the loss of location is:
Figure GDA0003892301530000041
wherein λ is coord For the loss factor, S is the number of feature grid cells, B is the number of predicted boxes,
Figure GDA0003892301530000042
defining that if the target exists in the grid unit i, the jth bounding box prediction value is effective to the prediction, and is 1 at this time; if no target exists in grid cell i, it is 0; x is the number of i ,y i ,
Figure GDA0003892301530000043
Respectively representing the target predicted position and the width and height values of the output of the detection network,
Figure GDA0003892301530000044
the position of the target in the dataset and the value of the width height.
Further, the fifth step is specifically:
(1) The YOLOv3 model has 9 anchors in total, 3 outputs with different scales are used, and each output uses 3 anchors, so that each position of the output predicts 3 boxes; for each box, the output parameters comprise the target position coordinates and the width and height values, the confidence scores of objects in the box and the probability of each type of object in the box;
(2) The loss function for training the entire network is configured as follows:
Loss=λ 1 Loss loc2 Loss conf3 Loss cls
in the above formula, λ 1 Is a target position loss coefficient, λ 2 Is the target confidence loss coefficient, λ 3 A target class loss factor;
loss of target confidence Loss conf Binary Cross Entropy loss (Binary Cross Entropy) was used, as i E {0,1} represents whether the target actually exists in the predicted target boundary box i, 0 represents absence, and 1 represents existence;
Figure GDA0003892301530000051
defining that if the target exists in the grid unit i, the jth bounding box prediction value is effective to the prediction, and is 1 at this time; if no target is present in grid cell i, this time 0,
Figure GDA0003892301530000052
conversely, if there is a target in grid cell i, then the jth bounding box predictor is valid for the prediction, which is 0 at this time; if no target exists in grid cell i, then it is 1; lambda [ alpha ] noobj For the current grid without loss coefficient of real target object, lambda obj The loss coefficient of a real target object exists in the current grid;
Figure GDA0003892301530000053
sigmoid probability representing whether or not there is an object in the predicted object rectangular box i, c i Represents the true value, and the target confidence Loss conf The formula of (1) is as follows:
Figure GDA0003892301530000054
loss of target class Loss cls Also employed is a binary cross-entropy loss, where o ij E {0,1} represents whether the j-th class target really exists in the predicted target boundary box i, 0 represents the absence, and 1 represents the existence;
Figure GDA0003892301530000055
the Sigmoid probability, loss, of the j-th class target in the network prediction target boundary box i is shown cls The formula of (1) is:
Figure GDA0003892301530000056
(3) Training is carried out by using a designed loss function form and an SGD (random gradient descent) method, and an Adam mode is adopted as a gradient updating mode.
Has the beneficial effects that:
when the method is used for solving the target detection problem in the field mountain forest environment, the attention mechanism-based STL space conversion layer is introduced and connected to the characteristic input layer, so that the network can automatically learn affine transformation parameters to solve the rotation problem and the shielding problem of targets with different scales, the accuracy of small target detection can be improved, and the target detection precision is effectively improved. The final result shows that the method has high target detection effect on the wild mountain forest data set.
Drawings
FIG. 1 is a data set produced by the present invention;
FIG. 2 is a network model structure of the present invention;
fig. 3 is a detailed structure of the spatial transform layer STL;
FIG. 4 is a specific structure of the de-STL layer;
fig. 5 is a diagram showing the actual effect of the present invention.
Detailed Description
The invention is described in detail below by way of example with reference to the accompanying drawings.
The invention mainly aims at the technical problem that the existing target detection method is difficult to deal with in the field mountain forest environment under the severe conditions of object shielding, rotation, scale change and the like. The core of the method is a YOLOv3 target detection algorithm, which is a method for efficiently detecting targets with balanced precision and speed, and adopts a convolution network structure, and the specific structure of the convolution network structure is shown in figure 2. The invention can accurately detect the target in the field mountain forest environment, and improves the precision of the target detection of the detector in the field mountain forest environment and the recall rate of the small-scale target.
The detailed steps of the invention are as follows:
step 1: and acquiring a background picture and a foreground object of the wild mountain forest, and presetting detection types of people and vehicles. The invention uses python3+ Beautiful Soup + requests + lxml to respectively obtain four pictures of keywords of forest, valley, plain and wetland on the Internet, the number of the pictures is 700, 600, and the unsuitable pictures (such as pictures already containing foreground data and pictures with more watermarks) are manually rejected, and the average size of the pictures is [600,400] and the pictures are saved in a 'jpg' format.
And 2, step: through image preprocessing, a background picture and foreground data are superposed, the picture effect after synthesis is as shown in fig. 1, a data set is generated, the enclosure frame and the category data of an object are automatically acquired in a script, and an xml file with the same format as that of the PASCALVOC2012 data set is generated for subsequent training. When composing a picture, the following rules are followed:
a) Many deformation and shielding situations exist in the COCO data set, so that the randomness in the aspect is not added;
b) Adding scale changes to the foreground data such that (side with larger foreground/side with smaller background) is in the range of 0.2 to 0.45;
c) To make the synthesized picture more realistic, let | foreground pixel mean-background pixel mean | <30;
d) The probability of each picture according to 0.2,0.3,0.5 comprises 3,2,1 foreground instances, and the probability that each instance belongs to the category of people and vehicles is the same;
e) The bounding box format is (x _ min, y _ min), (x _ max, y _ max) format.
In addition, attention needs to be paid to the direction of the picture coordinate system when writing the script (for example, the height of the cv2. Immead function default picture is the x direction, the width of the cv2.Rectangle function default picture is the x direction, and the width of the picture is also the x direction in the xml file). 2500 pictures are synthesized, randomly divided into a training set and a testing set according to the proportion of 4:1, stored according to the storage structure of PACALVOC2012, and named and constructed to be Detection2500 for convenience of description.
And step 3: designing the STL layer as shown in fig. 3, the spatial transform network has the following characteristics: (1) The network can be easily embedded as a module in any network. (2) The parameters that the network needs to learn are the parameters of localization net. In other words, the network can autonomously learn what affine transformations should be done in the course of training to make the output of the classifier or detector with higher confidence. In addition, since the target detection task is different from the image classification task, besides the need to accurately know the type of the target, a bounding box of the target is also obtained. Taking this into account, and placing the STL in the first layer, i.e., behind the input image, is not reasonable for two reasons: a) Training of the STL may cause the input U to lose a portion of the information, and if features are extracted by the convolutional layer, an appreciable number of features may be lost, especially when the input image contains multiple objects simultaneously; b) One of the large characteristics of the YOLOv3 structure is that different scales of feature maps are "responsible" for detecting different scales of targets. Whereas the difference between the "proper" affine transformations required for objects with larger differences in scale is also larger relative to objects with similar scales. Therefore, it is reasonable to train different affine transformations for feature maps of different scales using the feature maps as input. And embedding the STL into feature maps with different scales, and then performing operations such as classification and bounding box regression on the output of the STL.
And 4, step 4: the de-STL layer is designed as shown in FIG. 4, wherein "de" is a negative prefix, and means "inverse", defining de-STL as the inverse operation of STL. The reason for adding this operation is that after STL, the output of the location part is based on the feature map after the affine transformation as the coordinate system, which obviously does not match the encoding result of the truth bounding box based on the feature map before the affine transformation as the coordinate system, so the output of the location part needs to be inverse transformed, i.e. de-STL, to obtain x and y matching the coordinate system, and then calculate the loss, and further train the network. Wherein Loss loc Comprises the following steps:
Figure GDA0003892301530000081
wherein λ is coord For the loss factor, S is the number of feature grid cells, B is the predicted box number,
Figure GDA0003892301530000082
defining that if the target exists in the grid unit i, the jth bounding box prediction value is effective to the prediction, and is 1 at this moment; if no target exists in grid cell i, it is 0; x is a radical of a fluorine atom i ,y i ,
Figure GDA0003892301530000091
Respectively representing the target predicted position and the width and height values of the output of the detection network,
Figure GDA0003892301530000092
the position of the target in the dataset and the value of the width height. This is further calculated from the offset of the default box output by the network. The target position in the image commonly used in the data set is represented by (x _ min, y _ min), (x _ max, y _ max), i.e. the position coordinates of the upper left corner and the lower right corner of the bounding box. In order to use the above-mentioned loss calculation method and maintain the translation and scaling invariance of the bounding box, it is necessary to encode it, i.e. to represent the form of the bounding box by the coordinates of the center point and the width and height.
And 5: the configuration network training process is as follows:
(1) There are a total of 9 anchors in the yollov 3 model, 3 outputs of different scales, 3 anchors for each output, so each position of the output predicts 3 boxes. For each box, the output parameters include x, y, w, h, which are further calculated according to the offset of default box output by the network, and the confidence score of the box having objects and the probability of each type of object in the box. Thus, for a VOC data set that contains 20 categories, the output of YOLOv3 has 3 sizes: 13 × 13 × (3 × (20 + 5)) =13 × 13 × 75,26 × 26 × (3 × (20 + 5)) =26 × 26 × 75,52 × 52 × (3 × (20 + 5)) =52 × 52 × 75.
(2) The loss function for training the entire network is configured as:
Loss=λ 1 Loss loc2 Loss conf3 Loss cls
in the above formula, λ 1 Is a target position loss coefficient, λ 2 Is the target confidence loss coefficient, λ 3 A target class loss factor;
loss of target confidence Loss conf Binary Cross Entropy losses (Binary Cross Entropy) were used, using o i E {0,1} represents whether the target actually exists in the predicted target boundary box i, 0 represents absence, and 1 represents existence;
Figure GDA0003892301530000093
defining that if the target exists in the grid unit i, the jth bounding box prediction value is effective to the prediction, and is 1 at this moment; if there is no target in grid cell i, which is 0 at this time,
Figure GDA0003892301530000094
conversely, if there is a target in grid cell i, then the jth bounding box predictor is valid for the prediction, which is 0 at this time; if no target exists in grid cell i, it is 1; lambda noobj For the current grid without loss coefficient of real target object, lambda obj The loss coefficient of a real target object exists in the current grid;
Figure GDA0003892301530000101
sigmoid probability representing whether or not there is a target in the predicted target rectangular box i, c i Represents the true value, and the target confidence Loss conf The formula of (1) is:
Figure GDA0003892301530000102
loss of target class Loss cls Also using binary cross entropy lossesThe reason is that the same object can be classified into a plurality of categories at the same time, for example, cats can be classified into cats and animals, which can cope with more complicated scenes. Wherein o is ij E {0,1} represents whether the j-th class target really exists in the predicted target boundary box i, 0 represents the absence, and 1 represents the existence;
Figure GDA0003892301530000103
the probability of Sigmoid of j class target in the boundary box i of the network prediction target is shown, S is the number of feature grid units, B is the predicted box number, loss cls The formula of (1) is:
Figure GDA0003892301530000104
(3) Training is carried out by utilizing a designed loss function form and an SGD (random gradient descent) method, and an Adam method is adopted as a gradient updating method.
Compared with other detection methods, the model provided by the invention has higher detection precision and performance for the target detection task of the field mountain forest environment, and has higher accuracy for the detection of small targets, and the specific detection effect is shown in fig. 5.
In summary, the above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (5)

1. A field mountain forest environment target detection method based on an improved YOLOv3 network is characterized by comprising the following steps:
acquiring a background picture and a foreground object of a field mountain forest, and presetting detection target types of people and vehicles;
secondly, superposing the background and the foreground through image preprocessing to generate a data set, acquiring bounding box and category data of a foreground object, and generating an xml file with the same format as the PASCALVOC2012 data set to obtain a training set, a verification set and a test set;
thirdly, based on a YOLOv3 network model, adding a spatial transformation layer STL behind feature extraction layers with different scales, training to obtain different affine transformation results by taking feature maps with different scales as input, and then performing subsequent classification and bounding box regression on output features of the STL layer;
fourthly, adding an inverse spatial transform layer de-STL at the end of the network, enabling a final calculation result to be matched with a coding result of a true value surrounding frame which takes the feature map as a coordinate system before affine transformation to obtain x and y corresponding to the original image coordinate system, and then calculating the loss of location, the loss of target confidence coefficient and the loss of target category to obtain an improved YOLOv3 network model;
and step five, training the improved YOLOv3 network obtained in the step four by using the training set, the verification set and the test set obtained in the step two to obtain an optimal performance model, and performing target detection by using the optimal model.
2. The method for detecting the targets in the environment of the wild mountain forest based on the improved YOLOv3 network as claimed in claim 1, wherein the first step specifically comprises the following steps: acquiring pictures of four keywords of forest, valley, plane and welland on the Internet by utilizing python3+ Beautiful Soup + requests + lxml, wherein the number of the pictures is 700, 600 and 600 respectively, and unsuitable pictures are manually removed; and selecting data of people and vehicles in the COCO2014 data set as foreground objects, and randomly inserting the foreground objects into the background image to construct a field mountain forest environment data set.
3. The method for detecting the target in the wild mountain forest environment based on the improved YOLOv3 network as claimed in claim 1, wherein the third step specifically comprises the following steps: the design of the STL layer based on the attention mechanism focuses on localization net, 6 parameters are output for an input feature map to perform affine transformation on the original feature map, and therefore the problem of detection accuracy reduction caused by rotation and scale change of a target is solved; therefore, after the STL is embedded into the conv26, conv43 and conv52 characteristic maps of the dark net-53, the small information loss amount and the sensitivity to the rotation change are ensured after the STL passes through the STL layer, and then the classification and regression loss calculation are carried out on the output result.
4. The method for detecting the targets in the field mountain forest environment based on the improved YOLOv3 network as claimed in claim 1, wherein the step four specifically comprises the following processes: embedding the de-STL layer behind the convolution layer of the image target position so as to conveniently calculate the loss of location; wherein the loss of location is:
Figure FDA0003892301520000021
wherein λ is coord For the loss factor, S is the number of feature grid cells, B is the predicted box number,
Figure FDA0003892301520000022
defining that if the target exists in the grid unit i, the jth bounding box prediction value is effective to the prediction, and is 1 at this moment; if no target exists in grid cell i, it is 0; x is the number of i ,y i ,
Figure FDA0003892301520000023
Respectively representing the target predicted position and the width and height values of the output of the detection network,
Figure FDA0003892301520000024
the position of the target in the dataset and the value of the width height.
5. The method for detecting the targets in the field mountain forest environment based on the improved YOLOv3 network as claimed in claim 1, wherein the step five is specifically as follows:
(1) The YOLOv3 model has 9 anchors in total, 3 outputs with different scales and 3 anchors for each output, so that each position of the output predicts 3 boxes; for each box, the output parameters comprise the target position coordinates and the width and height values, the confidence scores of objects in the box and the probability of each type of object in the box;
(2) The loss function for training the entire network is configured as follows:
Loss=λ 1 Loss loc2 Loss conf3 Loss cls
in the above formula, λ 1 Is a target position loss coefficient, λ 2 Is the target confidence loss coefficient, λ 3 A target class loss factor;
loss of target confidence Loss conf Using a binary cross-entropy loss of i E {0,1} represents whether the target actually exists in the predicted target boundary box i, 0 represents absence, and 1 represents existence;
Figure FDA0003892301520000031
defining that if the target exists in the grid unit i, the jth bounding box prediction value is effective to the prediction, and is 1 at this time; if no target is present in grid cell i, this time 0,
Figure FDA0003892301520000032
conversely, if there is a target in grid cell i, then the jth bounding box predictor is valid for the prediction, which is 0 at this time; if no target exists in grid cell i, then it is 1; lambda noobj For the current grid without loss coefficient of real target object, lambda obj The loss coefficient of a real target object exists in the current grid;
Figure FDA0003892301520000033
sigmoid probability representing whether or not there is an object in the predicted object rectangular box i, c i Represents the true value, and the target confidence Loss conf The formula of (1) is as follows:
Figure FDA0003892301520000034
loss of target class Loss cls Also employed is a binary cross-entropy loss, where o ij E {0,1} represents whether the j-th class target really exists in the predicted target boundary box i, 0 represents the absence, and 1 represents the existence;
Figure FDA0003892301520000035
indicating the Sigmoid probability, loss, of the j-th class target in the network prediction target boundary box i cls The formula of (1) is:
Figure FDA0003892301520000036
(3) And training by using a designed loss function form and an SGD (generalized Gaussian) method, wherein an Adam mode is adopted as a gradient updating mode.
CN202011639547.8A 2020-12-31 2020-12-31 Outdoor mountain forest environment target detection method based on improved YOLOv3 network Active CN112668662B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011639547.8A CN112668662B (en) 2020-12-31 2020-12-31 Outdoor mountain forest environment target detection method based on improved YOLOv3 network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011639547.8A CN112668662B (en) 2020-12-31 2020-12-31 Outdoor mountain forest environment target detection method based on improved YOLOv3 network

Publications (2)

Publication Number Publication Date
CN112668662A CN112668662A (en) 2021-04-16
CN112668662B true CN112668662B (en) 2022-12-06

Family

ID=75413678

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011639547.8A Active CN112668662B (en) 2020-12-31 2020-12-31 Outdoor mountain forest environment target detection method based on improved YOLOv3 network

Country Status (1)

Country Link
CN (1) CN112668662B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113837087B (en) * 2021-09-24 2023-08-29 上海交通大学宁波人工智能研究院 Animal target detection system and method based on YOLOv3
CN115291730B (en) * 2022-08-11 2023-08-15 北京理工大学 Wearable bioelectric equipment and bioelectric action recognition and self-calibration method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111507271A (en) * 2020-04-20 2020-08-07 北京理工大学 Airborne photoelectric video target intelligent detection and identification method
CN111626128A (en) * 2020-04-27 2020-09-04 江苏大学 Improved YOLOv 3-based pedestrian detection method in orchard environment
CN111814621A (en) * 2020-06-29 2020-10-23 中国科学院合肥物质科学研究院 Multi-scale vehicle and pedestrian detection method and device based on attention mechanism
CN112101434A (en) * 2020-09-04 2020-12-18 河南大学 Infrared image weak and small target detection method based on improved YOLO v3

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2020100371A4 (en) * 2020-03-12 2020-04-16 Jilin University Hierarchical multi-object tracking method based on saliency detection

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111507271A (en) * 2020-04-20 2020-08-07 北京理工大学 Airborne photoelectric video target intelligent detection and identification method
CN111626128A (en) * 2020-04-27 2020-09-04 江苏大学 Improved YOLOv 3-based pedestrian detection method in orchard environment
CN111814621A (en) * 2020-06-29 2020-10-23 中国科学院合肥物质科学研究院 Multi-scale vehicle and pedestrian detection method and device based on attention mechanism
CN112101434A (en) * 2020-09-04 2020-12-18 河南大学 Infrared image weak and small target detection method based on improved YOLO v3

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于Anchor-free的交通标志检测;范红超等;《地球信息科学学报》;20200125(第01期);全文 *

Also Published As

Publication number Publication date
CN112668662A (en) 2021-04-16

Similar Documents

Publication Publication Date Title
CN110188705B (en) Remote traffic sign detection and identification method suitable for vehicle-mounted system
Chen et al. Vehicle detection in high-resolution aerial images via sparse representation and superpixels
CN107563372B (en) License plate positioning method based on deep learning SSD frame
CN111460968B (en) Unmanned aerial vehicle identification and tracking method and device based on video
Li et al. Adaptive deep convolutional neural networks for scene-specific object detection
CN111310773A (en) Efficient license plate positioning method of convolutional neural network
CN109034035A (en) Pedestrian&#39;s recognition methods again based on conspicuousness detection and Fusion Features
Zheng et al. A lightweight ship target detection model based on improved YOLOv5s algorithm
CN112668662B (en) Outdoor mountain forest environment target detection method based on improved YOLOv3 network
CN115861619A (en) Airborne LiDAR (light detection and ranging) urban point cloud semantic segmentation method and system of recursive residual double-attention kernel point convolution network
CN111400572A (en) Content safety monitoring system and method for realizing image feature recognition based on convolutional neural network
CN110852327A (en) Image processing method, image processing device, electronic equipment and storage medium
CN112528845A (en) Physical circuit diagram identification method based on deep learning and application thereof
CN113408584A (en) RGB-D multi-modal feature fusion 3D target detection method
CN111652240A (en) Image local feature detection and description method based on CNN
Xu et al. Occlusion problem-oriented adversarial faster-RCNN scheme
Zhang et al. CFANet: Efficient detection of UAV image based on cross-layer feature aggregation
Li et al. Real-time tracking algorithm for aerial vehicles using improved convolutional neural network and transfer learning
Wang et al. Detecting occluded and dense trees in urban terrestrial views with a high-quality tree detection dataset
Ouadiay et al. Simultaneous object detection and localization using convolutional neural networks
CN116503399A (en) Insulator pollution flashover detection method based on YOLO-AFPS
CN116703996A (en) Monocular three-dimensional target detection algorithm based on instance-level self-adaptive depth estimation
Li et al. A new algorithm of vehicle license plate location based on convolutional neural network
Ogura et al. Improving the visibility of nighttime images for pedestrian recognition using in‐vehicle camera
Li et al. Few-shot meta-learning on point cloud for semantic segmentation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant