CN112668662A - Outdoor mountain forest environment target detection method based on improved YOLOv3 network - Google Patents

Outdoor mountain forest environment target detection method based on improved YOLOv3 network Download PDF

Info

Publication number
CN112668662A
CN112668662A CN202011639547.8A CN202011639547A CN112668662A CN 112668662 A CN112668662 A CN 112668662A CN 202011639547 A CN202011639547 A CN 202011639547A CN 112668662 A CN112668662 A CN 112668662A
Authority
CN
China
Prior art keywords
target
loss
network
mountain forest
box
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011639547.8A
Other languages
Chinese (zh)
Other versions
CN112668662B (en
Inventor
彭志红
蒋卓
陈杰
奚乐乐
王星博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Original Assignee
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BIT filed Critical Beijing Institute of Technology BIT
Priority to CN202011639547.8A priority Critical patent/CN112668662B/en
Publication of CN112668662A publication Critical patent/CN112668662A/en
Application granted granted Critical
Publication of CN112668662B publication Critical patent/CN112668662B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The invention discloses a field mountain forest environment target detection method based on an improved YOLOv3 network, which comprises the steps of manufacturing a target detection data set of a field mountain forest environment, introducing a space transformation layer based on an attention mechanism to be embedded into a YOLOv3 detection model, adding a de-STL layer on the basis to facilitate model training, finally designing an improved YOLOv3 detection network, and then performing fine-tuning on the network to obtain a target detection model finally used in the field mountain forest environment; the invention can accurately detect the target in the field mountain forest environment, and improves the precision of the target detection of the detector in the field mountain forest environment and the recall rate of the small-scale target.

Description

Outdoor mountain forest environment target detection method based on improved YOLOv3 network
Technical Field
The invention belongs to the technical field of computer vision and machine learning, and particularly relates to a field mountain forest environment target detection method based on an improved YOLOv3 network.
Background
With the development of science and technology, the target detection technology has become a popular direction and research focus in the field of computer vision. The target detection technology is applied to various actual scenes, such as the fields of unmanned driving, unmanned aerial vehicle monitoring, scene recognition and the like, but the target detection algorithm in the field mountain forest environment still has many problems. Due to the fact that the outdoor mountain forest environment has high complexity, the situations that illumination changes violently, climate changes are irregular, shielding between a target and a non-target is serious and the like exist, the task difficulty of target detection and the like based on vision is increased, and in addition, due to actual requirements, the detection speed also needs to meet real-time performance.
Traditional target detection algorithms rely primarily on the apparent characteristics of real objects. For the object with rich texture, feature descriptors such as SIFT, PCA-SIFT, SURF and the like are designed manually, and feature points with strong representation are extracted from the image for further matching detection. For the object with less texture or even no texture, the template matching method is the preferred solution. The core problem of template matching is to design a reasonable and universal distance measurement mode. However, the above two methods are easily affected by the environment, such as shading, confusing background, etc., and the conventional method is very sensitive to the illumination variation and noise of the environment.
With the development of deep learning techniques in recent years, more and more engineering techniques are beginning to apply deep convolutional networks (CNN) to solve practical problems. By using the target detection method of deep learning, a feature descriptor designed manually is not needed, and the characteristics of deep learning are utilized to enable the network to learn higher-level semantic information. And the deep convolutional network (CNN) is more robust to factors such as illumination, scale change and noise, and the generalization capability of the model and the precision of the target detection algorithm are greatly improved.
In the literature (Girshick R, Donahue J, Darrell T, et al. Rich feature technologies for the object detection and the magnetic segmentation [ C ]// Proceedings of the IEEE conference on computer vision and pattern recognition.2014:580-587.), a target detection algorithm based on deep learning, which is a typical Two Stage algorithm, is proposed first, and the detection is mainly performed by using a method of candidate frame extraction + classification regression, in addition, typical algorithms include Faster R-CNN proposed in the literature (Ren S, He K, Girshick R, et al. Faster R-CNN: forward real-time object detection with region processing systems [ C ]// Advances in processing systems [ 2015 ]: 91-99.), although this series of algorithms have higher detection accuracy, but due to the complexity of the calculation process, the speed of the network is far from meeting the requirement of real-time performance. In order to balance the requirements of both precision and speed, another class of One Stage target detection models based on regression ideas has received much attention, among which are the YOLO series (Redmon J, Divvala S, Girshick R, et al. You only lock on: Unifield, real-time object detection [ C ]// Proceedings of the IEEE con on component vision and pattern recognition.2016:779-788.) and the SSD series (Liu W, Anguelov D, Erhan D, et al. Ssd: Single housing multi detector [ C ]// European con on component vision Springer, Cham,2016: 21-37.). The SSD detection model is a set of default boxes (default boxes) that discretize the output space of the bounding box into different aspect ratios, fine-tunes the default boxes, and detects the default boxes on feature maps of different scales, but the accuracy is higher, but the real-time performance is far less than that of YOLO. The real-time performance of the YOLO series is very good, the mutual balance of speed and precision is basically achieved by the YOLOv3 model, the detection effect based on small targets and multiple targets is greatly improved, and compared with an SSD model, the YOLO series is more widely applied due to the high efficiency of the YOLO.
Disclosure of Invention
In view of this, the invention provides a method for detecting targets in a field mountain forest environment based on an improved YOLOv3 network, which can effectively improve the accuracy of target detection in the field mountain forest environment by using a new network model.
The technical scheme for realizing the invention is as follows:
a field mountain forest environment target detection method based on an improved YOLOv3 network comprises the following steps:
acquiring a background picture and a foreground object of a field mountain forest, and presetting detection target types of people and vehicles;
secondly, superposing the background and the foreground through image preprocessing to generate a data set, acquiring bounding box and category data of a foreground object, and generating an xml file with the same format as the PASCALVOC2012 data set to obtain a training set, a verification set and a test set;
thirdly, based on a YOLOv3 network model, adding a spatial transformation layer STL behind feature extraction layers with different scales, training by taking feature maps with different scales as input to obtain different affine transformation results, and performing subsequent classification and bounding box regression on output features of the STL layer, wherein the method can effectively solve the problem of detection effect reduction when a target rotates and scales change in a field mountain forest environment;
fourthly, a de-STL conversion layer is added at last in the network, so that the final calculation result is matched with the encoding result of a true value bounding box which takes the feature map as the coordinate system before affine transformation, x and y corresponding to the original image coordinate system are obtained, and loss is calculated;
and step five, training the improved YOLOv3 network obtained in the step four by using the training set, the verification set and the test set obtained in the step two to obtain a performance optimal model.
Further, the step one specifically includes the following processes: respectively crawling four key words of forest, valley, plain and wetland on the Internet by utilizing python3, Beautiful Soup, requests and lxml, wherein the number of the four key words is 700, 600 and 600, and the unsuitable picture is manually removed; data of people (person, category _ id ═ 1) and cars (car, category _ id ═ 3) in the COCO2014 dataset are selected as foreground objects, as the objects contained in the data are more daily, the deformation and the shielding situations are more, and the foreground objects are randomly inserted into the background image to construct the field mountain forest environment dataset.
Further, the third step specifically includes the following steps: the design of the STL layer based on the attention mechanism focuses on localization net, 6 parameters are output for an input feature map to perform affine transformation on the original feature map, and therefore the problem of detection accuracy reduction caused by rotation and scale change of a target is solved; therefore, after the STL is embedded into the conv26, conv43 and conv52 characteristic diagrams of the dark net-53, the information loss amount is small after passing through the STL layer, the sensitivity to the rotation change is ensured, and then loss calculation such as classification and regression is carried out on the output result.
Further, the step four specifically includes the following processes: embedding the de-STL layer behind the convolution layer of the image target position so as to conveniently calculate the loss of location; wherein the loss of location is:
Figure BDA0002879597370000041
wherein λ iscoordFor the loss factor, S is the number of feature grid cells, B is the predicted box number,
Figure BDA0002879597370000042
defining that if the target exists in the grid unit i, the jth bounding box prediction value is effective to the prediction, and is 1 at this time; if no target exists in grid cell i, it is 0;
Figure BDA0002879597370000043
respectively representing the target predicted position and the width and height values of the output of the detection network,
Figure BDA0002879597370000044
the position of the target in the dataset and the value of the width height.
Further, the fifth step is specifically:
(1) the Yolov3 model has a total of 9 anchors, 3 outputs with different scales, and each output uses 3 anchors, so each position of the output predicts 3 boxes; for each box, the output parameters comprise the target position coordinates and the width and height values, the confidence scores of objects in the box and the probability of each type of object in the box;
(2) the loss function for training the entire network is configured as follows:
Loss=λ1Lossloc2Lossconf3Losscls
in the above formula, λ1Is a target position loss coefficient, λ2Is the target confidence loss coefficient, λ3A target class loss factor;
loss of target confidence LossconfBinary Cross Entropy loss (Binary Cross Entropy) was used, asiE {0,1} represents whether the target really exists in the predicted target boundary box i, 0 represents the absence, and 1 represents the existence;
Figure BDA0002879597370000051
defining that if the target exists in the grid unit i, the jth bounding box prediction value is effective to the prediction, and is 1 at this time; if no target is present in grid cell i, this time 0,
Figure BDA0002879597370000052
conversely, if there is a target in grid cell i, then the jth bounding box predictor is valid for the prediction, which is 0 at this time; if no target exists in grid cell i, then it is 1; lambda [ alpha ]noobjFor the current grid without loss coefficient of real target object, lambdaobjThe loss coefficient of a real target object exists in the current grid;
Figure BDA0002879597370000053
sigmoid probability representing whether or not there is an object in the predicted object rectangular box i, ciRepresents the true value, and the target confidence LossconfThe formula of (1) is:
Figure BDA0002879597370000054
loss of target class LossclsAlso employed is a binary cross-entropy loss, where oijE {0,1} represents whether a jth class target really exists in a predicted target boundary box i, 0 represents nonexistence, and 1 represents existence;
Figure BDA0002879597370000055
the Sigmoid probability, Loss, of the j-th class target in the network prediction target boundary box i is shownclsThe formula of (1) is:
Figure BDA0002879597370000056
(3) training is carried out by utilizing a designed loss function form and an SGD (random gradient descent) method, and an Adam method is adopted as a gradient updating method.
Has the advantages that:
when the method is used for solving the target detection problem in the field mountain forest environment, the attention mechanism-based STL space conversion layer is introduced and connected to the characteristic input layer, so that the network can automatically learn affine transformation parameters to solve the rotation problem and the shielding problem of targets with different scales, the accuracy of small target detection can be improved, and the target detection precision is effectively improved. The final result shows that the method has high target detection effect on the wild mountain forest data set.
Drawings
FIG. 1 is a data set produced by the present invention;
FIG. 2 is a network model structure of the present invention;
fig. 3 is a detailed structure of the spatial transform layer STL;
FIG. 4 is a specific structure of the de-STL layer;
fig. 5 is a diagram showing the actual effect of the present invention.
Detailed Description
The invention is described in detail below by way of example with reference to the accompanying drawings.
The invention mainly aims at the technical problem that the existing target detection method is difficult to deal with in the field mountain forest environment under the severe conditions of object shielding, rotation, scale change and the like. The core of the method is a Yolov3 target detection algorithm, which is a method for efficiently detecting targets with balanced precision and speed, and adopts a convolution network structure, and the specific structure of the convolution network structure is shown in figure 2. The invention can accurately detect the target in the field mountain forest environment, and improves the precision of the target detection of the detector in the field mountain forest environment and the recall rate of the small-scale target.
The detailed steps of the invention are as follows:
step 1: and acquiring a background picture and a foreground object of the wild mountain forest, and presetting detection types of people and vehicles. The invention uses python3+ Beautiful Soup + requests + lxml to crawl four pictures of "keywords" on the Internet, 700, 600 respectively, and manually eliminates the data of unsuitable pictures (such as pictures already containing foreground data and pictures with more watermarks), statistically, the average size of the pictures is [600,400] and the pictures are all stored in 'jpg' format, selects the data of people (person, category _ id 1) and cars (car, category _ id 3) in COCO2014 as foreground objects, on one hand, because the accurate segmentation calibration allows the extraction of the foreground objects, on the other hand, because the objects contained are more frequent, deformation and occlusion conditions are more favorable for increasing the robustness of the detector, and in order to extract the objects more completely, sets the extraction script to extract 2 picture and 1352 picture in total of the outline of people and cars, besides foreground data, pixel values of other positions in each picture are 0, and the picture can be conveniently superposed with a background picture.
Step 2: through image preprocessing, a background picture and foreground data are superposed, the picture effect after synthesis is as shown in fig. 1, a data set is generated, the enclosure frame and the category data of an object are automatically acquired in a script, and an xml file with the same format as that of the PASCALVOC2012 data set is generated for subsequent training. When composing a picture, the following rules are followed:
a) many deformation and shielding situations already exist in the COCO data set, so that the randomness in the aspect is not added;
b) adding scale changes to the foreground data such that (side with larger foreground/side with smaller background) is in the range of 0.2 to 0.45;
c) to make the synthesized picture more realistic, let | foreground pixel mean-background pixel mean | < 30;
d) each picture contains 3,2 and 1 foreground examples according to the probabilities of 0.2,0.3 and 0.5, and the probabilities of the examples belonging to the classes of people and vehicles are the same;
e) the bounding box format is (x _ min, y _ min), (x _ max, y _ max) format.
In addition, attention needs to be paid to the direction of the picture coordinate system when writing the script (for example, the height of the cv2. immead function default picture is the x direction, the width of the cv2.rectangle function default picture is the x direction, and the width of the picture is also the x direction in the xml file). 2500 pictures are synthesized in total, the pictures are randomly divided into a training set and a testing set according to the ratio of 4:1, the training sets and the testing sets are stored according to the storage structure of PACALVOC2012, and for convenience of description, the constructed data set is named as Detection 2500.
And step 3: designing the STL layer as shown in fig. 3, the spatial transform network has the following characteristics:
(1) the network can be easily embedded as a module in any network. (2) The parameters that the network needs to learn are the parameters of localization net. In other words, the network can autonomously learn what affine transformations should be done in the course of training to make the output of the classifier or detector with higher confidence. In addition, since the target detection task is different from the image classification task, besides the need to accurately know the type of the target, a bounding box of the target is also obtained. In view of this, and for two reasons, it is not reasonable to put STL to the first layer, i.e., after the input image: a) training of the STL may cause the input U to lose a portion of the information, and if features are extracted by the convolutional layer, an appreciable number of features may be lost, especially when the input image contains multiple objects simultaneously; b) one of the large features of the YOLOv3 structure is that different scales of feature maps are "responsible" for detecting different scales of objects. Whereas the difference between the "proper" affine transformations required for objects with larger differences in scale is also larger relative to objects with similar scales. Therefore, it is reasonable to train different affine transformations for feature maps of different scales using the feature maps as input. And embedding the STL into feature maps with different scales, and then performing operations such as classification and bounding box regression on the output of the STL.
And 4, step 4: the de-STL layer is designed as shown in FIG. 4, wherein "de" is a negative prefix, and means "inverse", defining de-STL as the inverse operation of STL. The reason for adding this operation is that after STL, the output of the location part is based on the feature map after the affine transformation as the coordinate system, which obviously does not match the encoding result of the truth bounding box based on the feature map before the affine transformation as the coordinate system, so the output of the location part needs to be inverse transformed, i.e. de-STL, to obtain x and y matching the coordinate system, and then calculate the loss, and further train the network. Wherein LosslocComprises the following steps:
Figure BDA0002879597370000081
wherein λ iscoordFor the loss factor, S is the number of feature grid cells, B is the predicted box number,
Figure BDA0002879597370000082
defining that if the target exists in the grid unit i, the jth bounding box prediction value is effective to the prediction, and is 1 at this time; if no target exists in grid cell i, it is 0;
Figure BDA0002879597370000091
respectively representing the target predicted position and the width and height values of the output of the detection network,
Figure BDA0002879597370000092
the position of the target in the dataset and the value of the width height. This is further calculated from the offset of the default box output by the network. The target position in the image commonly used in the data set is represented by (x _ min, y _ min), (x _ max, y _ max), i.e. the position coordinates of the upper left corner and the lower right corner of the bounding box. In order to use the above-mentioned loss calculation method and maintain the translation and scaling invariance of the bounding box, it is necessary to encode it, i.e. to represent the form of the bounding box by the coordinates of the center point and the width and height.
And 5: the network training process is configured as follows:
(1) there are a total of 9 anchors in the YOLOv3 model, 3 outputs of different scales, 3 anchors for each output, so each position of the output predicts 3 boxes. For each box, the output parameters include x, y, w, h, which are further calculated according to the offset of default box output by the network, and the confidence score of the box having objects and the probability of each type of object in the box. Thus, for a VOC data set that contains 20 categories, the output of YOLOv3 has 3 sizes: 13 × 13 × (3 × (20+5)) -13 × 13 × 75,26 × 26 × (3 × (20+5)) -26 × 26 × 75,52 × 52 × (3 × (20+5)) -52 × 52 × 75.
(2) The loss function for training the entire network is configured as follows:
Loss=λ1Lossloc2Lossconf3Losscls
in the above formula, λ1Is a target position loss coefficient, λ2Is the target confidence loss coefficient, λ3A target class loss factor;
loss of target confidence LossconfBinary Cross Entropy loss (Binary Cross Entropy) was used, asiE {0,1} represents whether the target really exists in the predicted target boundary box i, 0 represents the absence, and 1 represents the existence;
Figure BDA0002879597370000093
defining that if the target exists in the grid unit i, the jth bounding box prediction value is effective to the prediction, and is 1 at this time; if no target is present in grid cell i, this time 0,
Figure BDA0002879597370000094
conversely, if there is a target in grid cell i, then the jth bounding box predictor is valid for the prediction, which is 0 at this time; if no target exists in grid cell i, then it is 1; lambda [ alpha ]noobjFor the current grid without loss coefficient of real target object, lambdaobjLoss system with real target for current gridCounting;
Figure BDA0002879597370000101
sigmoid probability representing whether or not there is an object in the predicted object rectangular box i, ciRepresents the true value, and the target confidence LossconfThe formula of (1) is:
Figure BDA0002879597370000102
loss of target class LossclsThe binary cross entropy loss is adopted, and the reason for adopting the binary cross entropy loss is that the same target can be classified into multiple types at the same time, for example, cats can be classified into cats and animals, so that the method can cope with more complex scenes. Wherein o isijE {0,1} represents whether a jth class target really exists in a predicted target boundary box i, 0 represents nonexistence, and 1 represents existence;
Figure BDA0002879597370000103
the Sigmoid probability of the j-th class target in the network prediction target boundary frame i is shown, S is the number of feature grid units, B is the predicted box number, LossclsThe formula of (1) is:
Figure BDA0002879597370000104
(3) training is carried out by utilizing a designed loss function form and an SGD (random gradient descent) method, and an Adam method is adopted as a gradient updating method.
Compared with other detection methods, the model provided by the invention has higher detection precision and performance for the target detection task of the field mountain forest environment, and has higher accuracy for small target detection, and the specific detection effect is shown in fig. 5.
In summary, the above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (5)

1. A field mountain forest environment target detection method based on an improved YOLOv3 network is characterized by comprising the following steps:
acquiring a background picture and a foreground object of a field mountain forest, and presetting detection target types of people and vehicles;
secondly, superposing the background and the foreground through image preprocessing to generate a data set, acquiring bounding box and category data of a foreground object, and generating an xml file with the same format as the PASCALVOC2012 data set to obtain a training set, a verification set and a test set;
thirdly, based on a YOLOv3 network model, adding a spatial transformation layer STL behind feature extraction layers with different scales, training to obtain different affine transformation results by taking feature maps with different scales as input, and then performing subsequent classification and bounding box regression on output features of the STL layer;
fourthly, a de-STL conversion layer is added at last in the network, so that the final calculation result is matched with the encoding result of a true value bounding box which takes the feature map as the coordinate system before affine transformation, x and y corresponding to the original image coordinate system are obtained, and loss is calculated;
and step five, training the improved YOLOv3 network obtained in the step four by using the training set, the verification set and the test set obtained in the step two to obtain a performance optimal model.
2. The method for detecting the targets in the environment of the wild mountain forest based on the improved YOLOv3 network in claim 1, wherein the step one comprises the following steps: respectively crawling images of four keywords of forest, valley, plain and wetland on the Internet by utilizing python3+ Beautiful Soup + requests + lxml, wherein the number of the images is 700, 600 and 600, and manually removing unsuitable images; data of people and vehicles in the COCO2014 data set are selected as foreground objects, the objects contained in the data set are more daily, the deformation and shielding conditions are more, and the foreground objects are randomly inserted into the background image to construct the field mountain forest environment data set.
3. The method for detecting the target in the field mountain forest environment based on the improved YOLOv3 network as claimed in claim 1, wherein the step three comprises the following steps: the design of the STL layer based on the attention mechanism focuses on localization net, 6 parameters are output for an input feature map to perform affine transformation on the original feature map, and therefore the problem of detection accuracy reduction caused by rotation and scale change of a target is solved; therefore, after the STL is embedded into the conv26, conv43 and conv52 characteristic diagrams of the dark net-53, the information loss amount is small after passing through the STL layer, the sensitivity to the rotation change is ensured, and then loss calculation such as classification and regression is carried out on the output result.
4. The method for detecting the target in the field mountain forest environment based on the improved YOLOv3 network as claimed in claim 1, wherein the step four comprises the following steps: embedding the de-STL layer behind the convolution layer of the image target position so as to conveniently calculate the loss of location; wherein the loss of location is:
Figure FDA0002879597360000021
wherein λ iscoordFor the loss factor, S is the number of feature grid cells, B is the predicted box number,
Figure FDA0002879597360000022
defining that if the target exists in the grid unit i, the jth bounding box prediction value is effective to the prediction, and is 1 at this time; if no target exists in grid cell i, it is 0; x is the number ofi,yi,
Figure FDA0002879597360000023
Respectively representing the target predicted position and the width and height values of the output of the detection network,
Figure FDA0002879597360000024
the position of the target in the dataset and the value of the width height.
5. The method for detecting the targets in the field mountain forest environment based on the improved YOLOv3 network as claimed in claim 1, wherein the step five is specifically as follows:
(1) the Yolov3 model has a total of 9 anchors, 3 outputs with different scales, and each output uses 3 anchors, so each position of the output predicts 3 boxes; for each box, the output parameters comprise the target position coordinates and the width and height values, the confidence scores of objects in the box and the probability of each type of object in the box;
(2) the loss function for training the entire network is configured as follows:
Loss=λ1Lossloc2Lossconf3Losscls
in the above formula, λ1Is a target position loss coefficient, λ2Is the target confidence loss coefficient, λ3A target class loss factor;
loss of target confidence LossconfBinary Cross Entropy loss (Binary Cross Entropy) was used, asiE {0,1} represents whether the target really exists in the predicted target boundary box i, 0 represents the absence, and 1 represents the existence;
Figure FDA0002879597360000031
defining that if the target exists in the grid unit i, the jth bounding box prediction value is effective to the prediction, and is 1 at this time; if no target is present in grid cell i, this time 0,
Figure FDA0002879597360000032
conversely, if there is a target in grid cell i, then the jth bounding box predictor is valid for the prediction, which is 0 at this time; if no target exists in grid cell i, then it is 1; lambda [ alpha ]noobjFor the current grid without loss coefficient of real target object, lambdaobjThe loss coefficient of a real target object exists in the current grid;
Figure FDA0002879597360000033
sigmoid probability representing whether or not there is an object in the predicted object rectangular box i, ciRepresents the true value, and the target confidence LossconfThe formula of (1) is:
Figure FDA0002879597360000034
loss of target class LossclsAlso employed is a binary cross-entropy loss, where oijE {0,1} represents whether a jth class target really exists in a predicted target boundary box i, 0 represents nonexistence, and 1 represents existence;
Figure FDA0002879597360000035
the Sigmoid probability, Loss, of the j-th class target in the network prediction target boundary box i is shownclsThe formula of (1) is:
Figure FDA0002879597360000036
(3) and training by using a designed loss function form and an SGD (generalized Gaussian) method, wherein an Adam mode is adopted as a gradient updating mode.
CN202011639547.8A 2020-12-31 2020-12-31 Outdoor mountain forest environment target detection method based on improved YOLOv3 network Active CN112668662B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011639547.8A CN112668662B (en) 2020-12-31 2020-12-31 Outdoor mountain forest environment target detection method based on improved YOLOv3 network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011639547.8A CN112668662B (en) 2020-12-31 2020-12-31 Outdoor mountain forest environment target detection method based on improved YOLOv3 network

Publications (2)

Publication Number Publication Date
CN112668662A true CN112668662A (en) 2021-04-16
CN112668662B CN112668662B (en) 2022-12-06

Family

ID=75413678

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011639547.8A Active CN112668662B (en) 2020-12-31 2020-12-31 Outdoor mountain forest environment target detection method based on improved YOLOv3 network

Country Status (1)

Country Link
CN (1) CN112668662B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113837087A (en) * 2021-09-24 2021-12-24 上海交通大学宁波人工智能研究院 Animal target detection system and method based on YOLOv3
CN115291730A (en) * 2022-08-11 2022-11-04 北京理工大学 Wearable bioelectric equipment and bioelectric action identification and self-calibration method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2020100371A4 (en) * 2020-03-12 2020-04-16 Jilin University Hierarchical multi-object tracking method based on saliency detection
CN111507271A (en) * 2020-04-20 2020-08-07 北京理工大学 Airborne photoelectric video target intelligent detection and identification method
CN111626128A (en) * 2020-04-27 2020-09-04 江苏大学 Improved YOLOv 3-based pedestrian detection method in orchard environment
CN111814621A (en) * 2020-06-29 2020-10-23 中国科学院合肥物质科学研究院 Multi-scale vehicle and pedestrian detection method and device based on attention mechanism
CN112101434A (en) * 2020-09-04 2020-12-18 河南大学 Infrared image weak and small target detection method based on improved YOLO v3

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2020100371A4 (en) * 2020-03-12 2020-04-16 Jilin University Hierarchical multi-object tracking method based on saliency detection
CN111507271A (en) * 2020-04-20 2020-08-07 北京理工大学 Airborne photoelectric video target intelligent detection and identification method
CN111626128A (en) * 2020-04-27 2020-09-04 江苏大学 Improved YOLOv 3-based pedestrian detection method in orchard environment
CN111814621A (en) * 2020-06-29 2020-10-23 中国科学院合肥物质科学研究院 Multi-scale vehicle and pedestrian detection method and device based on attention mechanism
CN112101434A (en) * 2020-09-04 2020-12-18 河南大学 Infrared image weak and small target detection method based on improved YOLO v3

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
TINGCHAO SHI, ET AL: "Fast Classification and Detection of Marine Targets in Complex Scenes with YOLOv3", 《OCEANS 2019 - MARSEILLE》 *
范红超等: "基于Anchor-free的交通标志检测", 《地球信息科学学报》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113837087A (en) * 2021-09-24 2021-12-24 上海交通大学宁波人工智能研究院 Animal target detection system and method based on YOLOv3
CN113837087B (en) * 2021-09-24 2023-08-29 上海交通大学宁波人工智能研究院 Animal target detection system and method based on YOLOv3
CN115291730A (en) * 2022-08-11 2022-11-04 北京理工大学 Wearable bioelectric equipment and bioelectric action identification and self-calibration method
CN115291730B (en) * 2022-08-11 2023-08-15 北京理工大学 Wearable bioelectric equipment and bioelectric action recognition and self-calibration method

Also Published As

Publication number Publication date
CN112668662B (en) 2022-12-06

Similar Documents

Publication Publication Date Title
Xie et al. Multilevel cloud detection in remote sensing images based on deep learning
Chen et al. Vehicle detection in high-resolution aerial images via sparse representation and superpixels
CN107563372B (en) License plate positioning method based on deep learning SSD frame
Li et al. Weighted feature pyramid networks for object detection
CN115063573B (en) Multi-scale target detection method based on attention mechanism
Zheng et al. A lightweight ship target detection model based on improved YOLOv5s algorithm
CN111310773A (en) Efficient license plate positioning method of convolutional neural network
CN112528845A (en) Physical circuit diagram identification method based on deep learning and application thereof
CN112329662B (en) Multi-view saliency estimation method based on unsupervised learning
CN112668662B (en) Outdoor mountain forest environment target detection method based on improved YOLOv3 network
CN115861619A (en) Airborne LiDAR (light detection and ranging) urban point cloud semantic segmentation method and system of recursive residual double-attention kernel point convolution network
CN110852327A (en) Image processing method, image processing device, electronic equipment and storage medium
Han et al. Research on remote sensing image target recognition based on deep convolution neural network
CN116503399A (en) Insulator pollution flashover detection method based on YOLO-AFPS
CN114049541A (en) Visual scene recognition method based on structural information characteristic decoupling and knowledge migration
Ouadiay et al. Simultaneous object detection and localization using convolutional neural networks
Li et al. Real-time tracking algorithm for aerial vehicles using improved convolutional neural network and transfer learning
Yang et al. Improved YOLOv4 based on dilated coordinate attention for object detection
Li et al. Few-shot meta-learning on point cloud for semantic segmentation
CN116977359A (en) Image processing method, apparatus, device, readable storage medium, and program product
CN113313091B (en) Density estimation method based on multiple attention and topological constraints under warehouse logistics
CN116012299A (en) Composite insulator hydrophobicity grade detection method based on target identification
CN114863132A (en) Method, system, equipment and storage medium for modeling and capturing image spatial domain information
Gizatullin et al. Automatic car license plate detection based on the image weight model
Su et al. The precise vehicle retrieval in traffic surveillance with deep convolutional neural networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant