CN110263679B - Fine-grained vehicle detection method based on deep neural network - Google Patents

Fine-grained vehicle detection method based on deep neural network Download PDF

Info

Publication number
CN110263679B
CN110263679B CN201910476604.6A CN201910476604A CN110263679B CN 110263679 B CN110263679 B CN 110263679B CN 201910476604 A CN201910476604 A CN 201910476604A CN 110263679 B CN110263679 B CN 110263679B
Authority
CN
China
Prior art keywords
vehicle
alpha
output
detection
category
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910476604.6A
Other languages
Chinese (zh)
Other versions
CN110263679A (en
Inventor
袁泽剑
罗芳颖
刘芮金
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Jiaotong University
Original Assignee
Xian Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Jiaotong University filed Critical Xian Jiaotong University
Priority to CN201910476604.6A priority Critical patent/CN110263679B/en
Publication of CN110263679A publication Critical patent/CN110263679A/en
Application granted granted Critical
Publication of CN110263679B publication Critical patent/CN110263679B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
    • G06V20/584Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads of vehicle lights or traffic lights
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/08Detecting or categorising vehicles

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)
  • Traffic Control Systems (AREA)

Abstract

The invention discloses a fine-grained vehicle detection method based on a deep neural network, which can accurately detect the specific posture category and contour of a vehicle by defining an output, detection and training network. When the priori knowledge such as the ground plane, camera calibration information and the like is given, the detection result can be used for estimating a travelable area, collision time and the like, and further assisting and guaranteeing the safe driving of a driver. Compared with a general target detection network, the method can output more information and can meet different application requirements. The invention outputs the attitude category and the outline position information of the vehicle, and the information is beneficial to more accurately judging the position and the driving direction of the vehicle in the road. The invention has low requirement on the sensor for collecting data and is beneficial to production and use. The calculation of the invention is completed in the common RGB image, no equipment such as a depth sensor or a radar is needed, only one common camera is needed to meet the requirement, and the cost is low.

Description

Fine-grained vehicle detection method based on deep neural network
[ technical field ] A method for producing a semiconductor device
The invention relates to a fine-grained vehicle detection method based on a deep neural network.
[ background of the invention ]
The vehicle detection is an important task in an automatic driving or auxiliary driving system, and can be used for calculating collision distance and collision time and guaranteeing driving safety. The general target detection task can only obtain a coarse rectangular frame detection result, the rectangular frame cannot be used for distinguishing the position of each surface of the vehicle, the passable area beside the vehicle cannot be accurately analyzed, and the general target detection task is insensitive to the posture change of the vehicle. This requires that the vehicle be able to detect its exact contour and distinguish the side and head and tail of the vehicle to achieve fine-grained vehicle detection.
Two methods are mainly used for realizing contour detection, one is to realize example segmentation based on segmentation candidates or pixel classification in a common RGB image; the other is to take a picture with depth information and perform target detection in the RGB-D image. Although the example division method can detect the contour of the vehicle, it cannot distinguish different surfaces of the vehicle, and the calculation is slow. The 3D detection method can accurately obtain the position of the vehicle, but a 3D sensor is required to acquire depth information, and the acquisition cost is high.
With the continuous improvement of the algorithm, the neural network completing the target detection in one step approaches or meets the requirement of real-time detection. The detection of the vehicle contour is directly finished under the framework, so that extra calculation or acquisition cost can be avoided, and the method is suitable for an automatic driving or auxiliary driving system.
[ summary of the invention ]
The invention aims to overcome the defects of the prior art and provide a fine-grained vehicle detection method based on a deep neural network.
In order to achieve the purpose, the invention adopts the following technical scheme to realize the purpose:
a fine-grained vehicle detection method based on a deep neural network comprises the following steps:
step 1: defining an output
Given a rectangle (v, x, y, w, h), where v indicates positive and negative samples, v ∈ {0,1}, 0 indicates background, and 1 indicates vehicle; x, y, w and h represent the position and width and height of the rectangular frame, and on the basis, the posture subclass codes and the control points are output in an expanding mode;
step 2: detecting a network
Order (w)f,hf,cf) The width, height and channel number of the characteristic layer with the dimension f; if V, A, P are the number of categories V, a, P, respectively, then a feature layer with a scale f is convolved to produce (w)f,hf,BfX (V + A + P +4+3)) dimension, the detection result comprises (V, a, P, x, y, w, h, alpha, beta, gamma) information, wherein BfThe number of default boxes generated for each location;
in the detection process, the detector predicts the conditional probability at each node of the hierarchical structure, and the conditional probabilities from the root node to the node are multiplied to obtain a joint probability; if the joint probability of a certain node is lower than a selected threshold value, stopping continuously judging downwards, and then predicting the final category and the geometric shape of the vehicle;
and step 3: training network
Order to
Figure BDA0002082472970000021
Whether the ith default box is associated with the jth default box of the d typeAn indication function of true box matching; obtaining N matched default boxes after matching with the actual value of the label; the overall loss function is the sum of the classification and localization losses:
Figure BDA0002082472970000022
the invention further improves the following steps:
the specific method of step 1 is as follows:
1-1) attitude coding
Two subclasses are used to encode the imaged 9 2D vehicle poses p e { p1,...,p99 postures are respectively 9 leaf nodes; the first sub-class a e { R, F, N } indicates which faces of the vehicle are visible, wherein R indicates that the rear face is visible, F indicates that the front face is visible, and N indicates that neither the front face nor the rear face is visible, and only the side faces are visible; another sub-category is s, which represents a spatial configuration, which determines whether the side is to the left or right of the front or back; for a ═ N, s denotes the direction of the flank; s belongs to { l, r, n }, wherein l represents the left side, r represents the right side, and n represents that the target vehicle is in the right front, and only one rectangular surface can be seen; 9 different postures can be coded according to the values of a and s;
1-2) control points
Defining 3 virtual control points on the basis of the rectangular frame (x, y, w, h), and forming the outline boundary of each visible surface of the vehicle; alpha denotes the position of the boundary between two faces, beta, gamma defines the position of the upper base of the trapezoid; if s ═ l, then β, γ are defined on the leftmost border; for 9 2D poses, black dots are used to indicate required control points in each pose, when s is equal to N, the control points are not required, and when a is equal to N, and s is equal to l or r, only two control points of β and γ are required;
the output is defined as (v, a, s, x, y, w, h, α, β, γ), the output result of the third layer, i.e., the result of the leaf node, can be directly denoted by p, and thus, the output can also be defined as (v, a, p, x, y, w, h, α, β, γ);
1-3) hierarchical Structure
A hierarchical output structure is adopted, and detection results are output in 3 layers; whether the first layer outputs the vehicle or not is the category v, the second layer outputs the visible surface information of the vehicle is the category a, and the third layer outputs the accurate attitude category p.
The specific method of step 3 is as follows:
3-1) network classification
The loss function of the classification task is as follows;
Figure BDA0002082472970000031
in the formula:
Figure BDA0002082472970000032
the confidence of the category after softmax is shown, and the calculation formula is as follows:
Figure BDA0002082472970000041
3-2) control Point regression
Let (alpha)xy) Coordinates representing control point α, similar definitions apply to control points β and γ; the geometric constraint that three points of alpha, beta and gamma are all on the boundary of the rectangular frame requires regression values only including the position of the rectangular frame and alphaxyy(ii) a Defining the deviation of the predicted value from the default box as:
Figure BDA0002082472970000042
Figure BDA0002082472970000043
Figure BDA0002082472970000044
in the formula: cx, cy represents the coordinate of the center point of the default box; w, h tableWidth and height of the default box are shown;
Figure BDA0002082472970000045
a true value representing the x coordinate of the alpha control point; alpha is alphaxA predicted value representing the x coordinate of the alpha control point;
Figure BDA0002082472970000046
βy,
Figure BDA0002082472970000047
γyare similarly defined;
the loss of the positioning task is as follows:
Figure BDA0002082472970000048
in the formula: l isboxA loss function representing rectangular box regression in target detection; l represents a robust loss function smoothL1
Figure BDA0002082472970000049
Representing an indication function, indicating whether the ith default box contributes to the coordinate t, and when the true value posture matched by the default box does not contain the control point alpha, not adding the alphaxContributes to the regression of betayyAnd similarly.
Compared with the prior art, the invention has the following beneficial effects:
the invention provides a deep neural network method for detecting fine-grained vehicles, which can accurately detect the specific attitude category and contour of the vehicles. When the priori knowledge such as the ground plane, camera calibration information and the like is given, the detection result can be used for estimating a travelable area, collision time and the like, and further assisting and guaranteeing the safe driving of a driver. The invention has three advantages:
compared with a common target detection network, the method can output more information and can meet different application requirements. The invention outputs the attitude category and the outline position information of the vehicle, and the information is beneficial to more accurately judging the position and the driving direction of the vehicle in the road.
Secondly, the calculated amount is small, and the time efficiency is high. The method provided by the invention is expanded from a frame for completing target detection in one step which is close to or meets the real-time detection requirement, compared with the target detection method, the method provided by the invention does not generate additional characteristics, does not increase the number of candidate windows, only expands the output channel of the detector, hardly increases the calculated amount, and has the same detection efficiency as the original target detection algorithm.
Thirdly, the requirement on a sensor for collecting data is low, and the production and the use are facilitated. The calculation of the invention is completed in the common RGB image, no equipment such as a depth sensor or a radar is needed, only one common camera is needed to meet the requirement, and the cost is low.
[ description of the drawings ]
FIG. 1 is an overall network framework;
FIG. 2 is an example of a vehicle contour detection effect;
FIG. 3 is a pose encoding and control point definition;
FIG. 4 is a detection network structure;
FIG. 5 is a demonstration of control point regression;
FIG. 6 shows a part of the actual test results.
[ detailed description ] embodiments
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments, and are not intended to limit the scope of the present disclosure. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Various structural schematics according to the disclosed embodiments of the invention are shown in the drawings. The figures are not drawn to scale, wherein certain details are exaggerated and possibly omitted for clarity of presentation. The shapes of various regions, layers and their relative sizes and positional relationships shown in the drawings are merely exemplary, and deviations may occur in practice due to manufacturing tolerances or technical limitations, and a person skilled in the art may additionally design regions/layers having different shapes, sizes, relative positions, according to actual needs.
In the context of the present disclosure, when a layer/element is referred to as being "on" another layer/element, it can be directly on the other layer/element or intervening layers/elements may be present. In addition, if a layer/element is "on" another layer/element in one orientation, then that layer/element may be "under" the other layer/element when the orientation is reversed.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The invention is described in further detail below with reference to the accompanying drawings:
referring to fig. 2, a visible face of the vehicle can be located with a rectangular frame or trapezoid, depending on the characteristics of the image taken by the forward facing camera. This representation is shown in fig. 2 to accurately depict the front, rear and sides of the vehicle, as compared to the original rectangular frame (dashed box). Since the combination of the visible surfaces of the vehicle is limited, the present invention proposes a 2D vehicle pose encoding strategy to represent this output.
Definition of output
The output representation proposed by the invention is expanded on the basis of a rectangular frame. Given a rectangle (v, x, y, w, h), where v indicates positive and negative samples, v ∈ {0,1}, 0 indicates background, and 1 indicates vehicle. And x, y, w and h represent the position and width and height of the rectangular frame, and on the basis, the output is expanded from two parts of posture subclass coding and control points.
Coding of the pose
The invention uses two subclasses to code the imaged 9 2D vehicle postures p epsilon { p ∈ { p }1,...,p9And as shown in fig. 3(a), the 9 postures are respectively 9 leaf nodes in the graph. The first sub-class a e R, F, N indicates which faces of the vehicle are visible, where R indicates rear visible, F indicates front visible, and N indicates that neither front nor rear visible, but only the side faces. Another subclass is s, which represents the spatial configuration, which determines whether the side is to the left or right of the front (or back). For a ═ N, s denotes the direction of the side. s ∈ { l, r, n }, where l denotes the left side, r denotes the right side, and n denotes that the target vehicle is almost directly in front, and only one rectangular surface can be seen. And 9 different postures can be coded according to the values of a and s.
Control point
The invention defines 3 virtual control points on the basis of the rectangular frame (x, y, w, h) to further form the outline boundary of each visible surface of the vehicle. α denotes the position of the boundary between the two faces, and β, γ defines the position of the upper base of the trapezoid. The definitions of α, β, γ in the a-R, s-R pose are given in fig. 3 (b). If s ═ l then β, γ is defined at the leftmost boundary. Not all control points are required for each of the 9 2D poses, but in fig. 3(a), the required control points are indicated by black dots for each pose, and when s is N, no control point is required, and when a is N, and when s is l, or s is r, only two control points, β and γ, are required.
Finally, the output is defined as (v, a, s, x, y, w, h, α, β, γ), and if the result is output according to the tree structure shown in fig. 3(a), the output result of the third layer, i.e., the result of the leaf node, can be directly denoted by p, and thus, the output can also be defined as (v, a, p, x, y, w, h, α, β, γ).
(iii) use of the hierarchical Structure
It is also a feasible solution to calculate the confidence of the 9 poses and the background directly using the flattened structure. But the use of tree structures is more advantageous when refining the classification. For example, if the detector detects the vehicle and its front, i.e. determines that a is F, but cannot further determine which 2D pose is specific, the tree-structured detector still outputs a high-confidence rectangular frame, but does not further output the contour according to the control point, which can ensure the correctness of the output result.
Detecting a network
Order (w)f,hf,cf) The width, height and number of channels of the feature layer of dimension f. If V, A, P are the number of categories V, a, P, respectively. Then a feature layer of scale f is convolved to produce (w)f,hf,BfX (V + A + P +4+3)) dimension, the detection result comprises (V, a, P, x, y, w, h, alpha, beta, gamma) information, wherein BfThe number of default boxes generated for each location is schematically shown in FIG. 4.
In the detection process, the detector predicts the conditional probability at each node of the hierarchical structure, and the joint probability can be obtained by multiplying the conditional probabilities from the root node to the node. And if the joint probability at a certain node is lower than a selected threshold value, stopping continuously judging downwards, and then predicting the final class and the geometric shape of the vehicle.
Network training
Order to
Figure BDA0002082472970000081
Is the function indicating whether the ith default box matches the jth truth box of type d. And obtaining N matched default boxes after matching with the actual values of the labels. The overall loss function is the sum of the classification and localization losses.
Figure BDA0002082472970000091
Classifying into networks
The penalty function for the classification task is as follows.
Figure BDA0002082472970000092
In the formula:
Figure BDA0002082472970000093
the confidence of the category after softmax is shown, and the calculation formula is as follows:
Figure BDA0002082472970000094
② control point regression
Let (alpha)xy) Representing the coordinates of control point alpha, similar definitions apply for control points beta and gamma. Due to the geometric constraint that three points of alpha, beta and gamma are all on the boundary of the rectangular frame, the values needing regression are only the position of the rectangular frame and alphaxyy. The relationship between the true value box, default box, control points is shown in FIG. 5. Defining the deviation of the predicted value from the default box as:
Figure BDA0002082472970000095
Figure BDA0002082472970000096
Figure BDA0002082472970000097
in the formula: cx, cy represents the coordinate of the center point of the default box; w, h represents the width of the default boxAnd high;
Figure BDA0002082472970000098
a true value representing the x coordinate of the alpha control point; alpha is alphaxA predicted value representing the x coordinate of the alpha control point;
Figure BDA0002082472970000099
βy,
Figure BDA00020824729700000910
γythe definition of (a) is similar.
The loss of the positioning task is as follows:
Figure BDA00020824729700000911
in the formula: l isboxA loss function representing rectangular box regression in target detection; l represents a robust loss function smoothL1
Figure BDA00020824729700000912
To represent
Figure BDA00020824729700000913
An indication function, which indicates whether the ith default box contributes to the coordinate t, when the true value pose matched by the default box does not contain the control point alpha, the default box will not contribute to the alphaxContributes to the regression of betayyAnd similarly.
Network framework:
the network framework for implementing the overall detection is shown in fig. 1. The input image is subjected to a feature extraction network to obtain different feature layers, and detection is performed on different feature graphs to obtain a final required classification score, a rectangular frame and a control point regression result. The detailed detector is shown in fig. 4.
The parameter settings of the feature extraction network are shown in table 1, and the default step size stride is 1.
Table 1 feature extraction network parameter settings
Figure BDA0002082472970000101
Implementation details:
before training, an appropriate default box was generated by the K-Means clustering method. The pre-trained model on ImageNet was then fine-tuned on the collected dataset. The Adam gradient descent algorithm was used for training, with the size of the mini-batch set to 2, the initial learning rate set to 0.001 and descending at a rate of 0.94 of decay factor in each training period. On a single NVIDIA GeForce 1080Ti GPU, the entire training process based on tensrflow takes approximately 12 hours.
All objects in the dataset that need to be applied are clustered into L categories, L being equal to the number of layers of features that need to be used for detection. And setting the target dimension of the clustering center to be matched with the scaling size of the target of each characteristic layer during clustering. Thus for each feature layer, a range of scales for objects belonging to this class can be obtained
Figure BDA0002082472970000111
According to the scale range, order
Figure BDA0002082472970000112
Five aspect ratios are set, wherein the aspect ratio r belongs to {1,2,3,1/2,1/3 }. The equation for the anchor for each layer is as follows:
Figure BDA0002082472970000113
in the formula:
Figure BDA0002082472970000114
representing the width of the anchor with the k characteristic layer aspect ratio of r;
Figure BDA0002082472970000115
representing the height of the anchor with the k characteristic layer aspect ratio of r; w represents the width of the input image; h denotes the high of the input image.
When it is wideWhen the height ratio r is 1, a block of another scale is additionally calculated:
Figure BDA0002082472970000116
the effect of the network implementation is shown in fig. 6.
The above-mentioned contents are only for illustrating the technical idea of the present invention, and the protection scope of the present invention is not limited thereby, and any modification made on the basis of the technical idea of the present invention falls within the protection scope of the claims of the present invention.

Claims (2)

1. A fine-grained vehicle detection method based on a deep neural network is characterized by comprising the following steps:
step 1: defining an output
Given a rectangle (v, x, y, w, h), where v indicates positive and negative samples, v ∈ {0,1}, 0 indicates background, and 1 indicates vehicle; x, y, w and h represent the position and width and height of the rectangular frame, and on the basis, the output is expanded from two parts of posture subclass coding and control points, and the specific method is as follows:
1-1) attitude coding
Two subclasses are used to encode the imaged 9 2D vehicle poses p e { p1,...,p99 postures are respectively 9 leaf nodes; the first sub-class a e { R, F, N } indicates which faces of the vehicle are visible, wherein R indicates that the rear face is visible, F indicates that the front face is visible, and N indicates that neither the front face nor the rear face is visible, and only the side faces are visible; another sub-category is s, which represents a spatial configuration, which determines whether the side is to the left or right of the front or back; for a ═ N, s denotes the direction of the flank; s belongs to { l, r, n }, wherein l represents the left side, r represents the right side, and n represents that the target vehicle is in the right front, and only one rectangular surface can be seen; 9 different postures can be coded according to the values of a and s;
1-2) control points
Defining 3 virtual control points on the basis of the rectangular frame (x, y, w, h), and forming the outline boundary of each visible surface of the vehicle; alpha denotes the position of the boundary between two faces, beta, gamma defines the position of the upper base of the trapezoid; if s ═ l, then β, γ are defined on the leftmost border; for 9 2D poses, no control point is needed when s is N, and only two control points, β and γ, are needed when a is N, s is l, or s is r;
the output is defined as (v, a, s, x, y, w, h, α, β, γ), the output result of the third layer, i.e., the result of the leaf node, can be directly denoted by p, and thus, the output can also be defined as (v, a, p, x, y, w, h, α, β, γ);
1-3) hierarchical Structure
A hierarchical output structure is adopted, and detection results are output in 3 layers; whether the first layer of output is a vehicle or not is a category v, the second layer of output is visible surface information of the vehicle, namely a category a, and the third layer of output is an accurate attitude category p;
step 2: detecting a network
Order (w)f,hf,cf) The width, height and channel number of the characteristic layer with the dimension f; if V, A, P are the number of categories V, a, P, respectively, then a feature layer with a scale f is convolved to produce (w)f,hf,BfX (V + A + P +4+3)) dimension, the detection result comprises (V, a, P, x, y, w, h, alpha, beta, gamma) information, wherein BfThe number of default boxes generated for each location;
in the detection process, the detector predicts the conditional probability at each node of the hierarchical structure, and the conditional probabilities from the root node to the node are multiplied to obtain a joint probability; if the joint probability of a certain node is lower than a selected threshold value, stopping continuously judging downwards, and then predicting the final category and the geometric shape of the vehicle;
and step 3: training network
Order to
Figure FDA0003117588910000021
An indication function of whether the ith default box matches with the jth truth box of the type d; obtaining N matched default boxes after matching with the actual value of the label; the overall loss function is the sum of the classification and localization losses:
Figure FDA0003117588910000022
2. the fine-grained vehicle detection method based on the deep neural network as claimed in claim 1, wherein the specific method in step 3 is as follows:
3-1) network classification
The loss function of the classification task is as follows;
Figure FDA0003117588910000023
in the formula:
Figure FDA0003117588910000024
the confidence of the category after softmax is shown, and the calculation formula is as follows:
Figure FDA0003117588910000031
3-2) control Point regression
Let (alpha)xy) Coordinates representing control point α, similar definitions apply to control points β and γ; the geometric constraint that three points of alpha, beta and gamma are all on the boundary of the rectangular frame requires regression values only including the position of the rectangular frame and alphaxyy(ii) a Defining the deviation of the predicted value from the default box as:
Figure FDA0003117588910000032
Figure FDA0003117588910000033
Figure FDA0003117588910000034
in the formula: cx, cy represents the coordinate of the center point of the default box; w, h represent the width and height of the default box;
Figure FDA0003117588910000035
a true value representing the x coordinate of the alpha control point; alpha is alphaxA predicted value representing the x coordinate of the alpha control point;
Figure FDA0003117588910000036
βy,
Figure FDA0003117588910000037
γyare similarly defined;
the loss of the positioning task is as follows:
Figure FDA0003117588910000038
in the formula: l isboxA loss function representing rectangular box regression in target detection; l represents a robust loss function smoothL1
Figure FDA0003117588910000039
Representing an indication function, indicating whether the ith default box contributes to the coordinate t, and when the true value posture matched by the default box does not contain the control point alpha, not adding the alphaxContributes to the regression of betayyAnd similarly.
CN201910476604.6A 2019-06-03 2019-06-03 Fine-grained vehicle detection method based on deep neural network Active CN110263679B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910476604.6A CN110263679B (en) 2019-06-03 2019-06-03 Fine-grained vehicle detection method based on deep neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910476604.6A CN110263679B (en) 2019-06-03 2019-06-03 Fine-grained vehicle detection method based on deep neural network

Publications (2)

Publication Number Publication Date
CN110263679A CN110263679A (en) 2019-09-20
CN110263679B true CN110263679B (en) 2021-08-13

Family

ID=67916462

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910476604.6A Active CN110263679B (en) 2019-06-03 2019-06-03 Fine-grained vehicle detection method based on deep neural network

Country Status (1)

Country Link
CN (1) CN110263679B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111814847B (en) * 2020-06-19 2024-03-26 浙江工业大学 Clustering method based on three-dimensional contour of vehicle

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107066953A (en) * 2017-03-22 2017-08-18 北京邮电大学 It is a kind of towards the vehicle cab recognition of monitor video, tracking and antidote and device
CN107590440A (en) * 2017-08-21 2018-01-16 南京邮电大学 The method and system of Human detection under a kind of Intelligent household scene
CN109800631A (en) * 2018-12-07 2019-05-24 天津大学 Fluorescence-encoded micro-beads image detecting method based on masked areas convolutional neural networks

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6839466B2 (en) * 1999-10-04 2005-01-04 Xerox Corporation Detecting overlapping images in an automatic image segmentation device with the presence of severe bleeding
GB2409031A (en) * 2003-12-11 2005-06-15 Sony Uk Ltd Face detection
US9547808B2 (en) * 2013-07-17 2017-01-17 Emotient, Inc. Head-pose invariant recognition of facial attributes
CN108596053B (en) * 2018-04-09 2020-06-02 华中科技大学 Vehicle detection method and system based on SSD and vehicle posture classification
CN109343041B (en) * 2018-09-11 2023-02-14 昆山星际舟智能科技有限公司 Monocular distance measuring method for advanced intelligent auxiliary driving
CN109829400B (en) * 2019-01-18 2023-06-30 青岛大学 Rapid vehicle detection method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107066953A (en) * 2017-03-22 2017-08-18 北京邮电大学 It is a kind of towards the vehicle cab recognition of monitor video, tracking and antidote and device
CN107590440A (en) * 2017-08-21 2018-01-16 南京邮电大学 The method and system of Human detection under a kind of Intelligent household scene
CN109800631A (en) * 2018-12-07 2019-05-24 天津大学 Fluorescence-encoded micro-beads image detecting method based on masked areas convolutional neural networks

Also Published As

Publication number Publication date
CN110263679A (en) 2019-09-20

Similar Documents

Publication Publication Date Title
CN109636905B (en) Environment semantic mapping method based on deep convolutional neural network
CN109597087B (en) Point cloud data-based 3D target detection method
CN111310574B (en) Vehicle-mounted visual real-time multi-target multi-task joint sensing method and device
CN111832655B (en) Multi-scale three-dimensional target detection method based on characteristic pyramid network
Gosala et al. Bird’s-eye-view panoptic segmentation using monocular frontal view images
CN110222626B (en) Unmanned scene point cloud target labeling method based on deep learning algorithm
CN113128348B (en) Laser radar target detection method and system integrating semantic information
CN110728200A (en) Real-time pedestrian detection method and system based on deep learning
CN110688905B (en) Three-dimensional object detection and tracking method based on key frame
CN106156748A (en) Traffic scene participant's recognition methods based on vehicle-mounted binocular camera
CN106203342A (en) Target identification method based on multi-angle local feature coupling
CN110189339A (en) The active profile of depth map auxiliary scratches drawing method and system
CN110516633B (en) Lane line detection method and system based on deep learning
CN101996410A (en) Method and system of detecting moving object under dynamic background
CN111292366B (en) Visual driving ranging algorithm based on deep learning and edge calculation
CN113267761B (en) Laser radar target detection and identification method, system and computer readable storage medium
CN106023249A (en) Moving object detection method based on local binary similarity pattern
CN115424017B (en) Building inner and outer contour segmentation method, device and storage medium
CN108021857B (en) Building detection method based on unmanned aerial vehicle aerial image sequence depth recovery
CN115128628A (en) Road grid map construction method based on laser SLAM and monocular vision
CN114372523A (en) Binocular matching uncertainty estimation method based on evidence deep learning
CN110263679B (en) Fine-grained vehicle detection method based on deep neural network
CN117274749B (en) Fused 3D target detection method based on 4D millimeter wave radar and image
CN113221739B (en) Monocular vision-based vehicle distance measuring method
CN110598711A (en) Target segmentation method combined with classification task

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant