CN112801027A - Vehicle target detection method based on event camera - Google Patents

Vehicle target detection method based on event camera Download PDF

Info

Publication number
CN112801027A
CN112801027A CN202110182127.XA CN202110182127A CN112801027A CN 112801027 A CN112801027 A CN 112801027A CN 202110182127 A CN202110182127 A CN 202110182127A CN 112801027 A CN112801027 A CN 112801027A
Authority
CN
China
Prior art keywords
dvs
aps
image
event
loss
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110182127.XA
Other languages
Chinese (zh)
Inventor
孙艳丰
刘萌允
齐娜
施云惠
尹宝才
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CN202110182127.XA priority Critical patent/CN112801027A/en
Publication of CN112801027A publication Critical patent/CN112801027A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/08Detecting or categorising vehicles

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a vehicle target detection method based on an event camera, which is based on the event camera and is researched by utilizing a deep learning technology. The event camera can generate frames and event data asynchronously, which is of great help to overcome motion blur and extreme lighting conditions. Firstly, converting an event into an event image, then simultaneously sending a frame image and the event image into a fusion convolutional neural network, and adding a convolutional layer for extracting the characteristics of the event image; simultaneously, fusing the characteristics of the two through a fusion module in the middle layer of the network; and finally, redesigning the loss function to improve the effectiveness of vehicle target detection. The method can make up the defect that only the frame image is used for target detection in an extreme scene, and the event image is fused in the fusion convolution neural network on the basis of using the frame image, so that the vehicle target detection effect in the extreme scene is enhanced.

Description

Vehicle target detection method based on event camera
Technical Field
The invention discloses a vehicle target detection method under an extreme scene based on an event camera and by utilizing a deep learning technology, belongs to the field of computer vision, and particularly relates to technologies such as deep learning and target detection.
Background
With the rapid development of the automobile industry, the technology of automatically driving automobiles has received extensive attention in recent years from academic and industrial fields. Vehicle target detection is a challenging task in autonomous vehicle technology. It is an important application in the fields of automatic driving automobile technology and intelligent traffic system. It plays a key role in the automatic driving technology. The purpose of vehicle target detection is to accurately locate the positions of the remaining vehicles in the surrounding environment, avoiding accidents with other vehicles.
A great deal of current target detection research uses deep neural networks to enhance the target detection system. These studies basically use a frame-based camera called Active Pixel Sensor (APS). Thus, many of the detected objects are stationary or slowly moving, and lighting conditions are also suitable. In practice, vehicles encounter a variety of complex and extreme scenarios. In extreme lighting and motion blur situations, overexposure and blur situations can occur in images presented by conventional frame-based cameras, which can present a significant challenge to target detection.
Dynamic Vision Sensors (DVS) have the key features of high dynamic range and low latency. These characteristics enable them to capture environmental information and generate images faster than standard cameras. At the same time, they are not affected by motion blur, which is helpful for frame cameras in extreme cases. Furthermore, autonomous vehicles may be made more sensitive due to their low latency and short response time. Dynamic and active pixel sensors (DAVIS) can output regular gray frames and asynchronous events through APS and DVS channels, respectively. Regular grayscale frames may provide the main information for target detection, and asynchronous events may provide information for fast motion and illumination changes. With this heuristic, the detection performance of the target can be improved by combining the two data.
In recent years, deep learning algorithms have been used with great success and are widely used in image classification and target detection. The deep neural network has excellent feature extraction capability and strong learning capability, and can identify target categories and locate target positions in a target identification task. A Convolutional Neural Network (CNN) based on boundary regression can directly regress the position and class of a target from an input image without searching for candidate regions. But this requires that objects in the image fed into the CNN that need to be discriminated are sharp, whereas objects in the image generated in extreme scenes may be blurred. It cannot meet the requirement if only CNN is used for object detection of frame images generated in extreme scenes.
The CNN-based vehicle detection method fuses frame and event data output by a DAVIS camera. The method comprises the steps of reconstructing event data into an image, simultaneously sending a frame image and the event image into a convolutional neural network, and fusing the characteristics extracted from the event image and the characteristics extracted from the frame image in a network intermediate layer through a fusion module. And at the final detection layer, redesigning the loss function of the network, and adding a loss term to the DVS characteristics. The data set used in the experiment was a self-established vehicle target detection data set (Dataset of APS and DVS, DAD). The comparison of different input modes shows that the vehicle detection result is obviously improved under different environmental conditions. Meanwhile, compared with different methods such as a network using a single image input and a network using two kinds of data input at the same time, the method provided herein has a significant effect.
Disclosure of Invention
The invention provides a vehicle target detection method based on an event camera by utilizing a deep learning technology. Since a normal camera can produce motion blur, overexposure, or over-darkness in fast moving and extreme brightness scenes, temporal data generated by an event camera is used to enhance the detection effect. The event camera may asynchronously output events for changes in brightness, including coordinates of pixels, polarity of brightness, and time stamp, so the events are first turned into images. This is because the image-based target detection technology is mature at present, and the detection of the event is realized by the image detection technology. And simultaneously feeding the frame image (APS) and the event image (DVS) into a framework (ADF) of a converged convolutional network for convolution operation, and performing feature extraction and feature fusion in the network framework. Therefore, the characteristics of the images can be extracted, and the finally extracted characteristics have effective characteristic information of the characteristics. Finally, by modifying the loss function of the model, the loss term of the DVS is also increased on the basis of only carrying out the loss term on the APS. The overall frame diagram of the method is shown in the attached figure 1, and the method can be divided into the following four steps: and converting the event data into an event image, fusing the whole framework of the convolutional neural network to extract features, fusing the features through a fusion module, and carrying out target detection on the extracted features through a detection layer.
(1) Event data converted into event image
Considering that the current target detection algorithm for the image is relatively mature, the event data of the DVS channel is converted into the image and then is sent to the network together with the APS image for target detection. The event data is 5 parts in total, the abscissa x of the pixel, the ordinate y of the pixel, the luminance polarity increased by +1, the luminance polarity decreased by-1 and the time stamp. And converting the event data into an event image with the same size as the frame image in the accumulated time according to the change of the coordinates and the polarity of the pixels.
(2) Feature extraction monolithic framework
The invention uses the dark net-53 as a basic framework, and adds a convolution layer for extracting the characteristics of the DVS image on the basis of only carrying out convolution operation on the APS image. Because the data of the DVS channel is sparse, fewer convolutional layers are used to extract features at different resolutions. For Darknet-53, the DVS channel still uses continuous convolutional layers of 3X 3 and 1X 1. The specific number of convolution layers is shown in table 1.
(3) Fusion module
In the network structure, a fusion module is designed with reference to ResNet. And the fusion module extracts the main features of the DVS at different resolutions and then fuses the main features with the features of the APS with the same size so as to guide the network to learn more detailed features of the APS and the DVS at the same time. The fusion module is shown in fig. 2.
(4) Target detection is carried out on the extracted features through a detection layer
And modifying the loss function of the network at the detection layer, wherein the loss function of the APS features adopts a cross entropy loss function, and the loss of coordinates, categories and confidence degrees is included. And on the basis, the cross-entropy loss function is also adopted to carry out loss calculation on the DVS characteristics. And finally, combining the detection result of the APS and the detection result of the DVS. The results for APS or DVS alone may still be correct results. Taking only the intersection of the two results, many correct detection results are lost. The results of the two are collected, so that the error can be reduced, and the accuracy can be improved.
Compared with the prior art, the invention has the following obvious advantages and beneficial effects:
the invention adopts convolution neural network technology to detect the vehicle in the extreme scene based on the APS image and DVS data generated by the event camera. Compared with the method that only traditional APS images are used, the event data are converted into event images, and the images are identified through mature depth learning. And then adding a fusion module in the convolutional neural network to perform feature level fusion on the two parts of information. Finally, by modifying the loss function again, the capability of the network for identifying the target when the problems of target blurring, illumination discomfort and the like exist in the image is improved, and a good effect is obtained in an extreme scene.
Drawings
FIG. 1 is a block diagram of an overall network architecture;
FIG. 2 is a schematic diagram of a fusion module;
FIG. 3 is a graph of experimental results;
Detailed Description
In light of the above description, a specific implementation flow is as follows, but the scope of protection of this patent is not limited to this implementation flow.
Step 1: event data converted into event image
Based on the generation mechanism of the event, there are three reconstruction methods to convert the event into the frame. They are a fixed event number method, a leaky integrator method, and a fixed time interval method, respectively. In the present invention, it is an object to be able to detect fast moving objects. The event reconstruction is set to a fixed frame length of 10ms using a fixed time interval method. In each time interval, according to the pixel position generated by the event, at the corresponding pixel point generated with polarity, the event with the polarity increased is drawn as a white pixel, the event with the polarity decreased is drawn as a black pixel, and the background color of the image is gray. And finally generating an event image with the same size as the APS image.
Step 2: feature extraction via a network ensemble framework
APS images and DVS images are simultaneously input into a network framework, and features are extracted through respective 3 × 3 and 1 × 1 convolutional layers, except that the number of convolutional layers for extracting the features is different, and the DVS is less than that of the APS. The network (2) predicts the input APS image and also predicts the DVS image. Both APS and DVS images are divided into S × S grids, each grid predicts B bounding boxes, and predicts C classes altogether. Each bbox was introduced into the Gaussian model, predicting 8 coordinate values, μ _ x, ε _ x, μ _ y, ε _ y, μ _ w, ε _ w, μ _ h, ε _ h. A confidence score p is also predicted. So at the last input detection layer of the network is a tensor of 2 × S × B × (C + 9). The three size tensors of the APS channel and the three same size tensors of the DVS channel are fed into the detection layer, respectively.
And step 3: fusion module
Passing APS and DVS through respective convolution layers to obtain characteristic FapsAnd FdvsFeeding into a fusion model, and first FapsAnd FdvsF → U, F ∈ R, U ∈ RM×N×C,U=[u1,u2,…,uC]To obtain a transformation characteristic UapsAnd UdvsWherein u iscIs a feature matrix of size M × N for the C-th channel among the C channels. Briefly, the Tc operation is taken as a convolution operation;
obtaining transformation characteristics UdvsThen, we consider the global information of all channels in the feature, compress this global information into one channel to get the aggregation information zc. Operating Tst (U) by global average poolingdvs) To accomplish this formally expressed as:
Figure BDA0002941730490000041
wherein u isc(i, j) is the (i, j) th value in the feature matrix. In order to utilize the aggregate information z in the compression operationcExcitation operation is carried out, convolution characteristic information of each channel is fused, and a dependency relation s on the channels is obtained, namely:
s=Tex(z,E)=δ(E2σ(E1z))#(2)
where σ denotes a ReLU activation function, δ denotes a sigmoid activation function, E1And E2Two weights. This is achieved using two fully connected layers;
using s to activate switch U through Tscale operationapsObtaining a feature block U':
U′=Tscale(Uaps,s)=Uaps·s#(3)
finally, the DVS feature block is fused with the APS feature to obtain the final fusion feature Faps′:
Figure BDA0002941730490000051
Splicing operation is adopted in specific implementation.
And 4, step 4: target detection is carried out on the extracted features through a detection layer
The same as APS part, DVS detection result is added in the detection layer, binary cross entropy loss is carried out on the objects and classes detected by DVS, and the negative log likelihood loss function (NLL) of the coordinate frame is as follows:
Figure BDA0002941730490000052
wherein
Figure BDA0002941730490000053
Is the NLL loss of the x coordinate of the DVS. W and H are the number of grids for each width and height, respectively, and K is the prior frame number. The output of the detection layer at the kth prior box of the (i, j) grid is:
Figure BDA0002941730490000054
and
Figure BDA0002941730490000055
Figure BDA0002941730490000056
the coordinates of x are shown as such,
Figure BDA0002941730490000057
representing the uncertainty of the x coordinate.
Figure BDA0002941730490000058
Is the group Truth in x-coordinate, which is calculated from the width and height of the adjusted image in Gaussian yollov 3 and the kth prior box prior. ξ is a fixed value of 10-9.
Figure BDA0002941730490000059
The same as the x coordinate, represents the loss of the remaining coordinates y, w, h.
Figure BDA00029417304900000510
ωscale=2-wG×hG#(7)
ωscaleAccording to the size (w) of the object in the training processG,hG) Different weights are provided. (6) In (1)
Figure BDA00029417304900000511
Is a parameter that is applied in the loss only if there is an anchor point in the prior box that best fits the current object. The value of this parameter is 1 or 0, which is intersected by the intersection of GroudTruth and the kth prior box in the (i, j) meshes ((IOU).
Figure BDA00029417304900000512
CijkThe value of (d) depends on whether the bounding box of the grid cell fits the predicted object. If appropriate, Cijk1 is ═ 1; otherwise, Cijk=0。τnoobjThe k-th prior box indicating the grid does not fit the target.
Figure BDA0002941730490000061
Representing the correct category.
Figure BDA0002941730490000062
The k-th prior box indicating the mesh is not responsible for predicting the target.
The class losses are as follows:
Figure BDA0002941730490000063
Pijindicating the probability that the currently detected object is the correct object.
The loss function for the DVS portion is:
Figure BDA0002941730490000064
wherein L isDVSRepresenting the sum of DVS channel coordinate value loss, class loss, and confidence loss.
LAPSAnd LDVSConsistent in form. The overall network loss function is therefore:
L=LAPS+LDVS#(11)
by increasing the loss function of the DVS channel, the data of the model for detecting the extreme environment has stronger robustness, and the accuracy of the algorithm is improved.
To verify the validity of the proposed solution of the present invention, experiments were first performed on a custom data set. Comparative experiments were performed for different methods such as inputting only an APS image, inputting only a DVS image, inputting a superimposed image of APS and DVS pixels, and inputting both images at the same time, and the experimental results are shown in table 2. Further, the effect of the different input modes is shown in fig. 3. Each column in the figure corresponds to one input mode. Four scenes (fast moving, over-lit dark, and normal) were selected for each method. In a scene where the object is moving rapidly, the input DVS image may detect a rapidly moving vehicle, but may not detect a relatively stationary vehicle. The opposite input APS image can detect a relatively stationary vehicle, but cannot detect a fast moving vehicle. The effect of the image after the superposition of the input APS and DVS pixels is comparable to the effect of the input APS image alone. After the two images are input simultaneously, the vehicle can obtain good detection effect no matter the vehicle moves rapidly or is still. In the case of too strong or too dark illumination, neither the input APS image nor the superimposed image of APS and DVS pixels has a good detection effect. Compared with the prior art, the APS image and the DVS image can be input simultaneously, so that the characteristics of the two parts can be well fused, and the defects of the APS can be made up through the DVS. The DVS image detection effect is the worst in a normal scene because only the luminance variation in the image can generate information, and the region without luminance variation corresponds to the background and cannot be recognized. In general, the method of fusing two images in a network while using an ADF network is significantly superior to other methods.
At the same time, several of the most advanced single input networks were also selected for comparison, as shown in table 3. The network comparison results of the single image input are compared on the custom data set. As can be seen from the table, in the case where the model of the present invention inputs only a single image, the network performance is not as good as that of other networks because the network itself is designed to implement dual input. Therefore, when the model simultaneously inputs frames and events, the experimental result is improved, and the effect of improving the identification by using the event data is proved.
In addition, the present invention compares the PKU-DDD17-CAR dataset with the JDF network into which both data were imported, and the results are shown in Table 4. And converting the event data in the data set into images and then sending the images into the ADF network. The results of inputting the frame image only and the frame image at the same time and the event data are compared, respectively. Although the network is inferior to the JDF network in the case of only inputting a frame image, the network in the case of simultaneously inputting two kinds of data is superior to the JDF network.
TABLE 1 number of convolution layers in network framework
Figure BDA0002941730490000071
Figure BDA0002941730490000081
Table 2 experimental results on custom data set
Figure BDA0002941730490000082
TABLE 3 comparison with Single image input network
Figure BDA0002941730490000083
TABLE 4 comparison of two data inputs into different networks
Figure BDA0002941730490000091

Claims (5)

1. The vehicle target detection method based on the event camera is characterized by comprising the following steps: based on APS images and DVS data generated by an event camera, adopting a convolutional neural network technology to detect a target of a vehicle in an extreme scene, and converting event data into event images; according to the change of the coordinates and the polarity of the pixels, event data are converted into an event image with the same size as the frame image in the accumulated time; by utilizing a mature convolutional neural network, on the basis of a dark net-53 framework, only on the basis of carrying out convolution operation on an APS image, a convolutional layer for extracting characteristics of the DVS image is added, and the DVS channel still adopts continuous convolutional layers of 3 multiplied by 3 and 1 multiplied by 1; then adding a fusion module in the convolutional neural network, weighting the features of APS with the same size after extracting the features of DVS under different resolutions so as to guide the network to learn more detailed features of APS and DVS at the same time; modifying a loss function of the network at a detection layer, wherein the loss function of the APS features is a cross entropy loss function, and the loss function comprises the loss of coordinates, categories and confidence degrees; the cross-entropy penalty function performs a penalty calculation on the DVS features.
2. The event camera-based vehicle object detection method of claim 1, wherein the event is converted into an image by a fixed time interval method; in order to achieve detection at a speed FPS of 100 frames per second, the frame reconstruction is set to a fixed frame length of 10 ms; in each time interval, according to the pixel position generated by the event, in the corresponding pixel point generated by the polarity, the event with the increased polarity is drawn into a white pixel, the event with the decreased polarity is drawn into a black pixel, and the background color of the image is gray; and finally generating an event image with the same size as the APS image.
3. The event camera-based vehicle object detection method according to claim 1, wherein successive 3 x 3 and 1 x 1 convolutional layers that extract features from the DVS image are added; inputting an APS image and a DVS image into a network framework simultaneously, extracting features through respective 3 × 3 and 1 × 1 convolutional layers, wherein the difference is that the number of the convolutional layers for extracting the features is different, and the DVS is less than that of the APS; the network predicts the input APS image and also predicts the DVS image; both the APS image and the DVS image are divided into grids of S multiplied by S, each grid predicts B bounding boxes, and predicts C types; each bbox is introduced into a gaussian model, and 8 coordinate values, mu _ x, epsilon _ x, mu _ y, epsilon _ y, mu _ w, epsilon _ w, mu _ h and epsilon _ h are predicted; predicting a confidence score p; so at the last input detection layer of the network is a tensor of 2 × S × B × (C + 9); the three size tensors of the APS channel and the three same size tensors of the DVS channel are fed into the detection layer, respectively.
4. The event camera-based vehicle object detection method according to claim 1, characterized in that the two-part features are effectively fused in a fusion module; passing APS and DVS through respective convolution layers to obtain characteristic FapsAnd FdvsFeeding into a fusion model, and first FapsAnd FdvsAfter a given transformation operation Tc: f → U, F belongs to R, U belongs to RM×N×C,U=[u1,u2,...,uC]To obtain a transformation characteristic UapsAnd UdvsWherein u iscIs a feature matrix with the size of MxN of the C channel in the C channels; briefly, the Tc operation is taken as a convolution operation;
obtaining transformation characteristics UdvsThen, we consider the global information of all channels in the feature, compress this global information into one channel to get the aggregation information zc(ii) a Through a global average pooling operation Tsq (U)dvs) To accomplish this formally expressed as:
Figure FDA0002941730480000021
wherein u isc(i, j) is the (i, j) th value in the feature matrix; in order to utilize the aggregate information z in the compression operationcExcitation operation is carried out, convolution characteristic information of each channel is fused, and a dependency relation s on the channels is obtained, namely:
s=Tex(z,E)=δ(E2σ(E1z))#(2)
wherein, sigma represents a ReLU activation function, and delta represents a sigmoid activation function; e1And E2Two weights; this is achieved using two fully connected layers;
using s to activate switch U through Tscale operationapsObtaining a feature block U':
U′=Tscale(Uaps,s)=Uaps·s#(3)
finally, the DVS feature block is fused with the APS feature to obtain the final fusion feature Faps′:
Figure FDA0002941730480000022
Splicing operation is adopted in specific implementation.
5. The event camera-based vehicle object detection method according to claim 1, wherein a loss term for DVS features is added at a detection layer; the same as APS part, DVS detection result is added in the detection layer, binary cross entropy loss is carried out on the objects and classes detected by DVS, and the negative log likelihood loss function (NLL) of the coordinate frame is as follows:
Figure FDA0002941730480000023
wherein
Figure FDA0002941730480000024
NLL loss as x-coordinate of DVS; w and the number of the grids of each width and height are respectively the number of the slices, and K is the prior frame number; the output of the detection layer at the kth prior box of the (i, j) grid is:
Figure FDA0002941730480000025
and
Figure FDA0002941730480000026
Figure FDA0002941730480000027
the coordinates of x are shown as such,
Figure FDA0002941730480000028
representing the uncertainty of the x coordinate;
Figure FDA0002941730480000029
is the group Truth of the x coordinate, which is calculated from the width and height of the adjusted image in Gaussian YOLOv3 and the kth prior box prior; ξ is a fixed value of 10-9;
Figure FDA0002941730480000031
the same as x coordinate, representing the loss of the remaining coordinates y, w, h;
Figure FDA0002941730480000032
ωscale=2-wG×hG#(7)
ωscaleaccording to the size (w) of the object in the training processG,hG) Providing different weights; (7) in (1)
Figure FDA0002941730480000033
Is a parameter that is applied to loss only when there is an anchor point in the prior frame that is best suited for the current object; the value of this parameter is 1 or 0, which is determined by the intersection of the group Truth and the kth prior box in the (i, j) grids (IOU);
Figure FDA0002941730480000034
Cijkthe value of (d) depends on whether the bounding box of the grid cell fits the predicted object; if appropriate, Cijk1 is ═ 1; otherwise, Cijk=0;τnoobjIndicating that the kth prior box of the grid does not fit the target;
Figure FDA0002941730480000035
represents the correct category;
Figure FDA0002941730480000036
indicating that the kth prior box of the mesh is not responsible for predicting the target;
the class losses are as follows:
Figure FDA0002941730480000037
Pijrepresenting the probability that the currently detected object is the correct object;
the loss function for the DVS portion is:
Figure FDA0002941730480000038
wherein L isDVSRepresenting the sum of the DVS channel coordinate value loss, the category loss and the confidence coefficient loss;
LAPSand LDVSRemain consistent in form; the overall network loss function is therefore:
L=LAPS+LDVS#(11) 。
CN202110182127.XA 2021-02-09 2021-02-09 Vehicle target detection method based on event camera Pending CN112801027A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110182127.XA CN112801027A (en) 2021-02-09 2021-02-09 Vehicle target detection method based on event camera

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110182127.XA CN112801027A (en) 2021-02-09 2021-02-09 Vehicle target detection method based on event camera

Publications (1)

Publication Number Publication Date
CN112801027A true CN112801027A (en) 2021-05-14

Family

ID=75815068

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110182127.XA Pending CN112801027A (en) 2021-02-09 2021-02-09 Vehicle target detection method based on event camera

Country Status (1)

Country Link
CN (1) CN112801027A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113762409A (en) * 2021-09-17 2021-12-07 北京航空航天大学 Unmanned aerial vehicle target detection method based on event camera
CN115497028A (en) * 2022-10-10 2022-12-20 中国电子科技集团公司信息科学研究院 Event-driven dynamic hidden target detection and identification method and device
CN115631407A (en) * 2022-11-10 2023-01-20 中国石油大学(华东) Underwater transparent biological detection based on event camera and color frame image fusion
WO2023025185A1 (en) * 2021-08-24 2023-03-02 The University Of Hong Kong Event-based auto-exposure for digital photography
CN116206196A (en) * 2023-04-27 2023-06-02 吉林大学 Ocean low-light environment multi-target detection method and detection system thereof
CN116416602A (en) * 2023-04-17 2023-07-11 江南大学 Moving object detection method and system based on combination of event data and image data
CN116682000A (en) * 2023-07-28 2023-09-01 吉林大学 Underwater frogman target detection method based on event camera

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109685152A (en) * 2018-12-29 2019-04-26 北京化工大学 A kind of image object detection method based on DC-SPP-YOLO
CN111461083A (en) * 2020-05-26 2020-07-28 青岛大学 Rapid vehicle detection method based on deep learning
CN112163602A (en) * 2020-09-14 2021-01-01 湖北工业大学 Target detection method based on deep neural network

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109685152A (en) * 2018-12-29 2019-04-26 北京化工大学 A kind of image object detection method based on DC-SPP-YOLO
CN111461083A (en) * 2020-05-26 2020-07-28 青岛大学 Rapid vehicle detection method based on deep learning
CN112163602A (en) * 2020-09-14 2021-01-01 湖北工业大学 Target detection method based on deep neural network

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023025185A1 (en) * 2021-08-24 2023-03-02 The University Of Hong Kong Event-based auto-exposure for digital photography
CN113762409A (en) * 2021-09-17 2021-12-07 北京航空航天大学 Unmanned aerial vehicle target detection method based on event camera
CN115497028A (en) * 2022-10-10 2022-12-20 中国电子科技集团公司信息科学研究院 Event-driven dynamic hidden target detection and identification method and device
CN115497028B (en) * 2022-10-10 2023-11-07 中国电子科技集团公司信息科学研究院 Event-driven-based dynamic hidden target detection and recognition method and device
CN115631407A (en) * 2022-11-10 2023-01-20 中国石油大学(华东) Underwater transparent biological detection based on event camera and color frame image fusion
CN115631407B (en) * 2022-11-10 2023-10-20 中国石油大学(华东) Underwater transparent biological detection based on fusion of event camera and color frame image
CN116416602A (en) * 2023-04-17 2023-07-11 江南大学 Moving object detection method and system based on combination of event data and image data
CN116416602B (en) * 2023-04-17 2024-05-24 江南大学 Moving object detection method and system based on combination of event data and image data
CN116206196A (en) * 2023-04-27 2023-06-02 吉林大学 Ocean low-light environment multi-target detection method and detection system thereof
CN116206196B (en) * 2023-04-27 2023-08-08 吉林大学 Ocean low-light environment multi-target detection method and detection system thereof
CN116682000A (en) * 2023-07-28 2023-09-01 吉林大学 Underwater frogman target detection method based on event camera
CN116682000B (en) * 2023-07-28 2023-10-13 吉林大学 Underwater frogman target detection method based on event camera

Similar Documents

Publication Publication Date Title
CN112801027A (en) Vehicle target detection method based on event camera
CN108133188B (en) Behavior identification method based on motion history image and convolutional neural network
CN113052210B (en) Rapid low-light target detection method based on convolutional neural network
CN111814621B (en) Attention mechanism-based multi-scale vehicle pedestrian detection method and device
CN113688723B (en) Infrared image pedestrian target detection method based on improved YOLOv5
CN111461083A (en) Rapid vehicle detection method based on deep learning
CN112686207B (en) Urban street scene target detection method based on regional information enhancement
CN108416780B (en) Object detection and matching method based on twin-region-of-interest pooling model
CN110705412A (en) Video target detection method based on motion history image
CN111832453B (en) Unmanned scene real-time semantic segmentation method based on two-way deep neural network
WO2023030182A1 (en) Image generation method and apparatus
CN114202743A (en) Improved fast-RCNN-based small target detection method in automatic driving scene
Wang et al. Multi-stage fusion for multi-class 3d lidar detection
CN116342894A (en) GIS infrared feature recognition system and method based on improved YOLOv5
CN115035298A (en) City streetscape semantic segmentation enhancement method based on multi-dimensional attention mechanism
CN112329861A (en) Layered feature fusion method for multi-target detection of mobile robot
CN116246059A (en) Vehicle target recognition method based on improved YOLO multi-scale detection
CN116258940A (en) Small target detection method for multi-scale features and self-adaptive weights
CN113963333B (en) Traffic sign board detection method based on improved YOLOF model
Tang et al. HIC-YOLOv5: Improved YOLOv5 For Small Object Detection
CN110503049B (en) Satellite video vehicle number estimation method based on generation countermeasure network
Sun et al. UAV image detection algorithm based on improved YOLOv5
CN116597144A (en) Image semantic segmentation method based on event camera
CN116311154A (en) Vehicle detection and identification method based on YOLOv5 model optimization
CN115909276A (en) Improved YOLOv 5-based small traffic sign target detection method in complex weather

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination