AU2020100048A4 - Method of object detection for vehicle on-board video based on RetinaNet - Google Patents

Method of object detection for vehicle on-board video based on RetinaNet Download PDF

Info

Publication number
AU2020100048A4
AU2020100048A4 AU2020100048A AU2020100048A AU2020100048A4 AU 2020100048 A4 AU2020100048 A4 AU 2020100048A4 AU 2020100048 A AU2020100048 A AU 2020100048A AU 2020100048 A AU2020100048 A AU 2020100048A AU 2020100048 A4 AU2020100048 A4 AU 2020100048A4
Authority
AU
Australia
Prior art keywords
training
network
retinanet
data set
subnet
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
AU2020100048A
Inventor
Mengfang Ding
Heyang Huang
Zhixu Liu
Yufei Song
Yihe Wang
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to AU2020100048A priority Critical patent/AU2020100048A4/en
Application granted granted Critical
Publication of AU2020100048A4 publication Critical patent/AU2020100048A4/en
Ceased legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
    • G06V20/582Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads of traffic signs

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

This invention lies in the field of digital signal processing. It is an image recognition system of obstacles and road signs around automated driving vehicles based on deep learning. The invention consists of the following steps: Firstly, we collected images from cameras on several cars. Secondly, after selecting and preprocessing the images, they were divided into two data sets: one for training, the other for testing. We then put the training data set into the convolutional neural network. In order to reach the best performance, we adjusted some parameters of the network, and finally, we put the test data set into the network and the accuracy of recognition reached. In conclusion, this system can recognise different types of obstacles and road signs with high accuracy without human intervention. Download the RetinaNet training model and Pascal VOC 2007 data set Divide the Pascal VOC 2007 set into training and testing data sets Initialize the neural network Adjust parameters to train the network Do the test using the testing data set Figure 1 I subnet class--box WxH W Wx sobress x256 x5 K subnet Wo * I ubnet ) c ResNet (b) feature pyramid net (c) cass subnet (top) (d) box subnet (bottom) Figure 2

Description

Method of object detection for vehicle on-board video based on RetinaNet
Field of Invention
This invention is in the field of digital signal processing. It aims to recognise different kinds of obstacles and road signs around the automated driving vehicles in order to reduce the risk of traffic accidents and do self-adjustment in time in complicated traffic conditions.
Background
In the 20th century, soon after the car was invented, some scientists had thoughts about inventing automated driving cars. In automated driving system, the vehicle should recogniseroad signs (e.g. zebra crossings, temporary construction signs) and obstacles (e.g. other vehicles, people). But in that time, the computers were neither small enough to be placed inside a car, nor strong enough to handle those complexed data.
Until the beginning of this century, some basic automated driving system started to appear. However, in that period, the researchers must find out the special features of these objects by themselves, and type these features into the computer. Then the computer will compare these data with the image. These had brought the researchers much difficulty, since there are too many features to be extracted. Also, the accuracy of recognition is very low, so traffic accidents happened frequently while the researchers were doing the road test. But nowadays, with the development of technology, the computer can learn how to differentiate images itself, and that is called deep learning. With this technique, the computer can automatically learn the features of different objects through a large number of images, and use these features to judge the classification of a new image, which means that no human intervention is needed anymore. It reduces the difficulty for the researchers while increased the efficiency and accuracy of recognition. Hence, we use this technique to create an image recognition system for automated driving vehicles.
In this invention, we used RetinaNet, a one-stage object detector, as our deep learning framework. [1]lt has a faster detecting speed then two-stage detectors like Faster R-CNN. Also, thanks to Focal Loss, we solved the problem of the imbalance between positive and negative examples in one-stage detectors, so the efficiency and accuracy of recognition will be highly improved.
Summary
This invention aims to recognise different kinds of obstacles and road signs around the automated driving vehicles in order to reduce the risk of traffic accidents and do self-adjustment in time in complicated traffic conditions. Using RetinaNet, a one-stage detector which has a faster detecting speed, and Focal Loss, which can keep a balance between foreground and background
2020100048 10 Jan 2020 classes, there will be significant improvement on both efficiency and accuracy of recognition.
The framework of ourimage recognition method includes: Pascal VOC 2007 data set, convolutional neural network based on RetinaNet, parameter adjustment andthe application of recognition.
In order to make the training process effective, the data set should be large, diverse and reliable. Therefore, we chose Pascal VOC 2007 as our training and testing data set.
Our convolutional neural network is based on RetinaNet, a network architecture Figure 2 shows the general structure of our convolutional neural network. In the network, we used a Feature Pyramid Network (FPN) backbone on top of a feedforwardResNetarchitecture (a) to generate a rich, multi-scale convolutional feature pyramid (b). To this backbone RetinaNet attaches twosubnetworks, one for classifying anchor boxes (c) and one for regressing from anchor boxes to ground-truth object boxes (d). The network we used was downloaded from GitHub.
Description of Drawing
Figure 1 shows the procedure of our invention.
Figure 2shows the general structure of our convolutional neural network based on RetinaNet.
Description of Preferred Embodiment
Network Design
The network model we used is RetinaNet, a one-stage detector created by researchers from Facebook Al Research (FAIR). The highest accuracy object detectors to date are based on a two-stage approach popularized by R-CNN, where a classifier is applied to a sparse set of candidate object locations. In contrast, one-stage detectors that are applied over a regular, dense sampling of possible object locations have the potential to be faster and simpler, but have trailed the accuracy of two-stage detectors thus far. In this paper, we investigate why this is the case. We discover that the extreme foreground-background class imbalance encountered during training of dense detectors is the central cause. We propose to address this class imbalance by reshaping the standard cross entropy loss such that it down-weights the loss assigned to well-classified examples. Our novel Focal Loss focuses training on a sparse set of hard examples and prevents the vast number of easy negatives from overwhelming the detector during training. To evaluate the effectiveness of our loss, we design and train a simple dense detector we call RetinaNet. Our results show that when trained with the focal loss, RetinaNet is able to match the speed of previous one-stage detectors while surpassing the accuracy of all existing state-of-the-art two-stage detectors.
In order to verify the validity of focal loss, a one-stage target detector
2020100048 10 Jan 2020
RetinaNet was designed. Its design utilizes efficient network feature pyramid and adopts anchor boxes. The best performing RetinaNet structure is ResNet-101-FPN, with abackbone whose AP can reach 39.1 in the COCO test set, with a speed of 5fps;
Focal Loss is to be utilized for the balancedcross entropy to solve the balance problem CE(pt) = =at log(pt)by setting a, but acan only balance the importance of positive/negative samples, not easy/hard sample, therefore proposed to reconstruct CE loss in order to reduce the weight of the easy sample to put more attention on the hard negative training, the standard formula for focal loss isFL(pt)=—(1 — pt)ylog(pt), the experiment also adds a balance parameter FL(pt)=—at(1 — pt)ylog(pt).
In order to verify the effect of focal loss, a simple one-stage detector was designed for detection. The network structure is called RetinaNet.
RetinaNet is essentially composed of two FSN sub-networks of resnet+FPN+. The design idea is that backbone selects effective feature extraction networks such as vgg and resnet. It mainly tries resnet50 and resnet101. FPN is to strengthen the multi-scale features formed in resnet. Obtain a feature map with more expressive multi-scale target area information, and finally use two FCN sub-networks with the same structure but no share parameters on the FPN feature map set to complete the target frame category classification and box position regression task;
Anchor information: The area of the anchor is from 322 to 5122. On the feature pyramid p3 to p7 level, there are three different aspect ratios at each level {1:2, 1:1, 2:1 }, for the denser scale coverage, add {20, 21/3, 22/3}, three different sizes,to each level of the anchor set. Each anchor is assigned a vector of length K as classification information, and a box regression information of length 4.
During the model training and deployment, use the trained model to perform the next decoding process for the top 1000bbox with the highest target probability of occurrence at each FPN level, then summarize all the boxes of the level, filter the box with 0.5threshold nms, and finally get the target final the box position; the training loss consists of the box position information L1 loss and the category information's focal loss. In the case of model initialization, considering the extreme imbalance of the positive and negative samples, the biasing of the last conv bias parameter is made.
In the network, we used a Feature Pyramid Network (FPN) backbone on top of a feedforwardResNetarchitecture to generate a rich, multi-scale convolutional feature pyramid. To this backbone RetinaNet attaches twosubnetworks, one for classifying anchor boxes and one for regressing from anchor boxes to ground-truth object boxes.[1]
Focal Loss is designed to solve the extreme imbalancebetween foreground and background classes in one-stage detectors duringtraining.
Procedure
2020100048 10 Jan 2020
The procedure of this invention is implemented as follows:
1. Preparing: We downloaded RetinaNet network and Pascal VOC 2007 data set from the internet.
2. Training and testing data set splitting: We divided the Pascal VOC 2007 set into training data set and testing data set.
3. We put the training data set into the convolutional neural network for training.
4. Finally, we put the testing data set into the convolutional neural network and the accuracy

Claims (1)

1. Method of object detection for vehicle on-board video based on RetinaNet, wherein said retinaNet is essentially composed of two FSN sub-networks of resnet+FPN+, the design idea is that backbone selects effective feature extraction networks such as vgg and resnet, it mainly tries resnet50 and resnet101.
AU2020100048A 2020-01-10 2020-01-10 Method of object detection for vehicle on-board video based on RetinaNet Ceased AU2020100048A4 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2020100048A AU2020100048A4 (en) 2020-01-10 2020-01-10 Method of object detection for vehicle on-board video based on RetinaNet

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
AU2020100048A AU2020100048A4 (en) 2020-01-10 2020-01-10 Method of object detection for vehicle on-board video based on RetinaNet

Publications (1)

Publication Number Publication Date
AU2020100048A4 true AU2020100048A4 (en) 2020-02-13

Family

ID=69412711

Family Applications (1)

Application Number Title Priority Date Filing Date
AU2020100048A Ceased AU2020100048A4 (en) 2020-01-10 2020-01-10 Method of object detection for vehicle on-board video based on RetinaNet

Country Status (1)

Country Link
AU (1) AU2020100048A4 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111598061A (en) * 2020-07-21 2020-08-28 成都中轨轨道设备有限公司 System and method for autonomously identifying and positioning contents of track signboard
CN111626200A (en) * 2020-05-26 2020-09-04 北京联合大学 Multi-scale target detection network and traffic identification detection method based on Libra R-CNN
CN112529095A (en) * 2020-12-22 2021-03-19 合肥市正茂科技有限公司 Single-stage target detection method based on convolution region re-registration
CN113392757A (en) * 2021-06-11 2021-09-14 恒睿(重庆)人工智能技术研究院有限公司 Method, device and medium for training human body detection model by using unbalanced data
CN113610070A (en) * 2021-10-11 2021-11-05 中国地质环境监测院(自然资源部地质灾害技术指导中心) Landslide disaster identification method based on multi-source data fusion

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111626200A (en) * 2020-05-26 2020-09-04 北京联合大学 Multi-scale target detection network and traffic identification detection method based on Libra R-CNN
CN111598061A (en) * 2020-07-21 2020-08-28 成都中轨轨道设备有限公司 System and method for autonomously identifying and positioning contents of track signboard
CN112529095A (en) * 2020-12-22 2021-03-19 合肥市正茂科技有限公司 Single-stage target detection method based on convolution region re-registration
CN112529095B (en) * 2020-12-22 2023-04-07 合肥市正茂科技有限公司 Single-stage target detection method based on convolution region re-registration
CN113392757A (en) * 2021-06-11 2021-09-14 恒睿(重庆)人工智能技术研究院有限公司 Method, device and medium for training human body detection model by using unbalanced data
CN113392757B (en) * 2021-06-11 2023-08-15 恒睿(重庆)人工智能技术研究院有限公司 Method, device and medium for training human body detection model by using unbalanced data
CN113610070A (en) * 2021-10-11 2021-11-05 中国地质环境监测院(自然资源部地质灾害技术指导中心) Landslide disaster identification method based on multi-source data fusion

Similar Documents

Publication Publication Date Title
AU2020100048A4 (en) Method of object detection for vehicle on-board video based on RetinaNet
CN110363201B (en) Weak supervision semantic segmentation method and system based on collaborative learning
CN106022300B (en) Traffic sign recognition method and system based on cascade deep study
Khalil Car plate recognition using the template matching method
Zhang et al. Vehicle detection in the aerial infrared images via an improved yolov3 network
CN108734189A (en) Vehicle License Plate Recognition System based on atmospherical scattering model and deep learning under thick fog weather
Aggarwal et al. A robust method to authenticate car license plates using segmentation and ROI based approach
CN102163278B (en) Illegal vehicle intruding detection method for bus lane
CN112990282B (en) Classification method and device for fine-granularity small sample images
CN110009058A (en) A kind of parking lot Vehicle License Plate Recognition System and method
CN105956610B (en) A kind of remote sensing images classification of landform method based on multi-layer coding structure
Le et al. Vehicle count system based on time interval image capture method and deep learning mask R-CNN
Su et al. A new local-main-gradient-orientation HOG and contour differences based algorithm for object classification
Naimi et al. Multi-nation and multi-norm license plates detection in real traffic surveillance environment using deep learning
Prabhu et al. Recognition of Indian license plate number from live stream videos
Onim et al. Traffic surveillance using vehicle license plate detection and recognition in bangladesh
Latha et al. Image understanding: semantic segmentation of graphics and text using faster-RCNN
Shomee et al. License plate detection and recognition system for all types of bangladeshi vehicles using multi-step deep learning model
Vu et al. Traffic incident recognition using empirical deep convolutional neural networks model
Zhang et al. Learning with free object segments for long-tailed instance segmentation
Ismail License plate Recognition for moving vehicles case: At night and under rain condition
CN104517127A (en) Self-learning pedestrian counting method and apparatus based on Bag-of-features model
CN116052206A (en) Bird identification method and system integrating visual saliency
CN115456941A (en) Novel electric power insulator defect detection and identification method and system
Noaeen et al. Social media analysis for traffic management

Legal Events

Date Code Title Description
FGI Letters patent sealed or granted (innovation patent)
MK22 Patent ceased section 143a(d), or expired - non payment of renewal fee or expiry