CN109670450B - Video-based man-vehicle object detection method - Google Patents
Video-based man-vehicle object detection method Download PDFInfo
- Publication number
- CN109670450B CN109670450B CN201811565548.5A CN201811565548A CN109670450B CN 109670450 B CN109670450 B CN 109670450B CN 201811565548 A CN201811565548 A CN 201811565548A CN 109670450 B CN109670450 B CN 109670450B
- Authority
- CN
- China
- Prior art keywords
- video
- detection method
- object detection
- conv4
- vehicle object
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/56—Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
- G06V20/58—Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
- G06V20/584—Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads of vehicle lights or traffic lights
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/2431—Multiple classes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/56—Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
- G06V20/58—Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/62—Text, e.g. of license plates, overlay texts or captions on TV images
- G06V20/625—License plates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/08—Detecting or categorising vehicles
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Multimedia (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Image Analysis (AREA)
- Traffic Control Systems (AREA)
Abstract
The invention provides a man-vehicle object detection method based on video, which comprises the following steps: A. collecting data; B. labeling the data in the step A; C. generating a training set and a testing set for the marked data; D. constructing a convolutional neural network E, and performing model training on the convolutional neural network in the step D; F. and (5) detecting. The invention has the beneficial effects that: the detection rate is high, and the snapshot can achieve good effect for some vehicles which are difficult to detect and identify by the vehicle license plate; the intelligent monitoring system has a relatively accurate identification effect on identification and monitoring of pedestrians and non-motor vehicles, can better realize various monitoring evidence collection, and provides a guarantee for harmonious society, safe traffic and intelligent traveling.
Description
Technical Field
The invention belongs to the technical field of traffic monitoring, and particularly relates to a man-vehicle object detection method based on video.
Background
In the traffic field, detection and separation of vehicles, pedestrians and non-motor vehicles are unavoidable, then the vehicles, pedestrians and non-motor vehicles are respectively monitored, and early warning and recording of illegal events are carried out, so that the detection of vehicles is core in the traffic monitoring technical field, the detection of vehicles is relatively mature and is based on license plates, the accuracy can be up to 99%, but for some license-free vehicles and some engineering vehicles, the snap vehicles cannot be effectively positioned through license plate recognition, and thus, the difficulty of post evidence collection work is caused; pedestrian and non-motor vehicle detection, because the targets are relatively small and the complexity of the gesture features is far higher than that of vehicle detection, so far, the gesture features are still in continuous exploration optimization; pedestrian and non-motor vehicles are also used as main objects in traffic, and are indispensable for effectively detecting components, and have important roles in the progress of traffic fields.
Disclosure of Invention
In view of the above, the present invention is directed to a method for detecting objects of vehicles based on video, so as to solve the above-mentioned drawbacks.
In order to achieve the above purpose, the technical scheme of the invention is realized as follows:
a person and vehicle object detection method based on video comprises the following steps:
A. collecting data;
B. labeling the data in the step A;
C. generating a training set and a testing set for the marked data;
D. constructing a convolutional neural network;
E. performing model training on the convolutional neural network in the step D;
F. and (5) detecting.
Further, in the step a, pictures of various traffic targets in different time periods of various traffic scenes are collected.
In the step B, the circumscribed rectangle of the traffic target is used as a boundary for marking.
Further, in the step C, the training set and the testing set are randomly generated according to the proportion of 4:1 of the number of pictures.
Further, the process of constructing the convolutional neural network in the step D is as follows:
D1. using a VGG network, replacing one filter and space of size greater than or equal to 5*5 at a time with a plurality of filters and spaces of size less than or equal to 3*3 at the base layer of the roll;
D2. and removing the pooling layers respectively connected behind each layer of Conv1_2, conv2_2, conv3_2, conv4_2 and Conv5_2 in the VGG network, and adding four groups of convolution modules of Conv5_x, conv6_x, conv7_x and Conv8_x, wherein Convy_1 in each convolution module group is 1/2 of the channel number of Convy_2.
Further, the training process in the step E is as follows:
E1. c, carrying out data enhancement on the training set and the test set generated in the step C through changing brightness, saturation, rotation and mirroring and image clipping;
E2. position regression and classification probability calculation were performed on the feature maps obtained from conv4_2, conv5_2, conv6_2, conv7_2, and conv8_2, respectively.
Further, the detection process in the step F is as follows:
F1. image color transformation: converting the image color format from YUV to BGR format, the conversion conditions are as follows,
B=Y+1.779*(U-128)
G=Y-0.3455*(U-128)-0.7169*(V-128)
R=Y+1.4075*(V-128);
F2. sending the BGR format into a trained model for detection, and outputting the detected target type, target position (x, y, w, h) and target confidence coefficient by the model;
F3. filtering the result, namely filtering out false detection targets by limiting the confidence coefficient of the targets; the most accurate target type information is obtained through cross-correlation definition; and obtaining the most accurate target position information (x, y, w, h) through non-maximum value inhibition.
Compared with the prior art, the human-vehicle object detection method based on the video has the following advantages:
the human-vehicle object detection method based on the video is high in detection rate, and good effects can be achieved for some vehicles which are difficult to detect and identify by using the vehicle license plate; the intelligent monitoring system has a relatively accurate identification effect on identification and monitoring of pedestrians and non-motor vehicles, can better realize various monitoring evidence collection, and provides a guarantee for harmonious society, safe traffic and intelligent traveling.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention. In the drawings:
FIG. 1 is a training flow chart according to an embodiment of the present invention;
fig. 2 is a network configuration diagram of an embodiment of the present invention.
Detailed Description
It should be noted that, without conflict, the embodiments of the present invention and features of the embodiments may be combined with each other.
The invention will be described in detail below with reference to the drawings in connection with embodiments.
As shown in fig. 1, a method for detecting a person and a vehicle based on video includes the following steps:
A. collecting data;
B. labeling the data in the step A;
C. generating a training set and a testing set for the marked data;
D. constructing a convolutional neural network;
E. performing model training on the convolutional neural network in the step D;
F. and (5) detecting.
In the step A, pictures of various traffic targets are collected in different time periods (including daytime, nighttime, forward light and backlight) of various traffic scenes, wherein the various traffic targets comprise a Bicycle, a Pederstrain, a Car, a Truck, a Bus, a tricycl and an Engineering Truck
In the step B, the circumscribed rectangle of the traffic target is used as a boundary for marking, and the labels are a Bicycle (non-motor Bicycle), a pedestrian, a Car, a Truck, a Bus, a tricycl and an engineering_truck.
In the step C, a training set and a testing set are randomly generated according to the proportion of 4:1 of the number of pictures; the training set is used for training the network weight, and the testing set is used for testing the accuracy of the trained model and preventing the training from fitting.
The process of constructing the convolutional neural network in the step D is as follows:
D1. using VGG network, replacing one large size filter and interval with several small size filters and intervals on the roll base layer, wherein the small size refers to 3*3 and below; the large size generally refers to 5*5 and above, so that the visual field of input data is not changed, an improved convolution network is comprehensively designed for training of vehicles and people, two 3*3 convolutions can be consistent with one 5*5 convolution receptive field, and three 3*3 convolutions can be consistent with one 7*7 convolution receptive field;
D2. the pooling layers respectively connected behind each layer of Conv1_2, conv2_2, conv3_2, conv4_2 and Conv5_2 are removed in the VGG network, so that information loss caused by dimension reduction is reduced, four groups of convolution modules of Conv5_x, conv6_x, conv7_x and Conv8_x are added, wherein Convy_1 in each convolution module group is 1/2 of the number of channels of Convy_2, and through deepening the convolution network and different configurations of channels, more intensive target characteristics can be obtained; a specific network structure diagram is shown in fig. 2;
the training process in the step E is as follows:
E1. the existing samples (the training set and the test set generated in the step C) are subjected to data enhancement by changing brightness, saturation, rotation, mirroring and image clipping, and the existing samples are subjected to data enhancement by changing brightness, saturation, rotation, mirroring, image clipping and other methods due to the fact that deep learning requires a huge number of samples;
E2. position regression and classification probability calculation were performed on the feature maps obtained from conv4_2, conv5_2, conv6_2, conv7_2, and conv8_2, respectively, and position regression and classification probability calculation were performed on the feature maps obtained from conv4_2, conv5_2, conv6_2, conv7_2, and conv8_2, respectively, so that the model had multi-scale features. In this way, even the smaller target has more obvious position characteristics on the larger characteristic diagram, so that the large target and the small target can be detected in a compatible and better mode,
the Loss function Loss used is a weighting of the position error mbox_loc and the confidence error mbox_conf:
Loss(x,c,l,g)=1/N(mbox_conf(x,c)+α*mbox_loc(x,l,g))
wherein, the weight proportion alpha takes the value range of 0-1, N is the number of positive samples of the group Truth (target real frame), c is the category confidence prediction value, l is the position prediction value of the prediction frame, and g is the position parameter of the group Truth;
the position error mbox_loc is calculated using SmoothL1 loss:
wherein k represents the category to which the group Truth belongs,indicating whether the ith prediction frame and the jth real frame are matched with each other with respect to the category k, wherein the matching is 1, otherwise, the matching is 0;
the confidence error mbox_conf is obtained by the softmaxloss method:
wherein p represents the predicted class of the prediction,indicating whether the ith prediction box and the jth real box match with respect to class p.
The detection process in the step F is as follows:
F1. image color transformation: since the image format acquired by the monitoring camera is YUV type and the network input is BGR format, the image color format is converted from YUV to BGR format, the conversion conditions are as follows,
B=Y+1.779*(U-128)
G=Y-0.3455*(U-128)-0.7169*(V-128)
R=Y+1.4075*(V-128)
F2. sending the BGR format into a trained model for detection, and outputting the detected target type, target position (x, y, w, h) and target confidence coefficient by the model;
F3. filtering the result, namely filtering out false detection targets by limiting the confidence coefficient of the targets; the most accurate target type information is obtained through the definition of an intersectional-over-Union (IOU); the most accurate target location information (x, y, w, h) is obtained by Non-maximal suppression (Non-Maximum Suppression, NMS). In this embodiment, the confidence level is greater than or equal to 0.8 as the correct target, and then the targets smaller than 0.8 are filtered out; taking IOU greater than or equal to 0.4, and considering as the same target; the NMS threshold was taken to be 0.4.
The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, alternatives, and improvements that fall within the spirit and scope of the invention.
Claims (4)
1. The human-vehicle object detection method based on the video is characterized by comprising the following steps of:
A. collecting data;
B. labeling the data in the step A;
C. generating a training set and a testing set for the marked data;
D. constructing a convolutional neural network;
E. performing model training on the convolutional neural network in the step D;
F. detecting;
the convolutional neural network of the step D comprises the following steps:
conv1_1, conv1_2, conv2_1, conv2_2, conv3_1, conv3_2, conv4_1, conv4_2, conv4_3, conv5_1, conv5_2, conv6_1, conv6_2, conv7_1, conv7_2, conv8_1, conv8_2 convolutional layers;
the training process in the step E is as follows:
E1. c, carrying out data enhancement on the training set and the test set generated in the step C through changing brightness, saturation, rotation and mirroring and image clipping;
E2. position regression and classification probability calculation were performed on the feature maps obtained from conv4_2, conv5_2, conv6_2, conv7_2, and conv8_2, respectively;
in-process feature graphs obtained from Conv4_2, conv5_2, conv6_2, conv7_2 and Conv8_2 are respectively calculated, and finally, the results on the feature graphs are finally fused to obtain final output result information output.
2. The video-based person-vehicle object detection method according to claim 1, wherein: and A, collecting pictures containing various traffic targets in different time periods of various traffic scenes.
3. The video-based person-vehicle object detection method according to claim 1, wherein: and B, marking by taking the circumscribed rectangle of the traffic target as a boundary.
4. The video-based person-vehicle object detection method according to claim 1, wherein: and C, randomly generating a training set and a testing set according to the proportion of 4:1 of the number of pictures.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811565548.5A CN109670450B (en) | 2018-12-20 | 2018-12-20 | Video-based man-vehicle object detection method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811565548.5A CN109670450B (en) | 2018-12-20 | 2018-12-20 | Video-based man-vehicle object detection method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109670450A CN109670450A (en) | 2019-04-23 |
CN109670450B true CN109670450B (en) | 2023-07-25 |
Family
ID=66144134
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811565548.5A Active CN109670450B (en) | 2018-12-20 | 2018-12-20 | Video-based man-vehicle object detection method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109670450B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110111577B (en) * | 2019-05-15 | 2020-11-27 | 武汉纵横智慧城市股份有限公司 | Non-motor vehicle identification method, device, equipment and storage medium based on big data |
CN110399800B (en) * | 2019-06-28 | 2021-05-07 | 智慧眼科技股份有限公司 | License plate detection method and system based on deep learning VGG16 framework and storage medium |
CN111461130B (en) * | 2020-04-10 | 2021-02-09 | 视研智能科技(广州)有限公司 | High-precision image semantic segmentation algorithm model and segmentation method |
CN113239962A (en) * | 2021-04-12 | 2021-08-10 | 南京速度软件技术有限公司 | Traffic participant identification method based on single fixed camera |
CN114973154A (en) * | 2022-07-29 | 2022-08-30 | 成都宜泊信息科技有限公司 | Parking lot identification method, parking lot identification system, parking lot control method, parking lot control system, parking lot equipment and parking lot control medium |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104537387A (en) * | 2014-12-16 | 2015-04-22 | 广州中国科学院先进技术研究所 | Method and system for classifying automobile types based on neural network |
CN107358182A (en) * | 2017-06-29 | 2017-11-17 | 维拓智能科技(深圳)有限公司 | Pedestrian detection method and terminal device |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10068171B2 (en) * | 2015-11-12 | 2018-09-04 | Conduent Business Services, Llc | Multi-layer fusion in a convolutional neural network for image classification |
CN107316007B (en) * | 2017-06-07 | 2020-04-03 | 浙江捷尚视觉科技股份有限公司 | Monitoring image multi-class object detection and identification method based on deep learning |
CN107944442B (en) * | 2017-11-09 | 2019-08-13 | 北京智芯原动科技有限公司 | Based on the object test equipment and method for improving convolutional neural networks |
CN108090457A (en) * | 2017-12-27 | 2018-05-29 | 天津天地人和企业管理咨询有限公司 | A kind of motor vehicle based on video does not give precedence to pedestrian detection method |
CN108564555B (en) * | 2018-05-11 | 2021-09-21 | 中北大学 | NSST and CNN-based digital image noise reduction method |
-
2018
- 2018-12-20 CN CN201811565548.5A patent/CN109670450B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104537387A (en) * | 2014-12-16 | 2015-04-22 | 广州中国科学院先进技术研究所 | Method and system for classifying automobile types based on neural network |
CN107358182A (en) * | 2017-06-29 | 2017-11-17 | 维拓智能科技(深圳)有限公司 | Pedestrian detection method and terminal device |
Also Published As
Publication number | Publication date |
---|---|
CN109670450A (en) | 2019-04-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109670450B (en) | Video-based man-vehicle object detection method | |
CN111368687B (en) | Sidewalk vehicle illegal parking detection method based on target detection and semantic segmentation | |
Chen et al. | Vehicle detection, tracking and classification in urban traffic | |
CN111104903B (en) | Depth perception traffic scene multi-target detection method and system | |
CN104134079B (en) | A kind of licence plate recognition method based on extremal region and extreme learning machine | |
CN104463241A (en) | Vehicle type recognition method in intelligent transportation monitoring system | |
CN111800507A (en) | Traffic monitoring method and traffic monitoring system | |
CN110222596B (en) | Driver behavior analysis anti-cheating method based on vision | |
CN103093249A (en) | Taxi identifying method and system based on high-definition video | |
CN115311241B (en) | Underground coal mine pedestrian detection method based on image fusion and feature enhancement | |
CN111881739B (en) | Automobile tail lamp state identification method | |
Razalli et al. | Emergency vehicle recognition and classification method using HSV color segmentation | |
CN111274886A (en) | Deep learning-based pedestrian red light violation analysis method and system | |
CN114329074B (en) | Traffic energy efficiency detection method and system for ramp road section | |
CN113033275A (en) | Vehicle lane-changing non-turn signal lamp analysis system based on deep learning | |
CN115424217A (en) | AI vision-based intelligent vehicle identification method and device and electronic equipment | |
Zhang et al. | Vehicle detection in UAV aerial images based on improved YOLOv3 | |
CN114359196A (en) | Fog detection method and system | |
CN104778454A (en) | Night vehicle tail lamp extraction method based on descending luminance verification | |
Zhang et al. | A front vehicle detection algorithm for intelligent vehicle based on improved gabor filter and SVM | |
CN105574490A (en) | Vehicle brand identification method and system based on headlight image characteristics | |
CN115019263A (en) | Traffic supervision model establishing method, traffic supervision system and traffic supervision method | |
CN114882469A (en) | Traffic sign detection method and system based on DL-SSD model | |
CN114372556A (en) | Driving danger scene identification method based on lightweight multi-modal neural network | |
Visshwak et al. | On-the-fly traffic sign image labeling |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |