CN116153108A

CN116153108A - Method for evaluating safety influence of illumination on intersection by using random forest model

Info

Publication number: CN116153108A
Application number: CN202211740037.9A
Authority: CN
Inventors: 汪圆; 贾晨雨; 杜文俊; 殷玉明; 方浩杰
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2022-12-31
Filing date: 2022-12-31
Publication date: 2023-05-23

Abstract

A method of assessing illumination versus intersection safety using a random forest model, comprising: 1. designing an experiment step; 2. processing the data; 3. analyzing the variables and the data; 4. establishing a model and determining an evaluation index; 5. training a model and analyzing results. The invention uses the speed difference before and after the intersection entrance lane running as the basis for calculating the speed change of the driver, and adopts a random forest method to establish a regression model consisting of the output variable of the speed change and the input variable of the traffic characteristics such as illumination. The invention can analyze the influence of each characteristic variable on the speed of the driver passing through the intersection and calculate the illuminance value meeting the safety driving of the driver under different scenes.

Description

Method for evaluating safety influence of illumination on intersection by using random forest model

Technical Field

The invention relates to the field of intersection safety, in particular to a method for evaluating the safety of an intersection by using a random forest model.

Background

Intersection safety has been an important component in addressing local road safety challenges. Intersections can vary greatly in size, shape, number of entrance lanes, and number of turning lanes. Signalized intersections are typically the type of intersection where the traffic is greatest and are complex to operate, many factors leading to potential safety problems. Operations such as crossing and turning at the crossing may cause the vehicle to collide with other vehicles, pedestrians, and cyclists. In urban areas, more than 50% of car accidents and in rural areas more than 30% of car accidents are attributed to intersections. According to the report of the National Highway Traffic Safety Administration (NHTSA), the number of deaths associated with intersections has been on average 25% of all traffic deaths in the past few years, and about 50% of all traffic injuries.

Illumination at intersections is one of the key elements affecting night safety. The international commission on illumination (CIE) reveals that more than half of accidents occur at night, indicating that the proportion of night collisions is higher than during the day, and that poor illumination levels can lead to serious safety problems and road traffic accidents. Under different traffic conditions, there is a statistically significant relationship between average road brightness and safety, and the accident rate decreases with increasing visibility level. Therefore, maintaining proper illuminance in the driving environment of the intersection is critical to driving safety. In summary, a method for evaluating the safety of the intersection by illuminance is needed, which has important significance for the design of the intelligent illumination system of the intersection for real-time speed detection of a driver, and a management organization can also formulate quantifiable measures according to the evaluation method to improve the safety level of the intersection.

Disclosure of Invention

Aiming at the problem of intersections, the invention provides a method for evaluating the safety of illumination to the intersections by using a random forest model.

The invention provides a method for evaluating the safety of illumination on an intersection, which takes the speed difference before and after the intersection enters a lane to run as the basis for calculating the speed change of a driver, and establishes a regression model consisting of an output variable of the speed change and an input variable of traffic characteristics such as illumination and the like by adopting a random forest method. The method can analyze the influence of each characteristic variable on the speed of the driver passing through the intersection and calculate the illuminance value meeting the safety driving of the driver under different scenes.

The technical scheme of the invention is as follows:

a method for assessing illumination versus intersection safety using a random forest model, comprising the steps of:

1. designing an experiment step;

the field test is performed at night on a typical workday (e.g., tuesday, wednesday, thursday). The selected intersection size and the environmental conditions should be as identical as possible and all under signal control. Each intersection should have a center line, motor vehicle and non-motor vehicle separation lines, and left turn lanes. The speed of all intersections is limited to within 50 km/h. The width of the inlet channel was measured using a laser rangefinder.

And acquiring a specific video for initial traffic data acquisition by using the unmanned aerial vehicle, and performing investigation of the same duration at each intersection. And identifying parameters such as flying height, charge level and the like of the aircraft by using the GPS. Current methods use only vehicles and other traffic data on horizontal bi-directional roads.

In addition, when video is shot, the illuminance (illumination) of each intersection needs to be recorded at the same time. The machine is placed on the ground to measure the illuminance around the sidewalk and the illuminance is measured at regular intervals at night. Each intersection records 12 points in total in a period of time (the recorded points are the start point, the middle point and the end point of 4 sidewalks respectively). And finally, averaging 12 illumination values around the sidewalk, and calculating the recorded intersection illumination in the corresponding time period, wherein the illumination calculation expression is as follows:

2. processing the data;

in order to examine the influence of illumination on the change of the vehicle speed when a driver passes through an intersection, data collected by an unmanned aerial vehicle are screened. Where a valid event or event is defined as the entry of a driver through an intersection without other disturbances. The criteria for vehicle selection in video processing include:

1) All selected vehicles are not affected by other interactions (e.g., other vehicles or pedestrians).

2) Vehicles that freely pass through the intersection (excluding vehicles that signal a stop of a red light).

3) No turning vehicles are involved in the dataset.

After the video data are identified, the video is processed by utilizing Tracker software, and track data and speed data of vehicles in green light time intervals before and after the entrance of the intersection are obtained. With this software, only one vehicle can be tracked automatically at a time. Each vehicle can be considered a particle and is captured at a rate of 24 frames per second. The Tracker can calculate the actual coordinate value and the actual velocity value through the conversion of the pixel coordinates and the pixel velocity. The parameters t (i.e., time), x (i.e., abscissa), y (i.e., vertical axis), v (i.e., speed) for each vehicle are then derived from the software.

3. Analyzing the variables and the data;

the invention uses the speed change as a dependent variable to measure the safety degree of the drivers at the intersection. Speed change is a characteristic change in speed of the same vehicle, and is a useful method for evaluating safety, and the larger the speed change is, the higher the accident rate is. The vehicle speed variation of the driver before and after the entrance lane of the intersection is smaller, which indicates safe driving. The calculation formula of the speed variation is as follows:

ΔV＝V _s -V _o , (2)

in the formula ,V_s The average value of the vehicle speed for the intersection starting intersection region (5 meters long in this study); v (V) _o The initial speed is indicated and the calculated value is the average value of the vehicle speed before the vehicle starts to pass through the intersection (i.e., 60 to 40 meters from the line of the initial intersection).

The independent variables used in the present invention include illuminance (illuminence), initial velocity (V ₀ ) Left turn traffic (LeftV), right turn traffic (right tv), straight traffic (straight v), total traffic (total v), ratio of straight to right turn traffic (ratio sr), number of entrance lanes (NumLane). The illuminance at the intersection is calculated according to equation (1). Then, the illuminance value at a specific time point is calculated using linear interpolation. Linear interpolation is a method of curve fitting using linear polynomials, constructing new data points over a set of known discrete data points. All traffic flows including left turn, right turn, straight run and total traffic flow are counted once per fixed time period of the experiment. Also, the traffic flow at random time points is obtained by linear interpolation.

4. Establishing a model and determining an evaluation index;

the invention adopts a Random Forest model (RF) to simulate the influence of traffic elements such as intersection lights and the like on the speed change of a driver. Randomly sampling and training each decision tree in the model from the data set, and finally integrating the result of each decision tree. Compared with a linear model, the random forest regression model can capture nonlinear interaction between the features and the target, and quickly analyze the correlation between the data features and the tag data. Form data having numerical features, or category features of less than hundreds of categories, can be handled well.

The RF algorithm may perform regression modeling based on a set of decision trees and operate in conjunction with Bagging (guided aggregation) and bootstrapping (self-help) techniques. The RF algorithm flow of the invention is as follows:

41 Bootstrapping): each base classifier is a simple decision tree. Each tree is constructed by selecting a random set of observation data from a training dataset. All the basic models are built independently using different subsamples of the dataset. The selected data samples (approximately two-thirds of the cases) are referred to as boot samples, and the remaining data samples (approximately one-third of the cases) are referred to as outer bag samples.

42 Training): the RF model generates a forest of decision trees, each containing a random subset of all features (i.e., part of the features in Illuminance, V0, leftV, right ttv, straightV, totalV, ratioSR and NumLane), that is best segmented in the manner described above to build the decision tree.

43 Testing/Vote): the prediction of new samples/test samples may be achieved by averaging the predictions of all decision trees, or taking a majority vote on the decision tree. The final regression result of the RF algorithm in the present invention is obtained by averaging the outputs of all decision trees built on the bootstrap samples.

RF may calculate the Mean Square Error (MSE) of OOB samples. The lower error rate may eliminate the need for test verification (e.g., cross-validation). In the RF regression model, in order to verify the accuracy of the model, the squared residual mean (mean of squared residuals, MSE) and the interpretation percentage variance (percent variance explained) are used as evaluation indexes.

The mean square error calculation method is as follows:

wherein ,

representing predicted data obtained by substituting OOB data of the ith tree, y _i Is the true data of the ith tree. The percent variance (percent variance explained,% VarEplained) was calculated as follows:

in the formula ,

calculating by taking n as a divisor; the accuracy of the RF model can be expressed in% VarExplained values, the greater the value the higher the model accuracy.

Importantly, when the (OOB) data for that variable is arranged, while all other variables remain unchanged, RF can calculate the feature variable importance by calculating the delta of the prediction error. The obvious advantages of random forest analysis compared with other models include no possibility of overfitting and limited generalization error generated when multiple trees grow; the problem of multiple collinearity is alleviated by reducing the likelihood of selecting highly correlated features in the variable samples. In the RF algorithm, the contribution of the highly relevant features is preserved so as not to have too much impact on the top-ranked influential features. Unlike other classifiers, such as SVMs, the stochastic process in RF can remain high-performance when there are many noise features.

5. Training a model and analyzing results;

a valid vehicle tracking segment is identified from the drone video. Analysis was performed using 1 dependent variable (e.g., speed change) and 8 independent variables (e.g., initial speed, illuminance, etc.).

The random forest algorithm was developed in the R ((64 3.3.3) software. The dataset was randomly split into two parts, with 70% of the dataset defined as training data and the remaining 30% used as test data. The model required two main parameters: number of decision trees (n _tree ) And the number of variables (m _try )。m _try The larger the value, the greater the strength and inter-tree correlation of each tree. Thus, in RF analysis, a suitable m is selected _try The value is of vital importance. For regression problems, the suggested characteristics/input variables are 0.5/1/2 times the total 1/3.

The goodness of fit (percent variance of interpretation) of the RF test data is analyzed, and the importance of each variable in the final model is calculated, representing its contribution to the predicted response. To further quantitatively investigate the quantitative relationship between illuminance and intersection safety, a two-dimensional nuclear density map was displayed on all selected datasets, studying the speed variation of different illuminance, initial speed and linear traffic. The two-dimensional kernel density map is a smoothed color density representation of the scatter plot, a nonparametric technique based on a probability density function of the kernel density estimate. The goal of the density estimation is to take a finite sample of data and infer potential probability density functions for all places, including places where there are no data points. In the kernel density estimation, the contribution of each data point is smoothed from a single point to a neighborhood. These smoothed density maps show the average trend of the scatter plots.

The working principle of the invention is as follows: according to the invention, the unmanned aerial vehicle is utilized to acquire specific videos of all intersections for initial traffic data acquisition, and the speed change is used as a safety evaluation index to calculate the speed difference before and after the lane passing through the entrance of the intersection. A random forest model (RF) is established, and regression analysis is carried out on variables such as speed change, illuminance, initial speed, left turning traffic flow, right turning traffic flow, straight running traffic flow, total traffic flow, ratio of straight running traffic flow to right turning traffic flow, number of entrance lanes and the like.

The invention has the advantages that: the invention can analyze the influence of each characteristic variable on the speed of the driver passing through the intersection, calculate the illuminance value meeting the safety driving of the driver under different scenes, and has important significance for the design of the intelligent illumination system of the intersection based on the real-time speed detection of the driver for insight. The authorities may formulate quantifiable measures to improve the level of security at intersections based on this approach.

Drawings

Fig. 1 (a) -1 (d) are four intersection geometry diagrams (daytime shot vertical views) of the present invention, wherein fig. 1 (a) is a mountain-dahurian street intersection, fig. 1 (b) is a leshan-dahurian street intersection, fig. 1 (c) is a taishan-austria street intersection, and fig. 1 (d) is a mountain-dream street intersection;

FIG. 2 is a screenshot of a drone shooting interface of the present invention;

FIG. 3 is a Tracker software video processing process of the present invention;

FIG. 4 is a schematic illustration of the vehicle speed change calculation of the present invention;

FIG. 5 is a flow chart of regression modeling of the RF algorithm of the present invention;

FIG. 6 is a graph showing the velocity profile of the present invention under different illumination conditions;

FIGS. 7 (a) -7 (b) are graphs showing the velocity versus illuminance for different initial velocity scenarios according to the present invention, where FIG. 7 (a) is a graph showing the velocity versus illuminance for a low velocity scenario; FIG. 7 (b) is a graph showing the velocity versus illuminance for a high-velocity scene;

FIGS. 8 (a) -8 (b) are graphs showing the speed versus illuminance for different straight traffic scenarios according to the present invention, and FIG. 8 (a) is a graph showing the speed versus illuminance for a low traffic scenario; fig. 8 (b) is a graph showing the velocity versus illuminance distribution in a high traffic flow field.

Detailed Description

The technical scheme of the invention is further described below with reference to the accompanying drawings.

The method for evaluating the security of the illumination to the intersection by using the random forest model comprises the following steps: the method comprises the following steps:

1. designing an experiment step;

on-site trials were conducted on typical workdays (Tuesday, wednesday, thursday) at 17:30 to 19:30 nights, from 11 months 2017 to 18 months 12. The data acquisition is carried out on 4 intersections (a Huangshan-Xinglong street intersection, a Leshan-Xinglong street intersection, a Taishan-Oriental street intersection and a Huangshan-Meng street intersection) of a area built in Nanjing, jiangsu, china. Fig. 1 shows a vertical view of a city four signalized intersection during the day. The four intersections are almost the same size and are under signal control. Each intersection should have a center line, motor vehicle and non-motor vehicle separation lines, and left turn lanes. The speed of all intersections is limited to within 50 km/h. The width of the inlet channel was measured using a laser rangefinder. The lane numbers and entrance widths of these four intersections are listed in table 1.

TABLE 1 number and width of lanes at entrance to crossing

A special video was acquired using an unmanned aerial vehicle (DJI pharom 3 Advanced) for initial traffic data acquisition and two hours of investigation was performed at each intersection. Fig. 2 is a screenshot of an unmanned aerial vehicle operation interface. And identifying parameters such as flying height, charge level and the like of the aircraft by using the GPS. Current methods use only vehicles and other traffic data on horizontal bi-directional roads.

In addition, when video is shot, the illuminance (illuminence) of four intersections needs to be recorded at the same time. Illuminance was measured in lux using an illuminometer (konikamantadine, T-10A) at this time. The machine is placed on the ground to measure the illuminance around the sidewalk and the illuminance is measured at regular intervals at night. Each intersection records 12 points (the recorded points are the starting point, the middle point and the end point of 4 sidewalks respectively) in a period (often 15 minutes). And finally, averaging 12 illumination values around the sidewalk, and calculating the recorded intersection illumination in the corresponding time period, wherein the illumination calculation expression is as follows:

2. processing the data;

3) No turning vehicles are involved in the dataset.

After the video data is identified, the video is processed by using Tracker software, so as to obtain track data and vehicle speed data of vehicles in green light time intervals before and after the intersection entrance, as shown in fig. 3. With this software, only one vehicle can be tracked automatically at a time. Each vehicle can be considered a particle and is captured at a rate of 24 frames per second. The Tracker can calculate the actual coordinate value and the actual velocity value through the conversion of the pixel coordinates and the pixel velocity. The parameters t (i.e., time), x (i.e., abscissa), y (i.e., vertical axis), v (i.e., speed) for each vehicle are then derived from the software.

3. Analyzing the variables and the data;

the invention uses the speed change as a dependent variable to measure the safety degree of the drivers at the intersection. Speed change is a characteristic change in speed of the same vehicle, and is a useful method for evaluating safety, and the larger the speed change is, the higher the accident rate is. The vehicle speed variation of the driver before and after the entrance lane of the intersection is smaller, which indicates safe driving. Fig. 4 shows the calculation process of the vehicle speed change in the present study. The calculation formula of the speed variation is as follows:

ΔV＝V _s -V _o , (2)

in the formula ,V_s The average value of the vehicle speed in the intersection starting area (5 m in this study); v (V) _o The initial speed is indicated and the calculated value is the average value of the vehicle speed before the vehicle starts to pass through the intersection (i.e., 60 to 40 meters from the line of the initial intersection).

The independent variables used in the present invention include illuminance (illuminence), initial velocity (V ₀ ) Left turn traffic (LeftV), right turn traffic (right tv), straight traffic (straight v), total traffic (total v), ratio of straight to right turn traffic (ratio sr), number of entrance lanes (NumLane). The illuminance at the intersection is calculated according to equation (1). Illuminance was recorded every 15 minutes at night, and the illuminance value at a specific time point was calculated using linear interpolation. Linear interpolation is a method of curve fitting using linear polynomials, constructing new data points over a set of known discrete data points. All traffic flows, including left turn, right turn, straight run and total traffic flow, were counted every 5 minutes in this experiment. Also, the traffic flow at random time points is obtained by linear interpolation.

4. Establishing a model and determining an evaluation index;

The RF algorithm may perform regression modeling based on a set of decision trees and operate in conjunction with Bagging (guided aggregation) and bootstrapping (self-help) techniques. As shown in fig. 5, the RF algorithm flow of the present invention is as follows:

1) Bootstrapping: each base classifier is a simple decision tree. Each tree is constructed by selecting a random set of observation data from a training dataset. All the basic models are built independently using different subsamples of the dataset. The selected data samples (approximately two-thirds of the cases) are referred to as boot samples, and the remaining data samples (approximately one-third of the cases) are referred to as outer bag samples.

2) Tracking: the RF model generates a forest of decision trees, each containing a random subset of all features (i.e., part of the features in Illuminance, V0, leftV, right ttv, straightV, totalV, ratioSR and NumLane), that is best segmented in the manner described above to build the decision tree.

3) Testing/Vote: the prediction of new samples/test samples may be achieved by averaging the predictions of all decision trees, or taking a majority vote on the decision tree. The final regression result of the RF algorithm in the present invention is obtained by averaging the outputs of all decision trees built on the bootstrap samples.

The mean square error calculation method is as follows:

wherein ,

representing predicted data obtained by substituting OOB data of the ith tree, y _i Is the true data of the ith tree. The percent variance (percent variance explained,% varexplainated) was calculated as follows: />

in the formula ,

5. Training a model and analyzing results;

213 valid vehicle tracking segments are identified in total from the drone video. Analysis was performed using 1 dependent variable (e.g., speed change) and 8 independent variables (e.g., initial speed, illuminance, etc.). Table 2 summarizes the specific characteristics of all variables of the data set selected in the present invention.

Table 2 variable characteristic table

The random forest algorithm was developed in the R ((64 3.3.3) software. The dataset was randomly split into two parts, with 70% of the dataset defined as training data and the remaining 30% used as test data. The model required two main parameters: number of decision trees (n _tree ) And the number of variables (m _try )。m _try The larger the value, the greater the strength and inter-tree correlation of each tree. Thus, in RF analysis, a suitable m is selected _try The value is of vital importance. For regression problems, the suggested characteristic/input variable is 0.5/1/2 times the total 1/3 (equal to 2/3/6 in this study), n _tree The value of (2) is 200/300/500. In order to avoid overfitting problems when applying RF analysis on small datasets, recommended values for both parameters were tested. Finally, m is set in the present invention _try＝3 and n_tree =500 to meet the requirement of OOB error minimum and "interpretation percentage variance" maximum.

The RF analysis results show that the goodness of fit (percent variance of interpretation) of the test data is 86.44%, and the accuracy of the regression model is higher. The importance of each variable in the final model was calculated and its contribution to the predicted response was represented as shown in table 3. As can be seen from the table, the initial speed, illuminance and the straight traffic flow are the three most important variables in the final model, with RF importance of 23.47, 20.43 and 13.29, respectively. It was shown that in this study, the initial speed before the start of the crossing has the greatest effect on the change in vehicle speed, followed by illuminance and traffic flow.

TABLE 3 RF importance ratings for features

To further quantitatively investigate the quantitative relationship between illuminance and intersection safety, a two-dimensional nuclear density map was displayed on all selected datasets, studying the speed variation of different illuminance, initial speed and linear traffic. The two-dimensional kernel density map is a smoothed color density representation of the scatter plot, a nonparametric technique based on a probability density function of the kernel density estimate. The goal of the density estimation is to take a finite sample of data and infer potential probability density functions for all places, including places where there are no data points. In the kernel density estimation, the contribution of each data point is smoothed from a single point to a neighborhood. These smoothed density maps show the average trend of the scatter plots.

(1) Illuminance level

The two-dimensional nuclear density map depicts the distribution of the speed variation at different illumination values, as shown in fig. 6. It is apparent that the speed variation is steadily concentrated around zero when the illuminance is between 40 and 50 lux. Drivers can have safe driving visibility and then keep their speed through the intersection at green time when the light ranges from 40 to 50 lux. We can also observe that there are two completely different trends in the velocity profile across the illumination. When the illuminance is below 30lux, in particular below 10lux, the speed variation is widely distributed between 0 and 20 km/h. This means that under such lighting conditions, the driver may not observe the intersection clearly and then take different degrees of deceleration through the intersection. When the illuminance is 60 lux or more, there are 16 cases where the speed change is 5km/h or more. This means that the illumination may meet the visibility of the driver, who may be actively accelerating at the intersection. Therefore, the study strongly suggests that the illuminance of the intersection is between 40 and 50lux, so that the traffic safety and the energy consumption are considered.

(2) Initial velocity

The relationship between the speed change and the illuminance at different initial speeds was studied. As can be seen from Table 2, the average value of the initial velocity was 42.49km/h, and the median value was 40km/h. Thus, an initial speed of less than 40km/h was defined as a low speed scenario and an initial speed of greater than 40km/h was defined as a high speed scenario in this study. Fig. 7 (a) and 7 (b) are two-dimensional nuclear density maps of the velocity versus illuminance distribution at low and high velocities, respectively. As can be seen from the figure, the low initial velocity tends to increase with high illuminance (above 50 lux), while the high initial velocity tends to decrease with illuminance (below 40 lux). This suggests that high illuminance (above 40 lux) is more desirable at high speed than at low speed. And an illuminance of 40-50 lux is the best choice for drivers arriving at the intersection at low and high initial speeds.

(3) Flow rate of straight traffic

The distribution of speed over illuminance for different straight traffic conditions was studied as shown in fig. 8. In this study, traffic flows of 400/h or less are defined as smaller traffic flow scenes, and traffic flows of 400/h or more are defined as larger traffic flow scenes. The results show that the speed change distribution characteristics under different illumination in a scene with larger traffic flow are relatively unobvious compared with a scene with smaller traffic flow. The effect of illumination on the intersection speed variation may be relatively insignificant. This may be due to the following behavior in a scene of large traffic volume.

Claims

1. A method for assessing illumination versus intersection safety using a random forest model, comprising the steps of:

1. designing an experiment step;

performing field tests during typical working days and nights; the size and the environmental conditions of the selected intersection should be the same as possible and all are under signal control; each intersection should have a center line, motor vehicle and non-motor vehicle separation lines, and left turn lanes; the speed of all intersections is limited to be within 50 km/h; measuring the width of the inlet channel by using a laser range finder;

acquiring a specific video for initial traffic data acquisition by using an unmanned aerial vehicle, and performing investigation of the same duration at each intersection; identifying parameters such as flying height, charge level and the like of the aircraft by utilizing a GPS; current methods use only vehicles and other traffic data on horizontal bidirectional roads;

in addition, when shooting video, the illuminance (illuminance) of each intersection is recorded at the same time; placing the machine on the ground for measuring the illuminance around the sidewalk and measuring the illuminance every fixed time period at night; each intersection records 12 points in total in a time period (the recorded points are the starting point, the middle point and the end point of 4 sidewalks respectively); and finally, averaging 12 illumination values around the sidewalk, and calculating the recorded intersection illumination in the corresponding time period, wherein the illumination calculation expression is as follows:

2. processing the data;

in order to examine the influence of illumination on the change of the speed of a driver when the driver passes through an intersection, the unmanned aerial vehicle is firstly screened for data acquisition; wherein a valid event or event is defined as the entry of a driver through an intersection without other disturbances; the criteria for vehicle selection in video processing include:

1) All selected vehicles are not affected by other interactions;

2) Vehicles freely passing through the intersection, excluding vehicles stopped by a red light;

3) No turning vehicles are involved in the dataset;

after the video data are identified, the video is processed by utilizing Tracker software, and track data and speed data of vehicles in green light time intervals before and after the entrance of the intersection are obtained; with this software, only one vehicle can be tracked automatically at a time; each vehicle can be considered a particle and is captured at a rate of 24 frames per second; the Tracker can calculate actual coordinate values and speed values through conversion of pixel coordinates and pixel speeds; parameters for each vehicle are then derived from the software: time t, abscissa x, ordinate y, velocity v;

3. analyzing the variables and the data;

measuring the safety degree of drivers at the intersection by taking the speed change as a dependent variable; the speed change is a characteristic change of the same vehicle in terms of speed, the speed change is a useful method for evaluating safety, and the larger the speed change is, the higher the accident rate is; the vehicle speed variation amplitude of the driver before and after the entrance lane of the intersection is smaller, which indicates safe driving; the calculation formula of the speed variation is as follows:

ΔV＝V _s -V _o , (2)

in the formula ,V_s The average value of the vehicle speed of the initial intersection area of the intersection is obtained; v (V) _o The initial speed is represented, and the calculated value is the average value of the vehicle speed before the vehicle starts to pass through the intersection;

the independent variables include illuminance (illuminance), initial velocity (V ₀ ) Left turn traffic (LeftV), right turn traffic (right tv), straight traffic (straight v), total traffic (total v), ratio of straight to right turn traffic (ratio sr), number of entrance lanes (NumLane); the illuminance of the intersection is calculated according to the formula (1); then, calculating the illumination value of the specific time point by utilizing linear interpolation; linear interpolation is a method of curve fitting using linear polynomials, constructing new data points over a set of known discrete data points; counting all traffic flows at fixed time intervals, including left turn, right turn, straight run and total traffic flow; likewise, obtaining traffic flow at random time points through linear interpolation;

4. establishing a model and determining an evaluation index;

simulating the influence of traffic elements such as intersection lights and the like on the speed change of a driver by adopting a Random Forest model (RF); randomly sampling from a data set, training each decision tree in the model, and finally integrating the result of each decision tree; compared with a linear model, the random forest regression model can capture nonlinear interaction between the features and the target, and rapidly analyze the correlation between the data features and the tag data; table data with numerical features, or category features of less than hundreds of categories, can be processed well;

the RF algorithm may perform regression modeling based on a set of decision trees and operate in combination with Bagging (guided aggregation algorithm) and bootstrapping (self-help) techniques; the RF algorithm flow is as follows:

41 Bootstrapping): each base classifier is a simple decision tree; each tree is constructed by selecting a set of random observation data from a training dataset; all the basic models are independently constructed using different subsamples of the dataset; the selected data samples are referred to as boot samples, and the remaining data samples are referred to as outer bag samples;

42 Training): the RF model generates a forest of decision trees, each tree containing a random subset of all features (i.e., part of the features in Illuminance, V0, leftV, right ttv, straightV, totalV, ratioSR and NumLane), and obtains the best segmentation by building the decision tree as described above;

43 Testing/Vote): the prediction of the new sample/test sample may be achieved by averaging the predictions of all decision trees, or taking a majority vote on the decision tree; the final regression result of the RF algorithm is obtained by averaging the outputs of all decision trees built on the bootstrap samples;

RF may calculate the Mean Square Error (MSE) of OOB samples; lower error rates may eliminate the need for test verification; in the RF regression model, to verify the accuracy of the model, the squared residual mean (mean of squared residuals, MSE) and the interpretation percentage variance (percent variance explained) are used as evaluation indexes;

the mean square error calculation method is as follows:

wherein ,

representing predicted data obtained by substituting OOB data of the ith tree, y _i Is the real data of the ith tree; the percent variance (percent variance explained,% varexplainated) was calculated as follows:

in the formula ,

calculating by taking n as a divisor; the accuracy of the RF model can be expressed in% VarExplained values, the greater the value the higher the model accuracy;

importantly, when the (OOB) data for that variable is arranged, while all other variables remain unchanged, RF can calculate the feature variable importance by calculating the delta of the prediction error; the obvious advantages of random forest analysis compared with other models include no possibility of overfitting and limited generalization error generated when multiple trees grow; alleviating the multiple collinearity problem by reducing the likelihood of selecting highly correlated features in the variable samples; in the RF algorithm, the contribution of the highly relevant features is preserved, so that the influence on the influencing features which are ranked at the front is not too great; unlike other classifiers, such as SVMs, the random process in RF can remain high-performance when there are many noise features;

5. training a model and analyzing results;

identifying valid vehicle tracking segments from the unmanned video; analysis was performed using 1 dependent variable and 8 independent variables;

the random forest algorithm was developed in the R ((64 3.3.3) software; the dataset was randomly split into two parts, of which 70% was defined as training data and the remaining 30% was used as test data; the model required two main parameters: number of decision trees (n) _tree ) And the number of variables (m _try )；m _try The larger the value, the greater the strength and inter-tree correlation of each tree; thus, in RF analysis, a suitable m is selected _try The value is critical;

analyzing the fitting goodness of the RF test data, and calculating the importance of each variable in the final model to represent the contribution of each variable to the predicted response; in order to further quantitatively investigate the quantitative relationship between illuminance and intersection safety, a two-dimensional nuclear density map is displayed on all selected data sets, and speed change conditions of different illuminance, initial speed and linear traffic flow are studied; the two-dimensional kernel density map is a smoothed color density representation of the scatter plot, a non-parametric technique based on a probability density function of the kernel density estimation; the goal of the density estimation is to take a finite sample of data and infer potential probability density functions for all places, including places where there are no data points; in kernel density estimation, the contribution of each data point is smoothed from a single point to a neighborhood; these smoothed density maps show the average trend of the scatter plots.

2. A method of assessing lighting level versus intersection safety using a random forest model as defined in claim 1 wherein: typical workdays described in step 1 are Tuesday or Wednesday or Thursday.

3. A method of assessing lighting level versus intersection safety using a random forest model as defined in claim 1 wherein: step 3, the intersection is started to form an intersection area V _s Is the average value of the vehicle speed of 5 meters; v (V) _o The initial speed is represented, and the calculated value is the average value of the vehicle speed before 60-40 m from the initial intersection line.

4. A method of assessing lighting level versus intersection safety using a random forest model as defined in claim 1 wherein: the leading sample of step 41) is two-thirds of the data sample, and the outer bag sample is one-third of the data sample.

5. A method of assessing lighting level versus intersection safety using a random forest model as defined in claim 1 wherein: the dependent variable in the step 5) is a speed change, and the independent variable comprises an initial speed and illuminance.

6. A method of assessing lighting level versus intersection safety using a random forest model as defined in claim 1 wherein: the variable number (m) described in step 5) _try ) For regression problems, the characteristic/input variable is 0.5/1/2 times the total 1/3.