CN106095963B - Vehicle driving behavior analysis big data public service platform under internet + era - Google Patents

Vehicle driving behavior analysis big data public service platform under internet + era Download PDF

Info

Publication number
CN106095963B
CN106095963B CN201610430864.6A CN201610430864A CN106095963B CN 106095963 B CN106095963 B CN 106095963B CN 201610430864 A CN201610430864 A CN 201610430864A CN 106095963 B CN106095963 B CN 106095963B
Authority
CN
China
Prior art keywords
data
driving
vehicle
big data
platform
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610430864.6A
Other languages
Chinese (zh)
Other versions
CN106095963A (en
Inventor
文巨峰
罗赞文
穆祥强
许伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jingda Information Technology Co Ltd
Original Assignee
Shanghai Jingda Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jingda Information Technology Co Ltd filed Critical Shanghai Jingda Information Technology Co Ltd
Priority to CN201610430864.6A priority Critical patent/CN106095963B/en
Publication of CN106095963A publication Critical patent/CN106095963A/en
Application granted granted Critical
Publication of CN106095963B publication Critical patent/CN106095963B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/904Browsing; Visualisation therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2216/00Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
    • G06F2216/03Data mining

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Traffic Control Systems (AREA)

Abstract

A big data public service platform for analyzing vehicle driving behaviors in the Internet + era analyzes the vehicle driving behaviors and the vehicle driving risk probability, and comprises a first step of storing driving data of a vehicle in a period of time through a big data platform and processing and analyzing the data by adopting a MapReduce frame, a second step of counting abnormal driving times of the vehicle in a certain area and period of time after processing the data based on the big data platform, wherein the abnormal driving times comprise overspeed, overload, rapid acceleration and rapid deceleration, fatigue driving and night driving, and a third step of calculating the abnormal driving behaviors, calculating the vehicle risk probability by utilizing a self-organized mapping neural network algorithm and predicting the subsequent vehicle driving risk. According to the invention, the Beidou/GPS navigation data of the vehicle is collected, the collected data is preprocessed by using the big data platform, and the driving behavior of the driver can be analyzed and judged after further data mining on the data.

Description

vehicle driving behavior analysis big data public service platform under internet + era
The technical field is as follows:
The invention relates to the field of physics, in particular to a data storage and processing technology, and particularly relates to a vehicle driving behavior analysis big data public service platform in the Internet + era.
Background art:
with the improvement of the informatization degree, the life is full of massive information, some of the information is closely related to the life of people, and the analysis and the mining of the information can improve the life, serve the public, improve the social informatization degree and provide a new opportunity for a smart city and an informatization dynamic management scheme.
as the number of vehicles traveling on roads increases, the risk of personal property loss due to illegal driving by people also increases. The risk of vehicle violation in different regional areas also varies, so the risk cannot be simply averaged and varies from location to location. The environmental factors of the automobile driving need to be considered, so the automobile driving area is divided firstly and then the risk probability in each area is analyzed. In the prior art, the driving behavior of a vehicle cannot be analyzed from massive information in life.
The invention content is as follows:
the invention aims to provide a public service platform for analyzing vehicle driving behaviors in the Internet + era, which aims to solve the technical problem that the vehicle driving behaviors cannot be analyzed from mass information in life in the prior art.
The public service platform for vehicle driving behavior analysis big data in the Internet + era comprises the following steps of analyzing vehicle driving behaviors and automobile driving risk probability:
step one, storing driving data of the automobile within a period of time through a big data platform, processing and analyzing the data by adopting a MapReduce framework,
Step two, counting the abnormal driving times of the automobile in a certain area and a certain time period based on the processed data under the big data platform, wherein the abnormal driving times comprise overspeed, overload, rapid acceleration and rapid deceleration, fatigue driving and night driving,
And step three, calculating the vehicle risk probability by utilizing a self-organizing mapping neural network algorithm after counting the abnormal driving behaviors, and predicting the subsequent vehicle running risk.
Further, processing and analyzing the automobile driving data, wherein the method comprises the steps of establishing a data storage layer in the first step, wherein the main scheme is a Hadoop Distributed File System (HDFS) based storage scheme and a HDFS + distributed open-oriented source database (HBase) technology based storage scheme; the method comprises the following steps of establishing a data processing layer, processing original data in the data processing layer by adopting a large-scale parallel computing framework MapReduce (MR for short), establishing a data analysis layer, further analyzing, modeling and mining the data, establishing data visualization in the fourth step, establishing a security layer in the fifth step, providing a security protection system for a big data platform, providing a uniform service interface for the outside, and accessing the big data service platform by a user through a uniform data access interface or a uniform portal page.
Furthermore, a road network road section on which the vehicle runs is found by using a hidden Markov algorithm, road section information on which the vehicle runs is found through map data, and finally the running speed in the positioning data is input into a space-time speed limit model, so that the judgment of whether the vehicle is overspeed is finally obtained.
Further, an algorithm of a support vector machine is realized by utilizing a big data platform, wherein the method comprises the following steps of firstly, standardizing original data to eliminate the influence of dimensions; secondly, calculating average speed, average acceleration and starting acceleration through time, speed and mileage in the original data, and taking the average speed, the average acceleration and the starting acceleration as training samples; thirdly, uploading a large number of training samples to the HDFS and partitioning, enabling a Map task of the HDFS to start a sequence minimum optimization (SMO for short) training of data in the blocks by each node to calculate a support vector of each block of data, and finally inputting the support vector at the training position of each node into a Reduce task of the HDFS and submitting the support vector to an SMO algorithm training to obtain a final support vector; and fourthly, obtaining a classifier by using the support vector in the third step, and judging whether the vehicle is overloaded by using the classifier.
further, establishing a linear regression model under a big data platform, wherein the method comprises the steps of firstly, finding out the driving data of all automobiles in the road section by using the big data platform, and calculating the speed difference between two adjacent data; the method comprises the steps of firstly calculating a linear regression algorithm, secondly, obtaining a model S (delta v, vo) of the speed variation of the automobile in the road section and the driving distance of the automobile in the road section by using a statistical linear regression algorithm, thirdly, calculating a regression equation to obtain the model S (delta v, vo) of the speed variation of the automobile in the road section and the driving distance, finally, judging the probability of the automobile in rapid acceleration and rapid deceleration by using the equation, and judging the rule that the larger the error e between the distance of the ith automobile and the model distance is | | | | S-Si |, the larger the probability of the automobile in rapid acceleration and rapid deceleration is, and the smaller the probability is.
Specifically, the driving data of all vehicles at all time of a certain road section is found out from the original data through a big data platform, the data are processed to obtain the difference value of the speeds of two adjacent points, and then a regression model is obtained by using a regression analysis method. And obtaining the probability of the automobile in the road section for sudden acceleration and sudden deceleration through the regression model. Let Δ v be the speed difference between two time points on the same road section, and vo be the initial speed of each time point of the car. Since the distance traveled by the vehicle over a period of time is linear with the speed difference and the initial speed, the regression model can be as
Where bo, b1, b2 are partial regression coefficients for the variables Δ v and v 0.
And obtaining the value of the partial regression coefficient through training data and a least square principle, testing the obtained model by using variance analysis, and finally obtaining the probability of the rapid acceleration and the rapid deceleration of the automobile according to the size. Wherein S is the actual driving distance of the automobile and is the driving distance meeting the regression model.
Further, the step of calculating the vehicle risk probability includes measuring the risk probability in the first step, establishing a risk probability model in the second step, preprocessing historical data, clustering the data to obtain l clusters, namely Cn ═ { I1, I2.... Il }, wherein Ii is one of the clusters, and then constructing a prediction model according to a clustering result includes calculating an average risk probability of the clusters and a similarity coefficient between a feature vector and the average vector, and predicting the risk probability.
Specifically, the risk probability is a percentage of the occurrence probability of the risk, and is a measure of the occurrence probability of the object state. The driving data of the automobiles in the same area has higher similarity, and the driving data in different areas have larger difference. And clustering the automobile driving data according to the characteristic points by taking one area as a cluster to obtain the driving data in different areas. Each feature point (feature vector) is associated with a point in the cluster, and assuming that the ith feature vector in the cluster is Xi (i ∈ N), the risk probability can be expressed as
Wherein | | | nr (Xi) | | represents the number of all points in the region r, and nr (Xi) is a field in which the point Xi is in the radius of r. U (Nr (xi)) is the number of security risks occurring in the cluster. It can be seen that (2) the risk probability is the average probability of all points in the cluster. Can also be expressed as
Where the similarity of the feature points Xi to the average vector of all points is represented.
The risk probability prediction model determination comprises three parts:
pre-processing of historical data
(1) data attributes are abstracted from historical data, and the driving behavior data of the automobile comprises license plate numbers, longitude and latitude, driving speed, driving direction and driving time.
(2) Clustering the historical data to obtain l clusters, namely Cn ═ { I1, I2.. Il }, wherein Ii is one of the clusters, and k feature vectors in the Ii cluster are assumed, namely Xij is the jth feature vector in the cluster.
Construction prediction model
calculating the risk probability of the feature vector in each cluster after clustering to obtain a result, wherein the steps are as follows:
(1) The average risk probability in cluster Ii is calculated.
Where | Ii | is the number of elements in cluster Ii, if the point represented is a risk occurrence, then 0 otherwise.
(2) Calculate the average eigenvector of all points in the cluster Ii
(3) Calculating the similarity coefficient of each feature vector in the cluster
(4) Calculating a risk probability for each feature vector
5.3 Risk probability prediction
After the risk probability model is obtained through calculation, new automobile data can be predicted, and the method comprises the following steps:
(1) carrying out cluster analysis on the new automobile driving data Xp;
(2) After general clustering, new data are clustered into a certain cluster, and then the similarity coefficient of the data and an average vector is calculated;
(3) then the probability calculation for the new data point is:
further, in the step of clustering the data, a neural network clustering algorithm based on hadoop self-organizing mapping is adopted, and a Mapreduce parallel computing frame is used for computing the data.
Further, in the step of calculating the data by using the Mapreduce parallel calculation framework, a winning neuron and a weight modifier are calculated in the Map stage, and the total update quantity of each weight connected with the same winning neuron is counted in the Reducer stage, and then the weights are updated.
Further, in an algorithm for realizing the support vector machine, the similarity between two vectors is measured by using the Euclidean distance attitude.
specifically, in the above-mentioned hadoop-based self-organizing map (SOM) -based neural network clustering algorithm, competition learning is adopted to realize internal adaptation and change of the internal structure of self-organization, that is, after external input is received, the internal structure is automatically divided into different regions, each region has different response characteristics to the input, and the process is automatically completed. The steps of the algorithm are as follows:
(1) Initialization and normalization. Firstly, normalization processing is carried out on the current input vector and the neuron weight vector Wi in the competition layer, an initial win area is established, and the gain mu (0) is assigned.
(2) Winning neurons were sought. And performing similarity comparison on the input vector and all the response weight vectors in the competition layer, and taking the most missing weight vector as a winning neuron. The greater the dot product of two vectors in mathematics that are most similar can be expressed as the following equation:
(3) Adjusting node weight in the winning field
Where μ (t) is the gain and decreases to 0 over time.
(4) If the gain drops to the set threshold μmin then the process ends, otherwise the input data continues to start a new process calculation from the first step.
The algorithm is operated on a Hadoop platform, and the implementation steps are as follows:
map phase
the key-value pair < key, value >, where key denotes the sample sequence number value denotes the sample feature set.
and calculating the output value of the competition layer according to the competition learning principle.
The winning neuron is determined, and the state thereof is determined as 1, and the other states are determined as 0. The flag is set to the position of the winning neuron in the competition layer.
and calculating the updating quantity of each weight connected with the winning neuron, and keeping other weights unchanged.
And outputting the key value pair, wherein key represents flag, and value represents the weight updating amount connected with the winning neuron.
reducer phase
And accepting the output of the map function as the input of the reduce function, wherein the input key represents the position of the winning neuron, and the values represent the combination of weight updating quantities connected with the winning neuron.
And accumulating to obtain the total updating amount of each weight value, and calculating the average updating amount.
And adjusting the corresponding network weight.
And outputting the key value pair, wherein key represents the weight updating state, and value represents the updated network weight.
and updating the global vector, and repeating the MapReduce process until the errors of the weight vector and the winning sample are smaller than a set threshold value, and finishing.
It can be seen from the above calculation steps that learning of the SOM network can adaptively perform clustering according to intrinsic connections of samples, the output neuron weight vector Wi gradually approaches to a winning sample, and a weight vector set { Wi | i ═ 1,2.. l } is regarded as a clustering center of all samples. The similarity measure method is described as follows:
an X-X, X-1, X2.. xn, belongs to a cluster Ii in the feature space, and the similarity coefficient between the feature vector and the average vector (cluster center) can be expressed as,
It can be seen from equation (9) that the similarity between the metric vectors is measured using euclidean distance, and the more similar the two vectors, the larger the value is, and the maximum value is 1.
Compared with the prior art, the invention has positive and obvious effect. According to the invention, the Beidou/GPS navigation data of the vehicle is collected, the collected data is preprocessed by using the big data platform, and the driving behavior of the driver can be analyzed and judged after further data mining on the data.
Description of the drawings:
Fig. 1 is a frame diagram of a vehicle driving behavior analysis big data public service platform in the internet + age of the present invention.
Fig. 2 is a flow chart of the neural network algorithm based on self-organizing map in the invention.
the specific implementation mode is as follows:
Example 1:
As shown in fig. 1 and 2, the public service platform for vehicle driving behavior analysis in the internet + age of the present invention includes four parts: the data visualization system comprises a data storage layer, a data processing layer, a data analysis layer and a data visualization layer.
Judging the overspeed of the automobile, and finding the optimal road section for the automobile to run by utilizing a hidden Markov algorithm; secondly, finding out the speed limit size corresponding to the road type of the automobile by using the map data; and thirdly, comparing the driving speed in the obtained data with the speed limit of the road section to judge whether the vehicle is overspeed or not.
Judging the overload of the automobile, and normalizing the original data to eliminate the influence of dimensions; secondly, calculating average speed, average acceleration and starting acceleration through time, speed and mileage in the original data, and taking the average speed, the average acceleration and the starting acceleration as training samples; thirdly, training a classifier by using the training samples in the second step; and fourthly, judging whether the vehicle is overloaded by using the classifier.
and judging the rapid acceleration and the rapid deceleration, firstly finding out the driving data of all vehicles at all time of a certain road section from the original data through a large data platform, processing the data to obtain the difference value of the speeds of two adjacent points, and then obtaining a regression model by using a regression analysis method. And obtaining the probability of the automobile in the road section for sudden acceleration and sudden deceleration through the regression model. Let Δ v be the speed difference between two time points on the same road section, and vo be the initial speed of each time point of the car. Because the distance traveled by the vehicle over a period of time is linear with the speed difference and the initial speed, the regression model is:
Where bo, b1, b2 are partial regression coefficients for the variables Δ v and v 0.
and obtaining the value of the partial regression coefficient through training data and a least square principle, testing the obtained model by using variance analysis, and finally obtaining the probability of the rapid acceleration and the rapid deceleration of the automobile according to the size. Wherein S is the actual driving distance of the automobile and is the driving distance meeting the regression model.
the risk probability is the percentage of the occurrence probability of the risk and is a measure of the occurrence probability of the state of the thing. The driving data of the automobiles in the same area has higher similarity, and the driving data in different areas have larger difference. And clustering the automobile driving data according to the characteristic points by taking one area as a cluster to obtain the driving data in different areas. Each feature point (feature vector) is associated with a point in the cluster, and assuming that the ith feature vector in the cluster is Xi (i ∈ N), the risk probability can be expressed as
wherein | | | nr (Xi) | | represents the number of all points in the region r, and nr (Xi) is a field in which the point Xi is in the radius of r. U (Nr (xi)) is the number of security risks occurring in the cluster. One can see that (1) the risk probability is the average probability of all points in the cluster. Can also be expressed as
Where the similarity of the feature points Xi to the average vector of all points is represented.
The risk probability prediction model needs to be divided into three parts:
pre-processing of historical data
Data attributes are abstracted from historical data, and the driving behavior data of the automobile comprises license plate numbers, longitude and latitude, driving speed, driving direction and driving time.
Clustering the historical data to obtain l clusters, namely Cn ═ { I1, I2.. Il }, wherein Ii is one of the clusters, and k feature vectors in the Ii cluster are assumed, namely the jth feature vector in the cluster.
Construction prediction model
calculating the risk probability of the feature vector in each cluster after clustering to obtain a result, wherein the steps are as follows:
(1) The average risk probability in cluster Ii is calculated.
Where | Ii | is the number of elements in cluster Ii, if the point represented is a risk occurrence, then 0 otherwise.
(2) Calculate the average eigenvector of all points in the cluster Ii
(3) Calculating the similarity coefficient of each feature vector in the cluster
(4) Calculating a risk probability for each feature vector
Risk probability prediction
After the risk probability model is obtained through calculation, new automobile data can be predicted, and the method comprises the following steps:
Carrying out cluster analysis on the new automobile driving data Xp;
after general clustering, new data are clustered into a certain cluster, and then the similarity coefficient of the data and an average vector is calculated;
The probability calculation formula for the new data point is:
the key technology for solving the risk probability model of the data points is how to cluster historical data by using a clustering algorithm. A self-organizing map (SOM) based neural network clustering algorithm is employed herein. The algorithm adopts competitive learning to realize internal self-adaptation and self-organization change internal structure, namely, the algorithm can be automatically divided into different areas after receiving external input, each area has different response characteristics to the input, and the process is automatically completed. The steps of the algorithm are as follows:
initialization and normalization. Firstly, normalization processing is carried out on the current input vector and the neuron weight vector Wi in the competition layer, an initial win region Ni (0) is established, and the gain mu (0) is assigned.
Winning neurons were sought. And performing similarity comparison on the input vector and all the response weight vectors in the competition layer, and taking the most missing weight vector as a winning neuron. The greater the dot product of two vectors in mathematics that are most similar can be expressed as the following equation:
Adjusting node weight in the winning field
where μ (t) is the gain and decreases to 0 over time.
If the gain drops to the set threshold μmin then the process ends, otherwise the input data continues to start a new process calculation from the first step.
The algorithm is operated on a Hadoop platform, and the implementation steps are as follows:
Map phase
Inputting a key-value pair < key, value >, wherein key represents a sample sequence number value represents a sample feature set
And calculating the output value of the competition layer according to the competition learning principle.
The winning neuron is determined, and the state thereof is determined as 1, and the other states are determined as 0. The flag is set to the position of the winning neuron in the competition layer.
and calculating the updating quantity of each weight connected with the winning neuron, and keeping other weights unchanged.
And outputting the key value pair, wherein key represents flag, and value represents the weight updating amount connected with the winning neuron.
Reducer phase
And accepting the output of the map function as the input of the reduce function, wherein the input key represents the position of the winning neuron, and the values represent the combination of weight updating quantities connected with the winning neuron.
And accumulating to obtain the total updating amount of each weight value, and calculating the average updating amount.
And adjusting the corresponding network weight.
and outputting the key value pair, wherein key represents the weight updating state, and value represents the updated network weight.
And updating the global vector, and repeating the MapReduce process until the errors of the weight vector and the winning sample are smaller than a set threshold value, and finishing.
it can be seen from the above calculation steps that learning of the SOM network can adaptively perform clustering according to intrinsic connections of samples, and the output neuron weight vectors Wi gradually approach winning samples, so that a weight vector set { Wi | i ═ 1,2.. l } can be regarded as a clustering center of all samples. The similarity measurement method comprises the following steps:
An X-X { X1, X2.. xn } feature vector belongs to a cluster Ii in the feature space, and the similarity coefficient between the feature vector and the average vector (cluster center) can be expressed as
it can be seen from equation (9) that the similarity between the metric vectors is measured using euclidean distance, and the more similar the two vectors, the larger the value is, and the maximum value is 1.

Claims (4)

1. the utility model provides a vehicle driving behavior analysis big data public service platform under internet + time which characterized in that: the method for analyzing the driving behavior of the vehicle and the driving risk probability of the vehicle comprises the following steps:
Step one, storing driving data of the automobile within a period of time through a big data platform, processing and analyzing the data by adopting a MapReduce framework,
Step two, counting the abnormal driving times of the automobile in a certain area and a certain time period based on the processed data under the big data platform, wherein the abnormal driving times comprise overspeed, overload, rapid acceleration and rapid deceleration, fatigue driving and night driving,
Thirdly, calculating the risk probability of the vehicle by utilizing a self-organizing mapping neural network algorithm after counting the abnormal driving behaviors, and predicting the subsequent vehicle running risk;
The method comprises the steps of establishing a data storage layer in the first step, and establishing a Hadoop distributed file system based storage scheme and a Hadoop distributed file system + distributed column-oriented source database technology based storage scheme; establishing a data processing layer, processing original data by adopting a large-scale parallel computing frame in the data processing layer, establishing a data analysis layer in the third step, further analyzing, modeling and mining the data, establishing data visualization in the fourth step, establishing a security layer in the fifth step, providing a security protection system for the big data platform, providing a uniform service interface for the outside, and accessing the big data service platform by a user through a uniform data access interface or a uniform portal page;
Finding road network sections where vehicles run by using a hidden Markov algorithm, finding information of the road sections where the vehicles run by using map data, and finally inputting the running speed in positioning data into a space-time speed-limiting model to finally obtain the judgment of whether the vehicles overspeed;
an algorithm of a support vector machine is realized by utilizing a big data platform, wherein the algorithm comprises the following steps of firstly, standardizing original data to eliminate the influence of dimensions; secondly, calculating average speed, average acceleration and starting acceleration through time, speed and mileage in the original data, and taking the average speed, the average acceleration and the starting acceleration as training samples; thirdly, uploading a large number of training samples to a Hadoop distributed file system and partitioning, starting a Map task of a Hadoop distributed processing platform by each node to perform sequence minimum optimization training on data in the block to calculate a support vector of each block of data, and finally inputting the support vector of each node training position into a Reduce task of the Hadoop distributed processing platform and submitting the support vector to an SMO algorithm training to obtain a final support vector; fourthly, obtaining a classifier by using the support vector in the third step, and judging whether the vehicle is overloaded by using the classifier;
Establishing a linear regression model under a big data platform, wherein the linear regression model comprises the steps of finding out the driving data of all automobiles in the road section by using the big data platform, and calculating the speed difference between two adjacent data; secondly, obtaining a speed variation delta v of a vehicle on a certain road by using a statistical linear regression algorithm, wherein the speed variation delta v of the vehicle from a certain initial speed vo to the next driving time is input, the driving distance S of the vehicle in the time is output, and the data of all the driven vehicles in the road are counted;
The vehicle risk probability calculating method comprises the steps of measuring risk probability in the first step, establishing a risk probability model in the second step, preprocessing historical data, clustering the data to obtain l clusters, namely Cn ═ { I1, I2.... Il }, wherein Ii is one of the clusters, constructing a prediction model according to clustering results, calculating the average risk probability of the clusters and the similarity coefficient between a feature vector and the average vector, and predicting the risk probability.
2. the internet + time vehicle driving behavior analysis big data public service platform according to claim 1, wherein: in the step of clustering the data, a neural network clustering algorithm of self-organizing mapping is adopted, and a large-scale parallel computing framework Mapreduce parallel computing framework is used for computing the data.
3. The internet + time vehicle driving behavior analysis big data public service platform according to claim 2, wherein: in the step of calculating data by using a massively parallel calculation framework Mapreduce parallel calculation framework, a winning neuron and a weight modifier are calculated in a Map stage, and a total update quantity of each weight connected with the same winning neuron is counted in a Reduce stage, and then the weights are updated.
4. The internet + time vehicle driving behavior analysis big data public service platform according to claim 1, wherein: in the algorithm for realizing the support vector machine, the similarity between two vectors is measured by using the Euclidean distance attitude.
CN201610430864.6A 2016-06-17 2016-06-17 Vehicle driving behavior analysis big data public service platform under internet + era Active CN106095963B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610430864.6A CN106095963B (en) 2016-06-17 2016-06-17 Vehicle driving behavior analysis big data public service platform under internet + era

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610430864.6A CN106095963B (en) 2016-06-17 2016-06-17 Vehicle driving behavior analysis big data public service platform under internet + era

Publications (2)

Publication Number Publication Date
CN106095963A CN106095963A (en) 2016-11-09
CN106095963B true CN106095963B (en) 2019-12-06

Family

ID=57235694

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610430864.6A Active CN106095963B (en) 2016-06-17 2016-06-17 Vehicle driving behavior analysis big data public service platform under internet + era

Country Status (1)

Country Link
CN (1) CN106095963B (en)

Families Citing this family (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106682394A (en) * 2016-11-30 2017-05-17 北京拓明科技有限公司 Big data analyzing method and system of survival risk
CN108268678B (en) * 2016-12-30 2021-07-23 上海擎感智能科技有限公司 Driving behavior analysis method, device and system
CN107153916A (en) * 2017-04-30 2017-09-12 安徽中科美络信息技术有限公司 A kind of driving behavior evaluation method clustered based on FCM with BP neural network
CN108944799B (en) * 2017-05-18 2022-06-03 腾讯科技(深圳)有限公司 Vehicle driving behavior abnormity processing method and device
CN109754595B (en) * 2017-11-01 2022-02-01 阿里巴巴集团控股有限公司 Vehicle risk assessment method and device and interface equipment
CN107742193B (en) * 2017-11-28 2019-08-27 江苏大学 A kind of driving Risk Forecast Method based on time-varying state transition probability Markov chain
CN109840660B (en) * 2017-11-29 2021-07-30 北京四维图新科技股份有限公司 Vehicle characteristic data processing method and vehicle risk prediction model training method
CN107993017A (en) * 2017-12-12 2018-05-04 中国矿业大学(北京) A kind of worker's unsafe acts analysis method and system
CN109993966A (en) * 2018-01-02 2019-07-09 中国移动通信有限公司研究院 A kind of method and device of building user portrait
CN108764111B (en) * 2018-05-23 2022-03-01 长安大学 Method for detecting abnormal driving behavior of vehicle
CN109377046A (en) * 2018-10-18 2019-02-22 上海经达信息科技股份有限公司 Overload of vehicle method of discrimination, system and device based on BP neural network
CN111105110A (en) * 2018-10-25 2020-05-05 北京嘀嘀无限科技发展有限公司 Driving risk determination method, device, medium and computing equipment
CN109542081B (en) * 2018-11-14 2021-06-22 江苏大学 Online driving danger estimation method based on offline vehicle deceleration curve
CN109493566B (en) * 2018-12-18 2020-09-08 长安大学 Fatigue driving prediction method based on GPS data
CN109710245A (en) * 2019-01-16 2019-05-03 北京百度网讯科技有限公司 Information demonstrating method and device
CN110060370B (en) * 2019-04-15 2021-04-27 上海经达信息科技股份有限公司 Equivalent statistical method for times of rapid acceleration and rapid deceleration of vehicle
CN113095713B (en) * 2021-04-28 2023-10-20 扬州大学 Driver space risk early warning method based on public transport history alarm data
CN113838277B (en) * 2021-09-26 2022-07-26 广州文远知行科技有限公司 Method, device and equipment for determining abnormal occurrence time point of vehicle and storage medium

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103106793B (en) * 2013-01-11 2015-02-04 福州大学 Traffic state discriminated method based on real-time driving direction and transit time quantum information
CN103646534B (en) * 2013-11-22 2015-12-02 江苏大学 A kind of road real-time traffic accident risk control method
US9600541B2 (en) * 2014-05-02 2017-03-21 Kookmin University Industry Academy Corporation Foundation Method of processing and analysing vehicle driving big data and system thereof
CN104504531B (en) * 2014-12-30 2018-08-31 大连楼兰科技股份有限公司 A kind of vehicle drive methods of risk assessment
CN104484725A (en) * 2014-12-30 2015-04-01 大连楼兰科技股份有限公司 Driving risk assessment method based on principal component linear weighting
CN104636449A (en) * 2015-01-27 2015-05-20 厦门大学 Distributed type big data system risk recognition method based on LSA-GCC
CN105448108B (en) * 2015-12-30 2017-10-03 上海经达信息科技股份有限公司 A kind of hypervelocity method of discrimination based on Hidden Markov road network

Also Published As

Publication number Publication date
CN106095963A (en) 2016-11-09

Similar Documents

Publication Publication Date Title
CN106095963B (en) Vehicle driving behavior analysis big data public service platform under internet + era
Zheng et al. Deep and embedded learning approach for traffic flow prediction in urban informatics
Carfora et al. A “pay-how-you-drive” car insurance approach through cluster analysis
US20220332335A1 (en) Vehicle-data analytics
Dong et al. Characterizing driving styles with deep learning
CN112700072B (en) Traffic condition prediction method, electronic device, and storage medium
Zhao et al. Highway travel time prediction using sparse tensor completion tactics and k‐nearest neighbor pattern matching method
CN111582559B (en) Arrival time estimation method and device
US20230289663A1 (en) Vehicle resiliency, driving feedback and risk assessment using machine learning-based vehicle wear scoring
CN111126868B (en) Road traffic accident occurrence risk determination method and system
CN113205698A (en) Navigation reminding method based on IGWO-LSTM short-time traffic flow prediction
CN116432810A (en) Traffic flow prediction model determination method, device, apparatus and readable storage medium
Li et al. Macroscopic big data analysis and prediction of driving behavior with an adaptive fuzzy recurrent neural network on the internet of vehicles
CN113033899A (en) Unmanned adjacent vehicle track prediction method
Matarazzo et al. Crowdsourcing bridge vital signs with smartphone vehicle trips
CN103106329B (en) A kind of training sample constructed in groups method for SVR short-term load forecasting
Zhang et al. Long‐term bridge performance assessment using clustering and Bayesian linear regression for vehicle load and strain mapping model
CN114863170A (en) Deep learning-based new energy vehicle battery spontaneous combustion early warning method and device
Xia et al. A distributed EMDN-GRU model on Spark for passenger waiting time forecasting
Rui et al. High-accuracy transient fuel consumption model based on distance correlation analysis
CN116663742A (en) Regional capacity prediction method based on multi-factor and model fusion
Lu et al. A bi-level distribution mixture framework for unsupervised driving performance evaluation from naturalistic truck driving data
Xue et al. Traffic status evolution trend prediction based on congestion propagation effects under rainy weather
Marciniuk et al. Machine learning applied to acoustic-based road traffic monitoring
Moghadam et al. Application of smartphones in pavement deterioration identification using artificial neural network

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant