CN112183615B - Automobile risk user screening method with Markov chain data processing function - Google Patents
Automobile risk user screening method with Markov chain data processing function Download PDFInfo
- Publication number
- CN112183615B CN112183615B CN202011021233.1A CN202011021233A CN112183615B CN 112183615 B CN112183615 B CN 112183615B CN 202011021233 A CN202011021233 A CN 202011021233A CN 112183615 B CN112183615 B CN 112183615B
- Authority
- CN
- China
- Prior art keywords
- data
- state
- markov chain
- transition
- probability
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 37
- 238000012216 screening Methods 0.000 title claims abstract description 28
- 238000012545 processing Methods 0.000 title claims abstract description 23
- 230000007704 transition Effects 0.000 claims abstract description 61
- 239000011159 matrix material Substances 0.000 claims abstract description 34
- 238000012549 training Methods 0.000 claims abstract description 8
- 238000004140 cleaning Methods 0.000 claims abstract description 7
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 4
- 238000013528 artificial neural network Methods 0.000 claims description 18
- 239000013598 vector Substances 0.000 claims description 13
- 238000009826 distribution Methods 0.000 claims description 12
- 238000004364 calculation method Methods 0.000 claims description 10
- 230000014759 maintenance of location Effects 0.000 claims description 9
- 230000006399 behavior Effects 0.000 claims description 7
- 238000012217 deletion Methods 0.000 claims description 6
- 230000037430 deletion Effects 0.000 claims description 6
- 238000010586 diagram Methods 0.000 claims description 6
- 238000000605 extraction Methods 0.000 claims description 6
- 238000004519 manufacturing process Methods 0.000 claims description 6
- 238000010606 normalization Methods 0.000 claims description 6
- 238000011176 pooling Methods 0.000 claims description 5
- 238000007781 pre-processing Methods 0.000 claims description 5
- 238000012360 testing method Methods 0.000 claims description 4
- 230000002159 abnormal effect Effects 0.000 claims description 3
- 238000002790 cross-validation Methods 0.000 claims description 3
- 230000000694 effects Effects 0.000 claims description 3
- 230000008030 elimination Effects 0.000 claims description 3
- 238000003379 elimination reaction Methods 0.000 claims description 3
- 238000005516 engineering process Methods 0.000 claims description 3
- 238000007619 statistical method Methods 0.000 claims description 3
- 230000009466 transformation Effects 0.000 abstract description 2
- 238000013507 mapping Methods 0.000 abstract 1
- 230000008569 process Effects 0.000 description 12
- 238000013473 artificial intelligence Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 238000013135 deep learning Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 230000004913 activation Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Probability & Statistics with Applications (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Complex Calculations (AREA)
Abstract
The invention discloses an automobile risk user screening method with Markov chain data processing, which belongs to the technical field of automobile risk user classification, and comprises the steps of acquiring data attributes in a user journey, acquiring longitude and latitude and time of data, and cleaning and integrating the data; dividing data into small areas according to time and space, and extracting data characteristics of each area; mapping the data processed by each region into a data format of a Markov chain to obtain a state transition matrix; the extracted features are applied to a convolutional neural network model training classifier, and network parameters are updated with cross entropy loss. The invention solves the problem of how to extract the most effective feature for classifying from a plurality of features, realizes the transformation of feature space dimension, and obtains a group of classification features with invariance of similar samples and discriminativity of different samples.
Description
Technical Field
The invention relates to the technical field of automobile risk user classification, in particular to an automobile risk user screening method with Markov chain data processing.
Background
With the continuous development of the scientific and technological achievement of artificial intelligence, the deep learning classification neural network is migrated to each research field, and the actual value brought by the artificial intelligence to human becomes the direction of efforts of a plurality of scientific researchers; for example, application of classification networks to high and low risk user screening in the automotive industry is an important aid and reference for the automotive industry to better serve users. Because the data screened by the high-risk and low-risk users of the automobile come from a vehicle driving system, the data provided by the system is huge in quantity, and also has a large quantity of noise, meanwhile, the identification degree of the data is low, and large differences exist among available data attributes, so that the network training is difficult. As the network deepens, overfitting is easily caused, resulting in failure of the network model to converge.
At present, a Markov chain is widely applied to the artificial intelligence fields such as voice recognition, text recognition, path recognition and the like as a concept for explaining a time process; either in the financial field, it is used to predict market share of enterprise products, or as a signal model for entropy coding techniques, etc., but it is a main approach to solve various problems. Related applications of Markov chains are not available in the aspect of screening the automobile risk users temporarily so as to solve the problem of preprocessing data and strengthen the screening accuracy of the automobile risk users.
Disclosure of Invention
In view of the above-described deficiencies of the prior art, the present invention provides a method for screening risk users of an automobile with markov chain data processing.
In order to solve the technical problems, the invention adopts the following technical scheme: an automobile risk user screening method with Markov chain data processing, comprising the following steps:
step 1: the driving behavior related data are read from the database, the data are preprocessed according to the longitude and latitude and the data acquisition time acquired by the GPS, and the confidence and reliability of the data are improved, wherein the process is as follows:
step 1.1: checking whether repeated data exist in the data, and if so, only reserving one piece of data;
step 1.2: deleting tuples, 0 values and filling the average value and filling the deleted data by a K nearest neighbor distance method;
step 1.3: according to the longitude and latitude of each city, data which are not in the range of each city are regarded as abnormal data, and according to actual conditions, one statistical method of gradual backward deletion, average elimination and logic error deletion is adopted for data cleaning;
step 1.4: and according to the influence of the satellite positioning technology on the positioning precision, regarding data less than the threshold data amount as invalid data, and then carrying out data cleaning again.
Step 2: according to specific position information of each city, meshing division is carried out according to longitude and latitude and time data, vehicle operation data in driving behaviors of each small area in each time period are counted, data processing is carried out, and the process is as follows:
step 2.1: selecting a certain city, merging all data of different acquisition times, drawing a scatter diagram according to longitude and latitude, observing driving distribution conditions of an automobile, setting a city grid division standard according to the density degree of the scatter diagram, and obtaining the area grid size under different division standard conditions;
step 2.1.1: assuming that the maximum and minimum longitudes of the city are respectively max (X) and min (X), the maximum and minimum longitudes are respectively max (Y) and min (Y), and the side length of the city grid is set as r i (i=1, 2,3, …, m), where m represents m likelihood criteria for dividing the grid, then the number of grids divided by the city in terms of longitude and latitude is:
wherein ,nlength,i Represents the number of grids divided by longitude under the ith possibility division standard, n width,i Representing the number of latitudinal grids under the ith possibility division standard;
step 2.1.2: adding variances of the vehicle operation data of each area under different possibility division standards, and determining the optimal grid division standard with the smallest variance from the different possibility division standards according to a minimum variance method; or the voting method is adjusted according to the variance in the small area so as to avoid a large number of non-data areas;
step 2.2.: in the divided space cell, dividing the space cell into M time segments according to whether the space cell is a road section peak time point or not;
step 2.3: in each time period of each divided small area, carrying out mean and variance statistics on the data, and carrying out data calculation on the basis:
wherein ,and sigma (sigma) ijk The ith row and the jth column of the city respectivelyMean and variance of data in the kth time period of the grid, x k For the data unprocessed for the grid k time period, x' k Is the data processed by the grid k time period.
Step 3: dividing the time sequence data into a plurality of states according to the processed characteristic data, determining the interval dividing form of the measured distribution without tendency, and counting the state transition condition and state transition matrix of the data in the dividing interval, wherein the process is as follows:
step 3.1: dividing the time sequence data into N states according to the distribution condition of the processed characteristic data x';
the state is divided into equal intervals or unequal intervals according to actual distribution conditions;
step 3.2: converting the data subjected to the gridding treatment into states according to the upper and lower boundaries of the states, namely x (i) to s (i), wherein i=1, 2 and …, and generating a Markov chain;
wherein x (i) is data subjected to meshing processing at the moment i, and s (i) is a state at the moment i;
step 3.2.1: assuming that the upper and lower boundaries of the states are B and a, respectively, the interval between the states is:
step 3.2.2: when x (i) epsilon [ a+ (k-1) delta, A+kdelta ], s (i) =k, k=1, 2, …, N, so that the characteristic data corresponding to each time point is converted into state data between [1,2, …, N ], and the state data has the property of a Markov chain, so that the data set formed by all the state information s (i) is a Markov chain;
step 3.3: counting the transition condition of each state s (i), and extracting Markov characteristics;
step 3.3.1: defining the Markov characteristic, namely the transition condition of each state, and counting the upward and downward transition times of the state i asThe number of times of holding state i is k i ,/> and ki The calculation formula of (2) is as follows:
where s (j) represents the state at the moment j, s (j+1) represents the state at the moment j+1, and L represents the number of data points;
step 3.4: according to the extraction and ki The state transition probability and the state transition matrix are calculated as follows:
step 3.4.1: when the state is i=1, the corresponding state transition probability and state retention probability are:
wherein ,p1,1 To be the probability of transition from state 1 to state 1, p 1,2 A probability of transitioning from state 1 to state 2;
step 3.4.2: when the state is 1 < i < N, the corresponding state transition probability and state retention probability are as follows:
wherein ,pi,i-1 To transition from state i-1 to state i, p i,i To the probability of transition from state i to state i, p i,i+1 A probability of transition from state i to state i+1;
step 3.4.3: when the state is i=n, the corresponding state transition probability and state retention probability are:
wherein ,pN,N-1 To the probability of transition from state N to state N-1, p N,N The probability of transitioning to state N for state N;
step 3.4.4: the state transition matrix can be expressed as:
step 4: preprocessing a state transition condition and a state transition matrix, combining partial characteristics which are not processed by a Markov chain, and jointly forming data characteristics for classifying the neural network input, wherein the process is as follows:
step 4.1: after the data torque and power of part of the data which are not processed by the Markov chain are subjected to standardization processing, combining a state transition matrix to form a feature vector of a high-low risk user screening neural network together;
step 4.2: and randomly selecting 75% of data by adopting an S-fold cross validation model to manufacture a training set and 25% of data to manufacture a test set.
Step 5: in the training stage, the feature size is compressed by using a deep convolutional neural network, feature dimensions are enriched, main features are extracted, feature vectors output by the last layer of the feature network are input to a full-connection layer, and a classification result of high and low risks of a user is obtained after softmax normalization, and the process is as follows:
step 5.1: three layers of convolution layers of the neural network perform local feature extraction and combination by using convolution check feature vectors with shared parameters of various feature dimensions, and a standard convolution output matrix Y= (Y) ij ) Can be obtained by inputting a feature matrix x= (X) ij ) And convolution kernel matrix w= (W ij ) The calculation results are that:
wherein m, n are weight matrix position coordinates, i, j areInputting position coordinates of a feature matrix, w mn For the filter size at m, n positions, x i+m,j+n For the feature tensor to be processed by the filter at the i, j position, K is the convolution kernel size;
step 5.2: extracting key information by extracting the point with the largest median value in the local receiving area by the two maximum pooling layers of the neural network, and compressing the characteristics;
step 5.3: the neural network inputs the feature vector output by the last convolution layer into a full-connection layer, connects the extracted local features through a weight matrix, maps the extracted local features back to the global, adopts two layers of full-connection layers to improve the nonlinear expression capacity of the model, uses Dropout to prevent the model from being over-fitted, and obtains a classification result of high and low risks of the user through softmax normalization.
Step 6: and calculating cross entropy loss, and minimizing a loss function through random gradient descent, so that network model parameters are updated, and a better high-low risk user screening effect is realized.
The beneficial effects of adopting above-mentioned technical scheme to produce lie in:
1. the method designs a light network, extracts and combines local features by convolution calculation, and utilizes the maximum pooling compression feature to adapt to the task of screening high-risk users and low-risk users.
2. Aiming at the noise problem of data, the invention provides a data processing method using a Markov chain, the data characteristics of the data are considered, time series data are divided into a plurality of states, the Markov chain is established, and a state transition condition and a state transition matrix are generated for constructing partial data of a neural network.
3. The invention reasonably combines deep learning in the artificial intelligence field with high-low risk user screening and Markov chain screening, can extract the most effective classification characteristic from a plurality of characteristics under the condition of low noise identification degree of data, and simultaneously realizes the transformation of characteristic space dimension by Markov chain data processing, thereby obtaining a group of identification classification characteristics with invariance of similar samples and different samples.
Drawings
FIG. 1 is a flow chart of a method for screening risk users of an automobile with Markov chain data processing in an embodiment of the invention.
Detailed Description
The following describes in further detail the embodiments of the present invention with reference to the drawings and examples. The following examples are illustrative of the invention and are not intended to limit the scope of the invention.
As shown in fig. 1, the method of this embodiment is as follows.
Step 1: the driving behavior related data are read from the database, the data are preprocessed according to the longitude and latitude and the data acquisition time acquired by the GPS, and the confidence and reliability of the data are improved, wherein the process is as follows:
step 1.1: checking whether repeated data exist in the data, and if so, only reserving one piece of data;
step 1.2: deleting tuples, 0 values and filling the average value and filling the deleted data by a K nearest neighbor distance method;
step 1.3: according to the longitude and latitude of each city, data which are not in the range of each city are regarded as abnormal data, and according to actual conditions, one statistical method of gradual backward deletion, average elimination and logic error deletion is adopted for data cleaning;
step 1.4: and according to the influence of the satellite positioning technology on the positioning precision, regarding data less than the threshold data amount as invalid data, and then carrying out data cleaning again.
In this embodiment, data such as the condition of the vehicle, the behavior of the driver, and the environment outside the vehicle during the driving process of the driver are obtained from the vehicle driving system and analyzed and processed.
Step 2: according to specific position information of each city, meshing division is carried out according to longitude and latitude and time data, vehicle operation data in driving behaviors of each small area in each time period are counted, data processing is carried out, and the process is as follows:
step 2.1: selecting a certain city, merging all data of different acquisition times, drawing a scatter diagram according to longitude and latitude, observing driving distribution conditions of an automobile, setting a city grid division standard according to the density degree of the scatter diagram, and obtaining the area grid size under different division standard conditions;
step 2.1.1: assuming that the maximum and minimum longitudes of the city are respectively max (X) and min (X), the maximum and minimum longitudes are respectively max (Y) and min (Y), and the side length of the city grid is set as r i (i=1, 2,3, …, m), where m represents m likelihood criteria for dividing the grid, then the number of grids divided by the city in terms of longitude and latitude is:
wherein ,nlength,i Represents the number of grids divided by longitude under the ith possibility division standard, n width,i Representing the number of latitudinal grids under the ith possibility division standard;
step 2.1.2: adding variances of the vehicle operation data of each area under different possibility division standards, and determining the optimal grid division standard with the smallest variance from the different possibility division standards according to a minimum variance method; or the voting method is adjusted according to the variance in the small area so as to avoid a large number of non-data areas;
step 2.2.: in the divided space cell, dividing the space cell into M time segments according to whether the space cell is a road section peak time point or not;
step 2.3: in each time period of each divided small area, carrying out mean and variance statistics on the data, and carrying out data calculation on the basis:
wherein ,and sigma (sigma) ijk The mean and variance of the data in the kth time period of the ith row and jth column grid of the city, x, respectively k For the data unprocessed for the grid k time period, x' k Is the data processed by the grid k time period.
In this embodiment, 4 meshing likelihood criteria are determined. Under four standards, the number of grids divided by longitude and latitude of the city is respectively as follows:
standard 1:38 and 27;
standard 2:18 and 15;
standards 3:12 and 8;
standards 4:9 and 7.
Step 3: dividing the time sequence data into a plurality of states according to the processed characteristic data, determining the interval dividing form of the measured distribution without tendency, and counting the state transition condition and state transition matrix of the data in the dividing interval, wherein the process is as follows:
step 3.1: dividing the time sequence data into N states according to the distribution condition of the processed characteristic data x';
the state is divided into equal intervals or unequal intervals according to actual distribution conditions;
step 3.2: converting the data subjected to the gridding treatment into states according to the upper and lower boundaries of the states, namely x (i) to s (i), wherein i=1, 2 and …, and generating a Markov chain;
wherein x (i) is data subjected to meshing processing at the moment i, and s (i) is a state at the moment i;
step 3.2.1: assuming that the upper and lower boundaries of the states are B and a, respectively, the interval between the states is:
step 3.2.2: when x (i) epsilon [ a+ (k-1) delta, A+kdelta ], s (i) =k, k=1, 2, …, N, so that the characteristic data corresponding to each time point is converted into state data between [1,2, …, N ], and the state data has the property of a Markov chain, so that the data set formed by all the state information s (i) is a Markov chain;
step 3.3: counting the transition condition of each state s (i), and extracting Markov characteristics;
step 3.3.1: defining the Markov characteristic, namely the transition condition of each state, and counting the upward and downward transition times of the state i asThe number of times of holding state i is k i ,/> and ki The calculation formula of (2) is as follows:
where s (j) represents the state at the moment j, s (j+1) represents the state at the moment j+1, and L represents the number of data points;
step 3.4: according to the extraction and ki The state transition probability and the state transition matrix are calculated as follows:
step 3.4.1: when the state is i=1, the corresponding state transition probability and state retention probability are:
wherein ,p1,1 To be the probability of transition from state 1 to state 1, p 1,2 A probability of transitioning from state 1 to state 2;
step 3.4.2: when the state is 1 < i < N, the corresponding state transition probability and state retention probability are as follows:
wherein ,pi,i-1 To transition from state i-1 to state i, p i,i To the probability of transition from state i to state i, p i,i+1 A probability of transition from state i to state i+1;
step 3.4.3: when the state is i=n, the corresponding state transition probability and state retention probability are:
wherein ,pN,N-1 To the probability of transition from state N to state N-1, p N,N The probability of transitioning to state N for state N;
step 3.4.4: the state transition matrix can be expressed as:
in this example, taking the speed data of the driver in the driving system of the vehicle as an example, the speed data is divided into 7 states, and the state transition matrix is as follows:
step 4: preprocessing a state transition condition and a state transition matrix, combining partial characteristics which are not processed by a Markov chain, and jointly forming data characteristics for classifying the neural network input, wherein the process is as follows:
step 4.1: after the data torque and power of part of the data which are not processed by the Markov chain are subjected to standardization processing, combining a state transition matrix to form a feature vector of a high-low risk user screening neural network together;
step 4.2: and randomly selecting 75% of data by adopting an S-fold cross validation model to manufacture a training set and 25% of data to manufacture a test set.
In this embodiment, the feature data dimension of the feature vector of the co-formed high-low risk user screening neural network is 175.
Step 5: in the training stage, the feature size is compressed by using a deep convolutional neural network, feature dimensions are enriched, main features are extracted, feature vectors output by the last layer of the feature network are input to a full-connection layer, and a classification result of high and low risks of a user is obtained after softmax normalization, and the process is as follows:
step 5.1: three layers of convolution layers of the neural network perform local feature extraction and combination by using convolution check feature vectors with shared parameters of various feature dimensions, and a standard convolution output matrix Y= (Y) ij ) Can be obtained by inputting a feature matrix x= (X) ij ) And convolution kernel matrix w= (W ij ) The calculation results are that:
wherein m, n are weight matrix position coordinates, i, j are input feature matrix position coordinates, w mn For the filter size at m, n positions, x i+m,j+n For the feature tensor to be processed by the filter at the i, j position, K is the convolution kernel size;
step 5.2: extracting key information by extracting the point with the largest median value in the local receiving area by the two maximum pooling layers of the neural network, and compressing the characteristics;
step 5.3: the neural network inputs the feature vector output by the last convolution layer into a full-connection layer, connects the extracted local features through a weight matrix, maps the extracted local features back to the global, adopts two layers of full-connection layers to improve the nonlinear expression capacity of the model, uses Dropout to prevent the model from being over-fitted, and obtains a classification result of high and low risks of the user through softmax normalization.
In this embodiment, three convolutional layers are defined, each followed by an activation function layer and a max pooling layer.
Step 6: and calculating cross entropy loss, and minimizing a loss function through random gradient descent, so that network model parameters are updated, and a better high-low risk user screening effect is realized.
In this embodiment, the loss is calculated using softmax cross entropy. In the test stage, the embodiment achieves a classification accuracy of 86.46%, wherein the screening accuracy of high-risk users is 89.74%, and the screening accuracy of low-risk users is 79.95%.
Claims (6)
1. A method for screening a risk user of an automobile with markov chain data processing, comprising the steps of:
step 1: reading driving behavior related data from a database, preprocessing the data according to longitude and latitude and data acquisition time acquired by a GPS, and improving the confidence coefficient and reliability of the data;
step 2: according to specific position information of each city, performing gridding division according to longitude and latitude and time data, counting vehicle operation data in driving behaviors of each small area in each time period, and performing data processing;
step 3: dividing the time sequence data into a plurality of states according to the processed characteristic data, determining a section dividing form of the measured distribution without tendency, and counting state transition conditions and state transition matrixes of the data in the dividing sections;
step 4: preprocessing a state transition condition and a state transition matrix, combining partial characteristics which are not processed by a Markov chain, and forming data characteristics together for classifying neural network input;
step 5: in the training stage, the feature size is compressed by using a deep convolutional neural network, feature dimensions are enriched, main features are extracted, feature vectors output by the last layer of the feature network are input to a full-connection layer, and a classification result of high risk and low risk of a user is obtained after softmax normalization;
said step 5 comprises the steps of:
step 5.1: three layers of convolution layers of the neural network perform local feature extraction and combination by using convolution check feature vectors with shared parameters of various feature dimensions, and a standard convolution output matrix Y= (Y) ij ) Can be obtained by inputting a feature matrix x= (X) ij ) And convolution kernel matrix w= (W ij ) The calculation results are that:
wherein m, n are weight matrix position coordinates, i, j are input feature matrix position coordinates, w mn For the filter size at m, n positions, x i+m,j+n For the feature tensor to be processed by the filter at the i, j position, K is the convolution kernel size;
step 5.2: extracting key information by extracting the point with the largest median value in the local receiving area by the two maximum pooling layers of the neural network, and compressing the characteristics;
step 5.3: the neural network inputs the feature vector output by the last convolution layer into a full-connection layer, connects the extracted local features through a weight matrix, maps the extracted local features back to the global, adopts two layers of full-connection layers to improve the nonlinear expression capacity of the model, uses Dropout to prevent the model from being over-fitted, and obtains a classification result of high and low risks of the user through softmax normalization;
step 6: and calculating cross entropy loss, and minimizing a loss function through random gradient descent, so that network model parameters are updated, and a better high-low risk user screening effect is realized.
2. The method for screening risk users of an automobile with markov chain data processing of claim 1, wherein: the step 1 comprises the following steps:
step 1.1: checking whether repeated data exist in the data, and if so, only reserving one piece of data;
step 1.2: deleting tuples, 0 values and filling the average value and filling the deleted data by a K nearest neighbor distance method;
step 1.3: according to the longitude and latitude of each city, data which are not in the range of each city are regarded as abnormal data, and according to actual conditions, one statistical method of gradual backward deletion, average elimination and logic error deletion is adopted for data cleaning;
step 1.4: and according to the influence of the satellite positioning technology on the positioning precision, regarding data less than the threshold data amount as invalid data, and then carrying out data cleaning again.
3. The method for screening risk users of an automobile with markov chain data processing of claim 1, wherein: the step 2 comprises the following steps:
step 2.1: selecting a certain city, merging all data of different acquisition times, drawing a scatter diagram according to longitude and latitude, observing driving distribution conditions of an automobile, setting a city grid division standard according to the density degree of the scatter diagram, and obtaining the area grid size under different division standard conditions;
step 2.1.1: assuming that the maximum and minimum longitudes of the city are respectively max (X) and min (X), the maximum and minimum longitudes are respectively max (Y) and min (Y), and the side length of the city grid is set as r i I=1, 2,3, …, m, where m represents that there are m likelihood criteria for dividing the grid, and then the number of grids divided by the city in terms of longitude and latitude is respectively:
wherein ,nlength,i Represents the number of grids divided by longitude under the ith possibility division standard, n width,i Representing the number of latitudinal grids under the ith possibility division standard;
step 2.1.2: adding variances of the vehicle operation data of each area under different possibility division standards, and determining the optimal grid division standard with the smallest variance from the different possibility division standards according to a minimum variance method; or the voting method is adjusted according to the variance in the small area so as to avoid a large number of non-data areas;
step 2.2.: in the divided space cell, dividing the space cell into M time segments according to whether the space cell is a road section peak time point or not;
step 2.3: in each time period of each divided small area, carrying out mean and variance statistics on the data, and carrying out data calculation on the basis:
wherein ,and sigma (sigma) ijk The mean and variance of the data in the kth time period of the ith row and jth column grid of the city, x, respectively k For the data unprocessed for the grid k time period, x' k Is the data processed by the grid k time period.
4. The method for screening risk users of an automobile with markov chain data processing of claim 1, wherein: the step 3 comprises the following steps:
step 3.1: dividing the time sequence data into N states according to the distribution condition of the processed characteristic data x';
the state is divided into equal intervals or unequal intervals according to actual distribution conditions;
step 3.2: converting the data subjected to the gridding treatment into states according to the upper and lower boundaries of the states, namely x (i) to s (i), wherein i=1, 2 and …, and generating a Markov chain;
wherein x (i) is data subjected to meshing processing at the moment i, and s (i) is a state at the moment i;
step 3.2.1: assuming that the upper and lower boundaries of the states are B and a, respectively, the interval between the states is:
step 3.2.2: when x (i) epsilon [ a+ (k-1) delta, A+kdelta ], s (i) =k, k=1, 2, …, N, so that the characteristic data corresponding to each time point is converted into state data between [1,2, …, N ], and the state data has the property of a Markov chain, so that the data set formed by all the state information s (i) is a Markov chain;
step 3.3: counting the transition condition of each state s (i), and extracting Markov characteristics;
step 3.3.1: defining the Markov characteristic, namely the transition condition of each state, and counting the upward and downward transition times of the state i asThe number of times of holding state i is k i ,/> and ki The calculation formula of (2) is as follows:
where s (j) represents the state at the moment j, s (j+1) represents the state at the moment j+1, and L represents the number of data points;
step 3.4: according to the extraction and ki The state transition probabilities and state transition matrices are calculated.
5. The method for screening risk users of an automobile with markov chain data processing of claim 4, wherein: the step 3.4 comprises the following steps:
step 3.4.1: when the state is i=1, the corresponding state transition probability and state retention probability are:
wherein ,p1,1 To be the probability of transition from state 1 to state 1, p 1,2 A probability of transitioning from state 1 to state 2;
step 3.4.2: when the state is 1 < i < N, the corresponding state transition probability and state retention probability are as follows:
wherein ,pi,i-1 To transition from state i-1 to state i, p i,i To the probability of transition from state i to state i, p i,i+1 A probability of transition from state i to state i+1;
step 3.4.3: when the state is i=n, the corresponding state transition probability and state retention probability are:
wherein ,pN,N-1 To the probability of transition from state N to state N-1, p N,N The probability of transitioning to state N for state N;
step 3.4.4: the state transition matrix is expressed as:
6. the method for screening risk users of an automobile with markov chain data processing of claim 1, wherein: the step 4 comprises the following steps:
step 4.1: after the data torque and power of part of the data which are not processed by the Markov chain are subjected to standardization processing, combining a state transition matrix to form a feature vector of a high-low risk user screening neural network together;
step 4.2: and randomly selecting 75% of data by adopting an S-fold cross validation model to manufacture a training set and 25% of data to manufacture a test set.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011021233.1A CN112183615B (en) | 2020-09-25 | 2020-09-25 | Automobile risk user screening method with Markov chain data processing function |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011021233.1A CN112183615B (en) | 2020-09-25 | 2020-09-25 | Automobile risk user screening method with Markov chain data processing function |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112183615A CN112183615A (en) | 2021-01-05 |
CN112183615B true CN112183615B (en) | 2023-08-18 |
Family
ID=73943430
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011021233.1A Active CN112183615B (en) | 2020-09-25 | 2020-09-25 | Automobile risk user screening method with Markov chain data processing function |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112183615B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2001148019A (en) * | 1999-06-01 | 2001-05-29 | Fujitsu Ltd | Method and device for classifying motion between traveling objects, image recognizing device, and method and device for recognizing traveling object |
WO2018022821A1 (en) * | 2016-07-29 | 2018-02-01 | Arizona Board Of Regents On Behalf Of Arizona State University | Memory compression in a deep neural network |
CN107742193A (en) * | 2017-11-28 | 2018-02-27 | 江苏大学 | A kind of driving Risk Forecast Method based on time-varying state transition probability Markov chain |
CN111209966A (en) * | 2020-01-07 | 2020-05-29 | 中南大学 | Markov chain-based path travel time determination method and system |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10902336B2 (en) * | 2017-10-03 | 2021-01-26 | International Business Machines Corporation | Monitoring vehicular operation risk using sensing devices |
-
2020
- 2020-09-25 CN CN202011021233.1A patent/CN112183615B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2001148019A (en) * | 1999-06-01 | 2001-05-29 | Fujitsu Ltd | Method and device for classifying motion between traveling objects, image recognizing device, and method and device for recognizing traveling object |
WO2018022821A1 (en) * | 2016-07-29 | 2018-02-01 | Arizona Board Of Regents On Behalf Of Arizona State University | Memory compression in a deep neural network |
CN107742193A (en) * | 2017-11-28 | 2018-02-27 | 江苏大学 | A kind of driving Risk Forecast Method based on time-varying state transition probability Markov chain |
CN111209966A (en) * | 2020-01-07 | 2020-05-29 | 中南大学 | Markov chain-based path travel time determination method and system |
Also Published As
Publication number | Publication date |
---|---|
CN112183615A (en) | 2021-01-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112991354B (en) | High-resolution remote sensing image semantic segmentation method based on deep learning | |
CN109635010B (en) | User characteristic and characteristic factor extraction and query method and system | |
CN111861756B (en) | Group partner detection method based on financial transaction network and realization device thereof | |
CN101694744A (en) | Method and system for evaluating road emergency evacuation capacity and method and system for grading road emergency evacuation capacity | |
CN111598460B (en) | Method, device, equipment and storage medium for monitoring heavy metal content of soil | |
CN112766283B (en) | Two-phase flow pattern identification method based on multi-scale convolution network | |
CN111539444A (en) | Gaussian mixture model method for modified mode recognition and statistical modeling | |
Meng et al. | Improving automobile insurance claims frequency prediction with telematics car driving data | |
CN115905818A (en) | Landslide early warning method based on data mining | |
CN116307103A (en) | Traffic accident prediction method based on hard parameter sharing multitask learning | |
CN111242028A (en) | Remote sensing image ground object segmentation method based on U-Net | |
CN114973019A (en) | Deep learning-based geospatial information change detection classification method and system | |
CN113206808B (en) | Channel coding blind identification method based on one-dimensional multi-input convolutional neural network | |
CN114882373A (en) | Multi-feature fusion sandstorm prediction method based on deep neural network | |
CN117726939A (en) | Hyperspectral image classification method based on multi-feature fusion | |
CN112183615B (en) | Automobile risk user screening method with Markov chain data processing function | |
CN117436653A (en) | Prediction model construction method and prediction method for travel demands of network about vehicles | |
CN110188324B (en) | Traffic accident poisson regression analysis method based on feature vector space filtering value | |
CN114943290B (en) | Biological intrusion recognition method based on multi-source data fusion analysis | |
CN114252706B (en) | Lightning early warning method and system | |
CN115293639A (en) | Battlefield situation studying and judging method based on hidden Markov model | |
CN115130599A (en) | Semi-supervision method for strip mine card state recognition under time series GAN data enhancement | |
CN112465054A (en) | Multivariate time series data classification method based on FCN | |
CN116698410B (en) | Rolling bearing multi-sensor data monitoring method based on convolutional neural network | |
CN117216490B (en) | Intelligent big data acquisition system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |