CN112183615B - Automobile risk user screening method with Markov chain data processing function - Google Patents

Automobile risk user screening method with Markov chain data processing function Download PDF

Info

Publication number
CN112183615B
CN112183615B CN202011021233.1A CN202011021233A CN112183615B CN 112183615 B CN112183615 B CN 112183615B CN 202011021233 A CN202011021233 A CN 202011021233A CN 112183615 B CN112183615 B CN 112183615B
Authority
CN
China
Prior art keywords
data
state
markov chain
transition
probability
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011021233.1A
Other languages
Chinese (zh)
Other versions
CN112183615A (en
Inventor
刘洋
郑泉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ruichida New Energy Automotive Technology Beijing Co ltd
Original Assignee
Ruichida New Energy Automotive Technology Beijing Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ruichida New Energy Automotive Technology Beijing Co ltd filed Critical Ruichida New Energy Automotive Technology Beijing Co ltd
Priority to CN202011021233.1A priority Critical patent/CN112183615B/en
Publication of CN112183615A publication Critical patent/CN112183615A/en
Application granted granted Critical
Publication of CN112183615B publication Critical patent/CN112183615B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Complex Calculations (AREA)

Abstract

The invention discloses an automobile risk user screening method with Markov chain data processing, which belongs to the technical field of automobile risk user classification, and comprises the steps of acquiring data attributes in a user journey, acquiring longitude and latitude and time of data, and cleaning and integrating the data; dividing data into small areas according to time and space, and extracting data characteristics of each area; mapping the data processed by each region into a data format of a Markov chain to obtain a state transition matrix; the extracted features are applied to a convolutional neural network model training classifier, and network parameters are updated with cross entropy loss. The invention solves the problem of how to extract the most effective feature for classifying from a plurality of features, realizes the transformation of feature space dimension, and obtains a group of classification features with invariance of similar samples and discriminativity of different samples.

Description

Automobile risk user screening method with Markov chain data processing function
Technical Field
The invention relates to the technical field of automobile risk user classification, in particular to an automobile risk user screening method with Markov chain data processing.
Background
With the continuous development of the scientific and technological achievement of artificial intelligence, the deep learning classification neural network is migrated to each research field, and the actual value brought by the artificial intelligence to human becomes the direction of efforts of a plurality of scientific researchers; for example, application of classification networks to high and low risk user screening in the automotive industry is an important aid and reference for the automotive industry to better serve users. Because the data screened by the high-risk and low-risk users of the automobile come from a vehicle driving system, the data provided by the system is huge in quantity, and also has a large quantity of noise, meanwhile, the identification degree of the data is low, and large differences exist among available data attributes, so that the network training is difficult. As the network deepens, overfitting is easily caused, resulting in failure of the network model to converge.
At present, a Markov chain is widely applied to the artificial intelligence fields such as voice recognition, text recognition, path recognition and the like as a concept for explaining a time process; either in the financial field, it is used to predict market share of enterprise products, or as a signal model for entropy coding techniques, etc., but it is a main approach to solve various problems. Related applications of Markov chains are not available in the aspect of screening the automobile risk users temporarily so as to solve the problem of preprocessing data and strengthen the screening accuracy of the automobile risk users.
Disclosure of Invention
In view of the above-described deficiencies of the prior art, the present invention provides a method for screening risk users of an automobile with markov chain data processing.
In order to solve the technical problems, the invention adopts the following technical scheme: an automobile risk user screening method with Markov chain data processing, comprising the following steps:
step 1: the driving behavior related data are read from the database, the data are preprocessed according to the longitude and latitude and the data acquisition time acquired by the GPS, and the confidence and reliability of the data are improved, wherein the process is as follows:
step 1.1: checking whether repeated data exist in the data, and if so, only reserving one piece of data;
step 1.2: deleting tuples, 0 values and filling the average value and filling the deleted data by a K nearest neighbor distance method;
step 1.3: according to the longitude and latitude of each city, data which are not in the range of each city are regarded as abnormal data, and according to actual conditions, one statistical method of gradual backward deletion, average elimination and logic error deletion is adopted for data cleaning;
step 1.4: and according to the influence of the satellite positioning technology on the positioning precision, regarding data less than the threshold data amount as invalid data, and then carrying out data cleaning again.
Step 2: according to specific position information of each city, meshing division is carried out according to longitude and latitude and time data, vehicle operation data in driving behaviors of each small area in each time period are counted, data processing is carried out, and the process is as follows:
step 2.1: selecting a certain city, merging all data of different acquisition times, drawing a scatter diagram according to longitude and latitude, observing driving distribution conditions of an automobile, setting a city grid division standard according to the density degree of the scatter diagram, and obtaining the area grid size under different division standard conditions;
step 2.1.1: assuming that the maximum and minimum longitudes of the city are respectively max (X) and min (X), the maximum and minimum longitudes are respectively max (Y) and min (Y), and the side length of the city grid is set as r i (i=1, 2,3, …, m), where m represents m likelihood criteria for dividing the grid, then the number of grids divided by the city in terms of longitude and latitude is:
wherein ,nlength,i Represents the number of grids divided by longitude under the ith possibility division standard, n width,i Representing the number of latitudinal grids under the ith possibility division standard;
step 2.1.2: adding variances of the vehicle operation data of each area under different possibility division standards, and determining the optimal grid division standard with the smallest variance from the different possibility division standards according to a minimum variance method; or the voting method is adjusted according to the variance in the small area so as to avoid a large number of non-data areas;
step 2.2.: in the divided space cell, dividing the space cell into M time segments according to whether the space cell is a road section peak time point or not;
step 2.3: in each time period of each divided small area, carrying out mean and variance statistics on the data, and carrying out data calculation on the basis:
wherein ,and sigma (sigma) ijk The ith row and the jth column of the city respectivelyMean and variance of data in the kth time period of the grid, x k For the data unprocessed for the grid k time period, x' k Is the data processed by the grid k time period.
Step 3: dividing the time sequence data into a plurality of states according to the processed characteristic data, determining the interval dividing form of the measured distribution without tendency, and counting the state transition condition and state transition matrix of the data in the dividing interval, wherein the process is as follows:
step 3.1: dividing the time sequence data into N states according to the distribution condition of the processed characteristic data x';
the state is divided into equal intervals or unequal intervals according to actual distribution conditions;
step 3.2: converting the data subjected to the gridding treatment into states according to the upper and lower boundaries of the states, namely x (i) to s (i), wherein i=1, 2 and …, and generating a Markov chain;
wherein x (i) is data subjected to meshing processing at the moment i, and s (i) is a state at the moment i;
step 3.2.1: assuming that the upper and lower boundaries of the states are B and a, respectively, the interval between the states is:
step 3.2.2: when x (i) epsilon [ a+ (k-1) delta, A+kdelta ], s (i) =k, k=1, 2, …, N, so that the characteristic data corresponding to each time point is converted into state data between [1,2, …, N ], and the state data has the property of a Markov chain, so that the data set formed by all the state information s (i) is a Markov chain;
step 3.3: counting the transition condition of each state s (i), and extracting Markov characteristics;
step 3.3.1: defining the Markov characteristic, namely the transition condition of each state, and counting the upward and downward transition times of the state i asThe number of times of holding state i is k i ,/> and ki The calculation formula of (2) is as follows:
where s (j) represents the state at the moment j, s (j+1) represents the state at the moment j+1, and L represents the number of data points;
step 3.4: according to the extraction and ki The state transition probability and the state transition matrix are calculated as follows:
step 3.4.1: when the state is i=1, the corresponding state transition probability and state retention probability are:
wherein ,p1,1 To be the probability of transition from state 1 to state 1, p 1,2 A probability of transitioning from state 1 to state 2;
step 3.4.2: when the state is 1 < i < N, the corresponding state transition probability and state retention probability are as follows:
wherein ,pi,i-1 To transition from state i-1 to state i, p i,i To the probability of transition from state i to state i, p i,i+1 A probability of transition from state i to state i+1;
step 3.4.3: when the state is i=n, the corresponding state transition probability and state retention probability are:
wherein ,pN,N-1 To the probability of transition from state N to state N-1, p N,N The probability of transitioning to state N for state N;
step 3.4.4: the state transition matrix can be expressed as:
step 4: preprocessing a state transition condition and a state transition matrix, combining partial characteristics which are not processed by a Markov chain, and jointly forming data characteristics for classifying the neural network input, wherein the process is as follows:
step 4.1: after the data torque and power of part of the data which are not processed by the Markov chain are subjected to standardization processing, combining a state transition matrix to form a feature vector of a high-low risk user screening neural network together;
step 4.2: and randomly selecting 75% of data by adopting an S-fold cross validation model to manufacture a training set and 25% of data to manufacture a test set.
Step 5: in the training stage, the feature size is compressed by using a deep convolutional neural network, feature dimensions are enriched, main features are extracted, feature vectors output by the last layer of the feature network are input to a full-connection layer, and a classification result of high and low risks of a user is obtained after softmax normalization, and the process is as follows:
step 5.1: three layers of convolution layers of the neural network perform local feature extraction and combination by using convolution check feature vectors with shared parameters of various feature dimensions, and a standard convolution output matrix Y= (Y) ij ) Can be obtained by inputting a feature matrix x= (X) ij ) And convolution kernel matrix w= (W ij ) The calculation results are that:
wherein m, n are weight matrix position coordinates, i, j areInputting position coordinates of a feature matrix, w mn For the filter size at m, n positions, x i+m,j+n For the feature tensor to be processed by the filter at the i, j position, K is the convolution kernel size;
step 5.2: extracting key information by extracting the point with the largest median value in the local receiving area by the two maximum pooling layers of the neural network, and compressing the characteristics;
step 5.3: the neural network inputs the feature vector output by the last convolution layer into a full-connection layer, connects the extracted local features through a weight matrix, maps the extracted local features back to the global, adopts two layers of full-connection layers to improve the nonlinear expression capacity of the model, uses Dropout to prevent the model from being over-fitted, and obtains a classification result of high and low risks of the user through softmax normalization.
Step 6: and calculating cross entropy loss, and minimizing a loss function through random gradient descent, so that network model parameters are updated, and a better high-low risk user screening effect is realized.
The beneficial effects of adopting above-mentioned technical scheme to produce lie in:
1. the method designs a light network, extracts and combines local features by convolution calculation, and utilizes the maximum pooling compression feature to adapt to the task of screening high-risk users and low-risk users.
2. Aiming at the noise problem of data, the invention provides a data processing method using a Markov chain, the data characteristics of the data are considered, time series data are divided into a plurality of states, the Markov chain is established, and a state transition condition and a state transition matrix are generated for constructing partial data of a neural network.
3. The invention reasonably combines deep learning in the artificial intelligence field with high-low risk user screening and Markov chain screening, can extract the most effective classification characteristic from a plurality of characteristics under the condition of low noise identification degree of data, and simultaneously realizes the transformation of characteristic space dimension by Markov chain data processing, thereby obtaining a group of identification classification characteristics with invariance of similar samples and different samples.
Drawings
FIG. 1 is a flow chart of a method for screening risk users of an automobile with Markov chain data processing in an embodiment of the invention.
Detailed Description
The following describes in further detail the embodiments of the present invention with reference to the drawings and examples. The following examples are illustrative of the invention and are not intended to limit the scope of the invention.
As shown in fig. 1, the method of this embodiment is as follows.
Step 1: the driving behavior related data are read from the database, the data are preprocessed according to the longitude and latitude and the data acquisition time acquired by the GPS, and the confidence and reliability of the data are improved, wherein the process is as follows:
step 1.1: checking whether repeated data exist in the data, and if so, only reserving one piece of data;
step 1.2: deleting tuples, 0 values and filling the average value and filling the deleted data by a K nearest neighbor distance method;
step 1.3: according to the longitude and latitude of each city, data which are not in the range of each city are regarded as abnormal data, and according to actual conditions, one statistical method of gradual backward deletion, average elimination and logic error deletion is adopted for data cleaning;
step 1.4: and according to the influence of the satellite positioning technology on the positioning precision, regarding data less than the threshold data amount as invalid data, and then carrying out data cleaning again.
In this embodiment, data such as the condition of the vehicle, the behavior of the driver, and the environment outside the vehicle during the driving process of the driver are obtained from the vehicle driving system and analyzed and processed.
Step 2: according to specific position information of each city, meshing division is carried out according to longitude and latitude and time data, vehicle operation data in driving behaviors of each small area in each time period are counted, data processing is carried out, and the process is as follows:
step 2.1: selecting a certain city, merging all data of different acquisition times, drawing a scatter diagram according to longitude and latitude, observing driving distribution conditions of an automobile, setting a city grid division standard according to the density degree of the scatter diagram, and obtaining the area grid size under different division standard conditions;
step 2.1.1: assuming that the maximum and minimum longitudes of the city are respectively max (X) and min (X), the maximum and minimum longitudes are respectively max (Y) and min (Y), and the side length of the city grid is set as r i (i=1, 2,3, …, m), where m represents m likelihood criteria for dividing the grid, then the number of grids divided by the city in terms of longitude and latitude is:
wherein ,nlength,i Represents the number of grids divided by longitude under the ith possibility division standard, n width,i Representing the number of latitudinal grids under the ith possibility division standard;
step 2.1.2: adding variances of the vehicle operation data of each area under different possibility division standards, and determining the optimal grid division standard with the smallest variance from the different possibility division standards according to a minimum variance method; or the voting method is adjusted according to the variance in the small area so as to avoid a large number of non-data areas;
step 2.2.: in the divided space cell, dividing the space cell into M time segments according to whether the space cell is a road section peak time point or not;
step 2.3: in each time period of each divided small area, carrying out mean and variance statistics on the data, and carrying out data calculation on the basis:
wherein ,and sigma (sigma) ijk The mean and variance of the data in the kth time period of the ith row and jth column grid of the city, x, respectively k For the data unprocessed for the grid k time period, x' k Is the data processed by the grid k time period.
In this embodiment, 4 meshing likelihood criteria are determined. Under four standards, the number of grids divided by longitude and latitude of the city is respectively as follows:
standard 1:38 and 27;
standard 2:18 and 15;
standards 3:12 and 8;
standards 4:9 and 7.
Step 3: dividing the time sequence data into a plurality of states according to the processed characteristic data, determining the interval dividing form of the measured distribution without tendency, and counting the state transition condition and state transition matrix of the data in the dividing interval, wherein the process is as follows:
step 3.1: dividing the time sequence data into N states according to the distribution condition of the processed characteristic data x';
the state is divided into equal intervals or unequal intervals according to actual distribution conditions;
step 3.2: converting the data subjected to the gridding treatment into states according to the upper and lower boundaries of the states, namely x (i) to s (i), wherein i=1, 2 and …, and generating a Markov chain;
wherein x (i) is data subjected to meshing processing at the moment i, and s (i) is a state at the moment i;
step 3.2.1: assuming that the upper and lower boundaries of the states are B and a, respectively, the interval between the states is:
step 3.2.2: when x (i) epsilon [ a+ (k-1) delta, A+kdelta ], s (i) =k, k=1, 2, …, N, so that the characteristic data corresponding to each time point is converted into state data between [1,2, …, N ], and the state data has the property of a Markov chain, so that the data set formed by all the state information s (i) is a Markov chain;
step 3.3: counting the transition condition of each state s (i), and extracting Markov characteristics;
step 3.3.1: defining the Markov characteristic, namely the transition condition of each state, and counting the upward and downward transition times of the state i asThe number of times of holding state i is k i ,/> and ki The calculation formula of (2) is as follows:
where s (j) represents the state at the moment j, s (j+1) represents the state at the moment j+1, and L represents the number of data points;
step 3.4: according to the extraction and ki The state transition probability and the state transition matrix are calculated as follows:
step 3.4.1: when the state is i=1, the corresponding state transition probability and state retention probability are:
wherein ,p1,1 To be the probability of transition from state 1 to state 1, p 1,2 A probability of transitioning from state 1 to state 2;
step 3.4.2: when the state is 1 < i < N, the corresponding state transition probability and state retention probability are as follows:
wherein ,pi,i-1 To transition from state i-1 to state i, p i,i To the probability of transition from state i to state i, p i,i+1 A probability of transition from state i to state i+1;
step 3.4.3: when the state is i=n, the corresponding state transition probability and state retention probability are:
wherein ,pN,N-1 To the probability of transition from state N to state N-1, p N,N The probability of transitioning to state N for state N;
step 3.4.4: the state transition matrix can be expressed as:
in this example, taking the speed data of the driver in the driving system of the vehicle as an example, the speed data is divided into 7 states, and the state transition matrix is as follows:
step 4: preprocessing a state transition condition and a state transition matrix, combining partial characteristics which are not processed by a Markov chain, and jointly forming data characteristics for classifying the neural network input, wherein the process is as follows:
step 4.1: after the data torque and power of part of the data which are not processed by the Markov chain are subjected to standardization processing, combining a state transition matrix to form a feature vector of a high-low risk user screening neural network together;
step 4.2: and randomly selecting 75% of data by adopting an S-fold cross validation model to manufacture a training set and 25% of data to manufacture a test set.
In this embodiment, the feature data dimension of the feature vector of the co-formed high-low risk user screening neural network is 175.
Step 5: in the training stage, the feature size is compressed by using a deep convolutional neural network, feature dimensions are enriched, main features are extracted, feature vectors output by the last layer of the feature network are input to a full-connection layer, and a classification result of high and low risks of a user is obtained after softmax normalization, and the process is as follows:
step 5.1: three layers of convolution layers of the neural network perform local feature extraction and combination by using convolution check feature vectors with shared parameters of various feature dimensions, and a standard convolution output matrix Y= (Y) ij ) Can be obtained by inputting a feature matrix x= (X) ij ) And convolution kernel matrix w= (W ij ) The calculation results are that:
wherein m, n are weight matrix position coordinates, i, j are input feature matrix position coordinates, w mn For the filter size at m, n positions, x i+m,j+n For the feature tensor to be processed by the filter at the i, j position, K is the convolution kernel size;
step 5.2: extracting key information by extracting the point with the largest median value in the local receiving area by the two maximum pooling layers of the neural network, and compressing the characteristics;
step 5.3: the neural network inputs the feature vector output by the last convolution layer into a full-connection layer, connects the extracted local features through a weight matrix, maps the extracted local features back to the global, adopts two layers of full-connection layers to improve the nonlinear expression capacity of the model, uses Dropout to prevent the model from being over-fitted, and obtains a classification result of high and low risks of the user through softmax normalization.
In this embodiment, three convolutional layers are defined, each followed by an activation function layer and a max pooling layer.
Step 6: and calculating cross entropy loss, and minimizing a loss function through random gradient descent, so that network model parameters are updated, and a better high-low risk user screening effect is realized.
In this embodiment, the loss is calculated using softmax cross entropy. In the test stage, the embodiment achieves a classification accuracy of 86.46%, wherein the screening accuracy of high-risk users is 89.74%, and the screening accuracy of low-risk users is 79.95%.

Claims (6)

1. A method for screening a risk user of an automobile with markov chain data processing, comprising the steps of:
step 1: reading driving behavior related data from a database, preprocessing the data according to longitude and latitude and data acquisition time acquired by a GPS, and improving the confidence coefficient and reliability of the data;
step 2: according to specific position information of each city, performing gridding division according to longitude and latitude and time data, counting vehicle operation data in driving behaviors of each small area in each time period, and performing data processing;
step 3: dividing the time sequence data into a plurality of states according to the processed characteristic data, determining a section dividing form of the measured distribution without tendency, and counting state transition conditions and state transition matrixes of the data in the dividing sections;
step 4: preprocessing a state transition condition and a state transition matrix, combining partial characteristics which are not processed by a Markov chain, and forming data characteristics together for classifying neural network input;
step 5: in the training stage, the feature size is compressed by using a deep convolutional neural network, feature dimensions are enriched, main features are extracted, feature vectors output by the last layer of the feature network are input to a full-connection layer, and a classification result of high risk and low risk of a user is obtained after softmax normalization;
said step 5 comprises the steps of:
step 5.1: three layers of convolution layers of the neural network perform local feature extraction and combination by using convolution check feature vectors with shared parameters of various feature dimensions, and a standard convolution output matrix Y= (Y) ij ) Can be obtained by inputting a feature matrix x= (X) ij ) And convolution kernel matrix w= (W ij ) The calculation results are that:
wherein m, n are weight matrix position coordinates, i, j are input feature matrix position coordinates, w mn For the filter size at m, n positions, x i+m,j+n For the feature tensor to be processed by the filter at the i, j position, K is the convolution kernel size;
step 5.2: extracting key information by extracting the point with the largest median value in the local receiving area by the two maximum pooling layers of the neural network, and compressing the characteristics;
step 5.3: the neural network inputs the feature vector output by the last convolution layer into a full-connection layer, connects the extracted local features through a weight matrix, maps the extracted local features back to the global, adopts two layers of full-connection layers to improve the nonlinear expression capacity of the model, uses Dropout to prevent the model from being over-fitted, and obtains a classification result of high and low risks of the user through softmax normalization;
step 6: and calculating cross entropy loss, and minimizing a loss function through random gradient descent, so that network model parameters are updated, and a better high-low risk user screening effect is realized.
2. The method for screening risk users of an automobile with markov chain data processing of claim 1, wherein: the step 1 comprises the following steps:
step 1.1: checking whether repeated data exist in the data, and if so, only reserving one piece of data;
step 1.2: deleting tuples, 0 values and filling the average value and filling the deleted data by a K nearest neighbor distance method;
step 1.3: according to the longitude and latitude of each city, data which are not in the range of each city are regarded as abnormal data, and according to actual conditions, one statistical method of gradual backward deletion, average elimination and logic error deletion is adopted for data cleaning;
step 1.4: and according to the influence of the satellite positioning technology on the positioning precision, regarding data less than the threshold data amount as invalid data, and then carrying out data cleaning again.
3. The method for screening risk users of an automobile with markov chain data processing of claim 1, wherein: the step 2 comprises the following steps:
step 2.1: selecting a certain city, merging all data of different acquisition times, drawing a scatter diagram according to longitude and latitude, observing driving distribution conditions of an automobile, setting a city grid division standard according to the density degree of the scatter diagram, and obtaining the area grid size under different division standard conditions;
step 2.1.1: assuming that the maximum and minimum longitudes of the city are respectively max (X) and min (X), the maximum and minimum longitudes are respectively max (Y) and min (Y), and the side length of the city grid is set as r i I=1, 2,3, …, m, where m represents that there are m likelihood criteria for dividing the grid, and then the number of grids divided by the city in terms of longitude and latitude is respectively:
wherein ,nlength,i Represents the number of grids divided by longitude under the ith possibility division standard, n width,i Representing the number of latitudinal grids under the ith possibility division standard;
step 2.1.2: adding variances of the vehicle operation data of each area under different possibility division standards, and determining the optimal grid division standard with the smallest variance from the different possibility division standards according to a minimum variance method; or the voting method is adjusted according to the variance in the small area so as to avoid a large number of non-data areas;
step 2.2.: in the divided space cell, dividing the space cell into M time segments according to whether the space cell is a road section peak time point or not;
step 2.3: in each time period of each divided small area, carrying out mean and variance statistics on the data, and carrying out data calculation on the basis:
wherein ,and sigma (sigma) ijk The mean and variance of the data in the kth time period of the ith row and jth column grid of the city, x, respectively k For the data unprocessed for the grid k time period, x' k Is the data processed by the grid k time period.
4. The method for screening risk users of an automobile with markov chain data processing of claim 1, wherein: the step 3 comprises the following steps:
step 3.1: dividing the time sequence data into N states according to the distribution condition of the processed characteristic data x';
the state is divided into equal intervals or unequal intervals according to actual distribution conditions;
step 3.2: converting the data subjected to the gridding treatment into states according to the upper and lower boundaries of the states, namely x (i) to s (i), wherein i=1, 2 and …, and generating a Markov chain;
wherein x (i) is data subjected to meshing processing at the moment i, and s (i) is a state at the moment i;
step 3.2.1: assuming that the upper and lower boundaries of the states are B and a, respectively, the interval between the states is:
step 3.2.2: when x (i) epsilon [ a+ (k-1) delta, A+kdelta ], s (i) =k, k=1, 2, …, N, so that the characteristic data corresponding to each time point is converted into state data between [1,2, …, N ], and the state data has the property of a Markov chain, so that the data set formed by all the state information s (i) is a Markov chain;
step 3.3: counting the transition condition of each state s (i), and extracting Markov characteristics;
step 3.3.1: defining the Markov characteristic, namely the transition condition of each state, and counting the upward and downward transition times of the state i asThe number of times of holding state i is k i ,/> and ki The calculation formula of (2) is as follows:
where s (j) represents the state at the moment j, s (j+1) represents the state at the moment j+1, and L represents the number of data points;
step 3.4: according to the extraction and ki The state transition probabilities and state transition matrices are calculated.
5. The method for screening risk users of an automobile with markov chain data processing of claim 4, wherein: the step 3.4 comprises the following steps:
step 3.4.1: when the state is i=1, the corresponding state transition probability and state retention probability are:
wherein ,p1,1 To be the probability of transition from state 1 to state 1, p 1,2 A probability of transitioning from state 1 to state 2;
step 3.4.2: when the state is 1 < i < N, the corresponding state transition probability and state retention probability are as follows:
wherein ,pi,i-1 To transition from state i-1 to state i, p i,i To the probability of transition from state i to state i, p i,i+1 A probability of transition from state i to state i+1;
step 3.4.3: when the state is i=n, the corresponding state transition probability and state retention probability are:
wherein ,pN,N-1 To the probability of transition from state N to state N-1, p N,N The probability of transitioning to state N for state N;
step 3.4.4: the state transition matrix is expressed as:
6. the method for screening risk users of an automobile with markov chain data processing of claim 1, wherein: the step 4 comprises the following steps:
step 4.1: after the data torque and power of part of the data which are not processed by the Markov chain are subjected to standardization processing, combining a state transition matrix to form a feature vector of a high-low risk user screening neural network together;
step 4.2: and randomly selecting 75% of data by adopting an S-fold cross validation model to manufacture a training set and 25% of data to manufacture a test set.
CN202011021233.1A 2020-09-25 2020-09-25 Automobile risk user screening method with Markov chain data processing function Active CN112183615B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011021233.1A CN112183615B (en) 2020-09-25 2020-09-25 Automobile risk user screening method with Markov chain data processing function

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011021233.1A CN112183615B (en) 2020-09-25 2020-09-25 Automobile risk user screening method with Markov chain data processing function

Publications (2)

Publication Number Publication Date
CN112183615A CN112183615A (en) 2021-01-05
CN112183615B true CN112183615B (en) 2023-08-18

Family

ID=73943430

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011021233.1A Active CN112183615B (en) 2020-09-25 2020-09-25 Automobile risk user screening method with Markov chain data processing function

Country Status (1)

Country Link
CN (1) CN112183615B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001148019A (en) * 1999-06-01 2001-05-29 Fujitsu Ltd Method and device for classifying motion between traveling objects, image recognizing device, and method and device for recognizing traveling object
WO2018022821A1 (en) * 2016-07-29 2018-02-01 Arizona Board Of Regents On Behalf Of Arizona State University Memory compression in a deep neural network
CN107742193A (en) * 2017-11-28 2018-02-27 江苏大学 A kind of driving Risk Forecast Method based on time-varying state transition probability Markov chain
CN111209966A (en) * 2020-01-07 2020-05-29 中南大学 Markov chain-based path travel time determination method and system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10902336B2 (en) * 2017-10-03 2021-01-26 International Business Machines Corporation Monitoring vehicular operation risk using sensing devices

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001148019A (en) * 1999-06-01 2001-05-29 Fujitsu Ltd Method and device for classifying motion between traveling objects, image recognizing device, and method and device for recognizing traveling object
WO2018022821A1 (en) * 2016-07-29 2018-02-01 Arizona Board Of Regents On Behalf Of Arizona State University Memory compression in a deep neural network
CN107742193A (en) * 2017-11-28 2018-02-27 江苏大学 A kind of driving Risk Forecast Method based on time-varying state transition probability Markov chain
CN111209966A (en) * 2020-01-07 2020-05-29 中南大学 Markov chain-based path travel time determination method and system

Also Published As

Publication number Publication date
CN112183615A (en) 2021-01-05

Similar Documents

Publication Publication Date Title
CN112991354B (en) High-resolution remote sensing image semantic segmentation method based on deep learning
CN109635010B (en) User characteristic and characteristic factor extraction and query method and system
CN111861756B (en) Group partner detection method based on financial transaction network and realization device thereof
CN101694744A (en) Method and system for evaluating road emergency evacuation capacity and method and system for grading road emergency evacuation capacity
CN111598460B (en) Method, device, equipment and storage medium for monitoring heavy metal content of soil
CN112766283B (en) Two-phase flow pattern identification method based on multi-scale convolution network
CN111539444A (en) Gaussian mixture model method for modified mode recognition and statistical modeling
Meng et al. Improving automobile insurance claims frequency prediction with telematics car driving data
CN115905818A (en) Landslide early warning method based on data mining
CN116307103A (en) Traffic accident prediction method based on hard parameter sharing multitask learning
CN111242028A (en) Remote sensing image ground object segmentation method based on U-Net
CN114973019A (en) Deep learning-based geospatial information change detection classification method and system
CN113206808B (en) Channel coding blind identification method based on one-dimensional multi-input convolutional neural network
CN114882373A (en) Multi-feature fusion sandstorm prediction method based on deep neural network
CN117726939A (en) Hyperspectral image classification method based on multi-feature fusion
CN112183615B (en) Automobile risk user screening method with Markov chain data processing function
CN117436653A (en) Prediction model construction method and prediction method for travel demands of network about vehicles
CN110188324B (en) Traffic accident poisson regression analysis method based on feature vector space filtering value
CN114943290B (en) Biological intrusion recognition method based on multi-source data fusion analysis
CN114252706B (en) Lightning early warning method and system
CN115293639A (en) Battlefield situation studying and judging method based on hidden Markov model
CN115130599A (en) Semi-supervision method for strip mine card state recognition under time series GAN data enhancement
CN112465054A (en) Multivariate time series data classification method based on FCN
CN116698410B (en) Rolling bearing multi-sensor data monitoring method based on convolutional neural network
CN117216490B (en) Intelligent big data acquisition system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant