CN116228438A - Vehicle mileage-based vehicle risk pricing model construction method, device and system - Google Patents

Vehicle mileage-based vehicle risk pricing model construction method, device and system Download PDF

Info

Publication number
CN116228438A
CN116228438A CN202211659332.1A CN202211659332A CN116228438A CN 116228438 A CN116228438 A CN 116228438A CN 202211659332 A CN202211659332 A CN 202211659332A CN 116228438 A CN116228438 A CN 116228438A
Authority
CN
China
Prior art keywords
mileage
model
factor
vehicle
factors
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211659332.1A
Other languages
Chinese (zh)
Inventor
李欣
于忠华
邹家伟
叶灵玲
熊寅庚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Dingran Information Technology Co ltd
Original Assignee
Shenzhen Dingran Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Dingran Information Technology Co ltd filed Critical Shenzhen Dingran Information Technology Co ltd
Priority to CN202211659332.1A priority Critical patent/CN116228438A/en
Publication of CN116228438A publication Critical patent/CN116228438A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/08Insurance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2474Sequence data queries, e.g. querying versioned data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0283Price estimation or determination
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/40Business processes related to the transportation industry
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Human Resources & Organizations (AREA)
  • Development Economics (AREA)
  • Data Mining & Analysis (AREA)
  • Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Probability & Statistics with Applications (AREA)
  • Mathematical Physics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Fuzzy Systems (AREA)
  • Computational Mathematics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Operations Research (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Optimization (AREA)
  • Game Theory and Decision Science (AREA)
  • Mathematical Analysis (AREA)
  • Tourism & Hospitality (AREA)
  • Quality & Reliability (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Technology Law (AREA)
  • Educational Administration (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Algebra (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)

Abstract

The invention discloses a vehicle mileage-based vehicle risk pricing model construction method, device and system. The method comprises the following steps: acquiring original data of a vehicle in a historical time period; carrying out data processing on the original number to obtain a single mileage factor and a dependent variable; constructing a plurality of mileage interval factors according to the single mileage factor; and the mileage interval factors are taken as independent variables and are put into a preset model together with the independent variables, and a mileage pricing model is built. The method solves the problems that in the traditional vehicle insurance model, mileage variables are usually taken as single continuous variables to enter the model, and when the method is applied to vehicle insurance pricing, the interpretation ability of single mileage factors is usually weak.

Description

Vehicle mileage-based vehicle risk pricing model construction method, device and system
Technical Field
The invention relates to the technical field of big data pricing, in particular to a vehicle mileage-based vehicle insurance pricing model construction method, device and system.
Background
Pay-as-You-Drive (PAYD) by mileage is a new pricing method in the motor vehicle insurance market. Unlike traditional pricing concepts of vehicle insurance, PAYD vehicle insurance not only considers the factors of insuring vehicles and the factors of insured persons as drivers, but also classifies driving mileage into main pricing factors, and some PAYD products also consider driving behaviors of vehicle owners. In this pricing method, the lower the driving mileage of the owner, the lower the premium paid.
The driving mileage is taken as a factor positively related to driving risk, and is important to the scientificity and rationality of vehicle insurance pricing. In the traditional PAYD vehicle insurance, mileage variables are usually taken as single continuous variables to be put into a model, when the method is applied to vehicle insurance pricing, the interpretation capability of single mileage factors is usually weak, the relationship between mileage variables and driving risks cannot be accurately reflected, and the accuracy of the prediction result of a vehicle insurance pricing model is not high.
Therefore, there is a need for a method of converting mileage factors into modeling variables that accurately predict vehicle risk pricing, to improve accuracy and scientificity of PAYD vehicle risk pricing.
Disclosure of Invention
In order to solve the problems, the invention provides a vehicle mileage-based vehicle risk pricing model construction method, device and system, wherein a certain continuous variable factor is divided into a plurality of interval factors to serve as independent variables of the model, so that risk recognition capability under different mileage grouping is enhanced, interpretation capability of target variables is improved, and model prediction and pricing effects are improved.
The invention provides a vehicle mileage-based vehicle risk pricing model construction method, which comprises the following steps:
s10: acquiring original data of a vehicle in a historical time period;
s20: carrying out data processing on the original number to obtain a single mileage factor and a dependent variable;
s30: constructing a plurality of mileage interval factors according to the single mileage factor;
s40: and taking the multiple mileage interval factors as independent variables, and putting the independent variables into a preset model to establish a mileage pricing model.
Preferably, the raw data of the vehicle in the historical time period at least comprises the vehicle mileage in the historical time period.
Preferably, the dependent variable obtained after the data processing is performed on the original data is the odds or the odds.
Preferably, constructing a plurality of mileage interval factors according to the single mileage factor comprises the following steps:
s31: acquiring a single mileage factor;
s32: dividing a single mileage factor according to a preset mileage grouping mode and a preset mileage grouping number;
s33: and generating a plurality of corresponding mileage interval factors according to the segmentation result.
Preferably, the preset model is a tweed model in a generalized linear model.
Preferably, the method is characterized in that the mileage interval factors are taken as the independent variables and put into a preset model together with the dependent variables to obtain a mileage pricing model, and the method comprises the following steps:
s41: acquiring a plurality of mileage interval factors and dependent variables;
s42: putting a plurality of mileage interval factors and dependent variables into a Twaie model, carrying out model solving, and establishing a mileage pricing model;
s43: model effects of the mileage pricing model are evaluated.
Preferably, the placing the multiple mileage interval factors and the odds into the tweed model for model solving includes: taking a plurality of factors as independent variables to be put into a model, checking a model result and a P value corresponding to the factors to carry out hypothesis test, and if P is more than 0.05, needing to combine the P value insignificant group with the coefficient in a near way or reselecting a grouping threshold according to a one-dimensional analysis result; if P <0.05, reject the original hypothesis.
Preferably, the evaluating the model effect of the mileage pricing model includes: a scatter plot of model predictions and actual values, a coefficient of kunity, a lifting curve, and a cumulative lifting curve.
Preferably, the historical time period may be divided into one month, two months, one quarter, half year and a specified time period according to practical situations.
In order to achieve the above object, the present invention provides a vehicle mileage-based vehicle risk pricing model construction device, the device comprising:
the data acquisition module is used for acquiring original data of the vehicle in a historical time period;
the data processing module is used for processing the original data and acquiring a single mileage factor and a dependent variable;
the mileage factor construction module is used for constructing a plurality of mileage interval factors according to the single mileage factor;
and the data modeling module is used for establishing a mileage pricing model.
In order to achieve the above object, the present invention further provides a vehicle mileage-based vehicle risk pricing model construction system, which is characterized in that the system comprises:
at least one processor, memory and a server, the server comprising at least one processor and memory, at least one server, one processor, at least one memory and vehicle risk pricing model instructions stored in the memory, the computer program instructions when executed by the processor and/or the raw data being loaded into the server implementing the method of any of the above.
In summary, the vehicle risk pricing model construction method, device and system based on vehicle mileage provided by the embodiment of the invention divide a certain continuous variable factor into a plurality of interval factors, such as dividing a single mileage factor into a plurality of mileage interval factors, take the plurality of mileage interval factors as independent variables of the model, enhance risk identification capability under different mileage groups, improve interpretation capability of target variables, improve model prediction and pricing effects, enable PAYD vehicle risk to accurately reflect driving risk, stimulate an applicant to select optimal driving mileage, reduce risk of occurrence of insurance accidents, and increase efficiency and social benefit level of insurance market.
Drawings
FIG. 1 is a flow chart of a vehicle mileage-based vehicle risk pricing model construction method in an embodiment of the invention.
FIG. 2 is a flow chart of constructing a plurality of milestone factors in an embodiment of the invention.
FIG. 3 is a schematic diagram of a scatter plot evaluation effect of a mileage pricing model according to an embodiment of the present invention.
FIG. 4 is a graph illustrating the effect of the coefficient of base evaluation of a mileage pricing model according to an embodiment of the present invention.
FIG. 5 is a graph illustrating the effectiveness of a boost curve evaluation of a mileage pricing model according to an embodiment of the present invention.
FIG. 6 is a graph illustrating cumulative lift curve evaluation of a mileage pricing model according to an embodiment of the present invention.
FIG. 7 is a schematic diagram of a scatter plot evaluation effect of an embodiment two mileage pricing model according to an embodiment of the invention.
FIG. 8 is a graph illustrating the effect of the coefficient of base evaluation of the two mileage pricing model according to the embodiment of the present invention.
FIG. 9 is a graph illustrating the effectiveness of the boost curve evaluation of the two mileage pricing model according to an embodiment of the present invention.
FIG. 10 is a graph illustrating the cumulative lift curve evaluation of a two mileage pricing model according to an embodiment of the present invention.
FIG. 11 is a schematic diagram of scatter plot evaluation results of an embodiment three-mileage pricing model according to an embodiment of the present invention.
FIG. 12 is a schematic view of the effect of the coefficient of base evaluation of the three mileage pricing model according to the embodiment of the present invention.
FIG. 13 is a graph illustrating the effectiveness of the boost curve evaluation of the three-mileage pricing model according to an embodiment of the present invention.
FIG. 14 is a graph illustrating cumulative lift curve evaluation of a three mileage pricing model according to an embodiment of the present invention.
Detailed Description
Features and exemplary embodiments of various aspects of the present invention will be described in detail below, and in order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are merely configured to illustrate the invention and are not configured to limit the invention. It will be apparent to one skilled in the art that the present invention may be practiced without some of these specific details. The following description of the embodiments is merely intended to provide a better understanding of the invention by showing examples of the invention.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.
The embodiment of the invention provides a vehicle risk pricing model construction method based on vehicle mileage, which comprises the following steps: acquiring original data of a vehicle in a historical time period, wherein the original data at least comprises the vehicle driving mileage in the historical time period; carrying out data processing on the original number to obtain a single mileage factor and a dependent variable; constructing a plurality of mileage interval factors according to the single mileage factor; the mileage interval factors are used as independent variables and are put into a preset model together with the independent variables, and a mileage pricing model is built; fig. 1 is a flowchart of a vehicle risk pricing model construction method based on vehicle mileage according to an embodiment of the present disclosure, which specifically may include the following steps:
s10: and acquiring the original data of the vehicle in the historical time period, wherein the original data at least comprises the vehicle driving mileage in the historical time period. The raw data is mainly vehicle mileage. The driving mileage is mainly obtained by the following three modes:
1. the GPS positioning system is used for tracking and recording the running condition of the vehicle in real time, and then information is transmitted to the insurance company through a wireless network or manual modes of a driver and the like. Vehicle mileage is collected through an on-board OBD system, a handheld communication device or other on-board information devices. The GPS device has the ability to dynamically acquire data, not only to monitor the mileage of the protected vehicle, but also to monitor the speed, braking, distance, time, location, route, etc. at which the vehicle is traveling.
2. The device for recording the driving mileage data is jointly developed through cooperation with institutions such as banks, gas stations and the like, and the driving mileage data of a driver is automatically extracted when the driver uses a bank card or the gas stations are filled with oil, so that the data is transmitted to an insurance company. And collecting the driving mileage data by using the bank card business and the oiling behavior as carriers.
3. And directly acquiring the driving mileage number in a period of time through a driving odometer of the vehicle. The method directly obtains the driving mileage through the odometer of the vehicle.
The raw data may also be obtained from databases, including underwriting claims databases and internet of vehicles databases. The source of the database can be a database established by the enterprise, a shared database obtained by cooperation with other enterprises, or an official database obtained by cooperation with government units. The larger the sample size of the database and the more the real and effective data are, the better the prediction effect of the subsequent model is. The raw data obtained from the database is only used by the method to build the mileage pricing model. The raw data includes at least one of historical time period driving behavior, vehicle attributes, driver socioeconomic attributes, and underwriting data. The raw data are static data and dynamic data respectively, wherein the static data comprise socioeconomic properties of a driver and vehicle properties, and the dynamic data comprise historical time period driving mileage, driving behavior and underwriting data.
Driving behavior, vehicle attributes, driver socioeconomic attributes, and underwriting data are relevant to building the mileage pricing model. The driving behavior comprises continuous driving time, turning times, turning speed, high-speed national road and urban road ratio, driving area, overspeed times and time, turning times, sudden braking frequency and the like. Taking driving behaviors as an example, driving behavior information of a driver is collected through an on-board OBD system, handheld communication equipment or other on-board information devices, and the driving behaviors of the driver are judged and analyzed through a driving behavior judging module. In the driving process of the driver, the driving behavior is uploaded to the server as long as the driving behavior event is judged. The driving behavior data can be used as independent variables to construct a driving behavior scoring model, the dependent variables can be the risk rate, the pay amount and the like, and the model predicts the result and then converts the result into driving behavior scoring according to a certain rule. The driving behavior is converted into a driving behavior score. Other types of data are not limited to the above data acquisition modes, and data which can be processed by a computer can be obtained through assignment, conversion, direct acquisition and other modes. For example, among the socioeconomic attributes of the driver, are: gender, age, nature of work, risk record, risk purchase, etc. Assigning a sex male to 1 and a sex female to 0; assigning an age of 1 to less than 31 years indicates younger and 2 to the other cases. The vehicle attributes include: the purchase price of the vehicle, the new and old of the vehicle, the size of the vehicle, the number of the safety air bags, the equipped condition of the safety belt alarm and the like. The purchase price of the vehicle can be directly obtained by taking the ten thousand yuan people as a unit in the case of the vehicle.
S20: and carrying out data processing on the original data to obtain a single mileage factor and a dependent variable. This data processing mainly includes data cleansing, which mainly includes three aspects:
null filling: continuous variable null value takes median filling and classified variable null value takes mode filling. For example, if the median of the annual mileage variable is 10000km and the mode of the power type is pure electricity, the field null part is filled with 10000 and pure electricity;
correcting abnormal data: and correcting the data which are out of a reasonable range in part of variables by using a moving average method. The formula is as follows:
Figure BDA0004013100840000071
extreme value processing: samples with normal or approximately normal distribution are processed according to the Laida criterion, the data with normal interval of (mu-3 sigma, mu+3 sigma) and data beyond the interval range are deleted, wherein mu is the data mean value and sigma is the data standard deviation. The deleted data is filled with median or average values.
Other data processing methods are also possible, and if the algorithm is insensitive to outliers, it may not be processed; it is obviously abnormal and can be deleted directly in a small number; the abnormal data may be regarded as a missing value and processed in a method of processing the missing value.
S30: and constructing a plurality of mileage interval factors according to the single mileage factor. FIG. 2 is a flow chart for constructing a plurality of milestone factors. The preset mileage grouping mode can be used for grouping according to the general grouping modes such as the earned year grouping mode, the equal-width grouping mode, the one-dimensional analysis result grouping mode and the like. The number of vehicle years over which the statistical period has been earned is the number of vehicle years over which the statistical period has been over, and the proportion of the expiration of each policy is aimed at. The calculation formula is as follows:
earned year = number of expiration days/number of insurance period days
The earned years are grouped according to actual conditions, and the earned years in each group are equal.
The equal width grouping gives a fixed width according to the number of groupings, and the widths of the variables grouped into each group are equal. The method comprises the following specific steps:
1. determining the number of groups;
2. determining a group distance;
determining the group number according to the distribution characteristics of the data and the actual demand, wherein the group number is not too large or too small and is generally 5-15 groups; the group distance is determined according to the difference between the maximum value and the minimum value, namely:
group distance= (maximum value-minimum value)/group number
The number of groups can be determined according to the actual situation, and when the number of data is 50 to 1000, the number is generally 5 to 15 groups.
The group number K can also be obtained according to Sturges' rule, and the calculation formula is as follows:
K=1+3.322*lg N
where N is the number of data samples
The specific formula can also be as follows according to the modified Sturges' rule:
K=1+3.322log(N 2 /100)
and grouping the driving mileage according to the group segmentation mode, generating N mileage increment factors, namely a factor 1, a factor 2 and a factor 3.
S40: and taking the multiple mileage interval factors as independent variables, putting the independent variables and the dependent variables into a preset model, and establishing a mileage pricing model.
The independent variable can be a plurality of mileage interval factors, and the mileage pricing model constructed based on the independent variable can be charged by mileage. The independent variable can also take a plurality of mileage interval factors as core factors and auxiliary factors as auxiliary factors to construct a mileage pricing model. Specifically, the cofactors are divided into conventional and emerging pricing factors, the conventional pricing factors including: vehicle attributes, driver socioeconomic attributes, NCD, etc. Emerging pricing factors include: driving behavior, net appointment vehicle identification, second hand vehicles, accident vehicles and the like.
The dependent variable is the odds. The independent variables and the dependent variables are subjected to data set. The data in the plurality of data sources are combined and stored in a consistent data store, such as a data warehouse. Different data sources are subjected to deduplication by maintaining standardization during unified merging.
The method for determining the preset model comprises the following specific steps:
determining independent variables and dependent variables;
determining the independent variable and the data type of the dependent variable;
and selecting a corresponding model according to the independent variable and the data type of the dependent variable.
In this embodiment, the model corresponding to the independent variable and the data type of the dependent variable is preferably selected as the tweed model in the generalized linear model. And placing the independent variables and the dependent variables into a Twaie model for model decomposition to obtain the mileage pricing model. And obtaining the mileage pricing model by continuously revising the model. The model correction includes: and carrying out hypothesis test on the P value corresponding to the model result and the factor. Wherein the original assumption is that: dividing a single mileage factor into multiple mileage interval factors is inefficient. Alternative assumptions: it is effective to divide a single mileage factor into a plurality of mileage interval factors. And taking the multiple mileage interval factors as independent variables, putting the independent variables into a model, checking the model result and the P value corresponding to the factors, and carrying out hypothesis test. If the P value is greater than 0.05, the factor effect is insignificant, the P value is closely combined with the coefficient, or the grouping threshold is reselected according to the one-dimensional analysis result. If P <0.05, reject the original hypothesis.
After the model is built, model effect evaluation is required to be carried out on the model. In this embodiment, four indexes are selected for model evaluation.
Evaluating the model effect according to a scatter diagram of the model predicted value and the actual value, wherein the smaller the scatter diagram is, the smaller the deviation between the predicted value and the actual value is; the scattered points are distributed randomly at the diagonal lines, obvious characteristics are not presented, and no obvious deviation is generated along with the increase of the predicted value, so that the model is smaller in predicted deviation, good in generalization capability and higher in fitting degree of the actual value and the predicted value.
And according to the effect of the matrix evaluation model, the coefficient of the matrix represents the area between the black diagonal line and the blue curve of the area, the difference of the risks is reflected, and the larger the coefficient is, the higher the overall degree of distinction of the risks is.
The effect of the model is evaluated according to the lifting curve, and the effect is measured by the prediction capacity of the scoring model on bad samples compared with the randomly selected multiple. A degree of improvement greater than 1 indicates that the model performs better than the random selection.
And evaluating the model effect according to the accumulated lifting curve. Lift (lifting index) herein is used to evaluate the predictive ability of a predictive model to "response" in a target over a multiple of random choices, with 1 being a boundary, a Lift greater than 1 indicating that the model or rule captures more "response" than random choices, a Lift equal to 1 indicating that the model behaves independently of random choices, and a Lift less than 1 indicating that the model or rule captures less "response" than random choices. The larger the lifting index is, the better the identification capability of the prediction model to risks is.
For a better understanding of the present invention, and as proved by the experiments of the inventors, the contents of the present invention are further elucidated below in connection with some specific embodiments, the following list is for illustration only, and various parameters may be adapted according to the inventive concept of the present invention.
Example 1
The construction of the single mileage factors into a plurality of mileage interval factors is carried out by the following steps:
checking extreme values, 25%, 50% and 75% quantiles of the annual mileage km in the data set, dividing the annual mileage into 6 groups according to the earned years and the like, wherein the earned years of each group are equal, and the odds are gradually increased along with the increase of the annual mileage.
Grouping the annual driving mileage according to the 6-group segmentation mode to generate 6 mileage increment factors, namely a factor 1, a factor 2, a factor 3, a factor 4, a factor 5 and a factor 6, wherein the data in each factor is recorded in an increment form, and the specific method is as follows:
the annual mileage of 6 factors is divided into factor 1[0,5000], factor 2 (5000,10000), factor 3 (10000,15000), factor 4 (15000,20000), factor 5 (20000,25000), factor 6 (25000, + ].
When the annual driving mileage is 10000 km:
factor 1[0,5000] =5000;
factor 2 (5000,10000 ] =10000-5000=5000;
factor 3 (10000,15000 ] =0;
factor 4 (15000,20000 ] =0;
factor 5 (20000,25000 ] =0;
factor 6 (25000, + ] =0;
when the annual driving mileage is 11000 km:
factor 1[0,5000] =5000;
factor 2 (5000,10000 ] =10000-5000=5000;
factor 3 (10000,15000 ] =11000-10000=1000;
factor 4 (15000,20000 ] =0;
factor 5 (20000,25000 ] =0;
factor 6 (25000, + ] =0.
The 6 factors are put into the model as arguments (other numbers are possible, which are only examples here), a mileage pricing model is built, and the model is evaluated. In a preferred embodiment, the present invention also performs better evaluation optimization with respect to the prior art, and after the model is built, the model effect evaluation is further performed on the model, which further includes:
the method comprises the steps of firstly, determining a scatter diagram according to a model predicted value and an actual value to evaluate a model effect; specifically, taking a 45-degree diagonal line in a scatter diagram as a reference, taking a distance between the scatter and the 45-degree diagonal line as a first pre-judging condition, and determining a model with the number of scatter points exceeding a first preset number meeting the first pre-judging condition as an excellent model; when the first preset condition is not met, taking the fact that the distance between the scattered points and the 45-degree diagonal is smaller than a second preset distance as a second preset condition, wherein the second preset distance is larger than the first preset distance, and determining models with the number of the scattered points exceeding a second preset number, meeting the second preset condition, as standard reaching models, wherein the second preset number is larger than or equal to the first preset number; when the second pre-judging condition is not met, the model is obviously not in accordance with the requirement, and the model needs to be modeled again; and for the standard model meeting the second pre-judging condition, only the model needs to be properly optimized, such as adjusting part of independent variables and part of dependent variables, so that the model reaches an excellent model.
And secondly, evaluating the model effect according to the coefficient of the foundation when the constructed model is determined to be an excellent model, and meeting the requirement when the coefficient of the foundation is within a preset threshold range. In the application, the coefficient of the foundation represents the area between the diagonal line and the curve, reflects the difference of the risks, and is between 0 and 0.5, and the larger the coefficient of the foundation is, the higher the overall degree of distinguishing the risks is. The smaller the coefficient, the more equal the representation and the more difficult it is to distinguish the risk. When it is difficult to distinguish, then the samples (independent and dependent) need to be re-selected.
Thirdly, after the coefficient of the foundation meets the model evaluation requirement, evaluating the model effect according to the lifting curve, if the multiple value of the lifting curve is larger than 1, the model effect is met, and if the multiple value of the lifting curve is not met, the model effect is optimized.
The actual experimental results of the above steps are shown in fig. 3 to 6. FIG. 3 is a schematic diagram of evaluation effects of a scatter diagram of an embodiment mileage pricing model in an embodiment of the invention, wherein the scatter diagram shows that the scatter diagram is close to a 45-degree diagonal, and is distributed at the diagonal randomly, and has no obvious characteristics, no significant deviation generated along with the increase of a predicted value, and the model has smaller prediction deviation and good generalization capability. Fig. 4 is a schematic diagram of a coefficient of base evaluation effect of a mileage pricing model according to an embodiment of the present invention, where the coefficient of base is 0.3138, and the capability of distinguishing risk levels is better. Fig. 5 is a schematic diagram of evaluation effect of a lifting curve of a mileage pricing model according to an embodiment of the present invention, in which records of a test set are sorted in ascending order according to model predicted values, sorted by 10 equal scores according to earned years, and predicted values and actual values corresponding to 10 equal scores earned years groups with lowest predicted values to highest predicted values are compared, wherein the lifting curve is differentiated: 6.16 times (the predicted value is 7.25 times), which shows that the model can effectively distinguish high-risk vehicles from low-risk vehicles and can reasonably price. FIG. 6 is a schematic diagram illustrating an evaluation effect of a cumulative lift curve of a mileage pricing model according to an embodiment of the present invention, wherein the cumulative lift curve is a lift degree: 2.82 times, the effectiveness of the model was demonstrated.
Example 2
The construction of the single mileage factors into a plurality of mileage interval factors is carried out by the following steps:
checking extreme values, 25%, 50% and 75% quantiles of the annual mileage km in the data set, dividing the annual mileage into 6 groups according to the earned years and the like, wherein the earned years of each group are equal, and the odds are gradually increased along with the increase of the annual mileage.
Grouping the annual driving mileage according to the 6-group segmentation mode to generate 6 mileage increment factors, namely a factor 1, a factor 2, a factor 3, a factor 4, a factor 5 and a factor 6, wherein the data in each factor is recorded in an increment form, and the specific method is as follows:
the annual mileage of 6 factors is divided into factor 1[0,5000], factor 2 (5000,10000), factor 3 (10000,15000), factor 4 (15000,20000), factor 5 (20000,25000), factor 6 (25000, + ].
When the annual driving mileage is 10000 km:
factor 1[0,5000] =5000;
factor 2 (5000,10000 ] =10000-5000=5000;
factor 3 (10000,15000 ] =0;
factor 4 (15000,20000 ] =0;
factor 5 (20000,25000 ] =0;
factor 6 (25000, + ] =0;
when the annual driving mileage is 11000 km:
factor 1[0,5000] =5000;
factor 2 (5000,10000 ] =10000-5000=5000;
factor 3 (10000,15000 ] =11000-10000=1000;
factor 4 (15000,20000 ] =0;
factor 5 (20000,25000 ] =0;
factor 6 (25000, + ] =0.
And 6 factors are taken as independent variables to be put into a model, a mileage pricing model is built, and model evaluation is carried out on the model. FIG. 7 is a schematic diagram of evaluation effects of a scatter plot of an embodiment mileage pricing model in an embodiment of the invention, wherein the scatter plot is shown to be close to a 45-degree diagonal, and the scatter plot is distributed at the diagonal randomly, does not show obvious characteristics, does not generate significant deviation along with the increase of a predicted value, and shows that the model has smaller prediction deviation and good generalization capability. FIG. 8 is a schematic diagram showing the effect of estimating the coefficient of the mileage pricing model according to the embodiment of the present invention, wherein the coefficient of the mileage pricing model is 0.3167, and the risk level distinguishing capability is better. Fig. 9 is a schematic diagram of evaluation effect of a lifting curve of a mileage pricing model according to an embodiment of the present invention, in which records of a test set are sorted in ascending order according to model predicted values, sorted by 10 equal scores according to earned years, and predicted values and actual values corresponding to 10 equal scores earned years groups with lowest predicted values to highest predicted values are compared, wherein the lifting curve is differentiated: 6.05 times (predictive value is 7.21), which shows that the model can effectively predict mileage pricing premium. FIG. 10 is a schematic diagram illustrating an evaluation effect of a cumulative lift curve of a mileage pricing model according to an embodiment of the present invention, wherein the cumulative lift curve has a lift degree: 2.75 times, the effectiveness of the model was demonstrated.
Example 3
The construction of the single mileage factors into a plurality of mileage interval factors is carried out by the following steps:
checking extreme values, 25%, 50% and 75% quantiles of the annual mileage km in the data set, dividing the annual mileage into 6 groups according to the earned years and the like, wherein the earned years of each group are equal, and the odds are gradually increased along with the increase of the annual mileage.
Grouping the annual driving mileage according to the 6-group segmentation mode to generate 6 mileage increment factors, namely a factor 1, a factor 2, a factor 3, a factor 4, a factor 5 and a factor 6, wherein the data in each factor is recorded in an increment form, and the specific method is as follows:
the annual mileage of 6 factors is divided into factor 1[0,5000], factor 2 (5000,10000), factor 3 (10000,15000), factor 4 (15000,20000), factor 5 (20000,25000), factor 6 (25000, + ].
When the annual driving mileage is accumulated to 10000 km:
factor 1[0,5000] =0;
factor 2 (5000,10000 ] =10000;
factor 3 (10000,15000 ] =0;
factor 4 (15000,20000 ] =0;
factor 5 (20000,25000 ] =0;
factor 6 (25000, + ] =0;
when the annual driving mileage is 11000 km:
factor 1[0,5000] =0;
factor 2 (5000,10000 ] =0;
factor 3 (10000,15000 ] =11000;
factor 4 (15000,20000 ] =0;
factor 5 (20000,25000 ] =0;
factor 6 (25000, + ] =0.
And 6 factors are taken as independent variables to be put into a model, a mileage pricing model is built, and model evaluation is carried out on the model. FIG. 11 is a schematic diagram of evaluation effects of a scatter plot of an embodiment mileage pricing model in an embodiment of the invention, wherein the scatter plot is shown to be close to a 45-degree diagonal, and the scatter plot is distributed at the diagonal randomly, does not show obvious characteristics, does not generate significant deviation along with the increase of a predicted value, and shows that the model has smaller prediction deviation and good generalization capability. Fig. 12 is a schematic diagram of a coefficient of base evaluation effect of a mileage pricing model according to an embodiment of the present invention, where the coefficient of base is 0.3181, and the capability of distinguishing risk levels is better. Fig. 13 is a schematic diagram of evaluation effect of a lifting curve of a mileage pricing model according to an embodiment of the present invention, in which records of a test set are sorted in ascending order according to model predicted values, sorted by 10 equal scores according to earned years, and predicted values and actual values corresponding to 10 equal scores earned years groups with lowest predicted values to highest predicted values are compared, wherein the lifting curve is differentiated: and 6.45 times, the model can effectively predict mileage pricing premium. FIG. 14 is a schematic diagram illustrating the evaluation effect of the cumulative lift curve of a mileage pricing model according to an embodiment of the present invention, wherein the cumulative lift curve is the lift degree: 2.8 times, the effectiveness of the model is demonstrated.
The vehicle mileage-based vehicle risk pricing model construction method of the embodiment of the invention can be realized by a vehicle mileage-based vehicle risk pricing model construction device, and the device mainly comprises the following steps:
the data acquisition module is used for acquiring original data of the vehicle in a historical time period;
the data processing module is used for processing the original data and acquiring a single mileage factor and a dependent variable;
the mileage factor construction module is used for constructing a plurality of mileage interval factors according to the single mileage factor;
and the data modeling module is used for establishing a mileage pricing model.
The data acquisition module comprises a static data acquisition unit and a dynamic data acquisition unit. The static data acquisition module is used for acquiring static data related to vehicle risk pricing, such as socioeconomic properties of a driver. The dynamic data acquisition module is used for acquiring dynamic data related to vehicle insurance pricing, such as driving mileage. Specifically, the system further comprises a driving distance acquisition subunit, wherein the driving distance acquisition subunit acquires driving distances from GPS, an odometer, an on-board OBD (on-board OBD) and the like.
The data processing module comprises a data cleaning unit and a data centralizing unit. The data cleaning module processes the acquired data and converts the data into data for subsequent modeling. The data centralizing module centralizes the data to a point of the center, which is beneficial to establishing a model. When the data centralizing module is used for data centralizing, a plurality of mileage interval factors are firstly constructed in the mileage factor construction module, and the mileage interval factors and other factors are put into the data centralizing module for data centralizing.
The mileage factor construction module comprises a segmentation single mileage factor module and a construction plurality of mileage interval factor modules. The cut order mileage factor module groups mileage factors according to a preset mileage grouping mode and a preset mileage grouping number. Specifically, the segmentation single mileage factor module comprises a mileage grouping mode determining unit and a mileage grouping number determining unit. The mileage grouping mode determining unit is used for determining the mileage grouping mode, and the mileage grouping number determining unit is used for determining the mileage grouping number. And in the construction of the multiple mileage interval factor modules, the driving mileage is put into the divided mileage factors, and the multiple mileage interval factors corresponding to the mileage are obtained.
The data modeling module comprises a model determining unit, a model solving unit and a model evaluating unit. The model determination module determines a model required for establishing the model through the independent variables and the data types of the independent variables. The model solving unit is used for solving the mileage pricing model by using computer software, and further comprises the step of carrying out hypothesis test on the established mileage pricing model and continuously revising the model. The model evaluation unit is used for evaluating the effect of establishing the mileage pricing model in different evaluation modes.
Since each functional module of the model building apparatus according to the exemplary embodiment of the present invention corresponds to a step of the exemplary embodiment of the model building method, for details not disclosed in the embodiment of the apparatus according to the present invention, please refer to the embodiment of the model building method according to the present invention.
In addition, the embodiment of the invention also provides a vehicle risk pricing model construction system based on vehicle mileage, which mainly comprises the following steps: at least one server, one processor, at least one memory, and mileage pricing model construction program stored in the memory, which when executed by the processor and/or raw data is loaded into the server, implements the above-described implementation method.
In particular, the processor described above may comprise a Central Processing Unit (CPU), or a specific integrated circuit, or may be configured to implement one or more integrated circuits of embodiments of the present invention. The memory may include mass storage for data or instructions. By way of example, and not limitation, the memory may comprise a Hard Disk Drive (HDD), floppy Disk Drive, flash memory, optical Disk, magneto-optical Disk, magnetic tape, or universal serial bus (Universal Serial Bus, USB) Drive, or a combination of two or more of the foregoing. The memory may include removable or non-removable (or fixed) media, where appropriate. The memory may be internal or external to the data processing apparatus, where appropriate. In a particular embodiment, the memory is a non-volatile solid state memory. In a particular embodiment, the memory includes Read Only Memory (ROM). The ROM may be mask programmed ROM, programmable ROM (PROM), erasable PROM (EPROM), electrically Erasable PROM (EEPROM), electrically rewritable ROM (EAROM), or flash memory, or a combination of two or more of these, where appropriate.
A computer Central Processing Unit (CPU) can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) or a program whose other storage section is loaded into a Random Access Memory (RAM). In the RAM, various programs and data required for the system operation are also stored. The CPU, ROM and RAM are connected to each other by a bus. An input/output (I/O) interface is also connected to the bus.
It should also be noted that the exemplary embodiments mentioned in this disclosure describe some methods or systems based on a series of steps or devices. However, the present invention is not limited to the order of the above-described steps, that is, the steps may be performed in the order mentioned in the embodiments, or may be performed in a different order from the order in the embodiments, or several steps may be performed simultaneously.
In the foregoing, only the specific embodiments of the present invention are described, and it will be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the systems, modules and units described above may refer to the corresponding processes in the foregoing method embodiments, which are not repeated herein. It should be understood that the scope of the present invention is not limited thereto, and any equivalent modifications or substitutions can be easily made by those skilled in the art within the technical scope of the present invention, and they should be included in the scope of the present invention.

Claims (11)

1. The vehicle mileage-based vehicle risk pricing model construction method is characterized by comprising the following steps:
s10: acquiring original data of a vehicle in a historical time period;
s20: carrying out data processing on the original number to obtain a single mileage factor and a dependent variable;
s30: constructing a plurality of mileage interval factors according to the single mileage factor;
s40: and taking the multiple mileage interval factors as independent variables, and putting the independent variables into a preset model to establish a mileage pricing model.
2. The method of claim 1, wherein the raw data of the vehicle over a historical time period includes at least the historical time period vehicle mileage.
3. The method of claim 1, wherein the dependent variable obtained after the data processing of the raw data is a payrate or an amount of payable.
4. The method of claim 1, wherein constructing a plurality of mileage factors from the single mileage factor comprises the steps of:
s31: acquiring a single mileage factor;
s32: dividing a single mileage factor according to a preset mileage grouping mode and a preset mileage grouping number;
s33: and generating a plurality of corresponding mileage interval factors according to the segmentation result.
5. The method of claim 1, wherein the predetermined model is a tweed model in a generalized linear model.
6. The method of claim 1, wherein the step of placing the plurality of mileage interval factors as the independent variables and the dependent variables together in a preset model to obtain a mileage pricing model includes the steps of:
s41: acquiring a plurality of mileage interval factors and dependent variables;
s42: putting a plurality of mileage interval factors and dependent variables into a Twaie model, carrying out model solving, and establishing a mileage pricing model;
s43: model effects of the mileage pricing model are evaluated.
7. The method of claim 6, wherein placing the plurality of mile interval factors and odds into a tweed model for model solving comprises: and taking a plurality of factors as independent variables to be put into a model, checking a model result and a P value corresponding to the factors to carry out hypothesis test, and if P is more than 0.05, carrying out close combination on the P value insignificant group and the coefficient or reselecting a grouping threshold according to a one-dimensional analysis result.
8. The method of claim 6, wherein the evaluating model effects of a mileage pricing model comprises: a scatter plot of model predictions and actual values, a coefficient of kunity, a lifting curve, and a cumulative lifting curve.
9. The method of claim 1, wherein the historical time period is divided into a month, two months, a quarter, a half year, and a specified time period according to actual conditions.
10. Vehicle mileage-based vehicle risk pricing model construction device, which is characterized by comprising:
the data acquisition module is used for acquiring original data of the vehicle in a historical time period;
the data processing module is used for processing the original data and acquiring a single mileage factor and a dependent variable;
the mileage factor construction module is used for constructing a plurality of mileage interval factors according to the single mileage factor;
and the data modeling module is used for establishing a mileage pricing model.
11. A vehicle mileage-based vehicle risk pricing model building system, the system comprising:
at least one server, one processor, at least one memory and a vehicle risk pricing model build program stored in the memory, which when executed by the processor and/or the raw data is loaded into the server, implements the method of any of claims 1-9.
CN202211659332.1A 2022-12-22 2022-12-22 Vehicle mileage-based vehicle risk pricing model construction method, device and system Pending CN116228438A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211659332.1A CN116228438A (en) 2022-12-22 2022-12-22 Vehicle mileage-based vehicle risk pricing model construction method, device and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211659332.1A CN116228438A (en) 2022-12-22 2022-12-22 Vehicle mileage-based vehicle risk pricing model construction method, device and system

Publications (1)

Publication Number Publication Date
CN116228438A true CN116228438A (en) 2023-06-06

Family

ID=86581394

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211659332.1A Pending CN116228438A (en) 2022-12-22 2022-12-22 Vehicle mileage-based vehicle risk pricing model construction method, device and system

Country Status (1)

Country Link
CN (1) CN116228438A (en)

Similar Documents

Publication Publication Date Title
US10109013B1 (en) Usage-based insurance cost determination system and method
US9082072B1 (en) Method for applying usage based data
CN110400215B (en) Method and system for constructing enterprise family-oriented small micro enterprise credit assessment model
US10262530B2 (en) Determining customized safe speeds for vehicles
CN109636482B (en) Data processing method and system based on similarity model
CN111429268B (en) Vehicle credit risk detection method, terminal equipment and storage medium
JP2021509978A (en) Driving behavior evaluation method, device and computer-readable storage medium
CN112990386B (en) User value clustering method and device, computer equipment and storage medium
US20210407014A1 (en) Hail data evaluation computer system
CN110610431A (en) Intelligent claim settlement method and intelligent claim settlement system based on big data
CN110751317A (en) Power load prediction system and prediction method
CN109784586B (en) Prediction method and system for danger emergence condition of vehicle danger
CN116228438A (en) Vehicle mileage-based vehicle risk pricing model construction method, device and system
CN115797084A (en) Insurance pricing guidance method based on driving behavior of vehicle owner and related equipment thereof
CN111327661A (en) Pushing method, pushing device, server and computer readable storage medium
CN113064883A (en) Method for constructing logistics wind control model, computer equipment and storage medium
CN111833595B (en) Shared automobile auxiliary vehicle configuration method, electronic device and storage medium
CN116665342B (en) New energy automobile driving behavior analysis method, system and equipment
CN117522492A (en) Intelligent pricing method, device and system of UBI car insurance in dynamic time period
Ullah et al. A Data-Driven Approach for Customized Pay-As-You-Drive Insurance Premiums
CN116702606A (en) Driving behavior risk assessment method, system, equipment and storage medium
Liu et al. Lifecycle Assessment Using Snowplow Trucks’ Automatic Vehicle Location Data
Nejatnia et al. Developing a continuous dynamic multi criteria decision making model for ranking commodity groups in Iran railways
CN117035472A (en) Enterprise operation health assessment method, model construction method and device
JP2024025056A (en) Information management system and information management method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination