CN116091254B - Commercial vehicle risk analysis method - Google Patents

Commercial vehicle risk analysis method Download PDF

Info

Publication number
CN116091254B
CN116091254B CN202310375492.1A CN202310375492A CN116091254B CN 116091254 B CN116091254 B CN 116091254B CN 202310375492 A CN202310375492 A CN 202310375492A CN 116091254 B CN116091254 B CN 116091254B
Authority
CN
China
Prior art keywords
model
coefficient
feature
features
auxiliary
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310375492.1A
Other languages
Chinese (zh)
Other versions
CN116091254A (en
Inventor
徐显杰
金彪
简雄
胡敏智
赵伟亭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suoto Hangzhou Automotive Intelligent Equipment Co Ltd
Tianjin Soterea Automotive Technology Co Ltd
Original Assignee
Suoto Hangzhou Automotive Intelligent Equipment Co Ltd
Tianjin Soterea Automotive Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suoto Hangzhou Automotive Intelligent Equipment Co Ltd, Tianjin Soterea Automotive Technology Co Ltd filed Critical Suoto Hangzhou Automotive Intelligent Equipment Co Ltd
Priority to CN202310375492.1A priority Critical patent/CN116091254B/en
Publication of CN116091254A publication Critical patent/CN116091254A/en
Application granted granted Critical
Publication of CN116091254B publication Critical patent/CN116091254B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/08Insurance
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Abstract

The invention relates to the field of data processing, and discloses a method for analyzing risk of a commercial vehicle, which comprises the following steps: acquiring internet of vehicles data of a commercial vehicle to be evaluated; determining a first feature associated with risk information of a commercial vehicle to be evaluated based on the internet of vehicles data, and determining a first main model coefficient to be evaluated corresponding to the first feature based on an association relationship between a pure risk premium and the first main model feature and the second main model feature; inputting the vehicle networking data into a first auxiliary model to obtain a first auxiliary model result to be converted, and determining a first auxiliary model coefficient to be evaluated according to the corresponding relation between the first auxiliary model result to be converted and the first result coefficient; inputting the internet of vehicles data into a second auxiliary model to obtain a second auxiliary model result to be converted, determining a second auxiliary model coefficient to be evaluated according to the corresponding relation between the second auxiliary model result to be converted and the second result coefficient, and determining target risk information of the commercial vehicle to be evaluated so as to consider various data dimensions and improve the accuracy of risk index evaluation of the commercial vehicle.

Description

Commercial vehicle risk analysis method
Technical Field
The invention relates to the field of data processing, in particular to a method for analyzing risks of a commercial vehicle.
Background
With the popularization and development of the internet of vehicles technology, the internet of vehicles technology is being combined with various industries. UBI (Usage-based insurance) is one of applications of internet of vehicles technology in the vehicle insurance industry, and can be understood as an insurance based on driving behavior.
At present, commercial vehicle risk analysis mainly uses vehicle networking data such as driving time, driving speed, driving mileage and the like as index characterization, and because the used index is single, the accuracy of a risk analysis method constructed based on the information is poor. The pay data generated by the commercial vehicle is an objective result obtained through accident responsibility determination and vehicle loss severity assessment after the real traffic accident occurs. Therefore, the payment data (such as pure risk premium, pay rate of the bid amount and the like) of the commercial vehicle can objectively reflect the risk of the commercial vehicle, and the larger the payment data is, the larger the risk of the commercial vehicle is. Combining pay data of commercial vehicles and internet of vehicles data for risk prediction is an urgent problem to be solved.
In view of this, the present invention has been made.
Disclosure of Invention
In order to solve the technical problems, the invention provides a method for analyzing the risk of a commercial vehicle, which fully considers various data dimensions and improves the accuracy of risk analysis of the commercial vehicle.
The embodiment of the invention provides a method for analyzing risk of a commercial vehicle, which comprises the following steps:
acquiring internet of vehicles data of a commercial vehicle to be evaluated;
determining a first feature associated with risk information of a commercial vehicle to be evaluated based on the internet of vehicles data, determining a first main model coefficient to be evaluated corresponding to the first feature based on an association relation between pure risk premium and the first main model feature and the second main model feature, wherein the first main model coefficient to be evaluated represents the influence degree of the first feature on the risk information of the commercial vehicle to be evaluated, and the association relation is determined based on a generalized linear model;
inputting the Internet of vehicles data into a pre-trained first auxiliary model to obtain a first auxiliary model result to be converted, and determining a first auxiliary model coefficient to be evaluated according to the first auxiliary model result to be converted and a first result coefficient corresponding relation; the corresponding relation between the first auxiliary model and the first result coefficient is determined based on an xgboost algorithm and first auxiliary model characteristics and pure risk premium corresponding to each historical commercial vehicle, and the first auxiliary model coefficient to be evaluated represents the influence degree of the vehicle networking data on risk information of the commercial vehicle to be evaluated; the first auxiliary model features are features extracted based on internet of vehicles data of each historical commercial vehicle;
Inputting the Internet of vehicles data into a pre-trained second auxiliary model to obtain a second auxiliary model result to be converted, and determining a second auxiliary model coefficient to be evaluated according to the second auxiliary model result to be converted and a second result coefficient corresponding relation; the second auxiliary model and the second result coefficient corresponding relation are determined based on a neural network algorithm and second auxiliary model characteristics and mark fee issuing odds corresponding to each historical commercial vehicle, and the second auxiliary model coefficient to be evaluated represents the influence degree of the vehicle networking data on risk information of the commercial vehicle to be evaluated; the second auxiliary model feature and the first auxiliary model feature are features of different dimensions extracted based on internet of vehicles data of each historical commercial vehicle;
and determining target risk information corresponding to the commercial vehicle to be evaluated according to the first main model coefficient to be evaluated, the first auxiliary model coefficient to be evaluated and the second auxiliary model coefficient to be evaluated.
The embodiment of the invention also provides a method for analyzing the risk of the commercial vehicle, which comprises the following steps:
acquiring vehicle attribute data of a commercial vehicle to be evaluated;
determining a second feature associated with risk information of the commercial vehicle to be evaluated based on the vehicle attribute data, determining a second main model coefficient to be evaluated corresponding to the second feature based on an association relation between the risk information and the attribute feature, wherein the second main model coefficient to be evaluated represents the influence degree of the second feature on the risk information of the commercial vehicle to be evaluated, and the association relation is determined based on a generalized linear model;
And determining target risk information corresponding to the commercial vehicle to be evaluated according to the second main model coefficient to be evaluated.
The embodiment of the invention has the following technical effects:
the method comprises the steps of obtaining vehicle networking data of a commercial vehicle to be evaluated, using the vehicle networking data with multiple dimensions for subsequent determination of risk information of the commercial vehicle, further determining a first feature based on the vehicle networking data, determining a first main model coefficient to be evaluated corresponding to the first feature based on an association relation between pure risk premium and the first main model feature and the second main model feature, obtaining main relevant information in the risk information of the commercial vehicle, inputting the vehicle networking data into a first auxiliary model which is trained in advance, obtaining a first auxiliary model result to be converted, determining a first auxiliary model coefficient to be evaluated according to the first auxiliary model result to be converted and the first result coefficient association relation, inputting the vehicle networking data into a second auxiliary model which is trained in advance, obtaining a second auxiliary model result to be converted, determining a second auxiliary model coefficient to be evaluated according to the second auxiliary model result to be converted and the second result coefficient association relation, obtaining other relevant information in the risk information of the commercial vehicle, combining multiple models to improve accuracy, generalization and robustness of the models, and further fully improving the accuracy of the models to be evaluated according to the first auxiliary model, the first auxiliary model coefficient to be evaluated and the first auxiliary model coefficient to be evaluated.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a method for analyzing risk of a commercial vehicle according to an embodiment of the present invention;
fig. 2 is a flowchart of another method for analyzing risk of a commercial vehicle according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the invention, are within the scope of the invention.
The analysis method for the risk of the commercial vehicle provided by the embodiment of the invention is mainly suitable for the situation of evaluating the risk of the commercial vehicle by means of the Internet of vehicles data. The analysis method for the risk of the commercial vehicle provided by the embodiment of the invention can be executed by the electronic equipment.
Fig. 1 is a flowchart of a method for analyzing risk of a commercial vehicle according to an embodiment of the present invention. Referring to fig. 1, the method specifically includes:
s110, acquiring the Internet of vehicles data of the commercial vehicle to be evaluated.
The commercial vehicle to be evaluated can be a commercial vehicle needing risk analysis, and the risk information can be used as a pricing reference basis for UBI vehicle risks of the commercial vehicle to be evaluated. A typical commercial vehicle may be a truck or the like. The risk information is related information for representing the probability of occurrence of a risk accident of the commercial vehicle to be evaluated in the operation process, and the higher the risk information is, the higher the probability of occurrence of the risk accident of the commercial vehicle to be evaluated in the operation process is, and the pricing of the UBI vehicle risk of the commercial vehicle to be evaluated is usually higher. There are many factors that cause risk accidents in the operation of commercial vehicles, such as driving techniques of drivers, performance of the commercial vehicle itself, and surroundings of the commercial vehicle. In order to quantify risk information, taking the internet of vehicles data of the commercial vehicle to be evaluated as a data base, and obtaining the quantified risk information through a data analysis technology. Risk information may also be represented by UBI scores.
The internet of vehicles data may be dynamic data collected by devices installed on the commercial vehicle to be evaluated, and the internet of vehicles data includes at least one of driving habit data, driving technology data, commercial vehicle information data, and surrounding environment data.
Specifically, the internet of vehicles data is based on driving habit data, driving technology data, commercial vehicle information data, surrounding environment number and the like of a driver acquired by devices such as an automatic emergency braking system (AutonomousEmergency Braking, AEB), an advanced driving assistance system (Advanced DrivingAssistance System, ADAS), a Blind-zone-Detection (BSD), a driver monitoring system (Driver monitoringsystem, DMS), a beacon machine and the like mounted on a commercial vehicle to be evaluated. These internet of vehicles data may include the following information: 1) Mileage; 2) A travel speed profile; 3) The driving time distribution; 4) A stroke; 5) Location-based services (Location Based Services, LBS), for example: frequent cities/provinces, frequent routes, radius of activity, cross provinces, etc.; 6) AEB pre-warning/braking; 7) DMS pre-warning, for example: unsafe driving behaviors such as distraction, eye closure, yawning, smoking, phone call and the like; 8) ADAS pre-warning, for example: rapid acceleration, rapid deceleration, rapid turning, overspeed, long-term driving, frequent vehicle following, frequent lane changing, front vehicle collision early warning, pedestrian collision early warning, vehicle distance approaching early warning, lane departure early warning and the like; 9) TTA (time to Arrival) segments, which are segments whose TTA value is within a dangerous threshold for a short time, for measuring the risk of pre-collision when approaching a vehicle, TTA value = distance to other vehicles/vehicle speed; 10 Fatigue index, which refers to a fatigue probability value output by a fatigue model and is used for measuring the fatigue driving degree of a driver when the DMS equipment alarms; 11 A stay segment for measuring whether the rest time of the driver is sufficient; 12 The steering section is used for identifying turning, lane changing and turning sections through the cumulative change of the course angle in the set time and is used for measuring the running risk of the vehicle when passing through the intersection; 13 Active day, referring to a daily driving distance of 150km or more; 14 Day/day of non-work; 15 Weather, which may include clouds, fog, rain and snow, temperature, visibility, etc.; 16 Cruise segment; 17 The illegal section only judges whether the lamp is correctly lighted or not and is used for measuring whether the driving is standard when the vehicle turns and changes lanes or not; 18 Blind area detection segments which are segments divided by BSD equipment early warning data and used for measuring the risk degree of the occurrence of the blind area of the vehicle. Of course, the above information may also include various levels of sub information, which may be used as internet of vehicles data, and by analyzing the internet of vehicles data, driving habits, driving technologies, information related to the commercial vehicle itself (such as wear level of the commercial vehicle) and the like may be determined, and these information may represent the level of risk information of the commercial vehicle.
For example, the internet of vehicles data of different levels are shown in table 1, and of course, the various data in table 1 are not limited to the internet of vehicles data, and may also include other internet of vehicles data.
Table 1 internet of vehicles data
S120, determining first characteristics associated with risk information of the commercial vehicle to be evaluated based on the internet of vehicles, and determining a first main model coefficient to be evaluated corresponding to the first characteristics based on the association relation between the pure risk premium and the first main model characteristics and the second main model characteristics.
The first feature is a feature obtained by extracting features of the internet of vehicles data. The first main model coefficient to be evaluated represents the influence degree of the first feature on the risk information of the commercial vehicle to be evaluated, and the association relation is determined based on the generalized linear model.
Specifically, feature extraction is performed on the internet of vehicles data to obtain a first feature. And determining the characteristic coefficients corresponding to the first characteristics from the association relation between the pure risk premium and the first main model characteristics and the second main model characteristics, wherein the determined characteristic coefficients are the first main model coefficients to be evaluated.
On the basis of the above example, the association relationship between the pure risk premium and the first and second main model features is obtained based on the following manner:
S121, acquiring pure risk premium corresponding to each historical commercial vehicle;
s122, extracting first main model features and second main model features corresponding to the historical commercial vehicles respectively based on the Internet of vehicles data of the historical commercial vehicles;
s123, training the generalized linear model according to each pure risk premium and the first main model characteristic and the second main model characteristic corresponding to each pure risk premium, and updating the first main model characteristic and the second main model characteristic to obtain the association relation between the pure risk premium and the first main model characteristic and the second main model characteristic.
The pure risk premium may be a payoff amount per unit time, and the type of the pure risk premium includes business insurance and transaction insurance. The first main model feature may be a model feature left after processing by feature screening or the like. The second main model feature may be a model feature obtained by processing the remaining features other than the first main model feature.
Specifically, the pure risk premium can be determined through the information record of each historical commercial vehicle, and the first main model feature can be obtained by performing feature screening, feature extraction and other processes according to various acquired internet-of-vehicles data on each historical commercial vehicle. And processing the rest of the internet of vehicles data except the first main model characteristic to obtain a second model characteristic. Furthermore, the first main model feature and the second main model feature are taken as independent variables, the pure risk premium is taken as an independent variable, GLM-Twiied is adopted for development and training, a model meeting the p value smaller than a preset value (such as 0.05 and the like) and model convergence is taken as a first main model which is trained in advance, the first main model feature and the second main model feature (insignificant features are removed) used by the first main model are determined, and the association relation between the pure risk premium and the first main model feature and the second main model feature can be determined according to the first main model.
On the basis of the above example, the first main model features respectively corresponding to the historic commercial vehicles may be extracted based on the internet of vehicles data of the historic commercial vehicles in the following manner:
s1221, determining vehicle networking data of each historical commercial vehicle as initial characteristics corresponding to each historical commercial vehicle, and screening the characteristics of each initial characteristic to obtain primary model characteristics;
s1222, dividing the primary selected main model characteristics into preset characteristic group numbers aiming at the primary selected main model characteristics to obtain extended main model characteristics;
s1223, training the generalized linear model according to the characteristics of the extended main model and the pure risk premium, and determining the characteristics of the extended main model meeting the preset conditions as the characteristics of the first main model.
The initial characteristic may be internet of vehicles data collected through devices installed on each historical commercial vehicle. It can be understood that the number of initial features is more than 600, for example, if all the initial features are directly fitted and trained on the model, on one hand, the model may not be converged, and the model result may not be obtained, and on the other hand, too many features may affect the stability of the model and the interpretation complexity, so that the initial features need to be initially screened, and features which can better represent the risk of the commercial vehicle and features with better effects on the model are extracted. The feature screening includes at least one of information contribution screening, interpretability screening, temporal consistency screening, and relevance screening. The primary model features may be initial features obtained after feature screening and processing. The preset feature group number is a numerical value for grouping features of the primary model, and may be determined according to feature changes of the features of the primary model, or may be a preset fixed numerical value, for example, 3-5 groups, etc. The extended main model features may be the features of the predetermined feature group number after each of the initially selected main model features is divided. The generalized linear model (Generalize Linear Model, GLM) is an extension of the linear model by a linking function to establish a relationship between the mathematical expectation of the response variable and the predicted variable of the linear combination. The generalized linear model may be a GLM-Twaie (claim-fitting model). The preset condition includes that the p value is smaller than the preset value and the model converges. The p value is a parameter used to determine the hypothesis test results, and may be compared using the reject field of the distribution according to different distributions.
Specifically, the internet of vehicles data collected by the equipment on each historical commercial vehicle is used as an initial characteristic. And carrying out feature screening on the initial features, and taking the initial features remained after the feature screening as primary model features. After preliminary screening, a plurality of primary selection main model characteristics are obtained. Further, each primary model feature is analyzed, and if the primary model feature is a continuity feature, the primary model feature can be divided into a plurality of groups with tendency according to the preset feature group number. And taking the characteristics obtained after division and the characteristics of the primary selected main model without division as the characteristics of the expanded main model. Taking the expansion main model characteristic as an independent variable, taking the pure risk premium as a dependent variable, adopting GLM-Twaie for development and training, and taking the model entering characteristic meeting the preset condition as a first main model characteristic.
The information contribution degree screening may be an IV (Infromation Value, information value) screening, where IV is used to represent the contribution degree of a feature to target prediction, that is, the prediction capability of the feature, and in general, the higher the IV value, the stronger the prediction capability of the feature and the higher the information contribution degree. For example, the initially selected features with the information contribution degree lower than the preset contribution degree can be removed, so that the initially selected features with the information contribution degree not lower than the preset contribution degree are reserved. The interpretive screening can be to screen out features that are well interpretive and have business meaning. The temporal consistency screening may be to cull out initial features that vary over time based on larger fluctuations, i.e. initial features that are less consistent with time. The correlation screening may be based on the correlation between the initial features, for example: if the correlation between the two initial features is higher than the preset correlation, the information quantity of the two initial features is similar, and one of the two initial features can be reserved.
It can be understood that the insurance data and the internet of vehicles data can be compared according to the license plate number, the frame number, the month and other information, and the pure risk premium and the initial characteristic corresponding to each historical commercial vehicle can be determined. If there is information for a non-whole month, for example: the month of equipment installation on the historical commercial vehicle can be standardized and expanded for the internet of vehicles data, such as difference processing and the like, and the data can be converted into whole month information.
For example, the primary model feature a and the pure risk premium have monotonicity, and then the primary model feature a may be divided into preset feature group numbers, for example, 3 groups, according to the size of the data value, to obtain A1, A2, and A3, that is, replace a to be an extended primary model feature.
Therefore, the initial features can be screened through S1221 to S1223, so as to obtain a part of initial features with better effect, namely, the first main model features.
On the basis of the above example, the second main model features respectively corresponding to the historic commercial vehicles may be extracted based on the internet of vehicles data of the historic commercial vehicles in the following manner:
s1224, carrying out feature preprocessing on the residual features to obtain features to be analyzed;
s1225, performing principal component analysis on the features to be analyzed to obtain a first number of principal components to be selected, and determining the interpretability of each principal component to be selected;
S1226, determining that the principal component to be selected meeting the interpretation threshold is a second principal model characteristic according to each interpretability.
Wherein the remaining features are initial features other than the first main pattern feature. The feature preprocessing comprises the steps of removing the residual features with the missing value ratio larger than a preset proportion, removing the residual features of characters, performing dummy variable processing on the residual features of character string types, performing median filling on the missing values in the residual features, and performing standardization processing on the residual features. It will be appreciated that the preset ratio may be a preset ratio, for example 20%, and the specific value is not specified, and the normalization process may be a process of (x-mean)/std, where x represents the residual feature, mean represents the average value of the residual feature, and std represents the standard deviation of the residual feature. The feature to be analyzed is obtained by carrying out feature pretreatment on the residual features and is used for carrying out subsequent principal component analysis. The first number may be a preset number of main components to be selected, and for example, the first number such as 100 may be set according to actual requirements. The interpretation threshold may be a sum of interpretations for distinguishing the second main model feature, the interpretations being added one by one from high to low to get a sum of interpretations, e.g. 0.7. The principal component to be selected may be each principal component obtained by subjecting the feature to be analyzed to principal component analysis processing. The interpretability can be used to represent the correlation between each principal component to be selected and the pure risk premium, i.e. the stronger the interpretability, the larger the information content, the more obvious the advantage for model training and use, which can be the variance contribution rate.
Specifically, initial features except the first main model feature are determined to be residual features, the residual features are subjected to feature pretreatment, and the features which are residual and processed after the feature pretreatment are used as features to be analyzed. And further, carrying out principal component analysis on the features to be analyzed according to the first quantity to obtain a first quantity of principal components, namely principal components to be selected. Further, the variance contribution ratio of each principal component to be selected can be obtained as the interpretability, the interpretability is added from high to low, and when the accumulated interpretability reaches the interpretation threshold (for example, 70%), the principal components to be selected are considered to contain enough information, and the principal components to be selected are regarded as the second principal model characteristics.
On the basis of the above example, the first main model coefficient to be evaluated corresponding to the first feature may be determined based on the association relationship between the pure risk premium and the first main model feature and the second main model feature by:
s1227, determining a main model coefficient to be selected corresponding to each main model feature according to the association relation between the pure risk premium and the first main model feature and the second main model feature;
s1228, determining each main model coefficient to be selected corresponding to the first feature as a first main model coefficient to be evaluated according to the first feature and the corresponding relation of each main model feature.
The main model coefficients to be selected may be feature coefficients corresponding to the features of each main model. The primary pattern features include a first primary pattern feature and a second primary pattern feature.
Specifically, after training to obtain the association relationship between the pure risk premium and the first main model feature and the second main model feature, the feature coefficients corresponding to the main model features, that is, the main model coefficients to be selected, may be determined. And according to the first characteristic, the corresponding relation between the pure risk premium and the first main model characteristic and the second main model characteristic is carried out, and each corresponding main model coefficient to be selected is determined to be successful, namely the first main model coefficient to be evaluated.
S130, inputting the vehicle networking data into a pre-trained first auxiliary model to obtain a first auxiliary model result to be converted, and determining a first auxiliary model coefficient to be evaluated according to the first auxiliary model result to be converted and the corresponding relation of the first result coefficient.
The corresponding relation between the first auxiliary model and the first result coefficient is determined based on the xgboost algorithm, and the first auxiliary model characteristics and the pure risk premium corresponding to each historical commercial vehicle. The first auxiliary model features are features extracted based on internet of vehicles data of each historical commercial vehicle. The first auxiliary model coefficient to be evaluated represents the influence degree of the vehicle networking data on the risk information of the commercial vehicle to be evaluated, and can be an output result after the first auxiliary model processes the vehicle networking data. The first result coefficient correspondence may be a correspondence between the output result of the first auxiliary model and the result coefficient, which is established in advance, and may be in the form of a functional relationship or the like. The first auxiliary model coefficient to be evaluated may be a result coefficient corresponding to the first result coefficient correspondence.
Specifically, the internet of vehicles data is input into a first auxiliary model which is trained in advance, and the output result of the first auxiliary model is used as a first auxiliary model result to be converted. And carrying out conversion processing on the first auxiliary model result to be converted according to the corresponding relation of the first result coefficient, and determining the result coefficient corresponding to the first auxiliary model result to be converted, namely the first auxiliary model coefficient to be evaluated.
On the basis of the above example, the first auxiliary model is trained based on the following manner:
s131, acquiring pure risk premium corresponding to each historical commercial vehicle and first auxiliary model features;
and S132, training the xgboost model according to each pure risk premium and the first auxiliary model characteristics corresponding to each pure risk premium to obtain a first auxiliary model.
The types of the pure risk premium include business insurance and traffic insurance. Since the risk of both insurance is subsequently assessed, the data of the historical commercial vehicle with both commercial and traffic insurance is selected for subsequent processing.
Specifically, the pure risk premium can be determined through the information record of each historical commercial vehicle, and the first auxiliary model feature can be obtained by performing feature screening, feature extraction and other processes according to various internet-of-vehicles data collected on each historical commercial vehicle. Taking the first auxiliary model characteristics corresponding to each pure risk premium as independent variables, taking each pure risk premium as independent variable, training the xgboost model, and taking the xgboost model after training as the first auxiliary model.
Based on the above example, the first auxiliary model features corresponding to each historical commercial vehicle may be obtained by:
s1311, acquiring initial characteristics corresponding to each historical commercial vehicle, and performing characteristic preprocessing on each initial characteristic to obtain first primary characteristics of an auxiliary model;
s1312, carrying out principal component analysis on the first primary selected features of the auxiliary models according to the categories of the data of the Internet of vehicles, and determining importance degrees corresponding to the first primary selected features of each auxiliary model;
s1313, determining a second number of auxiliary model first check features corresponding to each category according to the importance degree of the auxiliary model first primary selection features;
s1314, training the xgboost model according to the first check features of each auxiliary model and the pure risk premium, and determining the features of the first auxiliary model.
The feature preprocessing comprises the steps of removing initial features with the missing value ratio larger than a preset proportion, removing initial features of characters, performing dummy variable processing on the initial features of character string types, performing median filling on the missing values in the initial features, and removing the initial features with the distance correlation coefficient smaller than a preset distance coefficient. The first initially selected feature of the auxiliary model may be an initial feature obtained after feature preprocessing. The distance correlation coefficient may be a distance correlation coefficient between the initial feature and the pure risk premium, and the preset distance coefficient may be a preset value, for example, 0.01, and the specific value is not limited in size. The category may be a category obtained by classifying the initial feature according to a certain rule, for example, the category may be driving habit, driving technology, commercial vehicle information and surrounding environment, or may be a plurality of other categories obtained by cluster analysis, and may be set according to actual requirements. The second number may be the number of secondary model first check features determined in each category, e.g., 3, etc. The importance degree is the sum of products of principal component coefficients corresponding to first primary selected features of the auxiliary model and principal component variance contribution rates in all principal components obtained through principal component analysis. The secondary model first check feature may be a second number of secondary model first preliminary features that are forward of importance in each category.
Specifically, the internet of vehicles data collected by the equipment on each historical commercial vehicle is used as an initial characteristic. And carrying out feature pretreatment on the initial features, and taking the initial features obtained after the feature pretreatment as first primary features of the auxiliary model. Classifying the first primary selected features of the auxiliary models according to categories, and carrying out principal component analysis on the first primary selected features of the auxiliary models under each category aiming at each category. And after the principal component analysis, a plurality of principal components can be obtained, each principal component comprises a plurality of first primary selection features of the auxiliary model and principal component coefficients corresponding to the first primary selection features of each auxiliary model, and the principal component variance contribution rate can be determined. For each first primary feature of the auxiliary model, the principal component coefficients and principal component variance contribution rates of the first primary feature of the auxiliary model in each principal component can be multiplied, and the products are added to obtain the importance degree of the first primary feature of the auxiliary model. Accordingly, the second number of auxiliary model first primary selection features with the front importance degree in each category can be obtained to be the auxiliary model first check features. Taking the first check feature of each auxiliary model as an independent variable, taking the pure risk premium as an independent variable, adopting an xgboost model for development and training, and taking the model entering feature meeting the preset condition as the first auxiliary model feature.
Illustratively, the first primary feature of the secondary model exceeding 600 is classified into 42 categories. And solving the importance degree of the first primary selected feature of each auxiliary model by a PCA (Principal Components Analysis, principal component analysis) coefficient method. The PCA coefficient method may be used to solve the importance, specifically, PCA processing is performed on a feature, where the PCA coefficient (importance) of a single feature is obtained by multiplying the absolute value of the coefficient of the feature in each principal component by the variance contribution rate of each principal component, and then adding the sum. For each category, the first primary feature of the secondary model with the top three importance ranks is determined to be the first check feature of the secondary model, and it is understood that the number of the first check features of the secondary model is 42×3=126.
Illustratively, the final modeling feature, i.e., the first auxiliary model feature, is determined based on a feature stability selection method, which is specifically as follows: i. training the xgboost model by all principal components; performing random sampling to determine a training set and a testing set during each training, and randomly selecting a group of model parameters within a specified range; selecting a secondary principal component with an importance greater than 0.01 after each training; and iv, after the preset times (for example, 1000 times) of cyclic training, counting the selected probability of each secondary main component, namely, the ratio of the selected times to the preset times, and finally, keeping the secondary main component with the selected probability being larger than the preset probability (for example, 0.6) as the first auxiliary model characteristic.
On the basis of the above example, the first result coefficient correspondence may be obtained based on the following manner:
s133, inputting the characteristics of each first auxiliary model into the first auxiliary model to obtain each first model result;
s134, sorting and dividing each first model result into first result groups with preset first group numbers according to the size;
s135, determining that the sum of the pure risk premium corresponding to the first result group is a first dependent variable true value and the sum of the first model result corresponding to the first result group is a first dependent variable predicted value according to each first result group, and determining the quotient of the first dependent variable true value and the first dependent variable predicted value as a first coefficient of the first result group;
s136, according to the first coefficient maximum value, the first coefficient minimum value and the first coefficients corresponding to each first result group, the corresponding relation of the first result coefficients is determined through smoothing processing.
Wherein the first model result is an output result of the first auxiliary model. The preset first group number may be a preset number of resulting packets, may be 20, etc. The first result groupings may be the results of each first model result grouping. The first coefficient is the residual in the first result packet. The first coefficient maximum value may be a preset maximum value of the first coefficient, the first coefficient minimum value may be a preset minimum value of the first coefficient, and the specific value may be determined according to a distribution of the first coefficient.
Specifically, for each group of first auxiliary model features, the group of first auxiliary model features may be input into the first auxiliary model to obtain a first model result. And sorting the first model results from large to small or from small to large, dividing the sorted first model results into preset first group arrays, wherein each group is a first result group. It should be noted that the number of the first model results in each group of the first result groups may be the same or different. For each first result group, a first coefficient corresponding to the first result group may be determined, which may specifically be: and determining the sum of the pure risk premium corresponding to each first model result in the first result group as a real value of the first dependent variable, determining the sum of each first model result in the first result group as a predicted value of the first dependent variable, dividing the real value of the first dependent variable by the predicted value of the first dependent variable, and obtaining the result as a first coefficient corresponding to the first result group so as to obtain the first coefficient of each group of the first result group. The first coefficient corresponding to each first result group can be controlled between the first coefficient maximum value and the first coefficient minimum value, namely, if the first coefficient is larger than the first coefficient maximum value, the first coefficient is adjusted to be the first coefficient maximum value; if the first coefficient is smaller than the first coefficient minimum value, the first coefficient is adjusted to be the first coefficient minimum value; the rest of the way, the first coefficient is kept unchanged. After the first coefficients are adjusted, smoothing processing can be performed on each first coefficient according to the first model result distribution and the first coefficients in each first result group, so as to obtain a functional relationship, namely, a first result coefficient corresponding relationship.
Optionally, the first coefficients corresponding to the plurality of first result groups are combined according to the size, and the combining principle may be: 1) The number of first model results in the first result group cannot be less than a preset number (e.g.: 300, etc.); 2) Combining two or more first result groups with similar first coefficients; 3) The first coefficients corresponding to the first result groups after combination should have obvious distinguishability. Further, the first result group after the combination may be used as a new first result group, and the first coefficient obtained again after the combination may be used as a new first coefficient.
By way of example, the smoothing process may be a process as follows: and establishing a rectangular coordinate system by taking the first model result as a horizontal axis and taking the first coefficient as a vertical axis, and drawing the piecewise function. And determining the maximum value and the minimum value of the first model result in each group of first result groups, and taking the average value of the maximum value and the minimum value as the turning point corresponding to the first result groups. And connecting turning points corresponding to groups of first results adjacent to the first model results to obtain a continuously-changing curve, wherein the function curve relationship is the first result coefficient corresponding relationship.
And S140, inputting the vehicle networking data into a pre-trained second auxiliary model to obtain a second auxiliary model result to be converted, and determining a second auxiliary model coefficient to be evaluated according to the second auxiliary model result to be converted and the second result coefficient corresponding relation.
And determining the corresponding relation between the second auxiliary model and the second result coefficient based on the neural network algorithm and the characteristics and the pay rate of the second auxiliary model corresponding to each historical commercial vehicle. The second auxiliary model result to be converted may be an output result of the second auxiliary model after the vehicle network data is processed. The second auxiliary model coefficient to be evaluated represents the influence degree of the vehicle networking data on the risk information of the commercial vehicle to be evaluated, and the influence degree can be a result coefficient corresponding to the second result coefficient corresponding relation. The second auxiliary model feature and the first auxiliary model feature are features of different dimensions extracted based on internet of vehicles data of each historical commercial vehicle. The second result coefficient correspondence may be a correspondence between the output result of the second auxiliary model and the result coefficient, which is established in advance, and may be in the form of a functional relationship or the like.
Specifically, the internet of vehicles data are input into a pre-trained second auxiliary model, and the output result of the second auxiliary model is used as a second auxiliary model result to be converted. And converting the second auxiliary model result to be converted according to the second result coefficient corresponding relation, and determining the result coefficient corresponding to the second auxiliary model result to be converted, namely the second auxiliary model coefficient to be evaluated.
On the basis of the above example, the second auxiliary model is trained based on the following manner:
s141, obtaining the corresponding pay rate and the second auxiliary model characteristic of each historical commercial vehicle;
and S142, training the neural network model according to the pay rate of each bid amount and the characteristics of the second auxiliary model corresponding to the pay rate of each bid amount to obtain the second auxiliary model.
The pay rate of the pay can be one kind of premium data in insurance data of each historical commercial vehicle, specifically refers to pay times in unit time, and the type of pay rate of the pay includes commercial insurance and traffic insurance. Since the risk of both insurance is subsequently assessed, the data of the historical commercial vehicle with both commercial and traffic insurance is selected for subsequent processing. The neural network model may be various neural network models, and may be a BP (Back Propagation) neural network.
Specifically, the payment rate of the payment of the bidding fee can be determined through the information record of each historical commercial vehicle, and the second auxiliary model feature can be obtained by performing feature screening, feature extraction and other processes according to various acquired internet-of-vehicles data on each historical commercial vehicle. And training the neural network model by taking the second auxiliary model characteristics corresponding to the pay-per-bid odds as independent variables and the pay-per-bid odds as dependent variables, and taking the trained neural network model as the second auxiliary model.
Based on the above example, the second auxiliary model features corresponding to each historical commercial vehicle may be obtained by:
s1411, obtaining initial characteristics and the payment rate of the marking fee corresponding to each historical commercial vehicle, and carrying out characteristic pretreatment on each initial characteristic to obtain a second primary characteristic of the auxiliary model;
s1412, training tree models with preset model numbers according to the second primary selection characteristics of each auxiliary model and the payment rate of each bidding expense, and determining the second primary selection characteristics of the auxiliary model corresponding to each tree model;
s1413, determining second auxiliary model features according to the second initial selection features of the auxiliary models corresponding to the various tree models.
The feature preprocessing comprises the steps of removing initial features with the missing value ratio larger than a preset proportion, removing initial features with standard deviation being zero, removing initial features with the correlation being higher than the preset correlation, removing initial features of characters, performing dummy variable processing on initial features of character string types, and performing median filling on missing values in the initial features. Removing the initial feature representation with a correlation higher than the preset correlation indicates that if the correlation between the two initial features is higher than the preset correlation (e.g., 0.99, etc.), the information amounts of the two initial features are similar, and one of the two initial features can be reserved. The secondary model second preliminary feature may be an initial feature obtained after feature preprocessing. The number of pre-set models may be a number of pre-selected pre-trained models, i.e. a number of tree models. The number of the preset models is more than 2 so as to ensure the pre-training effect. The tree model includes at least two of random forests, adaboost, and ExtraTree. Of course, the tree model may be other tree models than the three types described above, for example: GBDT (Gradient Boosting DecisionTree, iterative decision tree), CART (Classification and RegressionTree, classification regression tree), etc.
Specifically, the internet of vehicles data collected by the equipment on each historical commercial vehicle is used as an initial characteristic. And carrying out feature pretreatment on the initial features, and taking the initial features obtained after the feature pretreatment as second initial selection features of the auxiliary model. Selecting tree models with preset model numbers, aiming at each tree model, taking the secondary model second primary selection feature as an independent variable, taking the payment rate of the standard fee as a dependent variable, training by adopting the tree model, and taking the secondary model second primary selection feature with preset feature numbers (for example: 30) with the forefront importance in the trained tree model as the secondary model second primary selection feature corresponding to the tree model so as to obtain the secondary model second primary selection feature corresponding to each tree model. Summarizing the second primary selected features of the auxiliary model corresponding to the tree models, wherein the summarized second primary selected features of the auxiliary model are the second auxiliary model features.
On the basis of the above example, the second result coefficient correspondence relationship may be obtained based on the following manner:
s143, inputting the characteristics of each second auxiliary model into the second auxiliary model to obtain each second model result;
s144, sorting and dividing each second model result into second result groups with preset second group numbers according to the size;
S145, determining that the sum of the payrate of the bidding fee corresponding to the second result grouping is a second dependent variable true value and the sum of the second model results corresponding to the second result grouping is a second dependent variable predicted value according to each second result grouping, and determining the quotient of the second dependent variable true value and the second dependent variable predicted value as a second coefficient of the second result grouping;
s146, according to the maximum value and the minimum value of the second coefficient and the second coefficient corresponding to each second result group, the corresponding relation of the second result coefficient is determined through smoothing processing.
It is to be understood that the specific embodiments of S143 to S146 are similar to those of S133 to S136, and reference is made to the foregoing, and detailed description thereof will be omitted.
And S150, determining target risk information corresponding to the commercial vehicle to be evaluated according to the first main model coefficient to be evaluated, the first auxiliary model coefficient to be evaluated and the second auxiliary model coefficient to be evaluated.
The target risk information may be risk information obtained after risk analysis of the commercial vehicle to be evaluated, and the risk information is related information for representing probability of occurrence of risk accidents of the commercial vehicle to be evaluated in an operation process.
Specifically, the number of the main model coefficients to be evaluated may be one or more, and the number of the main model coefficients to be evaluated is the same as the number of the first features, and the number of the first auxiliary model coefficients to be evaluated and the number of the second auxiliary model coefficients to be evaluated are one. The first to-be-evaluated main model coefficient, the first to-be-evaluated auxiliary model coefficient and the second to-be-evaluated auxiliary model coefficient can be integrated in a preset mode, and the integrated numerical value is used as target risk information corresponding to the to-be-evaluated commercial vehicle. The preset mode may include simple mathematical calculations such as multiplication and addition, complex function calculations, pre-trained model calculations, and the like.
Optionally, in order to prevent the first to-be-evaluated main model coefficient, the first to-be-evaluated auxiliary model coefficient and the second to-be-evaluated auxiliary model coefficient from being excessively large in value and affecting calculation of the target risk information, normalization processing or limiting the calculation to be within a preset range may be performed on the target risk information.
On the basis of the above example, the target risk information corresponding to the commercial vehicle under evaluation may be determined from the first main model coefficient under evaluation, the first auxiliary model coefficient under evaluation, and the second auxiliary model coefficient under evaluation by:
s151, taking the product of the first main model coefficient to be evaluated, the first auxiliary model coefficient to be evaluated and the second auxiliary model coefficient to be evaluated as a target risk coefficient corresponding to the commercial vehicle to be evaluated;
and S152, determining target risk information corresponding to the commercial vehicle to be evaluated according to the target risk coefficient and the predetermined coefficient information corresponding relation.
The risk coefficient may be a product of each first to-be-evaluated primary model coefficient, the first to-be-evaluated secondary model coefficient, and the second to-be-evaluated secondary model coefficient. The coefficient information correspondence may be a functional relationship that converts risk coefficients into risk information, for example: and converting the risk coefficient into a corresponding relation of the risk information of the percentile and the like.
Specifically, the risk information is obtained by continuously multiplying each first to-be-evaluated main model coefficient, each first to-be-evaluated auxiliary model coefficient and each second to-be-evaluated auxiliary model coefficient. And further, converting the risk information according to a predetermined coefficient information corresponding relation to obtain target risk information corresponding to the commercial vehicle to be evaluated.
For example, the risk coefficient is mapped into risk information, the range of the risk information is 40-100, and the coefficient information correspondence may be set to 40 points corresponding to the highest value of the risk coefficient and 100 points corresponding to the lowest value of the risk coefficient. For example: the coefficient information correspondence relationship may be a linear function or the like.
The method and the system have the advantages that the vehicle networking data of the commercial vehicle to be evaluated are obtained, the multidimensional vehicle networking data are used for determining the subsequent risk information of the commercial vehicle, further, the first feature is determined based on the vehicle networking data, the first main model coefficient to be evaluated corresponding to the first feature is determined based on the association relation between the pure risk premium and the first main model feature and the second main model feature, main relevant information in the risk information of the commercial vehicle is obtained, the vehicle networking data are input into a first auxiliary model which is trained in advance, a first auxiliary model result to be converted is obtained, the first auxiliary model coefficient to be evaluated is determined according to the first auxiliary model result to be converted and the first result coefficient corresponding relation, the vehicle networking data are input into a second auxiliary model which is trained in advance, the second auxiliary model result to be converted is obtained, the second auxiliary model coefficient to be converted is determined according to the second auxiliary model result to be converted and the second result coefficient corresponding relation, and other relevant information in the risk information of the commercial vehicle is obtained, the various models are combined, the accuracy and the robustness of the various auxiliary models are improved, the first auxiliary model is fully estimated according to the first auxiliary model to be converted, the accuracy and the first auxiliary model coefficient to be evaluated is fully estimated, and the accuracy of the risk information is fully estimated.
In consideration of the situation that there is no equipment record (internet of vehicles data) in the commercial vehicle to be evaluated and the situation that the number of driving days of the commercial vehicle to be evaluated for which the newly installed equipment is not available for risk analysis, wind direction analysis can be performed using relatively fixed vehicle attribute data. Fig. 2 is a flowchart of another method for analyzing risk of a commercial vehicle according to an embodiment of the present invention. Referring to fig. 2, the method specifically includes:
s210, acquiring vehicle attribute data of the commercial vehicle to be evaluated.
S220, determining second characteristics associated with risk information of the commercial vehicle to be evaluated based on vehicle attribute data, and determining a second main model coefficient to be evaluated corresponding to the second characteristics based on association relation between the risk information and the attribute characteristics.
S230, determining target risk information corresponding to the commercial vehicle to be evaluated according to the second main model coefficient to be evaluated.
The second main model coefficient to be evaluated represents the influence degree of the second feature on the risk information of the commercial vehicle to be evaluated, and the association relation is determined based on the generalized linear model.
S210 to S220 are similar to S120 to S220, with the specific difference that: the vehicle attribute data may be used to describe basic attributes of the commercial vehicle under evaluation, and may be obtained without additional installed equipment, and may include at least one of license plate attribution, vehicle age, vehicle type, vehicle brand, and energy type.
On the basis of the above example, the association relationship between the risk information and the attribute features is obtained based on the following manner:
s221, acquiring history risk information and attribute characteristics corresponding to each history commercial vehicle;
s222, training the generalized linear model according to each attribute feature and the historical risk information corresponding to each attribute feature to obtain the association relationship between the risk information and the attribute feature.
The attribute features can be attribute information of the historical commercial vehicle, namely, historical attributes, and the features are obtained after feature processing. The historical risk information may be determined by internet of vehicles data calculations, for example, by S110 to S150.
Specifically, the historical risk information corresponding to the historical commercial vehicles can be determined through the information records (such as the internet of vehicles information and the like) of the historical commercial vehicles. And further, carrying out processing such as feature extraction, feature screening and the like according to the corresponding vehicle attribute data on each historical commercial vehicle to determine attribute features. And taking the attribute characteristics as independent variables, taking the history risk information as the dependent variables, adopting GLM-Twiiedel to develop and train, and determining the association relationship between the risk information and the attribute characteristics according to a model meeting preset conditions.
On the basis of the above example, the attribute characteristics of each history commercial vehicle can be obtained by:
s2221, acquiring historical attributes corresponding to all the historical commercial vehicles, and screening the characteristics of all the historical attributes to obtain initial characteristics;
s2222, dividing the initial features into preset feature group numbers aiming at each initial feature to obtain attribute features.
Wherein, the feature screening comprises information contribution screening, interpretability screening, time consistency screening, monotonicity screening and relevance screening.
Specifically, feature screening is performed on the historical attributes corresponding to each historical commercial vehicle, and the historical attributes obtained after feature screening are used as initial features. And analyzing each primary selected feature, if the primary selected feature is a continuous feature (such as the vehicle age, etc.), dividing the primary selected feature into a plurality of groups with tendency according to the preset feature group number, and taking the feature obtained after division and the primary selected feature without division as attribute features.
On the basis of the above example, the second main model coefficient to be evaluated corresponding to the second feature may be determined based on the association relationship between the risk information and the attribute feature by:
S2223, determining a main model coefficient to be determined corresponding to each attribute feature according to the association relation between the risk information and the attribute feature;
s2224, determining each main model coefficient to be determined corresponding to the second feature as a second main model coefficient to be evaluated according to the vehicle attribute data and the corresponding relation of each attribute feature.
S2223 to S2224 are similar to S1228 to S1229, and reference is made to the above description, and a detailed description thereof will be omitted.
On the basis of the above example, the target risk information corresponding to the commercial vehicle under evaluation may be determined from the second main model coefficient under evaluation by:
s231, taking the product of the second main model coefficients to be evaluated as a risk coefficient corresponding to the commercial vehicle to be evaluated;
s232, determining target risk information corresponding to the commercial vehicle to be evaluated according to the risk coefficient and the predetermined coefficient information corresponding relation.
S232 is similar to S152, and reference is made to the above description, and will not be repeated here.
The method and the device have the advantages that the vehicle attribute data of the commercial vehicle to be evaluated are obtained, so that the multidimensional vehicle attribute data are used for subsequent commercial vehicle risk information determination under the condition that the vehicle networking data cannot be obtained, further, the second characteristic is determined based on the vehicle attribute data, the second main model coefficient to be evaluated corresponding to the second characteristic is determined based on the association relation between the risk information and the attribute characteristic, the related information in the commercial vehicle risk information is obtained, the target risk information corresponding to the commercial vehicle to be evaluated is determined according to the second main model coefficient to be evaluated, various data dimensions are fully considered under the condition that the vehicle networking data are not available, and the accuracy of the commercial vehicle risk information evaluation is improved.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to limit the scope of the present application. As used in the specification and in the claims, the terms "a," "an," "the," and/or "the" are not specific to a singular, but may include a plurality, unless the context clearly dictates otherwise. The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method or apparatus comprising such elements.
It should also be noted that the positional or positional relationship indicated by the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc. are based on the positional or positional relationship shown in the drawings, are merely for convenience of describing the present invention and simplifying the description, and do not indicate or imply that the apparatus or element in question must have a specific orientation, be constructed and operated in a specific orientation, and thus should not be construed as limiting the present invention. Unless specifically stated or limited otherwise, the terms "mounted," "connected," and the like are to be construed broadly and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communication between two elements. The specific meaning of the above terms in the present invention will be understood in specific cases by those of ordinary skill in the art.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the essence of the corresponding technical solutions from the technical solutions of the embodiments of the present invention.

Claims (10)

1. A method for analyzing risk of a commercial vehicle, comprising:
acquiring internet of vehicles data of a commercial vehicle to be evaluated;
determining a first feature associated with risk information of a commercial vehicle to be evaluated based on the internet of vehicles data, determining a first main model coefficient to be evaluated corresponding to the first feature based on an association relation between pure risk premium and the first main model feature and the second main model feature, wherein the first main model coefficient to be evaluated represents the influence degree of the first feature on the risk information of the commercial vehicle to be evaluated, and the association relation is determined based on a generalized linear model; the first main model features and the second main model features are extracted based on internet of vehicles data of each historical commercial vehicle in the following mode: determining the internet of vehicles data of each historical commercial vehicle as initial characteristics corresponding to each historical commercial vehicle, and screening the characteristics of each initial characteristic to obtain primary selected model characteristics; dividing the primary selected main model characteristics into preset characteristic group numbers aiming at the primary selected main model characteristics to obtain extended main model characteristics; training a generalized linear model according to the extended main model characteristics and the pure risk premium, and determining the extended main model characteristics meeting preset conditions as first main model characteristics; the preset conditions comprise that the p value is smaller than a preset value and the model converges; carrying out feature pretreatment on the residual features to obtain features to be analyzed; wherein the remaining features are initial features other than the first main pattern feature; performing principal component analysis on the features to be analyzed to obtain a first number of principal components to be selected, and determining the corresponding interpretability of each principal component to be selected; determining that the principal component to be selected meeting the interpretation threshold is a second principal model characteristic according to each interpretability;
Inputting the Internet of vehicles data into a pre-trained first auxiliary model to obtain a first auxiliary model result to be converted, and determining a first auxiliary model coefficient to be evaluated according to the first auxiliary model result to be converted and a first result coefficient corresponding relation; the first auxiliary model is determined based on an xgboost algorithm, first auxiliary model features corresponding to each historical commercial vehicle and pure risk premium, and the first auxiliary model coefficient to be evaluated represents the influence degree of the vehicle networking data on risk information of the commercial vehicle to be evaluated; the first auxiliary model features are features extracted based on internet of vehicles data of each historical commercial vehicle; the first result coefficient corresponding relation is a corresponding relation between an output result of a first auxiliary model and a result coefficient, which is established in advance, and is obtained based on the following mode: inputting the characteristics of each first auxiliary model into the first auxiliary model to obtain each first model result; sorting and dividing each first model result into first result groups with preset first group numbers according to the size; for each first result group, determining that the sum of the pure risk premium corresponding to the first result group is a first dependent variable true value and the sum of the first model result corresponding to the first result group is a first dependent variable predicted value, and determining that the quotient of the first dependent variable true value and the first dependent variable predicted value is a first coefficient of the first result group; determining a first result coefficient corresponding relation through smoothing processing according to the first coefficient maximum value, the first coefficient minimum value and the first coefficients corresponding to the first result groups;
Inputting the Internet of vehicles data into a pre-trained second auxiliary model to obtain a second auxiliary model result to be converted, and determining a second auxiliary model coefficient to be evaluated according to the second auxiliary model result to be converted and a second result coefficient corresponding relation; the second auxiliary model is determined based on a neural network algorithm, and second auxiliary model characteristics and standard fee issuing odds corresponding to each historical commercial vehicle, and the second auxiliary model coefficient to be evaluated represents the influence degree of the vehicle networking data on risk information of the commercial vehicle to be evaluated; the second result coefficient corresponding relation is a corresponding relation between the output result of the second auxiliary model and the result coefficient, which is established in advance; the second auxiliary model feature and the first auxiliary model feature are features of different dimensions extracted based on internet of vehicles data of each historical commercial vehicle;
and determining target risk information corresponding to the commercial vehicle to be evaluated according to the first main model coefficient to be evaluated, the first auxiliary model coefficient to be evaluated and the second auxiliary model coefficient to be evaluated.
2. The method of claim 1, wherein the association between the pure risk premium and the first and second master model features is based on:
Acquiring the corresponding pure risk premium of each historical commercial vehicle; wherein the types of the pure risk premium include business risk and traffic risk;
extracting first main model features and second main model features corresponding to the historical commercial vehicles respectively based on the Internet of vehicles data of the historical commercial vehicles;
training a generalized linear model according to the pure risk premium and the first main model characteristic and the second main model characteristic corresponding to the pure risk premium, and updating the first main model characteristic and the second main model characteristic to obtain the association relation between the pure risk premium and the first main model characteristic and the second main model characteristic.
3. The method of claim 1, wherein said feature screening of each of said initial features comprises at least one of information contribution screening, interpretive screening, temporal consistency screening, and correlation screening; the method comprises the steps of carrying out feature pretreatment on the residual features, namely removing the residual features with the missing value ratio larger than a preset proportion, removing the residual features of characters, carrying out dummy variable treatment on the residual features of character string types, carrying out median filling on the missing values in the residual features, and carrying out standardization treatment on the residual features.
4. The method according to claim 2, wherein the determining the first to-be-evaluated master model coefficient corresponding to the first feature based on the association relationship between the pure risk premium and the first and second master model features includes:
determining a main model coefficient to be selected corresponding to each main model feature according to the association relation between the pure risk premium and the first main model feature and the second main model feature; wherein the primary pattern features include the first primary pattern feature and the second primary pattern feature;
and determining each main model coefficient to be selected corresponding to the first feature as a first main model coefficient to be evaluated according to the first feature and the corresponding relation between each main model feature and the main model coefficient to be selected.
5. The method of claim 1, wherein the first secondary model is trained based on:
acquiring pure risk premium and first auxiliary model features corresponding to each historical commercial vehicle; wherein the types of the pure risk premium include business risk and traffic risk;
and training the xgboost model according to the pure risk premium and the first auxiliary model characteristics corresponding to the pure risk premium to obtain a first auxiliary model.
6. The method of claim 5, wherein obtaining the corresponding first secondary model feature for each historical commercial vehicle comprises:
acquiring initial characteristics corresponding to each historical commercial vehicle, and performing characteristic preprocessing on each initial characteristic to obtain a first primary characteristic of an auxiliary model; the method comprises the steps of carrying out feature preprocessing on initial features, wherein the feature preprocessing comprises the steps of removing initial features with missing value ratio larger than a preset proportion, removing initial features of characters, carrying out dummy variable processing on initial features of character string types, carrying out median filling on missing values in the initial features, and removing initial features with distance correlation coefficients smaller than a preset distance coefficient;
according to the category of the internet of vehicles data, main component analysis is carried out on the first primary selected features of the auxiliary model, and the importance degree corresponding to each first primary selected feature of the auxiliary model is determined; the importance degree is the sum of products of principal component coefficients corresponding to the first primary selected features of the auxiliary model and principal component variance contribution rates in all principal components obtained through principal component analysis;
determining a second number of auxiliary model first check features corresponding to each category according to the importance degree of each auxiliary model first primary feature;
And training the xgboost model according to the first check feature of each auxiliary model and the pure risk premium, and determining the first auxiliary model feature.
7. The method of claim 1, wherein the second secondary model is trained based on:
obtaining the corresponding mark issuing rate and the second auxiliary model characteristic of each historical commercial vehicle; wherein, the types of the pay rate include business risk and traffic risk;
training the neural network model according to the pay rate of each bid and the characteristics of the second auxiliary model corresponding to the pay rate of each bid to obtain the second auxiliary model;
correspondingly, the second result coefficient corresponding relation is obtained based on the following mode:
inputting the characteristics of each second auxiliary model into the second auxiliary model to obtain each second model result;
sorting and dividing each second model result into second result groups with preset second group numbers according to the size;
for each second result group, determining that the sum of the payrate of the credit corresponding to the second result group is a second dependent variable true value and the sum of the second model results corresponding to the second result group is a second dependent variable predicted value, and determining that the quotient of the second dependent variable true value and the second dependent variable predicted value is a second coefficient of the second result group;
And determining a second result coefficient corresponding relation through smoothing processing according to the second coefficient maximum value, the second coefficient minimum value and the second coefficients corresponding to the second result groups.
8. The method of claim 7, wherein obtaining the corresponding second auxiliary model feature for each historical commercial vehicle comprises:
acquiring initial characteristics and the pay rate of the marking fee corresponding to each historical commercial vehicle, and carrying out characteristic pretreatment on each initial characteristic to obtain a second initial selection characteristic of the auxiliary model; the method comprises the steps of performing feature preprocessing on initial features, wherein the feature preprocessing comprises the steps of removing initial features with missing value ratio larger than a preset proportion, removing initial features with standard deviation of zero, removing initial features with correlation higher than the preset correlation, removing initial features of characters, performing dummy variable processing on initial features of character string types, and performing median filling on missing values in the initial features;
training tree models with preset model numbers according to the second primary selection characteristics of the auxiliary models and the pay rate of the standard expense, and determining the second primary selection characteristics of the auxiliary models corresponding to each tree model; wherein the number of the preset models is more than 2; the tree model comprises at least two of a random forest, an Adaboost and an ExtraTree;
And determining second auxiliary model features according to the second initial selection features of the auxiliary models corresponding to the various tree models.
9. The method of claim 1, wherein the determining target risk information corresponding to the commercial vehicle under evaluation based on the first primary model coefficient under evaluation, the first secondary model coefficient under evaluation, and the second secondary model coefficient under evaluation comprises:
taking the product of the first main model coefficient to be evaluated, the first auxiliary model coefficient to be evaluated and the second auxiliary model coefficient to be evaluated as a target risk coefficient corresponding to the commercial vehicle to be evaluated;
and determining target risk information corresponding to the commercial vehicle to be evaluated according to the target risk coefficient and a predetermined coefficient information corresponding relation.
10. The method as recited in claim 1, further comprising:
acquiring vehicle attribute data of a commercial vehicle to be evaluated under the condition that the vehicle networking data of the commercial vehicle to be evaluated cannot be acquired;
determining a second feature associated with risk information of the commercial vehicle to be evaluated based on the vehicle attribute data, determining a second main model coefficient to be evaluated corresponding to the second feature based on an association relation between the risk information and the attribute feature, wherein the second main model coefficient to be evaluated represents the influence degree of the second feature on the risk information of the commercial vehicle to be evaluated, and the association relation is determined based on a generalized linear model;
And determining target risk information corresponding to the commercial vehicle to be evaluated according to the second main model coefficient to be evaluated.
CN202310375492.1A 2023-04-11 2023-04-11 Commercial vehicle risk analysis method Active CN116091254B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310375492.1A CN116091254B (en) 2023-04-11 2023-04-11 Commercial vehicle risk analysis method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310375492.1A CN116091254B (en) 2023-04-11 2023-04-11 Commercial vehicle risk analysis method

Publications (2)

Publication Number Publication Date
CN116091254A CN116091254A (en) 2023-05-09
CN116091254B true CN116091254B (en) 2023-08-01

Family

ID=86188210

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310375492.1A Active CN116091254B (en) 2023-04-11 2023-04-11 Commercial vehicle risk analysis method

Country Status (1)

Country Link
CN (1) CN116091254B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109064002A (en) * 2018-07-26 2018-12-21 阿里巴巴集团控股有限公司 Vehicle risk appraisal procedure, device and equipment
CN109377398A (en) * 2018-11-19 2019-02-22 北京金州世纪信息技术有限公司 The classification risk method and device of UBI insurance
CN109544351A (en) * 2018-10-12 2019-03-29 平安科技(深圳)有限公司 Vehicle risk appraisal procedure, device, computer equipment and storage medium
CN112801393A (en) * 2021-02-05 2021-05-14 中国银行保险信息技术管理有限公司 Transfer factor-based vehicle insurance risk prediction method and device and storage medium
CN114997458A (en) * 2022-04-01 2022-09-02 上海评驾科技有限公司 Vehicle insurance claim rate prediction method based on principal component analysis and linear regression
CN115510990A (en) * 2022-10-10 2022-12-23 上海汽车集团股份有限公司 Model training method and related device

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210110480A1 (en) * 2019-10-13 2021-04-15 TrueLite Trace, Inc. Intelligent machine sensing and machine learning-based commercial vehicle insurance risk scoring system
CN112862621A (en) * 2021-01-15 2021-05-28 广州亚美信息科技有限公司 Vehicle insurance risk assessment method and device and computer equipment
CN114493895A (en) * 2021-12-30 2022-05-13 武汉东湖大数据交易中心股份有限公司 Method, system, equipment and storage medium for vehicle insurance claim payment rate pricing
CN115578205A (en) * 2022-09-22 2023-01-06 上海七炅信息科技有限公司 Vehicle insurance pure risk premium prediction method and device based on GLM and machine learning algorithm

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109064002A (en) * 2018-07-26 2018-12-21 阿里巴巴集团控股有限公司 Vehicle risk appraisal procedure, device and equipment
CN109544351A (en) * 2018-10-12 2019-03-29 平安科技(深圳)有限公司 Vehicle risk appraisal procedure, device, computer equipment and storage medium
CN109377398A (en) * 2018-11-19 2019-02-22 北京金州世纪信息技术有限公司 The classification risk method and device of UBI insurance
CN112801393A (en) * 2021-02-05 2021-05-14 中国银行保险信息技术管理有限公司 Transfer factor-based vehicle insurance risk prediction method and device and storage medium
CN114997458A (en) * 2022-04-01 2022-09-02 上海评驾科技有限公司 Vehicle insurance claim rate prediction method based on principal component analysis and linear regression
CN115510990A (en) * 2022-10-10 2022-12-23 上海汽车集团股份有限公司 Model training method and related device

Also Published As

Publication number Publication date
CN116091254A (en) 2023-05-09

Similar Documents

Publication Publication Date Title
CN111815986B (en) Traffic accident early warning method and device, terminal equipment and storage medium
US11836802B2 (en) Vehicle operation analytics, feedback, and enhancement
CN107784587B (en) Driving behavior evaluation system
Wang et al. How much data are enough? A statistical approach with case study on longitudinal driving behavior
Najafi Moghaddam Gilani et al. Data-driven urban traffic accident analysis and prediction using logit and machine learning-based pattern recognition models
CN110171361B (en) Automobile safety early warning method based on emotion and driving tendency of driver
CN111785023A (en) Vehicle collision risk early warning method and system
CN110675626B (en) Traffic accident black point prediction method, device and medium based on multidimensional data
CN110304068B (en) Method, device, equipment and storage medium for collecting automobile driving environment information
Hu et al. Efficient mapping of crash risk at intersections with connected vehicle data and deep learning models
CN111340355A (en) Matching method, device, server and medium of travel order
CN109887279B (en) Traffic jam prediction method and system
CN115294767B (en) Real-time detection and traffic safety early warning method and device for expressway lane line
CN109784586B (en) Prediction method and system for danger emergence condition of vehicle danger
CN114971009A (en) Vehicle insurance risk prediction method and computer equipment
CN113592221B (en) Road section risk dynamic assessment method based on safety substitution evaluation index
CN113095387B (en) Road risk identification method based on networking vehicle-mounted ADAS
CN114169444A (en) Driving style classification method considering risk potential field distribution under vehicle following working condition
CN116091254B (en) Commercial vehicle risk analysis method
CN113401130A (en) Driving style recognition method and device based on environmental information and storage medium
CN115169996B (en) Road risk determination method, apparatus and storage medium
CN116753938A (en) Vehicle test scene generation method, device, storage medium and equipment
CN112036709B (en) Random forest based rainfall weather expressway secondary accident cause analysis method
CN111775948B (en) Driving behavior analysis method and device
Siaminamini et al. Generating a risk profile for car insurance policyholders: A deep learning conceptual model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant