CN115293808A - Construction method of vehicle insurance cost prediction model based on artificial intelligence and related equipment - Google Patents

Construction method of vehicle insurance cost prediction model based on artificial intelligence and related equipment Download PDF

Info

Publication number
CN115293808A
CN115293808A CN202210951208.6A CN202210951208A CN115293808A CN 115293808 A CN115293808 A CN 115293808A CN 202210951208 A CN202210951208 A CN 202210951208A CN 115293808 A CN115293808 A CN 115293808A
Authority
CN
China
Prior art keywords
cost
data
factor
factors
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210951208.6A
Other languages
Chinese (zh)
Inventor
喻芳
柴松举
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Property and Casualty Insurance Company of China Ltd
Original Assignee
Ping An Property and Casualty Insurance Company of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Property and Casualty Insurance Company of China Ltd filed Critical Ping An Property and Casualty Insurance Company of China Ltd
Priority to CN202210951208.6A priority Critical patent/CN115293808A/en
Publication of CN115293808A publication Critical patent/CN115293808A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0206Price or cost determination based on market factors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/08Insurance

Abstract

The application provides a construction method and a device of an artificial intelligence-based vehicle insurance cost prediction model, electronic equipment and a storage medium, wherein the construction method of the artificial intelligence-based vehicle insurance cost prediction model comprises the following steps: storing all historical cost data to construct a data set, the historical cost data including actual cost and all category type factors and numerical type factors; constructing a cost decision tree based on the actual cost in the cost dataset and all the category factors; dividing all historical cost data into a plurality of cost data subsets based on a cost decision tree; fitting a plurality of cost sub-models based on the actual cost of the cost data subset and all numerical model factors; combining the cost decision tree and all the cost sub-models into a vehicle insurance cost prediction model; and acquiring target vehicle insurance data, and acquiring a cost prediction result of the target vehicle insurance data based on the vehicle insurance cost prediction model. According to the method and the device, all category type factors and numerical type factors can be comprehensively considered, and the accuracy of predicting the vehicle insurance cost is improved.

Description

Construction method of vehicle insurance cost prediction model based on artificial intelligence and related equipment
Technical Field
The application relates to the technical field of artificial intelligence, in particular to a construction method and device of a vehicle insurance cost prediction model based on artificial intelligence, electronic equipment and a storage medium.
Background
Under the large background of comprehensive improvement of vehicle insurance, accurate monitoring of vehicle insurance cost becomes an important means for stable development of financial insurance, an operation strategy is determined through cost monitoring, important reference significance is provided for financial insurance companies, and vehicle insurance cost prediction is the important factor in realizing accurate monitoring of vehicle insurance cost.
At present, generally, numerical values of different factors are directly subjected to weighted summation or a vehicle insurance cost prediction result is obtained by means of a neural network, however, the factors considered in the method are not fine enough, the logic relationship among different dimensions is neglected, the accuracy of vehicle insurance cost prediction is low, and the detailed requirement of daily operation management cannot be met.
Disclosure of Invention
In view of the above, there is a need to provide a method for constructing an artificial intelligence-based vehicle insurance cost prediction model and a related device, so as to solve the technical problem of how to improve the accuracy of vehicle insurance cost prediction, wherein the related device includes a construction apparatus of an artificial intelligence-based vehicle insurance cost prediction model, an electronic device and a storage medium.
The application provides a construction method of an artificial intelligence-based vehicle insurance cost prediction model, which comprises the following steps:
storing all historical cost data to construct a data set, wherein the historical cost data comprises actual cost and factor values of all preset factors, and the preset factors comprise category type factors and numerical type factors;
constructing a cost decision tree based on the actual cost of each piece of historical cost data in the cost data set and the factor values of all category factors, wherein the cost decision tree comprises a plurality of leaf nodes;
classifying all historical cost data in the cost data set based on the cost decision tree to obtain a plurality of cost data subsets, wherein the cost data subsets correspond to leaf nodes of the cost decision tree one to one;
fitting a cost sub-model for each leaf node based on the actual cost of each piece of historical cost data in the cost data subset and the factor values of all numerical type factors;
combining the cost decision tree and the cost sub-model of each leaf node together to form the car insurance cost prediction model;
and acquiring target vehicle insurance data, and acquiring a cost prediction result of the target vehicle insurance data based on the vehicle insurance cost prediction model, wherein the target vehicle insurance data comprises factor values of all preset factors.
In some embodiments, the storing all historical cost data to construct the cost data set, the historical cost data including factor values for actual costs and all preset factors, the preset factors including category type factors and numerical type factors, includes:
acquiring all preset factors influencing the vehicle insurance cost, and classifying all the preset factors based on the value range of the preset factors to obtain a classification result, wherein the classification result comprises a category factor and a numerical factor;
if the value range of the preset factors is continuous, the classification result is a numerical factor, and the numerical factor comprises a traffic violation coefficient, a non-claim-payment treating coefficient, a channel preference coefficient and an autonomous underwriting coefficient;
if the value range of the preset factors is discrete, the classification result is a classification type factor, and the classification type factor comprises preset vehicle types, dangerous types, vehicle purposes and vehicle types;
acquiring factor values of all preset factors and corresponding actual costs in each historical vehicle insurance policy in the historical vehicle insurance data as a piece of historical cost data;
all historical cost data is stored to build the present data set.
In some embodiments, said constructing a cost decision tree based on the actual cost of each piece of historical cost data in said set of cost data and the factor values for all category factors, said cost decision tree comprising a plurality of leaf nodes, comprises:
a1, acquiring the actual cost of each piece of historical cost data in the cost data set and factor values of all type factors to construct a type factor data set;
a2, calculating the information gain of each type factor based on the type factor data set;
a3, selecting a type factor corresponding to the maximum value of the information gain as a target factor, and taking the target factor as node information to obtain a newly added node;
a4, drawing a preset number of directed edges by taking the newly added node as a starting point, wherein the preset number is the same as the number of factor values in the value range of the target factor, and the directed edges correspond to the factor values of the target factor one by one;
a5, screening all historical cost data in the category type factor data set based on the factor values corresponding to the directed edges to obtain a data subset of each directed edge;
a6, calculating information entropy of actual cost in the data subset of the target directed edge as target cost entropy, wherein the target directed edge is any one of all directed edges; if the target cost entropy is smaller than a preset entropy value, generating a null node as an end point of the target directed edge; if the target cost entropy is not smaller than the preset entropy value, deleting all factor values of the target factors in the category type factor data set to obtain an updated category type factor data set, repeatedly executing the step A2 to the step A3 to obtain a non-empty newly added node, and taking the newly added node as the terminal point of the target directed edge;
a7, traversing all the directed edges to obtain the end point of each directed edge, and if the end points of all the directed edges are empty nodes, completing the construction of the cost decision tree; and if at least one directed edge has a non-empty newly added node as the end point, executing the steps A4 to A7 aiming at each non-empty newly added node until the construction of the cost decision tree is completed.
In some embodiments, said calculating an information gain for each category factor based on said set of category factor data comprises:
calculating the information entropy of the actual cost in the category factor data set, wherein the information entropy of the actual cost satisfies the relation:
Figure BDA0003789245530000021
wherein X represents the category factor data set, min (X) is the minimum value of the actual cost in X, max (X) is the maximum value of the actual cost in X, N (X) is the number of all the actual costs in X, N i () The actual cost in X is the number of i, and E (X) is the information entropy of the actual cost in X;
selecting all historical cost data with the same target type factor as the same factor value from the type factor data set to obtain a data subset of each factor value in the target type factor, wherein the target type factor is any one of all the type factors;
calculating an information gain for the target type factor based on the information entropy of the actual cost in the data subset for each factor value of the category type factor data set and the target type factor, the information gain satisfying the relationship:
Figure BDA0003789245530000031
wherein X is the data set of the category type factors, E (X) is the information entropy of the actual cost in X, v is the number of all factor values in the value range of the target category type factors, and M is k Is a data subset of the kth factor value in the target type factor M, E (M) k ) Is M k The information entropy of the actual cost, N (X) is the number of all the actual costs in X, N (M) k ) Is M k The amount of all actual costs in (1), gain (M), is the information Gain of the target type factor M;
traversing all the category factors obtains the information gain of each category factor.
In some embodiments, the classifying all historical cost data in the cost data set based on the cost decision tree results in a plurality of cost data subsets, including:
randomly selecting a piece of historical cost data from the cost data set as target data, wherein the target data comprises actual cost and factor values of all preset factors;
sequentially matching all directed edges of nodes in the cost decision tree and factor values of category factors in the target data by taking a root node of the cost decision tree as a starting point to obtain a classification result, wherein the classification result is a leaf node corresponding to the target data;
storing the target data in a cost data subset of leaf nodes in the classification result;
traversing all historical cost data in the cost data set to continually update a subset of cost data for each leaf node;
when all the historical cost data is traversed, the cost data subset of each leaf node is obtained.
In some embodiments, said fitting a cost sub-model for each leaf node based on the actual cost of each piece of historical cost data in the subset of cost data and the factor values of all numerical type factors comprises:
acquiring the actual cost of each piece of historical cost data in the cost data subset and factor values of all numerical type factors to construct a numerical type factor data set of each leaf node;
fitting an initial cost sub-model based on a numerical model factor data set of a target leaf node to obtain the cost sub-model of the target leaf node, wherein the target leaf node is any one of all leaf nodes, and the initial cost sub-model satisfies the relation:
Figure BDA0003789245530000032
where Num is the number of all numerical type factors, w j Is the influence factor of the jth numerical factor, Z j Is the value of the jth numerical model factor, and C is the predicted cost of the initial cost sub-model, where w j Is the undetermined coefficient;
the numerical type factor data sets for all leaf nodes are traversed to obtain a cost sub-model for each leaf node.
In some embodiments, the obtaining target vehicle insurance data and obtaining a cost prediction result of the target vehicle insurance data based on the vehicle insurance cost prediction model, where the target vehicle insurance data includes factor values of all preset factors, includes:
acquiring target vehicle insurance data, wherein the target vehicle insurance data comprise factor values of all preset factors, and the preset factors comprise category type factors and numerical type factors;
taking a root node of a cost decision tree in the vehicle insurance cost prediction model as a starting point, and sequentially matching all directed edges of the nodes in the cost decision tree with factor values of category factors in the target vehicle insurance data to obtain a classification result, wherein the classification result is a leaf node corresponding to the target vehicle insurance data;
taking the cost sub-model corresponding to the leaf node in the classification result as a target sub-model;
and inputting the factor values of the numerical type factors in the target vehicle insurance data into the target sub-model to calculate the cost prediction result of the target vehicle insurance data.
The embodiment of the present application further provides a device for constructing a vehicle insurance cost prediction model based on artificial intelligence, the device includes:
the storage unit is used for storing all historical cost data to construct a data set, the historical cost data comprises actual cost and factor values of all preset factors, and the preset factors comprise category type factors and numerical type factors;
a construction unit, configured to construct a cost decision tree based on an actual cost of each piece of historical cost data in the cost data set and factor values of all category factors, where the cost decision tree includes a plurality of leaf nodes;
a classification unit, configured to classify all historical cost data in the cost data set based on the cost decision tree to obtain a plurality of cost data subsets, where the cost data subsets correspond to leaf nodes of the cost decision tree one to one;
a fitting unit for fitting a cost sub-model of each leaf node based on the actual cost of each piece of historical cost data in the cost data subset and the factor values of all numerical type factors;
a combination unit for combining the cost decision tree and the cost sub-model of each leaf node together to form the vehicle insurance cost prediction model;
and the prediction unit is used for acquiring target vehicle insurance data and obtaining a cost prediction result of the target vehicle insurance data based on the vehicle insurance cost prediction model, wherein the target vehicle insurance data comprises factor values of all preset factors.
An embodiment of the present application further provides an electronic device, where the electronic device includes:
a memory storing at least one instruction;
and the processor executes the instructions stored in the memory to realize the construction method of the artificial intelligence-based vehicle insurance cost prediction model.
The embodiment of the application also provides a computer-readable storage medium, wherein at least one instruction is stored in the computer-readable storage medium and executed by a processor in an electronic device to implement the method for constructing the artificial intelligence-based vehicle insurance cost prediction model.
In conclusion, all category type factors and numerical type factors influencing the vehicle insurance cost can be comprehensively considered, the category type factors are used for constructing a decision tree to obtain a cost sub-model corresponding to the judgment logic of each category type factor, an accurate vehicle insurance cost prediction result is obtained based on the cost sub-model and the numerical type factors, and the accuracy of vehicle insurance cost prediction is improved.
Drawings
FIG. 1 is a flow chart of a preferred embodiment of a method for building an artificial intelligence based vehicle insurance cost prediction model to which the present application relates.
Fig. 2 is a schematic diagram of a structure of a cost decision tree to which the present application relates.
Fig. 3 is a schematic structural diagram of a vehicle insurance cost prediction model according to the present application.
FIG. 4 is a functional block diagram of a preferred embodiment of an apparatus for building an artificial intelligence based vehicle insurance cost prediction model according to the present application.
Fig. 5 is a schematic structural diagram of an electronic device according to a preferred embodiment of the method for constructing an artificial intelligence-based vehicle insurance cost prediction model according to the present application.
Detailed Description
For a clearer understanding of the objects, features and advantages of the present application, reference is made to the following detailed description of the present application along with the accompanying drawings and specific examples. It should be noted that the embodiments and features of the embodiments of the present application may be combined with each other without conflict. In the following description, numerous specific details are set forth to provide a thorough understanding of the present application, and the described embodiments are merely a subset of the embodiments of the present application and are not intended to be a complete embodiment.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, features defined as "first" and "second" may explicitly or implicitly include one or more of the described features. In the description of the present application, "a plurality" means two or more unless specifically limited otherwise.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein in the description of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
The embodiment of the present Application provides a method for constructing an artificial intelligence-based vehicle insurance cost prediction model, which can be applied to one or more electronic devices, where the electronic devices are devices capable of automatically performing numerical calculation and/or information processing according to preset or stored instructions, and hardware of the electronic devices includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.
The electronic device may be any electronic product capable of performing human-computer interaction with a client, for example, a Personal computer, a tablet computer, a smart phone, a Personal Digital Assistant (PDA), a game machine, an Internet Protocol Television (IPTV), an intelligent wearable device, and the like.
The electronic device may also include a network device and/or a client device. The network device includes, but is not limited to, a single network server, a server group consisting of a plurality of network servers, or a Cloud Computing (Cloud Computing) based Cloud consisting of a large number of hosts or network servers.
The Network where the electronic device is located includes, but is not limited to, the internet, a wide area Network, a metropolitan area Network, a local area Network, a Virtual Private Network (VPN), and the like.
FIG. 1 is a flow chart of a preferred embodiment of the construction method of the present invention based on the artificial intelligence vehicle insurance cost prediction model. The order of the steps in the flow chart may be changed and some steps may be omitted according to different needs.
And S10, storing all historical cost data to construct a data set, wherein the historical cost data comprises actual cost and factor values of all preset factors, and the preset factors comprise category type factors and numerical type factors.
In an alternative embodiment, the storing all historical cost data to construct the cost data set, the historical cost data including factor values of the actual cost and all preset factors, the preset factors including category type factors and numerical type factors, includes:
acquiring all preset factors influencing the vehicle insurance cost, and classifying all the preset factors based on the value range of the preset factors to obtain a classification result, wherein the classification result comprises a category factor and a numerical factor;
if the value range of the preset factors is continuous, the classification result is a numerical factor, and the numerical factor comprises a traffic violation coefficient, a non-reimbursement preferential treatment coefficient, a channel preferential coefficient and an autonomous underwriting coefficient;
if the value range of the preset factors is discrete, the classification result is a classification type factor, and the classification type factor comprises preset vehicle types, dangerous types, vehicle purposes and vehicle types;
acquiring factor values of all preset factors and corresponding actual costs in each historical vehicle insurance policy in the historical vehicle insurance data as a piece of historical cost data;
all historical cost data is stored to construct the cost data set.
The historical cost data comprises the actual cost of one historical vehicle insurance policy in the historical vehicle insurance data and the factor values of all preset factors of the historical vehicle insurance policy.
In this optional embodiment, the traffic violation coefficients are formulated according to historical records of traffic violation in various regions, and the range of the traffic violation coefficients is 0.9 to 1.5; the claim-free preferential treatment coefficient is uniformly set by the China insurance supervision and management committee according to the vehicle type, the vehicle price and the vehicle age, and the range of the claim-free preferential treatment coefficient is 0.6-2.0; the channel preferential coefficient and the autonomous underwriting coefficient are set by each insurance company according to the historical claims record and the risk quality of the insured vehicle, and the range of the channel preferential coefficient and the autonomous underwriting coefficient is 0.75-1.
In this optional embodiment, the value range of the preset vehicle type includes 4 factor values of vehicles in the hongkong and australia, yue, motorcycles, trailers and single-trip vehicles, and the single-trip vehicle-trip refers to a process of temporarily moving an unsold automobile from one place to another; the value range of the vehicle application comprises two factor values of commercial vehicles and domestic vehicles; the value range of the dangerous species comprises three factor values of single traffic, single quotient and joint quotient guarantee, wherein the single traffic means that the vehicle only guarantees the motor vehicle traffic accident responsibility forced insurance, the single quotient means that the vehicle only guarantees the motor vehicle business insurance, and the joint quotient guarantee means that the vehicle both guarantees the motor vehicle traffic accident responsibility forced insurance and the motor vehicle business insurance; the value range of the vehicle type comprises 3 factor values of a compact car, an SUV vehicle type and an MPV vehicle type.
It should be noted that the accuracy of the prediction result of the vehicle insurance cost can be ensured by refined preset factors, and the preset factors are not limited to the numerical type factors and the category type factors.
In this way, a cost data set is obtained, which includes a large amount of historical cost data, and provides a data basis for the construction of a subsequent vehicle insurance cost prediction model.
S11, constructing a cost decision tree based on the actual cost of each piece of historical cost data in the cost data set and the factor values of all the category factors, wherein the cost decision tree comprises a plurality of leaf nodes.
In an alternative embodiment, said constructing a cost decision tree based on the actual cost of each piece of historical cost data in said set of cost data and the factor values of all category factors, said cost decision tree comprising a plurality of leaf nodes, comprises:
a1, acquiring the actual cost of each piece of historical cost data in the cost data set and factor values of all type factors to construct a type factor data set;
in this optional embodiment, for each piece of historical cost data in the cost data set, the actual cost and the factor values of all the type factors in the historical cost data are obtained to obtain the type factor cost data, and all the type factor cost data are further stored to obtain the type factor data set.
A2, calculating the information gain of each type factor based on the type factor data set;
in an alternative embodiment, said calculating an information gain for each of the categorical factors based on the categorical factor dataset comprises:
calculating the information entropy of the actual cost in the category factor data set, wherein the information entropy of the actual cost satisfies the relation:
Figure BDA0003789245530000071
wherein X represents the category factor data set, min (X) is the minimum value of the actual cost in X, max (X) is the maximum value of the actual cost in X, N (X) is the number of all the actual costs in X, N i () The actual cost in X is the number of i, and E (X) is the information entropy of the actual cost in X;
selecting all historical cost data with the same target type factor as the same factor value from the type factor data set to obtain a data subset of each factor value in the target type factor, wherein the target type factor is any one of all the type factors;
calculating an information gain for the target type factor based on the information entropy of the actual cost in the data subsets of each factor value of the type factor data set and the target type factor, the information gain satisfying the relationship:
Figure BDA0003789245530000072
wherein X is the data set of the type factors, E (X) is the information entropy of the actual cost in X, v is the number of all factor values in the value range of the target type factors, and M is k Is a data subset of the kth factor value in the target type factor M, E (M) k ) Is M k The information entropy of the actual cost, N (X) is the number of all the actual costs in X, N (M) k ) Is M k The amount of all actual costs in (1), gain (M), is the information Gain of the target type factor M;
traversing all the category factors obtains the information gain of each category factor.
It should be noted that, the larger the information gain is, the greater the influence of the category-type factor on the prediction result of the vehicle insurance cost is.
A3, selecting a type factor corresponding to the maximum value of the information gain as a target factor, and taking the target factor as node information to obtain a newly added node;
a4, drawing a preset number of directed edges by taking the newly added node as a starting point, wherein the preset number is the same as the number of factor values in the value range of the target factor, and the directed edges correspond to the factor values of the target factor one to one;
in this optional embodiment, if the value range of the target factor includes three factor values in total, three directed edges are drawn with the newly added node as a starting point, and each directed edge corresponds to one factor value.
A5, screening all historical cost data in the category type factor data set based on the factor values corresponding to the directed edges to obtain a data subset of each directed edge;
in this optional embodiment, the filtering all historical cost data in the category-type factor data set based on the factor value corresponding to the directed edge to obtain the data subset of each directed edge includes:
randomly selecting one directed edge as a target directed edge, and taking a factor value of the target directed edge as a target factor value;
selecting all historical cost data with the target factors as the target factor values in the category factor data set as the data subsets of the target directed edges;
all the directed edges are traversed to obtain a data subset for each directed edge.
A6, calculating information entropy of actual cost in the data subset of the target directed edge as a target cost entropy, wherein the target directed edge is any one of all directed edges; if the target cost entropy is smaller than a preset entropy value, generating a null node as an end point of the target directed edge; if the target cost entropy is not smaller than the preset entropy value, deleting all factor values of the target factors in the category type factor data set to obtain an updated category type factor data set, repeatedly executing the step A2 to the step A3 to obtain a non-empty newly added node, and taking the newly added node as the terminal point of the target directed edge;
in this optional embodiment, the value of the preset entropy is 0.1.
A7, traversing all the directed edges to obtain the end point of each directed edge, and if the end points of all the directed edges are empty nodes, completing the construction of the cost decision tree; and if at least one directed edge has a non-empty newly added node as the end point, executing the steps A4 to A7 aiming at each non-empty newly added node until the construction of the cost decision tree is completed.
In this optional embodiment, leaf nodes in the cost decision tree are all empty nodes, each non-empty node in the cost decision tree corresponds to one type factor, and each directed edge corresponds to one factor value of the type factor in a parent node, where the parent node is a node corresponding to the start point of the directed edge. The structure diagram of the cost decision tree is shown in fig. 2.
Therefore, the cost decision tree is constructed according to the actual cost of each piece of historical cost data in the cost data set and the factor values of all the category factors, the cost decision tree can accurately reflect the logic relation among all the category factors, and the accuracy of the vehicle insurance cost prediction model is guaranteed.
S12, classifying all historical cost data in the cost data set based on the cost decision tree to obtain a plurality of cost data subsets, wherein the cost data subsets correspond to leaf nodes of the cost decision tree one to one.
In an optional embodiment, the classifying all historical cost data in the cost data set based on the cost decision tree to obtain a plurality of cost data subsets includes:
randomly selecting a piece of historical cost data from the cost data set as target data, wherein the target data comprises actual cost and factor values of all preset factors;
sequentially matching all directed edges of nodes in the cost decision tree and factor values of category factors in the target data by taking a root node of the cost decision tree as a starting point to obtain a classification result, wherein the classification result is a leaf node corresponding to the target data;
storing the target data in a cost data subset of leaf nodes in the classification result;
traversing all historical cost data in the cost data set to continually update a subset of cost data for each leaf node;
when all the historical cost data is traversed, the cost data subset of each leaf node is obtained.
In this optional embodiment, the taking a root node of the cost decision tree as a starting point, sequentially matching all directed edges of the nodes in the cost decision tree with factor values of category factors in the target data to obtain a classification result, where the classification result is a leaf node corresponding to the target data, includes:
acquiring a root node of the cost decision tree as a node to be matched, and taking a type factor in node information of the node to be matched as a factor to be matched;
matching each directed edge of the node to be matched with the factor value of the factor to be matched in the target data to obtain a matched directed edge, and taking the end point of the matched directed edge as a new node to be matched;
if the new node to be matched is a leaf node, taking the new node to be matched as a classification result of the target data;
and if the new node to be matched is a non-leaf node, taking the type factor in the new node to be matched as a new factor to be matched, and repeatedly executing the matching action until the classification result of the target data is obtained.
For example, assuming that the structure of the cost decision tree is as shown in fig. 2, if the factor values of the category type factor 1, the category type factor 2, the category type factor 3, the category type factor 4, and the category type factor 5 in the target data are the factor value 12, the factor value 22, the factor value 31, the factor value 42, and the factor value 51 in sequence, the classification node of the target data is a leaf node pointed to by the directed edge corresponding to the factor value 42 in the cost decision tree.
Therefore, the cost data set can be divided into a plurality of cost data subsets, each cost data subset corresponds to the leaf nodes in the cost decision tree one by one, the influence of category type factors on vehicle insurance cost prediction is eliminated from each cost data subset, and the accuracy of a vehicle insurance cost prediction model is guaranteed.
S13, fitting the cost sub-model of each leaf node based on the actual cost of each piece of historical cost data in the cost data subset and the factor values of all numerical type factors.
In an alternative embodiment, in the cost decision tree, each leaf node represents a type factor judgment logic, the influence of the type factor on the car insurance cost prediction is eliminated in the cost data subset of one leaf node, and the cost sub-model corresponding to each leaf node can be obtained based on the cost data subset.
In an alternative embodiment, said fitting a cost sub-model for each leaf node based on the actual cost of each of the historical cost data in the subset of cost data and the factor values of all numerical type factors comprises:
acquiring the actual cost of each piece of historical cost data in the cost data subset and factor values of all numerical type factors to construct a numerical type factor data set of each leaf node;
fitting an initial cost sub-model based on a numerical model factor data set of a target leaf node to obtain the cost sub-model of the target leaf node, wherein the target leaf node is any one of all leaf nodes, and the initial cost sub-model satisfies the relation:
Figure BDA0003789245530000091
where Num is the number of all numerical type factors, w j Is the influence factor of the jth numerical factor, Z j Is the value of the jth numerical model factor, and C is the predicted cost of the initial cost sub-model, where w j Is the undetermined coefficient;
the numerical type factor data sets for all leaf nodes are traversed to obtain a cost sub-model for each leaf node.
In this alternative embodiment, the fitting the initial cost sub-model based on the numerical type factor data set of the target leaf node to obtain the cost sub-model of the target leaf node includes: and determining all undetermined coefficients of the initial cost sub-model by using a least square method to obtain the cost sub-model of the target leaf node.
It should be noted that the decision logic of the type factor represented by different leaf nodes is different, so the cost submodels of different leaf nodes are different.
Therefore, the initial cost sub-model is fitted based on the numerical model factor data set of each leaf node to obtain the cost sub-model of each leaf node, the judgment logics of the category model factors corresponding to each cost sub-model are different, and the accuracy of the prediction result of the cost sub-model is guaranteed.
And S14, combining the cost decision tree and the cost sub-model of each leaf node together to form the vehicle insurance cost prediction model.
In an alternative embodiment, the cost sub-model of each leaf node is used as the node information of the leaf node to obtain a car insurance cost prediction model, the car insurance cost prediction model includes the cost decision tree and the cost sub-model of each leaf node, and a schematic structural diagram of the car insurance cost prediction model is shown in fig. 3.
Therefore, the construction of the vehicle insurance cost prediction model is completed, all category type factors and numerical type factors are comprehensively considered by the vehicle insurance cost prediction model, and the accuracy of the vehicle insurance cost prediction model is ensured.
S15, obtaining target vehicle insurance data, and obtaining a cost prediction result of the target vehicle insurance data based on the vehicle insurance cost prediction model, wherein the target vehicle insurance data comprises factor values of all preset factors.
In an optional embodiment, the obtaining target vehicle insurance data and obtaining a cost prediction result of the target vehicle insurance data based on the vehicle insurance cost prediction model, where the target vehicle insurance data includes factor values of all preset factors, includes:
acquiring target vehicle insurance data, wherein the target vehicle insurance data comprise factor values of all preset factors, and the preset factors comprise category type factors and numerical type factors;
taking a root node of a cost decision tree in the vehicle insurance cost prediction model as a starting point, and sequentially matching all directed edges of the nodes in the cost decision tree with factor values of category factors in the target vehicle insurance data to obtain a classification result, wherein the classification result is a leaf node corresponding to the target vehicle insurance data;
taking the cost submodel corresponding to the leaf node in the classification result as a target submodel;
and inputting the factor values of the numerical type factors in the target vehicle insurance data into the target sub-model to calculate the cost prediction result of the target vehicle insurance data.
Therefore, in the vehicle insurance cost prediction model, the target sub-model is determined according to the factor values of the category type factors in the target vehicle insurance data, and the factor values of the numerical type factors in the target vehicle insurance data are input into the target sub-model to obtain an accurate cost prediction result.
According to the technical scheme, all category type factors and numerical type factors influencing the vehicle insurance cost can be comprehensively considered, the category type factors are used for constructing the decision tree to obtain the cost submodel corresponding to the judgment logic of each category type factor, an accurate vehicle insurance cost prediction result is obtained based on the cost submodel and the numerical type factors, and the accuracy of vehicle insurance cost prediction is improved.
Referring to fig. 4, fig. 4 is a functional block diagram of a preferred embodiment of an apparatus for building an artificial intelligence-based vehicle insurance cost prediction model according to the present invention. The construction device 11 of the artificial intelligence-based vehicle insurance cost prediction model comprises a storage unit 110, a construction unit 111, a classification unit 112, a fitting unit 113, a combination unit 114 and a prediction unit 115. A module/unit as referred to herein is a series of computer readable instruction segments capable of being executed by the processor 13 and performing a fixed function, and is stored in the memory 12. In the present embodiment, the functions of the modules/units will be described in detail in the following embodiments.
In an alternative embodiment, the storage unit 110 is configured to store all historical cost data to construct the cost data set, the historical cost data including factor values of all preset factors including category type factors and numerical type factors and the actual cost.
In an alternative embodiment, the storing all historical cost data to construct the cost data set, the historical cost data including actual cost and factor values for all preset factors, the preset factors including category type factors and numerical type factors, includes:
acquiring all preset factors influencing the vehicle insurance cost, and classifying all the preset factors based on the value range of the preset factors to obtain a classification result, wherein the classification result comprises a category factor and a numerical factor;
if the value range of the preset factors is continuous, the classification result is a numerical factor, and the numerical factor comprises a traffic violation coefficient, a non-claim-payment treating coefficient, a channel preference coefficient and an autonomous underwriting coefficient;
if the value range of the preset factors is discrete, the classification result is a classification type factor, and the classification type factor comprises preset vehicle types, dangerous types, vehicle purposes and vehicle types;
acquiring factor values of all preset factors and corresponding actual costs in each historical vehicle insurance policy in the historical vehicle insurance data as a piece of historical cost data;
all historical cost data is stored to build the present data set.
The historical cost data comprises the actual cost of one historical vehicle insurance policy in the historical vehicle insurance data and the factor values of all preset factors of the historical vehicle insurance policy.
In the optional embodiment, the traffic violation coefficient is formulated according to the historical record of traffic violation in each region, and the range of the traffic violation coefficient is 0.9-1.5; the claim-free preferential treatment coefficient is uniformly set by the China insurance supervision and management committee according to the vehicle type, the vehicle price and the vehicle age, and the range of the claim-free preferential treatment coefficient is 0.6-2.0; the channel preferential coefficient and the autonomous underwriting coefficient are set by each insurance company according to the historical claims and the risk of the insured vehicles, and the range of the channel preferential coefficient and the autonomous underwriting coefficient is 0.75-1.
In this optional embodiment, the value range of the preset vehicle type includes 4 factor values of vehicles in the hongkong and australia, yue, motorcycles, trailers and single-trip vehicles, and the single-trip vehicle-trip refers to a process of temporarily moving an unsold automobile from one place to another; the value range of the vehicle application comprises two factor values of commercial vehicles and domestic vehicles; the value range of the dangerous species comprises three factor values of single transaction, single quotient and mutual insurance of the single transaction, wherein the single transaction means that the vehicle only guarantees the motor vehicle traffic accident responsibility forced insurance, the single quotient means that the vehicle only guarantees the motor vehicle business insurance, and the mutual insurance of the mutual transaction means that the vehicle guarantees the motor vehicle traffic accident responsibility forced insurance and the motor vehicle business insurance; the value range of the vehicle type comprises 3 factor values of a compact car, an SUV vehicle type and an MPV vehicle type.
It should be noted that the accuracy of the prediction result of the vehicle insurance cost can be ensured by refined preset factors, and the preset factors are not limited to the numerical type factors and the category type factors.
In an alternative embodiment, the construction unit 111 is configured to construct a cost decision tree based on the actual cost of each piece of historical cost data in the cost data set and the factor values of all category factors, the cost decision tree comprising a plurality of leaf nodes.
In an alternative embodiment, the constructing a cost decision tree based on the actual cost of each piece of historical cost data in the cost data set and the factor values of all categorical factors, the cost decision tree comprising a plurality of leaf nodes, comprises:
a1, acquiring the actual cost of each piece of historical cost data in the cost data set and factor values of all type factors to construct a type factor data set;
in this optional embodiment, for each piece of historical cost data in the cost data set, the actual cost and the factor values of all the type factors in the historical cost data are obtained to obtain the type factor cost data, and all the type factor cost data are further stored to obtain the type factor data set.
A2, calculating the information gain of each type factor based on the type factor data set;
in an alternative embodiment, said calculating an information gain for each category factor based on said set of category factor data comprises:
calculating the information entropy of the actual cost in the category factor data set, wherein the information entropy of the actual cost satisfies the relation:
Figure BDA0003789245530000121
wherein X represents the type factor data set, min (X) is the minimum value of the actual cost in X, max (X) is the maximum value of the actual cost in X, N (X) is the number of all the actual costs in X, N i () The actual cost in X is the number of i, and E (X) is the information entropy of the actual cost in X;
selecting all historical cost data with the same target type factor as the same factor value from the type factor data set to obtain a data subset of each factor value in the target type factor, wherein the target type factor is any one of all the type factors;
calculating an information gain for the target type factor based on the information entropy of the actual cost in the data subset for each factor value of the category type factor data set and the target type factor, the information gain satisfying the relationship:
Figure BDA0003789245530000122
wherein X is the data set of the type factors, E (X) is the information entropy of the actual cost in X, v is the number of all factor values in the value range of the target type factors, and M is k Is a data subset of the kth factor value in the target type factor M, E (M) k ) Is M k The information entropy of the actual cost, N (X) is the number of all the actual costs in X, N (M) k ) Is M k The amount of all actual costs, gain (M), is the information Gain of the target type factor M;
traversing all the category factors yields the information gain of each category factor.
It should be noted that, the larger the information gain is, the greater the influence of the category-type factor on the prediction result of the vehicle insurance cost is.
A3, selecting a type factor corresponding to the maximum value of the information gain as a target factor, and taking the target factor as node information to obtain a newly added node;
a4, drawing a preset number of directed edges by taking the newly added node as a starting point, wherein the preset number is the same as the number of factor values in the value range of the target factor, and the directed edges correspond to the factor values of the target factor one to one;
in this optional embodiment, if the value range of the target factor includes three factor values in total, three directed edges are drawn with the newly added node as a starting point, and each directed edge corresponds to one factor value.
A5, screening all historical cost data in the category factor data set based on the factor values corresponding to the directed edges to obtain a data subset of each directed edge;
in this optional embodiment, the filtering all historical cost data in the category-type factor data set based on the factor value corresponding to the directed edge to obtain a data subset of each directed edge includes:
randomly selecting one directed edge as a target directed edge, and taking a factor value of the target directed edge as a target factor value;
selecting all historical cost data with the target factors as the target factor values in the category factor data set as the data subsets of the target directed edges;
all the directed edges are traversed to obtain a subset of data for each directed edge.
A6, calculating information entropy of actual cost in the data subset of the target directed edge as target cost entropy, wherein the target directed edge is any one of all directed edges; if the target cost entropy is smaller than a preset entropy value, generating a null node as an end point of the target directed edge; if the target cost entropy is not smaller than the preset entropy value, deleting all factor values of the target factors in the category type factor data set to obtain an updated category type factor data set, repeatedly executing the step A2 to the step A3 to obtain a non-empty newly added node, and taking the newly added node as the terminal point of the target directed edge;
in this optional embodiment, the value of the preset entropy is 0.1.
A7, traversing all the directed edges to obtain the end point of each directed edge, and if the end points of all the directed edges are all empty nodes, completing the construction of the cost decision tree; and if at least one directed edge has a non-empty newly added node as the end point, executing the steps A4 to A7 aiming at each non-empty newly added node until the construction of the cost decision tree is completed.
In this optional embodiment, leaf nodes in the cost decision tree are all empty nodes, each non-empty node in the cost decision tree corresponds to one type factor, and each directed edge corresponds to one factor value of a type factor in a parent node, where the parent node is a node corresponding to the start point of the directed edge. The structure diagram of the cost decision tree is shown in fig. 2.
In an alternative embodiment, the classifying unit 112 is configured to classify all historical cost data in the cost data set based on the cost decision tree to obtain a plurality of cost data subsets, where the cost data subsets correspond to leaf nodes of the cost decision tree in a one-to-one manner.
In an optional embodiment, the classifying all historical cost data in the cost data set based on the cost decision tree to obtain a plurality of cost data subsets includes:
randomly selecting a piece of historical cost data from the cost data set as target data, wherein the target data comprises actual cost and factor values of all preset factors;
sequentially matching all directed edges of nodes in the cost decision tree and factor values of category factors in the target data by taking a root node of the cost decision tree as a starting point to obtain a classification result, wherein the classification result is a leaf node corresponding to the target data;
storing the target data in a cost data subset of leaf nodes in the classification result;
traversing all historical cost data in the cost data set to continually update a subset of cost data for each leaf node;
when all the historical cost data is traversed, a subset of the cost data for each leaf node is obtained.
In this optional embodiment, the taking a root node of the cost decision tree as a starting point, sequentially matching all directed edges of the nodes in the cost decision tree with factor values of category factors in the target data to obtain a classification result, where the classification result is a leaf node corresponding to the target data, includes:
acquiring a root node of the cost decision tree as a node to be matched, and taking a type factor in node information of the node to be matched as a factor to be matched;
matching each directed edge of the node to be matched with the factor value of the factor to be matched in the target data to obtain a matched directed edge, and taking the end point of the matched directed edge as a new node to be matched;
if the new node to be matched is a leaf node, taking the new node to be matched as a classification result of the target data;
and if the new node to be matched is a non-leaf node, taking the type factor in the new node to be matched as a new factor to be matched, and repeatedly executing the matching action until a classification result of the target data is obtained.
For example, assuming that the structure of the cost decision tree is as shown in fig. 2, if the factor values of the category type factor 1, the category type factor 2, the category type factor 3, the category type factor 4, and the category type factor 5 in the target data are the factor value 12, the factor value 22, the factor value 31, the factor value 42, and the factor value 51 in sequence, the classification node of the target data is a leaf node pointed to by the directed edge corresponding to the factor value 42 in the cost decision tree.
In an alternative embodiment, the fitting unit 113 is configured to fit the cost sub-model for each leaf node based on the actual cost of each of the historical cost data in the cost data subset and the factor values of all numerical type factors.
In an alternative embodiment, in the cost decision tree, each leaf node represents a type factor judgment logic, the influence of the type factor on the vehicle insurance cost prediction is eliminated in the cost data subset of one leaf node, and the cost sub-model corresponding to each leaf node can be obtained based on the cost data subset.
In an alternative embodiment, said fitting the cost sub-model for each leaf node based on the actual cost of each piece of historical cost data in the subset of cost data and the factor values of all numerical type factors comprises:
acquiring the actual cost of each piece of historical cost data in the cost data subset and factor values of all numerical type factors to construct a numerical type factor data set of each leaf node;
fitting an initial cost sub-model based on a numerical model factor data set of a target leaf node to obtain the cost sub-model of the target leaf node, wherein the target leaf node is any one of all leaf nodes, and the initial cost sub-model satisfies the relation:
Figure BDA0003789245530000141
where Num is the number of all numerical type factors, w j Is the influencing factor of the jth numerical factor, Z j Is the value of the jth numerical model factor, and C is the predicted cost of the initial cost sub-model, where w j Is the undetermined coefficient;
the numerical type factor data sets for all leaf nodes are traversed to obtain a cost sub-model for each leaf node.
In this optional embodiment, the fitting the initial cost sub-model based on the numeric type factor data set of the target leaf node to obtain the cost sub-model of the target leaf node includes: and determining all undetermined coefficients of the initial cost sub-model by using a least square method to obtain the cost sub-model of the target leaf node.
It should be noted that the decision logic of the type factor represented by different leaf nodes is different, so the cost submodels of different leaf nodes are different.
In an alternative embodiment, the combining unit 114 is configured to combine the cost decision tree and the cost sub-model of each leaf node together to form the car insurance cost prediction model.
In an alternative embodiment, the cost sub-model of each leaf node is used as the node information of the leaf node to obtain a car insurance cost prediction model, the car insurance cost prediction model includes the cost decision tree and the cost sub-model of each leaf node, and a schematic structural diagram of the car insurance cost prediction model is shown in fig. 3.
In an optional embodiment, the prediction unit 115 is configured to obtain target vehicle insurance data, and obtain a cost prediction result of the target vehicle insurance data based on the vehicle insurance cost prediction model, where the target vehicle insurance data includes factor values of all preset factors.
In an optional embodiment, the obtaining target vehicle insurance data and obtaining a cost prediction result of the target vehicle insurance data based on the vehicle insurance cost prediction model, where the target vehicle insurance data includes factor values of all preset factors, includes:
acquiring target vehicle insurance data, wherein the target vehicle insurance data comprise factor values of all preset factors, and the preset factors comprise category type factors and numerical type factors;
taking a root node of a cost decision tree in the vehicle insurance cost prediction model as a starting point, and sequentially matching all directed edges of the nodes in the cost decision tree with factor values of category factors in the target vehicle insurance data to obtain a classification result, wherein the classification result is a leaf node corresponding to the target vehicle insurance data;
taking the cost sub-model corresponding to the leaf node in the classification result as a target sub-model;
and inputting the factor values of the numerical model factors in the target vehicle insurance data into the target sub-model to calculate the cost prediction result of the target vehicle insurance data.
According to the technical scheme, all category type factors and numerical type factors influencing the vehicle insurance cost can be comprehensively considered, the category type factors are used for constructing the decision tree to obtain the cost submodel corresponding to the judgment logic of each category type factor, an accurate vehicle insurance cost prediction result is obtained based on the cost submodel and the numerical type factors, and the accuracy of vehicle insurance cost prediction is improved.
Fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application. The electronic device 1 comprises a memory 12 and a processor 13. The memory 12 is used for storing computer readable instructions, and the processor 13 is used for executing the computer readable instructions stored in the memory to implement the method for constructing the artificial intelligence based vehicle insurance cost prediction model according to any one of the above embodiments.
In an alternative embodiment, the electronic device 1 further comprises a bus, a computer program stored in said memory 12 and executable on said processor 13, such as a building program of an artificial intelligence based car insurance cost prediction model.
Fig. 5 shows only the electronic device 1 with the memory 12 and the processor 13, and it will be understood by those skilled in the art that the structure shown in fig. 5 does not constitute a limitation of the electronic device 1, and may comprise fewer or more components than shown, or a combination of certain components, or a different arrangement of components.
Referring to fig. 1, the memory 12 of the electronic device 1 stores a plurality of computer readable instructions to implement a method for constructing an artificial intelligence based vehicle insurance cost prediction model, and the processor 13 can execute the plurality of instructions to implement:
storing all historical cost data to construct a data set, wherein the historical cost data comprises actual cost and factor values of all preset factors, and the preset factors comprise category type factors and numerical type factors;
constructing a cost decision tree based on the actual cost of each piece of historical cost data in the cost data set and the factor values of all category factors, wherein the cost decision tree comprises a plurality of leaf nodes;
classifying all historical cost data in the cost data set based on the cost decision tree to obtain a plurality of cost data subsets, wherein the cost data subsets correspond to leaf nodes of the cost decision tree one to one;
fitting a cost sub-model for each leaf node based on the actual cost of each historical cost data in the cost data subset and the factor values of all numerical type factors;
combining the cost decision tree and the cost sub-model of each leaf node together to form the vehicle insurance cost prediction model;
and acquiring target vehicle insurance data, and acquiring a cost prediction result of the target vehicle insurance data based on the vehicle insurance cost prediction model, wherein the target vehicle insurance data comprises factor values of all preset factors.
Specifically, the processor 13 may refer to the description of the relevant steps in the embodiment corresponding to fig. 1 for a specific implementation method of the instruction, which is not described herein again.
It will be understood by those skilled in the art that the schematic diagram is only an example of the electronic device 1, and does not constitute a limitation to the electronic device 1, the electronic device 1 may have a bus-type structure or a star-shaped structure, the electronic device 1 may further include more or less hardware or software than those shown in the figures, or different component arrangements, for example, the electronic device 1 may further include an input and output device, a network access device, etc.
It should be noted that the electronic device 1 is only an example, and other existing or future electronic products, such as those that may be adapted to the present application, should also be included in the scope of protection of the present application, and are included by reference.
Memory 12 includes at least one type of readable storage medium, which may be non-volatile or volatile. The readable storage medium includes flash memory, removable hard disks, multimedia cards, card type memory (e.g., SD or DX memory, etc.), magnetic memory, magnetic disks, optical disks, and the like. The memory 12 may in some embodiments be an internal storage unit of the electronic device 1, e.g. a removable hard disk of the electronic device 1. The memory 12 may also be an external storage device of the electronic device 1 in other embodiments, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like provided on the electronic device 1. The memory 12 can be used not only to store application software installed in the electronic device 1 and various types of data, such as codes of a building program of an artificial intelligence-based vehicle insurance cost prediction model, but also to temporarily store data that has been output or is to be output.
The processor 13 may be composed of an integrated circuit in some embodiments, for example, a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same or different functions, including one or more Central Processing Units (CPUs), microprocessors, digital Processing chips, graphics processors, and combinations of various control chips. The processor 13 is a Control Unit (Control Unit) of the electronic device 1, connects various components of the whole electronic device 1 by using various interfaces and lines, and executes various functions and processes data of the electronic device 1 by running or executing programs or modules (for example, executing a building program of an artificial intelligence-based vehicle insurance cost prediction model, and the like) stored in the memory 12 and calling data stored in the memory 12.
The processor 13 executes an operating system of the electronic device 1 and various installed application programs. The processor 13 executes the application program to implement the steps in each embodiment of the method for constructing the artificial intelligence based vehicle insurance cost prediction model, such as the steps shown in fig. 1.
Illustratively, the computer program may be partitioned into one or more modules/units, which are stored in the memory 12 and executed by the processor 13 to accomplish the present application. The one or more modules/units may be a series of computer-readable instruction segments capable of performing certain functions, which are used to describe the execution of the computer program in the electronic device 1. For example, the computer program may be divided into a storage unit 110, a construction unit 111, a classification unit 112, a fitting unit 113, a combination unit 114, a prediction unit 115.
The integrated unit implemented in the form of a software functional module may be stored in a computer-readable storage medium. The software functional module is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a computer device, or a network device, etc.) or a Processor (Processor) to execute part of the method for constructing the artificial intelligence based vehicle insurance cost prediction model according to the embodiments of the present application.
The integrated modules/units of the electronic device 1 may be stored in a computer-readable storage medium if they are implemented in the form of software functional units and sold or used as separate products. Based on such understanding, all or part of the flow in the method of the embodiments described above may be implemented by a computer program, which may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the steps of the embodiments of the methods described above may be implemented.
Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, U.S. disk, removable hard disk, magnetic diskette, optical disk, computer Memory, read-Only Memory (ROM), random access Memory and other Memory, etc.
Further, the computer-readable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the blockchain node, and the like.
The block chain referred by the application is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
The bus may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one arrow is shown in FIG. 5, but this does not indicate only one bus or one type of bus. The bus is arranged to enable connection communication between the memory 12 and at least one processor 13 etc.
The embodiment of the present application further provides a computer-readable storage medium (not shown), in which computer-readable instructions are stored, and the computer-readable instructions are executed by a processor in an electronic device to implement the method for constructing an artificial intelligence based vehicle insurance cost prediction model according to any of the above embodiments.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, functional modules in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.
Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the specification may also be implemented by one unit or means through software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.
Finally, it should be noted that the above embodiments are only used for illustrating the technical solutions of the present application and not for limiting, and although the present application is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions can be made on the technical solutions of the present application without departing from the spirit and scope of the technical solutions of the present application.

Claims (10)

1. A construction method of an artificial intelligence-based vehicle insurance cost prediction model is characterized by comprising the following steps:
storing all historical cost data to construct a data set, wherein the historical cost data comprises actual cost and factor values of all preset factors, and the preset factors comprise category type factors and numerical type factors;
constructing a cost decision tree based on the actual cost of each piece of historical cost data in the cost data set and the factor values of all category factors, wherein the cost decision tree comprises a plurality of leaf nodes;
classifying all historical cost data in the cost data set based on the cost decision tree to obtain a plurality of cost data subsets, wherein the cost data subsets correspond to leaf nodes of the cost decision tree in a one-to-one mode;
fitting a cost sub-model for each leaf node based on the actual cost of each historical cost data in the cost data subset and the factor values of all numerical type factors;
combining the cost decision tree and the cost sub-model of each leaf node together to form the car insurance cost prediction model;
and acquiring target vehicle insurance data, and acquiring a cost prediction result of the target vehicle insurance data based on the vehicle insurance cost prediction model, wherein the target vehicle insurance data comprises factor values of all preset factors.
2. The method for constructing an artificial intelligence based vehicle insurance cost prediction model according to claim 1, wherein the step of storing all historical cost data to construct a cost data set, the historical cost data including actual cost and factor values of all preset factors, the preset factors including category type factors and numerical type factors includes:
acquiring all preset factors influencing the vehicle insurance cost, and classifying all the preset factors based on the value range of the preset factors to obtain a classification result, wherein the classification result comprises a category factor and a numerical factor;
if the value range of the preset factors is continuous, the classification result is a numerical factor, and the numerical factor comprises a traffic violation coefficient, a non-reimbursement preferential treatment coefficient, a channel preferential coefficient and an autonomous underwriting coefficient;
if the value range of the preset factors is discrete, the classification result is a classification type factor, and the classification type factor comprises preset vehicle types, dangerous types, vehicle purposes and vehicle types;
acquiring factor values of all preset factors and corresponding actual costs in each historical vehicle insurance policy in the historical vehicle insurance data as a piece of historical cost data;
all historical cost data is stored to build the present data set.
3. The method of constructing an artificial intelligence based vehicle insurance cost prediction model according to claim 1, wherein the cost decision tree is constructed based on actual cost of each piece of historical cost data in the cost data set and factor values of all categorical factors, the cost decision tree including a plurality of leaf nodes, including:
a1, acquiring the actual cost of each piece of historical cost data in the cost data set and factor values of all type factors to construct a type factor data set;
a2, calculating the information gain of each type factor based on the type factor data set;
a3, selecting a type factor corresponding to the maximum value of the information gain as a target factor, and obtaining a newly added node by using the target factor as node information;
a4, drawing a preset number of directed edges by taking the newly added node as a starting point, wherein the preset number is the same as the number of factor values in the value range of the target factor, and the directed edges correspond to the factor values of the target factor one to one;
a5, screening all historical cost data in the category type factor data set based on the factor values corresponding to the directed edges to obtain a data subset of each directed edge;
a6, calculating information entropy of actual cost in the data subset of the target directed edge as target cost entropy, wherein the target directed edge is any one of all directed edges; if the target cost entropy is smaller than a preset entropy value, generating a null node as an end point of the target directed edge; if the target cost entropy is not less than the preset entropy, deleting all factor values of the target factors in the category type factor data set to obtain an updated category type factor data set, repeatedly executing the step A2 to the step A3 to obtain a non-empty newly added node, and taking the newly added node as the terminal point of the target directed edge;
a7, traversing all the directed edges to obtain the end point of each directed edge, and if the end points of all the directed edges are empty nodes, completing the construction of the cost decision tree; and if at least one directed edge has a non-empty newly added node as the end point, executing the steps A4 to A7 aiming at each non-empty newly added node until the construction of the cost decision tree is completed.
4. The method of constructing an artificial intelligence based vehicle insurance cost prediction model according to claim 3, wherein said calculating an information gain for each type factor based on the type factor data set comprises:
calculating the information entropy of the actual cost in the categorical factor data set, wherein the information entropy of the actual cost satisfies the relation:
Figure FDA0003789245520000031
wherein X represents the category factor data set, min (X) is the minimum value of the actual cost in X, max (X) is the maximum value of the actual cost in X, N (X) is the number of all the actual costs in X, N i (X) is the number of the actual cost i in X, and E (X) is the information entropy of the actual cost in X;
selecting all historical cost data with the same target type factor as the same factor value from the type factor data set to obtain a data subset of each factor value in the target type factor, wherein the target type factor is any one of all the type factors;
calculating an information gain for the target type factor based on the information entropy of the actual cost in the data subsets of each factor value of the type factor data set and the target type factor, the information gain satisfying the relationship:
Figure FDA0003789245520000032
wherein X is the data set of the type factors, E (X) is the information entropy of the actual cost in X, v is the number of all factor values in the value range of the target type factors, and M is k Is a data subset of the kth factor value in the target type factor M, E (M) k ) Is M k The information entropy of the actual cost, N (X) is the number of all the actual costs in X, N (M) k ) Is M k The amount of all actual costs, gain (M), is the information Gain of the target type factor M;
traversing all the category factors yields the information gain of each category factor.
5. The method of constructing an artificial intelligence based vehicle insurance cost prediction model according to claim 1, wherein the classifying all historical cost data in the cost data set based on the cost decision tree into a plurality of cost data subsets comprises:
randomly selecting a piece of historical cost data from the cost data set as target data, wherein the target data comprises actual cost and factor values of all preset factors;
sequentially matching all directed edges of nodes in the cost decision tree and factor values of category factors in the target data by taking a root node of the cost decision tree as a starting point to obtain a classification result, wherein the classification result is a leaf node corresponding to the target data;
storing the target data in a cost data subset of leaf nodes in the classification result;
traversing all historical cost data in the cost data set to continually update a subset of cost data for each leaf node;
when all the historical cost data is traversed, the cost data subset of each leaf node is obtained.
6. The method of constructing an artificial intelligence based vehicle insurance cost prediction model of claim 1, wherein said fitting a cost sub-model for each leaf node based on actual cost of each piece of historical cost data in the subset of cost data and factor values of all numerical type factors comprises:
acquiring the actual cost of each piece of historical cost data in the cost data subset and factor values of all numerical type factors to construct a numerical type factor data set of each leaf node;
fitting an initial cost sub-model based on a numerical model factor data set of a target leaf node to obtain the cost sub-model of the target leaf node, wherein the target leaf node is any one of all leaf nodes, and the initial cost sub-model satisfies the relation:
Figure FDA0003789245520000041
where Num is the number of all numerical type factors, w j Is the influence factor of the jth numerical factor, Z j Is the value of the jth numerical model factor, and C is the predicted cost of the initial cost sub-model, where w j Is the undetermined coefficient;
the numerical type factor data sets for all leaf nodes are traversed to obtain a cost sub-model for each leaf node.
7. The method for constructing the artificial intelligence-based vehicle insurance cost prediction model according to claim 1, wherein the obtaining of the target vehicle insurance data and the obtaining of the cost prediction result of the target vehicle insurance data based on the vehicle insurance cost prediction model include the factor values of all preset factors, and includes:
acquiring target vehicle insurance data, wherein the target vehicle insurance data comprise factor values of all preset factors, and the preset factors comprise category type factors and numerical type factors;
taking a root node of a cost decision tree in the vehicle insurance cost prediction model as a starting point, and sequentially matching all directed edges of the nodes in the cost decision tree with factor values of category factors in the target vehicle insurance data to obtain a classification result, wherein the classification result is a leaf node corresponding to the target vehicle insurance data;
taking the cost submodel corresponding to the leaf node in the classification result as a target submodel;
and inputting the factor values of the numerical type factors in the target vehicle insurance data into the target sub-model to calculate the cost prediction result of the target vehicle insurance data.
8. An apparatus for constructing an artificial intelligence-based vehicle insurance cost prediction model, the apparatus comprising:
the system comprises a storage unit, a data processing unit and a data processing unit, wherein the storage unit is used for storing all historical cost data to construct a cost data set, the historical cost data comprise actual cost and factor values of all preset factors, and the preset factors comprise category type factors and numerical type factors;
a construction unit, configured to construct a cost decision tree based on an actual cost of each piece of historical cost data in the cost data set and factor values of all category factors, where the cost decision tree includes a plurality of leaf nodes;
a classification unit, configured to classify all historical cost data in the cost data set based on the cost decision tree to obtain a plurality of cost data subsets, where the cost data subsets correspond to leaf nodes of the cost decision tree one to one;
a fitting unit for fitting a cost sub-model of each leaf node based on the actual cost of each piece of historical cost data in the cost data subset and the factor values of all numerical type factors;
a combination unit for combining the cost decision tree and the cost sub-model of each leaf node together to form the vehicle insurance cost prediction model;
and the prediction unit is used for acquiring target vehicle insurance data and obtaining a cost prediction result of the target vehicle insurance data based on the vehicle insurance cost prediction model, wherein the target vehicle insurance data comprises factor values of all preset factors.
9. An electronic device, characterized in that the electronic device comprises:
a memory storing computer readable instructions; and
a processor executing computer readable instructions stored in the memory to implement the method of constructing an artificial intelligence based vehicle insurance cost prediction model as claimed in any one of claims 1 to 7.
10. A computer-readable storage medium having computer-readable instructions stored thereon, which when executed by a processor implement the method of constructing an artificial intelligence based vehicle insurance cost prediction model according to any one of claims 1 to 7.
CN202210951208.6A 2022-08-09 2022-08-09 Construction method of vehicle insurance cost prediction model based on artificial intelligence and related equipment Pending CN115293808A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210951208.6A CN115293808A (en) 2022-08-09 2022-08-09 Construction method of vehicle insurance cost prediction model based on artificial intelligence and related equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210951208.6A CN115293808A (en) 2022-08-09 2022-08-09 Construction method of vehicle insurance cost prediction model based on artificial intelligence and related equipment

Publications (1)

Publication Number Publication Date
CN115293808A true CN115293808A (en) 2022-11-04

Family

ID=83828009

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210951208.6A Pending CN115293808A (en) 2022-08-09 2022-08-09 Construction method of vehicle insurance cost prediction model based on artificial intelligence and related equipment

Country Status (1)

Country Link
CN (1) CN115293808A (en)

Similar Documents

Publication Publication Date Title
CN109634801B (en) Data trend analysis method, system, computer device and readable storage medium
CN112801718B (en) User behavior prediction method, device, equipment and medium
CN112541745A (en) User behavior data analysis method and device, electronic equipment and readable storage medium
CN108885762B (en) Method and system for allocating price discovery mechanism in data market
CN112686746A (en) Credit flow configuration method, device and equipment
CN110598993B (en) Data processing method and device
CN115577983B (en) Enterprise task matching method based on block chain, server and storage medium
CN113065947A (en) Data processing method, device, equipment and storage medium
CN111679959A (en) Computer performance data determination method and device, computer equipment and storage medium
CN111796937A (en) Resource allocation method based on memory, computer equipment and storage medium
CN115293808A (en) Construction method of vehicle insurance cost prediction model based on artificial intelligence and related equipment
CN116150185A (en) Data standard extraction method, device, equipment and medium based on artificial intelligence
CN115562934A (en) Service flow switching method based on artificial intelligence and related equipment
CN116187422A (en) Parameter updating method of neural network and related equipment
CN115169360A (en) User intention identification method based on artificial intelligence and related equipment
CN115358894A (en) Intellectual property life cycle trusteeship management method, device, equipment and medium
CN115456671A (en) Car insurance pricing method and device based on artificial intelligence, electronic equipment and medium
CN112291241A (en) Firewall wall opening method, firewall wall opening device and terminal equipment
CN110675268A (en) Risk client identification method and device and server
CN117744954B (en) Intelligent scheduling method and related equipment based on identification analysis
CN115271977A (en) Car risk cost management and control method based on artificial intelligence and related equipment
CN111709735B (en) System and method for processing trusted transaction based on blockchain
CN109727144A (en) Insure Incidence calculus method, apparatus, equipment and computer readable storage medium
CN115238189A (en) Accurate recommendation method, device, equipment and medium based on multivariate data fusion
WO2019041826A1 (en) Breakpoint list cleaning method and apparatus, storage medium, and server

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination